git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Reftable support git-core
@ 2020-01-23 19:41 Han-Wen Nienhuys via GitGitGadget
  2020-01-23 19:41 ` [PATCH 1/5] setup.c: enable repo detection for reftable Han-Wen Nienhuys via GitGitGadget
                   ` (7 more replies)
  0 siblings, 8 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-01-23 19:41 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend.

Han-Wen Nienhuys (5):
  setup.c: enable repo detection for reftable
  create .git/refs in files-backend.c
  Document how ref iterators and symrefs interact
  Add reftable library
  Reftable support for git-core

 Makefile                  |   23 +-
 builtin/init-db.c         |    2 -
 refs.c                    |   18 +-
 refs.h                    |    2 +
 refs/files-backend.c      |    4 +
 refs/refs-internal.h      |    4 +
 refs/reftable-backend.c   |  780 ++++++++++++++++++++++++++++
 reftable/README.md        |   12 +
 reftable/VERSION          |  293 +++++++++++
 reftable/basics.c         |  180 +++++++
 reftable/basics.h         |   38 ++
 reftable/block.c          |  382 ++++++++++++++
 reftable/block.h          |   71 +++
 reftable/block_test.c     |  145 ++++++
 reftable/blocksource.h    |   20 +
 reftable/bytes.c          |    0
 reftable/constants.h      |   27 +
 reftable/dump.c           |  102 ++++
 reftable/file.c           |   98 ++++
 reftable/iter.c           |  211 ++++++++
 reftable/iter.h           |   56 ++
 reftable/merged.c         |  262 ++++++++++
 reftable/merged.h         |   34 ++
 reftable/merged_test.c    |  247 +++++++++
 reftable/pq.c             |  116 +++++
 reftable/pq.h             |   34 ++
 reftable/reader.c         |  672 ++++++++++++++++++++++++
 reftable/reader.h         |   52 ++
 reftable/record.c         | 1030 +++++++++++++++++++++++++++++++++++++
 reftable/record.h         |   79 +++
 reftable/record_test.c    |  313 +++++++++++
 reftable/reftable.h       |  396 ++++++++++++++
 reftable/reftable_test.c  |  434 ++++++++++++++++
 reftable/slice.c          |  179 +++++++
 reftable/slice.h          |   39 ++
 reftable/slice_test.c     |   38 ++
 reftable/stack.c          |  931 +++++++++++++++++++++++++++++++++
 reftable/stack.h          |   40 ++
 reftable/stack_test.c     |  265 ++++++++++
 reftable/test_framework.c |   60 +++
 reftable/test_framework.h |   63 +++
 reftable/tree.c           |   60 +++
 reftable/tree.h           |   24 +
 reftable/tree_test.c      |   54 ++
 reftable/writer.c         |  586 +++++++++++++++++++++
 reftable/writer.h         |   46 ++
 setup.c                   |   20 +-
 47 files changed, 8530 insertions(+), 12 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/slice_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h


base-commit: 232378479ee6c66206d47a9be175e3a39682aea6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/539
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH 1/5] setup.c: enable repo detection for reftable
  2020-01-23 19:41 [PATCH 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-01-23 19:41 ` Han-Wen Nienhuys via GitGitGadget
  2020-01-23 19:41 ` [PATCH 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-01-23 19:41 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* only check R_OK for .git/refs.

  In the reftable format, tables are stored under $GITDIR/reftable/ and
  the list of tables is managed in a file $GITDIR/refs. Generally, this
  file does not have the +x bit set

* allow missing HEAD if there is a reftable/

Change-Id: I5d22317a15a84c8529aa503ae357a4afba247fe9
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 setup.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/setup.c b/setup.c
index e2a479a64f..e7c7dca701 100644
--- a/setup.c
+++ b/setup.c
@@ -304,22 +304,23 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
  *  - either an objects/ directory _or_ the proper
  *    GIT_OBJECT_DIRECTORY environment variable
  *  - a refs/ directory
- *  - either a HEAD symlink or a HEAD file that is formatted as
- *    a proper "ref:", or a regular file HEAD that has a properly
- *    formatted sha1 object name.
+ *  - a reftable/ director or, either a HEAD ref.
+ *    The HEAD ref should be a HEAD symlink or a HEAD file that is formatted as
+ *    a proper "ref:", or a regular file HEAD that has a properly formatted sha1
+ *    object name.
  */
 int is_git_directory(const char *suspect)
 {
 	struct strbuf path = STRBUF_INIT;
 	int ret = 0;
+        int have_head = 0;
 	size_t len;
 
 	/* Check worktree-related signatures */
 	strbuf_addstr(&path, suspect);
 	strbuf_complete(&path, '/');
 	strbuf_addstr(&path, "HEAD");
-	if (validate_headref(path.buf))
-		goto done;
+	have_head = !validate_headref(path.buf);
 
 	strbuf_reset(&path);
 	get_common_dir(&path, suspect);
@@ -339,9 +340,16 @@ int is_git_directory(const char *suspect)
 
 	strbuf_setlen(&path, len);
 	strbuf_addstr(&path, "/refs");
-	if (access(path.buf, X_OK))
+	if (access(path.buf, R_OK))
 		goto done;
 
+        if (!have_head) {
+                strbuf_setlen(&path, len);
+                strbuf_addstr(&path, "/reftable");
+                if (access(path.buf, X_OK))
+                        goto done;
+        }
+
 	ret = 1;
 done:
 	strbuf_release(&path);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH 2/5] create .git/refs in files-backend.c
  2020-01-23 19:41 [PATCH 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-01-23 19:41 ` [PATCH 1/5] setup.c: enable repo detection for reftable Han-Wen Nienhuys via GitGitGadget
@ 2020-01-23 19:41 ` Han-Wen Nienhuys via GitGitGadget
  2020-01-23 19:41 ` [PATCH 3/5] Document how ref iterators and symrefs interact Han-Wen Nienhuys via GitGitGadget
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-01-23 19:41 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This prepares for supporting the reftable format, which creates a file
in that place.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: I2fc47c89f5ec605734007ceff90321c02474aa92
---
 builtin/init-db.c    | 2 --
 refs/files-backend.c | 4 ++++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 944ec77fe1..45bdea0589 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
 	 */
-	safe_create_dir(git_path("refs"), 1);
-	adjust_shared_perm(git_path("refs"));
 
 	if (refs_init_db(&err))
 		die("failed to set up refs db: %s", err.buf);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0ea66a28b6..f49b6f2ab6 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3158,6 +3158,10 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
 	struct strbuf sb = STRBUF_INIT;
 
+	files_ref_path(refs, &sb, "refs");
+	safe_create_dir(sb.buf, 1);
+ 	// XXX adjust_shared_perm ?
+
 	/*
 	 * Create .git/refs/{heads,tags}
 	 */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH 3/5] Document how ref iterators and symrefs interact
  2020-01-23 19:41 [PATCH 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-01-23 19:41 ` [PATCH 1/5] setup.c: enable repo detection for reftable Han-Wen Nienhuys via GitGitGadget
  2020-01-23 19:41 ` [PATCH 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
@ 2020-01-23 19:41 ` Han-Wen Nienhuys via GitGitGadget
  2020-01-23 19:41 ` [PATCH 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-01-23 19:41 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Change-Id: Ie3ee63c52254c000ef712986246ca28f312b4301
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index ff2436c0fb..fc18b12340 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -269,6 +269,9 @@ int refs_rename_ref_available(struct ref_store *refs,
  * to the next entry, ref_iterator_advance() aborts the iteration,
  * frees the ref_iterator, and returns ITER_ERROR.
  *
+ * Ref iterators cannot return symref targets, so symbolic refs must be
+ * dereferenced during the iteration.
+ *
  * The reference currently being looked at can be peeled by calling
  * ref_iterator_peel(). This function is often faster than peel_ref(),
  * so it should be preferred when iterating over references.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH 4/5] Add reftable library
  2020-01-23 19:41 [PATCH 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-01-23 19:41 ` [PATCH 3/5] Document how ref iterators and symrefs interact Han-Wen Nienhuys via GitGitGadget
@ 2020-01-23 19:41 ` Han-Wen Nienhuys via GitGitGadget
  2020-01-23 19:41 ` [PATCH 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-01-23 19:41 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: Id396ff42be8b42b9e11f194a32e2f95b8250c109
---
 reftable/README.md        |   12 +
 reftable/VERSION          |  293 +++++++++++
 reftable/basics.c         |  180 +++++++
 reftable/basics.h         |   38 ++
 reftable/block.c          |  382 ++++++++++++++
 reftable/block.h          |   71 +++
 reftable/block_test.c     |  145 ++++++
 reftable/blocksource.h    |   20 +
 reftable/bytes.c          |    0
 reftable/constants.h      |   27 +
 reftable/dump.c           |  102 ++++
 reftable/file.c           |   98 ++++
 reftable/iter.c           |  211 ++++++++
 reftable/iter.h           |   56 ++
 reftable/merged.c         |  262 ++++++++++
 reftable/merged.h         |   34 ++
 reftable/merged_test.c    |  247 +++++++++
 reftable/pq.c             |  116 +++++
 reftable/pq.h             |   34 ++
 reftable/reader.c         |  672 ++++++++++++++++++++++++
 reftable/reader.h         |   52 ++
 reftable/record.c         | 1030 +++++++++++++++++++++++++++++++++++++
 reftable/record.h         |   79 +++
 reftable/record_test.c    |  313 +++++++++++
 reftable/reftable.h       |  396 ++++++++++++++
 reftable/reftable_test.c  |  434 ++++++++++++++++
 reftable/slice.c          |  179 +++++++
 reftable/slice.h          |   39 ++
 reftable/slice_test.c     |   38 ++
 reftable/stack.c          |  931 +++++++++++++++++++++++++++++++++
 reftable/stack.h          |   40 ++
 reftable/stack_test.c     |  265 ++++++++++
 reftable/test_framework.c |   60 +++
 reftable/test_framework.h |   63 +++
 reftable/tree.c           |   60 +++
 reftable/tree.h           |   24 +
 reftable/tree_test.c      |   54 ++
 reftable/writer.c         |  586 +++++++++++++++++++++
 reftable/writer.h         |   46 ++
 39 files changed, 7689 insertions(+)
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/slice_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h

diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 0000000000..cdaf07bbc1
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,12 @@
+The source code in this directory comes from https://github.com/google/reftable
+and can be updated by doing:
+
+   rm -rf reftable-repo && \
+   git clone https://github.com/google/reftable reftable-repo && \
+   cp reftable-repo/c/*.[ch] reftable/ && \
+   cp reftable-repo/LICENSE reftable/
+   git --git-dir reftable-repo/.git show --no-patch origin/master \
+      > reftable/VERSION
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 0000000000..8f3013ead1
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,293 @@
+commit 887250932a9c39a15aee26aef59df2300c77c03f
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Wed Jan 22 19:50:16 2020 +0100
+
+    C: implement reflog expiry
+
+diff --git a/c/reftable.h b/c/reftable.h
+index 2f44973..d760af9 100644
+--- a/c/reftable.h
++++ b/c/reftable.h
+@@ -359,8 +359,17 @@ void stack_destroy(struct stack *st);
+ /* reloads the stack if necessary. */
+ int stack_reload(struct stack *st);
+ 
+-/* compacts all reftables into a giant table. */
+-int stack_compact_all(struct stack *st);
++/* Policy for expiring reflog entries. */
++struct log_expiry_config {
++  /* Drop entries older than this timestamp */
++  uint64_t time;
++
++  /* Drop older entries */
++  uint64_t min_update_index;
++};
++
++/* compacts all reftables into a giant table. Expire reflog entries if config is non-NULL */
++int stack_compact_all(struct stack *st, struct log_expiry_config *config);
+ 
+ /* heuristically compact unbalanced table stack. */
+ int stack_auto_compact(struct stack *st);
+diff --git a/c/stack.c b/c/stack.c
+index d67828f..641ac52 100644
+--- a/c/stack.c
++++ b/c/stack.c
+@@ -119,7 +119,7 @@ static struct reader **stack_copy_readers(struct stack *st, int cur_len) {
+   return cur;
+ }
+ 
+-static int stack_reload_once(struct stack *st, char **names) {
++static int stack_reload_once(struct stack *st, char **names, bool reuse_open) {
+   int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+   struct reader **cur = stack_copy_readers(st, cur_len);
+   int err = 0;
+@@ -136,7 +136,7 @@ static int stack_reload_once(struct stack *st, char **names) {
+ 
+     // this is linear; we assume compaction keeps the number of tables
+     // under control so this is not quadratic.
+-    for (int j = 0; j < cur_len; j++) {
++    for (int j = 0; reuse_open && j < cur_len; j++) {
+       if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+         rd = cur[j];
+         cur[j] = NULL;
+@@ -207,7 +207,7 @@ static int tv_cmp(struct timeval *a, struct timeval *b) {
+   return udiff;
+ }
+ 
+-int stack_reload(struct stack *st) {
++static int stack_reload_maybe_reuse(struct stack *st, bool reuse_open) {
+   struct timeval deadline = {};
+   int err = gettimeofday(&deadline, NULL);
+   int64_t delay = 0;
+@@ -238,7 +238,7 @@ int stack_reload(struct stack *st) {
+       free_names(names);
+       return err;
+     }
+-    err = stack_reload_once(st, names);
++    err = stack_reload_once(st, names, reuse_open);
+     if (err == 0) {
+       free_names(names);
+       break;
+@@ -269,6 +269,10 @@ int stack_reload(struct stack *st) {
+   return 0;
+ }
+ 
++int stack_reload(struct stack *st) {
++  return stack_reload_maybe_reuse(st, true);
++}
++
+ // -1 = error
+ // 0 = up to date
+ // 1 = changed.
+@@ -471,7 +475,7 @@ uint64_t stack_next_update_index(struct stack *st) {
+ }
+ 
+ static int stack_compact_locked(struct stack *st, int first, int last,
+-                                struct slice *temp_tab) {
++                                struct slice *temp_tab, struct log_expiry_config *config) {
+   struct slice next_name = {};
+   int tab_fd = -1;
+   struct writer *wr = NULL;
+@@ -488,7 +492,7 @@ static int stack_compact_locked(struct stack *st, int first, int last,
+   tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+   wr = new_writer(fd_writer, &tab_fd, &st->config);
+ 
+-  err = stack_write_compact(st, wr, first, last);
++  err = stack_write_compact(st, wr, first, last, config);
+   if (err < 0) {
+     goto exit;
+   }
+@@ -515,7 +519,7 @@ static int stack_compact_locked(struct stack *st, int first, int last,
+ }
+ 
+ int stack_write_compact(struct stack *st, struct writer *wr, int first,
+-                        int last) {
++                        int last, struct log_expiry_config *config) {
+   int subtabs_len = last - first + 1;
+   struct reader **subtabs = calloc(sizeof(struct reader *), last - first + 1);
+   struct merged_table *mt = NULL;
+@@ -580,6 +584,16 @@ int stack_write_compact(struct stack *st, struct writer *wr, int first,
+       continue;
+     }
+ 
++    // XXX collect stats?
++
++    if (config != NULL && config->time > 0 && log.time < config->time) {
++      continue;
++    }
++
++    if (config != NULL && config->min_update_index > 0 && log.update_index < config->min_update_index) {
++      continue;
++    }
++
+     err = writer_add_log(wr, &log);
+     if (err < 0) {
+       break;
+@@ -599,7 +613,7 @@ int stack_write_compact(struct stack *st, struct writer *wr, int first,
+ }
+ 
+ // <  0: error. 0 == OK, > 0 attempt failed; could retry.
+-static int stack_compact_range(struct stack *st, int first, int last) {
++static int stack_compact_range(struct stack *st, int first, int last, struct log_expiry_config *expiry) {
+   struct slice temp_tab_name = {};
+   struct slice new_table_name = {};
+   struct slice lock_file_name = {};
+@@ -612,7 +626,7 @@ static int stack_compact_range(struct stack *st, int first, int last) {
+   char **delete_on_success = calloc(sizeof(char *), compact_count + 1);
+   char **subtable_locks = calloc(sizeof(char *), compact_count + 1);
+ 
+-  if (first >= last) {
++  if (first > last || (expiry == NULL && first == last)) {
+     err = 0;
+     goto exit;
+   }
+@@ -676,7 +690,7 @@ static int stack_compact_range(struct stack *st, int first, int last) {
+   }
+   have_lock = false;
+ 
+-  err = stack_compact_locked(st, first, last, &temp_tab_name);
++  err = stack_compact_locked(st, first, last, &temp_tab_name, expiry);
+   if (err < 0) {
+     goto exit;
+   }
+@@ -739,10 +753,12 @@ static int stack_compact_range(struct stack *st, int first, int last) {
+   have_lock = false;
+ 
+   for (char **p = delete_on_success; *p; p++) {
+-    unlink(*p);
++    if (0 != strcmp(*p, slice_as_string(&new_table_path))) {
++      unlink(*p);
++    }
+   }
+ 
+-  err = stack_reload(st);
++  err = stack_reload_maybe_reuse(st, first < last);
+ exit:
+   for (char **p = subtable_locks; *p; p++) {
+     unlink(*p);
+@@ -764,12 +780,12 @@ static int stack_compact_range(struct stack *st, int first, int last) {
+   return err;
+ }
+ 
+-int stack_compact_all(struct stack *st) {
+-  return stack_compact_range(st, 0, st->merged->stack_len - 1);
++int stack_compact_all(struct stack *st, struct log_expiry_config *config) {
++  return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+ }
+ 
+-static int stack_compact_range_stats(struct stack *st, int first, int last) {
+-  int err = stack_compact_range(st, first, last);
++static int stack_compact_range_stats(struct stack *st, int first, int last, struct log_expiry_config *config) {
++  int err = stack_compact_range(st, first, last, config);
+   if (err > 0) {
+     st->stats.failures++;
+   }
+@@ -856,7 +872,7 @@ int stack_auto_compact(struct stack *st) {
+   struct segment seg = suggest_compaction_segment(sizes, st->merged->stack_len);
+   free(sizes);
+   if (segment_size(&seg) > 0) {
+-    return stack_compact_range_stats(st, seg.start, seg.end - 1);
++    return stack_compact_range_stats(st, seg.start, seg.end - 1, NULL);
+   }
+ 
+   return 0;
+diff --git a/c/stack.h b/c/stack.h
+index 8cb2cb0..092a03b 100644
+--- a/c/stack.h
++++ b/c/stack.h
+@@ -25,7 +25,7 @@ int read_lines(const char *filename, char ***lines);
+ int stack_try_add(struct stack *st,
+                   int (*write_table)(struct writer *wr, void *arg), void *arg);
+ int stack_write_compact(struct stack *st, struct writer *wr, int first,
+-                        int last);
++                        int last, struct log_expiry_config *config);
+ int fastlog2(uint64_t sz);
+ 
+ struct segment {
+diff --git a/c/stack_test.c b/c/stack_test.c
+index c49d8b8..8c0cde0 100644
+--- a/c/stack_test.c
++++ b/c/stack_test.c
+@@ -121,7 +121,7 @@ void test_stack_add(void) {
+     assert_err(err);
+   }
+ 
+-  err = stack_compact_all(st);
++  err = stack_compact_all(st, NULL);
+   assert_err(err);
+ 
+   for (int i = 0; i < N; i++) {
+@@ -186,7 +186,73 @@ void test_suggest_compaction_segment(void) {
+   }
+ }
+ 
++void test_reflog_expire(void) {
++  char dir[256] = "/tmp/stack.XXXXXX";
++  assert(mkdtemp(dir));
++  printf("%s\n", dir);
++  char fn[256] = "";
++  strcat(fn, dir);
++  strcat(fn, "/refs");
++
++  struct write_options cfg = {};
++  struct stack *st = NULL;
++  int err = new_stack(&st, dir, fn, cfg);
++  assert_err(err);
++
++  struct log_record logs[20] = {};
++  int N = ARRAYSIZE(logs) -1;
++  for (int i = 1; i <= N; i++) {
++    char buf[256];
++    sprintf(buf, "branch%02d", i);
++
++    logs[i].ref_name = strdup(buf);
++    logs[i].update_index = i;
++    logs[i].time = i;
++    logs[i].new_hash = malloc(SHA1_SIZE);
++    logs[i].email = strdup("identity@invalid");
++    set_test_hash(logs[i].new_hash, i);
++  }
++
++  for (int i = 1; i <= N; i++) {
++    int err = stack_add(st, &write_test_log, &logs[i]);
++    assert_err(err);
++  }
++
++  err = stack_compact_all(st, NULL);
++  assert_err(err);
++
++  struct log_expiry_config expiry = {
++    .time = 10,
++  };
++  err = stack_compact_all(st, &expiry);
++  assert_err(err);
++
++  struct log_record log = {};
++  err = stack_read_log(st, logs[9].ref_name, &log);
++  assert(err == 1);
++
++  err = stack_read_log(st, logs[11].ref_name, &log);
++  assert_err(err);
++
++  expiry.min_update_index = 15;
++  err = stack_compact_all(st, &expiry);
++  assert_err(err);
++
++  err = stack_read_log(st, logs[14].ref_name, &log);
++  assert(err == 1);
++
++  err = stack_read_log(st, logs[16].ref_name, &log);
++  assert_err(err);
++
++  // cleanup
++  stack_destroy(st);
++  for (int i = 0; i < N; i++) {
++    log_record_clear(&logs[i]);
++  }
++}
++
+ int main() {
++  add_test_case("test_reflog_expire", test_reflog_expire);
+   add_test_case("test_suggest_compaction_segment",
+                 &test_suggest_compaction_segment);
+   add_test_case("test_sizes_to_segments", &test_sizes_to_segments);
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 0000000000..72c478cbea
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,180 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+void put_u24(byte *out, uint32_t i) {
+  out[0] = (byte)((i >> 16) & 0xff);
+  out[1] = (byte)((i >> 8) & 0xff);
+  out[2] = (byte)((i)&0xff);
+}
+
+uint32_t get_u24(byte *in) {
+  return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 | (uint32_t)(in[2]);
+}
+
+void put_u32(byte *out, uint32_t i) {
+  out[0] = (byte)((i >> 24) & 0xff);
+  out[1] = (byte)((i >> 16) & 0xff);
+  out[2] = (byte)((i >> 8) & 0xff);
+  out[3] = (byte)((i)&0xff);
+}
+
+uint32_t get_u32(byte *in) {
+  return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
+         (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
+}
+
+void put_u64(byte *out, uint64_t v) {
+  for (int i = sizeof(uint64_t); i--;) {
+    out[i] = (byte)(v & 0xff);
+    v >>= 8;
+  }
+}
+
+uint64_t get_u64(byte *out) {
+  uint64_t v = 0;
+  for (int i = 0; i < sizeof(uint64_t); i++) {
+    v = (v << 8) | (byte)(out[i] & 0xff);
+  }
+  return v;
+}
+
+void put_u16(byte *out, uint16_t i) {
+  out[0] = (byte)((i >> 8) & 0xff);
+  out[1] = (byte)((i)&0xff);
+}
+
+uint16_t get_u16(byte *in) {
+  return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
+}
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(int sz, int (*f)(int k, void *args), void *args) {
+  int lo = 0;
+  int hi = sz;
+
+  /* invariant: (hi == sz) || f(hi) == true
+     (lo == 0 && f(0) == true) || fi(lo) == false
+   */
+  while (hi - lo > 1) {
+    int mid = lo + (hi - lo) / 2;
+
+    int val = f(mid, args);
+    if (val) {
+      hi = mid;
+    } else {
+      lo = mid;
+    }
+  }
+
+  if (lo == 0) {
+    if (f(0, args)) {
+      return 0;
+    } else {
+      return 1;
+    }
+  }
+
+  return hi;
+}
+
+void free_names(char **a) {
+  char **p = a;
+  if (p == NULL) {
+    return;
+  }
+  while (*p) {
+    free(*p);
+    p++;
+  }
+  free(a);
+}
+
+int names_length(char **names) {
+  int len = 0;
+  for (char **p = names; *p; p++) {
+    len++;
+  }
+  return len;
+}
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp) {
+  char **names = NULL;
+  int names_cap = 0;
+  int names_len = 0;
+
+  char *p = buf;
+  char *end = buf + size;
+  while (p < end) {
+    char *next = strchr(p, '\n');
+    if (next != NULL) {
+      *next = 0;
+    } else {
+      next = end;
+    }
+    if (p < next) {
+      if (names_len == names_cap) {
+        names_cap = 2 * names_cap + 1;
+        names = realloc(names, names_cap * sizeof(char *));
+      }
+      names[names_len++] = strdup(p);
+    }
+    p = next + 1;
+  }
+
+  if (names_len == names_cap) {
+    names_cap = 2 * names_cap + 1;
+    names = realloc(names, names_cap * sizeof(char *));
+  }
+
+  names[names_len] = NULL;
+  *namesp = names;
+}
+
+int names_equal(char **a, char **b) {
+  while (*a && *b) {
+    if (0 != strcmp(*a, *b)) {
+      return 0;
+    }
+
+    a++;
+    b++;
+  }
+
+  return *a == *b;
+}
+
+const char *error_str(int err) {
+  switch (err) {
+    case IO_ERROR:
+      return "I/O error";
+    case FORMAT_ERROR:
+      return "FORMAT_ERROR";
+    case NOT_EXIST_ERROR:
+      return "NOT_EXIST_ERROR";
+    case LOCK_ERROR:
+      return "LOCK_ERROR";
+    case API_ERROR:
+      return "API_ERROR";
+    case ZLIB_ERROR:
+      return "ZLIB_ERROR";
+    case -1:
+      return "general error";
+    default:
+      return "unknown error code";
+  }
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 0000000000..c28903aa2e
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,38 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include <stdint.h>
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+#define ARRAYSIZE(a) sizeof(a) / sizeof(a[0])
+
+void put_u24(byte *out, uint32_t i);
+uint32_t get_u24(byte *in);
+
+uint64_t get_u64(byte *in);
+void put_u64(byte *out, uint64_t i);
+
+void put_u32(byte *out, uint32_t i);
+uint32_t get_u32(byte *in);
+
+void put_u16(byte *out, uint16_t i);
+uint16_t get_u16(byte *in);
+int binsearch(int sz, int (*f)(int k, void *args), void *args);
+
+void free_names(char **a);
+void parse_names(char *buf, int size, char ***namesp);
+int names_equal(char **a, char **b);
+int names_length(char **names);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 0000000000..079331b6b9
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,382 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include <assert.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "blocksource.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+                                  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+                       uint32_t block_size, uint32_t header_off,
+                       int hash_size) {
+  bw->buf = buf;
+  bw->hash_size = hash_size;
+  bw->block_size = block_size;
+  bw->header_off = header_off;
+  bw->buf[header_off] = typ;
+  bw->next = header_off + 4;
+  bw->restart_interval = 16;
+  bw->entries = 0;
+}
+
+byte block_writer_type(struct block_writer *bw) {
+  return bw->buf[bw->header_off];
+}
+
+/* adds the record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct record rec) {
+  struct slice empty = {};
+  struct slice last =
+      w->entries % w->restart_interval == 0 ? empty : w->last_key;
+  struct slice out = {
+      .buf = w->buf + w->next,
+      .len = w->block_size - w->next,
+  };
+
+  struct slice start = out;
+
+  bool restart = false;
+  struct slice key = {};
+  int n = 0;
+
+  record_key(rec, &key);
+  n = encode_key(&restart, out, last, key, record_val_type(rec));
+  if (n < 0) {
+    goto err;
+  }
+  out.buf += n;
+  out.len -= n;
+
+  n = record_encode(rec, out, w->hash_size);
+  if (n < 0) {
+    goto err;
+  }
+
+  out.buf += n;
+  out.len -= n;
+
+  if (block_writer_register_restart(w, start.len - out.len, restart, key) < 0) {
+    goto err;
+  }
+
+  free(slice_yield(&key));
+  return 0;
+
+err:
+  free(slice_yield(&key));
+  return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+                                  struct slice key) {
+  int rlen = w->restart_len;
+  if (rlen >= MAX_RESTARTS) {
+    restart = false;
+  }
+
+  if (restart) {
+    rlen++;
+  }
+  if (2 + 3 * rlen + n > w->block_size - w->next) {
+    return -1;
+  }
+  if (restart) {
+    if (w->restart_len == w->restart_cap) {
+      w->restart_cap = w->restart_cap * 2 + 1;
+      w->restarts = realloc(w->restarts, sizeof(uint32_t) * w->restart_cap);
+    }
+
+    w->restarts[w->restart_len++] = w->next;
+  }
+
+  w->next += n;
+  slice_copy(&w->last_key, key);
+  w->entries++;
+  return 0;
+}
+
+int block_writer_finish(struct block_writer *w) {
+  for (int i = 0; i < w->restart_len; i++) {
+    put_u24(w->buf + w->next, w->restarts[i]);
+    w->next += 3;
+  }
+
+  put_u16(w->buf + w->next, w->restart_len);
+  w->next += 2;
+  put_u24(w->buf + 1 + w->header_off, w->next);
+
+  if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+    int block_header_skip = 4 + w->header_off;
+    struct slice compressed = {};
+    uLongf dest_len = 0, src_len = 0;
+    slice_resize(&compressed, w->next - block_header_skip);
+
+    dest_len = compressed.len;
+    src_len = w->next - block_header_skip;
+
+    if (Z_OK != compress2(compressed.buf, &dest_len, w->buf + block_header_skip,
+                          src_len, 9)) {
+      free(slice_yield(&compressed));
+      return ZLIB_ERROR;
+    }
+    memcpy(w->buf + block_header_skip, compressed.buf, dest_len);
+    w->next = dest_len + block_header_skip;
+  }
+  return w->next;
+}
+
+byte block_reader_type(struct block_reader *r) {
+  return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct block *block,
+                      uint32_t header_off, uint32_t table_block_size,
+                      int hash_size) {
+  uint32_t full_block_size = table_block_size;
+  byte typ = block->data[header_off];
+  uint32_t sz = get_u24(block->data + header_off + 1);
+
+  if (!is_block_type(typ)) {
+    return FORMAT_ERROR;
+  }
+
+  if (typ == BLOCK_TYPE_LOG) {
+    struct slice uncompressed = {};
+    int block_header_skip = 4 + header_off;
+    uLongf dst_len = sz - block_header_skip;
+    uLongf src_len = block->len - block_header_skip;
+
+    slice_resize(&uncompressed, sz);
+    memcpy(uncompressed.buf, block->data, block_header_skip);
+
+    if (Z_OK != uncompress2(uncompressed.buf + block_header_skip, &dst_len,
+                            block->data + block_header_skip, &src_len)) {
+      free(slice_yield(&uncompressed));
+      return ZLIB_ERROR;
+    }
+
+    block_source_return_block(block->source, block);
+    block->data = uncompressed.buf;
+    block->len = dst_len; /* XXX: 4 bytes missing? */
+    block->source = malloc_block_source();
+    full_block_size = src_len + block_header_skip;
+  } else if (full_block_size == 0) {
+    full_block_size = sz;
+  } else if (sz < full_block_size && sz < block->len && block->data[sz] != 0) {
+    // If the block is smaller than the full block size,
+    // it is padded (data followed by '\0') or the next
+    // block is unaligned.
+    full_block_size = sz;
+  }
+
+  {
+    uint16_t restart_count = get_u16(block->data + sz - 2);
+    uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+    byte *restart_bytes = block->data + restart_start;
+
+    // transfer ownership.
+    br->block = *block;
+    block->data = NULL;
+    block->len = 0;
+
+    br->hash_size = hash_size;
+    br->block_len = restart_start;
+    br->full_block_size = full_block_size;
+    br->header_off = header_off;
+    br->restart_count = restart_count;
+    br->restart_bytes = restart_bytes;
+  }
+
+  return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i) {
+  return get_u24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it) {
+  it->br = br;
+  slice_resize(&it->last_key, 0);
+  it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+  struct slice key;
+  struct block_reader *r;
+  int error;
+};
+
+static int restart_key_less(int idx, void *args) {
+  struct restart_find_args *a = (struct restart_find_args *)args;
+  uint32_t off = block_reader_restart_offset(a->r, idx);
+  struct slice in = {
+      .buf = a->r->block.data + off,
+      .len = a->r->block_len - off,
+  };
+
+  /* the restart key is verbatim in the block, so this could avoid the
+     alloc for decoding the key */
+  struct slice rkey = {};
+  struct slice last_key = {};
+  byte unused_extra;
+  int n = decode_key(&rkey, &unused_extra, last_key, in);
+  if (n < 0) {
+    a->error = 1;
+    return -1;
+  }
+
+  {
+    int result = slice_compare(a->key, rkey);
+    free(slice_yield(&rkey));
+    return result;
+  }
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src) {
+  dest->br = src->br;
+  dest->next_off = src->next_off;
+  slice_copy(&dest->last_key, src->last_key);
+}
+
+// return < 0 for error, 0 for OK, > 0 for EOF.
+int block_iter_next(struct block_iter *it, struct record rec) {
+  if (it->next_off >= it->br->block_len) {
+    return 1;
+  }
+
+  {
+    struct slice in = {
+        .buf = it->br->block.data + it->next_off,
+        .len = it->br->block_len - it->next_off,
+    };
+    struct slice start = in;
+    struct slice key = {};
+    byte extra;
+    int n = decode_key(&key, &extra, it->last_key, in);
+    if (n < 0) {
+      return -1;
+    }
+
+    in.buf += n;
+    in.len -= n;
+    n = record_decode(rec, key, extra, in, it->br->hash_size);
+    if (n < 0) {
+      return -1;
+    }
+    in.buf += n;
+    in.len -= n;
+
+    slice_copy(&it->last_key, key);
+    it->next_off += start.len - in.len;
+    free(slice_yield(&key));
+    return 0;
+  }
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key) {
+  struct slice empty = {};
+  int off = br->header_off + 4;
+  struct slice in = {
+      .buf = br->block.data + off,
+      .len = br->block_len - off,
+  };
+
+  byte extra = 0;
+  int n = decode_key(key, &extra, empty, in);
+  if (n < 0) {
+    return n;
+  }
+  return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want) {
+  return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it) {
+  free(slice_yield(&it->last_key));
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+                      struct slice want) {
+  struct restart_find_args args = {
+      .key = want,
+      .r = br,
+  };
+
+  int i = binsearch(br->restart_count, &restart_key_less, &args);
+  if (args.error) {
+    return -1;
+  }
+
+  it->br = br;
+  if (i > 0) {
+    i--;
+    it->next_off = block_reader_restart_offset(br, i);
+  } else {
+    it->next_off = br->header_off + 4;
+  }
+
+  {
+    struct record rec = new_record(block_reader_type(br));
+    struct slice key = {};
+    int result = 0;
+    int err = 0;
+    struct block_iter next = {};
+    while (true) {
+      block_iter_copy_from(&next, it);
+
+      err = block_iter_next(&next, rec);
+      if (err < 0) {
+        result = -1;
+        goto exit;
+      }
+
+      record_key(rec, &key);
+      if (err > 0 || slice_compare(key, want) >= 0) {
+        result = 0;
+        goto exit;
+      }
+
+      block_iter_copy_from(it, &next);
+    }
+
+  exit:
+    free(slice_yield(&key));
+    free(slice_yield(&next.last_key));
+    record_clear(rec);
+    free(record_yield(&rec));
+
+    return result;
+  }
+}
+
+void block_writer_reset(struct block_writer *bw) {
+  bw->restart_len = 0;
+  bw->last_key.len = 0;
+}
+
+void block_writer_clear(struct block_writer *bw) {
+  free(bw->restarts);
+  bw->restarts = NULL;
+  free(slice_yield(&bw->last_key));
+  // the block is not owned.
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 0000000000..bc89f99aeb
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,71 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+struct block_writer {
+  byte *buf;
+  uint32_t block_size;
+  uint32_t header_off;
+  int restart_interval;
+  int hash_size;
+
+  uint32_t next;
+  uint32_t *restarts;
+  uint32_t restart_len;
+  uint32_t restart_cap;
+  struct slice last_key;
+  int entries;
+};
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+                       uint32_t block_size, uint32_t header_off, int hash_size);
+byte block_writer_type(struct block_writer *bw);
+int block_writer_add(struct block_writer *w, struct record rec);
+int block_writer_finish(struct block_writer *w);
+void block_writer_reset(struct block_writer *bw);
+void block_writer_clear(struct block_writer *bw);
+
+struct block_reader {
+  uint32_t header_off;
+  struct block block;
+  int hash_size;
+
+  // size of the data, excluding restart data.
+  uint32_t block_len;
+  byte *restart_bytes;
+  uint32_t full_block_size;
+  uint16_t restart_count;
+};
+
+struct block_iter {
+  struct block_reader *br;
+  struct slice last_key;
+  uint32_t next_off;
+};
+
+int block_reader_init(struct block_reader *br, struct block *bl,
+                      uint32_t header_off, uint32_t table_block_size,
+                      int hash_size);
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+                      struct slice want);
+byte block_reader_type(struct block_reader *r);
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+int block_iter_next(struct block_iter *it, struct record rec);
+int block_iter_seek(struct block_iter *it, struct slice want);
+void block_iter_close(struct block_iter *it);
+
+#endif
diff --git a/reftable/block_test.c b/reftable/block_test.c
new file mode 100644
index 0000000000..efc3d48a83
--- /dev/null
+++ b/reftable/block_test.c
@@ -0,0 +1,145 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include <string.h>
+
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+
+struct binsearch_args {
+  int key;
+  int *arr;
+};
+
+static int binsearch_func(int i, void *void_args) {
+  struct binsearch_args *args = (struct binsearch_args *)void_args;
+
+  return args->key < args->arr[i];
+}
+
+void test_binsearch() {
+  int arr[] = {2, 4, 6, 8, 10};
+  int sz = ARRAYSIZE(arr);
+  struct binsearch_args args = {
+      .arr = arr,
+  };
+
+  for (int i = 1; i < 11; i++) {
+    args.key = i;
+    int res = binsearch(sz, &binsearch_func, &args);
+
+    if (res < sz) {
+      assert(args.key < arr[res]);
+      if (res > 0) {
+        assert(args.key >= arr[res - 1]);
+      }
+    } else {
+      assert(args.key == 10 || args.key == 11);
+    }
+  }
+}
+
+void test_block_read_write() {
+  const int header_off = 21;  // random
+  const int N = 30;
+  char *names[N];
+  const int block_size = 1024;
+  struct block block = {};
+  block.data = calloc(block_size, 1);
+  block.len = block_size;
+
+  struct block_writer bw = {};
+  block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size, header_off,
+                    SHA1_SIZE);
+  struct ref_record ref = {};
+  struct record rec = {};
+  record_from_ref(&rec, &ref);
+
+  for (int i = 0; i < N; i++) {
+    char name[100];
+    sprintf(name, "branch%02d", i);
+
+    byte hash[SHA1_SIZE];
+    memset(hash, i, sizeof(hash));
+
+    ref.ref_name = name;
+    ref.value = hash;
+    names[i] = strdup(name);
+    int n = block_writer_add(&bw, rec);
+    ref.ref_name = NULL;
+    ref.value = NULL;
+    assert(n == 0);
+  }
+
+  int n = block_writer_finish(&bw);
+  assert(n > 0);
+
+  block_writer_clear(&bw);
+
+  struct block_reader br = {};
+  block_reader_init(&br, &block, header_off, block_size, SHA1_SIZE);
+
+  struct block_iter it = {};
+  block_reader_start(&br, &it);
+
+  int j = 0;
+  while (true) {
+    int r = block_iter_next(&it, rec);
+    assert(r >= 0);
+    if (r > 0) {
+      break;
+    }
+    assert_streq(names[j], ref.ref_name);
+    j++;
+  }
+
+  record_clear(rec);
+  block_iter_close(&it);
+
+  struct slice want = {};
+  for (int i = 0; i < N; i++) {
+    slice_set_string(&want, names[i]);
+
+    struct block_iter it = {};
+    int n = block_reader_seek(&br, &it, want);
+    assert(n == 0);
+
+    n = block_iter_next(&it, rec);
+    assert(n == 0);
+
+    assert_streq(names[i], ref.ref_name);
+
+    want.len--;
+    n = block_reader_seek(&br, &it, want);
+    assert(n == 0);
+
+    n = block_iter_next(&it, rec);
+    assert(n == 0);
+    assert_streq(names[10 * (i / 10)], ref.ref_name);
+
+    block_iter_close(&it);
+  }
+
+  record_clear(rec);
+  free(block.data);
+  free(slice_yield(&want));
+  for (int i = 0; i < N; i++) {
+    free(names[i]);
+  }
+}
+
+int main() {
+  add_test_case("binsearch", &test_binsearch);
+  add_test_case("block_read_write", &test_block_read_write);
+  test_main();
+}
diff --git a/reftable/blocksource.h b/reftable/blocksource.h
new file mode 100644
index 0000000000..269b3f2e7e
--- /dev/null
+++ b/reftable/blocksource.h
@@ -0,0 +1,20 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCKSOURCE_H
+#define BLOCKSOURCE_H
+
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+int block_source_read_block(struct block_source source, struct block *dest,
+                            uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source source);
+
+#endif
diff --git a/reftable/bytes.c b/reftable/bytes.c
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 0000000000..cd35704610
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,27 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+#define VERSION 1
+#define HEADER_SIZE 24
+#define FOOTER_SIZE 68
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 0000000000..2eed81ab90
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,102 @@
+// Copyright 2020 Google Inc. All rights reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+#include "reftable.h"
+
+static int dump_table(const char *tablename) {
+  struct block_source src = {};
+  int err = block_source_from_file(&src, tablename);
+  if (err < 0) {
+    return err;
+  }
+
+  struct reader *r = NULL;
+  err = new_reader(&r, src, tablename);
+  if (err < 0) {
+    return err;
+  }
+
+  {
+    struct iterator it = {};
+    err = reader_seek_ref(r, &it, "");
+    if (err < 0) {
+      return err;
+    }
+
+    struct ref_record ref = {};
+    while (1) {
+      err = iterator_next_ref(it, &ref);
+      if (err > 0) {
+        break;
+      }
+      if (err < 0) {
+        return err;
+      }
+      ref_record_print(&ref, 20);
+    }
+    iterator_destroy(&it);
+    ref_record_clear(&ref);
+  }
+
+  {
+    struct iterator it = {};
+    err = reader_seek_log(r, &it, "");
+    if (err < 0) {
+      return err;
+    }
+    struct log_record log = {};
+    while (1) {
+      err = iterator_next_log(it, &log);
+      if (err > 0) {
+        break;
+      }
+      if (err < 0) {
+        return err;
+      }
+      log_record_print(&log, 20);
+    }
+    iterator_destroy(&it);
+    log_record_clear(&log);
+  }
+  return 0;
+}
+
+int main(int argc, char *argv[]) {
+  int opt;
+  const char *table = NULL;
+  while ((opt = getopt(argc, argv, "t:")) != -1) {
+    switch (opt) {
+      case 't':
+        table = strdup(optarg);
+        break;
+      case '?':
+        printf("usage: %s [-table tablefile]\n", argv[0]);
+        return 2;
+        break;
+    }
+  }
+
+  if (table != NULL) {
+    int err = dump_table(table);
+    if (err < 0) {
+      fprintf(stderr, "%s: %s: %s\n", argv[0], table, error_str(err));
+      return 1;
+    }
+  }
+  return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 0000000000..358cd3332e
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,98 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+  int fd;
+  uint64_t size;
+};
+
+static uint64_t file_size(void *b) {
+  return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct block *dest) {
+  memset(dest->data, 0xff, dest->len);
+  free(dest->data);
+}
+
+static void file_close(void *b) {
+  int fd = ((struct file_block_source *)b)->fd;
+  if (fd > 0) {
+    close(fd);
+    ((struct file_block_source *)b)->fd = 0;
+  }
+
+  free(b);
+}
+
+static int file_read_block(void *v, struct block *dest, uint64_t off,
+                           uint32_t size) {
+  struct file_block_source *b = (struct file_block_source *)v;
+  assert(off + size <= b->size);
+  dest->data = malloc(size);
+  if (pread(b->fd, dest->data, size, off) != size) {
+    return -1;
+  }
+  dest->len = size;
+  return size;
+}
+
+struct block_source_vtable file_vtable = {
+    .size = &file_size,
+    .read_block = &file_read_block,
+    .return_block = &file_return_block,
+    .close = &file_close,
+};
+
+int block_source_from_file(struct block_source *bs, const char *name) {
+  struct stat st = {};
+  int err = 0;
+  int fd = open(name, O_RDONLY);
+  if (fd < 0) {
+    if (errno == ENOENT) {
+      return NOT_EXIST_ERROR;
+    }
+    return -1;
+  }
+
+  err = fstat(fd, &st);
+  if (err < 0) {
+    return -1;
+  }
+
+  {
+    struct file_block_source *p = calloc(sizeof(struct file_block_source), 1);
+    p->size = st.st_size;
+    p->fd = fd;
+
+    bs->ops = &file_vtable;
+    bs->arg = p;
+  }
+  return 0;
+}
+
+int fd_writer(void *arg, byte *data, int sz) {
+  int *fdp = (int *)arg;
+  return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 0000000000..12fb7ed48a
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,211 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include <stdlib.h>
+#include <string.h>
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct iterator it) { return it.ops == NULL; }
+
+static int empty_iterator_next(void *arg, struct record rec) { return 1; }
+
+static void empty_iterator_close(void *arg) {}
+
+struct iterator_vtable empty_vtable = {
+    .next = &empty_iterator_next,
+    .close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct iterator *it) {
+  it->iter_arg = NULL;
+  it->ops = &empty_vtable;
+}
+
+int iterator_next(struct iterator it, struct record rec) {
+  return it.ops->next(it.iter_arg, rec);
+}
+
+void iterator_destroy(struct iterator *it) {
+  if (it->ops == NULL) {
+    return;
+  }
+  it->ops->close(it->iter_arg);
+  it->ops = NULL;
+  free(it->iter_arg);
+  it->iter_arg = NULL;
+}
+
+int iterator_next_ref(struct iterator it, struct ref_record *ref) {
+  struct record rec = {};
+  record_from_ref(&rec, ref);
+  return iterator_next(it, rec);
+}
+
+int iterator_next_log(struct iterator it, struct log_record *log) {
+  struct record rec = {};
+  record_from_log(&rec, log);
+  return iterator_next(it, rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg) {
+  struct filtering_ref_iterator *fri =
+      (struct filtering_ref_iterator *)iter_arg;
+  free(slice_yield(&fri->oid));
+  iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg, struct record rec) {
+  struct filtering_ref_iterator *fri =
+      (struct filtering_ref_iterator *)iter_arg;
+  struct ref_record *ref = (struct ref_record *)rec.data;
+
+  while (true) {
+    int err = iterator_next_ref(fri->it, ref);
+    if (err != 0) {
+      return err;
+    }
+
+    if (fri->double_check) {
+      struct iterator it = {};
+
+      int err = reader_seek_ref(fri->r, &it, ref->ref_name);
+      if (err == 0) {
+        err = iterator_next_ref(it, ref);
+      }
+
+      iterator_destroy(&it);
+
+      if (err < 0) {
+        return err;
+      }
+
+      if (err > 0) {
+        continue;
+      }
+    }
+
+    if ((ref->target_value != NULL &&
+         0 == memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+        (ref->value != NULL &&
+         0 == memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+      return 0;
+    }
+  }
+}
+
+struct iterator_vtable filtering_ref_iterator_vtable = {
+    .next = &filtering_ref_iterator_next,
+    .close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *it,
+                                          struct filtering_ref_iterator *fri) {
+  it->iter_arg = fri;
+  it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p) {
+  struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+  block_iter_close(&it->cur);
+  reader_return_block(it->r, &it->block_reader.block);
+  free(slice_yield(&it->oid));
+}
+
+static int indexed_table_ref_iter_next_block(
+    struct indexed_table_ref_iter *it) {
+  if (it->offset_idx == it->offset_len) {
+    it->finished = true;
+    return 1;
+  }
+
+  reader_return_block(it->r, &it->block_reader.block);
+
+  {
+    uint64_t off = it->offsets[it->offset_idx++];
+    int err =
+        reader_init_block_reader(it->r, &it->block_reader, off, BLOCK_TYPE_REF);
+    if (err < 0) {
+      return err;
+    }
+    if (err > 0) {
+      // indexed block does not exist.
+      return FORMAT_ERROR;
+    }
+  }
+  block_reader_start(&it->block_reader, &it->cur);
+  return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct record rec) {
+  struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+  struct ref_record *ref = (struct ref_record *)rec.data;
+
+  while (true) {
+    int err = block_iter_next(&it->cur, rec);
+    if (err < 0) {
+      return err;
+    }
+
+    if (err > 0) {
+      err = indexed_table_ref_iter_next_block(it);
+      if (err < 0) {
+        return err;
+      }
+
+      if (it->finished) {
+        return 1;
+      }
+      continue;
+    }
+
+    if (0 == memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+        0 == memcmp(it->oid.buf, ref->value, it->oid.len)) {
+      return 0;
+    }
+  }
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+                               struct reader *r, byte *oid, int oid_len,
+                               uint64_t *offsets, int offset_len) {
+  struct indexed_table_ref_iter *itr =
+      calloc(sizeof(struct indexed_table_ref_iter), 1);
+  int err = 0;
+
+  itr->r = r;
+  slice_resize(&itr->oid, oid_len);
+  memcpy(itr->oid.buf, oid, oid_len);
+
+  itr->offsets = offsets;
+  itr->offset_len = offset_len;
+
+  err = indexed_table_ref_iter_next_block(itr);
+  if (err < 0) {
+    free(itr);
+  } else {
+    *dest = itr;
+  }
+  return err;
+}
+
+struct iterator_vtable indexed_table_ref_iter_vtable = {
+    .next = &indexed_table_ref_iter_next,
+    .close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+                                          struct indexed_table_ref_iter *itr) {
+  it->iter_arg = itr;
+  it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 0000000000..2590618a06
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,56 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct iterator_vtable {
+  int (*next)(void *iter_arg, struct record rec);
+  void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct iterator *it);
+int iterator_next(struct iterator it, struct record rec);
+bool iterator_is_null(struct iterator it);
+
+struct filtering_ref_iterator {
+  struct reader *r;
+  struct slice oid;
+  bool double_check;
+  struct iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *,
+                                          struct filtering_ref_iterator *);
+
+struct indexed_table_ref_iter {
+  struct reader *r;
+  struct slice oid;
+
+  // mutable
+  uint64_t *offsets;
+
+  // Points to the next offset to read.
+  int offset_idx;
+  int offset_len;
+  struct block_reader block_reader;
+  struct block_iter cur;
+  bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+                                          struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+                               struct reader *r, byte *oid, int oid_len,
+                               uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 0000000000..baf0997647
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,262 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include <stdlib.h>
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi) {
+  for (int i = 0; i < mi->stack_len; i++) {
+    struct record rec = new_record(mi->typ);
+    int err = iterator_next(mi->stack[i], rec);
+    if (err < 0) {
+      return err;
+    }
+
+    if (err > 0) {
+      iterator_destroy(&mi->stack[i]);
+      record_clear(rec);
+      free(record_yield(&rec));
+    } else {
+      struct pq_entry e = {
+          .rec = rec,
+          .index = i,
+      };
+      merged_iter_pqueue_add(&mi->pq, e);
+    }
+  }
+
+  return 0;
+}
+
+static void merged_iter_close(void *p) {
+  struct merged_iter *mi = (struct merged_iter *)p;
+  merged_iter_pqueue_clear(&mi->pq);
+  for (int i = 0; i < mi->stack_len; i++) {
+    iterator_destroy(&mi->stack[i]);
+  }
+  free(mi->stack);
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, int idx) {
+  if (iterator_is_null(mi->stack[idx])) {
+    return 0;
+  }
+
+  {
+    struct record rec = new_record(mi->typ);
+    struct pq_entry e = {
+        .rec = rec,
+        .index = idx,
+    };
+    int err = iterator_next(mi->stack[idx], rec);
+    if (err < 0) {
+      return err;
+    }
+
+    if (err > 0) {
+      iterator_destroy(&mi->stack[idx]);
+      record_clear(rec);
+      free(record_yield(&rec));
+      return 0;
+    }
+
+    merged_iter_pqueue_add(&mi->pq, e);
+  }
+  return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct record rec) {
+  struct slice entry_key = {};
+  struct pq_entry entry = merged_iter_pqueue_remove(&mi->pq);
+  int err = merged_iter_advance_subiter(mi, entry.index);
+  if (err < 0) {
+    return err;
+  }
+
+  record_key(entry.rec, &entry_key);
+  while (!merged_iter_pqueue_is_empty(mi->pq)) {
+    struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+    struct slice k = {};
+    int err = 0, cmp = 0;
+
+    record_key(top.rec, &k);
+
+    cmp = slice_compare(k, entry_key);
+    free(slice_yield(&k));
+
+    if (cmp > 0) {
+      break;
+    }
+
+    merged_iter_pqueue_remove(&mi->pq);
+    err = merged_iter_advance_subiter(mi, top.index);
+    if (err < 0) {
+      return err;
+    }
+    record_clear(top.rec);
+    free(record_yield(&top.rec));
+  }
+
+  record_copy_from(rec, entry.rec, mi->hash_size);
+  record_clear(entry.rec);
+  free(record_yield(&entry.rec));
+  free(slice_yield(&entry_key));
+  return 0;
+}
+
+static int merged_iter_next_void(void *p, struct record rec) {
+  struct merged_iter *mi = (struct merged_iter *)p;
+  if (merged_iter_pqueue_is_empty(mi->pq)) {
+    return 1;
+  }
+
+  return merged_iter_next(mi, rec);
+}
+
+struct iterator_vtable merged_iter_vtable = {
+    .next = &merged_iter_next_void,
+    .close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct iterator *it,
+                                      struct merged_iter *mi) {
+  it->iter_arg = mi;
+  it->ops = &merged_iter_vtable;
+}
+
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n) {
+  uint64_t last_max = 0;
+  uint64_t first_min = 0;
+  for (int i = 0; i < n; i++) {
+    struct reader *r = stack[i];
+    if (i > 0 && last_max >= reader_min_update_index(r)) {
+      return FORMAT_ERROR;
+    }
+    if (i == 0) {
+      first_min = reader_min_update_index(r);
+    }
+
+    last_max = reader_max_update_index(r);
+  }
+
+  {
+    struct merged_table m = {
+        .stack = stack,
+        .stack_len = n,
+        .min = first_min,
+        .max = last_max,
+        .hash_size = SHA1_SIZE,
+    };
+
+    *dest = calloc(sizeof(struct merged_table), 1);
+    **dest = m;
+  }
+  return 0;
+}
+
+void merged_table_close(struct merged_table *mt) {
+  for (int i = 0; i < mt->stack_len; i++) {
+    reader_free(mt->stack[i]);
+  }
+  free(mt->stack);
+  mt->stack = NULL;
+  mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct merged_table *mt) {
+  free(mt->stack);
+  mt->stack = NULL;
+  mt->stack_len = 0;
+}
+
+void merged_table_free(struct merged_table *mt) {
+  if (mt == NULL) {
+    return;
+  }
+  merged_table_clear(mt);
+  free(mt);
+}
+
+uint64_t merged_max_update_index(struct merged_table *mt) { return mt->max; }
+
+uint64_t merged_min_update_index(struct merged_table *mt) { return mt->min; }
+
+static int merged_table_seek_record(struct merged_table *mt,
+                                    struct iterator *it, struct record rec) {
+  struct iterator *iters = calloc(sizeof(struct iterator), mt->stack_len);
+  struct merged_iter merged = {
+      .stack = iters,
+      .typ = record_type(rec),
+      .hash_size = mt->hash_size,
+  };
+  int n = 0;
+  int err = 0;
+  for (int i = 0; i < mt->stack_len && err == 0; i++) {
+    int e = reader_seek(mt->stack[i], &iters[n], rec);
+    if (e < 0) {
+      err = e;
+    }
+    if (e == 0) {
+      n++;
+    }
+  }
+  if (err < 0) {
+    for (int i = 0; i < n; i++) {
+      iterator_destroy(&iters[i]);
+    }
+    free(iters);
+    return err;
+  }
+
+  merged.stack_len = n, err = merged_iter_init(&merged);
+  if (err < 0) {
+    merged_iter_close(&merged);
+    return err;
+  }
+
+  {
+    struct merged_iter *p = malloc(sizeof(struct merged_iter));
+    *p = merged;
+    iterator_from_merged_iter(it, p);
+  }
+  return 0;
+}
+
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+                          const char *name) {
+  struct ref_record ref = {
+      .ref_name = (char *)name,
+  };
+  struct record rec = {};
+  record_from_ref(&rec, &ref);
+  return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+                             const char *name, uint64_t update_index) {
+  struct log_record log = {
+      .ref_name = (char *)name,
+      .update_index = update_index,
+  };
+  struct record rec = {};
+  record_from_log(&rec, &log);
+  return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+                          const char *name) {
+  uint64_t max = ~((uint64_t)0);
+  return merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 0000000000..d9d7c7dc19
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct merged_table {
+  struct reader **stack;
+  int stack_len;
+  int hash_size;
+
+  uint64_t min;
+  uint64_t max;
+};
+
+struct merged_iter {
+  struct iterator *stack;
+  int hash_size;
+  int stack_len;
+  byte typ;
+  struct merged_iter_pqueue pq;
+} merged_iter;
+
+void merged_table_clear(struct merged_table *mt);
+
+#endif
diff --git a/reftable/merged_test.c b/reftable/merged_test.c
new file mode 100644
index 0000000000..81edc86a57
--- /dev/null
+++ b/reftable/merged_test.c
@@ -0,0 +1,247 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include <string.h>
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "pq.h"
+#include "reader.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+
+void test_pq(void) {
+  char *names[54] = {};
+  int N = ARRAYSIZE(names) - 1;
+
+  for (int i = 0; i < N; i++) {
+    char name[100];
+    sprintf(name, "%02d", i);
+    names[i] = strdup(name);
+  }
+
+  struct merged_iter_pqueue pq = {};
+
+  int i = 1;
+  do {
+    struct record rec = new_record(BLOCK_TYPE_REF);
+    record_as_ref(rec)->ref_name = names[i];
+
+    struct pq_entry e = {
+        .rec = rec,
+    };
+    merged_iter_pqueue_add(&pq, e);
+    merged_iter_pqueue_check(pq);
+    i = (i * 7) % N;
+  } while (i != 1);
+
+  const char *last = NULL;
+  while (!merged_iter_pqueue_is_empty(pq)) {
+    struct pq_entry e = merged_iter_pqueue_remove(&pq);
+    merged_iter_pqueue_check(pq);
+    struct ref_record *ref = record_as_ref(e.rec);
+
+    if (last != NULL) {
+      assert(strcmp(last, ref->ref_name) < 0);
+    }
+    last = ref->ref_name;
+    ref->ref_name = NULL;
+    free(ref);
+  }
+
+  for (int i = 0; i < N; i++) {
+    free(names[i]);
+  }
+
+  merged_iter_pqueue_clear(&pq);
+}
+
+void write_test_table(struct slice *buf, struct ref_record refs[], int n) {
+  int min = 0xffffffff;
+  int max = 0;
+  for (int i = 0; i < n; i++) {
+    uint64_t ui = refs[i].update_index;
+    if (ui > max) {
+      max = ui;
+    }
+    if (ui < min) {
+      min = ui;
+    }
+  }
+
+  struct write_options opts = {
+      .block_size = 256,
+  };
+
+  struct writer *w = new_writer(&slice_write_void, buf, &opts);
+  writer_set_limits(w, min, max);
+
+  for (int i = 0; i < n; i++) {
+    uint64_t before = refs[i].update_index;
+    int n = writer_add_ref(w, &refs[i]);
+    assert(n == 0);
+    assert(before == refs[i].update_index);
+  }
+
+  int err = writer_close(w);
+  assert(err == 0);
+
+  writer_free(w);
+  w = NULL;
+}
+
+static struct merged_table *merged_table_from_records(struct ref_record **refs,
+                                                      int *sizes,
+                                                      struct slice *buf,
+                                                      int n) {
+  struct block_source *source = calloc(n, sizeof(*source));
+  struct reader **rd = calloc(n, sizeof(*rd));
+  for (int i = 0; i < n; i++) {
+    write_test_table(&buf[i], refs[i], sizes[i]);
+    block_source_from_slice(&source[i], &buf[i]);
+
+    int err = new_reader(&rd[i], source[i], "name");
+    assert(err == 0);
+  }
+
+  struct merged_table *mt = NULL;
+  int err = new_merged_table(&mt, rd, n);
+  assert(err == 0);
+  return mt;
+}
+
+void test_merged_between(void) {
+  byte hash1[SHA1_SIZE];
+  byte hash2[SHA1_SIZE];
+
+  set_test_hash(hash1, 1);
+  set_test_hash(hash2, 2);
+  struct ref_record r1[] = {{
+      .ref_name = "b",
+      .update_index = 1,
+      .value = hash1,
+  }};
+  struct ref_record r2[] = {{
+      .ref_name = "a",
+      .update_index = 2,
+  }};
+
+  struct ref_record *refs[] = {r1, r2};
+  int sizes[] = {1, 1};
+  struct slice bufs[2] = {};
+  struct merged_table *mt = merged_table_from_records(refs, sizes, bufs, 2);
+
+  struct iterator it = {};
+  int err = merged_table_seek_ref(mt, &it, "a");
+  assert(err == 0);
+
+  struct ref_record ref = {};
+  err = iterator_next_ref(it, &ref);
+  assert_err(err);
+  assert(ref.update_index == 2);
+}
+
+void test_merged(void) {
+  byte hash1[SHA1_SIZE];
+  byte hash2[SHA1_SIZE];
+
+  set_test_hash(hash1, 1);
+  set_test_hash(hash2, 2);
+  struct ref_record r1[] = {{
+                                .ref_name = "a",
+                                .update_index = 1,
+                                .value = hash1,
+                            },
+                            {
+                                .ref_name = "b",
+                                .update_index = 1,
+                                .value = hash1,
+                            },
+                            {
+                                .ref_name = "c",
+                                .update_index = 1,
+                                .value = hash1,
+                            }};
+  struct ref_record r2[] = {{
+      .ref_name = "a",
+      .update_index = 2,
+  }};
+  struct ref_record r3[] = {
+      {
+          .ref_name = "c",
+          .update_index = 3,
+          .value = hash2,
+      },
+      {
+          .ref_name = "d",
+          .update_index = 3,
+          .value = hash1,
+      },
+  };
+
+  struct ref_record *refs[] = {r1, r2, r3};
+  int sizes[3] = {3, 1, 2};
+  struct slice bufs[3] = {};
+
+  struct merged_table *mt = merged_table_from_records(refs, sizes, bufs, 3);
+
+  struct iterator it = {};
+  int err = merged_table_seek_ref(mt, &it, "a");
+  assert(err == 0);
+
+  struct ref_record *out = NULL;
+  int len = 0;
+  int cap = 0;
+  while (len < 100) {  // cap loops/recursion.
+    struct ref_record ref = {};
+    int err = iterator_next_ref(it, &ref);
+    if (err > 0) {
+      break;
+    }
+    if (len == cap) {
+      cap = 2 * cap + 1;
+      out = realloc(out, sizeof(struct ref_record) * cap);
+    }
+    out[len++] = ref;
+  }
+  iterator_destroy(&it);
+
+  struct ref_record want[] = {
+      r2[0],
+      r1[1],
+      r3[0],
+      r3[1],
+  };
+  assert(ARRAYSIZE(want) == len);
+  for (int i = 0; i < len; i++) {
+    assert(ref_record_equal(&want[i], &out[i], SHA1_SIZE));
+  }
+  for (int i = 0; i < len; i++) {
+    ref_record_clear(&out[i]);
+  }
+  free(out);
+
+  for (int i = 0; i < 3; i++) {
+    free(slice_yield(&bufs[i]));
+  }
+  merged_table_close(mt);
+  merged_table_free(mt);
+}
+
+// XXX test refs_for(oid)
+
+int main() {
+  add_test_case("test_merged_between", &test_merged_between);
+  add_test_case("test_pq", &test_pq);
+  add_test_case("test_merged", &test_merged);
+  test_main();
+}
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 0000000000..3ea2f1bb3b
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,116 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include <assert.h>
+#include <stdlib.h>
+
+int pq_less(struct pq_entry a, struct pq_entry b) {
+  struct slice ak = {};
+  struct slice bk = {};
+  int cmp = 0;
+  record_key(a.rec, &ak);
+  record_key(b.rec, &bk);
+
+  cmp = slice_compare(ak, bk);
+
+  free(slice_yield(&ak));
+  free(slice_yield(&bk));
+
+  if (cmp == 0) {
+    return a.index > b.index;
+  }
+
+  return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq) {
+  return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq) {
+  return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq) {
+  for (int i = 1; i < pq.len; i++) {
+    int parent = (i - 1) / 2;
+
+    assert(pq_less(pq.heap[parent], pq.heap[i]));
+  }
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq) {
+  int i = 0;
+  struct pq_entry e = pq->heap[0];
+  pq->heap[0] = pq->heap[pq->len - 1];
+  pq->len--;
+
+  i = 0;
+  while (i < pq->len) {
+    int min = i;
+    int j = 2 * i + 1;
+    int k = 2 * i + 2;
+    if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+      min = j;
+    }
+    if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+      min = k;
+    }
+
+    if (min == i) {
+      break;
+    }
+
+    {
+      struct pq_entry tmp = pq->heap[min];
+      pq->heap[min] = pq->heap[i];
+      pq->heap[i] = tmp;
+    }
+
+    i = min;
+  }
+
+  return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e) {
+  int i = 0;
+  if (pq->len == pq->cap) {
+    pq->cap = 2 * pq->cap + 1;
+    pq->heap = realloc(pq->heap, pq->cap * sizeof(struct pq_entry));
+  }
+
+  pq->heap[pq->len++] = e;
+  i = pq->len - 1;
+  while (i > 0) {
+    int j = (i - 1) / 2;
+    if (pq_less(pq->heap[j], pq->heap[i])) {
+      break;
+    }
+
+    {
+      struct pq_entry tmp = pq->heap[j];
+      pq->heap[j] = pq->heap[i];
+      pq->heap[i] = tmp;
+    }
+
+    i = j;
+  }
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq) {
+  for (int i = 0; i < pq->len; i++) {
+    record_clear(pq->heap[i].rec);
+    free(record_yield(&pq->heap[i].rec));
+  }
+  free(pq->heap);
+  pq->heap = NULL;
+  pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 0000000000..c323da76cb
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+  struct record rec;
+  int index;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+  struct pq_entry *heap;
+  int len;
+  int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 0000000000..fd4800dcd6
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,672 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <zlib.h>
+
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct block_source source) {
+  return source.ops->size(source.arg);
+}
+
+int block_source_read_block(struct block_source source, struct block *dest,
+                            uint64_t off, uint32_t size) {
+  int result = source.ops->read_block(source.arg, dest, off, size);
+  dest->source = source;
+  return result;
+}
+
+void block_source_return_block(struct block_source source,
+                               struct block *blockp) {
+  source.ops->return_block(source.arg, blockp);
+  blockp->data = NULL;
+  blockp->len = 0;
+  blockp->source.ops = NULL;
+  blockp->source.arg = NULL;
+}
+
+void block_source_close(struct block_source *source) {
+  if (source->ops == NULL) {
+    return;
+  }
+
+  source->ops->close(source->arg);
+  source->ops = NULL;
+}
+
+static struct reader_offsets *reader_offsets_for(struct reader *r, byte typ) {
+  switch (typ) {
+    case BLOCK_TYPE_REF:
+      return &r->ref_offsets;
+    case BLOCK_TYPE_LOG:
+      return &r->log_offsets;
+    case BLOCK_TYPE_OBJ:
+      return &r->obj_offsets;
+  }
+  abort();
+}
+
+static int reader_get_block(struct reader *r, struct block *dest, uint64_t off,
+                            uint32_t sz) {
+  if (off >= r->size) {
+    return 0;
+  }
+
+  if (off + sz > r->size) {
+    sz = r->size - off;
+  }
+
+  return block_source_read_block(r->source, dest, off, sz);
+}
+
+void reader_return_block(struct reader *r, struct block *p) {
+  block_source_return_block(r->source, p);
+}
+
+const char *reader_name(struct reader *r) { return r->name; }
+
+static int parse_footer(struct reader *r, byte *footer, byte *header) {
+  byte *f = footer;
+  int err = 0;
+  if (memcmp(f, "REFT", 4)) {
+    err = FORMAT_ERROR;
+    goto exit;
+  }
+  f += 4;
+
+  if (0 != memcmp(footer, header, HEADER_SIZE)) {
+    err = FORMAT_ERROR;
+    goto exit;
+  }
+
+  {
+    byte version = *f++;
+    if (version != 1) {
+      err = FORMAT_ERROR;
+      goto exit;
+    }
+  }
+
+  r->block_size = get_u24(f);
+
+  f += 3;
+  r->min_update_index = get_u64(f);
+  f += 8;
+  r->max_update_index = get_u64(f);
+  f += 8;
+
+  r->ref_offsets.index_offset = get_u64(f);
+  f += 8;
+
+  r->obj_offsets.offset = get_u64(f);
+  f += 8;
+
+  r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+  r->obj_offsets.offset >>= 5;
+
+  r->obj_offsets.index_offset = get_u64(f);
+  f += 8;
+  r->log_offsets.offset = get_u64(f);
+  f += 8;
+  r->log_offsets.index_offset = get_u64(f);
+  f += 8;
+
+  {
+    uint32_t computed_crc = crc32(0, footer, f - footer);
+    uint32_t file_crc = get_u32(f);
+    f += 4;
+    if (computed_crc != file_crc) {
+      err = FORMAT_ERROR;
+      goto exit;
+    }
+  }
+
+  {
+    byte first_block_typ = header[HEADER_SIZE];
+    r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+    r->ref_offsets.offset = 0;
+    r->log_offsets.present =
+        (first_block_typ == BLOCK_TYPE_LOG || r->log_offsets.offset > 0);
+    r->obj_offsets.present = r->obj_offsets.offset > 0;
+  }
+  err = 0;
+exit:
+  return err;
+}
+
+int init_reader(struct reader *r, struct block_source source,
+                const char *name) {
+  struct block footer = {};
+  struct block header = {};
+  int err = 0;
+
+  memset(r, 0, sizeof(struct reader));
+  r->size = block_source_size(source) - FOOTER_SIZE;
+  r->source = source;
+  r->name = strdup(name);
+  r->hash_size = SHA1_SIZE;
+
+  err = block_source_read_block(source, &footer, r->size, FOOTER_SIZE);
+  if (err != FOOTER_SIZE) {
+    err = IO_ERROR;
+    goto exit;
+  }
+
+  // Need +1 to read type of first block.
+  err = reader_get_block(r, &header, 0, HEADER_SIZE + 1);
+  if (err != HEADER_SIZE + 1) {
+    err = IO_ERROR;
+    goto exit;
+  }
+
+  err = parse_footer(r, footer.data, header.data);
+exit:
+  block_source_return_block(r->source, &footer);
+  block_source_return_block(r->source, &header);
+  return err;
+}
+
+struct table_iter {
+  struct reader *r;
+  byte typ;
+  uint64_t block_off;
+  struct block_iter bi;
+  bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+                                 struct table_iter *src) {
+  dest->r = src->r;
+  dest->typ = src->typ;
+  dest->block_off = src->block_off;
+  dest->finished = src->finished;
+  block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti, struct record rec) {
+  int res = block_iter_next(&ti->bi, rec);
+  if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
+    ((struct ref_record *)rec.data)->update_index += ti->r->min_update_index;
+  }
+
+  return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti) {
+  if (ti->bi.br == NULL) {
+    return;
+  }
+  reader_return_block(ti->r, &ti->bi.br->block);
+  free(ti->bi.br);
+  ti->bi.br = NULL;
+
+  ti->bi.last_key.len = 0;
+  ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off) {
+  int32_t result = 0;
+
+  if (off == 0) {
+    data += 24;
+  }
+
+  *typ = data[0];
+  if (is_block_type(*typ)) {
+    result = get_u24(data + 1);
+  }
+  return result;
+}
+
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+                             uint64_t next_off, byte want_typ) {
+  int32_t guess_block_size = r->block_size ? r->block_size : DEFAULT_BLOCK_SIZE;
+  struct block block = {};
+  byte block_typ = 0;
+  int err = 0;
+  uint32_t header_off = next_off ? 0 : HEADER_SIZE;
+  int32_t block_size = 0;
+
+  if (next_off >= r->size) {
+    return 1;
+  }
+
+  err = reader_get_block(r, &block, next_off, guess_block_size);
+  if (err < 0) {
+    return err;
+  }
+
+  block_size = extract_block_size(block.data, &block_typ, next_off);
+  if (block_size < 0) {
+    return block_size;
+  }
+
+  if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+    reader_return_block(r, &block);
+    return 1;
+  }
+
+  if (block_size > guess_block_size) {
+    reader_return_block(r, &block);
+    err = reader_get_block(r, &block, next_off, block_size);
+    if (err < 0) {
+      return err;
+    }
+  }
+
+  return block_reader_init(br, &block, header_off, r->block_size, r->hash_size);
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+                                 struct table_iter *src) {
+  uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+  struct block_reader br = {};
+  int err = 0;
+
+  dest->r = src->r;
+  dest->typ = src->typ;
+  dest->block_off = next_block_off;
+
+  err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+  if (err > 0) {
+    dest->finished = true;
+    return 1;
+  }
+  if (err != 0) {
+    return err;
+  }
+
+  {
+    struct block_reader *brp = malloc(sizeof(struct block_reader));
+    *brp = br;
+
+    dest->finished = false;
+    block_reader_start(brp, &dest->bi);
+  }
+  return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct record rec) {
+  if (record_type(rec) != ti->typ) {
+    return API_ERROR;
+  }
+
+  while (true) {
+    struct table_iter next = {};
+    int err = 0;
+    if (ti->finished) {
+      return 1;
+    }
+
+    err = table_iter_next_in_block(ti, rec);
+    if (err <= 0) {
+      return err;
+    }
+
+    err = table_iter_next_block(&next, ti);
+    if (err != 0) {
+      ti->finished = true;
+    }
+    table_iter_block_done(ti);
+    if (err != 0) {
+      return err;
+    }
+    table_iter_copy_from(ti, &next);
+    block_iter_close(&next.bi);
+  }
+}
+
+static int table_iter_next_void(void *ti, struct record rec) {
+  return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p) {
+  struct table_iter *ti = (struct table_iter *)p;
+  table_iter_block_done(ti);
+  block_iter_close(&ti->bi);
+}
+
+struct iterator_vtable table_iter_vtable = {
+    .next = &table_iter_next_void,
+    .close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct iterator *it,
+                                     struct table_iter *ti) {
+  it->iter_arg = ti;
+  it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reader *r, struct table_iter *ti,
+                                uint64_t off, byte typ) {
+  struct block_reader br = {};
+  struct block_reader *brp = NULL;
+
+  int err = reader_init_block_reader(r, &br, off, typ);
+  if (err != 0) {
+    return err;
+  }
+
+  brp = malloc(sizeof(struct block_reader));
+  *brp = br;
+  ti->r = r;
+  ti->typ = block_reader_type(brp);
+  ti->block_off = off;
+  block_reader_start(brp, &ti->bi);
+  return 0;
+}
+
+static int reader_start(struct reader *r, struct table_iter *ti, byte typ,
+                        bool index) {
+  struct reader_offsets *offs = reader_offsets_for(r, typ);
+  uint64_t off = offs->offset;
+  if (index) {
+    off = offs->index_offset;
+    if (off == 0) {
+      return 1;
+    }
+    typ = BLOCK_TYPE_INDEX;
+  }
+
+  return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reader *r, struct table_iter *ti,
+                              struct record want) {
+  struct record rec = new_record(record_type(want));
+  struct slice want_key = {};
+  struct slice got_key = {};
+  struct table_iter next = {};
+  int err = -1;
+  record_key(want, &want_key);
+
+  while (true) {
+    err = table_iter_next_block(&next, ti);
+    if (err < 0) {
+      goto exit;
+    }
+
+    if (err > 0) {
+      break;
+    }
+
+    err = block_reader_first_key(next.bi.br, &got_key);
+    if (err < 0) {
+      goto exit;
+    }
+    {
+      int cmp = slice_compare(got_key, want_key);
+      if (cmp > 0) {
+        table_iter_block_done(&next);
+        break;
+      }
+    }
+
+    table_iter_block_done(ti);
+    table_iter_copy_from(ti, &next);
+  }
+
+  err = block_iter_seek(&ti->bi, want_key);
+  if (err < 0) {
+    goto exit;
+  }
+  err = 0;
+
+exit:
+  block_iter_close(&next.bi);
+  record_clear(rec);
+  free(record_yield(&rec));
+  free(slice_yield(&want_key));
+  free(slice_yield(&got_key));
+  return err;
+}
+
+static int reader_seek_indexed(struct reader *r, struct iterator *it,
+                               struct record rec) {
+  struct index_record want_index = {};
+  struct record want_index_rec = {};
+  struct index_record index_result = {};
+  struct record index_result_rec = {};
+  struct table_iter index_iter = {};
+  struct table_iter next = {};
+  int err = 0;
+
+  record_key(rec, &want_index.last_key);
+  record_from_index(&want_index_rec, &want_index);
+  record_from_index(&index_result_rec, &index_result);
+
+  err = reader_start(r, &index_iter, record_type(rec), true);
+  if (err < 0) {
+    goto exit;
+  }
+
+  err = reader_seek_linear(r, &index_iter, want_index_rec);
+  while (true) {
+    err = table_iter_next(&index_iter, index_result_rec);
+    table_iter_block_done(&index_iter);
+    if (err != 0) {
+      goto exit;
+    }
+
+    err = reader_table_iter_at(r, &next, index_result.offset, 0);
+    if (err != 0) {
+      goto exit;
+    }
+
+    err = block_iter_seek(&next.bi, want_index.last_key);
+    if (err < 0) {
+      goto exit;
+    }
+
+    if (next.typ == record_type(rec)) {
+      err = 0;
+      break;
+    }
+
+    if (next.typ != BLOCK_TYPE_INDEX) {
+      err = FORMAT_ERROR;
+      break;
+    }
+
+    table_iter_copy_from(&index_iter, &next);
+  }
+
+  if (err == 0) {
+    struct table_iter *malloced = calloc(sizeof(struct table_iter), 1);
+    table_iter_copy_from(malloced, &next);
+    iterator_from_table_iter(it, malloced);
+  }
+exit:
+  block_iter_close(&next.bi);
+  table_iter_close(&index_iter);
+  record_clear(want_index_rec);
+  record_clear(index_result_rec);
+  return err;
+}
+
+static int reader_seek_internal(struct reader *r, struct iterator *it,
+                                struct record rec) {
+  struct reader_offsets *offs = reader_offsets_for(r, record_type(rec));
+  uint64_t idx = offs->index_offset;
+  struct table_iter ti = {};
+  int err = 0;
+  if (idx > 0) {
+    return reader_seek_indexed(r, it, rec);
+  }
+
+  err = reader_start(r, &ti, record_type(rec), false);
+  if (err < 0) {
+    return err;
+  }
+  err = reader_seek_linear(r, &ti, rec);
+  if (err < 0) {
+    return err;
+  }
+
+  {
+    struct table_iter *p = malloc(sizeof(struct table_iter));
+    *p = ti;
+    iterator_from_table_iter(it, p);
+  }
+
+  return 0;
+}
+
+int reader_seek(struct reader *r, struct iterator *it, struct record rec) {
+  byte typ = record_type(rec);
+
+  struct reader_offsets *offs = reader_offsets_for(r, typ);
+  if (!offs->present) {
+    iterator_set_empty(it);
+    return 0;
+  }
+
+  return reader_seek_internal(r, it, rec);
+}
+
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name) {
+  struct ref_record ref = {
+      .ref_name = (char *)name,
+  };
+  struct record rec = {};
+  record_from_ref(&rec, &ref);
+  return reader_seek(r, it, rec);
+}
+
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+                    uint64_t update_index) {
+  struct log_record log = {
+      .ref_name = (char *)name,
+      .update_index = update_index,
+  };
+  struct record rec = {};
+  record_from_log(&rec, &log);
+  return reader_seek(r, it, rec);
+}
+
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name) {
+  uint64_t max = ~((uint64_t)0);
+  return reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reader *r) {
+  block_source_close(&r->source);
+  free(r->name);
+  r->name = NULL;
+}
+
+int new_reader(struct reader **p, struct block_source src, char const *name) {
+  struct reader *rd = calloc(sizeof(struct reader), 1);
+  int err = init_reader(rd, src, name);
+  if (err == 0) {
+    *p = rd;
+  } else {
+    free(rd);
+  }
+  return err;
+}
+
+void reader_free(struct reader *r) {
+  reader_close(r);
+  free(r);
+}
+
+static int reader_refs_for_indexed(struct reader *r, struct iterator *it,
+                                   byte *oid) {
+  struct obj_record want = {
+      .hash_prefix = oid,
+      .hash_prefix_len = r->object_id_len,
+  };
+  struct record want_rec = {};
+  struct iterator oit = {};
+  struct obj_record got = {};
+  struct record got_rec = {};
+  int err = 0;
+
+  record_from_obj(&want_rec, &want);
+
+  err = reader_seek(r, &oit, want_rec);
+  if (err != 0) {
+    return err;
+  }
+
+  record_from_obj(&got_rec, &got);
+  err = iterator_next(oit, got_rec);
+  iterator_destroy(&oit);
+  if (err < 0) {
+    return err;
+  }
+
+  if (err > 0 || memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+    iterator_set_empty(it);
+    return 0;
+  }
+
+  {
+    struct indexed_table_ref_iter *itr = NULL;
+    err = new_indexed_table_ref_iter(&itr, r, oid, r->hash_size, got.offsets,
+                                     got.offset_len);
+    if (err < 0) {
+      record_clear(got_rec);
+      return err;
+    }
+    got.offsets = NULL;
+    record_clear(got_rec);
+
+    iterator_from_indexed_table_ref_iter(it, itr);
+  }
+
+  return 0;
+}
+
+static int reader_refs_for_unindexed(struct reader *r, struct iterator *it,
+                                     byte *oid, int oid_len) {
+  struct table_iter *ti = calloc(sizeof(struct table_iter), 1);
+  struct filtering_ref_iterator *filter = NULL;
+  int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+  if (err < 0) {
+    free(ti);
+    return err;
+  }
+
+  filter = calloc(sizeof(struct filtering_ref_iterator), 1);
+  slice_resize(&filter->oid, oid_len);
+  memcpy(filter->oid.buf, oid, oid_len);
+  filter->r = r;
+  filter->double_check = false;
+  iterator_from_table_iter(&filter->it, ti);
+
+  iterator_from_filtering_ref_iterator(it, filter);
+  return 0;
+}
+
+int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
+                    int oid_len) {
+  if (r->obj_offsets.present) {
+    return reader_refs_for_indexed(r, it, oid);
+  }
+  return reader_refs_for_unindexed(r, it, oid, oid_len);
+}
+
+uint64_t reader_max_update_index(struct reader *r) {
+  return r->max_update_index;
+}
+
+uint64_t reader_min_update_index(struct reader *r) {
+  return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 0000000000..20dff91fc6
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,52 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+
+int block_source_read_block(struct block_source source, struct block *dest,
+                            uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source *source);
+
+struct reader_offsets {
+  bool present;
+  uint64_t offset;
+  uint64_t index_offset;
+};
+
+struct reader {
+  struct block_source source;
+  char *name;
+  int hash_size;
+  uint64_t size;
+  uint32_t block_size;
+  uint64_t min_update_index;
+  uint64_t max_update_index;
+  int object_id_len;
+
+  struct reader_offsets ref_offsets;
+  struct reader_offsets obj_offsets;
+  struct reader_offsets log_offsets;
+};
+
+int init_reader(struct reader *r, struct block_source source, const char *name);
+int reader_seek(struct reader *r, struct iterator *it, struct record rec);
+void reader_close(struct reader *r);
+const char *reader_name(struct reader *r);
+void reader_return_block(struct reader *r, struct block *p);
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+                             uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 0000000000..e28eb0670d
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1030 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include <assert.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "constants.h"
+#include "reftable.h"
+
+int is_block_type(byte typ) {
+  switch (typ) {
+    case BLOCK_TYPE_REF:
+    case BLOCK_TYPE_LOG:
+    case BLOCK_TYPE_OBJ:
+    case BLOCK_TYPE_INDEX:
+      return true;
+  }
+  return false;
+}
+
+int get_var_int(uint64_t *dest, struct slice in) {
+  int ptr = 0;
+  uint64_t val;
+
+  if (in.len == 0) {
+    return -1;
+  }
+  val = in.buf[ptr] & 0x7f;
+
+  while (in.buf[ptr] & 0x80) {
+    ptr++;
+    if (ptr > in.len) {
+      return -1;
+    }
+    val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+  }
+
+  *dest = val;
+  return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val) {
+  byte buf[10] = {};
+  int i = 9;
+  buf[i] = (byte)(val & 0x7f);
+  i--;
+  while (true) {
+    val >>= 7;
+    if (!val) {
+      break;
+    }
+    val--;
+    buf[i] = 0x80 | (byte)(val & 0x7f);
+    i--;
+  }
+
+  {
+    int n = sizeof(buf) - i - 1;
+    if (dest.len < n) {
+      return -1;
+    }
+    memcpy(dest.buf, &buf[i + 1], n);
+    return n;
+  }
+}
+
+int common_prefix_size(struct slice a, struct slice b) {
+  int p = 0;
+  while (p < a.len && p < b.len) {
+    if (a.buf[p] != b.buf[p]) {
+      break;
+    }
+    p++;
+  }
+
+  return p;
+}
+
+static int decode_string(struct slice *dest, struct slice in) {
+  int start_len = in.len;
+  uint64_t tsize = 0;
+  int n = get_var_int(&tsize, in);
+  if (n <= 0) {
+    return -1;
+  }
+  in.buf += n;
+  in.len -= n;
+  if (in.len < tsize) {
+    return -1;
+  }
+
+  slice_resize(dest, tsize + 1);
+  dest->buf[tsize] = 0;
+  memcpy(dest->buf, in.buf, tsize);
+  in.buf += tsize;
+  in.len -= tsize;
+
+  return start_len - in.len;
+}
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+               struct slice key, byte extra) {
+  struct slice start = dest;
+  int prefix_len = common_prefix_size(prev_key, key);
+  uint64_t suffix_len = key.len - prefix_len;
+  int n = put_var_int(dest, (uint64_t)prefix_len);
+  if (n < 0) {
+    return -1;
+  }
+  dest.buf += n;
+  dest.len -= n;
+
+  *restart = (prefix_len == 0);
+
+  n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+  if (n < 0) {
+    return -1;
+  }
+  dest.buf += n;
+  dest.len -= n;
+
+  if (dest.len < suffix_len) {
+    return -1;
+  }
+  memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+  dest.buf += suffix_len;
+  dest.len -= suffix_len;
+
+  return start.len - dest.len;
+}
+
+static byte ref_record_type(void) { return BLOCK_TYPE_REF; }
+
+static void ref_record_key(const void *r, struct slice *dest) {
+  const struct ref_record *rec = (const struct ref_record *)r;
+  slice_set_string(dest, rec->ref_name);
+}
+
+static void ref_record_copy_from(void *rec, const void *src_rec,
+                                 int hash_size) {
+  struct ref_record *ref = (struct ref_record *)rec;
+  struct ref_record *src = (struct ref_record *)src_rec;
+  assert(hash_size > 0);
+
+  // This is simple and correct, but we could probably reuse the hash fields.
+  ref_record_clear(ref);
+  if (src->ref_name != NULL) {
+    ref->ref_name = strdup(src->ref_name);
+  }
+
+  if (src->target != NULL) {
+    ref->target = strdup(src->target);
+  }
+
+  if (src->target_value != NULL) {
+    ref->target_value = malloc(hash_size);
+    memcpy(ref->target_value, src->target_value, hash_size);
+  }
+
+  if (src->value != NULL) {
+    ref->value = malloc(hash_size);
+    memcpy(ref->value, src->value, hash_size);
+  }
+  ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c) {
+  if (c <= 9) {
+    return '0' + c;
+  }
+  return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size) {
+  assert(hash_size > 0);
+  if (src != NULL) {
+    for (int i = 0; i < hash_size; i++) {
+      dest[2 * i] = hexdigit(src[i] >> 4);
+      dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+    }
+    dest[2 * hash_size] = 0;
+  }
+}
+
+void ref_record_print(struct ref_record *ref, int hash_size) {
+  char hex[SHA256_SIZE + 1] = {};
+
+  printf("ref{%s(%ld) ", ref->ref_name, ref->update_index);
+  if (ref->value != NULL) {
+    hex_format(hex, ref->value, hash_size);
+    printf("%s", hex);
+  }
+  if (ref->target_value != NULL) {
+    hex_format(hex, ref->target_value, hash_size);
+    printf(" (T %s)", hex);
+  }
+  if (ref->target != NULL) {
+    printf("=> %s", ref->target);
+  }
+  printf("}\n");
+}
+
+static void ref_record_clear_void(void *rec) {
+  ref_record_clear((struct ref_record *)rec);
+}
+
+void ref_record_clear(struct ref_record *ref) {
+  free(ref->ref_name);
+  free(ref->target);
+  free(ref->target_value);
+  free(ref->value);
+  memset(ref, 0, sizeof(struct ref_record));
+}
+
+static byte ref_record_val_type(const void *rec) {
+  const struct ref_record *r = (const struct ref_record *)rec;
+  if (r->value != NULL) {
+    if (r->target_value != NULL) {
+      return 2;
+    } else {
+      return 1;
+    }
+  } else if (r->target != NULL) {
+    return 3;
+  }
+  return 0;
+}
+
+static int encode_string(char *str, struct slice s) {
+  struct slice start = s;
+  int l = strlen(str);
+  int n = put_var_int(s, l);
+  if (n < 0) {
+    return -1;
+  }
+  s.buf += n;
+  s.len -= n;
+  if (s.len < l) {
+    return -1;
+  }
+  memcpy(s.buf, str, l);
+  s.buf += l;
+  s.len -= l;
+
+  return start.len - s.len;
+}
+
+static int ref_record_encode(const void *rec, struct slice s, int hash_size) {
+  const struct ref_record *r = (const struct ref_record *)rec;
+  struct slice start = s;
+  int n = put_var_int(s, r->update_index);
+  assert(hash_size > 0);
+  if (n < 0) {
+    return -1;
+  }
+  s.buf += n;
+  s.len -= n;
+
+  if (r->value != NULL) {
+    if (s.len < hash_size) {
+      return -1;
+    }
+    memcpy(s.buf, r->value, hash_size);
+    s.buf += hash_size;
+    s.len -= hash_size;
+  }
+
+  if (r->target_value != NULL) {
+    if (s.len < hash_size) {
+      return -1;
+    }
+    memcpy(s.buf, r->target_value, hash_size);
+    s.buf += hash_size;
+    s.len -= hash_size;
+  }
+
+  if (r->target != NULL) {
+    int n = encode_string(r->target, s);
+    if (n < 0) {
+      return -1;
+    }
+    s.buf += n;
+    s.len -= n;
+  }
+
+  return start.len - s.len;
+}
+
+static int ref_record_decode(void *rec, struct slice key, byte val_type,
+                             struct slice in, int hash_size) {
+  struct ref_record *r = (struct ref_record *)rec;
+  struct slice start = in;
+  bool seen_value = false;
+  bool seen_target_value = false;
+  bool seen_target = false;
+
+  int n = get_var_int(&r->update_index, in);
+  if (n < 0) {
+    return n;
+  }
+  assert(hash_size > 0);
+
+  in.buf += n;
+  in.len -= n;
+
+  r->ref_name = realloc(r->ref_name, key.len + 1);
+  memcpy(r->ref_name, key.buf, key.len);
+  r->ref_name[key.len] = 0;
+
+  switch (val_type) {
+    case 1:
+    case 2:
+      if (in.len < hash_size) {
+        return -1;
+      }
+
+      if (r->value == NULL) {
+        r->value = malloc(hash_size);
+      }
+      seen_value = true;
+      memcpy(r->value, in.buf, hash_size);
+      in.buf += hash_size;
+      in.len -= hash_size;
+      if (val_type == 1) {
+        break;
+      }
+      if (r->target_value == NULL) {
+        r->target_value = malloc(hash_size);
+      }
+      seen_target_value = true;
+      memcpy(r->target_value, in.buf, hash_size);
+      in.buf += hash_size;
+      in.len -= hash_size;
+      break;
+    case 3: {
+      struct slice dest = {};
+      int n = decode_string(&dest, in);
+      if (n < 0) {
+        return -1;
+      }
+      in.buf += n;
+      in.len -= n;
+      seen_target = true;
+      r->target = (char *)slice_as_string(&dest);
+    } break;
+
+    case 0:
+      break;
+    default:
+      abort();
+      break;
+  }
+
+  if (!seen_target && r->target != NULL) {
+    free(r->target);
+    r->target = NULL;
+  }
+  if (!seen_target_value && r->target_value != NULL) {
+    free(r->target_value);
+    r->target_value = NULL;
+  }
+  if (!seen_value && r->value != NULL) {
+    free(r->value);
+    r->value = NULL;
+  }
+
+  return start.len - in.len;
+}
+
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+               struct slice in) {
+  int start_len = in.len;
+  uint64_t prefix_len = 0;
+  uint64_t suffix_len = 0;
+  int n = get_var_int(&prefix_len, in);
+  if (n < 0) {
+    return -1;
+  }
+  in.buf += n;
+  in.len -= n;
+
+  if (prefix_len > last_key.len) {
+    return -1;
+  }
+
+  n = get_var_int(&suffix_len, in);
+  if (n <= 0) {
+    return -1;
+  }
+  in.buf += n;
+  in.len -= n;
+
+  *extra = (byte)(suffix_len & 0x7);
+  suffix_len >>= 3;
+
+  if (in.len < suffix_len) {
+    return -1;
+  }
+
+  slice_resize(key, suffix_len + prefix_len);
+  memcpy(key->buf, last_key.buf, prefix_len);
+
+  memcpy(key->buf + prefix_len, in.buf, suffix_len);
+  in.buf += suffix_len;
+  in.len -= suffix_len;
+
+  return start_len - in.len;
+}
+
+struct record_vtable ref_record_vtable = {
+    .key = &ref_record_key,
+    .type = &ref_record_type,
+    .copy_from = &ref_record_copy_from,
+    .val_type = &ref_record_val_type,
+    .encode = &ref_record_encode,
+    .decode = &ref_record_decode,
+    .clear = &ref_record_clear_void,
+};
+
+static byte obj_record_type(void) { return BLOCK_TYPE_OBJ; }
+
+static void obj_record_key(const void *r, struct slice *dest) {
+  const struct obj_record *rec = (const struct obj_record *)r;
+  slice_resize(dest, rec->hash_prefix_len);
+  memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void obj_record_copy_from(void *rec, const void *src_rec,
+                                 int hash_size) {
+  struct obj_record *ref = (struct obj_record *)rec;
+  const struct obj_record *src = (const struct obj_record *)src_rec;
+
+  *ref = *src;
+  ref->hash_prefix = malloc(ref->hash_prefix_len);
+  memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
+
+  {
+    int olen = ref->offset_len * sizeof(uint64_t);
+    ref->offsets = malloc(olen);
+    memcpy(ref->offsets, src->offsets, olen);
+  }
+}
+
+static void obj_record_clear(void *rec) {
+  struct obj_record *ref = (struct obj_record *)rec;
+  free(ref->hash_prefix);
+  free(ref->offsets);
+  memset(ref, 0, sizeof(struct obj_record));
+}
+
+static byte obj_record_val_type(const void *rec) {
+  struct obj_record *r = (struct obj_record *)rec;
+  if (r->offset_len > 0 && r->offset_len < 8) {
+    return r->offset_len;
+  }
+  return 0;
+}
+
+static int obj_record_encode(const void *rec, struct slice s, int hash_size) {
+  struct obj_record *r = (struct obj_record *)rec;
+  struct slice start = s;
+  int n = 0;
+  if (r->offset_len == 0 || r->offset_len >= 8) {
+    n = put_var_int(s, r->offset_len);
+    if (n < 0) {
+      return -1;
+    }
+    s.buf += n;
+    s.len -= n;
+  }
+  if (r->offset_len == 0) {
+    return start.len - s.len;
+  }
+  n = put_var_int(s, r->offsets[0]);
+  if (n < 0) {
+    return -1;
+  }
+  s.buf += n;
+  s.len -= n;
+
+  {
+    uint64_t last = r->offsets[0];
+    for (int i = 1; i < r->offset_len; i++) {
+      int n = put_var_int(s, r->offsets[i] - last);
+      if (n < 0) {
+        return -1;
+      }
+      s.buf += n;
+      s.len -= n;
+      last = r->offsets[i];
+    }
+  }
+  return start.len - s.len;
+}
+
+static int obj_record_decode(void *rec, struct slice key, byte val_type,
+                             struct slice in, int hash_size) {
+  struct slice start = in;
+  struct obj_record *r = (struct obj_record *)rec;
+  uint64_t count = val_type;
+  int n = 0;
+  r->hash_prefix = malloc(key.len);
+  memcpy(r->hash_prefix, key.buf, key.len);
+  r->hash_prefix_len = key.len;
+
+  if (val_type == 0) {
+    n = get_var_int(&count, in);
+    if (n < 0) {
+      return n;
+    }
+
+    in.buf += n;
+    in.len -= n;
+  }
+
+  r->offsets = NULL;
+  r->offset_len = 0;
+  if (count == 0) {
+    return start.len - in.len;
+  }
+
+  r->offsets = malloc(count * sizeof(uint64_t));
+  r->offset_len = count;
+
+  n = get_var_int(&r->offsets[0], in);
+  if (n < 0) {
+    return n;
+  }
+
+  in.buf += n;
+  in.len -= n;
+
+  {
+    uint64_t last = r->offsets[0];
+    int j = 1;
+    while (j < count) {
+      uint64_t delta = 0;
+      int n = get_var_int(&delta, in);
+      if (n < 0) {
+        return n;
+      }
+
+      in.buf += n;
+      in.len -= n;
+
+      last = r->offsets[j] = (delta + last);
+      j++;
+    }
+  }
+  return start.len - in.len;
+}
+
+struct record_vtable obj_record_vtable = {
+    .key = &obj_record_key,
+    .type = &obj_record_type,
+    .copy_from = &obj_record_copy_from,
+    .val_type = &obj_record_val_type,
+    .encode = &obj_record_encode,
+    .decode = &obj_record_decode,
+    .clear = &obj_record_clear,
+};
+
+void log_record_print(struct log_record *log, int hash_size) {
+  char hex[SHA256_SIZE + 1] = {};
+
+  printf("log{%s(%ld) %s <%s> %lu %04d\n", log->ref_name, log->update_index,
+         log->name, log->email, log->time, log->tz_offset);
+  hex_format(hex, log->old_hash, hash_size);
+  printf("%s => ", hex);
+  hex_format(hex, log->new_hash, hash_size);
+  printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static byte log_record_type(void) { return BLOCK_TYPE_LOG; }
+
+static void log_record_key(const void *r, struct slice *dest) {
+  const struct log_record *rec = (const struct log_record *)r;
+  int len = strlen(rec->ref_name);
+  uint64_t ts = 0;
+  slice_resize(dest, len + 9);
+  memcpy(dest->buf, rec->ref_name, len + 1);
+  ts = (~ts) - rec->update_index;
+  put_u64(dest->buf + 1 + len, ts);
+}
+
+static void log_record_copy_from(void *rec, const void *src_rec,
+                                 int hash_size) {
+  struct log_record *dst = (struct log_record *)rec;
+  const struct log_record *src = (const struct log_record *)src_rec;
+
+  *dst = *src;
+  dst->ref_name = strdup(dst->ref_name);
+  dst->email = strdup(dst->email);
+  dst->name = strdup(dst->name);
+  dst->message = strdup(dst->message);
+  if (dst->new_hash != NULL) {
+    dst->new_hash = malloc(hash_size);
+    memcpy(dst->new_hash, src->new_hash, hash_size);
+  }
+  if (dst->old_hash != NULL) {
+    dst->old_hash = malloc(hash_size);
+    memcpy(dst->old_hash, src->old_hash, hash_size);
+  }
+}
+
+static void log_record_clear_void(void *rec) {
+  struct log_record *r = (struct log_record *)rec;
+  log_record_clear(r);
+}
+
+void log_record_clear(struct log_record *r) {
+  free(r->ref_name);
+  free(r->new_hash);
+  free(r->old_hash);
+  free(r->name);
+  free(r->email);
+  free(r->message);
+  memset(r, 0, sizeof(struct log_record));
+}
+
+static byte log_record_val_type(const void *rec) { return 1; }
+
+static byte zero[SHA256_SIZE] = {};
+
+static int log_record_encode(const void *rec, struct slice s, int hash_size) {
+  struct log_record *r = (struct log_record *)rec;
+  struct slice start = s;
+  int n = 0;
+  byte *oldh = r->old_hash;
+  byte *newh = r->new_hash;
+  if (oldh == NULL) {
+    oldh = zero;
+  }
+  if (newh == NULL) {
+    newh = zero;
+  }
+
+  if (s.len < 2 * hash_size) {
+    return -1;
+  }
+
+  memcpy(s.buf, oldh, hash_size);
+  memcpy(s.buf + hash_size, newh, hash_size);
+  s.buf += 2 * hash_size;
+  s.len -= 2 * hash_size;
+
+  n = encode_string(r->name ? r->name : "", s);
+  if (n < 0) {
+    return -1;
+  }
+  s.len -= n;
+  s.buf += n;
+
+  n = encode_string(r->email ? r->email : "", s);
+  if (n < 0) {
+    return -1;
+  }
+  s.len -= n;
+  s.buf += n;
+
+  n = put_var_int(s, r->time);
+  if (n < 0) {
+    return -1;
+  }
+  s.buf += n;
+  s.len -= n;
+
+  if (s.len < 2) {
+    return -1;
+  }
+
+  put_u16(s.buf, r->tz_offset);
+  s.buf += 2;
+  s.len -= 2;
+
+  n = encode_string(r->message ? r->message : "", s);
+  if (n < 0) {
+    return -1;
+  }
+  s.len -= n;
+  s.buf += n;
+
+  return start.len - s.len;
+}
+
+static int log_record_decode(void *rec, struct slice key, byte val_type,
+                             struct slice in, int hash_size) {
+  struct slice start = in;
+  struct log_record *r = (struct log_record *)rec;
+  uint64_t max = 0;
+  uint64_t ts = 0;
+  struct slice dest = {};
+  int n;
+
+  if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+    return FORMAT_ERROR;
+  }
+
+  r->ref_name = realloc(r->ref_name, key.len - 8);
+  memcpy(r->ref_name, key.buf, key.len - 8);
+  ts = get_u64(key.buf + key.len - 8);
+
+  r->update_index = (~max) - ts;
+
+  if (in.len < 2 * hash_size) {
+    return FORMAT_ERROR;
+  }
+
+  r->old_hash = realloc(r->old_hash, hash_size);
+  r->new_hash = realloc(r->new_hash, hash_size);
+
+  memcpy(r->old_hash, in.buf, hash_size);
+  memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+  in.buf += 2 * hash_size;
+  in.len -= 2 * hash_size;
+
+  n = decode_string(&dest, in);
+  if (n < 0) {
+    goto error;
+  }
+  in.len -= n;
+  in.buf += n;
+
+  r->name = realloc(r->name, dest.len + 1);
+  memcpy(r->name, dest.buf, dest.len);
+  r->name[dest.len] = 0;
+
+  slice_resize(&dest, 0);
+  n = decode_string(&dest, in);
+  if (n < 0) {
+    goto error;
+  }
+  in.len -= n;
+  in.buf += n;
+
+  r->email = realloc(r->email, dest.len + 1);
+  memcpy(r->email, dest.buf, dest.len);
+  r->email[dest.len] = 0;
+
+  ts = 0;
+  n = get_var_int(&ts, in);
+  if (n < 0) {
+    goto error;
+  }
+  in.len -= n;
+  in.buf += n;
+  r->time = ts;
+  if (in.len < 2) {
+    goto error;
+  }
+
+  r->tz_offset = get_u16(in.buf);
+  in.buf += 2;
+  in.len -= 2;
+
+  slice_resize(&dest, 0);
+  n = decode_string(&dest, in);
+  if (n < 0) {
+    goto error;
+  }
+  in.len -= n;
+  in.buf += n;
+
+  r->message = realloc(r->message, dest.len + 1);
+  memcpy(r->message, dest.buf, dest.len);
+  r->message[dest.len] = 0;
+
+  return start.len - in.len;
+
+error:
+  free(slice_yield(&dest));
+  return FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b) {
+  char *empty = "";
+  if (a == NULL) {
+    a = empty;
+  }
+  if (b == NULL) {
+    b = empty;
+  }
+  return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz) {
+  if (a == NULL) {
+    a = zero;
+  }
+  if (b == NULL) {
+    b = zero;
+  }
+  return 0 == memcmp(a, b, sz);
+}
+
+bool log_record_equal(struct log_record *a, struct log_record *b,
+                      int hash_size) {
+  return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+         null_streq(a->message, b->message) &&
+         zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+         zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+         a->time == b->time && a->tz_offset == b->tz_offset &&
+         a->update_index == b->update_index;
+}
+
+struct record_vtable log_record_vtable = {
+    .key = &log_record_key,
+    .type = &log_record_type,
+    .copy_from = &log_record_copy_from,
+    .val_type = &log_record_val_type,
+    .encode = &log_record_encode,
+    .decode = &log_record_decode,
+    .clear = &log_record_clear_void,
+};
+
+struct record new_record(byte typ) {
+  struct record rec;
+  switch (typ) {
+    case BLOCK_TYPE_REF: {
+      struct ref_record *r = calloc(1, sizeof(struct ref_record));
+      record_from_ref(&rec, r);
+      return rec;
+    }
+
+    case BLOCK_TYPE_OBJ: {
+      struct obj_record *r = calloc(1, sizeof(struct obj_record));
+      record_from_obj(&rec, r);
+      return rec;
+    }
+    case BLOCK_TYPE_LOG: {
+      struct log_record *r = calloc(1, sizeof(struct log_record));
+      record_from_log(&rec, r);
+      return rec;
+    }
+    case BLOCK_TYPE_INDEX: {
+      struct index_record *r = calloc(1, sizeof(struct index_record));
+      record_from_index(&rec, r);
+      return rec;
+    }
+  }
+  abort();
+  return rec;
+}
+
+static byte index_record_type(void) { return BLOCK_TYPE_INDEX; }
+
+static void index_record_key(const void *r, struct slice *dest) {
+  struct index_record *rec = (struct index_record *)r;
+  slice_copy(dest, rec->last_key);
+}
+
+static void index_record_copy_from(void *rec, const void *src_rec,
+                                   int hash_size) {
+  struct index_record *dst = (struct index_record *)rec;
+  struct index_record *src = (struct index_record *)src_rec;
+
+  slice_copy(&dst->last_key, src->last_key);
+  dst->offset = src->offset;
+}
+
+static void index_record_clear(void *rec) {
+  struct index_record *idx = (struct index_record *)rec;
+  free(slice_yield(&idx->last_key));
+}
+
+static byte index_record_val_type(const void *rec) { return 0; }
+
+static int index_record_encode(const void *rec, struct slice out,
+                               int hash_size) {
+  const struct index_record *r = (const struct index_record *)rec;
+  struct slice start = out;
+
+  int n = put_var_int(out, r->offset);
+  if (n < 0) {
+    return n;
+  }
+
+  out.buf += n;
+  out.len -= n;
+
+  return start.len - out.len;
+}
+
+static int index_record_decode(void *rec, struct slice key, byte val_type,
+                               struct slice in, int hash_size) {
+  struct slice start = in;
+  struct index_record *r = (struct index_record *)rec;
+  int n = 0;
+
+  slice_copy(&r->last_key, key);
+
+  n = get_var_int(&r->offset, in);
+  if (n < 0) {
+    return n;
+  }
+
+  in.buf += n;
+  in.len -= n;
+  return start.len - in.len;
+}
+
+struct record_vtable index_record_vtable = {
+    .key = &index_record_key,
+    .type = &index_record_type,
+    .copy_from = &index_record_copy_from,
+    .val_type = &index_record_val_type,
+    .encode = &index_record_encode,
+    .decode = &index_record_decode,
+    .clear = &index_record_clear,
+};
+
+void record_key(struct record rec, struct slice *dest) {
+  rec.ops->key(rec.data, dest);
+}
+
+byte record_type(struct record rec) { return rec.ops->type(); }
+
+int record_encode(struct record rec, struct slice dest, int hash_size) {
+  return rec.ops->encode(rec.data, dest, hash_size);
+}
+
+void record_copy_from(struct record rec, struct record src, int hash_size) {
+  assert(src.ops->type() == rec.ops->type());
+
+  rec.ops->copy_from(rec.data, src.data, hash_size);
+}
+
+byte record_val_type(struct record rec) { return rec.ops->val_type(rec.data); }
+
+int record_decode(struct record rec, struct slice key, byte extra,
+                  struct slice src, int hash_size) {
+  return rec.ops->decode(rec.data, key, extra, src, hash_size);
+}
+
+void record_clear(struct record rec) { return rec.ops->clear(rec.data); }
+
+void record_from_ref(struct record *rec, struct ref_record *ref_rec) {
+  rec->data = ref_rec;
+  rec->ops = &ref_record_vtable;
+}
+
+void record_from_obj(struct record *rec, struct obj_record *obj_rec) {
+  rec->data = obj_rec;
+  rec->ops = &obj_record_vtable;
+}
+
+void record_from_index(struct record *rec, struct index_record *index_rec) {
+  rec->data = index_rec;
+  rec->ops = &index_record_vtable;
+}
+
+void record_from_log(struct record *rec, struct log_record *log_rec) {
+  rec->data = log_rec;
+  rec->ops = &log_record_vtable;
+}
+
+void *record_yield(struct record *rec) {
+  void *p = rec->data;
+  rec->data = NULL;
+  return p;
+}
+
+struct ref_record *record_as_ref(struct record rec) {
+  assert(record_type(rec) == BLOCK_TYPE_REF);
+  return (struct ref_record *)rec.data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size) {
+  if (a != NULL && b != NULL) {
+    return 0 == memcmp(a, b, hash_size);
+  }
+
+  return a == b;
+}
+
+static bool str_equal(char *a, char *b) {
+  if (a != NULL && b != NULL) {
+    return 0 == strcmp(a, b);
+  }
+
+  return a == b;
+}
+
+bool ref_record_equal(struct ref_record *a, struct ref_record *b,
+                      int hash_size) {
+  assert(hash_size > 0);
+  return 0 == strcmp(a->ref_name, b->ref_name) &&
+         a->update_index == b->update_index &&
+         hash_equal(a->value, b->value, hash_size) &&
+         hash_equal(a->target_value, b->target_value, hash_size) &&
+         str_equal(a->target, b->target);
+}
+
+int ref_record_compare_name(const void *a, const void *b) {
+  return strcmp(((struct ref_record *)a)->ref_name,
+                ((struct ref_record *)b)->ref_name);
+}
+
+bool ref_record_is_deletion(const struct ref_record *ref) {
+  return ref->value == NULL && ref->target == NULL && ref->target_value == NULL;
+}
+
+int log_record_compare_key(const void *a, const void *b) {
+  struct log_record *la = (struct log_record *)a;
+  struct log_record *lb = (struct log_record *)b;
+
+  int cmp  = strcmp(la->ref_name, lb->ref_name);
+  if (cmp) {
+    return cmp;
+  }
+  if (la->update_index > lb->update_index) {
+    return -1;
+  }
+  return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool log_record_is_deletion(const struct log_record *log) {
+  // XXX
+  return false;
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 0000000000..c6245d6cf1
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,79 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+struct record_vtable {
+  void (*key)(const void *rec, struct slice *dest);
+  byte (*type)(void);
+  void (*copy_from)(void *rec, const void *src, int hash_size);
+  byte (*val_type)(const void *rec);
+  int (*encode)(const void *rec, struct slice dest, int hash_size);
+  int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+                int hash_size);
+  void (*clear)(void *rec);
+};
+
+/* record is a generic wrapper for differnt types of records. */
+struct record {
+  void *data;
+  struct record_vtable *ops;
+};
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+int common_prefix_size(struct slice a, struct slice b);
+
+int is_block_type(byte typ);
+struct record new_record(byte typ);
+
+extern struct record_vtable ref_record_vtable;
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+               struct slice key, byte extra);
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+               struct slice in);
+
+struct index_record {
+  struct slice last_key;
+  uint64_t offset;
+};
+
+struct obj_record {
+  byte *hash_prefix;
+  int hash_prefix_len;
+  uint64_t *offsets;
+  int offset_len;
+};
+
+void record_key(struct record rec, struct slice *dest);
+byte record_type(struct record rec);
+void record_copy_from(struct record rec, struct record src, int hash_size);
+byte record_val_type(struct record rec);
+int record_encode(struct record rec, struct slice dest, int hash_size);
+int record_decode(struct record rec, struct slice key, byte extra,
+                  struct slice src, int hash_size);
+void record_clear(struct record rec);
+void *record_yield(struct record *rec);
+void record_from_obj(struct record *rec, struct obj_record *objrec);
+void record_from_index(struct record *rec, struct index_record *idxrec);
+void record_from_ref(struct record *rec, struct ref_record *refrec);
+void record_from_log(struct record *rec, struct log_record *logrec);
+struct ref_record *record_as_ref(struct record ref);
+
+// for qsort.
+int ref_record_compare_name(const void *a, const void *b);
+
+// for qsort.
+int log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/record_test.c b/reftable/record_test.c
new file mode 100644
index 0000000000..71c6d0dd97
--- /dev/null
+++ b/reftable/record_test.c
@@ -0,0 +1,313 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include <string.h>
+
+#include "basics.h"
+#include "constants.h"
+#include "reftable.h"
+#include "test_framework.h"
+
+void varint_roundtrip() {
+  uint64_t inputs[] = {0,
+                       1,
+                       27,
+                       127,
+                       128,
+                       257,
+                       4096,
+                       ((uint64_t)1 << 63),
+                       ((uint64_t)1 << 63) + ((uint64_t)1 << 63) - 1};
+  for (int i = 0; i < ARRAYSIZE(inputs); i++) {
+    byte dest[10];
+
+    struct slice out = {.buf = dest, .len = 10, .cap = 10};
+
+    uint64_t in = inputs[i];
+    int n = put_var_int(out, in);
+    assert(n > 0);
+    out.len = n;
+
+    uint64_t got = 0;
+    n = get_var_int(&got, out);
+    assert(n > 0);
+
+    assert(got == in);
+  }
+}
+
+void test_common_prefix() {
+  struct {
+    const char *a, *b;
+    int want;
+  } cases[] = {
+      {"abc", "ab", 2},
+      {"", "abc", 0},
+      {"abc", "abd", 2},
+      {"abc", "pqr", 0},
+  };
+
+  for (int i = 0; i < ARRAYSIZE(cases); i++) {
+    struct slice a = {};
+    struct slice b = {};
+    slice_set_string(&a, cases[i].a);
+    slice_set_string(&b, cases[i].b);
+
+    int got = common_prefix_size(a, b);
+    assert(got == cases[i].want);
+
+    free(slice_yield(&a));
+    free(slice_yield(&b));
+  }
+}
+
+void set_hash(byte *h, int j) {
+  for (int i = 0; i < SHA1_SIZE; i++) {
+    h[i] = (j >> i) & 0xff;
+  }
+}
+
+void test_ref_record_roundtrip() {
+  for (int i = 0; i <= 3; i++) {
+    printf("subtest %d\n", i);
+    struct ref_record in = {};
+    switch (i) {
+      case 0:
+        break;
+      case 1:
+        in.value = malloc(SHA1_SIZE);
+        set_hash(in.value, 1);
+        break;
+      case 2:
+        in.value = malloc(SHA1_SIZE);
+        set_hash(in.value, 1);
+        in.target_value = malloc(SHA1_SIZE);
+        set_hash(in.target_value, 2);
+        break;
+      case 3:
+        in.target = strdup("target");
+        break;
+    }
+    in.ref_name = strdup("refs/heads/master");
+
+    struct record rec = {};
+    record_from_ref(&rec, &in);
+    assert(record_val_type(rec) == i);
+    byte buf[1024];
+    struct slice key = {};
+    record_key(rec, &key);
+    struct slice dest = {
+        .buf = buf,
+        .len = sizeof(buf),
+    };
+    int n = record_encode(rec, dest, SHA1_SIZE);
+    assert(n > 0);
+
+    struct ref_record out = {};
+    struct record rec_out = {};
+    record_from_ref(&rec_out, &out);
+    int m = record_decode(rec_out, key, i, dest, SHA1_SIZE);
+    assert(n == m);
+
+    assert((out.value != NULL) == (in.value != NULL));
+    assert((out.target_value != NULL) == (in.target_value != NULL));
+    assert((out.target != NULL) == (in.target != NULL));
+    free(slice_yield(&key));
+    record_clear(rec_out);
+    ref_record_clear(&in);
+  }
+}
+
+void test_log_record_roundtrip() {
+  struct log_record in = {
+      .ref_name = strdup("refs/heads/master"),
+      .old_hash = malloc(SHA1_SIZE),
+      .new_hash = malloc(SHA1_SIZE),
+      .name = strdup("han-wen"),
+      .email = strdup("hanwen@google.com"),
+      .message = strdup("test"),
+      .update_index = 42,
+      .time = 1577123507,
+      .tz_offset = 100,
+  };
+
+  struct record rec = {};
+  record_from_log(&rec, &in);
+
+  struct slice key = {};
+  record_key(rec, &key);
+
+  byte buf[1024];
+  struct slice dest = {
+      .buf = buf,
+      .len = sizeof(buf),
+  };
+
+  int n = record_encode(rec, dest, SHA1_SIZE);
+  assert(n > 0);
+
+  struct log_record out = {};
+  struct record rec_out = {};
+  record_from_log(&rec_out, &out);
+  int valtype = record_val_type(rec);
+  int m = record_decode(rec_out, key, valtype, dest, SHA1_SIZE);
+  assert(n == m);
+
+  assert(log_record_equal(&in, &out, SHA1_SIZE));
+  log_record_clear(&in);
+  free(slice_yield(&key));
+  record_clear(rec_out);
+}
+
+void test_u24_roundtrip() {
+  uint32_t in = 0x112233;
+  byte dest[3];
+
+  put_u24(dest, in);
+  uint32_t out = get_u24(dest);
+  assert(in == out);
+}
+
+void test_key_roundtrip() {
+  struct slice dest = {}, last_key = {}, key = {}, roundtrip = {};
+
+  slice_resize(&dest, 1024);
+  slice_set_string(&last_key, "refs/heads/master");
+  slice_set_string(&key, "refs/tags/bla");
+
+  bool restart;
+  byte extra = 6;
+  int n = encode_key(&restart, dest, last_key, key, extra);
+  assert(!restart);
+  assert(n > 0);
+
+  byte rt_extra;
+  int m = decode_key(&roundtrip, &rt_extra, last_key, dest);
+  assert(n == m);
+  assert(slice_equal(key, roundtrip));
+  assert(rt_extra == extra);
+
+  free(slice_yield(&last_key));
+  free(slice_yield(&key));
+  free(slice_yield(&dest));
+  free(slice_yield(&roundtrip));
+}
+
+void print_bytes(byte *p, int l) {
+  for (int i = 0; i < l; i++) {
+    byte c = *p;
+    if (c < 32) {
+      c = '.';
+    }
+    printf("%02x[%c] ", p[i], c);
+  }
+  printf("(%d)\n", l);
+}
+
+void test_obj_record_roundtrip() {
+  byte testHash1[SHA1_SIZE] = {};
+  set_hash(testHash1, 1);
+  uint64_t till9[] = {1, 2, 3, 4, 500, 600, 700, 800, 9000};
+
+  struct obj_record recs[3] = {{
+                                   .hash_prefix = testHash1,
+                                   .hash_prefix_len = 5,
+                                   .offsets = till9,
+                                   .offset_len = 3,
+                               },
+                               {
+                                   .hash_prefix = testHash1,
+                                   .hash_prefix_len = 5,
+                                   .offsets = till9,
+                                   .offset_len = 9,
+                               },
+                               {
+                                   .hash_prefix = testHash1,
+                                   .hash_prefix_len = 5,
+                               }
+
+  };
+  for (int i = 0; i < ARRAYSIZE(recs); i++) {
+    printf("subtest %d\n", i);
+    struct obj_record in = recs[i];
+    byte buf[1024];
+    struct record rec = {};
+    record_from_obj(&rec, &in);
+    struct slice key = {};
+    record_key(rec, &key);
+    struct slice dest = {
+        .buf = buf,
+        .len = sizeof(buf),
+    };
+    int n = record_encode(rec, dest, SHA1_SIZE);
+    assert(n > 0);
+    byte extra = record_val_type(rec);
+    struct obj_record out = {};
+    struct record rec_out = {};
+    record_from_obj(&rec_out, &out);
+    int m = record_decode(rec_out, key, extra, dest, SHA1_SIZE);
+    assert(n == m);
+
+    assert(in.hash_prefix_len == out.hash_prefix_len);
+    assert(in.offset_len == out.offset_len);
+
+    assert(0 == memcmp(in.hash_prefix, out.hash_prefix, in.hash_prefix_len));
+    assert(0 ==
+           memcmp(in.offsets, out.offsets, sizeof(uint64_t) * in.offset_len));
+    free(slice_yield(&key));
+    record_clear(rec_out);
+  }
+}
+
+void test_index_record_roundtrip() {
+  struct index_record in = {.offset = 42};
+
+  slice_set_string(&in.last_key, "refs/heads/master");
+
+  struct slice key = {};
+  struct record rec = {};
+  record_from_index(&rec, &in);
+  record_key(rec, &key);
+
+  assert(0 == slice_compare(key, in.last_key));
+
+  byte buf[1024];
+  struct slice dest = {
+      .buf = buf,
+      .len = sizeof(buf),
+  };
+  int n = record_encode(rec, dest, SHA1_SIZE);
+  assert(n > 0);
+
+  byte extra = record_val_type(rec);
+  struct index_record out = {};
+  struct record out_rec;
+  record_from_index(&out_rec, &out);
+  int m = record_decode(out_rec, key, extra, dest, SHA1_SIZE);
+  assert(m == n);
+
+  assert(in.offset == out.offset);
+
+  record_clear(out_rec);
+  free(slice_yield(&key));
+  free(slice_yield(&in.last_key));
+}
+
+int main() {
+  add_test_case("test_log_record_roundtrip", &test_log_record_roundtrip);
+  add_test_case("test_ref_record_roundtrip", &test_ref_record_roundtrip);
+  add_test_case("varint_roundtrip", &varint_roundtrip);
+  add_test_case("test_key_roundtrip", &test_key_roundtrip);
+  add_test_case("test_common_prefix", &test_common_prefix);
+  add_test_case("test_obj_record_roundtrip", &test_obj_record_roundtrip);
+  add_test_case("test_index_record_roundtrip", &test_index_record_roundtrip);
+  add_test_case("test_u24_roundtrip", &test_u24_roundtrip);
+  test_main();
+}
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 0000000000..d760af973d
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,396 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+
+typedef uint8_t byte;
+typedef byte bool;
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct block_source {
+  struct block_source_vtable *ops;
+  void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct block {
+  byte *data;
+  int len;
+  struct block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct block_source_vtable {
+  /* returns the size of a block source */
+  uint64_t (*size)(void *source);
+
+  /* reads a segment from the block source. It is an error to read
+     beyond the end of the block */
+  int (*read_block)(void *source, struct block *dest, uint64_t off,
+                    uint32_t size);
+  /* mark the block as read; may return the data back to malloc */
+  void (*return_block)(void *source, struct block *blockp);
+
+  /* release all resources associated with the block source */
+  void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int block_source_from_file(struct block_source *block_src, const char *name);
+
+/* write_options sets options for writing a single reftable. */
+struct write_options {
+  /* do not pad out blocks to block size. */
+  bool unpadded;
+
+  /* the blocksize. Should be less than 2^24. */
+  uint32_t block_size;
+
+  /* do not generate a SHA1 => ref index. */
+  bool skip_index_objects;
+
+  /* how often to write complete keys in each block. */
+  int restart_interval;
+};
+
+/* ref_record holds a ref database entry target_value */
+struct ref_record {
+  char *ref_name;        /* Name of the ref, malloced. */
+  uint64_t update_index; /* Logical timestamp at which this value is written */
+  byte *value;           /* SHA1, or NULL. malloced. */
+  byte *target_value;    /* peeled annotated tag, or NULL. malloced. */
+  char *target;          /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+bool ref_record_is_deletion(const struct ref_record *ref);
+
+/* prints a ref_record onto stdout */
+void ref_record_print(struct ref_record *ref, int hash_size);
+
+/* frees and nulls all pointer values. */
+void ref_record_clear(struct ref_record *ref);
+
+/* returns whether two ref_records are the same */
+bool ref_record_equal(struct ref_record *a, struct ref_record *b,
+                      int hash_size);
+
+/* log_record holds a reflog entry */
+struct log_record {
+  char *ref_name;
+  uint64_t update_index;
+  byte *new_hash;
+  byte *old_hash;
+  char *name;
+  char *email;
+  uint64_t time;
+  int16_t tz_offset;
+  char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+bool log_record_is_deletion(const struct log_record *log);
+
+/* frees and nulls all pointer values. */
+void log_record_clear(struct log_record *log);
+
+/* returns whether two records are equal. */
+bool log_record_equal(struct log_record *a, struct log_record *b,
+                      int hash_size);
+
+void log_record_print(struct log_record *log, int hash_size);
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct iterator {
+  struct iterator_vtable *ops;
+  void *iter_arg;
+};
+
+/* reads the next ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_ref(struct iterator it, struct ref_record *ref);
+
+/* reads the next log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_log(struct iterator it, struct log_record *log);
+
+/* releases resources associated with an iterator. */
+void iterator_destroy(struct iterator *it);
+
+/* block_stats holds statistics for a single block type */
+struct block_stats {
+  /* total number of entries written */
+  int entries;
+  /* total number of key restarts */
+  int restarts;
+  /* total number of blocks */
+  int blocks;
+  /* total number of index blocks */
+  int index_blocks;
+  /* depth of the index */
+  int max_index_level;
+
+  /* offset of the first block for this type */
+  uint64_t offset;
+  /* offset of the top level index block for this type, or 0 if not present */
+  uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct stats {
+  /* total number of blocks written. */
+  int blocks;
+  /* stats for ref data */
+  struct block_stats ref_stats;
+  /* stats for the SHA1 to ref map. */
+  struct block_stats obj_stats;
+  /* stats for index blocks */
+  struct block_stats idx_stats;
+  /* stats for log blocks */
+  struct block_stats log_stats;
+
+  /* disambiguation length of shortened object IDs. */
+  int object_id_len;
+};
+
+/* different types of errors */
+
+/* Unexpected file system behavior */
+#define IO_ERROR -2
+
+/* Format inconsistency on reading data
+ */
+#define FORMAT_ERROR -3
+
+/* File does not exist. Returned from block_source_from_file(),  because it
+   needs special handling in stack.
+*/
+#define NOT_EXIST_ERROR -4
+
+/* Trying to write out-of-date data. */
+#define LOCK_ERROR -5
+
+/* Misuse of the API:
+   - on writing a ref_record outside the table limits
+   - on writing a ref or log record before the stack's next_update_index
+   - on reading a ref_record from log iterator, or vice versa.
+ */
+#define API_ERROR -6
+
+/* Decompression error */
+#define ZLIB_ERROR -7
+
+const char *error_str(int err);
+
+/* new_writer creates a new writer */
+struct writer *new_writer(int (*writer_func)(void *, byte *, int),
+                          void *writer_arg, struct write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int fd_writer(void *fdp, byte *data, int size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   stack_next_update_index(), or API_ERROR is returned.
+ */
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max);
+
+/* adds a ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   writer_set_limits(), or API_ERROR is returned.
+ */
+int writer_add_ref(struct writer *w, struct ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n);
+
+/* adds a log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int writer_add_log(struct writer *w, struct log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int writer_add_logs(struct writer *w, struct log_record *logs, int n);
+
+
+/* writer_close finalizes the reftable. The writer is retained so statistics can
+ * be inspected. */
+int writer_close(struct writer *w);
+
+/* writer_stats returns the statistics on the reftable being written. */
+struct stats *writer_stats(struct writer *w);
+
+/* writer_free deallocates memory for the writer */
+void writer_free(struct writer *w);
+
+struct reader;
+
+/* new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp.  The name is used for creating a
+ * stack. Typically, it is the basename of the file.
+ */
+int new_reader(struct reader **pp, struct block_source, const char *name);
+
+/* reader_seek_ref returns an iterator where 'name' would be inserted in the
+   table.
+
+   example:
+
+   struct reader *r = NULL;
+   int err = new_reader(&r, src, "filename");
+   if (err < 0) { ... }
+   struct iterator it = {};
+   err = reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct ref_record ref = {};
+   while (1) {
+     err = iterator_next_ref(it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   iterator_destroy(&it);
+   ref_record_clear(&ref);
+ */
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name);
+
+/* seek to logs for the given name, older than update_index. */
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+                       uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reader_free(struct reader *);
+
+/* return an iterator for the refs pointing to oid */
+int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
+                    int oid_len);
+
+/* return the max_update_index for a table */
+uint64_t reader_max_update_index(struct reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reader_min_update_index(struct reader *r);
+
+/* a merged table is implements seeking/iterating over a stack of tables. */
+struct merged_table;
+
+/* new_merged_table creates a new merged table. It takes ownership of the stack
+   array.
+*/
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n);
+
+/* returns an iterator positioned just before 'name' */
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+                          const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+                             const char *name, uint64_t update_index);
+
+/* like merged_table_seek_log_at but look for the newest entry. */
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+                          const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t merged_max_update_index(struct merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t merged_min_update_index(struct merged_table *mt);
+
+/* closes readers for the merged tables */
+void merged_table_close(struct merged_table *mt);
+
+/* releases memory for the merged_table */
+void merged_table_free(struct merged_table *m);
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct stack;
+
+/* open a new reftable stack. The tables will be stored in 'dir', while the list
+   of tables is in 'list_file'. Typically, this should be .git/reftables and
+   .git/refs respectively.
+*/
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+              struct write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t stack_next_update_index(struct stack *st);
+
+/* add a new table to the stack. The write_table function must call
+   writer_set_limits, add refs and return an error value. */
+int stack_add(struct stack *st,
+              int (*write_table)(struct writer *wr, void *write_arg),
+              void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct merged_table *stack_merged_table(struct stack *st);
+
+/* frees all resources associated with the stack. */
+void stack_destroy(struct stack *st);
+
+/* reloads the stack if necessary. */
+int stack_reload(struct stack *st);
+
+/* Policy for expiring reflog entries. */
+struct log_expiry_config {
+  /* Drop entries older than this timestamp */
+  uint64_t time;
+
+  /* Drop older entries */
+  uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is non-NULL */
+int stack_compact_all(struct stack *st, struct log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int stack_auto_compact(struct stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_ref(struct stack *st, const char *refname,
+                   struct ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_log(struct stack *st, const char *refname,
+                   struct log_record *log);
+
+/* statistics on past compactions. */
+struct compaction_stats {
+  uint64_t bytes;
+  int attempts;
+  int failures;
+};
+
+struct compaction_stats *stack_compaction_stats(struct stack *st);
+
+#endif
diff --git a/reftable/reftable_test.c b/reftable/reftable_test.c
new file mode 100644
index 0000000000..8341654c49
--- /dev/null
+++ b/reftable/reftable_test.c
@@ -0,0 +1,434 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+
+#include <string.h>
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "record.h"
+#include "test_framework.h"
+
+static const int update_index = 5;
+
+void test_buffer(void) {
+  struct slice buf = {};
+
+  byte in[] = "hello";
+  slice_write(&buf, in, sizeof(in));
+  struct block_source source;
+  block_source_from_slice(&source, &buf);
+  assert(block_source_size(source) == 6);
+  struct block out = {};
+  int n = block_source_read_block(source, &out, 0, sizeof(in));
+  assert(n == sizeof(in));
+  assert(0 == memcmp(in, out.data, n));
+  block_source_return_block(source, &out);
+
+  n = block_source_read_block(source, &out, 1, 2);
+  assert(n == 2);
+  assert(0 == memcmp(out.data, "el", 2));
+
+  block_source_return_block(source, &out);
+  block_source_close(&source);
+  free(slice_yield(&buf));
+}
+
+void write_table(char ***names, struct slice *buf, int N, int block_size) {
+  *names = calloc(sizeof(char *), N + 1);
+
+  struct write_options opts = {
+      .block_size = block_size,
+  };
+
+  struct writer *w = new_writer(&slice_write_void, buf, &opts);
+
+  writer_set_limits(w, update_index, update_index);
+  {
+    struct ref_record ref = {};
+    for (int i = 0; i < N; i++) {
+      byte hash[SHA1_SIZE];
+      set_test_hash(hash, i);
+
+      char name[100];
+      sprintf(name, "refs/heads/branch%02d", i);
+
+      ref.ref_name = name;
+      ref.value = hash;
+      ref.update_index = update_index;
+      (*names)[i] = strdup(name);
+
+      fflush(stdout);
+      int n = writer_add_ref(w, &ref);
+      assert(n == 0);
+    }
+  }
+  int n = writer_close(w);
+  assert(n == 0);
+
+  struct stats *stats = writer_stats(w);
+  for (int i = 0; i < stats->ref_stats.blocks; i++) {
+    int off = i * opts.block_size;
+    if (off == 0) {
+      off = HEADER_SIZE;
+    }
+    assert(buf->buf[off] == 'r');
+  }
+
+  writer_free(w);
+  w = NULL;
+}
+
+void test_log_write_read(void) {
+  int N = 2;
+  char **names = calloc(sizeof(char *), N + 1);
+
+  struct write_options opts = {
+      .block_size = 256,
+  };
+
+  struct slice buf = {};
+  struct writer *w = new_writer(&slice_write_void, &buf, &opts);
+
+  writer_set_limits(w, 0, N);
+  {
+    struct ref_record ref = {};
+    for (int i = 0; i < N; i++) {
+      char name[256];
+      sprintf(name, "b%02d%0*d", i, 130, 7);
+      names[i] = strdup(name);
+      puts(name);
+      ref.ref_name = name;
+      ref.update_index = i;
+
+      int err = writer_add_ref(w, &ref);
+      assert_err(err);
+    }
+  }
+
+  {
+    struct log_record log = {};
+    for (int i = 0; i < N; i++) {
+      byte hash1[SHA1_SIZE], hash2[SHA1_SIZE];
+      set_test_hash(hash1, i);
+      set_test_hash(hash2, i + 1);
+
+      log.ref_name = names[i];
+      log.update_index = i;
+      log.old_hash = hash1;
+      log.new_hash = hash2;
+
+      int err = writer_add_log(w, &log);
+      assert_err(err);
+    }
+  }
+
+  int n = writer_close(w);
+  assert(n == 0);
+
+  struct stats *stats = writer_stats(w);
+  assert(stats->log_stats.blocks > 0);
+  writer_free(w);
+  w = NULL;
+
+  struct block_source source = {};
+  block_source_from_slice(&source, &buf);
+
+  struct reader rd = {};
+  int err = init_reader(&rd, source, "file.log");
+  assert(err == 0);
+
+  {
+    struct iterator it = {};
+    err = reader_seek_ref(&rd, &it, names[N-1]);
+    assert(err == 0);
+
+    struct ref_record ref = {};
+    err = iterator_next_ref(it, &ref);
+    assert_err(err);
+
+    // end of iteration.
+    err = iterator_next_ref(it, &ref);
+    assert(0 < err);
+
+    iterator_destroy(&it);
+    ref_record_clear(&ref);
+  }
+
+  {
+    struct iterator it = {};
+    err = reader_seek_log(&rd, &it, "");
+    assert(err == 0);
+
+    struct log_record log = {};
+    int i = 0;
+    while (true) {
+      int err = iterator_next_log(it, &log);
+      if (err > 0) {
+        break;
+      }
+
+      assert_err(err);
+      assert_streq(names[i], log.ref_name);
+      assert(i == log.update_index);
+      i++;
+    }
+
+    assert(i == N);
+    iterator_destroy(&it);
+  }
+
+  // cleanup.
+  free(slice_yield(&buf));
+  free_names(names);
+  reader_close(&rd);
+}
+
+void test_table_read_write_sequential(void) {
+  char **names;
+  struct slice buf = {};
+  int N = 50;
+  write_table(&names, &buf, N, 256);
+
+  struct block_source source = {};
+  block_source_from_slice(&source, &buf);
+
+  struct reader rd = {};
+  int err = init_reader(&rd, source, "file.ref");
+  assert(err == 0);
+
+  struct iterator it = {};
+  err = reader_seek_ref(&rd, &it, "");
+  assert(err == 0);
+
+  int j = 0;
+  while (true) {
+    struct ref_record ref = {};
+    int r = iterator_next_ref(it, &ref);
+    assert(r >= 0);
+    if (r > 0) {
+      break;
+    }
+    assert(0 == strcmp(names[j], ref.ref_name));
+    assert(update_index == ref.update_index);
+
+    j++;
+    ref_record_clear(&ref);
+  }
+  assert(j == N);
+  iterator_destroy(&it);
+  free(slice_yield(&buf));
+  free_names(names);
+
+  reader_close(&rd);
+}
+
+void test_table_write_small_table(void) {
+  char **names;
+  struct slice buf = {};
+  int N = 1;
+  write_table(&names, &buf, N, 4096);
+  assert(buf.len < 200);
+  free(slice_yield(&buf));
+  free_names(names);
+}
+
+void test_table_read_api(void) {
+  char **names;
+  struct slice buf = {};
+  int N = 50;
+  write_table(&names, &buf, N, 256);
+
+  struct reader rd = {};
+  struct block_source source = {};
+  block_source_from_slice(&source, &buf);
+
+  int err = init_reader(&rd, source, "file.ref");
+  assert(err == 0);
+
+  struct iterator it = {};
+  err = reader_seek_ref(&rd, &it, names[0]);
+  assert(err == 0);
+
+  struct log_record log = {};
+  err = iterator_next_log(it, &log);
+  assert(err == API_ERROR);
+
+  free(slice_yield(&buf));
+  for (int i = 0; i < N; i++) {
+    free(names[i]);
+  }
+  free(names);
+  reader_close(&rd);
+}
+
+void test_table_read_write_seek(bool index) {
+  char **names;
+  struct slice buf = {};
+  int N = 50;
+  write_table(&names, &buf, N, 256);
+
+  struct reader rd = {};
+  struct block_source source = {};
+  block_source_from_slice(&source, &buf);
+
+  int err = init_reader(&rd, source, "file.ref");
+  assert(err == 0);
+
+  if (!index) {
+    rd.ref_offsets.index_offset = 0;
+  }
+
+  for (int i = 1; i < N; i++) {
+    struct iterator it = {};
+    int err = reader_seek_ref(&rd, &it, names[i]);
+    assert(err == 0);
+    struct ref_record ref = {};
+    err = iterator_next_ref(it, &ref);
+    assert(err == 0);
+    assert(0 == strcmp(names[i], ref.ref_name));
+    assert(i == ref.value[0]);
+
+    ref_record_clear(&ref);
+    iterator_destroy(&it);
+  }
+
+  free(slice_yield(&buf));
+  for (int i = 0; i < N; i++) {
+    free(names[i]);
+  }
+  free(names);
+  reader_close(&rd);
+}
+
+void test_table_read_write_seek_linear(void) {
+  test_table_read_write_seek(false);
+}
+
+void test_table_read_write_seek_index(void) {
+  test_table_read_write_seek(true);
+}
+
+void test_table_refs_for(bool indexed) {
+  int N = 50;
+
+  char **want_names = calloc(sizeof(char *), N + 1);
+
+  int want_names_len = 0;
+  byte want_hash[SHA1_SIZE];
+  set_test_hash(want_hash, 4);
+
+  struct write_options opts = {
+      .block_size = 256,
+  };
+
+  struct slice buf = {};
+  struct writer *w = new_writer(&slice_write_void, &buf, &opts);
+  {
+    struct ref_record ref = {};
+    for (int i = 0; i < N; i++) {
+      byte hash[SHA1_SIZE];
+      memset(hash, i, sizeof(hash));
+      char fill[51] = {};
+      memset(fill, 'x', 50);
+      char name[100];
+      // Put the variable part in the start
+      sprintf(name, "br%02d%s", i, fill);
+      name[40] = 0;
+      ref.ref_name = name;
+
+      byte hash1[SHA1_SIZE];
+      byte hash2[SHA1_SIZE];
+
+      set_test_hash(hash1, i / 4);
+      set_test_hash(hash2, 3 + i / 4);
+      ref.value = hash1;
+      ref.target_value = hash2;
+
+      // 80 bytes / entry, so 3 entries per block. Yields 17 blocks.
+      int n = writer_add_ref(w, &ref);
+      assert(n == 0);
+
+      if (0 == memcmp(hash1, want_hash, SHA1_SIZE) ||
+          0 == memcmp(hash2, want_hash, SHA1_SIZE)) {
+        want_names[want_names_len++] = strdup(name);
+      }
+    }
+  }
+
+  int n = writer_close(w);
+  assert(n == 0);
+
+  writer_free(w);
+  w = NULL;
+
+  struct reader rd;
+  struct block_source source = {};
+  block_source_from_slice(&source, &buf);
+
+  int err = init_reader(&rd, source, "file.ref");
+  assert(err == 0);
+  if (!indexed) {
+    rd.obj_offsets.present = 0;
+  }
+
+  struct iterator it = {};
+  err = reader_seek_ref(&rd, &it, "");
+  assert(err == 0);
+  iterator_destroy(&it);
+
+  err = reader_refs_for(&rd, &it, want_hash, SHA1_SIZE);
+  assert(err == 0);
+
+  struct ref_record ref = {};
+
+  int j = 0;
+  while (true) {
+    int err = iterator_next_ref(it, &ref);
+    assert(err >= 0);
+    if (err > 0) {
+      break;
+    }
+
+    assert(j < want_names_len);
+    assert(0 == strcmp(ref.ref_name, want_names[j]));
+    j++;
+    ref_record_clear(&ref);
+  }
+  assert(j == want_names_len);
+
+  free(slice_yield(&buf));
+  free_names(want_names);
+  iterator_destroy(&it);
+  reader_close(&rd);
+}
+
+void test_table_refs_for_no_index(void) { test_table_refs_for(false); }
+
+void test_table_refs_for_obj_index(void) { test_table_refs_for(true); }
+
+int main() {
+  add_test_case("test_log_write_read", test_log_write_read);
+  add_test_case("test_table_write_small_table", &test_table_write_small_table);
+  add_test_case("test_buffer", &test_buffer);
+  add_test_case("test_table_read_api", &test_table_read_api);
+  add_test_case("test_table_read_write_sequential",
+                &test_table_read_write_sequential);
+  add_test_case("test_table_read_write_seek_linear",
+                &test_table_read_write_seek_linear);
+  add_test_case("test_table_read_write_seek_index",
+                &test_table_read_write_seek_index);
+  add_test_case("test_table_read_write_refs_for_no_index",
+                &test_table_refs_for_no_index);
+  add_test_case("test_table_read_write_refs_for_obj_index",
+                &test_table_refs_for_obj_index);
+  test_main();
+}
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 0000000000..d4e4b5c532
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,179 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include <assert.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str) {
+  if (str == NULL) {
+    s->len = 0;
+    return;
+  }
+
+  {
+    int l = strlen(str);
+    l++;  // \0
+    slice_resize(s, l);
+    memcpy(s->buf, str, l);
+    s->len = l - 1;
+  }
+}
+
+void slice_resize(struct slice *s, int l) {
+  if (s->cap < l) {
+    int c = s->cap * 2;
+    if (c < l) {
+      c = l;
+    }
+    s->cap = c;
+    s->buf = realloc(s->buf, s->cap);
+  }
+  s->len = l;
+}
+
+void slice_append_string(struct slice *d, const char *s) {
+  int l1 = d->len;
+  int l2 = strlen(s);
+
+  slice_resize(d, l2 + l1);
+  memcpy(d->buf + l1, s, l2);
+}
+
+void slice_append(struct slice *s, struct slice a) {
+  int end = s->len;
+  slice_resize(s, s->len + a.len);
+  memcpy(s->buf + end, a.buf, a.len);
+}
+
+byte *slice_yield(struct slice *s) {
+  byte *p = s->buf;
+  s->buf = NULL;
+  s->cap = 0;
+  s->len = 0;
+  return p;
+}
+
+void slice_copy(struct slice *dest, struct slice src) {
+  slice_resize(dest, src.len);
+  memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s) {
+  if (s->cap == s->len) {
+    int l = s->len;
+    slice_resize(s, l + 1);
+    s->len = l;
+  }
+  s->buf[s->len] = 0;
+  return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in) {
+  struct slice s = {};
+  slice_resize(&s, in.len + 1);
+  s.buf[in.len] = 0;
+  memcpy(s.buf, in.buf, in.len);
+  return (char *)slice_yield(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b) {
+  if (a.len != b.len) {
+    return 0;
+  }
+  return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_compare(struct slice a, struct slice b) {
+  int min = a.len < b.len ? a.len : b.len;
+  int res = memcmp(a.buf, b.buf, min);
+  if (res != 0) {
+    return res;
+  }
+  if (a.len < b.len) {
+    return -1;
+  } else if (a.len > b.len) {
+    return 1;
+  } else {
+    return 0;
+  }
+}
+
+int slice_write(struct slice *b, byte *data, int sz) {
+  if (b->len + sz > b->cap) {
+    int newcap = 2 * b->cap + 1;
+    if (newcap < b->len + sz) {
+      newcap = (b->len + sz);
+    }
+    b->buf = realloc(b->buf, newcap);
+    b->cap = newcap;
+  }
+
+  memcpy(b->buf + b->len, data, sz);
+  b->len += sz;
+  return sz;
+}
+
+int slice_write_void(void *b, byte *data, int sz) {
+  return slice_write((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b) { return ((struct slice *)b)->len; }
+
+static void slice_return_block(void *b, struct block *dest) {
+  memset(dest->data, 0xff, dest->len);
+  free(dest->data);
+}
+
+static void slice_close(void *b) {}
+
+static int slice_read_block(void *v, struct block *dest, uint64_t off,
+                            uint32_t size) {
+  struct slice *b = (struct slice *)v;
+  assert(off + size <= b->len);
+  dest->data = calloc(size, 1);
+  memcpy(dest->data, b->buf + off, size);
+  dest->len = size;
+  return size;
+}
+
+struct block_source_vtable slice_vtable = {
+    .size = &slice_size,
+    .read_block = &slice_read_block,
+    .return_block = &slice_return_block,
+    .close = &slice_close,
+};
+
+void block_source_from_slice(struct block_source *bs, struct slice *buf) {
+  bs->ops = &slice_vtable;
+  bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct block *dest) {
+  memset(dest->data, 0xff, dest->len);
+  free(dest->data);
+}
+
+struct block_source_vtable malloc_vtable = {
+    .return_block = &malloc_return_block,
+};
+
+struct block_source malloc_block_source_instance = {
+    .ops = &malloc_vtable,
+};
+
+struct block_source malloc_block_source(void) {
+  return malloc_block_source_instance;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 0000000000..6a304f31f4
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+struct slice {
+  byte *buf;
+  int len;
+  int cap;
+};
+
+void slice_set_string(struct slice *dest, const char *);
+void slice_append_string(struct slice *dest, const char *);
+char *slice_to_string(struct slice src);
+const char *slice_as_string(struct slice *src);
+bool slice_equal(struct slice a, struct slice b);
+byte *slice_yield(struct slice *s);
+void slice_copy(struct slice *dest, struct slice src);
+void slice_resize(struct slice *s, int l);
+int slice_compare(struct slice a, struct slice b);
+int slice_write(struct slice *b, byte *data, int sz);
+int slice_write_void(void *b, byte *data, int sz);
+void slice_append(struct slice *dest, struct slice add);
+
+struct block_source;
+void block_source_from_slice(struct block_source *bs, struct slice *buf);
+
+struct block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/slice_test.c b/reftable/slice_test.c
new file mode 100644
index 0000000000..c2c0545b70
--- /dev/null
+++ b/reftable/slice_test.c
@@ -0,0 +1,38 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include <assert.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+
+void test_slice(void) {
+  struct slice s = {};
+  slice_set_string(&s, "abc");
+  assert(0 == strcmp("abc", slice_as_string(&s)));
+
+  struct slice t = {};
+  slice_set_string(&t, "pqr");
+
+  slice_append(&s, t);
+  assert(0 == strcmp("abcpqr", slice_as_string(&s)));
+
+  free(slice_yield(&s));
+  free(slice_yield(&t));
+}
+
+int main() {
+  add_test_case("test_slice", &test_slice);
+  test_main();
+}
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 0000000000..641ac521dd
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,931 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include "merged.h"
+#include "reader.h"
+#include "reftable.h"
+#include "writer.h"
+
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+              struct write_options config) {
+  struct stack *p = calloc(sizeof(struct stack), 1);
+  int err = 0;
+  *dest = NULL;
+  p->list_file = strdup(list_file);
+  p->reftable_dir = strdup(dir);
+  p->config = config;
+
+  err = stack_reload(p);
+  if (err < 0) {
+    stack_destroy(p);
+  } else {
+    *dest = p;
+  }
+  return err;
+}
+
+static int fread_lines(FILE *f, char ***namesp) {
+  long size = 0;
+  int err = fseek(f, 0, SEEK_END);
+  char *buf = NULL;
+  if (err < 0) {
+    err = IO_ERROR;
+    goto exit;
+  }
+  size = ftell(f);
+  if (size < 0) {
+    err = IO_ERROR;
+    goto exit;
+  }
+  err = fseek(f, 0, SEEK_SET);
+  if (err < 0) {
+    err = IO_ERROR;
+    goto exit;
+  }
+
+  buf = malloc(size + 1);
+  if (fread(buf, 1, size, f) != size) {
+    err = IO_ERROR;
+    goto exit;
+  }
+  buf[size] = 0;
+
+  parse_names(buf, size, namesp);
+exit:
+  free(buf);
+  return err;
+}
+
+int read_lines(const char *filename, char ***namesp) {
+  FILE *f = fopen(filename, "r");
+  int err = 0;
+  if (f == NULL) {
+    if (errno == ENOENT) {
+      *namesp = calloc(sizeof(char *), 1);
+      return 0;
+    }
+
+    return IO_ERROR;
+  }
+  err = fread_lines(f, namesp);
+  fclose(f);
+  return err;
+}
+
+struct merged_table *stack_merged_table(struct stack *st) {
+  return st->merged;
+}
+
+/* Close and free the stack */
+void stack_destroy(struct stack *st) {
+  if (st->merged == NULL) {
+    return;
+  }
+
+  merged_table_close(st->merged);
+  merged_table_free(st->merged);
+  st->merged = NULL;
+
+  free(st->list_file);
+  st->list_file = NULL;
+  free(st->reftable_dir);
+  st->reftable_dir = NULL;
+  free(st);
+}
+
+static struct reader **stack_copy_readers(struct stack *st, int cur_len) {
+  struct reader **cur = calloc(sizeof(struct reader *), cur_len);
+  for (int i = 0; i < cur_len; i++) {
+    cur[i] = st->merged->stack[i];
+  }
+  return cur;
+}
+
+static int stack_reload_once(struct stack *st, char **names, bool reuse_open) {
+  int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+  struct reader **cur = stack_copy_readers(st, cur_len);
+  int err = 0;
+  int names_len = names_length(names);
+  struct reader **new_tables = malloc(sizeof(struct reader *) * names_len);
+  int new_tables_len = 0;
+  struct merged_table *new_merged = NULL;
+
+  struct slice table_path = {};
+
+  while (*names) {
+    struct reader *rd = NULL;
+    char *name = *names++;
+
+    // this is linear; we assume compaction keeps the number of tables
+    // under control so this is not quadratic.
+    for (int j = 0; reuse_open && j < cur_len; j++) {
+      if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+        rd = cur[j];
+        cur[j] = NULL;
+        break;
+      }
+    }
+
+    if (rd == NULL) {
+      struct block_source src = {};
+      slice_set_string(&table_path, st->reftable_dir);
+      slice_append_string(&table_path, "/");
+      slice_append_string(&table_path, name);
+
+      err = block_source_from_file(&src, slice_as_string(&table_path));
+      if (err < 0) {
+        goto exit;
+      }
+
+      err = new_reader(&rd, src, name);
+      if (err < 0) {
+        goto exit;
+      }
+    }
+
+    new_tables[new_tables_len++] = rd;
+  }
+
+  // success!
+  err = new_merged_table(&new_merged, new_tables, new_tables_len);
+  if (err < 0) {
+    goto exit;
+  }
+
+  new_tables = NULL;
+  new_tables_len = 0;
+  if (st->merged != NULL) {
+    merged_table_clear(st->merged);
+    merged_table_free(st->merged);
+  }
+  st->merged = new_merged;
+
+  for (int i = 0; i < cur_len; i++) {
+    if (cur[i] != NULL) {
+      reader_close(cur[i]);
+      reader_free(cur[i]);
+    }
+  }
+
+exit:
+  free(slice_yield(&table_path));
+  for (int i = 0; i < new_tables_len; i++) {
+    reader_close(new_tables[i]);
+  }
+  free(new_tables);
+  free(cur);
+  return err;
+}
+
+// return negative if a before b.
+static int tv_cmp(struct timeval *a, struct timeval *b) {
+  time_t diff = a->tv_sec - b->tv_sec;
+  suseconds_t udiff = a->tv_usec - b->tv_usec;
+
+  if (diff != 0) {
+    return diff;
+  }
+
+  return udiff;
+}
+
+static int stack_reload_maybe_reuse(struct stack *st, bool reuse_open) {
+  struct timeval deadline = {};
+  int err = gettimeofday(&deadline, NULL);
+  int64_t delay = 0;
+  int tries = 0;
+  if (err < 0) {
+    return err;
+  }
+
+  deadline.tv_sec += 3;
+  while (true) {
+    char **names = NULL;
+    char **names_after = NULL;
+    struct timeval now = {};
+    int err = gettimeofday(&now, NULL);
+    if (err < 0) {
+      return err;
+    }
+
+    // Only look at deadlines after the first few times. This
+    // simplifies debugging in GDB
+    tries++;
+    if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+      break;
+    }
+
+    err = read_lines(st->list_file, &names);
+    if (err < 0) {
+      free_names(names);
+      return err;
+    }
+    err = stack_reload_once(st, names, reuse_open);
+    if (err == 0) {
+      free_names(names);
+      break;
+    }
+    if (err != NOT_EXIST_ERROR) {
+      free_names(names);
+      return err;
+    }
+
+    err = read_lines(st->list_file, &names_after);
+    if (err < 0) {
+      free_names(names);
+      return err;
+    }
+
+    if (names_equal(names_after, names)) {
+      free_names(names);
+      free_names(names_after);
+      return -1;
+    }
+    free_names(names);
+    free_names(names_after);
+
+    delay = delay + (delay * rand()) / RAND_MAX + 100;
+    usleep(delay);
+  }
+
+  return 0;
+}
+
+int stack_reload(struct stack *st) {
+  return stack_reload_maybe_reuse(st, true);
+}
+
+// -1 = error
+// 0 = up to date
+// 1 = changed.
+static int stack_uptodate(struct stack *st) {
+  char **names = NULL;
+  int err = read_lines(st->list_file, &names);
+  if (err < 0) {
+    return err;
+  }
+
+  for (int i = 0; i < st->merged->stack_len; i++) {
+    if (names[i] == NULL) {
+      err = 1;
+      goto exit;
+    }
+
+    if (0 != strcmp(st->merged->stack[i]->name, names[i])) {
+      err = 1;
+      goto exit;
+    }
+  }
+
+  if (names[st->merged->stack_len] != NULL) {
+    err = 1;
+    goto exit;
+  }
+
+exit:
+  free_names(names);
+  return err;
+}
+
+int stack_add(struct stack *st, int (*write)(struct writer *wr, void *arg),
+              void *arg) {
+  int err = stack_try_add(st, write, arg);
+  if (err < 0) {
+    if (err == LOCK_ERROR) {
+      err = stack_reload(st);
+    }
+    return err;
+  }
+
+  return stack_auto_compact(st);
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max) {
+  char buf[1024];
+  sprintf(buf, "%012lx-%012lx", min, max);
+  slice_set_string(dest, buf);
+}
+
+int stack_try_add(struct stack *st,
+                  int (*write_table)(struct writer *wr, void *arg), void *arg) {
+  struct slice lock_name = {};
+  struct slice temp_tab_name = {};
+  struct slice tab_name = {};
+  struct slice next_name = {};
+  struct slice table_list = {};
+  struct writer *wr = NULL;
+  int err = 0;
+  int tab_fd = 0;
+  int lock_fd = 0;
+  uint64_t next_update_index = 0;
+
+  slice_set_string(&lock_name, st->list_file);
+  slice_append_string(&lock_name, ".lock");
+
+  lock_fd =
+      open(slice_as_string(&lock_name), O_EXCL | O_CREAT | O_WRONLY, 0644);
+  if (lock_fd < 0) {
+    if (errno == EEXIST) {
+      err = LOCK_ERROR;
+      goto exit;
+    }
+    err = IO_ERROR;
+    goto exit;
+  }
+
+  err = stack_uptodate(st);
+  if (err < 0) {
+    goto exit;
+  }
+
+  if (err > 1) {
+    err = LOCK_ERROR;
+    goto exit;
+  }
+
+  next_update_index = stack_next_update_index(st);
+
+  slice_resize(&next_name, 0);
+  format_name(&next_name, next_update_index, next_update_index);
+
+  slice_set_string(&temp_tab_name, st->reftable_dir);
+  slice_append_string(&temp_tab_name, "/");
+  slice_append(&temp_tab_name, next_name);
+  slice_append_string(&temp_tab_name, "XXXXXX");
+
+  tab_fd = mkstemp((char *)slice_as_string(&temp_tab_name));
+  if (tab_fd < 0) {
+    err = IO_ERROR;
+    goto exit;
+  }
+
+  wr = new_writer(fd_writer, &tab_fd, &st->config);
+  err = write_table(wr, arg);
+  if (err < 0) {
+    goto exit;
+  }
+
+  err = writer_close(wr);
+  if (err < 0) {
+    goto exit;
+  }
+
+  err = close(tab_fd);
+  tab_fd = 0;
+  if (err < 0) {
+    err = IO_ERROR;
+    goto exit;
+  }
+
+  if (wr->min_update_index < next_update_index) {
+    err = API_ERROR;
+    goto exit;
+  }
+
+  for (int i = 0; i < st->merged->stack_len; i++) {
+    slice_append_string(&table_list, st->merged->stack[i]->name);
+    slice_append_string(&table_list, "\n");
+  }
+
+  format_name(&next_name, wr->min_update_index, wr->max_update_index);
+  slice_append_string(&next_name, ".ref");
+  slice_append(&table_list, next_name);
+  slice_append_string(&table_list, "\n");
+
+  slice_set_string(&tab_name, st->reftable_dir);
+  slice_append_string(&tab_name, "/");
+  slice_append(&tab_name, next_name);
+
+  err = rename(slice_as_string(&temp_tab_name), slice_as_string(&tab_name));
+  if (err < 0) {
+    err = IO_ERROR;
+    goto exit;
+  }
+  free(slice_yield(&temp_tab_name));
+
+  err = write(lock_fd, table_list.buf, table_list.len);
+  if (err < 0) {
+    err = IO_ERROR;
+    goto exit;
+  }
+  err = close(lock_fd);
+  lock_fd = 0;
+  if (err < 0) {
+    unlink(slice_as_string(&tab_name));
+    err = IO_ERROR;
+    goto exit;
+  }
+
+  err = rename(slice_as_string(&lock_name), st->list_file);
+  if (err < 0) {
+    unlink(slice_as_string(&tab_name));
+    err = IO_ERROR;
+    goto exit;
+  }
+
+  err = stack_reload(st);
+exit:
+  if (tab_fd > 0) {
+    close(tab_fd);
+    tab_fd = 0;
+  }
+  if (temp_tab_name.len > 0) {
+    unlink(slice_as_string(&temp_tab_name));
+  }
+  unlink(slice_as_string(&lock_name));
+
+  if (lock_fd > 0) {
+    close(lock_fd);
+    lock_fd = 0;
+  }
+
+  free(slice_yield(&lock_name));
+  free(slice_yield(&temp_tab_name));
+  free(slice_yield(&tab_name));
+  free(slice_yield(&next_name));
+  free(slice_yield(&table_list));
+  writer_free(wr);
+  return err;
+}
+
+uint64_t stack_next_update_index(struct stack *st) {
+  int sz = st->merged->stack_len;
+  if (sz > 0) {
+    return reader_max_update_index(st->merged->stack[sz - 1]) + 1;
+  }
+  return 1;
+}
+
+static int stack_compact_locked(struct stack *st, int first, int last,
+                                struct slice *temp_tab, struct log_expiry_config *config) {
+  struct slice next_name = {};
+  int tab_fd = -1;
+  struct writer *wr = NULL;
+  int err = 0;
+
+  format_name(&next_name, reader_min_update_index(st->merged->stack[first]),
+              reader_max_update_index(st->merged->stack[first]));
+
+  slice_set_string(temp_tab, st->reftable_dir);
+  slice_append_string(temp_tab, "/");
+  slice_append(temp_tab, next_name);
+  slice_append_string(temp_tab, "XXXXXX");
+
+  tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+  wr = new_writer(fd_writer, &tab_fd, &st->config);
+
+  err = stack_write_compact(st, wr, first, last, config);
+  if (err < 0) {
+    goto exit;
+  }
+  err = writer_close(wr);
+  if (err < 0) {
+    goto exit;
+  }
+  writer_free(wr);
+
+  err = close(tab_fd);
+  tab_fd = 0;
+
+exit:
+  if (tab_fd > 0) {
+    close(tab_fd);
+    tab_fd = 0;
+  }
+  if (err != 0 && temp_tab->len > 0) {
+    unlink(slice_as_string(temp_tab));
+    free(slice_yield(temp_tab));
+  }
+  free(slice_yield(&next_name));
+  return err;
+}
+
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+                        int last, struct log_expiry_config *config) {
+  int subtabs_len = last - first + 1;
+  struct reader **subtabs = calloc(sizeof(struct reader *), last - first + 1);
+  struct merged_table *mt = NULL;
+  int err = 0;
+  struct iterator it = {};
+  struct ref_record ref = {};
+  struct log_record log = {};
+
+  for (int i = first, j = 0; i <= last; i++) {
+    struct reader *t = st->merged->stack[i];
+    subtabs[j++] = t;
+    st->stats.bytes += t->size;
+  }
+  writer_set_limits(wr, st->merged->stack[first]->min_update_index,
+                    st->merged->stack[last]->max_update_index);
+
+  err = new_merged_table(&mt, subtabs, subtabs_len);
+  if (err < 0) {
+    free(subtabs);
+    goto exit;
+  }
+
+  err = merged_table_seek_ref(mt, &it, "");
+  if (err < 0) {
+    goto exit;
+  }
+
+  while (true) {
+    err = iterator_next_ref(it, &ref);
+    if (err > 0) {
+      err = 0;
+      break;
+    }
+    if (err < 0) {
+      break;
+    }
+    if (first == 0 && ref_record_is_deletion(&ref)) {
+      continue;
+    }
+
+    err = writer_add_ref(wr, &ref);
+    if (err < 0) {
+      break;
+    }
+  }
+
+  err = merged_table_seek_log(mt, &it, "");
+  if (err < 0) {
+    goto exit;
+  }
+
+  while (true) {
+    err = iterator_next_log(it, &log);
+    if (err > 0) {
+      err = 0;
+      break;
+    }
+    if (err < 0) {
+      break;
+    }
+    if (first == 0 && log_record_is_deletion(&log)) {
+      continue;
+    }
+
+    // XXX collect stats?
+
+    if (config != NULL && config->time > 0 && log.time < config->time) {
+      continue;
+    }
+
+    if (config != NULL && config->min_update_index > 0 && log.update_index < config->min_update_index) {
+      continue;
+    }
+
+    err = writer_add_log(wr, &log);
+    if (err < 0) {
+      break;
+    }
+  }
+
+
+exit:
+  iterator_destroy(&it);
+  if (mt != NULL) {
+    merged_table_clear(mt);
+    merged_table_free(mt);
+  }
+  ref_record_clear(&ref);
+
+  return err;
+}
+
+// <  0: error. 0 == OK, > 0 attempt failed; could retry.
+static int stack_compact_range(struct stack *st, int first, int last, struct log_expiry_config *expiry) {
+  struct slice temp_tab_name = {};
+  struct slice new_table_name = {};
+  struct slice lock_file_name = {};
+  struct slice ref_list_contents = {};
+  struct slice new_table_path = {};
+  int err = 0;
+  bool have_lock = false;
+  int lock_file_fd = 0;
+  int compact_count = last - first + 1;
+  char **delete_on_success = calloc(sizeof(char *), compact_count + 1);
+  char **subtable_locks = calloc(sizeof(char *), compact_count + 1);
+
+  if (first > last || (expiry == NULL && first == last)) {
+    err = 0;
+    goto exit;
+  }
+
+  st->stats.attempts++;
+
+  slice_set_string(&lock_file_name, st->list_file);
+  slice_append_string(&lock_file_name, ".lock");
+
+  lock_file_fd =
+      open(slice_as_string(&lock_file_name), O_EXCL | O_CREAT | O_WRONLY, 0644);
+  if (lock_file_fd < 0) {
+    if (errno == EEXIST) {
+      err = 1;
+    } else {
+      err = IO_ERROR;
+    }
+    goto exit;
+  }
+  have_lock = true;
+  err = stack_uptodate(st);
+  if (err != 0) {
+    goto exit;
+  }
+
+  for (int i = first, j = 0; i <= last; i++) {
+    struct slice subtab_name = {};
+    struct slice subtab_lock = {};
+    slice_set_string(&subtab_name, st->reftable_dir);
+    slice_append_string(&subtab_name, "/");
+    slice_append_string(&subtab_name, reader_name(st->merged->stack[i]));
+
+    slice_copy(&subtab_lock, subtab_name);
+    slice_append_string(&subtab_lock, ".lock");
+
+    {
+      int sublock_file_fd = open(slice_as_string(&subtab_lock),
+                                 O_EXCL | O_CREAT | O_WRONLY, 0644);
+      if (sublock_file_fd > 0) {
+        close(sublock_file_fd);
+      } else if (sublock_file_fd < 0) {
+        if (errno == EEXIST) {
+          err = 1;
+        }
+        err = IO_ERROR;
+      }
+    }
+
+    subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+    delete_on_success[j] = (char *)slice_as_string(&subtab_name);
+    j++;
+
+    if (err != 0) {
+      goto exit;
+    }
+  }
+
+  err = unlink(slice_as_string(&lock_file_name));
+  if (err < 0) {
+    goto exit;
+  }
+  have_lock = false;
+
+  err = stack_compact_locked(st, first, last, &temp_tab_name, expiry);
+  if (err < 0) {
+    goto exit;
+  }
+
+  lock_file_fd =
+      open(slice_as_string(&lock_file_name), O_EXCL | O_CREAT | O_WRONLY, 0644);
+  if (lock_file_fd < 0) {
+    if (errno == EEXIST) {
+      err = 1;
+    } else {
+      err = IO_ERROR;
+    }
+    goto exit;
+  }
+  have_lock = true;
+
+  format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+              st->merged->stack[last]->max_update_index);
+  slice_append_string(&new_table_name, ".ref");
+
+  slice_set_string(&new_table_path, st->reftable_dir);
+  slice_append_string(&new_table_path, "/");
+
+  slice_append(&new_table_path, new_table_name);
+
+  err =
+      rename(slice_as_string(&temp_tab_name), slice_as_string(&new_table_path));
+  if (err < 0) {
+    goto exit;
+  }
+
+  for (int i = 0; i < first; i++) {
+    slice_append_string(&ref_list_contents, st->merged->stack[i]->name);
+    slice_append_string(&ref_list_contents, "\n");
+  }
+  slice_append(&ref_list_contents, new_table_name);
+  slice_append_string(&ref_list_contents, "\n");
+  for (int i = last + 1; i < st->merged->stack_len; i++) {
+    slice_append_string(&ref_list_contents, st->merged->stack[i]->name);
+    slice_append_string(&ref_list_contents, "\n");
+  }
+
+  err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+  if (err < 0) {
+    unlink(slice_as_string(&new_table_path));
+    goto exit;
+  }
+  err = close(lock_file_fd);
+  lock_file_fd = 0;
+  if (err < 0) {
+    unlink(slice_as_string(&new_table_path));
+    goto exit;
+  }
+
+  err = rename(slice_as_string(&lock_file_name), st->list_file);
+  if (err < 0) {
+    unlink(slice_as_string(&new_table_path));
+    goto exit;
+  }
+  have_lock = false;
+
+  for (char **p = delete_on_success; *p; p++) {
+    if (0 != strcmp(*p, slice_as_string(&new_table_path))) {
+      unlink(*p);
+    }
+  }
+
+  err = stack_reload_maybe_reuse(st, first < last);
+exit:
+  for (char **p = subtable_locks; *p; p++) {
+    unlink(*p);
+  }
+  free_names(delete_on_success);
+  free_names(subtable_locks);
+  if (lock_file_fd > 0) {
+    close(lock_file_fd);
+    lock_file_fd = 0;
+  }
+  if (have_lock) {
+    unlink(slice_as_string(&lock_file_name));
+  }
+  free(slice_yield(&new_table_name));
+  free(slice_yield(&new_table_path));
+  free(slice_yield(&ref_list_contents));
+  free(slice_yield(&temp_tab_name));
+  free(slice_yield(&lock_file_name));
+  return err;
+}
+
+int stack_compact_all(struct stack *st, struct log_expiry_config *config) {
+  return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct stack *st, int first, int last, struct log_expiry_config *config) {
+  int err = stack_compact_range(st, first, last, config);
+  if (err > 0) {
+    st->stats.failures++;
+  }
+  return err;
+}
+
+static int segment_size(struct segment *s) { return s->end - s->start; }
+
+int fastlog2(uint64_t sz) {
+  int l = 0;
+  assert(sz > 0);
+  for (; sz; sz /= 2) {
+    l++;
+  }
+  return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n) {
+  struct segment *segs = calloc(sizeof(struct segment), n);
+  int next = 0;
+  struct segment cur = {};
+  for (int i = 0; i < n; i++) {
+    int log = fastlog2(sizes[i]);
+    if (cur.log != log && cur.bytes > 0) {
+      struct segment fresh = {
+          .start = i,
+      };
+
+      segs[next++] = cur;
+      cur = fresh;
+    }
+
+    cur.log = log;
+    cur.end = i + 1;
+    cur.bytes += sizes[i];
+  }
+
+  segs[next++] = cur;
+  *seglen = next;
+  return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n) {
+  int seglen = 0;
+  struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+  struct segment min_seg = {
+      .log = 64,
+  };
+  for (int i = 0; i < seglen; i++) {
+    if (segment_size(&segs[i]) == 1) {
+      continue;
+    }
+
+    if (segs[i].log < min_seg.log) {
+      min_seg = segs[i];
+    }
+  }
+
+  while (min_seg.start > 0) {
+    int prev = min_seg.start - 1;
+    if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+      break;
+    }
+
+    min_seg.start = prev;
+    min_seg.bytes += sizes[prev];
+  }
+
+  free(segs);
+  return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct stack *st) {
+  uint64_t *sizes = calloc(sizeof(uint64_t), st->merged->stack_len);
+  for (int i = 0; i < st->merged->stack_len; i++) {
+    // overhead is 24 + 68 = 92.
+    sizes[i] = st->merged->stack[i]->size - 91;
+  }
+  return sizes;
+}
+
+int stack_auto_compact(struct stack *st) {
+  uint64_t *sizes = stack_table_sizes_for_compaction(st);
+  struct segment seg = suggest_compaction_segment(sizes, st->merged->stack_len);
+  free(sizes);
+  if (segment_size(&seg) > 0) {
+    return stack_compact_range_stats(st, seg.start, seg.end - 1, NULL);
+  }
+
+  return 0;
+}
+
+struct compaction_stats *stack_compaction_stats(struct stack *st) {
+  return &st->stats;
+}
+
+int stack_read_ref(struct stack *st, const char *refname,
+                   struct ref_record *ref) {
+  struct iterator it = {};
+  struct merged_table *mt = stack_merged_table(st);
+  int err = merged_table_seek_ref(mt, &it, refname);
+  if (err) {
+    goto exit;
+  }
+
+  err = iterator_next_ref(it, ref);
+  if (err) {
+    goto exit;
+  }
+
+  if (0 != strcmp(ref->ref_name, refname) || ref_record_is_deletion(ref)) {
+    err = 1;
+    goto exit;
+  }
+
+exit:
+  iterator_destroy(&it);
+  return err;
+}
+
+int stack_read_log(struct stack *st, const char *refname,
+                   struct log_record *log) {
+  struct iterator it = {};
+  struct merged_table *mt = stack_merged_table(st);
+  int err = merged_table_seek_log(mt, &it, refname);
+  if (err) {
+    goto exit;
+  }
+
+  err = iterator_next_log(it, log);
+  if (err) {
+    goto exit;
+  }
+
+  if (0 != strcmp(log->ref_name, refname) || log_record_is_deletion(log)) {
+    err = 1;
+    goto exit;
+  }
+
+exit:
+  iterator_destroy(&it);
+  return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 0000000000..092a03b894
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,40 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+
+struct stack {
+  char *list_file;
+  char *reftable_dir;
+
+  struct write_options config;
+
+  struct merged_table *merged;
+  struct compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct stack *st,
+                  int (*write_table)(struct writer *wr, void *arg), void *arg);
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+                        int last, struct log_expiry_config *config);
+int fastlog2(uint64_t sz);
+
+struct segment {
+  int start, end;
+  int log;
+  uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
new file mode 100644
index 0000000000..8c0cde0579
--- /dev/null
+++ b/reftable/stack_test.c
@@ -0,0 +1,265 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include <string.h>
+#include <unistd.h>
+
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+
+void test_read_file(void) {
+  char fn[256] = "/tmp/stack.XXXXXX";
+  int fd = mkstemp(fn);
+  assert(fd > 0);
+
+  char out[1024] = "line1\n\nline2\nline3";
+
+  int n = write(fd, out, strlen(out));
+  assert(n == strlen(out));
+  int err = close(fd);
+  assert(err >= 0);
+
+  char **names = NULL;
+  err = read_lines(fn, &names);
+  assert_err(err);
+
+  char *want[] = {"line1", "line2", "line3"};
+  for (int i = 0; names[i] != NULL; i++) {
+    assert(0 == strcmp(want[i], names[i]));
+  }
+  free_names(names);
+  remove(fn);
+}
+
+void test_parse_names(void) {
+  char buf[] = "line\n";
+  char **names = NULL;
+  parse_names(buf, strlen(buf), &names);
+
+  assert(NULL != names[0]);
+  assert(0 == strcmp(names[0], "line"));
+  assert(NULL == names[1]);
+  free_names(names);
+}
+
+void test_names_equal(void) {
+  char *a[] = {"a", "b", "c", NULL};
+  char *b[] = {"a", "b", "d", NULL};
+  char *c[] = {"a", "b", NULL};
+
+  assert(names_equal(a, a));
+  assert(!names_equal(a, b));
+  assert(!names_equal(a, c));
+}
+
+int write_test_ref(struct writer *wr, void *arg) {
+  struct ref_record *ref = arg;
+
+  writer_set_limits(wr, ref->update_index, ref->update_index);
+  int err = writer_add_ref(wr, ref);
+
+  return err;
+}
+
+int write_test_log(struct writer *wr, void *arg) {
+  struct log_record *log = arg;
+
+  writer_set_limits(wr, log->update_index, log->update_index);
+  int err = writer_add_log(wr, log);
+
+  return err;
+}
+
+void test_stack_add(void) {
+  char dir[256] = "/tmp/stack.XXXXXX";
+  assert(mkdtemp(dir));
+  printf("%s\n", dir);
+  char fn[256] = "";
+  strcat(fn, dir);
+  strcat(fn, "/refs");
+
+  struct write_options cfg = {};
+  struct stack *st = NULL;
+  int err = new_stack(&st, dir, fn, cfg);
+  assert_err(err);
+
+  struct ref_record refs[2] = {};
+  struct log_record logs[2] = {};
+  int N = ARRAYSIZE(refs);
+  for (int i = 0; i < N; i++) {
+    char buf[256];
+    sprintf(buf, "branch%02d", i);
+    refs[i].ref_name = strdup(buf);
+    refs[i].value = malloc(SHA1_SIZE);
+    refs[i].update_index = i + 1;
+    set_test_hash(refs[i].value, i);
+
+    logs[i].ref_name = strdup(buf);
+    logs[i].update_index = N + i+1;
+    logs[i].new_hash = malloc(SHA1_SIZE);
+    logs[i].email = strdup("identity@invalid");
+    set_test_hash(logs[i].new_hash, i);
+  }
+
+  for (int i = 0; i < N; i++) {
+    int err = stack_add(st, &write_test_ref, &refs[i]);
+    assert_err(err);
+  }
+
+  for (int i = 0; i < N; i++) {
+    int err = stack_add(st, &write_test_log, &logs[i]);
+    assert_err(err);
+  }
+
+  err = stack_compact_all(st, NULL);
+  assert_err(err);
+
+  for (int i = 0; i < N; i++) {
+    struct ref_record dest = {};
+    int err = stack_read_ref(st, refs[i].ref_name, &dest);
+    assert_err(err);
+    assert(ref_record_equal(&dest, refs + i, SHA1_SIZE));
+    ref_record_clear(&dest);
+  }
+
+  for (int i = 0; i < N; i++) {
+    struct log_record dest = {};
+    int err = stack_read_log(st, refs[i].ref_name, &dest);
+    assert_err(err);
+    assert(log_record_equal(&dest, logs + i, SHA1_SIZE));
+    log_record_clear(&dest);
+  }
+
+  // cleanup
+  stack_destroy(st);
+  for (int i = 0; i < N; i++) {
+    ref_record_clear(&refs[i]);
+    log_record_clear(&logs[i]);
+  }
+}
+
+void test_log2(void) {
+  assert(1 == fastlog2(3));
+  assert(2 == fastlog2(4));
+  assert(2 == fastlog2(5));
+}
+
+void test_sizes_to_segments(void) {
+  uint64_t sizes[] = {2, 3, 4, 5, 7, 9};
+  // .................0  1  2  3  4  5
+
+  int seglen = 0;
+  struct segment *segs = sizes_to_segments(&seglen, sizes, ARRAYSIZE(sizes));
+  assert(segs[2].log == 3);
+  assert(segs[2].start == 5);
+  assert(segs[2].end == 6);
+
+  assert(segs[1].log == 2);
+  assert(segs[1].start == 2);
+  assert(segs[1].end == 5);
+  free(segs);
+}
+
+void test_suggest_compaction_segment(void) {
+  {
+    uint64_t sizes[] = {128, 64, 17, 16, 9, 9, 9, 16, 16};
+    // .................0    1    2  3   4  5  6
+    struct segment min = suggest_compaction_segment(sizes, ARRAYSIZE(sizes));
+    assert(min.start == 2);
+    assert(min.end == 7);
+  }
+
+  {
+    uint64_t sizes[] = {64, 32, 16, 8, 4, 2};
+    struct segment result = suggest_compaction_segment(sizes, ARRAYSIZE(sizes));
+    assert(result.start == result.end);
+  }
+}
+
+void test_reflog_expire(void) {
+  char dir[256] = "/tmp/stack.XXXXXX";
+  assert(mkdtemp(dir));
+  printf("%s\n", dir);
+  char fn[256] = "";
+  strcat(fn, dir);
+  strcat(fn, "/refs");
+
+  struct write_options cfg = {};
+  struct stack *st = NULL;
+  int err = new_stack(&st, dir, fn, cfg);
+  assert_err(err);
+
+  struct log_record logs[20] = {};
+  int N = ARRAYSIZE(logs) -1;
+  for (int i = 1; i <= N; i++) {
+    char buf[256];
+    sprintf(buf, "branch%02d", i);
+
+    logs[i].ref_name = strdup(buf);
+    logs[i].update_index = i;
+    logs[i].time = i;
+    logs[i].new_hash = malloc(SHA1_SIZE);
+    logs[i].email = strdup("identity@invalid");
+    set_test_hash(logs[i].new_hash, i);
+  }
+
+  for (int i = 1; i <= N; i++) {
+    int err = stack_add(st, &write_test_log, &logs[i]);
+    assert_err(err);
+  }
+
+  err = stack_compact_all(st, NULL);
+  assert_err(err);
+
+  struct log_expiry_config expiry = {
+    .time = 10,
+  };
+  err = stack_compact_all(st, &expiry);
+  assert_err(err);
+
+  struct log_record log = {};
+  err = stack_read_log(st, logs[9].ref_name, &log);
+  assert(err == 1);
+
+  err = stack_read_log(st, logs[11].ref_name, &log);
+  assert_err(err);
+
+  expiry.min_update_index = 15;
+  err = stack_compact_all(st, &expiry);
+  assert_err(err);
+
+  err = stack_read_log(st, logs[14].ref_name, &log);
+  assert(err == 1);
+
+  err = stack_read_log(st, logs[16].ref_name, &log);
+  assert_err(err);
+
+  // cleanup
+  stack_destroy(st);
+  for (int i = 0; i < N; i++) {
+    log_record_clear(&logs[i]);
+  }
+}
+
+int main() {
+  add_test_case("test_reflog_expire", test_reflog_expire);
+  add_test_case("test_suggest_compaction_segment",
+                &test_suggest_compaction_segment);
+  add_test_case("test_sizes_to_segments", &test_sizes_to_segments);
+  add_test_case("test_log2", &test_log2);
+  add_test_case("test_parse_names", &test_parse_names);
+  add_test_case("test_read_file", &test_read_file);
+  add_test_case("test_names_equal", &test_names_equal);
+  add_test_case("test_stack_add", &test_stack_add);
+  test_main();
+}
diff --git a/reftable/test_framework.c b/reftable/test_framework.c
new file mode 100644
index 0000000000..4ebf886f35
--- /dev/null
+++ b/reftable/test_framework.c
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "test_framework.h"
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "constants.h"
+
+struct test_case **test_cases;
+int test_case_len;
+int test_case_cap;
+
+struct test_case *new_test_case(const char *name, void (*testfunc)()) {
+  struct test_case *tc = malloc(sizeof(struct test_case));
+  tc->name = name;
+  tc->testfunc = testfunc;
+  return tc;
+}
+
+struct test_case *add_test_case(const char *name, void (*testfunc)()) {
+  struct test_case *tc = new_test_case(name, testfunc);
+  if (test_case_len == test_case_cap) {
+    test_case_cap = 2 * test_case_cap + 1;
+    test_cases = realloc(test_cases, sizeof(struct test_case) * test_case_cap);
+  }
+
+  test_cases[test_case_len++] = tc;
+  return tc;
+}
+
+void test_main() {
+  for (int i = 0; i < test_case_len; i++) {
+    printf("case %s\n", test_cases[i]->name);
+    test_cases[i]->testfunc();
+  }
+}
+
+void set_test_hash(byte *p, int i) { memset(p, (byte)i, SHA1_SIZE); }
+
+void print_names(char **a) {
+  if (a == NULL || *a == NULL) {
+    puts("[]");
+    return;
+  }
+  puts("[");
+  char **p = a;
+  while (*p) {
+    puts(*p);
+    p++;
+  }
+  puts("]");
+}
diff --git a/reftable/test_framework.h b/reftable/test_framework.h
new file mode 100644
index 0000000000..b9defee874
--- /dev/null
+++ b/reftable/test_framework.h
@@ -0,0 +1,63 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TEST_FRAMEWORK_H
+#define TEST_FRAMEWORK_H
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "reftable.h"
+
+#ifdef NDEBUG
+#undef NDEBUG
+#endif
+
+#include <assert.h>
+
+#ifdef assert
+#undef assert
+#endif
+
+#define assert_err(c)                                                        \
+  if (c != 0) {                                                              \
+    fflush(stderr);                                                          \
+    fflush(stdout);                                                          \
+    fprintf(stderr, "%s: %d: error == %d (%s), want 0\n", __FILE__, __LINE__, c, error_str(c)); \
+    abort();                                                                 \
+  }
+
+#define assert_streq(a, b)                                                    \
+  if (0 != strcmp(a, b)) {                                                    \
+    fflush(stderr);                                                           \
+    fflush(stdout);                                                           \
+    fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, __LINE__, #a, a, \
+            #b, b);                                                           \
+    abort();                                                                  \
+  }
+
+#define assert(c)                                                             \
+  if (!(c)) {                                                                 \
+    fflush(stderr);                                                           \
+    fflush(stdout);                                                           \
+    fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, __LINE__, #c); \
+    abort();                                                                  \
+  }
+
+struct test_case {
+  const char *name;
+  void (*testfunc)();
+};
+
+struct test_case *new_test_case(const char *name, void (*testfunc)());
+struct test_case *add_test_case(const char *name, void (*testfunc)());
+void test_main();
+
+void set_test_hash(byte *p, int i);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 0000000000..4bd9dce7c3
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include <stdlib.h>
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+                              int (*compare)(const void *, const void *),
+                              int insert) {
+  if (*rootp == NULL) {
+    if (!insert) {
+      return NULL;
+    } else {
+      struct tree_node *n = calloc(sizeof(struct tree_node), 1);
+      n->key = key;
+      *rootp = n;
+      return *rootp;
+    }
+  }
+
+  {
+    int res = compare(key, (*rootp)->key);
+    if (res < 0) {
+      return tree_search(key, &(*rootp)->left, compare, insert);
+    } else if (res > 0) {
+      return tree_search(key, &(*rootp)->right, compare, insert);
+    }
+  }
+  return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+                void *arg) {
+  if (t->left != NULL) {
+    infix_walk(t->left, action, arg);
+  }
+  action(arg, t->key);
+  if (t->right != NULL) {
+    infix_walk(t->right, action, arg);
+  }
+}
+
+void tree_free(struct tree_node *t) {
+  if (t == NULL) {
+    return;
+  }
+  if (t->left != NULL) {
+    tree_free(t->left);
+  }
+  if (t->right != NULL) {
+    tree_free(t->right);
+  }
+  free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 0000000000..e37b8215c4
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,24 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+struct tree_node {
+  void *key;
+  struct tree_node *left, *right;
+};
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+                              int (*compare)(const void *, const void *),
+                              int insert);
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+                void *arg);
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/tree_test.c b/reftable/tree_test.c
new file mode 100644
index 0000000000..ee4b6530d5
--- /dev/null
+++ b/reftable/tree_test.c
@@ -0,0 +1,54 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+
+static int test_compare(const void *a, const void *b) { return a - b; }
+
+struct curry {
+  void *last;
+};
+
+void check_increasing(void *arg, void *key) {
+  struct curry *c = (struct curry *)arg;
+  if (c->last != NULL) {
+    assert(test_compare(c->last, key) < 0);
+  }
+  c->last = key;
+}
+
+void test_tree() {
+  struct tree_node *root = NULL;
+
+  void *values[11] = {};
+  struct tree_node *nodes[11] = {};
+  int i = 1;
+  do {
+    nodes[i] = tree_search(values + i, &root, &test_compare, 1);
+    i = (i * 7) % 11;
+  } while (i != 1);
+
+  for (int i = 1; i < ARRAYSIZE(nodes); i++) {
+    assert(values + i == nodes[i]->key);
+    assert(nodes[i] == tree_search(values + i, &root, &test_compare, 0));
+  }
+
+  struct curry c = {};
+  infix_walk(root, check_increasing, &c);
+  tree_free(root);
+}
+
+int main() {
+  add_test_case("test_tree", &test_tree);
+  test_main();
+}
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 0000000000..542ad0fa00
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,586 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include <assert.h>
+#include <stdint.h>
+#include <stdio.h>  // debug
+#include <stdlib.h>
+#include <string.h>
+#include <zlib.h>
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct block_stats *writer_block_stats(struct writer *w, byte typ) {
+  switch (typ) {
+    case 'r':
+      return &w->stats.ref_stats;
+    case 'o':
+      return &w->stats.obj_stats;
+    case 'i':
+      return &w->stats.idx_stats;
+    case 'g':
+      return &w->stats.log_stats;
+  }
+  assert(false);
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct writer *w, byte *data, size_t len, int padding) {
+  int n = 0;
+  if (w->pending_padding > 0) {
+    byte *zeroed = calloc(w->pending_padding, 1);
+    int n = w->write(w->write_arg, zeroed, w->pending_padding);
+    if (n < 0) {
+      return n;
+    }
+
+    w->pending_padding = 0;
+    free(zeroed);
+  }
+
+  w->pending_padding = padding;
+  n = w->write(w->write_arg, data, len);
+  if (n < 0) {
+    return n;
+  }
+  n += padding;
+  return 0;
+}
+
+static void options_set_defaults(struct write_options *opts) {
+  if (opts->restart_interval == 0) {
+    opts->restart_interval = 16;
+  }
+
+  if (opts->block_size == 0) {
+    opts->block_size = DEFAULT_BLOCK_SIZE;
+  }
+}
+
+static int writer_write_header(struct writer *w, byte *dest) {
+  strcpy((char *)dest, "REFT");
+  dest[4] = 1;  // version
+  put_u24(dest + 5, w->opts.block_size);
+  put_u64(dest + 8, w->min_update_index);
+  put_u64(dest + 16, w->max_update_index);
+  return 24;
+}
+
+static void writer_reinit_block_writer(struct writer *w, byte typ) {
+  int block_start = 0;
+  if (w->next == 0) {
+    block_start = HEADER_SIZE;
+  }
+
+  block_writer_init(&w->block_writer_data, typ, w->block, w->opts.block_size,
+                    block_start, w->hash_size);
+  w->block_writer = &w->block_writer_data;
+  w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct writer *new_writer(int (*writer_func)(void *, byte *, int),
+                          void *writer_arg, struct write_options *opts) {
+  struct writer *wp = calloc(sizeof(struct writer), 1);
+  options_set_defaults(opts);
+  if (opts->block_size >= (1 << 24)) {
+    // TODO - error return?
+    abort();
+  }
+  wp->hash_size = SHA1_SIZE;
+  wp->block = calloc(opts->block_size, 1);
+  wp->write = writer_func;
+  wp->write_arg = writer_arg;
+  wp->opts = *opts;
+  writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+  return wp;
+}
+
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max) {
+  w->min_update_index = min;
+  w->max_update_index = max;
+}
+
+void writer_free(struct writer *w) {
+  free(w->block);
+  free(w);
+}
+
+struct obj_index_tree_node {
+  struct slice hash;
+  uint64_t *offsets;
+  int offset_len;
+  int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b) {
+  return slice_compare(((const struct obj_index_tree_node *)a)->hash,
+                       ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct writer *w, struct slice hash) {
+  uint64_t off = w->next;
+
+  struct obj_index_tree_node want = {.hash = hash};
+
+  struct tree_node *node =
+      tree_search(&want, &w->obj_index_tree, &obj_index_tree_node_compare, 0);
+  struct obj_index_tree_node *key = NULL;
+  if (node == NULL) {
+    key = calloc(sizeof(struct obj_index_tree_node), 1);
+    slice_copy(&key->hash, hash);
+    tree_search((void *)key, &w->obj_index_tree, &obj_index_tree_node_compare,
+                1);
+  } else {
+    key = node->key;
+  }
+
+  if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+    return;
+  }
+
+  if (key->offset_len == key->offset_cap) {
+    key->offset_cap = 2 * key->offset_cap + 1;
+    key->offsets = realloc(key->offsets, sizeof(uint64_t) * key->offset_cap);
+  }
+
+  key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct writer *w, struct record rec) {
+  int result = -1;
+  struct slice key = {};
+  int err = 0;
+  record_key(rec, &key);
+  if (slice_compare(w->last_key, key) >= 0) {
+    goto exit;
+  }
+
+  slice_copy(&w->last_key, key);
+  if (w->block_writer == NULL) {
+    writer_reinit_block_writer(w, record_type(rec));
+  }
+
+  assert(block_writer_type(w->block_writer) == record_type(rec));
+
+  if (block_writer_add(w->block_writer, rec) == 0) {
+    result = 0;
+    goto exit;
+  }
+
+  err = writer_flush_block(w);
+  if (err < 0) {
+    result = err;
+    goto exit;
+  }
+
+  writer_reinit_block_writer(w, record_type(rec));
+  err = block_writer_add(w->block_writer, rec);
+  if (err < 0) {
+    result = err;
+    goto exit;
+  }
+
+  result = 0;
+exit:
+  free(slice_yield(&key));
+  return result;
+}
+
+int writer_add_ref(struct writer *w, struct ref_record *ref) {
+  struct record rec = {};
+  struct ref_record copy = *ref;
+  int err = 0;
+
+  if (ref->update_index < w->min_update_index ||
+      ref->update_index > w->max_update_index) {
+    return API_ERROR;
+  }
+
+  record_from_ref(&rec, &copy);
+  copy.update_index -= w->min_update_index;
+  err = writer_add_record(w, rec);
+  if (err < 0) {
+    return err;
+  }
+
+  if (!w->opts.skip_index_objects && ref->value != NULL) {
+    struct slice h = {
+        .buf = ref->value,
+        .len = w->hash_size,
+    };
+
+    writer_index_hash(w, h);
+  }
+  if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+    struct slice h = {
+        .buf = ref->target_value,
+        .len = w->hash_size,
+    };
+    writer_index_hash(w, h);
+  }
+  return 0;
+}
+
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n) {
+  int err = 0;
+  qsort(refs, n, sizeof(struct ref_record), ref_record_compare_name);
+  for (int i = 0; err == 0 && i < n; i++) {
+    err = writer_add_ref(w, &refs[i]);
+  }
+  return err;
+}
+
+int writer_add_log(struct writer *w, struct log_record *log) {
+  if (w->block_writer != NULL &&
+      block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+    int err = writer_finish_public_section(w);
+    if (err < 0) {
+      return err;
+    }
+  }
+
+  {
+    struct record rec = {};
+    int err;
+    record_from_log(&rec, log);
+    err = writer_add_record(w, rec);
+    return err;
+  }
+}
+
+int writer_add_logs(struct writer *w, struct log_record *logs, int n) {
+  int err = 0;
+  qsort(logs, n, sizeof(struct log_record), log_record_compare_key);
+  for (int i = 0; err == 0 && i < n; i++) {
+    err = writer_add_log(w, &logs[i]);
+  }
+  return err;
+}
+
+
+static int writer_finish_section(struct writer *w) {
+  byte typ = block_writer_type(w->block_writer);
+  uint64_t index_start = 0;
+  int max_level = 0;
+  int threshold = w->opts.unpadded ? 1 : 3;
+  int before_blocks = w->stats.idx_stats.blocks;
+  int err = writer_flush_block(w);
+  if (err < 0) {
+    return err;
+  }
+
+  while (w->index_len > threshold) {
+    struct index_record *idx = NULL;
+    int idx_len = 0;
+
+    max_level++;
+    index_start = w->next;
+    writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+    idx = w->index;
+    idx_len = w->index_len;
+
+    w->index = NULL;
+    w->index_len = 0;
+    w->index_cap = 0;
+    for (int i = 0; i < idx_len; i++) {
+      struct record rec = {};
+      record_from_index(&rec, idx + i);
+      if (block_writer_add(w->block_writer, rec) == 0) {
+        continue;
+      }
+
+      {
+        int err = writer_flush_block(w);
+        if (err < 0) {
+          return err;
+        }
+      }
+
+      writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+      err = block_writer_add(w->block_writer, rec);
+      assert(err == 0);
+    }
+    for (int i = 0; i < idx_len; i++) {
+      free(slice_yield(&idx[i].last_key));
+    }
+    free(idx);
+  }
+
+  writer_clear_index(w);
+
+  err = writer_flush_block(w);
+  if (err < 0) {
+    return err;
+  }
+
+  {
+    struct block_stats *bstats = writer_block_stats(w, typ);
+    bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
+    bstats->index_offset = index_start;
+    bstats->max_index_level = max_level;
+  }
+
+  // Reinit lastKey, as the next section can start with any key.
+  w->last_key.len = 0;
+
+  return 0;
+}
+
+struct common_prefix_arg {
+  struct slice *last;
+  int max;
+};
+
+static void update_common(void *void_arg, void *key) {
+  struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+  struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+  if (arg->last != NULL) {
+    int n = common_prefix_size(entry->hash, *arg->last);
+    if (n > arg->max) {
+      arg->max = n;
+    }
+  }
+  arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+  struct writer *w;
+  int err;
+};
+
+static void write_object_record(void *void_arg, void *key) {
+  struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+  struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+  struct obj_record obj_rec = {
+      .hash_prefix = entry->hash.buf,
+      .hash_prefix_len = arg->w->stats.object_id_len,
+      .offsets = entry->offsets,
+      .offset_len = entry->offset_len,
+  };
+  struct record rec = {};
+  if (arg->err < 0) {
+    goto exit;
+  }
+
+  record_from_obj(&rec, &obj_rec);
+  arg->err = block_writer_add(arg->w->block_writer, rec);
+  if (arg->err == 0) {
+    goto exit;
+  }
+
+  arg->err = writer_flush_block(arg->w);
+  if (arg->err < 0) {
+    goto exit;
+  }
+
+  writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+  arg->err = block_writer_add(arg->w->block_writer, rec);
+  if (arg->err == 0) {
+    goto exit;
+  }
+  obj_rec.offset_len = 0;
+  arg->err = block_writer_add(arg->w->block_writer, rec);
+
+  // Should be able to write into a fresh block.
+  assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key) {
+  struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+  free(entry->offsets);
+  entry->offsets = NULL;
+  free(slice_yield(&entry->hash));
+  free(entry);
+}
+
+static int writer_dump_object_index(struct writer *w) {
+  struct write_record_arg closure = {.w = w};
+  struct common_prefix_arg common = {};
+  if (w->obj_index_tree != NULL) {
+    infix_walk(w->obj_index_tree, &update_common, &common);
+  }
+  w->stats.object_id_len = common.max + 1;
+
+  writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+  if (w->obj_index_tree != NULL) {
+    infix_walk(w->obj_index_tree, &write_object_record, &closure);
+  }
+
+  if (closure.err < 0) {
+    return closure.err;
+  }
+  return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct writer *w) {
+  byte typ = 0;
+  int err = 0;
+
+  if (w->block_writer == NULL) {
+    return 0;
+  }
+
+  typ = block_writer_type(w->block_writer);
+  err = writer_finish_section(w);
+  if (err < 0) {
+    return err;
+  }
+  if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+      w->stats.ref_stats.index_blocks > 0) {
+    err = writer_dump_object_index(w);
+    if (err < 0) {
+      return err;
+    }
+  }
+
+  if (w->obj_index_tree != NULL) {
+    infix_walk(w->obj_index_tree, &object_record_free, NULL);
+    tree_free(w->obj_index_tree);
+    w->obj_index_tree = NULL;
+  }
+
+  w->block_writer = NULL;
+  return 0;
+}
+
+int writer_close(struct writer *w) {
+  byte footer[68];
+  byte *p = footer;
+
+  writer_finish_public_section(w);
+
+  writer_write_header(w, footer);
+  p += 24;
+  put_u64(p, w->stats.ref_stats.index_offset);
+  p += 8;
+  put_u64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+  p += 8;
+  put_u64(p, w->stats.obj_stats.index_offset);
+  p += 8;
+
+  put_u64(p, w->stats.log_stats.offset);
+  p += 8;
+  put_u64(p, w->stats.log_stats.index_offset);
+  p += 8;
+
+  put_u32(p, crc32(0, footer, p - footer));
+  p += 4;
+  w->pending_padding = 0;
+
+  {
+    int n = padded_write(w, footer, sizeof(footer), 0);
+    if (n < 0) {
+      return n;
+    }
+  }
+
+  // free up memory.
+  block_writer_clear(&w->block_writer_data);
+  writer_clear_index(w);
+  free(slice_yield(&w->last_key));
+  return 0;
+}
+
+void writer_clear_index(struct writer *w) {
+  for (int i = 0; i < w->index_len; i++) {
+    free(slice_yield(&w->index[i].last_key));
+  }
+
+  free(w->index);
+  w->index = NULL;
+  w->index_len = 0;
+  w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct writer *w) {
+  byte typ = block_writer_type(w->block_writer);
+  struct block_stats *bstats = writer_block_stats(w, typ);
+  uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+  int raw_bytes = block_writer_finish(w->block_writer);
+  int padding = 0;
+  int err = 0;
+  if (raw_bytes < 0) {
+    return raw_bytes;
+  }
+
+  if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+    padding = w->opts.block_size - raw_bytes;
+  }
+
+  if (block_typ_off > 0) {
+    bstats->offset = block_typ_off;
+  }
+
+  bstats->entries += w->block_writer->entries;
+  bstats->restarts += w->block_writer->restart_len;
+  bstats->blocks++;
+  w->stats.blocks++;
+
+  if (debug) {
+    fprintf(stderr, "block %c off %ld sz %d (%d)\n", typ, w->next, raw_bytes,
+            get_u24(w->block + w->block_writer->header_off + 1));
+  }
+
+  if (w->next == 0) {
+    writer_write_header(w, w->block);
+  }
+
+  err = padded_write(w, w->block, raw_bytes, padding);
+  if (err < 0) {
+    return err;
+  }
+
+  if (w->index_cap == w->index_len) {
+    w->index_cap = 2 * w->index_cap + 1;
+    w->index = realloc(w->index, sizeof(struct index_record) * w->index_cap);
+  }
+
+  {
+    struct index_record ir = {
+        .offset = w->next,
+    };
+    slice_copy(&ir.last_key, w->block_writer->last_key);
+    w->index[w->index_len] = ir;
+  }
+
+  w->index_len++;
+  w->next += padding + raw_bytes;
+  block_writer_reset(&w->block_writer_data);
+  w->block_writer = NULL;
+  return 0;
+}
+
+int writer_flush_block(struct writer *w) {
+  if (w->block_writer == NULL) {
+    return 0;
+  }
+  if (w->block_writer->entries == 0) {
+    return 0;
+  }
+  return writer_flush_nonempty_block(w);
+}
+
+struct stats *writer_stats(struct writer *w) {
+  return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 0000000000..1f1b472727
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,46 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct writer {
+  int (*write)(void *, byte *, int);
+  void *write_arg;
+  int pending_padding;
+  int hash_size;
+  struct slice last_key;
+
+  uint64_t next;
+  uint64_t min_update_index, max_update_index;
+  struct write_options opts;
+
+  byte *block;
+  struct block_writer *block_writer;
+  struct block_writer block_writer_data;
+  struct index_record *index;
+  int index_len;
+  int index_cap;
+
+  // tree for use with tsearch
+  struct tree_node *obj_index_tree;
+
+  struct stats stats;
+};
+
+int writer_flush_block(struct writer *w);
+void writer_clear_index(struct writer *w);
+int writer_finish_public_section(struct writer *w);
+
+#endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH 5/5] Reftable support for git-core
  2020-01-23 19:41 [PATCH 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                   ` (3 preceding siblings ...)
  2020-01-23 19:41 ` [PATCH 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-01-23 19:41 ` Han-Wen Nienhuys via GitGitGadget
  2020-01-23 21:44 ` [PATCH 0/5] Reftable support git-core Junio C Hamano
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-01-23 19:41 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

$ ~/vc/git/git init
warning: templates not found in /usr/local/google/home/hanwen/share/git-core/templates
Initialized empty Git repository in /tmp/qz/.git/
$ echo q > a
$ ~/vc/git/git add a
$ ~/vc/git/git commit -mx
fatal: not a git repository (or any of the parent directories): .git
[master (root-commit) 373d969] x
 1 file changed, 1 insertion(+)
 create mode 100644 a
$ ~/vc/git/git show-ref
373d96972fca9b63595740bba3898a762778ba20 HEAD
373d96972fca9b63595740bba3898a762778ba20 refs/heads/master
$ ls -l .git/reftable/
total 12
-rw------- 1 hanwen primarygroup  126 Jan 23 20:08 000000000001-000000000001.ref
-rw------- 1 hanwen primarygroup 4277 Jan 23 20:08 000000000002-000000000002.ref
$ go run ~/vc/reftable/cmd/dump.go  -table /tmp/qz/.git/reftable/000000000002-000000000002.ref
** DEBUG **
name /tmp/qz/.git/reftable/000000000002-000000000002.ref, sz 4209: 'r' reftable.readerOffsets{Present:true, Offset:0x0, IndexOffset:0x0}, 'o' reftable.readerOffsets{Present:false, Offset:0x0, IndexOffset:0x0} 'g' reftable.readerOffsets{Present:true, Offset:0x1000, IndexOffset:0x0}
** REFS **
reftable.RefRecord{RefName:"refs/heads/master", UpdateIndex:0x2, Value:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, TargetValue:[]uint8(nil), Target:""}
** LOGS **
reftable.LogRecord{RefName:"HEAD", UpdateIndex:0x2, New:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, Old:[]uint8{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, Name:"Han-Wen Nienhuys", Email:"hanwen@google.com", Time:0x5e29ef27, TZOffset:100, Message:"commit (initial): x\n"}

TODO:
 * resolve the design problem with reflog expiry.

Change-Id: I225ee6317b7911edf9aa95f43299f6c7c4511914
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile                |  23 +-
 refs.c                  |  18 +-
 refs.h                  |   2 +
 refs/refs-internal.h    |   1 +
 refs/reftable-backend.c | 780 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 820 insertions(+), 4 deletions(-)
 create mode 100644 refs/reftable-backend.c

diff --git a/Makefile b/Makefile
index 09f98b777c..4f0bcbfbcb 100644
--- a/Makefile
+++ b/Makefile
@@ -813,6 +813,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += command-list.h
 
@@ -958,6 +959,7 @@ LIB_OBJS += rebase-interactive.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1162,7 +1164,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2343,11 +2345,27 @@ VCSSVN_OBJS += vcs-svn/fast_export.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/bytes.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2484,6 +2502,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
diff --git a/refs.c b/refs.c
index 1ab0bb54d3..b4764c9a68 100644
--- a/refs.c
+++ b/refs.c
@@ -20,7 +20,7 @@
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1839,10 +1839,22 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
 static struct ref_store *ref_store_init(const char *gitdir,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct strbuf refs_path = STRBUF_INIT;
+
+        // XXX this should probably come from a git config setting and not
+        // default to reftable.
+	const char *be_name = "reftable";
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	strbuf_addstr(&refs_path, gitdir);
+	strbuf_addstr(&refs_path, "/refs");
+
+	if (!is_directory(refs_path.buf)) {
+		be_name = "reftable";
+	}
+	strbuf_release(&refs_path);
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
diff --git a/refs.h b/refs.h
index 545029c6d8..4eedee37c9 100644
--- a/refs.h
+++ b/refs.h
@@ -456,6 +456,8 @@ int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
 				     const char *refname,
 				     each_reflog_ent_fn fn,
 				     void *cb_data);
+
+// XXX which ordering are these? Newest or oldest first?
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index fc18b12340..2744aa57f2 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -659,6 +659,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 0000000000..6fcb72dd95
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,780 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *reftable_dir;
+	char *table_list_file;
+	struct stack *stack;
+};
+
+static void clear_log_record(struct log_record* log) {
+        log->old_hash = NULL;
+        log->new_hash = NULL;
+        log->message = NULL;
+        log->ref_name = NULL;
+        log_record_clear(log);
+}
+
+static void fill_log_record(struct log_record* log) {
+        const char* info = git_committer_info(0);
+        struct ident_split split = {};
+        int result = split_ident_line(&split, info, strlen(info));
+        int sign = 1;
+        assert(0==result);
+
+        log_record_clear(log);
+        log->name = xstrndup(split.name_begin, split.name_end - split.name_begin);
+        log->email = xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+        log->time = atol(split.date_begin);
+        if (*split.tz_begin == '-') {
+                sign = -1;
+                split.tz_begin ++;
+        }
+        if (*split.tz_begin == '+') {
+                sign = 1;
+                split.tz_begin ++;
+        }
+
+        log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *reftable_ref_store_create(const char *path,
+                                                   unsigned int store_flags) {
+  	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct write_options cfg = {
+		.block_size = 4096,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+
+	strbuf_addf(&sb, "%s/refs", path);
+	refs->table_list_file = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_release(&sb);
+
+	refs->err = new_stack(&refs->stack, refs->reftable_dir, refs->table_list_file, cfg);
+
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
+	FILE *f = fopen(refs->table_list_file, "a");
+	if (f == NULL) {
+	  return -1;
+	}
+	fclose(f);
+
+	safe_create_dir(refs->reftable_dir, 1);
+	return 0;
+}
+
+
+struct reftable_iterator {
+	struct ref_iterator base;
+	struct iterator iter;
+	struct ref_record ref;
+	struct object_id oid;
+        struct ref_store *ref_store;
+	char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+        while (1) {
+                struct reftable_iterator *ri = (struct reftable_iterator*) ref_iterator;
+                int err = iterator_next_ref(ri->iter, &ri->ref);
+                if (err > 0) {
+                        return ITER_DONE;
+                }
+                if (err < 0) {
+                        return ITER_ERROR;
+                }
+
+                ri->base.refname = ri->ref.ref_name;
+                if (ri->prefix != NULL && 0 != strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+                        return ITER_DONE;
+                }
+
+                ri->base.flags = 0;
+                if (ri->ref.value != NULL) {
+                        memcpy(ri->oid.hash, ri->ref.value, GIT_SHA1_RAWSZ);
+                } else if (ri->ref.target != NULL) {
+                        int out_flags = 0;
+                        const char *resolved =
+                                refs_resolve_ref_unsafe(ri->ref_store, ri->ref.ref_name, RESOLVE_REF_READING,
+                                                        &ri->oid, &out_flags);
+                        ri->base.flags = out_flags;
+                        if (resolved == NULL && !(ri->base.flags & DO_FOR_EACH_INCLUDE_BROKEN)
+                            && (ri->base.flags & REF_ISBROKEN)) {
+                                continue;
+                        }
+                }
+                ri->base.oid  = &ri->oid;
+                return ITER_OK;
+        }
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator*) ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		memcpy(peeled->hash, ri->ref.target_value, GIT_SHA1_RAWSZ);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator*) ref_iterator;
+	ref_record_clear(&ri->ref);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance,
+	reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *reftable_ref_iterator_begin(
+		struct ref_store *ref_store,
+		const char *prefix, unsigned int flags)
+{
+	struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
+	struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct merged_table *mt = NULL;
+	int err = 0;
+	if (refs->err) {
+		// XXX ?
+		return NULL;
+	}
+
+	mt = stack_merged_table(refs->stack);
+
+	// XXX something with flags?
+	err = merged_table_seek_ref(mt, &ri->iter, prefix);
+	// XXX what to do with err?
+	assert(err == 0);
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->base.oid = &ri->oid;
+        ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				    struct ref_transaction *transaction,
+				    struct strbuf *err)
+{
+	struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
+	(void)refs;
+	return 0;
+}
+
+
+static int ref_update_cmp(const void *a, const void *b) {
+	return strcmp(((struct ref_update*)a)->refname, ((struct ref_update*)b)->refname);
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname, struct object_id *want_oid) {
+        struct object_id out_oid = {};
+        int out_flags = 0;
+        const char *resolved =
+                refs_resolve_ref_unsafe(refs, refname, RESOLVE_REF_READING,
+                                        &out_oid, &out_flags);
+        if (is_null_oid(want_oid) != (resolved == NULL)) {
+                return LOCK_ERROR;
+        }
+
+        if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+                return LOCK_ERROR;
+        }
+
+        return 0;
+}
+
+static int write_transaction_table(struct writer *writer, void *arg) {
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct reftable_ref_store *refs
+		= (struct reftable_ref_store*)transaction->ref_store;
+	uint64_t ts = stack_next_update_index(refs->stack);
+	int err = 0;
+	// XXX - are we allowed to mutate the input data?
+	qsort(transaction->updates, transaction->nr, sizeof(struct ref_update*),
+	      ref_update_cmp);
+	writer_set_limits(writer, ts, ts);
+
+	for (int i = 0; i <  transaction->nr; i++) {
+		struct ref_update * u = transaction->updates[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store, u->refname, &u->old_oid);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i <  transaction->nr; i++) {
+		struct ref_update * u = transaction->updates[i];
+		if (u->flags & REF_HAVE_NEW) {
+                        struct object_id out_oid = {};
+                        int out_flags = 0;
+                        // XXX who owns the memory here?
+                        const char *resolved
+                                = refs_resolve_ref_unsafe(transaction->ref_store, u->refname, 0, &out_oid, &out_flags);
+                        struct ref_record ref = {};
+                        ref.ref_name = (char*)(resolved ? resolved : u->refname);
+                        ref.value = u->new_oid.hash;
+                        ref.update_index = ts;
+                        err = writer_add_ref(writer, &ref);
+                        if (err < 0) {
+                                goto exit;
+                        }
+		}
+	}
+
+
+	for (int i = 0; i <  transaction->nr; i++) {
+		struct ref_update * u = transaction->updates[i];
+                struct log_record log = {};
+                fill_log_record(&log);
+
+                log.ref_name = (char*)u->refname;
+                log.old_hash = u->old_oid.hash;
+                log.new_hash = u->new_oid.hash;
+                log.update_index = ts;
+                log.message = u->msg;
+
+                err = writer_add_log(writer, &log);
+                clear_log_record(&log);
+                if (err < 0) { return err; }
+        }
+exit:
+	return err;
+}
+
+static int reftable_transaction_commit(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
+	int err = stack_add(refs->stack,
+		  &write_transaction_table,
+		  transaction);
+	if (err < 0) {
+		strbuf_addf(errmsg, "reftable: transaction failure %s", error_str(err));
+		return -1;
+ 	}
+
+	return 0;
+}
+
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				     struct ref_transaction *transaction,
+				     struct strbuf *err)
+{
+        return reftable_transaction_commit(ref_store, transaction, err);
+}
+
+
+
+struct write_delete_refs_arg {
+	struct stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+        unsigned int flags;
+};
+
+static int write_delete_refs_table(struct writer *writer, void *argv) {
+	struct write_delete_refs_arg *arg = (struct write_delete_refs_arg*)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	int err = 0;
+
+        writer_set_limits(writer, ts, ts);
+        for (int i = 0; i < arg->refnames->nr; i++) {
+                struct ref_record ref = {
+                        .ref_name = (char*) arg->refnames->items[i].string,
+                        .update_index = ts,
+                };
+                err = writer_add_ref(writer, &ref);
+                if (err < 0) {
+                        return err;
+                }
+        }
+
+        for (int i = 0; i < arg->refnames->nr; i++) {
+                struct log_record log = {};
+                fill_log_record(&log);
+                log.message = xstrdup(arg->logmsg);
+                log.new_hash = NULL;
+
+                // XXX should lookup old oid.
+                log.old_hash = NULL;
+                log.update_index = ts;
+                log.ref_name = (char*) arg->refnames->items[i].string;
+
+                err = writer_add_log(writer, &log);
+                clear_log_record(&log);
+                if (err < 0) {
+                        return err;
+                }
+        }
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+                                struct string_list *refnames, unsigned int flags)
+{
+	struct reftable_ref_store *refs = (struct reftable_ref_store*)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+                .flags = flags,
+	};
+	return stack_add(refs->stack, &write_delete_refs_table, &arg);
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct reftable_ref_store *refs = (struct reftable_ref_store*)ref_store;
+        // XXX reflog expiry.
+	return stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct stack *stack;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct writer *writer, void *arg) {
+	struct write_create_symref_arg *create = (struct write_create_symref_arg*)arg;
+	uint64_t ts = stack_next_update_index(create->stack);
+	int err = 0;
+
+	struct ref_record ref = {
+		.ref_name = (char*) create->refname,
+		.target = (char*) create->target,
+		.update_index = ts,
+	};
+	writer_set_limits(writer, ts, ts);
+	err = writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	// XXX reflog?
+
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct reftable_ref_store *refs = (struct reftable_ref_store*)ref_store;
+	struct write_create_symref_arg arg = {
+		.stack = refs->stack,
+		.refname = refname,
+		.target = target,
+		.logmsg = logmsg
+	};
+	return stack_add(refs->stack, &write_create_symref_table, &arg);
+}
+
+
+struct write_rename_arg {
+	struct stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct writer *writer, void *argv) {
+	struct write_rename_arg *arg = (struct write_rename_arg*)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	struct ref_record ref = {};
+	int err = stack_read_ref(arg->stack, arg->oldname, &ref);
+	if (err) {
+		goto exit;
+	}
+
+	// XXX should check that dest doesn't exist?
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct ref_record todo[2]  = {};
+		todo[0].ref_name =  (char *) arg->oldname;
+		todo[0].update_index = ts;
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+        if (ref.value != NULL) {
+                struct log_record todo[2] = {};
+                fill_log_record(&todo[0]);
+                fill_log_record(&todo[1]);
+
+                todo[0].ref_name = (char *) arg->oldname;
+                todo[0].update_index = ts;
+                todo[0].message = (char*)arg->logmsg;
+                todo[0].old_hash = ref.value;
+                todo[0].new_hash = NULL;
+
+                todo[1].ref_name = (char *) arg->newname;
+                todo[1].update_index = ts;
+                todo[1].old_hash = NULL;
+                todo[1].new_hash = ref.value;
+                todo[1].message = (char*) arg->logmsg;
+
+		err = writer_add_logs(writer, todo, 2);
+
+                clear_log_record(&todo[0]);
+                clear_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto exit;
+		}
+
+        } else {
+                // symrefs?
+        }
+
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			    const char *oldrefname, const char *newrefname,
+			    const char *logmsg)
+{
+	struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	return stack_add(refs->stack, &write_rename_table, &arg);
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			   const char *oldrefname, const char *newrefname,
+			   const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+        struct ref_iterator base;
+        struct iterator iter;
+        struct log_record log;
+        struct object_id oid;
+        char *last_name;
+};
+
+static int reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+        struct reftable_reflog_ref_iterator *ri = (struct reftable_reflog_ref_iterator*) ref_iterator;
+
+        while (1) {
+                int err = iterator_next_log(ri->iter, &ri->log);
+                if (err > 0) {
+                        return ITER_DONE;
+                }
+                if (err < 0) {
+                        return ITER_ERROR;
+                }
+
+                ri->base.refname = ri->log.ref_name;
+                if (ri->last_name != NULL && 0 == strcmp(ri->log.ref_name, ri->last_name)) {
+                        continue;
+                }
+
+                free(ri->last_name);
+                ri->last_name = xstrdup(ri->log.ref_name);
+                // XXX const?
+                memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ);
+                return ITER_OK;
+        }
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+                                             struct object_id *peeled)
+{
+        BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri = (struct reftable_reflog_ref_iterator*) ref_iterator;
+	log_record_clear(&ri->log);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance,
+	reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+        struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+        struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
+
+	struct merged_table *mt = stack_merged_table(refs->stack);
+        int err = merged_table_seek_log(mt, &ri->iter, "");
+        if (err < 0) {
+                free(ri);
+                return NULL;
+        }
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable, 1);
+	ri->base.oid = &ri->oid;
+
+	return empty_ref_iterator_begin();
+}
+
+static int reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+                                        const char *refname,
+                                        each_reflog_ent_fn fn, void *cb_data)
+{
+        struct iterator it = { };
+        struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
+        struct merged_table *mt = stack_merged_table(refs->stack);
+        int err = merged_table_seek_log(mt, &it, refname);
+        struct log_record log = {};
+
+        while (err == 0)  {
+                err = iterator_next_log(it, &log);
+                if (err != 0) {
+                        break;
+                }
+
+                if (0 != strcmp(log.ref_name, refname)) {
+                        break;
+                }
+
+                {
+                        struct object_id old_oid = {};
+                        struct object_id new_oid = {};
+
+                        memcpy(&old_oid.hash, log.old_hash, GIT_SHA1_RAWSZ);
+                        memcpy(&new_oid.hash, log.new_hash, GIT_SHA1_RAWSZ);
+
+                        // XXX committer = email? name?
+                        if (fn(&old_oid, &new_oid, log.name, log.time, log.tz_offset, log.message, cb_data)) {
+                                err = -1;
+                                break;
+                        }
+                }
+        }
+
+        log_record_clear(&log);
+        iterator_destroy(&it);
+        if (err > 0) {
+                err = 0;
+        }
+	return err;
+}
+
+static int reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+                                                     const char *refname,
+                                                     each_reflog_ent_fn fn,
+                                                     void *cb_data)
+{
+        struct iterator it = { };
+        struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
+        struct merged_table *mt = stack_merged_table(refs->stack);
+        int err = merged_table_seek_log(mt, &it, refname);
+
+        struct log_record *logs = NULL;
+        int cap = 0;
+        int len = 0;
+
+        printf("oldest first\n");
+        while (err == 0)  {
+                struct log_record log = {};
+                err = iterator_next_log(it, &log);
+                if (err != 0) {
+                        break;
+                }
+
+                if (0 != strcmp(log.ref_name, refname)) {
+                        break;
+                }
+
+                if (len == cap) {
+                        cap  =2*cap + 1;
+                        logs = realloc(logs, cap * sizeof(*logs));
+                }
+
+                logs[len++] = log;
+        }
+
+        for (int i = len; i--; ) {
+                struct log_record *log = &logs[i];
+                struct object_id old_oid = {};
+                struct object_id new_oid = {};
+
+                memcpy(&old_oid.hash, log->old_hash, GIT_SHA1_RAWSZ);
+                memcpy(&new_oid.hash, log->new_hash, GIT_SHA1_RAWSZ);
+
+                // XXX committer = email? name?
+                if (!fn(&old_oid, &new_oid, log->name, log->time, log->tz_offset, log->message, cb_data)) {
+                        err = -1;
+                        break;
+                }
+        }
+
+        for (int i = 0; i < len; i++) {
+                log_record_clear(&logs[i]);
+        }
+        free(logs);
+
+        iterator_destroy(&it);
+        if (err > 0) {
+                err = 0;
+        }
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+                                  const char *refname)
+{
+        // always exists.
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+                                  const char *refname, int force_create,
+                                  struct strbuf *err)
+{
+        return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+                                  const char *refname)
+{
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				const char *refname, const struct object_id *oid,
+				unsigned int flags,
+				reflog_expiry_prepare_fn prepare_fn,
+				reflog_expiry_should_prune_fn should_prune_fn,
+				reflog_expiry_cleanup_fn cleanup_fn,
+				void *policy_cb_data)
+{
+        /*
+          XXX
+
+          This doesn't fit with the reftable API. If we are expiring for space
+          reasons, the expiry should be combined with a compaction, and there
+          should be a policy that can be called for all refnames, not for a
+          single ref name.
+
+          If this is for cleaning up individual entries, we'll have to write
+          extra data to create tombstones.
+         */
+	return 0;
+}
+
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
+	struct ref_record ref = {};
+	int err = stack_read_ref(refs->stack, refname, &ref);
+	if (err) {
+		goto exit;
+	}
+	if (ref.target != NULL) {
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+ 	} else  {
+		memcpy(oid->hash, ref.value, GIT_SHA1_RAWSZ);
+	}
+
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire
+};
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH 0/5] Reftable support git-core
  2020-01-23 19:41 [PATCH 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                   ` (4 preceding siblings ...)
  2020-01-23 19:41 ` [PATCH 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-01-23 21:44 ` Junio C Hamano
  2020-01-27 13:52   ` Han-Wen Nienhuys
  2020-01-23 22:45 ` Stephan Beyer
  2020-01-27 14:22 ` [PATCH v2 " Han-Wen Nienhuys via GitGitGadget
  7 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-01-23 21:44 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This adds the reftable library, and hooks it up as a ref backend.

Just a quick impression before getting into details of individual
steps.

 * With this series, the reftable backend seems to take over as the
   default and only backend.  We would need to design and decide how
   repositories would specify which backend it uses (I personally do
   not think we need to allow more than one backend to be active at
   the same time) before we take this series out of RFC status.

 * What's reftable/VERSION file?  Does it really have what you
   intended to add?

 * Mixed indentation and many whitespace breakages make the code
   distracting to review.

 * Comparison with 0 is written as "if (!strcmp(a, b))" in this
   codebase, and never "if (0 != strcmp(a, b))".

 * We unfortunately do not use var defn "for (int i = 0; ..." yet.

 * We do not use // comments.

Thanks.


> Han-Wen Nienhuys (5):
>   setup.c: enable repo detection for reftable
>   create .git/refs in files-backend.c
>   Document how ref iterators and symrefs interact
>   Add reftable library
>   Reftable support for git-core
>
>  Makefile                  |   23 +-
>  builtin/init-db.c         |    2 -
>  refs.c                    |   18 +-
>  refs.h                    |    2 +
>  refs/files-backend.c      |    4 +
>  refs/refs-internal.h      |    4 +
>  refs/reftable-backend.c   |  780 ++++++++++++++++++++++++++++
>  reftable/README.md        |   12 +
>  reftable/VERSION          |  293 +++++++++++
>  reftable/basics.c         |  180 +++++++
>  reftable/basics.h         |   38 ++
>  reftable/block.c          |  382 ++++++++++++++
>  reftable/block.h          |   71 +++
>  reftable/block_test.c     |  145 ++++++
>  reftable/blocksource.h    |   20 +
>  reftable/bytes.c          |    0
>  reftable/constants.h      |   27 +
>  reftable/dump.c           |  102 ++++
>  reftable/file.c           |   98 ++++
>  reftable/iter.c           |  211 ++++++++
>  reftable/iter.h           |   56 ++
>  reftable/merged.c         |  262 ++++++++++
>  reftable/merged.h         |   34 ++
>  reftable/merged_test.c    |  247 +++++++++
>  reftable/pq.c             |  116 +++++
>  reftable/pq.h             |   34 ++
>  reftable/reader.c         |  672 ++++++++++++++++++++++++
>  reftable/reader.h         |   52 ++
>  reftable/record.c         | 1030 +++++++++++++++++++++++++++++++++++++
>  reftable/record.h         |   79 +++
>  reftable/record_test.c    |  313 +++++++++++
>  reftable/reftable.h       |  396 ++++++++++++++
>  reftable/reftable_test.c  |  434 ++++++++++++++++
>  reftable/slice.c          |  179 +++++++
>  reftable/slice.h          |   39 ++
>  reftable/slice_test.c     |   38 ++
>  reftable/stack.c          |  931 +++++++++++++++++++++++++++++++++
>  reftable/stack.h          |   40 ++
>  reftable/stack_test.c     |  265 ++++++++++
>  reftable/test_framework.c |   60 +++
>  reftable/test_framework.h |   63 +++
>  reftable/tree.c           |   60 +++
>  reftable/tree.h           |   24 +
>  reftable/tree_test.c      |   54 ++
>  reftable/writer.c         |  586 +++++++++++++++++++++
>  reftable/writer.h         |   46 ++
>  setup.c                   |   20 +-
>  47 files changed, 8530 insertions(+), 12 deletions(-)
>  create mode 100644 refs/reftable-backend.c
>  create mode 100644 reftable/README.md
>  create mode 100644 reftable/VERSION
>  create mode 100644 reftable/basics.c
>  create mode 100644 reftable/basics.h
>  create mode 100644 reftable/block.c
>  create mode 100644 reftable/block.h
>  create mode 100644 reftable/block_test.c
>  create mode 100644 reftable/blocksource.h
>  create mode 100644 reftable/bytes.c
>  create mode 100644 reftable/constants.h
>  create mode 100644 reftable/dump.c
>  create mode 100644 reftable/file.c
>  create mode 100644 reftable/iter.c
>  create mode 100644 reftable/iter.h
>  create mode 100644 reftable/merged.c
>  create mode 100644 reftable/merged.h
>  create mode 100644 reftable/merged_test.c
>  create mode 100644 reftable/pq.c
>  create mode 100644 reftable/pq.h
>  create mode 100644 reftable/reader.c
>  create mode 100644 reftable/reader.h
>  create mode 100644 reftable/record.c
>  create mode 100644 reftable/record.h
>  create mode 100644 reftable/record_test.c
>  create mode 100644 reftable/reftable.h
>  create mode 100644 reftable/reftable_test.c
>  create mode 100644 reftable/slice.c
>  create mode 100644 reftable/slice.h
>  create mode 100644 reftable/slice_test.c
>  create mode 100644 reftable/stack.c
>  create mode 100644 reftable/stack.h
>  create mode 100644 reftable/stack_test.c
>  create mode 100644 reftable/test_framework.c
>  create mode 100644 reftable/test_framework.h
>  create mode 100644 reftable/tree.c
>  create mode 100644 reftable/tree.h
>  create mode 100644 reftable/tree_test.c
>  create mode 100644 reftable/writer.c
>  create mode 100644 reftable/writer.h
>
>
> base-commit: 232378479ee6c66206d47a9be175e3a39682aea6
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/539

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH 0/5] Reftable support git-core
  2020-01-23 19:41 [PATCH 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                   ` (5 preceding siblings ...)
  2020-01-23 21:44 ` [PATCH 0/5] Reftable support git-core Junio C Hamano
@ 2020-01-23 22:45 ` Stephan Beyer
  2020-01-27 13:57   ` Han-Wen Nienhuys
  2020-01-27 14:22 ` [PATCH v2 " Han-Wen Nienhuys via GitGitGadget
  7 siblings, 1 reply; 409+ messages in thread
From: Stephan Beyer @ 2020-01-23 22:45 UTC (permalink / raw)
  To: git, Han-Wen Nienhuys

Hi Han-Wen,

On 1/23/20 8:41 PM, Han-Wen Nienhuys via GitGitGadget wrote:
> This adds the reftable library, and hooks it up as a ref backend.
>
> Han-Wen Nienhuys (5):
>   setup.c: enable repo detection for reftable
>   create .git/refs in files-backend.c
>   Document how ref iterators and symrefs interact
>   Add reftable library
>   Reftable support for git-core

I am most of the time just a curious reader on this list but as someone
who has no idea what the reftable library does (except that it can serve
as a "ref backend"), I would expect much more elaborate commit messages.
In particular, patch 4/5 (i.e. the commit message of the "add reftable
library" commit) should describe the purpose of the reftable library at
least briefly; and patch 5/5 should not contain shell commands and
output without context but a short description why the patch is doing
what it is doing. For example, if the use of the reftable library makes
certain git commands a lot faster, it should state that.

Thanks.

Stephan

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH 0/5] Reftable support git-core
  2020-01-23 21:44 ` [PATCH 0/5] Reftable support git-core Junio C Hamano
@ 2020-01-27 13:52   ` Han-Wen Nienhuys
  2020-01-27 13:57     ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-01-27 13:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git

On Thu, Jan 23, 2020 at 10:44 PM Junio C Hamano <gitster@pobox.com> wrote:
> > This adds the reftable library, and hooks it up as a ref backend.
>
> Just a quick impression before getting into details of individual
> steps.
>
>  * With this series, the reftable backend seems to take over as the
>    default and only backend.  We would need to design and decide how
>    repositories would specify which backend it uses (I personally do
>    not think we need to allow more than one backend to be active at
>    the same time) before we take this series out of RFC status.

Yes, obviously. Would you have concrete ideas on how this should work?

>  * What's reftable/VERSION file?  Does it really have what you
>    intended to add?

Fixed, and explained in reftable/VERSION.

>  * Mixed indentation and many whitespace breakages make the code
>    distracting to review.

Do you mean in the changes to refs/reftable-backend.c? Or the imported code?

The imported code is uniformly formatted with clang-format.

Is there a clang-format setting for uniformly formatting to Linux/Git
style? I really love automated formatting, so we don't have to spend
time debating irrelevant details.


>
>  * Comparison with 0 is written as "if (!strcmp(a, b))" in this
>    codebase, and never "if (0 != strcmp(a, b))".
>
>  * We unfortunately do not use var defn "for (int i = 0; ..." yet.
>
>  * We do not use // comments.
>

fixed.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH 0/5] Reftable support git-core
  2020-01-23 22:45 ` Stephan Beyer
@ 2020-01-27 13:57   ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-01-27 13:57 UTC (permalink / raw)
  To: Stephan Beyer; +Cc: git

I added some more background in one of the commit messages. You can
find extensive discussion here:

  https://github.com/google/reftable#background-reading


On Thu, Jan 23, 2020 at 11:45 PM Stephan Beyer <s-beyer@gmx.net> wrote:
>
> Hi Han-Wen,
>
> On 1/23/20 8:41 PM, Han-Wen Nienhuys via GitGitGadget wrote:
> > This adds the reftable library, and hooks it up as a ref backend.
> >
> > Han-Wen Nienhuys (5):
> >   setup.c: enable repo detection for reftable
> >   create .git/refs in files-backend.c
> >   Document how ref iterators and symrefs interact
> >   Add reftable library
> >   Reftable support for git-core
>
> I am most of the time just a curious reader on this list but as someone
> who has no idea what the reftable library does (except that it can serve
> as a "ref backend"), I would expect much more elaborate commit messages.
> In particular, patch 4/5 (i.e. the commit message of the "add reftable
> library" commit) should describe the purpose of the reftable library at
> least briefly; and patch 5/5 should not contain shell commands and
> output without context but a short description why the patch is doing
> what it is doing. For example, if the use of the reftable library makes
> certain git commands a lot faster, it should state that.
>
> Thanks.
>
> Stephan

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH 0/5] Reftable support git-core
  2020-01-27 13:52   ` Han-Wen Nienhuys
@ 2020-01-27 13:57     ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-01-27 13:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git

On Mon, Jan 27, 2020 at 2:52 PM Han-Wen Nienhuys <hanwen@xs4all.nl> wrote:
> >  * Mixed indentation and many whitespace breakages make the code
> >    distracting to review.
>
> Do you mean in the changes to refs/reftable-backend.c? Or the imported code?
>
> The imported code is uniformly formatted with clang-format.
>
> Is there a clang-format setting for uniformly formatting to Linux/Git
> style? I really love automated formatting, so we don't have to spend
> time debating irrelevant details.

Ah, there is a .clang-format entry in the git repository already. Thanks!

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v2 0/5] Reftable support git-core
  2020-01-23 19:41 [PATCH 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                   ` (6 preceding siblings ...)
  2020-01-23 22:45 ` Stephan Beyer
@ 2020-01-27 14:22 ` Han-Wen Nienhuys via GitGitGadget
  2020-01-27 14:22   ` [PATCH v2 1/5] setup.c: enable repo detection for reftable Han-Wen Nienhuys via GitGitGadget
                     ` (5 more replies)
  7 siblings, 6 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-01-27 14:22 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend.

At this point, I am mainly interested in feedback on the spots marked with
XXX in the Git source code, in particular, how to handle reflog expiry in
this backend.

v2

 * address Jun's nits.
 * address Dscho's portability comments
 * more background in commit messages.

Han-Wen Nienhuys (5):
  setup.c: enable repo detection for reftable
  create .git/refs in files-backend.c
  Document how ref iterators and symrefs interact
  Add reftable library
  Reftable support for git-core

 Makefile                  |   23 +-
 builtin/init-db.c         |    2 -
 refs.c                    |   18 +-
 refs.h                    |    2 +
 refs/files-backend.c      |    4 +
 refs/refs-internal.h      |    4 +
 refs/reftable-backend.c   |  812 +++++++++++++++++++++++++++
 reftable/LICENSE          |   31 ++
 reftable/README.md        |   17 +
 reftable/VERSION          |    5 +
 reftable/basics.c         |  196 +++++++
 reftable/basics.h         |   38 ++
 reftable/block.c          |  401 ++++++++++++++
 reftable/block.h          |   71 +++
 reftable/block_test.c     |  151 +++++
 reftable/blocksource.h    |   20 +
 reftable/bytes.c          |    0
 reftable/config.h         |    1 +
 reftable/constants.h      |   27 +
 reftable/dump.c           |   97 ++++
 reftable/file.c           |   97 ++++
 reftable/iter.c           |  230 ++++++++
 reftable/iter.h           |   56 ++
 reftable/merged.c         |  288 ++++++++++
 reftable/merged.h         |   34 ++
 reftable/merged_test.c    |  258 +++++++++
 reftable/pq.c             |  124 +++++
 reftable/pq.h             |   34 ++
 reftable/reader.c         |  710 ++++++++++++++++++++++++
 reftable/reader.h         |   52 ++
 reftable/record.c         | 1110 +++++++++++++++++++++++++++++++++++++
 reftable/record.h         |   79 +++
 reftable/record_test.c    |  332 +++++++++++
 reftable/reftable.h       |  399 +++++++++++++
 reftable/reftable_test.c  |  481 ++++++++++++++++
 reftable/slice.c          |  199 +++++++
 reftable/slice.h          |   39 ++
 reftable/slice_test.c     |   38 ++
 reftable/stack.c          |  985 ++++++++++++++++++++++++++++++++
 reftable/stack.h          |   40 ++
 reftable/stack_test.c     |  281 ++++++++++
 reftable/system.h         |   36 ++
 reftable/test_framework.c |   67 +++
 reftable/test_framework.h |   64 +++
 reftable/tree.c           |   66 +++
 reftable/tree.h           |   24 +
 reftable/tree_test.c      |   61 ++
 reftable/writer.c         |  624 +++++++++++++++++++++
 reftable/writer.h         |   46 ++
 setup.c                   |   20 +-
 50 files changed, 8782 insertions(+), 12 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/slice_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/system.h
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h


base-commit: bc7a3d4dc04dd719e7c8c35ebd7a6e6651c5c5b6
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v1:

 1:  fd2baf1628 = 1:  174b98f6db setup.c: enable repo detection for reftable
 2:  bc643d0b0c = 2:  d7d642dcf6 create .git/refs in files-backend.c
 3:  1a01e0b1b5 = 3:  9cf185b51f Document how ref iterators and symrefs interact
 4:  3c86bd1d7e ! 4:  2106ff286b Add reftable library
     @@ -2,23 +2,92 @@
      
          Add reftable library
      
     +    Reftable is a new format for storing the ref database. It provides the
     +    following benefits:
     +
     +     * Simple and fast atomic ref transactions, including multiple refs and reflogs.
     +     * Compact storage of ref data.
     +     * Fast look ups of ref data.
     +     * Case-sensitive ref names on Windows/OSX, regardless of file system
     +     * Eliminates file/directory conflicts in ref names
     +
     +    Further context and motivation can be found in background reading:
     +
     +    * Spec: https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md
     +
     +    * Original discussion on JGit-dev:  https://www.eclipse.org/lists/jgit-dev/msg03389.html
     +
     +    * First design discussion on git@vger: https://public-inbox.org/git/CAJo=hJtTp2eA3z9wW9cHo-nA7kK40vVThqh6inXpbCcqfdMP9g@mail.gmail.com/
     +
     +    * Last design discussion on git@vger: https://public-inbox.org/git/CAJo=hJsZcAM9sipdVr7TMD-FD2V2W6_pvMQ791EGCDsDkQ033w@mail.gmail.com/
     +
     +    * First attempt at implementation: https://public-inbox.org/git/CAP8UFD0PPZSjBnxCA7ez91vBuatcHKQ+JUWvTD1iHcXzPBjPBg@mail.gmail.com/
     +
     +    * libgit2 support issue: https://github.com/libgit2/libgit2/issues
     +
     +    * GitLab support issue: https://gitlab.com/gitlab-org/git/issues/6
     +
     +    * go-git support issue: https://github.com/src-d/go-git/issues/1059
     +
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
          Change-Id: Id396ff42be8b42b9e11f194a32e2f95b8250c109
      
     + diff --git a/reftable/LICENSE b/reftable/LICENSE
     + new file mode 100644
     + --- /dev/null
     + +++ b/reftable/LICENSE
     +@@
     ++BSD License
     ++
     ++Copyright (c) 2020, Google LLC
     ++All rights reserved.
     ++
     ++Redistribution and use in source and binary forms, with or without
     ++modification, are permitted provided that the following conditions are
     ++met:
     ++
     ++* Redistributions of source code must retain the above copyright notice,
     ++this list of conditions and the following disclaimer.
     ++
     ++* Redistributions in binary form must reproduce the above copyright
     ++notice, this list of conditions and the following disclaimer in the
     ++documentation and/or other materials provided with the distribution.
     ++
     ++* Neither the name of Google LLC nor the names of its contributors may
     ++be used to endorse or promote products derived from this software
     ++without specific prior written permission.
     ++
     ++THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
     ++"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
     ++LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
     ++A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
     ++OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
     ++SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
     ++LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
     ++DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
     ++THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
     ++(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
     ++OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
     +
       diff --git a/reftable/README.md b/reftable/README.md
       new file mode 100644
       --- /dev/null
       +++ b/reftable/README.md
      @@
     -+The source code in this directory comes from https://github.com/google/reftable
     -+and can be updated by doing:
      +
     -+   rm -rf reftable-repo && \
     -+   git clone https://github.com/google/reftable reftable-repo && \
     ++The source code in this directory comes from https://github.com/google/reftable.
     ++
     ++The VERSION file keeps track of the current version of the reftable library.
     ++
     ++To update the library, do:
     ++
     ++   ((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
     ++    git clone https://github.com/google/reftable reftable-repo) && \
      +   cp reftable-repo/c/*.[ch] reftable/ && \
     -+   cp reftable-repo/LICENSE reftable/
     ++   cp reftable-repo/LICENSE reftable/ &&
      +   git --git-dir reftable-repo/.git show --no-patch origin/master \
     -+      > reftable/VERSION
     ++    > reftable/VERSION && \
     ++   echo '/* empty */' > reftable/config.h
      +
      +Bugfixes should be accompanied by a test and applied to upstream project at
      +https://github.com/google/reftable.
     @@ -28,299 +97,11 @@
       --- /dev/null
       +++ b/reftable/VERSION
      @@
     -+commit 887250932a9c39a15aee26aef59df2300c77c03f
     ++commit c616d53b88657c3a5fe4d2e7243a48effc34c626
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Wed Jan 22 19:50:16 2020 +0100
     -+
     -+    C: implement reflog expiry
     -+
     -+diff --git a/c/reftable.h b/c/reftable.h
     -+index 2f44973..d760af9 100644
     -+--- a/c/reftable.h
     -++++ b/c/reftable.h
     -+@@ -359,8 +359,17 @@ void stack_destroy(struct stack *st);
     -+ /* reloads the stack if necessary. */
     -+ int stack_reload(struct stack *st);
     -+ 
     -+-/* compacts all reftables into a giant table. */
     -+-int stack_compact_all(struct stack *st);
     -++/* Policy for expiring reflog entries. */
     -++struct log_expiry_config {
     -++  /* Drop entries older than this timestamp */
     -++  uint64_t time;
     -++
     -++  /* Drop older entries */
     -++  uint64_t min_update_index;
     -++};
     -++
     -++/* compacts all reftables into a giant table. Expire reflog entries if config is non-NULL */
     -++int stack_compact_all(struct stack *st, struct log_expiry_config *config);
     -+ 
     -+ /* heuristically compact unbalanced table stack. */
     -+ int stack_auto_compact(struct stack *st);
     -+diff --git a/c/stack.c b/c/stack.c
     -+index d67828f..641ac52 100644
     -+--- a/c/stack.c
     -++++ b/c/stack.c
     -+@@ -119,7 +119,7 @@ static struct reader **stack_copy_readers(struct stack *st, int cur_len) {
     -+   return cur;
     -+ }
     -+ 
     -+-static int stack_reload_once(struct stack *st, char **names) {
     -++static int stack_reload_once(struct stack *st, char **names, bool reuse_open) {
     -+   int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
     -+   struct reader **cur = stack_copy_readers(st, cur_len);
     -+   int err = 0;
     -+@@ -136,7 +136,7 @@ static int stack_reload_once(struct stack *st, char **names) {
     -+ 
     -+     // this is linear; we assume compaction keeps the number of tables
     -+     // under control so this is not quadratic.
     -+-    for (int j = 0; j < cur_len; j++) {
     -++    for (int j = 0; reuse_open && j < cur_len; j++) {
     -+       if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
     -+         rd = cur[j];
     -+         cur[j] = NULL;
     -+@@ -207,7 +207,7 @@ static int tv_cmp(struct timeval *a, struct timeval *b) {
     -+   return udiff;
     -+ }
     -+ 
     -+-int stack_reload(struct stack *st) {
     -++static int stack_reload_maybe_reuse(struct stack *st, bool reuse_open) {
     -+   struct timeval deadline = {};
     -+   int err = gettimeofday(&deadline, NULL);
     -+   int64_t delay = 0;
     -+@@ -238,7 +238,7 @@ int stack_reload(struct stack *st) {
     -+       free_names(names);
     -+       return err;
     -+     }
     -+-    err = stack_reload_once(st, names);
     -++    err = stack_reload_once(st, names, reuse_open);
     -+     if (err == 0) {
     -+       free_names(names);
     -+       break;
     -+@@ -269,6 +269,10 @@ int stack_reload(struct stack *st) {
     -+   return 0;
     -+ }
     -+ 
     -++int stack_reload(struct stack *st) {
     -++  return stack_reload_maybe_reuse(st, true);
     -++}
     -++
     -+ // -1 = error
     -+ // 0 = up to date
     -+ // 1 = changed.
     -+@@ -471,7 +475,7 @@ uint64_t stack_next_update_index(struct stack *st) {
     -+ }
     -+ 
     -+ static int stack_compact_locked(struct stack *st, int first, int last,
     -+-                                struct slice *temp_tab) {
     -++                                struct slice *temp_tab, struct log_expiry_config *config) {
     -+   struct slice next_name = {};
     -+   int tab_fd = -1;
     -+   struct writer *wr = NULL;
     -+@@ -488,7 +492,7 @@ static int stack_compact_locked(struct stack *st, int first, int last,
     -+   tab_fd = mkstemp((char *)slice_as_string(temp_tab));
     -+   wr = new_writer(fd_writer, &tab_fd, &st->config);
     -+ 
     -+-  err = stack_write_compact(st, wr, first, last);
     -++  err = stack_write_compact(st, wr, first, last, config);
     -+   if (err < 0) {
     -+     goto exit;
     -+   }
     -+@@ -515,7 +519,7 @@ static int stack_compact_locked(struct stack *st, int first, int last,
     -+ }
     -+ 
     -+ int stack_write_compact(struct stack *st, struct writer *wr, int first,
     -+-                        int last) {
     -++                        int last, struct log_expiry_config *config) {
     -+   int subtabs_len = last - first + 1;
     -+   struct reader **subtabs = calloc(sizeof(struct reader *), last - first + 1);
     -+   struct merged_table *mt = NULL;
     -+@@ -580,6 +584,16 @@ int stack_write_compact(struct stack *st, struct writer *wr, int first,
     -+       continue;
     -+     }
     -+ 
     -++    // XXX collect stats?
     -++
     -++    if (config != NULL && config->time > 0 && log.time < config->time) {
     -++      continue;
     -++    }
     -++
     -++    if (config != NULL && config->min_update_index > 0 && log.update_index < config->min_update_index) {
     -++      continue;
     -++    }
     -++
     -+     err = writer_add_log(wr, &log);
     -+     if (err < 0) {
     -+       break;
     -+@@ -599,7 +613,7 @@ int stack_write_compact(struct stack *st, struct writer *wr, int first,
     -+ }
     -+ 
     -+ // <  0: error. 0 == OK, > 0 attempt failed; could retry.
     -+-static int stack_compact_range(struct stack *st, int first, int last) {
     -++static int stack_compact_range(struct stack *st, int first, int last, struct log_expiry_config *expiry) {
     -+   struct slice temp_tab_name = {};
     -+   struct slice new_table_name = {};
     -+   struct slice lock_file_name = {};
     -+@@ -612,7 +626,7 @@ static int stack_compact_range(struct stack *st, int first, int last) {
     -+   char **delete_on_success = calloc(sizeof(char *), compact_count + 1);
     -+   char **subtable_locks = calloc(sizeof(char *), compact_count + 1);
     -+ 
     -+-  if (first >= last) {
     -++  if (first > last || (expiry == NULL && first == last)) {
     -+     err = 0;
     -+     goto exit;
     -+   }
     -+@@ -676,7 +690,7 @@ static int stack_compact_range(struct stack *st, int first, int last) {
     -+   }
     -+   have_lock = false;
     -+ 
     -+-  err = stack_compact_locked(st, first, last, &temp_tab_name);
     -++  err = stack_compact_locked(st, first, last, &temp_tab_name, expiry);
     -+   if (err < 0) {
     -+     goto exit;
     -+   }
     -+@@ -739,10 +753,12 @@ static int stack_compact_range(struct stack *st, int first, int last) {
     -+   have_lock = false;
     -+ 
     -+   for (char **p = delete_on_success; *p; p++) {
     -+-    unlink(*p);
     -++    if (0 != strcmp(*p, slice_as_string(&new_table_path))) {
     -++      unlink(*p);
     -++    }
     -+   }
     -+ 
     -+-  err = stack_reload(st);
     -++  err = stack_reload_maybe_reuse(st, first < last);
     -+ exit:
     -+   for (char **p = subtable_locks; *p; p++) {
     -+     unlink(*p);
     -+@@ -764,12 +780,12 @@ static int stack_compact_range(struct stack *st, int first, int last) {
     -+   return err;
     -+ }
     -+ 
     -+-int stack_compact_all(struct stack *st) {
     -+-  return stack_compact_range(st, 0, st->merged->stack_len - 1);
     -++int stack_compact_all(struct stack *st, struct log_expiry_config *config) {
     -++  return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
     -+ }
     -+ 
     -+-static int stack_compact_range_stats(struct stack *st, int first, int last) {
     -+-  int err = stack_compact_range(st, first, last);
     -++static int stack_compact_range_stats(struct stack *st, int first, int last, struct log_expiry_config *config) {
     -++  int err = stack_compact_range(st, first, last, config);
     -+   if (err > 0) {
     -+     st->stats.failures++;
     -+   }
     -+@@ -856,7 +872,7 @@ int stack_auto_compact(struct stack *st) {
     -+   struct segment seg = suggest_compaction_segment(sizes, st->merged->stack_len);
     -+   free(sizes);
     -+   if (segment_size(&seg) > 0) {
     -+-    return stack_compact_range_stats(st, seg.start, seg.end - 1);
     -++    return stack_compact_range_stats(st, seg.start, seg.end - 1, NULL);
     -+   }
     -+ 
     -+   return 0;
     -+diff --git a/c/stack.h b/c/stack.h
     -+index 8cb2cb0..092a03b 100644
     -+--- a/c/stack.h
     -++++ b/c/stack.h
     -+@@ -25,7 +25,7 @@ int read_lines(const char *filename, char ***lines);
     -+ int stack_try_add(struct stack *st,
     -+                   int (*write_table)(struct writer *wr, void *arg), void *arg);
     -+ int stack_write_compact(struct stack *st, struct writer *wr, int first,
     -+-                        int last);
     -++                        int last, struct log_expiry_config *config);
     -+ int fastlog2(uint64_t sz);
     -+ 
     -+ struct segment {
     -+diff --git a/c/stack_test.c b/c/stack_test.c
     -+index c49d8b8..8c0cde0 100644
     -+--- a/c/stack_test.c
     -++++ b/c/stack_test.c
     -+@@ -121,7 +121,7 @@ void test_stack_add(void) {
     -+     assert_err(err);
     -+   }
     -+ 
     -+-  err = stack_compact_all(st);
     -++  err = stack_compact_all(st, NULL);
     -+   assert_err(err);
     -+ 
     -+   for (int i = 0; i < N; i++) {
     -+@@ -186,7 +186,73 @@ void test_suggest_compaction_segment(void) {
     -+   }
     -+ }
     -+ 
     -++void test_reflog_expire(void) {
     -++  char dir[256] = "/tmp/stack.XXXXXX";
     -++  assert(mkdtemp(dir));
     -++  printf("%s\n", dir);
     -++  char fn[256] = "";
     -++  strcat(fn, dir);
     -++  strcat(fn, "/refs");
     -++
     -++  struct write_options cfg = {};
     -++  struct stack *st = NULL;
     -++  int err = new_stack(&st, dir, fn, cfg);
     -++  assert_err(err);
     -++
     -++  struct log_record logs[20] = {};
     -++  int N = ARRAYSIZE(logs) -1;
     -++  for (int i = 1; i <= N; i++) {
     -++    char buf[256];
     -++    sprintf(buf, "branch%02d", i);
     -++
     -++    logs[i].ref_name = strdup(buf);
     -++    logs[i].update_index = i;
     -++    logs[i].time = i;
     -++    logs[i].new_hash = malloc(SHA1_SIZE);
     -++    logs[i].email = strdup("identity@invalid");
     -++    set_test_hash(logs[i].new_hash, i);
     -++  }
     -++
     -++  for (int i = 1; i <= N; i++) {
     -++    int err = stack_add(st, &write_test_log, &logs[i]);
     -++    assert_err(err);
     -++  }
     -++
     -++  err = stack_compact_all(st, NULL);
     -++  assert_err(err);
     -++
     -++  struct log_expiry_config expiry = {
     -++    .time = 10,
     -++  };
     -++  err = stack_compact_all(st, &expiry);
     -++  assert_err(err);
     -++
     -++  struct log_record log = {};
     -++  err = stack_read_log(st, logs[9].ref_name, &log);
     -++  assert(err == 1);
     -++
     -++  err = stack_read_log(st, logs[11].ref_name, &log);
     -++  assert_err(err);
     -++
     -++  expiry.min_update_index = 15;
     -++  err = stack_compact_all(st, &expiry);
     -++  assert_err(err);
     -++
     -++  err = stack_read_log(st, logs[14].ref_name, &log);
     -++  assert(err == 1);
     -++
     -++  err = stack_read_log(st, logs[16].ref_name, &log);
     -++  assert_err(err);
     -++
     -++  // cleanup
     -++  stack_destroy(st);
     -++  for (int i = 0; i < N; i++) {
     -++    log_record_clear(&logs[i]);
     -++  }
     -++}
     -++
     -+ int main() {
     -++  add_test_case("test_reflog_expire", test_reflog_expire);
     -+   add_test_case("test_suggest_compaction_segment",
     -+                 &test_suggest_compaction_segment);
     -+   add_test_case("test_sizes_to_segments", &test_sizes_to_segments);
     ++Date:   Mon Jan 27 15:05:43 2020 +0100
     ++
     ++    C: ban // comments
      
       diff --git a/reftable/basics.c b/reftable/basics.c
       new file mode 100644
     @@ -337,175 +118,191 @@
      +
      +#include "basics.h"
      +
     -+#include <stdio.h>
     -+#include <stdlib.h>
     -+#include <string.h>
     ++#include "system.h"
      +
     -+void put_u24(byte *out, uint32_t i) {
     -+  out[0] = (byte)((i >> 16) & 0xff);
     -+  out[1] = (byte)((i >> 8) & 0xff);
     -+  out[2] = (byte)((i)&0xff);
     ++void put_u24(byte *out, uint32_t i)
     ++{
     ++	out[0] = (byte)((i >> 16) & 0xff);
     ++	out[1] = (byte)((i >> 8) & 0xff);
     ++	out[2] = (byte)((i)&0xff);
      +}
      +
     -+uint32_t get_u24(byte *in) {
     -+  return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 | (uint32_t)(in[2]);
     ++uint32_t get_u24(byte *in)
     ++{
     ++	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
     ++	       (uint32_t)(in[2]);
      +}
      +
     -+void put_u32(byte *out, uint32_t i) {
     -+  out[0] = (byte)((i >> 24) & 0xff);
     -+  out[1] = (byte)((i >> 16) & 0xff);
     -+  out[2] = (byte)((i >> 8) & 0xff);
     -+  out[3] = (byte)((i)&0xff);
     ++void put_u32(byte *out, uint32_t i)
     ++{
     ++	out[0] = (byte)((i >> 24) & 0xff);
     ++	out[1] = (byte)((i >> 16) & 0xff);
     ++	out[2] = (byte)((i >> 8) & 0xff);
     ++	out[3] = (byte)((i)&0xff);
      +}
      +
     -+uint32_t get_u32(byte *in) {
     -+  return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
     -+         (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
     ++uint32_t get_u32(byte *in)
     ++{
     ++	return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
     ++	       (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
      +}
      +
     -+void put_u64(byte *out, uint64_t v) {
     -+  for (int i = sizeof(uint64_t); i--;) {
     -+    out[i] = (byte)(v & 0xff);
     -+    v >>= 8;
     -+  }
     ++void put_u64(byte *out, uint64_t v)
     ++{
     ++	int i = 0;
     ++	for (i = sizeof(uint64_t); i--;) {
     ++		out[i] = (byte)(v & 0xff);
     ++		v >>= 8;
     ++	}
      +}
      +
     -+uint64_t get_u64(byte *out) {
     -+  uint64_t v = 0;
     -+  for (int i = 0; i < sizeof(uint64_t); i++) {
     -+    v = (v << 8) | (byte)(out[i] & 0xff);
     -+  }
     -+  return v;
     ++uint64_t get_u64(byte *out)
     ++{
     ++	uint64_t v = 0;
     ++	int i = 0;
     ++	for (i = 0; i < sizeof(uint64_t); i++) {
     ++		v = (v << 8) | (byte)(out[i] & 0xff);
     ++	}
     ++	return v;
      +}
      +
     -+void put_u16(byte *out, uint16_t i) {
     -+  out[0] = (byte)((i >> 8) & 0xff);
     -+  out[1] = (byte)((i)&0xff);
     ++void put_u16(byte *out, uint16_t i)
     ++{
     ++	out[0] = (byte)((i >> 8) & 0xff);
     ++	out[1] = (byte)((i)&0xff);
      +}
      +
     -+uint16_t get_u16(byte *in) {
     -+  return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
     ++uint16_t get_u16(byte *in)
     ++{
     ++	return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
      +}
      +
      +/*
      +  find smallest index i in [0, sz) at which f(i) is true, assuming
      +  that f is ascending. Return sz if f(i) is false for all indices.
      +*/
     -+int binsearch(int sz, int (*f)(int k, void *args), void *args) {
     -+  int lo = 0;
     -+  int hi = sz;
     -+
     -+  /* invariant: (hi == sz) || f(hi) == true
     -+     (lo == 0 && f(0) == true) || fi(lo) == false
     -+   */
     -+  while (hi - lo > 1) {
     -+    int mid = lo + (hi - lo) / 2;
     -+
     -+    int val = f(mid, args);
     -+    if (val) {
     -+      hi = mid;
     -+    } else {
     -+      lo = mid;
     -+    }
     -+  }
     -+
     -+  if (lo == 0) {
     -+    if (f(0, args)) {
     -+      return 0;
     -+    } else {
     -+      return 1;
     -+    }
     -+  }
     -+
     -+  return hi;
     -+}
     -+
     -+void free_names(char **a) {
     -+  char **p = a;
     -+  if (p == NULL) {
     -+    return;
     -+  }
     -+  while (*p) {
     -+    free(*p);
     -+    p++;
     -+  }
     -+  free(a);
     -+}
     -+
     -+int names_length(char **names) {
     -+  int len = 0;
     -+  for (char **p = names; *p; p++) {
     -+    len++;
     -+  }
     -+  return len;
     ++int binsearch(int sz, int (*f)(int k, void *args), void *args)
     ++{
     ++	int lo = 0;
     ++	int hi = sz;
     ++
     ++	/* invariant: (hi == sz) || f(hi) == true
     ++	   (lo == 0 && f(0) == true) || fi(lo) == false
     ++	 */
     ++	while (hi - lo > 1) {
     ++		int mid = lo + (hi - lo) / 2;
     ++
     ++		int val = f(mid, args);
     ++		if (val) {
     ++			hi = mid;
     ++		} else {
     ++			lo = mid;
     ++		}
     ++	}
     ++
     ++	if (lo == 0) {
     ++		if (f(0, args)) {
     ++			return 0;
     ++		} else {
     ++			return 1;
     ++		}
     ++	}
     ++
     ++	return hi;
     ++}
     ++
     ++void free_names(char **a)
     ++{
     ++	char **p = a;
     ++	if (p == NULL) {
     ++		return;
     ++	}
     ++	while (*p) {
     ++		free(*p);
     ++		p++;
     ++	}
     ++	free(a);
     ++}
     ++
     ++int names_length(char **names)
     ++{
     ++	int len = 0;
     ++	for (char **p = names; *p; p++) {
     ++		len++;
     ++	}
     ++	return len;
      +}
      +
      +/* parse a newline separated list of names. Empty names are discarded. */
     -+void parse_names(char *buf, int size, char ***namesp) {
     -+  char **names = NULL;
     -+  int names_cap = 0;
     -+  int names_len = 0;
     -+
     -+  char *p = buf;
     -+  char *end = buf + size;
     -+  while (p < end) {
     -+    char *next = strchr(p, '\n');
     -+    if (next != NULL) {
     -+      *next = 0;
     -+    } else {
     -+      next = end;
     -+    }
     -+    if (p < next) {
     -+      if (names_len == names_cap) {
     -+        names_cap = 2 * names_cap + 1;
     -+        names = realloc(names, names_cap * sizeof(char *));
     -+      }
     -+      names[names_len++] = strdup(p);
     -+    }
     -+    p = next + 1;
     -+  }
     -+
     -+  if (names_len == names_cap) {
     -+    names_cap = 2 * names_cap + 1;
     -+    names = realloc(names, names_cap * sizeof(char *));
     -+  }
     -+
     -+  names[names_len] = NULL;
     -+  *namesp = names;
     -+}
     -+
     -+int names_equal(char **a, char **b) {
     -+  while (*a && *b) {
     -+    if (0 != strcmp(*a, *b)) {
     -+      return 0;
     -+    }
     -+
     -+    a++;
     -+    b++;
     -+  }
     -+
     -+  return *a == *b;
     -+}
     -+
     -+const char *error_str(int err) {
     -+  switch (err) {
     -+    case IO_ERROR:
     -+      return "I/O error";
     -+    case FORMAT_ERROR:
     -+      return "FORMAT_ERROR";
     -+    case NOT_EXIST_ERROR:
     -+      return "NOT_EXIST_ERROR";
     -+    case LOCK_ERROR:
     -+      return "LOCK_ERROR";
     -+    case API_ERROR:
     -+      return "API_ERROR";
     -+    case ZLIB_ERROR:
     -+      return "ZLIB_ERROR";
     -+    case -1:
     -+      return "general error";
     -+    default:
     -+      return "unknown error code";
     -+  }
     ++void parse_names(char *buf, int size, char ***namesp)
     ++{
     ++	char **names = NULL;
     ++	int names_cap = 0;
     ++	int names_len = 0;
     ++
     ++	char *p = buf;
     ++	char *end = buf + size;
     ++	while (p < end) {
     ++		char *next = strchr(p, '\n');
     ++		if (next != NULL) {
     ++			*next = 0;
     ++		} else {
     ++			next = end;
     ++		}
     ++		if (p < next) {
     ++			if (names_len == names_cap) {
     ++				names_cap = 2 * names_cap + 1;
     ++				names = realloc(names,
     ++						names_cap * sizeof(char *));
     ++			}
     ++			names[names_len++] = strdup(p);
     ++		}
     ++		p = next + 1;
     ++	}
     ++
     ++	if (names_len == names_cap) {
     ++		names_cap = 2 * names_cap + 1;
     ++		names = realloc(names, names_cap * sizeof(char *));
     ++	}
     ++
     ++	names[names_len] = NULL;
     ++	*namesp = names;
     ++}
     ++
     ++int names_equal(char **a, char **b)
     ++{
     ++	while (*a && *b) {
     ++		if (strcmp(*a, *b)) {
     ++			return 0;
     ++		}
     ++
     ++		a++;
     ++		b++;
     ++	}
     ++
     ++	return *a == *b;
     ++}
     ++
     ++const char *error_str(int err)
     ++{
     ++	switch (err) {
     ++	case IO_ERROR:
     ++		return "I/O error";
     ++	case FORMAT_ERROR:
     ++		return "FORMAT_ERROR";
     ++	case NOT_EXIST_ERROR:
     ++		return "NOT_EXIST_ERROR";
     ++	case LOCK_ERROR:
     ++		return "LOCK_ERROR";
     ++	case API_ERROR:
     ++		return "API_ERROR";
     ++	case ZLIB_ERROR:
     ++		return "ZLIB_ERROR";
     ++	case -1:
     ++		return "general error";
     ++	default:
     ++		return "unknown error code";
     ++	}
      +}
      
       diff --git a/reftable/basics.h b/reftable/basics.h
     @@ -524,7 +321,7 @@
      +#ifndef BASICS_H
      +#define BASICS_H
      +
     -+#include <stdint.h>
     ++#include "system.h"
      +
      +#include "reftable.h"
      +
     @@ -567,10 +364,7 @@
      +
      +#include "block.h"
      +
     -+#include <assert.h>
     -+#include <stdio.h>
     -+#include <stdlib.h>
     -+#include <string.h>
     ++#include "system.h"
      +
      +#include "blocksource.h"
      +#include "constants.h"
     @@ -579,365 +373,387 @@
      +#include "zlib.h"
      +
      +int block_writer_register_restart(struct block_writer *w, int n, bool restart,
     -+                                  struct slice key);
     ++				  struct slice key);
      +
      +void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
     -+                       uint32_t block_size, uint32_t header_off,
     -+                       int hash_size) {
     -+  bw->buf = buf;
     -+  bw->hash_size = hash_size;
     -+  bw->block_size = block_size;
     -+  bw->header_off = header_off;
     -+  bw->buf[header_off] = typ;
     -+  bw->next = header_off + 4;
     -+  bw->restart_interval = 16;
     -+  bw->entries = 0;
     ++		       uint32_t block_size, uint32_t header_off, int hash_size)
     ++{
     ++	bw->buf = buf;
     ++	bw->hash_size = hash_size;
     ++	bw->block_size = block_size;
     ++	bw->header_off = header_off;
     ++	bw->buf[header_off] = typ;
     ++	bw->next = header_off + 4;
     ++	bw->restart_interval = 16;
     ++	bw->entries = 0;
      +}
      +
     -+byte block_writer_type(struct block_writer *bw) {
     -+  return bw->buf[bw->header_off];
     ++byte block_writer_type(struct block_writer *bw)
     ++{
     ++	return bw->buf[bw->header_off];
      +}
      +
      +/* adds the record to the block. Returns -1 if it does not fit, 0 on
      +   success */
     -+int block_writer_add(struct block_writer *w, struct record rec) {
     -+  struct slice empty = {};
     -+  struct slice last =
     -+      w->entries % w->restart_interval == 0 ? empty : w->last_key;
     -+  struct slice out = {
     -+      .buf = w->buf + w->next,
     -+      .len = w->block_size - w->next,
     -+  };
     -+
     -+  struct slice start = out;
     -+
     -+  bool restart = false;
     -+  struct slice key = {};
     -+  int n = 0;
     -+
     -+  record_key(rec, &key);
     -+  n = encode_key(&restart, out, last, key, record_val_type(rec));
     -+  if (n < 0) {
     -+    goto err;
     -+  }
     -+  out.buf += n;
     -+  out.len -= n;
     -+
     -+  n = record_encode(rec, out, w->hash_size);
     -+  if (n < 0) {
     -+    goto err;
     -+  }
     -+
     -+  out.buf += n;
     -+  out.len -= n;
     -+
     -+  if (block_writer_register_restart(w, start.len - out.len, restart, key) < 0) {
     -+    goto err;
     -+  }
     -+
     -+  free(slice_yield(&key));
     -+  return 0;
     ++int block_writer_add(struct block_writer *w, struct record rec)
     ++{
     ++	struct slice empty = {};
     ++	struct slice last = w->entries % w->restart_interval == 0 ? empty :
     ++								    w->last_key;
     ++	struct slice out = {
     ++		.buf = w->buf + w->next,
     ++		.len = w->block_size - w->next,
     ++	};
     ++
     ++	struct slice start = out;
     ++
     ++	bool restart = false;
     ++	struct slice key = {};
     ++	int n = 0;
     ++
     ++	record_key(rec, &key);
     ++	n = encode_key(&restart, out, last, key, record_val_type(rec));
     ++	if (n < 0) {
     ++		goto err;
     ++	}
     ++	out.buf += n;
     ++	out.len -= n;
     ++
     ++	n = record_encode(rec, out, w->hash_size);
     ++	if (n < 0) {
     ++		goto err;
     ++	}
     ++
     ++	out.buf += n;
     ++	out.len -= n;
     ++
     ++	if (block_writer_register_restart(w, start.len - out.len, restart,
     ++					  key) < 0) {
     ++		goto err;
     ++	}
     ++
     ++	free(slice_yield(&key));
     ++	return 0;
      +
      +err:
     -+  free(slice_yield(&key));
     -+  return -1;
     ++	free(slice_yield(&key));
     ++	return -1;
      +}
      +
      +int block_writer_register_restart(struct block_writer *w, int n, bool restart,
     -+                                  struct slice key) {
     -+  int rlen = w->restart_len;
     -+  if (rlen >= MAX_RESTARTS) {
     -+    restart = false;
     -+  }
     -+
     -+  if (restart) {
     -+    rlen++;
     -+  }
     -+  if (2 + 3 * rlen + n > w->block_size - w->next) {
     -+    return -1;
     -+  }
     -+  if (restart) {
     -+    if (w->restart_len == w->restart_cap) {
     -+      w->restart_cap = w->restart_cap * 2 + 1;
     -+      w->restarts = realloc(w->restarts, sizeof(uint32_t) * w->restart_cap);
     -+    }
     -+
     -+    w->restarts[w->restart_len++] = w->next;
     -+  }
     -+
     -+  w->next += n;
     -+  slice_copy(&w->last_key, key);
     -+  w->entries++;
     -+  return 0;
     -+}
     -+
     -+int block_writer_finish(struct block_writer *w) {
     -+  for (int i = 0; i < w->restart_len; i++) {
     -+    put_u24(w->buf + w->next, w->restarts[i]);
     -+    w->next += 3;
     -+  }
     -+
     -+  put_u16(w->buf + w->next, w->restart_len);
     -+  w->next += 2;
     -+  put_u24(w->buf + 1 + w->header_off, w->next);
     -+
     -+  if (block_writer_type(w) == BLOCK_TYPE_LOG) {
     -+    int block_header_skip = 4 + w->header_off;
     -+    struct slice compressed = {};
     -+    uLongf dest_len = 0, src_len = 0;
     -+    slice_resize(&compressed, w->next - block_header_skip);
     -+
     -+    dest_len = compressed.len;
     -+    src_len = w->next - block_header_skip;
     -+
     -+    if (Z_OK != compress2(compressed.buf, &dest_len, w->buf + block_header_skip,
     -+                          src_len, 9)) {
     -+      free(slice_yield(&compressed));
     -+      return ZLIB_ERROR;
     -+    }
     -+    memcpy(w->buf + block_header_skip, compressed.buf, dest_len);
     -+    w->next = dest_len + block_header_skip;
     -+  }
     -+  return w->next;
     -+}
     -+
     -+byte block_reader_type(struct block_reader *r) {
     -+  return r->block.data[r->header_off];
     ++				  struct slice key)
     ++{
     ++	int rlen = w->restart_len;
     ++	if (rlen >= MAX_RESTARTS) {
     ++		restart = false;
     ++	}
     ++
     ++	if (restart) {
     ++		rlen++;
     ++	}
     ++	if (2 + 3 * rlen + n > w->block_size - w->next) {
     ++		return -1;
     ++	}
     ++	if (restart) {
     ++		if (w->restart_len == w->restart_cap) {
     ++			w->restart_cap = w->restart_cap * 2 + 1;
     ++			w->restarts = realloc(
     ++				w->restarts, sizeof(uint32_t) * w->restart_cap);
     ++		}
     ++
     ++		w->restarts[w->restart_len++] = w->next;
     ++	}
     ++
     ++	w->next += n;
     ++	slice_copy(&w->last_key, key);
     ++	w->entries++;
     ++	return 0;
     ++}
     ++
     ++int block_writer_finish(struct block_writer *w)
     ++{
     ++	int i = 0;
     ++	for (i = 0; i < w->restart_len; i++) {
     ++		put_u24(w->buf + w->next, w->restarts[i]);
     ++		w->next += 3;
     ++	}
     ++
     ++	put_u16(w->buf + w->next, w->restart_len);
     ++	w->next += 2;
     ++	put_u24(w->buf + 1 + w->header_off, w->next);
     ++
     ++	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
     ++		int block_header_skip = 4 + w->header_off;
     ++		struct slice compressed = {};
     ++		uLongf dest_len = 0, src_len = 0;
     ++		slice_resize(&compressed, w->next - block_header_skip);
     ++
     ++		dest_len = compressed.len;
     ++		src_len = w->next - block_header_skip;
     ++
     ++		if (Z_OK != compress2(compressed.buf, &dest_len,
     ++				      w->buf + block_header_skip, src_len, 9)) {
     ++			free(slice_yield(&compressed));
     ++			return ZLIB_ERROR;
     ++		}
     ++		memcpy(w->buf + block_header_skip, compressed.buf, dest_len);
     ++		w->next = dest_len + block_header_skip;
     ++	}
     ++	return w->next;
     ++}
     ++
     ++byte block_reader_type(struct block_reader *r)
     ++{
     ++	return r->block.data[r->header_off];
      +}
      +
      +int block_reader_init(struct block_reader *br, struct block *block,
     -+                      uint32_t header_off, uint32_t table_block_size,
     -+                      int hash_size) {
     -+  uint32_t full_block_size = table_block_size;
     -+  byte typ = block->data[header_off];
     -+  uint32_t sz = get_u24(block->data + header_off + 1);
     -+
     -+  if (!is_block_type(typ)) {
     -+    return FORMAT_ERROR;
     -+  }
     -+
     -+  if (typ == BLOCK_TYPE_LOG) {
     -+    struct slice uncompressed = {};
     -+    int block_header_skip = 4 + header_off;
     -+    uLongf dst_len = sz - block_header_skip;
     -+    uLongf src_len = block->len - block_header_skip;
     -+
     -+    slice_resize(&uncompressed, sz);
     -+    memcpy(uncompressed.buf, block->data, block_header_skip);
     -+
     -+    if (Z_OK != uncompress2(uncompressed.buf + block_header_skip, &dst_len,
     -+                            block->data + block_header_skip, &src_len)) {
     -+      free(slice_yield(&uncompressed));
     -+      return ZLIB_ERROR;
     -+    }
     -+
     -+    block_source_return_block(block->source, block);
     -+    block->data = uncompressed.buf;
     -+    block->len = dst_len; /* XXX: 4 bytes missing? */
     -+    block->source = malloc_block_source();
     -+    full_block_size = src_len + block_header_skip;
     -+  } else if (full_block_size == 0) {
     -+    full_block_size = sz;
     -+  } else if (sz < full_block_size && sz < block->len && block->data[sz] != 0) {
     -+    // If the block is smaller than the full block size,
     -+    // it is padded (data followed by '\0') or the next
     -+    // block is unaligned.
     -+    full_block_size = sz;
     -+  }
     -+
     -+  {
     -+    uint16_t restart_count = get_u16(block->data + sz - 2);
     -+    uint32_t restart_start = sz - 2 - 3 * restart_count;
     -+
     -+    byte *restart_bytes = block->data + restart_start;
     -+
     -+    // transfer ownership.
     -+    br->block = *block;
     -+    block->data = NULL;
     -+    block->len = 0;
     -+
     -+    br->hash_size = hash_size;
     -+    br->block_len = restart_start;
     -+    br->full_block_size = full_block_size;
     -+    br->header_off = header_off;
     -+    br->restart_count = restart_count;
     -+    br->restart_bytes = restart_bytes;
     -+  }
     -+
     -+  return 0;
     -+}
     -+
     -+static uint32_t block_reader_restart_offset(struct block_reader *br, int i) {
     -+  return get_u24(br->restart_bytes + 3 * i);
     -+}
     -+
     -+void block_reader_start(struct block_reader *br, struct block_iter *it) {
     -+  it->br = br;
     -+  slice_resize(&it->last_key, 0);
     -+  it->next_off = br->header_off + 4;
     ++		      uint32_t header_off, uint32_t table_block_size,
     ++		      int hash_size)
     ++{
     ++	uint32_t full_block_size = table_block_size;
     ++	byte typ = block->data[header_off];
     ++	uint32_t sz = get_u24(block->data + header_off + 1);
     ++
     ++	if (!is_block_type(typ)) {
     ++		return FORMAT_ERROR;
     ++	}
     ++
     ++	if (typ == BLOCK_TYPE_LOG) {
     ++		struct slice uncompressed = {};
     ++		int block_header_skip = 4 + header_off;
     ++		uLongf dst_len = sz - block_header_skip;
     ++		uLongf src_len = block->len - block_header_skip;
     ++
     ++		slice_resize(&uncompressed, sz);
     ++		memcpy(uncompressed.buf, block->data, block_header_skip);
     ++
     ++		if (Z_OK !=
     ++		    uncompress2(uncompressed.buf + block_header_skip, &dst_len,
     ++				block->data + block_header_skip, &src_len)) {
     ++			free(slice_yield(&uncompressed));
     ++			return ZLIB_ERROR;
     ++		}
     ++
     ++		block_source_return_block(block->source, block);
     ++		block->data = uncompressed.buf;
     ++		block->len = dst_len; /* XXX: 4 bytes missing? */
     ++		block->source = malloc_block_source();
     ++		full_block_size = src_len + block_header_skip;
     ++	} else if (full_block_size == 0) {
     ++		full_block_size = sz;
     ++	} else if (sz < full_block_size && sz < block->len &&
     ++		   block->data[sz] != 0) {
     ++		/* If the block is smaller than the full block size, it is
     ++                   padded (data followed by '\0') or the next block is
     ++                   unaligned. */
     ++		full_block_size = sz;
     ++	}
     ++
     ++	{
     ++		uint16_t restart_count = get_u16(block->data + sz - 2);
     ++		uint32_t restart_start = sz - 2 - 3 * restart_count;
     ++
     ++		byte *restart_bytes = block->data + restart_start;
     ++
     ++		/* transfer ownership. */
     ++		br->block = *block;
     ++		block->data = NULL;
     ++		block->len = 0;
     ++
     ++		br->hash_size = hash_size;
     ++		br->block_len = restart_start;
     ++		br->full_block_size = full_block_size;
     ++		br->header_off = header_off;
     ++		br->restart_count = restart_count;
     ++		br->restart_bytes = restart_bytes;
     ++	}
     ++
     ++	return 0;
     ++}
     ++
     ++static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
     ++{
     ++	return get_u24(br->restart_bytes + 3 * i);
     ++}
     ++
     ++void block_reader_start(struct block_reader *br, struct block_iter *it)
     ++{
     ++	it->br = br;
     ++	slice_resize(&it->last_key, 0);
     ++	it->next_off = br->header_off + 4;
      +}
      +
      +struct restart_find_args {
     -+  struct slice key;
     -+  struct block_reader *r;
     -+  int error;
     ++	struct slice key;
     ++	struct block_reader *r;
     ++	int error;
      +};
      +
     -+static int restart_key_less(int idx, void *args) {
     -+  struct restart_find_args *a = (struct restart_find_args *)args;
     -+  uint32_t off = block_reader_restart_offset(a->r, idx);
     -+  struct slice in = {
     -+      .buf = a->r->block.data + off,
     -+      .len = a->r->block_len - off,
     -+  };
     -+
     -+  /* the restart key is verbatim in the block, so this could avoid the
     -+     alloc for decoding the key */
     -+  struct slice rkey = {};
     -+  struct slice last_key = {};
     -+  byte unused_extra;
     -+  int n = decode_key(&rkey, &unused_extra, last_key, in);
     -+  if (n < 0) {
     -+    a->error = 1;
     -+    return -1;
     -+  }
     -+
     -+  {
     -+    int result = slice_compare(a->key, rkey);
     -+    free(slice_yield(&rkey));
     -+    return result;
     -+  }
     -+}
     -+
     -+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src) {
     -+  dest->br = src->br;
     -+  dest->next_off = src->next_off;
     -+  slice_copy(&dest->last_key, src->last_key);
     -+}
     -+
     -+// return < 0 for error, 0 for OK, > 0 for EOF.
     -+int block_iter_next(struct block_iter *it, struct record rec) {
     -+  if (it->next_off >= it->br->block_len) {
     -+    return 1;
     -+  }
     -+
     -+  {
     -+    struct slice in = {
     -+        .buf = it->br->block.data + it->next_off,
     -+        .len = it->br->block_len - it->next_off,
     -+    };
     -+    struct slice start = in;
     -+    struct slice key = {};
     -+    byte extra;
     -+    int n = decode_key(&key, &extra, it->last_key, in);
     -+    if (n < 0) {
     -+      return -1;
     -+    }
     -+
     -+    in.buf += n;
     -+    in.len -= n;
     -+    n = record_decode(rec, key, extra, in, it->br->hash_size);
     -+    if (n < 0) {
     -+      return -1;
     -+    }
     -+    in.buf += n;
     -+    in.len -= n;
     -+
     -+    slice_copy(&it->last_key, key);
     -+    it->next_off += start.len - in.len;
     -+    free(slice_yield(&key));
     -+    return 0;
     -+  }
     -+}
     -+
     -+int block_reader_first_key(struct block_reader *br, struct slice *key) {
     -+  struct slice empty = {};
     -+  int off = br->header_off + 4;
     -+  struct slice in = {
     -+      .buf = br->block.data + off,
     -+      .len = br->block_len - off,
     -+  };
     -+
     -+  byte extra = 0;
     -+  int n = decode_key(key, &extra, empty, in);
     -+  if (n < 0) {
     -+    return n;
     -+  }
     -+  return 0;
     -+}
     -+
     -+int block_iter_seek(struct block_iter *it, struct slice want) {
     -+  return block_reader_seek(it->br, it, want);
     -+}
     -+
     -+void block_iter_close(struct block_iter *it) {
     -+  free(slice_yield(&it->last_key));
     ++static int restart_key_less(int idx, void *args)
     ++{
     ++	struct restart_find_args *a = (struct restart_find_args *)args;
     ++	uint32_t off = block_reader_restart_offset(a->r, idx);
     ++	struct slice in = {
     ++		.buf = a->r->block.data + off,
     ++		.len = a->r->block_len - off,
     ++	};
     ++
     ++	/* the restart key is verbatim in the block, so this could avoid the
     ++	   alloc for decoding the key */
     ++	struct slice rkey = {};
     ++	struct slice last_key = {};
     ++	byte unused_extra;
     ++	int n = decode_key(&rkey, &unused_extra, last_key, in);
     ++	if (n < 0) {
     ++		a->error = 1;
     ++		return -1;
     ++	}
     ++
     ++	{
     ++		int result = slice_compare(a->key, rkey);
     ++		free(slice_yield(&rkey));
     ++		return result;
     ++	}
     ++}
     ++
     ++void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
     ++{
     ++	dest->br = src->br;
     ++	dest->next_off = src->next_off;
     ++	slice_copy(&dest->last_key, src->last_key);
     ++}
     ++
     ++/* return < 0 for error, 0 for OK, > 0 for EOF. */
     ++int block_iter_next(struct block_iter *it, struct record rec)
     ++{
     ++	if (it->next_off >= it->br->block_len) {
     ++		return 1;
     ++	}
     ++
     ++	{
     ++		struct slice in = {
     ++			.buf = it->br->block.data + it->next_off,
     ++			.len = it->br->block_len - it->next_off,
     ++		};
     ++		struct slice start = in;
     ++		struct slice key = {};
     ++		byte extra;
     ++		int n = decode_key(&key, &extra, it->last_key, in);
     ++		if (n < 0) {
     ++			return -1;
     ++		}
     ++
     ++		in.buf += n;
     ++		in.len -= n;
     ++		n = record_decode(rec, key, extra, in, it->br->hash_size);
     ++		if (n < 0) {
     ++			return -1;
     ++		}
     ++		in.buf += n;
     ++		in.len -= n;
     ++
     ++		slice_copy(&it->last_key, key);
     ++		it->next_off += start.len - in.len;
     ++		free(slice_yield(&key));
     ++		return 0;
     ++	}
     ++}
     ++
     ++int block_reader_first_key(struct block_reader *br, struct slice *key)
     ++{
     ++	struct slice empty = {};
     ++	int off = br->header_off + 4;
     ++	struct slice in = {
     ++		.buf = br->block.data + off,
     ++		.len = br->block_len - off,
     ++	};
     ++
     ++	byte extra = 0;
     ++	int n = decode_key(key, &extra, empty, in);
     ++	if (n < 0) {
     ++		return n;
     ++	}
     ++	return 0;
     ++}
     ++
     ++int block_iter_seek(struct block_iter *it, struct slice want)
     ++{
     ++	return block_reader_seek(it->br, it, want);
     ++}
     ++
     ++void block_iter_close(struct block_iter *it)
     ++{
     ++	free(slice_yield(&it->last_key));
      +}
      +
      +int block_reader_seek(struct block_reader *br, struct block_iter *it,
     -+                      struct slice want) {
     -+  struct restart_find_args args = {
     -+      .key = want,
     -+      .r = br,
     -+  };
     -+
     -+  int i = binsearch(br->restart_count, &restart_key_less, &args);
     -+  if (args.error) {
     -+    return -1;
     -+  }
     -+
     -+  it->br = br;
     -+  if (i > 0) {
     -+    i--;
     -+    it->next_off = block_reader_restart_offset(br, i);
     -+  } else {
     -+    it->next_off = br->header_off + 4;
     -+  }
     -+
     -+  {
     -+    struct record rec = new_record(block_reader_type(br));
     -+    struct slice key = {};
     -+    int result = 0;
     -+    int err = 0;
     -+    struct block_iter next = {};
     -+    while (true) {
     -+      block_iter_copy_from(&next, it);
     -+
     -+      err = block_iter_next(&next, rec);
     -+      if (err < 0) {
     -+        result = -1;
     -+        goto exit;
     -+      }
     -+
     -+      record_key(rec, &key);
     -+      if (err > 0 || slice_compare(key, want) >= 0) {
     -+        result = 0;
     -+        goto exit;
     -+      }
     -+
     -+      block_iter_copy_from(it, &next);
     -+    }
     -+
     -+  exit:
     -+    free(slice_yield(&key));
     -+    free(slice_yield(&next.last_key));
     -+    record_clear(rec);
     -+    free(record_yield(&rec));
     -+
     -+    return result;
     -+  }
     -+}
     -+
     -+void block_writer_reset(struct block_writer *bw) {
     -+  bw->restart_len = 0;
     -+  bw->last_key.len = 0;
     -+}
     -+
     -+void block_writer_clear(struct block_writer *bw) {
     -+  free(bw->restarts);
     -+  bw->restarts = NULL;
     -+  free(slice_yield(&bw->last_key));
     -+  // the block is not owned.
     ++		      struct slice want)
     ++{
     ++	struct restart_find_args args = {
     ++		.key = want,
     ++		.r = br,
     ++	};
     ++
     ++	int i = binsearch(br->restart_count, &restart_key_less, &args);
     ++	if (args.error) {
     ++		return -1;
     ++	}
     ++
     ++	it->br = br;
     ++	if (i > 0) {
     ++		i--;
     ++		it->next_off = block_reader_restart_offset(br, i);
     ++	} else {
     ++		it->next_off = br->header_off + 4;
     ++	}
     ++
     ++	{
     ++		struct record rec = new_record(block_reader_type(br));
     ++		struct slice key = {};
     ++		int result = 0;
     ++		int err = 0;
     ++		struct block_iter next = {};
     ++		while (true) {
     ++			block_iter_copy_from(&next, it);
     ++
     ++			err = block_iter_next(&next, rec);
     ++			if (err < 0) {
     ++				result = -1;
     ++				goto exit;
     ++			}
     ++
     ++			record_key(rec, &key);
     ++			if (err > 0 || slice_compare(key, want) >= 0) {
     ++				result = 0;
     ++				goto exit;
     ++			}
     ++
     ++			block_iter_copy_from(it, &next);
     ++		}
     ++
     ++	exit:
     ++		free(slice_yield(&key));
     ++		free(slice_yield(&next.last_key));
     ++		record_clear(rec);
     ++		free(record_yield(&rec));
     ++
     ++		return result;
     ++	}
     ++}
     ++
     ++void block_writer_reset(struct block_writer *bw)
     ++{
     ++	bw->restart_len = 0;
     ++	bw->last_key.len = 0;
     ++}
     ++
     ++void block_writer_clear(struct block_writer *bw)
     ++{
     ++	free(bw->restarts);
     ++	bw->restarts = NULL;
     ++	free(slice_yield(&bw->last_key));
     ++	/* the block is not owned. */
      +}
      
       diff --git a/reftable/block.h b/reftable/block.h
     @@ -961,22 +777,22 @@
      +#include "reftable.h"
      +
      +struct block_writer {
     -+  byte *buf;
     -+  uint32_t block_size;
     -+  uint32_t header_off;
     -+  int restart_interval;
     -+  int hash_size;
     -+
     -+  uint32_t next;
     -+  uint32_t *restarts;
     -+  uint32_t restart_len;
     -+  uint32_t restart_cap;
     -+  struct slice last_key;
     -+  int entries;
     ++	byte *buf;
     ++	uint32_t block_size;
     ++	uint32_t header_off;
     ++	int restart_interval;
     ++	int hash_size;
     ++
     ++	uint32_t next;
     ++	uint32_t *restarts;
     ++	uint32_t restart_len;
     ++	uint32_t restart_cap;
     ++	struct slice last_key;
     ++	int entries;
      +};
      +
      +void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
     -+                       uint32_t block_size, uint32_t header_off, int hash_size);
     ++		       uint32_t block_size, uint32_t header_off, int hash_size);
      +byte block_writer_type(struct block_writer *bw);
      +int block_writer_add(struct block_writer *w, struct record rec);
      +int block_writer_finish(struct block_writer *w);
     @@ -984,29 +800,29 @@
      +void block_writer_clear(struct block_writer *bw);
      +
      +struct block_reader {
     -+  uint32_t header_off;
     -+  struct block block;
     -+  int hash_size;
     -+
     -+  // size of the data, excluding restart data.
     -+  uint32_t block_len;
     -+  byte *restart_bytes;
     -+  uint32_t full_block_size;
     -+  uint16_t restart_count;
     ++	uint32_t header_off;
     ++	struct block block;
     ++	int hash_size;
     ++
     ++	/* size of the data, excluding restart data. */
     ++	uint32_t block_len;
     ++	byte *restart_bytes;
     ++	uint32_t full_block_size;
     ++	uint16_t restart_count;
      +};
      +
      +struct block_iter {
     -+  struct block_reader *br;
     -+  struct slice last_key;
     -+  uint32_t next_off;
     ++	struct block_reader *br;
     ++	struct slice last_key;
     ++	uint32_t next_off;
      +};
      +
      +int block_reader_init(struct block_reader *br, struct block *bl,
     -+                      uint32_t header_off, uint32_t table_block_size,
     -+                      int hash_size);
     ++		      uint32_t header_off, uint32_t table_block_size,
     ++		      int hash_size);
      +void block_reader_start(struct block_reader *br, struct block_iter *it);
      +int block_reader_seek(struct block_reader *br, struct block_iter *it,
     -+                      struct slice want);
     ++		      struct slice want);
      +byte block_reader_type(struct block_reader *r);
      +int block_reader_first_key(struct block_reader *br, struct slice *key);
      +
     @@ -1032,7 +848,7 @@
      +
      +#include "block.h"
      +
     -+#include <string.h>
     ++#include "system.h"
      +
      +#include "basics.h"
      +#include "constants.h"
     @@ -1041,131 +857,137 @@
      +#include "test_framework.h"
      +
      +struct binsearch_args {
     -+  int key;
     -+  int *arr;
     ++	int key;
     ++	int *arr;
      +};
      +
     -+static int binsearch_func(int i, void *void_args) {
     -+  struct binsearch_args *args = (struct binsearch_args *)void_args;
     -+
     -+  return args->key < args->arr[i];
     -+}
     -+
     -+void test_binsearch() {
     -+  int arr[] = {2, 4, 6, 8, 10};
     -+  int sz = ARRAYSIZE(arr);
     -+  struct binsearch_args args = {
     -+      .arr = arr,
     -+  };
     -+
     -+  for (int i = 1; i < 11; i++) {
     -+    args.key = i;
     -+    int res = binsearch(sz, &binsearch_func, &args);
     -+
     -+    if (res < sz) {
     -+      assert(args.key < arr[res]);
     -+      if (res > 0) {
     -+        assert(args.key >= arr[res - 1]);
     -+      }
     -+    } else {
     -+      assert(args.key == 10 || args.key == 11);
     -+    }
     -+  }
     -+}
     -+
     -+void test_block_read_write() {
     -+  const int header_off = 21;  // random
     -+  const int N = 30;
     -+  char *names[N];
     -+  const int block_size = 1024;
     -+  struct block block = {};
     -+  block.data = calloc(block_size, 1);
     -+  block.len = block_size;
     -+
     -+  struct block_writer bw = {};
     -+  block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size, header_off,
     -+                    SHA1_SIZE);
     -+  struct ref_record ref = {};
     -+  struct record rec = {};
     -+  record_from_ref(&rec, &ref);
     -+
     -+  for (int i = 0; i < N; i++) {
     -+    char name[100];
     -+    sprintf(name, "branch%02d", i);
     -+
     -+    byte hash[SHA1_SIZE];
     -+    memset(hash, i, sizeof(hash));
     -+
     -+    ref.ref_name = name;
     -+    ref.value = hash;
     -+    names[i] = strdup(name);
     -+    int n = block_writer_add(&bw, rec);
     -+    ref.ref_name = NULL;
     -+    ref.value = NULL;
     -+    assert(n == 0);
     -+  }
     -+
     -+  int n = block_writer_finish(&bw);
     -+  assert(n > 0);
     -+
     -+  block_writer_clear(&bw);
     -+
     -+  struct block_reader br = {};
     -+  block_reader_init(&br, &block, header_off, block_size, SHA1_SIZE);
     -+
     -+  struct block_iter it = {};
     -+  block_reader_start(&br, &it);
     -+
     -+  int j = 0;
     -+  while (true) {
     -+    int r = block_iter_next(&it, rec);
     -+    assert(r >= 0);
     -+    if (r > 0) {
     -+      break;
     -+    }
     -+    assert_streq(names[j], ref.ref_name);
     -+    j++;
     -+  }
     -+
     -+  record_clear(rec);
     -+  block_iter_close(&it);
     -+
     -+  struct slice want = {};
     -+  for (int i = 0; i < N; i++) {
     -+    slice_set_string(&want, names[i]);
     -+
     -+    struct block_iter it = {};
     -+    int n = block_reader_seek(&br, &it, want);
     -+    assert(n == 0);
     -+
     -+    n = block_iter_next(&it, rec);
     -+    assert(n == 0);
     -+
     -+    assert_streq(names[i], ref.ref_name);
     -+
     -+    want.len--;
     -+    n = block_reader_seek(&br, &it, want);
     -+    assert(n == 0);
     -+
     -+    n = block_iter_next(&it, rec);
     -+    assert(n == 0);
     -+    assert_streq(names[10 * (i / 10)], ref.ref_name);
     -+
     -+    block_iter_close(&it);
     -+  }
     -+
     -+  record_clear(rec);
     -+  free(block.data);
     -+  free(slice_yield(&want));
     -+  for (int i = 0; i < N; i++) {
     -+    free(names[i]);
     -+  }
     -+}
     -+
     -+int main() {
     -+  add_test_case("binsearch", &test_binsearch);
     -+  add_test_case("block_read_write", &test_block_read_write);
     -+  test_main();
     ++static int binsearch_func(int i, void *void_args)
     ++{
     ++	struct binsearch_args *args = (struct binsearch_args *)void_args;
     ++
     ++	return args->key < args->arr[i];
     ++}
     ++
     ++void test_binsearch()
     ++{
     ++	int arr[] = { 2, 4, 6, 8, 10 };
     ++	int sz = ARRAYSIZE(arr);
     ++	struct binsearch_args args = {
     ++		.arr = arr,
     ++	};
     ++
     ++	int i = 0;
     ++	for (i = 1; i < 11; i++) {
     ++		args.key = i;
     ++		int res = binsearch(sz, &binsearch_func, &args);
     ++
     ++		if (res < sz) {
     ++			assert(args.key < arr[res]);
     ++			if (res > 0) {
     ++				assert(args.key >= arr[res - 1]);
     ++			}
     ++		} else {
     ++			assert(args.key == 10 || args.key == 11);
     ++		}
     ++	}
     ++}
     ++
     ++void test_block_read_write()
     ++{
     ++	const int header_off = 21; /* random */
     ++	const int N = 30;
     ++	char *names[N];
     ++	const int block_size = 1024;
     ++	struct block block = {};
     ++	block.data = calloc(block_size, 1);
     ++	block.len = block_size;
     ++
     ++	struct block_writer bw = {};
     ++	block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size,
     ++			  header_off, SHA1_SIZE);
     ++	struct ref_record ref = {};
     ++	struct record rec = {};
     ++	record_from_ref(&rec, &ref);
     ++
     ++	int i = 0;
     ++	for (i = 0; i < N; i++) {
     ++		char name[100];
     ++		snprintf(name, sizeof(name), "branch%02d", i);
     ++
     ++		byte hash[SHA1_SIZE];
     ++		memset(hash, i, sizeof(hash));
     ++
     ++		ref.ref_name = name;
     ++		ref.value = hash;
     ++		names[i] = strdup(name);
     ++		int n = block_writer_add(&bw, rec);
     ++		ref.ref_name = NULL;
     ++		ref.value = NULL;
     ++		assert(n == 0);
     ++	}
     ++
     ++	int n = block_writer_finish(&bw);
     ++	assert(n > 0);
     ++
     ++	block_writer_clear(&bw);
     ++
     ++	struct block_reader br = {};
     ++	block_reader_init(&br, &block, header_off, block_size, SHA1_SIZE);
     ++
     ++	struct block_iter it = {};
     ++	block_reader_start(&br, &it);
     ++
     ++	int j = 0;
     ++	while (true) {
     ++		int r = block_iter_next(&it, rec);
     ++		assert(r >= 0);
     ++		if (r > 0) {
     ++			break;
     ++		}
     ++		assert_streq(names[j], ref.ref_name);
     ++		j++;
     ++	}
     ++
     ++	record_clear(rec);
     ++	block_iter_close(&it);
     ++
     ++	struct slice want = {};
     ++	for (i = 0; i < N; i++) {
     ++		slice_set_string(&want, names[i]);
     ++
     ++		struct block_iter it = {};
     ++		int n = block_reader_seek(&br, &it, want);
     ++		assert(n == 0);
     ++
     ++		n = block_iter_next(&it, rec);
     ++		assert(n == 0);
     ++
     ++		assert_streq(names[i], ref.ref_name);
     ++
     ++		want.len--;
     ++		n = block_reader_seek(&br, &it, want);
     ++		assert(n == 0);
     ++
     ++		n = block_iter_next(&it, rec);
     ++		assert(n == 0);
     ++		assert_streq(names[10 * (i / 10)], ref.ref_name);
     ++
     ++		block_iter_close(&it);
     ++	}
     ++
     ++	record_clear(rec);
     ++	free(block.data);
     ++	free(slice_yield(&want));
     ++	for (i = 0; i < N; i++) {
     ++		free(names[i]);
     ++	}
     ++}
     ++
     ++int main()
     ++{
     ++	add_test_case("binsearch", &test_binsearch);
     ++	add_test_case("block_read_write", &test_block_read_write);
     ++	test_main();
      +}
      
       diff --git a/reftable/blocksource.h b/reftable/blocksource.h
     @@ -1188,7 +1010,7 @@
      +
      +uint64_t block_source_size(struct block_source source);
      +int block_source_read_block(struct block_source source, struct block *dest,
     -+                            uint64_t off, uint32_t size);
     ++			    uint64_t off, uint32_t size);
      +void block_source_return_block(struct block_source source, struct block *ret);
      +void block_source_close(struct block_source source);
      +
     @@ -1197,6 +1019,13 @@
       diff --git a/reftable/bytes.c b/reftable/bytes.c
       new file mode 100644
      
     + diff --git a/reftable/config.h b/reftable/config.h
     + new file mode 100644
     + --- /dev/null
     + +++ b/reftable/config.h
     +@@
     ++/* empty */
     +
       diff --git a/reftable/constants.h b/reftable/constants.h
       new file mode 100644
       --- /dev/null
     @@ -1235,107 +1064,102 @@
       --- /dev/null
       +++ b/reftable/dump.c
      @@
     -+// Copyright 2020 Google Inc. All rights reserved.
     -+//
     -+// Licensed under the Apache License, Version 2.0 (the "License");
     -+// you may not use this file except in compliance with the License.
     -+// You may obtain a copy of the License at
     -+//
     -+//    http://www.apache.org/licenses/LICENSE-2.0
     -+//
     -+// Unless required by applicable law or agreed to in writing, software
     -+// distributed under the License is distributed on an "AS IS" BASIS,
     -+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     -+// See the License for the specific language governing permissions and
     -+// limitations under the License.
     ++/*
     ++Copyright 2020 Google LLC
      +
     -+#include <stdio.h>
     -+#include <string.h>
     -+#include <unistd.h>
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include "system.h"
      +
      +#include "reftable.h"
      +
     -+static int dump_table(const char *tablename) {
     -+  struct block_source src = {};
     -+  int err = block_source_from_file(&src, tablename);
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+
     -+  struct reader *r = NULL;
     -+  err = new_reader(&r, src, tablename);
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+
     -+  {
     -+    struct iterator it = {};
     -+    err = reader_seek_ref(r, &it, "");
     -+    if (err < 0) {
     -+      return err;
     -+    }
     -+
     -+    struct ref_record ref = {};
     -+    while (1) {
     -+      err = iterator_next_ref(it, &ref);
     -+      if (err > 0) {
     -+        break;
     -+      }
     -+      if (err < 0) {
     -+        return err;
     -+      }
     -+      ref_record_print(&ref, 20);
     -+    }
     -+    iterator_destroy(&it);
     -+    ref_record_clear(&ref);
     -+  }
     -+
     -+  {
     -+    struct iterator it = {};
     -+    err = reader_seek_log(r, &it, "");
     -+    if (err < 0) {
     -+      return err;
     -+    }
     -+    struct log_record log = {};
     -+    while (1) {
     -+      err = iterator_next_log(it, &log);
     -+      if (err > 0) {
     -+        break;
     -+      }
     -+      if (err < 0) {
     -+        return err;
     -+      }
     -+      log_record_print(&log, 20);
     -+    }
     -+    iterator_destroy(&it);
     -+    log_record_clear(&log);
     -+  }
     -+  return 0;
     -+}
     -+
     -+int main(int argc, char *argv[]) {
     -+  int opt;
     -+  const char *table = NULL;
     -+  while ((opt = getopt(argc, argv, "t:")) != -1) {
     -+    switch (opt) {
     -+      case 't':
     -+        table = strdup(optarg);
     -+        break;
     -+      case '?':
     -+        printf("usage: %s [-table tablefile]\n", argv[0]);
     -+        return 2;
     -+        break;
     -+    }
     -+  }
     -+
     -+  if (table != NULL) {
     -+    int err = dump_table(table);
     -+    if (err < 0) {
     -+      fprintf(stderr, "%s: %s: %s\n", argv[0], table, error_str(err));
     -+      return 1;
     -+    }
     -+  }
     -+  return 0;
     ++static int dump_table(const char *tablename)
     ++{
     ++	struct block_source src = {};
     ++	int err = block_source_from_file(&src, tablename);
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	struct reader *r = NULL;
     ++	err = new_reader(&r, src, tablename);
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	{
     ++		struct iterator it = {};
     ++		err = reader_seek_ref(r, &it, "");
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++
     ++		struct ref_record ref = {};
     ++		while (1) {
     ++			err = iterator_next_ref(it, &ref);
     ++			if (err > 0) {
     ++				break;
     ++			}
     ++			if (err < 0) {
     ++				return err;
     ++			}
     ++			ref_record_print(&ref, 20);
     ++		}
     ++		iterator_destroy(&it);
     ++		ref_record_clear(&ref);
     ++	}
     ++
     ++	{
     ++		struct iterator it = {};
     ++		err = reader_seek_log(r, &it, "");
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++		struct log_record log = {};
     ++		while (1) {
     ++			err = iterator_next_log(it, &log);
     ++			if (err > 0) {
     ++				break;
     ++			}
     ++			if (err < 0) {
     ++				return err;
     ++			}
     ++			log_record_print(&log, 20);
     ++		}
     ++		iterator_destroy(&it);
     ++		log_record_clear(&log);
     ++	}
     ++	return 0;
     ++}
     ++
     ++int main(int argc, char *argv[])
     ++{
     ++	int opt;
     ++	const char *table = NULL;
     ++	while ((opt = getopt(argc, argv, "t:")) != -1) {
     ++		switch (opt) {
     ++		case 't':
     ++			table = strdup(optarg);
     ++			break;
     ++		case '?':
     ++			printf("usage: %s [-table tablefile]\n", argv[0]);
     ++			return 2;
     ++			break;
     ++		}
     ++	}
     ++
     ++	if (table != NULL) {
     ++		int err = dump_table(table);
     ++		if (err < 0) {
     ++			fprintf(stderr, "%s: %s: %s\n", argv[0], table,
     ++				error_str(err));
     ++			return 1;
     ++		}
     ++	}
     ++	return 0;
      +}
      
       diff --git a/reftable/file.c b/reftable/file.c
     @@ -1351,15 +1175,7 @@
      +https://developers.google.com/open-source/licenses/bsd
      +*/
      +
     -+#include <assert.h>
     -+#include <errno.h>
     -+#include <fcntl.h>
     -+#include <stdio.h>
     -+#include <stdlib.h>
     -+#include <string.h>
     -+#include <sys/stat.h>
     -+#include <sys/types.h>
     -+#include <unistd.h>
     ++#include "system.h"
      +
      +#include "block.h"
      +#include "iter.h"
     @@ -1368,78 +1184,85 @@
      +#include "tree.h"
      +
      +struct file_block_source {
     -+  int fd;
     -+  uint64_t size;
     ++	int fd;
     ++	uint64_t size;
      +};
      +
     -+static uint64_t file_size(void *b) {
     -+  return ((struct file_block_source *)b)->size;
     ++static uint64_t file_size(void *b)
     ++{
     ++	return ((struct file_block_source *)b)->size;
      +}
      +
     -+static void file_return_block(void *b, struct block *dest) {
     -+  memset(dest->data, 0xff, dest->len);
     -+  free(dest->data);
     ++static void file_return_block(void *b, struct block *dest)
     ++{
     ++	memset(dest->data, 0xff, dest->len);
     ++	free(dest->data);
      +}
      +
     -+static void file_close(void *b) {
     -+  int fd = ((struct file_block_source *)b)->fd;
     -+  if (fd > 0) {
     -+    close(fd);
     -+    ((struct file_block_source *)b)->fd = 0;
     -+  }
     ++static void file_close(void *b)
     ++{
     ++	int fd = ((struct file_block_source *)b)->fd;
     ++	if (fd > 0) {
     ++		close(fd);
     ++		((struct file_block_source *)b)->fd = 0;
     ++	}
      +
     -+  free(b);
     ++	free(b);
      +}
      +
      +static int file_read_block(void *v, struct block *dest, uint64_t off,
     -+                           uint32_t size) {
     -+  struct file_block_source *b = (struct file_block_source *)v;
     -+  assert(off + size <= b->size);
     -+  dest->data = malloc(size);
     -+  if (pread(b->fd, dest->data, size, off) != size) {
     -+    return -1;
     -+  }
     -+  dest->len = size;
     -+  return size;
     ++			   uint32_t size)
     ++{
     ++	struct file_block_source *b = (struct file_block_source *)v;
     ++	assert(off + size <= b->size);
     ++	dest->data = malloc(size);
     ++	if (pread(b->fd, dest->data, size, off) != size) {
     ++		return -1;
     ++	}
     ++	dest->len = size;
     ++	return size;
      +}
      +
      +struct block_source_vtable file_vtable = {
     -+    .size = &file_size,
     -+    .read_block = &file_read_block,
     -+    .return_block = &file_return_block,
     -+    .close = &file_close,
     ++	.size = &file_size,
     ++	.read_block = &file_read_block,
     ++	.return_block = &file_return_block,
     ++	.close = &file_close,
      +};
      +
     -+int block_source_from_file(struct block_source *bs, const char *name) {
     -+  struct stat st = {};
     -+  int err = 0;
     -+  int fd = open(name, O_RDONLY);
     -+  if (fd < 0) {
     -+    if (errno == ENOENT) {
     -+      return NOT_EXIST_ERROR;
     -+    }
     -+    return -1;
     -+  }
     -+
     -+  err = fstat(fd, &st);
     -+  if (err < 0) {
     -+    return -1;
     -+  }
     -+
     -+  {
     -+    struct file_block_source *p = calloc(sizeof(struct file_block_source), 1);
     -+    p->size = st.st_size;
     -+    p->fd = fd;
     -+
     -+    bs->ops = &file_vtable;
     -+    bs->arg = p;
     -+  }
     -+  return 0;
     -+}
     -+
     -+int fd_writer(void *arg, byte *data, int sz) {
     -+  int *fdp = (int *)arg;
     -+  return write(*fdp, data, sz);
     ++int block_source_from_file(struct block_source *bs, const char *name)
     ++{
     ++	struct stat st = {};
     ++	int err = 0;
     ++	int fd = open(name, O_RDONLY);
     ++	if (fd < 0) {
     ++		if (errno == ENOENT) {
     ++			return NOT_EXIST_ERROR;
     ++		}
     ++		return -1;
     ++	}
     ++
     ++	err = fstat(fd, &st);
     ++	if (err < 0) {
     ++		return -1;
     ++	}
     ++
     ++	{
     ++		struct file_block_source *p =
     ++			calloc(sizeof(struct file_block_source), 1);
     ++		p->size = st.st_size;
     ++		p->fd = fd;
     ++
     ++		bs->ops = &file_vtable;
     ++		bs->arg = p;
     ++	}
     ++	return 0;
     ++}
     ++
     ++int fd_writer(void *arg, byte *data, int sz)
     ++{
     ++	int *fdp = (int *)arg;
     ++	return write(*fdp, data, sz);
      +}
      
       diff --git a/reftable/iter.c b/reftable/iter.c
     @@ -1457,206 +1280,225 @@
      +
      +#include "iter.h"
      +
     -+#include <stdlib.h>
     -+#include <string.h>
     ++#include "system.h"
      +
      +#include "block.h"
      +#include "constants.h"
      +#include "reader.h"
      +#include "reftable.h"
      +
     -+bool iterator_is_null(struct iterator it) { return it.ops == NULL; }
     ++bool iterator_is_null(struct iterator it)
     ++{
     ++	return it.ops == NULL;
     ++}
      +
     -+static int empty_iterator_next(void *arg, struct record rec) { return 1; }
     ++static int empty_iterator_next(void *arg, struct record rec)
     ++{
     ++	return 1;
     ++}
      +
     -+static void empty_iterator_close(void *arg) {}
     ++static void empty_iterator_close(void *arg)
     ++{
     ++}
      +
      +struct iterator_vtable empty_vtable = {
     -+    .next = &empty_iterator_next,
     -+    .close = &empty_iterator_close,
     ++	.next = &empty_iterator_next,
     ++	.close = &empty_iterator_close,
      +};
      +
     -+void iterator_set_empty(struct iterator *it) {
     -+  it->iter_arg = NULL;
     -+  it->ops = &empty_vtable;
     ++void iterator_set_empty(struct iterator *it)
     ++{
     ++	it->iter_arg = NULL;
     ++	it->ops = &empty_vtable;
      +}
      +
     -+int iterator_next(struct iterator it, struct record rec) {
     -+  return it.ops->next(it.iter_arg, rec);
     ++int iterator_next(struct iterator it, struct record rec)
     ++{
     ++	return it.ops->next(it.iter_arg, rec);
      +}
      +
     -+void iterator_destroy(struct iterator *it) {
     -+  if (it->ops == NULL) {
     -+    return;
     -+  }
     -+  it->ops->close(it->iter_arg);
     -+  it->ops = NULL;
     -+  free(it->iter_arg);
     -+  it->iter_arg = NULL;
     ++void iterator_destroy(struct iterator *it)
     ++{
     ++	if (it->ops == NULL) {
     ++		return;
     ++	}
     ++	it->ops->close(it->iter_arg);
     ++	it->ops = NULL;
     ++	free(it->iter_arg);
     ++	it->iter_arg = NULL;
      +}
      +
     -+int iterator_next_ref(struct iterator it, struct ref_record *ref) {
     -+  struct record rec = {};
     -+  record_from_ref(&rec, ref);
     -+  return iterator_next(it, rec);
     ++int iterator_next_ref(struct iterator it, struct ref_record *ref)
     ++{
     ++	struct record rec = {};
     ++	record_from_ref(&rec, ref);
     ++	return iterator_next(it, rec);
      +}
      +
     -+int iterator_next_log(struct iterator it, struct log_record *log) {
     -+  struct record rec = {};
     -+  record_from_log(&rec, log);
     -+  return iterator_next(it, rec);
     ++int iterator_next_log(struct iterator it, struct log_record *log)
     ++{
     ++	struct record rec = {};
     ++	record_from_log(&rec, log);
     ++	return iterator_next(it, rec);
      +}
      +
     -+static void filtering_ref_iterator_close(void *iter_arg) {
     -+  struct filtering_ref_iterator *fri =
     -+      (struct filtering_ref_iterator *)iter_arg;
     -+  free(slice_yield(&fri->oid));
     -+  iterator_destroy(&fri->it);
     ++static void filtering_ref_iterator_close(void *iter_arg)
     ++{
     ++	struct filtering_ref_iterator *fri =
     ++		(struct filtering_ref_iterator *)iter_arg;
     ++	free(slice_yield(&fri->oid));
     ++	iterator_destroy(&fri->it);
      +}
      +
     -+static int filtering_ref_iterator_next(void *iter_arg, struct record rec) {
     -+  struct filtering_ref_iterator *fri =
     -+      (struct filtering_ref_iterator *)iter_arg;
     -+  struct ref_record *ref = (struct ref_record *)rec.data;
     ++static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
     ++{
     ++	struct filtering_ref_iterator *fri =
     ++		(struct filtering_ref_iterator *)iter_arg;
     ++	struct ref_record *ref = (struct ref_record *)rec.data;
      +
     -+  while (true) {
     -+    int err = iterator_next_ref(fri->it, ref);
     -+    if (err != 0) {
     -+      return err;
     -+    }
     ++	while (true) {
     ++		int err = iterator_next_ref(fri->it, ref);
     ++		if (err != 0) {
     ++			return err;
     ++		}
      +
     -+    if (fri->double_check) {
     -+      struct iterator it = {};
     ++		if (fri->double_check) {
     ++			struct iterator it = {};
      +
     -+      int err = reader_seek_ref(fri->r, &it, ref->ref_name);
     -+      if (err == 0) {
     -+        err = iterator_next_ref(it, ref);
     -+      }
     ++			int err = reader_seek_ref(fri->r, &it, ref->ref_name);
     ++			if (err == 0) {
     ++				err = iterator_next_ref(it, ref);
     ++			}
      +
     -+      iterator_destroy(&it);
     ++			iterator_destroy(&it);
      +
     -+      if (err < 0) {
     -+        return err;
     -+      }
     ++			if (err < 0) {
     ++				return err;
     ++			}
      +
     -+      if (err > 0) {
     -+        continue;
     -+      }
     -+    }
     ++			if (err > 0) {
     ++				continue;
     ++			}
     ++		}
      +
     -+    if ((ref->target_value != NULL &&
     -+         0 == memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
     -+        (ref->value != NULL &&
     -+         0 == memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
     -+      return 0;
     -+    }
     -+  }
     ++		if ((ref->target_value != NULL &&
     ++		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
     ++		    (ref->value != NULL &&
     ++		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
     ++			return 0;
     ++		}
     ++	}
      +}
      +
      +struct iterator_vtable filtering_ref_iterator_vtable = {
     -+    .next = &filtering_ref_iterator_next,
     -+    .close = &filtering_ref_iterator_close,
     ++	.next = &filtering_ref_iterator_next,
     ++	.close = &filtering_ref_iterator_close,
      +};
      +
      +void iterator_from_filtering_ref_iterator(struct iterator *it,
     -+                                          struct filtering_ref_iterator *fri) {
     -+  it->iter_arg = fri;
     -+  it->ops = &filtering_ref_iterator_vtable;
     -+}
     -+
     -+static void indexed_table_ref_iter_close(void *p) {
     -+  struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
     -+  block_iter_close(&it->cur);
     -+  reader_return_block(it->r, &it->block_reader.block);
     -+  free(slice_yield(&it->oid));
     -+}
     -+
     -+static int indexed_table_ref_iter_next_block(
     -+    struct indexed_table_ref_iter *it) {
     -+  if (it->offset_idx == it->offset_len) {
     -+    it->finished = true;
     -+    return 1;
     -+  }
     -+
     -+  reader_return_block(it->r, &it->block_reader.block);
     -+
     -+  {
     -+    uint64_t off = it->offsets[it->offset_idx++];
     -+    int err =
     -+        reader_init_block_reader(it->r, &it->block_reader, off, BLOCK_TYPE_REF);
     -+    if (err < 0) {
     -+      return err;
     -+    }
     -+    if (err > 0) {
     -+      // indexed block does not exist.
     -+      return FORMAT_ERROR;
     -+    }
     -+  }
     -+  block_reader_start(&it->block_reader, &it->cur);
     -+  return 0;
     -+}
     -+
     -+static int indexed_table_ref_iter_next(void *p, struct record rec) {
     -+  struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
     -+  struct ref_record *ref = (struct ref_record *)rec.data;
     -+
     -+  while (true) {
     -+    int err = block_iter_next(&it->cur, rec);
     -+    if (err < 0) {
     -+      return err;
     -+    }
     -+
     -+    if (err > 0) {
     -+      err = indexed_table_ref_iter_next_block(it);
     -+      if (err < 0) {
     -+        return err;
     -+      }
     -+
     -+      if (it->finished) {
     -+        return 1;
     -+      }
     -+      continue;
     -+    }
     -+
     -+    if (0 == memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
     -+        0 == memcmp(it->oid.buf, ref->value, it->oid.len)) {
     -+      return 0;
     -+    }
     -+  }
     ++					  struct filtering_ref_iterator *fri)
     ++{
     ++	it->iter_arg = fri;
     ++	it->ops = &filtering_ref_iterator_vtable;
     ++}
     ++
     ++static void indexed_table_ref_iter_close(void *p)
     ++{
     ++	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
     ++	block_iter_close(&it->cur);
     ++	reader_return_block(it->r, &it->block_reader.block);
     ++	free(slice_yield(&it->oid));
     ++}
     ++
     ++static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
     ++{
     ++	if (it->offset_idx == it->offset_len) {
     ++		it->finished = true;
     ++		return 1;
     ++	}
     ++
     ++	reader_return_block(it->r, &it->block_reader.block);
     ++
     ++	{
     ++		uint64_t off = it->offsets[it->offset_idx++];
     ++		int err = reader_init_block_reader(it->r, &it->block_reader,
     ++						   off, BLOCK_TYPE_REF);
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++		if (err > 0) {
     ++			/* indexed block does not exist. */
     ++			return FORMAT_ERROR;
     ++		}
     ++	}
     ++	block_reader_start(&it->block_reader, &it->cur);
     ++	return 0;
     ++}
     ++
     ++static int indexed_table_ref_iter_next(void *p, struct record rec)
     ++{
     ++	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
     ++	struct ref_record *ref = (struct ref_record *)rec.data;
     ++
     ++	while (true) {
     ++		int err = block_iter_next(&it->cur, rec);
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++
     ++		if (err > 0) {
     ++			err = indexed_table_ref_iter_next_block(it);
     ++			if (err < 0) {
     ++				return err;
     ++			}
     ++
     ++			if (it->finished) {
     ++				return 1;
     ++			}
     ++			continue;
     ++		}
     ++
     ++		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
     ++		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
     ++			return 0;
     ++		}
     ++	}
      +}
      +
      +int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
     -+                               struct reader *r, byte *oid, int oid_len,
     -+                               uint64_t *offsets, int offset_len) {
     -+  struct indexed_table_ref_iter *itr =
     -+      calloc(sizeof(struct indexed_table_ref_iter), 1);
     -+  int err = 0;
     ++			       struct reader *r, byte *oid, int oid_len,
     ++			       uint64_t *offsets, int offset_len)
     ++{
     ++	struct indexed_table_ref_iter *itr =
     ++		calloc(sizeof(struct indexed_table_ref_iter), 1);
     ++	int err = 0;
      +
     -+  itr->r = r;
     -+  slice_resize(&itr->oid, oid_len);
     -+  memcpy(itr->oid.buf, oid, oid_len);
     ++	itr->r = r;
     ++	slice_resize(&itr->oid, oid_len);
     ++	memcpy(itr->oid.buf, oid, oid_len);
      +
     -+  itr->offsets = offsets;
     -+  itr->offset_len = offset_len;
     ++	itr->offsets = offsets;
     ++	itr->offset_len = offset_len;
      +
     -+  err = indexed_table_ref_iter_next_block(itr);
     -+  if (err < 0) {
     -+    free(itr);
     -+  } else {
     -+    *dest = itr;
     -+  }
     -+  return err;
     ++	err = indexed_table_ref_iter_next_block(itr);
     ++	if (err < 0) {
     ++		free(itr);
     ++	} else {
     ++		*dest = itr;
     ++	}
     ++	return err;
      +}
      +
      +struct iterator_vtable indexed_table_ref_iter_vtable = {
     -+    .next = &indexed_table_ref_iter_next,
     -+    .close = &indexed_table_ref_iter_close,
     ++	.next = &indexed_table_ref_iter_next,
     ++	.close = &indexed_table_ref_iter_close,
      +};
      +
      +void iterator_from_indexed_table_ref_iter(struct iterator *it,
     -+                                          struct indexed_table_ref_iter *itr) {
     -+  it->iter_arg = itr;
     -+  it->ops = &indexed_table_ref_iter_vtable;
     ++					  struct indexed_table_ref_iter *itr)
     ++{
     ++	it->iter_arg = itr;
     ++	it->ops = &indexed_table_ref_iter_vtable;
      +}
      
       diff --git a/reftable/iter.h b/reftable/iter.h
     @@ -1680,8 +1522,8 @@
      +#include "slice.h"
      +
      +struct iterator_vtable {
     -+  int (*next)(void *iter_arg, struct record rec);
     -+  void (*close)(void *iter_arg);
     ++	int (*next)(void *iter_arg, struct record rec);
     ++	void (*close)(void *iter_arg);
      +};
      +
      +void iterator_set_empty(struct iterator *it);
     @@ -1689,35 +1531,35 @@
      +bool iterator_is_null(struct iterator it);
      +
      +struct filtering_ref_iterator {
     -+  struct reader *r;
     -+  struct slice oid;
     -+  bool double_check;
     -+  struct iterator it;
     ++	struct reader *r;
     ++	struct slice oid;
     ++	bool double_check;
     ++	struct iterator it;
      +};
      +
      +void iterator_from_filtering_ref_iterator(struct iterator *,
     -+                                          struct filtering_ref_iterator *);
     ++					  struct filtering_ref_iterator *);
      +
      +struct indexed_table_ref_iter {
     -+  struct reader *r;
     -+  struct slice oid;
     -+
     -+  // mutable
     -+  uint64_t *offsets;
     -+
     -+  // Points to the next offset to read.
     -+  int offset_idx;
     -+  int offset_len;
     -+  struct block_reader block_reader;
     -+  struct block_iter cur;
     -+  bool finished;
     ++	struct reader *r;
     ++	struct slice oid;
     ++
     ++	/* mutable */
     ++	uint64_t *offsets;
     ++
     ++	/* Points to the next offset to read. */
     ++	int offset_idx;
     ++	int offset_len;
     ++	struct block_reader block_reader;
     ++	struct block_iter cur;
     ++	bool finished;
      +};
      +
      +void iterator_from_indexed_table_ref_iter(struct iterator *it,
     -+                                          struct indexed_table_ref_iter *itr);
     ++					  struct indexed_table_ref_iter *itr);
      +int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
     -+                               struct reader *r, byte *oid, int oid_len,
     -+                               uint64_t *offsets, int offset_len);
     ++			       struct reader *r, byte *oid, int oid_len,
     ++			       uint64_t *offsets, int offset_len);
      +
      +#endif
      
     @@ -1736,257 +1578,283 @@
      +
      +#include "merged.h"
      +
     -+#include <stdlib.h>
     ++#include "system.h"
      +
      +#include "constants.h"
      +#include "iter.h"
      +#include "pq.h"
      +#include "reader.h"
      +
     -+static int merged_iter_init(struct merged_iter *mi) {
     -+  for (int i = 0; i < mi->stack_len; i++) {
     -+    struct record rec = new_record(mi->typ);
     -+    int err = iterator_next(mi->stack[i], rec);
     -+    if (err < 0) {
     -+      return err;
     -+    }
     -+
     -+    if (err > 0) {
     -+      iterator_destroy(&mi->stack[i]);
     -+      record_clear(rec);
     -+      free(record_yield(&rec));
     -+    } else {
     -+      struct pq_entry e = {
     -+          .rec = rec,
     -+          .index = i,
     -+      };
     -+      merged_iter_pqueue_add(&mi->pq, e);
     -+    }
     -+  }
     -+
     -+  return 0;
     -+}
     -+
     -+static void merged_iter_close(void *p) {
     -+  struct merged_iter *mi = (struct merged_iter *)p;
     -+  merged_iter_pqueue_clear(&mi->pq);
     -+  for (int i = 0; i < mi->stack_len; i++) {
     -+    iterator_destroy(&mi->stack[i]);
     -+  }
     -+  free(mi->stack);
     -+}
     -+
     -+static int merged_iter_advance_subiter(struct merged_iter *mi, int idx) {
     -+  if (iterator_is_null(mi->stack[idx])) {
     -+    return 0;
     -+  }
     -+
     -+  {
     -+    struct record rec = new_record(mi->typ);
     -+    struct pq_entry e = {
     -+        .rec = rec,
     -+        .index = idx,
     -+    };
     -+    int err = iterator_next(mi->stack[idx], rec);
     -+    if (err < 0) {
     -+      return err;
     -+    }
     -+
     -+    if (err > 0) {
     -+      iterator_destroy(&mi->stack[idx]);
     -+      record_clear(rec);
     -+      free(record_yield(&rec));
     -+      return 0;
     -+    }
     -+
     -+    merged_iter_pqueue_add(&mi->pq, e);
     -+  }
     -+  return 0;
     -+}
     -+
     -+static int merged_iter_next(struct merged_iter *mi, struct record rec) {
     -+  struct slice entry_key = {};
     -+  struct pq_entry entry = merged_iter_pqueue_remove(&mi->pq);
     -+  int err = merged_iter_advance_subiter(mi, entry.index);
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+
     -+  record_key(entry.rec, &entry_key);
     -+  while (!merged_iter_pqueue_is_empty(mi->pq)) {
     -+    struct pq_entry top = merged_iter_pqueue_top(mi->pq);
     -+    struct slice k = {};
     -+    int err = 0, cmp = 0;
     -+
     -+    record_key(top.rec, &k);
     -+
     -+    cmp = slice_compare(k, entry_key);
     -+    free(slice_yield(&k));
     -+
     -+    if (cmp > 0) {
     -+      break;
     -+    }
     -+
     -+    merged_iter_pqueue_remove(&mi->pq);
     -+    err = merged_iter_advance_subiter(mi, top.index);
     -+    if (err < 0) {
     -+      return err;
     -+    }
     -+    record_clear(top.rec);
     -+    free(record_yield(&top.rec));
     -+  }
     -+
     -+  record_copy_from(rec, entry.rec, mi->hash_size);
     -+  record_clear(entry.rec);
     -+  free(record_yield(&entry.rec));
     -+  free(slice_yield(&entry_key));
     -+  return 0;
     -+}
     -+
     -+static int merged_iter_next_void(void *p, struct record rec) {
     -+  struct merged_iter *mi = (struct merged_iter *)p;
     -+  if (merged_iter_pqueue_is_empty(mi->pq)) {
     -+    return 1;
     -+  }
     -+
     -+  return merged_iter_next(mi, rec);
     ++static int merged_iter_init(struct merged_iter *mi)
     ++{
     ++	int i = 0;
     ++	for (i = 0; i < mi->stack_len; i++) {
     ++		struct record rec = new_record(mi->typ);
     ++		int err = iterator_next(mi->stack[i], rec);
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++
     ++		if (err > 0) {
     ++			iterator_destroy(&mi->stack[i]);
     ++			record_clear(rec);
     ++			free(record_yield(&rec));
     ++		} else {
     ++			struct pq_entry e = {
     ++				.rec = rec,
     ++				.index = i,
     ++			};
     ++			merged_iter_pqueue_add(&mi->pq, e);
     ++		}
     ++	}
     ++
     ++	return 0;
     ++}
     ++
     ++static void merged_iter_close(void *p)
     ++{
     ++	struct merged_iter *mi = (struct merged_iter *)p;
     ++	int i = 0;
     ++	merged_iter_pqueue_clear(&mi->pq);
     ++	for (i = 0; i < mi->stack_len; i++) {
     ++		iterator_destroy(&mi->stack[i]);
     ++	}
     ++	free(mi->stack);
     ++}
     ++
     ++static int merged_iter_advance_subiter(struct merged_iter *mi, int idx)
     ++{
     ++	if (iterator_is_null(mi->stack[idx])) {
     ++		return 0;
     ++	}
     ++
     ++	{
     ++		struct record rec = new_record(mi->typ);
     ++		struct pq_entry e = {
     ++			.rec = rec,
     ++			.index = idx,
     ++		};
     ++		int err = iterator_next(mi->stack[idx], rec);
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++
     ++		if (err > 0) {
     ++			iterator_destroy(&mi->stack[idx]);
     ++			record_clear(rec);
     ++			free(record_yield(&rec));
     ++			return 0;
     ++		}
     ++
     ++		merged_iter_pqueue_add(&mi->pq, e);
     ++	}
     ++	return 0;
     ++}
     ++
     ++static int merged_iter_next(struct merged_iter *mi, struct record rec)
     ++{
     ++	struct slice entry_key = {};
     ++	struct pq_entry entry = merged_iter_pqueue_remove(&mi->pq);
     ++	int err = merged_iter_advance_subiter(mi, entry.index);
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	record_key(entry.rec, &entry_key);
     ++	while (!merged_iter_pqueue_is_empty(mi->pq)) {
     ++		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
     ++		struct slice k = {};
     ++		int err = 0, cmp = 0;
     ++
     ++		record_key(top.rec, &k);
     ++
     ++		cmp = slice_compare(k, entry_key);
     ++		free(slice_yield(&k));
     ++
     ++		if (cmp > 0) {
     ++			break;
     ++		}
     ++
     ++		merged_iter_pqueue_remove(&mi->pq);
     ++		err = merged_iter_advance_subiter(mi, top.index);
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++		record_clear(top.rec);
     ++		free(record_yield(&top.rec));
     ++	}
     ++
     ++	record_copy_from(rec, entry.rec, mi->hash_size);
     ++	record_clear(entry.rec);
     ++	free(record_yield(&entry.rec));
     ++	free(slice_yield(&entry_key));
     ++	return 0;
     ++}
     ++
     ++static int merged_iter_next_void(void *p, struct record rec)
     ++{
     ++	struct merged_iter *mi = (struct merged_iter *)p;
     ++	if (merged_iter_pqueue_is_empty(mi->pq)) {
     ++		return 1;
     ++	}
     ++
     ++	return merged_iter_next(mi, rec);
      +}
      +
      +struct iterator_vtable merged_iter_vtable = {
     -+    .next = &merged_iter_next_void,
     -+    .close = &merged_iter_close,
     ++	.next = &merged_iter_next_void,
     ++	.close = &merged_iter_close,
      +};
      +
      +static void iterator_from_merged_iter(struct iterator *it,
     -+                                      struct merged_iter *mi) {
     -+  it->iter_arg = mi;
     -+  it->ops = &merged_iter_vtable;
     -+}
     -+
     -+int new_merged_table(struct merged_table **dest, struct reader **stack, int n) {
     -+  uint64_t last_max = 0;
     -+  uint64_t first_min = 0;
     -+  for (int i = 0; i < n; i++) {
     -+    struct reader *r = stack[i];
     -+    if (i > 0 && last_max >= reader_min_update_index(r)) {
     -+      return FORMAT_ERROR;
     -+    }
     -+    if (i == 0) {
     -+      first_min = reader_min_update_index(r);
     -+    }
     -+
     -+    last_max = reader_max_update_index(r);
     -+  }
     -+
     -+  {
     -+    struct merged_table m = {
     -+        .stack = stack,
     -+        .stack_len = n,
     -+        .min = first_min,
     -+        .max = last_max,
     -+        .hash_size = SHA1_SIZE,
     -+    };
     -+
     -+    *dest = calloc(sizeof(struct merged_table), 1);
     -+    **dest = m;
     -+  }
     -+  return 0;
     -+}
     -+
     -+void merged_table_close(struct merged_table *mt) {
     -+  for (int i = 0; i < mt->stack_len; i++) {
     -+    reader_free(mt->stack[i]);
     -+  }
     -+  free(mt->stack);
     -+  mt->stack = NULL;
     -+  mt->stack_len = 0;
     ++				      struct merged_iter *mi)
     ++{
     ++	it->iter_arg = mi;
     ++	it->ops = &merged_iter_vtable;
     ++}
     ++
     ++int new_merged_table(struct merged_table **dest, struct reader **stack, int n)
     ++{
     ++	uint64_t last_max = 0;
     ++	uint64_t first_min = 0;
     ++	int i = 0;
     ++	for (i = 0; i < n; i++) {
     ++		struct reader *r = stack[i];
     ++		if (i > 0 && last_max >= reader_min_update_index(r)) {
     ++			return FORMAT_ERROR;
     ++		}
     ++		if (i == 0) {
     ++			first_min = reader_min_update_index(r);
     ++		}
     ++
     ++		last_max = reader_max_update_index(r);
     ++	}
     ++
     ++	{
     ++		struct merged_table m = {
     ++			.stack = stack,
     ++			.stack_len = n,
     ++			.min = first_min,
     ++			.max = last_max,
     ++			.hash_size = SHA1_SIZE,
     ++		};
     ++
     ++		*dest = calloc(sizeof(struct merged_table), 1);
     ++		**dest = m;
     ++	}
     ++	return 0;
     ++}
     ++
     ++void merged_table_close(struct merged_table *mt)
     ++{
     ++	int i = 0;
     ++	for (i = 0; i < mt->stack_len; i++) {
     ++		reader_free(mt->stack[i]);
     ++	}
     ++	free(mt->stack);
     ++	mt->stack = NULL;
     ++	mt->stack_len = 0;
      +}
      +
      +/* clears the list of subtable, without affecting the readers themselves. */
     -+void merged_table_clear(struct merged_table *mt) {
     -+  free(mt->stack);
     -+  mt->stack = NULL;
     -+  mt->stack_len = 0;
     ++void merged_table_clear(struct merged_table *mt)
     ++{
     ++	free(mt->stack);
     ++	mt->stack = NULL;
     ++	mt->stack_len = 0;
      +}
      +
     -+void merged_table_free(struct merged_table *mt) {
     -+  if (mt == NULL) {
     -+    return;
     -+  }
     -+  merged_table_clear(mt);
     -+  free(mt);
     ++void merged_table_free(struct merged_table *mt)
     ++{
     ++	if (mt == NULL) {
     ++		return;
     ++	}
     ++	merged_table_clear(mt);
     ++	free(mt);
      +}
      +
     -+uint64_t merged_max_update_index(struct merged_table *mt) { return mt->max; }
     ++uint64_t merged_max_update_index(struct merged_table *mt)
     ++{
     ++	return mt->max;
     ++}
      +
     -+uint64_t merged_min_update_index(struct merged_table *mt) { return mt->min; }
     ++uint64_t merged_min_update_index(struct merged_table *mt)
     ++{
     ++	return mt->min;
     ++}
      +
      +static int merged_table_seek_record(struct merged_table *mt,
     -+                                    struct iterator *it, struct record rec) {
     -+  struct iterator *iters = calloc(sizeof(struct iterator), mt->stack_len);
     -+  struct merged_iter merged = {
     -+      .stack = iters,
     -+      .typ = record_type(rec),
     -+      .hash_size = mt->hash_size,
     -+  };
     -+  int n = 0;
     -+  int err = 0;
     -+  for (int i = 0; i < mt->stack_len && err == 0; i++) {
     -+    int e = reader_seek(mt->stack[i], &iters[n], rec);
     -+    if (e < 0) {
     -+      err = e;
     -+    }
     -+    if (e == 0) {
     -+      n++;
     -+    }
     -+  }
     -+  if (err < 0) {
     -+    for (int i = 0; i < n; i++) {
     -+      iterator_destroy(&iters[i]);
     -+    }
     -+    free(iters);
     -+    return err;
     -+  }
     -+
     -+  merged.stack_len = n, err = merged_iter_init(&merged);
     -+  if (err < 0) {
     -+    merged_iter_close(&merged);
     -+    return err;
     -+  }
     -+
     -+  {
     -+    struct merged_iter *p = malloc(sizeof(struct merged_iter));
     -+    *p = merged;
     -+    iterator_from_merged_iter(it, p);
     -+  }
     -+  return 0;
     ++				    struct iterator *it, struct record rec)
     ++{
     ++	struct iterator *iters = calloc(sizeof(struct iterator), mt->stack_len);
     ++	struct merged_iter merged = {
     ++		.stack = iters,
     ++		.typ = record_type(rec),
     ++		.hash_size = mt->hash_size,
     ++	};
     ++	int n = 0;
     ++	int err = 0;
     ++	int i = 0;
     ++	for (i = 0; i < mt->stack_len && err == 0; i++) {
     ++		int e = reader_seek(mt->stack[i], &iters[n], rec);
     ++		if (e < 0) {
     ++			err = e;
     ++		}
     ++		if (e == 0) {
     ++			n++;
     ++		}
     ++	}
     ++	if (err < 0) {
     ++		int i = 0;
     ++		for (i = 0; i < n; i++) {
     ++			iterator_destroy(&iters[i]);
     ++		}
     ++		free(iters);
     ++		return err;
     ++	}
     ++
     ++	merged.stack_len = n, err = merged_iter_init(&merged);
     ++	if (err < 0) {
     ++		merged_iter_close(&merged);
     ++		return err;
     ++	}
     ++
     ++	{
     ++		struct merged_iter *p = malloc(sizeof(struct merged_iter));
     ++		*p = merged;
     ++		iterator_from_merged_iter(it, p);
     ++	}
     ++	return 0;
      +}
      +
      +int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
     -+                          const char *name) {
     -+  struct ref_record ref = {
     -+      .ref_name = (char *)name,
     -+  };
     -+  struct record rec = {};
     -+  record_from_ref(&rec, &ref);
     -+  return merged_table_seek_record(mt, it, rec);
     ++			  const char *name)
     ++{
     ++	struct ref_record ref = {
     ++		.ref_name = (char *)name,
     ++	};
     ++	struct record rec = {};
     ++	record_from_ref(&rec, &ref);
     ++	return merged_table_seek_record(mt, it, rec);
      +}
      +
      +int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
     -+                             const char *name, uint64_t update_index) {
     -+  struct log_record log = {
     -+      .ref_name = (char *)name,
     -+      .update_index = update_index,
     -+  };
     -+  struct record rec = {};
     -+  record_from_log(&rec, &log);
     -+  return merged_table_seek_record(mt, it, rec);
     ++			     const char *name, uint64_t update_index)
     ++{
     ++	struct log_record log = {
     ++		.ref_name = (char *)name,
     ++		.update_index = update_index,
     ++	};
     ++	struct record rec = {};
     ++	record_from_log(&rec, &log);
     ++	return merged_table_seek_record(mt, it, rec);
      +}
      +
      +int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
     -+                          const char *name) {
     -+  uint64_t max = ~((uint64_t)0);
     -+  return merged_table_seek_log_at(mt, it, name, max);
     ++			  const char *name)
     ++{
     ++	uint64_t max = ~((uint64_t)0);
     ++	return merged_table_seek_log_at(mt, it, name, max);
      +}
      
       diff --git a/reftable/merged.h b/reftable/merged.h
     @@ -2009,20 +1877,20 @@
      +#include "reftable.h"
      +
      +struct merged_table {
     -+  struct reader **stack;
     -+  int stack_len;
     -+  int hash_size;
     ++	struct reader **stack;
     ++	int stack_len;
     ++	int hash_size;
      +
     -+  uint64_t min;
     -+  uint64_t max;
     ++	uint64_t min;
     ++	uint64_t max;
      +};
      +
      +struct merged_iter {
     -+  struct iterator *stack;
     -+  int hash_size;
     -+  int stack_len;
     -+  byte typ;
     -+  struct merged_iter_pqueue pq;
     ++	struct iterator *stack;
     ++	int hash_size;
     ++	int stack_len;
     ++	byte typ;
     ++	struct merged_iter_pqueue pq;
      +} merged_iter;
      +
      +void merged_table_clear(struct merged_table *mt);
     @@ -2044,7 +1912,7 @@
      +
      +#include "merged.h"
      +
     -+#include <string.h>
     ++#include "system.h"
      +
      +#include "basics.h"
      +#include "block.h"
     @@ -2055,231 +1923,242 @@
      +#include "reftable.h"
      +#include "test_framework.h"
      +
     -+void test_pq(void) {
     -+  char *names[54] = {};
     -+  int N = ARRAYSIZE(names) - 1;
     -+
     -+  for (int i = 0; i < N; i++) {
     -+    char name[100];
     -+    sprintf(name, "%02d", i);
     -+    names[i] = strdup(name);
     -+  }
     -+
     -+  struct merged_iter_pqueue pq = {};
     -+
     -+  int i = 1;
     -+  do {
     -+    struct record rec = new_record(BLOCK_TYPE_REF);
     -+    record_as_ref(rec)->ref_name = names[i];
     -+
     -+    struct pq_entry e = {
     -+        .rec = rec,
     -+    };
     -+    merged_iter_pqueue_add(&pq, e);
     -+    merged_iter_pqueue_check(pq);
     -+    i = (i * 7) % N;
     -+  } while (i != 1);
     -+
     -+  const char *last = NULL;
     -+  while (!merged_iter_pqueue_is_empty(pq)) {
     -+    struct pq_entry e = merged_iter_pqueue_remove(&pq);
     -+    merged_iter_pqueue_check(pq);
     -+    struct ref_record *ref = record_as_ref(e.rec);
     -+
     -+    if (last != NULL) {
     -+      assert(strcmp(last, ref->ref_name) < 0);
     -+    }
     -+    last = ref->ref_name;
     -+    ref->ref_name = NULL;
     -+    free(ref);
     -+  }
     -+
     -+  for (int i = 0; i < N; i++) {
     -+    free(names[i]);
     -+  }
     -+
     -+  merged_iter_pqueue_clear(&pq);
     -+}
     -+
     -+void write_test_table(struct slice *buf, struct ref_record refs[], int n) {
     -+  int min = 0xffffffff;
     -+  int max = 0;
     -+  for (int i = 0; i < n; i++) {
     -+    uint64_t ui = refs[i].update_index;
     -+    if (ui > max) {
     -+      max = ui;
     -+    }
     -+    if (ui < min) {
     -+      min = ui;
     -+    }
     -+  }
     -+
     -+  struct write_options opts = {
     -+      .block_size = 256,
     -+  };
     -+
     -+  struct writer *w = new_writer(&slice_write_void, buf, &opts);
     -+  writer_set_limits(w, min, max);
     -+
     -+  for (int i = 0; i < n; i++) {
     -+    uint64_t before = refs[i].update_index;
     -+    int n = writer_add_ref(w, &refs[i]);
     -+    assert(n == 0);
     -+    assert(before == refs[i].update_index);
     -+  }
     -+
     -+  int err = writer_close(w);
     -+  assert(err == 0);
     -+
     -+  writer_free(w);
     -+  w = NULL;
     ++void test_pq(void)
     ++{
     ++	char *names[54] = {};
     ++	int N = ARRAYSIZE(names) - 1;
     ++
     ++	int i = 0;
     ++	for (i = 0; i < N; i++) {
     ++		char name[100];
     ++		snprintf(name, sizeof(name), "%02d", i);
     ++		names[i] = strdup(name);
     ++	}
     ++
     ++	struct merged_iter_pqueue pq = {};
     ++
     ++	i = 1;
     ++	do {
     ++		struct record rec = new_record(BLOCK_TYPE_REF);
     ++		record_as_ref(rec)->ref_name = names[i];
     ++
     ++		struct pq_entry e = {
     ++			.rec = rec,
     ++		};
     ++		merged_iter_pqueue_add(&pq, e);
     ++		merged_iter_pqueue_check(pq);
     ++		i = (i * 7) % N;
     ++	} while (i != 1);
     ++
     ++	const char *last = NULL;
     ++	while (!merged_iter_pqueue_is_empty(pq)) {
     ++		struct pq_entry e = merged_iter_pqueue_remove(&pq);
     ++		merged_iter_pqueue_check(pq);
     ++		struct ref_record *ref = record_as_ref(e.rec);
     ++
     ++		if (last != NULL) {
     ++			assert(strcmp(last, ref->ref_name) < 0);
     ++		}
     ++		last = ref->ref_name;
     ++		ref->ref_name = NULL;
     ++		free(ref);
     ++	}
     ++
     ++	for (i = 0; i < N; i++) {
     ++		free(names[i]);
     ++	}
     ++
     ++	merged_iter_pqueue_clear(&pq);
     ++}
     ++
     ++void write_test_table(struct slice *buf, struct ref_record refs[], int n)
     ++{
     ++	int min = 0xffffffff;
     ++	int max = 0;
     ++	int i = 0;
     ++	for (i = 0; i < n; i++) {
     ++		uint64_t ui = refs[i].update_index;
     ++		if (ui > max) {
     ++			max = ui;
     ++		}
     ++		if (ui < min) {
     ++			min = ui;
     ++		}
     ++	}
     ++
     ++	struct write_options opts = {
     ++		.block_size = 256,
     ++	};
     ++
     ++	struct writer *w = new_writer(&slice_write_void, buf, &opts);
     ++	writer_set_limits(w, min, max);
     ++
     ++	for (i = 0; i < n; i++) {
     ++		uint64_t before = refs[i].update_index;
     ++		int n = writer_add_ref(w, &refs[i]);
     ++		assert(n == 0);
     ++		assert(before == refs[i].update_index);
     ++	}
     ++
     ++	int err = writer_close(w);
     ++	assert(err == 0);
     ++
     ++	writer_free(w);
     ++	w = NULL;
      +}
      +
      +static struct merged_table *merged_table_from_records(struct ref_record **refs,
     -+                                                      int *sizes,
     -+                                                      struct slice *buf,
     -+                                                      int n) {
     -+  struct block_source *source = calloc(n, sizeof(*source));
     -+  struct reader **rd = calloc(n, sizeof(*rd));
     -+  for (int i = 0; i < n; i++) {
     -+    write_test_table(&buf[i], refs[i], sizes[i]);
     -+    block_source_from_slice(&source[i], &buf[i]);
     -+
     -+    int err = new_reader(&rd[i], source[i], "name");
     -+    assert(err == 0);
     -+  }
     -+
     -+  struct merged_table *mt = NULL;
     -+  int err = new_merged_table(&mt, rd, n);
     -+  assert(err == 0);
     -+  return mt;
     -+}
     -+
     -+void test_merged_between(void) {
     -+  byte hash1[SHA1_SIZE];
     -+  byte hash2[SHA1_SIZE];
     -+
     -+  set_test_hash(hash1, 1);
     -+  set_test_hash(hash2, 2);
     -+  struct ref_record r1[] = {{
     -+      .ref_name = "b",
     -+      .update_index = 1,
     -+      .value = hash1,
     -+  }};
     -+  struct ref_record r2[] = {{
     -+      .ref_name = "a",
     -+      .update_index = 2,
     -+  }};
     -+
     -+  struct ref_record *refs[] = {r1, r2};
     -+  int sizes[] = {1, 1};
     -+  struct slice bufs[2] = {};
     -+  struct merged_table *mt = merged_table_from_records(refs, sizes, bufs, 2);
     -+
     -+  struct iterator it = {};
     -+  int err = merged_table_seek_ref(mt, &it, "a");
     -+  assert(err == 0);
     -+
     -+  struct ref_record ref = {};
     -+  err = iterator_next_ref(it, &ref);
     -+  assert_err(err);
     -+  assert(ref.update_index == 2);
     -+}
     -+
     -+void test_merged(void) {
     -+  byte hash1[SHA1_SIZE];
     -+  byte hash2[SHA1_SIZE];
     -+
     -+  set_test_hash(hash1, 1);
     -+  set_test_hash(hash2, 2);
     -+  struct ref_record r1[] = {{
     -+                                .ref_name = "a",
     -+                                .update_index = 1,
     -+                                .value = hash1,
     -+                            },
     -+                            {
     -+                                .ref_name = "b",
     -+                                .update_index = 1,
     -+                                .value = hash1,
     -+                            },
     -+                            {
     -+                                .ref_name = "c",
     -+                                .update_index = 1,
     -+                                .value = hash1,
     -+                            }};
     -+  struct ref_record r2[] = {{
     -+      .ref_name = "a",
     -+      .update_index = 2,
     -+  }};
     -+  struct ref_record r3[] = {
     -+      {
     -+          .ref_name = "c",
     -+          .update_index = 3,
     -+          .value = hash2,
     -+      },
     -+      {
     -+          .ref_name = "d",
     -+          .update_index = 3,
     -+          .value = hash1,
     -+      },
     -+  };
     -+
     -+  struct ref_record *refs[] = {r1, r2, r3};
     -+  int sizes[3] = {3, 1, 2};
     -+  struct slice bufs[3] = {};
     -+
     -+  struct merged_table *mt = merged_table_from_records(refs, sizes, bufs, 3);
     -+
     -+  struct iterator it = {};
     -+  int err = merged_table_seek_ref(mt, &it, "a");
     -+  assert(err == 0);
     -+
     -+  struct ref_record *out = NULL;
     -+  int len = 0;
     -+  int cap = 0;
     -+  while (len < 100) {  // cap loops/recursion.
     -+    struct ref_record ref = {};
     -+    int err = iterator_next_ref(it, &ref);
     -+    if (err > 0) {
     -+      break;
     -+    }
     -+    if (len == cap) {
     -+      cap = 2 * cap + 1;
     -+      out = realloc(out, sizeof(struct ref_record) * cap);
     -+    }
     -+    out[len++] = ref;
     -+  }
     -+  iterator_destroy(&it);
     -+
     -+  struct ref_record want[] = {
     -+      r2[0],
     -+      r1[1],
     -+      r3[0],
     -+      r3[1],
     -+  };
     -+  assert(ARRAYSIZE(want) == len);
     -+  for (int i = 0; i < len; i++) {
     -+    assert(ref_record_equal(&want[i], &out[i], SHA1_SIZE));
     -+  }
     -+  for (int i = 0; i < len; i++) {
     -+    ref_record_clear(&out[i]);
     -+  }
     -+  free(out);
     -+
     -+  for (int i = 0; i < 3; i++) {
     -+    free(slice_yield(&bufs[i]));
     -+  }
     -+  merged_table_close(mt);
     -+  merged_table_free(mt);
     -+}
     -+
     -+// XXX test refs_for(oid)
     -+
     -+int main() {
     -+  add_test_case("test_merged_between", &test_merged_between);
     -+  add_test_case("test_pq", &test_pq);
     -+  add_test_case("test_merged", &test_merged);
     -+  test_main();
     ++						      int *sizes,
     ++						      struct slice *buf, int n)
     ++{
     ++	struct block_source *source = calloc(n, sizeof(*source));
     ++	struct reader **rd = calloc(n, sizeof(*rd));
     ++	int i = 0;
     ++	for (i = 0; i < n; i++) {
     ++		write_test_table(&buf[i], refs[i], sizes[i]);
     ++		block_source_from_slice(&source[i], &buf[i]);
     ++
     ++		int err = new_reader(&rd[i], source[i], "name");
     ++		assert(err == 0);
     ++	}
     ++
     ++	struct merged_table *mt = NULL;
     ++	int err = new_merged_table(&mt, rd, n);
     ++	assert(err == 0);
     ++	return mt;
     ++}
     ++
     ++void test_merged_between(void)
     ++{
     ++	byte hash1[SHA1_SIZE];
     ++	byte hash2[SHA1_SIZE];
     ++
     ++	set_test_hash(hash1, 1);
     ++	set_test_hash(hash2, 2);
     ++	struct ref_record r1[] = { {
     ++		.ref_name = "b",
     ++		.update_index = 1,
     ++		.value = hash1,
     ++	} };
     ++	struct ref_record r2[] = { {
     ++		.ref_name = "a",
     ++		.update_index = 2,
     ++	} };
     ++
     ++	struct ref_record *refs[] = { r1, r2 };
     ++	int sizes[] = { 1, 1 };
     ++	struct slice bufs[2] = {};
     ++	struct merged_table *mt =
     ++		merged_table_from_records(refs, sizes, bufs, 2);
     ++
     ++	struct iterator it = {};
     ++	int err = merged_table_seek_ref(mt, &it, "a");
     ++	assert(err == 0);
     ++
     ++	struct ref_record ref = {};
     ++	err = iterator_next_ref(it, &ref);
     ++	assert_err(err);
     ++	assert(ref.update_index == 2);
     ++}
     ++
     ++void test_merged(void)
     ++{
     ++	byte hash1[SHA1_SIZE];
     ++	byte hash2[SHA1_SIZE];
     ++
     ++	set_test_hash(hash1, 1);
     ++	set_test_hash(hash2, 2);
     ++	struct ref_record r1[] = { {
     ++					   .ref_name = "a",
     ++					   .update_index = 1,
     ++					   .value = hash1,
     ++				   },
     ++				   {
     ++					   .ref_name = "b",
     ++					   .update_index = 1,
     ++					   .value = hash1,
     ++				   },
     ++				   {
     ++					   .ref_name = "c",
     ++					   .update_index = 1,
     ++					   .value = hash1,
     ++				   } };
     ++	struct ref_record r2[] = { {
     ++		.ref_name = "a",
     ++		.update_index = 2,
     ++	} };
     ++	struct ref_record r3[] = {
     ++		{
     ++			.ref_name = "c",
     ++			.update_index = 3,
     ++			.value = hash2,
     ++		},
     ++		{
     ++			.ref_name = "d",
     ++			.update_index = 3,
     ++			.value = hash1,
     ++		},
     ++	};
     ++
     ++	struct ref_record *refs[] = { r1, r2, r3 };
     ++	int sizes[3] = { 3, 1, 2 };
     ++	struct slice bufs[3] = {};
     ++
     ++	struct merged_table *mt =
     ++		merged_table_from_records(refs, sizes, bufs, 3);
     ++
     ++	struct iterator it = {};
     ++	int err = merged_table_seek_ref(mt, &it, "a");
     ++	assert(err == 0);
     ++
     ++	struct ref_record *out = NULL;
     ++	int len = 0;
     ++	int cap = 0;
     ++	while (len < 100) { /* cap loops/recursion. */
     ++		struct ref_record ref = {};
     ++		int err = iterator_next_ref(it, &ref);
     ++		if (err > 0) {
     ++			break;
     ++		}
     ++		if (len == cap) {
     ++			cap = 2 * cap + 1;
     ++			out = realloc(out, sizeof(struct ref_record) * cap);
     ++		}
     ++		out[len++] = ref;
     ++	}
     ++	iterator_destroy(&it);
     ++
     ++	struct ref_record want[] = {
     ++		r2[0],
     ++		r1[1],
     ++		r3[0],
     ++		r3[1],
     ++	};
     ++	assert(ARRAYSIZE(want) == len);
     ++	int i = 0;
     ++	for (i = 0; i < len; i++) {
     ++		assert(ref_record_equal(&want[i], &out[i], SHA1_SIZE));
     ++	}
     ++	for (i = 0; i < len; i++) {
     ++		ref_record_clear(&out[i]);
     ++	}
     ++	free(out);
     ++
     ++	for (i = 0; i < 3; i++) {
     ++		free(slice_yield(&bufs[i]));
     ++	}
     ++	merged_table_close(mt);
     ++	merged_table_free(mt);
     ++}
     ++
     ++/* XXX test refs_for(oid) */
     ++
     ++int main()
     ++{
     ++	add_test_case("test_merged_between", &test_merged_between);
     ++	add_test_case("test_pq", &test_pq);
     ++	add_test_case("test_merged", &test_merged);
     ++	test_main();
      +}
      
       diff --git a/reftable/pq.c b/reftable/pq.c
     @@ -2297,111 +2176,119 @@
      +
      +#include "pq.h"
      +
     -+#include <assert.h>
     -+#include <stdlib.h>
     ++#include "system.h"
      +
     -+int pq_less(struct pq_entry a, struct pq_entry b) {
     -+  struct slice ak = {};
     -+  struct slice bk = {};
     -+  int cmp = 0;
     -+  record_key(a.rec, &ak);
     -+  record_key(b.rec, &bk);
     ++int pq_less(struct pq_entry a, struct pq_entry b)
     ++{
     ++	struct slice ak = {};
     ++	struct slice bk = {};
     ++	int cmp = 0;
     ++	record_key(a.rec, &ak);
     ++	record_key(b.rec, &bk);
      +
     -+  cmp = slice_compare(ak, bk);
     ++	cmp = slice_compare(ak, bk);
     ++
     ++	free(slice_yield(&ak));
     ++	free(slice_yield(&bk));
      +
     -+  free(slice_yield(&ak));
     -+  free(slice_yield(&bk));
     ++	if (cmp == 0) {
     ++		return a.index > b.index;
     ++	}
      +
     -+  if (cmp == 0) {
     -+    return a.index > b.index;
     -+  }
     -+
     -+  return cmp < 0;
     ++	return cmp < 0;
      +}
      +
     -+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq) {
     -+  return pq.heap[0];
     ++struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
     ++{
     ++	return pq.heap[0];
      +}
      +
     -+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq) {
     -+  return pq.len == 0;
     ++bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
     ++{
     ++	return pq.len == 0;
      +}
      +
     -+void merged_iter_pqueue_check(struct merged_iter_pqueue pq) {
     -+  for (int i = 1; i < pq.len; i++) {
     -+    int parent = (i - 1) / 2;
     ++void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
     ++{
     ++	int i = 0;
     ++	for (i = 1; i < pq.len; i++) {
     ++		int parent = (i - 1) / 2;
      +
     -+    assert(pq_less(pq.heap[parent], pq.heap[i]));
     -+  }
     ++		assert(pq_less(pq.heap[parent], pq.heap[i]));
     ++	}
      +}
      +
     -+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq) {
     -+  int i = 0;
     -+  struct pq_entry e = pq->heap[0];
     -+  pq->heap[0] = pq->heap[pq->len - 1];
     -+  pq->len--;
     ++struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
     ++{
     ++	int i = 0;
     ++	struct pq_entry e = pq->heap[0];
     ++	pq->heap[0] = pq->heap[pq->len - 1];
     ++	pq->len--;
      +
     -+  i = 0;
     -+  while (i < pq->len) {
     -+    int min = i;
     -+    int j = 2 * i + 1;
     -+    int k = 2 * i + 2;
     -+    if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
     -+      min = j;
     -+    }
     -+    if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
     -+      min = k;
     -+    }
     ++	i = 0;
     ++	while (i < pq->len) {
     ++		int min = i;
     ++		int j = 2 * i + 1;
     ++		int k = 2 * i + 2;
     ++		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
     ++			min = j;
     ++		}
     ++		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
     ++			min = k;
     ++		}
      +
     -+    if (min == i) {
     -+      break;
     -+    }
     ++		if (min == i) {
     ++			break;
     ++		}
      +
     -+    {
     -+      struct pq_entry tmp = pq->heap[min];
     -+      pq->heap[min] = pq->heap[i];
     -+      pq->heap[i] = tmp;
     -+    }
     ++		{
     ++			struct pq_entry tmp = pq->heap[min];
     ++			pq->heap[min] = pq->heap[i];
     ++			pq->heap[i] = tmp;
     ++		}
      +
     -+    i = min;
     -+  }
     ++		i = min;
     ++	}
      +
     -+  return e;
     ++	return e;
      +}
      +
     -+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e) {
     -+  int i = 0;
     -+  if (pq->len == pq->cap) {
     -+    pq->cap = 2 * pq->cap + 1;
     -+    pq->heap = realloc(pq->heap, pq->cap * sizeof(struct pq_entry));
     -+  }
     -+
     -+  pq->heap[pq->len++] = e;
     -+  i = pq->len - 1;
     -+  while (i > 0) {
     -+    int j = (i - 1) / 2;
     -+    if (pq_less(pq->heap[j], pq->heap[i])) {
     -+      break;
     -+    }
     ++void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
     ++{
     ++	int i = 0;
     ++	if (pq->len == pq->cap) {
     ++		pq->cap = 2 * pq->cap + 1;
     ++		pq->heap = realloc(pq->heap, pq->cap * sizeof(struct pq_entry));
     ++	}
      +
     -+    {
     -+      struct pq_entry tmp = pq->heap[j];
     -+      pq->heap[j] = pq->heap[i];
     -+      pq->heap[i] = tmp;
     -+    }
     ++	pq->heap[pq->len++] = e;
     ++	i = pq->len - 1;
     ++	while (i > 0) {
     ++		int j = (i - 1) / 2;
     ++		if (pq_less(pq->heap[j], pq->heap[i])) {
     ++			break;
     ++		}
      +
     -+    i = j;
     -+  }
     -+}
     ++		{
     ++			struct pq_entry tmp = pq->heap[j];
     ++			pq->heap[j] = pq->heap[i];
     ++			pq->heap[i] = tmp;
     ++		}
      +
     -+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq) {
     -+  for (int i = 0; i < pq->len; i++) {
     -+    record_clear(pq->heap[i].rec);
     -+    free(record_yield(&pq->heap[i].rec));
     -+  }
     -+  free(pq->heap);
     -+  pq->heap = NULL;
     -+  pq->len = pq->cap = 0;
     ++		i = j;
     ++	}
     ++}
     ++
     ++void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
     ++{
     ++	int i = 0;
     ++	for (i = 0; i < pq->len; i++) {
     ++		record_clear(pq->heap[i].rec);
     ++		free(record_yield(&pq->heap[i].rec));
     ++	}
     ++	free(pq->heap);
     ++	pq->heap = NULL;
     ++	pq->len = pq->cap = 0;
      +}
      
       diff --git a/reftable/pq.h b/reftable/pq.h
     @@ -2423,16 +2310,16 @@
      +#include "record.h"
      +
      +struct pq_entry {
     -+  struct record rec;
     -+  int index;
     ++	struct record rec;
     ++	int index;
      +};
      +
      +int pq_less(struct pq_entry a, struct pq_entry b);
      +
      +struct merged_iter_pqueue {
     -+  struct pq_entry *heap;
     -+  int len;
     -+  int cap;
     ++	struct pq_entry *heap;
     ++	int len;
     ++	int cap;
      +};
      +
      +struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
     @@ -2459,10 +2346,7 @@
      +
      +#include "reader.h"
      +
     -+#include <stdio.h>
     -+#include <stdlib.h>
     -+#include <string.h>
     -+#include <zlib.h>
     ++#include "system.h"
      +
      +#include "block.h"
      +#include "constants.h"
     @@ -2471,655 +2355,696 @@
      +#include "reftable.h"
      +#include "tree.h"
      +
     -+uint64_t block_source_size(struct block_source source) {
     -+  return source.ops->size(source.arg);
     ++uint64_t block_source_size(struct block_source source)
     ++{
     ++	return source.ops->size(source.arg);
      +}
      +
      +int block_source_read_block(struct block_source source, struct block *dest,
     -+                            uint64_t off, uint32_t size) {
     -+  int result = source.ops->read_block(source.arg, dest, off, size);
     -+  dest->source = source;
     -+  return result;
     ++			    uint64_t off, uint32_t size)
     ++{
     ++	int result = source.ops->read_block(source.arg, dest, off, size);
     ++	dest->source = source;
     ++	return result;
      +}
      +
     -+void block_source_return_block(struct block_source source,
     -+                               struct block *blockp) {
     -+  source.ops->return_block(source.arg, blockp);
     -+  blockp->data = NULL;
     -+  blockp->len = 0;
     -+  blockp->source.ops = NULL;
     -+  blockp->source.arg = NULL;
     ++void block_source_return_block(struct block_source source, struct block *blockp)
     ++{
     ++	source.ops->return_block(source.arg, blockp);
     ++	blockp->data = NULL;
     ++	blockp->len = 0;
     ++	blockp->source.ops = NULL;
     ++	blockp->source.arg = NULL;
      +}
      +
     -+void block_source_close(struct block_source *source) {
     -+  if (source->ops == NULL) {
     -+    return;
     -+  }
     ++void block_source_close(struct block_source *source)
     ++{
     ++	if (source->ops == NULL) {
     ++		return;
     ++	}
      +
     -+  source->ops->close(source->arg);
     -+  source->ops = NULL;
     ++	source->ops->close(source->arg);
     ++	source->ops = NULL;
      +}
      +
     -+static struct reader_offsets *reader_offsets_for(struct reader *r, byte typ) {
     -+  switch (typ) {
     -+    case BLOCK_TYPE_REF:
     -+      return &r->ref_offsets;
     -+    case BLOCK_TYPE_LOG:
     -+      return &r->log_offsets;
     -+    case BLOCK_TYPE_OBJ:
     -+      return &r->obj_offsets;
     -+  }
     -+  abort();
     ++static struct reader_offsets *reader_offsets_for(struct reader *r, byte typ)
     ++{
     ++	switch (typ) {
     ++	case BLOCK_TYPE_REF:
     ++		return &r->ref_offsets;
     ++	case BLOCK_TYPE_LOG:
     ++		return &r->log_offsets;
     ++	case BLOCK_TYPE_OBJ:
     ++		return &r->obj_offsets;
     ++	}
     ++	abort();
      +}
      +
      +static int reader_get_block(struct reader *r, struct block *dest, uint64_t off,
     -+                            uint32_t sz) {
     -+  if (off >= r->size) {
     -+    return 0;
     -+  }
     -+
     -+  if (off + sz > r->size) {
     -+    sz = r->size - off;
     -+  }
     -+
     -+  return block_source_read_block(r->source, dest, off, sz);
     -+}
     -+
     -+void reader_return_block(struct reader *r, struct block *p) {
     -+  block_source_return_block(r->source, p);
     -+}
     -+
     -+const char *reader_name(struct reader *r) { return r->name; }
     -+
     -+static int parse_footer(struct reader *r, byte *footer, byte *header) {
     -+  byte *f = footer;
     -+  int err = 0;
     -+  if (memcmp(f, "REFT", 4)) {
     -+    err = FORMAT_ERROR;
     -+    goto exit;
     -+  }
     -+  f += 4;
     -+
     -+  if (0 != memcmp(footer, header, HEADER_SIZE)) {
     -+    err = FORMAT_ERROR;
     -+    goto exit;
     -+  }
     -+
     -+  {
     -+    byte version = *f++;
     -+    if (version != 1) {
     -+      err = FORMAT_ERROR;
     -+      goto exit;
     -+    }
     -+  }
     -+
     -+  r->block_size = get_u24(f);
     -+
     -+  f += 3;
     -+  r->min_update_index = get_u64(f);
     -+  f += 8;
     -+  r->max_update_index = get_u64(f);
     -+  f += 8;
     -+
     -+  r->ref_offsets.index_offset = get_u64(f);
     -+  f += 8;
     -+
     -+  r->obj_offsets.offset = get_u64(f);
     -+  f += 8;
     -+
     -+  r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
     -+  r->obj_offsets.offset >>= 5;
     -+
     -+  r->obj_offsets.index_offset = get_u64(f);
     -+  f += 8;
     -+  r->log_offsets.offset = get_u64(f);
     -+  f += 8;
     -+  r->log_offsets.index_offset = get_u64(f);
     -+  f += 8;
     -+
     -+  {
     -+    uint32_t computed_crc = crc32(0, footer, f - footer);
     -+    uint32_t file_crc = get_u32(f);
     -+    f += 4;
     -+    if (computed_crc != file_crc) {
     -+      err = FORMAT_ERROR;
     -+      goto exit;
     -+    }
     -+  }
     -+
     -+  {
     -+    byte first_block_typ = header[HEADER_SIZE];
     -+    r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
     -+    r->ref_offsets.offset = 0;
     -+    r->log_offsets.present =
     -+        (first_block_typ == BLOCK_TYPE_LOG || r->log_offsets.offset > 0);
     -+    r->obj_offsets.present = r->obj_offsets.offset > 0;
     -+  }
     -+  err = 0;
     ++			    uint32_t sz)
     ++{
     ++	if (off >= r->size) {
     ++		return 0;
     ++	}
     ++
     ++	if (off + sz > r->size) {
     ++		sz = r->size - off;
     ++	}
     ++
     ++	return block_source_read_block(r->source, dest, off, sz);
     ++}
     ++
     ++void reader_return_block(struct reader *r, struct block *p)
     ++{
     ++	block_source_return_block(r->source, p);
     ++}
     ++
     ++const char *reader_name(struct reader *r)
     ++{
     ++	return r->name;
     ++}
     ++
     ++static int parse_footer(struct reader *r, byte *footer, byte *header)
     ++{
     ++	byte *f = footer;
     ++	int err = 0;
     ++	if (memcmp(f, "REFT", 4)) {
     ++		err = FORMAT_ERROR;
     ++		goto exit;
     ++	}
     ++	f += 4;
     ++
     ++	if (memcmp(footer, header, HEADER_SIZE)) {
     ++		err = FORMAT_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	{
     ++		byte version = *f++;
     ++		if (version != 1) {
     ++			err = FORMAT_ERROR;
     ++			goto exit;
     ++		}
     ++	}
     ++
     ++	r->block_size = get_u24(f);
     ++
     ++	f += 3;
     ++	r->min_update_index = get_u64(f);
     ++	f += 8;
     ++	r->max_update_index = get_u64(f);
     ++	f += 8;
     ++
     ++	r->ref_offsets.index_offset = get_u64(f);
     ++	f += 8;
     ++
     ++	r->obj_offsets.offset = get_u64(f);
     ++	f += 8;
     ++
     ++	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
     ++	r->obj_offsets.offset >>= 5;
     ++
     ++	r->obj_offsets.index_offset = get_u64(f);
     ++	f += 8;
     ++	r->log_offsets.offset = get_u64(f);
     ++	f += 8;
     ++	r->log_offsets.index_offset = get_u64(f);
     ++	f += 8;
     ++
     ++	{
     ++		uint32_t computed_crc = crc32(0, footer, f - footer);
     ++		uint32_t file_crc = get_u32(f);
     ++		f += 4;
     ++		if (computed_crc != file_crc) {
     ++			err = FORMAT_ERROR;
     ++			goto exit;
     ++		}
     ++	}
     ++
     ++	{
     ++		byte first_block_typ = header[HEADER_SIZE];
     ++		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
     ++		r->ref_offsets.offset = 0;
     ++		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
     ++					  r->log_offsets.offset > 0);
     ++		r->obj_offsets.present = r->obj_offsets.offset > 0;
     ++	}
     ++	err = 0;
      +exit:
     -+  return err;
     -+}
     -+
     -+int init_reader(struct reader *r, struct block_source source,
     -+                const char *name) {
     -+  struct block footer = {};
     -+  struct block header = {};
     -+  int err = 0;
     -+
     -+  memset(r, 0, sizeof(struct reader));
     -+  r->size = block_source_size(source) - FOOTER_SIZE;
     -+  r->source = source;
     -+  r->name = strdup(name);
     -+  r->hash_size = SHA1_SIZE;
     -+
     -+  err = block_source_read_block(source, &footer, r->size, FOOTER_SIZE);
     -+  if (err != FOOTER_SIZE) {
     -+    err = IO_ERROR;
     -+    goto exit;
     -+  }
     -+
     -+  // Need +1 to read type of first block.
     -+  err = reader_get_block(r, &header, 0, HEADER_SIZE + 1);
     -+  if (err != HEADER_SIZE + 1) {
     -+    err = IO_ERROR;
     -+    goto exit;
     -+  }
     -+
     -+  err = parse_footer(r, footer.data, header.data);
     ++	return err;
     ++}
     ++
     ++int init_reader(struct reader *r, struct block_source source, const char *name)
     ++{
     ++	struct block footer = {};
     ++	struct block header = {};
     ++	int err = 0;
     ++
     ++	memset(r, 0, sizeof(struct reader));
     ++	r->size = block_source_size(source) - FOOTER_SIZE;
     ++	r->source = source;
     ++	r->name = strdup(name);
     ++	r->hash_size = SHA1_SIZE;
     ++
     ++	err = block_source_read_block(source, &footer, r->size, FOOTER_SIZE);
     ++	if (err != FOOTER_SIZE) {
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	/* Need +1 to read type of first block. */
     ++	err = reader_get_block(r, &header, 0, HEADER_SIZE + 1);
     ++	if (err != HEADER_SIZE + 1) {
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	err = parse_footer(r, footer.data, header.data);
      +exit:
     -+  block_source_return_block(r->source, &footer);
     -+  block_source_return_block(r->source, &header);
     -+  return err;
     ++	block_source_return_block(r->source, &footer);
     ++	block_source_return_block(r->source, &header);
     ++	return err;
      +}
      +
      +struct table_iter {
     -+  struct reader *r;
     -+  byte typ;
     -+  uint64_t block_off;
     -+  struct block_iter bi;
     -+  bool finished;
     ++	struct reader *r;
     ++	byte typ;
     ++	uint64_t block_off;
     ++	struct block_iter bi;
     ++	bool finished;
      +};
      +
      +static void table_iter_copy_from(struct table_iter *dest,
     -+                                 struct table_iter *src) {
     -+  dest->r = src->r;
     -+  dest->typ = src->typ;
     -+  dest->block_off = src->block_off;
     -+  dest->finished = src->finished;
     -+  block_iter_copy_from(&dest->bi, &src->bi);
     ++				 struct table_iter *src)
     ++{
     ++	dest->r = src->r;
     ++	dest->typ = src->typ;
     ++	dest->block_off = src->block_off;
     ++	dest->finished = src->finished;
     ++	block_iter_copy_from(&dest->bi, &src->bi);
      +}
      +
     -+static int table_iter_next_in_block(struct table_iter *ti, struct record rec) {
     -+  int res = block_iter_next(&ti->bi, rec);
     -+  if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
     -+    ((struct ref_record *)rec.data)->update_index += ti->r->min_update_index;
     -+  }
     ++static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
     ++{
     ++	int res = block_iter_next(&ti->bi, rec);
     ++	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
     ++		((struct ref_record *)rec.data)->update_index +=
     ++			ti->r->min_update_index;
     ++	}
      +
     -+  return res;
     ++	return res;
      +}
      +
     -+static void table_iter_block_done(struct table_iter *ti) {
     -+  if (ti->bi.br == NULL) {
     -+    return;
     -+  }
     -+  reader_return_block(ti->r, &ti->bi.br->block);
     -+  free(ti->bi.br);
     -+  ti->bi.br = NULL;
     ++static void table_iter_block_done(struct table_iter *ti)
     ++{
     ++	if (ti->bi.br == NULL) {
     ++		return;
     ++	}
     ++	reader_return_block(ti->r, &ti->bi.br->block);
     ++	free(ti->bi.br);
     ++	ti->bi.br = NULL;
      +
     -+  ti->bi.last_key.len = 0;
     -+  ti->bi.next_off = 0;
     ++	ti->bi.last_key.len = 0;
     ++	ti->bi.next_off = 0;
      +}
      +
     -+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off) {
     -+  int32_t result = 0;
     ++static int32_t extract_block_size(byte *data, byte *typ, uint64_t off)
     ++{
     ++	int32_t result = 0;
      +
     -+  if (off == 0) {
     -+    data += 24;
     -+  }
     ++	if (off == 0) {
     ++		data += 24;
     ++	}
      +
     -+  *typ = data[0];
     -+  if (is_block_type(*typ)) {
     -+    result = get_u24(data + 1);
     -+  }
     -+  return result;
     ++	*typ = data[0];
     ++	if (is_block_type(*typ)) {
     ++		result = get_u24(data + 1);
     ++	}
     ++	return result;
      +}
      +
      +int reader_init_block_reader(struct reader *r, struct block_reader *br,
     -+                             uint64_t next_off, byte want_typ) {
     -+  int32_t guess_block_size = r->block_size ? r->block_size : DEFAULT_BLOCK_SIZE;
     -+  struct block block = {};
     -+  byte block_typ = 0;
     -+  int err = 0;
     -+  uint32_t header_off = next_off ? 0 : HEADER_SIZE;
     -+  int32_t block_size = 0;
     -+
     -+  if (next_off >= r->size) {
     -+    return 1;
     -+  }
     -+
     -+  err = reader_get_block(r, &block, next_off, guess_block_size);
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+
     -+  block_size = extract_block_size(block.data, &block_typ, next_off);
     -+  if (block_size < 0) {
     -+    return block_size;
     -+  }
     -+
     -+  if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
     -+    reader_return_block(r, &block);
     -+    return 1;
     -+  }
     -+
     -+  if (block_size > guess_block_size) {
     -+    reader_return_block(r, &block);
     -+    err = reader_get_block(r, &block, next_off, block_size);
     -+    if (err < 0) {
     -+      return err;
     -+    }
     -+  }
     -+
     -+  return block_reader_init(br, &block, header_off, r->block_size, r->hash_size);
     ++			     uint64_t next_off, byte want_typ)
     ++{
     ++	int32_t guess_block_size = r->block_size ? r->block_size :
     ++						   DEFAULT_BLOCK_SIZE;
     ++	struct block block = {};
     ++	byte block_typ = 0;
     ++	int err = 0;
     ++	uint32_t header_off = next_off ? 0 : HEADER_SIZE;
     ++	int32_t block_size = 0;
     ++
     ++	if (next_off >= r->size) {
     ++		return 1;
     ++	}
     ++
     ++	err = reader_get_block(r, &block, next_off, guess_block_size);
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	block_size = extract_block_size(block.data, &block_typ, next_off);
     ++	if (block_size < 0) {
     ++		return block_size;
     ++	}
     ++
     ++	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
     ++		reader_return_block(r, &block);
     ++		return 1;
     ++	}
     ++
     ++	if (block_size > guess_block_size) {
     ++		reader_return_block(r, &block);
     ++		err = reader_get_block(r, &block, next_off, block_size);
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++	}
     ++
     ++	return block_reader_init(br, &block, header_off, r->block_size,
     ++				 r->hash_size);
      +}
      +
      +static int table_iter_next_block(struct table_iter *dest,
     -+                                 struct table_iter *src) {
     -+  uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
     -+  struct block_reader br = {};
     -+  int err = 0;
     -+
     -+  dest->r = src->r;
     -+  dest->typ = src->typ;
     -+  dest->block_off = next_block_off;
     -+
     -+  err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
     -+  if (err > 0) {
     -+    dest->finished = true;
     -+    return 1;
     -+  }
     -+  if (err != 0) {
     -+    return err;
     -+  }
     -+
     -+  {
     -+    struct block_reader *brp = malloc(sizeof(struct block_reader));
     -+    *brp = br;
     -+
     -+    dest->finished = false;
     -+    block_reader_start(brp, &dest->bi);
     -+  }
     -+  return 0;
     -+}
     -+
     -+static int table_iter_next(struct table_iter *ti, struct record rec) {
     -+  if (record_type(rec) != ti->typ) {
     -+    return API_ERROR;
     -+  }
     -+
     -+  while (true) {
     -+    struct table_iter next = {};
     -+    int err = 0;
     -+    if (ti->finished) {
     -+      return 1;
     -+    }
     -+
     -+    err = table_iter_next_in_block(ti, rec);
     -+    if (err <= 0) {
     -+      return err;
     -+    }
     -+
     -+    err = table_iter_next_block(&next, ti);
     -+    if (err != 0) {
     -+      ti->finished = true;
     -+    }
     -+    table_iter_block_done(ti);
     -+    if (err != 0) {
     -+      return err;
     -+    }
     -+    table_iter_copy_from(ti, &next);
     -+    block_iter_close(&next.bi);
     -+  }
     -+}
     -+
     -+static int table_iter_next_void(void *ti, struct record rec) {
     -+  return table_iter_next((struct table_iter *)ti, rec);
     -+}
     -+
     -+static void table_iter_close(void *p) {
     -+  struct table_iter *ti = (struct table_iter *)p;
     -+  table_iter_block_done(ti);
     -+  block_iter_close(&ti->bi);
     ++				 struct table_iter *src)
     ++{
     ++	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
     ++	struct block_reader br = {};
     ++	int err = 0;
     ++
     ++	dest->r = src->r;
     ++	dest->typ = src->typ;
     ++	dest->block_off = next_block_off;
     ++
     ++	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
     ++	if (err > 0) {
     ++		dest->finished = true;
     ++		return 1;
     ++	}
     ++	if (err != 0) {
     ++		return err;
     ++	}
     ++
     ++	{
     ++		struct block_reader *brp = malloc(sizeof(struct block_reader));
     ++		*brp = br;
     ++
     ++		dest->finished = false;
     ++		block_reader_start(brp, &dest->bi);
     ++	}
     ++	return 0;
     ++}
     ++
     ++static int table_iter_next(struct table_iter *ti, struct record rec)
     ++{
     ++	if (record_type(rec) != ti->typ) {
     ++		return API_ERROR;
     ++	}
     ++
     ++	while (true) {
     ++		struct table_iter next = {};
     ++		int err = 0;
     ++		if (ti->finished) {
     ++			return 1;
     ++		}
     ++
     ++		err = table_iter_next_in_block(ti, rec);
     ++		if (err <= 0) {
     ++			return err;
     ++		}
     ++
     ++		err = table_iter_next_block(&next, ti);
     ++		if (err != 0) {
     ++			ti->finished = true;
     ++		}
     ++		table_iter_block_done(ti);
     ++		if (err != 0) {
     ++			return err;
     ++		}
     ++		table_iter_copy_from(ti, &next);
     ++		block_iter_close(&next.bi);
     ++	}
     ++}
     ++
     ++static int table_iter_next_void(void *ti, struct record rec)
     ++{
     ++	return table_iter_next((struct table_iter *)ti, rec);
     ++}
     ++
     ++static void table_iter_close(void *p)
     ++{
     ++	struct table_iter *ti = (struct table_iter *)p;
     ++	table_iter_block_done(ti);
     ++	block_iter_close(&ti->bi);
      +}
      +
      +struct iterator_vtable table_iter_vtable = {
     -+    .next = &table_iter_next_void,
     -+    .close = &table_iter_close,
     ++	.next = &table_iter_next_void,
     ++	.close = &table_iter_close,
      +};
      +
     -+static void iterator_from_table_iter(struct iterator *it,
     -+                                     struct table_iter *ti) {
     -+  it->iter_arg = ti;
     -+  it->ops = &table_iter_vtable;
     ++static void iterator_from_table_iter(struct iterator *it, struct table_iter *ti)
     ++{
     ++	it->iter_arg = ti;
     ++	it->ops = &table_iter_vtable;
      +}
      +
      +static int reader_table_iter_at(struct reader *r, struct table_iter *ti,
     -+                                uint64_t off, byte typ) {
     -+  struct block_reader br = {};
     -+  struct block_reader *brp = NULL;
     ++				uint64_t off, byte typ)
     ++{
     ++	struct block_reader br = {};
     ++	struct block_reader *brp = NULL;
      +
     -+  int err = reader_init_block_reader(r, &br, off, typ);
     -+  if (err != 0) {
     -+    return err;
     -+  }
     ++	int err = reader_init_block_reader(r, &br, off, typ);
     ++	if (err != 0) {
     ++		return err;
     ++	}
      +
     -+  brp = malloc(sizeof(struct block_reader));
     -+  *brp = br;
     -+  ti->r = r;
     -+  ti->typ = block_reader_type(brp);
     -+  ti->block_off = off;
     -+  block_reader_start(brp, &ti->bi);
     -+  return 0;
     ++	brp = malloc(sizeof(struct block_reader));
     ++	*brp = br;
     ++	ti->r = r;
     ++	ti->typ = block_reader_type(brp);
     ++	ti->block_off = off;
     ++	block_reader_start(brp, &ti->bi);
     ++	return 0;
      +}
      +
      +static int reader_start(struct reader *r, struct table_iter *ti, byte typ,
     -+                        bool index) {
     -+  struct reader_offsets *offs = reader_offsets_for(r, typ);
     -+  uint64_t off = offs->offset;
     -+  if (index) {
     -+    off = offs->index_offset;
     -+    if (off == 0) {
     -+      return 1;
     -+    }
     -+    typ = BLOCK_TYPE_INDEX;
     -+  }
     ++			bool index)
     ++{
     ++	struct reader_offsets *offs = reader_offsets_for(r, typ);
     ++	uint64_t off = offs->offset;
     ++	if (index) {
     ++		off = offs->index_offset;
     ++		if (off == 0) {
     ++			return 1;
     ++		}
     ++		typ = BLOCK_TYPE_INDEX;
     ++	}
      +
     -+  return reader_table_iter_at(r, ti, off, typ);
     ++	return reader_table_iter_at(r, ti, off, typ);
      +}
      +
      +static int reader_seek_linear(struct reader *r, struct table_iter *ti,
     -+                              struct record want) {
     -+  struct record rec = new_record(record_type(want));
     -+  struct slice want_key = {};
     -+  struct slice got_key = {};
     -+  struct table_iter next = {};
     -+  int err = -1;
     -+  record_key(want, &want_key);
     -+
     -+  while (true) {
     -+    err = table_iter_next_block(&next, ti);
     -+    if (err < 0) {
     -+      goto exit;
     -+    }
     -+
     -+    if (err > 0) {
     -+      break;
     -+    }
     -+
     -+    err = block_reader_first_key(next.bi.br, &got_key);
     -+    if (err < 0) {
     -+      goto exit;
     -+    }
     -+    {
     -+      int cmp = slice_compare(got_key, want_key);
     -+      if (cmp > 0) {
     -+        table_iter_block_done(&next);
     -+        break;
     -+      }
     -+    }
     -+
     -+    table_iter_block_done(ti);
     -+    table_iter_copy_from(ti, &next);
     -+  }
     -+
     -+  err = block_iter_seek(&ti->bi, want_key);
     -+  if (err < 0) {
     -+    goto exit;
     -+  }
     -+  err = 0;
     ++			      struct record want)
     ++{
     ++	struct record rec = new_record(record_type(want));
     ++	struct slice want_key = {};
     ++	struct slice got_key = {};
     ++	struct table_iter next = {};
     ++	int err = -1;
     ++	record_key(want, &want_key);
     ++
     ++	while (true) {
     ++		err = table_iter_next_block(&next, ti);
     ++		if (err < 0) {
     ++			goto exit;
     ++		}
     ++
     ++		if (err > 0) {
     ++			break;
     ++		}
     ++
     ++		err = block_reader_first_key(next.bi.br, &got_key);
     ++		if (err < 0) {
     ++			goto exit;
     ++		}
     ++		{
     ++			int cmp = slice_compare(got_key, want_key);
     ++			if (cmp > 0) {
     ++				table_iter_block_done(&next);
     ++				break;
     ++			}
     ++		}
     ++
     ++		table_iter_block_done(ti);
     ++		table_iter_copy_from(ti, &next);
     ++	}
     ++
     ++	err = block_iter_seek(&ti->bi, want_key);
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++	err = 0;
      +
      +exit:
     -+  block_iter_close(&next.bi);
     -+  record_clear(rec);
     -+  free(record_yield(&rec));
     -+  free(slice_yield(&want_key));
     -+  free(slice_yield(&got_key));
     -+  return err;
     ++	block_iter_close(&next.bi);
     ++	record_clear(rec);
     ++	free(record_yield(&rec));
     ++	free(slice_yield(&want_key));
     ++	free(slice_yield(&got_key));
     ++	return err;
      +}
      +
      +static int reader_seek_indexed(struct reader *r, struct iterator *it,
     -+                               struct record rec) {
     -+  struct index_record want_index = {};
     -+  struct record want_index_rec = {};
     -+  struct index_record index_result = {};
     -+  struct record index_result_rec = {};
     -+  struct table_iter index_iter = {};
     -+  struct table_iter next = {};
     -+  int err = 0;
     -+
     -+  record_key(rec, &want_index.last_key);
     -+  record_from_index(&want_index_rec, &want_index);
     -+  record_from_index(&index_result_rec, &index_result);
     -+
     -+  err = reader_start(r, &index_iter, record_type(rec), true);
     -+  if (err < 0) {
     -+    goto exit;
     -+  }
     -+
     -+  err = reader_seek_linear(r, &index_iter, want_index_rec);
     -+  while (true) {
     -+    err = table_iter_next(&index_iter, index_result_rec);
     -+    table_iter_block_done(&index_iter);
     -+    if (err != 0) {
     -+      goto exit;
     -+    }
     -+
     -+    err = reader_table_iter_at(r, &next, index_result.offset, 0);
     -+    if (err != 0) {
     -+      goto exit;
     -+    }
     -+
     -+    err = block_iter_seek(&next.bi, want_index.last_key);
     -+    if (err < 0) {
     -+      goto exit;
     -+    }
     -+
     -+    if (next.typ == record_type(rec)) {
     -+      err = 0;
     -+      break;
     -+    }
     -+
     -+    if (next.typ != BLOCK_TYPE_INDEX) {
     -+      err = FORMAT_ERROR;
     -+      break;
     -+    }
     -+
     -+    table_iter_copy_from(&index_iter, &next);
     -+  }
     -+
     -+  if (err == 0) {
     -+    struct table_iter *malloced = calloc(sizeof(struct table_iter), 1);
     -+    table_iter_copy_from(malloced, &next);
     -+    iterator_from_table_iter(it, malloced);
     -+  }
     ++			       struct record rec)
     ++{
     ++	struct index_record want_index = {};
     ++	struct record want_index_rec = {};
     ++	struct index_record index_result = {};
     ++	struct record index_result_rec = {};
     ++	struct table_iter index_iter = {};
     ++	struct table_iter next = {};
     ++	int err = 0;
     ++
     ++	record_key(rec, &want_index.last_key);
     ++	record_from_index(&want_index_rec, &want_index);
     ++	record_from_index(&index_result_rec, &index_result);
     ++
     ++	err = reader_start(r, &index_iter, record_type(rec), true);
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	err = reader_seek_linear(r, &index_iter, want_index_rec);
     ++	while (true) {
     ++		err = table_iter_next(&index_iter, index_result_rec);
     ++		table_iter_block_done(&index_iter);
     ++		if (err != 0) {
     ++			goto exit;
     ++		}
     ++
     ++		err = reader_table_iter_at(r, &next, index_result.offset, 0);
     ++		if (err != 0) {
     ++			goto exit;
     ++		}
     ++
     ++		err = block_iter_seek(&next.bi, want_index.last_key);
     ++		if (err < 0) {
     ++			goto exit;
     ++		}
     ++
     ++		if (next.typ == record_type(rec)) {
     ++			err = 0;
     ++			break;
     ++		}
     ++
     ++		if (next.typ != BLOCK_TYPE_INDEX) {
     ++			err = FORMAT_ERROR;
     ++			break;
     ++		}
     ++
     ++		table_iter_copy_from(&index_iter, &next);
     ++	}
     ++
     ++	if (err == 0) {
     ++		struct table_iter *malloced =
     ++			calloc(sizeof(struct table_iter), 1);
     ++		table_iter_copy_from(malloced, &next);
     ++		iterator_from_table_iter(it, malloced);
     ++	}
      +exit:
     -+  block_iter_close(&next.bi);
     -+  table_iter_close(&index_iter);
     -+  record_clear(want_index_rec);
     -+  record_clear(index_result_rec);
     -+  return err;
     ++	block_iter_close(&next.bi);
     ++	table_iter_close(&index_iter);
     ++	record_clear(want_index_rec);
     ++	record_clear(index_result_rec);
     ++	return err;
      +}
      +
      +static int reader_seek_internal(struct reader *r, struct iterator *it,
     -+                                struct record rec) {
     -+  struct reader_offsets *offs = reader_offsets_for(r, record_type(rec));
     -+  uint64_t idx = offs->index_offset;
     -+  struct table_iter ti = {};
     -+  int err = 0;
     -+  if (idx > 0) {
     -+    return reader_seek_indexed(r, it, rec);
     -+  }
     -+
     -+  err = reader_start(r, &ti, record_type(rec), false);
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+  err = reader_seek_linear(r, &ti, rec);
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+
     -+  {
     -+    struct table_iter *p = malloc(sizeof(struct table_iter));
     -+    *p = ti;
     -+    iterator_from_table_iter(it, p);
     -+  }
     -+
     -+  return 0;
     -+}
     -+
     -+int reader_seek(struct reader *r, struct iterator *it, struct record rec) {
     -+  byte typ = record_type(rec);
     -+
     -+  struct reader_offsets *offs = reader_offsets_for(r, typ);
     -+  if (!offs->present) {
     -+    iterator_set_empty(it);
     -+    return 0;
     -+  }
     -+
     -+  return reader_seek_internal(r, it, rec);
     -+}
     -+
     -+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name) {
     -+  struct ref_record ref = {
     -+      .ref_name = (char *)name,
     -+  };
     -+  struct record rec = {};
     -+  record_from_ref(&rec, &ref);
     -+  return reader_seek(r, it, rec);
     ++				struct record rec)
     ++{
     ++	struct reader_offsets *offs = reader_offsets_for(r, record_type(rec));
     ++	uint64_t idx = offs->index_offset;
     ++	struct table_iter ti = {};
     ++	int err = 0;
     ++	if (idx > 0) {
     ++		return reader_seek_indexed(r, it, rec);
     ++	}
     ++
     ++	err = reader_start(r, &ti, record_type(rec), false);
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++	err = reader_seek_linear(r, &ti, rec);
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	{
     ++		struct table_iter *p = malloc(sizeof(struct table_iter));
     ++		*p = ti;
     ++		iterator_from_table_iter(it, p);
     ++	}
     ++
     ++	return 0;
     ++}
     ++
     ++int reader_seek(struct reader *r, struct iterator *it, struct record rec)
     ++{
     ++	byte typ = record_type(rec);
     ++
     ++	struct reader_offsets *offs = reader_offsets_for(r, typ);
     ++	if (!offs->present) {
     ++		iterator_set_empty(it);
     ++		return 0;
     ++	}
     ++
     ++	return reader_seek_internal(r, it, rec);
     ++}
     ++
     ++int reader_seek_ref(struct reader *r, struct iterator *it, const char *name)
     ++{
     ++	struct ref_record ref = {
     ++		.ref_name = (char *)name,
     ++	};
     ++	struct record rec = {};
     ++	record_from_ref(&rec, &ref);
     ++	return reader_seek(r, it, rec);
      +}
      +
      +int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
     -+                    uint64_t update_index) {
     -+  struct log_record log = {
     -+      .ref_name = (char *)name,
     -+      .update_index = update_index,
     -+  };
     -+  struct record rec = {};
     -+  record_from_log(&rec, &log);
     -+  return reader_seek(r, it, rec);
     ++		       uint64_t update_index)
     ++{
     ++	struct log_record log = {
     ++		.ref_name = (char *)name,
     ++		.update_index = update_index,
     ++	};
     ++	struct record rec = {};
     ++	record_from_log(&rec, &log);
     ++	return reader_seek(r, it, rec);
      +}
      +
     -+int reader_seek_log(struct reader *r, struct iterator *it, const char *name) {
     -+  uint64_t max = ~((uint64_t)0);
     -+  return reader_seek_log_at(r, it, name, max);
     ++int reader_seek_log(struct reader *r, struct iterator *it, const char *name)
     ++{
     ++	uint64_t max = ~((uint64_t)0);
     ++	return reader_seek_log_at(r, it, name, max);
      +}
      +
     -+void reader_close(struct reader *r) {
     -+  block_source_close(&r->source);
     -+  free(r->name);
     -+  r->name = NULL;
     ++void reader_close(struct reader *r)
     ++{
     ++	block_source_close(&r->source);
     ++	free(r->name);
     ++	r->name = NULL;
      +}
      +
     -+int new_reader(struct reader **p, struct block_source src, char const *name) {
     -+  struct reader *rd = calloc(sizeof(struct reader), 1);
     -+  int err = init_reader(rd, src, name);
     -+  if (err == 0) {
     -+    *p = rd;
     -+  } else {
     -+    free(rd);
     -+  }
     -+  return err;
     ++int new_reader(struct reader **p, struct block_source src, char const *name)
     ++{
     ++	struct reader *rd = calloc(sizeof(struct reader), 1);
     ++	int err = init_reader(rd, src, name);
     ++	if (err == 0) {
     ++		*p = rd;
     ++	} else {
     ++		free(rd);
     ++	}
     ++	return err;
      +}
      +
     -+void reader_free(struct reader *r) {
     -+  reader_close(r);
     -+  free(r);
     ++void reader_free(struct reader *r)
     ++{
     ++	reader_close(r);
     ++	free(r);
      +}
      +
      +static int reader_refs_for_indexed(struct reader *r, struct iterator *it,
     -+                                   byte *oid) {
     -+  struct obj_record want = {
     -+      .hash_prefix = oid,
     -+      .hash_prefix_len = r->object_id_len,
     -+  };
     -+  struct record want_rec = {};
     -+  struct iterator oit = {};
     -+  struct obj_record got = {};
     -+  struct record got_rec = {};
     -+  int err = 0;
     -+
     -+  record_from_obj(&want_rec, &want);
     -+
     -+  err = reader_seek(r, &oit, want_rec);
     -+  if (err != 0) {
     -+    return err;
     -+  }
     -+
     -+  record_from_obj(&got_rec, &got);
     -+  err = iterator_next(oit, got_rec);
     -+  iterator_destroy(&oit);
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+
     -+  if (err > 0 || memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
     -+    iterator_set_empty(it);
     -+    return 0;
     -+  }
     -+
     -+  {
     -+    struct indexed_table_ref_iter *itr = NULL;
     -+    err = new_indexed_table_ref_iter(&itr, r, oid, r->hash_size, got.offsets,
     -+                                     got.offset_len);
     -+    if (err < 0) {
     -+      record_clear(got_rec);
     -+      return err;
     -+    }
     -+    got.offsets = NULL;
     -+    record_clear(got_rec);
     -+
     -+    iterator_from_indexed_table_ref_iter(it, itr);
     -+  }
     -+
     -+  return 0;
     ++				   byte *oid)
     ++{
     ++	struct obj_record want = {
     ++		.hash_prefix = oid,
     ++		.hash_prefix_len = r->object_id_len,
     ++	};
     ++	struct record want_rec = {};
     ++	struct iterator oit = {};
     ++	struct obj_record got = {};
     ++	struct record got_rec = {};
     ++	int err = 0;
     ++
     ++	record_from_obj(&want_rec, &want);
     ++
     ++	err = reader_seek(r, &oit, want_rec);
     ++	if (err != 0) {
     ++		return err;
     ++	}
     ++
     ++	record_from_obj(&got_rec, &got);
     ++	err = iterator_next(oit, got_rec);
     ++	iterator_destroy(&oit);
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	if (err > 0 ||
     ++	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
     ++		iterator_set_empty(it);
     ++		return 0;
     ++	}
     ++
     ++	{
     ++		struct indexed_table_ref_iter *itr = NULL;
     ++		err = new_indexed_table_ref_iter(&itr, r, oid, r->hash_size,
     ++						 got.offsets, got.offset_len);
     ++		if (err < 0) {
     ++			record_clear(got_rec);
     ++			return err;
     ++		}
     ++		got.offsets = NULL;
     ++		record_clear(got_rec);
     ++
     ++		iterator_from_indexed_table_ref_iter(it, itr);
     ++	}
     ++
     ++	return 0;
      +}
      +
      +static int reader_refs_for_unindexed(struct reader *r, struct iterator *it,
     -+                                     byte *oid, int oid_len) {
     -+  struct table_iter *ti = calloc(sizeof(struct table_iter), 1);
     -+  struct filtering_ref_iterator *filter = NULL;
     -+  int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
     -+  if (err < 0) {
     -+    free(ti);
     -+    return err;
     -+  }
     -+
     -+  filter = calloc(sizeof(struct filtering_ref_iterator), 1);
     -+  slice_resize(&filter->oid, oid_len);
     -+  memcpy(filter->oid.buf, oid, oid_len);
     -+  filter->r = r;
     -+  filter->double_check = false;
     -+  iterator_from_table_iter(&filter->it, ti);
     -+
     -+  iterator_from_filtering_ref_iterator(it, filter);
     -+  return 0;
     ++				     byte *oid, int oid_len)
     ++{
     ++	struct table_iter *ti = calloc(sizeof(struct table_iter), 1);
     ++	struct filtering_ref_iterator *filter = NULL;
     ++	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
     ++	if (err < 0) {
     ++		free(ti);
     ++		return err;
     ++	}
     ++
     ++	filter = calloc(sizeof(struct filtering_ref_iterator), 1);
     ++	slice_resize(&filter->oid, oid_len);
     ++	memcpy(filter->oid.buf, oid, oid_len);
     ++	filter->r = r;
     ++	filter->double_check = false;
     ++	iterator_from_table_iter(&filter->it, ti);
     ++
     ++	iterator_from_filtering_ref_iterator(it, filter);
     ++	return 0;
      +}
      +
      +int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
     -+                    int oid_len) {
     -+  if (r->obj_offsets.present) {
     -+    return reader_refs_for_indexed(r, it, oid);
     -+  }
     -+  return reader_refs_for_unindexed(r, it, oid, oid_len);
     ++		    int oid_len)
     ++{
     ++	if (r->obj_offsets.present) {
     ++		return reader_refs_for_indexed(r, it, oid);
     ++	}
     ++	return reader_refs_for_unindexed(r, it, oid, oid_len);
      +}
      +
     -+uint64_t reader_max_update_index(struct reader *r) {
     -+  return r->max_update_index;
     ++uint64_t reader_max_update_index(struct reader *r)
     ++{
     ++	return r->max_update_index;
      +}
      +
     -+uint64_t reader_min_update_index(struct reader *r) {
     -+  return r->min_update_index;
     ++uint64_t reader_min_update_index(struct reader *r)
     ++{
     ++	return r->min_update_index;
      +}
      
       diff --git a/reftable/reader.h b/reftable/reader.h
     @@ -3145,29 +3070,29 @@
      +uint64_t block_source_size(struct block_source source);
      +
      +int block_source_read_block(struct block_source source, struct block *dest,
     -+                            uint64_t off, uint32_t size);
     ++			    uint64_t off, uint32_t size);
      +void block_source_return_block(struct block_source source, struct block *ret);
      +void block_source_close(struct block_source *source);
      +
      +struct reader_offsets {
     -+  bool present;
     -+  uint64_t offset;
     -+  uint64_t index_offset;
     ++	bool present;
     ++	uint64_t offset;
     ++	uint64_t index_offset;
      +};
      +
      +struct reader {
     -+  struct block_source source;
     -+  char *name;
     -+  int hash_size;
     -+  uint64_t size;
     -+  uint32_t block_size;
     -+  uint64_t min_update_index;
     -+  uint64_t max_update_index;
     -+  int object_id_len;
     -+
     -+  struct reader_offsets ref_offsets;
     -+  struct reader_offsets obj_offsets;
     -+  struct reader_offsets log_offsets;
     ++	struct block_source source;
     ++	char *name;
     ++	int hash_size;
     ++	uint64_t size;
     ++	uint32_t block_size;
     ++	uint64_t min_update_index;
     ++	uint64_t max_update_index;
     ++	int object_id_len;
     ++
     ++	struct reader_offsets ref_offsets;
     ++	struct reader_offsets obj_offsets;
     ++	struct reader_offsets log_offsets;
      +};
      +
      +int init_reader(struct reader *r, struct block_source source, const char *name);
     @@ -3176,7 +3101,7 @@
      +const char *reader_name(struct reader *r);
      +void reader_return_block(struct reader *r, struct block *p);
      +int reader_init_block_reader(struct reader *r, struct block_reader *br,
     -+                             uint64_t next_off, byte want_typ);
     ++			     uint64_t next_off, byte want_typ);
      +
      +#endif
      
     @@ -3195,1025 +3120,1105 @@
      +
      +#include "record.h"
      +
     -+#include <assert.h>
     -+#include <stdio.h>
     -+#include <stdlib.h>
     -+#include <string.h>
     ++#include "system.h"
      +
      +#include "constants.h"
      +#include "reftable.h"
      +
     -+int is_block_type(byte typ) {
     -+  switch (typ) {
     -+    case BLOCK_TYPE_REF:
     -+    case BLOCK_TYPE_LOG:
     -+    case BLOCK_TYPE_OBJ:
     -+    case BLOCK_TYPE_INDEX:
     -+      return true;
     -+  }
     -+  return false;
     -+}
     -+
     -+int get_var_int(uint64_t *dest, struct slice in) {
     -+  int ptr = 0;
     -+  uint64_t val;
     -+
     -+  if (in.len == 0) {
     -+    return -1;
     -+  }
     -+  val = in.buf[ptr] & 0x7f;
     -+
     -+  while (in.buf[ptr] & 0x80) {
     -+    ptr++;
     -+    if (ptr > in.len) {
     -+      return -1;
     -+    }
     -+    val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
     -+  }
     -+
     -+  *dest = val;
     -+  return ptr + 1;
     -+}
     -+
     -+int put_var_int(struct slice dest, uint64_t val) {
     -+  byte buf[10] = {};
     -+  int i = 9;
     -+  buf[i] = (byte)(val & 0x7f);
     -+  i--;
     -+  while (true) {
     -+    val >>= 7;
     -+    if (!val) {
     -+      break;
     -+    }
     -+    val--;
     -+    buf[i] = 0x80 | (byte)(val & 0x7f);
     -+    i--;
     -+  }
     -+
     -+  {
     -+    int n = sizeof(buf) - i - 1;
     -+    if (dest.len < n) {
     -+      return -1;
     -+    }
     -+    memcpy(dest.buf, &buf[i + 1], n);
     -+    return n;
     -+  }
     -+}
     -+
     -+int common_prefix_size(struct slice a, struct slice b) {
     -+  int p = 0;
     -+  while (p < a.len && p < b.len) {
     -+    if (a.buf[p] != b.buf[p]) {
     -+      break;
     -+    }
     -+    p++;
     -+  }
     -+
     -+  return p;
     -+}
     -+
     -+static int decode_string(struct slice *dest, struct slice in) {
     -+  int start_len = in.len;
     -+  uint64_t tsize = 0;
     -+  int n = get_var_int(&tsize, in);
     -+  if (n <= 0) {
     -+    return -1;
     -+  }
     -+  in.buf += n;
     -+  in.len -= n;
     -+  if (in.len < tsize) {
     -+    return -1;
     -+  }
     -+
     -+  slice_resize(dest, tsize + 1);
     -+  dest->buf[tsize] = 0;
     -+  memcpy(dest->buf, in.buf, tsize);
     -+  in.buf += tsize;
     -+  in.len -= tsize;
     -+
     -+  return start_len - in.len;
     ++int is_block_type(byte typ)
     ++{
     ++	switch (typ) {
     ++	case BLOCK_TYPE_REF:
     ++	case BLOCK_TYPE_LOG:
     ++	case BLOCK_TYPE_OBJ:
     ++	case BLOCK_TYPE_INDEX:
     ++		return true;
     ++	}
     ++	return false;
     ++}
     ++
     ++int get_var_int(uint64_t *dest, struct slice in)
     ++{
     ++	int ptr = 0;
     ++	uint64_t val;
     ++
     ++	if (in.len == 0) {
     ++		return -1;
     ++	}
     ++	val = in.buf[ptr] & 0x7f;
     ++
     ++	while (in.buf[ptr] & 0x80) {
     ++		ptr++;
     ++		if (ptr > in.len) {
     ++			return -1;
     ++		}
     ++		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
     ++	}
     ++
     ++	*dest = val;
     ++	return ptr + 1;
     ++}
     ++
     ++int put_var_int(struct slice dest, uint64_t val)
     ++{
     ++	byte buf[10] = {};
     ++	int i = 9;
     ++	buf[i] = (byte)(val & 0x7f);
     ++	i--;
     ++	while (true) {
     ++		val >>= 7;
     ++		if (!val) {
     ++			break;
     ++		}
     ++		val--;
     ++		buf[i] = 0x80 | (byte)(val & 0x7f);
     ++		i--;
     ++	}
     ++
     ++	{
     ++		int n = sizeof(buf) - i - 1;
     ++		if (dest.len < n) {
     ++			return -1;
     ++		}
     ++		memcpy(dest.buf, &buf[i + 1], n);
     ++		return n;
     ++	}
     ++}
     ++
     ++int common_prefix_size(struct slice a, struct slice b)
     ++{
     ++	int p = 0;
     ++	while (p < a.len && p < b.len) {
     ++		if (a.buf[p] != b.buf[p]) {
     ++			break;
     ++		}
     ++		p++;
     ++	}
     ++
     ++	return p;
     ++}
     ++
     ++static int decode_string(struct slice *dest, struct slice in)
     ++{
     ++	int start_len = in.len;
     ++	uint64_t tsize = 0;
     ++	int n = get_var_int(&tsize, in);
     ++	if (n <= 0) {
     ++		return -1;
     ++	}
     ++	in.buf += n;
     ++	in.len -= n;
     ++	if (in.len < tsize) {
     ++		return -1;
     ++	}
     ++
     ++	slice_resize(dest, tsize + 1);
     ++	dest->buf[tsize] = 0;
     ++	memcpy(dest->buf, in.buf, tsize);
     ++	in.buf += tsize;
     ++	in.len -= tsize;
     ++
     ++	return start_len - in.len;
      +}
      +
      +int encode_key(bool *restart, struct slice dest, struct slice prev_key,
     -+               struct slice key, byte extra) {
     -+  struct slice start = dest;
     -+  int prefix_len = common_prefix_size(prev_key, key);
     -+  uint64_t suffix_len = key.len - prefix_len;
     -+  int n = put_var_int(dest, (uint64_t)prefix_len);
     -+  if (n < 0) {
     -+    return -1;
     -+  }
     -+  dest.buf += n;
     -+  dest.len -= n;
     -+
     -+  *restart = (prefix_len == 0);
     -+
     -+  n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
     -+  if (n < 0) {
     -+    return -1;
     -+  }
     -+  dest.buf += n;
     -+  dest.len -= n;
     -+
     -+  if (dest.len < suffix_len) {
     -+    return -1;
     -+  }
     -+  memcpy(dest.buf, key.buf + prefix_len, suffix_len);
     -+  dest.buf += suffix_len;
     -+  dest.len -= suffix_len;
     -+
     -+  return start.len - dest.len;
     -+}
     -+
     -+static byte ref_record_type(void) { return BLOCK_TYPE_REF; }
     -+
     -+static void ref_record_key(const void *r, struct slice *dest) {
     -+  const struct ref_record *rec = (const struct ref_record *)r;
     -+  slice_set_string(dest, rec->ref_name);
     -+}
     -+
     -+static void ref_record_copy_from(void *rec, const void *src_rec,
     -+                                 int hash_size) {
     -+  struct ref_record *ref = (struct ref_record *)rec;
     -+  struct ref_record *src = (struct ref_record *)src_rec;
     -+  assert(hash_size > 0);
     -+
     -+  // This is simple and correct, but we could probably reuse the hash fields.
     -+  ref_record_clear(ref);
     -+  if (src->ref_name != NULL) {
     -+    ref->ref_name = strdup(src->ref_name);
     -+  }
     -+
     -+  if (src->target != NULL) {
     -+    ref->target = strdup(src->target);
     -+  }
     -+
     -+  if (src->target_value != NULL) {
     -+    ref->target_value = malloc(hash_size);
     -+    memcpy(ref->target_value, src->target_value, hash_size);
     -+  }
     -+
     -+  if (src->value != NULL) {
     -+    ref->value = malloc(hash_size);
     -+    memcpy(ref->value, src->value, hash_size);
     -+  }
     -+  ref->update_index = src->update_index;
     -+}
     -+
     -+static char hexdigit(int c) {
     -+  if (c <= 9) {
     -+    return '0' + c;
     -+  }
     -+  return 'a' + (c - 10);
     -+}
     -+
     -+static void hex_format(char *dest, byte *src, int hash_size) {
     -+  assert(hash_size > 0);
     -+  if (src != NULL) {
     -+    for (int i = 0; i < hash_size; i++) {
     -+      dest[2 * i] = hexdigit(src[i] >> 4);
     -+      dest[2 * i + 1] = hexdigit(src[i] & 0xf);
     -+    }
     -+    dest[2 * hash_size] = 0;
     -+  }
     -+}
     -+
     -+void ref_record_print(struct ref_record *ref, int hash_size) {
     -+  char hex[SHA256_SIZE + 1] = {};
     -+
     -+  printf("ref{%s(%ld) ", ref->ref_name, ref->update_index);
     -+  if (ref->value != NULL) {
     -+    hex_format(hex, ref->value, hash_size);
     -+    printf("%s", hex);
     -+  }
     -+  if (ref->target_value != NULL) {
     -+    hex_format(hex, ref->target_value, hash_size);
     -+    printf(" (T %s)", hex);
     -+  }
     -+  if (ref->target != NULL) {
     -+    printf("=> %s", ref->target);
     -+  }
     -+  printf("}\n");
     -+}
     -+
     -+static void ref_record_clear_void(void *rec) {
     -+  ref_record_clear((struct ref_record *)rec);
     -+}
     -+
     -+void ref_record_clear(struct ref_record *ref) {
     -+  free(ref->ref_name);
     -+  free(ref->target);
     -+  free(ref->target_value);
     -+  free(ref->value);
     -+  memset(ref, 0, sizeof(struct ref_record));
     -+}
     -+
     -+static byte ref_record_val_type(const void *rec) {
     -+  const struct ref_record *r = (const struct ref_record *)rec;
     -+  if (r->value != NULL) {
     -+    if (r->target_value != NULL) {
     -+      return 2;
     -+    } else {
     -+      return 1;
     -+    }
     -+  } else if (r->target != NULL) {
     -+    return 3;
     -+  }
     -+  return 0;
     -+}
     -+
     -+static int encode_string(char *str, struct slice s) {
     -+  struct slice start = s;
     -+  int l = strlen(str);
     -+  int n = put_var_int(s, l);
     -+  if (n < 0) {
     -+    return -1;
     -+  }
     -+  s.buf += n;
     -+  s.len -= n;
     -+  if (s.len < l) {
     -+    return -1;
     -+  }
     -+  memcpy(s.buf, str, l);
     -+  s.buf += l;
     -+  s.len -= l;
     -+
     -+  return start.len - s.len;
     -+}
     -+
     -+static int ref_record_encode(const void *rec, struct slice s, int hash_size) {
     -+  const struct ref_record *r = (const struct ref_record *)rec;
     -+  struct slice start = s;
     -+  int n = put_var_int(s, r->update_index);
     -+  assert(hash_size > 0);
     -+  if (n < 0) {
     -+    return -1;
     -+  }
     -+  s.buf += n;
     -+  s.len -= n;
     -+
     -+  if (r->value != NULL) {
     -+    if (s.len < hash_size) {
     -+      return -1;
     -+    }
     -+    memcpy(s.buf, r->value, hash_size);
     -+    s.buf += hash_size;
     -+    s.len -= hash_size;
     -+  }
     -+
     -+  if (r->target_value != NULL) {
     -+    if (s.len < hash_size) {
     -+      return -1;
     -+    }
     -+    memcpy(s.buf, r->target_value, hash_size);
     -+    s.buf += hash_size;
     -+    s.len -= hash_size;
     -+  }
     -+
     -+  if (r->target != NULL) {
     -+    int n = encode_string(r->target, s);
     -+    if (n < 0) {
     -+      return -1;
     -+    }
     -+    s.buf += n;
     -+    s.len -= n;
     -+  }
     -+
     -+  return start.len - s.len;
     ++	       struct slice key, byte extra)
     ++{
     ++	struct slice start = dest;
     ++	int prefix_len = common_prefix_size(prev_key, key);
     ++	uint64_t suffix_len = key.len - prefix_len;
     ++	int n = put_var_int(dest, (uint64_t)prefix_len);
     ++	if (n < 0) {
     ++		return -1;
     ++	}
     ++	dest.buf += n;
     ++	dest.len -= n;
     ++
     ++	*restart = (prefix_len == 0);
     ++
     ++	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
     ++	if (n < 0) {
     ++		return -1;
     ++	}
     ++	dest.buf += n;
     ++	dest.len -= n;
     ++
     ++	if (dest.len < suffix_len) {
     ++		return -1;
     ++	}
     ++	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
     ++	dest.buf += suffix_len;
     ++	dest.len -= suffix_len;
     ++
     ++	return start.len - dest.len;
     ++}
     ++
     ++static byte ref_record_type(void)
     ++{
     ++	return BLOCK_TYPE_REF;
     ++}
     ++
     ++static void ref_record_key(const void *r, struct slice *dest)
     ++{
     ++	const struct ref_record *rec = (const struct ref_record *)r;
     ++	slice_set_string(dest, rec->ref_name);
     ++}
     ++
     ++static void ref_record_copy_from(void *rec, const void *src_rec, int hash_size)
     ++{
     ++	struct ref_record *ref = (struct ref_record *)rec;
     ++	struct ref_record *src = (struct ref_record *)src_rec;
     ++	assert(hash_size > 0);
     ++
     ++	/* This is simple and correct, but we could probably reuse the hash
     ++           fields. */
     ++	ref_record_clear(ref);
     ++	if (src->ref_name != NULL) {
     ++		ref->ref_name = strdup(src->ref_name);
     ++	}
     ++
     ++	if (src->target != NULL) {
     ++		ref->target = strdup(src->target);
     ++	}
     ++
     ++	if (src->target_value != NULL) {
     ++		ref->target_value = malloc(hash_size);
     ++		memcpy(ref->target_value, src->target_value, hash_size);
     ++	}
     ++
     ++	if (src->value != NULL) {
     ++		ref->value = malloc(hash_size);
     ++		memcpy(ref->value, src->value, hash_size);
     ++	}
     ++	ref->update_index = src->update_index;
     ++}
     ++
     ++static char hexdigit(int c)
     ++{
     ++	if (c <= 9) {
     ++		return '0' + c;
     ++	}
     ++	return 'a' + (c - 10);
     ++}
     ++
     ++static void hex_format(char *dest, byte *src, int hash_size)
     ++{
     ++	assert(hash_size > 0);
     ++	if (src != NULL) {
     ++		int i = 0;
     ++		for (i = 0; i < hash_size; i++) {
     ++			dest[2 * i] = hexdigit(src[i] >> 4);
     ++			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
     ++		}
     ++		dest[2 * hash_size] = 0;
     ++	}
     ++}
     ++
     ++void ref_record_print(struct ref_record *ref, int hash_size)
     ++{
     ++	char hex[SHA256_SIZE + 1] = {};
     ++
     ++	printf("ref{%s(%" PRIdMAX ") ", ref->ref_name, ref->update_index);
     ++	if (ref->value != NULL) {
     ++		hex_format(hex, ref->value, hash_size);
     ++		printf("%s", hex);
     ++	}
     ++	if (ref->target_value != NULL) {
     ++		hex_format(hex, ref->target_value, hash_size);
     ++		printf(" (T %s)", hex);
     ++	}
     ++	if (ref->target != NULL) {
     ++		printf("=> %s", ref->target);
     ++	}
     ++	printf("}\n");
     ++}
     ++
     ++static void ref_record_clear_void(void *rec)
     ++{
     ++	ref_record_clear((struct ref_record *)rec);
     ++}
     ++
     ++void ref_record_clear(struct ref_record *ref)
     ++{
     ++	free(ref->ref_name);
     ++	free(ref->target);
     ++	free(ref->target_value);
     ++	free(ref->value);
     ++	memset(ref, 0, sizeof(struct ref_record));
     ++}
     ++
     ++static byte ref_record_val_type(const void *rec)
     ++{
     ++	const struct ref_record *r = (const struct ref_record *)rec;
     ++	if (r->value != NULL) {
     ++		if (r->target_value != NULL) {
     ++			return 2;
     ++		} else {
     ++			return 1;
     ++		}
     ++	} else if (r->target != NULL) {
     ++		return 3;
     ++	}
     ++	return 0;
     ++}
     ++
     ++static int encode_string(char *str, struct slice s)
     ++{
     ++	struct slice start = s;
     ++	int l = strlen(str);
     ++	int n = put_var_int(s, l);
     ++	if (n < 0) {
     ++		return -1;
     ++	}
     ++	s.buf += n;
     ++	s.len -= n;
     ++	if (s.len < l) {
     ++		return -1;
     ++	}
     ++	memcpy(s.buf, str, l);
     ++	s.buf += l;
     ++	s.len -= l;
     ++
     ++	return start.len - s.len;
     ++}
     ++
     ++static int ref_record_encode(const void *rec, struct slice s, int hash_size)
     ++{
     ++	const struct ref_record *r = (const struct ref_record *)rec;
     ++	struct slice start = s;
     ++	int n = put_var_int(s, r->update_index);
     ++	assert(hash_size > 0);
     ++	if (n < 0) {
     ++		return -1;
     ++	}
     ++	s.buf += n;
     ++	s.len -= n;
     ++
     ++	if (r->value != NULL) {
     ++		if (s.len < hash_size) {
     ++			return -1;
     ++		}
     ++		memcpy(s.buf, r->value, hash_size);
     ++		s.buf += hash_size;
     ++		s.len -= hash_size;
     ++	}
     ++
     ++	if (r->target_value != NULL) {
     ++		if (s.len < hash_size) {
     ++			return -1;
     ++		}
     ++		memcpy(s.buf, r->target_value, hash_size);
     ++		s.buf += hash_size;
     ++		s.len -= hash_size;
     ++	}
     ++
     ++	if (r->target != NULL) {
     ++		int n = encode_string(r->target, s);
     ++		if (n < 0) {
     ++			return -1;
     ++		}
     ++		s.buf += n;
     ++		s.len -= n;
     ++	}
     ++
     ++	return start.len - s.len;
      +}
      +
      +static int ref_record_decode(void *rec, struct slice key, byte val_type,
     -+                             struct slice in, int hash_size) {
     -+  struct ref_record *r = (struct ref_record *)rec;
     -+  struct slice start = in;
     -+  bool seen_value = false;
     -+  bool seen_target_value = false;
     -+  bool seen_target = false;
     -+
     -+  int n = get_var_int(&r->update_index, in);
     -+  if (n < 0) {
     -+    return n;
     -+  }
     -+  assert(hash_size > 0);
     -+
     -+  in.buf += n;
     -+  in.len -= n;
     -+
     -+  r->ref_name = realloc(r->ref_name, key.len + 1);
     -+  memcpy(r->ref_name, key.buf, key.len);
     -+  r->ref_name[key.len] = 0;
     -+
     -+  switch (val_type) {
     -+    case 1:
     -+    case 2:
     -+      if (in.len < hash_size) {
     -+        return -1;
     -+      }
     -+
     -+      if (r->value == NULL) {
     -+        r->value = malloc(hash_size);
     -+      }
     -+      seen_value = true;
     -+      memcpy(r->value, in.buf, hash_size);
     -+      in.buf += hash_size;
     -+      in.len -= hash_size;
     -+      if (val_type == 1) {
     -+        break;
     -+      }
     -+      if (r->target_value == NULL) {
     -+        r->target_value = malloc(hash_size);
     -+      }
     -+      seen_target_value = true;
     -+      memcpy(r->target_value, in.buf, hash_size);
     -+      in.buf += hash_size;
     -+      in.len -= hash_size;
     -+      break;
     -+    case 3: {
     -+      struct slice dest = {};
     -+      int n = decode_string(&dest, in);
     -+      if (n < 0) {
     -+        return -1;
     -+      }
     -+      in.buf += n;
     -+      in.len -= n;
     -+      seen_target = true;
     -+      r->target = (char *)slice_as_string(&dest);
     -+    } break;
     -+
     -+    case 0:
     -+      break;
     -+    default:
     -+      abort();
     -+      break;
     -+  }
     -+
     -+  if (!seen_target && r->target != NULL) {
     -+    free(r->target);
     -+    r->target = NULL;
     -+  }
     -+  if (!seen_target_value && r->target_value != NULL) {
     -+    free(r->target_value);
     -+    r->target_value = NULL;
     -+  }
     -+  if (!seen_value && r->value != NULL) {
     -+    free(r->value);
     -+    r->value = NULL;
     -+  }
     -+
     -+  return start.len - in.len;
     ++			     struct slice in, int hash_size)
     ++{
     ++	struct ref_record *r = (struct ref_record *)rec;
     ++	struct slice start = in;
     ++	bool seen_value = false;
     ++	bool seen_target_value = false;
     ++	bool seen_target = false;
     ++
     ++	int n = get_var_int(&r->update_index, in);
     ++	if (n < 0) {
     ++		return n;
     ++	}
     ++	assert(hash_size > 0);
     ++
     ++	in.buf += n;
     ++	in.len -= n;
     ++
     ++	r->ref_name = realloc(r->ref_name, key.len + 1);
     ++	memcpy(r->ref_name, key.buf, key.len);
     ++	r->ref_name[key.len] = 0;
     ++
     ++	switch (val_type) {
     ++	case 1:
     ++	case 2:
     ++		if (in.len < hash_size) {
     ++			return -1;
     ++		}
     ++
     ++		if (r->value == NULL) {
     ++			r->value = malloc(hash_size);
     ++		}
     ++		seen_value = true;
     ++		memcpy(r->value, in.buf, hash_size);
     ++		in.buf += hash_size;
     ++		in.len -= hash_size;
     ++		if (val_type == 1) {
     ++			break;
     ++		}
     ++		if (r->target_value == NULL) {
     ++			r->target_value = malloc(hash_size);
     ++		}
     ++		seen_target_value = true;
     ++		memcpy(r->target_value, in.buf, hash_size);
     ++		in.buf += hash_size;
     ++		in.len -= hash_size;
     ++		break;
     ++	case 3: {
     ++		struct slice dest = {};
     ++		int n = decode_string(&dest, in);
     ++		if (n < 0) {
     ++			return -1;
     ++		}
     ++		in.buf += n;
     ++		in.len -= n;
     ++		seen_target = true;
     ++		r->target = (char *)slice_as_string(&dest);
     ++	} break;
     ++
     ++	case 0:
     ++		break;
     ++	default:
     ++		abort();
     ++		break;
     ++	}
     ++
     ++	if (!seen_target && r->target != NULL) {
     ++		free(r->target);
     ++		r->target = NULL;
     ++	}
     ++	if (!seen_target_value && r->target_value != NULL) {
     ++		free(r->target_value);
     ++		r->target_value = NULL;
     ++	}
     ++	if (!seen_value && r->value != NULL) {
     ++		free(r->value);
     ++		r->value = NULL;
     ++	}
     ++
     ++	return start.len - in.len;
      +}
      +
      +int decode_key(struct slice *key, byte *extra, struct slice last_key,
     -+               struct slice in) {
     -+  int start_len = in.len;
     -+  uint64_t prefix_len = 0;
     -+  uint64_t suffix_len = 0;
     -+  int n = get_var_int(&prefix_len, in);
     -+  if (n < 0) {
     -+    return -1;
     -+  }
     -+  in.buf += n;
     -+  in.len -= n;
     -+
     -+  if (prefix_len > last_key.len) {
     -+    return -1;
     -+  }
     -+
     -+  n = get_var_int(&suffix_len, in);
     -+  if (n <= 0) {
     -+    return -1;
     -+  }
     -+  in.buf += n;
     -+  in.len -= n;
     -+
     -+  *extra = (byte)(suffix_len & 0x7);
     -+  suffix_len >>= 3;
     -+
     -+  if (in.len < suffix_len) {
     -+    return -1;
     -+  }
     -+
     -+  slice_resize(key, suffix_len + prefix_len);
     -+  memcpy(key->buf, last_key.buf, prefix_len);
     -+
     -+  memcpy(key->buf + prefix_len, in.buf, suffix_len);
     -+  in.buf += suffix_len;
     -+  in.len -= suffix_len;
     -+
     -+  return start_len - in.len;
     ++	       struct slice in)
     ++{
     ++	int start_len = in.len;
     ++	uint64_t prefix_len = 0;
     ++	uint64_t suffix_len = 0;
     ++	int n = get_var_int(&prefix_len, in);
     ++	if (n < 0) {
     ++		return -1;
     ++	}
     ++	in.buf += n;
     ++	in.len -= n;
     ++
     ++	if (prefix_len > last_key.len) {
     ++		return -1;
     ++	}
     ++
     ++	n = get_var_int(&suffix_len, in);
     ++	if (n <= 0) {
     ++		return -1;
     ++	}
     ++	in.buf += n;
     ++	in.len -= n;
     ++
     ++	*extra = (byte)(suffix_len & 0x7);
     ++	suffix_len >>= 3;
     ++
     ++	if (in.len < suffix_len) {
     ++		return -1;
     ++	}
     ++
     ++	slice_resize(key, suffix_len + prefix_len);
     ++	memcpy(key->buf, last_key.buf, prefix_len);
     ++
     ++	memcpy(key->buf + prefix_len, in.buf, suffix_len);
     ++	in.buf += suffix_len;
     ++	in.len -= suffix_len;
     ++
     ++	return start_len - in.len;
      +}
      +
      +struct record_vtable ref_record_vtable = {
     -+    .key = &ref_record_key,
     -+    .type = &ref_record_type,
     -+    .copy_from = &ref_record_copy_from,
     -+    .val_type = &ref_record_val_type,
     -+    .encode = &ref_record_encode,
     -+    .decode = &ref_record_decode,
     -+    .clear = &ref_record_clear_void,
     ++	.key = &ref_record_key,
     ++	.type = &ref_record_type,
     ++	.copy_from = &ref_record_copy_from,
     ++	.val_type = &ref_record_val_type,
     ++	.encode = &ref_record_encode,
     ++	.decode = &ref_record_decode,
     ++	.clear = &ref_record_clear_void,
      +};
      +
     -+static byte obj_record_type(void) { return BLOCK_TYPE_OBJ; }
     -+
     -+static void obj_record_key(const void *r, struct slice *dest) {
     -+  const struct obj_record *rec = (const struct obj_record *)r;
     -+  slice_resize(dest, rec->hash_prefix_len);
     -+  memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
     -+}
     -+
     -+static void obj_record_copy_from(void *rec, const void *src_rec,
     -+                                 int hash_size) {
     -+  struct obj_record *ref = (struct obj_record *)rec;
     -+  const struct obj_record *src = (const struct obj_record *)src_rec;
     -+
     -+  *ref = *src;
     -+  ref->hash_prefix = malloc(ref->hash_prefix_len);
     -+  memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
     -+
     -+  {
     -+    int olen = ref->offset_len * sizeof(uint64_t);
     -+    ref->offsets = malloc(olen);
     -+    memcpy(ref->offsets, src->offsets, olen);
     -+  }
     -+}
     -+
     -+static void obj_record_clear(void *rec) {
     -+  struct obj_record *ref = (struct obj_record *)rec;
     -+  free(ref->hash_prefix);
     -+  free(ref->offsets);
     -+  memset(ref, 0, sizeof(struct obj_record));
     -+}
     -+
     -+static byte obj_record_val_type(const void *rec) {
     -+  struct obj_record *r = (struct obj_record *)rec;
     -+  if (r->offset_len > 0 && r->offset_len < 8) {
     -+    return r->offset_len;
     -+  }
     -+  return 0;
     -+}
     -+
     -+static int obj_record_encode(const void *rec, struct slice s, int hash_size) {
     -+  struct obj_record *r = (struct obj_record *)rec;
     -+  struct slice start = s;
     -+  int n = 0;
     -+  if (r->offset_len == 0 || r->offset_len >= 8) {
     -+    n = put_var_int(s, r->offset_len);
     -+    if (n < 0) {
     -+      return -1;
     -+    }
     -+    s.buf += n;
     -+    s.len -= n;
     -+  }
     -+  if (r->offset_len == 0) {
     -+    return start.len - s.len;
     -+  }
     -+  n = put_var_int(s, r->offsets[0]);
     -+  if (n < 0) {
     -+    return -1;
     -+  }
     -+  s.buf += n;
     -+  s.len -= n;
     -+
     -+  {
     -+    uint64_t last = r->offsets[0];
     -+    for (int i = 1; i < r->offset_len; i++) {
     -+      int n = put_var_int(s, r->offsets[i] - last);
     -+      if (n < 0) {
     -+        return -1;
     -+      }
     -+      s.buf += n;
     -+      s.len -= n;
     -+      last = r->offsets[i];
     -+    }
     -+  }
     -+  return start.len - s.len;
     ++static byte obj_record_type(void)
     ++{
     ++	return BLOCK_TYPE_OBJ;
     ++}
     ++
     ++static void obj_record_key(const void *r, struct slice *dest)
     ++{
     ++	const struct obj_record *rec = (const struct obj_record *)r;
     ++	slice_resize(dest, rec->hash_prefix_len);
     ++	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
     ++}
     ++
     ++static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
     ++{
     ++	struct obj_record *ref = (struct obj_record *)rec;
     ++	const struct obj_record *src = (const struct obj_record *)src_rec;
     ++
     ++	*ref = *src;
     ++	ref->hash_prefix = malloc(ref->hash_prefix_len);
     ++	memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
     ++
     ++	{
     ++		int olen = ref->offset_len * sizeof(uint64_t);
     ++		ref->offsets = malloc(olen);
     ++		memcpy(ref->offsets, src->offsets, olen);
     ++	}
     ++}
     ++
     ++static void obj_record_clear(void *rec)
     ++{
     ++	struct obj_record *ref = (struct obj_record *)rec;
     ++	free(ref->hash_prefix);
     ++	free(ref->offsets);
     ++	memset(ref, 0, sizeof(struct obj_record));
     ++}
     ++
     ++static byte obj_record_val_type(const void *rec)
     ++{
     ++	struct obj_record *r = (struct obj_record *)rec;
     ++	if (r->offset_len > 0 && r->offset_len < 8) {
     ++		return r->offset_len;
     ++	}
     ++	return 0;
     ++}
     ++
     ++static int obj_record_encode(const void *rec, struct slice s, int hash_size)
     ++{
     ++	struct obj_record *r = (struct obj_record *)rec;
     ++	struct slice start = s;
     ++	int n = 0;
     ++	if (r->offset_len == 0 || r->offset_len >= 8) {
     ++		n = put_var_int(s, r->offset_len);
     ++		if (n < 0) {
     ++			return -1;
     ++		}
     ++		s.buf += n;
     ++		s.len -= n;
     ++	}
     ++	if (r->offset_len == 0) {
     ++		return start.len - s.len;
     ++	}
     ++	n = put_var_int(s, r->offsets[0]);
     ++	if (n < 0) {
     ++		return -1;
     ++	}
     ++	s.buf += n;
     ++	s.len -= n;
     ++
     ++	{
     ++		uint64_t last = r->offsets[0];
     ++		int i = 0;
     ++		for (i = 1; i < r->offset_len; i++) {
     ++			int n = put_var_int(s, r->offsets[i] - last);
     ++			if (n < 0) {
     ++				return -1;
     ++			}
     ++			s.buf += n;
     ++			s.len -= n;
     ++			last = r->offsets[i];
     ++		}
     ++	}
     ++	return start.len - s.len;
      +}
      +
      +static int obj_record_decode(void *rec, struct slice key, byte val_type,
     -+                             struct slice in, int hash_size) {
     -+  struct slice start = in;
     -+  struct obj_record *r = (struct obj_record *)rec;
     -+  uint64_t count = val_type;
     -+  int n = 0;
     -+  r->hash_prefix = malloc(key.len);
     -+  memcpy(r->hash_prefix, key.buf, key.len);
     -+  r->hash_prefix_len = key.len;
     -+
     -+  if (val_type == 0) {
     -+    n = get_var_int(&count, in);
     -+    if (n < 0) {
     -+      return n;
     -+    }
     -+
     -+    in.buf += n;
     -+    in.len -= n;
     -+  }
     -+
     -+  r->offsets = NULL;
     -+  r->offset_len = 0;
     -+  if (count == 0) {
     -+    return start.len - in.len;
     -+  }
     -+
     -+  r->offsets = malloc(count * sizeof(uint64_t));
     -+  r->offset_len = count;
     -+
     -+  n = get_var_int(&r->offsets[0], in);
     -+  if (n < 0) {
     -+    return n;
     -+  }
     -+
     -+  in.buf += n;
     -+  in.len -= n;
     -+
     -+  {
     -+    uint64_t last = r->offsets[0];
     -+    int j = 1;
     -+    while (j < count) {
     -+      uint64_t delta = 0;
     -+      int n = get_var_int(&delta, in);
     -+      if (n < 0) {
     -+        return n;
     -+      }
     -+
     -+      in.buf += n;
     -+      in.len -= n;
     -+
     -+      last = r->offsets[j] = (delta + last);
     -+      j++;
     -+    }
     -+  }
     -+  return start.len - in.len;
     ++			     struct slice in, int hash_size)
     ++{
     ++	struct slice start = in;
     ++	struct obj_record *r = (struct obj_record *)rec;
     ++	uint64_t count = val_type;
     ++	int n = 0;
     ++	r->hash_prefix = malloc(key.len);
     ++	memcpy(r->hash_prefix, key.buf, key.len);
     ++	r->hash_prefix_len = key.len;
     ++
     ++	if (val_type == 0) {
     ++		n = get_var_int(&count, in);
     ++		if (n < 0) {
     ++			return n;
     ++		}
     ++
     ++		in.buf += n;
     ++		in.len -= n;
     ++	}
     ++
     ++	r->offsets = NULL;
     ++	r->offset_len = 0;
     ++	if (count == 0) {
     ++		return start.len - in.len;
     ++	}
     ++
     ++	r->offsets = malloc(count * sizeof(uint64_t));
     ++	r->offset_len = count;
     ++
     ++	n = get_var_int(&r->offsets[0], in);
     ++	if (n < 0) {
     ++		return n;
     ++	}
     ++
     ++	in.buf += n;
     ++	in.len -= n;
     ++
     ++	{
     ++		uint64_t last = r->offsets[0];
     ++		int j = 1;
     ++		while (j < count) {
     ++			uint64_t delta = 0;
     ++			int n = get_var_int(&delta, in);
     ++			if (n < 0) {
     ++				return n;
     ++			}
     ++
     ++			in.buf += n;
     ++			in.len -= n;
     ++
     ++			last = r->offsets[j] = (delta + last);
     ++			j++;
     ++		}
     ++	}
     ++	return start.len - in.len;
      +}
      +
      +struct record_vtable obj_record_vtable = {
     -+    .key = &obj_record_key,
     -+    .type = &obj_record_type,
     -+    .copy_from = &obj_record_copy_from,
     -+    .val_type = &obj_record_val_type,
     -+    .encode = &obj_record_encode,
     -+    .decode = &obj_record_decode,
     -+    .clear = &obj_record_clear,
     ++	.key = &obj_record_key,
     ++	.type = &obj_record_type,
     ++	.copy_from = &obj_record_copy_from,
     ++	.val_type = &obj_record_val_type,
     ++	.encode = &obj_record_encode,
     ++	.decode = &obj_record_decode,
     ++	.clear = &obj_record_clear,
      +};
      +
     -+void log_record_print(struct log_record *log, int hash_size) {
     -+  char hex[SHA256_SIZE + 1] = {};
     -+
     -+  printf("log{%s(%ld) %s <%s> %lu %04d\n", log->ref_name, log->update_index,
     -+         log->name, log->email, log->time, log->tz_offset);
     -+  hex_format(hex, log->old_hash, hash_size);
     -+  printf("%s => ", hex);
     -+  hex_format(hex, log->new_hash, hash_size);
     -+  printf("%s\n\n%s\n}\n", hex, log->message);
     ++void log_record_print(struct log_record *log, int hash_size)
     ++{
     ++	char hex[SHA256_SIZE + 1] = {};
     ++
     ++	printf("log{%s(%" PRIdMAX ") %s <%s> %lu %04d\n", log->ref_name,
     ++	       log->update_index, log->name, log->email, log->time,
     ++	       log->tz_offset);
     ++	hex_format(hex, log->old_hash, hash_size);
     ++	printf("%s => ", hex);
     ++	hex_format(hex, log->new_hash, hash_size);
     ++	printf("%s\n\n%s\n}\n", hex, log->message);
     ++}
     ++
     ++static byte log_record_type(void)
     ++{
     ++	return BLOCK_TYPE_LOG;
     ++}
     ++
     ++static void log_record_key(const void *r, struct slice *dest)
     ++{
     ++	const struct log_record *rec = (const struct log_record *)r;
     ++	int len = strlen(rec->ref_name);
     ++	uint64_t ts = 0;
     ++	slice_resize(dest, len + 9);
     ++	memcpy(dest->buf, rec->ref_name, len + 1);
     ++	ts = (~ts) - rec->update_index;
     ++	put_u64(dest->buf + 1 + len, ts);
     ++}
     ++
     ++static void log_record_copy_from(void *rec, const void *src_rec, int hash_size)
     ++{
     ++	struct log_record *dst = (struct log_record *)rec;
     ++	const struct log_record *src = (const struct log_record *)src_rec;
     ++
     ++	*dst = *src;
     ++	dst->ref_name = strdup(dst->ref_name);
     ++	dst->email = strdup(dst->email);
     ++	dst->name = strdup(dst->name);
     ++	dst->message = strdup(dst->message);
     ++	if (dst->new_hash != NULL) {
     ++		dst->new_hash = malloc(hash_size);
     ++		memcpy(dst->new_hash, src->new_hash, hash_size);
     ++	}
     ++	if (dst->old_hash != NULL) {
     ++		dst->old_hash = malloc(hash_size);
     ++		memcpy(dst->old_hash, src->old_hash, hash_size);
     ++	}
     ++}
     ++
     ++static void log_record_clear_void(void *rec)
     ++{
     ++	struct log_record *r = (struct log_record *)rec;
     ++	log_record_clear(r);
     ++}
     ++
     ++void log_record_clear(struct log_record *r)
     ++{
     ++	free(r->ref_name);
     ++	free(r->new_hash);
     ++	free(r->old_hash);
     ++	free(r->name);
     ++	free(r->email);
     ++	free(r->message);
     ++	memset(r, 0, sizeof(struct log_record));
     ++}
     ++
     ++static byte log_record_val_type(const void *rec)
     ++{
     ++	return 1;
      +}
      +
     -+static byte log_record_type(void) { return BLOCK_TYPE_LOG; }
     -+
     -+static void log_record_key(const void *r, struct slice *dest) {
     -+  const struct log_record *rec = (const struct log_record *)r;
     -+  int len = strlen(rec->ref_name);
     -+  uint64_t ts = 0;
     -+  slice_resize(dest, len + 9);
     -+  memcpy(dest->buf, rec->ref_name, len + 1);
     -+  ts = (~ts) - rec->update_index;
     -+  put_u64(dest->buf + 1 + len, ts);
     -+}
     -+
     -+static void log_record_copy_from(void *rec, const void *src_rec,
     -+                                 int hash_size) {
     -+  struct log_record *dst = (struct log_record *)rec;
     -+  const struct log_record *src = (const struct log_record *)src_rec;
     -+
     -+  *dst = *src;
     -+  dst->ref_name = strdup(dst->ref_name);
     -+  dst->email = strdup(dst->email);
     -+  dst->name = strdup(dst->name);
     -+  dst->message = strdup(dst->message);
     -+  if (dst->new_hash != NULL) {
     -+    dst->new_hash = malloc(hash_size);
     -+    memcpy(dst->new_hash, src->new_hash, hash_size);
     -+  }
     -+  if (dst->old_hash != NULL) {
     -+    dst->old_hash = malloc(hash_size);
     -+    memcpy(dst->old_hash, src->old_hash, hash_size);
     -+  }
     -+}
     -+
     -+static void log_record_clear_void(void *rec) {
     -+  struct log_record *r = (struct log_record *)rec;
     -+  log_record_clear(r);
     -+}
     -+
     -+void log_record_clear(struct log_record *r) {
     -+  free(r->ref_name);
     -+  free(r->new_hash);
     -+  free(r->old_hash);
     -+  free(r->name);
     -+  free(r->email);
     -+  free(r->message);
     -+  memset(r, 0, sizeof(struct log_record));
     -+}
     -+
     -+static byte log_record_val_type(const void *rec) { return 1; }
     -+
      +static byte zero[SHA256_SIZE] = {};
      +
     -+static int log_record_encode(const void *rec, struct slice s, int hash_size) {
     -+  struct log_record *r = (struct log_record *)rec;
     -+  struct slice start = s;
     -+  int n = 0;
     -+  byte *oldh = r->old_hash;
     -+  byte *newh = r->new_hash;
     -+  if (oldh == NULL) {
     -+    oldh = zero;
     -+  }
     -+  if (newh == NULL) {
     -+    newh = zero;
     -+  }
     -+
     -+  if (s.len < 2 * hash_size) {
     -+    return -1;
     -+  }
     -+
     -+  memcpy(s.buf, oldh, hash_size);
     -+  memcpy(s.buf + hash_size, newh, hash_size);
     -+  s.buf += 2 * hash_size;
     -+  s.len -= 2 * hash_size;
     -+
     -+  n = encode_string(r->name ? r->name : "", s);
     -+  if (n < 0) {
     -+    return -1;
     -+  }
     -+  s.len -= n;
     -+  s.buf += n;
     -+
     -+  n = encode_string(r->email ? r->email : "", s);
     -+  if (n < 0) {
     -+    return -1;
     -+  }
     -+  s.len -= n;
     -+  s.buf += n;
     -+
     -+  n = put_var_int(s, r->time);
     -+  if (n < 0) {
     -+    return -1;
     -+  }
     -+  s.buf += n;
     -+  s.len -= n;
     -+
     -+  if (s.len < 2) {
     -+    return -1;
     -+  }
     -+
     -+  put_u16(s.buf, r->tz_offset);
     -+  s.buf += 2;
     -+  s.len -= 2;
     -+
     -+  n = encode_string(r->message ? r->message : "", s);
     -+  if (n < 0) {
     -+    return -1;
     -+  }
     -+  s.len -= n;
     -+  s.buf += n;
     -+
     -+  return start.len - s.len;
     ++static int log_record_encode(const void *rec, struct slice s, int hash_size)
     ++{
     ++	struct log_record *r = (struct log_record *)rec;
     ++	struct slice start = s;
     ++	int n = 0;
     ++	byte *oldh = r->old_hash;
     ++	byte *newh = r->new_hash;
     ++	if (oldh == NULL) {
     ++		oldh = zero;
     ++	}
     ++	if (newh == NULL) {
     ++		newh = zero;
     ++	}
     ++
     ++	if (s.len < 2 * hash_size) {
     ++		return -1;
     ++	}
     ++
     ++	memcpy(s.buf, oldh, hash_size);
     ++	memcpy(s.buf + hash_size, newh, hash_size);
     ++	s.buf += 2 * hash_size;
     ++	s.len -= 2 * hash_size;
     ++
     ++	n = encode_string(r->name ? r->name : "", s);
     ++	if (n < 0) {
     ++		return -1;
     ++	}
     ++	s.len -= n;
     ++	s.buf += n;
     ++
     ++	n = encode_string(r->email ? r->email : "", s);
     ++	if (n < 0) {
     ++		return -1;
     ++	}
     ++	s.len -= n;
     ++	s.buf += n;
     ++
     ++	n = put_var_int(s, r->time);
     ++	if (n < 0) {
     ++		return -1;
     ++	}
     ++	s.buf += n;
     ++	s.len -= n;
     ++
     ++	if (s.len < 2) {
     ++		return -1;
     ++	}
     ++
     ++	put_u16(s.buf, r->tz_offset);
     ++	s.buf += 2;
     ++	s.len -= 2;
     ++
     ++	n = encode_string(r->message ? r->message : "", s);
     ++	if (n < 0) {
     ++		return -1;
     ++	}
     ++	s.len -= n;
     ++	s.buf += n;
     ++
     ++	return start.len - s.len;
      +}
      +
      +static int log_record_decode(void *rec, struct slice key, byte val_type,
     -+                             struct slice in, int hash_size) {
     -+  struct slice start = in;
     -+  struct log_record *r = (struct log_record *)rec;
     -+  uint64_t max = 0;
     -+  uint64_t ts = 0;
     -+  struct slice dest = {};
     -+  int n;
     -+
     -+  if (key.len <= 9 || key.buf[key.len - 9] != 0) {
     -+    return FORMAT_ERROR;
     -+  }
     -+
     -+  r->ref_name = realloc(r->ref_name, key.len - 8);
     -+  memcpy(r->ref_name, key.buf, key.len - 8);
     -+  ts = get_u64(key.buf + key.len - 8);
     -+
     -+  r->update_index = (~max) - ts;
     -+
     -+  if (in.len < 2 * hash_size) {
     -+    return FORMAT_ERROR;
     -+  }
     -+
     -+  r->old_hash = realloc(r->old_hash, hash_size);
     -+  r->new_hash = realloc(r->new_hash, hash_size);
     -+
     -+  memcpy(r->old_hash, in.buf, hash_size);
     -+  memcpy(r->new_hash, in.buf + hash_size, hash_size);
     -+
     -+  in.buf += 2 * hash_size;
     -+  in.len -= 2 * hash_size;
     -+
     -+  n = decode_string(&dest, in);
     -+  if (n < 0) {
     -+    goto error;
     -+  }
     -+  in.len -= n;
     -+  in.buf += n;
     -+
     -+  r->name = realloc(r->name, dest.len + 1);
     -+  memcpy(r->name, dest.buf, dest.len);
     -+  r->name[dest.len] = 0;
     -+
     -+  slice_resize(&dest, 0);
     -+  n = decode_string(&dest, in);
     -+  if (n < 0) {
     -+    goto error;
     -+  }
     -+  in.len -= n;
     -+  in.buf += n;
     -+
     -+  r->email = realloc(r->email, dest.len + 1);
     -+  memcpy(r->email, dest.buf, dest.len);
     -+  r->email[dest.len] = 0;
     -+
     -+  ts = 0;
     -+  n = get_var_int(&ts, in);
     -+  if (n < 0) {
     -+    goto error;
     -+  }
     -+  in.len -= n;
     -+  in.buf += n;
     -+  r->time = ts;
     -+  if (in.len < 2) {
     -+    goto error;
     -+  }
     -+
     -+  r->tz_offset = get_u16(in.buf);
     -+  in.buf += 2;
     -+  in.len -= 2;
     -+
     -+  slice_resize(&dest, 0);
     -+  n = decode_string(&dest, in);
     -+  if (n < 0) {
     -+    goto error;
     -+  }
     -+  in.len -= n;
     -+  in.buf += n;
     -+
     -+  r->message = realloc(r->message, dest.len + 1);
     -+  memcpy(r->message, dest.buf, dest.len);
     -+  r->message[dest.len] = 0;
     -+
     -+  return start.len - in.len;
     ++			     struct slice in, int hash_size)
     ++{
     ++	struct slice start = in;
     ++	struct log_record *r = (struct log_record *)rec;
     ++	uint64_t max = 0;
     ++	uint64_t ts = 0;
     ++	struct slice dest = {};
     ++	int n;
     ++
     ++	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
     ++		return FORMAT_ERROR;
     ++	}
     ++
     ++	r->ref_name = realloc(r->ref_name, key.len - 8);
     ++	memcpy(r->ref_name, key.buf, key.len - 8);
     ++	ts = get_u64(key.buf + key.len - 8);
     ++
     ++	r->update_index = (~max) - ts;
     ++
     ++	if (in.len < 2 * hash_size) {
     ++		return FORMAT_ERROR;
     ++	}
     ++
     ++	r->old_hash = realloc(r->old_hash, hash_size);
     ++	r->new_hash = realloc(r->new_hash, hash_size);
     ++
     ++	memcpy(r->old_hash, in.buf, hash_size);
     ++	memcpy(r->new_hash, in.buf + hash_size, hash_size);
     ++
     ++	in.buf += 2 * hash_size;
     ++	in.len -= 2 * hash_size;
     ++
     ++	n = decode_string(&dest, in);
     ++	if (n < 0) {
     ++		goto error;
     ++	}
     ++	in.len -= n;
     ++	in.buf += n;
     ++
     ++	r->name = realloc(r->name, dest.len + 1);
     ++	memcpy(r->name, dest.buf, dest.len);
     ++	r->name[dest.len] = 0;
     ++
     ++	slice_resize(&dest, 0);
     ++	n = decode_string(&dest, in);
     ++	if (n < 0) {
     ++		goto error;
     ++	}
     ++	in.len -= n;
     ++	in.buf += n;
     ++
     ++	r->email = realloc(r->email, dest.len + 1);
     ++	memcpy(r->email, dest.buf, dest.len);
     ++	r->email[dest.len] = 0;
     ++
     ++	ts = 0;
     ++	n = get_var_int(&ts, in);
     ++	if (n < 0) {
     ++		goto error;
     ++	}
     ++	in.len -= n;
     ++	in.buf += n;
     ++	r->time = ts;
     ++	if (in.len < 2) {
     ++		goto error;
     ++	}
     ++
     ++	r->tz_offset = get_u16(in.buf);
     ++	in.buf += 2;
     ++	in.len -= 2;
     ++
     ++	slice_resize(&dest, 0);
     ++	n = decode_string(&dest, in);
     ++	if (n < 0) {
     ++		goto error;
     ++	}
     ++	in.len -= n;
     ++	in.buf += n;
     ++
     ++	r->message = realloc(r->message, dest.len + 1);
     ++	memcpy(r->message, dest.buf, dest.len);
     ++	r->message[dest.len] = 0;
     ++
     ++	return start.len - in.len;
      +
      +error:
     -+  free(slice_yield(&dest));
     -+  return FORMAT_ERROR;
     -+}
     -+
     -+static bool null_streq(char *a, char *b) {
     -+  char *empty = "";
     -+  if (a == NULL) {
     -+    a = empty;
     -+  }
     -+  if (b == NULL) {
     -+    b = empty;
     -+  }
     -+  return 0 == strcmp(a, b);
     -+}
     -+
     -+static bool zero_hash_eq(byte *a, byte *b, int sz) {
     -+  if (a == NULL) {
     -+    a = zero;
     -+  }
     -+  if (b == NULL) {
     -+    b = zero;
     -+  }
     -+  return 0 == memcmp(a, b, sz);
     -+}
     -+
     -+bool log_record_equal(struct log_record *a, struct log_record *b,
     -+                      int hash_size) {
     -+  return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
     -+         null_streq(a->message, b->message) &&
     -+         zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
     -+         zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
     -+         a->time == b->time && a->tz_offset == b->tz_offset &&
     -+         a->update_index == b->update_index;
     ++	free(slice_yield(&dest));
     ++	return FORMAT_ERROR;
     ++}
     ++
     ++static bool null_streq(char *a, char *b)
     ++{
     ++	char *empty = "";
     ++	if (a == NULL) {
     ++		a = empty;
     ++	}
     ++	if (b == NULL) {
     ++		b = empty;
     ++	}
     ++	return 0 == strcmp(a, b);
     ++}
     ++
     ++static bool zero_hash_eq(byte *a, byte *b, int sz)
     ++{
     ++	if (a == NULL) {
     ++		a = zero;
     ++	}
     ++	if (b == NULL) {
     ++		b = zero;
     ++	}
     ++	return !memcmp(a, b, sz);
     ++}
     ++
     ++bool log_record_equal(struct log_record *a, struct log_record *b, int hash_size)
     ++{
     ++	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
     ++	       null_streq(a->message, b->message) &&
     ++	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
     ++	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
     ++	       a->time == b->time && a->tz_offset == b->tz_offset &&
     ++	       a->update_index == b->update_index;
      +}
      +
      +struct record_vtable log_record_vtable = {
     -+    .key = &log_record_key,
     -+    .type = &log_record_type,
     -+    .copy_from = &log_record_copy_from,
     -+    .val_type = &log_record_val_type,
     -+    .encode = &log_record_encode,
     -+    .decode = &log_record_decode,
     -+    .clear = &log_record_clear_void,
     ++	.key = &log_record_key,
     ++	.type = &log_record_type,
     ++	.copy_from = &log_record_copy_from,
     ++	.val_type = &log_record_val_type,
     ++	.encode = &log_record_encode,
     ++	.decode = &log_record_decode,
     ++	.clear = &log_record_clear_void,
      +};
      +
     -+struct record new_record(byte typ) {
     -+  struct record rec;
     -+  switch (typ) {
     -+    case BLOCK_TYPE_REF: {
     -+      struct ref_record *r = calloc(1, sizeof(struct ref_record));
     -+      record_from_ref(&rec, r);
     -+      return rec;
     -+    }
     -+
     -+    case BLOCK_TYPE_OBJ: {
     -+      struct obj_record *r = calloc(1, sizeof(struct obj_record));
     -+      record_from_obj(&rec, r);
     -+      return rec;
     -+    }
     -+    case BLOCK_TYPE_LOG: {
     -+      struct log_record *r = calloc(1, sizeof(struct log_record));
     -+      record_from_log(&rec, r);
     -+      return rec;
     -+    }
     -+    case BLOCK_TYPE_INDEX: {
     -+      struct index_record *r = calloc(1, sizeof(struct index_record));
     -+      record_from_index(&rec, r);
     -+      return rec;
     -+    }
     -+  }
     -+  abort();
     -+  return rec;
     -+}
     -+
     -+static byte index_record_type(void) { return BLOCK_TYPE_INDEX; }
     -+
     -+static void index_record_key(const void *r, struct slice *dest) {
     -+  struct index_record *rec = (struct index_record *)r;
     -+  slice_copy(dest, rec->last_key);
     ++struct record new_record(byte typ)
     ++{
     ++	struct record rec;
     ++	switch (typ) {
     ++	case BLOCK_TYPE_REF: {
     ++		struct ref_record *r = calloc(1, sizeof(struct ref_record));
     ++		record_from_ref(&rec, r);
     ++		return rec;
     ++	}
     ++
     ++	case BLOCK_TYPE_OBJ: {
     ++		struct obj_record *r = calloc(1, sizeof(struct obj_record));
     ++		record_from_obj(&rec, r);
     ++		return rec;
     ++	}
     ++	case BLOCK_TYPE_LOG: {
     ++		struct log_record *r = calloc(1, sizeof(struct log_record));
     ++		record_from_log(&rec, r);
     ++		return rec;
     ++	}
     ++	case BLOCK_TYPE_INDEX: {
     ++		struct index_record *r = calloc(1, sizeof(struct index_record));
     ++		record_from_index(&rec, r);
     ++		return rec;
     ++	}
     ++	}
     ++	abort();
     ++	return rec;
     ++}
     ++
     ++static byte index_record_type(void)
     ++{
     ++	return BLOCK_TYPE_INDEX;
     ++}
     ++
     ++static void index_record_key(const void *r, struct slice *dest)
     ++{
     ++	struct index_record *rec = (struct index_record *)r;
     ++	slice_copy(dest, rec->last_key);
      +}
      +
      +static void index_record_copy_from(void *rec, const void *src_rec,
     -+                                   int hash_size) {
     -+  struct index_record *dst = (struct index_record *)rec;
     -+  struct index_record *src = (struct index_record *)src_rec;
     ++				   int hash_size)
     ++{
     ++	struct index_record *dst = (struct index_record *)rec;
     ++	struct index_record *src = (struct index_record *)src_rec;
      +
     -+  slice_copy(&dst->last_key, src->last_key);
     -+  dst->offset = src->offset;
     ++	slice_copy(&dst->last_key, src->last_key);
     ++	dst->offset = src->offset;
      +}
      +
     -+static void index_record_clear(void *rec) {
     -+  struct index_record *idx = (struct index_record *)rec;
     -+  free(slice_yield(&idx->last_key));
     ++static void index_record_clear(void *rec)
     ++{
     ++	struct index_record *idx = (struct index_record *)rec;
     ++	free(slice_yield(&idx->last_key));
      +}
      +
     -+static byte index_record_val_type(const void *rec) { return 0; }
     ++static byte index_record_val_type(const void *rec)
     ++{
     ++	return 0;
     ++}
      +
     -+static int index_record_encode(const void *rec, struct slice out,
     -+                               int hash_size) {
     -+  const struct index_record *r = (const struct index_record *)rec;
     -+  struct slice start = out;
     ++static int index_record_encode(const void *rec, struct slice out, int hash_size)
     ++{
     ++	const struct index_record *r = (const struct index_record *)rec;
     ++	struct slice start = out;
      +
     -+  int n = put_var_int(out, r->offset);
     -+  if (n < 0) {
     -+    return n;
     -+  }
     ++	int n = put_var_int(out, r->offset);
     ++	if (n < 0) {
     ++		return n;
     ++	}
      +
     -+  out.buf += n;
     -+  out.len -= n;
     ++	out.buf += n;
     ++	out.len -= n;
      +
     -+  return start.len - out.len;
     ++	return start.len - out.len;
      +}
      +
      +static int index_record_decode(void *rec, struct slice key, byte val_type,
     -+                               struct slice in, int hash_size) {
     -+  struct slice start = in;
     -+  struct index_record *r = (struct index_record *)rec;
     -+  int n = 0;
     ++			       struct slice in, int hash_size)
     ++{
     ++	struct slice start = in;
     ++	struct index_record *r = (struct index_record *)rec;
     ++	int n = 0;
      +
     -+  slice_copy(&r->last_key, key);
     ++	slice_copy(&r->last_key, key);
      +
     -+  n = get_var_int(&r->offset, in);
     -+  if (n < 0) {
     -+    return n;
     -+  }
     ++	n = get_var_int(&r->offset, in);
     ++	if (n < 0) {
     ++		return n;
     ++	}
      +
     -+  in.buf += n;
     -+  in.len -= n;
     -+  return start.len - in.len;
     ++	in.buf += n;
     ++	in.len -= n;
     ++	return start.len - in.len;
      +}
      +
      +struct record_vtable index_record_vtable = {
     -+    .key = &index_record_key,
     -+    .type = &index_record_type,
     -+    .copy_from = &index_record_copy_from,
     -+    .val_type = &index_record_val_type,
     -+    .encode = &index_record_encode,
     -+    .decode = &index_record_decode,
     -+    .clear = &index_record_clear,
     ++	.key = &index_record_key,
     ++	.type = &index_record_type,
     ++	.copy_from = &index_record_copy_from,
     ++	.val_type = &index_record_val_type,
     ++	.encode = &index_record_encode,
     ++	.decode = &index_record_decode,
     ++	.clear = &index_record_clear,
      +};
      +
     -+void record_key(struct record rec, struct slice *dest) {
     -+  rec.ops->key(rec.data, dest);
     ++void record_key(struct record rec, struct slice *dest)
     ++{
     ++	rec.ops->key(rec.data, dest);
      +}
      +
     -+byte record_type(struct record rec) { return rec.ops->type(); }
     ++byte record_type(struct record rec)
     ++{
     ++	return rec.ops->type();
     ++}
      +
     -+int record_encode(struct record rec, struct slice dest, int hash_size) {
     -+  return rec.ops->encode(rec.data, dest, hash_size);
     ++int record_encode(struct record rec, struct slice dest, int hash_size)
     ++{
     ++	return rec.ops->encode(rec.data, dest, hash_size);
      +}
      +
     -+void record_copy_from(struct record rec, struct record src, int hash_size) {
     -+  assert(src.ops->type() == rec.ops->type());
     ++void record_copy_from(struct record rec, struct record src, int hash_size)
     ++{
     ++	assert(src.ops->type() == rec.ops->type());
      +
     -+  rec.ops->copy_from(rec.data, src.data, hash_size);
     ++	rec.ops->copy_from(rec.data, src.data, hash_size);
      +}
      +
     -+byte record_val_type(struct record rec) { return rec.ops->val_type(rec.data); }
     ++byte record_val_type(struct record rec)
     ++{
     ++	return rec.ops->val_type(rec.data);
     ++}
      +
      +int record_decode(struct record rec, struct slice key, byte extra,
     -+                  struct slice src, int hash_size) {
     -+  return rec.ops->decode(rec.data, key, extra, src, hash_size);
     ++		  struct slice src, int hash_size)
     ++{
     ++	return rec.ops->decode(rec.data, key, extra, src, hash_size);
      +}
      +
     -+void record_clear(struct record rec) { return rec.ops->clear(rec.data); }
     ++void record_clear(struct record rec)
     ++{
     ++	return rec.ops->clear(rec.data);
     ++}
      +
     -+void record_from_ref(struct record *rec, struct ref_record *ref_rec) {
     -+  rec->data = ref_rec;
     -+  rec->ops = &ref_record_vtable;
     ++void record_from_ref(struct record *rec, struct ref_record *ref_rec)
     ++{
     ++	rec->data = ref_rec;
     ++	rec->ops = &ref_record_vtable;
      +}
      +
     -+void record_from_obj(struct record *rec, struct obj_record *obj_rec) {
     -+  rec->data = obj_rec;
     -+  rec->ops = &obj_record_vtable;
     ++void record_from_obj(struct record *rec, struct obj_record *obj_rec)
     ++{
     ++	rec->data = obj_rec;
     ++	rec->ops = &obj_record_vtable;
      +}
      +
     -+void record_from_index(struct record *rec, struct index_record *index_rec) {
     -+  rec->data = index_rec;
     -+  rec->ops = &index_record_vtable;
     ++void record_from_index(struct record *rec, struct index_record *index_rec)
     ++{
     ++	rec->data = index_rec;
     ++	rec->ops = &index_record_vtable;
      +}
      +
     -+void record_from_log(struct record *rec, struct log_record *log_rec) {
     -+  rec->data = log_rec;
     -+  rec->ops = &log_record_vtable;
     ++void record_from_log(struct record *rec, struct log_record *log_rec)
     ++{
     ++	rec->data = log_rec;
     ++	rec->ops = &log_record_vtable;
      +}
      +
     -+void *record_yield(struct record *rec) {
     -+  void *p = rec->data;
     -+  rec->data = NULL;
     -+  return p;
     ++void *record_yield(struct record *rec)
     ++{
     ++	void *p = rec->data;
     ++	rec->data = NULL;
     ++	return p;
      +}
      +
     -+struct ref_record *record_as_ref(struct record rec) {
     -+  assert(record_type(rec) == BLOCK_TYPE_REF);
     -+  return (struct ref_record *)rec.data;
     ++struct ref_record *record_as_ref(struct record rec)
     ++{
     ++	assert(record_type(rec) == BLOCK_TYPE_REF);
     ++	return (struct ref_record *)rec.data;
      +}
      +
     -+static bool hash_equal(byte *a, byte *b, int hash_size) {
     -+  if (a != NULL && b != NULL) {
     -+    return 0 == memcmp(a, b, hash_size);
     -+  }
     ++static bool hash_equal(byte *a, byte *b, int hash_size)
     ++{
     ++	if (a != NULL && b != NULL) {
     ++		return !memcmp(a, b, hash_size);
     ++	}
      +
     -+  return a == b;
     ++	return a == b;
      +}
      +
     -+static bool str_equal(char *a, char *b) {
     -+  if (a != NULL && b != NULL) {
     -+    return 0 == strcmp(a, b);
     -+  }
     ++static bool str_equal(char *a, char *b)
     ++{
     ++	if (a != NULL && b != NULL) {
     ++		return 0 == strcmp(a, b);
     ++	}
      +
     -+  return a == b;
     ++	return a == b;
      +}
      +
     -+bool ref_record_equal(struct ref_record *a, struct ref_record *b,
     -+                      int hash_size) {
     -+  assert(hash_size > 0);
     -+  return 0 == strcmp(a->ref_name, b->ref_name) &&
     -+         a->update_index == b->update_index &&
     -+         hash_equal(a->value, b->value, hash_size) &&
     -+         hash_equal(a->target_value, b->target_value, hash_size) &&
     -+         str_equal(a->target, b->target);
     ++bool ref_record_equal(struct ref_record *a, struct ref_record *b, int hash_size)
     ++{
     ++	assert(hash_size > 0);
     ++	return 0 == strcmp(a->ref_name, b->ref_name) &&
     ++	       a->update_index == b->update_index &&
     ++	       hash_equal(a->value, b->value, hash_size) &&
     ++	       hash_equal(a->target_value, b->target_value, hash_size) &&
     ++	       str_equal(a->target, b->target);
      +}
      +
     -+int ref_record_compare_name(const void *a, const void *b) {
     -+  return strcmp(((struct ref_record *)a)->ref_name,
     -+                ((struct ref_record *)b)->ref_name);
     ++int ref_record_compare_name(const void *a, const void *b)
     ++{
     ++	return strcmp(((struct ref_record *)a)->ref_name,
     ++		      ((struct ref_record *)b)->ref_name);
      +}
      +
     -+bool ref_record_is_deletion(const struct ref_record *ref) {
     -+  return ref->value == NULL && ref->target == NULL && ref->target_value == NULL;
     ++bool ref_record_is_deletion(const struct ref_record *ref)
     ++{
     ++	return ref->value == NULL && ref->target == NULL &&
     ++	       ref->target_value == NULL;
      +}
      +
     -+int log_record_compare_key(const void *a, const void *b) {
     -+  struct log_record *la = (struct log_record *)a;
     -+  struct log_record *lb = (struct log_record *)b;
     ++int log_record_compare_key(const void *a, const void *b)
     ++{
     ++	struct log_record *la = (struct log_record *)a;
     ++	struct log_record *lb = (struct log_record *)b;
      +
     -+  int cmp  = strcmp(la->ref_name, lb->ref_name);
     -+  if (cmp) {
     -+    return cmp;
     -+  }
     -+  if (la->update_index > lb->update_index) {
     -+    return -1;
     -+  }
     -+  return (la->update_index < lb->update_index) ? 1 : 0;
     ++	int cmp = strcmp(la->ref_name, lb->ref_name);
     ++	if (cmp) {
     ++		return cmp;
     ++	}
     ++	if (la->update_index > lb->update_index) {
     ++		return -1;
     ++	}
     ++	return (la->update_index < lb->update_index) ? 1 : 0;
      +}
      +
     -+bool log_record_is_deletion(const struct log_record *log) {
     -+  // XXX
     -+  return false;
     ++bool log_record_is_deletion(const struct log_record *log)
     ++{
     ++	/* XXX */
     ++	return false;
      +}
      
       diff --git a/reftable/record.h b/reftable/record.h
     @@ -4236,20 +4241,20 @@
      +#include "slice.h"
      +
      +struct record_vtable {
     -+  void (*key)(const void *rec, struct slice *dest);
     -+  byte (*type)(void);
     -+  void (*copy_from)(void *rec, const void *src, int hash_size);
     -+  byte (*val_type)(const void *rec);
     -+  int (*encode)(const void *rec, struct slice dest, int hash_size);
     -+  int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
     -+                int hash_size);
     -+  void (*clear)(void *rec);
     ++	void (*key)(const void *rec, struct slice *dest);
     ++	byte (*type)(void);
     ++	void (*copy_from)(void *rec, const void *src, int hash_size);
     ++	byte (*val_type)(const void *rec);
     ++	int (*encode)(const void *rec, struct slice dest, int hash_size);
     ++	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
     ++		      int hash_size);
     ++	void (*clear)(void *rec);
      +};
      +
      +/* record is a generic wrapper for differnt types of records. */
      +struct record {
     -+  void *data;
     -+  struct record_vtable *ops;
     ++	void *data;
     ++	struct record_vtable *ops;
      +};
      +
      +int get_var_int(uint64_t *dest, struct slice in);
     @@ -4262,20 +4267,20 @@
      +extern struct record_vtable ref_record_vtable;
      +
      +int encode_key(bool *restart, struct slice dest, struct slice prev_key,
     -+               struct slice key, byte extra);
     ++	       struct slice key, byte extra);
      +int decode_key(struct slice *key, byte *extra, struct slice last_key,
     -+               struct slice in);
     ++	       struct slice in);
      +
      +struct index_record {
     -+  struct slice last_key;
     -+  uint64_t offset;
     ++	struct slice last_key;
     ++	uint64_t offset;
      +};
      +
      +struct obj_record {
     -+  byte *hash_prefix;
     -+  int hash_prefix_len;
     -+  uint64_t *offsets;
     -+  int offset_len;
     ++	byte *hash_prefix;
     ++	int hash_prefix_len;
     ++	uint64_t *offsets;
     ++	int offset_len;
      +};
      +
      +void record_key(struct record rec, struct slice *dest);
     @@ -4284,7 +4289,7 @@
      +byte record_val_type(struct record rec);
      +int record_encode(struct record rec, struct slice dest, int hash_size);
      +int record_decode(struct record rec, struct slice key, byte extra,
     -+                  struct slice src, int hash_size);
     ++		  struct slice src, int hash_size);
      +void record_clear(struct record rec);
      +void *record_yield(struct record *rec);
      +void record_from_obj(struct record *rec, struct obj_record *objrec);
     @@ -4293,10 +4298,10 @@
      +void record_from_log(struct record *rec, struct log_record *logrec);
      +struct ref_record *record_as_ref(struct record ref);
      +
     -+// for qsort.
     ++/* for qsort. */
      +int ref_record_compare_name(const void *a, const void *b);
      +
     -+// for qsort.
     ++/* for qsort. */
      +int log_record_compare_key(const void *a, const void *b);
      +
      +#endif
     @@ -4316,308 +4321,327 @@
      +
      +#include "record.h"
      +
     -+#include <string.h>
     ++#include "system.h"
      +
      +#include "basics.h"
      +#include "constants.h"
      +#include "reftable.h"
      +#include "test_framework.h"
      +
     -+void varint_roundtrip() {
     -+  uint64_t inputs[] = {0,
     -+                       1,
     -+                       27,
     -+                       127,
     -+                       128,
     -+                       257,
     -+                       4096,
     -+                       ((uint64_t)1 << 63),
     -+                       ((uint64_t)1 << 63) + ((uint64_t)1 << 63) - 1};
     -+  for (int i = 0; i < ARRAYSIZE(inputs); i++) {
     -+    byte dest[10];
     -+
     -+    struct slice out = {.buf = dest, .len = 10, .cap = 10};
     -+
     -+    uint64_t in = inputs[i];
     -+    int n = put_var_int(out, in);
     -+    assert(n > 0);
     -+    out.len = n;
     -+
     -+    uint64_t got = 0;
     -+    n = get_var_int(&got, out);
     -+    assert(n > 0);
     -+
     -+    assert(got == in);
     -+  }
     -+}
     -+
     -+void test_common_prefix() {
     -+  struct {
     -+    const char *a, *b;
     -+    int want;
     -+  } cases[] = {
     -+      {"abc", "ab", 2},
     -+      {"", "abc", 0},
     -+      {"abc", "abd", 2},
     -+      {"abc", "pqr", 0},
     -+  };
     -+
     -+  for (int i = 0; i < ARRAYSIZE(cases); i++) {
     -+    struct slice a = {};
     -+    struct slice b = {};
     -+    slice_set_string(&a, cases[i].a);
     -+    slice_set_string(&b, cases[i].b);
     -+
     -+    int got = common_prefix_size(a, b);
     -+    assert(got == cases[i].want);
     -+
     -+    free(slice_yield(&a));
     -+    free(slice_yield(&b));
     -+  }
     -+}
     -+
     -+void set_hash(byte *h, int j) {
     -+  for (int i = 0; i < SHA1_SIZE; i++) {
     -+    h[i] = (j >> i) & 0xff;
     -+  }
     -+}
     -+
     -+void test_ref_record_roundtrip() {
     -+  for (int i = 0; i <= 3; i++) {
     -+    printf("subtest %d\n", i);
     -+    struct ref_record in = {};
     -+    switch (i) {
     -+      case 0:
     -+        break;
     -+      case 1:
     -+        in.value = malloc(SHA1_SIZE);
     -+        set_hash(in.value, 1);
     -+        break;
     -+      case 2:
     -+        in.value = malloc(SHA1_SIZE);
     -+        set_hash(in.value, 1);
     -+        in.target_value = malloc(SHA1_SIZE);
     -+        set_hash(in.target_value, 2);
     -+        break;
     -+      case 3:
     -+        in.target = strdup("target");
     -+        break;
     -+    }
     -+    in.ref_name = strdup("refs/heads/master");
     -+
     -+    struct record rec = {};
     -+    record_from_ref(&rec, &in);
     -+    assert(record_val_type(rec) == i);
     -+    byte buf[1024];
     -+    struct slice key = {};
     -+    record_key(rec, &key);
     -+    struct slice dest = {
     -+        .buf = buf,
     -+        .len = sizeof(buf),
     -+    };
     -+    int n = record_encode(rec, dest, SHA1_SIZE);
     -+    assert(n > 0);
     -+
     -+    struct ref_record out = {};
     -+    struct record rec_out = {};
     -+    record_from_ref(&rec_out, &out);
     -+    int m = record_decode(rec_out, key, i, dest, SHA1_SIZE);
     -+    assert(n == m);
     -+
     -+    assert((out.value != NULL) == (in.value != NULL));
     -+    assert((out.target_value != NULL) == (in.target_value != NULL));
     -+    assert((out.target != NULL) == (in.target != NULL));
     -+    free(slice_yield(&key));
     -+    record_clear(rec_out);
     -+    ref_record_clear(&in);
     -+  }
     -+}
     -+
     -+void test_log_record_roundtrip() {
     -+  struct log_record in = {
     -+      .ref_name = strdup("refs/heads/master"),
     -+      .old_hash = malloc(SHA1_SIZE),
     -+      .new_hash = malloc(SHA1_SIZE),
     -+      .name = strdup("han-wen"),
     -+      .email = strdup("hanwen@google.com"),
     -+      .message = strdup("test"),
     -+      .update_index = 42,
     -+      .time = 1577123507,
     -+      .tz_offset = 100,
     -+  };
     -+
     -+  struct record rec = {};
     -+  record_from_log(&rec, &in);
     -+
     -+  struct slice key = {};
     -+  record_key(rec, &key);
     -+
     -+  byte buf[1024];
     -+  struct slice dest = {
     -+      .buf = buf,
     -+      .len = sizeof(buf),
     -+  };
     -+
     -+  int n = record_encode(rec, dest, SHA1_SIZE);
     -+  assert(n > 0);
     -+
     -+  struct log_record out = {};
     -+  struct record rec_out = {};
     -+  record_from_log(&rec_out, &out);
     -+  int valtype = record_val_type(rec);
     -+  int m = record_decode(rec_out, key, valtype, dest, SHA1_SIZE);
     -+  assert(n == m);
     -+
     -+  assert(log_record_equal(&in, &out, SHA1_SIZE));
     -+  log_record_clear(&in);
     -+  free(slice_yield(&key));
     -+  record_clear(rec_out);
     -+}
     -+
     -+void test_u24_roundtrip() {
     -+  uint32_t in = 0x112233;
     -+  byte dest[3];
     -+
     -+  put_u24(dest, in);
     -+  uint32_t out = get_u24(dest);
     -+  assert(in == out);
     -+}
     -+
     -+void test_key_roundtrip() {
     -+  struct slice dest = {}, last_key = {}, key = {}, roundtrip = {};
     -+
     -+  slice_resize(&dest, 1024);
     -+  slice_set_string(&last_key, "refs/heads/master");
     -+  slice_set_string(&key, "refs/tags/bla");
     -+
     -+  bool restart;
     -+  byte extra = 6;
     -+  int n = encode_key(&restart, dest, last_key, key, extra);
     -+  assert(!restart);
     -+  assert(n > 0);
     -+
     -+  byte rt_extra;
     -+  int m = decode_key(&roundtrip, &rt_extra, last_key, dest);
     -+  assert(n == m);
     -+  assert(slice_equal(key, roundtrip));
     -+  assert(rt_extra == extra);
     -+
     -+  free(slice_yield(&last_key));
     -+  free(slice_yield(&key));
     -+  free(slice_yield(&dest));
     -+  free(slice_yield(&roundtrip));
     -+}
     -+
     -+void print_bytes(byte *p, int l) {
     -+  for (int i = 0; i < l; i++) {
     -+    byte c = *p;
     -+    if (c < 32) {
     -+      c = '.';
     -+    }
     -+    printf("%02x[%c] ", p[i], c);
     -+  }
     -+  printf("(%d)\n", l);
     -+}
     -+
     -+void test_obj_record_roundtrip() {
     -+  byte testHash1[SHA1_SIZE] = {};
     -+  set_hash(testHash1, 1);
     -+  uint64_t till9[] = {1, 2, 3, 4, 500, 600, 700, 800, 9000};
     -+
     -+  struct obj_record recs[3] = {{
     -+                                   .hash_prefix = testHash1,
     -+                                   .hash_prefix_len = 5,
     -+                                   .offsets = till9,
     -+                                   .offset_len = 3,
     -+                               },
     -+                               {
     -+                                   .hash_prefix = testHash1,
     -+                                   .hash_prefix_len = 5,
     -+                                   .offsets = till9,
     -+                                   .offset_len = 9,
     -+                               },
     -+                               {
     -+                                   .hash_prefix = testHash1,
     -+                                   .hash_prefix_len = 5,
     -+                               }
     -+
     -+  };
     -+  for (int i = 0; i < ARRAYSIZE(recs); i++) {
     -+    printf("subtest %d\n", i);
     -+    struct obj_record in = recs[i];
     -+    byte buf[1024];
     -+    struct record rec = {};
     -+    record_from_obj(&rec, &in);
     -+    struct slice key = {};
     -+    record_key(rec, &key);
     -+    struct slice dest = {
     -+        .buf = buf,
     -+        .len = sizeof(buf),
     -+    };
     -+    int n = record_encode(rec, dest, SHA1_SIZE);
     -+    assert(n > 0);
     -+    byte extra = record_val_type(rec);
     -+    struct obj_record out = {};
     -+    struct record rec_out = {};
     -+    record_from_obj(&rec_out, &out);
     -+    int m = record_decode(rec_out, key, extra, dest, SHA1_SIZE);
     -+    assert(n == m);
     -+
     -+    assert(in.hash_prefix_len == out.hash_prefix_len);
     -+    assert(in.offset_len == out.offset_len);
     -+
     -+    assert(0 == memcmp(in.hash_prefix, out.hash_prefix, in.hash_prefix_len));
     -+    assert(0 ==
     -+           memcmp(in.offsets, out.offsets, sizeof(uint64_t) * in.offset_len));
     -+    free(slice_yield(&key));
     -+    record_clear(rec_out);
     -+  }
     -+}
     -+
     -+void test_index_record_roundtrip() {
     -+  struct index_record in = {.offset = 42};
     -+
     -+  slice_set_string(&in.last_key, "refs/heads/master");
     -+
     -+  struct slice key = {};
     -+  struct record rec = {};
     -+  record_from_index(&rec, &in);
     -+  record_key(rec, &key);
     -+
     -+  assert(0 == slice_compare(key, in.last_key));
     -+
     -+  byte buf[1024];
     -+  struct slice dest = {
     -+      .buf = buf,
     -+      .len = sizeof(buf),
     -+  };
     -+  int n = record_encode(rec, dest, SHA1_SIZE);
     -+  assert(n > 0);
     -+
     -+  byte extra = record_val_type(rec);
     -+  struct index_record out = {};
     -+  struct record out_rec;
     -+  record_from_index(&out_rec, &out);
     -+  int m = record_decode(out_rec, key, extra, dest, SHA1_SIZE);
     -+  assert(m == n);
     -+
     -+  assert(in.offset == out.offset);
     -+
     -+  record_clear(out_rec);
     -+  free(slice_yield(&key));
     -+  free(slice_yield(&in.last_key));
     -+}
     -+
     -+int main() {
     -+  add_test_case("test_log_record_roundtrip", &test_log_record_roundtrip);
     -+  add_test_case("test_ref_record_roundtrip", &test_ref_record_roundtrip);
     -+  add_test_case("varint_roundtrip", &varint_roundtrip);
     -+  add_test_case("test_key_roundtrip", &test_key_roundtrip);
     -+  add_test_case("test_common_prefix", &test_common_prefix);
     -+  add_test_case("test_obj_record_roundtrip", &test_obj_record_roundtrip);
     -+  add_test_case("test_index_record_roundtrip", &test_index_record_roundtrip);
     -+  add_test_case("test_u24_roundtrip", &test_u24_roundtrip);
     -+  test_main();
     ++void varint_roundtrip()
     ++{
     ++	uint64_t inputs[] = { 0,
     ++			      1,
     ++			      27,
     ++			      127,
     ++			      128,
     ++			      257,
     ++			      4096,
     ++			      ((uint64_t)1 << 63),
     ++			      ((uint64_t)1 << 63) + ((uint64_t)1 << 63) - 1 };
     ++	int i = 0;
     ++	for (i = 0; i < ARRAYSIZE(inputs); i++) {
     ++		byte dest[10];
     ++
     ++		struct slice out = { .buf = dest, .len = 10, .cap = 10 };
     ++
     ++		uint64_t in = inputs[i];
     ++		int n = put_var_int(out, in);
     ++		assert(n > 0);
     ++		out.len = n;
     ++
     ++		uint64_t got = 0;
     ++		n = get_var_int(&got, out);
     ++		assert(n > 0);
     ++
     ++		assert(got == in);
     ++	}
     ++}
     ++
     ++void test_common_prefix()
     ++{
     ++	struct {
     ++		const char *a, *b;
     ++		int want;
     ++	} cases[] = {
     ++		{ "abc", "ab", 2 },
     ++		{ "", "abc", 0 },
     ++		{ "abc", "abd", 2 },
     ++		{ "abc", "pqr", 0 },
     ++	};
     ++
     ++	int i = 0;
     ++	for (i = 0; i < ARRAYSIZE(cases); i++) {
     ++		struct slice a = {};
     ++		struct slice b = {};
     ++		slice_set_string(&a, cases[i].a);
     ++		slice_set_string(&b, cases[i].b);
     ++
     ++		int got = common_prefix_size(a, b);
     ++		assert(got == cases[i].want);
     ++
     ++		free(slice_yield(&a));
     ++		free(slice_yield(&b));
     ++	}
     ++}
     ++
     ++void set_hash(byte *h, int j)
     ++{
     ++	int i = 0;
     ++	for (i = 0; i < SHA1_SIZE; i++) {
     ++		h[i] = (j >> i) & 0xff;
     ++	}
     ++}
     ++
     ++void test_ref_record_roundtrip()
     ++{
     ++	int i = 0;
     ++	for (i = 0; i <= 3; i++) {
     ++		printf("subtest %d\n", i);
     ++		struct ref_record in = {};
     ++		switch (i) {
     ++		case 0:
     ++			break;
     ++		case 1:
     ++			in.value = malloc(SHA1_SIZE);
     ++			set_hash(in.value, 1);
     ++			break;
     ++		case 2:
     ++			in.value = malloc(SHA1_SIZE);
     ++			set_hash(in.value, 1);
     ++			in.target_value = malloc(SHA1_SIZE);
     ++			set_hash(in.target_value, 2);
     ++			break;
     ++		case 3:
     ++			in.target = strdup("target");
     ++			break;
     ++		}
     ++		in.ref_name = strdup("refs/heads/master");
     ++
     ++		struct record rec = {};
     ++		record_from_ref(&rec, &in);
     ++		assert(record_val_type(rec) == i);
     ++		byte buf[1024];
     ++		struct slice key = {};
     ++		record_key(rec, &key);
     ++		struct slice dest = {
     ++			.buf = buf,
     ++			.len = sizeof(buf),
     ++		};
     ++		int n = record_encode(rec, dest, SHA1_SIZE);
     ++		assert(n > 0);
     ++
     ++		struct ref_record out = {};
     ++		struct record rec_out = {};
     ++		record_from_ref(&rec_out, &out);
     ++		int m = record_decode(rec_out, key, i, dest, SHA1_SIZE);
     ++		assert(n == m);
     ++
     ++		assert((out.value != NULL) == (in.value != NULL));
     ++		assert((out.target_value != NULL) == (in.target_value != NULL));
     ++		assert((out.target != NULL) == (in.target != NULL));
     ++		free(slice_yield(&key));
     ++		record_clear(rec_out);
     ++		ref_record_clear(&in);
     ++	}
     ++}
     ++
     ++void test_log_record_roundtrip()
     ++{
     ++	struct log_record in = {
     ++		.ref_name = strdup("refs/heads/master"),
     ++		.old_hash = malloc(SHA1_SIZE),
     ++		.new_hash = malloc(SHA1_SIZE),
     ++		.name = strdup("han-wen"),
     ++		.email = strdup("hanwen@google.com"),
     ++		.message = strdup("test"),
     ++		.update_index = 42,
     ++		.time = 1577123507,
     ++		.tz_offset = 100,
     ++	};
     ++
     ++	struct record rec = {};
     ++	record_from_log(&rec, &in);
     ++
     ++	struct slice key = {};
     ++	record_key(rec, &key);
     ++
     ++	byte buf[1024];
     ++	struct slice dest = {
     ++		.buf = buf,
     ++		.len = sizeof(buf),
     ++	};
     ++
     ++	int n = record_encode(rec, dest, SHA1_SIZE);
     ++	assert(n > 0);
     ++
     ++	struct log_record out = {};
     ++	struct record rec_out = {};
     ++	record_from_log(&rec_out, &out);
     ++	int valtype = record_val_type(rec);
     ++	int m = record_decode(rec_out, key, valtype, dest, SHA1_SIZE);
     ++	assert(n == m);
     ++
     ++	assert(log_record_equal(&in, &out, SHA1_SIZE));
     ++	log_record_clear(&in);
     ++	free(slice_yield(&key));
     ++	record_clear(rec_out);
     ++}
     ++
     ++void test_u24_roundtrip()
     ++{
     ++	uint32_t in = 0x112233;
     ++	byte dest[3];
     ++
     ++	put_u24(dest, in);
     ++	uint32_t out = get_u24(dest);
     ++	assert(in == out);
     ++}
     ++
     ++void test_key_roundtrip()
     ++{
     ++	struct slice dest = {}, last_key = {}, key = {}, roundtrip = {};
     ++
     ++	slice_resize(&dest, 1024);
     ++	slice_set_string(&last_key, "refs/heads/master");
     ++	slice_set_string(&key, "refs/tags/bla");
     ++
     ++	bool restart;
     ++	byte extra = 6;
     ++	int n = encode_key(&restart, dest, last_key, key, extra);
     ++	assert(!restart);
     ++	assert(n > 0);
     ++
     ++	byte rt_extra;
     ++	int m = decode_key(&roundtrip, &rt_extra, last_key, dest);
     ++	assert(n == m);
     ++	assert(slice_equal(key, roundtrip));
     ++	assert(rt_extra == extra);
     ++
     ++	free(slice_yield(&last_key));
     ++	free(slice_yield(&key));
     ++	free(slice_yield(&dest));
     ++	free(slice_yield(&roundtrip));
     ++}
     ++
     ++void print_bytes(byte *p, int l)
     ++{
     ++	int i = 0;
     ++	for (i = 0; i < l; i++) {
     ++		byte c = *p;
     ++		if (c < 32) {
     ++			c = '.';
     ++		}
     ++		printf("%02x[%c] ", p[i], c);
     ++	}
     ++	printf("(%d)\n", l);
     ++}
     ++
     ++void test_obj_record_roundtrip()
     ++{
     ++	byte testHash1[SHA1_SIZE] = {};
     ++	set_hash(testHash1, 1);
     ++	uint64_t till9[] = { 1, 2, 3, 4, 500, 600, 700, 800, 9000 };
     ++
     ++	struct obj_record recs[3] = { {
     ++					      .hash_prefix = testHash1,
     ++					      .hash_prefix_len = 5,
     ++					      .offsets = till9,
     ++					      .offset_len = 3,
     ++				      },
     ++				      {
     ++					      .hash_prefix = testHash1,
     ++					      .hash_prefix_len = 5,
     ++					      .offsets = till9,
     ++					      .offset_len = 9,
     ++				      },
     ++				      {
     ++					      .hash_prefix = testHash1,
     ++					      .hash_prefix_len = 5,
     ++				      }
     ++
     ++	};
     ++	int i = 0;
     ++	for (i = 0; i < ARRAYSIZE(recs); i++) {
     ++		printf("subtest %d\n", i);
     ++		struct obj_record in = recs[i];
     ++		byte buf[1024];
     ++		struct record rec = {};
     ++		record_from_obj(&rec, &in);
     ++		struct slice key = {};
     ++		record_key(rec, &key);
     ++		struct slice dest = {
     ++			.buf = buf,
     ++			.len = sizeof(buf),
     ++		};
     ++		int n = record_encode(rec, dest, SHA1_SIZE);
     ++		assert(n > 0);
     ++		byte extra = record_val_type(rec);
     ++		struct obj_record out = {};
     ++		struct record rec_out = {};
     ++		record_from_obj(&rec_out, &out);
     ++		int m = record_decode(rec_out, key, extra, dest, SHA1_SIZE);
     ++		assert(n == m);
     ++
     ++		assert(in.hash_prefix_len == out.hash_prefix_len);
     ++		assert(in.offset_len == out.offset_len);
     ++
     ++		assert(!memcmp(in.hash_prefix, out.hash_prefix,
     ++			       in.hash_prefix_len));
     ++		assert(0 == memcmp(in.offsets, out.offsets,
     ++				   sizeof(uint64_t) * in.offset_len));
     ++		free(slice_yield(&key));
     ++		record_clear(rec_out);
     ++	}
     ++}
     ++
     ++void test_index_record_roundtrip()
     ++{
     ++	struct index_record in = { .offset = 42 };
     ++
     ++	slice_set_string(&in.last_key, "refs/heads/master");
     ++
     ++	struct slice key = {};
     ++	struct record rec = {};
     ++	record_from_index(&rec, &in);
     ++	record_key(rec, &key);
     ++
     ++	assert(0 == slice_compare(key, in.last_key));
     ++
     ++	byte buf[1024];
     ++	struct slice dest = {
     ++		.buf = buf,
     ++		.len = sizeof(buf),
     ++	};
     ++	int n = record_encode(rec, dest, SHA1_SIZE);
     ++	assert(n > 0);
     ++
     ++	byte extra = record_val_type(rec);
     ++	struct index_record out = {};
     ++	struct record out_rec;
     ++	record_from_index(&out_rec, &out);
     ++	int m = record_decode(out_rec, key, extra, dest, SHA1_SIZE);
     ++	assert(m == n);
     ++
     ++	assert(in.offset == out.offset);
     ++
     ++	record_clear(out_rec);
     ++	free(slice_yield(&key));
     ++	free(slice_yield(&in.last_key));
     ++}
     ++
     ++int main()
     ++{
     ++	add_test_case("test_log_record_roundtrip", &test_log_record_roundtrip);
     ++	add_test_case("test_ref_record_roundtrip", &test_ref_record_roundtrip);
     ++	add_test_case("varint_roundtrip", &varint_roundtrip);
     ++	add_test_case("test_key_roundtrip", &test_key_roundtrip);
     ++	add_test_case("test_common_prefix", &test_common_prefix);
     ++	add_test_case("test_obj_record_roundtrip", &test_obj_record_roundtrip);
     ++	add_test_case("test_index_record_roundtrip",
     ++		      &test_index_record_roundtrip);
     ++	add_test_case("test_u24_roundtrip", &test_u24_roundtrip);
     ++	test_main();
      +}
      
       diff --git a/reftable/reftable.h b/reftable/reftable.h
     @@ -4636,7 +4660,7 @@
      +#ifndef REFTABLE_H
      +#define REFTABLE_H
      +
     -+#include <stdint.h>
     ++#include "system.h"
      +
      +typedef uint8_t byte;
      +typedef byte bool;
     @@ -4645,33 +4669,33 @@
      +   It is generally passed around by value.
      + */
      +struct block_source {
     -+  struct block_source_vtable *ops;
     -+  void *arg;
     ++	struct block_source_vtable *ops;
     ++	void *arg;
      +};
      +
      +/* a contiguous segment of bytes. It keeps track of its generating block_source
      +   so it can return itself into the pool.
      +*/
      +struct block {
     -+  byte *data;
     -+  int len;
     -+  struct block_source source;
     ++	byte *data;
     ++	int len;
     ++	struct block_source source;
      +};
      +
      +/* block_source_vtable are the operations that make up block_source */
      +struct block_source_vtable {
     -+  /* returns the size of a block source */
     -+  uint64_t (*size)(void *source);
     -+
     -+  /* reads a segment from the block source. It is an error to read
     -+     beyond the end of the block */
     -+  int (*read_block)(void *source, struct block *dest, uint64_t off,
     -+                    uint32_t size);
     -+  /* mark the block as read; may return the data back to malloc */
     -+  void (*return_block)(void *source, struct block *blockp);
     -+
     -+  /* release all resources associated with the block source */
     -+  void (*close)(void *source);
     ++	/* returns the size of a block source */
     ++	uint64_t (*size)(void *source);
     ++
     ++	/* reads a segment from the block source. It is an error to read
     ++	   beyond the end of the block */
     ++	int (*read_block)(void *source, struct block *dest, uint64_t off,
     ++			  uint32_t size);
     ++	/* mark the block as read; may return the data back to malloc */
     ++	void (*return_block)(void *source, struct block *blockp);
     ++
     ++	/* release all resources associated with the block source */
     ++	void (*close)(void *source);
      +};
      +
      +/* opens a file on the file system as a block_source */
     @@ -4679,26 +4703,27 @@
      +
      +/* write_options sets options for writing a single reftable. */
      +struct write_options {
     -+  /* do not pad out blocks to block size. */
     -+  bool unpadded;
     ++	/* do not pad out blocks to block size. */
     ++	bool unpadded;
      +
     -+  /* the blocksize. Should be less than 2^24. */
     -+  uint32_t block_size;
     ++	/* the blocksize. Should be less than 2^24. */
     ++	uint32_t block_size;
      +
     -+  /* do not generate a SHA1 => ref index. */
     -+  bool skip_index_objects;
     ++	/* do not generate a SHA1 => ref index. */
     ++	bool skip_index_objects;
      +
     -+  /* how often to write complete keys in each block. */
     -+  int restart_interval;
     ++	/* how often to write complete keys in each block. */
     ++	int restart_interval;
      +};
      +
      +/* ref_record holds a ref database entry target_value */
      +struct ref_record {
     -+  char *ref_name;        /* Name of the ref, malloced. */
     -+  uint64_t update_index; /* Logical timestamp at which this value is written */
     -+  byte *value;           /* SHA1, or NULL. malloced. */
     -+  byte *target_value;    /* peeled annotated tag, or NULL. malloced. */
     -+  char *target;          /* symref, or NULL. malloced. */
     ++	char *ref_name; /* Name of the ref, malloced. */
     ++	uint64_t update_index; /* Logical timestamp at which this value is
     ++				  written */
     ++	byte *value; /* SHA1, or NULL. malloced. */
     ++	byte *target_value; /* peeled annotated tag, or NULL. malloced. */
     ++	char *target; /* symref, or NULL. malloced. */
      +};
      +
      +/* returns whether 'ref' represents a deletion */
     @@ -4712,19 +4737,19 @@
      +
      +/* returns whether two ref_records are the same */
      +bool ref_record_equal(struct ref_record *a, struct ref_record *b,
     -+                      int hash_size);
     ++		      int hash_size);
      +
      +/* log_record holds a reflog entry */
      +struct log_record {
     -+  char *ref_name;
     -+  uint64_t update_index;
     -+  byte *new_hash;
     -+  byte *old_hash;
     -+  char *name;
     -+  char *email;
     -+  uint64_t time;
     -+  int16_t tz_offset;
     -+  char *message;
     ++	char *ref_name;
     ++	uint64_t update_index;
     ++	byte *new_hash;
     ++	byte *old_hash;
     ++	char *name;
     ++	char *email;
     ++	uint64_t time;
     ++	int16_t tz_offset;
     ++	char *message;
      +};
      +
      +/* returns whether 'ref' represents the deletion of a log record. */
     @@ -4735,7 +4760,7 @@
      +
      +/* returns whether two records are equal. */
      +bool log_record_equal(struct log_record *a, struct log_record *b,
     -+                      int hash_size);
     ++		      int hash_size);
      +
      +void log_record_print(struct log_record *log, int hash_size);
      +
     @@ -4743,8 +4768,8 @@
      +   reftable. It is generally passed around by value.
      +*/
      +struct iterator {
     -+  struct iterator_vtable *ops;
     -+  void *iter_arg;
     ++	struct iterator_vtable *ops;
     ++	void *iter_arg;
      +};
      +
      +/* reads the next ref_record. Returns < 0 for error, 0 for OK and > 0:
     @@ -4762,38 +4787,39 @@
      +
      +/* block_stats holds statistics for a single block type */
      +struct block_stats {
     -+  /* total number of entries written */
     -+  int entries;
     -+  /* total number of key restarts */
     -+  int restarts;
     -+  /* total number of blocks */
     -+  int blocks;
     -+  /* total number of index blocks */
     -+  int index_blocks;
     -+  /* depth of the index */
     -+  int max_index_level;
     -+
     -+  /* offset of the first block for this type */
     -+  uint64_t offset;
     -+  /* offset of the top level index block for this type, or 0 if not present */
     -+  uint64_t index_offset;
     ++	/* total number of entries written */
     ++	int entries;
     ++	/* total number of key restarts */
     ++	int restarts;
     ++	/* total number of blocks */
     ++	int blocks;
     ++	/* total number of index blocks */
     ++	int index_blocks;
     ++	/* depth of the index */
     ++	int max_index_level;
     ++
     ++	/* offset of the first block for this type */
     ++	uint64_t offset;
     ++	/* offset of the top level index block for this type, or 0 if not
     ++	 * present */
     ++	uint64_t index_offset;
      +};
      +
      +/* stats holds overall statistics for a single reftable */
      +struct stats {
     -+  /* total number of blocks written. */
     -+  int blocks;
     -+  /* stats for ref data */
     -+  struct block_stats ref_stats;
     -+  /* stats for the SHA1 to ref map. */
     -+  struct block_stats obj_stats;
     -+  /* stats for index blocks */
     -+  struct block_stats idx_stats;
     -+  /* stats for log blocks */
     -+  struct block_stats log_stats;
     -+
     -+  /* disambiguation length of shortened object IDs. */
     -+  int object_id_len;
     ++	/* total number of blocks written. */
     ++	int blocks;
     ++	/* stats for ref data */
     ++	struct block_stats ref_stats;
     ++	/* stats for the SHA1 to ref map. */
     ++	struct block_stats obj_stats;
     ++	/* stats for index blocks */
     ++	struct block_stats idx_stats;
     ++	/* stats for log blocks */
     ++	struct block_stats log_stats;
     ++
     ++	/* disambiguation length of shortened object IDs. */
     ++	int object_id_len;
      +};
      +
      +/* different types of errors */
     @@ -4814,6 +4840,7 @@
      +#define LOCK_ERROR -5
      +
      +/* Misuse of the API:
     ++   - on writing a record with NULL ref_name.
      +   - on writing a ref_record outside the table limits
      +   - on writing a ref or log record before the stack's next_update_index
      +   - on reading a ref_record from log iterator, or vice versa.
     @@ -4827,7 +4854,7 @@
      +
      +/* new_writer creates a new writer */
      +struct writer *new_writer(int (*writer_func)(void *, byte *, int),
     -+                          void *writer_arg, struct write_options *opts);
     ++			  void *writer_arg, struct write_options *opts);
      +
      +/* write to a file descriptor. fdp should be an int* pointing to the fd. */
      +int fd_writer(void *fdp, byte *data, int size);
     @@ -4857,7 +4884,6 @@
      +   key before adding. */
      +int writer_add_logs(struct writer *w, struct log_record *logs, int n);
      +
     -+
      +/* writer_close finalizes the reftable. The writer is retained so statistics can
      + * be inspected. */
      +int writer_close(struct writer *w);
     @@ -4905,7 +4931,7 @@
      +
      +/* seek to logs for the given name, older than update_index. */
      +int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
     -+                       uint64_t update_index);
     ++		       uint64_t update_index);
      +
      +/* seek to newest log entry for given name. */
      +int reader_seek_log(struct reader *r, struct iterator *it, const char *name);
     @@ -4915,7 +4941,7 @@
      +
      +/* return an iterator for the refs pointing to oid */
      +int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
     -+                    int oid_len);
     ++		    int oid_len);
      +
      +/* return the max_update_index for a table */
      +uint64_t reader_max_update_index(struct reader *r);
     @@ -4933,15 +4959,15 @@
      +
      +/* returns an iterator positioned just before 'name' */
      +int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
     -+                          const char *name);
     ++			  const char *name);
      +
      +/* returns an iterator for log entry, at given update_index */
      +int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
     -+                             const char *name, uint64_t update_index);
     ++			     const char *name, uint64_t update_index);
      +
      +/* like merged_table_seek_log_at but look for the newest entry. */
      +int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
     -+                          const char *name);
     ++			  const char *name);
      +
      +/* returns the max update_index covered by this merged table. */
      +uint64_t merged_max_update_index(struct merged_table *mt);
     @@ -4964,7 +4990,7 @@
      +   .git/refs respectively.
      +*/
      +int new_stack(struct stack **dest, const char *dir, const char *list_file,
     -+              struct write_options config);
     ++	      struct write_options config);
      +
      +/* returns the update_index at which a next table should be written. */
      +uint64_t stack_next_update_index(struct stack *st);
     @@ -4972,8 +4998,8 @@
      +/* add a new table to the stack. The write_table function must call
      +   writer_set_limits, add refs and return an error value. */
      +int stack_add(struct stack *st,
     -+              int (*write_table)(struct writer *wr, void *write_arg),
     -+              void *write_arg);
     ++	      int (*write_table)(struct writer *wr, void *write_arg),
     ++	      void *write_arg);
      +
      +/* returns the merged_table for seeking. This table is valid until the
      +   next write or reload, and should not be closed or deleted.
     @@ -4988,14 +5014,15 @@
      +
      +/* Policy for expiring reflog entries. */
      +struct log_expiry_config {
     -+  /* Drop entries older than this timestamp */
     -+  uint64_t time;
     ++	/* Drop entries older than this timestamp */
     ++	uint64_t time;
      +
     -+  /* Drop older entries */
     -+  uint64_t min_update_index;
     ++	/* Drop older entries */
     ++	uint64_t min_update_index;
      +};
      +
     -+/* compacts all reftables into a giant table. Expire reflog entries if config is non-NULL */
     ++/* compacts all reftables into a giant table. Expire reflog entries if config is
     ++ * non-NULL */
      +int stack_compact_all(struct stack *st, struct log_expiry_config *config);
      +
      +/* heuristically compact unbalanced table stack. */
     @@ -5004,18 +5031,18 @@
      +/* convenience function to read a single ref. Returns < 0 for error, 0
      +   for success, and 1 if ref not found. */
      +int stack_read_ref(struct stack *st, const char *refname,
     -+                   struct ref_record *ref);
     ++		   struct ref_record *ref);
      +
      +/* convenience function to read a single log. Returns < 0 for error, 0
      +   for success, and 1 if ref not found. */
      +int stack_read_log(struct stack *st, const char *refname,
     -+                   struct log_record *log);
     ++		   struct log_record *log);
      +
      +/* statistics on past compactions. */
      +struct compaction_stats {
     -+  uint64_t bytes;
     -+  int attempts;
     -+  int failures;
     ++	uint64_t bytes;
     ++	int attempts;
     ++	int failures;
      +};
      +
      +struct compaction_stats *stack_compaction_stats(struct stack *st);
     @@ -5037,7 +5064,7 @@
      +
      +#include "reftable.h"
      +
     -+#include <string.h>
     ++#include "system.h"
      +
      +#include "basics.h"
      +#include "block.h"
     @@ -5048,418 +5075,465 @@
      +
      +static const int update_index = 5;
      +
     -+void test_buffer(void) {
     -+  struct slice buf = {};
     -+
     -+  byte in[] = "hello";
     -+  slice_write(&buf, in, sizeof(in));
     -+  struct block_source source;
     -+  block_source_from_slice(&source, &buf);
     -+  assert(block_source_size(source) == 6);
     -+  struct block out = {};
     -+  int n = block_source_read_block(source, &out, 0, sizeof(in));
     -+  assert(n == sizeof(in));
     -+  assert(0 == memcmp(in, out.data, n));
     -+  block_source_return_block(source, &out);
     -+
     -+  n = block_source_read_block(source, &out, 1, 2);
     -+  assert(n == 2);
     -+  assert(0 == memcmp(out.data, "el", 2));
     -+
     -+  block_source_return_block(source, &out);
     -+  block_source_close(&source);
     -+  free(slice_yield(&buf));
     -+}
     -+
     -+void write_table(char ***names, struct slice *buf, int N, int block_size) {
     -+  *names = calloc(sizeof(char *), N + 1);
     -+
     -+  struct write_options opts = {
     -+      .block_size = block_size,
     -+  };
     -+
     -+  struct writer *w = new_writer(&slice_write_void, buf, &opts);
     -+
     -+  writer_set_limits(w, update_index, update_index);
     -+  {
     -+    struct ref_record ref = {};
     -+    for (int i = 0; i < N; i++) {
     -+      byte hash[SHA1_SIZE];
     -+      set_test_hash(hash, i);
     -+
     -+      char name[100];
     -+      sprintf(name, "refs/heads/branch%02d", i);
     -+
     -+      ref.ref_name = name;
     -+      ref.value = hash;
     -+      ref.update_index = update_index;
     -+      (*names)[i] = strdup(name);
     -+
     -+      fflush(stdout);
     -+      int n = writer_add_ref(w, &ref);
     -+      assert(n == 0);
     -+    }
     -+  }
     -+  int n = writer_close(w);
     -+  assert(n == 0);
     -+
     -+  struct stats *stats = writer_stats(w);
     -+  for (int i = 0; i < stats->ref_stats.blocks; i++) {
     -+    int off = i * opts.block_size;
     -+    if (off == 0) {
     -+      off = HEADER_SIZE;
     -+    }
     -+    assert(buf->buf[off] == 'r');
     -+  }
     -+
     -+  writer_free(w);
     -+  w = NULL;
     -+}
     -+
     -+void test_log_write_read(void) {
     -+  int N = 2;
     -+  char **names = calloc(sizeof(char *), N + 1);
     -+
     -+  struct write_options opts = {
     -+      .block_size = 256,
     -+  };
     -+
     -+  struct slice buf = {};
     -+  struct writer *w = new_writer(&slice_write_void, &buf, &opts);
     -+
     -+  writer_set_limits(w, 0, N);
     -+  {
     -+    struct ref_record ref = {};
     -+    for (int i = 0; i < N; i++) {
     -+      char name[256];
     -+      sprintf(name, "b%02d%0*d", i, 130, 7);
     -+      names[i] = strdup(name);
     -+      puts(name);
     -+      ref.ref_name = name;
     -+      ref.update_index = i;
     -+
     -+      int err = writer_add_ref(w, &ref);
     -+      assert_err(err);
     -+    }
     -+  }
     -+
     -+  {
     -+    struct log_record log = {};
     -+    for (int i = 0; i < N; i++) {
     -+      byte hash1[SHA1_SIZE], hash2[SHA1_SIZE];
     -+      set_test_hash(hash1, i);
     -+      set_test_hash(hash2, i + 1);
     -+
     -+      log.ref_name = names[i];
     -+      log.update_index = i;
     -+      log.old_hash = hash1;
     -+      log.new_hash = hash2;
     -+
     -+      int err = writer_add_log(w, &log);
     -+      assert_err(err);
     -+    }
     -+  }
     -+
     -+  int n = writer_close(w);
     -+  assert(n == 0);
     -+
     -+  struct stats *stats = writer_stats(w);
     -+  assert(stats->log_stats.blocks > 0);
     -+  writer_free(w);
     -+  w = NULL;
     -+
     -+  struct block_source source = {};
     -+  block_source_from_slice(&source, &buf);
     -+
     -+  struct reader rd = {};
     -+  int err = init_reader(&rd, source, "file.log");
     -+  assert(err == 0);
     -+
     -+  {
     -+    struct iterator it = {};
     -+    err = reader_seek_ref(&rd, &it, names[N-1]);
     -+    assert(err == 0);
     -+
     -+    struct ref_record ref = {};
     -+    err = iterator_next_ref(it, &ref);
     -+    assert_err(err);
     -+
     -+    // end of iteration.
     -+    err = iterator_next_ref(it, &ref);
     -+    assert(0 < err);
     -+
     -+    iterator_destroy(&it);
     -+    ref_record_clear(&ref);
     -+  }
     -+
     -+  {
     -+    struct iterator it = {};
     -+    err = reader_seek_log(&rd, &it, "");
     -+    assert(err == 0);
     -+
     -+    struct log_record log = {};
     -+    int i = 0;
     -+    while (true) {
     -+      int err = iterator_next_log(it, &log);
     -+      if (err > 0) {
     -+        break;
     -+      }
     -+
     -+      assert_err(err);
     -+      assert_streq(names[i], log.ref_name);
     -+      assert(i == log.update_index);
     -+      i++;
     -+    }
     -+
     -+    assert(i == N);
     -+    iterator_destroy(&it);
     -+  }
     -+
     -+  // cleanup.
     -+  free(slice_yield(&buf));
     -+  free_names(names);
     -+  reader_close(&rd);
     -+}
     -+
     -+void test_table_read_write_sequential(void) {
     -+  char **names;
     -+  struct slice buf = {};
     -+  int N = 50;
     -+  write_table(&names, &buf, N, 256);
     -+
     -+  struct block_source source = {};
     -+  block_source_from_slice(&source, &buf);
     -+
     -+  struct reader rd = {};
     -+  int err = init_reader(&rd, source, "file.ref");
     -+  assert(err == 0);
     -+
     -+  struct iterator it = {};
     -+  err = reader_seek_ref(&rd, &it, "");
     -+  assert(err == 0);
     -+
     -+  int j = 0;
     -+  while (true) {
     -+    struct ref_record ref = {};
     -+    int r = iterator_next_ref(it, &ref);
     -+    assert(r >= 0);
     -+    if (r > 0) {
     -+      break;
     -+    }
     -+    assert(0 == strcmp(names[j], ref.ref_name));
     -+    assert(update_index == ref.update_index);
     -+
     -+    j++;
     -+    ref_record_clear(&ref);
     -+  }
     -+  assert(j == N);
     -+  iterator_destroy(&it);
     -+  free(slice_yield(&buf));
     -+  free_names(names);
     -+
     -+  reader_close(&rd);
     -+}
     -+
     -+void test_table_write_small_table(void) {
     -+  char **names;
     -+  struct slice buf = {};
     -+  int N = 1;
     -+  write_table(&names, &buf, N, 4096);
     -+  assert(buf.len < 200);
     -+  free(slice_yield(&buf));
     -+  free_names(names);
     -+}
     -+
     -+void test_table_read_api(void) {
     -+  char **names;
     -+  struct slice buf = {};
     -+  int N = 50;
     -+  write_table(&names, &buf, N, 256);
     -+
     -+  struct reader rd = {};
     -+  struct block_source source = {};
     -+  block_source_from_slice(&source, &buf);
     -+
     -+  int err = init_reader(&rd, source, "file.ref");
     -+  assert(err == 0);
     -+
     -+  struct iterator it = {};
     -+  err = reader_seek_ref(&rd, &it, names[0]);
     -+  assert(err == 0);
     -+
     -+  struct log_record log = {};
     -+  err = iterator_next_log(it, &log);
     -+  assert(err == API_ERROR);
     -+
     -+  free(slice_yield(&buf));
     -+  for (int i = 0; i < N; i++) {
     -+    free(names[i]);
     -+  }
     -+  free(names);
     -+  reader_close(&rd);
     -+}
     -+
     -+void test_table_read_write_seek(bool index) {
     -+  char **names;
     -+  struct slice buf = {};
     -+  int N = 50;
     -+  write_table(&names, &buf, N, 256);
     -+
     -+  struct reader rd = {};
     -+  struct block_source source = {};
     -+  block_source_from_slice(&source, &buf);
     -+
     -+  int err = init_reader(&rd, source, "file.ref");
     -+  assert(err == 0);
     -+
     -+  if (!index) {
     -+    rd.ref_offsets.index_offset = 0;
     -+  }
     -+
     -+  for (int i = 1; i < N; i++) {
     -+    struct iterator it = {};
     -+    int err = reader_seek_ref(&rd, &it, names[i]);
     -+    assert(err == 0);
     -+    struct ref_record ref = {};
     -+    err = iterator_next_ref(it, &ref);
     -+    assert(err == 0);
     -+    assert(0 == strcmp(names[i], ref.ref_name));
     -+    assert(i == ref.value[0]);
     -+
     -+    ref_record_clear(&ref);
     -+    iterator_destroy(&it);
     -+  }
     -+
     -+  free(slice_yield(&buf));
     -+  for (int i = 0; i < N; i++) {
     -+    free(names[i]);
     -+  }
     -+  free(names);
     -+  reader_close(&rd);
     -+}
     -+
     -+void test_table_read_write_seek_linear(void) {
     -+  test_table_read_write_seek(false);
     -+}
     -+
     -+void test_table_read_write_seek_index(void) {
     -+  test_table_read_write_seek(true);
     -+}
     -+
     -+void test_table_refs_for(bool indexed) {
     -+  int N = 50;
     -+
     -+  char **want_names = calloc(sizeof(char *), N + 1);
     -+
     -+  int want_names_len = 0;
     -+  byte want_hash[SHA1_SIZE];
     -+  set_test_hash(want_hash, 4);
     -+
     -+  struct write_options opts = {
     -+      .block_size = 256,
     -+  };
     -+
     -+  struct slice buf = {};
     -+  struct writer *w = new_writer(&slice_write_void, &buf, &opts);
     -+  {
     -+    struct ref_record ref = {};
     -+    for (int i = 0; i < N; i++) {
     -+      byte hash[SHA1_SIZE];
     -+      memset(hash, i, sizeof(hash));
     -+      char fill[51] = {};
     -+      memset(fill, 'x', 50);
     -+      char name[100];
     -+      // Put the variable part in the start
     -+      sprintf(name, "br%02d%s", i, fill);
     -+      name[40] = 0;
     -+      ref.ref_name = name;
     -+
     -+      byte hash1[SHA1_SIZE];
     -+      byte hash2[SHA1_SIZE];
     -+
     -+      set_test_hash(hash1, i / 4);
     -+      set_test_hash(hash2, 3 + i / 4);
     -+      ref.value = hash1;
     -+      ref.target_value = hash2;
     -+
     -+      // 80 bytes / entry, so 3 entries per block. Yields 17 blocks.
     -+      int n = writer_add_ref(w, &ref);
     -+      assert(n == 0);
     -+
     -+      if (0 == memcmp(hash1, want_hash, SHA1_SIZE) ||
     -+          0 == memcmp(hash2, want_hash, SHA1_SIZE)) {
     -+        want_names[want_names_len++] = strdup(name);
     -+      }
     -+    }
     -+  }
     -+
     -+  int n = writer_close(w);
     -+  assert(n == 0);
     -+
     -+  writer_free(w);
     -+  w = NULL;
     -+
     -+  struct reader rd;
     -+  struct block_source source = {};
     -+  block_source_from_slice(&source, &buf);
     -+
     -+  int err = init_reader(&rd, source, "file.ref");
     -+  assert(err == 0);
     -+  if (!indexed) {
     -+    rd.obj_offsets.present = 0;
     -+  }
     -+
     -+  struct iterator it = {};
     -+  err = reader_seek_ref(&rd, &it, "");
     -+  assert(err == 0);
     -+  iterator_destroy(&it);
     -+
     -+  err = reader_refs_for(&rd, &it, want_hash, SHA1_SIZE);
     -+  assert(err == 0);
     -+
     -+  struct ref_record ref = {};
     -+
     -+  int j = 0;
     -+  while (true) {
     -+    int err = iterator_next_ref(it, &ref);
     -+    assert(err >= 0);
     -+    if (err > 0) {
     -+      break;
     -+    }
     -+
     -+    assert(j < want_names_len);
     -+    assert(0 == strcmp(ref.ref_name, want_names[j]));
     -+    j++;
     -+    ref_record_clear(&ref);
     -+  }
     -+  assert(j == want_names_len);
     -+
     -+  free(slice_yield(&buf));
     -+  free_names(want_names);
     -+  iterator_destroy(&it);
     -+  reader_close(&rd);
     -+}
     -+
     -+void test_table_refs_for_no_index(void) { test_table_refs_for(false); }
     -+
     -+void test_table_refs_for_obj_index(void) { test_table_refs_for(true); }
     -+
     -+int main() {
     -+  add_test_case("test_log_write_read", test_log_write_read);
     -+  add_test_case("test_table_write_small_table", &test_table_write_small_table);
     -+  add_test_case("test_buffer", &test_buffer);
     -+  add_test_case("test_table_read_api", &test_table_read_api);
     -+  add_test_case("test_table_read_write_sequential",
     -+                &test_table_read_write_sequential);
     -+  add_test_case("test_table_read_write_seek_linear",
     -+                &test_table_read_write_seek_linear);
     -+  add_test_case("test_table_read_write_seek_index",
     -+                &test_table_read_write_seek_index);
     -+  add_test_case("test_table_read_write_refs_for_no_index",
     -+                &test_table_refs_for_no_index);
     -+  add_test_case("test_table_read_write_refs_for_obj_index",
     -+                &test_table_refs_for_obj_index);
     -+  test_main();
     ++void test_buffer(void)
     ++{
     ++	struct slice buf = {};
     ++
     ++	byte in[] = "hello";
     ++	slice_write(&buf, in, sizeof(in));
     ++	struct block_source source;
     ++	block_source_from_slice(&source, &buf);
     ++	assert(block_source_size(source) == 6);
     ++	struct block out = {};
     ++	int n = block_source_read_block(source, &out, 0, sizeof(in));
     ++	assert(n == sizeof(in));
     ++	assert(!memcmp(in, out.data, n));
     ++	block_source_return_block(source, &out);
     ++
     ++	n = block_source_read_block(source, &out, 1, 2);
     ++	assert(n == 2);
     ++	assert(!memcmp(out.data, "el", 2));
     ++
     ++	block_source_return_block(source, &out);
     ++	block_source_close(&source);
     ++	free(slice_yield(&buf));
     ++}
     ++
     ++void write_table(char ***names, struct slice *buf, int N, int block_size)
     ++{
     ++	*names = calloc(sizeof(char *), N + 1);
     ++
     ++	struct write_options opts = {
     ++		.block_size = block_size,
     ++	};
     ++
     ++	struct writer *w = new_writer(&slice_write_void, buf, &opts);
     ++
     ++	writer_set_limits(w, update_index, update_index);
     ++	{
     ++		struct ref_record ref = {};
     ++		int i = 0;
     ++		for (i = 0; i < N; i++) {
     ++			byte hash[SHA1_SIZE];
     ++			set_test_hash(hash, i);
     ++
     ++			char name[100];
     ++			snprintf(name, sizeof(name), "refs/heads/branch%02d",
     ++				 i);
     ++
     ++			ref.ref_name = name;
     ++			ref.value = hash;
     ++			ref.update_index = update_index;
     ++			(*names)[i] = strdup(name);
     ++
     ++			int n = writer_add_ref(w, &ref);
     ++			assert(n == 0);
     ++		}
     ++	}
     ++	{
     ++		struct log_record log = {};
     ++		int i = 0;
     ++		for (i = 0; i < N; i++) {
     ++			byte hash[SHA1_SIZE];
     ++			set_test_hash(hash, i);
     ++
     ++			char name[100];
     ++			snprintf(name, sizeof(name), "refs/heads/branch%02d",
     ++				 i);
     ++
     ++			log.ref_name = name;
     ++			log.new_hash = hash;
     ++			log.update_index = update_index;
     ++			log.message = "message";
     ++
     ++			int n = writer_add_log(w, &log);
     ++			assert(n == 0);
     ++		}
     ++	}
     ++
     ++	int n = writer_close(w);
     ++	assert(n == 0);
     ++
     ++	struct stats *stats = writer_stats(w);
     ++	int i = 0;
     ++	for (i = 0; i < stats->ref_stats.blocks; i++) {
     ++		int off = i * opts.block_size;
     ++		if (off == 0) {
     ++			off = HEADER_SIZE;
     ++		}
     ++		assert(buf->buf[off] == 'r');
     ++	}
     ++
     ++	writer_free(w);
     ++	w = NULL;
     ++}
     ++
     ++void test_log_write_read(void)
     ++{
     ++	int N = 2;
     ++	char **names = calloc(sizeof(char *), N + 1);
     ++
     ++	struct write_options opts = {
     ++		.block_size = 256,
     ++	};
     ++
     ++	struct slice buf = {};
     ++	struct writer *w = new_writer(&slice_write_void, &buf, &opts);
     ++
     ++	writer_set_limits(w, 0, N);
     ++	{
     ++		struct ref_record ref = {};
     ++		int i = 0;
     ++		for (i = 0; i < N; i++) {
     ++			char name[256];
     ++			snprintf(name, sizeof(name), "b%02d%0*d", i, 130, 7);
     ++			names[i] = strdup(name);
     ++			puts(name);
     ++			ref.ref_name = name;
     ++			ref.update_index = i;
     ++
     ++			int err = writer_add_ref(w, &ref);
     ++			assert_err(err);
     ++		}
     ++	}
     ++
     ++	{
     ++		struct log_record log = {};
     ++		int i = 0;
     ++		for (i = 0; i < N; i++) {
     ++			byte hash1[SHA1_SIZE], hash2[SHA1_SIZE];
     ++			set_test_hash(hash1, i);
     ++			set_test_hash(hash2, i + 1);
     ++
     ++			log.ref_name = names[i];
     ++			log.update_index = i;
     ++			log.old_hash = hash1;
     ++			log.new_hash = hash2;
     ++
     ++			int err = writer_add_log(w, &log);
     ++			assert_err(err);
     ++		}
     ++	}
     ++
     ++	int n = writer_close(w);
     ++	assert(n == 0);
     ++
     ++	struct stats *stats = writer_stats(w);
     ++	assert(stats->log_stats.blocks > 0);
     ++	writer_free(w);
     ++	w = NULL;
     ++
     ++	struct block_source source = {};
     ++	block_source_from_slice(&source, &buf);
     ++
     ++	struct reader rd = {};
     ++	int err = init_reader(&rd, source, "file.log");
     ++	assert(err == 0);
     ++
     ++	{
     ++		struct iterator it = {};
     ++		err = reader_seek_ref(&rd, &it, names[N - 1]);
     ++		assert(err == 0);
     ++
     ++		struct ref_record ref = {};
     ++		err = iterator_next_ref(it, &ref);
     ++		assert_err(err);
     ++
     ++		/* end of iteration. */
     ++		err = iterator_next_ref(it, &ref);
     ++		assert(0 < err);
     ++
     ++		iterator_destroy(&it);
     ++		ref_record_clear(&ref);
     ++	}
     ++
     ++	{
     ++		struct iterator it = {};
     ++		err = reader_seek_log(&rd, &it, "");
     ++		assert(err == 0);
     ++
     ++		struct log_record log = {};
     ++		int i = 0;
     ++		while (true) {
     ++			int err = iterator_next_log(it, &log);
     ++			if (err > 0) {
     ++				break;
     ++			}
     ++
     ++			assert_err(err);
     ++			assert_streq(names[i], log.ref_name);
     ++			assert(i == log.update_index);
     ++			i++;
     ++		}
     ++
     ++		assert(i == N);
     ++		iterator_destroy(&it);
     ++	}
     ++
     ++	/* cleanup. */
     ++	free(slice_yield(&buf));
     ++	free_names(names);
     ++	reader_close(&rd);
     ++}
     ++
     ++void test_table_read_write_sequential(void)
     ++{
     ++	char **names;
     ++	struct slice buf = {};
     ++	int N = 50;
     ++	write_table(&names, &buf, N, 256);
     ++
     ++	struct block_source source = {};
     ++	block_source_from_slice(&source, &buf);
     ++
     ++	struct reader rd = {};
     ++	int err = init_reader(&rd, source, "file.ref");
     ++	assert(err == 0);
     ++
     ++	struct iterator it = {};
     ++	err = reader_seek_ref(&rd, &it, "");
     ++	assert(err == 0);
     ++
     ++	int j = 0;
     ++	while (true) {
     ++		struct ref_record ref = {};
     ++		int r = iterator_next_ref(it, &ref);
     ++		assert(r >= 0);
     ++		if (r > 0) {
     ++			break;
     ++		}
     ++		assert(0 == strcmp(names[j], ref.ref_name));
     ++		assert(update_index == ref.update_index);
     ++
     ++		j++;
     ++		ref_record_clear(&ref);
     ++	}
     ++	assert(j == N);
     ++	iterator_destroy(&it);
     ++	free(slice_yield(&buf));
     ++	free_names(names);
     ++
     ++	reader_close(&rd);
     ++}
     ++
     ++void test_table_write_small_table(void)
     ++{
     ++	char **names;
     ++	struct slice buf = {};
     ++	int N = 1;
     ++	write_table(&names, &buf, N, 4096);
     ++	assert(buf.len < 200);
     ++	free(slice_yield(&buf));
     ++	free_names(names);
     ++}
     ++
     ++void test_table_read_api(void)
     ++{
     ++	char **names;
     ++	struct slice buf = {};
     ++	int N = 50;
     ++	write_table(&names, &buf, N, 256);
     ++
     ++	struct reader rd = {};
     ++	struct block_source source = {};
     ++	block_source_from_slice(&source, &buf);
     ++
     ++	int err = init_reader(&rd, source, "file.ref");
     ++	assert(err == 0);
     ++
     ++	struct iterator it = {};
     ++	err = reader_seek_ref(&rd, &it, names[0]);
     ++	assert(err == 0);
     ++
     ++	struct log_record log = {};
     ++	err = iterator_next_log(it, &log);
     ++	assert(err == API_ERROR);
     ++
     ++	free(slice_yield(&buf));
     ++	int i = 0;
     ++	for (i = 0; i < N; i++) {
     ++		free(names[i]);
     ++	}
     ++	free(names);
     ++	reader_close(&rd);
     ++}
     ++
     ++void test_table_read_write_seek(bool index)
     ++{
     ++	char **names;
     ++	struct slice buf = {};
     ++	int N = 50;
     ++	write_table(&names, &buf, N, 256);
     ++
     ++	struct reader rd = {};
     ++	struct block_source source = {};
     ++	block_source_from_slice(&source, &buf);
     ++
     ++	int err = init_reader(&rd, source, "file.ref");
     ++	assert(err == 0);
     ++
     ++	if (!index) {
     ++		rd.ref_offsets.index_offset = 0;
     ++	}
     ++
     ++	int i = 0;
     ++	for (i = 1; i < N; i++) {
     ++		struct iterator it = {};
     ++		int err = reader_seek_ref(&rd, &it, names[i]);
     ++		assert(err == 0);
     ++		struct ref_record ref = {};
     ++		err = iterator_next_ref(it, &ref);
     ++		assert(err == 0);
     ++		assert(0 == strcmp(names[i], ref.ref_name));
     ++		assert(i == ref.value[0]);
     ++
     ++		ref_record_clear(&ref);
     ++		iterator_destroy(&it);
     ++	}
     ++
     ++	free(slice_yield(&buf));
     ++	for (i = 0; i < N; i++) {
     ++		free(names[i]);
     ++	}
     ++	free(names);
     ++	reader_close(&rd);
     ++}
     ++
     ++void test_table_read_write_seek_linear(void)
     ++{
     ++	test_table_read_write_seek(false);
     ++}
     ++
     ++void test_table_read_write_seek_index(void)
     ++{
     ++	test_table_read_write_seek(true);
     ++}
     ++
     ++void test_table_refs_for(bool indexed)
     ++{
     ++	int N = 50;
     ++
     ++	char **want_names = calloc(sizeof(char *), N + 1);
     ++
     ++	int want_names_len = 0;
     ++	byte want_hash[SHA1_SIZE];
     ++	set_test_hash(want_hash, 4);
     ++
     ++	struct write_options opts = {
     ++		.block_size = 256,
     ++	};
     ++
     ++	struct slice buf = {};
     ++	struct writer *w = new_writer(&slice_write_void, &buf, &opts);
     ++	{
     ++		struct ref_record ref = {};
     ++		int i = 0;
     ++		for (i = 0; i < N; i++) {
     ++			byte hash[SHA1_SIZE];
     ++			memset(hash, i, sizeof(hash));
     ++			char fill[51] = {};
     ++			memset(fill, 'x', 50);
     ++			char name[100];
     ++			/* Put the variable part in the start */
     ++			snprintf(name, sizeof(name), "br%02d%s", i, fill);
     ++			name[40] = 0;
     ++			ref.ref_name = name;
     ++
     ++			byte hash1[SHA1_SIZE];
     ++			byte hash2[SHA1_SIZE];
     ++
     ++			set_test_hash(hash1, i / 4);
     ++			set_test_hash(hash2, 3 + i / 4);
     ++			ref.value = hash1;
     ++			ref.target_value = hash2;
     ++
     ++			/* 80 bytes / entry, so 3 entries per block. Yields 17 */
     ++			/* blocks. */
     ++			int n = writer_add_ref(w, &ref);
     ++			assert(n == 0);
     ++
     ++			if (!memcmp(hash1, want_hash, SHA1_SIZE) ||
     ++			    !memcmp(hash2, want_hash, SHA1_SIZE)) {
     ++				want_names[want_names_len++] = strdup(name);
     ++			}
     ++		}
     ++	}
     ++
     ++	int n = writer_close(w);
     ++	assert(n == 0);
     ++
     ++	writer_free(w);
     ++	w = NULL;
     ++
     ++	struct reader rd;
     ++	struct block_source source = {};
     ++	block_source_from_slice(&source, &buf);
     ++
     ++	int err = init_reader(&rd, source, "file.ref");
     ++	assert(err == 0);
     ++	if (!indexed) {
     ++		rd.obj_offsets.present = 0;
     ++	}
     ++
     ++	struct iterator it = {};
     ++	err = reader_seek_ref(&rd, &it, "");
     ++	assert(err == 0);
     ++	iterator_destroy(&it);
     ++
     ++	err = reader_refs_for(&rd, &it, want_hash, SHA1_SIZE);
     ++	assert(err == 0);
     ++
     ++	struct ref_record ref = {};
     ++
     ++	int j = 0;
     ++	while (true) {
     ++		int err = iterator_next_ref(it, &ref);
     ++		assert(err >= 0);
     ++		if (err > 0) {
     ++			break;
     ++		}
     ++
     ++		assert(j < want_names_len);
     ++		assert(0 == strcmp(ref.ref_name, want_names[j]));
     ++		j++;
     ++		ref_record_clear(&ref);
     ++	}
     ++	assert(j == want_names_len);
     ++
     ++	free(slice_yield(&buf));
     ++	free_names(want_names);
     ++	iterator_destroy(&it);
     ++	reader_close(&rd);
     ++}
     ++
     ++void test_table_refs_for_no_index(void)
     ++{
     ++	test_table_refs_for(false);
     ++}
     ++
     ++void test_table_refs_for_obj_index(void)
     ++{
     ++	test_table_refs_for(true);
     ++}
     ++
     ++int main()
     ++{
     ++	add_test_case("test_log_write_read", test_log_write_read);
     ++	add_test_case("test_table_write_small_table",
     ++		      &test_table_write_small_table);
     ++	add_test_case("test_buffer", &test_buffer);
     ++	add_test_case("test_table_read_api", &test_table_read_api);
     ++	add_test_case("test_table_read_write_sequential",
     ++		      &test_table_read_write_sequential);
     ++	add_test_case("test_table_read_write_seek_linear",
     ++		      &test_table_read_write_seek_linear);
     ++	add_test_case("test_table_read_write_seek_index",
     ++		      &test_table_read_write_seek_index);
     ++	add_test_case("test_table_read_write_refs_for_no_index",
     ++		      &test_table_refs_for_no_index);
     ++	add_test_case("test_table_read_write_refs_for_obj_index",
     ++		      &test_table_refs_for_obj_index);
     ++	test_main();
      +}
      
       diff --git a/reftable/slice.c b/reftable/slice.c
     @@ -5477,174 +5551,194 @@
      +
      +#include "slice.h"
      +
     -+#include <assert.h>
     -+#include <stdlib.h>
     -+#include <string.h>
     ++#include "system.h"
      +
      +#include "reftable.h"
      +
     -+void slice_set_string(struct slice *s, const char *str) {
     -+  if (str == NULL) {
     -+    s->len = 0;
     -+    return;
     -+  }
     ++void slice_set_string(struct slice *s, const char *str)
     ++{
     ++	if (str == NULL) {
     ++		s->len = 0;
     ++		return;
     ++	}
      +
     -+  {
     -+    int l = strlen(str);
     -+    l++;  // \0
     -+    slice_resize(s, l);
     -+    memcpy(s->buf, str, l);
     -+    s->len = l - 1;
     -+  }
     ++	{
     ++		int l = strlen(str);
     ++		l++; /* \0 */
     ++		slice_resize(s, l);
     ++		memcpy(s->buf, str, l);
     ++		s->len = l - 1;
     ++	}
      +}
      +
     -+void slice_resize(struct slice *s, int l) {
     -+  if (s->cap < l) {
     -+    int c = s->cap * 2;
     -+    if (c < l) {
     -+      c = l;
     -+    }
     -+    s->cap = c;
     -+    s->buf = realloc(s->buf, s->cap);
     -+  }
     -+  s->len = l;
     ++void slice_resize(struct slice *s, int l)
     ++{
     ++	if (s->cap < l) {
     ++		int c = s->cap * 2;
     ++		if (c < l) {
     ++			c = l;
     ++		}
     ++		s->cap = c;
     ++		s->buf = realloc(s->buf, s->cap);
     ++	}
     ++	s->len = l;
      +}
      +
     -+void slice_append_string(struct slice *d, const char *s) {
     -+  int l1 = d->len;
     -+  int l2 = strlen(s);
     ++void slice_append_string(struct slice *d, const char *s)
     ++{
     ++	int l1 = d->len;
     ++	int l2 = strlen(s);
      +
     -+  slice_resize(d, l2 + l1);
     -+  memcpy(d->buf + l1, s, l2);
     ++	slice_resize(d, l2 + l1);
     ++	memcpy(d->buf + l1, s, l2);
      +}
      +
     -+void slice_append(struct slice *s, struct slice a) {
     -+  int end = s->len;
     -+  slice_resize(s, s->len + a.len);
     -+  memcpy(s->buf + end, a.buf, a.len);
     ++void slice_append(struct slice *s, struct slice a)
     ++{
     ++	int end = s->len;
     ++	slice_resize(s, s->len + a.len);
     ++	memcpy(s->buf + end, a.buf, a.len);
      +}
      +
     -+byte *slice_yield(struct slice *s) {
     -+  byte *p = s->buf;
     -+  s->buf = NULL;
     -+  s->cap = 0;
     -+  s->len = 0;
     -+  return p;
     ++byte *slice_yield(struct slice *s)
     ++{
     ++	byte *p = s->buf;
     ++	s->buf = NULL;
     ++	s->cap = 0;
     ++	s->len = 0;
     ++	return p;
      +}
      +
     -+void slice_copy(struct slice *dest, struct slice src) {
     -+  slice_resize(dest, src.len);
     -+  memcpy(dest->buf, src.buf, src.len);
     ++void slice_copy(struct slice *dest, struct slice src)
     ++{
     ++	slice_resize(dest, src.len);
     ++	memcpy(dest->buf, src.buf, src.len);
      +}
      +
      +/* return the underlying data as char*. len is left unchanged, but
      +   a \0 is added at the end. */
     -+const char *slice_as_string(struct slice *s) {
     -+  if (s->cap == s->len) {
     -+    int l = s->len;
     -+    slice_resize(s, l + 1);
     -+    s->len = l;
     -+  }
     -+  s->buf[s->len] = 0;
     -+  return (const char *)s->buf;
     ++const char *slice_as_string(struct slice *s)
     ++{
     ++	if (s->cap == s->len) {
     ++		int l = s->len;
     ++		slice_resize(s, l + 1);
     ++		s->len = l;
     ++	}
     ++	s->buf[s->len] = 0;
     ++	return (const char *)s->buf;
      +}
      +
      +/* return a newly malloced string for this slice */
     -+char *slice_to_string(struct slice in) {
     -+  struct slice s = {};
     -+  slice_resize(&s, in.len + 1);
     -+  s.buf[in.len] = 0;
     -+  memcpy(s.buf, in.buf, in.len);
     -+  return (char *)slice_yield(&s);
     ++char *slice_to_string(struct slice in)
     ++{
     ++	struct slice s = {};
     ++	slice_resize(&s, in.len + 1);
     ++	s.buf[in.len] = 0;
     ++	memcpy(s.buf, in.buf, in.len);
     ++	return (char *)slice_yield(&s);
      +}
      +
     -+bool slice_equal(struct slice a, struct slice b) {
     -+  if (a.len != b.len) {
     -+    return 0;
     -+  }
     -+  return memcmp(a.buf, b.buf, a.len) == 0;
     ++bool slice_equal(struct slice a, struct slice b)
     ++{
     ++	if (a.len != b.len) {
     ++		return 0;
     ++	}
     ++	return memcmp(a.buf, b.buf, a.len) == 0;
      +}
      +
     -+int slice_compare(struct slice a, struct slice b) {
     -+  int min = a.len < b.len ? a.len : b.len;
     -+  int res = memcmp(a.buf, b.buf, min);
     -+  if (res != 0) {
     -+    return res;
     -+  }
     -+  if (a.len < b.len) {
     -+    return -1;
     -+  } else if (a.len > b.len) {
     -+    return 1;
     -+  } else {
     -+    return 0;
     -+  }
     ++int slice_compare(struct slice a, struct slice b)
     ++{
     ++	int min = a.len < b.len ? a.len : b.len;
     ++	int res = memcmp(a.buf, b.buf, min);
     ++	if (res != 0) {
     ++		return res;
     ++	}
     ++	if (a.len < b.len) {
     ++		return -1;
     ++	} else if (a.len > b.len) {
     ++		return 1;
     ++	} else {
     ++		return 0;
     ++	}
      +}
      +
     -+int slice_write(struct slice *b, byte *data, int sz) {
     -+  if (b->len + sz > b->cap) {
     -+    int newcap = 2 * b->cap + 1;
     -+    if (newcap < b->len + sz) {
     -+      newcap = (b->len + sz);
     -+    }
     -+    b->buf = realloc(b->buf, newcap);
     -+    b->cap = newcap;
     -+  }
     ++int slice_write(struct slice *b, byte *data, int sz)
     ++{
     ++	if (b->len + sz > b->cap) {
     ++		int newcap = 2 * b->cap + 1;
     ++		if (newcap < b->len + sz) {
     ++			newcap = (b->len + sz);
     ++		}
     ++		b->buf = realloc(b->buf, newcap);
     ++		b->cap = newcap;
     ++	}
      +
     -+  memcpy(b->buf + b->len, data, sz);
     -+  b->len += sz;
     -+  return sz;
     ++	memcpy(b->buf + b->len, data, sz);
     ++	b->len += sz;
     ++	return sz;
      +}
      +
     -+int slice_write_void(void *b, byte *data, int sz) {
     -+  return slice_write((struct slice *)b, data, sz);
     ++int slice_write_void(void *b, byte *data, int sz)
     ++{
     ++	return slice_write((struct slice *)b, data, sz);
      +}
      +
     -+static uint64_t slice_size(void *b) { return ((struct slice *)b)->len; }
     ++static uint64_t slice_size(void *b)
     ++{
     ++	return ((struct slice *)b)->len;
     ++}
      +
     -+static void slice_return_block(void *b, struct block *dest) {
     -+  memset(dest->data, 0xff, dest->len);
     -+  free(dest->data);
     ++static void slice_return_block(void *b, struct block *dest)
     ++{
     ++	memset(dest->data, 0xff, dest->len);
     ++	free(dest->data);
      +}
      +
     -+static void slice_close(void *b) {}
     ++static void slice_close(void *b)
     ++{
     ++}
      +
      +static int slice_read_block(void *v, struct block *dest, uint64_t off,
     -+                            uint32_t size) {
     -+  struct slice *b = (struct slice *)v;
     -+  assert(off + size <= b->len);
     -+  dest->data = calloc(size, 1);
     -+  memcpy(dest->data, b->buf + off, size);
     -+  dest->len = size;
     -+  return size;
     ++			    uint32_t size)
     ++{
     ++	struct slice *b = (struct slice *)v;
     ++	assert(off + size <= b->len);
     ++	dest->data = calloc(size, 1);
     ++	memcpy(dest->data, b->buf + off, size);
     ++	dest->len = size;
     ++	return size;
      +}
      +
      +struct block_source_vtable slice_vtable = {
     -+    .size = &slice_size,
     -+    .read_block = &slice_read_block,
     -+    .return_block = &slice_return_block,
     -+    .close = &slice_close,
     ++	.size = &slice_size,
     ++	.read_block = &slice_read_block,
     ++	.return_block = &slice_return_block,
     ++	.close = &slice_close,
      +};
      +
     -+void block_source_from_slice(struct block_source *bs, struct slice *buf) {
     -+  bs->ops = &slice_vtable;
     -+  bs->arg = buf;
     ++void block_source_from_slice(struct block_source *bs, struct slice *buf)
     ++{
     ++	bs->ops = &slice_vtable;
     ++	bs->arg = buf;
      +}
      +
     -+static void malloc_return_block(void *b, struct block *dest) {
     -+  memset(dest->data, 0xff, dest->len);
     -+  free(dest->data);
     ++static void malloc_return_block(void *b, struct block *dest)
     ++{
     ++	memset(dest->data, 0xff, dest->len);
     ++	free(dest->data);
      +}
      +
      +struct block_source_vtable malloc_vtable = {
     -+    .return_block = &malloc_return_block,
     ++	.return_block = &malloc_return_block,
      +};
      +
      +struct block_source malloc_block_source_instance = {
     -+    .ops = &malloc_vtable,
     ++	.ops = &malloc_vtable,
      +};
      +
     -+struct block_source malloc_block_source(void) {
     -+  return malloc_block_source_instance;
     ++struct block_source malloc_block_source(void)
     ++{
     ++	return malloc_block_source_instance;
      +}
      
       diff --git a/reftable/slice.h b/reftable/slice.h
     @@ -5667,9 +5761,9 @@
      +#include "reftable.h"
      +
      +struct slice {
     -+  byte *buf;
     -+  int len;
     -+  int cap;
     ++	byte *buf;
     ++	int len;
     ++	int cap;
      +};
      +
      +void slice_set_string(struct slice *dest, const char *);
     @@ -5707,33 +5801,33 @@
      +
      +#include "slice.h"
      +
     -+#include <assert.h>
     -+#include <stdlib.h>
     -+#include <string.h>
     ++#include "system.h"
      +
      +#include "basics.h"
      +#include "record.h"
      +#include "reftable.h"
      +#include "test_framework.h"
      +
     -+void test_slice(void) {
     -+  struct slice s = {};
     -+  slice_set_string(&s, "abc");
     -+  assert(0 == strcmp("abc", slice_as_string(&s)));
     ++void test_slice(void)
     ++{
     ++	struct slice s = {};
     ++	slice_set_string(&s, "abc");
     ++	assert(0 == strcmp("abc", slice_as_string(&s)));
      +
     -+  struct slice t = {};
     -+  slice_set_string(&t, "pqr");
     ++	struct slice t = {};
     ++	slice_set_string(&t, "pqr");
      +
     -+  slice_append(&s, t);
     -+  assert(0 == strcmp("abcpqr", slice_as_string(&s)));
     ++	slice_append(&s, t);
     ++	assert(0 == strcmp("abcpqr", slice_as_string(&s)));
      +
     -+  free(slice_yield(&s));
     -+  free(slice_yield(&t));
     ++	free(slice_yield(&s));
     ++	free(slice_yield(&t));
      +}
      +
     -+int main() {
     -+  add_test_case("test_slice", &test_slice);
     -+  test_main();
     ++int main()
     ++{
     ++	add_test_case("test_slice", &test_slice);
     ++	test_main();
      +}
      
       diff --git a/reftable/stack.c b/reftable/stack.c
     @@ -5751,926 +5845,980 @@
      +
      +#include "stack.h"
      +
     -+#include <assert.h>
     -+#include <errno.h>
     -+#include <fcntl.h>
     -+#include <stdio.h>
     -+#include <stdlib.h>
     -+#include <string.h>
     -+#include <sys/stat.h>
     -+#include <sys/time.h>
     -+#include <sys/types.h>
     -+#include <unistd.h>
     -+
     ++#include "system.h"
      +#include "merged.h"
      +#include "reader.h"
      +#include "reftable.h"
      +#include "writer.h"
      +
      +int new_stack(struct stack **dest, const char *dir, const char *list_file,
     -+              struct write_options config) {
     -+  struct stack *p = calloc(sizeof(struct stack), 1);
     -+  int err = 0;
     -+  *dest = NULL;
     -+  p->list_file = strdup(list_file);
     -+  p->reftable_dir = strdup(dir);
     -+  p->config = config;
     -+
     -+  err = stack_reload(p);
     -+  if (err < 0) {
     -+    stack_destroy(p);
     -+  } else {
     -+    *dest = p;
     -+  }
     -+  return err;
     -+}
     -+
     -+static int fread_lines(FILE *f, char ***namesp) {
     -+  long size = 0;
     -+  int err = fseek(f, 0, SEEK_END);
     -+  char *buf = NULL;
     -+  if (err < 0) {
     -+    err = IO_ERROR;
     -+    goto exit;
     -+  }
     -+  size = ftell(f);
     -+  if (size < 0) {
     -+    err = IO_ERROR;
     -+    goto exit;
     -+  }
     -+  err = fseek(f, 0, SEEK_SET);
     -+  if (err < 0) {
     -+    err = IO_ERROR;
     -+    goto exit;
     -+  }
     -+
     -+  buf = malloc(size + 1);
     -+  if (fread(buf, 1, size, f) != size) {
     -+    err = IO_ERROR;
     -+    goto exit;
     -+  }
     -+  buf[size] = 0;
     -+
     -+  parse_names(buf, size, namesp);
     ++	      struct write_options config)
     ++{
     ++	struct stack *p = calloc(sizeof(struct stack), 1);
     ++	int err = 0;
     ++	*dest = NULL;
     ++	p->list_file = strdup(list_file);
     ++	p->reftable_dir = strdup(dir);
     ++	p->config = config;
     ++
     ++	err = stack_reload(p);
     ++	if (err < 0) {
     ++		stack_destroy(p);
     ++	} else {
     ++		*dest = p;
     ++	}
     ++	return err;
     ++}
     ++
     ++static int fread_lines(FILE *f, char ***namesp)
     ++{
     ++	long size = 0;
     ++	int err = fseek(f, 0, SEEK_END);
     ++	char *buf = NULL;
     ++	if (err < 0) {
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++	size = ftell(f);
     ++	if (size < 0) {
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++	err = fseek(f, 0, SEEK_SET);
     ++	if (err < 0) {
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	buf = malloc(size + 1);
     ++	if (fread(buf, 1, size, f) != size) {
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++	buf[size] = 0;
     ++
     ++	parse_names(buf, size, namesp);
      +exit:
     -+  free(buf);
     -+  return err;
     ++	free(buf);
     ++	return err;
      +}
      +
     -+int read_lines(const char *filename, char ***namesp) {
     -+  FILE *f = fopen(filename, "r");
     -+  int err = 0;
     -+  if (f == NULL) {
     -+    if (errno == ENOENT) {
     -+      *namesp = calloc(sizeof(char *), 1);
     -+      return 0;
     -+    }
     ++int read_lines(const char *filename, char ***namesp)
     ++{
     ++	FILE *f = fopen(filename, "r");
     ++	int err = 0;
     ++	if (f == NULL) {
     ++		if (errno == ENOENT) {
     ++			*namesp = calloc(sizeof(char *), 1);
     ++			return 0;
     ++		}
      +
     -+    return IO_ERROR;
     -+  }
     -+  err = fread_lines(f, namesp);
     -+  fclose(f);
     -+  return err;
     ++		return IO_ERROR;
     ++	}
     ++	err = fread_lines(f, namesp);
     ++	fclose(f);
     ++	return err;
      +}
      +
     -+struct merged_table *stack_merged_table(struct stack *st) {
     -+  return st->merged;
     ++struct merged_table *stack_merged_table(struct stack *st)
     ++{
     ++	return st->merged;
      +}
      +
      +/* Close and free the stack */
     -+void stack_destroy(struct stack *st) {
     -+  if (st->merged == NULL) {
     -+    return;
     -+  }
     -+
     -+  merged_table_close(st->merged);
     -+  merged_table_free(st->merged);
     -+  st->merged = NULL;
     -+
     -+  free(st->list_file);
     -+  st->list_file = NULL;
     -+  free(st->reftable_dir);
     -+  st->reftable_dir = NULL;
     -+  free(st);
     -+}
     -+
     -+static struct reader **stack_copy_readers(struct stack *st, int cur_len) {
     -+  struct reader **cur = calloc(sizeof(struct reader *), cur_len);
     -+  for (int i = 0; i < cur_len; i++) {
     -+    cur[i] = st->merged->stack[i];
     -+  }
     -+  return cur;
     -+}
     -+
     -+static int stack_reload_once(struct stack *st, char **names, bool reuse_open) {
     -+  int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
     -+  struct reader **cur = stack_copy_readers(st, cur_len);
     -+  int err = 0;
     -+  int names_len = names_length(names);
     -+  struct reader **new_tables = malloc(sizeof(struct reader *) * names_len);
     -+  int new_tables_len = 0;
     -+  struct merged_table *new_merged = NULL;
     -+
     -+  struct slice table_path = {};
     -+
     -+  while (*names) {
     -+    struct reader *rd = NULL;
     -+    char *name = *names++;
     -+
     -+    // this is linear; we assume compaction keeps the number of tables
     -+    // under control so this is not quadratic.
     -+    for (int j = 0; reuse_open && j < cur_len; j++) {
     -+      if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
     -+        rd = cur[j];
     -+        cur[j] = NULL;
     -+        break;
     -+      }
     -+    }
     -+
     -+    if (rd == NULL) {
     -+      struct block_source src = {};
     -+      slice_set_string(&table_path, st->reftable_dir);
     -+      slice_append_string(&table_path, "/");
     -+      slice_append_string(&table_path, name);
     -+
     -+      err = block_source_from_file(&src, slice_as_string(&table_path));
     -+      if (err < 0) {
     -+        goto exit;
     -+      }
     -+
     -+      err = new_reader(&rd, src, name);
     -+      if (err < 0) {
     -+        goto exit;
     -+      }
     -+    }
     -+
     -+    new_tables[new_tables_len++] = rd;
     -+  }
     -+
     -+  // success!
     -+  err = new_merged_table(&new_merged, new_tables, new_tables_len);
     -+  if (err < 0) {
     -+    goto exit;
     -+  }
     -+
     -+  new_tables = NULL;
     -+  new_tables_len = 0;
     -+  if (st->merged != NULL) {
     -+    merged_table_clear(st->merged);
     -+    merged_table_free(st->merged);
     -+  }
     -+  st->merged = new_merged;
     -+
     -+  for (int i = 0; i < cur_len; i++) {
     -+    if (cur[i] != NULL) {
     -+      reader_close(cur[i]);
     -+      reader_free(cur[i]);
     -+    }
     -+  }
     -+
     ++void stack_destroy(struct stack *st)
     ++{
     ++	if (st->merged == NULL) {
     ++		return;
     ++	}
     ++
     ++	merged_table_close(st->merged);
     ++	merged_table_free(st->merged);
     ++	st->merged = NULL;
     ++
     ++	free(st->list_file);
     ++	st->list_file = NULL;
     ++	free(st->reftable_dir);
     ++	st->reftable_dir = NULL;
     ++	free(st);
     ++}
     ++
     ++static struct reader **stack_copy_readers(struct stack *st, int cur_len)
     ++{
     ++	struct reader **cur = calloc(sizeof(struct reader *), cur_len);
     ++	int i = 0;
     ++	for (i = 0; i < cur_len; i++) {
     ++		cur[i] = st->merged->stack[i];
     ++	}
     ++	return cur;
     ++}
     ++
     ++static int stack_reload_once(struct stack *st, char **names, bool reuse_open)
     ++{
     ++	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
     ++	struct reader **cur = stack_copy_readers(st, cur_len);
     ++	int err = 0;
     ++	int names_len = names_length(names);
     ++	struct reader **new_tables =
     ++		malloc(sizeof(struct reader *) * names_len);
     ++	int new_tables_len = 0;
     ++	struct merged_table *new_merged = NULL;
     ++
     ++	struct slice table_path = {};
     ++
     ++	while (*names) {
     ++		struct reader *rd = NULL;
     ++		char *name = *names++;
     ++
     ++		/* this is linear; we assume compaction keeps the number of
     ++		   tables under control so this is not quadratic. */
     ++		int j = 0;
     ++		for (j = 0; reuse_open && j < cur_len; j++) {
     ++			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
     ++				rd = cur[j];
     ++				cur[j] = NULL;
     ++				break;
     ++			}
     ++		}
     ++
     ++		if (rd == NULL) {
     ++			struct block_source src = {};
     ++			slice_set_string(&table_path, st->reftable_dir);
     ++			slice_append_string(&table_path, "/");
     ++			slice_append_string(&table_path, name);
     ++
     ++			err = block_source_from_file(
     ++				&src, slice_as_string(&table_path));
     ++			if (err < 0) {
     ++				goto exit;
     ++			}
     ++
     ++			err = new_reader(&rd, src, name);
     ++			if (err < 0) {
     ++				goto exit;
     ++			}
     ++		}
     ++
     ++		new_tables[new_tables_len++] = rd;
     ++	}
     ++
     ++	/* success! */
     ++	err = new_merged_table(&new_merged, new_tables, new_tables_len);
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	new_tables = NULL;
     ++	new_tables_len = 0;
     ++	if (st->merged != NULL) {
     ++		merged_table_clear(st->merged);
     ++		merged_table_free(st->merged);
     ++	}
     ++	st->merged = new_merged;
     ++
     ++	{
     ++		int i = 0;
     ++		for (i = 0; i < cur_len; i++) {
     ++			if (cur[i] != NULL) {
     ++				reader_close(cur[i]);
     ++				reader_free(cur[i]);
     ++			}
     ++		}
     ++	}
      +exit:
     -+  free(slice_yield(&table_path));
     -+  for (int i = 0; i < new_tables_len; i++) {
     -+    reader_close(new_tables[i]);
     -+  }
     -+  free(new_tables);
     -+  free(cur);
     -+  return err;
     -+}
     -+
     -+// return negative if a before b.
     -+static int tv_cmp(struct timeval *a, struct timeval *b) {
     -+  time_t diff = a->tv_sec - b->tv_sec;
     -+  suseconds_t udiff = a->tv_usec - b->tv_usec;
     -+
     -+  if (diff != 0) {
     -+    return diff;
     -+  }
     -+
     -+  return udiff;
     -+}
     -+
     -+static int stack_reload_maybe_reuse(struct stack *st, bool reuse_open) {
     -+  struct timeval deadline = {};
     -+  int err = gettimeofday(&deadline, NULL);
     -+  int64_t delay = 0;
     -+  int tries = 0;
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+
     -+  deadline.tv_sec += 3;
     -+  while (true) {
     -+    char **names = NULL;
     -+    char **names_after = NULL;
     -+    struct timeval now = {};
     -+    int err = gettimeofday(&now, NULL);
     -+    if (err < 0) {
     -+      return err;
     -+    }
     -+
     -+    // Only look at deadlines after the first few times. This
     -+    // simplifies debugging in GDB
     -+    tries++;
     -+    if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
     -+      break;
     -+    }
     -+
     -+    err = read_lines(st->list_file, &names);
     -+    if (err < 0) {
     -+      free_names(names);
     -+      return err;
     -+    }
     -+    err = stack_reload_once(st, names, reuse_open);
     -+    if (err == 0) {
     -+      free_names(names);
     -+      break;
     -+    }
     -+    if (err != NOT_EXIST_ERROR) {
     -+      free_names(names);
     -+      return err;
     -+    }
     -+
     -+    err = read_lines(st->list_file, &names_after);
     -+    if (err < 0) {
     -+      free_names(names);
     -+      return err;
     -+    }
     -+
     -+    if (names_equal(names_after, names)) {
     -+      free_names(names);
     -+      free_names(names_after);
     -+      return -1;
     -+    }
     -+    free_names(names);
     -+    free_names(names_after);
     -+
     -+    delay = delay + (delay * rand()) / RAND_MAX + 100;
     -+    usleep(delay);
     -+  }
     -+
     -+  return 0;
     -+}
     -+
     -+int stack_reload(struct stack *st) {
     -+  return stack_reload_maybe_reuse(st, true);
     -+}
     -+
     -+// -1 = error
     -+// 0 = up to date
     -+// 1 = changed.
     -+static int stack_uptodate(struct stack *st) {
     -+  char **names = NULL;
     -+  int err = read_lines(st->list_file, &names);
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+
     -+  for (int i = 0; i < st->merged->stack_len; i++) {
     -+    if (names[i] == NULL) {
     -+      err = 1;
     -+      goto exit;
     -+    }
     -+
     -+    if (0 != strcmp(st->merged->stack[i]->name, names[i])) {
     -+      err = 1;
     -+      goto exit;
     -+    }
     -+  }
     -+
     -+  if (names[st->merged->stack_len] != NULL) {
     -+    err = 1;
     -+    goto exit;
     -+  }
     ++	free(slice_yield(&table_path));
     ++	{
     ++		int i = 0;
     ++		for (i = 0; i < new_tables_len; i++) {
     ++			reader_close(new_tables[i]);
     ++		}
     ++	}
     ++	free(new_tables);
     ++	free(cur);
     ++	return err;
     ++}
     ++
     ++/* return negative if a before b. */
     ++static int tv_cmp(struct timeval *a, struct timeval *b)
     ++{
     ++	time_t diff = a->tv_sec - b->tv_sec;
     ++	int udiff = a->tv_usec - b->tv_usec;
     ++
     ++	if (diff != 0) {
     ++		return diff;
     ++	}
     ++
     ++	return udiff;
     ++}
     ++
     ++static int stack_reload_maybe_reuse(struct stack *st, bool reuse_open)
     ++{
     ++	struct timeval deadline = {};
     ++	int err = gettimeofday(&deadline, NULL);
     ++	int64_t delay = 0;
     ++	int tries = 0;
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	deadline.tv_sec += 3;
     ++	while (true) {
     ++		char **names = NULL;
     ++		char **names_after = NULL;
     ++		struct timeval now = {};
     ++		int err = gettimeofday(&now, NULL);
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++
     ++		/* Only look at deadlines after the first few times. This
     ++		   simplifies debugging in GDB */
     ++		tries++;
     ++		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
     ++			break;
     ++		}
     ++
     ++		err = read_lines(st->list_file, &names);
     ++		if (err < 0) {
     ++			free_names(names);
     ++			return err;
     ++		}
     ++		err = stack_reload_once(st, names, reuse_open);
     ++		if (err == 0) {
     ++			free_names(names);
     ++			break;
     ++		}
     ++		if (err != NOT_EXIST_ERROR) {
     ++			free_names(names);
     ++			return err;
     ++		}
     ++
     ++		err = read_lines(st->list_file, &names_after);
     ++		if (err < 0) {
     ++			free_names(names);
     ++			return err;
     ++		}
     ++
     ++		if (names_equal(names_after, names)) {
     ++			free_names(names);
     ++			free_names(names_after);
     ++			return -1;
     ++		}
     ++		free_names(names);
     ++		free_names(names_after);
     ++
     ++		delay = delay + (delay * rand()) / RAND_MAX + 100;
     ++		usleep(delay);
     ++	}
     ++
     ++	return 0;
     ++}
     ++
     ++int stack_reload(struct stack *st)
     ++{
     ++	return stack_reload_maybe_reuse(st, true);
     ++}
     ++
     ++/* -1 = error
     ++ 0 = up to date
     ++ 1 = changed. */
     ++static int stack_uptodate(struct stack *st)
     ++{
     ++	char **names = NULL;
     ++	int err = read_lines(st->list_file, &names);
     ++	int i = 0;
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	for (i = 0; i < st->merged->stack_len; i++) {
     ++		if (names[i] == NULL) {
     ++			err = 1;
     ++			goto exit;
     ++		}
     ++
     ++		if (strcmp(st->merged->stack[i]->name, names[i])) {
     ++			err = 1;
     ++			goto exit;
     ++		}
     ++	}
     ++
     ++	if (names[st->merged->stack_len] != NULL) {
     ++		err = 1;
     ++		goto exit;
     ++	}
      +
      +exit:
     -+  free_names(names);
     -+  return err;
     ++	free_names(names);
     ++	return err;
      +}
      +
      +int stack_add(struct stack *st, int (*write)(struct writer *wr, void *arg),
     -+              void *arg) {
     -+  int err = stack_try_add(st, write, arg);
     -+  if (err < 0) {
     -+    if (err == LOCK_ERROR) {
     -+      err = stack_reload(st);
     -+    }
     -+    return err;
     -+  }
     ++	      void *arg)
     ++{
     ++	int err = stack_try_add(st, write, arg);
     ++	if (err < 0) {
     ++		if (err == LOCK_ERROR) {
     ++			err = stack_reload(st);
     ++		}
     ++		return err;
     ++	}
      +
     -+  return stack_auto_compact(st);
     ++	return stack_auto_compact(st);
      +}
      +
     -+static void format_name(struct slice *dest, uint64_t min, uint64_t max) {
     -+  char buf[1024];
     -+  sprintf(buf, "%012lx-%012lx", min, max);
     -+  slice_set_string(dest, buf);
     ++static void format_name(struct slice *dest, uint64_t min, uint64_t max)
     ++{
     ++	char buf[100];
     ++	snprintf(buf, sizeof(buf), "%012lx-%012lx", min, max);
     ++	slice_set_string(dest, buf);
      +}
      +
      +int stack_try_add(struct stack *st,
     -+                  int (*write_table)(struct writer *wr, void *arg), void *arg) {
     -+  struct slice lock_name = {};
     -+  struct slice temp_tab_name = {};
     -+  struct slice tab_name = {};
     -+  struct slice next_name = {};
     -+  struct slice table_list = {};
     -+  struct writer *wr = NULL;
     -+  int err = 0;
     -+  int tab_fd = 0;
     -+  int lock_fd = 0;
     -+  uint64_t next_update_index = 0;
     -+
     -+  slice_set_string(&lock_name, st->list_file);
     -+  slice_append_string(&lock_name, ".lock");
     -+
     -+  lock_fd =
     -+      open(slice_as_string(&lock_name), O_EXCL | O_CREAT | O_WRONLY, 0644);
     -+  if (lock_fd < 0) {
     -+    if (errno == EEXIST) {
     -+      err = LOCK_ERROR;
     -+      goto exit;
     -+    }
     -+    err = IO_ERROR;
     -+    goto exit;
     -+  }
     -+
     -+  err = stack_uptodate(st);
     -+  if (err < 0) {
     -+    goto exit;
     -+  }
     -+
     -+  if (err > 1) {
     -+    err = LOCK_ERROR;
     -+    goto exit;
     -+  }
     -+
     -+  next_update_index = stack_next_update_index(st);
     -+
     -+  slice_resize(&next_name, 0);
     -+  format_name(&next_name, next_update_index, next_update_index);
     -+
     -+  slice_set_string(&temp_tab_name, st->reftable_dir);
     -+  slice_append_string(&temp_tab_name, "/");
     -+  slice_append(&temp_tab_name, next_name);
     -+  slice_append_string(&temp_tab_name, "XXXXXX");
     -+
     -+  tab_fd = mkstemp((char *)slice_as_string(&temp_tab_name));
     -+  if (tab_fd < 0) {
     -+    err = IO_ERROR;
     -+    goto exit;
     -+  }
     -+
     -+  wr = new_writer(fd_writer, &tab_fd, &st->config);
     -+  err = write_table(wr, arg);
     -+  if (err < 0) {
     -+    goto exit;
     -+  }
     -+
     -+  err = writer_close(wr);
     -+  if (err < 0) {
     -+    goto exit;
     -+  }
     -+
     -+  err = close(tab_fd);
     -+  tab_fd = 0;
     -+  if (err < 0) {
     -+    err = IO_ERROR;
     -+    goto exit;
     -+  }
     -+
     -+  if (wr->min_update_index < next_update_index) {
     -+    err = API_ERROR;
     -+    goto exit;
     -+  }
     -+
     -+  for (int i = 0; i < st->merged->stack_len; i++) {
     -+    slice_append_string(&table_list, st->merged->stack[i]->name);
     -+    slice_append_string(&table_list, "\n");
     -+  }
     -+
     -+  format_name(&next_name, wr->min_update_index, wr->max_update_index);
     -+  slice_append_string(&next_name, ".ref");
     -+  slice_append(&table_list, next_name);
     -+  slice_append_string(&table_list, "\n");
     -+
     -+  slice_set_string(&tab_name, st->reftable_dir);
     -+  slice_append_string(&tab_name, "/");
     -+  slice_append(&tab_name, next_name);
     -+
     -+  err = rename(slice_as_string(&temp_tab_name), slice_as_string(&tab_name));
     -+  if (err < 0) {
     -+    err = IO_ERROR;
     -+    goto exit;
     -+  }
     -+  free(slice_yield(&temp_tab_name));
     -+
     -+  err = write(lock_fd, table_list.buf, table_list.len);
     -+  if (err < 0) {
     -+    err = IO_ERROR;
     -+    goto exit;
     -+  }
     -+  err = close(lock_fd);
     -+  lock_fd = 0;
     -+  if (err < 0) {
     -+    unlink(slice_as_string(&tab_name));
     -+    err = IO_ERROR;
     -+    goto exit;
     -+  }
     -+
     -+  err = rename(slice_as_string(&lock_name), st->list_file);
     -+  if (err < 0) {
     -+    unlink(slice_as_string(&tab_name));
     -+    err = IO_ERROR;
     -+    goto exit;
     -+  }
     -+
     -+  err = stack_reload(st);
     ++		  int (*write_table)(struct writer *wr, void *arg), void *arg)
     ++{
     ++	struct slice lock_name = {};
     ++	struct slice temp_tab_name = {};
     ++	struct slice tab_name = {};
     ++	struct slice next_name = {};
     ++	struct slice table_list = {};
     ++	struct writer *wr = NULL;
     ++	int err = 0;
     ++	int tab_fd = 0;
     ++	int lock_fd = 0;
     ++	uint64_t next_update_index = 0;
     ++
     ++	slice_set_string(&lock_name, st->list_file);
     ++	slice_append_string(&lock_name, ".lock");
     ++
     ++	lock_fd = open(slice_as_string(&lock_name), O_EXCL | O_CREAT | O_WRONLY,
     ++		       0644);
     ++	if (lock_fd < 0) {
     ++		if (errno == EEXIST) {
     ++			err = LOCK_ERROR;
     ++			goto exit;
     ++		}
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	err = stack_uptodate(st);
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	if (err > 1) {
     ++		err = LOCK_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	next_update_index = stack_next_update_index(st);
     ++
     ++	slice_resize(&next_name, 0);
     ++	format_name(&next_name, next_update_index, next_update_index);
     ++
     ++	slice_set_string(&temp_tab_name, st->reftable_dir);
     ++	slice_append_string(&temp_tab_name, "/");
     ++	slice_append(&temp_tab_name, next_name);
     ++	slice_append_string(&temp_tab_name, ".temp.XXXXXX");
     ++
     ++	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_name));
     ++	if (tab_fd < 0) {
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	wr = new_writer(fd_writer, &tab_fd, &st->config);
     ++	err = write_table(wr, arg);
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	err = writer_close(wr);
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	err = close(tab_fd);
     ++	tab_fd = 0;
     ++	if (err < 0) {
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	if (wr->min_update_index < next_update_index) {
     ++		err = API_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	{
     ++		int i = 0;
     ++		for (i = 0; i < st->merged->stack_len; i++) {
     ++			slice_append_string(&table_list,
     ++					    st->merged->stack[i]->name);
     ++			slice_append_string(&table_list, "\n");
     ++		}
     ++	}
     ++
     ++	format_name(&next_name, wr->min_update_index, wr->max_update_index);
     ++	slice_append_string(&next_name, ".ref");
     ++	slice_append(&table_list, next_name);
     ++	slice_append_string(&table_list, "\n");
     ++
     ++	slice_set_string(&tab_name, st->reftable_dir);
     ++	slice_append_string(&tab_name, "/");
     ++	slice_append(&tab_name, next_name);
     ++
     ++	err = rename(slice_as_string(&temp_tab_name),
     ++		     slice_as_string(&tab_name));
     ++	if (err < 0) {
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++	free(slice_yield(&temp_tab_name));
     ++
     ++	err = write(lock_fd, table_list.buf, table_list.len);
     ++	if (err < 0) {
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++	err = close(lock_fd);
     ++	lock_fd = 0;
     ++	if (err < 0) {
     ++		unlink(slice_as_string(&tab_name));
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	err = rename(slice_as_string(&lock_name), st->list_file);
     ++	if (err < 0) {
     ++		unlink(slice_as_string(&tab_name));
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	err = stack_reload(st);
      +exit:
     -+  if (tab_fd > 0) {
     -+    close(tab_fd);
     -+    tab_fd = 0;
     -+  }
     -+  if (temp_tab_name.len > 0) {
     -+    unlink(slice_as_string(&temp_tab_name));
     -+  }
     -+  unlink(slice_as_string(&lock_name));
     -+
     -+  if (lock_fd > 0) {
     -+    close(lock_fd);
     -+    lock_fd = 0;
     -+  }
     -+
     -+  free(slice_yield(&lock_name));
     -+  free(slice_yield(&temp_tab_name));
     -+  free(slice_yield(&tab_name));
     -+  free(slice_yield(&next_name));
     -+  free(slice_yield(&table_list));
     -+  writer_free(wr);
     -+  return err;
     -+}
     -+
     -+uint64_t stack_next_update_index(struct stack *st) {
     -+  int sz = st->merged->stack_len;
     -+  if (sz > 0) {
     -+    return reader_max_update_index(st->merged->stack[sz - 1]) + 1;
     -+  }
     -+  return 1;
     ++	if (tab_fd > 0) {
     ++		close(tab_fd);
     ++		tab_fd = 0;
     ++	}
     ++	if (temp_tab_name.len > 0) {
     ++		unlink(slice_as_string(&temp_tab_name));
     ++	}
     ++	unlink(slice_as_string(&lock_name));
     ++
     ++	if (lock_fd > 0) {
     ++		close(lock_fd);
     ++		lock_fd = 0;
     ++	}
     ++
     ++	free(slice_yield(&lock_name));
     ++	free(slice_yield(&temp_tab_name));
     ++	free(slice_yield(&tab_name));
     ++	free(slice_yield(&next_name));
     ++	free(slice_yield(&table_list));
     ++	writer_free(wr);
     ++	return err;
     ++}
     ++
     ++uint64_t stack_next_update_index(struct stack *st)
     ++{
     ++	int sz = st->merged->stack_len;
     ++	if (sz > 0) {
     ++		return reader_max_update_index(st->merged->stack[sz - 1]) + 1;
     ++	}
     ++	return 1;
      +}
      +
      +static int stack_compact_locked(struct stack *st, int first, int last,
     -+                                struct slice *temp_tab, struct log_expiry_config *config) {
     -+  struct slice next_name = {};
     -+  int tab_fd = -1;
     -+  struct writer *wr = NULL;
     -+  int err = 0;
     -+
     -+  format_name(&next_name, reader_min_update_index(st->merged->stack[first]),
     -+              reader_max_update_index(st->merged->stack[first]));
     -+
     -+  slice_set_string(temp_tab, st->reftable_dir);
     -+  slice_append_string(temp_tab, "/");
     -+  slice_append(temp_tab, next_name);
     -+  slice_append_string(temp_tab, "XXXXXX");
     -+
     -+  tab_fd = mkstemp((char *)slice_as_string(temp_tab));
     -+  wr = new_writer(fd_writer, &tab_fd, &st->config);
     -+
     -+  err = stack_write_compact(st, wr, first, last, config);
     -+  if (err < 0) {
     -+    goto exit;
     -+  }
     -+  err = writer_close(wr);
     -+  if (err < 0) {
     -+    goto exit;
     -+  }
     -+  writer_free(wr);
     -+
     -+  err = close(tab_fd);
     -+  tab_fd = 0;
     ++				struct slice *temp_tab,
     ++				struct log_expiry_config *config)
     ++{
     ++	struct slice next_name = {};
     ++	int tab_fd = -1;
     ++	struct writer *wr = NULL;
     ++	int err = 0;
     ++
     ++	format_name(&next_name,
     ++		    reader_min_update_index(st->merged->stack[first]),
     ++		    reader_max_update_index(st->merged->stack[first]));
     ++
     ++	slice_set_string(temp_tab, st->reftable_dir);
     ++	slice_append_string(temp_tab, "/");
     ++	slice_append(temp_tab, next_name);
     ++	slice_append_string(temp_tab, ".temp.XXXXXX");
     ++
     ++	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
     ++	wr = new_writer(fd_writer, &tab_fd, &st->config);
     ++
     ++	err = stack_write_compact(st, wr, first, last, config);
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++	err = writer_close(wr);
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++	writer_free(wr);
     ++
     ++	err = close(tab_fd);
     ++	tab_fd = 0;
      +
      +exit:
     -+  if (tab_fd > 0) {
     -+    close(tab_fd);
     -+    tab_fd = 0;
     -+  }
     -+  if (err != 0 && temp_tab->len > 0) {
     -+    unlink(slice_as_string(temp_tab));
     -+    free(slice_yield(temp_tab));
     -+  }
     -+  free(slice_yield(&next_name));
     -+  return err;
     ++	if (tab_fd > 0) {
     ++		close(tab_fd);
     ++		tab_fd = 0;
     ++	}
     ++	if (err != 0 && temp_tab->len > 0) {
     ++		unlink(slice_as_string(temp_tab));
     ++		free(slice_yield(temp_tab));
     ++	}
     ++	free(slice_yield(&next_name));
     ++	return err;
      +}
      +
      +int stack_write_compact(struct stack *st, struct writer *wr, int first,
     -+                        int last, struct log_expiry_config *config) {
     -+  int subtabs_len = last - first + 1;
     -+  struct reader **subtabs = calloc(sizeof(struct reader *), last - first + 1);
     -+  struct merged_table *mt = NULL;
     -+  int err = 0;
     -+  struct iterator it = {};
     -+  struct ref_record ref = {};
     -+  struct log_record log = {};
     -+
     -+  for (int i = first, j = 0; i <= last; i++) {
     -+    struct reader *t = st->merged->stack[i];
     -+    subtabs[j++] = t;
     -+    st->stats.bytes += t->size;
     -+  }
     -+  writer_set_limits(wr, st->merged->stack[first]->min_update_index,
     -+                    st->merged->stack[last]->max_update_index);
     -+
     -+  err = new_merged_table(&mt, subtabs, subtabs_len);
     -+  if (err < 0) {
     -+    free(subtabs);
     -+    goto exit;
     -+  }
     -+
     -+  err = merged_table_seek_ref(mt, &it, "");
     -+  if (err < 0) {
     -+    goto exit;
     -+  }
     -+
     -+  while (true) {
     -+    err = iterator_next_ref(it, &ref);
     -+    if (err > 0) {
     -+      err = 0;
     -+      break;
     -+    }
     -+    if (err < 0) {
     -+      break;
     -+    }
     -+    if (first == 0 && ref_record_is_deletion(&ref)) {
     -+      continue;
     -+    }
     -+
     -+    err = writer_add_ref(wr, &ref);
     -+    if (err < 0) {
     -+      break;
     -+    }
     -+  }
     -+
     -+  err = merged_table_seek_log(mt, &it, "");
     -+  if (err < 0) {
     -+    goto exit;
     -+  }
     -+
     -+  while (true) {
     -+    err = iterator_next_log(it, &log);
     -+    if (err > 0) {
     -+      err = 0;
     -+      break;
     -+    }
     -+    if (err < 0) {
     -+      break;
     -+    }
     -+    if (first == 0 && log_record_is_deletion(&log)) {
     -+      continue;
     -+    }
     -+
     -+    // XXX collect stats?
     -+
     -+    if (config != NULL && config->time > 0 && log.time < config->time) {
     -+      continue;
     -+    }
     -+
     -+    if (config != NULL && config->min_update_index > 0 && log.update_index < config->min_update_index) {
     -+      continue;
     -+    }
     -+
     -+    err = writer_add_log(wr, &log);
     -+    if (err < 0) {
     -+      break;
     -+    }
     -+  }
     -+
     ++			int last, struct log_expiry_config *config)
     ++{
     ++	int subtabs_len = last - first + 1;
     ++	struct reader **subtabs =
     ++		calloc(sizeof(struct reader *), last - first + 1);
     ++	struct merged_table *mt = NULL;
     ++	int err = 0;
     ++	struct iterator it = {};
     ++	struct ref_record ref = {};
     ++	struct log_record log = {};
     ++
     ++	int i = 0, j = 0;
     ++	for (i = first, j = 0; i <= last; i++) {
     ++		struct reader *t = st->merged->stack[i];
     ++		subtabs[j++] = t;
     ++		st->stats.bytes += t->size;
     ++	}
     ++	writer_set_limits(wr, st->merged->stack[first]->min_update_index,
     ++			  st->merged->stack[last]->max_update_index);
     ++
     ++	err = new_merged_table(&mt, subtabs, subtabs_len);
     ++	if (err < 0) {
     ++		free(subtabs);
     ++		goto exit;
     ++	}
     ++
     ++	err = merged_table_seek_ref(mt, &it, "");
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	while (true) {
     ++		err = iterator_next_ref(it, &ref);
     ++		if (err > 0) {
     ++			err = 0;
     ++			break;
     ++		}
     ++		if (err < 0) {
     ++			break;
     ++		}
     ++		if (first == 0 && ref_record_is_deletion(&ref)) {
     ++			continue;
     ++		}
     ++
     ++		err = writer_add_ref(wr, &ref);
     ++		if (err < 0) {
     ++			break;
     ++		}
     ++	}
     ++
     ++	err = merged_table_seek_log(mt, &it, "");
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	while (true) {
     ++		err = iterator_next_log(it, &log);
     ++		if (err > 0) {
     ++			err = 0;
     ++			break;
     ++		}
     ++		if (err < 0) {
     ++			break;
     ++		}
     ++		if (first == 0 && log_record_is_deletion(&log)) {
     ++			continue;
     ++		}
     ++
     ++		/* XXX collect stats? */
     ++
     ++		if (config != NULL && config->time > 0 &&
     ++		    log.time < config->time) {
     ++			continue;
     ++		}
     ++
     ++		if (config != NULL && config->min_update_index > 0 &&
     ++		    log.update_index < config->min_update_index) {
     ++			continue;
     ++		}
     ++
     ++		err = writer_add_log(wr, &log);
     ++		if (err < 0) {
     ++			break;
     ++		}
     ++	}
      +
      +exit:
     -+  iterator_destroy(&it);
     -+  if (mt != NULL) {
     -+    merged_table_clear(mt);
     -+    merged_table_free(mt);
     -+  }
     -+  ref_record_clear(&ref);
     -+
     -+  return err;
     -+}
     -+
     -+// <  0: error. 0 == OK, > 0 attempt failed; could retry.
     -+static int stack_compact_range(struct stack *st, int first, int last, struct log_expiry_config *expiry) {
     -+  struct slice temp_tab_name = {};
     -+  struct slice new_table_name = {};
     -+  struct slice lock_file_name = {};
     -+  struct slice ref_list_contents = {};
     -+  struct slice new_table_path = {};
     -+  int err = 0;
     -+  bool have_lock = false;
     -+  int lock_file_fd = 0;
     -+  int compact_count = last - first + 1;
     -+  char **delete_on_success = calloc(sizeof(char *), compact_count + 1);
     -+  char **subtable_locks = calloc(sizeof(char *), compact_count + 1);
     -+
     -+  if (first > last || (expiry == NULL && first == last)) {
     -+    err = 0;
     -+    goto exit;
     -+  }
     -+
     -+  st->stats.attempts++;
     -+
     -+  slice_set_string(&lock_file_name, st->list_file);
     -+  slice_append_string(&lock_file_name, ".lock");
     -+
     -+  lock_file_fd =
     -+      open(slice_as_string(&lock_file_name), O_EXCL | O_CREAT | O_WRONLY, 0644);
     -+  if (lock_file_fd < 0) {
     -+    if (errno == EEXIST) {
     -+      err = 1;
     -+    } else {
     -+      err = IO_ERROR;
     -+    }
     -+    goto exit;
     -+  }
     -+  have_lock = true;
     -+  err = stack_uptodate(st);
     -+  if (err != 0) {
     -+    goto exit;
     -+  }
     -+
     -+  for (int i = first, j = 0; i <= last; i++) {
     -+    struct slice subtab_name = {};
     -+    struct slice subtab_lock = {};
     -+    slice_set_string(&subtab_name, st->reftable_dir);
     -+    slice_append_string(&subtab_name, "/");
     -+    slice_append_string(&subtab_name, reader_name(st->merged->stack[i]));
     -+
     -+    slice_copy(&subtab_lock, subtab_name);
     -+    slice_append_string(&subtab_lock, ".lock");
     -+
     -+    {
     -+      int sublock_file_fd = open(slice_as_string(&subtab_lock),
     -+                                 O_EXCL | O_CREAT | O_WRONLY, 0644);
     -+      if (sublock_file_fd > 0) {
     -+        close(sublock_file_fd);
     -+      } else if (sublock_file_fd < 0) {
     -+        if (errno == EEXIST) {
     -+          err = 1;
     -+        }
     -+        err = IO_ERROR;
     -+      }
     -+    }
     -+
     -+    subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
     -+    delete_on_success[j] = (char *)slice_as_string(&subtab_name);
     -+    j++;
     -+
     -+    if (err != 0) {
     -+      goto exit;
     -+    }
     -+  }
     -+
     -+  err = unlink(slice_as_string(&lock_file_name));
     -+  if (err < 0) {
     -+    goto exit;
     -+  }
     -+  have_lock = false;
     -+
     -+  err = stack_compact_locked(st, first, last, &temp_tab_name, expiry);
     -+  if (err < 0) {
     -+    goto exit;
     -+  }
     -+
     -+  lock_file_fd =
     -+      open(slice_as_string(&lock_file_name), O_EXCL | O_CREAT | O_WRONLY, 0644);
     -+  if (lock_file_fd < 0) {
     -+    if (errno == EEXIST) {
     -+      err = 1;
     -+    } else {
     -+      err = IO_ERROR;
     -+    }
     -+    goto exit;
     -+  }
     -+  have_lock = true;
     -+
     -+  format_name(&new_table_name, st->merged->stack[first]->min_update_index,
     -+              st->merged->stack[last]->max_update_index);
     -+  slice_append_string(&new_table_name, ".ref");
     -+
     -+  slice_set_string(&new_table_path, st->reftable_dir);
     -+  slice_append_string(&new_table_path, "/");
     -+
     -+  slice_append(&new_table_path, new_table_name);
     -+
     -+  err =
     -+      rename(slice_as_string(&temp_tab_name), slice_as_string(&new_table_path));
     -+  if (err < 0) {
     -+    goto exit;
     -+  }
     -+
     -+  for (int i = 0; i < first; i++) {
     -+    slice_append_string(&ref_list_contents, st->merged->stack[i]->name);
     -+    slice_append_string(&ref_list_contents, "\n");
     -+  }
     -+  slice_append(&ref_list_contents, new_table_name);
     -+  slice_append_string(&ref_list_contents, "\n");
     -+  for (int i = last + 1; i < st->merged->stack_len; i++) {
     -+    slice_append_string(&ref_list_contents, st->merged->stack[i]->name);
     -+    slice_append_string(&ref_list_contents, "\n");
     -+  }
     -+
     -+  err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
     -+  if (err < 0) {
     -+    unlink(slice_as_string(&new_table_path));
     -+    goto exit;
     -+  }
     -+  err = close(lock_file_fd);
     -+  lock_file_fd = 0;
     -+  if (err < 0) {
     -+    unlink(slice_as_string(&new_table_path));
     -+    goto exit;
     -+  }
     -+
     -+  err = rename(slice_as_string(&lock_file_name), st->list_file);
     -+  if (err < 0) {
     -+    unlink(slice_as_string(&new_table_path));
     -+    goto exit;
     -+  }
     -+  have_lock = false;
     -+
     -+  for (char **p = delete_on_success; *p; p++) {
     -+    if (0 != strcmp(*p, slice_as_string(&new_table_path))) {
     -+      unlink(*p);
     -+    }
     -+  }
     -+
     -+  err = stack_reload_maybe_reuse(st, first < last);
     ++	iterator_destroy(&it);
     ++	if (mt != NULL) {
     ++		merged_table_clear(mt);
     ++		merged_table_free(mt);
     ++	}
     ++	ref_record_clear(&ref);
     ++
     ++	return err;
     ++}
     ++
     ++/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
     ++static int stack_compact_range(struct stack *st, int first, int last,
     ++			       struct log_expiry_config *expiry)
     ++{
     ++	struct slice temp_tab_name = {};
     ++	struct slice new_table_name = {};
     ++	struct slice lock_file_name = {};
     ++	struct slice ref_list_contents = {};
     ++	struct slice new_table_path = {};
     ++	int err = 0;
     ++	bool have_lock = false;
     ++	int lock_file_fd = 0;
     ++	int compact_count = last - first + 1;
     ++	char **delete_on_success = calloc(sizeof(char *), compact_count + 1);
     ++	char **subtable_locks = calloc(sizeof(char *), compact_count + 1);
     ++	int i = 0;
     ++	int j = 0;
     ++
     ++	if (first > last || (expiry == NULL && first == last)) {
     ++		err = 0;
     ++		goto exit;
     ++	}
     ++
     ++	st->stats.attempts++;
     ++
     ++	slice_set_string(&lock_file_name, st->list_file);
     ++	slice_append_string(&lock_file_name, ".lock");
     ++
     ++	lock_file_fd = open(slice_as_string(&lock_file_name),
     ++			    O_EXCL | O_CREAT | O_WRONLY, 0644);
     ++	if (lock_file_fd < 0) {
     ++		if (errno == EEXIST) {
     ++			err = 1;
     ++		} else {
     ++			err = IO_ERROR;
     ++		}
     ++		goto exit;
     ++	}
     ++	have_lock = true;
     ++	err = stack_uptodate(st);
     ++	if (err != 0) {
     ++		goto exit;
     ++	}
     ++
     ++	for (i = first, j = 0; i <= last; i++) {
     ++		struct slice subtab_name = {};
     ++		struct slice subtab_lock = {};
     ++		slice_set_string(&subtab_name, st->reftable_dir);
     ++		slice_append_string(&subtab_name, "/");
     ++		slice_append_string(&subtab_name,
     ++				    reader_name(st->merged->stack[i]));
     ++
     ++		slice_copy(&subtab_lock, subtab_name);
     ++		slice_append_string(&subtab_lock, ".lock");
     ++
     ++		{
     ++			int sublock_file_fd =
     ++				open(slice_as_string(&subtab_lock),
     ++				     O_EXCL | O_CREAT | O_WRONLY, 0644);
     ++			if (sublock_file_fd > 0) {
     ++				close(sublock_file_fd);
     ++			} else if (sublock_file_fd < 0) {
     ++				if (errno == EEXIST) {
     ++					err = 1;
     ++				}
     ++				err = IO_ERROR;
     ++			}
     ++		}
     ++
     ++		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
     ++		delete_on_success[j] = (char *)slice_as_string(&subtab_name);
     ++		j++;
     ++
     ++		if (err != 0) {
     ++			goto exit;
     ++		}
     ++	}
     ++
     ++	err = unlink(slice_as_string(&lock_file_name));
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++	have_lock = false;
     ++
     ++	err = stack_compact_locked(st, first, last, &temp_tab_name, expiry);
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	lock_file_fd = open(slice_as_string(&lock_file_name),
     ++			    O_EXCL | O_CREAT | O_WRONLY, 0644);
     ++	if (lock_file_fd < 0) {
     ++		if (errno == EEXIST) {
     ++			err = 1;
     ++		} else {
     ++			err = IO_ERROR;
     ++		}
     ++		goto exit;
     ++	}
     ++	have_lock = true;
     ++
     ++	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
     ++		    st->merged->stack[last]->max_update_index);
     ++	slice_append_string(&new_table_name, ".ref");
     ++
     ++	slice_set_string(&new_table_path, st->reftable_dir);
     ++	slice_append_string(&new_table_path, "/");
     ++
     ++	slice_append(&new_table_path, new_table_name);
     ++
     ++	err = rename(slice_as_string(&temp_tab_name),
     ++		     slice_as_string(&new_table_path));
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	for (i = 0; i < first; i++) {
     ++		slice_append_string(&ref_list_contents,
     ++				    st->merged->stack[i]->name);
     ++		slice_append_string(&ref_list_contents, "\n");
     ++	}
     ++	slice_append(&ref_list_contents, new_table_name);
     ++	slice_append_string(&ref_list_contents, "\n");
     ++	for (i = last + 1; i < st->merged->stack_len; i++) {
     ++		slice_append_string(&ref_list_contents,
     ++				    st->merged->stack[i]->name);
     ++		slice_append_string(&ref_list_contents, "\n");
     ++	}
     ++
     ++	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
     ++	if (err < 0) {
     ++		unlink(slice_as_string(&new_table_path));
     ++		goto exit;
     ++	}
     ++	err = close(lock_file_fd);
     ++	lock_file_fd = 0;
     ++	if (err < 0) {
     ++		unlink(slice_as_string(&new_table_path));
     ++		goto exit;
     ++	}
     ++
     ++	err = rename(slice_as_string(&lock_file_name), st->list_file);
     ++	if (err < 0) {
     ++		unlink(slice_as_string(&new_table_path));
     ++		goto exit;
     ++	}
     ++	have_lock = false;
     ++
     ++	for (char **p = delete_on_success; *p; p++) {
     ++		if (strcmp(*p, slice_as_string(&new_table_path))) {
     ++			unlink(*p);
     ++		}
     ++	}
     ++
     ++	err = stack_reload_maybe_reuse(st, first < last);
      +exit:
     -+  for (char **p = subtable_locks; *p; p++) {
     -+    unlink(*p);
     -+  }
     -+  free_names(delete_on_success);
     -+  free_names(subtable_locks);
     -+  if (lock_file_fd > 0) {
     -+    close(lock_file_fd);
     -+    lock_file_fd = 0;
     -+  }
     -+  if (have_lock) {
     -+    unlink(slice_as_string(&lock_file_name));
     -+  }
     -+  free(slice_yield(&new_table_name));
     -+  free(slice_yield(&new_table_path));
     -+  free(slice_yield(&ref_list_contents));
     -+  free(slice_yield(&temp_tab_name));
     -+  free(slice_yield(&lock_file_name));
     -+  return err;
     -+}
     -+
     -+int stack_compact_all(struct stack *st, struct log_expiry_config *config) {
     -+  return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
     -+}
     -+
     -+static int stack_compact_range_stats(struct stack *st, int first, int last, struct log_expiry_config *config) {
     -+  int err = stack_compact_range(st, first, last, config);
     -+  if (err > 0) {
     -+    st->stats.failures++;
     -+  }
     -+  return err;
     -+}
     -+
     -+static int segment_size(struct segment *s) { return s->end - s->start; }
     -+
     -+int fastlog2(uint64_t sz) {
     -+  int l = 0;
     -+  assert(sz > 0);
     -+  for (; sz; sz /= 2) {
     -+    l++;
     -+  }
     -+  return l - 1;
     -+}
     -+
     -+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n) {
     -+  struct segment *segs = calloc(sizeof(struct segment), n);
     -+  int next = 0;
     -+  struct segment cur = {};
     -+  for (int i = 0; i < n; i++) {
     -+    int log = fastlog2(sizes[i]);
     -+    if (cur.log != log && cur.bytes > 0) {
     -+      struct segment fresh = {
     -+          .start = i,
     -+      };
     -+
     -+      segs[next++] = cur;
     -+      cur = fresh;
     -+    }
     -+
     -+    cur.log = log;
     -+    cur.end = i + 1;
     -+    cur.bytes += sizes[i];
     -+  }
     -+
     -+  segs[next++] = cur;
     -+  *seglen = next;
     -+  return segs;
     -+}
     -+
     -+struct segment suggest_compaction_segment(uint64_t *sizes, int n) {
     -+  int seglen = 0;
     -+  struct segment *segs = sizes_to_segments(&seglen, sizes, n);
     -+  struct segment min_seg = {
     -+      .log = 64,
     -+  };
     -+  for (int i = 0; i < seglen; i++) {
     -+    if (segment_size(&segs[i]) == 1) {
     -+      continue;
     -+    }
     -+
     -+    if (segs[i].log < min_seg.log) {
     -+      min_seg = segs[i];
     -+    }
     -+  }
     -+
     -+  while (min_seg.start > 0) {
     -+    int prev = min_seg.start - 1;
     -+    if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
     -+      break;
     -+    }
     -+
     -+    min_seg.start = prev;
     -+    min_seg.bytes += sizes[prev];
     -+  }
     -+
     -+  free(segs);
     -+  return min_seg;
     -+}
     -+
     -+static uint64_t *stack_table_sizes_for_compaction(struct stack *st) {
     -+  uint64_t *sizes = calloc(sizeof(uint64_t), st->merged->stack_len);
     -+  for (int i = 0; i < st->merged->stack_len; i++) {
     -+    // overhead is 24 + 68 = 92.
     -+    sizes[i] = st->merged->stack[i]->size - 91;
     -+  }
     -+  return sizes;
     -+}
     -+
     -+int stack_auto_compact(struct stack *st) {
     -+  uint64_t *sizes = stack_table_sizes_for_compaction(st);
     -+  struct segment seg = suggest_compaction_segment(sizes, st->merged->stack_len);
     -+  free(sizes);
     -+  if (segment_size(&seg) > 0) {
     -+    return stack_compact_range_stats(st, seg.start, seg.end - 1, NULL);
     -+  }
     -+
     -+  return 0;
     -+}
     -+
     -+struct compaction_stats *stack_compaction_stats(struct stack *st) {
     -+  return &st->stats;
     ++	for (char **p = subtable_locks; *p; p++) {
     ++		unlink(*p);
     ++	}
     ++	free_names(delete_on_success);
     ++	free_names(subtable_locks);
     ++	if (lock_file_fd > 0) {
     ++		close(lock_file_fd);
     ++		lock_file_fd = 0;
     ++	}
     ++	if (have_lock) {
     ++		unlink(slice_as_string(&lock_file_name));
     ++	}
     ++	free(slice_yield(&new_table_name));
     ++	free(slice_yield(&new_table_path));
     ++	free(slice_yield(&ref_list_contents));
     ++	free(slice_yield(&temp_tab_name));
     ++	free(slice_yield(&lock_file_name));
     ++	return err;
     ++}
     ++
     ++int stack_compact_all(struct stack *st, struct log_expiry_config *config)
     ++{
     ++	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
     ++}
     ++
     ++static int stack_compact_range_stats(struct stack *st, int first, int last,
     ++				     struct log_expiry_config *config)
     ++{
     ++	int err = stack_compact_range(st, first, last, config);
     ++	if (err > 0) {
     ++		st->stats.failures++;
     ++	}
     ++	return err;
     ++}
     ++
     ++static int segment_size(struct segment *s)
     ++{
     ++	return s->end - s->start;
     ++}
     ++
     ++int fastlog2(uint64_t sz)
     ++{
     ++	int l = 0;
     ++	assert(sz > 0);
     ++	for (; sz; sz /= 2) {
     ++		l++;
     ++	}
     ++	return l - 1;
     ++}
     ++
     ++struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
     ++{
     ++	struct segment *segs = calloc(sizeof(struct segment), n);
     ++	int next = 0;
     ++	struct segment cur = {};
     ++	int i = 0;
     ++	for (i = 0; i < n; i++) {
     ++		int log = fastlog2(sizes[i]);
     ++		if (cur.log != log && cur.bytes > 0) {
     ++			struct segment fresh = {
     ++				.start = i,
     ++			};
     ++
     ++			segs[next++] = cur;
     ++			cur = fresh;
     ++		}
     ++
     ++		cur.log = log;
     ++		cur.end = i + 1;
     ++		cur.bytes += sizes[i];
     ++	}
     ++
     ++	segs[next++] = cur;
     ++	*seglen = next;
     ++	return segs;
     ++}
     ++
     ++struct segment suggest_compaction_segment(uint64_t *sizes, int n)
     ++{
     ++	int seglen = 0;
     ++	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
     ++	struct segment min_seg = {
     ++		.log = 64,
     ++	};
     ++	int i = 0;
     ++	for (i = 0; i < seglen; i++) {
     ++		if (segment_size(&segs[i]) == 1) {
     ++			continue;
     ++		}
     ++
     ++		if (segs[i].log < min_seg.log) {
     ++			min_seg = segs[i];
     ++		}
     ++	}
     ++
     ++	while (min_seg.start > 0) {
     ++		int prev = min_seg.start - 1;
     ++		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
     ++			break;
     ++		}
     ++
     ++		min_seg.start = prev;
     ++		min_seg.bytes += sizes[prev];
     ++	}
     ++
     ++	free(segs);
     ++	return min_seg;
     ++}
     ++
     ++static uint64_t *stack_table_sizes_for_compaction(struct stack *st)
     ++{
     ++	uint64_t *sizes = calloc(sizeof(uint64_t), st->merged->stack_len);
     ++	int i = 0;
     ++	for (i = 0; i < st->merged->stack_len; i++) {
     ++		/* overhead is 24 + 68 = 92. */
     ++		sizes[i] = st->merged->stack[i]->size - 91;
     ++	}
     ++	return sizes;
     ++}
     ++
     ++int stack_auto_compact(struct stack *st)
     ++{
     ++	uint64_t *sizes = stack_table_sizes_for_compaction(st);
     ++	struct segment seg =
     ++		suggest_compaction_segment(sizes, st->merged->stack_len);
     ++	free(sizes);
     ++	if (segment_size(&seg) > 0) {
     ++		return stack_compact_range_stats(st, seg.start, seg.end - 1,
     ++						 NULL);
     ++	}
     ++
     ++	return 0;
     ++}
     ++
     ++struct compaction_stats *stack_compaction_stats(struct stack *st)
     ++{
     ++	return &st->stats;
      +}
      +
      +int stack_read_ref(struct stack *st, const char *refname,
     -+                   struct ref_record *ref) {
     -+  struct iterator it = {};
     -+  struct merged_table *mt = stack_merged_table(st);
     -+  int err = merged_table_seek_ref(mt, &it, refname);
     -+  if (err) {
     -+    goto exit;
     -+  }
     -+
     -+  err = iterator_next_ref(it, ref);
     -+  if (err) {
     -+    goto exit;
     -+  }
     -+
     -+  if (0 != strcmp(ref->ref_name, refname) || ref_record_is_deletion(ref)) {
     -+    err = 1;
     -+    goto exit;
     -+  }
     ++		   struct ref_record *ref)
     ++{
     ++	struct iterator it = {};
     ++	struct merged_table *mt = stack_merged_table(st);
     ++	int err = merged_table_seek_ref(mt, &it, refname);
     ++	if (err) {
     ++		goto exit;
     ++	}
     ++
     ++	err = iterator_next_ref(it, ref);
     ++	if (err) {
     ++		goto exit;
     ++	}
     ++
     ++	if (strcmp(ref->ref_name, refname) || ref_record_is_deletion(ref)) {
     ++		err = 1;
     ++		goto exit;
     ++	}
      +
      +exit:
     -+  iterator_destroy(&it);
     -+  return err;
     ++	iterator_destroy(&it);
     ++	return err;
      +}
      +
      +int stack_read_log(struct stack *st, const char *refname,
     -+                   struct log_record *log) {
     -+  struct iterator it = {};
     -+  struct merged_table *mt = stack_merged_table(st);
     -+  int err = merged_table_seek_log(mt, &it, refname);
     -+  if (err) {
     -+    goto exit;
     -+  }
     -+
     -+  err = iterator_next_log(it, log);
     -+  if (err) {
     -+    goto exit;
     -+  }
     -+
     -+  if (0 != strcmp(log->ref_name, refname) || log_record_is_deletion(log)) {
     -+    err = 1;
     -+    goto exit;
     -+  }
     ++		   struct log_record *log)
     ++{
     ++	struct iterator it = {};
     ++	struct merged_table *mt = stack_merged_table(st);
     ++	int err = merged_table_seek_log(mt, &it, refname);
     ++	if (err) {
     ++		goto exit;
     ++	}
     ++
     ++	err = iterator_next_log(it, log);
     ++	if (err) {
     ++		goto exit;
     ++	}
     ++
     ++	if (strcmp(log->ref_name, refname) || log_record_is_deletion(log)) {
     ++		err = 1;
     ++		goto exit;
     ++	}
      +
      +exit:
     -+  iterator_destroy(&it);
     -+  return err;
     ++	iterator_destroy(&it);
     ++	return err;
      +}
      
       diff --git a/reftable/stack.h b/reftable/stack.h
     @@ -6692,26 +6840,26 @@
      +#include "reftable.h"
      +
      +struct stack {
     -+  char *list_file;
     -+  char *reftable_dir;
     ++	char *list_file;
     ++	char *reftable_dir;
      +
     -+  struct write_options config;
     ++	struct write_options config;
      +
     -+  struct merged_table *merged;
     -+  struct compaction_stats stats;
     ++	struct merged_table *merged;
     ++	struct compaction_stats stats;
      +};
      +
      +int read_lines(const char *filename, char ***lines);
      +int stack_try_add(struct stack *st,
     -+                  int (*write_table)(struct writer *wr, void *arg), void *arg);
     ++		  int (*write_table)(struct writer *wr, void *arg), void *arg);
      +int stack_write_compact(struct stack *st, struct writer *wr, int first,
     -+                        int last, struct log_expiry_config *config);
     ++			int last, struct log_expiry_config *config);
      +int fastlog2(uint64_t sz);
      +
      +struct segment {
     -+  int start, end;
     -+  int log;
     -+  uint64_t bytes;
     ++	int start, end;
     ++	int log;
     ++	uint64_t bytes;
      +};
      +
      +struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
     @@ -6734,8 +6882,7 @@
      +
      +#include "stack.h"
      +
     -+#include <string.h>
     -+#include <unistd.h>
     ++#include "system.h"
      +
      +#include "basics.h"
      +#include "constants.h"
     @@ -6743,252 +6890,311 @@
      +#include "reftable.h"
      +#include "test_framework.h"
      +
     -+void test_read_file(void) {
     -+  char fn[256] = "/tmp/stack.XXXXXX";
     -+  int fd = mkstemp(fn);
     -+  assert(fd > 0);
     ++void test_read_file(void)
     ++{
     ++	char fn[256] = "/tmp/stack.test_read_file.XXXXXX";
     ++	int fd = mkstemp(fn);
     ++	assert(fd > 0);
     ++
     ++	char out[1024] = "line1\n\nline2\nline3";
     ++
     ++	int n = write(fd, out, strlen(out));
     ++	assert(n == strlen(out));
     ++	int err = close(fd);
     ++	assert(err >= 0);
     ++
     ++	char **names = NULL;
     ++	err = read_lines(fn, &names);
     ++	assert_err(err);
     ++
     ++	char *want[] = { "line1", "line2", "line3" };
     ++	int i = 0;
     ++	for (i = 0; names[i] != NULL; i++) {
     ++		assert(0 == strcmp(want[i], names[i]));
     ++	}
     ++	free_names(names);
     ++	remove(fn);
     ++}
     ++
     ++void test_parse_names(void)
     ++{
     ++	char buf[] = "line\n";
     ++	char **names = NULL;
     ++	parse_names(buf, strlen(buf), &names);
     ++
     ++	assert(NULL != names[0]);
     ++	assert(0 == strcmp(names[0], "line"));
     ++	assert(NULL == names[1]);
     ++	free_names(names);
     ++}
     ++
     ++void test_names_equal(void)
     ++{
     ++	char *a[] = { "a", "b", "c", NULL };
     ++	char *b[] = { "a", "b", "d", NULL };
     ++	char *c[] = { "a", "b", NULL };
     ++
     ++	assert(names_equal(a, a));
     ++	assert(!names_equal(a, b));
     ++	assert(!names_equal(a, c));
     ++}
     ++
     ++int write_test_ref(struct writer *wr, void *arg)
     ++{
     ++	struct ref_record *ref = arg;
     ++
     ++	writer_set_limits(wr, ref->update_index, ref->update_index);
     ++	int err = writer_add_ref(wr, ref);
     ++
     ++	return err;
     ++}
     ++
     ++int write_test_log(struct writer *wr, void *arg)
     ++{
     ++	struct log_record *log = arg;
     ++
     ++	writer_set_limits(wr, log->update_index, log->update_index);
     ++	int err = writer_add_log(wr, log);
     ++
     ++	return err;
     ++}
     ++
     ++void test_stack_add(void)
     ++{
     ++	int i = 0;
     ++	char dir[256] = "/tmp/stack.test_stack_add.XXXXXX";
     ++	assert(mkdtemp(dir));
     ++	printf("%s\n", dir);
     ++	char fn[256] = "";
     ++	strcat(fn, dir);
     ++	strcat(fn, "/refs");
     ++
     ++	struct write_options cfg = {};
     ++	struct stack *st = NULL;
     ++	int err = new_stack(&st, dir, fn, cfg);
     ++	assert_err(err);
     ++
     ++	struct ref_record refs[2] = {};
     ++	struct log_record logs[2] = {};
     ++	int N = ARRAYSIZE(refs);
     ++	for (i = 0; i < N; i++) {
     ++		char buf[256];
     ++		snprintf(buf, sizeof(buf), "branch%02d", i);
     ++		refs[i].ref_name = strdup(buf);
     ++		refs[i].value = malloc(SHA1_SIZE);
     ++		refs[i].update_index = i + 1;
     ++		set_test_hash(refs[i].value, i);
     ++
     ++		logs[i].ref_name = strdup(buf);
     ++		logs[i].update_index = N + i + 1;
     ++		logs[i].new_hash = malloc(SHA1_SIZE);
     ++		logs[i].email = strdup("identity@invalid");
     ++		set_test_hash(logs[i].new_hash, i);
     ++	}
     ++
     ++	for (i = 0; i < N; i++) {
     ++		int err = stack_add(st, &write_test_ref, &refs[i]);
     ++		assert_err(err);
     ++	}
     ++
     ++	for (i = 0; i < N; i++) {
     ++		int err = stack_add(st, &write_test_log, &logs[i]);
     ++		assert_err(err);
     ++	}
     ++
     ++	err = stack_compact_all(st, NULL);
     ++	assert_err(err);
     ++
     ++	for (i = 0; i < N; i++) {
     ++		struct ref_record dest = {};
     ++		int err = stack_read_ref(st, refs[i].ref_name, &dest);
     ++		assert_err(err);
     ++		assert(ref_record_equal(&dest, refs + i, SHA1_SIZE));
     ++		ref_record_clear(&dest);
     ++	}
     ++
     ++	for (i = 0; i < N; i++) {
     ++		struct log_record dest = {};
     ++		int err = stack_read_log(st, refs[i].ref_name, &dest);
     ++		assert_err(err);
     ++		assert(log_record_equal(&dest, logs + i, SHA1_SIZE));
     ++		log_record_clear(&dest);
     ++	}
     ++
     ++	/* cleanup */
     ++	stack_destroy(st);
     ++	for (i = 0; i < N; i++) {
     ++		ref_record_clear(&refs[i]);
     ++		log_record_clear(&logs[i]);
     ++	}
     ++}
     ++
     ++void test_log2(void)
     ++{
     ++	assert(1 == fastlog2(3));
     ++	assert(2 == fastlog2(4));
     ++	assert(2 == fastlog2(5));
     ++}
     ++
     ++void test_sizes_to_segments(void)
     ++{
     ++	uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
     ++	/* .................0  1  2  3  4  5 */
     ++
     ++	int seglen = 0;
     ++	struct segment *segs =
     ++		sizes_to_segments(&seglen, sizes, ARRAYSIZE(sizes));
     ++	assert(segs[2].log == 3);
     ++	assert(segs[2].start == 5);
     ++	assert(segs[2].end == 6);
     ++
     ++	assert(segs[1].log == 2);
     ++	assert(segs[1].start == 2);
     ++	assert(segs[1].end == 5);
     ++	free(segs);
     ++}
     ++
     ++void test_suggest_compaction_segment(void)
     ++{
     ++	{
     ++		uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
     ++		/* .................0    1    2  3   4  5  6 */
     ++		struct segment min =
     ++			suggest_compaction_segment(sizes, ARRAYSIZE(sizes));
     ++		assert(min.start == 2);
     ++		assert(min.end == 7);
     ++	}
     ++
     ++	{
     ++		uint64_t sizes[] = { 64, 32, 16, 8, 4, 2 };
     ++		struct segment result =
     ++			suggest_compaction_segment(sizes, ARRAYSIZE(sizes));
     ++		assert(result.start == result.end);
     ++	}
     ++}
     ++
     ++void test_reflog_expire(void)
     ++{
     ++	char dir[256] = "/tmp/stack.test_reflog_expire.XXXXXX";
     ++	assert(mkdtemp(dir));
     ++	printf("%s\n", dir);
     ++	char fn[256] = "";
     ++	strcat(fn, dir);
     ++	strcat(fn, "/refs");
     ++
     ++	struct write_options cfg = {};
     ++	struct stack *st = NULL;
     ++	int err = new_stack(&st, dir, fn, cfg);
     ++	assert_err(err);
     ++
     ++	struct log_record logs[20] = {};
     ++	int N = ARRAYSIZE(logs) - 1;
     ++	int i = 0;
     ++	for (i = 1; i <= N; i++) {
     ++		char buf[256];
     ++		snprintf(buf, sizeof(buf), "branch%02d", i);
     ++
     ++		logs[i].ref_name = strdup(buf);
     ++		logs[i].update_index = i;
     ++		logs[i].time = i;
     ++		logs[i].new_hash = malloc(SHA1_SIZE);
     ++		logs[i].email = strdup("identity@invalid");
     ++		set_test_hash(logs[i].new_hash, i);
     ++	}
     ++
     ++	for (i = 1; i <= N; i++) {
     ++		int err = stack_add(st, &write_test_log, &logs[i]);
     ++		assert_err(err);
     ++	}
     ++
     ++	err = stack_compact_all(st, NULL);
     ++	assert_err(err);
     ++
     ++	struct log_expiry_config expiry = {
     ++		.time = 10,
     ++	};
     ++	err = stack_compact_all(st, &expiry);
     ++	assert_err(err);
     ++
     ++	struct log_record log = {};
     ++	err = stack_read_log(st, logs[9].ref_name, &log);
     ++	assert(err == 1);
     ++
     ++	err = stack_read_log(st, logs[11].ref_name, &log);
     ++	assert_err(err);
     ++
     ++	expiry.min_update_index = 15;
     ++	err = stack_compact_all(st, &expiry);
     ++	assert_err(err);
     ++
     ++	err = stack_read_log(st, logs[14].ref_name, &log);
     ++	assert(err == 1);
     ++
     ++	err = stack_read_log(st, logs[16].ref_name, &log);
     ++	assert_err(err);
     ++
     ++	/* cleanup */
     ++	stack_destroy(st);
     ++	for (i = 0; i < N; i++) {
     ++		log_record_clear(&logs[i]);
     ++	}
     ++}
     ++
     ++int main()
     ++{
     ++	add_test_case("test_reflog_expire", test_reflog_expire);
     ++	add_test_case("test_suggest_compaction_segment",
     ++		      &test_suggest_compaction_segment);
     ++	add_test_case("test_sizes_to_segments", &test_sizes_to_segments);
     ++	add_test_case("test_log2", &test_log2);
     ++	add_test_case("test_parse_names", &test_parse_names);
     ++	add_test_case("test_read_file", &test_read_file);
     ++	add_test_case("test_names_equal", &test_names_equal);
     ++	add_test_case("test_stack_add", &test_stack_add);
     ++	test_main();
     ++}
     +
     + diff --git a/reftable/system.h b/reftable/system.h
     + new file mode 100644
     + --- /dev/null
     + +++ b/reftable/system.h
     +@@
     ++/*
     ++Copyright 2020 Google LLC
      +
     -+  char out[1024] = "line1\n\nline2\nline3";
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
      +
     -+  int n = write(fd, out, strlen(out));
     -+  assert(n == strlen(out));
     -+  int err = close(fd);
     -+  assert(err >= 0);
     ++#ifndef SYSTEM_H
     ++#define SYSTEM_H
      +
     -+  char **names = NULL;
     -+  err = read_lines(fn, &names);
     -+  assert_err(err);
     ++#include "config.h"
      +
     -+  char *want[] = {"line1", "line2", "line3"};
     -+  for (int i = 0; names[i] != NULL; i++) {
     -+    assert(0 == strcmp(want[i], names[i]));
     -+  }
     -+  free_names(names);
     -+  remove(fn);
     -+}
     -+
     -+void test_parse_names(void) {
     -+  char buf[] = "line\n";
     -+  char **names = NULL;
     -+  parse_names(buf, strlen(buf), &names);
     -+
     -+  assert(NULL != names[0]);
     -+  assert(0 == strcmp(names[0], "line"));
     -+  assert(NULL == names[1]);
     -+  free_names(names);
     -+}
     -+
     -+void test_names_equal(void) {
     -+  char *a[] = {"a", "b", "c", NULL};
     -+  char *b[] = {"a", "b", "d", NULL};
     -+  char *c[] = {"a", "b", NULL};
     -+
     -+  assert(names_equal(a, a));
     -+  assert(!names_equal(a, b));
     -+  assert(!names_equal(a, c));
     -+}
     -+
     -+int write_test_ref(struct writer *wr, void *arg) {
     -+  struct ref_record *ref = arg;
     -+
     -+  writer_set_limits(wr, ref->update_index, ref->update_index);
     -+  int err = writer_add_ref(wr, ref);
     -+
     -+  return err;
     -+}
     ++#ifndef REFTABLE_STANDALONE
     ++#include "git-compat-util.h"
     ++#else
     ++#include <assert.h>
     ++#include <errno.h>
     ++#include <fcntl.h>
     ++#include <stdint.h>
     ++#include <stdio.h>
     ++#include <stdlib.h>
     ++#include <string.h>
     ++#include <sys/stat.h>
     ++#include <sys/time.h>
     ++#include <sys/types.h>
     ++#include <unistd.h>
     ++#include <zlib.h>
      +
     -+int write_test_log(struct writer *wr, void *arg) {
     -+  struct log_record *log = arg;
     -+
     -+  writer_set_limits(wr, log->update_index, log->update_index);
     -+  int err = writer_add_log(wr, log);
     -+
     -+  return err;
     -+}
     -+
     -+void test_stack_add(void) {
     -+  char dir[256] = "/tmp/stack.XXXXXX";
     -+  assert(mkdtemp(dir));
     -+  printf("%s\n", dir);
     -+  char fn[256] = "";
     -+  strcat(fn, dir);
     -+  strcat(fn, "/refs");
     -+
     -+  struct write_options cfg = {};
     -+  struct stack *st = NULL;
     -+  int err = new_stack(&st, dir, fn, cfg);
     -+  assert_err(err);
     -+
     -+  struct ref_record refs[2] = {};
     -+  struct log_record logs[2] = {};
     -+  int N = ARRAYSIZE(refs);
     -+  for (int i = 0; i < N; i++) {
     -+    char buf[256];
     -+    sprintf(buf, "branch%02d", i);
     -+    refs[i].ref_name = strdup(buf);
     -+    refs[i].value = malloc(SHA1_SIZE);
     -+    refs[i].update_index = i + 1;
     -+    set_test_hash(refs[i].value, i);
     -+
     -+    logs[i].ref_name = strdup(buf);
     -+    logs[i].update_index = N + i+1;
     -+    logs[i].new_hash = malloc(SHA1_SIZE);
     -+    logs[i].email = strdup("identity@invalid");
     -+    set_test_hash(logs[i].new_hash, i);
     -+  }
     -+
     -+  for (int i = 0; i < N; i++) {
     -+    int err = stack_add(st, &write_test_ref, &refs[i]);
     -+    assert_err(err);
     -+  }
     -+
     -+  for (int i = 0; i < N; i++) {
     -+    int err = stack_add(st, &write_test_log, &logs[i]);
     -+    assert_err(err);
     -+  }
     -+
     -+  err = stack_compact_all(st, NULL);
     -+  assert_err(err);
     -+
     -+  for (int i = 0; i < N; i++) {
     -+    struct ref_record dest = {};
     -+    int err = stack_read_ref(st, refs[i].ref_name, &dest);
     -+    assert_err(err);
     -+    assert(ref_record_equal(&dest, refs + i, SHA1_SIZE));
     -+    ref_record_clear(&dest);
     -+  }
     -+
     -+  for (int i = 0; i < N; i++) {
     -+    struct log_record dest = {};
     -+    int err = stack_read_log(st, refs[i].ref_name, &dest);
     -+    assert_err(err);
     -+    assert(log_record_equal(&dest, logs + i, SHA1_SIZE));
     -+    log_record_clear(&dest);
     -+  }
     -+
     -+  // cleanup
     -+  stack_destroy(st);
     -+  for (int i = 0; i < N; i++) {
     -+    ref_record_clear(&refs[i]);
     -+    log_record_clear(&logs[i]);
     -+  }
     -+}
     -+
     -+void test_log2(void) {
     -+  assert(1 == fastlog2(3));
     -+  assert(2 == fastlog2(4));
     -+  assert(2 == fastlog2(5));
     -+}
     -+
     -+void test_sizes_to_segments(void) {
     -+  uint64_t sizes[] = {2, 3, 4, 5, 7, 9};
     -+  // .................0  1  2  3  4  5
     -+
     -+  int seglen = 0;
     -+  struct segment *segs = sizes_to_segments(&seglen, sizes, ARRAYSIZE(sizes));
     -+  assert(segs[2].log == 3);
     -+  assert(segs[2].start == 5);
     -+  assert(segs[2].end == 6);
     -+
     -+  assert(segs[1].log == 2);
     -+  assert(segs[1].start == 2);
     -+  assert(segs[1].end == 5);
     -+  free(segs);
     -+}
     -+
     -+void test_suggest_compaction_segment(void) {
     -+  {
     -+    uint64_t sizes[] = {128, 64, 17, 16, 9, 9, 9, 16, 16};
     -+    // .................0    1    2  3   4  5  6
     -+    struct segment min = suggest_compaction_segment(sizes, ARRAYSIZE(sizes));
     -+    assert(min.start == 2);
     -+    assert(min.end == 7);
     -+  }
     -+
     -+  {
     -+    uint64_t sizes[] = {64, 32, 16, 8, 4, 2};
     -+    struct segment result = suggest_compaction_segment(sizes, ARRAYSIZE(sizes));
     -+    assert(result.start == result.end);
     -+  }
     -+}
     -+
     -+void test_reflog_expire(void) {
     -+  char dir[256] = "/tmp/stack.XXXXXX";
     -+  assert(mkdtemp(dir));
     -+  printf("%s\n", dir);
     -+  char fn[256] = "";
     -+  strcat(fn, dir);
     -+  strcat(fn, "/refs");
     -+
     -+  struct write_options cfg = {};
     -+  struct stack *st = NULL;
     -+  int err = new_stack(&st, dir, fn, cfg);
     -+  assert_err(err);
     -+
     -+  struct log_record logs[20] = {};
     -+  int N = ARRAYSIZE(logs) -1;
     -+  for (int i = 1; i <= N; i++) {
     -+    char buf[256];
     -+    sprintf(buf, "branch%02d", i);
     -+
     -+    logs[i].ref_name = strdup(buf);
     -+    logs[i].update_index = i;
     -+    logs[i].time = i;
     -+    logs[i].new_hash = malloc(SHA1_SIZE);
     -+    logs[i].email = strdup("identity@invalid");
     -+    set_test_hash(logs[i].new_hash, i);
     -+  }
     -+
     -+  for (int i = 1; i <= N; i++) {
     -+    int err = stack_add(st, &write_test_log, &logs[i]);
     -+    assert_err(err);
     -+  }
     -+
     -+  err = stack_compact_all(st, NULL);
     -+  assert_err(err);
     -+
     -+  struct log_expiry_config expiry = {
     -+    .time = 10,
     -+  };
     -+  err = stack_compact_all(st, &expiry);
     -+  assert_err(err);
     -+
     -+  struct log_record log = {};
     -+  err = stack_read_log(st, logs[9].ref_name, &log);
     -+  assert(err == 1);
     -+
     -+  err = stack_read_log(st, logs[11].ref_name, &log);
     -+  assert_err(err);
     -+
     -+  expiry.min_update_index = 15;
     -+  err = stack_compact_all(st, &expiry);
     -+  assert_err(err);
     -+
     -+  err = stack_read_log(st, logs[14].ref_name, &log);
     -+  assert(err == 1);
     -+
     -+  err = stack_read_log(st, logs[16].ref_name, &log);
     -+  assert_err(err);
     -+
     -+  // cleanup
     -+  stack_destroy(st);
     -+  for (int i = 0; i < N; i++) {
     -+    log_record_clear(&logs[i]);
     -+  }
     -+}
     -+
     -+int main() {
     -+  add_test_case("test_reflog_expire", test_reflog_expire);
     -+  add_test_case("test_suggest_compaction_segment",
     -+                &test_suggest_compaction_segment);
     -+  add_test_case("test_sizes_to_segments", &test_sizes_to_segments);
     -+  add_test_case("test_log2", &test_log2);
     -+  add_test_case("test_parse_names", &test_parse_names);
     -+  add_test_case("test_read_file", &test_read_file);
     -+  add_test_case("test_names_equal", &test_names_equal);
     -+  add_test_case("test_stack_add", &test_stack_add);
     -+  test_main();
     -+}
     ++#define PRIuMAX "lu"
     ++#define PRIdMAX "ld"
     ++#define PRIxMAX "lx"
     ++
     ++#endif
     ++
     ++#endif
      
       diff --git a/reftable/test_framework.c b/reftable/test_framework.c
       new file mode 100644
     @@ -7005,9 +7211,7 @@
      +
      +#include "test_framework.h"
      +
     -+#include <stdio.h>
     -+#include <stdlib.h>
     -+#include <string.h>
     ++#include "system.h"
      +
      +#include "constants.h"
      +
     @@ -7015,45 +7219,54 @@
      +int test_case_len;
      +int test_case_cap;
      +
     -+struct test_case *new_test_case(const char *name, void (*testfunc)()) {
     -+  struct test_case *tc = malloc(sizeof(struct test_case));
     -+  tc->name = name;
     -+  tc->testfunc = testfunc;
     -+  return tc;
     -+}
     -+
     -+struct test_case *add_test_case(const char *name, void (*testfunc)()) {
     -+  struct test_case *tc = new_test_case(name, testfunc);
     -+  if (test_case_len == test_case_cap) {
     -+    test_case_cap = 2 * test_case_cap + 1;
     -+    test_cases = realloc(test_cases, sizeof(struct test_case) * test_case_cap);
     -+  }
     -+
     -+  test_cases[test_case_len++] = tc;
     -+  return tc;
     -+}
     -+
     -+void test_main() {
     -+  for (int i = 0; i < test_case_len; i++) {
     -+    printf("case %s\n", test_cases[i]->name);
     -+    test_cases[i]->testfunc();
     -+  }
     -+}
     -+
     -+void set_test_hash(byte *p, int i) { memset(p, (byte)i, SHA1_SIZE); }
     -+
     -+void print_names(char **a) {
     -+  if (a == NULL || *a == NULL) {
     -+    puts("[]");
     -+    return;
     -+  }
     -+  puts("[");
     -+  char **p = a;
     -+  while (*p) {
     -+    puts(*p);
     -+    p++;
     -+  }
     -+  puts("]");
     ++struct test_case *new_test_case(const char *name, void (*testfunc)())
     ++{
     ++	struct test_case *tc = malloc(sizeof(struct test_case));
     ++	tc->name = name;
     ++	tc->testfunc = testfunc;
     ++	return tc;
     ++}
     ++
     ++struct test_case *add_test_case(const char *name, void (*testfunc)())
     ++{
     ++	struct test_case *tc = new_test_case(name, testfunc);
     ++	if (test_case_len == test_case_cap) {
     ++		test_case_cap = 2 * test_case_cap + 1;
     ++		test_cases = realloc(test_cases,
     ++				     sizeof(struct test_case) * test_case_cap);
     ++	}
     ++
     ++	test_cases[test_case_len++] = tc;
     ++	return tc;
     ++}
     ++
     ++void test_main()
     ++{
     ++	int i = 0;
     ++	for (i = 0; i < test_case_len; i++) {
     ++		printf("case %s\n", test_cases[i]->name);
     ++		test_cases[i]->testfunc();
     ++	}
     ++}
     ++
     ++void set_test_hash(byte *p, int i)
     ++{
     ++	memset(p, (byte)i, SHA1_SIZE);
     ++}
     ++
     ++void print_names(char **a)
     ++{
     ++	if (a == NULL || *a == NULL) {
     ++		puts("[]");
     ++		return;
     ++	}
     ++	puts("[");
     ++	char **p = a;
     ++	while (*p) {
     ++		puts(*p);
     ++		p++;
     ++	}
     ++	puts("]");
      +}
      
       diff --git a/reftable/test_framework.h b/reftable/test_framework.h
     @@ -7072,8 +7285,7 @@
      +#ifndef TEST_FRAMEWORK_H
      +#define TEST_FRAMEWORK_H
      +
     -+#include <stdio.h>
     -+#include <stdlib.h>
     ++#include "system.h"
      +
      +#include "reftable.h"
      +
     @@ -7081,40 +7293,42 @@
      +#undef NDEBUG
      +#endif
      +
     -+#include <assert.h>
     ++#include "system.h"
      +
      +#ifdef assert
      +#undef assert
      +#endif
      +
     -+#define assert_err(c)                                                        \
     -+  if (c != 0) {                                                              \
     -+    fflush(stderr);                                                          \
     -+    fflush(stdout);                                                          \
     -+    fprintf(stderr, "%s: %d: error == %d (%s), want 0\n", __FILE__, __LINE__, c, error_str(c)); \
     -+    abort();                                                                 \
     -+  }
     -+
     -+#define assert_streq(a, b)                                                    \
     -+  if (0 != strcmp(a, b)) {                                                    \
     -+    fflush(stderr);                                                           \
     -+    fflush(stdout);                                                           \
     -+    fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, __LINE__, #a, a, \
     -+            #b, b);                                                           \
     -+    abort();                                                                  \
     -+  }
     -+
     -+#define assert(c)                                                             \
     -+  if (!(c)) {                                                                 \
     -+    fflush(stderr);                                                           \
     -+    fflush(stdout);                                                           \
     -+    fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, __LINE__, #c); \
     -+    abort();                                                                  \
     -+  }
     ++#define assert_err(c)                                                 \
     ++	if (c != 0) {                                                 \
     ++		fflush(stderr);                                       \
     ++		fflush(stdout);                                       \
     ++		fprintf(stderr, "%s: %d: error == %d (%s), want 0\n", \
     ++			__FILE__, __LINE__, c, error_str(c));         \
     ++		abort();                                              \
     ++	}
     ++
     ++#define assert_streq(a, b)                                               \
     ++	if (strcmp(a, b)) {                                              \
     ++		fflush(stderr);                                          \
     ++		fflush(stdout);                                          \
     ++		fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, \
     ++			__LINE__, #a, a, #b, b);                         \
     ++		abort();                                                 \
     ++	}
     ++
     ++#define assert(c)                                                          \
     ++	if (!(c)) {                                                        \
     ++		fflush(stderr);                                            \
     ++		fflush(stdout);                                            \
     ++		fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, \
     ++			__LINE__, #c);                                     \
     ++		abort();                                                   \
     ++	}
      +
      +struct test_case {
     -+  const char *name;
     -+  void (*testfunc)();
     ++	const char *name;
     ++	void (*testfunc)();
      +};
      +
      +struct test_case *new_test_case(const char *name, void (*testfunc)());
     @@ -7140,55 +7354,61 @@
      +
      +#include "tree.h"
      +
     -+#include <stdlib.h>
     ++#include "system.h"
      +
      +struct tree_node *tree_search(void *key, struct tree_node **rootp,
     -+                              int (*compare)(const void *, const void *),
     -+                              int insert) {
     -+  if (*rootp == NULL) {
     -+    if (!insert) {
     -+      return NULL;
     -+    } else {
     -+      struct tree_node *n = calloc(sizeof(struct tree_node), 1);
     -+      n->key = key;
     -+      *rootp = n;
     -+      return *rootp;
     -+    }
     -+  }
     -+
     -+  {
     -+    int res = compare(key, (*rootp)->key);
     -+    if (res < 0) {
     -+      return tree_search(key, &(*rootp)->left, compare, insert);
     -+    } else if (res > 0) {
     -+      return tree_search(key, &(*rootp)->right, compare, insert);
     -+    }
     -+  }
     -+  return *rootp;
     ++			      int (*compare)(const void *, const void *),
     ++			      int insert)
     ++{
     ++	if (*rootp == NULL) {
     ++		if (!insert) {
     ++			return NULL;
     ++		} else {
     ++			struct tree_node *n =
     ++				calloc(sizeof(struct tree_node), 1);
     ++			n->key = key;
     ++			*rootp = n;
     ++			return *rootp;
     ++		}
     ++	}
     ++
     ++	{
     ++		int res = compare(key, (*rootp)->key);
     ++		if (res < 0) {
     ++			return tree_search(key, &(*rootp)->left, compare,
     ++					   insert);
     ++		} else if (res > 0) {
     ++			return tree_search(key, &(*rootp)->right, compare,
     ++					   insert);
     ++		}
     ++	}
     ++	return *rootp;
      +}
      +
      +void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
     -+                void *arg) {
     -+  if (t->left != NULL) {
     -+    infix_walk(t->left, action, arg);
     -+  }
     -+  action(arg, t->key);
     -+  if (t->right != NULL) {
     -+    infix_walk(t->right, action, arg);
     -+  }
     -+}
     -+
     -+void tree_free(struct tree_node *t) {
     -+  if (t == NULL) {
     -+    return;
     -+  }
     -+  if (t->left != NULL) {
     -+    tree_free(t->left);
     -+  }
     -+  if (t->right != NULL) {
     -+    tree_free(t->right);
     -+  }
     -+  free(t);
     ++		void *arg)
     ++{
     ++	if (t->left != NULL) {
     ++		infix_walk(t->left, action, arg);
     ++	}
     ++	action(arg, t->key);
     ++	if (t->right != NULL) {
     ++		infix_walk(t->right, action, arg);
     ++	}
     ++}
     ++
     ++void tree_free(struct tree_node *t)
     ++{
     ++	if (t == NULL) {
     ++		return;
     ++	}
     ++	if (t->left != NULL) {
     ++		tree_free(t->left);
     ++	}
     ++	if (t->right != NULL) {
     ++		tree_free(t->right);
     ++	}
     ++	free(t);
      +}
      
       diff --git a/reftable/tree.h b/reftable/tree.h
     @@ -7208,15 +7428,15 @@
      +#define TREE_H
      +
      +struct tree_node {
     -+  void *key;
     -+  struct tree_node *left, *right;
     ++	void *key;
     ++	struct tree_node *left, *right;
      +};
      +
      +struct tree_node *tree_search(void *key, struct tree_node **rootp,
     -+                              int (*compare)(const void *, const void *),
     -+                              int insert);
     ++			      int (*compare)(const void *, const void *),
     ++			      int insert);
      +void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
     -+                void *arg);
     ++		void *arg);
      +void tree_free(struct tree_node *t);
      +
      +#endif
     @@ -7241,44 +7461,51 @@
      +#include "reftable.h"
      +#include "test_framework.h"
      +
     -+static int test_compare(const void *a, const void *b) { return a - b; }
     ++static int test_compare(const void *a, const void *b)
     ++{
     ++	return a - b;
     ++}
      +
      +struct curry {
     -+  void *last;
     ++	void *last;
      +};
      +
     -+void check_increasing(void *arg, void *key) {
     -+  struct curry *c = (struct curry *)arg;
     -+  if (c->last != NULL) {
     -+    assert(test_compare(c->last, key) < 0);
     -+  }
     -+  c->last = key;
     ++void check_increasing(void *arg, void *key)
     ++{
     ++	struct curry *c = (struct curry *)arg;
     ++	if (c->last != NULL) {
     ++		assert(test_compare(c->last, key) < 0);
     ++	}
     ++	c->last = key;
      +}
      +
     -+void test_tree() {
     -+  struct tree_node *root = NULL;
     ++void test_tree()
     ++{
     ++	struct tree_node *root = NULL;
      +
     -+  void *values[11] = {};
     -+  struct tree_node *nodes[11] = {};
     -+  int i = 1;
     -+  do {
     -+    nodes[i] = tree_search(values + i, &root, &test_compare, 1);
     -+    i = (i * 7) % 11;
     -+  } while (i != 1);
     ++	void *values[11] = {};
     ++	struct tree_node *nodes[11] = {};
     ++	int i = 1;
     ++	do {
     ++		nodes[i] = tree_search(values + i, &root, &test_compare, 1);
     ++		i = (i * 7) % 11;
     ++	} while (i != 1);
      +
     -+  for (int i = 1; i < ARRAYSIZE(nodes); i++) {
     -+    assert(values + i == nodes[i]->key);
     -+    assert(nodes[i] == tree_search(values + i, &root, &test_compare, 0));
     -+  }
     ++	for (i = 1; i < ARRAYSIZE(nodes); i++) {
     ++		assert(values + i == nodes[i]->key);
     ++		assert(nodes[i] ==
     ++		       tree_search(values + i, &root, &test_compare, 0));
     ++	}
      +
     -+  struct curry c = {};
     -+  infix_walk(root, check_increasing, &c);
     -+  tree_free(root);
     ++	struct curry c = {};
     ++	infix_walk(root, check_increasing, &c);
     ++	tree_free(root);
      +}
      +
     -+int main() {
     -+  add_test_case("test_tree", &test_tree);
     -+  test_main();
     ++int main()
     ++{
     ++	add_test_case("test_tree", &test_tree);
     ++	test_main();
      +}
      
       diff --git a/reftable/writer.c b/reftable/writer.c
     @@ -7296,12 +7523,7 @@
      +
      +#include "writer.h"
      +
     -+#include <assert.h>
     -+#include <stdint.h>
     -+#include <stdio.h>  // debug
     -+#include <stdlib.h>
     -+#include <string.h>
     -+#include <zlib.h>
     ++#include "system.h"
      +
      +#include "block.h"
      +#include "constants.h"
     @@ -7309,568 +7531,611 @@
      +#include "reftable.h"
      +#include "tree.h"
      +
     -+static struct block_stats *writer_block_stats(struct writer *w, byte typ) {
     -+  switch (typ) {
     -+    case 'r':
     -+      return &w->stats.ref_stats;
     -+    case 'o':
     -+      return &w->stats.obj_stats;
     -+    case 'i':
     -+      return &w->stats.idx_stats;
     -+    case 'g':
     -+      return &w->stats.log_stats;
     -+  }
     -+  assert(false);
     ++static struct block_stats *writer_block_stats(struct writer *w, byte typ)
     ++{
     ++	switch (typ) {
     ++	case 'r':
     ++		return &w->stats.ref_stats;
     ++	case 'o':
     ++		return &w->stats.obj_stats;
     ++	case 'i':
     ++		return &w->stats.idx_stats;
     ++	case 'g':
     ++		return &w->stats.log_stats;
     ++	}
     ++	assert(false);
      +}
      +
      +/* write data, queuing the padding for the next write. Returns negative for
      + * error. */
     -+static int padded_write(struct writer *w, byte *data, size_t len, int padding) {
     -+  int n = 0;
     -+  if (w->pending_padding > 0) {
     -+    byte *zeroed = calloc(w->pending_padding, 1);
     -+    int n = w->write(w->write_arg, zeroed, w->pending_padding);
     -+    if (n < 0) {
     -+      return n;
     -+    }
     -+
     -+    w->pending_padding = 0;
     -+    free(zeroed);
     -+  }
     -+
     -+  w->pending_padding = padding;
     -+  n = w->write(w->write_arg, data, len);
     -+  if (n < 0) {
     -+    return n;
     -+  }
     -+  n += padding;
     -+  return 0;
     -+}
     -+
     -+static void options_set_defaults(struct write_options *opts) {
     -+  if (opts->restart_interval == 0) {
     -+    opts->restart_interval = 16;
     -+  }
     -+
     -+  if (opts->block_size == 0) {
     -+    opts->block_size = DEFAULT_BLOCK_SIZE;
     -+  }
     -+}
     -+
     -+static int writer_write_header(struct writer *w, byte *dest) {
     -+  strcpy((char *)dest, "REFT");
     -+  dest[4] = 1;  // version
     -+  put_u24(dest + 5, w->opts.block_size);
     -+  put_u64(dest + 8, w->min_update_index);
     -+  put_u64(dest + 16, w->max_update_index);
     -+  return 24;
     -+}
     -+
     -+static void writer_reinit_block_writer(struct writer *w, byte typ) {
     -+  int block_start = 0;
     -+  if (w->next == 0) {
     -+    block_start = HEADER_SIZE;
     -+  }
     -+
     -+  block_writer_init(&w->block_writer_data, typ, w->block, w->opts.block_size,
     -+                    block_start, w->hash_size);
     -+  w->block_writer = &w->block_writer_data;
     -+  w->block_writer->restart_interval = w->opts.restart_interval;
     ++static int padded_write(struct writer *w, byte *data, size_t len, int padding)
     ++{
     ++	int n = 0;
     ++	if (w->pending_padding > 0) {
     ++		byte *zeroed = calloc(w->pending_padding, 1);
     ++		int n = w->write(w->write_arg, zeroed, w->pending_padding);
     ++		if (n < 0) {
     ++			return n;
     ++		}
     ++
     ++		w->pending_padding = 0;
     ++		free(zeroed);
     ++	}
     ++
     ++	w->pending_padding = padding;
     ++	n = w->write(w->write_arg, data, len);
     ++	if (n < 0) {
     ++		return n;
     ++	}
     ++	n += padding;
     ++	return 0;
     ++}
     ++
     ++static void options_set_defaults(struct write_options *opts)
     ++{
     ++	if (opts->restart_interval == 0) {
     ++		opts->restart_interval = 16;
     ++	}
     ++
     ++	if (opts->block_size == 0) {
     ++		opts->block_size = DEFAULT_BLOCK_SIZE;
     ++	}
     ++}
     ++
     ++static int writer_write_header(struct writer *w, byte *dest)
     ++{
     ++	memcpy((char *)dest, "REFT", 4);
     ++	dest[4] = 1; /* version */
     ++	put_u24(dest + 5, w->opts.block_size);
     ++	put_u64(dest + 8, w->min_update_index);
     ++	put_u64(dest + 16, w->max_update_index);
     ++	return 24;
     ++}
     ++
     ++static void writer_reinit_block_writer(struct writer *w, byte typ)
     ++{
     ++	int block_start = 0;
     ++	if (w->next == 0) {
     ++		block_start = HEADER_SIZE;
     ++	}
     ++
     ++	block_writer_init(&w->block_writer_data, typ, w->block,
     ++			  w->opts.block_size, block_start, w->hash_size);
     ++	w->block_writer = &w->block_writer_data;
     ++	w->block_writer->restart_interval = w->opts.restart_interval;
      +}
      +
      +struct writer *new_writer(int (*writer_func)(void *, byte *, int),
     -+                          void *writer_arg, struct write_options *opts) {
     -+  struct writer *wp = calloc(sizeof(struct writer), 1);
     -+  options_set_defaults(opts);
     -+  if (opts->block_size >= (1 << 24)) {
     -+    // TODO - error return?
     -+    abort();
     -+  }
     -+  wp->hash_size = SHA1_SIZE;
     -+  wp->block = calloc(opts->block_size, 1);
     -+  wp->write = writer_func;
     -+  wp->write_arg = writer_arg;
     -+  wp->opts = *opts;
     -+  writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
     ++			  void *writer_arg, struct write_options *opts)
     ++{
     ++	struct writer *wp = calloc(sizeof(struct writer), 1);
     ++	options_set_defaults(opts);
     ++	if (opts->block_size >= (1 << 24)) {
     ++		/* TODO - error return? */
     ++		abort();
     ++	}
     ++	wp->hash_size = SHA1_SIZE;
     ++	wp->block = calloc(opts->block_size, 1);
     ++	wp->write = writer_func;
     ++	wp->write_arg = writer_arg;
     ++	wp->opts = *opts;
     ++	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
      +
     -+  return wp;
     ++	return wp;
      +}
      +
     -+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max) {
     -+  w->min_update_index = min;
     -+  w->max_update_index = max;
     ++void writer_set_limits(struct writer *w, uint64_t min, uint64_t max)
     ++{
     ++	w->min_update_index = min;
     ++	w->max_update_index = max;
      +}
      +
     -+void writer_free(struct writer *w) {
     -+  free(w->block);
     -+  free(w);
     ++void writer_free(struct writer *w)
     ++{
     ++	free(w->block);
     ++	free(w);
      +}
      +
      +struct obj_index_tree_node {
     -+  struct slice hash;
     -+  uint64_t *offsets;
     -+  int offset_len;
     -+  int offset_cap;
     ++	struct slice hash;
     ++	uint64_t *offsets;
     ++	int offset_len;
     ++	int offset_cap;
      +};
      +
     -+static int obj_index_tree_node_compare(const void *a, const void *b) {
     -+  return slice_compare(((const struct obj_index_tree_node *)a)->hash,
     -+                       ((const struct obj_index_tree_node *)b)->hash);
     -+}
     -+
     -+static void writer_index_hash(struct writer *w, struct slice hash) {
     -+  uint64_t off = w->next;
     -+
     -+  struct obj_index_tree_node want = {.hash = hash};
     -+
     -+  struct tree_node *node =
     -+      tree_search(&want, &w->obj_index_tree, &obj_index_tree_node_compare, 0);
     -+  struct obj_index_tree_node *key = NULL;
     -+  if (node == NULL) {
     -+    key = calloc(sizeof(struct obj_index_tree_node), 1);
     -+    slice_copy(&key->hash, hash);
     -+    tree_search((void *)key, &w->obj_index_tree, &obj_index_tree_node_compare,
     -+                1);
     -+  } else {
     -+    key = node->key;
     -+  }
     -+
     -+  if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
     -+    return;
     -+  }
     -+
     -+  if (key->offset_len == key->offset_cap) {
     -+    key->offset_cap = 2 * key->offset_cap + 1;
     -+    key->offsets = realloc(key->offsets, sizeof(uint64_t) * key->offset_cap);
     -+  }
     -+
     -+  key->offsets[key->offset_len++] = off;
     -+}
     -+
     -+static int writer_add_record(struct writer *w, struct record rec) {
     -+  int result = -1;
     -+  struct slice key = {};
     -+  int err = 0;
     -+  record_key(rec, &key);
     -+  if (slice_compare(w->last_key, key) >= 0) {
     -+    goto exit;
     -+  }
     -+
     -+  slice_copy(&w->last_key, key);
     -+  if (w->block_writer == NULL) {
     -+    writer_reinit_block_writer(w, record_type(rec));
     -+  }
     -+
     -+  assert(block_writer_type(w->block_writer) == record_type(rec));
     -+
     -+  if (block_writer_add(w->block_writer, rec) == 0) {
     -+    result = 0;
     -+    goto exit;
     -+  }
     -+
     -+  err = writer_flush_block(w);
     -+  if (err < 0) {
     -+    result = err;
     -+    goto exit;
     -+  }
     -+
     -+  writer_reinit_block_writer(w, record_type(rec));
     -+  err = block_writer_add(w->block_writer, rec);
     -+  if (err < 0) {
     -+    result = err;
     -+    goto exit;
     -+  }
     -+
     -+  result = 0;
     ++static int obj_index_tree_node_compare(const void *a, const void *b)
     ++{
     ++	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
     ++			     ((const struct obj_index_tree_node *)b)->hash);
     ++}
     ++
     ++static void writer_index_hash(struct writer *w, struct slice hash)
     ++{
     ++	uint64_t off = w->next;
     ++
     ++	struct obj_index_tree_node want = { .hash = hash };
     ++
     ++	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
     ++					     &obj_index_tree_node_compare, 0);
     ++	struct obj_index_tree_node *key = NULL;
     ++	if (node == NULL) {
     ++		key = calloc(sizeof(struct obj_index_tree_node), 1);
     ++		slice_copy(&key->hash, hash);
     ++		tree_search((void *)key, &w->obj_index_tree,
     ++			    &obj_index_tree_node_compare, 1);
     ++	} else {
     ++		key = node->key;
     ++	}
     ++
     ++	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
     ++		return;
     ++	}
     ++
     ++	if (key->offset_len == key->offset_cap) {
     ++		key->offset_cap = 2 * key->offset_cap + 1;
     ++		key->offsets = realloc(key->offsets,
     ++				       sizeof(uint64_t) * key->offset_cap);
     ++	}
     ++
     ++	key->offsets[key->offset_len++] = off;
     ++}
     ++
     ++static int writer_add_record(struct writer *w, struct record rec)
     ++{
     ++	int result = -1;
     ++	struct slice key = {};
     ++	int err = 0;
     ++	record_key(rec, &key);
     ++	if (slice_compare(w->last_key, key) >= 0) {
     ++		goto exit;
     ++	}
     ++
     ++	slice_copy(&w->last_key, key);
     ++	if (w->block_writer == NULL) {
     ++		writer_reinit_block_writer(w, record_type(rec));
     ++	}
     ++
     ++	assert(block_writer_type(w->block_writer) == record_type(rec));
     ++
     ++	if (block_writer_add(w->block_writer, rec) == 0) {
     ++		result = 0;
     ++		goto exit;
     ++	}
     ++
     ++	err = writer_flush_block(w);
     ++	if (err < 0) {
     ++		result = err;
     ++		goto exit;
     ++	}
     ++
     ++	writer_reinit_block_writer(w, record_type(rec));
     ++	err = block_writer_add(w->block_writer, rec);
     ++	if (err < 0) {
     ++		result = err;
     ++		goto exit;
     ++	}
     ++
     ++	result = 0;
      +exit:
     -+  free(slice_yield(&key));
     -+  return result;
     -+}
     -+
     -+int writer_add_ref(struct writer *w, struct ref_record *ref) {
     -+  struct record rec = {};
     -+  struct ref_record copy = *ref;
     -+  int err = 0;
     -+
     -+  if (ref->update_index < w->min_update_index ||
     -+      ref->update_index > w->max_update_index) {
     -+    return API_ERROR;
     -+  }
     -+
     -+  record_from_ref(&rec, &copy);
     -+  copy.update_index -= w->min_update_index;
     -+  err = writer_add_record(w, rec);
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+
     -+  if (!w->opts.skip_index_objects && ref->value != NULL) {
     -+    struct slice h = {
     -+        .buf = ref->value,
     -+        .len = w->hash_size,
     -+    };
     -+
     -+    writer_index_hash(w, h);
     -+  }
     -+  if (!w->opts.skip_index_objects && ref->target_value != NULL) {
     -+    struct slice h = {
     -+        .buf = ref->target_value,
     -+        .len = w->hash_size,
     -+    };
     -+    writer_index_hash(w, h);
     -+  }
     -+  return 0;
     -+}
     -+
     -+int writer_add_refs(struct writer *w, struct ref_record *refs, int n) {
     -+  int err = 0;
     -+  qsort(refs, n, sizeof(struct ref_record), ref_record_compare_name);
     -+  for (int i = 0; err == 0 && i < n; i++) {
     -+    err = writer_add_ref(w, &refs[i]);
     -+  }
     -+  return err;
     -+}
     -+
     -+int writer_add_log(struct writer *w, struct log_record *log) {
     -+  if (w->block_writer != NULL &&
     -+      block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
     -+    int err = writer_finish_public_section(w);
     -+    if (err < 0) {
     -+      return err;
     -+    }
     -+  }
     -+
     -+  {
     -+    struct record rec = {};
     -+    int err;
     -+    record_from_log(&rec, log);
     -+    err = writer_add_record(w, rec);
     -+    return err;
     -+  }
     -+}
     -+
     -+int writer_add_logs(struct writer *w, struct log_record *logs, int n) {
     -+  int err = 0;
     -+  qsort(logs, n, sizeof(struct log_record), log_record_compare_key);
     -+  for (int i = 0; err == 0 && i < n; i++) {
     -+    err = writer_add_log(w, &logs[i]);
     -+  }
     -+  return err;
     -+}
     -+
     -+
     -+static int writer_finish_section(struct writer *w) {
     -+  byte typ = block_writer_type(w->block_writer);
     -+  uint64_t index_start = 0;
     -+  int max_level = 0;
     -+  int threshold = w->opts.unpadded ? 1 : 3;
     -+  int before_blocks = w->stats.idx_stats.blocks;
     -+  int err = writer_flush_block(w);
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+
     -+  while (w->index_len > threshold) {
     -+    struct index_record *idx = NULL;
     -+    int idx_len = 0;
     -+
     -+    max_level++;
     -+    index_start = w->next;
     -+    writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
     -+
     -+    idx = w->index;
     -+    idx_len = w->index_len;
     -+
     -+    w->index = NULL;
     -+    w->index_len = 0;
     -+    w->index_cap = 0;
     -+    for (int i = 0; i < idx_len; i++) {
     -+      struct record rec = {};
     -+      record_from_index(&rec, idx + i);
     -+      if (block_writer_add(w->block_writer, rec) == 0) {
     -+        continue;
     -+      }
     -+
     -+      {
     -+        int err = writer_flush_block(w);
     -+        if (err < 0) {
     -+          return err;
     -+        }
     -+      }
     -+
     -+      writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
     -+
     -+      err = block_writer_add(w->block_writer, rec);
     -+      assert(err == 0);
     -+    }
     -+    for (int i = 0; i < idx_len; i++) {
     -+      free(slice_yield(&idx[i].last_key));
     -+    }
     -+    free(idx);
     -+  }
     -+
     -+  writer_clear_index(w);
     -+
     -+  err = writer_flush_block(w);
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+
     -+  {
     -+    struct block_stats *bstats = writer_block_stats(w, typ);
     -+    bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
     -+    bstats->index_offset = index_start;
     -+    bstats->max_index_level = max_level;
     -+  }
     -+
     -+  // Reinit lastKey, as the next section can start with any key.
     -+  w->last_key.len = 0;
     -+
     -+  return 0;
     ++	free(slice_yield(&key));
     ++	return result;
     ++}
     ++
     ++int writer_add_ref(struct writer *w, struct ref_record *ref)
     ++{
     ++	struct record rec = {};
     ++	struct ref_record copy = *ref;
     ++	int err = 0;
     ++
     ++	if (ref->ref_name == NULL) {
     ++		return API_ERROR;
     ++	}
     ++	if (ref->update_index < w->min_update_index ||
     ++	    ref->update_index > w->max_update_index) {
     ++		return API_ERROR;
     ++	}
     ++
     ++	record_from_ref(&rec, &copy);
     ++	copy.update_index -= w->min_update_index;
     ++	err = writer_add_record(w, rec);
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	if (!w->opts.skip_index_objects && ref->value != NULL) {
     ++		struct slice h = {
     ++			.buf = ref->value,
     ++			.len = w->hash_size,
     ++		};
     ++
     ++		writer_index_hash(w, h);
     ++	}
     ++	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
     ++		struct slice h = {
     ++			.buf = ref->target_value,
     ++			.len = w->hash_size,
     ++		};
     ++		writer_index_hash(w, h);
     ++	}
     ++	return 0;
     ++}
     ++
     ++int writer_add_refs(struct writer *w, struct ref_record *refs, int n)
     ++{
     ++	int err = 0;
     ++	int i = 0;
     ++	qsort(refs, n, sizeof(struct ref_record), ref_record_compare_name);
     ++	for (i = 0; err == 0 && i < n; i++) {
     ++		err = writer_add_ref(w, &refs[i]);
     ++	}
     ++	return err;
     ++}
     ++
     ++int writer_add_log(struct writer *w, struct log_record *log)
     ++{
     ++	if (log->ref_name == NULL) {
     ++		return API_ERROR;
     ++	}
     ++
     ++	if (w->block_writer != NULL &&
     ++	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
     ++		int err = writer_finish_public_section(w);
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++	}
     ++
     ++	w->next -= w->pending_padding;
     ++	w->pending_padding = 0;
     ++
     ++	{
     ++		struct record rec = {};
     ++		int err;
     ++		record_from_log(&rec, log);
     ++		err = writer_add_record(w, rec);
     ++		return err;
     ++	}
     ++}
     ++
     ++int writer_add_logs(struct writer *w, struct log_record *logs, int n)
     ++{
     ++	int err = 0;
     ++	int i = 0;
     ++	qsort(logs, n, sizeof(struct log_record), log_record_compare_key);
     ++	for (i = 0; err == 0 && i < n; i++) {
     ++		err = writer_add_log(w, &logs[i]);
     ++	}
     ++	return err;
     ++}
     ++
     ++static int writer_finish_section(struct writer *w)
     ++{
     ++	byte typ = block_writer_type(w->block_writer);
     ++	uint64_t index_start = 0;
     ++	int max_level = 0;
     ++	int threshold = w->opts.unpadded ? 1 : 3;
     ++	int before_blocks = w->stats.idx_stats.blocks;
     ++	int err = writer_flush_block(w);
     ++	int i = 0;
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	while (w->index_len > threshold) {
     ++		struct index_record *idx = NULL;
     ++		int idx_len = 0;
     ++
     ++		max_level++;
     ++		index_start = w->next;
     ++		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
     ++
     ++		idx = w->index;
     ++		idx_len = w->index_len;
     ++
     ++		w->index = NULL;
     ++		w->index_len = 0;
     ++		w->index_cap = 0;
     ++		for (i = 0; i < idx_len; i++) {
     ++			struct record rec = {};
     ++			record_from_index(&rec, idx + i);
     ++			if (block_writer_add(w->block_writer, rec) == 0) {
     ++				continue;
     ++			}
     ++
     ++			{
     ++				int err = writer_flush_block(w);
     ++				if (err < 0) {
     ++					return err;
     ++				}
     ++			}
     ++
     ++			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
     ++
     ++			err = block_writer_add(w->block_writer, rec);
     ++			assert(err == 0);
     ++		}
     ++		for (i = 0; i < idx_len; i++) {
     ++			free(slice_yield(&idx[i].last_key));
     ++		}
     ++		free(idx);
     ++	}
     ++
     ++	writer_clear_index(w);
     ++
     ++	err = writer_flush_block(w);
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	{
     ++		struct block_stats *bstats = writer_block_stats(w, typ);
     ++		bstats->index_blocks =
     ++			w->stats.idx_stats.blocks - before_blocks;
     ++		bstats->index_offset = index_start;
     ++		bstats->max_index_level = max_level;
     ++	}
     ++
     ++	/* Reinit lastKey, as the next section can start with any key. */
     ++	w->last_key.len = 0;
     ++
     ++	return 0;
      +}
      +
      +struct common_prefix_arg {
     -+  struct slice *last;
     -+  int max;
     ++	struct slice *last;
     ++	int max;
      +};
      +
     -+static void update_common(void *void_arg, void *key) {
     -+  struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
     -+  struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
     -+  if (arg->last != NULL) {
     -+    int n = common_prefix_size(entry->hash, *arg->last);
     -+    if (n > arg->max) {
     -+      arg->max = n;
     -+    }
     -+  }
     -+  arg->last = &entry->hash;
     ++static void update_common(void *void_arg, void *key)
     ++{
     ++	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
     ++	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
     ++	if (arg->last != NULL) {
     ++		int n = common_prefix_size(entry->hash, *arg->last);
     ++		if (n > arg->max) {
     ++			arg->max = n;
     ++		}
     ++	}
     ++	arg->last = &entry->hash;
      +}
      +
      +struct write_record_arg {
     -+  struct writer *w;
     -+  int err;
     ++	struct writer *w;
     ++	int err;
      +};
      +
     -+static void write_object_record(void *void_arg, void *key) {
     -+  struct write_record_arg *arg = (struct write_record_arg *)void_arg;
     -+  struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
     -+  struct obj_record obj_rec = {
     -+      .hash_prefix = entry->hash.buf,
     -+      .hash_prefix_len = arg->w->stats.object_id_len,
     -+      .offsets = entry->offsets,
     -+      .offset_len = entry->offset_len,
     -+  };
     -+  struct record rec = {};
     -+  if (arg->err < 0) {
     -+    goto exit;
     -+  }
     -+
     -+  record_from_obj(&rec, &obj_rec);
     -+  arg->err = block_writer_add(arg->w->block_writer, rec);
     -+  if (arg->err == 0) {
     -+    goto exit;
     -+  }
     -+
     -+  arg->err = writer_flush_block(arg->w);
     -+  if (arg->err < 0) {
     -+    goto exit;
     -+  }
     -+
     -+  writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
     -+  arg->err = block_writer_add(arg->w->block_writer, rec);
     -+  if (arg->err == 0) {
     -+    goto exit;
     -+  }
     -+  obj_rec.offset_len = 0;
     -+  arg->err = block_writer_add(arg->w->block_writer, rec);
     -+
     -+  // Should be able to write into a fresh block.
     -+  assert(arg->err == 0);
     ++static void write_object_record(void *void_arg, void *key)
     ++{
     ++	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
     ++	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
     ++	struct obj_record obj_rec = {
     ++		.hash_prefix = entry->hash.buf,
     ++		.hash_prefix_len = arg->w->stats.object_id_len,
     ++		.offsets = entry->offsets,
     ++		.offset_len = entry->offset_len,
     ++	};
     ++	struct record rec = {};
     ++	if (arg->err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	record_from_obj(&rec, &obj_rec);
     ++	arg->err = block_writer_add(arg->w->block_writer, rec);
     ++	if (arg->err == 0) {
     ++		goto exit;
     ++	}
     ++
     ++	arg->err = writer_flush_block(arg->w);
     ++	if (arg->err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
     ++	arg->err = block_writer_add(arg->w->block_writer, rec);
     ++	if (arg->err == 0) {
     ++		goto exit;
     ++	}
     ++	obj_rec.offset_len = 0;
     ++	arg->err = block_writer_add(arg->w->block_writer, rec);
     ++
     ++	/* Should be able to write into a fresh block. */
     ++	assert(arg->err == 0);
      +
      +exit:;
      +}
      +
     -+static void object_record_free(void *void_arg, void *key) {
     -+  struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
     -+
     -+  free(entry->offsets);
     -+  entry->offsets = NULL;
     -+  free(slice_yield(&entry->hash));
     -+  free(entry);
     -+}
     -+
     -+static int writer_dump_object_index(struct writer *w) {
     -+  struct write_record_arg closure = {.w = w};
     -+  struct common_prefix_arg common = {};
     -+  if (w->obj_index_tree != NULL) {
     -+    infix_walk(w->obj_index_tree, &update_common, &common);
     -+  }
     -+  w->stats.object_id_len = common.max + 1;
     -+
     -+  writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
     -+
     -+  if (w->obj_index_tree != NULL) {
     -+    infix_walk(w->obj_index_tree, &write_object_record, &closure);
     -+  }
     -+
     -+  if (closure.err < 0) {
     -+    return closure.err;
     -+  }
     -+  return writer_finish_section(w);
     -+}
     -+
     -+int writer_finish_public_section(struct writer *w) {
     -+  byte typ = 0;
     -+  int err = 0;
     -+
     -+  if (w->block_writer == NULL) {
     -+    return 0;
     -+  }
     -+
     -+  typ = block_writer_type(w->block_writer);
     -+  err = writer_finish_section(w);
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+  if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
     -+      w->stats.ref_stats.index_blocks > 0) {
     -+    err = writer_dump_object_index(w);
     -+    if (err < 0) {
     -+      return err;
     -+    }
     -+  }
     -+
     -+  if (w->obj_index_tree != NULL) {
     -+    infix_walk(w->obj_index_tree, &object_record_free, NULL);
     -+    tree_free(w->obj_index_tree);
     -+    w->obj_index_tree = NULL;
     -+  }
     -+
     -+  w->block_writer = NULL;
     -+  return 0;
     -+}
     -+
     -+int writer_close(struct writer *w) {
     -+  byte footer[68];
     -+  byte *p = footer;
     -+
     -+  writer_finish_public_section(w);
     -+
     -+  writer_write_header(w, footer);
     -+  p += 24;
     -+  put_u64(p, w->stats.ref_stats.index_offset);
     -+  p += 8;
     -+  put_u64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
     -+  p += 8;
     -+  put_u64(p, w->stats.obj_stats.index_offset);
     -+  p += 8;
     -+
     -+  put_u64(p, w->stats.log_stats.offset);
     -+  p += 8;
     -+  put_u64(p, w->stats.log_stats.index_offset);
     -+  p += 8;
     -+
     -+  put_u32(p, crc32(0, footer, p - footer));
     -+  p += 4;
     -+  w->pending_padding = 0;
     -+
     -+  {
     -+    int n = padded_write(w, footer, sizeof(footer), 0);
     -+    if (n < 0) {
     -+      return n;
     -+    }
     -+  }
     -+
     -+  // free up memory.
     -+  block_writer_clear(&w->block_writer_data);
     -+  writer_clear_index(w);
     -+  free(slice_yield(&w->last_key));
     -+  return 0;
     -+}
     -+
     -+void writer_clear_index(struct writer *w) {
     -+  for (int i = 0; i < w->index_len; i++) {
     -+    free(slice_yield(&w->index[i].last_key));
     -+  }
     -+
     -+  free(w->index);
     -+  w->index = NULL;
     -+  w->index_len = 0;
     -+  w->index_cap = 0;
     ++static void object_record_free(void *void_arg, void *key)
     ++{
     ++	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
     ++
     ++	free(entry->offsets);
     ++	entry->offsets = NULL;
     ++	free(slice_yield(&entry->hash));
     ++	free(entry);
     ++}
     ++
     ++static int writer_dump_object_index(struct writer *w)
     ++{
     ++	struct write_record_arg closure = { .w = w };
     ++	struct common_prefix_arg common = {};
     ++	if (w->obj_index_tree != NULL) {
     ++		infix_walk(w->obj_index_tree, &update_common, &common);
     ++	}
     ++	w->stats.object_id_len = common.max + 1;
     ++
     ++	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
     ++
     ++	if (w->obj_index_tree != NULL) {
     ++		infix_walk(w->obj_index_tree, &write_object_record, &closure);
     ++	}
     ++
     ++	if (closure.err < 0) {
     ++		return closure.err;
     ++	}
     ++	return writer_finish_section(w);
     ++}
     ++
     ++int writer_finish_public_section(struct writer *w)
     ++{
     ++	byte typ = 0;
     ++	int err = 0;
     ++
     ++	if (w->block_writer == NULL) {
     ++		return 0;
     ++	}
     ++
     ++	typ = block_writer_type(w->block_writer);
     ++	err = writer_finish_section(w);
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
     ++	    w->stats.ref_stats.index_blocks > 0) {
     ++		err = writer_dump_object_index(w);
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++	}
     ++
     ++	if (w->obj_index_tree != NULL) {
     ++		infix_walk(w->obj_index_tree, &object_record_free, NULL);
     ++		tree_free(w->obj_index_tree);
     ++		w->obj_index_tree = NULL;
     ++	}
     ++
     ++	w->block_writer = NULL;
     ++	return 0;
     ++}
     ++
     ++int writer_close(struct writer *w)
     ++{
     ++	byte footer[68];
     ++	byte *p = footer;
     ++
     ++	writer_finish_public_section(w);
     ++
     ++	writer_write_header(w, footer);
     ++	p += 24;
     ++	put_u64(p, w->stats.ref_stats.index_offset);
     ++	p += 8;
     ++	put_u64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
     ++	p += 8;
     ++	put_u64(p, w->stats.obj_stats.index_offset);
     ++	p += 8;
     ++
     ++	put_u64(p, w->stats.log_stats.offset);
     ++	p += 8;
     ++	put_u64(p, w->stats.log_stats.index_offset);
     ++	p += 8;
     ++
     ++	put_u32(p, crc32(0, footer, p - footer));
     ++	p += 4;
     ++	w->pending_padding = 0;
     ++
     ++	{
     ++		int n = padded_write(w, footer, sizeof(footer), 0);
     ++		if (n < 0) {
     ++			return n;
     ++		}
     ++	}
     ++
     ++	/* free up memory. */
     ++	block_writer_clear(&w->block_writer_data);
     ++	writer_clear_index(w);
     ++	free(slice_yield(&w->last_key));
     ++	return 0;
     ++}
     ++
     ++void writer_clear_index(struct writer *w)
     ++{
     ++	int i = 0;
     ++	for (i = 0; i < w->index_len; i++) {
     ++		free(slice_yield(&w->index[i].last_key));
     ++	}
     ++
     ++	free(w->index);
     ++	w->index = NULL;
     ++	w->index_len = 0;
     ++	w->index_cap = 0;
      +}
      +
      +const int debug = 0;
      +
     -+static int writer_flush_nonempty_block(struct writer *w) {
     -+  byte typ = block_writer_type(w->block_writer);
     -+  struct block_stats *bstats = writer_block_stats(w, typ);
     -+  uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
     -+  int raw_bytes = block_writer_finish(w->block_writer);
     -+  int padding = 0;
     -+  int err = 0;
     -+  if (raw_bytes < 0) {
     -+    return raw_bytes;
     -+  }
     -+
     -+  if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
     -+    padding = w->opts.block_size - raw_bytes;
     -+  }
     -+
     -+  if (block_typ_off > 0) {
     -+    bstats->offset = block_typ_off;
     -+  }
     -+
     -+  bstats->entries += w->block_writer->entries;
     -+  bstats->restarts += w->block_writer->restart_len;
     -+  bstats->blocks++;
     -+  w->stats.blocks++;
     -+
     -+  if (debug) {
     -+    fprintf(stderr, "block %c off %ld sz %d (%d)\n", typ, w->next, raw_bytes,
     -+            get_u24(w->block + w->block_writer->header_off + 1));
     -+  }
     -+
     -+  if (w->next == 0) {
     -+    writer_write_header(w, w->block);
     -+  }
     -+
     -+  err = padded_write(w, w->block, raw_bytes, padding);
     -+  if (err < 0) {
     -+    return err;
     -+  }
     -+
     -+  if (w->index_cap == w->index_len) {
     -+    w->index_cap = 2 * w->index_cap + 1;
     -+    w->index = realloc(w->index, sizeof(struct index_record) * w->index_cap);
     -+  }
     -+
     -+  {
     -+    struct index_record ir = {
     -+        .offset = w->next,
     -+    };
     -+    slice_copy(&ir.last_key, w->block_writer->last_key);
     -+    w->index[w->index_len] = ir;
     -+  }
     -+
     -+  w->index_len++;
     -+  w->next += padding + raw_bytes;
     -+  block_writer_reset(&w->block_writer_data);
     -+  w->block_writer = NULL;
     -+  return 0;
     -+}
     -+
     -+int writer_flush_block(struct writer *w) {
     -+  if (w->block_writer == NULL) {
     -+    return 0;
     -+  }
     -+  if (w->block_writer->entries == 0) {
     -+    return 0;
     -+  }
     -+  return writer_flush_nonempty_block(w);
     -+}
     -+
     -+struct stats *writer_stats(struct writer *w) {
     -+  return &w->stats;
     ++static int writer_flush_nonempty_block(struct writer *w)
     ++{
     ++	byte typ = block_writer_type(w->block_writer);
     ++	struct block_stats *bstats = writer_block_stats(w, typ);
     ++	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
     ++	int raw_bytes = block_writer_finish(w->block_writer);
     ++	int padding = 0;
     ++	int err = 0;
     ++	if (raw_bytes < 0) {
     ++		return raw_bytes;
     ++	}
     ++
     ++	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
     ++		padding = w->opts.block_size - raw_bytes;
     ++	}
     ++
     ++	if (block_typ_off > 0) {
     ++		bstats->offset = block_typ_off;
     ++	}
     ++
     ++	bstats->entries += w->block_writer->entries;
     ++	bstats->restarts += w->block_writer->restart_len;
     ++	bstats->blocks++;
     ++	w->stats.blocks++;
     ++
     ++	if (debug) {
     ++		fprintf(stderr, "block %c off %" PRIuMAX " sz %d (%d)\n", typ,
     ++			w->next, raw_bytes,
     ++			get_u24(w->block + w->block_writer->header_off + 1));
     ++	}
     ++
     ++	if (w->next == 0) {
     ++		writer_write_header(w, w->block);
     ++	}
     ++
     ++	err = padded_write(w, w->block, raw_bytes, padding);
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	if (w->index_cap == w->index_len) {
     ++		w->index_cap = 2 * w->index_cap + 1;
     ++		w->index = realloc(w->index,
     ++				   sizeof(struct index_record) * w->index_cap);
     ++	}
     ++
     ++	{
     ++		struct index_record ir = {
     ++			.offset = w->next,
     ++		};
     ++		slice_copy(&ir.last_key, w->block_writer->last_key);
     ++		w->index[w->index_len] = ir;
     ++	}
     ++
     ++	w->index_len++;
     ++	w->next += padding + raw_bytes;
     ++	block_writer_reset(&w->block_writer_data);
     ++	w->block_writer = NULL;
     ++	return 0;
     ++}
     ++
     ++int writer_flush_block(struct writer *w)
     ++{
     ++	if (w->block_writer == NULL) {
     ++		return 0;
     ++	}
     ++	if (w->block_writer->entries == 0) {
     ++		return 0;
     ++	}
     ++	return writer_flush_nonempty_block(w);
     ++}
     ++
     ++struct stats *writer_stats(struct writer *w)
     ++{
     ++	return &w->stats;
      +}
      
       diff --git a/reftable/writer.h b/reftable/writer.h
     @@ -7896,27 +8161,27 @@
      +#include "tree.h"
      +
      +struct writer {
     -+  int (*write)(void *, byte *, int);
     -+  void *write_arg;
     -+  int pending_padding;
     -+  int hash_size;
     -+  struct slice last_key;
     -+
     -+  uint64_t next;
     -+  uint64_t min_update_index, max_update_index;
     -+  struct write_options opts;
     -+
     -+  byte *block;
     -+  struct block_writer *block_writer;
     -+  struct block_writer block_writer_data;
     -+  struct index_record *index;
     -+  int index_len;
     -+  int index_cap;
     -+
     -+  // tree for use with tsearch
     -+  struct tree_node *obj_index_tree;
     -+
     -+  struct stats stats;
     ++	int (*write)(void *, byte *, int);
     ++	void *write_arg;
     ++	int pending_padding;
     ++	int hash_size;
     ++	struct slice last_key;
     ++
     ++	uint64_t next;
     ++	uint64_t min_update_index, max_update_index;
     ++	struct write_options opts;
     ++
     ++	byte *block;
     ++	struct block_writer *block_writer;
     ++	struct block_writer block_writer_data;
     ++	struct index_record *index;
     ++	int index_len;
     ++	int index_cap;
     ++
     ++	/* tree for use with tsearch */
     ++	struct tree_node *obj_index_tree;
     ++
     ++	struct stats stats;
      +};
      +
      +int writer_flush_block(struct writer *w);
 5:  a615afa0a8 ! 5:  721201269d Reftable support for git-core
     @@ -2,33 +2,39 @@
      
          Reftable support for git-core
      
     -    $ ~/vc/git/git init
     -    warning: templates not found in /usr/local/google/home/hanwen/share/git-core/templates
     -    Initialized empty Git repository in /tmp/qz/.git/
     -    $ echo q > a
     -    $ ~/vc/git/git add a
     -    $ ~/vc/git/git commit -mx
     -    fatal: not a git repository (or any of the parent directories): .git
     -    [master (root-commit) 373d969] x
     -     1 file changed, 1 insertion(+)
     -     create mode 100644 a
     -    $ ~/vc/git/git show-ref
     -    373d96972fca9b63595740bba3898a762778ba20 HEAD
     -    373d96972fca9b63595740bba3898a762778ba20 refs/heads/master
     -    $ ls -l .git/reftable/
     -    total 12
     -    -rw------- 1 hanwen primarygroup  126 Jan 23 20:08 000000000001-000000000001.ref
     -    -rw------- 1 hanwen primarygroup 4277 Jan 23 20:08 000000000002-000000000002.ref
     -    $ go run ~/vc/reftable/cmd/dump.go  -table /tmp/qz/.git/reftable/000000000002-000000000002.ref
     -    ** DEBUG **
     -    name /tmp/qz/.git/reftable/000000000002-000000000002.ref, sz 4209: 'r' reftable.readerOffsets{Present:true, Offset:0x0, IndexOffset:0x0}, 'o' reftable.readerOffsets{Present:false, Offset:0x0, IndexOffset:0x0} 'g' reftable.readerOffsets{Present:true, Offset:0x1000, IndexOffset:0x0}
     -    ** REFS **
     -    reftable.RefRecord{RefName:"refs/heads/master", UpdateIndex:0x2, Value:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, TargetValue:[]uint8(nil), Target:""}
     -    ** LOGS **
     -    reftable.LogRecord{RefName:"HEAD", UpdateIndex:0x2, New:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, Old:[]uint8{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, Name:"Han-Wen Nienhuys", Email:"hanwen@google.com", Time:0x5e29ef27, TZOffset:100, Message:"commit (initial): x\n"}
     +    For background, see the previous commit introducing the library.
      
          TODO:
     -     * resolve the design problem with reflog expiry.
     +
     +     * Resolve the design problem with reflog expiry.
     +     * Resolve spots marked with XXX
     +
     +    Example use:
     +
     +      $ ~/vc/git/git init
     +      warning: templates not found in /usr/local/google/home/hanwen/share/git-core/templates
     +      Initialized empty Git repository in /tmp/qz/.git/
     +      $ echo q > a
     +      $ ~/vc/git/git add a
     +      $ ~/vc/git/git commit -mx
     +      fatal: not a git repository (or any of the parent directories): .git
     +      [master (root-commit) 373d969] x
     +       1 file changed, 1 insertion(+)
     +       create mode 100644 a
     +      $ ~/vc/git/git show-ref
     +      373d96972fca9b63595740bba3898a762778ba20 HEAD
     +      373d96972fca9b63595740bba3898a762778ba20 refs/heads/master
     +      $ ls -l .git/reftable/
     +      total 12
     +      -rw------- 1 hanwen primarygroup  126 Jan 23 20:08 000000000001-000000000001.ref
     +      -rw------- 1 hanwen primarygroup 4277 Jan 23 20:08 000000000002-000000000002.ref
     +      $ go run ~/vc/reftable/cmd/dump.go  -table /tmp/qz/.git/reftable/000000000002-000000000002.ref
     +      ** DEBUG **
     +      name /tmp/qz/.git/reftable/000000000002-000000000002.ref, sz 4209: 'r' reftable.readerOffsets{Present:true, Offset:0x0, IndexOffset:0x0}, 'o' reftable.readerOffsets{Present:false, Offset:0x0, IndexOffset:0x0} 'g' reftable.readerOffsets{Present:true, Offset:0x1000, IndexOffset:0x0}
     +      ** REFS **
     +      reftable.RefRecord{RefName:"refs/heads/master", UpdateIndex:0x2, Value:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, TargetValue:[]uint8(nil), Target:""}
     +      ** LOGS **
     +      reftable.LogRecord{RefName:"HEAD", UpdateIndex:0x2, New:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, Old:[]uint8{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, Name:"Han-Wen Nienhuys", Email:"hanwen@google.com", Time:0x5e29ef27, TZOffset:100, Message:"commit (initial): x\n"}
      
          Change-Id: I225ee6317b7911edf9aa95f43299f6c7c4511914
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
     @@ -120,8 +126,8 @@ $^
      -	struct ref_storage_be *be = find_ref_storage_backend(be_name);
      +	struct strbuf refs_path = STRBUF_INIT;
      +
     -+        // XXX this should probably come from a git config setting and not
     -+        // default to reftable.
     ++        /* XXX this should probably come from a git config setting and not
     ++           default to reftable. */
      +	const char *be_name = "reftable";
      +	struct ref_storage_be *be;
       	struct ref_store *refs;
     @@ -146,7 +152,7 @@ $^
       				     each_reflog_ent_fn fn,
       				     void *cb_data);
      +
     -+// XXX which ordering are these? Newest or oldest first?
     ++/* XXX which ordering are these? Newest or oldest first? */
       int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
       int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
       
     @@ -190,40 +196,45 @@ $^
      +	struct stack *stack;
      +};
      +
     -+static void clear_log_record(struct log_record* log) {
     -+        log->old_hash = NULL;
     -+        log->new_hash = NULL;
     -+        log->message = NULL;
     -+        log->ref_name = NULL;
     -+        log_record_clear(log);
     -+}
     -+
     -+static void fill_log_record(struct log_record* log) {
     -+        const char* info = git_committer_info(0);
     -+        struct ident_split split = {};
     -+        int result = split_ident_line(&split, info, strlen(info));
     -+        int sign = 1;
     -+        assert(0==result);
     -+
     -+        log_record_clear(log);
     -+        log->name = xstrndup(split.name_begin, split.name_end - split.name_begin);
     -+        log->email = xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
     -+        log->time = atol(split.date_begin);
     -+        if (*split.tz_begin == '-') {
     -+                sign = -1;
     -+                split.tz_begin ++;
     -+        }
     -+        if (*split.tz_begin == '+') {
     -+                sign = 1;
     -+                split.tz_begin ++;
     -+        }
     -+
     -+        log->tz_offset = sign * atoi(split.tz_begin);
     ++static void clear_log_record(struct log_record *log)
     ++{
     ++	log->old_hash = NULL;
     ++	log->new_hash = NULL;
     ++	log->message = NULL;
     ++	log->ref_name = NULL;
     ++	log_record_clear(log);
     ++}
     ++
     ++static void fill_log_record(struct log_record *log)
     ++{
     ++	const char *info = git_committer_info(0);
     ++	struct ident_split split = {};
     ++	int result = split_ident_line(&split, info, strlen(info));
     ++	int sign = 1;
     ++	assert(0 == result);
     ++
     ++	log_record_clear(log);
     ++	log->name =
     ++		xstrndup(split.name_begin, split.name_end - split.name_begin);
     ++	log->email =
     ++		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
     ++	log->time = atol(split.date_begin);
     ++	if (*split.tz_begin == '-') {
     ++		sign = -1;
     ++		split.tz_begin++;
     ++	}
     ++	if (*split.tz_begin == '+') {
     ++		sign = 1;
     ++		split.tz_begin++;
     ++	}
     ++
     ++	log->tz_offset = sign * atoi(split.tz_begin);
      +}
      +
      +static struct ref_store *reftable_ref_store_create(const char *path,
     -+                                                   unsigned int store_flags) {
     -+  	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
     ++						   unsigned int store_flags)
     ++{
     ++	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
      +	struct ref_store *ref_store = (struct ref_store *)refs;
      +	struct write_options cfg = {
      +		.block_size = 4096,
     @@ -240,17 +251,19 @@ $^
      +	refs->reftable_dir = xstrdup(sb.buf);
      +	strbuf_release(&sb);
      +
     -+	refs->err = new_stack(&refs->stack, refs->reftable_dir, refs->table_list_file, cfg);
     ++	refs->err = new_stack(&refs->stack, refs->reftable_dir,
     ++			      refs->table_list_file, cfg);
      +
      +	return ref_store;
      +}
      +
      +static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
      +{
     -+	struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)ref_store;
      +	FILE *f = fopen(refs->table_list_file, "a");
      +	if (f == NULL) {
     -+	  return -1;
     ++		return -1;
      +	}
      +	fclose(f);
      +
     @@ -258,56 +271,58 @@ $^
      +	return 0;
      +}
      +
     -+
      +struct reftable_iterator {
      +	struct ref_iterator base;
      +	struct iterator iter;
      +	struct ref_record ref;
      +	struct object_id oid;
     -+        struct ref_store *ref_store;
     ++	struct ref_store *ref_store;
      +	char *prefix;
      +};
      +
      +static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
      +{
     -+        while (1) {
     -+                struct reftable_iterator *ri = (struct reftable_iterator*) ref_iterator;
     -+                int err = iterator_next_ref(ri->iter, &ri->ref);
     -+                if (err > 0) {
     -+                        return ITER_DONE;
     -+                }
     -+                if (err < 0) {
     -+                        return ITER_ERROR;
     -+                }
     -+
     -+                ri->base.refname = ri->ref.ref_name;
     -+                if (ri->prefix != NULL && 0 != strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
     -+                        return ITER_DONE;
     -+                }
     -+
     -+                ri->base.flags = 0;
     -+                if (ri->ref.value != NULL) {
     -+                        memcpy(ri->oid.hash, ri->ref.value, GIT_SHA1_RAWSZ);
     -+                } else if (ri->ref.target != NULL) {
     -+                        int out_flags = 0;
     -+                        const char *resolved =
     -+                                refs_resolve_ref_unsafe(ri->ref_store, ri->ref.ref_name, RESOLVE_REF_READING,
     -+                                                        &ri->oid, &out_flags);
     -+                        ri->base.flags = out_flags;
     -+                        if (resolved == NULL && !(ri->base.flags & DO_FOR_EACH_INCLUDE_BROKEN)
     -+                            && (ri->base.flags & REF_ISBROKEN)) {
     -+                                continue;
     -+                        }
     -+                }
     -+                ri->base.oid  = &ri->oid;
     -+                return ITER_OK;
     -+        }
     ++	while (1) {
     ++		struct reftable_iterator *ri =
     ++			(struct reftable_iterator *)ref_iterator;
     ++		int err = iterator_next_ref(ri->iter, &ri->ref);
     ++		if (err > 0) {
     ++			return ITER_DONE;
     ++		}
     ++		if (err < 0) {
     ++			return ITER_ERROR;
     ++		}
     ++
     ++		ri->base.refname = ri->ref.ref_name;
     ++		if (ri->prefix != NULL &&
     ++		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
     ++			return ITER_DONE;
     ++		}
     ++
     ++		ri->base.flags = 0;
     ++		if (ri->ref.value != NULL) {
     ++			memcpy(ri->oid.hash, ri->ref.value, GIT_SHA1_RAWSZ);
     ++		} else if (ri->ref.target != NULL) {
     ++			int out_flags = 0;
     ++			const char *resolved = refs_resolve_ref_unsafe(
     ++				ri->ref_store, ri->ref.ref_name,
     ++				RESOLVE_REF_READING, &ri->oid, &out_flags);
     ++			ri->base.flags = out_flags;
     ++			if (resolved == NULL &&
     ++			    !(ri->base.flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
     ++			    (ri->base.flags & REF_ISBROKEN)) {
     ++				continue;
     ++			}
     ++		}
     ++		ri->base.oid = &ri->oid;
     ++		return ITER_OK;
     ++	}
      +}
      +
      +static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
      +				      struct object_id *peeled)
      +{
     -+	struct reftable_iterator *ri = (struct reftable_iterator*) ref_iterator;
     ++	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
      +	if (ri->ref.target_value != NULL) {
      +		memcpy(peeled->hash, ri->ref.target_value, GIT_SHA1_RAWSZ);
      +		return 0;
     @@ -318,40 +333,40 @@ $^
      +
      +static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
      +{
     -+	struct reftable_iterator *ri = (struct reftable_iterator*) ref_iterator;
     ++	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
      +	ref_record_clear(&ri->ref);
      +	iterator_destroy(&ri->iter);
      +	return 0;
      +}
      +
      +static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
     -+	reftable_ref_iterator_advance,
     -+	reftable_ref_iterator_peel,
     ++	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
      +	reftable_ref_iterator_abort
      +};
      +
     -+static struct ref_iterator *reftable_ref_iterator_begin(
     -+		struct ref_store *ref_store,
     -+		const char *prefix, unsigned int flags)
     ++static struct ref_iterator *
     ++reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
     ++			    unsigned int flags)
      +{
     -+	struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)ref_store;
      +	struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
      +	struct merged_table *mt = NULL;
      +	int err = 0;
      +	if (refs->err) {
     -+		// XXX ?
     ++		/* how to propagate errors? */
      +		return NULL;
      +	}
      +
      +	mt = stack_merged_table(refs->stack);
      +
     -+	// XXX something with flags?
     ++	/* XXX something with flags? */
      +	err = merged_table_seek_ref(mt, &ri->iter, prefix);
     -+	// XXX what to do with err?
     ++	/* XXX what to do with err? */
      +	assert(err == 0);
      +	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
      +	ri->base.oid = &ri->oid;
     -+        ri->ref_store = ref_store;
     ++	ri->ref_store = ref_store;
      +	return &ri->base;
      +}
      +
     @@ -363,92 +378,100 @@ $^
      +}
      +
      +static int reftable_transaction_abort(struct ref_store *ref_store,
     -+				    struct ref_transaction *transaction,
     -+				    struct strbuf *err)
     ++				      struct ref_transaction *transaction,
     ++				      struct strbuf *err)
      +{
     -+	struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)ref_store;
      +	(void)refs;
      +	return 0;
      +}
      +
     -+
     -+static int ref_update_cmp(const void *a, const void *b) {
     -+	return strcmp(((struct ref_update*)a)->refname, ((struct ref_update*)b)->refname);
     ++static int ref_update_cmp(const void *a, const void *b)
     ++{
     ++	return strcmp(((struct ref_update *)a)->refname,
     ++		      ((struct ref_update *)b)->refname);
      +}
      +
     -+static int reftable_check_old_oid(struct ref_store *refs, const char *refname, struct object_id *want_oid) {
     -+        struct object_id out_oid = {};
     -+        int out_flags = 0;
     -+        const char *resolved =
     -+                refs_resolve_ref_unsafe(refs, refname, RESOLVE_REF_READING,
     -+                                        &out_oid, &out_flags);
     -+        if (is_null_oid(want_oid) != (resolved == NULL)) {
     -+                return LOCK_ERROR;
     -+        }
     ++static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
     ++				  struct object_id *want_oid)
     ++{
     ++	struct object_id out_oid = {};
     ++	int out_flags = 0;
     ++	const char *resolved = refs_resolve_ref_unsafe(
     ++		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
     ++	if (is_null_oid(want_oid) != (resolved == NULL)) {
     ++		return LOCK_ERROR;
     ++	}
      +
     -+        if (resolved != NULL && !oideq(&out_oid, want_oid)) {
     -+                return LOCK_ERROR;
     -+        }
     ++	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
     ++		return LOCK_ERROR;
     ++	}
      +
     -+        return 0;
     ++	return 0;
      +}
      +
     -+static int write_transaction_table(struct writer *writer, void *arg) {
     ++static int write_transaction_table(struct writer *writer, void *arg)
     ++{
      +	struct ref_transaction *transaction = (struct ref_transaction *)arg;
     -+	struct reftable_ref_store *refs
     -+		= (struct reftable_ref_store*)transaction->ref_store;
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)transaction->ref_store;
      +	uint64_t ts = stack_next_update_index(refs->stack);
      +	int err = 0;
     -+	// XXX - are we allowed to mutate the input data?
     -+	qsort(transaction->updates, transaction->nr, sizeof(struct ref_update*),
     -+	      ref_update_cmp);
     ++	/* XXX - are we allowed to mutate the input data? */
     ++	qsort(transaction->updates, transaction->nr,
     ++	      sizeof(struct ref_update *), ref_update_cmp);
      +	writer_set_limits(writer, ts, ts);
      +
     -+	for (int i = 0; i <  transaction->nr; i++) {
     -+		struct ref_update * u = transaction->updates[i];
     ++	for (int i = 0; i < transaction->nr; i++) {
     ++		struct ref_update *u = transaction->updates[i];
      +		if (u->flags & REF_HAVE_OLD) {
     -+			err = reftable_check_old_oid(transaction->ref_store, u->refname, &u->old_oid);
     ++			err = reftable_check_old_oid(transaction->ref_store,
     ++						     u->refname, &u->old_oid);
      +			if (err < 0) {
      +				goto exit;
      +			}
      +		}
      +	}
      +
     -+	for (int i = 0; i <  transaction->nr; i++) {
     -+		struct ref_update * u = transaction->updates[i];
     ++	for (int i = 0; i < transaction->nr; i++) {
     ++		struct ref_update *u = transaction->updates[i];
      +		if (u->flags & REF_HAVE_NEW) {
     -+                        struct object_id out_oid = {};
     -+                        int out_flags = 0;
     -+                        // XXX who owns the memory here?
     -+                        const char *resolved
     -+                                = refs_resolve_ref_unsafe(transaction->ref_store, u->refname, 0, &out_oid, &out_flags);
     -+                        struct ref_record ref = {};
     -+                        ref.ref_name = (char*)(resolved ? resolved : u->refname);
     -+                        ref.value = u->new_oid.hash;
     -+                        ref.update_index = ts;
     -+                        err = writer_add_ref(writer, &ref);
     -+                        if (err < 0) {
     -+                                goto exit;
     -+                        }
     ++			struct object_id out_oid = {};
     ++			int out_flags = 0;
     ++			/* XXX who owns the memory here? */
     ++			const char *resolved = refs_resolve_ref_unsafe(
     ++				transaction->ref_store, u->refname, 0, &out_oid,
     ++				&out_flags);
     ++			struct ref_record ref = {};
     ++			ref.ref_name =
     ++				(char *)(resolved ? resolved : u->refname);
     ++			ref.value = u->new_oid.hash;
     ++			ref.update_index = ts;
     ++			err = writer_add_ref(writer, &ref);
     ++			if (err < 0) {
     ++				goto exit;
     ++			}
      +		}
      +	}
      +
     ++	for (int i = 0; i < transaction->nr; i++) {
     ++		struct ref_update *u = transaction->updates[i];
     ++		struct log_record log = {};
     ++		fill_log_record(&log);
      +
     -+	for (int i = 0; i <  transaction->nr; i++) {
     -+		struct ref_update * u = transaction->updates[i];
     -+                struct log_record log = {};
     -+                fill_log_record(&log);
     ++		log.ref_name = (char *)u->refname;
     ++		log.old_hash = u->old_oid.hash;
     ++		log.new_hash = u->new_oid.hash;
     ++		log.update_index = ts;
     ++		log.message = u->msg;
      +
     -+                log.ref_name = (char*)u->refname;
     -+                log.old_hash = u->old_oid.hash;
     -+                log.new_hash = u->new_oid.hash;
     -+                log.update_index = ts;
     -+                log.message = u->msg;
     -+
     -+                err = writer_add_log(writer, &log);
     -+                clear_log_record(&log);
     -+                if (err < 0) { return err; }
     -+        }
     ++		err = writer_add_log(writer, &log);
     ++		clear_log_record(&log);
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++	}
      +exit:
      +	return err;
      +}
     @@ -457,89 +480,91 @@ $^
      +				       struct ref_transaction *transaction,
      +				       struct strbuf *errmsg)
      +{
     -+	struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
     -+	int err = stack_add(refs->stack,
     -+		  &write_transaction_table,
     -+		  transaction);
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)ref_store;
     ++	int err = stack_add(refs->stack, &write_transaction_table, transaction);
      +	if (err < 0) {
     -+		strbuf_addf(errmsg, "reftable: transaction failure %s", error_str(err));
     ++		strbuf_addf(errmsg, "reftable: transaction failure %s",
     ++			    error_str(err));
      +		return -1;
     -+ 	}
     ++	}
      +
      +	return 0;
      +}
      +
     -+
      +static int reftable_transaction_finish(struct ref_store *ref_store,
     -+				     struct ref_transaction *transaction,
     -+				     struct strbuf *err)
     ++				       struct ref_transaction *transaction,
     ++				       struct strbuf *err)
      +{
     -+        return reftable_transaction_commit(ref_store, transaction, err);
     ++	return reftable_transaction_commit(ref_store, transaction, err);
      +}
      +
     -+
     -+
      +struct write_delete_refs_arg {
      +	struct stack *stack;
      +	struct string_list *refnames;
      +	const char *logmsg;
     -+        unsigned int flags;
     ++	unsigned int flags;
      +};
      +
     -+static int write_delete_refs_table(struct writer *writer, void *argv) {
     -+	struct write_delete_refs_arg *arg = (struct write_delete_refs_arg*)argv;
     ++static int write_delete_refs_table(struct writer *writer, void *argv)
     ++{
     ++	struct write_delete_refs_arg *arg =
     ++		(struct write_delete_refs_arg *)argv;
      +	uint64_t ts = stack_next_update_index(arg->stack);
      +	int err = 0;
      +
     -+        writer_set_limits(writer, ts, ts);
     -+        for (int i = 0; i < arg->refnames->nr; i++) {
     -+                struct ref_record ref = {
     -+                        .ref_name = (char*) arg->refnames->items[i].string,
     -+                        .update_index = ts,
     -+                };
     -+                err = writer_add_ref(writer, &ref);
     -+                if (err < 0) {
     -+                        return err;
     -+                }
     -+        }
     -+
     -+        for (int i = 0; i < arg->refnames->nr; i++) {
     -+                struct log_record log = {};
     -+                fill_log_record(&log);
     -+                log.message = xstrdup(arg->logmsg);
     -+                log.new_hash = NULL;
     -+
     -+                // XXX should lookup old oid.
     -+                log.old_hash = NULL;
     -+                log.update_index = ts;
     -+                log.ref_name = (char*) arg->refnames->items[i].string;
     -+
     -+                err = writer_add_log(writer, &log);
     -+                clear_log_record(&log);
     -+                if (err < 0) {
     -+                        return err;
     -+                }
     -+        }
     ++	writer_set_limits(writer, ts, ts);
     ++	for (int i = 0; i < arg->refnames->nr; i++) {
     ++		struct ref_record ref = {
     ++			.ref_name = (char *)arg->refnames->items[i].string,
     ++			.update_index = ts,
     ++		};
     ++		err = writer_add_ref(writer, &ref);
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++	}
     ++
     ++	for (int i = 0; i < arg->refnames->nr; i++) {
     ++		struct log_record log = {};
     ++		fill_log_record(&log);
     ++		log.message = xstrdup(arg->logmsg);
     ++		log.new_hash = NULL;
     ++
     ++		/* XXX should lookup old oid. */
     ++		log.old_hash = NULL;
     ++		log.update_index = ts;
     ++		log.ref_name = (char *)arg->refnames->items[i].string;
     ++
     ++		err = writer_add_log(writer, &log);
     ++		clear_log_record(&log);
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++	}
      +	return 0;
      +}
      +
      +static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
     -+                                struct string_list *refnames, unsigned int flags)
     ++				struct string_list *refnames,
     ++				unsigned int flags)
      +{
     -+	struct reftable_ref_store *refs = (struct reftable_ref_store*)ref_store;
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)ref_store;
      +	struct write_delete_refs_arg arg = {
      +		.stack = refs->stack,
      +		.refnames = refnames,
      +		.logmsg = msg,
     -+                .flags = flags,
     ++		.flags = flags,
      +	};
      +	return stack_add(refs->stack, &write_delete_refs_table, &arg);
      +}
      +
      +static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
      +{
     -+	struct reftable_ref_store *refs = (struct reftable_ref_store*)ref_store;
     -+        // XXX reflog expiry.
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)ref_store;
     ++	/* XXX reflog expiry. */
      +	return stack_compact_all(refs->stack, NULL);
      +}
      +
     @@ -550,14 +575,16 @@ $^
      +	const char *logmsg;
      +};
      +
     -+static int write_create_symref_table(struct writer *writer, void *arg) {
     -+	struct write_create_symref_arg *create = (struct write_create_symref_arg*)arg;
     ++static int write_create_symref_table(struct writer *writer, void *arg)
     ++{
     ++	struct write_create_symref_arg *create =
     ++		(struct write_create_symref_arg *)arg;
      +	uint64_t ts = stack_next_update_index(create->stack);
      +	int err = 0;
      +
      +	struct ref_record ref = {
     -+		.ref_name = (char*) create->refname,
     -+		.target = (char*) create->target,
     ++		.ref_name = (char *)create->refname,
     ++		.target = (char *)create->target,
      +		.update_index = ts,
      +	};
      +	writer_set_limits(writer, ts, ts);
     @@ -566,7 +593,7 @@ $^
      +		return err;
      +	}
      +
     -+	// XXX reflog?
     ++	/* XXX reflog? */
      +
      +	return 0;
      +}
     @@ -575,17 +602,15 @@ $^
      +				  const char *refname, const char *target,
      +				  const char *logmsg)
      +{
     -+	struct reftable_ref_store *refs = (struct reftable_ref_store*)ref_store;
     -+	struct write_create_symref_arg arg = {
     -+		.stack = refs->stack,
     -+		.refname = refname,
     -+		.target = target,
     -+		.logmsg = logmsg
     -+	};
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)ref_store;
     ++	struct write_create_symref_arg arg = { .stack = refs->stack,
     ++					       .refname = refname,
     ++					       .target = target,
     ++					       .logmsg = logmsg };
      +	return stack_add(refs->stack, &write_create_symref_table, &arg);
      +}
      +
     -+
      +struct write_rename_arg {
      +	struct stack *stack;
      +	const char *oldname;
     @@ -593,8 +618,9 @@ $^
      +	const char *logmsg;
      +};
      +
     -+static int write_rename_table(struct writer *writer, void *argv) {
     -+	struct write_rename_arg *arg = (struct write_rename_arg*)argv;
     ++static int write_rename_table(struct writer *writer, void *argv)
     ++{
     ++	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
      +	uint64_t ts = stack_next_update_index(arg->stack);
      +	struct ref_record ref = {};
      +	int err = stack_read_ref(arg->stack, arg->oldname, &ref);
     @@ -602,15 +628,15 @@ $^
      +		goto exit;
      +	}
      +
     -+	// XXX should check that dest doesn't exist?
     ++	/* XXX should check that dest doesn't exist? */
      +	free(ref.ref_name);
      +	ref.ref_name = strdup(arg->newname);
      +	writer_set_limits(writer, ts, ts);
      +	ref.update_index = ts;
      +
      +	{
     -+		struct ref_record todo[2]  = {};
     -+		todo[0].ref_name =  (char *) arg->oldname;
     ++		struct ref_record todo[2] = {};
     ++		todo[0].ref_name = (char *)arg->oldname;
      +		todo[0].update_index = ts;
      +		todo[1] = ref;
      +		todo[1].update_index = ts;
     @@ -621,47 +647,47 @@ $^
      +		}
      +	}
      +
     -+        if (ref.value != NULL) {
     -+                struct log_record todo[2] = {};
     -+                fill_log_record(&todo[0]);
     -+                fill_log_record(&todo[1]);
     ++	if (ref.value != NULL) {
     ++		struct log_record todo[2] = {};
     ++		fill_log_record(&todo[0]);
     ++		fill_log_record(&todo[1]);
      +
     -+                todo[0].ref_name = (char *) arg->oldname;
     -+                todo[0].update_index = ts;
     -+                todo[0].message = (char*)arg->logmsg;
     -+                todo[0].old_hash = ref.value;
     -+                todo[0].new_hash = NULL;
     ++		todo[0].ref_name = (char *)arg->oldname;
     ++		todo[0].update_index = ts;
     ++		todo[0].message = (char *)arg->logmsg;
     ++		todo[0].old_hash = ref.value;
     ++		todo[0].new_hash = NULL;
      +
     -+                todo[1].ref_name = (char *) arg->newname;
     -+                todo[1].update_index = ts;
     -+                todo[1].old_hash = NULL;
     -+                todo[1].new_hash = ref.value;
     -+                todo[1].message = (char*) arg->logmsg;
     ++		todo[1].ref_name = (char *)arg->newname;
     ++		todo[1].update_index = ts;
     ++		todo[1].old_hash = NULL;
     ++		todo[1].new_hash = ref.value;
     ++		todo[1].message = (char *)arg->logmsg;
      +
      +		err = writer_add_logs(writer, todo, 2);
      +
     -+                clear_log_record(&todo[0]);
     -+                clear_log_record(&todo[1]);
     ++		clear_log_record(&todo[0]);
     ++		clear_log_record(&todo[1]);
      +
      +		if (err < 0) {
      +			goto exit;
      +		}
      +
     -+        } else {
     -+                // symrefs?
     -+        }
     ++	} else {
     ++		/* XXX symrefs? */
     ++	}
      +
      +exit:
      +	ref_record_clear(&ref);
      +	return err;
      +}
      +
     -+
      +static int reftable_rename_ref(struct ref_store *ref_store,
     -+			    const char *oldrefname, const char *newrefname,
     -+			    const char *logmsg)
     ++			       const char *oldrefname, const char *newrefname,
     ++			       const char *logmsg)
      +{
     -+	struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)ref_store;
      +	struct write_rename_arg arg = {
      +		.stack = refs->stack,
      +		.oldname = oldrefname,
     @@ -672,237 +698,249 @@ $^
      +}
      +
      +static int reftable_copy_ref(struct ref_store *ref_store,
     -+			   const char *oldrefname, const char *newrefname,
     -+			   const char *logmsg)
     ++			     const char *oldrefname, const char *newrefname,
     ++			     const char *logmsg)
      +{
      +	BUG("reftable reference store does not support copying references");
      +}
      +
      +struct reftable_reflog_ref_iterator {
     -+        struct ref_iterator base;
     -+        struct iterator iter;
     -+        struct log_record log;
     -+        struct object_id oid;
     -+        char *last_name;
     ++	struct ref_iterator base;
     ++	struct iterator iter;
     ++	struct log_record log;
     ++	struct object_id oid;
     ++	char *last_name;
      +};
      +
     -+static int reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
     ++static int
     ++reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
      +{
     -+        struct reftable_reflog_ref_iterator *ri = (struct reftable_reflog_ref_iterator*) ref_iterator;
     ++	struct reftable_reflog_ref_iterator *ri =
     ++		(struct reftable_reflog_ref_iterator *)ref_iterator;
      +
     -+        while (1) {
     -+                int err = iterator_next_log(ri->iter, &ri->log);
     -+                if (err > 0) {
     -+                        return ITER_DONE;
     -+                }
     -+                if (err < 0) {
     -+                        return ITER_ERROR;
     -+                }
     ++	while (1) {
     ++		int err = iterator_next_log(ri->iter, &ri->log);
     ++		if (err > 0) {
     ++			return ITER_DONE;
     ++		}
     ++		if (err < 0) {
     ++			return ITER_ERROR;
     ++		}
      +
     -+                ri->base.refname = ri->log.ref_name;
     -+                if (ri->last_name != NULL && 0 == strcmp(ri->log.ref_name, ri->last_name)) {
     -+                        continue;
     -+                }
     ++		ri->base.refname = ri->log.ref_name;
     ++		if (ri->last_name != NULL &&
     ++		    !strcmp(ri->log.ref_name, ri->last_name)) {
     ++			continue;
     ++		}
      +
     -+                free(ri->last_name);
     -+                ri->last_name = xstrdup(ri->log.ref_name);
     -+                // XXX const?
     -+                memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ);
     -+                return ITER_OK;
     -+        }
     ++		free(ri->last_name);
     ++		ri->last_name = xstrdup(ri->log.ref_name);
     ++		/* XXX const? */
     ++		memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ);
     ++		return ITER_OK;
     ++	}
      +}
      +
      +static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
     -+                                             struct object_id *peeled)
     ++					     struct object_id *peeled)
      +{
     -+        BUG("not supported.");
     ++	BUG("not supported.");
      +	return -1;
      +}
      +
      +static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
      +{
     -+	struct reftable_reflog_ref_iterator *ri = (struct reftable_reflog_ref_iterator*) ref_iterator;
     ++	struct reftable_reflog_ref_iterator *ri =
     ++		(struct reftable_reflog_ref_iterator *)ref_iterator;
      +	log_record_clear(&ri->log);
      +	iterator_destroy(&ri->iter);
      +	return 0;
      +}
      +
      +static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
     -+	reftable_reflog_ref_iterator_advance,
     -+	reftable_reflog_ref_iterator_peel,
     ++	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
      +	reftable_reflog_ref_iterator_abort
      +};
      +
     -+static struct ref_iterator *reftable_reflog_iterator_begin(struct ref_store *ref_store)
     ++static struct ref_iterator *
     ++reftable_reflog_iterator_begin(struct ref_store *ref_store)
      +{
     -+        struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
     -+        struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
     ++	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)ref_store;
      +
      +	struct merged_table *mt = stack_merged_table(refs->stack);
     -+        int err = merged_table_seek_log(mt, &ri->iter, "");
     -+        if (err < 0) {
     -+                free(ri);
     -+                return NULL;
     -+        }
     ++	int err = merged_table_seek_log(mt, &ri->iter, "");
     ++	if (err < 0) {
     ++		free(ri);
     ++		return NULL;
     ++	}
      +
     -+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable, 1);
     ++	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
     ++			       1);
      +	ri->base.oid = &ri->oid;
      +
      +	return empty_ref_iterator_begin();
      +}
      +
     -+static int reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
     -+                                        const char *refname,
     -+                                        each_reflog_ent_fn fn, void *cb_data)
     -+{
     -+        struct iterator it = { };
     -+        struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
     -+        struct merged_table *mt = stack_merged_table(refs->stack);
     -+        int err = merged_table_seek_log(mt, &it, refname);
     -+        struct log_record log = {};
     -+
     -+        while (err == 0)  {
     -+                err = iterator_next_log(it, &log);
     -+                if (err != 0) {
     -+                        break;
     -+                }
     -+
     -+                if (0 != strcmp(log.ref_name, refname)) {
     -+                        break;
     -+                }
     -+
     -+                {
     -+                        struct object_id old_oid = {};
     -+                        struct object_id new_oid = {};
     -+
     -+                        memcpy(&old_oid.hash, log.old_hash, GIT_SHA1_RAWSZ);
     -+                        memcpy(&new_oid.hash, log.new_hash, GIT_SHA1_RAWSZ);
     -+
     -+                        // XXX committer = email? name?
     -+                        if (fn(&old_oid, &new_oid, log.name, log.time, log.tz_offset, log.message, cb_data)) {
     -+                                err = -1;
     -+                                break;
     -+                        }
     -+                }
     -+        }
     -+
     -+        log_record_clear(&log);
     -+        iterator_destroy(&it);
     -+        if (err > 0) {
     -+                err = 0;
     -+        }
     ++static int
     ++reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
     ++					  const char *refname,
     ++					  each_reflog_ent_fn fn, void *cb_data)
     ++{
     ++	struct iterator it = {};
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)ref_store;
     ++	struct merged_table *mt = stack_merged_table(refs->stack);
     ++	int err = merged_table_seek_log(mt, &it, refname);
     ++	struct log_record log = {};
     ++
     ++	while (err == 0) {
     ++		err = iterator_next_log(it, &log);
     ++		if (err != 0) {
     ++			break;
     ++		}
     ++
     ++		if (strcmp(log.ref_name, refname)) {
     ++			break;
     ++		}
     ++
     ++		{
     ++			struct object_id old_oid = {};
     ++			struct object_id new_oid = {};
     ++
     ++			memcpy(&old_oid.hash, log.old_hash, GIT_SHA1_RAWSZ);
     ++			memcpy(&new_oid.hash, log.new_hash, GIT_SHA1_RAWSZ);
     ++
     ++			/* XXX committer = email? name? */
     ++			if (fn(&old_oid, &new_oid, log.name, log.time,
     ++			       log.tz_offset, log.message, cb_data)) {
     ++				err = -1;
     ++				break;
     ++			}
     ++		}
     ++	}
     ++
     ++	log_record_clear(&log);
     ++	iterator_destroy(&it);
     ++	if (err > 0) {
     ++		err = 0;
     ++	}
      +	return err;
      +}
      +
     -+static int reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
     -+                                                     const char *refname,
     -+                                                     each_reflog_ent_fn fn,
     -+                                                     void *cb_data)
     -+{
     -+        struct iterator it = { };
     -+        struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
     -+        struct merged_table *mt = stack_merged_table(refs->stack);
     -+        int err = merged_table_seek_log(mt, &it, refname);
     -+
     -+        struct log_record *logs = NULL;
     -+        int cap = 0;
     -+        int len = 0;
     -+
     -+        printf("oldest first\n");
     -+        while (err == 0)  {
     -+                struct log_record log = {};
     -+                err = iterator_next_log(it, &log);
     -+                if (err != 0) {
     -+                        break;
     -+                }
     -+
     -+                if (0 != strcmp(log.ref_name, refname)) {
     -+                        break;
     -+                }
     -+
     -+                if (len == cap) {
     -+                        cap  =2*cap + 1;
     -+                        logs = realloc(logs, cap * sizeof(*logs));
     -+                }
     -+
     -+                logs[len++] = log;
     -+        }
     -+
     -+        for (int i = len; i--; ) {
     -+                struct log_record *log = &logs[i];
     -+                struct object_id old_oid = {};
     -+                struct object_id new_oid = {};
     -+
     -+                memcpy(&old_oid.hash, log->old_hash, GIT_SHA1_RAWSZ);
     -+                memcpy(&new_oid.hash, log->new_hash, GIT_SHA1_RAWSZ);
     -+
     -+                // XXX committer = email? name?
     -+                if (!fn(&old_oid, &new_oid, log->name, log->time, log->tz_offset, log->message, cb_data)) {
     -+                        err = -1;
     -+                        break;
     -+                }
     -+        }
     -+
     -+        for (int i = 0; i < len; i++) {
     -+                log_record_clear(&logs[i]);
     -+        }
     -+        free(logs);
     -+
     -+        iterator_destroy(&it);
     -+        if (err > 0) {
     -+                err = 0;
     -+        }
     ++static int
     ++reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
     ++					  const char *refname,
     ++					  each_reflog_ent_fn fn, void *cb_data)
     ++{
     ++	struct iterator it = {};
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)ref_store;
     ++	struct merged_table *mt = stack_merged_table(refs->stack);
     ++	int err = merged_table_seek_log(mt, &it, refname);
     ++
     ++	struct log_record *logs = NULL;
     ++	int cap = 0;
     ++	int len = 0;
     ++
     ++	printf("oldest first\n");
     ++	while (err == 0) {
     ++		struct log_record log = {};
     ++		err = iterator_next_log(it, &log);
     ++		if (err != 0) {
     ++			break;
     ++		}
     ++
     ++		if (strcmp(log.ref_name, refname)) {
     ++			break;
     ++		}
     ++
     ++		if (len == cap) {
     ++			cap = 2 * cap + 1;
     ++			logs = realloc(logs, cap * sizeof(*logs));
     ++		}
     ++
     ++		logs[len++] = log;
     ++	}
     ++
     ++	for (int i = len; i--;) {
     ++		struct log_record *log = &logs[i];
     ++		struct object_id old_oid = {};
     ++		struct object_id new_oid = {};
     ++
     ++		memcpy(&old_oid.hash, log->old_hash, GIT_SHA1_RAWSZ);
     ++		memcpy(&new_oid.hash, log->new_hash, GIT_SHA1_RAWSZ);
     ++
     ++		/* XXX committer = email? name? */
     ++		if (!fn(&old_oid, &new_oid, log->name, log->time,
     ++			log->tz_offset, log->message, cb_data)) {
     ++			err = -1;
     ++			break;
     ++		}
     ++	}
     ++
     ++	for (int i = 0; i < len; i++) {
     ++		log_record_clear(&logs[i]);
     ++	}
     ++	free(logs);
     ++
     ++	iterator_destroy(&it);
     ++	if (err > 0) {
     ++		err = 0;
     ++	}
      +	return err;
      +}
      +
      +static int reftable_reflog_exists(struct ref_store *ref_store,
     -+                                  const char *refname)
     ++				  const char *refname)
      +{
     -+        // always exists.
     ++	/* always exists. */
      +	return 1;
      +}
      +
      +static int reftable_create_reflog(struct ref_store *ref_store,
     -+                                  const char *refname, int force_create,
     -+                                  struct strbuf *err)
     ++				  const char *refname, int force_create,
     ++				  struct strbuf *err)
      +{
     -+        return 0;
     ++	return 0;
      +}
      +
      +static int reftable_delete_reflog(struct ref_store *ref_store,
     -+                                  const char *refname)
     ++				  const char *refname)
      +{
      +	return 0;
      +}
      +
      +static int reftable_reflog_expire(struct ref_store *ref_store,
     -+				const char *refname, const struct object_id *oid,
     -+				unsigned int flags,
     -+				reflog_expiry_prepare_fn prepare_fn,
     -+				reflog_expiry_should_prune_fn should_prune_fn,
     -+				reflog_expiry_cleanup_fn cleanup_fn,
     -+				void *policy_cb_data)
     -+{
     -+        /*
     -+          XXX
     -+
     -+          This doesn't fit with the reftable API. If we are expiring for space
     -+          reasons, the expiry should be combined with a compaction, and there
     -+          should be a policy that can be called for all refnames, not for a
     -+          single ref name.
     -+
     -+          If this is for cleaning up individual entries, we'll have to write
     -+          extra data to create tombstones.
     -+         */
     ++				  const char *refname,
     ++				  const struct object_id *oid,
     ++				  unsigned int flags,
     ++				  reflog_expiry_prepare_fn prepare_fn,
     ++				  reflog_expiry_should_prune_fn should_prune_fn,
     ++				  reflog_expiry_cleanup_fn cleanup_fn,
     ++				  void *policy_cb_data)
     ++{
     ++	/*
     ++	  XXX
     ++
     ++	  This doesn't fit with the reftable API. If we are expiring for space
     ++	  reasons, the expiry should be combined with a compaction, and there
     ++	  should be a policy that can be called for all refnames, not for a
     ++	  single ref name.
     ++
     ++	  If this is for cleaning up individual entries, we'll have to write
     ++	  extra data to create tombstones.
     ++	 */
      +	return 0;
      +}
      +
     -+
      +static int reftable_read_raw_ref(struct ref_store *ref_store,
      +				 const char *refname, struct object_id *oid,
      +				 struct strbuf *referent, unsigned int *type)
      +{
     -+	struct reftable_ref_store *refs = (struct reftable_ref_store*) ref_store;
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)ref_store;
      +	struct ref_record ref = {};
      +	int err = stack_read_ref(refs->stack, refname, &ref);
      +	if (err) {
     @@ -912,7 +950,7 @@ $^
      +		strbuf_reset(referent);
      +		strbuf_addstr(referent, ref.target);
      +		*type |= REF_ISSYMREF;
     -+ 	} else  {
     ++	} else {
      +		memcpy(oid->hash, ref.value, GIT_SHA1_RAWSZ);
      +	}
      +

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v2 1/5] setup.c: enable repo detection for reftable
  2020-01-27 14:22 ` [PATCH v2 " Han-Wen Nienhuys via GitGitGadget
@ 2020-01-27 14:22   ` Han-Wen Nienhuys via GitGitGadget
  2020-01-27 14:22   ` [PATCH v2 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-01-27 14:22 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* only check R_OK for .git/refs.

  In the reftable format, tables are stored under $GITDIR/reftable/ and
  the list of tables is managed in a file $GITDIR/refs. Generally, this
  file does not have the +x bit set

* allow missing HEAD if there is a reftable/

Change-Id: I5d22317a15a84c8529aa503ae357a4afba247fe9
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 setup.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/setup.c b/setup.c
index e2a479a64f..e7c7dca701 100644
--- a/setup.c
+++ b/setup.c
@@ -304,22 +304,23 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
  *  - either an objects/ directory _or_ the proper
  *    GIT_OBJECT_DIRECTORY environment variable
  *  - a refs/ directory
- *  - either a HEAD symlink or a HEAD file that is formatted as
- *    a proper "ref:", or a regular file HEAD that has a properly
- *    formatted sha1 object name.
+ *  - a reftable/ director or, either a HEAD ref.
+ *    The HEAD ref should be a HEAD symlink or a HEAD file that is formatted as
+ *    a proper "ref:", or a regular file HEAD that has a properly formatted sha1
+ *    object name.
  */
 int is_git_directory(const char *suspect)
 {
 	struct strbuf path = STRBUF_INIT;
 	int ret = 0;
+        int have_head = 0;
 	size_t len;
 
 	/* Check worktree-related signatures */
 	strbuf_addstr(&path, suspect);
 	strbuf_complete(&path, '/');
 	strbuf_addstr(&path, "HEAD");
-	if (validate_headref(path.buf))
-		goto done;
+	have_head = !validate_headref(path.buf);
 
 	strbuf_reset(&path);
 	get_common_dir(&path, suspect);
@@ -339,9 +340,16 @@ int is_git_directory(const char *suspect)
 
 	strbuf_setlen(&path, len);
 	strbuf_addstr(&path, "/refs");
-	if (access(path.buf, X_OK))
+	if (access(path.buf, R_OK))
 		goto done;
 
+        if (!have_head) {
+                strbuf_setlen(&path, len);
+                strbuf_addstr(&path, "/reftable");
+                if (access(path.buf, X_OK))
+                        goto done;
+        }
+
 	ret = 1;
 done:
 	strbuf_release(&path);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v2 2/5] create .git/refs in files-backend.c
  2020-01-27 14:22 ` [PATCH v2 " Han-Wen Nienhuys via GitGitGadget
  2020-01-27 14:22   ` [PATCH v2 1/5] setup.c: enable repo detection for reftable Han-Wen Nienhuys via GitGitGadget
@ 2020-01-27 14:22   ` Han-Wen Nienhuys via GitGitGadget
  2020-01-27 22:28     ` Junio C Hamano
  2020-01-27 14:22   ` [PATCH v2 3/5] Document how ref iterators and symrefs interact Han-Wen Nienhuys via GitGitGadget
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-01-27 14:22 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This prepares for supporting the reftable format, which creates a file
in that place.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: I2fc47c89f5ec605734007ceff90321c02474aa92
---
 builtin/init-db.c    | 2 --
 refs/files-backend.c | 4 ++++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 944ec77fe1..45bdea0589 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
 	 */
-	safe_create_dir(git_path("refs"), 1);
-	adjust_shared_perm(git_path("refs"));
 
 	if (refs_init_db(&err))
 		die("failed to set up refs db: %s", err.buf);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0ea66a28b6..f49b6f2ab6 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3158,6 +3158,10 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
 	struct strbuf sb = STRBUF_INIT;
 
+	files_ref_path(refs, &sb, "refs");
+	safe_create_dir(sb.buf, 1);
+ 	// XXX adjust_shared_perm ?
+
 	/*
 	 * Create .git/refs/{heads,tags}
 	 */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v2 3/5] Document how ref iterators and symrefs interact
  2020-01-27 14:22 ` [PATCH v2 " Han-Wen Nienhuys via GitGitGadget
  2020-01-27 14:22   ` [PATCH v2 1/5] setup.c: enable repo detection for reftable Han-Wen Nienhuys via GitGitGadget
  2020-01-27 14:22   ` [PATCH v2 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
@ 2020-01-27 14:22   ` Han-Wen Nienhuys via GitGitGadget
  2020-01-27 22:53     ` Junio C Hamano
  2020-01-27 14:22   ` [PATCH v2 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-01-27 14:22 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Change-Id: Ie3ee63c52254c000ef712986246ca28f312b4301
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index ff2436c0fb..fc18b12340 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -269,6 +269,9 @@ int refs_rename_ref_available(struct ref_store *refs,
  * to the next entry, ref_iterator_advance() aborts the iteration,
  * frees the ref_iterator, and returns ITER_ERROR.
  *
+ * Ref iterators cannot return symref targets, so symbolic refs must be
+ * dereferenced during the iteration.
+ *
  * The reference currently being looked at can be peeled by calling
  * ref_iterator_peel(). This function is often faster than peel_ref(),
  * so it should be preferred when iterating over references.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v2 4/5] Add reftable library
  2020-01-27 14:22 ` [PATCH v2 " Han-Wen Nienhuys via GitGitGadget
                     ` (2 preceding siblings ...)
  2020-01-27 14:22   ` [PATCH v2 3/5] Document how ref iterators and symrefs interact Han-Wen Nienhuys via GitGitGadget
@ 2020-01-27 14:22   ` Han-Wen Nienhuys via GitGitGadget
  2020-01-27 14:22   ` [PATCH v2 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
  2020-02-04 20:27   ` [PATCH v3 0/6] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  5 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-01-27 14:22 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a new format for storing the ref database. It provides the
following benefits:

 * Simple and fast atomic ref transactions, including multiple refs and reflogs.
 * Compact storage of ref data.
 * Fast look ups of ref data.
 * Case-sensitive ref names on Windows/OSX, regardless of file system
 * Eliminates file/directory conflicts in ref names

Further context and motivation can be found in background reading:

* Spec: https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md

* Original discussion on JGit-dev:  https://www.eclipse.org/lists/jgit-dev/msg03389.html

* First design discussion on git@vger: https://public-inbox.org/git/CAJo=hJtTp2eA3z9wW9cHo-nA7kK40vVThqh6inXpbCcqfdMP9g@mail.gmail.com/

* Last design discussion on git@vger: https://public-inbox.org/git/CAJo=hJsZcAM9sipdVr7TMD-FD2V2W6_pvMQ791EGCDsDkQ033w@mail.gmail.com/

* First attempt at implementation: https://public-inbox.org/git/CAP8UFD0PPZSjBnxCA7ez91vBuatcHKQ+JUWvTD1iHcXzPBjPBg@mail.gmail.com/

* libgit2 support issue: https://github.com/libgit2/libgit2/issues

* GitLab support issue: https://gitlab.com/gitlab-org/git/issues/6

* go-git support issue: https://github.com/src-d/go-git/issues/1059

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Change-Id: Id396ff42be8b42b9e11f194a32e2f95b8250c109
---
 reftable/LICENSE          |   31 ++
 reftable/README.md        |   17 +
 reftable/VERSION          |    5 +
 reftable/basics.c         |  196 +++++++
 reftable/basics.h         |   38 ++
 reftable/block.c          |  401 ++++++++++++++
 reftable/block.h          |   71 +++
 reftable/block_test.c     |  151 +++++
 reftable/blocksource.h    |   20 +
 reftable/bytes.c          |    0
 reftable/config.h         |    1 +
 reftable/constants.h      |   27 +
 reftable/dump.c           |   97 ++++
 reftable/file.c           |   97 ++++
 reftable/iter.c           |  230 ++++++++
 reftable/iter.h           |   56 ++
 reftable/merged.c         |  288 ++++++++++
 reftable/merged.h         |   34 ++
 reftable/merged_test.c    |  258 +++++++++
 reftable/pq.c             |  124 +++++
 reftable/pq.h             |   34 ++
 reftable/reader.c         |  710 ++++++++++++++++++++++++
 reftable/reader.h         |   52 ++
 reftable/record.c         | 1110 +++++++++++++++++++++++++++++++++++++
 reftable/record.h         |   79 +++
 reftable/record_test.c    |  332 +++++++++++
 reftable/reftable.h       |  399 +++++++++++++
 reftable/reftable_test.c  |  481 ++++++++++++++++
 reftable/slice.c          |  199 +++++++
 reftable/slice.h          |   39 ++
 reftable/slice_test.c     |   38 ++
 reftable/stack.c          |  985 ++++++++++++++++++++++++++++++++
 reftable/stack.h          |   40 ++
 reftable/stack_test.c     |  281 ++++++++++
 reftable/system.h         |   36 ++
 reftable/test_framework.c |   67 +++
 reftable/test_framework.h |   64 +++
 reftable/tree.c           |   66 +++
 reftable/tree.h           |   24 +
 reftable/tree_test.c      |   61 ++
 reftable/writer.c         |  624 +++++++++++++++++++++
 reftable/writer.h         |   46 ++
 42 files changed, 7909 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/slice_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/system.h
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 0000000000..402e0f9356
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 0000000000..1c52d55833
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,17 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   ((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
+    git clone https://github.com/google/reftable reftable-repo) && \
+   cp reftable-repo/c/*.[ch] reftable/ && \
+   cp reftable-repo/LICENSE reftable/ &&
+   git --git-dir reftable-repo/.git show --no-patch origin/master \
+    > reftable/VERSION && \
+   echo '/* empty */' > reftable/config.h
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 0000000000..627f9c49f6
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,5 @@
+commit c616d53b88657c3a5fe4d2e7243a48effc34c626
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Mon Jan 27 15:05:43 2020 +0100
+
+    C: ban // comments
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 0000000000..791dcc867a
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,196 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_u24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)((i)&0xff);
+}
+
+uint32_t get_u24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_u32(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 24) & 0xff);
+	out[1] = (byte)((i >> 16) & 0xff);
+	out[2] = (byte)((i >> 8) & 0xff);
+	out[3] = (byte)((i)&0xff);
+}
+
+uint32_t get_u32(byte *in)
+{
+	return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
+	       (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
+}
+
+void put_u64(byte *out, uint64_t v)
+{
+	int i = 0;
+	for (i = sizeof(uint64_t); i--;) {
+		out[i] = (byte)(v & 0xff);
+		v >>= 8;
+	}
+}
+
+uint64_t get_u64(byte *out)
+{
+	uint64_t v = 0;
+	int i = 0;
+	for (i = 0; i < sizeof(uint64_t); i++) {
+		v = (v << 8) | (byte)(out[i] & 0xff);
+	}
+	return v;
+}
+
+void put_u16(byte *out, uint16_t i)
+{
+	out[0] = (byte)((i >> 8) & 0xff);
+	out[1] = (byte)((i)&0xff);
+}
+
+uint16_t get_u16(byte *in)
+{
+	return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
+}
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(int sz, int (*f)(int k, void *args), void *args)
+{
+	int lo = 0;
+	int hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		int mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		free(*p);
+		p++;
+	}
+	free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	for (char **p = names; *p; p++) {
+		len++;
+	}
+	return len;
+}
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = realloc(names,
+						names_cap * sizeof(char *));
+			}
+			names[names_len++] = strdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *error_str(int err)
+{
+	switch (err) {
+	case IO_ERROR:
+		return "I/O error";
+	case FORMAT_ERROR:
+		return "FORMAT_ERROR";
+	case NOT_EXIST_ERROR:
+		return "NOT_EXIST_ERROR";
+	case LOCK_ERROR:
+		return "LOCK_ERROR";
+	case API_ERROR:
+		return "API_ERROR";
+	case ZLIB_ERROR:
+		return "ZLIB_ERROR";
+	case -1:
+		return "general error";
+	default:
+		return "unknown error code";
+	}
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 0000000000..e2115bb879
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,38 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+#define ARRAYSIZE(a) sizeof(a) / sizeof(a[0])
+
+void put_u24(byte *out, uint32_t i);
+uint32_t get_u24(byte *in);
+
+uint64_t get_u64(byte *in);
+void put_u64(byte *out, uint64_t i);
+
+void put_u32(byte *out, uint32_t i);
+uint32_t get_u32(byte *in);
+
+void put_u16(byte *out, uint16_t i);
+uint16_t get_u16(byte *in);
+int binsearch(int sz, int (*f)(int k, void *args), void *args);
+
+void free_names(char **a);
+void parse_names(char *buf, int size, char ***namesp);
+int names_equal(char **a, char **b);
+int names_length(char **names);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 0000000000..9fe4f8a080
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,401 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "blocksource.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct record rec)
+{
+	struct slice empty = {};
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = {};
+	int n = 0;
+
+	record_key(rec, &key);
+	n = encode_key(&restart, out, last, key, record_val_type(rec));
+	if (n < 0) {
+		goto err;
+	}
+	out.buf += n;
+	out.len -= n;
+
+	n = record_encode(rec, out, w->hash_size);
+	if (n < 0) {
+		goto err;
+	}
+
+	out.buf += n;
+	out.len -= n;
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0) {
+		goto err;
+	}
+
+	free(slice_yield(&key));
+	return 0;
+
+err:
+	free(slice_yield(&key));
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next) {
+		return -1;
+	}
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_u24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_u16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_u24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = {};
+		uLongf dest_len = 0, src_len = 0;
+		slice_resize(&compressed, w->next - block_header_skip);
+
+		dest_len = compressed.len;
+		src_len = w->next - block_header_skip;
+
+		if (Z_OK != compress2(compressed.buf, &dest_len,
+				      w->buf + block_header_skip, src_len, 9)) {
+			free(slice_yield(&compressed));
+			return ZLIB_ERROR;
+		}
+		memcpy(w->buf + block_header_skip, compressed.buf, dest_len);
+		w->next = dest_len + block_header_skip;
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_u24(block->data + header_off + 1);
+
+	if (!is_block_type(typ)) {
+		return FORMAT_ERROR;
+	}
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = {};
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip;
+		uLongf src_len = block->len - block_header_skip;
+
+		slice_resize(&uncompressed, sz);
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		if (Z_OK !=
+		    uncompress2(uncompressed.buf + block_header_skip, &dst_len,
+				block->data + block_header_skip, &src_len)) {
+			free(slice_yield(&uncompressed));
+			return ZLIB_ERROR;
+		}
+
+		block_source_return_block(block->source, block);
+		block->data = uncompressed.buf;
+		block->len = dst_len; /* XXX: 4 bytes missing? */
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+                   padded (data followed by '\0') or the next block is
+                   unaligned. */
+		full_block_size = sz;
+	}
+
+	{
+		uint16_t restart_count = get_u16(block->data + sz - 2);
+		uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+		byte *restart_bytes = block->data + restart_start;
+
+		/* transfer ownership. */
+		br->block = *block;
+		block->data = NULL;
+		block->len = 0;
+
+		br->hash_size = hash_size;
+		br->block_len = restart_start;
+		br->full_block_size = full_block_size;
+		br->header_off = header_off;
+		br->restart_count = restart_count;
+		br->restart_bytes = restart_bytes;
+	}
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_u24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	struct slice key;
+	struct block_reader *r;
+	int error;
+};
+
+static int restart_key_less(int idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = {};
+	struct slice last_key = {};
+	byte unused_extra;
+	int n = decode_key(&rkey, &unused_extra, last_key, in);
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	{
+		int result = slice_compare(a->key, rkey);
+		free(slice_yield(&rkey));
+		return result;
+	}
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct record rec)
+{
+	if (it->next_off >= it->br->block_len) {
+		return 1;
+	}
+
+	{
+		struct slice in = {
+			.buf = it->br->block.data + it->next_off,
+			.len = it->br->block_len - it->next_off,
+		};
+		struct slice start = in;
+		struct slice key = {};
+		byte extra;
+		int n = decode_key(&key, &extra, it->last_key, in);
+		if (n < 0) {
+			return -1;
+		}
+
+		in.buf += n;
+		in.len -= n;
+		n = record_decode(rec, key, extra, in, it->br->hash_size);
+		if (n < 0) {
+			return -1;
+		}
+		in.buf += n;
+		in.len -= n;
+
+		slice_copy(&it->last_key, key);
+		it->next_off += start.len - in.len;
+		free(slice_yield(&key));
+		return 0;
+	}
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = {};
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	byte extra = 0;
+	int n = decode_key(key, &extra, empty, in);
+	if (n < 0) {
+		return n;
+	}
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	free(slice_yield(&it->last_key));
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		return -1;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	{
+		struct record rec = new_record(block_reader_type(br));
+		struct slice key = {};
+		int result = 0;
+		int err = 0;
+		struct block_iter next = {};
+		while (true) {
+			block_iter_copy_from(&next, it);
+
+			err = block_iter_next(&next, rec);
+			if (err < 0) {
+				result = -1;
+				goto exit;
+			}
+
+			record_key(rec, &key);
+			if (err > 0 || slice_compare(key, want) >= 0) {
+				result = 0;
+				goto exit;
+			}
+
+			block_iter_copy_from(it, &next);
+		}
+
+	exit:
+		free(slice_yield(&key));
+		free(slice_yield(&next.last_key));
+		record_clear(rec);
+		free(record_yield(&rec));
+
+		return result;
+	}
+}
+
+void block_writer_reset(struct block_writer *bw)
+{
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	free(bw->restarts);
+	bw->restarts = NULL;
+	free(slice_yield(&bw->last_key));
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 0000000000..bb42588111
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,71 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+	uint32_t header_off;
+	int restart_interval;
+	int hash_size;
+
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+	struct slice last_key;
+	int entries;
+};
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+byte block_writer_type(struct block_writer *bw);
+int block_writer_add(struct block_writer *w, struct record rec);
+int block_writer_finish(struct block_writer *w);
+void block_writer_reset(struct block_writer *bw);
+void block_writer_clear(struct block_writer *bw);
+
+struct block_reader {
+	uint32_t header_off;
+	struct block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint32_t full_block_size;
+	uint16_t restart_count;
+};
+
+struct block_iter {
+	struct block_reader *br;
+	struct slice last_key;
+	uint32_t next_off;
+};
+
+int block_reader_init(struct block_reader *br, struct block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want);
+byte block_reader_type(struct block_reader *r);
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+int block_iter_next(struct block_iter *it, struct record rec);
+int block_iter_seek(struct block_iter *it, struct slice want);
+void block_iter_close(struct block_iter *it);
+
+#endif
diff --git a/reftable/block_test.c b/reftable/block_test.c
new file mode 100644
index 0000000000..694333f80f
--- /dev/null
+++ b/reftable/block_test.c
@@ -0,0 +1,151 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+
+struct binsearch_args {
+	int key;
+	int *arr;
+};
+
+static int binsearch_func(int i, void *void_args)
+{
+	struct binsearch_args *args = (struct binsearch_args *)void_args;
+
+	return args->key < args->arr[i];
+}
+
+void test_binsearch()
+{
+	int arr[] = { 2, 4, 6, 8, 10 };
+	int sz = ARRAYSIZE(arr);
+	struct binsearch_args args = {
+		.arr = arr,
+	};
+
+	int i = 0;
+	for (i = 1; i < 11; i++) {
+		args.key = i;
+		int res = binsearch(sz, &binsearch_func, &args);
+
+		if (res < sz) {
+			assert(args.key < arr[res]);
+			if (res > 0) {
+				assert(args.key >= arr[res - 1]);
+			}
+		} else {
+			assert(args.key == 10 || args.key == 11);
+		}
+	}
+}
+
+void test_block_read_write()
+{
+	const int header_off = 21; /* random */
+	const int N = 30;
+	char *names[N];
+	const int block_size = 1024;
+	struct block block = {};
+	block.data = calloc(block_size, 1);
+	block.len = block_size;
+
+	struct block_writer bw = {};
+	block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size,
+			  header_off, SHA1_SIZE);
+	struct ref_record ref = {};
+	struct record rec = {};
+	record_from_ref(&rec, &ref);
+
+	int i = 0;
+	for (i = 0; i < N; i++) {
+		char name[100];
+		snprintf(name, sizeof(name), "branch%02d", i);
+
+		byte hash[SHA1_SIZE];
+		memset(hash, i, sizeof(hash));
+
+		ref.ref_name = name;
+		ref.value = hash;
+		names[i] = strdup(name);
+		int n = block_writer_add(&bw, rec);
+		ref.ref_name = NULL;
+		ref.value = NULL;
+		assert(n == 0);
+	}
+
+	int n = block_writer_finish(&bw);
+	assert(n > 0);
+
+	block_writer_clear(&bw);
+
+	struct block_reader br = {};
+	block_reader_init(&br, &block, header_off, block_size, SHA1_SIZE);
+
+	struct block_iter it = {};
+	block_reader_start(&br, &it);
+
+	int j = 0;
+	while (true) {
+		int r = block_iter_next(&it, rec);
+		assert(r >= 0);
+		if (r > 0) {
+			break;
+		}
+		assert_streq(names[j], ref.ref_name);
+		j++;
+	}
+
+	record_clear(rec);
+	block_iter_close(&it);
+
+	struct slice want = {};
+	for (i = 0; i < N; i++) {
+		slice_set_string(&want, names[i]);
+
+		struct block_iter it = {};
+		int n = block_reader_seek(&br, &it, want);
+		assert(n == 0);
+
+		n = block_iter_next(&it, rec);
+		assert(n == 0);
+
+		assert_streq(names[i], ref.ref_name);
+
+		want.len--;
+		n = block_reader_seek(&br, &it, want);
+		assert(n == 0);
+
+		n = block_iter_next(&it, rec);
+		assert(n == 0);
+		assert_streq(names[10 * (i / 10)], ref.ref_name);
+
+		block_iter_close(&it);
+	}
+
+	record_clear(rec);
+	free(block.data);
+	free(slice_yield(&want));
+	for (i = 0; i < N; i++) {
+		free(names[i]);
+	}
+}
+
+int main()
+{
+	add_test_case("binsearch", &test_binsearch);
+	add_test_case("block_read_write", &test_block_read_write);
+	test_main();
+}
diff --git a/reftable/blocksource.h b/reftable/blocksource.h
new file mode 100644
index 0000000000..f3ad3a4c22
--- /dev/null
+++ b/reftable/blocksource.h
@@ -0,0 +1,20 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCKSOURCE_H
+#define BLOCKSOURCE_H
+
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source source);
+
+#endif
diff --git a/reftable/bytes.c b/reftable/bytes.c
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/reftable/config.h b/reftable/config.h
new file mode 100644
index 0000000000..40a8c178f1
--- /dev/null
+++ b/reftable/config.h
@@ -0,0 +1 @@
+/* empty */
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 0000000000..cd35704610
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,27 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+#define VERSION 1
+#define HEADER_SIZE 24
+#define FOOTER_SIZE 68
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 0000000000..acabe18fbe
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "reftable.h"
+
+static int dump_table(const char *tablename)
+{
+	struct block_source src = {};
+	int err = block_source_from_file(&src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	struct reader *r = NULL;
+	err = new_reader(&r, src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct iterator it = {};
+		err = reader_seek_ref(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+
+		struct ref_record ref = {};
+		while (1) {
+			err = iterator_next_ref(it, &ref);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			ref_record_print(&ref, 20);
+		}
+		iterator_destroy(&it);
+		ref_record_clear(&ref);
+	}
+
+	{
+		struct iterator it = {};
+		err = reader_seek_log(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+		struct log_record log = {};
+		while (1) {
+			err = iterator_next_log(it, &log);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			log_record_print(&log, 20);
+		}
+		iterator_destroy(&it);
+		log_record_clear(&log);
+	}
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int opt;
+	const char *table = NULL;
+	while ((opt = getopt(argc, argv, "t:")) != -1) {
+		switch (opt) {
+		case 't':
+			table = strdup(optarg);
+			break;
+		case '?':
+			printf("usage: %s [-table tablefile]\n", argv[0]);
+			return 2;
+			break;
+		}
+	}
+
+	if (table != NULL) {
+		int err = dump_table(table);
+		if (err < 0) {
+			fprintf(stderr, "%s: %s: %s\n", argv[0], table,
+				error_str(err));
+			return 1;
+		}
+	}
+	return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 0000000000..b2ea90bf94
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	free(b);
+}
+
+static int file_read_block(void *v, struct block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size) {
+		return -1;
+	}
+	dest->len = size;
+	return size;
+}
+
+struct block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int block_source_from_file(struct block_source *bs, const char *name)
+{
+	struct stat st = {};
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0) {
+		return -1;
+	}
+
+	{
+		struct file_block_source *p =
+			calloc(sizeof(struct file_block_source), 1);
+		p->size = st.st_size;
+		p->fd = fd;
+
+		bs->ops = &file_vtable;
+		bs->arg = p;
+	}
+	return 0;
+}
+
+int fd_writer(void *arg, byte *data, int sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 0000000000..e306a5d465
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,230 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct iterator it)
+{
+	return it.ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct record rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct iterator *it)
+{
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct iterator it, struct record rec)
+{
+	return it.ops->next(it.iter_arg, rec);
+}
+
+void iterator_destroy(struct iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	free(it->iter_arg);
+	it->iter_arg = NULL;
+}
+
+int iterator_next_ref(struct iterator it, struct ref_record *ref)
+{
+	struct record rec = {};
+	record_from_ref(&rec, ref);
+	return iterator_next(it, rec);
+}
+
+int iterator_next_log(struct iterator it, struct log_record *log)
+{
+	struct record rec = {};
+	record_from_log(&rec, log);
+	return iterator_next(it, rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	free(slice_yield(&fri->oid));
+	iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct ref_record *ref = (struct ref_record *)rec.data;
+
+	while (true) {
+		int err = iterator_next_ref(fri->it, ref);
+		if (err != 0) {
+			return err;
+		}
+
+		if (fri->double_check) {
+			struct iterator it = {};
+
+			int err = reader_seek_ref(fri->r, &it, ref->ref_name);
+			if (err == 0) {
+				err = iterator_next_ref(it, ref);
+			}
+
+			iterator_destroy(&it);
+
+			if (err < 0) {
+				return err;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+}
+
+struct iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reader_return_block(it->r, &it->block_reader.block);
+	free(slice_yield(&it->oid));
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reader_return_block(it->r, &it->block_reader.block);
+
+	{
+		uint64_t off = it->offsets[it->offset_idx++];
+		int err = reader_init_block_reader(it->r, &it->block_reader,
+						   off, BLOCK_TYPE_REF);
+		if (err < 0) {
+			return err;
+		}
+		if (err > 0) {
+			/* indexed block does not exist. */
+			return FORMAT_ERROR;
+		}
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct record rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct ref_record *ref = (struct ref_record *)rec.data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reader *r, byte *oid, int oid_len,
+			       uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter *itr =
+		calloc(sizeof(struct indexed_table_ref_iter), 1);
+	int err = 0;
+
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 0000000000..f497f2a27e
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,56 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct iterator_vtable {
+	int (*next)(void *iter_arg, struct record rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct iterator *it);
+int iterator_next(struct iterator it, struct record rec);
+bool iterator_is_null(struct iterator it);
+
+struct filtering_ref_iterator {
+	struct reader *r;
+	struct slice oid;
+	bool double_check;
+	struct iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *,
+					  struct filtering_ref_iterator *);
+
+struct indexed_table_ref_iter {
+	struct reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reader *r, byte *oid, int oid_len,
+			       uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 0000000000..7e32bb5d8a
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,288 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct record rec = new_record(mi->typ);
+		int err = iterator_next(mi->stack[i], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			iterator_destroy(&mi->stack[i]);
+			record_clear(rec);
+			free(record_yield(&rec));
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		iterator_destroy(&mi->stack[i]);
+	}
+	free(mi->stack);
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, int idx)
+{
+	if (iterator_is_null(mi->stack[idx])) {
+		return 0;
+	}
+
+	{
+		struct record rec = new_record(mi->typ);
+		struct pq_entry e = {
+			.rec = rec,
+			.index = idx,
+		};
+		int err = iterator_next(mi->stack[idx], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			iterator_destroy(&mi->stack[idx]);
+			record_clear(rec);
+			free(record_yield(&rec));
+			return 0;
+		}
+
+		merged_iter_pqueue_add(&mi->pq, e);
+	}
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct record rec)
+{
+	struct slice entry_key = {};
+	struct pq_entry entry = merged_iter_pqueue_remove(&mi->pq);
+	int err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0) {
+		return err;
+	}
+
+	record_key(entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = {};
+		int err = 0, cmp = 0;
+
+		record_key(top.rec, &k);
+
+		cmp = slice_compare(k, entry_key);
+		free(slice_yield(&k));
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		record_clear(top.rec);
+		free(record_yield(&top.rec));
+	}
+
+	record_copy_from(rec, entry.rec, mi->hash_size);
+	record_clear(entry.rec);
+	free(record_yield(&entry.rec));
+	free(slice_yield(&entry_key));
+	return 0;
+}
+
+static int merged_iter_next_void(void *p, struct record rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	return merged_iter_next(mi, rec);
+}
+
+struct iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct iterator *it,
+				      struct merged_iter *mi)
+{
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n)
+{
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reader *r = stack[i];
+		if (i > 0 && last_max >= reader_min_update_index(r)) {
+			return FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reader_min_update_index(r);
+		}
+
+		last_max = reader_max_update_index(r);
+	}
+
+	{
+		struct merged_table m = {
+			.stack = stack,
+			.stack_len = n,
+			.min = first_min,
+			.max = last_max,
+			.hash_size = SHA1_SIZE,
+		};
+
+		*dest = calloc(sizeof(struct merged_table), 1);
+		**dest = m;
+	}
+	return 0;
+}
+
+void merged_table_close(struct merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reader_free(mt->stack[i]);
+	}
+	free(mt->stack);
+	mt->stack = NULL;
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct merged_table *mt)
+{
+	free(mt->stack);
+	mt->stack = NULL;
+	mt->stack_len = 0;
+}
+
+void merged_table_free(struct merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	free(mt);
+}
+
+uint64_t merged_max_update_index(struct merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t merged_min_update_index(struct merged_table *mt)
+{
+	return mt->min;
+}
+
+static int merged_table_seek_record(struct merged_table *mt,
+				    struct iterator *it, struct record rec)
+{
+	struct iterator *iters = calloc(sizeof(struct iterator), mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = record_type(rec),
+		.hash_size = mt->hash_size,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			iterator_destroy(&iters[i]);
+		}
+		free(iters);
+		return err;
+	}
+
+	merged.stack_len = n, err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	}
+
+	{
+		struct merged_iter *p = malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+			  const char *name)
+{
+	struct ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = {};
+	record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+			     const char *name, uint64_t update_index)
+{
+	struct log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = {};
+	record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+			  const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 0000000000..b8d3572e26
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct merged_table {
+	struct reader **stack;
+	int stack_len;
+	int hash_size;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct iterator *stack;
+	int hash_size;
+	int stack_len;
+	byte typ;
+	struct merged_iter_pqueue pq;
+} merged_iter;
+
+void merged_table_clear(struct merged_table *mt);
+
+#endif
diff --git a/reftable/merged_test.c b/reftable/merged_test.c
new file mode 100644
index 0000000000..2a46c5ffbd
--- /dev/null
+++ b/reftable/merged_test.c
@@ -0,0 +1,258 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "pq.h"
+#include "reader.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+
+void test_pq(void)
+{
+	char *names[54] = {};
+	int N = ARRAYSIZE(names) - 1;
+
+	int i = 0;
+	for (i = 0; i < N; i++) {
+		char name[100];
+		snprintf(name, sizeof(name), "%02d", i);
+		names[i] = strdup(name);
+	}
+
+	struct merged_iter_pqueue pq = {};
+
+	i = 1;
+	do {
+		struct record rec = new_record(BLOCK_TYPE_REF);
+		record_as_ref(rec)->ref_name = names[i];
+
+		struct pq_entry e = {
+			.rec = rec,
+		};
+		merged_iter_pqueue_add(&pq, e);
+		merged_iter_pqueue_check(pq);
+		i = (i * 7) % N;
+	} while (i != 1);
+
+	const char *last = NULL;
+	while (!merged_iter_pqueue_is_empty(pq)) {
+		struct pq_entry e = merged_iter_pqueue_remove(&pq);
+		merged_iter_pqueue_check(pq);
+		struct ref_record *ref = record_as_ref(e.rec);
+
+		if (last != NULL) {
+			assert(strcmp(last, ref->ref_name) < 0);
+		}
+		last = ref->ref_name;
+		ref->ref_name = NULL;
+		free(ref);
+	}
+
+	for (i = 0; i < N; i++) {
+		free(names[i]);
+	}
+
+	merged_iter_pqueue_clear(&pq);
+}
+
+void write_test_table(struct slice *buf, struct ref_record refs[], int n)
+{
+	int min = 0xffffffff;
+	int max = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		uint64_t ui = refs[i].update_index;
+		if (ui > max) {
+			max = ui;
+		}
+		if (ui < min) {
+			min = ui;
+		}
+	}
+
+	struct write_options opts = {
+		.block_size = 256,
+	};
+
+	struct writer *w = new_writer(&slice_write_void, buf, &opts);
+	writer_set_limits(w, min, max);
+
+	for (i = 0; i < n; i++) {
+		uint64_t before = refs[i].update_index;
+		int n = writer_add_ref(w, &refs[i]);
+		assert(n == 0);
+		assert(before == refs[i].update_index);
+	}
+
+	int err = writer_close(w);
+	assert(err == 0);
+
+	writer_free(w);
+	w = NULL;
+}
+
+static struct merged_table *merged_table_from_records(struct ref_record **refs,
+						      int *sizes,
+						      struct slice *buf, int n)
+{
+	struct block_source *source = calloc(n, sizeof(*source));
+	struct reader **rd = calloc(n, sizeof(*rd));
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		write_test_table(&buf[i], refs[i], sizes[i]);
+		block_source_from_slice(&source[i], &buf[i]);
+
+		int err = new_reader(&rd[i], source[i], "name");
+		assert(err == 0);
+	}
+
+	struct merged_table *mt = NULL;
+	int err = new_merged_table(&mt, rd, n);
+	assert(err == 0);
+	return mt;
+}
+
+void test_merged_between(void)
+{
+	byte hash1[SHA1_SIZE];
+	byte hash2[SHA1_SIZE];
+
+	set_test_hash(hash1, 1);
+	set_test_hash(hash2, 2);
+	struct ref_record r1[] = { {
+		.ref_name = "b",
+		.update_index = 1,
+		.value = hash1,
+	} };
+	struct ref_record r2[] = { {
+		.ref_name = "a",
+		.update_index = 2,
+	} };
+
+	struct ref_record *refs[] = { r1, r2 };
+	int sizes[] = { 1, 1 };
+	struct slice bufs[2] = {};
+	struct merged_table *mt =
+		merged_table_from_records(refs, sizes, bufs, 2);
+
+	struct iterator it = {};
+	int err = merged_table_seek_ref(mt, &it, "a");
+	assert(err == 0);
+
+	struct ref_record ref = {};
+	err = iterator_next_ref(it, &ref);
+	assert_err(err);
+	assert(ref.update_index == 2);
+}
+
+void test_merged(void)
+{
+	byte hash1[SHA1_SIZE];
+	byte hash2[SHA1_SIZE];
+
+	set_test_hash(hash1, 1);
+	set_test_hash(hash2, 2);
+	struct ref_record r1[] = { {
+					   .ref_name = "a",
+					   .update_index = 1,
+					   .value = hash1,
+				   },
+				   {
+					   .ref_name = "b",
+					   .update_index = 1,
+					   .value = hash1,
+				   },
+				   {
+					   .ref_name = "c",
+					   .update_index = 1,
+					   .value = hash1,
+				   } };
+	struct ref_record r2[] = { {
+		.ref_name = "a",
+		.update_index = 2,
+	} };
+	struct ref_record r3[] = {
+		{
+			.ref_name = "c",
+			.update_index = 3,
+			.value = hash2,
+		},
+		{
+			.ref_name = "d",
+			.update_index = 3,
+			.value = hash1,
+		},
+	};
+
+	struct ref_record *refs[] = { r1, r2, r3 };
+	int sizes[3] = { 3, 1, 2 };
+	struct slice bufs[3] = {};
+
+	struct merged_table *mt =
+		merged_table_from_records(refs, sizes, bufs, 3);
+
+	struct iterator it = {};
+	int err = merged_table_seek_ref(mt, &it, "a");
+	assert(err == 0);
+
+	struct ref_record *out = NULL;
+	int len = 0;
+	int cap = 0;
+	while (len < 100) { /* cap loops/recursion. */
+		struct ref_record ref = {};
+		int err = iterator_next_ref(it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			out = realloc(out, sizeof(struct ref_record) * cap);
+		}
+		out[len++] = ref;
+	}
+	iterator_destroy(&it);
+
+	struct ref_record want[] = {
+		r2[0],
+		r1[1],
+		r3[0],
+		r3[1],
+	};
+	assert(ARRAYSIZE(want) == len);
+	int i = 0;
+	for (i = 0; i < len; i++) {
+		assert(ref_record_equal(&want[i], &out[i], SHA1_SIZE));
+	}
+	for (i = 0; i < len; i++) {
+		ref_record_clear(&out[i]);
+	}
+	free(out);
+
+	for (i = 0; i < 3; i++) {
+		free(slice_yield(&bufs[i]));
+	}
+	merged_table_close(mt);
+	merged_table_free(mt);
+}
+
+/* XXX test refs_for(oid) */
+
+int main()
+{
+	add_test_case("test_merged_between", &test_merged_between);
+	add_test_case("test_pq", &test_pq);
+	add_test_case("test_merged", &test_merged);
+	test_main();
+}
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 0000000000..62cb75c5c9
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,124 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = {};
+	struct slice bk = {};
+	int cmp = 0;
+	record_key(a.rec, &ak);
+	record_key(b.rec, &bk);
+
+	cmp = slice_compare(ak, bk);
+
+	free(slice_yield(&ak));
+	free(slice_yield(&bk));
+
+	if (cmp == 0) {
+		return a.index > b.index;
+	}
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		{
+			struct pq_entry tmp = pq->heap[min];
+			pq->heap[min] = pq->heap[i];
+			pq->heap[i] = tmp;
+		}
+
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = realloc(pq->heap, pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		{
+			struct pq_entry tmp = pq->heap[j];
+			pq->heap[j] = pq->heap[i];
+			pq->heap[i] = tmp;
+		}
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		record_clear(pq->heap[i].rec);
+		free(record_yield(&pq->heap[i].rec));
+	}
+	free(pq->heap);
+	pq->heap = NULL;
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 0000000000..5f7018979d
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	struct record rec;
+	int index;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 0000000000..754114b58b
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,710 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct block_source source)
+{
+	return source.ops->size(source.arg);
+}
+
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size)
+{
+	int result = source.ops->read_block(source.arg, dest, off, size);
+	dest->source = source;
+	return result;
+}
+
+void block_source_return_block(struct block_source source, struct block *blockp)
+{
+	source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reader_offsets *reader_offsets_for(struct reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reader *r, struct block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size) {
+		return 0;
+	}
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(r->source, dest, off, sz);
+}
+
+void reader_return_block(struct reader *r, struct block *p)
+{
+	block_source_return_block(r->source, p);
+}
+
+const char *reader_name(struct reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	int err = 0;
+	if (memcmp(f, "REFT", 4)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, HEADER_SIZE)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+
+	{
+		byte version = *f++;
+		if (version != 1) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	r->block_size = get_u24(f);
+
+	f += 3;
+	r->min_update_index = get_u64(f);
+	f += 8;
+	r->max_update_index = get_u64(f);
+	f += 8;
+
+	r->ref_offsets.index_offset = get_u64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_u64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_u64(f);
+	f += 8;
+	r->log_offsets.offset = get_u64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_u64(f);
+	f += 8;
+
+	{
+		uint32_t computed_crc = crc32(0, footer, f - footer);
+		uint32_t file_crc = get_u32(f);
+		f += 4;
+		if (computed_crc != file_crc) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	{
+		byte first_block_typ = header[HEADER_SIZE];
+		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+		r->ref_offsets.offset = 0;
+		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+					  r->log_offsets.offset > 0);
+		r->obj_offsets.present = r->obj_offsets.offset > 0;
+	}
+	err = 0;
+exit:
+	return err;
+}
+
+int init_reader(struct reader *r, struct block_source source, const char *name)
+{
+	struct block footer = {};
+	struct block header = {};
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reader));
+	r->size = block_source_size(source) - FOOTER_SIZE;
+	r->source = source;
+	r->name = strdup(name);
+	r->hash_size = SHA1_SIZE;
+
+	err = block_source_read_block(source, &footer, r->size, FOOTER_SIZE);
+	if (err != FOOTER_SIZE) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	/* Need +1 to read type of first block. */
+	err = reader_get_block(r, &header, 0, HEADER_SIZE + 1);
+	if (err != HEADER_SIZE + 1) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+exit:
+	block_source_return_block(r->source, &footer);
+	block_source_return_block(r->source, &header);
+	return err;
+}
+
+struct table_iter {
+	struct reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
+		((struct ref_record *)rec.data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reader_return_block(ti->r, &ti->bi.br->block);
+	free(ti->bi.br);
+	ti->bi.br = NULL;
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += 24;
+	}
+
+	*typ = data[0];
+	if (is_block_type(*typ)) {
+		result = get_u24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct block block = {};
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : HEADER_SIZE;
+	int32_t block_size = 0;
+
+	if (next_off >= r->size) {
+		return 1;
+	}
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0) {
+		return err;
+	}
+
+	block_size = extract_block_size(block.data, &block_typ, next_off);
+	if (block_size < 0) {
+		return block_size;
+	}
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reader_return_block(r, &block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reader_return_block(r, &block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 r->hash_size);
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = {};
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0) {
+		return err;
+	}
+
+	{
+		struct block_reader *brp = malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct record rec)
+{
+	if (record_type(rec) != ti->typ) {
+		return API_ERROR;
+	}
+
+	while (true) {
+		struct table_iter next = {};
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct record rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct iterator *it, struct table_iter *ti)
+{
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reader *r, struct table_iter *ti,
+				uint64_t off, byte typ)
+{
+	struct block_reader br = {};
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0) {
+		return err;
+	}
+
+	brp = malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reader *r, struct table_iter *ti, byte typ,
+			bool index)
+{
+	struct reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reader *r, struct table_iter *ti,
+			      struct record want)
+{
+	struct record rec = new_record(record_type(want));
+	struct slice want_key = {};
+	struct slice got_key = {};
+	struct table_iter next = {};
+	int err = -1;
+	record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0) {
+			goto exit;
+		}
+		{
+			int cmp = slice_compare(got_key, want_key);
+			if (cmp > 0) {
+				table_iter_block_done(&next);
+				break;
+			}
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0) {
+		goto exit;
+	}
+	err = 0;
+
+exit:
+	block_iter_close(&next.bi);
+	record_clear(rec);
+	free(record_yield(&rec));
+	free(slice_yield(&want_key));
+	free(slice_yield(&got_key));
+	return err;
+}
+
+static int reader_seek_indexed(struct reader *r, struct iterator *it,
+			       struct record rec)
+{
+	struct index_record want_index = {};
+	struct record want_index_rec = {};
+	struct index_record index_result = {};
+	struct record index_result_rec = {};
+	struct table_iter index_iter = {};
+	struct table_iter next = {};
+	int err = 0;
+
+	record_key(rec, &want_index.last_key);
+	record_from_index(&want_index_rec, &want_index);
+	record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, record_type(rec), true);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reader_seek_linear(r, &index_iter, want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (next.typ == record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter *malloced =
+			calloc(sizeof(struct table_iter), 1);
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+exit:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	record_clear(want_index_rec);
+	record_clear(index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reader *r, struct iterator *it,
+				struct record rec)
+{
+	struct reader_offsets *offs = reader_offsets_for(r, record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = {};
+	int err = 0;
+	if (idx > 0) {
+		return reader_seek_indexed(r, it, rec);
+	}
+
+	err = reader_start(r, &ti, record_type(rec), false);
+	if (err < 0) {
+		return err;
+	}
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct table_iter *p = malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reader *r, struct iterator *it, struct record rec)
+{
+	byte typ = record_type(rec);
+
+	struct reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name)
+{
+	struct ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = {};
+	record_from_ref(&rec, &ref);
+	return reader_seek(r, it, rec);
+}
+
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+		       uint64_t update_index)
+{
+	struct log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = {};
+	record_from_log(&rec, &log);
+	return reader_seek(r, it, rec);
+}
+
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reader *r)
+{
+	block_source_close(&r->source);
+	free(r->name);
+	r->name = NULL;
+}
+
+int new_reader(struct reader **p, struct block_source src, char const *name)
+{
+	struct reader *rd = calloc(sizeof(struct reader), 1);
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		free(rd);
+	}
+	return err;
+}
+
+void reader_free(struct reader *r)
+{
+	reader_close(r);
+	free(r);
+}
+
+static int reader_refs_for_indexed(struct reader *r, struct iterator *it,
+				   byte *oid)
+{
+	struct obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct record want_rec = {};
+	struct iterator oit = {};
+	struct obj_record got = {};
+	struct record got_rec = {};
+	int err = 0;
+
+	record_from_obj(&want_rec, &want);
+
+	err = reader_seek(r, &oit, want_rec);
+	if (err != 0) {
+		return err;
+	}
+
+	record_from_obj(&got_rec, &got);
+	err = iterator_next(oit, got_rec);
+	iterator_destroy(&oit);
+	if (err < 0) {
+		return err;
+	}
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	{
+		struct indexed_table_ref_iter *itr = NULL;
+		err = new_indexed_table_ref_iter(&itr, r, oid, r->hash_size,
+						 got.offsets, got.offset_len);
+		if (err < 0) {
+			record_clear(got_rec);
+			return err;
+		}
+		got.offsets = NULL;
+		record_clear(got_rec);
+
+		iterator_from_indexed_table_ref_iter(it, itr);
+	}
+
+	return 0;
+}
+
+static int reader_refs_for_unindexed(struct reader *r, struct iterator *it,
+				     byte *oid, int oid_len)
+{
+	struct table_iter *ti = calloc(sizeof(struct table_iter), 1);
+	struct filtering_ref_iterator *filter = NULL;
+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		free(ti);
+		return err;
+	}
+
+	filter = calloc(sizeof(struct filtering_ref_iterator), 1);
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	filter->r = r;
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
+		    int oid_len)
+{
+	if (r->obj_offsets.present) {
+		return reader_refs_for_indexed(r, it, oid);
+	}
+	return reader_refs_for_unindexed(r, it, oid, oid_len);
+}
+
+uint64_t reader_max_update_index(struct reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reader_min_update_index(struct reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 0000000000..599a90028e
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,52 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source *source);
+
+struct reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+struct reader {
+	struct block_source source;
+	char *name;
+	int hash_size;
+	uint64_t size;
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	int object_id_len;
+
+	struct reader_offsets ref_offsets;
+	struct reader_offsets obj_offsets;
+	struct reader_offsets log_offsets;
+};
+
+int init_reader(struct reader *r, struct block_source source, const char *name);
+int reader_seek(struct reader *r, struct iterator *it, struct record rec);
+void reader_close(struct reader *r);
+const char *reader_name(struct reader *r);
+void reader_return_block(struct reader *r, struct block *p);
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 0000000000..657e0e0d91
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1110 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0) {
+		return -1;
+	}
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = {};
+	int i = 9;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	{
+		int n = sizeof(buf) - i - 1;
+		if (dest.len < n) {
+			return -1;
+		}
+		memcpy(dest.buf, &buf[i + 1], n);
+		return n;
+	}
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+	if (in.len < tsize) {
+		return -1;
+	}
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	in.buf += tsize;
+	in.len -= tsize;
+
+	return start_len - in.len;
+}
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0) {
+		return -1;
+	}
+	dest.buf += n;
+	dest.len -= n;
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0) {
+		return -1;
+	}
+	dest.buf += n;
+	dest.len -= n;
+
+	if (dest.len < suffix_len) {
+		return -1;
+	}
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	dest.buf += suffix_len;
+	dest.len -= suffix_len;
+
+	return start.len - dest.len;
+}
+
+static byte ref_record_type(void)
+{
+	return BLOCK_TYPE_REF;
+}
+
+static void ref_record_key(const void *r, struct slice *dest)
+{
+	const struct ref_record *rec = (const struct ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void ref_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct ref_record *ref = (struct ref_record *)rec;
+	struct ref_record *src = (struct ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+           fields. */
+	ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = strdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = strdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9) {
+		return '0' + c;
+	}
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void ref_record_print(struct ref_record *ref, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = {};
+
+	printf("ref{%s(%" PRIdMAX ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size);
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size);
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void ref_record_clear_void(void *rec)
+{
+	ref_record_clear((struct ref_record *)rec);
+}
+
+void ref_record_clear(struct ref_record *ref)
+{
+	free(ref->ref_name);
+	free(ref->target);
+	free(ref->target_value);
+	free(ref->value);
+	memset(ref, 0, sizeof(struct ref_record));
+}
+
+static byte ref_record_val_type(const void *rec)
+{
+	const struct ref_record *r = (const struct ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL) {
+		return 3;
+	}
+	return 0;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+	if (s.len < l) {
+		return -1;
+	}
+	memcpy(s.buf, str, l);
+	s.buf += l;
+	s.len -= l;
+
+	return start.len - s.len;
+}
+
+static int ref_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	const struct ref_record *r = (const struct ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		s.buf += hash_size;
+		s.len -= hash_size;
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		s.buf += hash_size;
+		s.len -= hash_size;
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		s.buf += n;
+		s.len -= n;
+	}
+
+	return start.len - s.len;
+}
+
+static int ref_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct ref_record *r = (struct ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0) {
+		return n;
+	}
+	assert(hash_size > 0);
+
+	in.buf += n;
+	in.len -= n;
+
+	r->ref_name = realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		in.buf += hash_size;
+		in.len -= hash_size;
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		in.buf += hash_size;
+		in.len -= hash_size;
+		break;
+	case 3: {
+		struct slice dest = {};
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		in.buf += n;
+		in.len -= n;
+		seen_target = true;
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		free(r->target);
+		r->target = NULL;
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		free(r->target_value);
+		r->target_value = NULL;
+	}
+	if (!seen_value && r->value != NULL) {
+		free(r->value);
+		r->value = NULL;
+	}
+
+	return start.len - in.len;
+}
+
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+
+	if (prefix_len > last_key.len) {
+		return -1;
+	}
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len) {
+		return -1;
+	}
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	in.buf += suffix_len;
+	in.len -= suffix_len;
+
+	return start_len - in.len;
+}
+
+struct record_vtable ref_record_vtable = {
+	.key = &ref_record_key,
+	.type = &ref_record_type,
+	.copy_from = &ref_record_copy_from,
+	.val_type = &ref_record_val_type,
+	.encode = &ref_record_encode,
+	.decode = &ref_record_decode,
+	.clear = &ref_record_clear_void,
+};
+
+static byte obj_record_type(void)
+{
+	return BLOCK_TYPE_OBJ;
+}
+
+static void obj_record_key(const void *r, struct slice *dest)
+{
+	const struct obj_record *rec = (const struct obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	const struct obj_record *src = (const struct obj_record *)src_rec;
+
+	*ref = *src;
+	ref->hash_prefix = malloc(ref->hash_prefix_len);
+	memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
+
+	{
+		int olen = ref->offset_len * sizeof(uint64_t);
+		ref->offsets = malloc(olen);
+		memcpy(ref->offsets, src->offsets, olen);
+	}
+}
+
+static void obj_record_clear(void *rec)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	free(ref->hash_prefix);
+	free(ref->offsets);
+	memset(ref, 0, sizeof(struct obj_record));
+}
+
+static byte obj_record_val_type(const void *rec)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8) {
+		return r->offset_len;
+	}
+	return 0;
+}
+
+static int obj_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		s.buf += n;
+		s.len -= n;
+	}
+	if (r->offset_len == 0) {
+		return start.len - s.len;
+	}
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	{
+		uint64_t last = r->offsets[0];
+		int i = 0;
+		for (i = 1; i < r->offset_len; i++) {
+			int n = put_var_int(s, r->offsets[i] - last);
+			if (n < 0) {
+				return -1;
+			}
+			s.buf += n;
+			s.len -= n;
+			last = r->offsets[i];
+		}
+	}
+	return start.len - s.len;
+}
+
+static int obj_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct obj_record *r = (struct obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	r->hash_prefix = malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		in.buf += n;
+		in.len -= n;
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0) {
+		return start.len - in.len;
+	}
+
+	r->offsets = malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0) {
+		return n;
+	}
+
+	in.buf += n;
+	in.len -= n;
+
+	{
+		uint64_t last = r->offsets[0];
+		int j = 1;
+		while (j < count) {
+			uint64_t delta = 0;
+			int n = get_var_int(&delta, in);
+			if (n < 0) {
+				return n;
+			}
+
+			in.buf += n;
+			in.len -= n;
+
+			last = r->offsets[j] = (delta + last);
+			j++;
+		}
+	}
+	return start.len - in.len;
+}
+
+struct record_vtable obj_record_vtable = {
+	.key = &obj_record_key,
+	.type = &obj_record_type,
+	.copy_from = &obj_record_copy_from,
+	.val_type = &obj_record_val_type,
+	.encode = &obj_record_encode,
+	.decode = &obj_record_decode,
+	.clear = &obj_record_clear,
+};
+
+void log_record_print(struct log_record *log, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = {};
+
+	printf("log{%s(%" PRIdMAX ") %s <%s> %lu %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size);
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size);
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static byte log_record_type(void)
+{
+	return BLOCK_TYPE_LOG;
+}
+
+static void log_record_key(const void *r, struct slice *dest)
+{
+	const struct log_record *rec = (const struct log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_u64(dest->buf + 1 + len, ts);
+}
+
+static void log_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct log_record *dst = (struct log_record *)rec;
+	const struct log_record *src = (const struct log_record *)src_rec;
+
+	*dst = *src;
+	dst->ref_name = strdup(dst->ref_name);
+	dst->email = strdup(dst->email);
+	dst->name = strdup(dst->name);
+	dst->message = strdup(dst->message);
+	if (dst->new_hash != NULL) {
+		dst->new_hash = malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void log_record_clear_void(void *rec)
+{
+	struct log_record *r = (struct log_record *)rec;
+	log_record_clear(r);
+}
+
+void log_record_clear(struct log_record *r)
+{
+	free(r->ref_name);
+	free(r->new_hash);
+	free(r->old_hash);
+	free(r->name);
+	free(r->email);
+	free(r->message);
+	memset(r, 0, sizeof(struct log_record));
+}
+
+static byte log_record_val_type(const void *rec)
+{
+	return 1;
+}
+
+static byte zero[SHA256_SIZE] = {};
+
+static int log_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct log_record *r = (struct log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size) {
+		return -1;
+	}
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	s.buf += 2 * hash_size;
+	s.len -= 2 * hash_size;
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	n = put_var_int(s, r->time);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	if (s.len < 2) {
+		return -1;
+	}
+
+	put_u16(s.buf, r->tz_offset);
+	s.buf += 2;
+	s.len -= 2;
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	return start.len - s.len;
+}
+
+static int log_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct log_record *r = (struct log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = {};
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+		return FORMAT_ERROR;
+	}
+
+	r->ref_name = realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_u64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (in.len < 2 * hash_size) {
+		return FORMAT_ERROR;
+	}
+
+	r->old_hash = realloc(r->old_hash, hash_size);
+	r->new_hash = realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	in.buf += 2 * hash_size;
+	in.len -= 2 * hash_size;
+
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->name = realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->email = realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+	r->time = ts;
+	if (in.len < 2) {
+		goto error;
+	}
+
+	r->tz_offset = get_u16(in.buf);
+	in.buf += 2;
+	in.len -= 2;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->message = realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	return start.len - in.len;
+
+error:
+	free(slice_yield(&dest));
+	return FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL) {
+		a = empty;
+	}
+	if (b == NULL) {
+		b = empty;
+	}
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL) {
+		a = zero;
+	}
+	if (b == NULL) {
+		b = zero;
+	}
+	return !memcmp(a, b, sz);
+}
+
+bool log_record_equal(struct log_record *a, struct log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+struct record_vtable log_record_vtable = {
+	.key = &log_record_key,
+	.type = &log_record_type,
+	.copy_from = &log_record_copy_from,
+	.val_type = &log_record_val_type,
+	.encode = &log_record_encode,
+	.decode = &log_record_decode,
+	.clear = &log_record_clear_void,
+};
+
+struct record new_record(byte typ)
+{
+	struct record rec;
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct ref_record *r = calloc(1, sizeof(struct ref_record));
+		record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct obj_record *r = calloc(1, sizeof(struct obj_record));
+		record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct log_record *r = calloc(1, sizeof(struct log_record));
+		record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct index_record *r = calloc(1, sizeof(struct index_record));
+		record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+static byte index_record_type(void)
+{
+	return BLOCK_TYPE_INDEX;
+}
+
+static void index_record_key(const void *r, struct slice *dest)
+{
+	struct index_record *rec = (struct index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void index_record_copy_from(void *rec, const void *src_rec,
+				   int hash_size)
+{
+	struct index_record *dst = (struct index_record *)rec;
+	struct index_record *src = (struct index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void index_record_clear(void *rec)
+{
+	struct index_record *idx = (struct index_record *)rec;
+	free(slice_yield(&idx->last_key));
+}
+
+static byte index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int index_record_encode(const void *rec, struct slice out, int hash_size)
+{
+	const struct index_record *r = (const struct index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0) {
+		return n;
+	}
+
+	out.buf += n;
+	out.len -= n;
+
+	return start.len - out.len;
+}
+
+static int index_record_decode(void *rec, struct slice key, byte val_type,
+			       struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct index_record *r = (struct index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0) {
+		return n;
+	}
+
+	in.buf += n;
+	in.len -= n;
+	return start.len - in.len;
+}
+
+struct record_vtable index_record_vtable = {
+	.key = &index_record_key,
+	.type = &index_record_type,
+	.copy_from = &index_record_copy_from,
+	.val_type = &index_record_val_type,
+	.encode = &index_record_encode,
+	.decode = &index_record_decode,
+	.clear = &index_record_clear,
+};
+
+void record_key(struct record rec, struct slice *dest)
+{
+	rec.ops->key(rec.data, dest);
+}
+
+byte record_type(struct record rec)
+{
+	return rec.ops->type();
+}
+
+int record_encode(struct record rec, struct slice dest, int hash_size)
+{
+	return rec.ops->encode(rec.data, dest, hash_size);
+}
+
+void record_copy_from(struct record rec, struct record src, int hash_size)
+{
+	assert(src.ops->type() == rec.ops->type());
+
+	rec.ops->copy_from(rec.data, src.data, hash_size);
+}
+
+byte record_val_type(struct record rec)
+{
+	return rec.ops->val_type(rec.data);
+}
+
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size)
+{
+	return rec.ops->decode(rec.data, key, extra, src, hash_size);
+}
+
+void record_clear(struct record rec)
+{
+	return rec.ops->clear(rec.data);
+}
+
+void record_from_ref(struct record *rec, struct ref_record *ref_rec)
+{
+	rec->data = ref_rec;
+	rec->ops = &ref_record_vtable;
+}
+
+void record_from_obj(struct record *rec, struct obj_record *obj_rec)
+{
+	rec->data = obj_rec;
+	rec->ops = &obj_record_vtable;
+}
+
+void record_from_index(struct record *rec, struct index_record *index_rec)
+{
+	rec->data = index_rec;
+	rec->ops = &index_record_vtable;
+}
+
+void record_from_log(struct record *rec, struct log_record *log_rec)
+{
+	rec->data = log_rec;
+	rec->ops = &log_record_vtable;
+}
+
+void *record_yield(struct record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+struct ref_record *record_as_ref(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_REF);
+	return (struct ref_record *)rec.data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL) {
+		return !memcmp(a, b, hash_size);
+	}
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL) {
+		return 0 == strcmp(a, b);
+	}
+
+	return a == b;
+}
+
+bool ref_record_equal(struct ref_record *a, struct ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct ref_record *)a)->ref_name,
+		      ((struct ref_record *)b)->ref_name);
+}
+
+bool ref_record_is_deletion(const struct ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int log_record_compare_key(const void *a, const void *b)
+{
+	struct log_record *la = (struct log_record *)a;
+	struct log_record *lb = (struct log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp) {
+		return cmp;
+	}
+	if (la->update_index > lb->update_index) {
+		return -1;
+	}
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool log_record_is_deletion(const struct log_record *log)
+{
+	/* XXX */
+	return false;
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 0000000000..dffdd71fc2
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,79 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+struct record_vtable {
+	void (*key)(const void *rec, struct slice *dest);
+	byte (*type)(void);
+	void (*copy_from)(void *rec, const void *src, int hash_size);
+	byte (*val_type)(const void *rec);
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+	void (*clear)(void *rec);
+};
+
+/* record is a generic wrapper for differnt types of records. */
+struct record {
+	void *data;
+	struct record_vtable *ops;
+};
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+int common_prefix_size(struct slice a, struct slice b);
+
+int is_block_type(byte typ);
+struct record new_record(byte typ);
+
+extern struct record_vtable ref_record_vtable;
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra);
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in);
+
+struct index_record {
+	struct slice last_key;
+	uint64_t offset;
+};
+
+struct obj_record {
+	byte *hash_prefix;
+	int hash_prefix_len;
+	uint64_t *offsets;
+	int offset_len;
+};
+
+void record_key(struct record rec, struct slice *dest);
+byte record_type(struct record rec);
+void record_copy_from(struct record rec, struct record src, int hash_size);
+byte record_val_type(struct record rec);
+int record_encode(struct record rec, struct slice dest, int hash_size);
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size);
+void record_clear(struct record rec);
+void *record_yield(struct record *rec);
+void record_from_obj(struct record *rec, struct obj_record *objrec);
+void record_from_index(struct record *rec, struct index_record *idxrec);
+void record_from_ref(struct record *rec, struct ref_record *refrec);
+void record_from_log(struct record *rec, struct log_record *logrec);
+struct ref_record *record_as_ref(struct record ref);
+
+/* for qsort. */
+int ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/record_test.c b/reftable/record_test.c
new file mode 100644
index 0000000000..b95ac6aa1a
--- /dev/null
+++ b/reftable/record_test.c
@@ -0,0 +1,332 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "constants.h"
+#include "reftable.h"
+#include "test_framework.h"
+
+void varint_roundtrip()
+{
+	uint64_t inputs[] = { 0,
+			      1,
+			      27,
+			      127,
+			      128,
+			      257,
+			      4096,
+			      ((uint64_t)1 << 63),
+			      ((uint64_t)1 << 63) + ((uint64_t)1 << 63) - 1 };
+	int i = 0;
+	for (i = 0; i < ARRAYSIZE(inputs); i++) {
+		byte dest[10];
+
+		struct slice out = { .buf = dest, .len = 10, .cap = 10 };
+
+		uint64_t in = inputs[i];
+		int n = put_var_int(out, in);
+		assert(n > 0);
+		out.len = n;
+
+		uint64_t got = 0;
+		n = get_var_int(&got, out);
+		assert(n > 0);
+
+		assert(got == in);
+	}
+}
+
+void test_common_prefix()
+{
+	struct {
+		const char *a, *b;
+		int want;
+	} cases[] = {
+		{ "abc", "ab", 2 },
+		{ "", "abc", 0 },
+		{ "abc", "abd", 2 },
+		{ "abc", "pqr", 0 },
+	};
+
+	int i = 0;
+	for (i = 0; i < ARRAYSIZE(cases); i++) {
+		struct slice a = {};
+		struct slice b = {};
+		slice_set_string(&a, cases[i].a);
+		slice_set_string(&b, cases[i].b);
+
+		int got = common_prefix_size(a, b);
+		assert(got == cases[i].want);
+
+		free(slice_yield(&a));
+		free(slice_yield(&b));
+	}
+}
+
+void set_hash(byte *h, int j)
+{
+	int i = 0;
+	for (i = 0; i < SHA1_SIZE; i++) {
+		h[i] = (j >> i) & 0xff;
+	}
+}
+
+void test_ref_record_roundtrip()
+{
+	int i = 0;
+	for (i = 0; i <= 3; i++) {
+		printf("subtest %d\n", i);
+		struct ref_record in = {};
+		switch (i) {
+		case 0:
+			break;
+		case 1:
+			in.value = malloc(SHA1_SIZE);
+			set_hash(in.value, 1);
+			break;
+		case 2:
+			in.value = malloc(SHA1_SIZE);
+			set_hash(in.value, 1);
+			in.target_value = malloc(SHA1_SIZE);
+			set_hash(in.target_value, 2);
+			break;
+		case 3:
+			in.target = strdup("target");
+			break;
+		}
+		in.ref_name = strdup("refs/heads/master");
+
+		struct record rec = {};
+		record_from_ref(&rec, &in);
+		assert(record_val_type(rec) == i);
+		byte buf[1024];
+		struct slice key = {};
+		record_key(rec, &key);
+		struct slice dest = {
+			.buf = buf,
+			.len = sizeof(buf),
+		};
+		int n = record_encode(rec, dest, SHA1_SIZE);
+		assert(n > 0);
+
+		struct ref_record out = {};
+		struct record rec_out = {};
+		record_from_ref(&rec_out, &out);
+		int m = record_decode(rec_out, key, i, dest, SHA1_SIZE);
+		assert(n == m);
+
+		assert((out.value != NULL) == (in.value != NULL));
+		assert((out.target_value != NULL) == (in.target_value != NULL));
+		assert((out.target != NULL) == (in.target != NULL));
+		free(slice_yield(&key));
+		record_clear(rec_out);
+		ref_record_clear(&in);
+	}
+}
+
+void test_log_record_roundtrip()
+{
+	struct log_record in = {
+		.ref_name = strdup("refs/heads/master"),
+		.old_hash = malloc(SHA1_SIZE),
+		.new_hash = malloc(SHA1_SIZE),
+		.name = strdup("han-wen"),
+		.email = strdup("hanwen@google.com"),
+		.message = strdup("test"),
+		.update_index = 42,
+		.time = 1577123507,
+		.tz_offset = 100,
+	};
+
+	struct record rec = {};
+	record_from_log(&rec, &in);
+
+	struct slice key = {};
+	record_key(rec, &key);
+
+	byte buf[1024];
+	struct slice dest = {
+		.buf = buf,
+		.len = sizeof(buf),
+	};
+
+	int n = record_encode(rec, dest, SHA1_SIZE);
+	assert(n > 0);
+
+	struct log_record out = {};
+	struct record rec_out = {};
+	record_from_log(&rec_out, &out);
+	int valtype = record_val_type(rec);
+	int m = record_decode(rec_out, key, valtype, dest, SHA1_SIZE);
+	assert(n == m);
+
+	assert(log_record_equal(&in, &out, SHA1_SIZE));
+	log_record_clear(&in);
+	free(slice_yield(&key));
+	record_clear(rec_out);
+}
+
+void test_u24_roundtrip()
+{
+	uint32_t in = 0x112233;
+	byte dest[3];
+
+	put_u24(dest, in);
+	uint32_t out = get_u24(dest);
+	assert(in == out);
+}
+
+void test_key_roundtrip()
+{
+	struct slice dest = {}, last_key = {}, key = {}, roundtrip = {};
+
+	slice_resize(&dest, 1024);
+	slice_set_string(&last_key, "refs/heads/master");
+	slice_set_string(&key, "refs/tags/bla");
+
+	bool restart;
+	byte extra = 6;
+	int n = encode_key(&restart, dest, last_key, key, extra);
+	assert(!restart);
+	assert(n > 0);
+
+	byte rt_extra;
+	int m = decode_key(&roundtrip, &rt_extra, last_key, dest);
+	assert(n == m);
+	assert(slice_equal(key, roundtrip));
+	assert(rt_extra == extra);
+
+	free(slice_yield(&last_key));
+	free(slice_yield(&key));
+	free(slice_yield(&dest));
+	free(slice_yield(&roundtrip));
+}
+
+void print_bytes(byte *p, int l)
+{
+	int i = 0;
+	for (i = 0; i < l; i++) {
+		byte c = *p;
+		if (c < 32) {
+			c = '.';
+		}
+		printf("%02x[%c] ", p[i], c);
+	}
+	printf("(%d)\n", l);
+}
+
+void test_obj_record_roundtrip()
+{
+	byte testHash1[SHA1_SIZE] = {};
+	set_hash(testHash1, 1);
+	uint64_t till9[] = { 1, 2, 3, 4, 500, 600, 700, 800, 9000 };
+
+	struct obj_record recs[3] = { {
+					      .hash_prefix = testHash1,
+					      .hash_prefix_len = 5,
+					      .offsets = till9,
+					      .offset_len = 3,
+				      },
+				      {
+					      .hash_prefix = testHash1,
+					      .hash_prefix_len = 5,
+					      .offsets = till9,
+					      .offset_len = 9,
+				      },
+				      {
+					      .hash_prefix = testHash1,
+					      .hash_prefix_len = 5,
+				      }
+
+	};
+	int i = 0;
+	for (i = 0; i < ARRAYSIZE(recs); i++) {
+		printf("subtest %d\n", i);
+		struct obj_record in = recs[i];
+		byte buf[1024];
+		struct record rec = {};
+		record_from_obj(&rec, &in);
+		struct slice key = {};
+		record_key(rec, &key);
+		struct slice dest = {
+			.buf = buf,
+			.len = sizeof(buf),
+		};
+		int n = record_encode(rec, dest, SHA1_SIZE);
+		assert(n > 0);
+		byte extra = record_val_type(rec);
+		struct obj_record out = {};
+		struct record rec_out = {};
+		record_from_obj(&rec_out, &out);
+		int m = record_decode(rec_out, key, extra, dest, SHA1_SIZE);
+		assert(n == m);
+
+		assert(in.hash_prefix_len == out.hash_prefix_len);
+		assert(in.offset_len == out.offset_len);
+
+		assert(!memcmp(in.hash_prefix, out.hash_prefix,
+			       in.hash_prefix_len));
+		assert(0 == memcmp(in.offsets, out.offsets,
+				   sizeof(uint64_t) * in.offset_len));
+		free(slice_yield(&key));
+		record_clear(rec_out);
+	}
+}
+
+void test_index_record_roundtrip()
+{
+	struct index_record in = { .offset = 42 };
+
+	slice_set_string(&in.last_key, "refs/heads/master");
+
+	struct slice key = {};
+	struct record rec = {};
+	record_from_index(&rec, &in);
+	record_key(rec, &key);
+
+	assert(0 == slice_compare(key, in.last_key));
+
+	byte buf[1024];
+	struct slice dest = {
+		.buf = buf,
+		.len = sizeof(buf),
+	};
+	int n = record_encode(rec, dest, SHA1_SIZE);
+	assert(n > 0);
+
+	byte extra = record_val_type(rec);
+	struct index_record out = {};
+	struct record out_rec;
+	record_from_index(&out_rec, &out);
+	int m = record_decode(out_rec, key, extra, dest, SHA1_SIZE);
+	assert(m == n);
+
+	assert(in.offset == out.offset);
+
+	record_clear(out_rec);
+	free(slice_yield(&key));
+	free(slice_yield(&in.last_key));
+}
+
+int main()
+{
+	add_test_case("test_log_record_roundtrip", &test_log_record_roundtrip);
+	add_test_case("test_ref_record_roundtrip", &test_ref_record_roundtrip);
+	add_test_case("varint_roundtrip", &varint_roundtrip);
+	add_test_case("test_key_roundtrip", &test_key_roundtrip);
+	add_test_case("test_common_prefix", &test_common_prefix);
+	add_test_case("test_obj_record_roundtrip", &test_obj_record_roundtrip);
+	add_test_case("test_index_record_roundtrip",
+		      &test_index_record_roundtrip);
+	add_test_case("test_u24_roundtrip", &test_u24_roundtrip);
+	test_main();
+}
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 0000000000..d84cc2004a
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,399 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include "system.h"
+
+typedef uint8_t byte;
+typedef byte bool;
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct block_source {
+	struct block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct block {
+	byte *data;
+	int len;
+	struct block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct block *dest, uint64_t off,
+			  uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int block_source_from_file(struct block_source *block_src, const char *name);
+
+/* write_options sets options for writing a single reftable. */
+struct write_options {
+	/* do not pad out blocks to block size. */
+	bool unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* do not generate a SHA1 => ref index. */
+	bool skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+};
+
+/* ref_record holds a ref database entry target_value */
+struct ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	byte *value; /* SHA1, or NULL. malloced. */
+	byte *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+bool ref_record_is_deletion(const struct ref_record *ref);
+
+/* prints a ref_record onto stdout */
+void ref_record_print(struct ref_record *ref, int hash_size);
+
+/* frees and nulls all pointer values. */
+void ref_record_clear(struct ref_record *ref);
+
+/* returns whether two ref_records are the same */
+bool ref_record_equal(struct ref_record *a, struct ref_record *b,
+		      int hash_size);
+
+/* log_record holds a reflog entry */
+struct log_record {
+	char *ref_name;
+	uint64_t update_index;
+	byte *new_hash;
+	byte *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+bool log_record_is_deletion(const struct log_record *log);
+
+/* frees and nulls all pointer values. */
+void log_record_clear(struct log_record *log);
+
+/* returns whether two records are equal. */
+bool log_record_equal(struct log_record *a, struct log_record *b,
+		      int hash_size);
+
+void log_record_print(struct log_record *log, int hash_size);
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct iterator {
+	struct iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_ref(struct iterator it, struct ref_record *ref);
+
+/* reads the next log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_log(struct iterator it, struct log_record *log);
+
+/* releases resources associated with an iterator. */
+void iterator_destroy(struct iterator *it);
+
+/* block_stats holds statistics for a single block type */
+struct block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct block_stats obj_stats;
+	/* stats for index blocks */
+	struct block_stats idx_stats;
+	/* stats for log blocks */
+	struct block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* different types of errors */
+
+/* Unexpected file system behavior */
+#define IO_ERROR -2
+
+/* Format inconsistency on reading data
+ */
+#define FORMAT_ERROR -3
+
+/* File does not exist. Returned from block_source_from_file(),  because it
+   needs special handling in stack.
+*/
+#define NOT_EXIST_ERROR -4
+
+/* Trying to write out-of-date data. */
+#define LOCK_ERROR -5
+
+/* Misuse of the API:
+   - on writing a record with NULL ref_name.
+   - on writing a ref_record outside the table limits
+   - on writing a ref or log record before the stack's next_update_index
+   - on reading a ref_record from log iterator, or vice versa.
+ */
+#define API_ERROR -6
+
+/* Decompression error */
+#define ZLIB_ERROR -7
+
+const char *error_str(int err);
+
+/* new_writer creates a new writer */
+struct writer *new_writer(int (*writer_func)(void *, byte *, int),
+			  void *writer_arg, struct write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int fd_writer(void *fdp, byte *data, int size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   stack_next_update_index(), or API_ERROR is returned.
+ */
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max);
+
+/* adds a ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   writer_set_limits(), or API_ERROR is returned.
+ */
+int writer_add_ref(struct writer *w, struct ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n);
+
+/* adds a log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int writer_add_log(struct writer *w, struct log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int writer_add_logs(struct writer *w, struct log_record *logs, int n);
+
+/* writer_close finalizes the reftable. The writer is retained so statistics can
+ * be inspected. */
+int writer_close(struct writer *w);
+
+/* writer_stats returns the statistics on the reftable being written. */
+struct stats *writer_stats(struct writer *w);
+
+/* writer_free deallocates memory for the writer */
+void writer_free(struct writer *w);
+
+struct reader;
+
+/* new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp.  The name is used for creating a
+ * stack. Typically, it is the basename of the file.
+ */
+int new_reader(struct reader **pp, struct block_source, const char *name);
+
+/* reader_seek_ref returns an iterator where 'name' would be inserted in the
+   table.
+
+   example:
+
+   struct reader *r = NULL;
+   int err = new_reader(&r, src, "filename");
+   if (err < 0) { ... }
+   struct iterator it = {};
+   err = reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct ref_record ref = {};
+   while (1) {
+     err = iterator_next_ref(it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   iterator_destroy(&it);
+   ref_record_clear(&ref);
+ */
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name);
+
+/* seek to logs for the given name, older than update_index. */
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+		       uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reader_free(struct reader *);
+
+/* return an iterator for the refs pointing to oid */
+int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
+		    int oid_len);
+
+/* return the max_update_index for a table */
+uint64_t reader_max_update_index(struct reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reader_min_update_index(struct reader *r);
+
+/* a merged table is implements seeking/iterating over a stack of tables. */
+struct merged_table;
+
+/* new_merged_table creates a new merged table. It takes ownership of the stack
+   array.
+*/
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n);
+
+/* returns an iterator positioned just before 'name' */
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+			  const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+			     const char *name, uint64_t update_index);
+
+/* like merged_table_seek_log_at but look for the newest entry. */
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+			  const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t merged_max_update_index(struct merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t merged_min_update_index(struct merged_table *mt);
+
+/* closes readers for the merged tables */
+void merged_table_close(struct merged_table *mt);
+
+/* releases memory for the merged_table */
+void merged_table_free(struct merged_table *m);
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct stack;
+
+/* open a new reftable stack. The tables will be stored in 'dir', while the list
+   of tables is in 'list_file'. Typically, this should be .git/reftables and
+   .git/refs respectively.
+*/
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+	      struct write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t stack_next_update_index(struct stack *st);
+
+/* add a new table to the stack. The write_table function must call
+   writer_set_limits, add refs and return an error value. */
+int stack_add(struct stack *st,
+	      int (*write_table)(struct writer *wr, void *write_arg),
+	      void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct merged_table *stack_merged_table(struct stack *st);
+
+/* frees all resources associated with the stack. */
+void stack_destroy(struct stack *st);
+
+/* reloads the stack if necessary. */
+int stack_reload(struct stack *st);
+
+/* Policy for expiring reflog entries. */
+struct log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int stack_compact_all(struct stack *st, struct log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int stack_auto_compact(struct stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_ref(struct stack *st, const char *refname,
+		   struct ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_log(struct stack *st, const char *refname,
+		   struct log_record *log);
+
+/* statistics on past compactions. */
+struct compaction_stats {
+	uint64_t bytes;
+	int attempts;
+	int failures;
+};
+
+struct compaction_stats *stack_compaction_stats(struct stack *st);
+
+#endif
diff --git a/reftable/reftable_test.c b/reftable/reftable_test.c
new file mode 100644
index 0000000000..1e89d37d9d
--- /dev/null
+++ b/reftable/reftable_test.c
@@ -0,0 +1,481 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "record.h"
+#include "test_framework.h"
+
+static const int update_index = 5;
+
+void test_buffer(void)
+{
+	struct slice buf = {};
+
+	byte in[] = "hello";
+	slice_write(&buf, in, sizeof(in));
+	struct block_source source;
+	block_source_from_slice(&source, &buf);
+	assert(block_source_size(source) == 6);
+	struct block out = {};
+	int n = block_source_read_block(source, &out, 0, sizeof(in));
+	assert(n == sizeof(in));
+	assert(!memcmp(in, out.data, n));
+	block_source_return_block(source, &out);
+
+	n = block_source_read_block(source, &out, 1, 2);
+	assert(n == 2);
+	assert(!memcmp(out.data, "el", 2));
+
+	block_source_return_block(source, &out);
+	block_source_close(&source);
+	free(slice_yield(&buf));
+}
+
+void write_table(char ***names, struct slice *buf, int N, int block_size)
+{
+	*names = calloc(sizeof(char *), N + 1);
+
+	struct write_options opts = {
+		.block_size = block_size,
+	};
+
+	struct writer *w = new_writer(&slice_write_void, buf, &opts);
+
+	writer_set_limits(w, update_index, update_index);
+	{
+		struct ref_record ref = {};
+		int i = 0;
+		for (i = 0; i < N; i++) {
+			byte hash[SHA1_SIZE];
+			set_test_hash(hash, i);
+
+			char name[100];
+			snprintf(name, sizeof(name), "refs/heads/branch%02d",
+				 i);
+
+			ref.ref_name = name;
+			ref.value = hash;
+			ref.update_index = update_index;
+			(*names)[i] = strdup(name);
+
+			int n = writer_add_ref(w, &ref);
+			assert(n == 0);
+		}
+	}
+	{
+		struct log_record log = {};
+		int i = 0;
+		for (i = 0; i < N; i++) {
+			byte hash[SHA1_SIZE];
+			set_test_hash(hash, i);
+
+			char name[100];
+			snprintf(name, sizeof(name), "refs/heads/branch%02d",
+				 i);
+
+			log.ref_name = name;
+			log.new_hash = hash;
+			log.update_index = update_index;
+			log.message = "message";
+
+			int n = writer_add_log(w, &log);
+			assert(n == 0);
+		}
+	}
+
+	int n = writer_close(w);
+	assert(n == 0);
+
+	struct stats *stats = writer_stats(w);
+	int i = 0;
+	for (i = 0; i < stats->ref_stats.blocks; i++) {
+		int off = i * opts.block_size;
+		if (off == 0) {
+			off = HEADER_SIZE;
+		}
+		assert(buf->buf[off] == 'r');
+	}
+
+	writer_free(w);
+	w = NULL;
+}
+
+void test_log_write_read(void)
+{
+	int N = 2;
+	char **names = calloc(sizeof(char *), N + 1);
+
+	struct write_options opts = {
+		.block_size = 256,
+	};
+
+	struct slice buf = {};
+	struct writer *w = new_writer(&slice_write_void, &buf, &opts);
+
+	writer_set_limits(w, 0, N);
+	{
+		struct ref_record ref = {};
+		int i = 0;
+		for (i = 0; i < N; i++) {
+			char name[256];
+			snprintf(name, sizeof(name), "b%02d%0*d", i, 130, 7);
+			names[i] = strdup(name);
+			puts(name);
+			ref.ref_name = name;
+			ref.update_index = i;
+
+			int err = writer_add_ref(w, &ref);
+			assert_err(err);
+		}
+	}
+
+	{
+		struct log_record log = {};
+		int i = 0;
+		for (i = 0; i < N; i++) {
+			byte hash1[SHA1_SIZE], hash2[SHA1_SIZE];
+			set_test_hash(hash1, i);
+			set_test_hash(hash2, i + 1);
+
+			log.ref_name = names[i];
+			log.update_index = i;
+			log.old_hash = hash1;
+			log.new_hash = hash2;
+
+			int err = writer_add_log(w, &log);
+			assert_err(err);
+		}
+	}
+
+	int n = writer_close(w);
+	assert(n == 0);
+
+	struct stats *stats = writer_stats(w);
+	assert(stats->log_stats.blocks > 0);
+	writer_free(w);
+	w = NULL;
+
+	struct block_source source = {};
+	block_source_from_slice(&source, &buf);
+
+	struct reader rd = {};
+	int err = init_reader(&rd, source, "file.log");
+	assert(err == 0);
+
+	{
+		struct iterator it = {};
+		err = reader_seek_ref(&rd, &it, names[N - 1]);
+		assert(err == 0);
+
+		struct ref_record ref = {};
+		err = iterator_next_ref(it, &ref);
+		assert_err(err);
+
+		/* end of iteration. */
+		err = iterator_next_ref(it, &ref);
+		assert(0 < err);
+
+		iterator_destroy(&it);
+		ref_record_clear(&ref);
+	}
+
+	{
+		struct iterator it = {};
+		err = reader_seek_log(&rd, &it, "");
+		assert(err == 0);
+
+		struct log_record log = {};
+		int i = 0;
+		while (true) {
+			int err = iterator_next_log(it, &log);
+			if (err > 0) {
+				break;
+			}
+
+			assert_err(err);
+			assert_streq(names[i], log.ref_name);
+			assert(i == log.update_index);
+			i++;
+		}
+
+		assert(i == N);
+		iterator_destroy(&it);
+	}
+
+	/* cleanup. */
+	free(slice_yield(&buf));
+	free_names(names);
+	reader_close(&rd);
+}
+
+void test_table_read_write_sequential(void)
+{
+	char **names;
+	struct slice buf = {};
+	int N = 50;
+	write_table(&names, &buf, N, 256);
+
+	struct block_source source = {};
+	block_source_from_slice(&source, &buf);
+
+	struct reader rd = {};
+	int err = init_reader(&rd, source, "file.ref");
+	assert(err == 0);
+
+	struct iterator it = {};
+	err = reader_seek_ref(&rd, &it, "");
+	assert(err == 0);
+
+	int j = 0;
+	while (true) {
+		struct ref_record ref = {};
+		int r = iterator_next_ref(it, &ref);
+		assert(r >= 0);
+		if (r > 0) {
+			break;
+		}
+		assert(0 == strcmp(names[j], ref.ref_name));
+		assert(update_index == ref.update_index);
+
+		j++;
+		ref_record_clear(&ref);
+	}
+	assert(j == N);
+	iterator_destroy(&it);
+	free(slice_yield(&buf));
+	free_names(names);
+
+	reader_close(&rd);
+}
+
+void test_table_write_small_table(void)
+{
+	char **names;
+	struct slice buf = {};
+	int N = 1;
+	write_table(&names, &buf, N, 4096);
+	assert(buf.len < 200);
+	free(slice_yield(&buf));
+	free_names(names);
+}
+
+void test_table_read_api(void)
+{
+	char **names;
+	struct slice buf = {};
+	int N = 50;
+	write_table(&names, &buf, N, 256);
+
+	struct reader rd = {};
+	struct block_source source = {};
+	block_source_from_slice(&source, &buf);
+
+	int err = init_reader(&rd, source, "file.ref");
+	assert(err == 0);
+
+	struct iterator it = {};
+	err = reader_seek_ref(&rd, &it, names[0]);
+	assert(err == 0);
+
+	struct log_record log = {};
+	err = iterator_next_log(it, &log);
+	assert(err == API_ERROR);
+
+	free(slice_yield(&buf));
+	int i = 0;
+	for (i = 0; i < N; i++) {
+		free(names[i]);
+	}
+	free(names);
+	reader_close(&rd);
+}
+
+void test_table_read_write_seek(bool index)
+{
+	char **names;
+	struct slice buf = {};
+	int N = 50;
+	write_table(&names, &buf, N, 256);
+
+	struct reader rd = {};
+	struct block_source source = {};
+	block_source_from_slice(&source, &buf);
+
+	int err = init_reader(&rd, source, "file.ref");
+	assert(err == 0);
+
+	if (!index) {
+		rd.ref_offsets.index_offset = 0;
+	}
+
+	int i = 0;
+	for (i = 1; i < N; i++) {
+		struct iterator it = {};
+		int err = reader_seek_ref(&rd, &it, names[i]);
+		assert(err == 0);
+		struct ref_record ref = {};
+		err = iterator_next_ref(it, &ref);
+		assert(err == 0);
+		assert(0 == strcmp(names[i], ref.ref_name));
+		assert(i == ref.value[0]);
+
+		ref_record_clear(&ref);
+		iterator_destroy(&it);
+	}
+
+	free(slice_yield(&buf));
+	for (i = 0; i < N; i++) {
+		free(names[i]);
+	}
+	free(names);
+	reader_close(&rd);
+}
+
+void test_table_read_write_seek_linear(void)
+{
+	test_table_read_write_seek(false);
+}
+
+void test_table_read_write_seek_index(void)
+{
+	test_table_read_write_seek(true);
+}
+
+void test_table_refs_for(bool indexed)
+{
+	int N = 50;
+
+	char **want_names = calloc(sizeof(char *), N + 1);
+
+	int want_names_len = 0;
+	byte want_hash[SHA1_SIZE];
+	set_test_hash(want_hash, 4);
+
+	struct write_options opts = {
+		.block_size = 256,
+	};
+
+	struct slice buf = {};
+	struct writer *w = new_writer(&slice_write_void, &buf, &opts);
+	{
+		struct ref_record ref = {};
+		int i = 0;
+		for (i = 0; i < N; i++) {
+			byte hash[SHA1_SIZE];
+			memset(hash, i, sizeof(hash));
+			char fill[51] = {};
+			memset(fill, 'x', 50);
+			char name[100];
+			/* Put the variable part in the start */
+			snprintf(name, sizeof(name), "br%02d%s", i, fill);
+			name[40] = 0;
+			ref.ref_name = name;
+
+			byte hash1[SHA1_SIZE];
+			byte hash2[SHA1_SIZE];
+
+			set_test_hash(hash1, i / 4);
+			set_test_hash(hash2, 3 + i / 4);
+			ref.value = hash1;
+			ref.target_value = hash2;
+
+			/* 80 bytes / entry, so 3 entries per block. Yields 17 */
+			/* blocks. */
+			int n = writer_add_ref(w, &ref);
+			assert(n == 0);
+
+			if (!memcmp(hash1, want_hash, SHA1_SIZE) ||
+			    !memcmp(hash2, want_hash, SHA1_SIZE)) {
+				want_names[want_names_len++] = strdup(name);
+			}
+		}
+	}
+
+	int n = writer_close(w);
+	assert(n == 0);
+
+	writer_free(w);
+	w = NULL;
+
+	struct reader rd;
+	struct block_source source = {};
+	block_source_from_slice(&source, &buf);
+
+	int err = init_reader(&rd, source, "file.ref");
+	assert(err == 0);
+	if (!indexed) {
+		rd.obj_offsets.present = 0;
+	}
+
+	struct iterator it = {};
+	err = reader_seek_ref(&rd, &it, "");
+	assert(err == 0);
+	iterator_destroy(&it);
+
+	err = reader_refs_for(&rd, &it, want_hash, SHA1_SIZE);
+	assert(err == 0);
+
+	struct ref_record ref = {};
+
+	int j = 0;
+	while (true) {
+		int err = iterator_next_ref(it, &ref);
+		assert(err >= 0);
+		if (err > 0) {
+			break;
+		}
+
+		assert(j < want_names_len);
+		assert(0 == strcmp(ref.ref_name, want_names[j]));
+		j++;
+		ref_record_clear(&ref);
+	}
+	assert(j == want_names_len);
+
+	free(slice_yield(&buf));
+	free_names(want_names);
+	iterator_destroy(&it);
+	reader_close(&rd);
+}
+
+void test_table_refs_for_no_index(void)
+{
+	test_table_refs_for(false);
+}
+
+void test_table_refs_for_obj_index(void)
+{
+	test_table_refs_for(true);
+}
+
+int main()
+{
+	add_test_case("test_log_write_read", test_log_write_read);
+	add_test_case("test_table_write_small_table",
+		      &test_table_write_small_table);
+	add_test_case("test_buffer", &test_buffer);
+	add_test_case("test_table_read_api", &test_table_read_api);
+	add_test_case("test_table_read_write_sequential",
+		      &test_table_read_write_sequential);
+	add_test_case("test_table_read_write_seek_linear",
+		      &test_table_read_write_seek_linear);
+	add_test_case("test_table_read_write_seek_index",
+		      &test_table_read_write_seek_index);
+	add_test_case("test_table_read_write_refs_for_no_index",
+		      &test_table_refs_for_no_index);
+	add_test_case("test_table_read_write_refs_for_obj_index",
+		      &test_table_refs_for_obj_index);
+	test_main();
+}
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 0000000000..efbe625253
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,199 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+
+	{
+		int l = strlen(str);
+		l++; /* \0 */
+		slice_resize(s, l);
+		memcpy(s->buf, str, l);
+		s->len = l - 1;
+	}
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_append_string(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_append(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+byte *slice_yield(struct slice *s)
+{
+	byte *p = s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = {};
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_yield(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	if (a.len != b.len) {
+		return 0;
+	}
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_compare(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	if (res != 0) {
+		return res;
+	}
+	if (a.len < b.len) {
+		return -1;
+	} else if (a.len > b.len) {
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+int slice_write(struct slice *b, byte *data, int sz)
+{
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_write_void(void *b, byte *data, int sz)
+{
+	return slice_write((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = calloc(size, 1);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct block_source *bs, struct slice *buf)
+{
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+struct block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 0000000000..f12a6db228
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+struct slice {
+	byte *buf;
+	int len;
+	int cap;
+};
+
+void slice_set_string(struct slice *dest, const char *);
+void slice_append_string(struct slice *dest, const char *);
+char *slice_to_string(struct slice src);
+const char *slice_as_string(struct slice *src);
+bool slice_equal(struct slice a, struct slice b);
+byte *slice_yield(struct slice *s);
+void slice_copy(struct slice *dest, struct slice src);
+void slice_resize(struct slice *s, int l);
+int slice_compare(struct slice a, struct slice b);
+int slice_write(struct slice *b, byte *data, int sz);
+int slice_write_void(void *b, byte *data, int sz);
+void slice_append(struct slice *dest, struct slice add);
+
+struct block_source;
+void block_source_from_slice(struct block_source *bs, struct slice *buf);
+
+struct block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/slice_test.c b/reftable/slice_test.c
new file mode 100644
index 0000000000..40a391383c
--- /dev/null
+++ b/reftable/slice_test.c
@@ -0,0 +1,38 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+
+void test_slice(void)
+{
+	struct slice s = {};
+	slice_set_string(&s, "abc");
+	assert(0 == strcmp("abc", slice_as_string(&s)));
+
+	struct slice t = {};
+	slice_set_string(&t, "pqr");
+
+	slice_append(&s, t);
+	assert(0 == strcmp("abcpqr", slice_as_string(&s)));
+
+	free(slice_yield(&s));
+	free(slice_yield(&t));
+}
+
+int main()
+{
+	add_test_case("test_slice", &test_slice);
+	test_main();
+}
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 0000000000..58b1656364
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,985 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "reftable.h"
+#include "writer.h"
+
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+	      struct write_options config)
+{
+	struct stack *p = calloc(sizeof(struct stack), 1);
+	int err = 0;
+	*dest = NULL;
+	p->list_file = strdup(list_file);
+	p->reftable_dir = strdup(dir);
+	p->config = config;
+
+	err = stack_reload(p);
+	if (err < 0) {
+		stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fread_lines(FILE *f, char ***namesp)
+{
+	long size = 0;
+	int err = fseek(f, 0, SEEK_END);
+	char *buf = NULL;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	size = ftell(f);
+	if (size < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = fseek(f, 0, SEEK_SET);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	buf = malloc(size + 1);
+	if (fread(buf, 1, size, f) != size) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+exit:
+	free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	FILE *f = fopen(filename, "r");
+	int err = 0;
+	if (f == NULL) {
+		if (errno == ENOENT) {
+			*namesp = calloc(sizeof(char *), 1);
+			return 0;
+		}
+
+		return IO_ERROR;
+	}
+	err = fread_lines(f, namesp);
+	fclose(f);
+	return err;
+}
+
+struct merged_table *stack_merged_table(struct stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void stack_destroy(struct stack *st)
+{
+	if (st->merged == NULL) {
+		return;
+	}
+
+	merged_table_close(st->merged);
+	merged_table_free(st->merged);
+	st->merged = NULL;
+
+	free(st->list_file);
+	st->list_file = NULL;
+	free(st->reftable_dir);
+	st->reftable_dir = NULL;
+	free(st);
+}
+
+static struct reader **stack_copy_readers(struct stack *st, int cur_len)
+{
+	struct reader **cur = calloc(sizeof(struct reader *), cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int stack_reload_once(struct stack *st, char **names, bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reader **new_tables =
+		malloc(sizeof(struct reader *) * names_len);
+	int new_tables_len = 0;
+	struct merged_table *new_merged = NULL;
+
+	struct slice table_path = {};
+
+	while (*names) {
+		struct reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct block_source src = {};
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_append_string(&table_path, "/");
+			slice_append_string(&table_path, name);
+
+			err = block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0) {
+				goto exit;
+			}
+
+			err = new_reader(&rd, src, name);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = new_merged_table(&new_merged, new_tables, new_tables_len);
+	if (err < 0) {
+		goto exit;
+	}
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		merged_table_free(st->merged);
+	}
+	st->merged = new_merged;
+
+	{
+		int i = 0;
+		for (i = 0; i < cur_len; i++) {
+			if (cur[i] != NULL) {
+				reader_close(cur[i]);
+				reader_free(cur[i]);
+			}
+		}
+	}
+exit:
+	free(slice_yield(&table_path));
+	{
+		int i = 0;
+		for (i = 0; i < new_tables_len; i++) {
+			reader_close(new_tables[i]);
+		}
+	}
+	free(new_tables);
+	free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0) {
+		return diff;
+	}
+
+	return udiff;
+}
+
+static int stack_reload_maybe_reuse(struct stack *st, bool reuse_open)
+{
+	struct timeval deadline = {};
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = {};
+		int err = gettimeofday(&now, NULL);
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		err = read_lines(st->list_file, &names_after);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return -1;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 100;
+		usleep(delay);
+	}
+
+	return 0;
+}
+
+int stack_reload(struct stack *st)
+{
+	return stack_reload_maybe_reuse(st, true);
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto exit;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto exit;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	free_names(names);
+	return err;
+}
+
+int stack_add(struct stack *st, int (*write)(struct writer *wr, void *arg),
+	      void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == LOCK_ERROR) {
+			err = stack_reload(st);
+		}
+		return err;
+	}
+
+	return stack_auto_compact(st);
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "%012lx-%012lx", min, max);
+	slice_set_string(dest, buf);
+}
+
+int stack_try_add(struct stack *st,
+		  int (*write_table)(struct writer *wr, void *arg), void *arg)
+{
+	struct slice lock_name = {};
+	struct slice temp_tab_name = {};
+	struct slice tab_name = {};
+	struct slice next_name = {};
+	struct slice table_list = {};
+	struct writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+	int lock_fd = 0;
+	uint64_t next_update_index = 0;
+
+	slice_set_string(&lock_name, st->list_file);
+	slice_append_string(&lock_name, ".lock");
+
+	lock_fd = open(slice_as_string(&lock_name), O_EXCL | O_CREAT | O_WRONLY,
+		       0644);
+	if (lock_fd < 0) {
+		if (errno == EEXIST) {
+			err = LOCK_ERROR;
+			goto exit;
+		}
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_uptodate(st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 1) {
+		err = LOCK_ERROR;
+		goto exit;
+	}
+
+	next_update_index = stack_next_update_index(st);
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, next_update_index, next_update_index);
+
+	slice_set_string(&temp_tab_name, st->reftable_dir);
+	slice_append_string(&temp_tab_name, "/");
+	slice_append(&temp_tab_name, next_name);
+	slice_append_string(&temp_tab_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_name));
+	if (tab_fd < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	wr = new_writer(fd_writer, &tab_fd, &st->config);
+	err = write_table(wr, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	if (wr->min_update_index < next_update_index) {
+		err = API_ERROR;
+		goto exit;
+	}
+
+	{
+		int i = 0;
+		for (i = 0; i < st->merged->stack_len; i++) {
+			slice_append_string(&table_list,
+					    st->merged->stack[i]->name);
+			slice_append_string(&table_list, "\n");
+		}
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_append_string(&next_name, ".ref");
+	slice_append(&table_list, next_name);
+	slice_append_string(&table_list, "\n");
+
+	slice_set_string(&tab_name, st->reftable_dir);
+	slice_append_string(&tab_name, "/");
+	slice_append(&tab_name, next_name);
+
+	err = rename(slice_as_string(&temp_tab_name),
+		     slice_as_string(&tab_name));
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	free(slice_yield(&temp_tab_name));
+
+	err = write(lock_fd, table_list.buf, table_list.len);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = close(lock_fd);
+	lock_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&tab_name));
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&tab_name));
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_reload(st);
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_name));
+	}
+	unlink(slice_as_string(&lock_name));
+
+	if (lock_fd > 0) {
+		close(lock_fd);
+		lock_fd = 0;
+	}
+
+	free(slice_yield(&lock_name));
+	free(slice_yield(&temp_tab_name));
+	free(slice_yield(&tab_name));
+	free(slice_yield(&next_name));
+	free(slice_yield(&table_list));
+	writer_free(wr);
+	return err;
+}
+
+uint64_t stack_next_update_index(struct stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0) {
+		return reader_max_update_index(st->merged->stack[sz - 1]) + 1;
+	}
+	return 1;
+}
+
+static int stack_compact_locked(struct stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct log_expiry_config *config)
+{
+	struct slice next_name = {};
+	int tab_fd = -1;
+	struct writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reader_min_update_index(st->merged->stack[first]),
+		    reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_append_string(temp_tab, "/");
+	slice_append(temp_tab, next_name);
+	slice_append_string(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = new_writer(fd_writer, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0) {
+		goto exit;
+	}
+	err = writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+	writer_free(wr);
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		free(slice_yield(temp_tab));
+	}
+	free(slice_yield(&next_name));
+	return err;
+}
+
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+			int last, struct log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reader **subtabs =
+		calloc(sizeof(struct reader *), last - first + 1);
+	struct merged_table *mt = NULL;
+	int err = 0;
+	struct iterator it = {};
+	struct ref_record ref = {};
+	struct log_record log = {};
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	writer_set_limits(wr, st->merged->stack[first]->min_update_index,
+			  st->merged->stack[last]->max_update_index);
+
+	err = new_merged_table(&mt, subtabs, subtabs_len);
+	if (err < 0) {
+		free(subtabs);
+		goto exit;
+	}
+
+	err = merged_table_seek_ref(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = iterator_next_ref(it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+	}
+
+	err = merged_table_seek_log(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = iterator_next_log(it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		/* XXX collect stats? */
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+	}
+
+exit:
+	iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		merged_table_free(mt);
+	}
+	ref_record_clear(&ref);
+
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct stack *st, int first, int last,
+			       struct log_expiry_config *expiry)
+{
+	struct slice temp_tab_name = {};
+	struct slice new_table_name = {};
+	struct slice lock_file_name = {};
+	struct slice ref_list_contents = {};
+	struct slice new_table_path = {};
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **delete_on_success = calloc(sizeof(char *), compact_count + 1);
+	char **subtable_locks = calloc(sizeof(char *), compact_count + 1);
+	int i = 0;
+	int j = 0;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto exit;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_append_string(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0) {
+		goto exit;
+	}
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_name = {};
+		struct slice subtab_lock = {};
+		slice_set_string(&subtab_name, st->reftable_dir);
+		slice_append_string(&subtab_name, "/");
+		slice_append_string(&subtab_name,
+				    reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_name);
+		slice_append_string(&subtab_lock, ".lock");
+
+		{
+			int sublock_file_fd =
+				open(slice_as_string(&subtab_lock),
+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
+			if (sublock_file_fd > 0) {
+				close(sublock_file_fd);
+			} else if (sublock_file_fd < 0) {
+				if (errno == EEXIST) {
+					err = 1;
+				}
+				err = IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] = (char *)slice_as_string(&subtab_name);
+		j++;
+
+		if (err != 0) {
+			goto exit;
+		}
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_name, expiry);
+	if (err < 0) {
+		goto exit;
+	}
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_append_string(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_append_string(&new_table_path, "/");
+
+	slice_append(&new_table_path, new_table_name);
+
+	err = rename(slice_as_string(&temp_tab_name),
+		     slice_as_string(&new_table_path));
+	if (err < 0) {
+		goto exit;
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	slice_append(&ref_list_contents, new_table_name);
+	slice_append_string(&ref_list_contents, "\n");
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	have_lock = false;
+
+	for (char **p = delete_on_success; *p; p++) {
+		if (strcmp(*p, slice_as_string(&new_table_path))) {
+			unlink(*p);
+		}
+	}
+
+	err = stack_reload_maybe_reuse(st, first < last);
+exit:
+	for (char **p = subtable_locks; *p; p++) {
+		unlink(*p);
+	}
+	free_names(delete_on_success);
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	free(slice_yield(&new_table_name));
+	free(slice_yield(&new_table_path));
+	free(slice_yield(&ref_list_contents));
+	free(slice_yield(&temp_tab_name));
+	free(slice_yield(&lock_file_name));
+	return err;
+}
+
+int stack_compact_all(struct stack *st, struct log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct stack *st, int first, int last,
+				     struct log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	assert(sz > 0);
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = calloc(sizeof(struct segment), n);
+	int next = 0;
+	struct segment cur = {};
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct stack *st)
+{
+	uint64_t *sizes = calloc(sizeof(uint64_t), st->merged->stack_len);
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		/* overhead is 24 + 68 = 92. */
+		sizes[i] = st->merged->stack[i]->size - 91;
+	}
+	return sizes;
+}
+
+int stack_auto_compact(struct stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	free(sizes);
+	if (segment_size(&seg) > 0) {
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+	}
+
+	return 0;
+}
+
+struct compaction_stats *stack_compaction_stats(struct stack *st)
+{
+	return &st->stats;
+}
+
+int stack_read_ref(struct stack *st, const char *refname,
+		   struct ref_record *ref)
+{
+	struct iterator it = {};
+	struct merged_table *mt = stack_merged_table(st);
+	int err = merged_table_seek_ref(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = iterator_next_ref(it, ref);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(ref->ref_name, refname) || ref_record_is_deletion(ref)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	iterator_destroy(&it);
+	return err;
+}
+
+int stack_read_log(struct stack *st, const char *refname,
+		   struct log_record *log)
+{
+	struct iterator it = {};
+	struct merged_table *mt = stack_merged_table(st);
+	int err = merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = iterator_next_log(it, log);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(log->ref_name, refname) || log_record_is_deletion(log)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 0000000000..d5e2c93c29
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,40 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+
+struct stack {
+	char *list_file;
+	char *reftable_dir;
+
+	struct write_options config;
+
+	struct merged_table *merged;
+	struct compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct stack *st,
+		  int (*write_table)(struct writer *wr, void *arg), void *arg);
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+			int last, struct log_expiry_config *config);
+int fastlog2(uint64_t sz);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
new file mode 100644
index 0000000000..d80cb1789a
--- /dev/null
+++ b/reftable/stack_test.c
@@ -0,0 +1,281 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+
+void test_read_file(void)
+{
+	char fn[256] = "/tmp/stack.test_read_file.XXXXXX";
+	int fd = mkstemp(fn);
+	assert(fd > 0);
+
+	char out[1024] = "line1\n\nline2\nline3";
+
+	int n = write(fd, out, strlen(out));
+	assert(n == strlen(out));
+	int err = close(fd);
+	assert(err >= 0);
+
+	char **names = NULL;
+	err = read_lines(fn, &names);
+	assert_err(err);
+
+	char *want[] = { "line1", "line2", "line3" };
+	int i = 0;
+	for (i = 0; names[i] != NULL; i++) {
+		assert(0 == strcmp(want[i], names[i]));
+	}
+	free_names(names);
+	remove(fn);
+}
+
+void test_parse_names(void)
+{
+	char buf[] = "line\n";
+	char **names = NULL;
+	parse_names(buf, strlen(buf), &names);
+
+	assert(NULL != names[0]);
+	assert(0 == strcmp(names[0], "line"));
+	assert(NULL == names[1]);
+	free_names(names);
+}
+
+void test_names_equal(void)
+{
+	char *a[] = { "a", "b", "c", NULL };
+	char *b[] = { "a", "b", "d", NULL };
+	char *c[] = { "a", "b", NULL };
+
+	assert(names_equal(a, a));
+	assert(!names_equal(a, b));
+	assert(!names_equal(a, c));
+}
+
+int write_test_ref(struct writer *wr, void *arg)
+{
+	struct ref_record *ref = arg;
+
+	writer_set_limits(wr, ref->update_index, ref->update_index);
+	int err = writer_add_ref(wr, ref);
+
+	return err;
+}
+
+int write_test_log(struct writer *wr, void *arg)
+{
+	struct log_record *log = arg;
+
+	writer_set_limits(wr, log->update_index, log->update_index);
+	int err = writer_add_log(wr, log);
+
+	return err;
+}
+
+void test_stack_add(void)
+{
+	int i = 0;
+	char dir[256] = "/tmp/stack.test_stack_add.XXXXXX";
+	assert(mkdtemp(dir));
+	printf("%s\n", dir);
+	char fn[256] = "";
+	strcat(fn, dir);
+	strcat(fn, "/refs");
+
+	struct write_options cfg = {};
+	struct stack *st = NULL;
+	int err = new_stack(&st, dir, fn, cfg);
+	assert_err(err);
+
+	struct ref_record refs[2] = {};
+	struct log_record logs[2] = {};
+	int N = ARRAYSIZE(refs);
+	for (i = 0; i < N; i++) {
+		char buf[256];
+		snprintf(buf, sizeof(buf), "branch%02d", i);
+		refs[i].ref_name = strdup(buf);
+		refs[i].value = malloc(SHA1_SIZE);
+		refs[i].update_index = i + 1;
+		set_test_hash(refs[i].value, i);
+
+		logs[i].ref_name = strdup(buf);
+		logs[i].update_index = N + i + 1;
+		logs[i].new_hash = malloc(SHA1_SIZE);
+		logs[i].email = strdup("identity@invalid");
+		set_test_hash(logs[i].new_hash, i);
+	}
+
+	for (i = 0; i < N; i++) {
+		int err = stack_add(st, &write_test_ref, &refs[i]);
+		assert_err(err);
+	}
+
+	for (i = 0; i < N; i++) {
+		int err = stack_add(st, &write_test_log, &logs[i]);
+		assert_err(err);
+	}
+
+	err = stack_compact_all(st, NULL);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		struct ref_record dest = {};
+		int err = stack_read_ref(st, refs[i].ref_name, &dest);
+		assert_err(err);
+		assert(ref_record_equal(&dest, refs + i, SHA1_SIZE));
+		ref_record_clear(&dest);
+	}
+
+	for (i = 0; i < N; i++) {
+		struct log_record dest = {};
+		int err = stack_read_log(st, refs[i].ref_name, &dest);
+		assert_err(err);
+		assert(log_record_equal(&dest, logs + i, SHA1_SIZE));
+		log_record_clear(&dest);
+	}
+
+	/* cleanup */
+	stack_destroy(st);
+	for (i = 0; i < N; i++) {
+		ref_record_clear(&refs[i]);
+		log_record_clear(&logs[i]);
+	}
+}
+
+void test_log2(void)
+{
+	assert(1 == fastlog2(3));
+	assert(2 == fastlog2(4));
+	assert(2 == fastlog2(5));
+}
+
+void test_sizes_to_segments(void)
+{
+	uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
+	/* .................0  1  2  3  4  5 */
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAYSIZE(sizes));
+	assert(segs[2].log == 3);
+	assert(segs[2].start == 5);
+	assert(segs[2].end == 6);
+
+	assert(segs[1].log == 2);
+	assert(segs[1].start == 2);
+	assert(segs[1].end == 5);
+	free(segs);
+}
+
+void test_suggest_compaction_segment(void)
+{
+	{
+		uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
+		/* .................0    1    2  3   4  5  6 */
+		struct segment min =
+			suggest_compaction_segment(sizes, ARRAYSIZE(sizes));
+		assert(min.start == 2);
+		assert(min.end == 7);
+	}
+
+	{
+		uint64_t sizes[] = { 64, 32, 16, 8, 4, 2 };
+		struct segment result =
+			suggest_compaction_segment(sizes, ARRAYSIZE(sizes));
+		assert(result.start == result.end);
+	}
+}
+
+void test_reflog_expire(void)
+{
+	char dir[256] = "/tmp/stack.test_reflog_expire.XXXXXX";
+	assert(mkdtemp(dir));
+	printf("%s\n", dir);
+	char fn[256] = "";
+	strcat(fn, dir);
+	strcat(fn, "/refs");
+
+	struct write_options cfg = {};
+	struct stack *st = NULL;
+	int err = new_stack(&st, dir, fn, cfg);
+	assert_err(err);
+
+	struct log_record logs[20] = {};
+	int N = ARRAYSIZE(logs) - 1;
+	int i = 0;
+	for (i = 1; i <= N; i++) {
+		char buf[256];
+		snprintf(buf, sizeof(buf), "branch%02d", i);
+
+		logs[i].ref_name = strdup(buf);
+		logs[i].update_index = i;
+		logs[i].time = i;
+		logs[i].new_hash = malloc(SHA1_SIZE);
+		logs[i].email = strdup("identity@invalid");
+		set_test_hash(logs[i].new_hash, i);
+	}
+
+	for (i = 1; i <= N; i++) {
+		int err = stack_add(st, &write_test_log, &logs[i]);
+		assert_err(err);
+	}
+
+	err = stack_compact_all(st, NULL);
+	assert_err(err);
+
+	struct log_expiry_config expiry = {
+		.time = 10,
+	};
+	err = stack_compact_all(st, &expiry);
+	assert_err(err);
+
+	struct log_record log = {};
+	err = stack_read_log(st, logs[9].ref_name, &log);
+	assert(err == 1);
+
+	err = stack_read_log(st, logs[11].ref_name, &log);
+	assert_err(err);
+
+	expiry.min_update_index = 15;
+	err = stack_compact_all(st, &expiry);
+	assert_err(err);
+
+	err = stack_read_log(st, logs[14].ref_name, &log);
+	assert(err == 1);
+
+	err = stack_read_log(st, logs[16].ref_name, &log);
+	assert_err(err);
+
+	/* cleanup */
+	stack_destroy(st);
+	for (i = 0; i < N; i++) {
+		log_record_clear(&logs[i]);
+	}
+}
+
+int main()
+{
+	add_test_case("test_reflog_expire", test_reflog_expire);
+	add_test_case("test_suggest_compaction_segment",
+		      &test_suggest_compaction_segment);
+	add_test_case("test_sizes_to_segments", &test_sizes_to_segments);
+	add_test_case("test_log2", &test_log2);
+	add_test_case("test_parse_names", &test_parse_names);
+	add_test_case("test_read_file", &test_read_file);
+	add_test_case("test_names_equal", &test_names_equal);
+	add_test_case("test_stack_add", &test_stack_add);
+	test_main();
+}
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 0000000000..97214ae210
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,36 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#include "config.h"
+
+#ifndef REFTABLE_STANDALONE
+#include "git-compat-util.h"
+#else
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#define PRIuMAX "lu"
+#define PRIdMAX "ld"
+#define PRIxMAX "lx"
+
+#endif
+
+#endif
diff --git a/reftable/test_framework.c b/reftable/test_framework.c
new file mode 100644
index 0000000000..b9f633934a
--- /dev/null
+++ b/reftable/test_framework.c
@@ -0,0 +1,67 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "test_framework.h"
+
+#include "system.h"
+
+#include "constants.h"
+
+struct test_case **test_cases;
+int test_case_len;
+int test_case_cap;
+
+struct test_case *new_test_case(const char *name, void (*testfunc)())
+{
+	struct test_case *tc = malloc(sizeof(struct test_case));
+	tc->name = name;
+	tc->testfunc = testfunc;
+	return tc;
+}
+
+struct test_case *add_test_case(const char *name, void (*testfunc)())
+{
+	struct test_case *tc = new_test_case(name, testfunc);
+	if (test_case_len == test_case_cap) {
+		test_case_cap = 2 * test_case_cap + 1;
+		test_cases = realloc(test_cases,
+				     sizeof(struct test_case) * test_case_cap);
+	}
+
+	test_cases[test_case_len++] = tc;
+	return tc;
+}
+
+void test_main()
+{
+	int i = 0;
+	for (i = 0; i < test_case_len; i++) {
+		printf("case %s\n", test_cases[i]->name);
+		test_cases[i]->testfunc();
+	}
+}
+
+void set_test_hash(byte *p, int i)
+{
+	memset(p, (byte)i, SHA1_SIZE);
+}
+
+void print_names(char **a)
+{
+	if (a == NULL || *a == NULL) {
+		puts("[]");
+		return;
+	}
+	puts("[");
+	char **p = a;
+	while (*p) {
+		puts(*p);
+		p++;
+	}
+	puts("]");
+}
diff --git a/reftable/test_framework.h b/reftable/test_framework.h
new file mode 100644
index 0000000000..d3e793c3a6
--- /dev/null
+++ b/reftable/test_framework.h
@@ -0,0 +1,64 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TEST_FRAMEWORK_H
+#define TEST_FRAMEWORK_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#ifdef NDEBUG
+#undef NDEBUG
+#endif
+
+#include "system.h"
+
+#ifdef assert
+#undef assert
+#endif
+
+#define assert_err(c)                                                 \
+	if (c != 0) {                                                 \
+		fflush(stderr);                                       \
+		fflush(stdout);                                       \
+		fprintf(stderr, "%s: %d: error == %d (%s), want 0\n", \
+			__FILE__, __LINE__, c, error_str(c));         \
+		abort();                                              \
+	}
+
+#define assert_streq(a, b)                                               \
+	if (strcmp(a, b)) {                                              \
+		fflush(stderr);                                          \
+		fflush(stdout);                                          \
+		fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, \
+			__LINE__, #a, a, #b, b);                         \
+		abort();                                                 \
+	}
+
+#define assert(c)                                                          \
+	if (!(c)) {                                                        \
+		fflush(stderr);                                            \
+		fflush(stdout);                                            \
+		fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, \
+			__LINE__, #c);                                     \
+		abort();                                                   \
+	}
+
+struct test_case {
+	const char *name;
+	void (*testfunc)();
+};
+
+struct test_case *new_test_case(const char *name, void (*testfunc)());
+struct test_case *add_test_case(const char *name, void (*testfunc)());
+void test_main();
+
+void set_test_hash(byte *p, int i);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 0000000000..9bf7fe531f
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,66 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				calloc(sizeof(struct tree_node), 1);
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	{
+		int res = compare(key, (*rootp)->key);
+		if (res < 0) {
+			return tree_search(key, &(*rootp)->left, compare,
+					   insert);
+		} else if (res > 0) {
+			return tree_search(key, &(*rootp)->right, compare,
+					   insert);
+		}
+	}
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 0000000000..86a71715ae
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,24 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/tree_test.c b/reftable/tree_test.c
new file mode 100644
index 0000000000..842985376c
--- /dev/null
+++ b/reftable/tree_test.c
@@ -0,0 +1,61 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+
+static int test_compare(const void *a, const void *b)
+{
+	return a - b;
+}
+
+struct curry {
+	void *last;
+};
+
+void check_increasing(void *arg, void *key)
+{
+	struct curry *c = (struct curry *)arg;
+	if (c->last != NULL) {
+		assert(test_compare(c->last, key) < 0);
+	}
+	c->last = key;
+}
+
+void test_tree()
+{
+	struct tree_node *root = NULL;
+
+	void *values[11] = {};
+	struct tree_node *nodes[11] = {};
+	int i = 1;
+	do {
+		nodes[i] = tree_search(values + i, &root, &test_compare, 1);
+		i = (i * 7) % 11;
+	} while (i != 1);
+
+	for (i = 1; i < ARRAYSIZE(nodes); i++) {
+		assert(values + i == nodes[i]->key);
+		assert(nodes[i] ==
+		       tree_search(values + i, &root, &test_compare, 0));
+	}
+
+	struct curry c = {};
+	infix_walk(root, check_increasing, &c);
+	tree_free(root);
+}
+
+int main()
+{
+	add_test_case("test_tree", &test_tree);
+	test_main();
+}
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 0000000000..cb8e4bbf89
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,624 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct block_stats *writer_block_stats(struct writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct writer *w, byte *data, size_t len, int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = calloc(w->pending_padding, 1);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0) {
+			return n;
+		}
+
+		w->pending_padding = 0;
+		free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0) {
+		return n;
+	}
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_write_header(struct writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+	dest[4] = 1; /* version */
+	put_u24(dest + 5, w->opts.block_size);
+	put_u64(dest + 8, w->min_update_index);
+	put_u64(dest + 16, w->max_update_index);
+	return 24;
+}
+
+static void writer_reinit_block_writer(struct writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = HEADER_SIZE;
+	}
+
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start, w->hash_size);
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct writer *new_writer(int (*writer_func)(void *, byte *, int),
+			  void *writer_arg, struct write_options *opts)
+{
+	struct writer *wp = calloc(sizeof(struct writer), 1);
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->hash_size = SHA1_SIZE;
+	wp->block = calloc(opts->block_size, 1);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void writer_free(struct writer *w)
+{
+	free(w->block);
+	free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
+			     ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		key = calloc(sizeof(struct obj_index_tree_node), 1);
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = realloc(key->offsets,
+				       sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct writer *w, struct record rec)
+{
+	int result = -1;
+	struct slice key = {};
+	int err = 0;
+	record_key(rec, &key);
+	if (slice_compare(w->last_key, key) >= 0) {
+		goto exit;
+	}
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto exit;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	writer_reinit_block_writer(w, record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	result = 0;
+exit:
+	free(slice_yield(&key));
+	return result;
+}
+
+int writer_add_ref(struct writer *w, struct ref_record *ref)
+{
+	struct record rec = {};
+	struct ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL) {
+		return API_ERROR;
+	}
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index) {
+		return API_ERROR;
+	}
+
+	record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = w->hash_size,
+		};
+
+		writer_index_hash(w, h);
+	}
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = w->hash_size,
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	qsort(refs, n, sizeof(struct ref_record), ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int writer_add_log(struct writer *w, struct log_record *log)
+{
+	if (log->ref_name == NULL) {
+		return API_ERROR;
+	}
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	{
+		struct record rec = {};
+		int err;
+		record_from_log(&rec, log);
+		err = writer_add_record(w, rec);
+		return err;
+	}
+}
+
+int writer_add_logs(struct writer *w, struct log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	qsort(logs, n, sizeof(struct log_record), log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	while (w->index_len > threshold) {
+		struct index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct record rec = {};
+			record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, rec) == 0) {
+				continue;
+			}
+
+			{
+				int err = writer_flush_block(w);
+				if (err < 0) {
+					return err;
+				}
+			}
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, rec);
+			assert(err == 0);
+		}
+		for (i = 0; i < idx_len; i++) {
+			free(slice_yield(&idx[i].last_key));
+		}
+		free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct block_stats *bstats = writer_block_stats(w, typ);
+		bstats->index_blocks =
+			w->stats.idx_stats.blocks - before_blocks;
+		bstats->index_offset = index_start;
+		bstats->max_index_level = max_level;
+	}
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct record rec = {};
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	free(entry->offsets);
+	entry->offsets = NULL;
+	free(slice_yield(&entry->hash));
+	free(entry);
+}
+
+static int writer_dump_object_index(struct writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = {};
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0) {
+		return closure.err;
+	}
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0) {
+		return err;
+	}
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_close(struct writer *w)
+{
+	byte footer[68];
+	byte *p = footer;
+
+	writer_finish_public_section(w);
+
+	writer_write_header(w, footer);
+	p += 24;
+	put_u64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_u64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_u64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_u64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_u64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_u32(p, crc32(0, footer, p - footer));
+	p += 4;
+	w->pending_padding = 0;
+
+	{
+		int n = padded_write(w, footer, sizeof(footer), 0);
+		if (n < 0) {
+			return n;
+		}
+	}
+
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	free(slice_yield(&w->last_key));
+	return 0;
+}
+
+void writer_clear_index(struct writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		free(slice_yield(&w->index[i].last_key));
+	}
+
+	free(w->index);
+	w->index = NULL;
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct block_stats *bstats = writer_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	if (raw_bytes < 0) {
+		return raw_bytes;
+	}
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIuMAX " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_u24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0) {
+		return err;
+	}
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = realloc(w->index,
+				   sizeof(struct index_record) * w->index_cap);
+	}
+
+	{
+		struct index_record ir = {
+			.offset = w->next,
+		};
+		slice_copy(&ir.last_key, w->block_writer->last_key);
+		w->index[w->index_len] = ir;
+	}
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	block_writer_reset(&w->block_writer_data);
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct writer *w)
+{
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+	if (w->block_writer->entries == 0) {
+		return 0;
+	}
+	return writer_flush_nonempty_block(w);
+}
+
+struct stats *writer_stats(struct writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 0000000000..bd2386474b
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,46 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct writer {
+	int (*write)(void *, byte *, int);
+	void *write_arg;
+	int pending_padding;
+	int hash_size;
+	struct slice last_key;
+
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct write_options opts;
+
+	byte *block;
+	struct block_writer *block_writer;
+	struct block_writer block_writer_data;
+	struct index_record *index;
+	int index_len;
+	int index_cap;
+
+	/* tree for use with tsearch */
+	struct tree_node *obj_index_tree;
+
+	struct stats stats;
+};
+
+int writer_flush_block(struct writer *w);
+void writer_clear_index(struct writer *w);
+int writer_finish_public_section(struct writer *w);
+
+#endif
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v2 5/5] Reftable support for git-core
  2020-01-27 14:22 ` [PATCH v2 " Han-Wen Nienhuys via GitGitGadget
                     ` (3 preceding siblings ...)
  2020-01-27 14:22   ` [PATCH v2 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-01-27 14:22   ` Han-Wen Nienhuys via GitGitGadget
  2020-01-28  7:31     ` Jeff King
  2020-02-04 20:27   ` [PATCH v3 0/6] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  5 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-01-27 14:22 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

TODO:

 * Resolve the design problem with reflog expiry.
 * Resolve spots marked with XXX

Example use:

  $ ~/vc/git/git init
  warning: templates not found in /usr/local/google/home/hanwen/share/git-core/templates
  Initialized empty Git repository in /tmp/qz/.git/
  $ echo q > a
  $ ~/vc/git/git add a
  $ ~/vc/git/git commit -mx
  fatal: not a git repository (or any of the parent directories): .git
  [master (root-commit) 373d969] x
   1 file changed, 1 insertion(+)
   create mode 100644 a
  $ ~/vc/git/git show-ref
  373d96972fca9b63595740bba3898a762778ba20 HEAD
  373d96972fca9b63595740bba3898a762778ba20 refs/heads/master
  $ ls -l .git/reftable/
  total 12
  -rw------- 1 hanwen primarygroup  126 Jan 23 20:08 000000000001-000000000001.ref
  -rw------- 1 hanwen primarygroup 4277 Jan 23 20:08 000000000002-000000000002.ref
  $ go run ~/vc/reftable/cmd/dump.go  -table /tmp/qz/.git/reftable/000000000002-000000000002.ref
  ** DEBUG **
  name /tmp/qz/.git/reftable/000000000002-000000000002.ref, sz 4209: 'r' reftable.readerOffsets{Present:true, Offset:0x0, IndexOffset:0x0}, 'o' reftable.readerOffsets{Present:false, Offset:0x0, IndexOffset:0x0} 'g' reftable.readerOffsets{Present:true, Offset:0x1000, IndexOffset:0x0}
  ** REFS **
  reftable.RefRecord{RefName:"refs/heads/master", UpdateIndex:0x2, Value:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, TargetValue:[]uint8(nil), Target:""}
  ** LOGS **
  reftable.LogRecord{RefName:"HEAD", UpdateIndex:0x2, New:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, Old:[]uint8{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, Name:"Han-Wen Nienhuys", Email:"hanwen@google.com", Time:0x5e29ef27, TZOffset:100, Message:"commit (initial): x\n"}

Change-Id: I225ee6317b7911edf9aa95f43299f6c7c4511914
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile                |  23 +-
 refs.c                  |  18 +-
 refs.h                  |   2 +
 refs/refs-internal.h    |   1 +
 refs/reftable-backend.c | 812 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 852 insertions(+), 4 deletions(-)
 create mode 100644 refs/reftable-backend.c

diff --git a/Makefile b/Makefile
index 09f98b777c..4f0bcbfbcb 100644
--- a/Makefile
+++ b/Makefile
@@ -813,6 +813,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += command-list.h
 
@@ -958,6 +959,7 @@ LIB_OBJS += rebase-interactive.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1162,7 +1164,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2343,11 +2345,27 @@ VCSSVN_OBJS += vcs-svn/fast_export.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/bytes.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2484,6 +2502,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
diff --git a/refs.c b/refs.c
index 1ab0bb54d3..26d9637e55 100644
--- a/refs.c
+++ b/refs.c
@@ -20,7 +20,7 @@
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1839,10 +1839,22 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
 static struct ref_store *ref_store_init(const char *gitdir,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct strbuf refs_path = STRBUF_INIT;
+
+        /* XXX this should probably come from a git config setting and not
+           default to reftable. */
+	const char *be_name = "reftable";
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	strbuf_addstr(&refs_path, gitdir);
+	strbuf_addstr(&refs_path, "/refs");
+
+	if (!is_directory(refs_path.buf)) {
+		be_name = "reftable";
+	}
+	strbuf_release(&refs_path);
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
diff --git a/refs.h b/refs.h
index 545029c6d8..99656d2065 100644
--- a/refs.h
+++ b/refs.h
@@ -456,6 +456,8 @@ int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
 				     const char *refname,
 				     each_reflog_ent_fn fn,
 				     void *cb_data);
+
+/* XXX which ordering are these? Newest or oldest first? */
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index fc18b12340..2744aa57f2 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -659,6 +659,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 0000000000..c89884af0c
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,812 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *reftable_dir;
+	char *table_list_file;
+	struct stack *stack;
+};
+
+static void clear_log_record(struct log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	log_record_clear(log);
+}
+
+static void fill_log_record(struct log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = {};
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *reftable_ref_store_create(const char *path,
+						   unsigned int store_flags)
+{
+	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct write_options cfg = {
+		.block_size = 4096,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+
+	strbuf_addf(&sb, "%s/refs", path);
+	refs->table_list_file = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_release(&sb);
+
+	refs->err = new_stack(&refs->stack, refs->reftable_dir,
+			      refs->table_list_file, cfg);
+
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	FILE *f = fopen(refs->table_list_file, "a");
+	if (f == NULL) {
+		return -1;
+	}
+	fclose(f);
+
+	safe_create_dir(refs->reftable_dir, 1);
+	return 0;
+}
+
+struct reftable_iterator {
+	struct ref_iterator base;
+	struct iterator iter;
+	struct ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	while (1) {
+		struct reftable_iterator *ri =
+			(struct reftable_iterator *)ref_iterator;
+		int err = iterator_next_ref(ri->iter, &ri->ref);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			return ITER_DONE;
+		}
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			memcpy(ri->oid.hash, ri->ref.value, GIT_SHA1_RAWSZ);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->base.flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+		ri->base.oid = &ri->oid;
+		return ITER_OK;
+	}
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		memcpy(peeled->hash, ri->ref.target_value, GIT_SHA1_RAWSZ);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	ref_record_clear(&ri->ref);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct merged_table *mt = NULL;
+	int err = 0;
+	if (refs->err) {
+		/* how to propagate errors? */
+		return NULL;
+	}
+
+	mt = stack_merged_table(refs->stack);
+
+	/* XXX something with flags? */
+	err = merged_table_seek_ref(mt, &ri->iter, prefix);
+	/* XXX what to do with err? */
+	assert(err == 0);
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->base.oid = &ri->oid;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	(void)refs;
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct ref_update *)a)->refname,
+		      ((struct ref_update *)b)->refname);
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid = {};
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int write_transaction_table(struct writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = stack_next_update_index(refs->stack);
+	int err = 0;
+	/* XXX - are we allowed to mutate the input data? */
+	qsort(transaction->updates, transaction->nr,
+	      sizeof(struct ref_update *), ref_update_cmp);
+	writer_set_limits(writer, ts, ts);
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		if (u->flags & REF_HAVE_NEW) {
+			struct object_id out_oid = {};
+			int out_flags = 0;
+			/* XXX who owns the memory here? */
+			const char *resolved = refs_resolve_ref_unsafe(
+				transaction->ref_store, u->refname, 0, &out_oid,
+				&out_flags);
+			struct ref_record ref = {};
+			ref.ref_name =
+				(char *)(resolved ? resolved : u->refname);
+			ref.value = u->new_oid.hash;
+			ref.update_index = ts;
+			err = writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		struct log_record log = {};
+		fill_log_record(&log);
+
+		log.ref_name = (char *)u->refname;
+		log.old_hash = u->old_oid.hash;
+		log.new_hash = u->new_oid.hash;
+		log.update_index = ts;
+		log.message = u->msg;
+
+		err = writer_add_log(writer, &log);
+		clear_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+exit:
+	return err;
+}
+
+static int reftable_transaction_commit(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	int err = stack_add(refs->stack, &write_transaction_table, transaction);
+	if (err < 0) {
+		strbuf_addf(errmsg, "reftable: transaction failure %s",
+			    error_str(err));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *err)
+{
+	return reftable_transaction_commit(ref_store, transaction, err);
+}
+
+struct write_delete_refs_arg {
+	struct stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	int err = 0;
+
+	writer_set_limits(writer, ts, ts);
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct log_record log = {};
+		fill_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+
+		/* XXX should lookup old oid. */
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		err = writer_add_log(writer, &log);
+		clear_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	return stack_add(refs->stack, &write_delete_refs_table, &arg);
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	/* XXX reflog expiry. */
+	return stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct stack *stack;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = stack_next_update_index(create->stack);
+	int err = 0;
+
+	struct ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	writer_set_limits(writer, ts, ts);
+	err = writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	/* XXX reflog? */
+
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .stack = refs->stack,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	return stack_add(refs->stack, &write_create_symref_table, &arg);
+}
+
+struct write_rename_arg {
+	struct stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	struct ref_record ref = {};
+	int err = stack_read_ref(arg->stack, arg->oldname, &ref);
+	if (err) {
+		goto exit;
+	}
+
+	/* XXX should check that dest doesn't exist? */
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct ref_record todo[2] = {};
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct log_record todo[2] = {};
+		fill_log_record(&todo[0]);
+		fill_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = writer_add_logs(writer, todo, 2);
+
+		clear_log_record(&todo[0]);
+		clear_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto exit;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	return stack_add(refs->stack, &write_rename_table, &arg);
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct iterator iter;
+	struct log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = iterator_next_log(ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		/* XXX const? */
+		memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	log_record_clear(&ri->log);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return empty_ref_iterator_begin();
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct iterator it = {};
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &it, refname);
+	struct log_record log = {};
+
+	while (err == 0) {
+		err = iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid = {};
+			struct object_id new_oid = {};
+
+			memcpy(&old_oid.hash, log.old_hash, GIT_SHA1_RAWSZ);
+			memcpy(&new_oid.hash, log.new_hash, GIT_SHA1_RAWSZ);
+
+			/* XXX committer = email? name? */
+			if (fn(&old_oid, &new_oid, log.name, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	log_record_clear(&log);
+	iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct iterator it = {};
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &it, refname);
+
+	struct log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+
+	printf("oldest first\n");
+	while (err == 0) {
+		struct log_record log = {};
+		err = iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (int i = len; i--;) {
+		struct log_record *log = &logs[i];
+		struct object_id old_oid = {};
+		struct object_id new_oid = {};
+
+		memcpy(&old_oid.hash, log->old_hash, GIT_SHA1_RAWSZ);
+		memcpy(&new_oid.hash, log->new_hash, GIT_SHA1_RAWSZ);
+
+		/* XXX committer = email? name? */
+		if (!fn(&old_oid, &new_oid, log->name, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (int i = 0; i < len; i++) {
+		log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  XXX
+
+	  This doesn't fit with the reftable API. If we are expiring for space
+	  reasons, the expiry should be combined with a compaction, and there
+	  should be a policy that can be called for all refnames, not for a
+	  single ref name.
+
+	  If this is for cleaning up individual entries, we'll have to write
+	  extra data to create tombstones.
+	 */
+	return 0;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct ref_record ref = {};
+	int err = stack_read_ref(refs->stack, refname, &ref);
+	if (err) {
+		goto exit;
+	}
+	if (ref.target != NULL) {
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else {
+		memcpy(oid->hash, ref.value, GIT_SHA1_RAWSZ);
+	}
+
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire
+};
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 2/5] create .git/refs in files-backend.c
  2020-01-27 14:22   ` [PATCH v2 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
@ 2020-01-27 22:28     ` Junio C Hamano
  2020-01-28 15:58       ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-01-27 22:28 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
> Subject: Re: [PATCH v2 2/5] create .git/refs in files-backend.c
>
> This prepares for supporting the reftable format, which creates a file
> in that place.

The idea is sound, I think.  We want to let each backend to be
responsible for creating and maintaining what is at .git/refs on the
filesystem.

> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>


> Change-Id: I2fc47c89f5ec605734007ceff90321c02474aa92

Do we need to keep this, which is pretty much private name for the
patch that is not valid for most of the people on the list?

> ---
>  builtin/init-db.c    | 2 --
>  refs/files-backend.c | 4 ++++
>  2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index 944ec77fe1..45bdea0589 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
>  	 * We need to create a "refs" dir in any case so that older
>  	 * versions of git can tell that this is a repository.
>  	 */
> -	safe_create_dir(git_path("refs"), 1);
> -	adjust_shared_perm(git_path("refs"));
>  
>  	if (refs_init_db(&err))
>  		die("failed to set up refs db: %s", err.buf);
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 0ea66a28b6..f49b6f2ab6 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3158,6 +3158,10 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
>  		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
>  	struct strbuf sb = STRBUF_INIT;
>  
> +	files_ref_path(refs, &sb, "refs");
> +	safe_create_dir(sb.buf, 1);
> + 	// XXX adjust_shared_perm ?

I am not sure what's there to wonder about with the question mark.

If this step is meant to be a preparation before we actually allow
a different backend to be used, shouldn't the updated and prepared
code behave identically in externally visible ways?

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 3/5] Document how ref iterators and symrefs interact
  2020-01-27 14:22   ` [PATCH v2 3/5] Document how ref iterators and symrefs interact Han-Wen Nienhuys via GitGitGadget
@ 2020-01-27 22:53     ` Junio C Hamano
  2020-01-28 16:07       ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-01-27 22:53 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
> Subject: Re: [PATCH v2 3/5] Document how ref iterators and symrefs interact

"Subject: refs: document how ...", perhaps?

Also isn't it more like iterators and symrefs do not interact in any
unexpected way, is it?  iterators while enumerating refs when they
see a symref, they do not dereference and give the underlying ref
the symref is pointing at---the underlying ref will be listed when
it comes its turn to be shown as an ordinary ref.  I am not sure
what is there to single out and document...

> Change-Id: Ie3ee63c52254c000ef712986246ca28f312b4301
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  refs/refs-internal.h | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index ff2436c0fb..fc18b12340 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -269,6 +269,9 @@ int refs_rename_ref_available(struct ref_store *refs,
>   * to the next entry, ref_iterator_advance() aborts the iteration,
>   * frees the ref_iterator, and returns ITER_ERROR.
>   *
> + * Ref iterators cannot return symref targets, so symbolic refs must be
> + * dereferenced during the iteration.

What this says is not techincally incorrect.  Iterators do not give
each_ref_fn what underlying ref a symref is pointing at.  But it
also is misleading.  If your callback wants to know what object each
ref is pointing at do not need to do anything extra when it sees a
symref, as name of the object pointed at by the underlying ref is
given to it.  Only callbacks that wants to know the other ref, not
the value of the other ref, needs to dereference when called with a
symref.

But I wonder if we need to even say this.  Isn't it obvious from the
each_ref_fn API that nothing other than the refname, object id, and
what kind of ref it is, will be given to the user of the API, so it
would be natural for a caller that wants to do extra things it needs
to do for symrefs must act when it learns a ref is a symref?  After
all, that is why the flags word is one of the parameters given to an
each_ref_fn.


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-27 14:22   ` [PATCH v2 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-01-28  7:31     ` Jeff King
  2020-01-28 15:36       ` Martin Fick
                         ` (2 more replies)
  0 siblings, 3 replies; 409+ messages in thread
From: Jeff King @ 2020-01-28  7:31 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

On Mon, Jan 27, 2020 at 02:22:24PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:

> [...]

I'll try to add my 2 cents to all of the XXX spots you flagged.

> @@ -1839,10 +1839,22 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
>  static struct ref_store *ref_store_init(const char *gitdir,
>  					unsigned int flags)
>  {
> -	const char *be_name = "files";
> -	struct ref_storage_be *be = find_ref_storage_backend(be_name);
> +	struct strbuf refs_path = STRBUF_INIT;
> +
> +        /* XXX this should probably come from a git config setting and not
> +           default to reftable. */
> +	const char *be_name = "reftable";

I think the scheme here needs to be something like:

  - "struct repository" gets a new "ref_format" field

  - setup.c:check_repo_format() learns about an extensions.refFormat
    config key, which we use to set repo->ref_format

  - init/clone should take a command-line option for the ref format of
    the new repository. Anybody choosing reftables would want to set
    core.repositoryformatversion to "1" and set the extensions.refFormat
    key.

  - likewise, there should be a config setting to choose the default ref
    format. It would obviously stay at "files" for now, but we'd
    eventually perhaps flip the default to "reftable".

Some thoughts on compatibility:

The idea of the config changes is that older versions of Git will either
complain about repositoryformatversion (if very old), or will complain
that they don't know about extensions.refFormat. But the changes you
made in patch 1 indicate that existing versions of Git won't consider it
to be a Git repository at all!

I think this is slightly non-ideal, in that we'd continue walking up the
tree looking for a Git repo. And either:

  - we'd find one, which would be confusing and possibly wrong

  - we wouldn't, in which case the error would be something like "no git
    repository", and not "your git repository is too new"

So it would be really nice if we continued to have a HEAD file (just
with some sentinel value in it) and a refs/ directory, so that existing
versions of Git realize "there's a repository here, but it's too new for
me".

There's a slight downside in that tools which _aren't_ careful about
repositoryformatversion might try to operate on the repository, writing
into refs/ or whatever. But such tools are broken, and I'm not sure we
should be catering to them (besides which, the packed-refs ship sailed
long ago, so they're already potentially dangerous).

> +/* XXX which ordering are these? Newest or oldest first? */
>  int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
>  int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);

The top one is chronological order (i.e., reading the file from start to
finish), and the latter is reverse-chronological (seeking backwards in
the file).

> +static struct ref_iterator *
> +reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
> +			    unsigned int flags)
> +{
> +	struct reftable_ref_store *refs =
> +		(struct reftable_ref_store *)ref_store;
> +	struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
> +	struct merged_table *mt = NULL;
> +	int err = 0;
> +	if (refs->err) {
> +		/* how to propagate errors? */
> +		return NULL;
> +	}
> +
> +	mt = stack_merged_table(refs->stack);
> +
> +	/* XXX something with flags? */

I think the flags here are DO_FOR_EACH_*. You might need to consider
INCLUDE_BROKEN here (or it might be dealt with at a layer above).

There's also PER_WORKTREE_ONLY; I think you'd need to check ref_type()
there, like the files backend does.

> +	err = merged_table_seek_ref(mt, &ri->iter, prefix);
> +	/* XXX what to do with err? */

Hmm, I don't think you can return an error from iterator_begin(). It
would probably be OK to record the error in your local struct, and then
later return ITER_ERROR from iterator_advance().

I notice that the more-recent dir_iterator_begin() returns NULL in case
of an error, but looking at the callers of the ref iterators, they don't
seem to be ready to handle NULL.

> +static int write_transaction_table(struct writer *writer, void *arg)
> +{
> +	struct ref_transaction *transaction = (struct ref_transaction *)arg;
> +	struct reftable_ref_store *refs =
> +		(struct reftable_ref_store *)transaction->ref_store;
> +	uint64_t ts = stack_next_update_index(refs->stack);
> +	int err = 0;
> +	/* XXX - are we allowed to mutate the input data? */
> +	qsort(transaction->updates, transaction->nr,
> +	      sizeof(struct ref_update *), ref_update_cmp);

I don't offhand know of anything that would break, but it's probably a
bad idea to do so. If you need a sorted view, can you make an auxiliary
array-of-pointers? Also, the QSORT() macro is a little shorter and has
some extra checks.

> +	for (int i = 0; i < transaction->nr; i++) {
> +		struct ref_update *u = transaction->updates[i];
> +		if (u->flags & REF_HAVE_NEW) {
> +			struct object_id out_oid = {};
> +			int out_flags = 0;
> +			/* XXX who owns the memory here? */
> +			const char *resolved = refs_resolve_ref_unsafe(
> +				transaction->ref_store, u->refname, 0, &out_oid,
> +				&out_flags);

In the "unsafe" version, the memory belongs to a static buffer inside
the function. You shouldn't need to free it.

> +static int
> +reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
> [...]
> +		/* XXX const? */
> +		memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ);

You'd want probably want to use hashcpy() here (or better yet, use
"struct object_id" in ri->log, too, and then use oidcpy()).

But that raises a question: how ready are reftables to handle non-sha1
object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk
format actually has binary sha1s, right? In theory if those all become
the_hash_algo->rawsz, then it might "Just Work" to read and write
slightly larger entries.

> +reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
> +					  const char *refname,
> +					  each_reflog_ent_fn fn, void *cb_data)
> [...]
> +			/* XXX committer = email? name? */
> +			if (fn(&old_oid, &new_oid, log.name, log.time,
> +			       log.tz_offset, log.message, cb_data)) {
> +				err = -1;
> +				break;
> +			}

The committer entry we pass back to each_reflog_ent_fn should be the
full "Human Name <email@host>".

> +static int reftable_reflog_expire(struct ref_store *ref_store,
> +				  const char *refname,
> +				  const struct object_id *oid,
> +				  unsigned int flags,
> +				  reflog_expiry_prepare_fn prepare_fn,
> +				  reflog_expiry_should_prune_fn should_prune_fn,
> +				  reflog_expiry_cleanup_fn cleanup_fn,
> +				  void *policy_cb_data)
> +{
> +	/*
> +	  XXX
> +
> +	  This doesn't fit with the reftable API. If we are expiring for space
> +	  reasons, the expiry should be combined with a compaction, and there
> +	  should be a policy that can be called for all refnames, not for a
> +	  single ref name.
> +
> +	  If this is for cleaning up individual entries, we'll have to write
> +	  extra data to create tombstones.
> +	 */
> +	return 0;
> +}

I agree that we'd generally want to expire and compact all entries at
once. Unfortunately I think this per-ref interface is exposed by
git-reflog. I.e., you can do "git reflog expire refs/heads/foo".

Could we add an extra API call for "expire everything", and then:

  - have refs/files-backend.c implement that by just iterating over all
    of the refs and pruning them one by one

  - have builtin/reflog.c trigger the "expire everything" API when it
    sees "--all"

  - teach refs/reftables-backend.c a crappy version of the per-ref
    expiration, where it just inserts tombstones but doesn't do any
    compaction. It doesn't have to be fast or produce a good output;
    it's just for compatibility.

    Alternatively, I'd be OK with die("sorry, reftables doesn't support
    per-ref reflog expiration") as a first step.

-Peff

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-28  7:31     ` Jeff King
@ 2020-01-28 15:36       ` Martin Fick
  2020-01-29  8:12         ` Jeff King
  2020-01-28 15:56       ` Han-Wen Nienhuys
  2020-02-04 20:22       ` Han-Wen Nienhuys
  2 siblings, 1 reply; 409+ messages in thread
From: Martin Fick @ 2020-01-28 15:36 UTC (permalink / raw)
  To: Jeff King
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Han-Wen Nienhuys

On Tuesday, January 28, 2020 2:31:00 AM MST Jeff King wrote:
> Some thoughts on compatibility:
> 
> The idea of the config changes is that older versions of Git will either
> complain about repositoryformatversion (if very old), or will complain
> that they don't know about extensions.refFormat. But the changes you
> made in patch 1 indicate that existing versions of Git won't consider it
> to be a Git repository at all!
> 
> I think this is slightly non-ideal, in that we'd continue walking up the
> tree looking for a Git repo. And either:
> 
>   - we'd find one, which would be confusing and possibly wrong
> 
>   - we wouldn't, in which case the error would be something like "no git
>     repository", and not "your git repository is too new"
> 
> So it would be really nice if we continued to have a HEAD file (just
> with some sentinel value in it) and a refs/ directory, so that existing
> versions of Git realize "there's a repository here, but it's too new for
> me".
> 
> There's a slight downside in that tools which _aren't_ careful about
> repositoryformatversion might try to operate on the repository, writing
> into refs/ or whatever. But such tools are broken, and I'm not sure we
> should be catering to them (besides which, the packed-refs ship sailed
> long ago, so they're already potentially dangerous).

Could you elaborate on this a bit because it seems on the surface that these 
tools aren't very dangerous, and therefore likely many of them exist?

What are the dangers today of tools that understand/operate on loose and 
packed refs without reading repositoryformatversion?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-28  7:31     ` Jeff King
  2020-01-28 15:36       ` Martin Fick
@ 2020-01-28 15:56       ` Han-Wen Nienhuys
  2020-01-29 10:47         ` Jeff King
  2020-02-04 20:22       ` Han-Wen Nienhuys
  2 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-01-28 15:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, Jan 28, 2020 at 8:31 AM Jeff King <peff@peff.net> wrote:
>
> On Mon, Jan 27, 2020 at 02:22:24PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
>
> > [...]
>
> I'll try to add my 2 cents to all of the XXX spots you flagged.
>

thanks! that was really helpful.

>
> Some thoughts on compatibility:
>
> The idea of the config changes is that older versions of Git will either
> complain about repositoryformatversion (if very old), or will complain
> that they don't know about extensions.refFormat. But the changes you
> made in patch 1 indicate that existing versions of Git won't consider it
> to be a Git repository at all!
>
> I think this is slightly non-ideal, in that we'd continue walking up the
> tree looking for a Git repo. And either:
>
>   - we'd find one, which would be confusing and possibly wrong
>
>   - we wouldn't, in which case the error would be something like "no git
>     repository", and not "your git repository is too new"
>
> So it would be really nice if we continued to have a HEAD file (just
> with some sentinel value in it) and a refs/ directory, so that existing
> versions of Git realize "there's a repository here, but it's too new for
> me".

You have good points.

JGit currently implements what we have here, as this is what's spelled
out in the spec that Shawn posted  back in the day. It's probably
acceptable to this, though, as the reftable support has only landed in
JGit very recently and will probably appear very experimental to
folks.

How would the layout be then? We'll have

  HEAD - dummy file
  reftable/ - the tables
  refs/ - dummy dir

where shall we store the reftable list? maybe in a file called

  reftable-list

If we have both HEAD/refs + (refable/reftable-list), what should we
put there to ensure that no git version actually manages to use the
repository? (what happens if someone deletes the version setting from
the .git/config file)

> > +/* XXX which ordering are these? Newest or oldest first? */
> >  int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
> >  int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
>
> The top one is chronological order (i.e., reading the file from start to
> finish), and the latter is reverse-chronological (seeking backwards in
> the file).

thanks, I put in a commit to clarify this.

> > +     err = merged_table_seek_ref(mt, &ri->iter, prefix);
> > +     /* XXX what to do with err? */
>
> Hmm, I don't think you can return an error from iterator_begin(). It
> would probably be OK to record the error in your local struct, and then
> later return ITER_ERROR from iterator_advance().

good point. I did that.

> > +     /* XXX - are we allowed to mutate the input data? */
> > +     qsort(transaction->updates, transaction->nr,
> > +           sizeof(struct ref_update *), ref_update_cmp);
>
> I don't offhand know of anything that would break, but it's probably a
> bad idea to do so. If you need a sorted view, can you make an auxiliary
> array-of-pointers? Also, the QSORT() macro is a little shorter and has
> some extra checks.

Done.

> In the "unsafe" version, the memory belongs to a static buffer inside
> the function. You shouldn't need to free it.


OK.

> > +static int
> > +reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
> > [...]
> > +             /* XXX const? */
> > +             memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ);
>
> You'd want probably want to use hashcpy() here (or better yet, use
> "struct object_id" in ri->log, too, and then use oidcpy()).

It's more practical in the library to use pointers rather than fixed
size arrays, so it'll be hashcpy.

> But that raises a question: how ready are reftables to handle non-sha1
> object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk
> format actually has binary sha1s, right? In theory if those all become
> the_hash_algo->rawsz, then it might "Just Work" to read and write
> slightly larger entries.

The format fixes the reftable at 20 bytes, and there is not enough
framing information to just write more data. We'll have to encode the
hash size in the version number somehow, eg. we could use the  higher
order bit of the version byte to encode it, for example.

But it needs a new version of the spec. I think it's premature to do
this while v1 of reftable isn't in git-core yet.


> The committer entry we pass back to each_reflog_ent_fn should be the
> full "Human Name <email@host>".

thx, done.

> I agree that we'd generally want to expire and compact all entries at
> once. Unfortunately I think this per-ref interface is exposed by
> git-reflog. I.e., you can do "git reflog expire refs/heads/foo".
>
> Could we add an extra API call for "expire everything", and then:

I took note of your remarks, and added a BUG() for now.

I'll focus on getting the CI on gitgitgadget to build this code first.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 2/5] create .git/refs in files-backend.c
  2020-01-27 22:28     ` Junio C Hamano
@ 2020-01-28 15:58       ` Han-Wen Nienhuys
  2020-01-30  4:19         ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-01-28 15:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Mon, Jan 27, 2020 at 11:28 PM Junio C Hamano <gitster@pobox.com> wrote:

> > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
>
>
> > Change-Id: I2fc47c89f5ec605734007ceff90321c02474aa92
>
> Do we need to keep this, which is pretty much private name for the
> patch that is not valid for most of the people on the list?

No, I can make do with like gitgitgadget.

> > -     safe_create_dir(git_path("refs"), 1);
> > -     adjust_shared_perm(git_path("refs"));
> >
> >       if (refs_init_db(&err))
> >               die("failed to set up refs db: %s", err.buf);
> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index 0ea66a28b6..f49b6f2ab6 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -3158,6 +3158,10 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
> >               files_downcast(ref_store, REF_STORE_WRITE, "init_db");
> >       struct strbuf sb = STRBUF_INIT;
> >
> > +     files_ref_path(refs, &sb, "refs");
> > +     safe_create_dir(sb.buf, 1);
> > +     // XXX adjust_shared_perm ?
>
> I am not sure what's there to wonder about with the question mark.

I forgot why I put the XXX, but note that safe_create_dirs runs
adjust_shared_perms implicitly.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 3/5] Document how ref iterators and symrefs interact
  2020-01-27 22:53     ` Junio C Hamano
@ 2020-01-28 16:07       ` Han-Wen Nienhuys
  2020-01-28 19:35         ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-01-28 16:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Mon, Jan 27, 2020 at 11:53 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Han-Wen Nienhuys <hanwen@google.com>
> > Subject: Re: [PATCH v2 3/5] Document how ref iterators and symrefs interact
>
> "Subject: refs: document how ...", perhaps?
>
> Also isn't it more like iterators and symrefs do not interact in any
> unexpected way, is it?  iterators while enumerating refs when they
> see a symref, they do not dereference and give the underlying ref
> the symref is pointing at---the underlying ref will be listed when
> it comes its turn to be shown as an ordinary ref.  I am not sure
> what is there to single out and document...

I mean, you can't return a zero OID and set REF_ISSYMLINK. Instead,
the ref backend has to follow the symlink recursively until it finds a
ref pointing to a OID.

> > Change-Id: Ie3ee63c52254c000ef712986246ca28f312b4301
> > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> > ---
> >  refs/refs-internal.h | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> > index ff2436c0fb..fc18b12340 100644
> > --- a/refs/refs-internal.h
> > +++ b/refs/refs-internal.h
> > @@ -269,6 +269,9 @@ int refs_rename_ref_available(struct ref_store *refs,
> >   * to the next entry, ref_iterator_advance() aborts the iteration,
> >   * frees the ref_iterator, and returns ITER_ERROR.
> >   *
> > + * Ref iterators cannot return symref targets, so symbolic refs must be
> > + * dereferenced during the iteration.
>
> What this says is not techincally incorrect.  Iterators do not give
> each_ref_fn what underlying ref a symref is pointing at.  But it
> also is misleading.  If your callback wants to know what object each
> ref is pointing at do not need to do anything extra when it sees a
> symref, as name of the object pointed at by the underlying ref is
> given to it.  Only callbacks that wants to know the other ref, not
> the value of the other ref, needs to dereference when called with a
> symref.
>
> But I wonder if we need to even say this.  Isn't it obvious from the
> each_ref_fn API that nothing other than the refname, object id, and
> what kind of ref it is, will be given to the user of the API, so it
> would be natural for a caller that wants to do extra things it needs
> to do for symrefs must act when it learns a ref is a symref?  After
> all, that is why the flags word is one of the parameters given to an
> each_ref_fn.

Maybe it is obvious to you, but it was not obvious to me, coming from
JGit working on the RefDatabase. One could imagine that the caller
needs to do something special to handle a symref, and returning a zero
object ID is OK, and this is a natural assumption if you implement a
ref backend, as dereferencing the symref to find out the final OID may
do more work than the caller needs.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 3/5] Document how ref iterators and symrefs interact
  2020-01-28 16:07       ` Han-Wen Nienhuys
@ 2020-01-28 19:35         ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-01-28 19:35 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

Han-Wen Nienhuys <hanwen@google.com> writes:

>> But I wonder if we need to even say this.  Isn't it obvious from the
>> each_ref_fn API that nothing other than the refname, object id, and
>> what kind of ref it is, will be given to the user of the API, so it
>> would be natural for a caller that wants to do extra things it needs
>> to do for symrefs must act when it learns a ref is a symref?  After
>> all, that is why the flags word is one of the parameters given to an
>> each_ref_fn.
>
> Maybe it is obvious to you, but it was not obvious to me, coming from
> JGit working on the RefDatabase.

If you found each_ref_fn API docs lacking, that probably is where
the information should go (i.e. regardless of type of refs, the
object name is filled correctly).  We need to make it obvious to the
developers who needs to use and work on the code with the API
defined and described in refs.h

Thanks.



^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-28 15:36       ` Martin Fick
@ 2020-01-29  8:12         ` Jeff King
  2020-01-29 16:49           ` Martin Fick
  2020-01-29 18:34           ` Junio C Hamano
  0 siblings, 2 replies; 409+ messages in thread
From: Jeff King @ 2020-01-29  8:12 UTC (permalink / raw)
  To: Martin Fick
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Han-Wen Nienhuys

On Tue, Jan 28, 2020 at 08:36:53AM -0700, Martin Fick wrote:

> > There's a slight downside in that tools which _aren't_ careful about
> > repositoryformatversion might try to operate on the repository, writing
> > into refs/ or whatever. But such tools are broken, and I'm not sure we
> > should be catering to them (besides which, the packed-refs ship sailed
> > long ago, so they're already potentially dangerous).
> 
> Could you elaborate on this a bit because it seems on the surface that these 
> tools aren't very dangerous, and therefore likely many of them exist?
> 
> What are the dangers today of tools that understand/operate on loose and 
> packed refs without reading repositoryformatversion?

I was mostly thinking of hacky scripts that tried to touch .git/refs
directly. And there are a few levels of dangerous there:

  - if you're doing "echo $sha1 >.git/refs/heads/master", then you're
    not locking properly. But it would probably work most of the time.

  - if you're properly taking a lock on ".git/refs/heads/master.lock"
    and renaming into place but not looking at packed-refs, then you
    might overwrite somebody else's update which is in the packed file

  - if you're trying to read refs and not reading packed-refs, obviously
    you might miss some values

If you're actually doing the correct locking and packed-refs read (which
"real" implementations like libgit2 do) then no, I don't think that's
dangerous. And I think libgit2 properly complains when it sees a
repositoryformatversion above 0. I don't know offhand about JGit, or any
of the lesser-used ones like dulwich or go-git.

-Peff

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-28 15:56       ` Han-Wen Nienhuys
@ 2020-01-29 10:47         ` Jeff King
  2020-01-29 18:43           ` Junio C Hamano
  2020-02-04 19:06           ` Han-Wen Nienhuys
  0 siblings, 2 replies; 409+ messages in thread
From: Jeff King @ 2020-01-29 10:47 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, Jan 28, 2020 at 04:56:26PM +0100, Han-Wen Nienhuys wrote:

> JGit currently implements what we have here, as this is what's spelled
> out in the spec that Shawn posted  back in the day. It's probably
> acceptable to this, though, as the reftable support has only landed in
> JGit very recently and will probably appear very experimental to
> folks.
> 
> How would the layout be then? We'll have
> 
>   HEAD - dummy file
>   reftable/ - the tables
>   refs/ - dummy dir
> 
> where shall we store the reftable list? maybe in a file called
> 
>   reftable-list
> 
> If we have both HEAD/refs + (refable/reftable-list), what should we
> put there to ensure that no git version actually manages to use the
> repository? (what happens if someone deletes the version setting from
> the .git/config file)

Yeah, it would be nice to have something that an older version of Git
would totally choke on, but I'm not sure we have a lot of leeway. What
we put in HEAD has to be syntactically legitimate enough to appease
validate_headref(), so our options are either "ref:
refs/something/bogus" or an object hash that we don't have (e.g.,
0{40}). The former would be preferable because it would (in theory)
prevent us from writing to HEAD, as well.

I wondered what would happen if you put in a syntactically invalid ref,
like "ref: refs/.not/.valid" (leading dots are not allowed in path
components of refnames). It does cause _some_ parts of Git to choke, but
sadly "git update-ref HEAD $sha1" actually writes to .git/refs/.not/.valid.

Even "refs/../../dangerous" doesn't give it pause. Yikes. It seems we're
pretty willing to accept symref destinations without further checking.

Making "refs" a file instead of a directory does work nicely, as any
attempts to read or write would get ENOTDIR. And we can fool
is_git_directory() as long as it's marked executable. That's OK on POSIX
systems, but I'm not sure how it would work on Windows (or maybe it
would work just fine, since we presumably just say "yep, everything is
executable").

So perhaps that's enough, and what we put in HEAD won't matter (since
nobody will be able to write into refs/ anyway).

> > But that raises a question: how ready are reftables to handle non-sha1
> > object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk
> > format actually has binary sha1s, right? In theory if those all become
> > the_hash_algo->rawsz, then it might "Just Work" to read and write
> > slightly larger entries.
> 
> The format fixes the reftable at 20 bytes, and there is not enough
> framing information to just write more data. We'll have to encode the
> hash size in the version number somehow, eg. we could use the  higher
> order bit of the version byte to encode it, for example.
> 
> But it needs a new version of the spec. I think it's premature to do
> this while v1 of reftable isn't in git-core yet.

I don't know that we technically need the reftables file to say how long
the hashes are. The git config will tell us which hash we're using, and
everything else is supposed to follow. So I think it would work OK as
long as you're able to be told by the rest of Git that hashes are N
bytes, and just use that to compute the fixed-size records.

That said, it might make for easier debugging if the reftables file
declares the size it assumes.

-Peff

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-29  8:12         ` Jeff King
@ 2020-01-29 16:49           ` Martin Fick
  2020-01-29 18:40             ` Han-Wen Nienhuys
  2020-01-29 18:34           ` Junio C Hamano
  1 sibling, 1 reply; 409+ messages in thread
From: Martin Fick @ 2020-01-29 16:49 UTC (permalink / raw)
  To: Jeff King
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Han-Wen Nienhuys

On Wednesday, January 29, 2020 3:12:59 AM MST Jeff King wrote:
> On Tue, Jan 28, 2020 at 08:36:53AM -0700, Martin Fick wrote:
> > > There's a slight downside in that tools which _aren't_ careful about
> > > repositoryformatversion might try to operate on the repository, writing
> > > into refs/ or whatever. But such tools are broken, and I'm not sure we
> > > should be catering to them (besides which, the packed-refs ship sailed
> > > long ago, so they're already potentially dangerous).
> > 
> > Could you elaborate on this a bit because it seems on the surface that
> > these tools aren't very dangerous, and therefore likely many of them
> > exist?
> > 
> > What are the dangers today of tools that understand/operate on loose and
> > packed refs without reading repositoryformatversion?
> 
> I was mostly thinking of hacky scripts that tried to touch .git/refs
> directly. And there are a few levels of dangerous there:
> 
>   - if you're doing "echo $sha1 >.git/refs/heads/master", then you're
>     not locking properly. But it would probably work most of the time.
> 
>   - if you're properly taking a lock on ".git/refs/heads/master.lock"
>     and renaming into place but not looking at packed-refs, then you
>     might overwrite somebody else's update which is in the packed file
> 
>   - if you're trying to read refs and not reading packed-refs, obviously
>     you might miss some values
> 
> If you're actually doing the correct locking and packed-refs read (which
> "real" implementations like libgit2 do) then no, I don't think that's
> dangerous. And I think libgit2 properly complains when it sees a
> repositoryformatversion above 0. I don't know offhand about JGit, or any
> of the lesser-used ones like dulwich or go-git.

Today, some of these sound like shortcuts that are very likely taken quite a 
bit by cleanup and other maintenance scripts (not necessarily formal git 
tools), and the impact of these shortcuts is likely low with the current 
model. However, I suspect these tools/scripts could be seriously disruptive if 
we leave the refs dir around when using reftable,

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-29  8:12         ` Jeff King
  2020-01-29 16:49           ` Martin Fick
@ 2020-01-29 18:34           ` Junio C Hamano
  1 sibling, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-01-29 18:34 UTC (permalink / raw)
  To: Jeff King
  Cc: Martin Fick, Han-Wen Nienhuys via GitGitGadget, git,
	Han-Wen Nienhuys, Han-Wen Nienhuys

Jeff King <peff@peff.net> writes:

> I was mostly thinking of hacky scripts that tried to touch .git/refs
> directly. And there are a few levels of dangerous there:
>
>   - if you're doing "echo $sha1 >.git/refs/heads/master", then you're
>     not locking properly. But it would probably work most of the time.
>
>   - if you're properly taking a lock on ".git/refs/heads/master.lock"
>     and renaming into place but not looking at packed-refs, then you
>     might overwrite somebody else's update which is in the packed file
>
>   - if you're trying to read refs and not reading packed-refs, obviously
>     you might miss some values

... but the values you read are the right ones as loose refs overlay
and trump what is in packed-refs.

 - if you're trying to delete a branch by removing a file under
   refs/, that would uncover the corresponding entry in packed-refs,
   if exists---in other words, you may resurrect an older value of a
   branch instead of deleting it.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-29 16:49           ` Martin Fick
@ 2020-01-29 18:40             ` Han-Wen Nienhuys
  2020-01-29 19:47               ` Martin Fick
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-01-29 18:40 UTC (permalink / raw)
  To: Martin Fick
  Cc: Jeff King, Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Wed, Jan 29, 2020 at 5:49 PM Martin Fick <mfick@codeaurora.org> wrote:
> > If you're actually doing the correct locking and packed-refs read (which
> > "real" implementations like libgit2 do) then no, I don't think that's
> > dangerous. And I think libgit2 properly complains when it sees a
> > repositoryformatversion above 0. I don't know offhand about JGit, or any
> > of the lesser-used ones like dulwich or go-git.
>
> Today, some of these sound like shortcuts that are very likely taken quite a
> bit by cleanup and other maintenance scripts (not necessarily formal git
> tools), and the impact of these shortcuts is likely low with the current
> model. However, I suspect these tools/scripts could be seriously disruptive if
> we leave the refs dir around when using reftable,

Maybe we can leave the refs dir, but have no heads/ directory inside,
and make the whole thing read-only?


-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-29 10:47         ` Jeff King
@ 2020-01-29 18:43           ` Junio C Hamano
  2020-01-29 18:53             ` Han-Wen Nienhuys
  2020-01-30  7:26             ` Jeff King
  2020-02-04 19:06           ` Han-Wen Nienhuys
  1 sibling, 2 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-01-29 18:43 UTC (permalink / raw)
  To: Jeff King
  Cc: Han-Wen Nienhuys, Han-Wen Nienhuys via GitGitGadget, git,
	Han-Wen Nienhuys

Jeff King <peff@peff.net> writes:

> Making "refs" a file instead of a directory does work nicely, as any
> attempts to read or write would get ENOTDIR. And we can fool
> is_git_directory() as long as it's marked executable. That's OK on POSIX
> systems, but I'm not sure how it would work on Windows (or maybe it
> would work just fine, since we presumably just say "yep, everything is
> executable").
>
> So perhaps that's enough, and what we put in HEAD won't matter (since
> nobody will be able to write into refs/ anyway).

I wonder if it would help to take the "looser repository detection"
code alone and have it in a release, way before the rest of the
reftable topic is ready.  Then by the time a repository created by a
reftable-enabled Git appears on people's disks, all the older
versions of Git that are still in people's hands would at least know
that it is a repository supported by future Git that they themselves
do not know how to handle, stop repository discovery correctly and
refrain from damaging the repository with an extension unknown to
them?


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-29 18:43           ` Junio C Hamano
@ 2020-01-29 18:53             ` Han-Wen Nienhuys
  2020-01-30  7:26             ` Jeff King
  1 sibling, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-01-29 18:53 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Wed, Jan 29, 2020 at 7:43 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Jeff King <peff@peff.net> writes:
>
> > Making "refs" a file instead of a directory does work nicely, as any
> > attempts to read or write would get ENOTDIR. And we can fool
> > is_git_directory() as long as it's marked executable. That's OK on POSIX
> > systems, but I'm not sure how it would work on Windows (or maybe it
> > would work just fine, since we presumably just say "yep, everything is
> > executable").
> >
> > So perhaps that's enough, and what we put in HEAD won't matter (since
> > nobody will be able to write into refs/ anyway).
>
> I wonder if it would help to take the "looser repository detection"
> code alone and have it in a release, way before the rest of the
> reftable topic is ready.  Then by the time a repository created by a
> reftable-enabled Git appears on people's disks, all the older
> versions of Git that are still in people's hands would at least know
> that it is a repository supported by future Git that they themselves
> do not know how to handle, stop repository discovery correctly and
> refrain from damaging the repository with an extension unknown to
> them?

Sure. I suspect that the reftable topic will take some time to get fully ready.

However, it would be good to have a definite plan for the layout now,
because JGit (if you tell it to) is already creating repositories with
the layout we're moving away from.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-29 18:40             ` Han-Wen Nienhuys
@ 2020-01-29 19:47               ` Martin Fick
  2020-01-29 19:50                 ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Martin Fick @ 2020-01-29 19:47 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Jeff King, Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Wednesday, January 29, 2020 7:40:50 PM MST Han-Wen Nienhuys wrote:
> On Wed, Jan 29, 2020 at 5:49 PM Martin Fick <mfick@codeaurora.org> wrote:
> > > If you're actually doing the correct locking and packed-refs read (which
> > > "real" implementations like libgit2 do) then no, I don't think that's
> > > dangerous. And I think libgit2 properly complains when it sees a
> > > repositoryformatversion above 0. I don't know offhand about JGit, or any
> > > of the lesser-used ones like dulwich or go-git.
> > 
> > Today, some of these sound like shortcuts that are very likely taken quite
> > a bit by cleanup and other maintenance scripts (not necessarily formal
> > git tools), and the impact of these shortcuts is likely low with the
> > current model. However, I suspect these tools/scripts could be seriously
> > disruptive if we leave the refs dir around when using reftable,
> 
> Maybe we can leave the refs dir, but have no heads/ directory inside,
> and make the whole thing read-only?

That might be a good enough safety. I guess the next question would be,  would 
it be OK for reftable to ignore and entries under the refs/ dir if they happen 
to appear there somehow?

-Martin

-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-29 19:47               ` Martin Fick
@ 2020-01-29 19:50                 ` Han-Wen Nienhuys
  2020-01-30  7:21                   ` Jeff King
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-01-29 19:50 UTC (permalink / raw)
  To: Martin Fick
  Cc: Jeff King, Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Wed, Jan 29, 2020 at 8:47 PM Martin Fick <mfick@codeaurora.org> wrote:
>
> On Wednesday, January 29, 2020 7:40:50 PM MST Han-Wen Nienhuys wrote:
> > > Today, some of these sound like shortcuts that are very likely taken quite
> > > a bit by cleanup and other maintenance scripts (not necessarily formal
> > > git tools), and the impact of these shortcuts is likely low with the
> > > current model. However, I suspect these tools/scripts could be seriously
> > > disruptive if we leave the refs dir around when using reftable,
> >
> > Maybe we can leave the refs dir, but have no heads/ directory inside,
> > and make the whole thing read-only?
>
> That might be a good enough safety. I guess the next question would be,  would
> it be OK for reftable to ignore and entries under the refs/ dir if they happen
> to appear there somehow?

I propose to ignore refs/ if it is read-only, and fail if it is r/w.
We're not going to look over the files under refs/ . If people
actively try to shoot themselves in the foot, why would we stop them?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 2/5] create .git/refs in files-backend.c
  2020-01-28 15:58       ` Han-Wen Nienhuys
@ 2020-01-30  4:19         ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-01-30  4:19 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

Han-Wen Nienhuys <hanwen@google.com> writes:

>> > +     files_ref_path(refs, &sb, "refs");
>> > +     safe_create_dir(sb.buf, 1);
>> > +     // XXX adjust_shared_perm ?
>
> I forgot why I put the XXX, but note that safe_create_dirs runs
> adjust_shared_perms implicitly.

That piece of information would make an excellent part of the log
message, which is to describe why "questionable" changes are made.

"git init" is designed to be runnable in an existing repository, but
safe_create_dir() would not run adjust_shared_perm() when mkdir()
says EEXIST, and I think that is why the original makes a call to
adjust_shared_perm() after safe_create_dir() returns.

Thanks.



^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-29 19:50                 ` Han-Wen Nienhuys
@ 2020-01-30  7:21                   ` Jeff King
  2020-02-03 16:39                     ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Jeff King @ 2020-01-30  7:21 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Martin Fick, Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Wed, Jan 29, 2020 at 08:50:13PM +0100, Han-Wen Nienhuys wrote:

> > That might be a good enough safety. I guess the next question would be,  would
> > it be OK for reftable to ignore and entries under the refs/ dir if they happen
> > to appear there somehow?
> 
> I propose to ignore refs/ if it is read-only, and fail if it is r/w.
> We're not going to look over the files under refs/ . If people
> actively try to shoot themselves in the foot, why would we stop them?

I'm worried that playing games with permissions is going to lead to
confusing outcomes. There are reasons one might want a r/o refs/
directory with the current system (e.g., you could have a repository on
a read-only mount). Or you might have a system which doesn't implement
the full POSIX permissions, and everything appears to be r/w by the
user.

-Peff

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-29 18:43           ` Junio C Hamano
  2020-01-29 18:53             ` Han-Wen Nienhuys
@ 2020-01-30  7:26             ` Jeff King
  1 sibling, 0 replies; 409+ messages in thread
From: Jeff King @ 2020-01-30  7:26 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Han-Wen Nienhuys, Han-Wen Nienhuys via GitGitGadget, git,
	Han-Wen Nienhuys

On Wed, Jan 29, 2020 at 10:43:11AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Making "refs" a file instead of a directory does work nicely, as any
> > attempts to read or write would get ENOTDIR. And we can fool
> > is_git_directory() as long as it's marked executable. That's OK on POSIX
> > systems, but I'm not sure how it would work on Windows (or maybe it
> > would work just fine, since we presumably just say "yep, everything is
> > executable").
> >
> > So perhaps that's enough, and what we put in HEAD won't matter (since
> > nobody will be able to write into refs/ anyway).
> 
> I wonder if it would help to take the "looser repository detection"
> code alone and have it in a release, way before the rest of the
> reftable topic is ready.  Then by the time a repository created by a
> reftable-enabled Git appears on people's disks, all the older
> versions of Git that are still in people's hands would at least know
> that it is a repository supported by future Git that they themselves
> do not know how to handle, stop repository discovery correctly and
> refrain from damaging the repository with an extension unknown to
> them?

That helps, but it just slightly expands the window where we do the
right thing. I.e., in a rollout like this:

  1. Status quo: we only consider it a repo if "refs/" is executable

  2. We introduce a version of Git that can read and write reftables.

  3. We flip reftables support on by default.

Then any version of git after step 2 is fine. It's the ones from step 1
and older we care about. And there's probably some time between steps 2
and 3 to let the new versions percolate, shake out bugs, etc.

Adding a new step 1.5, "looser repository detection" expands the window
of "good" versions by however long it takes to go from step 1.5 to step
2. But I expect that to be much smaller than the window between 2 and 3,
so I'm not sure it makes a meaningful impact.

(Of course people who voluntarily turn on the feature as soon as they
get hold of a step 2 version have no window at all, but presumably early
adopters are OK with the risk).

-Peff

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-30  7:21                   ` Jeff King
@ 2020-02-03 16:39                     ` Han-Wen Nienhuys
  2020-02-03 17:05                       ` Jeff King
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-03 16:39 UTC (permalink / raw)
  To: Jeff King
  Cc: Martin Fick, Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Thu, Jan 30, 2020 at 8:21 AM Jeff King <peff@peff.net> wrote:
>
> On Wed, Jan 29, 2020 at 08:50:13PM +0100, Han-Wen Nienhuys wrote:
>
> > > That might be a good enough safety. I guess the next question would be,  would
> > > it be OK for reftable to ignore and entries under the refs/ dir if they happen
> > > to appear there somehow?
> >
> > I propose to ignore refs/ if it is read-only, and fail if it is r/w.
> > We're not going to look over the files under refs/ . If people
> > actively try to shoot themselves in the foot, why would we stop them?
>
> I'm worried that playing games with permissions is going to lead to
> confusing outcomes. There are reasons one might want a r/o refs/
> directory with the current system (e.g., you could have a repository on
> a read-only mount). Or you might have a system which doesn't implement
> the full POSIX permissions, and everything appears to be r/w by the
> user.

OK, so permissions are out. How about:

  HEAD - convincing enough for old versions to accept
  refs/ - a standard rwx directory
  reftable/ - a normal directory
  reftable-list - the list of tables
  reads/heads  - a file containing "this repo uses reftables. Upgrade
to git vXYZ"

this would prevent people from erroneously creating normal branches.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-02-03 16:39                     ` Han-Wen Nienhuys
@ 2020-02-03 17:05                       ` Jeff King
  2020-02-03 17:09                         ` Han-Wen Nienhuys
  2020-02-04 18:54                         ` Han-Wen Nienhuys
  0 siblings, 2 replies; 409+ messages in thread
From: Jeff King @ 2020-02-03 17:05 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Martin Fick, Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Mon, Feb 03, 2020 at 05:39:55PM +0100, Han-Wen Nienhuys wrote:

> OK, so permissions are out. How about:
> 
>   HEAD - convincing enough for old versions to accept
>   refs/ - a standard rwx directory
>   reftable/ - a normal directory
>   reftable-list - the list of tables
>   reads/heads  - a file containing "this repo uses reftables. Upgrade
> to git vXYZ"
> 
> this would prevent people from erroneously creating normal branches.

That seems reasonable. They could still write "refs/tags/foo", etc, but
presumably branches would be more common.

Trying to write to any ref there (including HEAD, assuming it has a
dummy value in refs/heads/) would yield something like:

  fatal: cannot lock ref 'HEAD': unable to create lock file .git/refs/heads/master.lock; non-directory in the way

Unfortunately reading with an old version would quietly return "no such
ref", at least with old versions of Git (but those maybe aren't as
interesting as alternate implementations, since old versions of Git
would also complain about the extensions marked in the config).

Iterating with "for-each-ref" would say something like:

  $ git for-each-ref
  warning: ignoring broken ref refs/heads
  $ echo $?
  0

which is so-so. Replacing "refs/" with a file with an executable bit
(which is still playing with permissions, but I think in a little bit
less exotic way) behaves similarly, except that for-each-ref does not
even produce a warning. :-/

I dunno. I wish the failure mode were a little more complete. OTOH, I'm
really not that worried about it. The config repo-version stuff is
_supposed_ to be the definitive word on whether Git is willing to open
up a repository. So this is really belt-and-suspenders on top of that.

-Peff

PS I don't know if it's set in stone yet, but if we're changing things
   around, is it an option to put reftable-list inside the reftable
   directory, just to keep the top-level uncluttered?

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-02-03 17:05                       ` Jeff King
@ 2020-02-03 17:09                         ` Han-Wen Nienhuys
  2020-02-04 18:54                         ` Han-Wen Nienhuys
  1 sibling, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-03 17:09 UTC (permalink / raw)
  To: Jeff King
  Cc: Martin Fick, Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Mon, Feb 3, 2020 at 6:05 PM Jeff King <peff@peff.net> wrote:
> PS I don't know if it's set in stone yet, but if we're changing things
>    around, is it an option to put reftable-list inside the reftable
>    directory, just to keep the top-level uncluttered?

If we're moving things around, we can add this little difference too.
The original idea was to create a conflict with the old-style storage,
but putting the list in reftable/ actually makes files -> restable
conversion slightly more atomic.

--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-02-03 17:05                       ` Jeff King
  2020-02-03 17:09                         ` Han-Wen Nienhuys
@ 2020-02-04 18:54                         ` Han-Wen Nienhuys
  2020-02-04 20:06                           ` Jeff King
  1 sibling, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-04 18:54 UTC (permalink / raw)
  To: Jeff King
  Cc: Martin Fick, Han-Wen Nienhuys via GitGitGadget, git,
	Han-Wen Nienhuys, Jonathan Nieder, Jonathan Tan

On Mon, Feb 3, 2020 at 6:05 PM Jeff King <peff@peff.net> wrote:
> >   HEAD - convincing enough for old versions to accept
> >   refs/ - a standard rwx directory
> >   reftable/ - a normal directory
> >   reftable-list - the list of tables
> >   reads/heads  - a file containing "this repo uses reftables. Upgrade
> > to git vXYZ"
> >
> > this would prevent people from erroneously creating normal branches.
>
> That seems reasonable. They could still write "refs/tags/foo", etc, but
> presumably branches would be more common.
>..
>
> PS I don't know if it's set in stone yet, but if we're changing things
>    around, is it an option to put reftable-list inside the reftable
>    directory, just to keep the top-level uncluttered?

I put up https://git.eclipse.org/r/c/157167/ to update the spec inside
JGit. Please review.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-29 10:47         ` Jeff King
  2020-01-29 18:43           ` Junio C Hamano
@ 2020-02-04 19:06           ` Han-Wen Nienhuys
  2020-02-04 19:54             ` Jeff King
  1 sibling, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-04 19:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Wed, Jan 29, 2020 at 11:47 AM Jeff King <peff@peff.net> wrote:

> That said, it might make for easier debugging if the reftables file
> declares the size it assumes.

If there is a mismatch in size, the reftable will look completely
corrupted data.  I think this will be a bad experience.


-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-02-04 19:06           ` Han-Wen Nienhuys
@ 2020-02-04 19:54             ` Jeff King
  0 siblings, 0 replies; 409+ messages in thread
From: Jeff King @ 2020-02-04 19:54 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, Feb 04, 2020 at 08:06:57PM +0100, Han-Wen Nienhuys wrote:

> On Wed, Jan 29, 2020 at 11:47 AM Jeff King <peff@peff.net> wrote:
> 
> > That said, it might make for easier debugging if the reftables file
> > declares the size it assumes.
> 
> If there is a mismatch in size, the reftable will look completely
> corrupted data.  I think this will be a bad experience.

Yes, but I think it would take a bunch of other failures to get there.
And it's true of all of the other parts of Git, too; the master switch
is a config setting that says "I'm a sha-256 repository".

That said, I'm not at all opposed to a header in the reftables file
saying "I'm a sha-256 reftable". We shouldn't ever see a mismatch there,
but that's what I meant by "easier debugging".

-Peff

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-02-04 18:54                         ` Han-Wen Nienhuys
@ 2020-02-04 20:06                           ` Jeff King
  2020-02-04 20:26                             ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Jeff King @ 2020-02-04 20:06 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Martin Fick, Han-Wen Nienhuys via GitGitGadget, git,
	Han-Wen Nienhuys, Jonathan Nieder, Jonathan Tan

On Tue, Feb 04, 2020 at 07:54:12PM +0100, Han-Wen Nienhuys wrote:

> > PS I don't know if it's set in stone yet, but if we're changing things
> >    around, is it an option to put reftable-list inside the reftable
> >    directory, just to keep the top-level uncluttered?
> 
> I put up https://git.eclipse.org/r/c/157167/ to update the spec inside
> JGit. Please review.

That looks quite reasonable. The one change I'd make is to put
"refs/heads/.invalid" into HEAD (instead of "refs/.invalid"). In my
experiments, existing Git is happy to write into an invalid refname via
"git update-ref HEAD". But by putting it past the non-directory "heads",
such a write would always fail.

It's also supposed to be a requirement that HEAD only puts to
refs/heads/, though our validate_headref() does not enforce that. I
don't know if any other implementations might, though.

I did compare earlier your "make refs/heads a file" versus "make refs/ a
file and just give it an executable bit". And I think the "refs/heads"
solution does behave slightly better in a few cases. But if I understand
correctly, just giving "refs/" an executable bit would mean we retain
compatibility with the existing JGit implementation.

It does end up playing games with permissions, which I warned against,
but I think it might be tolerable:

  1. Unlike read vs write, there's not generally a reason users would
     need to separate read and execute.

  2. Portability-wise, POSIX permissions give us what we need, and
     non-POSIX systems tend to just say everything is executable.

So I could also live with that direction, if switching the JGit behavior
at this point is too much of a pain.

-Peff

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-01-28  7:31     ` Jeff King
  2020-01-28 15:36       ` Martin Fick
  2020-01-28 15:56       ` Han-Wen Nienhuys
@ 2020-02-04 20:22       ` Han-Wen Nienhuys
  2020-02-04 22:13         ` Jeff King
  2 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-04 20:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, Jan 28, 2020 at 8:31 AM Jeff King <peff@peff.net> wrote:
>
> >  {
> > -     const char *be_name = "files";
> > -     struct ref_storage_be *be = find_ref_storage_backend(be_name);
> > +     struct strbuf refs_path = STRBUF_INIT;
> > +
> > +        /* XXX this should probably come from a git config setting and not
> > +           default to reftable. */
> > +     const char *be_name = "reftable";
>
> I think the scheme here needs to be something like:
>
>   - "struct repository" gets a new "ref_format" field
>
>   - setup.c:check_repo_format() learns about an extensions.refFormat
>     config key, which we use to set repo->ref_format
>
>   - init/clone should take a command-line option for the ref format of
>     the new repository. Anybody choosing reftables would want to set
>     core.repositoryformatversion to "1" and set the extensions.refFormat
>     key.

I did this, but are you sure this works? Where would the
repo->ref_storage_format get set? I tried adding it to repo_init(),
but this doesn't get called in a normal startup sequence.

Breakpoint 3, ref_store_init (gitdir=0x555555884b70 ".git",
be_name=0x5555557942ca "reftable", flags=15) at refs.c:1841
warning: Source file is more recent than executable.
1841 {
(gdb) up
#1  0x00005555556de2c8 in get_main_ref_store (r=0x555555871d40
<the_repo>) at refs.c:1862
(gdb) p r->ref_storage_format
$1 = 0x0
}


I'm sending the revised patch series now.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-02-04 20:06                           ` Jeff King
@ 2020-02-04 20:26                             ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-04 20:26 UTC (permalink / raw)
  To: Jeff King
  Cc: Martin Fick, Han-Wen Nienhuys via GitGitGadget, git,
	Han-Wen Nienhuys, Jonathan Nieder, Jonathan Tan

On Tue, Feb 4, 2020 at 9:06 PM Jeff King <peff@peff.net> wrote:
>
> On Tue, Feb 04, 2020 at 07:54:12PM +0100, Han-Wen Nienhuys wrote:
>
> > > PS I don't know if it's set in stone yet, but if we're changing things
> > >    around, is it an option to put reftable-list inside the reftable
> > >    directory, just to keep the top-level uncluttered?
> >
> > I put up https://git.eclipse.org/r/c/157167/ to update the spec inside
> > JGit. Please review.
>
> That looks quite reasonable. The one change I'd make is to put
> "refs/heads/.invalid" into HEAD (instead of "refs/.invalid"). In my
> experiments, existing Git is happy to write into an invalid refname via
> "git update-ref HEAD". But by putting it past the non-directory "heads",
> such a write would always fail.

thanks, incorporated.

> It's also supposed to be a requirement that HEAD only puts to
> refs/heads/, though our validate_headref() does not enforce that. I
> don't know if any other implementations might, though.
>
> I did compare earlier your "make refs/heads a file" versus "make refs/ a
> file and just give it an executable bit". And I think the "refs/heads"
> solution does behave slightly better in a few cases. But if I understand
> correctly, just giving "refs/" an executable bit would mean we retain
> compatibility with the existing JGit implementation.

I hadn't looked at the JGit discovery in detail, but it looks like the
currently proposed approach works with JGit. (I haven't adapted JGit
for the new path setup yet.)

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v3 0/6] Reftable support git-core
  2020-01-27 14:22 ` [PATCH v2 " Han-Wen Nienhuys via GitGitGadget
                     ` (4 preceding siblings ...)
  2020-01-27 14:22   ` [PATCH v2 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-02-04 20:27   ` Han-Wen Nienhuys via GitGitGadget
  2020-02-04 20:27     ` [PATCH v3 1/6] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
                       ` (6 more replies)
  5 siblings, 7 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-04 20:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend.

At this point, I am mainly interested in feedback on the spots marked with
XXX in the Git source code, in particular, how to handle reflog expiry in
this backend.

v2

 * address Jun's nits.
 * address Dscho's portability comments
 * more background in commit messages.

Han-Wen Nienhuys (6):
  refs.h: clarify reflog iteration order
  setup.c: enable repo detection for reftable
  create .git/refs in files-backend.c
  refs: document how ref_iterator_advance_fn should handle symrefs
  Add reftable library
  Reftable support for git-core

 Makefile                |   24 +-
 builtin/init-db.c       |   42 +-
 cache.h                 |    2 +
 refs.c                  |   22 +-
 refs.h                  |    5 +-
 refs/files-backend.c    |    5 +
 refs/refs-internal.h    |    6 +
 refs/reftable-backend.c |  880 +++++++++++++++++++++++++++++++
 reftable/LICENSE        |   31 ++
 reftable/README.md      |   19 +
 reftable/VERSION        |    5 +
 reftable/basics.c       |  196 +++++++
 reftable/basics.h       |   37 ++
 reftable/block.c        |  401 ++++++++++++++
 reftable/block.h        |   71 +++
 reftable/blocksource.h  |   20 +
 reftable/bytes.c        |    0
 reftable/config.h       |    1 +
 reftable/constants.h    |   27 +
 reftable/dump.c         |   97 ++++
 reftable/file.c         |   97 ++++
 reftable/iter.c         |  229 ++++++++
 reftable/iter.h         |   56 ++
 reftable/merged.c       |  286 ++++++++++
 reftable/merged.h       |   34 ++
 reftable/pq.c           |  114 ++++
 reftable/pq.h           |   34 ++
 reftable/reader.c       |  708 +++++++++++++++++++++++++
 reftable/reader.h       |   52 ++
 reftable/record.c       | 1107 +++++++++++++++++++++++++++++++++++++++
 reftable/record.h       |   79 +++
 reftable/reftable.h     |  399 ++++++++++++++
 reftable/slice.c        |  199 +++++++
 reftable/slice.h        |   39 ++
 reftable/stack.c        |  983 ++++++++++++++++++++++++++++++++++
 reftable/stack.h        |   40 ++
 reftable/system.h       |   57 ++
 reftable/tree.c         |   66 +++
 reftable/tree.h         |   24 +
 reftable/writer.c       |  622 ++++++++++++++++++++++
 reftable/writer.h       |   46 ++
 reftable/zlib-compat.c  |   92 ++++
 repository.c            |    4 +
 repository.h            |    3 +
 setup.c                 |   27 +-
 45 files changed, 7255 insertions(+), 33 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c


base-commit: 5b0ca878e008e82f91300091e793427205ce3544
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v2:

 -:  ---------- > 1:  c00403c94d refs.h: clarify reflog iteration order
 1:  174b98f6db ! 2:  57c7342319 setup.c: enable repo detection for reftable
     @@ -10,7 +10,6 @@
      
          * allow missing HEAD if there is a reftable/
      
     -    Change-Id: I5d22317a15a84c8529aa503ae357a4afba247fe9
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
       diff --git a/setup.c b/setup.c
 2:  d7d642dcf6 ! 3:  5b7060cb2f create .git/refs in files-backend.c
     @@ -2,11 +2,10 @@
      
          create .git/refs in files-backend.c
      
     -    This prepares for supporting the reftable format, which creates a file
     -    in that place.
     +    This prepares for supporting the reftable format, which will want
     +    create its own file system layout in .git
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
     -    Change-Id: I2fc47c89f5ec605734007ceff90321c02474aa92
      
       diff --git a/builtin/init-db.c b/builtin/init-db.c
       --- a/builtin/init-db.c
     @@ -30,7 +29,8 @@
       
      +	files_ref_path(refs, &sb, "refs");
      +	safe_create_dir(sb.buf, 1);
     -+ 	// XXX adjust_shared_perm ?
     ++        /* adjust permissions even if directory already exists. */
     ++	adjust_shared_perm(sb.buf);
      +
       	/*
       	 * Create .git/refs/{heads,tags}
 3:  9cf185b51f ! 4:  1b01c735a9 Document how ref iterators and symrefs interact
     @@ -1,20 +1,21 @@
      Author: Han-Wen Nienhuys <hanwen@google.com>
      
     -    Document how ref iterators and symrefs interact
     +    refs: document how ref_iterator_advance_fn should handle symrefs
      
     -    Change-Id: Ie3ee63c52254c000ef712986246ca28f312b4301
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
       diff --git a/refs/refs-internal.h b/refs/refs-internal.h
       --- a/refs/refs-internal.h
       +++ b/refs/refs-internal.h
      @@
     -  * to the next entry, ref_iterator_advance() aborts the iteration,
     -  * frees the ref_iterator, and returns ITER_ERROR.
     -  *
     -+ * Ref iterators cannot return symref targets, so symbolic refs must be
     -+ * dereferenced during the iteration.
     -+ *
     -  * The reference currently being looked at can be peeled by calling
     -  * ref_iterator_peel(). This function is often faster than peel_ref(),
     -  * so it should be preferred when iterating over references.
     + 
     + /* Virtual function declarations for ref_iterators: */
     + 
     ++/*
     ++ * backend-specific implementation of ref_iterator_advance.
     ++ * For symrefs, the function should set REF_ISSYMREF, and it should also dereference
     ++ * the symref to provide the OID referent.
     ++ */
     + typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
     + 
     + typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
 4:  2106ff286b ! 5:  eb0df10068 Add reftable library
     @@ -30,7 +30,6 @@
          * go-git support issue: https://github.com/src-d/go-git/issues/1059
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
     -    Change-Id: Id396ff42be8b42b9e11f194a32e2f95b8250c109
      
       diff --git a/reftable/LICENSE b/reftable/LICENSE
       new file mode 100644
     @@ -88,6 +87,8 @@
      +   git --git-dir reftable-repo/.git show --no-patch origin/master \
      +    > reftable/VERSION && \
      +   echo '/* empty */' > reftable/config.h
     ++   rm reftable/*_test.c reftable/test_framework.*
     ++   git add reftable/*.[ch]
      +
      +Bugfixes should be accompanied by a test and applied to upstream project at
      +https://github.com/google/reftable.
     @@ -97,11 +98,11 @@
       --- /dev/null
       +++ b/reftable/VERSION
      @@
     -+commit c616d53b88657c3a5fe4d2e7243a48effc34c626
     ++commit e54326f73d95bfe8b17f264c400f4c365dbd5e5e
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Mon Jan 27 15:05:43 2020 +0100
     ++Date:   Tue Feb 4 19:48:17 2020 +0100
      +
     -+    C: ban // comments
     ++    C: PRI?MAX use; clang-format.
      
       diff --git a/reftable/basics.c b/reftable/basics.c
       new file mode 100644
     @@ -327,7 +328,6 @@
      +
      +#define true 1
      +#define false 0
     -+#define ARRAYSIZE(a) sizeof(a) / sizeof(a[0])
      +
      +void put_u24(byte *out, uint32_t i);
      +uint32_t get_u24(byte *in);
     @@ -528,9 +528,10 @@
      +		slice_resize(&uncompressed, sz);
      +		memcpy(uncompressed.buf, block->data, block_header_skip);
      +
     -+		if (Z_OK !=
     -+		    uncompress2(uncompressed.buf + block_header_skip, &dst_len,
     -+				block->data + block_header_skip, &src_len)) {
     ++		if (Z_OK != uncompress_return_consumed(
     ++				    uncompressed.buf + block_header_skip,
     ++				    &dst_len, block->data + block_header_skip,
     ++				    &src_len)) {
      +			free(slice_yield(&uncompressed));
      +			return ZLIB_ERROR;
      +		}
     @@ -545,8 +546,8 @@
      +	} else if (sz < full_block_size && sz < block->len &&
      +		   block->data[sz] != 0) {
      +		/* If the block is smaller than the full block size, it is
     -+                   padded (data followed by '\0') or the next block is
     -+                   unaligned. */
     ++		   padded (data followed by '\0') or the next block is
     ++		   unaligned. */
      +		full_block_size = sz;
      +	}
      +
     @@ -750,8 +751,7 @@
      +
      +void block_writer_clear(struct block_writer *bw)
      +{
     -+	free(bw->restarts);
     -+	bw->restarts = NULL;
     ++	FREE_AND_NULL(bw->restarts);
      +	free(slice_yield(&bw->last_key));
      +	/* the block is not owned. */
      +}
     @@ -833,163 +833,6 @@
      +
      +#endif
      
     - diff --git a/reftable/block_test.c b/reftable/block_test.c
     - new file mode 100644
     - --- /dev/null
     - +++ b/reftable/block_test.c
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#include "block.h"
     -+
     -+#include "system.h"
     -+
     -+#include "basics.h"
     -+#include "constants.h"
     -+#include "record.h"
     -+#include "reftable.h"
     -+#include "test_framework.h"
     -+
     -+struct binsearch_args {
     -+	int key;
     -+	int *arr;
     -+};
     -+
     -+static int binsearch_func(int i, void *void_args)
     -+{
     -+	struct binsearch_args *args = (struct binsearch_args *)void_args;
     -+
     -+	return args->key < args->arr[i];
     -+}
     -+
     -+void test_binsearch()
     -+{
     -+	int arr[] = { 2, 4, 6, 8, 10 };
     -+	int sz = ARRAYSIZE(arr);
     -+	struct binsearch_args args = {
     -+		.arr = arr,
     -+	};
     -+
     -+	int i = 0;
     -+	for (i = 1; i < 11; i++) {
     -+		args.key = i;
     -+		int res = binsearch(sz, &binsearch_func, &args);
     -+
     -+		if (res < sz) {
     -+			assert(args.key < arr[res]);
     -+			if (res > 0) {
     -+				assert(args.key >= arr[res - 1]);
     -+			}
     -+		} else {
     -+			assert(args.key == 10 || args.key == 11);
     -+		}
     -+	}
     -+}
     -+
     -+void test_block_read_write()
     -+{
     -+	const int header_off = 21; /* random */
     -+	const int N = 30;
     -+	char *names[N];
     -+	const int block_size = 1024;
     -+	struct block block = {};
     -+	block.data = calloc(block_size, 1);
     -+	block.len = block_size;
     -+
     -+	struct block_writer bw = {};
     -+	block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size,
     -+			  header_off, SHA1_SIZE);
     -+	struct ref_record ref = {};
     -+	struct record rec = {};
     -+	record_from_ref(&rec, &ref);
     -+
     -+	int i = 0;
     -+	for (i = 0; i < N; i++) {
     -+		char name[100];
     -+		snprintf(name, sizeof(name), "branch%02d", i);
     -+
     -+		byte hash[SHA1_SIZE];
     -+		memset(hash, i, sizeof(hash));
     -+
     -+		ref.ref_name = name;
     -+		ref.value = hash;
     -+		names[i] = strdup(name);
     -+		int n = block_writer_add(&bw, rec);
     -+		ref.ref_name = NULL;
     -+		ref.value = NULL;
     -+		assert(n == 0);
     -+	}
     -+
     -+	int n = block_writer_finish(&bw);
     -+	assert(n > 0);
     -+
     -+	block_writer_clear(&bw);
     -+
     -+	struct block_reader br = {};
     -+	block_reader_init(&br, &block, header_off, block_size, SHA1_SIZE);
     -+
     -+	struct block_iter it = {};
     -+	block_reader_start(&br, &it);
     -+
     -+	int j = 0;
     -+	while (true) {
     -+		int r = block_iter_next(&it, rec);
     -+		assert(r >= 0);
     -+		if (r > 0) {
     -+			break;
     -+		}
     -+		assert_streq(names[j], ref.ref_name);
     -+		j++;
     -+	}
     -+
     -+	record_clear(rec);
     -+	block_iter_close(&it);
     -+
     -+	struct slice want = {};
     -+	for (i = 0; i < N; i++) {
     -+		slice_set_string(&want, names[i]);
     -+
     -+		struct block_iter it = {};
     -+		int n = block_reader_seek(&br, &it, want);
     -+		assert(n == 0);
     -+
     -+		n = block_iter_next(&it, rec);
     -+		assert(n == 0);
     -+
     -+		assert_streq(names[i], ref.ref_name);
     -+
     -+		want.len--;
     -+		n = block_reader_seek(&br, &it, want);
     -+		assert(n == 0);
     -+
     -+		n = block_iter_next(&it, rec);
     -+		assert(n == 0);
     -+		assert_streq(names[10 * (i / 10)], ref.ref_name);
     -+
     -+		block_iter_close(&it);
     -+	}
     -+
     -+	record_clear(rec);
     -+	free(block.data);
     -+	free(slice_yield(&want));
     -+	for (i = 0; i < N; i++) {
     -+		free(names[i]);
     -+	}
     -+}
     -+
     -+int main()
     -+{
     -+	add_test_case("binsearch", &test_binsearch);
     -+	add_test_case("block_read_write", &test_block_read_write);
     -+	test_main();
     -+}
     -
       diff --git a/reftable/blocksource.h b/reftable/blocksource.h
       new file mode 100644
       --- /dev/null
     @@ -1324,8 +1167,7 @@
      +	}
      +	it->ops->close(it->iter_arg);
      +	it->ops = NULL;
     -+	free(it->iter_arg);
     -+	it->iter_arg = NULL;
     ++	FREE_AND_NULL(it->iter_arg);
      +}
      +
      +int iterator_next_ref(struct iterator it, struct ref_record *ref)
     @@ -1751,16 +1593,14 @@
      +	for (i = 0; i < mt->stack_len; i++) {
      +		reader_free(mt->stack[i]);
      +	}
     -+	free(mt->stack);
     -+	mt->stack = NULL;
     ++	FREE_AND_NULL(mt->stack);
      +	mt->stack_len = 0;
      +}
      +
      +/* clears the list of subtable, without affecting the readers themselves. */
      +void merged_table_clear(struct merged_table *mt)
      +{
     -+	free(mt->stack);
     -+	mt->stack = NULL;
     ++	FREE_AND_NULL(mt->stack);
      +	mt->stack_len = 0;
      +}
      +
     @@ -1897,270 +1737,6 @@
      +
      +#endif
      
     - diff --git a/reftable/merged_test.c b/reftable/merged_test.c
     - new file mode 100644
     - --- /dev/null
     - +++ b/reftable/merged_test.c
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#include "merged.h"
     -+
     -+#include "system.h"
     -+
     -+#include "basics.h"
     -+#include "block.h"
     -+#include "constants.h"
     -+#include "pq.h"
     -+#include "reader.h"
     -+#include "record.h"
     -+#include "reftable.h"
     -+#include "test_framework.h"
     -+
     -+void test_pq(void)
     -+{
     -+	char *names[54] = {};
     -+	int N = ARRAYSIZE(names) - 1;
     -+
     -+	int i = 0;
     -+	for (i = 0; i < N; i++) {
     -+		char name[100];
     -+		snprintf(name, sizeof(name), "%02d", i);
     -+		names[i] = strdup(name);
     -+	}
     -+
     -+	struct merged_iter_pqueue pq = {};
     -+
     -+	i = 1;
     -+	do {
     -+		struct record rec = new_record(BLOCK_TYPE_REF);
     -+		record_as_ref(rec)->ref_name = names[i];
     -+
     -+		struct pq_entry e = {
     -+			.rec = rec,
     -+		};
     -+		merged_iter_pqueue_add(&pq, e);
     -+		merged_iter_pqueue_check(pq);
     -+		i = (i * 7) % N;
     -+	} while (i != 1);
     -+
     -+	const char *last = NULL;
     -+	while (!merged_iter_pqueue_is_empty(pq)) {
     -+		struct pq_entry e = merged_iter_pqueue_remove(&pq);
     -+		merged_iter_pqueue_check(pq);
     -+		struct ref_record *ref = record_as_ref(e.rec);
     -+
     -+		if (last != NULL) {
     -+			assert(strcmp(last, ref->ref_name) < 0);
     -+		}
     -+		last = ref->ref_name;
     -+		ref->ref_name = NULL;
     -+		free(ref);
     -+	}
     -+
     -+	for (i = 0; i < N; i++) {
     -+		free(names[i]);
     -+	}
     -+
     -+	merged_iter_pqueue_clear(&pq);
     -+}
     -+
     -+void write_test_table(struct slice *buf, struct ref_record refs[], int n)
     -+{
     -+	int min = 0xffffffff;
     -+	int max = 0;
     -+	int i = 0;
     -+	for (i = 0; i < n; i++) {
     -+		uint64_t ui = refs[i].update_index;
     -+		if (ui > max) {
     -+			max = ui;
     -+		}
     -+		if (ui < min) {
     -+			min = ui;
     -+		}
     -+	}
     -+
     -+	struct write_options opts = {
     -+		.block_size = 256,
     -+	};
     -+
     -+	struct writer *w = new_writer(&slice_write_void, buf, &opts);
     -+	writer_set_limits(w, min, max);
     -+
     -+	for (i = 0; i < n; i++) {
     -+		uint64_t before = refs[i].update_index;
     -+		int n = writer_add_ref(w, &refs[i]);
     -+		assert(n == 0);
     -+		assert(before == refs[i].update_index);
     -+	}
     -+
     -+	int err = writer_close(w);
     -+	assert(err == 0);
     -+
     -+	writer_free(w);
     -+	w = NULL;
     -+}
     -+
     -+static struct merged_table *merged_table_from_records(struct ref_record **refs,
     -+						      int *sizes,
     -+						      struct slice *buf, int n)
     -+{
     -+	struct block_source *source = calloc(n, sizeof(*source));
     -+	struct reader **rd = calloc(n, sizeof(*rd));
     -+	int i = 0;
     -+	for (i = 0; i < n; i++) {
     -+		write_test_table(&buf[i], refs[i], sizes[i]);
     -+		block_source_from_slice(&source[i], &buf[i]);
     -+
     -+		int err = new_reader(&rd[i], source[i], "name");
     -+		assert(err == 0);
     -+	}
     -+
     -+	struct merged_table *mt = NULL;
     -+	int err = new_merged_table(&mt, rd, n);
     -+	assert(err == 0);
     -+	return mt;
     -+}
     -+
     -+void test_merged_between(void)
     -+{
     -+	byte hash1[SHA1_SIZE];
     -+	byte hash2[SHA1_SIZE];
     -+
     -+	set_test_hash(hash1, 1);
     -+	set_test_hash(hash2, 2);
     -+	struct ref_record r1[] = { {
     -+		.ref_name = "b",
     -+		.update_index = 1,
     -+		.value = hash1,
     -+	} };
     -+	struct ref_record r2[] = { {
     -+		.ref_name = "a",
     -+		.update_index = 2,
     -+	} };
     -+
     -+	struct ref_record *refs[] = { r1, r2 };
     -+	int sizes[] = { 1, 1 };
     -+	struct slice bufs[2] = {};
     -+	struct merged_table *mt =
     -+		merged_table_from_records(refs, sizes, bufs, 2);
     -+
     -+	struct iterator it = {};
     -+	int err = merged_table_seek_ref(mt, &it, "a");
     -+	assert(err == 0);
     -+
     -+	struct ref_record ref = {};
     -+	err = iterator_next_ref(it, &ref);
     -+	assert_err(err);
     -+	assert(ref.update_index == 2);
     -+}
     -+
     -+void test_merged(void)
     -+{
     -+	byte hash1[SHA1_SIZE];
     -+	byte hash2[SHA1_SIZE];
     -+
     -+	set_test_hash(hash1, 1);
     -+	set_test_hash(hash2, 2);
     -+	struct ref_record r1[] = { {
     -+					   .ref_name = "a",
     -+					   .update_index = 1,
     -+					   .value = hash1,
     -+				   },
     -+				   {
     -+					   .ref_name = "b",
     -+					   .update_index = 1,
     -+					   .value = hash1,
     -+				   },
     -+				   {
     -+					   .ref_name = "c",
     -+					   .update_index = 1,
     -+					   .value = hash1,
     -+				   } };
     -+	struct ref_record r2[] = { {
     -+		.ref_name = "a",
     -+		.update_index = 2,
     -+	} };
     -+	struct ref_record r3[] = {
     -+		{
     -+			.ref_name = "c",
     -+			.update_index = 3,
     -+			.value = hash2,
     -+		},
     -+		{
     -+			.ref_name = "d",
     -+			.update_index = 3,
     -+			.value = hash1,
     -+		},
     -+	};
     -+
     -+	struct ref_record *refs[] = { r1, r2, r3 };
     -+	int sizes[3] = { 3, 1, 2 };
     -+	struct slice bufs[3] = {};
     -+
     -+	struct merged_table *mt =
     -+		merged_table_from_records(refs, sizes, bufs, 3);
     -+
     -+	struct iterator it = {};
     -+	int err = merged_table_seek_ref(mt, &it, "a");
     -+	assert(err == 0);
     -+
     -+	struct ref_record *out = NULL;
     -+	int len = 0;
     -+	int cap = 0;
     -+	while (len < 100) { /* cap loops/recursion. */
     -+		struct ref_record ref = {};
     -+		int err = iterator_next_ref(it, &ref);
     -+		if (err > 0) {
     -+			break;
     -+		}
     -+		if (len == cap) {
     -+			cap = 2 * cap + 1;
     -+			out = realloc(out, sizeof(struct ref_record) * cap);
     -+		}
     -+		out[len++] = ref;
     -+	}
     -+	iterator_destroy(&it);
     -+
     -+	struct ref_record want[] = {
     -+		r2[0],
     -+		r1[1],
     -+		r3[0],
     -+		r3[1],
     -+	};
     -+	assert(ARRAYSIZE(want) == len);
     -+	int i = 0;
     -+	for (i = 0; i < len; i++) {
     -+		assert(ref_record_equal(&want[i], &out[i], SHA1_SIZE));
     -+	}
     -+	for (i = 0; i < len; i++) {
     -+		ref_record_clear(&out[i]);
     -+	}
     -+	free(out);
     -+
     -+	for (i = 0; i < 3; i++) {
     -+		free(slice_yield(&bufs[i]));
     -+	}
     -+	merged_table_close(mt);
     -+	merged_table_free(mt);
     -+}
     -+
     -+/* XXX test refs_for(oid) */
     -+
     -+int main()
     -+{
     -+	add_test_case("test_merged_between", &test_merged_between);
     -+	add_test_case("test_pq", &test_pq);
     -+	add_test_case("test_merged", &test_merged);
     -+	test_main();
     -+}
     -
       diff --git a/reftable/pq.c b/reftable/pq.c
       new file mode 100644
       --- /dev/null
     @@ -2241,12 +1817,7 @@
      +			break;
      +		}
      +
     -+		{
     -+			struct pq_entry tmp = pq->heap[min];
     -+			pq->heap[min] = pq->heap[i];
     -+			pq->heap[i] = tmp;
     -+		}
     -+
     ++		SWAP(pq->heap[i], pq->heap[min]);
      +		i = min;
      +	}
      +
     @@ -2269,11 +1840,7 @@
      +			break;
      +		}
      +
     -+		{
     -+			struct pq_entry tmp = pq->heap[j];
     -+			pq->heap[j] = pq->heap[i];
     -+			pq->heap[i] = tmp;
     -+		}
     ++		SWAP(pq->heap[j], pq->heap[i]);
      +
      +		i = j;
      +	}
     @@ -2286,8 +1853,7 @@
      +		record_clear(pq->heap[i].rec);
      +		free(record_yield(&pq->heap[i].rec));
      +	}
     -+	free(pq->heap);
     -+	pq->heap = NULL;
     ++	FREE_AND_NULL(pq->heap);
      +	pq->len = pq->cap = 0;
      +}
      
     @@ -2561,8 +2127,7 @@
      +		return;
      +	}
      +	reader_return_block(ti->r, &ti->bi.br->block);
     -+	free(ti->bi.br);
     -+	ti->bi.br = NULL;
     ++	FREE_AND_NULL(ti->bi.br);
      +
      +	ti->bi.last_key.len = 0;
      +	ti->bi.next_off = 0;
     @@ -2934,8 +2499,7 @@
      +void reader_close(struct reader *r)
      +{
      +	block_source_close(&r->source);
     -+	free(r->name);
     -+	r->name = NULL;
     ++	FREE_AND_NULL(r->name);
      +}
      +
      +int new_reader(struct reader **p, struct block_source src, char const *name)
     @@ -3271,7 +2835,7 @@
      +	assert(hash_size > 0);
      +
      +	/* This is simple and correct, but we could probably reuse the hash
     -+           fields. */
     ++	   fields. */
      +	ref_record_clear(ref);
      +	if (src->ref_name != NULL) {
      +		ref->ref_name = strdup(src->ref_name);
     @@ -3491,16 +3055,13 @@
      +	}
      +
      +	if (!seen_target && r->target != NULL) {
     -+		free(r->target);
     -+		r->target = NULL;
     ++		FREE_AND_NULL(r->target);
      +	}
      +	if (!seen_target_value && r->target_value != NULL) {
     -+		free(r->target_value);
     -+		r->target_value = NULL;
     ++		FREE_AND_NULL(r->target_value);
      +	}
      +	if (!seen_value && r->value != NULL) {
     -+		free(r->value);
     -+		r->value = NULL;
     ++		FREE_AND_NULL(r->value);
      +	}
      +
      +	return start.len - in.len;
     @@ -3588,8 +3149,8 @@
      +static void obj_record_clear(void *rec)
      +{
      +	struct obj_record *ref = (struct obj_record *)rec;
     -+	free(ref->hash_prefix);
     -+	free(ref->offsets);
     ++	FREE_AND_NULL(ref->hash_prefix);
     ++	FREE_AND_NULL(ref->offsets);
      +	memset(ref, 0, sizeof(struct obj_record));
      +}
      +
     @@ -3713,9 +3274,9 @@
      +{
      +	char hex[SHA256_SIZE + 1] = {};
      +
     -+	printf("log{%s(%" PRIdMAX ") %s <%s> %lu %04d\n", log->ref_name,
     -+	       log->update_index, log->name, log->email, log->time,
     -+	       log->tz_offset);
     ++	printf("log{%s(%" PRIdMAX ") %s <%s> %" PRIuMAX " %04d\n",
     ++	       log->ref_name, log->update_index, log->name, log->email,
     ++	       log->time, log->tz_offset);
      +	hex_format(hex, log->old_hash, hash_size);
      +	printf("%s => ", hex);
      +	hex_format(hex, log->new_hash, hash_size);
     @@ -4306,10 +3867,10 @@
      +
      +#endif
      
     - diff --git a/reftable/record_test.c b/reftable/record_test.c
     + diff --git a/reftable/reftable.h b/reftable/reftable.h
       new file mode 100644
       --- /dev/null
     - +++ b/reftable/record_test.c
     + +++ b/reftable/reftable.h
      @@
      +/*
      +Copyright 2020 Google LLC
     @@ -4319,402 +3880,64 @@
      +https://developers.google.com/open-source/licenses/bsd
      +*/
      +
     -+#include "record.h"
     ++#ifndef REFTABLE_H
     ++#define REFTABLE_H
      +
      +#include "system.h"
      +
     -+#include "basics.h"
     -+#include "constants.h"
     -+#include "reftable.h"
     -+#include "test_framework.h"
     -+
     -+void varint_roundtrip()
     -+{
     -+	uint64_t inputs[] = { 0,
     -+			      1,
     -+			      27,
     -+			      127,
     -+			      128,
     -+			      257,
     -+			      4096,
     -+			      ((uint64_t)1 << 63),
     -+			      ((uint64_t)1 << 63) + ((uint64_t)1 << 63) - 1 };
     -+	int i = 0;
     -+	for (i = 0; i < ARRAYSIZE(inputs); i++) {
     -+		byte dest[10];
     ++typedef uint8_t byte;
     ++typedef byte bool;
      +
     -+		struct slice out = { .buf = dest, .len = 10, .cap = 10 };
     ++/* block_source is a generic wrapper for a seekable readable file.
     ++   It is generally passed around by value.
     ++ */
     ++struct block_source {
     ++	struct block_source_vtable *ops;
     ++	void *arg;
     ++};
      +
     -+		uint64_t in = inputs[i];
     -+		int n = put_var_int(out, in);
     -+		assert(n > 0);
     -+		out.len = n;
     ++/* a contiguous segment of bytes. It keeps track of its generating block_source
     ++   so it can return itself into the pool.
     ++*/
     ++struct block {
     ++	byte *data;
     ++	int len;
     ++	struct block_source source;
     ++};
      +
     -+		uint64_t got = 0;
     -+		n = get_var_int(&got, out);
     -+		assert(n > 0);
     ++/* block_source_vtable are the operations that make up block_source */
     ++struct block_source_vtable {
     ++	/* returns the size of a block source */
     ++	uint64_t (*size)(void *source);
      +
     -+		assert(got == in);
     -+	}
     -+}
     ++	/* reads a segment from the block source. It is an error to read
     ++	   beyond the end of the block */
     ++	int (*read_block)(void *source, struct block *dest, uint64_t off,
     ++			  uint32_t size);
     ++	/* mark the block as read; may return the data back to malloc */
     ++	void (*return_block)(void *source, struct block *blockp);
      +
     -+void test_common_prefix()
     -+{
     -+	struct {
     -+		const char *a, *b;
     -+		int want;
     -+	} cases[] = {
     -+		{ "abc", "ab", 2 },
     -+		{ "", "abc", 0 },
     -+		{ "abc", "abd", 2 },
     -+		{ "abc", "pqr", 0 },
     -+	};
     ++	/* release all resources associated with the block source */
     ++	void (*close)(void *source);
     ++};
      +
     -+	int i = 0;
     -+	for (i = 0; i < ARRAYSIZE(cases); i++) {
     -+		struct slice a = {};
     -+		struct slice b = {};
     -+		slice_set_string(&a, cases[i].a);
     -+		slice_set_string(&b, cases[i].b);
     ++/* opens a file on the file system as a block_source */
     ++int block_source_from_file(struct block_source *block_src, const char *name);
      +
     -+		int got = common_prefix_size(a, b);
     -+		assert(got == cases[i].want);
     ++/* write_options sets options for writing a single reftable. */
     ++struct write_options {
     ++	/* do not pad out blocks to block size. */
     ++	bool unpadded;
      +
     -+		free(slice_yield(&a));
     -+		free(slice_yield(&b));
     -+	}
     -+}
     ++	/* the blocksize. Should be less than 2^24. */
     ++	uint32_t block_size;
      +
     -+void set_hash(byte *h, int j)
     -+{
     -+	int i = 0;
     -+	for (i = 0; i < SHA1_SIZE; i++) {
     -+		h[i] = (j >> i) & 0xff;
     -+	}
     -+}
     ++	/* do not generate a SHA1 => ref index. */
     ++	bool skip_index_objects;
      +
     -+void test_ref_record_roundtrip()
     -+{
     -+	int i = 0;
     -+	for (i = 0; i <= 3; i++) {
     -+		printf("subtest %d\n", i);
     -+		struct ref_record in = {};
     -+		switch (i) {
     -+		case 0:
     -+			break;
     -+		case 1:
     -+			in.value = malloc(SHA1_SIZE);
     -+			set_hash(in.value, 1);
     -+			break;
     -+		case 2:
     -+			in.value = malloc(SHA1_SIZE);
     -+			set_hash(in.value, 1);
     -+			in.target_value = malloc(SHA1_SIZE);
     -+			set_hash(in.target_value, 2);
     -+			break;
     -+		case 3:
     -+			in.target = strdup("target");
     -+			break;
     -+		}
     -+		in.ref_name = strdup("refs/heads/master");
     -+
     -+		struct record rec = {};
     -+		record_from_ref(&rec, &in);
     -+		assert(record_val_type(rec) == i);
     -+		byte buf[1024];
     -+		struct slice key = {};
     -+		record_key(rec, &key);
     -+		struct slice dest = {
     -+			.buf = buf,
     -+			.len = sizeof(buf),
     -+		};
     -+		int n = record_encode(rec, dest, SHA1_SIZE);
     -+		assert(n > 0);
     -+
     -+		struct ref_record out = {};
     -+		struct record rec_out = {};
     -+		record_from_ref(&rec_out, &out);
     -+		int m = record_decode(rec_out, key, i, dest, SHA1_SIZE);
     -+		assert(n == m);
     -+
     -+		assert((out.value != NULL) == (in.value != NULL));
     -+		assert((out.target_value != NULL) == (in.target_value != NULL));
     -+		assert((out.target != NULL) == (in.target != NULL));
     -+		free(slice_yield(&key));
     -+		record_clear(rec_out);
     -+		ref_record_clear(&in);
     -+	}
     -+}
     -+
     -+void test_log_record_roundtrip()
     -+{
     -+	struct log_record in = {
     -+		.ref_name = strdup("refs/heads/master"),
     -+		.old_hash = malloc(SHA1_SIZE),
     -+		.new_hash = malloc(SHA1_SIZE),
     -+		.name = strdup("han-wen"),
     -+		.email = strdup("hanwen@google.com"),
     -+		.message = strdup("test"),
     -+		.update_index = 42,
     -+		.time = 1577123507,
     -+		.tz_offset = 100,
     -+	};
     -+
     -+	struct record rec = {};
     -+	record_from_log(&rec, &in);
     -+
     -+	struct slice key = {};
     -+	record_key(rec, &key);
     -+
     -+	byte buf[1024];
     -+	struct slice dest = {
     -+		.buf = buf,
     -+		.len = sizeof(buf),
     -+	};
     -+
     -+	int n = record_encode(rec, dest, SHA1_SIZE);
     -+	assert(n > 0);
     -+
     -+	struct log_record out = {};
     -+	struct record rec_out = {};
     -+	record_from_log(&rec_out, &out);
     -+	int valtype = record_val_type(rec);
     -+	int m = record_decode(rec_out, key, valtype, dest, SHA1_SIZE);
     -+	assert(n == m);
     -+
     -+	assert(log_record_equal(&in, &out, SHA1_SIZE));
     -+	log_record_clear(&in);
     -+	free(slice_yield(&key));
     -+	record_clear(rec_out);
     -+}
     -+
     -+void test_u24_roundtrip()
     -+{
     -+	uint32_t in = 0x112233;
     -+	byte dest[3];
     -+
     -+	put_u24(dest, in);
     -+	uint32_t out = get_u24(dest);
     -+	assert(in == out);
     -+}
     -+
     -+void test_key_roundtrip()
     -+{
     -+	struct slice dest = {}, last_key = {}, key = {}, roundtrip = {};
     -+
     -+	slice_resize(&dest, 1024);
     -+	slice_set_string(&last_key, "refs/heads/master");
     -+	slice_set_string(&key, "refs/tags/bla");
     -+
     -+	bool restart;
     -+	byte extra = 6;
     -+	int n = encode_key(&restart, dest, last_key, key, extra);
     -+	assert(!restart);
     -+	assert(n > 0);
     -+
     -+	byte rt_extra;
     -+	int m = decode_key(&roundtrip, &rt_extra, last_key, dest);
     -+	assert(n == m);
     -+	assert(slice_equal(key, roundtrip));
     -+	assert(rt_extra == extra);
     -+
     -+	free(slice_yield(&last_key));
     -+	free(slice_yield(&key));
     -+	free(slice_yield(&dest));
     -+	free(slice_yield(&roundtrip));
     -+}
     -+
     -+void print_bytes(byte *p, int l)
     -+{
     -+	int i = 0;
     -+	for (i = 0; i < l; i++) {
     -+		byte c = *p;
     -+		if (c < 32) {
     -+			c = '.';
     -+		}
     -+		printf("%02x[%c] ", p[i], c);
     -+	}
     -+	printf("(%d)\n", l);
     -+}
     -+
     -+void test_obj_record_roundtrip()
     -+{
     -+	byte testHash1[SHA1_SIZE] = {};
     -+	set_hash(testHash1, 1);
     -+	uint64_t till9[] = { 1, 2, 3, 4, 500, 600, 700, 800, 9000 };
     -+
     -+	struct obj_record recs[3] = { {
     -+					      .hash_prefix = testHash1,
     -+					      .hash_prefix_len = 5,
     -+					      .offsets = till9,
     -+					      .offset_len = 3,
     -+				      },
     -+				      {
     -+					      .hash_prefix = testHash1,
     -+					      .hash_prefix_len = 5,
     -+					      .offsets = till9,
     -+					      .offset_len = 9,
     -+				      },
     -+				      {
     -+					      .hash_prefix = testHash1,
     -+					      .hash_prefix_len = 5,
     -+				      }
     -+
     -+	};
     -+	int i = 0;
     -+	for (i = 0; i < ARRAYSIZE(recs); i++) {
     -+		printf("subtest %d\n", i);
     -+		struct obj_record in = recs[i];
     -+		byte buf[1024];
     -+		struct record rec = {};
     -+		record_from_obj(&rec, &in);
     -+		struct slice key = {};
     -+		record_key(rec, &key);
     -+		struct slice dest = {
     -+			.buf = buf,
     -+			.len = sizeof(buf),
     -+		};
     -+		int n = record_encode(rec, dest, SHA1_SIZE);
     -+		assert(n > 0);
     -+		byte extra = record_val_type(rec);
     -+		struct obj_record out = {};
     -+		struct record rec_out = {};
     -+		record_from_obj(&rec_out, &out);
     -+		int m = record_decode(rec_out, key, extra, dest, SHA1_SIZE);
     -+		assert(n == m);
     -+
     -+		assert(in.hash_prefix_len == out.hash_prefix_len);
     -+		assert(in.offset_len == out.offset_len);
     -+
     -+		assert(!memcmp(in.hash_prefix, out.hash_prefix,
     -+			       in.hash_prefix_len));
     -+		assert(0 == memcmp(in.offsets, out.offsets,
     -+				   sizeof(uint64_t) * in.offset_len));
     -+		free(slice_yield(&key));
     -+		record_clear(rec_out);
     -+	}
     -+}
     -+
     -+void test_index_record_roundtrip()
     -+{
     -+	struct index_record in = { .offset = 42 };
     -+
     -+	slice_set_string(&in.last_key, "refs/heads/master");
     -+
     -+	struct slice key = {};
     -+	struct record rec = {};
     -+	record_from_index(&rec, &in);
     -+	record_key(rec, &key);
     -+
     -+	assert(0 == slice_compare(key, in.last_key));
     -+
     -+	byte buf[1024];
     -+	struct slice dest = {
     -+		.buf = buf,
     -+		.len = sizeof(buf),
     -+	};
     -+	int n = record_encode(rec, dest, SHA1_SIZE);
     -+	assert(n > 0);
     -+
     -+	byte extra = record_val_type(rec);
     -+	struct index_record out = {};
     -+	struct record out_rec;
     -+	record_from_index(&out_rec, &out);
     -+	int m = record_decode(out_rec, key, extra, dest, SHA1_SIZE);
     -+	assert(m == n);
     -+
     -+	assert(in.offset == out.offset);
     -+
     -+	record_clear(out_rec);
     -+	free(slice_yield(&key));
     -+	free(slice_yield(&in.last_key));
     -+}
     -+
     -+int main()
     -+{
     -+	add_test_case("test_log_record_roundtrip", &test_log_record_roundtrip);
     -+	add_test_case("test_ref_record_roundtrip", &test_ref_record_roundtrip);
     -+	add_test_case("varint_roundtrip", &varint_roundtrip);
     -+	add_test_case("test_key_roundtrip", &test_key_roundtrip);
     -+	add_test_case("test_common_prefix", &test_common_prefix);
     -+	add_test_case("test_obj_record_roundtrip", &test_obj_record_roundtrip);
     -+	add_test_case("test_index_record_roundtrip",
     -+		      &test_index_record_roundtrip);
     -+	add_test_case("test_u24_roundtrip", &test_u24_roundtrip);
     -+	test_main();
     -+}
     -
     - diff --git a/reftable/reftable.h b/reftable/reftable.h
     - new file mode 100644
     - --- /dev/null
     - +++ b/reftable/reftable.h
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#ifndef REFTABLE_H
     -+#define REFTABLE_H
     -+
     -+#include "system.h"
     -+
     -+typedef uint8_t byte;
     -+typedef byte bool;
     -+
     -+/* block_source is a generic wrapper for a seekable readable file.
     -+   It is generally passed around by value.
     -+ */
     -+struct block_source {
     -+	struct block_source_vtable *ops;
     -+	void *arg;
     -+};
     -+
     -+/* a contiguous segment of bytes. It keeps track of its generating block_source
     -+   so it can return itself into the pool.
     -+*/
     -+struct block {
     -+	byte *data;
     -+	int len;
     -+	struct block_source source;
     -+};
     -+
     -+/* block_source_vtable are the operations that make up block_source */
     -+struct block_source_vtable {
     -+	/* returns the size of a block source */
     -+	uint64_t (*size)(void *source);
     -+
     -+	/* reads a segment from the block source. It is an error to read
     -+	   beyond the end of the block */
     -+	int (*read_block)(void *source, struct block *dest, uint64_t off,
     -+			  uint32_t size);
     -+	/* mark the block as read; may return the data back to malloc */
     -+	void (*return_block)(void *source, struct block *blockp);
     -+
     -+	/* release all resources associated with the block source */
     -+	void (*close)(void *source);
     -+};
     -+
     -+/* opens a file on the file system as a block_source */
     -+int block_source_from_file(struct block_source *block_src, const char *name);
     -+
     -+/* write_options sets options for writing a single reftable. */
     -+struct write_options {
     -+	/* do not pad out blocks to block size. */
     -+	bool unpadded;
     -+
     -+	/* the blocksize. Should be less than 2^24. */
     -+	uint32_t block_size;
     -+
     -+	/* do not generate a SHA1 => ref index. */
     -+	bool skip_index_objects;
     -+
     -+	/* how often to write complete keys in each block. */
     -+	int restart_interval;
     -+};
     ++	/* how often to write complete keys in each block. */
     ++	int restart_interval;
     ++};
      +
      +/* ref_record holds a ref database entry target_value */
      +struct ref_record {
     @@ -5049,493 +4272,6 @@
      +
      +#endif
      
     - diff --git a/reftable/reftable_test.c b/reftable/reftable_test.c
     - new file mode 100644
     - --- /dev/null
     - +++ b/reftable/reftable_test.c
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#include "reftable.h"
     -+
     -+#include "system.h"
     -+
     -+#include "basics.h"
     -+#include "block.h"
     -+#include "constants.h"
     -+#include "reader.h"
     -+#include "record.h"
     -+#include "test_framework.h"
     -+
     -+static const int update_index = 5;
     -+
     -+void test_buffer(void)
     -+{
     -+	struct slice buf = {};
     -+
     -+	byte in[] = "hello";
     -+	slice_write(&buf, in, sizeof(in));
     -+	struct block_source source;
     -+	block_source_from_slice(&source, &buf);
     -+	assert(block_source_size(source) == 6);
     -+	struct block out = {};
     -+	int n = block_source_read_block(source, &out, 0, sizeof(in));
     -+	assert(n == sizeof(in));
     -+	assert(!memcmp(in, out.data, n));
     -+	block_source_return_block(source, &out);
     -+
     -+	n = block_source_read_block(source, &out, 1, 2);
     -+	assert(n == 2);
     -+	assert(!memcmp(out.data, "el", 2));
     -+
     -+	block_source_return_block(source, &out);
     -+	block_source_close(&source);
     -+	free(slice_yield(&buf));
     -+}
     -+
     -+void write_table(char ***names, struct slice *buf, int N, int block_size)
     -+{
     -+	*names = calloc(sizeof(char *), N + 1);
     -+
     -+	struct write_options opts = {
     -+		.block_size = block_size,
     -+	};
     -+
     -+	struct writer *w = new_writer(&slice_write_void, buf, &opts);
     -+
     -+	writer_set_limits(w, update_index, update_index);
     -+	{
     -+		struct ref_record ref = {};
     -+		int i = 0;
     -+		for (i = 0; i < N; i++) {
     -+			byte hash[SHA1_SIZE];
     -+			set_test_hash(hash, i);
     -+
     -+			char name[100];
     -+			snprintf(name, sizeof(name), "refs/heads/branch%02d",
     -+				 i);
     -+
     -+			ref.ref_name = name;
     -+			ref.value = hash;
     -+			ref.update_index = update_index;
     -+			(*names)[i] = strdup(name);
     -+
     -+			int n = writer_add_ref(w, &ref);
     -+			assert(n == 0);
     -+		}
     -+	}
     -+	{
     -+		struct log_record log = {};
     -+		int i = 0;
     -+		for (i = 0; i < N; i++) {
     -+			byte hash[SHA1_SIZE];
     -+			set_test_hash(hash, i);
     -+
     -+			char name[100];
     -+			snprintf(name, sizeof(name), "refs/heads/branch%02d",
     -+				 i);
     -+
     -+			log.ref_name = name;
     -+			log.new_hash = hash;
     -+			log.update_index = update_index;
     -+			log.message = "message";
     -+
     -+			int n = writer_add_log(w, &log);
     -+			assert(n == 0);
     -+		}
     -+	}
     -+
     -+	int n = writer_close(w);
     -+	assert(n == 0);
     -+
     -+	struct stats *stats = writer_stats(w);
     -+	int i = 0;
     -+	for (i = 0; i < stats->ref_stats.blocks; i++) {
     -+		int off = i * opts.block_size;
     -+		if (off == 0) {
     -+			off = HEADER_SIZE;
     -+		}
     -+		assert(buf->buf[off] == 'r');
     -+	}
     -+
     -+	writer_free(w);
     -+	w = NULL;
     -+}
     -+
     -+void test_log_write_read(void)
     -+{
     -+	int N = 2;
     -+	char **names = calloc(sizeof(char *), N + 1);
     -+
     -+	struct write_options opts = {
     -+		.block_size = 256,
     -+	};
     -+
     -+	struct slice buf = {};
     -+	struct writer *w = new_writer(&slice_write_void, &buf, &opts);
     -+
     -+	writer_set_limits(w, 0, N);
     -+	{
     -+		struct ref_record ref = {};
     -+		int i = 0;
     -+		for (i = 0; i < N; i++) {
     -+			char name[256];
     -+			snprintf(name, sizeof(name), "b%02d%0*d", i, 130, 7);
     -+			names[i] = strdup(name);
     -+			puts(name);
     -+			ref.ref_name = name;
     -+			ref.update_index = i;
     -+
     -+			int err = writer_add_ref(w, &ref);
     -+			assert_err(err);
     -+		}
     -+	}
     -+
     -+	{
     -+		struct log_record log = {};
     -+		int i = 0;
     -+		for (i = 0; i < N; i++) {
     -+			byte hash1[SHA1_SIZE], hash2[SHA1_SIZE];
     -+			set_test_hash(hash1, i);
     -+			set_test_hash(hash2, i + 1);
     -+
     -+			log.ref_name = names[i];
     -+			log.update_index = i;
     -+			log.old_hash = hash1;
     -+			log.new_hash = hash2;
     -+
     -+			int err = writer_add_log(w, &log);
     -+			assert_err(err);
     -+		}
     -+	}
     -+
     -+	int n = writer_close(w);
     -+	assert(n == 0);
     -+
     -+	struct stats *stats = writer_stats(w);
     -+	assert(stats->log_stats.blocks > 0);
     -+	writer_free(w);
     -+	w = NULL;
     -+
     -+	struct block_source source = {};
     -+	block_source_from_slice(&source, &buf);
     -+
     -+	struct reader rd = {};
     -+	int err = init_reader(&rd, source, "file.log");
     -+	assert(err == 0);
     -+
     -+	{
     -+		struct iterator it = {};
     -+		err = reader_seek_ref(&rd, &it, names[N - 1]);
     -+		assert(err == 0);
     -+
     -+		struct ref_record ref = {};
     -+		err = iterator_next_ref(it, &ref);
     -+		assert_err(err);
     -+
     -+		/* end of iteration. */
     -+		err = iterator_next_ref(it, &ref);
     -+		assert(0 < err);
     -+
     -+		iterator_destroy(&it);
     -+		ref_record_clear(&ref);
     -+	}
     -+
     -+	{
     -+		struct iterator it = {};
     -+		err = reader_seek_log(&rd, &it, "");
     -+		assert(err == 0);
     -+
     -+		struct log_record log = {};
     -+		int i = 0;
     -+		while (true) {
     -+			int err = iterator_next_log(it, &log);
     -+			if (err > 0) {
     -+				break;
     -+			}
     -+
     -+			assert_err(err);
     -+			assert_streq(names[i], log.ref_name);
     -+			assert(i == log.update_index);
     -+			i++;
     -+		}
     -+
     -+		assert(i == N);
     -+		iterator_destroy(&it);
     -+	}
     -+
     -+	/* cleanup. */
     -+	free(slice_yield(&buf));
     -+	free_names(names);
     -+	reader_close(&rd);
     -+}
     -+
     -+void test_table_read_write_sequential(void)
     -+{
     -+	char **names;
     -+	struct slice buf = {};
     -+	int N = 50;
     -+	write_table(&names, &buf, N, 256);
     -+
     -+	struct block_source source = {};
     -+	block_source_from_slice(&source, &buf);
     -+
     -+	struct reader rd = {};
     -+	int err = init_reader(&rd, source, "file.ref");
     -+	assert(err == 0);
     -+
     -+	struct iterator it = {};
     -+	err = reader_seek_ref(&rd, &it, "");
     -+	assert(err == 0);
     -+
     -+	int j = 0;
     -+	while (true) {
     -+		struct ref_record ref = {};
     -+		int r = iterator_next_ref(it, &ref);
     -+		assert(r >= 0);
     -+		if (r > 0) {
     -+			break;
     -+		}
     -+		assert(0 == strcmp(names[j], ref.ref_name));
     -+		assert(update_index == ref.update_index);
     -+
     -+		j++;
     -+		ref_record_clear(&ref);
     -+	}
     -+	assert(j == N);
     -+	iterator_destroy(&it);
     -+	free(slice_yield(&buf));
     -+	free_names(names);
     -+
     -+	reader_close(&rd);
     -+}
     -+
     -+void test_table_write_small_table(void)
     -+{
     -+	char **names;
     -+	struct slice buf = {};
     -+	int N = 1;
     -+	write_table(&names, &buf, N, 4096);
     -+	assert(buf.len < 200);
     -+	free(slice_yield(&buf));
     -+	free_names(names);
     -+}
     -+
     -+void test_table_read_api(void)
     -+{
     -+	char **names;
     -+	struct slice buf = {};
     -+	int N = 50;
     -+	write_table(&names, &buf, N, 256);
     -+
     -+	struct reader rd = {};
     -+	struct block_source source = {};
     -+	block_source_from_slice(&source, &buf);
     -+
     -+	int err = init_reader(&rd, source, "file.ref");
     -+	assert(err == 0);
     -+
     -+	struct iterator it = {};
     -+	err = reader_seek_ref(&rd, &it, names[0]);
     -+	assert(err == 0);
     -+
     -+	struct log_record log = {};
     -+	err = iterator_next_log(it, &log);
     -+	assert(err == API_ERROR);
     -+
     -+	free(slice_yield(&buf));
     -+	int i = 0;
     -+	for (i = 0; i < N; i++) {
     -+		free(names[i]);
     -+	}
     -+	free(names);
     -+	reader_close(&rd);
     -+}
     -+
     -+void test_table_read_write_seek(bool index)
     -+{
     -+	char **names;
     -+	struct slice buf = {};
     -+	int N = 50;
     -+	write_table(&names, &buf, N, 256);
     -+
     -+	struct reader rd = {};
     -+	struct block_source source = {};
     -+	block_source_from_slice(&source, &buf);
     -+
     -+	int err = init_reader(&rd, source, "file.ref");
     -+	assert(err == 0);
     -+
     -+	if (!index) {
     -+		rd.ref_offsets.index_offset = 0;
     -+	}
     -+
     -+	int i = 0;
     -+	for (i = 1; i < N; i++) {
     -+		struct iterator it = {};
     -+		int err = reader_seek_ref(&rd, &it, names[i]);
     -+		assert(err == 0);
     -+		struct ref_record ref = {};
     -+		err = iterator_next_ref(it, &ref);
     -+		assert(err == 0);
     -+		assert(0 == strcmp(names[i], ref.ref_name));
     -+		assert(i == ref.value[0]);
     -+
     -+		ref_record_clear(&ref);
     -+		iterator_destroy(&it);
     -+	}
     -+
     -+	free(slice_yield(&buf));
     -+	for (i = 0; i < N; i++) {
     -+		free(names[i]);
     -+	}
     -+	free(names);
     -+	reader_close(&rd);
     -+}
     -+
     -+void test_table_read_write_seek_linear(void)
     -+{
     -+	test_table_read_write_seek(false);
     -+}
     -+
     -+void test_table_read_write_seek_index(void)
     -+{
     -+	test_table_read_write_seek(true);
     -+}
     -+
     -+void test_table_refs_for(bool indexed)
     -+{
     -+	int N = 50;
     -+
     -+	char **want_names = calloc(sizeof(char *), N + 1);
     -+
     -+	int want_names_len = 0;
     -+	byte want_hash[SHA1_SIZE];
     -+	set_test_hash(want_hash, 4);
     -+
     -+	struct write_options opts = {
     -+		.block_size = 256,
     -+	};
     -+
     -+	struct slice buf = {};
     -+	struct writer *w = new_writer(&slice_write_void, &buf, &opts);
     -+	{
     -+		struct ref_record ref = {};
     -+		int i = 0;
     -+		for (i = 0; i < N; i++) {
     -+			byte hash[SHA1_SIZE];
     -+			memset(hash, i, sizeof(hash));
     -+			char fill[51] = {};
     -+			memset(fill, 'x', 50);
     -+			char name[100];
     -+			/* Put the variable part in the start */
     -+			snprintf(name, sizeof(name), "br%02d%s", i, fill);
     -+			name[40] = 0;
     -+			ref.ref_name = name;
     -+
     -+			byte hash1[SHA1_SIZE];
     -+			byte hash2[SHA1_SIZE];
     -+
     -+			set_test_hash(hash1, i / 4);
     -+			set_test_hash(hash2, 3 + i / 4);
     -+			ref.value = hash1;
     -+			ref.target_value = hash2;
     -+
     -+			/* 80 bytes / entry, so 3 entries per block. Yields 17 */
     -+			/* blocks. */
     -+			int n = writer_add_ref(w, &ref);
     -+			assert(n == 0);
     -+
     -+			if (!memcmp(hash1, want_hash, SHA1_SIZE) ||
     -+			    !memcmp(hash2, want_hash, SHA1_SIZE)) {
     -+				want_names[want_names_len++] = strdup(name);
     -+			}
     -+		}
     -+	}
     -+
     -+	int n = writer_close(w);
     -+	assert(n == 0);
     -+
     -+	writer_free(w);
     -+	w = NULL;
     -+
     -+	struct reader rd;
     -+	struct block_source source = {};
     -+	block_source_from_slice(&source, &buf);
     -+
     -+	int err = init_reader(&rd, source, "file.ref");
     -+	assert(err == 0);
     -+	if (!indexed) {
     -+		rd.obj_offsets.present = 0;
     -+	}
     -+
     -+	struct iterator it = {};
     -+	err = reader_seek_ref(&rd, &it, "");
     -+	assert(err == 0);
     -+	iterator_destroy(&it);
     -+
     -+	err = reader_refs_for(&rd, &it, want_hash, SHA1_SIZE);
     -+	assert(err == 0);
     -+
     -+	struct ref_record ref = {};
     -+
     -+	int j = 0;
     -+	while (true) {
     -+		int err = iterator_next_ref(it, &ref);
     -+		assert(err >= 0);
     -+		if (err > 0) {
     -+			break;
     -+		}
     -+
     -+		assert(j < want_names_len);
     -+		assert(0 == strcmp(ref.ref_name, want_names[j]));
     -+		j++;
     -+		ref_record_clear(&ref);
     -+	}
     -+	assert(j == want_names_len);
     -+
     -+	free(slice_yield(&buf));
     -+	free_names(want_names);
     -+	iterator_destroy(&it);
     -+	reader_close(&rd);
     -+}
     -+
     -+void test_table_refs_for_no_index(void)
     -+{
     -+	test_table_refs_for(false);
     -+}
     -+
     -+void test_table_refs_for_obj_index(void)
     -+{
     -+	test_table_refs_for(true);
     -+}
     -+
     -+int main()
     -+{
     -+	add_test_case("test_log_write_read", test_log_write_read);
     -+	add_test_case("test_table_write_small_table",
     -+		      &test_table_write_small_table);
     -+	add_test_case("test_buffer", &test_buffer);
     -+	add_test_case("test_table_read_api", &test_table_read_api);
     -+	add_test_case("test_table_read_write_sequential",
     -+		      &test_table_read_write_sequential);
     -+	add_test_case("test_table_read_write_seek_linear",
     -+		      &test_table_read_write_seek_linear);
     -+	add_test_case("test_table_read_write_seek_index",
     -+		      &test_table_read_write_seek_index);
     -+	add_test_case("test_table_read_write_refs_for_no_index",
     -+		      &test_table_refs_for_no_index);
     -+	add_test_case("test_table_read_write_refs_for_obj_index",
     -+		      &test_table_refs_for_obj_index);
     -+	test_main();
     -+}
     -
       diff --git a/reftable/slice.c b/reftable/slice.c
       new file mode 100644
       --- /dev/null
     @@ -5774,61 +4510,17 @@
      +byte *slice_yield(struct slice *s);
      +void slice_copy(struct slice *dest, struct slice src);
      +void slice_resize(struct slice *s, int l);
     -+int slice_compare(struct slice a, struct slice b);
     -+int slice_write(struct slice *b, byte *data, int sz);
     -+int slice_write_void(void *b, byte *data, int sz);
     -+void slice_append(struct slice *dest, struct slice add);
     -+
     -+struct block_source;
     -+void block_source_from_slice(struct block_source *bs, struct slice *buf);
     -+
     -+struct block_source malloc_block_source(void);
     -+
     -+#endif
     -
     - diff --git a/reftable/slice_test.c b/reftable/slice_test.c
     - new file mode 100644
     - --- /dev/null
     - +++ b/reftable/slice_test.c
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#include "slice.h"
     -+
     -+#include "system.h"
     -+
     -+#include "basics.h"
     -+#include "record.h"
     -+#include "reftable.h"
     -+#include "test_framework.h"
     -+
     -+void test_slice(void)
     -+{
     -+	struct slice s = {};
     -+	slice_set_string(&s, "abc");
     -+	assert(0 == strcmp("abc", slice_as_string(&s)));
     -+
     -+	struct slice t = {};
     -+	slice_set_string(&t, "pqr");
     ++int slice_compare(struct slice a, struct slice b);
     ++int slice_write(struct slice *b, byte *data, int sz);
     ++int slice_write_void(void *b, byte *data, int sz);
     ++void slice_append(struct slice *dest, struct slice add);
      +
     -+	slice_append(&s, t);
     -+	assert(0 == strcmp("abcpqr", slice_as_string(&s)));
     ++struct block_source;
     ++void block_source_from_slice(struct block_source *bs, struct slice *buf);
      +
     -+	free(slice_yield(&s));
     -+	free(slice_yield(&t));
     -+}
     ++struct block_source malloc_block_source(void);
      +
     -+int main()
     -+{
     -+	add_test_case("test_slice", &test_slice);
     -+	test_main();
     -+}
     ++#endif
      
       diff --git a/reftable/stack.c b/reftable/stack.c
       new file mode 100644
     @@ -5936,10 +4628,8 @@
      +	merged_table_free(st->merged);
      +	st->merged = NULL;
      +
     -+	free(st->list_file);
     -+	st->list_file = NULL;
     -+	free(st->reftable_dir);
     -+	st->reftable_dir = NULL;
     ++	FREE_AND_NULL(st->list_file);
     ++	FREE_AND_NULL(st->reftable_dir);
      +	free(st);
      +}
      +
     @@ -6170,7 +4860,7 @@
      +static void format_name(struct slice *dest, uint64_t min, uint64_t max)
      +{
      +	char buf[100];
     -+	snprintf(buf, sizeof(buf), "%012lx-%012lx", min, max);
     ++	snprintf(buf, sizeof(buf), "%012" PRIxMAX "-%012" PRIxMAX, min, max);
      +	slice_set_string(dest, buf);
      +}
      +
     @@ -6867,293 +5557,6 @@
      +
      +#endif
      
     - diff --git a/reftable/stack_test.c b/reftable/stack_test.c
     - new file mode 100644
     - --- /dev/null
     - +++ b/reftable/stack_test.c
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#include "stack.h"
     -+
     -+#include "system.h"
     -+
     -+#include "basics.h"
     -+#include "constants.h"
     -+#include "record.h"
     -+#include "reftable.h"
     -+#include "test_framework.h"
     -+
     -+void test_read_file(void)
     -+{
     -+	char fn[256] = "/tmp/stack.test_read_file.XXXXXX";
     -+	int fd = mkstemp(fn);
     -+	assert(fd > 0);
     -+
     -+	char out[1024] = "line1\n\nline2\nline3";
     -+
     -+	int n = write(fd, out, strlen(out));
     -+	assert(n == strlen(out));
     -+	int err = close(fd);
     -+	assert(err >= 0);
     -+
     -+	char **names = NULL;
     -+	err = read_lines(fn, &names);
     -+	assert_err(err);
     -+
     -+	char *want[] = { "line1", "line2", "line3" };
     -+	int i = 0;
     -+	for (i = 0; names[i] != NULL; i++) {
     -+		assert(0 == strcmp(want[i], names[i]));
     -+	}
     -+	free_names(names);
     -+	remove(fn);
     -+}
     -+
     -+void test_parse_names(void)
     -+{
     -+	char buf[] = "line\n";
     -+	char **names = NULL;
     -+	parse_names(buf, strlen(buf), &names);
     -+
     -+	assert(NULL != names[0]);
     -+	assert(0 == strcmp(names[0], "line"));
     -+	assert(NULL == names[1]);
     -+	free_names(names);
     -+}
     -+
     -+void test_names_equal(void)
     -+{
     -+	char *a[] = { "a", "b", "c", NULL };
     -+	char *b[] = { "a", "b", "d", NULL };
     -+	char *c[] = { "a", "b", NULL };
     -+
     -+	assert(names_equal(a, a));
     -+	assert(!names_equal(a, b));
     -+	assert(!names_equal(a, c));
     -+}
     -+
     -+int write_test_ref(struct writer *wr, void *arg)
     -+{
     -+	struct ref_record *ref = arg;
     -+
     -+	writer_set_limits(wr, ref->update_index, ref->update_index);
     -+	int err = writer_add_ref(wr, ref);
     -+
     -+	return err;
     -+}
     -+
     -+int write_test_log(struct writer *wr, void *arg)
     -+{
     -+	struct log_record *log = arg;
     -+
     -+	writer_set_limits(wr, log->update_index, log->update_index);
     -+	int err = writer_add_log(wr, log);
     -+
     -+	return err;
     -+}
     -+
     -+void test_stack_add(void)
     -+{
     -+	int i = 0;
     -+	char dir[256] = "/tmp/stack.test_stack_add.XXXXXX";
     -+	assert(mkdtemp(dir));
     -+	printf("%s\n", dir);
     -+	char fn[256] = "";
     -+	strcat(fn, dir);
     -+	strcat(fn, "/refs");
     -+
     -+	struct write_options cfg = {};
     -+	struct stack *st = NULL;
     -+	int err = new_stack(&st, dir, fn, cfg);
     -+	assert_err(err);
     -+
     -+	struct ref_record refs[2] = {};
     -+	struct log_record logs[2] = {};
     -+	int N = ARRAYSIZE(refs);
     -+	for (i = 0; i < N; i++) {
     -+		char buf[256];
     -+		snprintf(buf, sizeof(buf), "branch%02d", i);
     -+		refs[i].ref_name = strdup(buf);
     -+		refs[i].value = malloc(SHA1_SIZE);
     -+		refs[i].update_index = i + 1;
     -+		set_test_hash(refs[i].value, i);
     -+
     -+		logs[i].ref_name = strdup(buf);
     -+		logs[i].update_index = N + i + 1;
     -+		logs[i].new_hash = malloc(SHA1_SIZE);
     -+		logs[i].email = strdup("identity@invalid");
     -+		set_test_hash(logs[i].new_hash, i);
     -+	}
     -+
     -+	for (i = 0; i < N; i++) {
     -+		int err = stack_add(st, &write_test_ref, &refs[i]);
     -+		assert_err(err);
     -+	}
     -+
     -+	for (i = 0; i < N; i++) {
     -+		int err = stack_add(st, &write_test_log, &logs[i]);
     -+		assert_err(err);
     -+	}
     -+
     -+	err = stack_compact_all(st, NULL);
     -+	assert_err(err);
     -+
     -+	for (i = 0; i < N; i++) {
     -+		struct ref_record dest = {};
     -+		int err = stack_read_ref(st, refs[i].ref_name, &dest);
     -+		assert_err(err);
     -+		assert(ref_record_equal(&dest, refs + i, SHA1_SIZE));
     -+		ref_record_clear(&dest);
     -+	}
     -+
     -+	for (i = 0; i < N; i++) {
     -+		struct log_record dest = {};
     -+		int err = stack_read_log(st, refs[i].ref_name, &dest);
     -+		assert_err(err);
     -+		assert(log_record_equal(&dest, logs + i, SHA1_SIZE));
     -+		log_record_clear(&dest);
     -+	}
     -+
     -+	/* cleanup */
     -+	stack_destroy(st);
     -+	for (i = 0; i < N; i++) {
     -+		ref_record_clear(&refs[i]);
     -+		log_record_clear(&logs[i]);
     -+	}
     -+}
     -+
     -+void test_log2(void)
     -+{
     -+	assert(1 == fastlog2(3));
     -+	assert(2 == fastlog2(4));
     -+	assert(2 == fastlog2(5));
     -+}
     -+
     -+void test_sizes_to_segments(void)
     -+{
     -+	uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
     -+	/* .................0  1  2  3  4  5 */
     -+
     -+	int seglen = 0;
     -+	struct segment *segs =
     -+		sizes_to_segments(&seglen, sizes, ARRAYSIZE(sizes));
     -+	assert(segs[2].log == 3);
     -+	assert(segs[2].start == 5);
     -+	assert(segs[2].end == 6);
     -+
     -+	assert(segs[1].log == 2);
     -+	assert(segs[1].start == 2);
     -+	assert(segs[1].end == 5);
     -+	free(segs);
     -+}
     -+
     -+void test_suggest_compaction_segment(void)
     -+{
     -+	{
     -+		uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
     -+		/* .................0    1    2  3   4  5  6 */
     -+		struct segment min =
     -+			suggest_compaction_segment(sizes, ARRAYSIZE(sizes));
     -+		assert(min.start == 2);
     -+		assert(min.end == 7);
     -+	}
     -+
     -+	{
     -+		uint64_t sizes[] = { 64, 32, 16, 8, 4, 2 };
     -+		struct segment result =
     -+			suggest_compaction_segment(sizes, ARRAYSIZE(sizes));
     -+		assert(result.start == result.end);
     -+	}
     -+}
     -+
     -+void test_reflog_expire(void)
     -+{
     -+	char dir[256] = "/tmp/stack.test_reflog_expire.XXXXXX";
     -+	assert(mkdtemp(dir));
     -+	printf("%s\n", dir);
     -+	char fn[256] = "";
     -+	strcat(fn, dir);
     -+	strcat(fn, "/refs");
     -+
     -+	struct write_options cfg = {};
     -+	struct stack *st = NULL;
     -+	int err = new_stack(&st, dir, fn, cfg);
     -+	assert_err(err);
     -+
     -+	struct log_record logs[20] = {};
     -+	int N = ARRAYSIZE(logs) - 1;
     -+	int i = 0;
     -+	for (i = 1; i <= N; i++) {
     -+		char buf[256];
     -+		snprintf(buf, sizeof(buf), "branch%02d", i);
     -+
     -+		logs[i].ref_name = strdup(buf);
     -+		logs[i].update_index = i;
     -+		logs[i].time = i;
     -+		logs[i].new_hash = malloc(SHA1_SIZE);
     -+		logs[i].email = strdup("identity@invalid");
     -+		set_test_hash(logs[i].new_hash, i);
     -+	}
     -+
     -+	for (i = 1; i <= N; i++) {
     -+		int err = stack_add(st, &write_test_log, &logs[i]);
     -+		assert_err(err);
     -+	}
     -+
     -+	err = stack_compact_all(st, NULL);
     -+	assert_err(err);
     -+
     -+	struct log_expiry_config expiry = {
     -+		.time = 10,
     -+	};
     -+	err = stack_compact_all(st, &expiry);
     -+	assert_err(err);
     -+
     -+	struct log_record log = {};
     -+	err = stack_read_log(st, logs[9].ref_name, &log);
     -+	assert(err == 1);
     -+
     -+	err = stack_read_log(st, logs[11].ref_name, &log);
     -+	assert_err(err);
     -+
     -+	expiry.min_update_index = 15;
     -+	err = stack_compact_all(st, &expiry);
     -+	assert_err(err);
     -+
     -+	err = stack_read_log(st, logs[14].ref_name, &log);
     -+	assert(err == 1);
     -+
     -+	err = stack_read_log(st, logs[16].ref_name, &log);
     -+	assert_err(err);
     -+
     -+	/* cleanup */
     -+	stack_destroy(st);
     -+	for (i = 0; i < N; i++) {
     -+		log_record_clear(&logs[i]);
     -+	}
     -+}
     -+
     -+int main()
     -+{
     -+	add_test_case("test_reflog_expire", test_reflog_expire);
     -+	add_test_case("test_suggest_compaction_segment",
     -+		      &test_suggest_compaction_segment);
     -+	add_test_case("test_sizes_to_segments", &test_sizes_to_segments);
     -+	add_test_case("test_log2", &test_log2);
     -+	add_test_case("test_parse_names", &test_parse_names);
     -+	add_test_case("test_read_file", &test_read_file);
     -+	add_test_case("test_names_equal", &test_names_equal);
     -+	add_test_case("test_stack_add", &test_stack_add);
     -+	test_main();
     -+}
     -
       diff --git a/reftable/system.h b/reftable/system.h
       new file mode 100644
       --- /dev/null
     @@ -7173,8 +5576,12 @@
      +#include "config.h"
      +
      +#ifndef REFTABLE_STANDALONE
     ++
      +#include "git-compat-util.h"
     ++#include <zlib.h>
     ++
      +#else
     ++
      +#include <assert.h>
      +#include <errno.h>
      +#include <fcntl.h>
     @@ -7191,151 +5598,25 @@
      +#define PRIuMAX "lu"
      +#define PRIdMAX "ld"
      +#define PRIxMAX "lx"
     -+
     -+#endif
     -+
     -+#endif
     -
     - diff --git a/reftable/test_framework.c b/reftable/test_framework.c
     - new file mode 100644
     - --- /dev/null
     - +++ b/reftable/test_framework.c
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#include "test_framework.h"
     -+
     -+#include "system.h"
     -+
     -+#include "constants.h"
     -+
     -+struct test_case **test_cases;
     -+int test_case_len;
     -+int test_case_cap;
     -+
     -+struct test_case *new_test_case(const char *name, void (*testfunc)())
     -+{
     -+	struct test_case *tc = malloc(sizeof(struct test_case));
     -+	tc->name = name;
     -+	tc->testfunc = testfunc;
     -+	return tc;
     -+}
     -+
     -+struct test_case *add_test_case(const char *name, void (*testfunc)())
     -+{
     -+	struct test_case *tc = new_test_case(name, testfunc);
     -+	if (test_case_len == test_case_cap) {
     -+		test_case_cap = 2 * test_case_cap + 1;
     -+		test_cases = realloc(test_cases,
     -+				     sizeof(struct test_case) * test_case_cap);
     -+	}
     -+
     -+	test_cases[test_case_len++] = tc;
     -+	return tc;
     -+}
     -+
     -+void test_main()
     -+{
     -+	int i = 0;
     -+	for (i = 0; i < test_case_len; i++) {
     -+		printf("case %s\n", test_cases[i]->name);
     -+		test_cases[i]->testfunc();
     -+	}
     -+}
     -+
     -+void set_test_hash(byte *p, int i)
     -+{
     -+	memset(p, (byte)i, SHA1_SIZE);
     -+}
     -+
     -+void print_names(char **a)
     -+{
     -+	if (a == NULL || *a == NULL) {
     -+		puts("[]");
     -+		return;
     -+	}
     -+	puts("[");
     -+	char **p = a;
     -+	while (*p) {
     -+		puts(*p);
     -+		p++;
     -+	}
     -+	puts("]");
     -+}
     -
     - diff --git a/reftable/test_framework.h b/reftable/test_framework.h
     - new file mode 100644
     - --- /dev/null
     - +++ b/reftable/test_framework.h
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#ifndef TEST_FRAMEWORK_H
     -+#define TEST_FRAMEWORK_H
     -+
     -+#include "system.h"
     -+
     -+#include "reftable.h"
     -+
     -+#ifdef NDEBUG
     -+#undef NDEBUG
     -+#endif
     -+
     -+#include "system.h"
     -+
     -+#ifdef assert
     -+#undef assert
     ++#define ARRAY_SIZE(a) sizeof((a)) / sizeof((a)[0])
     ++#define FREE_AND_NULL(x)    \
     ++	do {                \
     ++		free(x);    \
     ++		(x) = NULL; \
     ++	} while (0)
     ++#define QSORT(arr, n, cmp) qsort(arr, n, sizeof(arr[0]), cmp)
     ++#define SWAP(a, b) \
     ++  { \
     ++     char tmp[sizeof(a)]; \
     ++     assert(sizeof(a)==sizeof(b)); \
     ++     memcpy(&tmp[0], &a, sizeof(a)); \
     ++     memcpy(&a, &b, sizeof(a)); \
     ++     memcpy(&b, &tmp[0], sizeof(a)); \
     ++  }
      +#endif
      +
     -+#define assert_err(c)                                                 \
     -+	if (c != 0) {                                                 \
     -+		fflush(stderr);                                       \
     -+		fflush(stdout);                                       \
     -+		fprintf(stderr, "%s: %d: error == %d (%s), want 0\n", \
     -+			__FILE__, __LINE__, c, error_str(c));         \
     -+		abort();                                              \
     -+	}
     -+
     -+#define assert_streq(a, b)                                               \
     -+	if (strcmp(a, b)) {                                              \
     -+		fflush(stderr);                                          \
     -+		fflush(stdout);                                          \
     -+		fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, \
     -+			__LINE__, #a, a, #b, b);                         \
     -+		abort();                                                 \
     -+	}
     -+
     -+#define assert(c)                                                          \
     -+	if (!(c)) {                                                        \
     -+		fflush(stderr);                                            \
     -+		fflush(stdout);                                            \
     -+		fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, \
     -+			__LINE__, #c);                                     \
     -+		abort();                                                   \
     -+	}
     -+
     -+struct test_case {
     -+	const char *name;
     -+	void (*testfunc)();
     -+};
     -+
     -+struct test_case *new_test_case(const char *name, void (*testfunc)());
     -+struct test_case *add_test_case(const char *name, void (*testfunc)());
     -+void test_main();
     -+
     -+void set_test_hash(byte *p, int i);
     ++int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
     ++			       const Bytef *source, uLong *sourceLen);
      +
      +#endif
      
     @@ -7441,73 +5722,6 @@
      +
      +#endif
      
     - diff --git a/reftable/tree_test.c b/reftable/tree_test.c
     - new file mode 100644
     - --- /dev/null
     - +++ b/reftable/tree_test.c
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#include "tree.h"
     -+
     -+#include "basics.h"
     -+#include "record.h"
     -+#include "reftable.h"
     -+#include "test_framework.h"
     -+
     -+static int test_compare(const void *a, const void *b)
     -+{
     -+	return a - b;
     -+}
     -+
     -+struct curry {
     -+	void *last;
     -+};
     -+
     -+void check_increasing(void *arg, void *key)
     -+{
     -+	struct curry *c = (struct curry *)arg;
     -+	if (c->last != NULL) {
     -+		assert(test_compare(c->last, key) < 0);
     -+	}
     -+	c->last = key;
     -+}
     -+
     -+void test_tree()
     -+{
     -+	struct tree_node *root = NULL;
     -+
     -+	void *values[11] = {};
     -+	struct tree_node *nodes[11] = {};
     -+	int i = 1;
     -+	do {
     -+		nodes[i] = tree_search(values + i, &root, &test_compare, 1);
     -+		i = (i * 7) % 11;
     -+	} while (i != 1);
     -+
     -+	for (i = 1; i < ARRAYSIZE(nodes); i++) {
     -+		assert(values + i == nodes[i]->key);
     -+		assert(nodes[i] ==
     -+		       tree_search(values + i, &root, &test_compare, 0));
     -+	}
     -+
     -+	struct curry c = {};
     -+	infix_walk(root, check_increasing, &c);
     -+	tree_free(root);
     -+}
     -+
     -+int main()
     -+{
     -+	add_test_case("test_tree", &test_tree);
     -+	test_main();
     -+}
     -
       diff --git a/reftable/writer.c b/reftable/writer.c
       new file mode 100644
       --- /dev/null
     @@ -7764,7 +5978,7 @@
      +{
      +	int err = 0;
      +	int i = 0;
     -+	qsort(refs, n, sizeof(struct ref_record), ref_record_compare_name);
     ++	QSORT(refs, n, ref_record_compare_name);
      +	for (i = 0; err == 0 && i < n; i++) {
      +		err = writer_add_ref(w, &refs[i]);
      +	}
     @@ -7801,7 +6015,7 @@
      +{
      +	int err = 0;
      +	int i = 0;
     -+	qsort(logs, n, sizeof(struct log_record), log_record_compare_key);
     ++	QSORT(logs, n, log_record_compare_key);
      +	for (i = 0; err == 0 && i < n; i++) {
      +		err = writer_add_log(w, &logs[i]);
      +	}
     @@ -7948,8 +6162,7 @@
      +{
      +	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
      +
     -+	free(entry->offsets);
     -+	entry->offsets = NULL;
     ++	FREE_AND_NULL(entry->offsets);
      +	free(slice_yield(&entry->hash));
      +	free(entry);
      +}
     @@ -8053,8 +6266,7 @@
      +		free(slice_yield(&w->index[i].last_key));
      +	}
      +
     -+	free(w->index);
     -+	w->index = NULL;
     ++	FREE_AND_NULL(w->index);
      +	w->index_len = 0;
      +	w->index_cap = 0;
      +}
     @@ -8189,3 +6401,101 @@
      +int writer_finish_public_section(struct writer *w);
      +
      +#endif
     +
     + diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
     + new file mode 100644
     + --- /dev/null
     + +++ b/reftable/zlib-compat.c
     +@@
     ++/* taken from zlib's uncompr.c
     ++
     ++   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
     ++   Author: Mark Adler <madler@alumni.caltech.edu>
     ++   Date:   Sun Jan 15 09:18:46 2017 -0800
     ++
     ++       zlib 1.2.11
     ++
     ++*/
     ++
     ++/*
     ++ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
     ++ * For conditions of distribution and use, see copyright notice in zlib.h
     ++ */
     ++
     ++#include "system.h"
     ++
     ++/* clang-format off */
     ++
     ++/* ===========================================================================
     ++     Decompresses the source buffer into the destination buffer.  *sourceLen is
     ++   the byte length of the source buffer. Upon entry, *destLen is the total size
     ++   of the destination buffer, which must be large enough to hold the entire
     ++   uncompressed data. (The size of the uncompressed data must have been saved
     ++   previously by the compressor and transmitted to the decompressor by some
     ++   mechanism outside the scope of this compression library.) Upon exit,
     ++   *destLen is the size of the decompressed data and *sourceLen is the number
     ++   of source bytes consumed. Upon return, source + *sourceLen points to the
     ++   first unused input byte.
     ++
     ++     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
     ++   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
     ++   Z_DATA_ERROR if the input data was corrupted, including if the input data is
     ++   an incomplete zlib stream.
     ++*/
     ++int ZEXPORT uncompress_return_consumed (
     ++    Bytef *dest,
     ++    uLongf *destLen,
     ++    const Bytef *source,
     ++    uLong *sourceLen) {
     ++    z_stream stream;
     ++    int err;
     ++    const uInt max = (uInt)-1;
     ++    uLong len, left;
     ++    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
     ++
     ++    len = *sourceLen;
     ++    if (*destLen) {
     ++        left = *destLen;
     ++        *destLen = 0;
     ++    }
     ++    else {
     ++        left = 1;
     ++        dest = buf;
     ++    }
     ++
     ++    stream.next_in = (z_const Bytef *)source;
     ++    stream.avail_in = 0;
     ++    stream.zalloc = (alloc_func)0;
     ++    stream.zfree = (free_func)0;
     ++    stream.opaque = (voidpf)0;
     ++
     ++    err = inflateInit(&stream);
     ++    if (err != Z_OK) return err;
     ++
     ++    stream.next_out = dest;
     ++    stream.avail_out = 0;
     ++
     ++    do {
     ++        if (stream.avail_out == 0) {
     ++            stream.avail_out = left > (uLong)max ? max : (uInt)left;
     ++            left -= stream.avail_out;
     ++        }
     ++        if (stream.avail_in == 0) {
     ++            stream.avail_in = len > (uLong)max ? max : (uInt)len;
     ++            len -= stream.avail_in;
     ++        }
     ++        err = inflate(&stream, Z_NO_FLUSH);
     ++    } while (err == Z_OK);
     ++
     ++    *sourceLen -= len + stream.avail_in;
     ++    if (dest != buf)
     ++        *destLen = stream.total_out;
     ++    else if (stream.total_out && err == Z_BUF_ERROR)
     ++        left = 1;
     ++
     ++    inflateEnd(&stream);
     ++    return err == Z_STREAM_END ? Z_OK :
     ++           err == Z_NEED_DICT ? Z_DATA_ERROR  :
     ++           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
     ++           err;
     ++}
 5:  721201269d ! 6:  0b2a1a81d6 Reftable support for git-core
     @@ -6,12 +6,14 @@
      
          TODO:
      
     +     * Make CI on gitgitgadget compile.
     +     * Redo interaction between repo config, repo storage version
           * Resolve the design problem with reflog expiry.
           * Resolve spots marked with XXX
      
          Example use:
      
     -      $ ~/vc/git/git init
     +      $ ~/vc/git/git init --reftable
            warning: templates not found in /usr/local/google/home/hanwen/share/git-core/templates
            Initialized empty Git repository in /tmp/qz/.git/
            $ echo q > a
     @@ -36,7 +38,6 @@
            ** LOGS **
            reftable.LogRecord{RefName:"HEAD", UpdateIndex:0x2, New:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, Old:[]uint8{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, Name:"Han-Wen Nienhuys", Email:"hanwen@google.com", Time:0x5e29ef27, TZOffset:100, Message:"commit (initial): x\n"}
      
     -    Change-Id: I225ee6317b7911edf9aa95f43299f6c7c4511914
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
       diff --git a/Makefile b/Makefile
     @@ -84,6 +85,7 @@
      +REFTABLE_OBJS += reftable/stack.o
      +REFTABLE_OBJS += reftable/tree.o
      +REFTABLE_OBJS += reftable/writer.o
     ++REFTABLE_OBJS += reftable/zlib-compat.o
      +
      +
       TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
     @@ -106,6 +108,114 @@ $^
       
       Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
      
     + diff --git a/builtin/init-db.c b/builtin/init-db.c
     + --- a/builtin/init-db.c
     + +++ b/builtin/init-db.c
     +@@
     + }
     + 
     + static int create_default_files(const char *template_path,
     +-				const char *original_git_dir)
     ++				const char *original_git_dir, int flags)
     + {
     + 	struct stat st1;
     + 	struct strbuf buf = STRBUF_INIT;
     +@@
     + 	 * Create the default symlink from ".git/HEAD" to the "master"
     + 	 * branch, if it does not exist yet.
     + 	 */
     +-	path = git_path_buf(&buf, "HEAD");
     +-	reinit = (!access(path, R_OK)
     +-		  || readlink(path, junk, sizeof(junk)-1) != -1);
     ++	if (flags & INIT_DB_REFTABLE) {
     ++		reinit = 0; /* XXX - how do we recognize a reinit,
     ++			     * and what should we do? */
     ++	} else {
     ++		path = git_path_buf(&buf, "HEAD");
     ++		reinit = (!access(path, R_OK) ||
     ++			  readlink(path, junk, sizeof(junk) - 1) != -1);
     ++	}
     ++
     + 	if (!reinit) {
     + 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
     + 			exit(1);
     + 	}
     + 
     + 	/* This forces creation of new config file */
     +-	xsnprintf(repo_version_string, sizeof(repo_version_string),
     +-		  "%d", GIT_REPO_VERSION);
     ++	xsnprintf(repo_version_string, sizeof(repo_version_string), "%d",
     ++		  flags & INIT_DB_REFTABLE ? GIT_REPO_VERSION_READ :
     ++					     GIT_REPO_VERSION);
     + 	git_config_set("core.repositoryformatversion", repo_version_string);
     + 
     + 	/* Check filemode trustability */
     +@@
     + 	 */
     + 	check_repository_format();
     + 
     +-	reinit = create_default_files(template_dir, original_git_dir);
     ++	reinit = create_default_files(template_dir, original_git_dir, flags);
     + 
     + 	create_object_directory();
     + 
     +@@
     + 		git_config_set("receive.denyNonFastforwards", "true");
     + 	}
     + 
     ++	if (flags & INIT_DB_REFTABLE) {
     ++		git_config_set("extensions.refStorage", "reftable");
     ++	}
     ++
     + 	if (!(flags & INIT_DB_QUIET)) {
     + 		int len = strlen(git_dir);
     + 
     +@@
     + 	const char *template_dir = NULL;
     + 	unsigned int flags = 0;
     + 	const struct option init_db_options[] = {
     +-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
     +-				N_("directory from which templates will be used")),
     ++		OPT_STRING(0, "template", &template_dir,
     ++			   N_("template-directory"),
     ++			   N_("directory from which templates will be used")),
     + 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
     +-				N_("create a bare repository"), 1),
     ++			    N_("create a bare repository"), 1),
     + 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
     +-			N_("permissions"),
     +-			N_("specify that the git repository is to be shared amongst several users"),
     +-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
     ++		  N_("permissions"),
     ++		  N_("specify that the git repository is to be shared amongst several users"),
     ++		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
     + 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
     ++		OPT_BIT(0, "reftable", &flags, N_("use reftable"),
     ++			INIT_DB_REFTABLE),
     + 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
     + 			   N_("separate git dir from working tree")),
     + 		OPT_END()
     +
     + diff --git a/cache.h b/cache.h
     + --- a/cache.h
     + +++ b/cache.h
     +@@
     + 
     + #define INIT_DB_QUIET 0x0001
     + #define INIT_DB_EXIST_OK 0x0002
     ++#define INIT_DB_REFTABLE 0x0004
     + 
     + int init_db(const char *git_dir, const char *real_git_dir,
     + 	    const char *template_dir, unsigned int flags);
     +@@
     + 	int is_bare;
     + 	int hash_algo;
     + 	char *work_tree;
     ++	char *ref_storage;
     + 	struct string_list unknown_extensions;
     + };
     + 
     +
       diff --git a/refs.c b/refs.c
       --- a/refs.c
       +++ b/refs.c
     @@ -119,43 +229,65 @@ $^
       static struct ref_storage_be *find_ref_storage_backend(const char *name)
       {
      @@
     - static struct ref_store *ref_store_init(const char *gitdir,
     +  * Create, record, and return a ref_store instance for the specified
     +  * gitdir.
     +  */
     +-static struct ref_store *ref_store_init(const char *gitdir,
     ++static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
       					unsigned int flags)
       {
      -	const char *be_name = "files";
      -	struct ref_storage_be *be = find_ref_storage_backend(be_name);
     -+	struct strbuf refs_path = STRBUF_INIT;
     -+
     -+        /* XXX this should probably come from a git config setting and not
     -+           default to reftable. */
     -+	const char *be_name = "reftable";
      +	struct ref_storage_be *be;
       	struct ref_store *refs;
       
     -+	strbuf_addstr(&refs_path, gitdir);
     -+	strbuf_addstr(&refs_path, "/refs");
     -+
     -+	if (!is_directory(refs_path.buf)) {
     -+		be_name = "reftable";
     -+	}
     -+	strbuf_release(&refs_path);
      +	be = find_ref_storage_backend(be_name);
       	if (!be)
       		BUG("reference backend %s is unknown", be_name);
       
     -
     - diff --git a/refs.h b/refs.h
     - --- a/refs.h
     - +++ b/refs.h
      @@
     - 				     const char *refname,
     - 				     each_reflog_ent_fn fn,
     - 				     void *cb_data);
     -+
     -+/* XXX which ordering are these? Newest or oldest first? */
     - int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
     - int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
     + 	if (!r->gitdir)
     + 		BUG("attempting to get main_ref_store outside of repository");
       
     +-	r->refs = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
     ++	r->refs = ref_store_init(r->gitdir,
     ++				 /* XXX r->ref_storage_format == NULL. Where
     ++				  * should the config file be parsed out? */
     ++				 r->ref_storage_format ? r->ref_storage_format :
     ++							 "reftable",
     ++				 REF_STORE_ALL_CAPS);
     + 	return r->refs;
     + }
     + 
     +@@
     + 		goto done;
     + 
     + 	/* assume that add_submodule_odb() has been called */
     +-	refs = ref_store_init(submodule_sb.buf,
     ++	refs = ref_store_init(submodule_sb.buf, "files", /* XXX */
     + 			      REF_STORE_READ | REF_STORE_ODB);
     + 	register_ref_store_map(&submodule_ref_stores, "submodule",
     + 			       refs, submodule);
     +@@
     + 
     + struct ref_store *get_worktree_ref_store(const struct worktree *wt)
     + {
     ++	const char *format = "files"; /* XXX */
     + 	struct ref_store *refs;
     + 	const char *id;
     + 
     +@@
     + 
     + 	if (wt->id)
     + 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
     +-				      REF_STORE_ALL_CAPS);
     ++				      format, REF_STORE_ALL_CAPS);
     + 	else
     +-		refs = ref_store_init(get_git_common_dir(),
     ++		refs = ref_store_init(get_git_common_dir(), format,
     + 				      REF_STORE_ALL_CAPS);
     + 
     + 	if (refs)
      
       diff --git a/refs/refs-internal.h b/refs/refs-internal.h
       --- a/refs/refs-internal.h
     @@ -184,6 +316,11 @@ $^
      +
      +#include "../reftable/reftable.h"
      +
     ++/*
     ++  The reftable v1 spec only supports 20-byte binary SHA1s. A new format version
     ++  will be needed to support SHA256.
     ++ */
     ++
      +extern struct ref_storage_be refs_be_reftable;
      +
      +struct reftable_ref_store {
     @@ -244,16 +381,28 @@ $^
      +	base_ref_store_init(ref_store, &refs_be_reftable);
      +	refs->store_flags = store_flags;
      +
     -+	strbuf_addf(&sb, "%s/refs", path);
     -+	refs->table_list_file = xstrdup(sb.buf);
     -+	strbuf_reset(&sb);
      +	strbuf_addf(&sb, "%s/reftable", path);
      +	refs->reftable_dir = xstrdup(sb.buf);
     -+	strbuf_release(&sb);
     ++	strbuf_reset(&sb);
     ++
     ++	strbuf_addf(&sb, "%s/reftable/tables.list", path);
     ++	refs->table_list_file = xstrdup(sb.buf);
     ++	strbuf_reset(&sb);
     ++
     ++	strbuf_addf(&sb, "%s/refs", path);
     ++	safe_create_dir(sb.buf, 1);
     ++	strbuf_reset(&sb);
     ++
     ++	strbuf_addf(&sb, "%s/HEAD", path);
     ++	write_file(sb.buf, "ref: refs/.invalid");
     ++	strbuf_reset(&sb);
     ++
     ++	strbuf_addf(&sb, "%s/refs/heads", path);
     ++	write_file(sb.buf, "this repository uses the reftable format");
      +
      +	refs->err = new_stack(&refs->stack, refs->reftable_dir,
      +			      refs->table_list_file, cfg);
     -+
     ++	strbuf_release(&sb);
      +	return ref_store;
      +}
      +
     @@ -261,13 +410,13 @@ $^
      +{
      +	struct reftable_ref_store *refs =
      +		(struct reftable_ref_store *)ref_store;
     ++
     ++	safe_create_dir(refs->reftable_dir, 1);
      +	FILE *f = fopen(refs->table_list_file, "a");
      +	if (f == NULL) {
      +		return -1;
      +	}
      +	fclose(f);
     -+
     -+	safe_create_dir(refs->reftable_dir, 1);
      +	return 0;
      +}
      +
     @@ -277,31 +426,33 @@ $^
      +	struct ref_record ref;
      +	struct object_id oid;
      +	struct ref_store *ref_store;
     ++	unsigned int flags;
     ++	int err;
      +	char *prefix;
      +};
      +
      +static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
      +{
     -+	while (1) {
     -+		struct reftable_iterator *ri =
     -+			(struct reftable_iterator *)ref_iterator;
     -+		int err = iterator_next_ref(ri->iter, &ri->ref);
     -+		if (err > 0) {
     -+			return ITER_DONE;
     -+		}
     -+		if (err < 0) {
     -+			return ITER_ERROR;
     ++	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
     ++	while (ri->err == 0) {
     ++		ri->err = iterator_next_ref(ri->iter, &ri->ref);
     ++		if (ri->err) {
     ++			break;
      +		}
      +
      +		ri->base.refname = ri->ref.ref_name;
      +		if (ri->prefix != NULL &&
      +		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
     -+			return ITER_DONE;
     ++			ri->err = 1;
     ++			break;
      +		}
     ++		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
     ++		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
     ++			continue;
      +
      +		ri->base.flags = 0;
      +		if (ri->ref.value != NULL) {
     -+			memcpy(ri->oid.hash, ri->ref.value, GIT_SHA1_RAWSZ);
     ++			hashcpy(ri->oid.hash, ri->ref.value);
      +		} else if (ri->ref.target != NULL) {
      +			int out_flags = 0;
      +			const char *resolved = refs_resolve_ref_unsafe(
     @@ -309,14 +460,30 @@ $^
      +				RESOLVE_REF_READING, &ri->oid, &out_flags);
      +			ri->base.flags = out_flags;
      +			if (resolved == NULL &&
     -+			    !(ri->base.flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
     ++			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
      +			    (ri->base.flags & REF_ISBROKEN)) {
      +				continue;
      +			}
      +		}
     ++
      +		ri->base.oid = &ri->oid;
     -+		return ITER_OK;
     ++		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
     ++		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
     ++					    ri->base.flags)) {
     ++			continue;
     ++		}
     ++
     ++		break;
     ++	}
     ++
     ++	if (ri->err > 0) {
     ++		return ITER_DONE;
      +	}
     ++	if (ri->err < 0) {
     ++		return ITER_ERROR;
     ++	}
     ++
     ++	return ITER_OK;
      +}
      +
      +static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
     @@ -324,7 +491,7 @@ $^
      +{
      +	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
      +	if (ri->ref.target_value != NULL) {
     -+		memcpy(peeled->hash, ri->ref.target_value, GIT_SHA1_RAWSZ);
     ++		hashcpy(peeled->hash, ri->ref.target_value);
      +		return 0;
      +	}
      +
     @@ -352,20 +519,15 @@ $^
      +		(struct reftable_ref_store *)ref_store;
      +	struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
      +	struct merged_table *mt = NULL;
     -+	int err = 0;
     -+	if (refs->err) {
     -+		/* how to propagate errors? */
     -+		return NULL;
     -+	}
      +
      +	mt = stack_merged_table(refs->stack);
     -+
     -+	/* XXX something with flags? */
     -+	err = merged_table_seek_ref(mt, &ri->iter, prefix);
     -+	/* XXX what to do with err? */
     -+	assert(err == 0);
     ++	ri->err = refs->err;
     ++	if (ri->err == 0) {
     ++		ri->err = merged_table_seek_ref(mt, &ri->iter, prefix);
     ++	}
      +	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
      +	ri->base.oid = &ri->oid;
     ++	ri->flags = flags;
      +	ri->ref_store = ref_store;
      +	return &ri->base;
      +}
     @@ -387,12 +549,6 @@ $^
      +	return 0;
      +}
      +
     -+static int ref_update_cmp(const void *a, const void *b)
     -+{
     -+	return strcmp(((struct ref_update *)a)->refname,
     -+		      ((struct ref_update *)b)->refname);
     -+}
     -+
      +static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
      +				  struct object_id *want_oid)
      +{
     @@ -411,6 +567,12 @@ $^
      +	return 0;
      +}
      +
     ++static int ref_update_cmp(const void *a, const void *b)
     ++{
     ++	return strcmp(((struct ref_update *)a)->refname,
     ++		      ((struct ref_update *)b)->refname);
     ++}
     ++
      +static int write_transaction_table(struct writer *writer, void *arg)
      +{
      +	struct ref_transaction *transaction = (struct ref_transaction *)arg;
     @@ -418,13 +580,14 @@ $^
      +		(struct reftable_ref_store *)transaction->ref_store;
      +	uint64_t ts = stack_next_update_index(refs->stack);
      +	int err = 0;
     -+	/* XXX - are we allowed to mutate the input data? */
     -+	qsort(transaction->updates, transaction->nr,
     -+	      sizeof(struct ref_update *), ref_update_cmp);
     ++	struct ref_update **sorted =
     ++		malloc(transaction->nr * sizeof(struct ref_update *));
     ++	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
     ++	QSORT(sorted, transaction->nr, ref_update_cmp);
      +	writer_set_limits(writer, ts, ts);
      +
      +	for (int i = 0; i < transaction->nr; i++) {
     -+		struct ref_update *u = transaction->updates[i];
     ++		struct ref_update *u = sorted[i];
      +		if (u->flags & REF_HAVE_OLD) {
      +			err = reftable_check_old_oid(transaction->ref_store,
      +						     u->refname, &u->old_oid);
     @@ -435,11 +598,12 @@ $^
      +	}
      +
      +	for (int i = 0; i < transaction->nr; i++) {
     -+		struct ref_update *u = transaction->updates[i];
     ++		struct ref_update *u = sorted[i];
      +		if (u->flags & REF_HAVE_NEW) {
      +			struct object_id out_oid = {};
      +			int out_flags = 0;
     -+			/* XXX who owns the memory here? */
     ++			/* Memory owned by refs_resolve_ref_unsafe, no need to
     ++			 * free(). */
      +			const char *resolved = refs_resolve_ref_unsafe(
      +				transaction->ref_store, u->refname, 0, &out_oid,
      +				&out_flags);
     @@ -456,7 +620,7 @@ $^
      +	}
      +
      +	for (int i = 0; i < transaction->nr; i++) {
     -+		struct ref_update *u = transaction->updates[i];
     ++		struct ref_update *u = sorted[i];
      +		struct log_record log = {};
      +		fill_log_record(&log);
      +
     @@ -473,6 +637,7 @@ $^
      +		}
      +	}
      +exit:
     ++	free(sorted);
      +	return err;
      +}
      +
     @@ -527,16 +692,21 @@ $^
      +
      +	for (int i = 0; i < arg->refnames->nr; i++) {
      +		struct log_record log = {};
     ++		struct ref_record current = {};
      +		fill_log_record(&log);
      +		log.message = xstrdup(arg->logmsg);
      +		log.new_hash = NULL;
     -+
     -+		/* XXX should lookup old oid. */
      +		log.old_hash = NULL;
      +		log.update_index = ts;
      +		log.ref_name = (char *)arg->refnames->items[i].string;
      +
     ++		if (stack_read_ref(arg->stack, log.ref_name, &current) == 0) {
     ++			log.old_hash = current.value;
     ++		}
      +		err = writer_add_log(writer, &log);
     ++		log.old_hash = NULL;
     ++		ref_record_clear(&current);
     ++
      +		clear_log_record(&log);
      +		if (err < 0) {
      +			return err;
     @@ -564,12 +734,11 @@ $^
      +{
      +	struct reftable_ref_store *refs =
      +		(struct reftable_ref_store *)ref_store;
     -+	/* XXX reflog expiry. */
      +	return stack_compact_all(refs->stack, NULL);
      +}
      +
      +struct write_create_symref_arg {
     -+	struct stack *stack;
     ++	struct reftable_ref_store *refs;
      +	const char *refname;
      +	const char *target;
      +	const char *logmsg;
     @@ -579,7 +748,7 @@ $^
      +{
      +	struct write_create_symref_arg *create =
      +		(struct write_create_symref_arg *)arg;
     -+	uint64_t ts = stack_next_update_index(create->stack);
     ++	uint64_t ts = stack_next_update_index(create->refs->stack);
      +	int err = 0;
      +
      +	struct ref_record ref = {
     @@ -593,8 +762,36 @@ $^
      +		return err;
      +	}
      +
     -+	/* XXX reflog? */
     ++	{
     ++		struct log_record log = {};
     ++		struct object_id new_oid = {};
     ++		struct object_id old_oid = {};
     ++		struct ref_record current = {};
     ++		stack_read_ref(create->refs->stack, create->refname, &current);
     ++
     ++		fill_log_record(&log);
     ++		log.ref_name = current.ref_name;
     ++		if (refs_resolve_ref_unsafe(
     ++			    (struct ref_store *)create->refs, create->refname,
     ++			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
     ++			log.old_hash = old_oid.hash;
     ++		}
     ++
     ++		/* XXX should the resolution be done relative or absolute? */
     ++		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
     ++					    create->target, RESOLVE_REF_READING,
     ++					    &new_oid, NULL) != NULL) {
     ++			log.new_hash = new_oid.hash;
     ++		}
      +
     ++		if (log.old_hash != NULL || log.new_hash != NULL) {
     ++			writer_add_log(writer, &log);
     ++		}
     ++		log.ref_name = NULL;
     ++		log.old_hash = NULL;
     ++		log.new_hash = NULL;
     ++		clear_log_record(&log);
     ++	}
      +	return 0;
      +}
      +
     @@ -604,7 +801,7 @@ $^
      +{
      +	struct reftable_ref_store *refs =
      +		(struct reftable_ref_store *)ref_store;
     -+	struct write_create_symref_arg arg = { .stack = refs->stack,
     ++	struct write_create_symref_arg arg = { .refs = refs,
      +					       .refname = refname,
      +					       .target = target,
      +					       .logmsg = logmsg };
     @@ -628,7 +825,11 @@ $^
      +		goto exit;
      +	}
      +
     -+	/* XXX should check that dest doesn't exist? */
     ++	/* XXX do ref renames overwrite the target? */
     ++	if (stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
     ++		goto exit;
     ++	}
     ++
      +	free(ref.ref_name);
      +	ref.ref_name = strdup(arg->newname);
      +	writer_set_limits(writer, ts, ts);
     @@ -638,6 +839,7 @@ $^
      +		struct ref_record todo[2] = {};
      +		todo[0].ref_name = (char *)arg->oldname;
      +		todo[0].update_index = ts;
     ++		/* leave todo[0] empty */
      +		todo[1] = ref;
      +		todo[1].update_index = ts;
      +
     @@ -735,8 +937,7 @@ $^
      +
      +		free(ri->last_name);
      +		ri->last_name = xstrdup(ri->log.ref_name);
     -+		/* XXX const? */
     -+		memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ);
     ++		hashcpy(ri->oid.hash, ri->log.new_hash);
      +		return ITER_OK;
      +	}
      +}
     @@ -808,12 +1009,16 @@ $^
      +		{
      +			struct object_id old_oid = {};
      +			struct object_id new_oid = {};
     ++			const char *full_committer = "";
      +
     -+			memcpy(&old_oid.hash, log.old_hash, GIT_SHA1_RAWSZ);
     -+			memcpy(&new_oid.hash, log.new_hash, GIT_SHA1_RAWSZ);
     ++			hashcpy(old_oid.hash, log.old_hash);
     ++			hashcpy(new_oid.hash, log.new_hash);
      +
     -+			/* XXX committer = email? name? */
     -+			if (fn(&old_oid, &new_oid, log.name, log.time,
     ++			full_committer = fmt_ident(log.name, log.email,
     ++						   WANT_COMMITTER_IDENT,
     ++						   /*date*/ NULL,
     ++						   IDENT_NO_DATE);
     ++			if (fn(&old_oid, &new_oid, full_committer, log.time,
      +			       log.tz_offset, log.message, cb_data)) {
      +				err = -1;
      +				break;
     @@ -844,7 +1049,6 @@ $^
      +	int cap = 0;
      +	int len = 0;
      +
     -+	printf("oldest first\n");
      +	while (err == 0) {
      +		struct log_record log = {};
      +		err = iterator_next_log(it, &log);
     @@ -868,12 +1072,15 @@ $^
      +		struct log_record *log = &logs[i];
      +		struct object_id old_oid = {};
      +		struct object_id new_oid = {};
     ++		const char *full_committer = "";
      +
     -+		memcpy(&old_oid.hash, log->old_hash, GIT_SHA1_RAWSZ);
     -+		memcpy(&new_oid.hash, log->new_hash, GIT_SHA1_RAWSZ);
     ++		hashcpy(old_oid.hash, log->old_hash);
     ++		hashcpy(new_oid.hash, log->new_hash);
      +
     -+		/* XXX committer = email? name? */
     -+		if (!fn(&old_oid, &new_oid, log->name, log->time,
     ++		full_committer = fmt_ident(log->name, log->email,
     ++					   WANT_COMMITTER_IDENT, NULL,
     ++					   IDENT_NO_DATE);
     ++		if (!fn(&old_oid, &new_oid, full_committer, log->time,
      +			log->tz_offset, log->message, cb_data)) {
      +			err = -1;
      +			break;
     @@ -921,18 +1128,11 @@ $^
      +				  reflog_expiry_cleanup_fn cleanup_fn,
      +				  void *policy_cb_data)
      +{
     ++	BUG("per ref reflog expiry is not implemented for reftable backend.");
      +	/*
     -+	  XXX
     -+
     -+	  This doesn't fit with the reftable API. If we are expiring for space
     -+	  reasons, the expiry should be combined with a compaction, and there
     -+	  should be a policy that can be called for all refnames, not for a
     -+	  single ref name.
     -+
     -+	  If this is for cleaning up individual entries, we'll have to write
     -+	  extra data to create tombstones.
     ++	  TODO: should iterate over the reflog, and write tombstones. The
     ++	  tombstones will be removed on compaction.
      +	 */
     -+	return 0;
      +}
      +
      +static int reftable_read_raw_ref(struct ref_store *ref_store,
     @@ -947,13 +1147,13 @@ $^
      +		goto exit;
      +	}
      +	if (ref.target != NULL) {
     ++		/* XXX recurse? */
      +		strbuf_reset(referent);
      +		strbuf_addstr(referent, ref.target);
      +		*type |= REF_ISSYMREF;
      +	} else {
     -+		memcpy(oid->hash, ref.value, GIT_SHA1_RAWSZ);
     ++		hashcpy(oid->hash, ref.value);
      +	}
     -+
      +exit:
      +	ref_record_clear(&ref);
      +	return err;
     @@ -986,3 +1186,58 @@ $^
      +	reftable_delete_reflog,
      +	reftable_reflog_expire
      +};
     +
     + diff --git a/repository.c b/repository.c
     + --- a/repository.c
     + +++ b/repository.c
     +@@
     + 	if (worktree)
     + 		repo_set_worktree(repo, worktree);
     + 
     ++	repo->ref_storage_format = format.ref_storage != NULL ?
     ++					   xstrdup(format.ref_storage) :
     ++					   "files"; /* XXX */
     ++
     + 	clear_repository_format(&format);
     + 	return 0;
     + 
     +
     + diff --git a/repository.h b/repository.h
     + --- a/repository.h
     + +++ b/repository.h
     +@@
     + 	/* The store in which the refs are held. */
     + 	struct ref_store *refs;
     + 
     ++	/* The format to use for the ref database. */
     ++	char *ref_storage_format;
     ++
     + 	/*
     + 	 * Contains path to often used file names.
     + 	 */
     +
     + diff --git a/setup.c b/setup.c
     + --- a/setup.c
     + +++ b/setup.c
     +@@
     + 			if (!value)
     + 				return config_error_nonbool(var);
     + 			data->partial_clone = xstrdup(value);
     +-		} else if (!strcmp(ext, "worktreeconfig"))
     ++		} else if (!strcmp(ext, "worktreeconfig")) {
     + 			data->worktree_config = git_config_bool(var, value);
     +-		else
     ++		} else if (!strcmp(ext, "refStorage")) {
     ++			data->ref_storage = xstrdup(value);
     ++		} else
     + 			string_list_append(&data->unknown_extensions, ext);
     + 	}
     + 
     +@@
     + 	string_list_clear(&format->unknown_extensions, 0);
     + 	free(format->work_tree);
     + 	free(format->partial_clone);
     ++	free(format->ref_storage);
     + 	init_repository_format(format);
     + }
     + 

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v3 1/6] refs.h: clarify reflog iteration order
  2020-02-04 20:27   ` [PATCH v3 0/6] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-02-04 20:27     ` Han-Wen Nienhuys via GitGitGadget
  2020-02-04 20:27     ` [PATCH v3 2/6] setup.c: enable repo detection for reftable Han-Wen Nienhuys via GitGitGadget
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-04 20:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/refs.h b/refs.h
index 545029c6d8..87c9ec921b 100644
--- a/refs.h
+++ b/refs.h
@@ -444,18 +444,21 @@ int delete_refs(const char *msg, struct string_list *refnames,
 int refs_delete_reflog(struct ref_store *refs, const char *refname);
 int delete_reflog(const char *refname);
 
-/* iterate over reflog entries */
+/* Iterate over reflog entries. */
 typedef int each_reflog_ent_fn(
 		struct object_id *old_oid, struct object_id *new_oid,
 		const char *committer, timestamp_t timestamp,
 		int tz, const char *msg, void *cb_data);
 
+/* Iterate in over reflog entries, oldest entry first. */
 int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
 			     each_reflog_ent_fn fn, void *cb_data);
 int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
 				     const char *refname,
 				     each_reflog_ent_fn fn,
 				     void *cb_data);
+
+/* Call a function for each reflog entry, oldest entry first. */
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v3 2/6] setup.c: enable repo detection for reftable
  2020-02-04 20:27   ` [PATCH v3 0/6] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-02-04 20:27     ` [PATCH v3 1/6] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
@ 2020-02-04 20:27     ` Han-Wen Nienhuys via GitGitGadget
  2020-02-04 20:31       ` Han-Wen Nienhuys
  2020-02-04 20:27     ` [PATCH v3 3/6] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
                       ` (4 subsequent siblings)
  6 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-04 20:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* only check R_OK for .git/refs.

  In the reftable format, tables are stored under $GITDIR/reftable/ and
  the list of tables is managed in a file $GITDIR/refs. Generally, this
  file does not have the +x bit set

* allow missing HEAD if there is a reftable/

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 setup.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/setup.c b/setup.c
index e2a479a64f..e7c7dca701 100644
--- a/setup.c
+++ b/setup.c
@@ -304,22 +304,23 @@ int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
  *  - either an objects/ directory _or_ the proper
  *    GIT_OBJECT_DIRECTORY environment variable
  *  - a refs/ directory
- *  - either a HEAD symlink or a HEAD file that is formatted as
- *    a proper "ref:", or a regular file HEAD that has a properly
- *    formatted sha1 object name.
+ *  - a reftable/ director or, either a HEAD ref.
+ *    The HEAD ref should be a HEAD symlink or a HEAD file that is formatted as
+ *    a proper "ref:", or a regular file HEAD that has a properly formatted sha1
+ *    object name.
  */
 int is_git_directory(const char *suspect)
 {
 	struct strbuf path = STRBUF_INIT;
 	int ret = 0;
+        int have_head = 0;
 	size_t len;
 
 	/* Check worktree-related signatures */
 	strbuf_addstr(&path, suspect);
 	strbuf_complete(&path, '/');
 	strbuf_addstr(&path, "HEAD");
-	if (validate_headref(path.buf))
-		goto done;
+	have_head = !validate_headref(path.buf);
 
 	strbuf_reset(&path);
 	get_common_dir(&path, suspect);
@@ -339,9 +340,16 @@ int is_git_directory(const char *suspect)
 
 	strbuf_setlen(&path, len);
 	strbuf_addstr(&path, "/refs");
-	if (access(path.buf, X_OK))
+	if (access(path.buf, R_OK))
 		goto done;
 
+        if (!have_head) {
+                strbuf_setlen(&path, len);
+                strbuf_addstr(&path, "/reftable");
+                if (access(path.buf, X_OK))
+                        goto done;
+        }
+
 	ret = 1;
 done:
 	strbuf_release(&path);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v3 3/6] create .git/refs in files-backend.c
  2020-02-04 20:27   ` [PATCH v3 0/6] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-02-04 20:27     ` [PATCH v3 1/6] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
  2020-02-04 20:27     ` [PATCH v3 2/6] setup.c: enable repo detection for reftable Han-Wen Nienhuys via GitGitGadget
@ 2020-02-04 20:27     ` Han-Wen Nienhuys via GitGitGadget
  2020-02-04 21:29       ` Junio C Hamano
  2020-02-05 11:42       ` SZEDER Gábor
  2020-02-04 20:27     ` [PATCH v3 4/6] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
                       ` (3 subsequent siblings)
  6 siblings, 2 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-04 20:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This prepares for supporting the reftable format, which will want
create its own file system layout in .git

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/init-db.c    | 2 --
 refs/files-backend.c | 5 +++++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 944ec77fe1..45bdea0589 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
 	 */
-	safe_create_dir(git_path("refs"), 1);
-	adjust_shared_perm(git_path("refs"));
 
 	if (refs_init_db(&err))
 		die("failed to set up refs db: %s", err.buf);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0ea66a28b6..0c53b246e8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3158,6 +3158,11 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
 	struct strbuf sb = STRBUF_INIT;
 
+	files_ref_path(refs, &sb, "refs");
+	safe_create_dir(sb.buf, 1);
+        /* adjust permissions even if directory already exists. */
+	adjust_shared_perm(sb.buf);
+
 	/*
 	 * Create .git/refs/{heads,tags}
 	 */
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v3 4/6] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-02-04 20:27   ` [PATCH v3 0/6] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                       ` (2 preceding siblings ...)
  2020-02-04 20:27     ` [PATCH v3 3/6] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
@ 2020-02-04 20:27     ` Han-Wen Nienhuys via GitGitGadget
  2020-02-04 20:27     ` [PATCH v3 5/6] Add reftable library Han-Wen Nienhuys via GitGitGadget
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-04 20:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index ff2436c0fb..1d7a485220 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -438,6 +438,11 @@ void base_ref_iterator_free(struct ref_iterator *iter);
 
 /* Virtual function declarations for ref_iterators: */
 
+/*
+ * backend-specific implementation of ref_iterator_advance.
+ * For symrefs, the function should set REF_ISSYMREF, and it should also dereference
+ * the symref to provide the OID referent.
+ */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v3 5/6] Add reftable library
  2020-02-04 20:27   ` [PATCH v3 0/6] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                       ` (3 preceding siblings ...)
  2020-02-04 20:27     ` [PATCH v3 4/6] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
@ 2020-02-04 20:27     ` Han-Wen Nienhuys via GitGitGadget
  2020-02-04 20:27     ` [PATCH v3 6/6] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
  2020-02-06 22:55     ` [PATCH v4 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  6 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-04 20:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a new format for storing the ref database. It provides the
following benefits:

 * Simple and fast atomic ref transactions, including multiple refs and reflogs.
 * Compact storage of ref data.
 * Fast look ups of ref data.
 * Case-sensitive ref names on Windows/OSX, regardless of file system
 * Eliminates file/directory conflicts in ref names

Further context and motivation can be found in background reading:

* Spec: https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md

* Original discussion on JGit-dev:  https://www.eclipse.org/lists/jgit-dev/msg03389.html

* First design discussion on git@vger: https://public-inbox.org/git/CAJo=hJtTp2eA3z9wW9cHo-nA7kK40vVThqh6inXpbCcqfdMP9g@mail.gmail.com/

* Last design discussion on git@vger: https://public-inbox.org/git/CAJo=hJsZcAM9sipdVr7TMD-FD2V2W6_pvMQ791EGCDsDkQ033w@mail.gmail.com/

* First attempt at implementation: https://public-inbox.org/git/CAP8UFD0PPZSjBnxCA7ez91vBuatcHKQ+JUWvTD1iHcXzPBjPBg@mail.gmail.com/

* libgit2 support issue: https://github.com/libgit2/libgit2/issues

* GitLab support issue: https://gitlab.com/gitlab-org/git/issues/6

* go-git support issue: https://github.com/src-d/go-git/issues/1059

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE       |   31 ++
 reftable/README.md     |   19 +
 reftable/VERSION       |    5 +
 reftable/basics.c      |  196 +++++++
 reftable/basics.h      |   37 ++
 reftable/block.c       |  401 +++++++++++++++
 reftable/block.h       |   71 +++
 reftable/blocksource.h |   20 +
 reftable/bytes.c       |    0
 reftable/config.h      |    1 +
 reftable/constants.h   |   27 +
 reftable/dump.c        |   97 ++++
 reftable/file.c        |   97 ++++
 reftable/iter.c        |  229 +++++++++
 reftable/iter.h        |   56 ++
 reftable/merged.c      |  286 +++++++++++
 reftable/merged.h      |   34 ++
 reftable/pq.c          |  114 +++++
 reftable/pq.h          |   34 ++
 reftable/reader.c      |  708 +++++++++++++++++++++++++
 reftable/reader.h      |   52 ++
 reftable/record.c      | 1107 ++++++++++++++++++++++++++++++++++++++++
 reftable/record.h      |   79 +++
 reftable/reftable.h    |  399 +++++++++++++++
 reftable/slice.c       |  199 ++++++++
 reftable/slice.h       |   39 ++
 reftable/stack.c       |  983 +++++++++++++++++++++++++++++++++++
 reftable/stack.h       |   40 ++
 reftable/system.h      |   57 +++
 reftable/tree.c        |   66 +++
 reftable/tree.h        |   24 +
 reftable/writer.c      |  622 ++++++++++++++++++++++
 reftable/writer.h      |   46 ++
 reftable/zlib-compat.c |   92 ++++
 34 files changed, 6268 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 0000000000..402e0f9356
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 0000000000..f527da0380
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,19 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   ((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
+    git clone https://github.com/google/reftable reftable-repo) && \
+   cp reftable-repo/c/*.[ch] reftable/ && \
+   cp reftable-repo/LICENSE reftable/ &&
+   git --git-dir reftable-repo/.git show --no-patch origin/master \
+    > reftable/VERSION && \
+   echo '/* empty */' > reftable/config.h
+   rm reftable/*_test.c reftable/test_framework.*
+   git add reftable/*.[ch]
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 0000000000..94624f583b
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,5 @@
+commit e54326f73d95bfe8b17f264c400f4c365dbd5e5e
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Tue Feb 4 19:48:17 2020 +0100
+
+    C: PRI?MAX use; clang-format.
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 0000000000..791dcc867a
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,196 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_u24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)((i)&0xff);
+}
+
+uint32_t get_u24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_u32(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 24) & 0xff);
+	out[1] = (byte)((i >> 16) & 0xff);
+	out[2] = (byte)((i >> 8) & 0xff);
+	out[3] = (byte)((i)&0xff);
+}
+
+uint32_t get_u32(byte *in)
+{
+	return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
+	       (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
+}
+
+void put_u64(byte *out, uint64_t v)
+{
+	int i = 0;
+	for (i = sizeof(uint64_t); i--;) {
+		out[i] = (byte)(v & 0xff);
+		v >>= 8;
+	}
+}
+
+uint64_t get_u64(byte *out)
+{
+	uint64_t v = 0;
+	int i = 0;
+	for (i = 0; i < sizeof(uint64_t); i++) {
+		v = (v << 8) | (byte)(out[i] & 0xff);
+	}
+	return v;
+}
+
+void put_u16(byte *out, uint16_t i)
+{
+	out[0] = (byte)((i >> 8) & 0xff);
+	out[1] = (byte)((i)&0xff);
+}
+
+uint16_t get_u16(byte *in)
+{
+	return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
+}
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(int sz, int (*f)(int k, void *args), void *args)
+{
+	int lo = 0;
+	int hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		int mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		free(*p);
+		p++;
+	}
+	free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	for (char **p = names; *p; p++) {
+		len++;
+	}
+	return len;
+}
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = realloc(names,
+						names_cap * sizeof(char *));
+			}
+			names[names_len++] = strdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *error_str(int err)
+{
+	switch (err) {
+	case IO_ERROR:
+		return "I/O error";
+	case FORMAT_ERROR:
+		return "FORMAT_ERROR";
+	case NOT_EXIST_ERROR:
+		return "NOT_EXIST_ERROR";
+	case LOCK_ERROR:
+		return "LOCK_ERROR";
+	case API_ERROR:
+		return "API_ERROR";
+	case ZLIB_ERROR:
+		return "ZLIB_ERROR";
+	case -1:
+		return "general error";
+	default:
+		return "unknown error code";
+	}
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 0000000000..0ad368cfd3
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,37 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+void put_u24(byte *out, uint32_t i);
+uint32_t get_u24(byte *in);
+
+uint64_t get_u64(byte *in);
+void put_u64(byte *out, uint64_t i);
+
+void put_u32(byte *out, uint32_t i);
+uint32_t get_u32(byte *in);
+
+void put_u16(byte *out, uint16_t i);
+uint16_t get_u16(byte *in);
+int binsearch(int sz, int (*f)(int k, void *args), void *args);
+
+void free_names(char **a);
+void parse_names(char *buf, int size, char ***namesp);
+int names_equal(char **a, char **b);
+int names_length(char **names);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 0000000000..ffbfee13c3
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,401 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "blocksource.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct record rec)
+{
+	struct slice empty = {};
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = {};
+	int n = 0;
+
+	record_key(rec, &key);
+	n = encode_key(&restart, out, last, key, record_val_type(rec));
+	if (n < 0) {
+		goto err;
+	}
+	out.buf += n;
+	out.len -= n;
+
+	n = record_encode(rec, out, w->hash_size);
+	if (n < 0) {
+		goto err;
+	}
+
+	out.buf += n;
+	out.len -= n;
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0) {
+		goto err;
+	}
+
+	free(slice_yield(&key));
+	return 0;
+
+err:
+	free(slice_yield(&key));
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next) {
+		return -1;
+	}
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_u24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_u16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_u24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = {};
+		uLongf dest_len = 0, src_len = 0;
+		slice_resize(&compressed, w->next - block_header_skip);
+
+		dest_len = compressed.len;
+		src_len = w->next - block_header_skip;
+
+		if (Z_OK != compress2(compressed.buf, &dest_len,
+				      w->buf + block_header_skip, src_len, 9)) {
+			free(slice_yield(&compressed));
+			return ZLIB_ERROR;
+		}
+		memcpy(w->buf + block_header_skip, compressed.buf, dest_len);
+		w->next = dest_len + block_header_skip;
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_u24(block->data + header_off + 1);
+
+	if (!is_block_type(typ)) {
+		return FORMAT_ERROR;
+	}
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = {};
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip;
+		uLongf src_len = block->len - block_header_skip;
+
+		slice_resize(&uncompressed, sz);
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			free(slice_yield(&uncompressed));
+			return ZLIB_ERROR;
+		}
+
+		block_source_return_block(block->source, block);
+		block->data = uncompressed.buf;
+		block->len = dst_len; /* XXX: 4 bytes missing? */
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	{
+		uint16_t restart_count = get_u16(block->data + sz - 2);
+		uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+		byte *restart_bytes = block->data + restart_start;
+
+		/* transfer ownership. */
+		br->block = *block;
+		block->data = NULL;
+		block->len = 0;
+
+		br->hash_size = hash_size;
+		br->block_len = restart_start;
+		br->full_block_size = full_block_size;
+		br->header_off = header_off;
+		br->restart_count = restart_count;
+		br->restart_bytes = restart_bytes;
+	}
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_u24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	struct slice key;
+	struct block_reader *r;
+	int error;
+};
+
+static int restart_key_less(int idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = {};
+	struct slice last_key = {};
+	byte unused_extra;
+	int n = decode_key(&rkey, &unused_extra, last_key, in);
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	{
+		int result = slice_compare(a->key, rkey);
+		free(slice_yield(&rkey));
+		return result;
+	}
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct record rec)
+{
+	if (it->next_off >= it->br->block_len) {
+		return 1;
+	}
+
+	{
+		struct slice in = {
+			.buf = it->br->block.data + it->next_off,
+			.len = it->br->block_len - it->next_off,
+		};
+		struct slice start = in;
+		struct slice key = {};
+		byte extra;
+		int n = decode_key(&key, &extra, it->last_key, in);
+		if (n < 0) {
+			return -1;
+		}
+
+		in.buf += n;
+		in.len -= n;
+		n = record_decode(rec, key, extra, in, it->br->hash_size);
+		if (n < 0) {
+			return -1;
+		}
+		in.buf += n;
+		in.len -= n;
+
+		slice_copy(&it->last_key, key);
+		it->next_off += start.len - in.len;
+		free(slice_yield(&key));
+		return 0;
+	}
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = {};
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	byte extra = 0;
+	int n = decode_key(key, &extra, empty, in);
+	if (n < 0) {
+		return n;
+	}
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	free(slice_yield(&it->last_key));
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		return -1;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	{
+		struct record rec = new_record(block_reader_type(br));
+		struct slice key = {};
+		int result = 0;
+		int err = 0;
+		struct block_iter next = {};
+		while (true) {
+			block_iter_copy_from(&next, it);
+
+			err = block_iter_next(&next, rec);
+			if (err < 0) {
+				result = -1;
+				goto exit;
+			}
+
+			record_key(rec, &key);
+			if (err > 0 || slice_compare(key, want) >= 0) {
+				result = 0;
+				goto exit;
+			}
+
+			block_iter_copy_from(it, &next);
+		}
+
+	exit:
+		free(slice_yield(&key));
+		free(slice_yield(&next.last_key));
+		record_clear(rec);
+		free(record_yield(&rec));
+
+		return result;
+	}
+}
+
+void block_writer_reset(struct block_writer *bw)
+{
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	free(slice_yield(&bw->last_key));
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 0000000000..bb42588111
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,71 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+	uint32_t header_off;
+	int restart_interval;
+	int hash_size;
+
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+	struct slice last_key;
+	int entries;
+};
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+byte block_writer_type(struct block_writer *bw);
+int block_writer_add(struct block_writer *w, struct record rec);
+int block_writer_finish(struct block_writer *w);
+void block_writer_reset(struct block_writer *bw);
+void block_writer_clear(struct block_writer *bw);
+
+struct block_reader {
+	uint32_t header_off;
+	struct block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint32_t full_block_size;
+	uint16_t restart_count;
+};
+
+struct block_iter {
+	struct block_reader *br;
+	struct slice last_key;
+	uint32_t next_off;
+};
+
+int block_reader_init(struct block_reader *br, struct block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want);
+byte block_reader_type(struct block_reader *r);
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+int block_iter_next(struct block_iter *it, struct record rec);
+int block_iter_seek(struct block_iter *it, struct slice want);
+void block_iter_close(struct block_iter *it);
+
+#endif
diff --git a/reftable/blocksource.h b/reftable/blocksource.h
new file mode 100644
index 0000000000..f3ad3a4c22
--- /dev/null
+++ b/reftable/blocksource.h
@@ -0,0 +1,20 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCKSOURCE_H
+#define BLOCKSOURCE_H
+
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source source);
+
+#endif
diff --git a/reftable/bytes.c b/reftable/bytes.c
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/reftable/config.h b/reftable/config.h
new file mode 100644
index 0000000000..40a8c178f1
--- /dev/null
+++ b/reftable/config.h
@@ -0,0 +1 @@
+/* empty */
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 0000000000..cd35704610
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,27 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+#define VERSION 1
+#define HEADER_SIZE 24
+#define FOOTER_SIZE 68
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 0000000000..acabe18fbe
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "reftable.h"
+
+static int dump_table(const char *tablename)
+{
+	struct block_source src = {};
+	int err = block_source_from_file(&src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	struct reader *r = NULL;
+	err = new_reader(&r, src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct iterator it = {};
+		err = reader_seek_ref(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+
+		struct ref_record ref = {};
+		while (1) {
+			err = iterator_next_ref(it, &ref);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			ref_record_print(&ref, 20);
+		}
+		iterator_destroy(&it);
+		ref_record_clear(&ref);
+	}
+
+	{
+		struct iterator it = {};
+		err = reader_seek_log(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+		struct log_record log = {};
+		while (1) {
+			err = iterator_next_log(it, &log);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			log_record_print(&log, 20);
+		}
+		iterator_destroy(&it);
+		log_record_clear(&log);
+	}
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int opt;
+	const char *table = NULL;
+	while ((opt = getopt(argc, argv, "t:")) != -1) {
+		switch (opt) {
+		case 't':
+			table = strdup(optarg);
+			break;
+		case '?':
+			printf("usage: %s [-table tablefile]\n", argv[0]);
+			return 2;
+			break;
+		}
+	}
+
+	if (table != NULL) {
+		int err = dump_table(table);
+		if (err < 0) {
+			fprintf(stderr, "%s: %s: %s\n", argv[0], table,
+				error_str(err));
+			return 1;
+		}
+	}
+	return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 0000000000..b2ea90bf94
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	free(b);
+}
+
+static int file_read_block(void *v, struct block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size) {
+		return -1;
+	}
+	dest->len = size;
+	return size;
+}
+
+struct block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int block_source_from_file(struct block_source *bs, const char *name)
+{
+	struct stat st = {};
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0) {
+		return -1;
+	}
+
+	{
+		struct file_block_source *p =
+			calloc(sizeof(struct file_block_source), 1);
+		p->size = st.st_size;
+		p->fd = fd;
+
+		bs->ops = &file_vtable;
+		bs->arg = p;
+	}
+	return 0;
+}
+
+int fd_writer(void *arg, byte *data, int sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 0000000000..c06c891366
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,229 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct iterator it)
+{
+	return it.ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct record rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct iterator *it)
+{
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct iterator it, struct record rec)
+{
+	return it.ops->next(it.iter_arg, rec);
+}
+
+void iterator_destroy(struct iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int iterator_next_ref(struct iterator it, struct ref_record *ref)
+{
+	struct record rec = {};
+	record_from_ref(&rec, ref);
+	return iterator_next(it, rec);
+}
+
+int iterator_next_log(struct iterator it, struct log_record *log)
+{
+	struct record rec = {};
+	record_from_log(&rec, log);
+	return iterator_next(it, rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	free(slice_yield(&fri->oid));
+	iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct ref_record *ref = (struct ref_record *)rec.data;
+
+	while (true) {
+		int err = iterator_next_ref(fri->it, ref);
+		if (err != 0) {
+			return err;
+		}
+
+		if (fri->double_check) {
+			struct iterator it = {};
+
+			int err = reader_seek_ref(fri->r, &it, ref->ref_name);
+			if (err == 0) {
+				err = iterator_next_ref(it, ref);
+			}
+
+			iterator_destroy(&it);
+
+			if (err < 0) {
+				return err;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+}
+
+struct iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reader_return_block(it->r, &it->block_reader.block);
+	free(slice_yield(&it->oid));
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reader_return_block(it->r, &it->block_reader.block);
+
+	{
+		uint64_t off = it->offsets[it->offset_idx++];
+		int err = reader_init_block_reader(it->r, &it->block_reader,
+						   off, BLOCK_TYPE_REF);
+		if (err < 0) {
+			return err;
+		}
+		if (err > 0) {
+			/* indexed block does not exist. */
+			return FORMAT_ERROR;
+		}
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct record rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct ref_record *ref = (struct ref_record *)rec.data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reader *r, byte *oid, int oid_len,
+			       uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter *itr =
+		calloc(sizeof(struct indexed_table_ref_iter), 1);
+	int err = 0;
+
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 0000000000..f497f2a27e
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,56 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct iterator_vtable {
+	int (*next)(void *iter_arg, struct record rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct iterator *it);
+int iterator_next(struct iterator it, struct record rec);
+bool iterator_is_null(struct iterator it);
+
+struct filtering_ref_iterator {
+	struct reader *r;
+	struct slice oid;
+	bool double_check;
+	struct iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *,
+					  struct filtering_ref_iterator *);
+
+struct indexed_table_ref_iter {
+	struct reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reader *r, byte *oid, int oid_len,
+			       uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 0000000000..7d52ec6232
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,286 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct record rec = new_record(mi->typ);
+		int err = iterator_next(mi->stack[i], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			iterator_destroy(&mi->stack[i]);
+			record_clear(rec);
+			free(record_yield(&rec));
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		iterator_destroy(&mi->stack[i]);
+	}
+	free(mi->stack);
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, int idx)
+{
+	if (iterator_is_null(mi->stack[idx])) {
+		return 0;
+	}
+
+	{
+		struct record rec = new_record(mi->typ);
+		struct pq_entry e = {
+			.rec = rec,
+			.index = idx,
+		};
+		int err = iterator_next(mi->stack[idx], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			iterator_destroy(&mi->stack[idx]);
+			record_clear(rec);
+			free(record_yield(&rec));
+			return 0;
+		}
+
+		merged_iter_pqueue_add(&mi->pq, e);
+	}
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct record rec)
+{
+	struct slice entry_key = {};
+	struct pq_entry entry = merged_iter_pqueue_remove(&mi->pq);
+	int err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0) {
+		return err;
+	}
+
+	record_key(entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = {};
+		int err = 0, cmp = 0;
+
+		record_key(top.rec, &k);
+
+		cmp = slice_compare(k, entry_key);
+		free(slice_yield(&k));
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		record_clear(top.rec);
+		free(record_yield(&top.rec));
+	}
+
+	record_copy_from(rec, entry.rec, mi->hash_size);
+	record_clear(entry.rec);
+	free(record_yield(&entry.rec));
+	free(slice_yield(&entry_key));
+	return 0;
+}
+
+static int merged_iter_next_void(void *p, struct record rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	return merged_iter_next(mi, rec);
+}
+
+struct iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct iterator *it,
+				      struct merged_iter *mi)
+{
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n)
+{
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reader *r = stack[i];
+		if (i > 0 && last_max >= reader_min_update_index(r)) {
+			return FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reader_min_update_index(r);
+		}
+
+		last_max = reader_max_update_index(r);
+	}
+
+	{
+		struct merged_table m = {
+			.stack = stack,
+			.stack_len = n,
+			.min = first_min,
+			.max = last_max,
+			.hash_size = SHA1_SIZE,
+		};
+
+		*dest = calloc(sizeof(struct merged_table), 1);
+		**dest = m;
+	}
+	return 0;
+}
+
+void merged_table_close(struct merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void merged_table_free(struct merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	free(mt);
+}
+
+uint64_t merged_max_update_index(struct merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t merged_min_update_index(struct merged_table *mt)
+{
+	return mt->min;
+}
+
+static int merged_table_seek_record(struct merged_table *mt,
+				    struct iterator *it, struct record rec)
+{
+	struct iterator *iters = calloc(sizeof(struct iterator), mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = record_type(rec),
+		.hash_size = mt->hash_size,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			iterator_destroy(&iters[i]);
+		}
+		free(iters);
+		return err;
+	}
+
+	merged.stack_len = n, err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	}
+
+	{
+		struct merged_iter *p = malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+			  const char *name)
+{
+	struct ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = {};
+	record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+			     const char *name, uint64_t update_index)
+{
+	struct log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = {};
+	record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+			  const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 0000000000..b8d3572e26
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct merged_table {
+	struct reader **stack;
+	int stack_len;
+	int hash_size;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct iterator *stack;
+	int hash_size;
+	int stack_len;
+	byte typ;
+	struct merged_iter_pqueue pq;
+} merged_iter;
+
+void merged_table_clear(struct merged_table *mt);
+
+#endif
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 0000000000..a1aff7c98c
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,114 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = {};
+	struct slice bk = {};
+	int cmp = 0;
+	record_key(a.rec, &ak);
+	record_key(b.rec, &bk);
+
+	cmp = slice_compare(ak, bk);
+
+	free(slice_yield(&ak));
+	free(slice_yield(&bk));
+
+	if (cmp == 0) {
+		return a.index > b.index;
+	}
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = realloc(pq->heap, pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		record_clear(pq->heap[i].rec);
+		free(record_yield(&pq->heap[i].rec));
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 0000000000..5f7018979d
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	struct record rec;
+	int index;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 0000000000..981911f076
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,708 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct block_source source)
+{
+	return source.ops->size(source.arg);
+}
+
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size)
+{
+	int result = source.ops->read_block(source.arg, dest, off, size);
+	dest->source = source;
+	return result;
+}
+
+void block_source_return_block(struct block_source source, struct block *blockp)
+{
+	source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reader_offsets *reader_offsets_for(struct reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reader *r, struct block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size) {
+		return 0;
+	}
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(r->source, dest, off, sz);
+}
+
+void reader_return_block(struct reader *r, struct block *p)
+{
+	block_source_return_block(r->source, p);
+}
+
+const char *reader_name(struct reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	int err = 0;
+	if (memcmp(f, "REFT", 4)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, HEADER_SIZE)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+
+	{
+		byte version = *f++;
+		if (version != 1) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	r->block_size = get_u24(f);
+
+	f += 3;
+	r->min_update_index = get_u64(f);
+	f += 8;
+	r->max_update_index = get_u64(f);
+	f += 8;
+
+	r->ref_offsets.index_offset = get_u64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_u64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_u64(f);
+	f += 8;
+	r->log_offsets.offset = get_u64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_u64(f);
+	f += 8;
+
+	{
+		uint32_t computed_crc = crc32(0, footer, f - footer);
+		uint32_t file_crc = get_u32(f);
+		f += 4;
+		if (computed_crc != file_crc) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	{
+		byte first_block_typ = header[HEADER_SIZE];
+		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+		r->ref_offsets.offset = 0;
+		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+					  r->log_offsets.offset > 0);
+		r->obj_offsets.present = r->obj_offsets.offset > 0;
+	}
+	err = 0;
+exit:
+	return err;
+}
+
+int init_reader(struct reader *r, struct block_source source, const char *name)
+{
+	struct block footer = {};
+	struct block header = {};
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reader));
+	r->size = block_source_size(source) - FOOTER_SIZE;
+	r->source = source;
+	r->name = strdup(name);
+	r->hash_size = SHA1_SIZE;
+
+	err = block_source_read_block(source, &footer, r->size, FOOTER_SIZE);
+	if (err != FOOTER_SIZE) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	/* Need +1 to read type of first block. */
+	err = reader_get_block(r, &header, 0, HEADER_SIZE + 1);
+	if (err != HEADER_SIZE + 1) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+exit:
+	block_source_return_block(r->source, &footer);
+	block_source_return_block(r->source, &header);
+	return err;
+}
+
+struct table_iter {
+	struct reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
+		((struct ref_record *)rec.data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reader_return_block(ti->r, &ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += 24;
+	}
+
+	*typ = data[0];
+	if (is_block_type(*typ)) {
+		result = get_u24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct block block = {};
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : HEADER_SIZE;
+	int32_t block_size = 0;
+
+	if (next_off >= r->size) {
+		return 1;
+	}
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0) {
+		return err;
+	}
+
+	block_size = extract_block_size(block.data, &block_typ, next_off);
+	if (block_size < 0) {
+		return block_size;
+	}
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reader_return_block(r, &block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reader_return_block(r, &block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 r->hash_size);
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = {};
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0) {
+		return err;
+	}
+
+	{
+		struct block_reader *brp = malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct record rec)
+{
+	if (record_type(rec) != ti->typ) {
+		return API_ERROR;
+	}
+
+	while (true) {
+		struct table_iter next = {};
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct record rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct iterator *it, struct table_iter *ti)
+{
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reader *r, struct table_iter *ti,
+				uint64_t off, byte typ)
+{
+	struct block_reader br = {};
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0) {
+		return err;
+	}
+
+	brp = malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reader *r, struct table_iter *ti, byte typ,
+			bool index)
+{
+	struct reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reader *r, struct table_iter *ti,
+			      struct record want)
+{
+	struct record rec = new_record(record_type(want));
+	struct slice want_key = {};
+	struct slice got_key = {};
+	struct table_iter next = {};
+	int err = -1;
+	record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0) {
+			goto exit;
+		}
+		{
+			int cmp = slice_compare(got_key, want_key);
+			if (cmp > 0) {
+				table_iter_block_done(&next);
+				break;
+			}
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0) {
+		goto exit;
+	}
+	err = 0;
+
+exit:
+	block_iter_close(&next.bi);
+	record_clear(rec);
+	free(record_yield(&rec));
+	free(slice_yield(&want_key));
+	free(slice_yield(&got_key));
+	return err;
+}
+
+static int reader_seek_indexed(struct reader *r, struct iterator *it,
+			       struct record rec)
+{
+	struct index_record want_index = {};
+	struct record want_index_rec = {};
+	struct index_record index_result = {};
+	struct record index_result_rec = {};
+	struct table_iter index_iter = {};
+	struct table_iter next = {};
+	int err = 0;
+
+	record_key(rec, &want_index.last_key);
+	record_from_index(&want_index_rec, &want_index);
+	record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, record_type(rec), true);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reader_seek_linear(r, &index_iter, want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (next.typ == record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter *malloced =
+			calloc(sizeof(struct table_iter), 1);
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+exit:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	record_clear(want_index_rec);
+	record_clear(index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reader *r, struct iterator *it,
+				struct record rec)
+{
+	struct reader_offsets *offs = reader_offsets_for(r, record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = {};
+	int err = 0;
+	if (idx > 0) {
+		return reader_seek_indexed(r, it, rec);
+	}
+
+	err = reader_start(r, &ti, record_type(rec), false);
+	if (err < 0) {
+		return err;
+	}
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct table_iter *p = malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reader *r, struct iterator *it, struct record rec)
+{
+	byte typ = record_type(rec);
+
+	struct reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name)
+{
+	struct ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = {};
+	record_from_ref(&rec, &ref);
+	return reader_seek(r, it, rec);
+}
+
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+		       uint64_t update_index)
+{
+	struct log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = {};
+	record_from_log(&rec, &log);
+	return reader_seek(r, it, rec);
+}
+
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int new_reader(struct reader **p, struct block_source src, char const *name)
+{
+	struct reader *rd = calloc(sizeof(struct reader), 1);
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		free(rd);
+	}
+	return err;
+}
+
+void reader_free(struct reader *r)
+{
+	reader_close(r);
+	free(r);
+}
+
+static int reader_refs_for_indexed(struct reader *r, struct iterator *it,
+				   byte *oid)
+{
+	struct obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct record want_rec = {};
+	struct iterator oit = {};
+	struct obj_record got = {};
+	struct record got_rec = {};
+	int err = 0;
+
+	record_from_obj(&want_rec, &want);
+
+	err = reader_seek(r, &oit, want_rec);
+	if (err != 0) {
+		return err;
+	}
+
+	record_from_obj(&got_rec, &got);
+	err = iterator_next(oit, got_rec);
+	iterator_destroy(&oit);
+	if (err < 0) {
+		return err;
+	}
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	{
+		struct indexed_table_ref_iter *itr = NULL;
+		err = new_indexed_table_ref_iter(&itr, r, oid, r->hash_size,
+						 got.offsets, got.offset_len);
+		if (err < 0) {
+			record_clear(got_rec);
+			return err;
+		}
+		got.offsets = NULL;
+		record_clear(got_rec);
+
+		iterator_from_indexed_table_ref_iter(it, itr);
+	}
+
+	return 0;
+}
+
+static int reader_refs_for_unindexed(struct reader *r, struct iterator *it,
+				     byte *oid, int oid_len)
+{
+	struct table_iter *ti = calloc(sizeof(struct table_iter), 1);
+	struct filtering_ref_iterator *filter = NULL;
+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		free(ti);
+		return err;
+	}
+
+	filter = calloc(sizeof(struct filtering_ref_iterator), 1);
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	filter->r = r;
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
+		    int oid_len)
+{
+	if (r->obj_offsets.present) {
+		return reader_refs_for_indexed(r, it, oid);
+	}
+	return reader_refs_for_unindexed(r, it, oid, oid_len);
+}
+
+uint64_t reader_max_update_index(struct reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reader_min_update_index(struct reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 0000000000..599a90028e
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,52 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source *source);
+
+struct reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+struct reader {
+	struct block_source source;
+	char *name;
+	int hash_size;
+	uint64_t size;
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	int object_id_len;
+
+	struct reader_offsets ref_offsets;
+	struct reader_offsets obj_offsets;
+	struct reader_offsets log_offsets;
+};
+
+int init_reader(struct reader *r, struct block_source source, const char *name);
+int reader_seek(struct reader *r, struct iterator *it, struct record rec);
+void reader_close(struct reader *r);
+const char *reader_name(struct reader *r);
+void reader_return_block(struct reader *r, struct block *p);
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 0000000000..95b6ffcc9f
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1107 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0) {
+		return -1;
+	}
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = {};
+	int i = 9;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	{
+		int n = sizeof(buf) - i - 1;
+		if (dest.len < n) {
+			return -1;
+		}
+		memcpy(dest.buf, &buf[i + 1], n);
+		return n;
+	}
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+	if (in.len < tsize) {
+		return -1;
+	}
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	in.buf += tsize;
+	in.len -= tsize;
+
+	return start_len - in.len;
+}
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0) {
+		return -1;
+	}
+	dest.buf += n;
+	dest.len -= n;
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0) {
+		return -1;
+	}
+	dest.buf += n;
+	dest.len -= n;
+
+	if (dest.len < suffix_len) {
+		return -1;
+	}
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	dest.buf += suffix_len;
+	dest.len -= suffix_len;
+
+	return start.len - dest.len;
+}
+
+static byte ref_record_type(void)
+{
+	return BLOCK_TYPE_REF;
+}
+
+static void ref_record_key(const void *r, struct slice *dest)
+{
+	const struct ref_record *rec = (const struct ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void ref_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct ref_record *ref = (struct ref_record *)rec;
+	struct ref_record *src = (struct ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = strdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = strdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9) {
+		return '0' + c;
+	}
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void ref_record_print(struct ref_record *ref, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = {};
+
+	printf("ref{%s(%" PRIdMAX ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size);
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size);
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void ref_record_clear_void(void *rec)
+{
+	ref_record_clear((struct ref_record *)rec);
+}
+
+void ref_record_clear(struct ref_record *ref)
+{
+	free(ref->ref_name);
+	free(ref->target);
+	free(ref->target_value);
+	free(ref->value);
+	memset(ref, 0, sizeof(struct ref_record));
+}
+
+static byte ref_record_val_type(const void *rec)
+{
+	const struct ref_record *r = (const struct ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL) {
+		return 3;
+	}
+	return 0;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+	if (s.len < l) {
+		return -1;
+	}
+	memcpy(s.buf, str, l);
+	s.buf += l;
+	s.len -= l;
+
+	return start.len - s.len;
+}
+
+static int ref_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	const struct ref_record *r = (const struct ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		s.buf += hash_size;
+		s.len -= hash_size;
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		s.buf += hash_size;
+		s.len -= hash_size;
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		s.buf += n;
+		s.len -= n;
+	}
+
+	return start.len - s.len;
+}
+
+static int ref_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct ref_record *r = (struct ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0) {
+		return n;
+	}
+	assert(hash_size > 0);
+
+	in.buf += n;
+	in.len -= n;
+
+	r->ref_name = realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		in.buf += hash_size;
+		in.len -= hash_size;
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		in.buf += hash_size;
+		in.len -= hash_size;
+		break;
+	case 3: {
+		struct slice dest = {};
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		in.buf += n;
+		in.len -= n;
+		seen_target = true;
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+
+	if (prefix_len > last_key.len) {
+		return -1;
+	}
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len) {
+		return -1;
+	}
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	in.buf += suffix_len;
+	in.len -= suffix_len;
+
+	return start_len - in.len;
+}
+
+struct record_vtable ref_record_vtable = {
+	.key = &ref_record_key,
+	.type = &ref_record_type,
+	.copy_from = &ref_record_copy_from,
+	.val_type = &ref_record_val_type,
+	.encode = &ref_record_encode,
+	.decode = &ref_record_decode,
+	.clear = &ref_record_clear_void,
+};
+
+static byte obj_record_type(void)
+{
+	return BLOCK_TYPE_OBJ;
+}
+
+static void obj_record_key(const void *r, struct slice *dest)
+{
+	const struct obj_record *rec = (const struct obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	const struct obj_record *src = (const struct obj_record *)src_rec;
+
+	*ref = *src;
+	ref->hash_prefix = malloc(ref->hash_prefix_len);
+	memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
+
+	{
+		int olen = ref->offset_len * sizeof(uint64_t);
+		ref->offsets = malloc(olen);
+		memcpy(ref->offsets, src->offsets, olen);
+	}
+}
+
+static void obj_record_clear(void *rec)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	FREE_AND_NULL(ref->hash_prefix);
+	FREE_AND_NULL(ref->offsets);
+	memset(ref, 0, sizeof(struct obj_record));
+}
+
+static byte obj_record_val_type(const void *rec)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8) {
+		return r->offset_len;
+	}
+	return 0;
+}
+
+static int obj_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		s.buf += n;
+		s.len -= n;
+	}
+	if (r->offset_len == 0) {
+		return start.len - s.len;
+	}
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	{
+		uint64_t last = r->offsets[0];
+		int i = 0;
+		for (i = 1; i < r->offset_len; i++) {
+			int n = put_var_int(s, r->offsets[i] - last);
+			if (n < 0) {
+				return -1;
+			}
+			s.buf += n;
+			s.len -= n;
+			last = r->offsets[i];
+		}
+	}
+	return start.len - s.len;
+}
+
+static int obj_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct obj_record *r = (struct obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	r->hash_prefix = malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		in.buf += n;
+		in.len -= n;
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0) {
+		return start.len - in.len;
+	}
+
+	r->offsets = malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0) {
+		return n;
+	}
+
+	in.buf += n;
+	in.len -= n;
+
+	{
+		uint64_t last = r->offsets[0];
+		int j = 1;
+		while (j < count) {
+			uint64_t delta = 0;
+			int n = get_var_int(&delta, in);
+			if (n < 0) {
+				return n;
+			}
+
+			in.buf += n;
+			in.len -= n;
+
+			last = r->offsets[j] = (delta + last);
+			j++;
+		}
+	}
+	return start.len - in.len;
+}
+
+struct record_vtable obj_record_vtable = {
+	.key = &obj_record_key,
+	.type = &obj_record_type,
+	.copy_from = &obj_record_copy_from,
+	.val_type = &obj_record_val_type,
+	.encode = &obj_record_encode,
+	.decode = &obj_record_decode,
+	.clear = &obj_record_clear,
+};
+
+void log_record_print(struct log_record *log, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = {};
+
+	printf("log{%s(%" PRIdMAX ") %s <%s> %" PRIuMAX " %04d\n",
+	       log->ref_name, log->update_index, log->name, log->email,
+	       log->time, log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size);
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size);
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static byte log_record_type(void)
+{
+	return BLOCK_TYPE_LOG;
+}
+
+static void log_record_key(const void *r, struct slice *dest)
+{
+	const struct log_record *rec = (const struct log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_u64(dest->buf + 1 + len, ts);
+}
+
+static void log_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct log_record *dst = (struct log_record *)rec;
+	const struct log_record *src = (const struct log_record *)src_rec;
+
+	*dst = *src;
+	dst->ref_name = strdup(dst->ref_name);
+	dst->email = strdup(dst->email);
+	dst->name = strdup(dst->name);
+	dst->message = strdup(dst->message);
+	if (dst->new_hash != NULL) {
+		dst->new_hash = malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void log_record_clear_void(void *rec)
+{
+	struct log_record *r = (struct log_record *)rec;
+	log_record_clear(r);
+}
+
+void log_record_clear(struct log_record *r)
+{
+	free(r->ref_name);
+	free(r->new_hash);
+	free(r->old_hash);
+	free(r->name);
+	free(r->email);
+	free(r->message);
+	memset(r, 0, sizeof(struct log_record));
+}
+
+static byte log_record_val_type(const void *rec)
+{
+	return 1;
+}
+
+static byte zero[SHA256_SIZE] = {};
+
+static int log_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct log_record *r = (struct log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size) {
+		return -1;
+	}
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	s.buf += 2 * hash_size;
+	s.len -= 2 * hash_size;
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	n = put_var_int(s, r->time);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	if (s.len < 2) {
+		return -1;
+	}
+
+	put_u16(s.buf, r->tz_offset);
+	s.buf += 2;
+	s.len -= 2;
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	return start.len - s.len;
+}
+
+static int log_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct log_record *r = (struct log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = {};
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+		return FORMAT_ERROR;
+	}
+
+	r->ref_name = realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_u64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (in.len < 2 * hash_size) {
+		return FORMAT_ERROR;
+	}
+
+	r->old_hash = realloc(r->old_hash, hash_size);
+	r->new_hash = realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	in.buf += 2 * hash_size;
+	in.len -= 2 * hash_size;
+
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->name = realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->email = realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+	r->time = ts;
+	if (in.len < 2) {
+		goto error;
+	}
+
+	r->tz_offset = get_u16(in.buf);
+	in.buf += 2;
+	in.len -= 2;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->message = realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	return start.len - in.len;
+
+error:
+	free(slice_yield(&dest));
+	return FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL) {
+		a = empty;
+	}
+	if (b == NULL) {
+		b = empty;
+	}
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL) {
+		a = zero;
+	}
+	if (b == NULL) {
+		b = zero;
+	}
+	return !memcmp(a, b, sz);
+}
+
+bool log_record_equal(struct log_record *a, struct log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+struct record_vtable log_record_vtable = {
+	.key = &log_record_key,
+	.type = &log_record_type,
+	.copy_from = &log_record_copy_from,
+	.val_type = &log_record_val_type,
+	.encode = &log_record_encode,
+	.decode = &log_record_decode,
+	.clear = &log_record_clear_void,
+};
+
+struct record new_record(byte typ)
+{
+	struct record rec;
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct ref_record *r = calloc(1, sizeof(struct ref_record));
+		record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct obj_record *r = calloc(1, sizeof(struct obj_record));
+		record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct log_record *r = calloc(1, sizeof(struct log_record));
+		record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct index_record *r = calloc(1, sizeof(struct index_record));
+		record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+static byte index_record_type(void)
+{
+	return BLOCK_TYPE_INDEX;
+}
+
+static void index_record_key(const void *r, struct slice *dest)
+{
+	struct index_record *rec = (struct index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void index_record_copy_from(void *rec, const void *src_rec,
+				   int hash_size)
+{
+	struct index_record *dst = (struct index_record *)rec;
+	struct index_record *src = (struct index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void index_record_clear(void *rec)
+{
+	struct index_record *idx = (struct index_record *)rec;
+	free(slice_yield(&idx->last_key));
+}
+
+static byte index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int index_record_encode(const void *rec, struct slice out, int hash_size)
+{
+	const struct index_record *r = (const struct index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0) {
+		return n;
+	}
+
+	out.buf += n;
+	out.len -= n;
+
+	return start.len - out.len;
+}
+
+static int index_record_decode(void *rec, struct slice key, byte val_type,
+			       struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct index_record *r = (struct index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0) {
+		return n;
+	}
+
+	in.buf += n;
+	in.len -= n;
+	return start.len - in.len;
+}
+
+struct record_vtable index_record_vtable = {
+	.key = &index_record_key,
+	.type = &index_record_type,
+	.copy_from = &index_record_copy_from,
+	.val_type = &index_record_val_type,
+	.encode = &index_record_encode,
+	.decode = &index_record_decode,
+	.clear = &index_record_clear,
+};
+
+void record_key(struct record rec, struct slice *dest)
+{
+	rec.ops->key(rec.data, dest);
+}
+
+byte record_type(struct record rec)
+{
+	return rec.ops->type();
+}
+
+int record_encode(struct record rec, struct slice dest, int hash_size)
+{
+	return rec.ops->encode(rec.data, dest, hash_size);
+}
+
+void record_copy_from(struct record rec, struct record src, int hash_size)
+{
+	assert(src.ops->type() == rec.ops->type());
+
+	rec.ops->copy_from(rec.data, src.data, hash_size);
+}
+
+byte record_val_type(struct record rec)
+{
+	return rec.ops->val_type(rec.data);
+}
+
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size)
+{
+	return rec.ops->decode(rec.data, key, extra, src, hash_size);
+}
+
+void record_clear(struct record rec)
+{
+	return rec.ops->clear(rec.data);
+}
+
+void record_from_ref(struct record *rec, struct ref_record *ref_rec)
+{
+	rec->data = ref_rec;
+	rec->ops = &ref_record_vtable;
+}
+
+void record_from_obj(struct record *rec, struct obj_record *obj_rec)
+{
+	rec->data = obj_rec;
+	rec->ops = &obj_record_vtable;
+}
+
+void record_from_index(struct record *rec, struct index_record *index_rec)
+{
+	rec->data = index_rec;
+	rec->ops = &index_record_vtable;
+}
+
+void record_from_log(struct record *rec, struct log_record *log_rec)
+{
+	rec->data = log_rec;
+	rec->ops = &log_record_vtable;
+}
+
+void *record_yield(struct record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+struct ref_record *record_as_ref(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_REF);
+	return (struct ref_record *)rec.data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL) {
+		return !memcmp(a, b, hash_size);
+	}
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL) {
+		return 0 == strcmp(a, b);
+	}
+
+	return a == b;
+}
+
+bool ref_record_equal(struct ref_record *a, struct ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct ref_record *)a)->ref_name,
+		      ((struct ref_record *)b)->ref_name);
+}
+
+bool ref_record_is_deletion(const struct ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int log_record_compare_key(const void *a, const void *b)
+{
+	struct log_record *la = (struct log_record *)a;
+	struct log_record *lb = (struct log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp) {
+		return cmp;
+	}
+	if (la->update_index > lb->update_index) {
+		return -1;
+	}
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool log_record_is_deletion(const struct log_record *log)
+{
+	/* XXX */
+	return false;
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 0000000000..dffdd71fc2
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,79 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+struct record_vtable {
+	void (*key)(const void *rec, struct slice *dest);
+	byte (*type)(void);
+	void (*copy_from)(void *rec, const void *src, int hash_size);
+	byte (*val_type)(const void *rec);
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+	void (*clear)(void *rec);
+};
+
+/* record is a generic wrapper for differnt types of records. */
+struct record {
+	void *data;
+	struct record_vtable *ops;
+};
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+int common_prefix_size(struct slice a, struct slice b);
+
+int is_block_type(byte typ);
+struct record new_record(byte typ);
+
+extern struct record_vtable ref_record_vtable;
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra);
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in);
+
+struct index_record {
+	struct slice last_key;
+	uint64_t offset;
+};
+
+struct obj_record {
+	byte *hash_prefix;
+	int hash_prefix_len;
+	uint64_t *offsets;
+	int offset_len;
+};
+
+void record_key(struct record rec, struct slice *dest);
+byte record_type(struct record rec);
+void record_copy_from(struct record rec, struct record src, int hash_size);
+byte record_val_type(struct record rec);
+int record_encode(struct record rec, struct slice dest, int hash_size);
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size);
+void record_clear(struct record rec);
+void *record_yield(struct record *rec);
+void record_from_obj(struct record *rec, struct obj_record *objrec);
+void record_from_index(struct record *rec, struct index_record *idxrec);
+void record_from_ref(struct record *rec, struct ref_record *refrec);
+void record_from_log(struct record *rec, struct log_record *logrec);
+struct ref_record *record_as_ref(struct record ref);
+
+/* for qsort. */
+int ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 0000000000..d84cc2004a
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,399 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include "system.h"
+
+typedef uint8_t byte;
+typedef byte bool;
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct block_source {
+	struct block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct block {
+	byte *data;
+	int len;
+	struct block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct block *dest, uint64_t off,
+			  uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int block_source_from_file(struct block_source *block_src, const char *name);
+
+/* write_options sets options for writing a single reftable. */
+struct write_options {
+	/* do not pad out blocks to block size. */
+	bool unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* do not generate a SHA1 => ref index. */
+	bool skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+};
+
+/* ref_record holds a ref database entry target_value */
+struct ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	byte *value; /* SHA1, or NULL. malloced. */
+	byte *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+bool ref_record_is_deletion(const struct ref_record *ref);
+
+/* prints a ref_record onto stdout */
+void ref_record_print(struct ref_record *ref, int hash_size);
+
+/* frees and nulls all pointer values. */
+void ref_record_clear(struct ref_record *ref);
+
+/* returns whether two ref_records are the same */
+bool ref_record_equal(struct ref_record *a, struct ref_record *b,
+		      int hash_size);
+
+/* log_record holds a reflog entry */
+struct log_record {
+	char *ref_name;
+	uint64_t update_index;
+	byte *new_hash;
+	byte *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+bool log_record_is_deletion(const struct log_record *log);
+
+/* frees and nulls all pointer values. */
+void log_record_clear(struct log_record *log);
+
+/* returns whether two records are equal. */
+bool log_record_equal(struct log_record *a, struct log_record *b,
+		      int hash_size);
+
+void log_record_print(struct log_record *log, int hash_size);
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct iterator {
+	struct iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_ref(struct iterator it, struct ref_record *ref);
+
+/* reads the next log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_log(struct iterator it, struct log_record *log);
+
+/* releases resources associated with an iterator. */
+void iterator_destroy(struct iterator *it);
+
+/* block_stats holds statistics for a single block type */
+struct block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct block_stats obj_stats;
+	/* stats for index blocks */
+	struct block_stats idx_stats;
+	/* stats for log blocks */
+	struct block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* different types of errors */
+
+/* Unexpected file system behavior */
+#define IO_ERROR -2
+
+/* Format inconsistency on reading data
+ */
+#define FORMAT_ERROR -3
+
+/* File does not exist. Returned from block_source_from_file(),  because it
+   needs special handling in stack.
+*/
+#define NOT_EXIST_ERROR -4
+
+/* Trying to write out-of-date data. */
+#define LOCK_ERROR -5
+
+/* Misuse of the API:
+   - on writing a record with NULL ref_name.
+   - on writing a ref_record outside the table limits
+   - on writing a ref or log record before the stack's next_update_index
+   - on reading a ref_record from log iterator, or vice versa.
+ */
+#define API_ERROR -6
+
+/* Decompression error */
+#define ZLIB_ERROR -7
+
+const char *error_str(int err);
+
+/* new_writer creates a new writer */
+struct writer *new_writer(int (*writer_func)(void *, byte *, int),
+			  void *writer_arg, struct write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int fd_writer(void *fdp, byte *data, int size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   stack_next_update_index(), or API_ERROR is returned.
+ */
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max);
+
+/* adds a ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   writer_set_limits(), or API_ERROR is returned.
+ */
+int writer_add_ref(struct writer *w, struct ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n);
+
+/* adds a log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int writer_add_log(struct writer *w, struct log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int writer_add_logs(struct writer *w, struct log_record *logs, int n);
+
+/* writer_close finalizes the reftable. The writer is retained so statistics can
+ * be inspected. */
+int writer_close(struct writer *w);
+
+/* writer_stats returns the statistics on the reftable being written. */
+struct stats *writer_stats(struct writer *w);
+
+/* writer_free deallocates memory for the writer */
+void writer_free(struct writer *w);
+
+struct reader;
+
+/* new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp.  The name is used for creating a
+ * stack. Typically, it is the basename of the file.
+ */
+int new_reader(struct reader **pp, struct block_source, const char *name);
+
+/* reader_seek_ref returns an iterator where 'name' would be inserted in the
+   table.
+
+   example:
+
+   struct reader *r = NULL;
+   int err = new_reader(&r, src, "filename");
+   if (err < 0) { ... }
+   struct iterator it = {};
+   err = reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct ref_record ref = {};
+   while (1) {
+     err = iterator_next_ref(it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   iterator_destroy(&it);
+   ref_record_clear(&ref);
+ */
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name);
+
+/* seek to logs for the given name, older than update_index. */
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+		       uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reader_free(struct reader *);
+
+/* return an iterator for the refs pointing to oid */
+int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
+		    int oid_len);
+
+/* return the max_update_index for a table */
+uint64_t reader_max_update_index(struct reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reader_min_update_index(struct reader *r);
+
+/* a merged table is implements seeking/iterating over a stack of tables. */
+struct merged_table;
+
+/* new_merged_table creates a new merged table. It takes ownership of the stack
+   array.
+*/
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n);
+
+/* returns an iterator positioned just before 'name' */
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+			  const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+			     const char *name, uint64_t update_index);
+
+/* like merged_table_seek_log_at but look for the newest entry. */
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+			  const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t merged_max_update_index(struct merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t merged_min_update_index(struct merged_table *mt);
+
+/* closes readers for the merged tables */
+void merged_table_close(struct merged_table *mt);
+
+/* releases memory for the merged_table */
+void merged_table_free(struct merged_table *m);
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct stack;
+
+/* open a new reftable stack. The tables will be stored in 'dir', while the list
+   of tables is in 'list_file'. Typically, this should be .git/reftables and
+   .git/refs respectively.
+*/
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+	      struct write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t stack_next_update_index(struct stack *st);
+
+/* add a new table to the stack. The write_table function must call
+   writer_set_limits, add refs and return an error value. */
+int stack_add(struct stack *st,
+	      int (*write_table)(struct writer *wr, void *write_arg),
+	      void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct merged_table *stack_merged_table(struct stack *st);
+
+/* frees all resources associated with the stack. */
+void stack_destroy(struct stack *st);
+
+/* reloads the stack if necessary. */
+int stack_reload(struct stack *st);
+
+/* Policy for expiring reflog entries. */
+struct log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int stack_compact_all(struct stack *st, struct log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int stack_auto_compact(struct stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_ref(struct stack *st, const char *refname,
+		   struct ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_log(struct stack *st, const char *refname,
+		   struct log_record *log);
+
+/* statistics on past compactions. */
+struct compaction_stats {
+	uint64_t bytes;
+	int attempts;
+	int failures;
+};
+
+struct compaction_stats *stack_compaction_stats(struct stack *st);
+
+#endif
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 0000000000..efbe625253
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,199 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+
+	{
+		int l = strlen(str);
+		l++; /* \0 */
+		slice_resize(s, l);
+		memcpy(s->buf, str, l);
+		s->len = l - 1;
+	}
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_append_string(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_append(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+byte *slice_yield(struct slice *s)
+{
+	byte *p = s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = {};
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_yield(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	if (a.len != b.len) {
+		return 0;
+	}
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_compare(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	if (res != 0) {
+		return res;
+	}
+	if (a.len < b.len) {
+		return -1;
+	} else if (a.len > b.len) {
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+int slice_write(struct slice *b, byte *data, int sz)
+{
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_write_void(void *b, byte *data, int sz)
+{
+	return slice_write((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = calloc(size, 1);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct block_source *bs, struct slice *buf)
+{
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+struct block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 0000000000..f12a6db228
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+struct slice {
+	byte *buf;
+	int len;
+	int cap;
+};
+
+void slice_set_string(struct slice *dest, const char *);
+void slice_append_string(struct slice *dest, const char *);
+char *slice_to_string(struct slice src);
+const char *slice_as_string(struct slice *src);
+bool slice_equal(struct slice a, struct slice b);
+byte *slice_yield(struct slice *s);
+void slice_copy(struct slice *dest, struct slice src);
+void slice_resize(struct slice *s, int l);
+int slice_compare(struct slice a, struct slice b);
+int slice_write(struct slice *b, byte *data, int sz);
+int slice_write_void(void *b, byte *data, int sz);
+void slice_append(struct slice *dest, struct slice add);
+
+struct block_source;
+void block_source_from_slice(struct block_source *bs, struct slice *buf);
+
+struct block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 0000000000..dd16b1fe06
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,983 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "reftable.h"
+#include "writer.h"
+
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+	      struct write_options config)
+{
+	struct stack *p = calloc(sizeof(struct stack), 1);
+	int err = 0;
+	*dest = NULL;
+	p->list_file = strdup(list_file);
+	p->reftable_dir = strdup(dir);
+	p->config = config;
+
+	err = stack_reload(p);
+	if (err < 0) {
+		stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fread_lines(FILE *f, char ***namesp)
+{
+	long size = 0;
+	int err = fseek(f, 0, SEEK_END);
+	char *buf = NULL;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	size = ftell(f);
+	if (size < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = fseek(f, 0, SEEK_SET);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	buf = malloc(size + 1);
+	if (fread(buf, 1, size, f) != size) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+exit:
+	free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	FILE *f = fopen(filename, "r");
+	int err = 0;
+	if (f == NULL) {
+		if (errno == ENOENT) {
+			*namesp = calloc(sizeof(char *), 1);
+			return 0;
+		}
+
+		return IO_ERROR;
+	}
+	err = fread_lines(f, namesp);
+	fclose(f);
+	return err;
+}
+
+struct merged_table *stack_merged_table(struct stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void stack_destroy(struct stack *st)
+{
+	if (st->merged == NULL) {
+		return;
+	}
+
+	merged_table_close(st->merged);
+	merged_table_free(st->merged);
+	st->merged = NULL;
+
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	free(st);
+}
+
+static struct reader **stack_copy_readers(struct stack *st, int cur_len)
+{
+	struct reader **cur = calloc(sizeof(struct reader *), cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int stack_reload_once(struct stack *st, char **names, bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reader **new_tables =
+		malloc(sizeof(struct reader *) * names_len);
+	int new_tables_len = 0;
+	struct merged_table *new_merged = NULL;
+
+	struct slice table_path = {};
+
+	while (*names) {
+		struct reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct block_source src = {};
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_append_string(&table_path, "/");
+			slice_append_string(&table_path, name);
+
+			err = block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0) {
+				goto exit;
+			}
+
+			err = new_reader(&rd, src, name);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = new_merged_table(&new_merged, new_tables, new_tables_len);
+	if (err < 0) {
+		goto exit;
+	}
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		merged_table_free(st->merged);
+	}
+	st->merged = new_merged;
+
+	{
+		int i = 0;
+		for (i = 0; i < cur_len; i++) {
+			if (cur[i] != NULL) {
+				reader_close(cur[i]);
+				reader_free(cur[i]);
+			}
+		}
+	}
+exit:
+	free(slice_yield(&table_path));
+	{
+		int i = 0;
+		for (i = 0; i < new_tables_len; i++) {
+			reader_close(new_tables[i]);
+		}
+	}
+	free(new_tables);
+	free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0) {
+		return diff;
+	}
+
+	return udiff;
+}
+
+static int stack_reload_maybe_reuse(struct stack *st, bool reuse_open)
+{
+	struct timeval deadline = {};
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = {};
+		int err = gettimeofday(&now, NULL);
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		err = read_lines(st->list_file, &names_after);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return -1;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 100;
+		usleep(delay);
+	}
+
+	return 0;
+}
+
+int stack_reload(struct stack *st)
+{
+	return stack_reload_maybe_reuse(st, true);
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto exit;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto exit;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	free_names(names);
+	return err;
+}
+
+int stack_add(struct stack *st, int (*write)(struct writer *wr, void *arg),
+	      void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == LOCK_ERROR) {
+			err = stack_reload(st);
+		}
+		return err;
+	}
+
+	return stack_auto_compact(st);
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "%012" PRIxMAX "-%012" PRIxMAX, min, max);
+	slice_set_string(dest, buf);
+}
+
+int stack_try_add(struct stack *st,
+		  int (*write_table)(struct writer *wr, void *arg), void *arg)
+{
+	struct slice lock_name = {};
+	struct slice temp_tab_name = {};
+	struct slice tab_name = {};
+	struct slice next_name = {};
+	struct slice table_list = {};
+	struct writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+	int lock_fd = 0;
+	uint64_t next_update_index = 0;
+
+	slice_set_string(&lock_name, st->list_file);
+	slice_append_string(&lock_name, ".lock");
+
+	lock_fd = open(slice_as_string(&lock_name), O_EXCL | O_CREAT | O_WRONLY,
+		       0644);
+	if (lock_fd < 0) {
+		if (errno == EEXIST) {
+			err = LOCK_ERROR;
+			goto exit;
+		}
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_uptodate(st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 1) {
+		err = LOCK_ERROR;
+		goto exit;
+	}
+
+	next_update_index = stack_next_update_index(st);
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, next_update_index, next_update_index);
+
+	slice_set_string(&temp_tab_name, st->reftable_dir);
+	slice_append_string(&temp_tab_name, "/");
+	slice_append(&temp_tab_name, next_name);
+	slice_append_string(&temp_tab_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_name));
+	if (tab_fd < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	wr = new_writer(fd_writer, &tab_fd, &st->config);
+	err = write_table(wr, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	if (wr->min_update_index < next_update_index) {
+		err = API_ERROR;
+		goto exit;
+	}
+
+	{
+		int i = 0;
+		for (i = 0; i < st->merged->stack_len; i++) {
+			slice_append_string(&table_list,
+					    st->merged->stack[i]->name);
+			slice_append_string(&table_list, "\n");
+		}
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_append_string(&next_name, ".ref");
+	slice_append(&table_list, next_name);
+	slice_append_string(&table_list, "\n");
+
+	slice_set_string(&tab_name, st->reftable_dir);
+	slice_append_string(&tab_name, "/");
+	slice_append(&tab_name, next_name);
+
+	err = rename(slice_as_string(&temp_tab_name),
+		     slice_as_string(&tab_name));
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	free(slice_yield(&temp_tab_name));
+
+	err = write(lock_fd, table_list.buf, table_list.len);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = close(lock_fd);
+	lock_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&tab_name));
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&tab_name));
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_reload(st);
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_name));
+	}
+	unlink(slice_as_string(&lock_name));
+
+	if (lock_fd > 0) {
+		close(lock_fd);
+		lock_fd = 0;
+	}
+
+	free(slice_yield(&lock_name));
+	free(slice_yield(&temp_tab_name));
+	free(slice_yield(&tab_name));
+	free(slice_yield(&next_name));
+	free(slice_yield(&table_list));
+	writer_free(wr);
+	return err;
+}
+
+uint64_t stack_next_update_index(struct stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0) {
+		return reader_max_update_index(st->merged->stack[sz - 1]) + 1;
+	}
+	return 1;
+}
+
+static int stack_compact_locked(struct stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct log_expiry_config *config)
+{
+	struct slice next_name = {};
+	int tab_fd = -1;
+	struct writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reader_min_update_index(st->merged->stack[first]),
+		    reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_append_string(temp_tab, "/");
+	slice_append(temp_tab, next_name);
+	slice_append_string(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = new_writer(fd_writer, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0) {
+		goto exit;
+	}
+	err = writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+	writer_free(wr);
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		free(slice_yield(temp_tab));
+	}
+	free(slice_yield(&next_name));
+	return err;
+}
+
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+			int last, struct log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reader **subtabs =
+		calloc(sizeof(struct reader *), last - first + 1);
+	struct merged_table *mt = NULL;
+	int err = 0;
+	struct iterator it = {};
+	struct ref_record ref = {};
+	struct log_record log = {};
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	writer_set_limits(wr, st->merged->stack[first]->min_update_index,
+			  st->merged->stack[last]->max_update_index);
+
+	err = new_merged_table(&mt, subtabs, subtabs_len);
+	if (err < 0) {
+		free(subtabs);
+		goto exit;
+	}
+
+	err = merged_table_seek_ref(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = iterator_next_ref(it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+	}
+
+	err = merged_table_seek_log(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = iterator_next_log(it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		/* XXX collect stats? */
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+	}
+
+exit:
+	iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		merged_table_free(mt);
+	}
+	ref_record_clear(&ref);
+
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct stack *st, int first, int last,
+			       struct log_expiry_config *expiry)
+{
+	struct slice temp_tab_name = {};
+	struct slice new_table_name = {};
+	struct slice lock_file_name = {};
+	struct slice ref_list_contents = {};
+	struct slice new_table_path = {};
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **delete_on_success = calloc(sizeof(char *), compact_count + 1);
+	char **subtable_locks = calloc(sizeof(char *), compact_count + 1);
+	int i = 0;
+	int j = 0;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto exit;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_append_string(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0) {
+		goto exit;
+	}
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_name = {};
+		struct slice subtab_lock = {};
+		slice_set_string(&subtab_name, st->reftable_dir);
+		slice_append_string(&subtab_name, "/");
+		slice_append_string(&subtab_name,
+				    reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_name);
+		slice_append_string(&subtab_lock, ".lock");
+
+		{
+			int sublock_file_fd =
+				open(slice_as_string(&subtab_lock),
+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
+			if (sublock_file_fd > 0) {
+				close(sublock_file_fd);
+			} else if (sublock_file_fd < 0) {
+				if (errno == EEXIST) {
+					err = 1;
+				}
+				err = IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] = (char *)slice_as_string(&subtab_name);
+		j++;
+
+		if (err != 0) {
+			goto exit;
+		}
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_name, expiry);
+	if (err < 0) {
+		goto exit;
+	}
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_append_string(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_append_string(&new_table_path, "/");
+
+	slice_append(&new_table_path, new_table_name);
+
+	err = rename(slice_as_string(&temp_tab_name),
+		     slice_as_string(&new_table_path));
+	if (err < 0) {
+		goto exit;
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	slice_append(&ref_list_contents, new_table_name);
+	slice_append_string(&ref_list_contents, "\n");
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	have_lock = false;
+
+	for (char **p = delete_on_success; *p; p++) {
+		if (strcmp(*p, slice_as_string(&new_table_path))) {
+			unlink(*p);
+		}
+	}
+
+	err = stack_reload_maybe_reuse(st, first < last);
+exit:
+	for (char **p = subtable_locks; *p; p++) {
+		unlink(*p);
+	}
+	free_names(delete_on_success);
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	free(slice_yield(&new_table_name));
+	free(slice_yield(&new_table_path));
+	free(slice_yield(&ref_list_contents));
+	free(slice_yield(&temp_tab_name));
+	free(slice_yield(&lock_file_name));
+	return err;
+}
+
+int stack_compact_all(struct stack *st, struct log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct stack *st, int first, int last,
+				     struct log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	assert(sz > 0);
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = calloc(sizeof(struct segment), n);
+	int next = 0;
+	struct segment cur = {};
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct stack *st)
+{
+	uint64_t *sizes = calloc(sizeof(uint64_t), st->merged->stack_len);
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		/* overhead is 24 + 68 = 92. */
+		sizes[i] = st->merged->stack[i]->size - 91;
+	}
+	return sizes;
+}
+
+int stack_auto_compact(struct stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	free(sizes);
+	if (segment_size(&seg) > 0) {
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+	}
+
+	return 0;
+}
+
+struct compaction_stats *stack_compaction_stats(struct stack *st)
+{
+	return &st->stats;
+}
+
+int stack_read_ref(struct stack *st, const char *refname,
+		   struct ref_record *ref)
+{
+	struct iterator it = {};
+	struct merged_table *mt = stack_merged_table(st);
+	int err = merged_table_seek_ref(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = iterator_next_ref(it, ref);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(ref->ref_name, refname) || ref_record_is_deletion(ref)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	iterator_destroy(&it);
+	return err;
+}
+
+int stack_read_log(struct stack *st, const char *refname,
+		   struct log_record *log)
+{
+	struct iterator it = {};
+	struct merged_table *mt = stack_merged_table(st);
+	int err = merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = iterator_next_log(it, log);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(log->ref_name, refname) || log_record_is_deletion(log)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 0000000000..d5e2c93c29
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,40 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+
+struct stack {
+	char *list_file;
+	char *reftable_dir;
+
+	struct write_options config;
+
+	struct merged_table *merged;
+	struct compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct stack *st,
+		  int (*write_table)(struct writer *wr, void *arg), void *arg);
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+			int last, struct log_expiry_config *config);
+int fastlog2(uint64_t sz);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 0000000000..5e9f442374
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,57 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#include "config.h"
+
+#ifndef REFTABLE_STANDALONE
+
+#include "git-compat-util.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#define PRIuMAX "lu"
+#define PRIdMAX "ld"
+#define PRIxMAX "lx"
+#define ARRAY_SIZE(a) sizeof((a)) / sizeof((a)[0])
+#define FREE_AND_NULL(x)    \
+	do {                \
+		free(x);    \
+		(x) = NULL; \
+	} while (0)
+#define QSORT(arr, n, cmp) qsort(arr, n, sizeof(arr[0]), cmp)
+#define SWAP(a, b) \
+  { \
+     char tmp[sizeof(a)]; \
+     assert(sizeof(a)==sizeof(b)); \
+     memcpy(&tmp[0], &a, sizeof(a)); \
+     memcpy(&a, &b, sizeof(a)); \
+     memcpy(&b, &tmp[0], sizeof(a)); \
+  }
+#endif
+
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 0000000000..9bf7fe531f
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,66 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				calloc(sizeof(struct tree_node), 1);
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	{
+		int res = compare(key, (*rootp)->key);
+		if (res < 0) {
+			return tree_search(key, &(*rootp)->left, compare,
+					   insert);
+		} else if (res > 0) {
+			return tree_search(key, &(*rootp)->right, compare,
+					   insert);
+		}
+	}
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 0000000000..86a71715ae
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,24 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 0000000000..3785bc8a4a
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,622 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct block_stats *writer_block_stats(struct writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct writer *w, byte *data, size_t len, int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = calloc(w->pending_padding, 1);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0) {
+			return n;
+		}
+
+		w->pending_padding = 0;
+		free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0) {
+		return n;
+	}
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_write_header(struct writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+	dest[4] = 1; /* version */
+	put_u24(dest + 5, w->opts.block_size);
+	put_u64(dest + 8, w->min_update_index);
+	put_u64(dest + 16, w->max_update_index);
+	return 24;
+}
+
+static void writer_reinit_block_writer(struct writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = HEADER_SIZE;
+	}
+
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start, w->hash_size);
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct writer *new_writer(int (*writer_func)(void *, byte *, int),
+			  void *writer_arg, struct write_options *opts)
+{
+	struct writer *wp = calloc(sizeof(struct writer), 1);
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->hash_size = SHA1_SIZE;
+	wp->block = calloc(opts->block_size, 1);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void writer_free(struct writer *w)
+{
+	free(w->block);
+	free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
+			     ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		key = calloc(sizeof(struct obj_index_tree_node), 1);
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = realloc(key->offsets,
+				       sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct writer *w, struct record rec)
+{
+	int result = -1;
+	struct slice key = {};
+	int err = 0;
+	record_key(rec, &key);
+	if (slice_compare(w->last_key, key) >= 0) {
+		goto exit;
+	}
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto exit;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	writer_reinit_block_writer(w, record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	result = 0;
+exit:
+	free(slice_yield(&key));
+	return result;
+}
+
+int writer_add_ref(struct writer *w, struct ref_record *ref)
+{
+	struct record rec = {};
+	struct ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL) {
+		return API_ERROR;
+	}
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index) {
+		return API_ERROR;
+	}
+
+	record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = w->hash_size,
+		};
+
+		writer_index_hash(w, h);
+	}
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = w->hash_size,
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int writer_add_log(struct writer *w, struct log_record *log)
+{
+	if (log->ref_name == NULL) {
+		return API_ERROR;
+	}
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	{
+		struct record rec = {};
+		int err;
+		record_from_log(&rec, log);
+		err = writer_add_record(w, rec);
+		return err;
+	}
+}
+
+int writer_add_logs(struct writer *w, struct log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	while (w->index_len > threshold) {
+		struct index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct record rec = {};
+			record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, rec) == 0) {
+				continue;
+			}
+
+			{
+				int err = writer_flush_block(w);
+				if (err < 0) {
+					return err;
+				}
+			}
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, rec);
+			assert(err == 0);
+		}
+		for (i = 0; i < idx_len; i++) {
+			free(slice_yield(&idx[i].last_key));
+		}
+		free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct block_stats *bstats = writer_block_stats(w, typ);
+		bstats->index_blocks =
+			w->stats.idx_stats.blocks - before_blocks;
+		bstats->index_offset = index_start;
+		bstats->max_index_level = max_level;
+	}
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct record rec = {};
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	free(slice_yield(&entry->hash));
+	free(entry);
+}
+
+static int writer_dump_object_index(struct writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = {};
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0) {
+		return closure.err;
+	}
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0) {
+		return err;
+	}
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_close(struct writer *w)
+{
+	byte footer[68];
+	byte *p = footer;
+
+	writer_finish_public_section(w);
+
+	writer_write_header(w, footer);
+	p += 24;
+	put_u64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_u64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_u64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_u64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_u64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_u32(p, crc32(0, footer, p - footer));
+	p += 4;
+	w->pending_padding = 0;
+
+	{
+		int n = padded_write(w, footer, sizeof(footer), 0);
+		if (n < 0) {
+			return n;
+		}
+	}
+
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	free(slice_yield(&w->last_key));
+	return 0;
+}
+
+void writer_clear_index(struct writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		free(slice_yield(&w->index[i].last_key));
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct block_stats *bstats = writer_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	if (raw_bytes < 0) {
+		return raw_bytes;
+	}
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIuMAX " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_u24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0) {
+		return err;
+	}
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = realloc(w->index,
+				   sizeof(struct index_record) * w->index_cap);
+	}
+
+	{
+		struct index_record ir = {
+			.offset = w->next,
+		};
+		slice_copy(&ir.last_key, w->block_writer->last_key);
+		w->index[w->index_len] = ir;
+	}
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	block_writer_reset(&w->block_writer_data);
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct writer *w)
+{
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+	if (w->block_writer->entries == 0) {
+		return 0;
+	}
+	return writer_flush_nonempty_block(w);
+}
+
+struct stats *writer_stats(struct writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 0000000000..bd2386474b
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,46 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct writer {
+	int (*write)(void *, byte *, int);
+	void *write_arg;
+	int pending_padding;
+	int hash_size;
+	struct slice last_key;
+
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct write_options opts;
+
+	byte *block;
+	struct block_writer *block_writer;
+	struct block_writer block_writer_data;
+	struct index_record *index;
+	int index_len;
+	int index_cap;
+
+	/* tree for use with tsearch */
+	struct tree_node *obj_index_tree;
+
+	struct stats stats;
+};
+
+int writer_flush_block(struct writer *w);
+void writer_clear_index(struct writer *w);
+int writer_finish_public_section(struct writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 0000000000..3e0b0f24f1
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v3 6/6] Reftable support for git-core
  2020-02-04 20:27   ` [PATCH v3 0/6] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                       ` (4 preceding siblings ...)
  2020-02-04 20:27     ` [PATCH v3 5/6] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-02-04 20:27     ` Han-Wen Nienhuys via GitGitGadget
  2020-02-06 22:55     ` [PATCH v4 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  6 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-04 20:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

TODO:

 * Make CI on gitgitgadget compile.
 * Redo interaction between repo config, repo storage version
 * Resolve the design problem with reflog expiry.
 * Resolve spots marked with XXX

Example use:

  $ ~/vc/git/git init --reftable
  warning: templates not found in /usr/local/google/home/hanwen/share/git-core/templates
  Initialized empty Git repository in /tmp/qz/.git/
  $ echo q > a
  $ ~/vc/git/git add a
  $ ~/vc/git/git commit -mx
  fatal: not a git repository (or any of the parent directories): .git
  [master (root-commit) 373d969] x
   1 file changed, 1 insertion(+)
   create mode 100644 a
  $ ~/vc/git/git show-ref
  373d96972fca9b63595740bba3898a762778ba20 HEAD
  373d96972fca9b63595740bba3898a762778ba20 refs/heads/master
  $ ls -l .git/reftable/
  total 12
  -rw------- 1 hanwen primarygroup  126 Jan 23 20:08 000000000001-000000000001.ref
  -rw------- 1 hanwen primarygroup 4277 Jan 23 20:08 000000000002-000000000002.ref
  $ go run ~/vc/reftable/cmd/dump.go  -table /tmp/qz/.git/reftable/000000000002-000000000002.ref
  ** DEBUG **
  name /tmp/qz/.git/reftable/000000000002-000000000002.ref, sz 4209: 'r' reftable.readerOffsets{Present:true, Offset:0x0, IndexOffset:0x0}, 'o' reftable.readerOffsets{Present:false, Offset:0x0, IndexOffset:0x0} 'g' reftable.readerOffsets{Present:true, Offset:0x1000, IndexOffset:0x0}
  ** REFS **
  reftable.RefRecord{RefName:"refs/heads/master", UpdateIndex:0x2, Value:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, TargetValue:[]uint8(nil), Target:""}
  ** LOGS **
  reftable.LogRecord{RefName:"HEAD", UpdateIndex:0x2, New:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, Old:[]uint8{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, Name:"Han-Wen Nienhuys", Email:"hanwen@google.com", Time:0x5e29ef27, TZOffset:100, Message:"commit (initial): x\n"}

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile                |  24 +-
 builtin/init-db.c       |  40 +-
 cache.h                 |   2 +
 refs.c                  |  22 +-
 refs/refs-internal.h    |   1 +
 refs/reftable-backend.c | 880 ++++++++++++++++++++++++++++++++++++++++
 repository.c            |   4 +
 repository.h            |   3 +
 setup.c                 |   7 +-
 9 files changed, 959 insertions(+), 24 deletions(-)
 create mode 100644 refs/reftable-backend.c

diff --git a/Makefile b/Makefile
index 6134104ae6..706bb0a981 100644
--- a/Makefile
+++ b/Makefile
@@ -814,6 +814,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += command-list.h
 
@@ -959,6 +960,7 @@ LIB_OBJS += rebase-interactive.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1163,7 +1165,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2347,11 +2349,28 @@ VCSSVN_OBJS += vcs-svn/fast_export.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/bytes.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2488,6 +2507,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 45bdea0589..dcea74610c 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -177,7 +177,7 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 }
 
 static int create_default_files(const char *template_path,
-				const char *original_git_dir)
+				const char *original_git_dir, int flags)
 {
 	struct stat st1;
 	struct strbuf buf = STRBUF_INIT;
@@ -234,17 +234,24 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
+	if (flags & INIT_DB_REFTABLE) {
+		reinit = 0; /* XXX - how do we recognize a reinit,
+			     * and what should we do? */
+	} else {
+		path = git_path_buf(&buf, "HEAD");
+		reinit = (!access(path, R_OK) ||
+			  readlink(path, junk, sizeof(junk) - 1) != -1);
+	}
+
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
 	}
 
 	/* This forces creation of new config file */
-	xsnprintf(repo_version_string, sizeof(repo_version_string),
-		  "%d", GIT_REPO_VERSION);
+	xsnprintf(repo_version_string, sizeof(repo_version_string), "%d",
+		  flags & INIT_DB_REFTABLE ? GIT_REPO_VERSION_READ :
+					     GIT_REPO_VERSION);
 	git_config_set("core.repositoryformatversion", repo_version_string);
 
 	/* Check filemode trustability */
@@ -378,7 +385,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 */
 	check_repository_format();
 
-	reinit = create_default_files(template_dir, original_git_dir);
+	reinit = create_default_files(template_dir, original_git_dir, flags);
 
 	create_object_directory();
 
@@ -403,6 +410,10 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	if (flags & INIT_DB_REFTABLE) {
+		git_config_set("extensions.refStorage", "reftable");
+	}
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -481,15 +492,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *template_dir = NULL;
 	unsigned int flags = 0;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_BIT(0, "reftable", &flags, N_("use reftable"),
+			INIT_DB_REFTABLE),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_END()
diff --git a/cache.h b/cache.h
index cbfaead23a..929a61e861 100644
--- a/cache.h
+++ b/cache.h
@@ -625,6 +625,7 @@ int path_inside_repo(const char *prefix, const char *path);
 
 #define INIT_DB_QUIET 0x0001
 #define INIT_DB_EXIST_OK 0x0002
+#define INIT_DB_REFTABLE 0x0004
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, unsigned int flags);
@@ -1041,6 +1042,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 1ab0bb54d3..493d3fb673 100644
--- a/refs.c
+++ b/refs.c
@@ -20,7 +20,7 @@
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1836,13 +1836,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1858,7 +1858,12 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs = ref_store_init(r->gitdir,
+				 /* XXX r->ref_storage_format == NULL. Where
+				  * should the config file be parsed out? */
+				 r->ref_storage_format ? r->ref_storage_format :
+							 "reftable",
+				 REF_STORE_ALL_CAPS);
 	return r->refs;
 }
 
@@ -1913,7 +1918,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, "files", /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1927,6 +1932,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = "files"; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1940,9 +1946,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 1d7a485220..d87e9cbdec 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -661,6 +661,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 0000000000..9394389c0a
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,880 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+/*
+  The reftable v1 spec only supports 20-byte binary SHA1s. A new format version
+  will be needed to support SHA256.
+ */
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *reftable_dir;
+	char *table_list_file;
+	struct stack *stack;
+};
+
+static void clear_log_record(struct log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	log_record_clear(log);
+}
+
+static void fill_log_record(struct log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = {};
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *reftable_ref_store_create(const char *path,
+						   unsigned int store_flags)
+{
+	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct write_options cfg = {
+		.block_size = 4096,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/reftable/tables.list", path);
+	refs->table_list_file = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", path);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/HEAD", path);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", path);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	refs->err = new_stack(&refs->stack, refs->reftable_dir,
+			      refs->table_list_file, cfg);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+
+	safe_create_dir(refs->reftable_dir, 1);
+	FILE *f = fopen(refs->table_list_file, "a");
+	if (f == NULL) {
+		return -1;
+	}
+	fclose(f);
+	return 0;
+}
+
+struct reftable_iterator {
+	struct ref_iterator base;
+	struct iterator iter;
+	struct ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = iterator_next_ref(ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	ref_record_clear(&ri->ref);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct merged_table *mt = NULL;
+
+	mt = stack_merged_table(refs->stack);
+	ri->err = refs->err;
+	if (ri->err == 0) {
+		ri->err = merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	(void)refs;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid = {};
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct ref_update *)a)->refname,
+		      ((struct ref_update *)b)->refname);
+}
+
+static int write_transaction_table(struct writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = stack_next_update_index(refs->stack);
+	int err = 0;
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	writer_set_limits(writer, ts, ts);
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_NEW) {
+			struct object_id out_oid = {};
+			int out_flags = 0;
+			/* Memory owned by refs_resolve_ref_unsafe, no need to
+			 * free(). */
+			const char *resolved = refs_resolve_ref_unsafe(
+				transaction->ref_store, u->refname, 0, &out_oid,
+				&out_flags);
+			struct ref_record ref = {};
+			ref.ref_name =
+				(char *)(resolved ? resolved : u->refname);
+			ref.value = u->new_oid.hash;
+			ref.update_index = ts;
+			err = writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct log_record log = {};
+		fill_log_record(&log);
+
+		log.ref_name = (char *)u->refname;
+		log.old_hash = u->old_oid.hash;
+		log.new_hash = u->new_oid.hash;
+		log.update_index = ts;
+		log.message = u->msg;
+
+		err = writer_add_log(writer, &log);
+		clear_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+exit:
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_commit(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	int err = stack_add(refs->stack, &write_transaction_table, transaction);
+	if (err < 0) {
+		strbuf_addf(errmsg, "reftable: transaction failure %s",
+			    error_str(err));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *err)
+{
+	return reftable_transaction_commit(ref_store, transaction, err);
+}
+
+struct write_delete_refs_arg {
+	struct stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	int err = 0;
+
+	writer_set_limits(writer, ts, ts);
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct log_record log = {};
+		struct ref_record current = {};
+		fill_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (stack_read_ref(arg->stack, log.ref_name, &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		ref_record_clear(&current);
+
+		clear_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	return stack_add(refs->stack, &write_delete_refs_table, &arg);
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	return stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	writer_set_limits(writer, ts, ts);
+	err = writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct log_record log = {};
+		struct object_id new_oid = {};
+		struct object_id old_oid = {};
+		struct ref_record current = {};
+		stack_read_ref(create->refs->stack, create->refname, &current);
+
+		fill_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		/* XXX should the resolution be done relative or absolute? */
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	return stack_add(refs->stack, &write_create_symref_table, &arg);
+}
+
+struct write_rename_arg {
+	struct stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	struct ref_record ref = {};
+	int err = stack_read_ref(arg->stack, arg->oldname, &ref);
+	if (err) {
+		goto exit;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto exit;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct ref_record todo[2] = {};
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct log_record todo[2] = {};
+		fill_log_record(&todo[0]);
+		fill_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = writer_add_logs(writer, todo, 2);
+
+		clear_log_record(&todo[0]);
+		clear_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto exit;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	return stack_add(refs->stack, &write_rename_table, &arg);
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct iterator iter;
+	struct log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = iterator_next_log(ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	log_record_clear(&ri->log);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return empty_ref_iterator_begin();
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct iterator it = {};
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &it, refname);
+	struct log_record log = {};
+
+	while (err == 0) {
+		err = iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid = {};
+			struct object_id new_oid = {};
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	log_record_clear(&log);
+	iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct iterator it = {};
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &it, refname);
+
+	struct log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+
+	while (err == 0) {
+		struct log_record log = {};
+		err = iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (int i = len; i--;) {
+		struct log_record *log = &logs[i];
+		struct object_id old_oid = {};
+		struct object_id new_oid = {};
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (int i = 0; i < len; i++) {
+		log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	BUG("per ref reflog expiry is not implemented for reftable backend.");
+	/*
+	  TODO: should iterate over the reflog, and write tombstones. The
+	  tombstones will be removed on compaction.
+	 */
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct ref_record ref = {};
+	int err = stack_read_ref(refs->stack, refname, &ref);
+	if (err) {
+		goto exit;
+	}
+	if (ref.target != NULL) {
+		/* XXX recurse? */
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else {
+		hashcpy(oid->hash, ref.value);
+	}
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire
+};
diff --git a/repository.c b/repository.c
index a4174ddb06..aff47302c9 100644
--- a/repository.c
+++ b/repository.c
@@ -174,6 +174,10 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = format.ref_storage != NULL ?
+					   xstrdup(format.ref_storage) :
+					   "files"; /* XXX */
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 040057dea6..198d4aa090 100644
--- a/repository.h
+++ b/repository.h
@@ -70,6 +70,9 @@ struct repository {
 	/* The store in which the refs are held. */
 	struct ref_store *refs;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index e7c7dca701..a339186d83 100644
--- a/setup.c
+++ b/setup.c
@@ -448,9 +448,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refStorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -539,6 +541,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v3 2/6] setup.c: enable repo detection for reftable
  2020-02-04 20:27     ` [PATCH v3 2/6] setup.c: enable repo detection for reftable Han-Wen Nienhuys via GitGitGadget
@ 2020-02-04 20:31       ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-04 20:31 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git

On Tue, Feb 4, 2020 at 9:27 PM Han-Wen Nienhuys via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Han-Wen Nienhuys <hanwen@google.com>
>
> * only check R_OK for .git/refs.

this patch can go now.
-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v3 3/6] create .git/refs in files-backend.c
  2020-02-04 20:27     ` [PATCH v3 3/6] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
@ 2020-02-04 21:29       ` Junio C Hamano
  2020-02-05 11:34         ` Han-Wen Nienhuys
  2020-02-05 11:42       ` SZEDER Gábor
  1 sibling, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-02-04 21:29 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> This prepares for supporting the reftable format, which will want
> create its own file system layout in .git
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  builtin/init-db.c    | 2 --
>  refs/files-backend.c | 5 +++++
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index 944ec77fe1..45bdea0589 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
>  	 * We need to create a "refs" dir in any case so that older
>  	 * versions of git can tell that this is a repository.
>  	 */
> -	safe_create_dir(git_path("refs"), 1);
> -	adjust_shared_perm(git_path("refs"));
>  
>  	if (refs_init_db(&err))
>  		die("failed to set up refs db: %s", err.buf);
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 0ea66a28b6..0c53b246e8 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3158,6 +3158,11 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
>  		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
>  	struct strbuf sb = STRBUF_INIT;
>  
> +	files_ref_path(refs, &sb, "refs");

Have you run "git diff --check" before committing?

> +	safe_create_dir(sb.buf, 1);
> +        /* adjust permissions even if directory already exists. */

whitespace error here, but this comment is really appreciated.  It
wasn't immediately clear why we do this again in the original.

> +	adjust_shared_perm(sb.buf);
> +
>  	/*
>  	 * Create .git/refs/{heads,tags}
>  	 */

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v2 5/5] Reftable support for git-core
  2020-02-04 20:22       ` Han-Wen Nienhuys
@ 2020-02-04 22:13         ` Jeff King
  0 siblings, 0 replies; 409+ messages in thread
From: Jeff King @ 2020-02-04 22:13 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, Feb 04, 2020 at 09:22:36PM +0100, Han-Wen Nienhuys wrote:

> >   - init/clone should take a command-line option for the ref format of
> >     the new repository. Anybody choosing reftables would want to set
> >     core.repositoryformatversion to "1" and set the extensions.refFormat
> >     key.
> 
> I did this, but are you sure this works? Where would the
> repo->ref_storage_format get set? I tried adding it to repo_init(),
> but this doesn't get called in a normal startup sequence.
> 
> Breakpoint 3, ref_store_init (gitdir=0x555555884b70 ".git",
> be_name=0x5555557942ca "reftable", flags=15) at refs.c:1841
> warning: Source file is more recent than executable.
> 1841 {
> (gdb) up
> #1  0x00005555556de2c8 in get_main_ref_store (r=0x555555871d40
> <the_repo>) at refs.c:1862
> (gdb) p r->ref_storage_format
> $1 = 0x0
> }

Hmm. Looks like repo_init() is only used for submodules. Which is pretty
horrid, because that setup sequence is basically duplicated between
there and setup_git_directory(), which is the traditional mechanism.
That's something that ought to be cleaned up, but it's definitely
outside the scope of your series.

The patch below was enough to make something like this work:

  git init --reftable repo
  cd repo
  git commit --allow-empty -m foo
  git for-each-ref

I had to make some tweaks to the init code, too:

 - setting the_repository->ref_storage_format based on the --reftable
   flag

 - we can still recognize a reinit by the presence of a HEAD file
   (whether it's a real one or a dummy). And we should call
   create_symref() in either case, either to create HEAD or to create
   the HEAD symlink inside reftables

I also fixed a bug in your first patch, where the creation of refs/
moves into files_init_db(). The strbuf needs reset there, or we just
append to it, and end up with a doubled path (you could see it by
running the same commands above without --reftable, but of course not
with your patch series as-is because it will _always_ choose reftable).

diff --git a/builtin/init-db.c b/builtin/init-db.c
index dcea74610c..ea2a333c4a 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -213,6 +213,8 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	if (flags & INIT_DB_REFTABLE)
+		the_repository->ref_storage_format = xstrdup("reftable");
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -222,6 +224,15 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -234,18 +245,14 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	if (flags & INIT_DB_REFTABLE) {
-		reinit = 0; /* XXX - how do we recognize a reinit,
-			     * and what should we do? */
-	} else {
-		path = git_path_buf(&buf, "HEAD");
-		reinit = (!access(path, R_OK) ||
-			  readlink(path, junk, sizeof(junk) - 1) != -1);
-	}
-
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
+	} else {
+		/*
+		 * XXX should check whether our ref backend matches the
+		 * original one; if not, either die() or convert
+		 */
 	}
 
 	/* This forces creation of new config file */
diff --git a/refs.c b/refs.c
index 493d3fb673..d6ddbbcc67 100644
--- a/refs.c
+++ b/refs.c
@@ -1859,10 +1859,9 @@ struct ref_store *get_main_ref_store(struct repository *r)
 		BUG("attempting to get main_ref_store outside of repository");
 
 	r->refs = ref_store_init(r->gitdir,
-				 /* XXX r->ref_storage_format == NULL. Where
-				  * should the config file be parsed out? */
-				 r->ref_storage_format ? r->ref_storage_format :
-							 "reftable",
+				 r->ref_storage_format ?
+					r->ref_storage_format :
+					"files",
 				 REF_STORE_ALL_CAPS);
 	return r->refs;
 }
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0c53b246e8..6f9efde9ca 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3166,6 +3166,7 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 	/*
 	 * Create .git/refs/{heads,tags}
 	 */
+	strbuf_reset(&sb);
 	files_ref_path(refs, &sb, "refs/heads");
 	safe_create_dir(sb.buf, 1);
 
diff --git a/repository.c b/repository.c
index aff47302c9..ff0988dac8 100644
--- a/repository.c
+++ b/repository.c
@@ -174,9 +174,7 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
-	repo->ref_storage_format = format.ref_storage != NULL ?
-					   xstrdup(format.ref_storage) :
-					   "files"; /* XXX */
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
 
 	clear_repository_format(&format);
 	return 0;
diff --git a/setup.c b/setup.c
index a339186d83..58c5cd3bf0 100644
--- a/setup.c
+++ b/setup.c
@@ -450,7 +450,7 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			data->partial_clone = xstrdup(value);
 		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		} else if (!strcmp(ext, "refStorage")) {
+		} else if (!strcmp(ext, "refstorage")) {
 			data->ref_storage = xstrdup(value);
 		} else
 			string_list_append(&data->unknown_extensions, ext);
@@ -1184,8 +1184,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v3 3/6] create .git/refs in files-backend.c
  2020-02-04 21:29       ` Junio C Hamano
@ 2020-02-05 11:34         ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-05 11:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, Feb 4, 2020 at 10:29 PM Junio C Hamano <gitster@pobox.com> wrote:
> > +     files_ref_path(refs, &sb, "refs");
>
> Have you run "git diff --check" before committing?

let me install the clang-format pre-commit hook.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v3 3/6] create .git/refs in files-backend.c
  2020-02-04 20:27     ` [PATCH v3 3/6] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
  2020-02-04 21:29       ` Junio C Hamano
@ 2020-02-05 11:42       ` SZEDER Gábor
  2020-02-05 12:24         ` Jeff King
  1 sibling, 1 reply; 409+ messages in thread
From: SZEDER Gábor @ 2020-02-05 11:42 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

On Tue, Feb 04, 2020 at 08:27:37PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> From: Han-Wen Nienhuys <hanwen@google.com>
> 
> This prepares for supporting the reftable format, which will want
> create its own file system layout in .git

This breaks 'git init', and, consequently, the whole test suite:

  $ ./git init /tmp/foo
  /tmp/foo/.git/refs/tmp/foo/.git/refs/heads: No such file or directory


>  builtin/init-db.c    | 2 --
>  refs/files-backend.c | 5 +++++
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index 944ec77fe1..45bdea0589 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
>  	 * We need to create a "refs" dir in any case so that older
>  	 * versions of git can tell that this is a repository.
>  	 */
> -	safe_create_dir(git_path("refs"), 1);
> -	adjust_shared_perm(git_path("refs"));
>  
>  	if (refs_init_db(&err))
>  		die("failed to set up refs db: %s", err.buf);
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 0ea66a28b6..0c53b246e8 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3158,6 +3158,11 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
>  		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
>  	struct strbuf sb = STRBUF_INIT;
>  
> +	files_ref_path(refs, &sb, "refs");
> +	safe_create_dir(sb.buf, 1);
> +        /* adjust permissions even if directory already exists. */
> +	adjust_shared_perm(sb.buf);
> +
>  	/*
>  	 * Create .git/refs/{heads,tags}
>  	 */
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v3 3/6] create .git/refs in files-backend.c
  2020-02-05 11:42       ` SZEDER Gábor
@ 2020-02-05 12:24         ` Jeff King
  0 siblings, 0 replies; 409+ messages in thread
From: Jeff King @ 2020-02-05 12:24 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Han-Wen Nienhuys

On Wed, Feb 05, 2020 at 12:42:10PM +0100, SZEDER Gábor wrote:

> On Tue, Feb 04, 2020 at 08:27:37PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> > From: Han-Wen Nienhuys <hanwen@google.com>
> > 
> > This prepares for supporting the reftable format, which will want
> > create its own file system layout in .git
> 
> This breaks 'git init', and, consequently, the whole test suite:
> 
>   $ ./git init /tmp/foo
>   /tmp/foo/.git/refs/tmp/foo/.git/refs/heads: No such file or directory

Yeah, this is one of the fixes in the patch I sent earlier in the
thread. The issue is here:

> > @@ -3158,6 +3158,11 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
> >  		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
> >  	struct strbuf sb = STRBUF_INIT;
> >  
> > +	files_ref_path(refs, &sb, "refs");
> > +	safe_create_dir(sb.buf, 1);
> > +        /* adjust permissions even if directory already exists. */
> > +	adjust_shared_perm(sb.buf);
> > +
> >  	/*
> >  	 * Create .git/refs/{heads,tags}
> >  	 */

Right after this context, we call files_ref_path() with "sb" again.
There needs to be a strbuf_reset() beforehand.

-Peff

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v4 0/5] Reftable support git-core
  2020-02-04 20:27   ` [PATCH v3 0/6] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                       ` (5 preceding siblings ...)
  2020-02-04 20:27     ` [PATCH v3 6/6] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-02-06 22:55     ` Han-Wen Nienhuys via GitGitGadget
  2020-02-06 22:55       ` [PATCH v4 1/5] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
                         ` (6 more replies)
  6 siblings, 7 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-06 22:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend.

At this point, I am mainly interested in feedback on the spots marked with
XXX in the Git source code.

v3

 * passes gitgitgadget CI.

Han-Wen Nienhuys (5):
  refs.h: clarify reflog iteration order
  create .git/refs in files-backend.c
  refs: document how ref_iterator_advance_fn should handle symrefs
  Add reftable library
  Reftable support for git-core

 Makefile                |   24 +-
 builtin/init-db.c       |   49 +-
 cache.h                 |    2 +
 refs.c                  |   20 +-
 refs.h                  |    5 +-
 refs/files-backend.c    |    6 +
 refs/refs-internal.h    |    6 +
 refs/reftable-backend.c |  880 +++++++++++++++++++++++++++++++
 reftable/LICENSE        |   31 ++
 reftable/README.md      |   19 +
 reftable/VERSION        |    5 +
 reftable/basics.c       |  196 +++++++
 reftable/basics.h       |   37 ++
 reftable/block.c        |  401 ++++++++++++++
 reftable/block.h        |   71 +++
 reftable/blocksource.h  |   20 +
 reftable/bytes.c        |    0
 reftable/config.h       |    1 +
 reftable/constants.h    |   27 +
 reftable/dump.c         |   97 ++++
 reftable/file.c         |   97 ++++
 reftable/iter.c         |  229 ++++++++
 reftable/iter.h         |   56 ++
 reftable/merged.c       |  286 ++++++++++
 reftable/merged.h       |   34 ++
 reftable/pq.c           |  114 ++++
 reftable/pq.h           |   34 ++
 reftable/reader.c       |  708 +++++++++++++++++++++++++
 reftable/reader.h       |   52 ++
 reftable/record.c       | 1107 +++++++++++++++++++++++++++++++++++++++
 reftable/record.h       |   79 +++
 reftable/reftable.h     |  399 ++++++++++++++
 reftable/slice.c        |  199 +++++++
 reftable/slice.h        |   39 ++
 reftable/stack.c        |  983 ++++++++++++++++++++++++++++++++++
 reftable/stack.h        |   40 ++
 reftable/system.h       |   55 ++
 reftable/tree.c         |   66 +++
 reftable/tree.h         |   24 +
 reftable/writer.c       |  623 ++++++++++++++++++++++
 reftable/writer.h       |   46 ++
 reftable/zlib-compat.c  |   92 ++++
 repository.c            |    2 +
 repository.h            |    3 +
 setup.c                 |   12 +-
 45 files changed, 7248 insertions(+), 28 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c


base-commit: 5b0ca878e008e82f91300091e793427205ce3544
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v3:

 1:  c00403c94d = 1:  c00403c94d refs.h: clarify reflog iteration order
 2:  57c7342319 < -:  ---------- setup.c: enable repo detection for reftable
 3:  5b7060cb2f ! 2:  4d6da9bc47 create .git/refs in files-backend.c
     @@ -29,9 +29,13 @@
       
      +	files_ref_path(refs, &sb, "refs");
      +	safe_create_dir(sb.buf, 1);
     -+        /* adjust permissions even if directory already exists. */
     ++	/* adjust permissions even if directory already exists. */
      +	adjust_shared_perm(sb.buf);
      +
       	/*
       	 * Create .git/refs/{heads,tags}
       	 */
     ++	strbuf_reset(&sb);
     + 	files_ref_path(refs, &sb, "refs/heads");
     + 	safe_create_dir(sb.buf, 1);
     + 
 4:  1b01c735a9 = 3:  fbdcdccc88 refs: document how ref_iterator_advance_fn should handle symrefs
 5:  eb0df10068 ! 4:  02d2ca8b87 Add reftable library
     @@ -98,11 +98,11 @@
       --- /dev/null
       +++ b/reftable/VERSION
      @@
     -+commit e54326f73d95bfe8b17f264c400f4c365dbd5e5e
     ++commit e7c3fc3099d9999bc8d895f84027b0e36348d5e6
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Tue Feb 4 19:48:17 2020 +0100
     ++Date:   Thu Feb 6 20:17:40 2020 +0100
      +
     -+    C: PRI?MAX use; clang-format.
     ++    C: use inttypes.h header definitions
      
       diff --git a/reftable/basics.c b/reftable/basics.c
       new file mode 100644
     @@ -2882,7 +2882,7 @@
      +{
      +	char hex[SHA256_SIZE + 1] = {};
      +
     -+	printf("ref{%s(%" PRIdMAX ") ", ref->ref_name, ref->update_index);
     ++	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
      +	if (ref->value != NULL) {
      +		hex_format(hex, ref->value, hash_size);
      +		printf("%s", hex);
     @@ -3274,9 +3274,9 @@
      +{
      +	char hex[SHA256_SIZE + 1] = {};
      +
     -+	printf("log{%s(%" PRIdMAX ") %s <%s> %" PRIuMAX " %04d\n",
     -+	       log->ref_name, log->update_index, log->name, log->email,
     -+	       log->time, log->tz_offset);
     ++	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
     ++	       log->update_index, log->name, log->email, log->time,
     ++	       log->tz_offset);
      +	hex_format(hex, log->old_hash, hash_size);
      +	printf("%s => ", hex);
      +	hex_format(hex, log->new_hash, hash_size);
     @@ -4860,7 +4860,7 @@
      +static void format_name(struct slice *dest, uint64_t min, uint64_t max)
      +{
      +	char buf[100];
     -+	snprintf(buf, sizeof(buf), "%012" PRIxMAX "-%012" PRIxMAX, min, max);
     ++	snprintf(buf, sizeof(buf), "%012" PRIx64 "-%012" PRIx64, min, max);
      +	slice_set_string(dest, buf);
      +}
      +
     @@ -5585,6 +5585,7 @@
      +#include <assert.h>
      +#include <errno.h>
      +#include <fcntl.h>
     ++#include <inttypes.h>
      +#include <stdint.h>
      +#include <stdio.h>
      +#include <stdlib.h>
     @@ -5595,9 +5596,6 @@
      +#include <unistd.h>
      +#include <zlib.h>
      +
     -+#define PRIuMAX "lu"
     -+#define PRIdMAX "ld"
     -+#define PRIxMAX "lx"
      +#define ARRAY_SIZE(a) sizeof((a)) / sizeof((a)[0])
      +#define FREE_AND_NULL(x)    \
      +	do {                \
     @@ -5605,14 +5603,14 @@
      +		(x) = NULL; \
      +	} while (0)
      +#define QSORT(arr, n, cmp) qsort(arr, n, sizeof(arr[0]), cmp)
     -+#define SWAP(a, b) \
     -+  { \
     -+     char tmp[sizeof(a)]; \
     -+     assert(sizeof(a)==sizeof(b)); \
     -+     memcpy(&tmp[0], &a, sizeof(a)); \
     -+     memcpy(&a, &b, sizeof(a)); \
     -+     memcpy(&b, &tmp[0], sizeof(a)); \
     -+  }
     ++#define SWAP(a, b)                              \
     ++	{                                       \
     ++		char tmp[sizeof(a)];            \
     ++		assert(sizeof(a) == sizeof(b)); \
     ++		memcpy(&tmp[0], &a, sizeof(a)); \
     ++		memcpy(&a, &b, sizeof(a));      \
     ++		memcpy(&b, &tmp[0], sizeof(a)); \
     ++	}
      +#endif
      +
      +int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
     @@ -5758,6 +5756,7 @@
      +		return &w->stats.log_stats;
      +	}
      +	assert(false);
     ++	return NULL;
      +}
      +
      +/* write data, queuing the padding for the next write. Returns negative for
     @@ -6299,7 +6298,7 @@
      +	w->stats.blocks++;
      +
      +	if (debug) {
     -+		fprintf(stderr, "block %c off %" PRIuMAX " sz %d (%d)\n", typ,
     ++		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
      +			w->next, raw_bytes,
      +			get_u24(w->block + w->block_writer->header_off + 1));
      +	}
 6:  0b2a1a81d6 ! 5:  2786a6bf61 Reftable support for git-core
     @@ -39,6 +39,7 @@
            reftable.LogRecord{RefName:"HEAD", UpdateIndex:0x2, New:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, Old:[]uint8{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, Name:"Han-Wen Nienhuys", Email:"hanwen@google.com", Time:0x5e29ef27, TZOffset:100, Message:"commit (initial): x\n"}
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
     +    Co-authored-by: Jeff King <peff@peff.net>
      
       diff --git a/Makefile b/Makefile
       --- a/Makefile
     @@ -120,6 +121,31 @@ $^
       {
       	struct stat st1;
       	struct strbuf buf = STRBUF_INIT;
     +@@
     + 	is_bare_repository_cfg = init_is_bare_repository;
     + 	if (init_shared_repository != -1)
     + 		set_shared_repository(init_shared_repository);
     ++	if (flags & INIT_DB_REFTABLE)
     ++		the_repository->ref_storage_format = xstrdup("reftable");
     + 
     + 	/*
     + 	 * We would have created the above under user's umask -- under
     +@@
     + 		adjust_shared_perm(get_git_dir());
     + 	}
     + 
     ++	/*
     ++	 * Check to see if .git/HEAD exists; this must happen before
     ++	 * initializing the ref db, because we want to see if there is an
     ++	 * existing HEAD.
     ++	 */
     ++	path = git_path_buf(&buf, "HEAD");
     ++	reinit = (!access(path, R_OK) ||
     ++		  readlink(path, junk, sizeof(junk) - 1) != -1);
     ++
     + 	/*
     + 	 * We need to create a "refs" dir in any case so that older
     + 	 * versions of git can tell that this is a repository.
      @@
       	 * Create the default symlink from ".git/HEAD" to the "master"
       	 * branch, if it does not exist yet.
     @@ -127,18 +153,14 @@ $^
      -	path = git_path_buf(&buf, "HEAD");
      -	reinit = (!access(path, R_OK)
      -		  || readlink(path, junk, sizeof(junk)-1) != -1);
     -+	if (flags & INIT_DB_REFTABLE) {
     -+		reinit = 0; /* XXX - how do we recognize a reinit,
     -+			     * and what should we do? */
     -+	} else {
     -+		path = git_path_buf(&buf, "HEAD");
     -+		reinit = (!access(path, R_OK) ||
     -+			  readlink(path, junk, sizeof(junk) - 1) != -1);
     -+	}
     -+
       	if (!reinit) {
       		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
       			exit(1);
     ++	} else {
     ++		/*
     ++		 * XXX should check whether our ref backend matches the
     ++		 * original one; if not, either die() or convert
     ++		 */
       	}
       
       	/* This forces creation of new config file */
     @@ -251,10 +273,8 @@ $^
       
      -	r->refs = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
      +	r->refs = ref_store_init(r->gitdir,
     -+				 /* XXX r->ref_storage_format == NULL. Where
     -+				  * should the config file be parsed out? */
      +				 r->ref_storage_format ? r->ref_storage_format :
     -+							 "reftable",
     ++							 "files",
      +				 REF_STORE_ALL_CAPS);
       	return r->refs;
       }
     @@ -410,13 +430,14 @@ $^
      +{
      +	struct reftable_ref_store *refs =
      +		(struct reftable_ref_store *)ref_store;
     -+
     ++	FILE *file;
      +	safe_create_dir(refs->reftable_dir, 1);
     -+	FILE *f = fopen(refs->table_list_file, "a");
     -+	if (f == NULL) {
     ++
     ++	file = fopen(refs->table_list_file, "a");
     ++	if (file == NULL) {
      +		return -1;
      +	}
     -+	fclose(f);
     ++	fclose(file);
      +	return 0;
      +}
      +
     @@ -777,7 +798,6 @@ $^
      +			log.old_hash = old_oid.hash;
      +		}
      +
     -+		/* XXX should the resolution be done relative or absolute? */
      +		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
      +					    create->target, RESOLVE_REF_READING,
      +					    &new_oid, NULL) != NULL) {
     @@ -1194,9 +1214,7 @@ $^
       	if (worktree)
       		repo_set_worktree(repo, worktree);
       
     -+	repo->ref_storage_format = format.ref_storage != NULL ?
     -+					   xstrdup(format.ref_storage) :
     -+					   "files"; /* XXX */
     ++	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
      +
       	clear_repository_format(&format);
       	return 0;
     @@ -1227,7 +1245,7 @@ $^
      +		} else if (!strcmp(ext, "worktreeconfig")) {
       			data->worktree_config = git_config_bool(var, value);
      -		else
     -+		} else if (!strcmp(ext, "refStorage")) {
     ++		} else if (!strcmp(ext, "refstorage")) {
      +			data->ref_storage = xstrdup(value);
      +		} else
       			string_list_append(&data->unknown_extensions, ext);
     @@ -1241,3 +1259,16 @@ $^
       	init_repository_format(format);
       }
       
     +@@
     + 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
     + 			setup_git_env(gitdir);
     + 		}
     +-		if (startup_info->have_repository)
     ++		if (startup_info->have_repository) {
     + 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
     ++			the_repository->ref_storage_format =
     ++				xstrdup_or_null(repo_fmt.ref_storage);
     ++		}
     + 	}
     + 
     + 	strbuf_release(&dir);

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v4 1/5] refs.h: clarify reflog iteration order
  2020-02-06 22:55     ` [PATCH v4 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-02-06 22:55       ` Han-Wen Nienhuys via GitGitGadget
  2020-02-06 22:55       ` [PATCH v4 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-06 22:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/refs.h b/refs.h
index 545029c6d8..87c9ec921b 100644
--- a/refs.h
+++ b/refs.h
@@ -444,18 +444,21 @@ int delete_refs(const char *msg, struct string_list *refnames,
 int refs_delete_reflog(struct ref_store *refs, const char *refname);
 int delete_reflog(const char *refname);
 
-/* iterate over reflog entries */
+/* Iterate over reflog entries. */
 typedef int each_reflog_ent_fn(
 		struct object_id *old_oid, struct object_id *new_oid,
 		const char *committer, timestamp_t timestamp,
 		int tz, const char *msg, void *cb_data);
 
+/* Iterate in over reflog entries, oldest entry first. */
 int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
 			     each_reflog_ent_fn fn, void *cb_data);
 int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
 				     const char *refname,
 				     each_reflog_ent_fn fn,
 				     void *cb_data);
+
+/* Call a function for each reflog entry, oldest entry first. */
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v4 2/5] create .git/refs in files-backend.c
  2020-02-06 22:55     ` [PATCH v4 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-02-06 22:55       ` [PATCH v4 1/5] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
@ 2020-02-06 22:55       ` Han-Wen Nienhuys via GitGitGadget
  2020-02-06 22:55       ` [PATCH v4 3/5] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
                         ` (4 subsequent siblings)
  6 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-06 22:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This prepares for supporting the reftable format, which will want
create its own file system layout in .git

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/init-db.c    | 2 --
 refs/files-backend.c | 6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 944ec77fe1..45bdea0589 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
 	 */
-	safe_create_dir(git_path("refs"), 1);
-	adjust_shared_perm(git_path("refs"));
 
 	if (refs_init_db(&err))
 		die("failed to set up refs db: %s", err.buf);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0ea66a28b6..1d362e99fb 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3158,9 +3158,15 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
 	struct strbuf sb = STRBUF_INIT;
 
+	files_ref_path(refs, &sb, "refs");
+	safe_create_dir(sb.buf, 1);
+	/* adjust permissions even if directory already exists. */
+	adjust_shared_perm(sb.buf);
+
 	/*
 	 * Create .git/refs/{heads,tags}
 	 */
+	strbuf_reset(&sb);
 	files_ref_path(refs, &sb, "refs/heads");
 	safe_create_dir(sb.buf, 1);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v4 3/5] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-02-06 22:55     ` [PATCH v4 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-02-06 22:55       ` [PATCH v4 1/5] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
  2020-02-06 22:55       ` [PATCH v4 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
@ 2020-02-06 22:55       ` Han-Wen Nienhuys via GitGitGadget
  2020-02-06 22:55       ` [PATCH v4 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-06 22:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index ff2436c0fb..1d7a485220 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -438,6 +438,11 @@ void base_ref_iterator_free(struct ref_iterator *iter);
 
 /* Virtual function declarations for ref_iterators: */
 
+/*
+ * backend-specific implementation of ref_iterator_advance.
+ * For symrefs, the function should set REF_ISSYMREF, and it should also dereference
+ * the symref to provide the OID referent.
+ */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v4 4/5] Add reftable library
  2020-02-06 22:55     ` [PATCH v4 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                         ` (2 preceding siblings ...)
  2020-02-06 22:55       ` [PATCH v4 3/5] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
@ 2020-02-06 22:55       ` Han-Wen Nienhuys via GitGitGadget
  2020-02-06 23:07         ` Junio C Hamano
  2020-02-07  0:16         ` brian m. carlson
  2020-02-06 22:55       ` [PATCH v4 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                         ` (2 subsequent siblings)
  6 siblings, 2 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-06 22:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a new format for storing the ref database. It provides the
following benefits:

 * Simple and fast atomic ref transactions, including multiple refs and reflogs.
 * Compact storage of ref data.
 * Fast look ups of ref data.
 * Case-sensitive ref names on Windows/OSX, regardless of file system
 * Eliminates file/directory conflicts in ref names

Further context and motivation can be found in background reading:

* Spec: https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md

* Original discussion on JGit-dev:  https://www.eclipse.org/lists/jgit-dev/msg03389.html

* First design discussion on git@vger: https://public-inbox.org/git/CAJo=hJtTp2eA3z9wW9cHo-nA7kK40vVThqh6inXpbCcqfdMP9g@mail.gmail.com/

* Last design discussion on git@vger: https://public-inbox.org/git/CAJo=hJsZcAM9sipdVr7TMD-FD2V2W6_pvMQ791EGCDsDkQ033w@mail.gmail.com/

* First attempt at implementation: https://public-inbox.org/git/CAP8UFD0PPZSjBnxCA7ez91vBuatcHKQ+JUWvTD1iHcXzPBjPBg@mail.gmail.com/

* libgit2 support issue: https://github.com/libgit2/libgit2/issues

* GitLab support issue: https://gitlab.com/gitlab-org/git/issues/6

* go-git support issue: https://github.com/src-d/go-git/issues/1059

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE       |   31 ++
 reftable/README.md     |   19 +
 reftable/VERSION       |    5 +
 reftable/basics.c      |  196 +++++++
 reftable/basics.h      |   37 ++
 reftable/block.c       |  401 +++++++++++++++
 reftable/block.h       |   71 +++
 reftable/blocksource.h |   20 +
 reftable/bytes.c       |    0
 reftable/config.h      |    1 +
 reftable/constants.h   |   27 +
 reftable/dump.c        |   97 ++++
 reftable/file.c        |   97 ++++
 reftable/iter.c        |  229 +++++++++
 reftable/iter.h        |   56 ++
 reftable/merged.c      |  286 +++++++++++
 reftable/merged.h      |   34 ++
 reftable/pq.c          |  114 +++++
 reftable/pq.h          |   34 ++
 reftable/reader.c      |  708 +++++++++++++++++++++++++
 reftable/reader.h      |   52 ++
 reftable/record.c      | 1107 ++++++++++++++++++++++++++++++++++++++++
 reftable/record.h      |   79 +++
 reftable/reftable.h    |  399 +++++++++++++++
 reftable/slice.c       |  199 ++++++++
 reftable/slice.h       |   39 ++
 reftable/stack.c       |  983 +++++++++++++++++++++++++++++++++++
 reftable/stack.h       |   40 ++
 reftable/system.h      |   55 ++
 reftable/tree.c        |   66 +++
 reftable/tree.h        |   24 +
 reftable/writer.c      |  623 ++++++++++++++++++++++
 reftable/writer.h      |   46 ++
 reftable/zlib-compat.c |   92 ++++
 34 files changed, 6267 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 0000000000..402e0f9356
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 0000000000..f527da0380
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,19 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   ((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
+    git clone https://github.com/google/reftable reftable-repo) && \
+   cp reftable-repo/c/*.[ch] reftable/ && \
+   cp reftable-repo/LICENSE reftable/ &&
+   git --git-dir reftable-repo/.git show --no-patch origin/master \
+    > reftable/VERSION && \
+   echo '/* empty */' > reftable/config.h
+   rm reftable/*_test.c reftable/test_framework.*
+   git add reftable/*.[ch]
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 0000000000..5d505c6c0a
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,5 @@
+commit e7c3fc3099d9999bc8d895f84027b0e36348d5e6
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Thu Feb 6 20:17:40 2020 +0100
+
+    C: use inttypes.h header definitions
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 0000000000..791dcc867a
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,196 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_u24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)((i)&0xff);
+}
+
+uint32_t get_u24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_u32(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 24) & 0xff);
+	out[1] = (byte)((i >> 16) & 0xff);
+	out[2] = (byte)((i >> 8) & 0xff);
+	out[3] = (byte)((i)&0xff);
+}
+
+uint32_t get_u32(byte *in)
+{
+	return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
+	       (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
+}
+
+void put_u64(byte *out, uint64_t v)
+{
+	int i = 0;
+	for (i = sizeof(uint64_t); i--;) {
+		out[i] = (byte)(v & 0xff);
+		v >>= 8;
+	}
+}
+
+uint64_t get_u64(byte *out)
+{
+	uint64_t v = 0;
+	int i = 0;
+	for (i = 0; i < sizeof(uint64_t); i++) {
+		v = (v << 8) | (byte)(out[i] & 0xff);
+	}
+	return v;
+}
+
+void put_u16(byte *out, uint16_t i)
+{
+	out[0] = (byte)((i >> 8) & 0xff);
+	out[1] = (byte)((i)&0xff);
+}
+
+uint16_t get_u16(byte *in)
+{
+	return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
+}
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(int sz, int (*f)(int k, void *args), void *args)
+{
+	int lo = 0;
+	int hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		int mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		free(*p);
+		p++;
+	}
+	free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	for (char **p = names; *p; p++) {
+		len++;
+	}
+	return len;
+}
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = realloc(names,
+						names_cap * sizeof(char *));
+			}
+			names[names_len++] = strdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *error_str(int err)
+{
+	switch (err) {
+	case IO_ERROR:
+		return "I/O error";
+	case FORMAT_ERROR:
+		return "FORMAT_ERROR";
+	case NOT_EXIST_ERROR:
+		return "NOT_EXIST_ERROR";
+	case LOCK_ERROR:
+		return "LOCK_ERROR";
+	case API_ERROR:
+		return "API_ERROR";
+	case ZLIB_ERROR:
+		return "ZLIB_ERROR";
+	case -1:
+		return "general error";
+	default:
+		return "unknown error code";
+	}
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 0000000000..0ad368cfd3
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,37 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+void put_u24(byte *out, uint32_t i);
+uint32_t get_u24(byte *in);
+
+uint64_t get_u64(byte *in);
+void put_u64(byte *out, uint64_t i);
+
+void put_u32(byte *out, uint32_t i);
+uint32_t get_u32(byte *in);
+
+void put_u16(byte *out, uint16_t i);
+uint16_t get_u16(byte *in);
+int binsearch(int sz, int (*f)(int k, void *args), void *args);
+
+void free_names(char **a);
+void parse_names(char *buf, int size, char ***namesp);
+int names_equal(char **a, char **b);
+int names_length(char **names);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 0000000000..ffbfee13c3
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,401 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "blocksource.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct record rec)
+{
+	struct slice empty = {};
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = {};
+	int n = 0;
+
+	record_key(rec, &key);
+	n = encode_key(&restart, out, last, key, record_val_type(rec));
+	if (n < 0) {
+		goto err;
+	}
+	out.buf += n;
+	out.len -= n;
+
+	n = record_encode(rec, out, w->hash_size);
+	if (n < 0) {
+		goto err;
+	}
+
+	out.buf += n;
+	out.len -= n;
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0) {
+		goto err;
+	}
+
+	free(slice_yield(&key));
+	return 0;
+
+err:
+	free(slice_yield(&key));
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next) {
+		return -1;
+	}
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_u24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_u16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_u24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = {};
+		uLongf dest_len = 0, src_len = 0;
+		slice_resize(&compressed, w->next - block_header_skip);
+
+		dest_len = compressed.len;
+		src_len = w->next - block_header_skip;
+
+		if (Z_OK != compress2(compressed.buf, &dest_len,
+				      w->buf + block_header_skip, src_len, 9)) {
+			free(slice_yield(&compressed));
+			return ZLIB_ERROR;
+		}
+		memcpy(w->buf + block_header_skip, compressed.buf, dest_len);
+		w->next = dest_len + block_header_skip;
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_u24(block->data + header_off + 1);
+
+	if (!is_block_type(typ)) {
+		return FORMAT_ERROR;
+	}
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = {};
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip;
+		uLongf src_len = block->len - block_header_skip;
+
+		slice_resize(&uncompressed, sz);
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			free(slice_yield(&uncompressed));
+			return ZLIB_ERROR;
+		}
+
+		block_source_return_block(block->source, block);
+		block->data = uncompressed.buf;
+		block->len = dst_len; /* XXX: 4 bytes missing? */
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	{
+		uint16_t restart_count = get_u16(block->data + sz - 2);
+		uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+		byte *restart_bytes = block->data + restart_start;
+
+		/* transfer ownership. */
+		br->block = *block;
+		block->data = NULL;
+		block->len = 0;
+
+		br->hash_size = hash_size;
+		br->block_len = restart_start;
+		br->full_block_size = full_block_size;
+		br->header_off = header_off;
+		br->restart_count = restart_count;
+		br->restart_bytes = restart_bytes;
+	}
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_u24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	struct slice key;
+	struct block_reader *r;
+	int error;
+};
+
+static int restart_key_less(int idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = {};
+	struct slice last_key = {};
+	byte unused_extra;
+	int n = decode_key(&rkey, &unused_extra, last_key, in);
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	{
+		int result = slice_compare(a->key, rkey);
+		free(slice_yield(&rkey));
+		return result;
+	}
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct record rec)
+{
+	if (it->next_off >= it->br->block_len) {
+		return 1;
+	}
+
+	{
+		struct slice in = {
+			.buf = it->br->block.data + it->next_off,
+			.len = it->br->block_len - it->next_off,
+		};
+		struct slice start = in;
+		struct slice key = {};
+		byte extra;
+		int n = decode_key(&key, &extra, it->last_key, in);
+		if (n < 0) {
+			return -1;
+		}
+
+		in.buf += n;
+		in.len -= n;
+		n = record_decode(rec, key, extra, in, it->br->hash_size);
+		if (n < 0) {
+			return -1;
+		}
+		in.buf += n;
+		in.len -= n;
+
+		slice_copy(&it->last_key, key);
+		it->next_off += start.len - in.len;
+		free(slice_yield(&key));
+		return 0;
+	}
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = {};
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	byte extra = 0;
+	int n = decode_key(key, &extra, empty, in);
+	if (n < 0) {
+		return n;
+	}
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	free(slice_yield(&it->last_key));
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		return -1;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	{
+		struct record rec = new_record(block_reader_type(br));
+		struct slice key = {};
+		int result = 0;
+		int err = 0;
+		struct block_iter next = {};
+		while (true) {
+			block_iter_copy_from(&next, it);
+
+			err = block_iter_next(&next, rec);
+			if (err < 0) {
+				result = -1;
+				goto exit;
+			}
+
+			record_key(rec, &key);
+			if (err > 0 || slice_compare(key, want) >= 0) {
+				result = 0;
+				goto exit;
+			}
+
+			block_iter_copy_from(it, &next);
+		}
+
+	exit:
+		free(slice_yield(&key));
+		free(slice_yield(&next.last_key));
+		record_clear(rec);
+		free(record_yield(&rec));
+
+		return result;
+	}
+}
+
+void block_writer_reset(struct block_writer *bw)
+{
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	free(slice_yield(&bw->last_key));
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 0000000000..bb42588111
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,71 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+	uint32_t header_off;
+	int restart_interval;
+	int hash_size;
+
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+	struct slice last_key;
+	int entries;
+};
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+byte block_writer_type(struct block_writer *bw);
+int block_writer_add(struct block_writer *w, struct record rec);
+int block_writer_finish(struct block_writer *w);
+void block_writer_reset(struct block_writer *bw);
+void block_writer_clear(struct block_writer *bw);
+
+struct block_reader {
+	uint32_t header_off;
+	struct block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint32_t full_block_size;
+	uint16_t restart_count;
+};
+
+struct block_iter {
+	struct block_reader *br;
+	struct slice last_key;
+	uint32_t next_off;
+};
+
+int block_reader_init(struct block_reader *br, struct block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want);
+byte block_reader_type(struct block_reader *r);
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+int block_iter_next(struct block_iter *it, struct record rec);
+int block_iter_seek(struct block_iter *it, struct slice want);
+void block_iter_close(struct block_iter *it);
+
+#endif
diff --git a/reftable/blocksource.h b/reftable/blocksource.h
new file mode 100644
index 0000000000..f3ad3a4c22
--- /dev/null
+++ b/reftable/blocksource.h
@@ -0,0 +1,20 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCKSOURCE_H
+#define BLOCKSOURCE_H
+
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source source);
+
+#endif
diff --git a/reftable/bytes.c b/reftable/bytes.c
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/reftable/config.h b/reftable/config.h
new file mode 100644
index 0000000000..40a8c178f1
--- /dev/null
+++ b/reftable/config.h
@@ -0,0 +1 @@
+/* empty */
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 0000000000..cd35704610
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,27 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+#define VERSION 1
+#define HEADER_SIZE 24
+#define FOOTER_SIZE 68
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 0000000000..acabe18fbe
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "reftable.h"
+
+static int dump_table(const char *tablename)
+{
+	struct block_source src = {};
+	int err = block_source_from_file(&src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	struct reader *r = NULL;
+	err = new_reader(&r, src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct iterator it = {};
+		err = reader_seek_ref(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+
+		struct ref_record ref = {};
+		while (1) {
+			err = iterator_next_ref(it, &ref);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			ref_record_print(&ref, 20);
+		}
+		iterator_destroy(&it);
+		ref_record_clear(&ref);
+	}
+
+	{
+		struct iterator it = {};
+		err = reader_seek_log(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+		struct log_record log = {};
+		while (1) {
+			err = iterator_next_log(it, &log);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			log_record_print(&log, 20);
+		}
+		iterator_destroy(&it);
+		log_record_clear(&log);
+	}
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int opt;
+	const char *table = NULL;
+	while ((opt = getopt(argc, argv, "t:")) != -1) {
+		switch (opt) {
+		case 't':
+			table = strdup(optarg);
+			break;
+		case '?':
+			printf("usage: %s [-table tablefile]\n", argv[0]);
+			return 2;
+			break;
+		}
+	}
+
+	if (table != NULL) {
+		int err = dump_table(table);
+		if (err < 0) {
+			fprintf(stderr, "%s: %s: %s\n", argv[0], table,
+				error_str(err));
+			return 1;
+		}
+	}
+	return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 0000000000..b2ea90bf94
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	free(b);
+}
+
+static int file_read_block(void *v, struct block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size) {
+		return -1;
+	}
+	dest->len = size;
+	return size;
+}
+
+struct block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int block_source_from_file(struct block_source *bs, const char *name)
+{
+	struct stat st = {};
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0) {
+		return -1;
+	}
+
+	{
+		struct file_block_source *p =
+			calloc(sizeof(struct file_block_source), 1);
+		p->size = st.st_size;
+		p->fd = fd;
+
+		bs->ops = &file_vtable;
+		bs->arg = p;
+	}
+	return 0;
+}
+
+int fd_writer(void *arg, byte *data, int sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 0000000000..c06c891366
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,229 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct iterator it)
+{
+	return it.ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct record rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct iterator *it)
+{
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct iterator it, struct record rec)
+{
+	return it.ops->next(it.iter_arg, rec);
+}
+
+void iterator_destroy(struct iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int iterator_next_ref(struct iterator it, struct ref_record *ref)
+{
+	struct record rec = {};
+	record_from_ref(&rec, ref);
+	return iterator_next(it, rec);
+}
+
+int iterator_next_log(struct iterator it, struct log_record *log)
+{
+	struct record rec = {};
+	record_from_log(&rec, log);
+	return iterator_next(it, rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	free(slice_yield(&fri->oid));
+	iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct ref_record *ref = (struct ref_record *)rec.data;
+
+	while (true) {
+		int err = iterator_next_ref(fri->it, ref);
+		if (err != 0) {
+			return err;
+		}
+
+		if (fri->double_check) {
+			struct iterator it = {};
+
+			int err = reader_seek_ref(fri->r, &it, ref->ref_name);
+			if (err == 0) {
+				err = iterator_next_ref(it, ref);
+			}
+
+			iterator_destroy(&it);
+
+			if (err < 0) {
+				return err;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+}
+
+struct iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reader_return_block(it->r, &it->block_reader.block);
+	free(slice_yield(&it->oid));
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reader_return_block(it->r, &it->block_reader.block);
+
+	{
+		uint64_t off = it->offsets[it->offset_idx++];
+		int err = reader_init_block_reader(it->r, &it->block_reader,
+						   off, BLOCK_TYPE_REF);
+		if (err < 0) {
+			return err;
+		}
+		if (err > 0) {
+			/* indexed block does not exist. */
+			return FORMAT_ERROR;
+		}
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct record rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct ref_record *ref = (struct ref_record *)rec.data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reader *r, byte *oid, int oid_len,
+			       uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter *itr =
+		calloc(sizeof(struct indexed_table_ref_iter), 1);
+	int err = 0;
+
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 0000000000..f497f2a27e
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,56 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct iterator_vtable {
+	int (*next)(void *iter_arg, struct record rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct iterator *it);
+int iterator_next(struct iterator it, struct record rec);
+bool iterator_is_null(struct iterator it);
+
+struct filtering_ref_iterator {
+	struct reader *r;
+	struct slice oid;
+	bool double_check;
+	struct iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *,
+					  struct filtering_ref_iterator *);
+
+struct indexed_table_ref_iter {
+	struct reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reader *r, byte *oid, int oid_len,
+			       uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 0000000000..7d52ec6232
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,286 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct record rec = new_record(mi->typ);
+		int err = iterator_next(mi->stack[i], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			iterator_destroy(&mi->stack[i]);
+			record_clear(rec);
+			free(record_yield(&rec));
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		iterator_destroy(&mi->stack[i]);
+	}
+	free(mi->stack);
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, int idx)
+{
+	if (iterator_is_null(mi->stack[idx])) {
+		return 0;
+	}
+
+	{
+		struct record rec = new_record(mi->typ);
+		struct pq_entry e = {
+			.rec = rec,
+			.index = idx,
+		};
+		int err = iterator_next(mi->stack[idx], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			iterator_destroy(&mi->stack[idx]);
+			record_clear(rec);
+			free(record_yield(&rec));
+			return 0;
+		}
+
+		merged_iter_pqueue_add(&mi->pq, e);
+	}
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct record rec)
+{
+	struct slice entry_key = {};
+	struct pq_entry entry = merged_iter_pqueue_remove(&mi->pq);
+	int err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0) {
+		return err;
+	}
+
+	record_key(entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = {};
+		int err = 0, cmp = 0;
+
+		record_key(top.rec, &k);
+
+		cmp = slice_compare(k, entry_key);
+		free(slice_yield(&k));
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		record_clear(top.rec);
+		free(record_yield(&top.rec));
+	}
+
+	record_copy_from(rec, entry.rec, mi->hash_size);
+	record_clear(entry.rec);
+	free(record_yield(&entry.rec));
+	free(slice_yield(&entry_key));
+	return 0;
+}
+
+static int merged_iter_next_void(void *p, struct record rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	return merged_iter_next(mi, rec);
+}
+
+struct iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct iterator *it,
+				      struct merged_iter *mi)
+{
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n)
+{
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reader *r = stack[i];
+		if (i > 0 && last_max >= reader_min_update_index(r)) {
+			return FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reader_min_update_index(r);
+		}
+
+		last_max = reader_max_update_index(r);
+	}
+
+	{
+		struct merged_table m = {
+			.stack = stack,
+			.stack_len = n,
+			.min = first_min,
+			.max = last_max,
+			.hash_size = SHA1_SIZE,
+		};
+
+		*dest = calloc(sizeof(struct merged_table), 1);
+		**dest = m;
+	}
+	return 0;
+}
+
+void merged_table_close(struct merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void merged_table_free(struct merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	free(mt);
+}
+
+uint64_t merged_max_update_index(struct merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t merged_min_update_index(struct merged_table *mt)
+{
+	return mt->min;
+}
+
+static int merged_table_seek_record(struct merged_table *mt,
+				    struct iterator *it, struct record rec)
+{
+	struct iterator *iters = calloc(sizeof(struct iterator), mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = record_type(rec),
+		.hash_size = mt->hash_size,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			iterator_destroy(&iters[i]);
+		}
+		free(iters);
+		return err;
+	}
+
+	merged.stack_len = n, err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	}
+
+	{
+		struct merged_iter *p = malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+			  const char *name)
+{
+	struct ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = {};
+	record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+			     const char *name, uint64_t update_index)
+{
+	struct log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = {};
+	record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+			  const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 0000000000..b8d3572e26
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct merged_table {
+	struct reader **stack;
+	int stack_len;
+	int hash_size;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct iterator *stack;
+	int hash_size;
+	int stack_len;
+	byte typ;
+	struct merged_iter_pqueue pq;
+} merged_iter;
+
+void merged_table_clear(struct merged_table *mt);
+
+#endif
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 0000000000..a1aff7c98c
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,114 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = {};
+	struct slice bk = {};
+	int cmp = 0;
+	record_key(a.rec, &ak);
+	record_key(b.rec, &bk);
+
+	cmp = slice_compare(ak, bk);
+
+	free(slice_yield(&ak));
+	free(slice_yield(&bk));
+
+	if (cmp == 0) {
+		return a.index > b.index;
+	}
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = realloc(pq->heap, pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		record_clear(pq->heap[i].rec);
+		free(record_yield(&pq->heap[i].rec));
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 0000000000..5f7018979d
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	struct record rec;
+	int index;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 0000000000..981911f076
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,708 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct block_source source)
+{
+	return source.ops->size(source.arg);
+}
+
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size)
+{
+	int result = source.ops->read_block(source.arg, dest, off, size);
+	dest->source = source;
+	return result;
+}
+
+void block_source_return_block(struct block_source source, struct block *blockp)
+{
+	source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reader_offsets *reader_offsets_for(struct reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reader *r, struct block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size) {
+		return 0;
+	}
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(r->source, dest, off, sz);
+}
+
+void reader_return_block(struct reader *r, struct block *p)
+{
+	block_source_return_block(r->source, p);
+}
+
+const char *reader_name(struct reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	int err = 0;
+	if (memcmp(f, "REFT", 4)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, HEADER_SIZE)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+
+	{
+		byte version = *f++;
+		if (version != 1) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	r->block_size = get_u24(f);
+
+	f += 3;
+	r->min_update_index = get_u64(f);
+	f += 8;
+	r->max_update_index = get_u64(f);
+	f += 8;
+
+	r->ref_offsets.index_offset = get_u64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_u64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_u64(f);
+	f += 8;
+	r->log_offsets.offset = get_u64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_u64(f);
+	f += 8;
+
+	{
+		uint32_t computed_crc = crc32(0, footer, f - footer);
+		uint32_t file_crc = get_u32(f);
+		f += 4;
+		if (computed_crc != file_crc) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	{
+		byte first_block_typ = header[HEADER_SIZE];
+		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+		r->ref_offsets.offset = 0;
+		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+					  r->log_offsets.offset > 0);
+		r->obj_offsets.present = r->obj_offsets.offset > 0;
+	}
+	err = 0;
+exit:
+	return err;
+}
+
+int init_reader(struct reader *r, struct block_source source, const char *name)
+{
+	struct block footer = {};
+	struct block header = {};
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reader));
+	r->size = block_source_size(source) - FOOTER_SIZE;
+	r->source = source;
+	r->name = strdup(name);
+	r->hash_size = SHA1_SIZE;
+
+	err = block_source_read_block(source, &footer, r->size, FOOTER_SIZE);
+	if (err != FOOTER_SIZE) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	/* Need +1 to read type of first block. */
+	err = reader_get_block(r, &header, 0, HEADER_SIZE + 1);
+	if (err != HEADER_SIZE + 1) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+exit:
+	block_source_return_block(r->source, &footer);
+	block_source_return_block(r->source, &header);
+	return err;
+}
+
+struct table_iter {
+	struct reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
+		((struct ref_record *)rec.data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reader_return_block(ti->r, &ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += 24;
+	}
+
+	*typ = data[0];
+	if (is_block_type(*typ)) {
+		result = get_u24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct block block = {};
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : HEADER_SIZE;
+	int32_t block_size = 0;
+
+	if (next_off >= r->size) {
+		return 1;
+	}
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0) {
+		return err;
+	}
+
+	block_size = extract_block_size(block.data, &block_typ, next_off);
+	if (block_size < 0) {
+		return block_size;
+	}
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reader_return_block(r, &block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reader_return_block(r, &block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 r->hash_size);
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = {};
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0) {
+		return err;
+	}
+
+	{
+		struct block_reader *brp = malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct record rec)
+{
+	if (record_type(rec) != ti->typ) {
+		return API_ERROR;
+	}
+
+	while (true) {
+		struct table_iter next = {};
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct record rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct iterator *it, struct table_iter *ti)
+{
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reader *r, struct table_iter *ti,
+				uint64_t off, byte typ)
+{
+	struct block_reader br = {};
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0) {
+		return err;
+	}
+
+	brp = malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reader *r, struct table_iter *ti, byte typ,
+			bool index)
+{
+	struct reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reader *r, struct table_iter *ti,
+			      struct record want)
+{
+	struct record rec = new_record(record_type(want));
+	struct slice want_key = {};
+	struct slice got_key = {};
+	struct table_iter next = {};
+	int err = -1;
+	record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0) {
+			goto exit;
+		}
+		{
+			int cmp = slice_compare(got_key, want_key);
+			if (cmp > 0) {
+				table_iter_block_done(&next);
+				break;
+			}
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0) {
+		goto exit;
+	}
+	err = 0;
+
+exit:
+	block_iter_close(&next.bi);
+	record_clear(rec);
+	free(record_yield(&rec));
+	free(slice_yield(&want_key));
+	free(slice_yield(&got_key));
+	return err;
+}
+
+static int reader_seek_indexed(struct reader *r, struct iterator *it,
+			       struct record rec)
+{
+	struct index_record want_index = {};
+	struct record want_index_rec = {};
+	struct index_record index_result = {};
+	struct record index_result_rec = {};
+	struct table_iter index_iter = {};
+	struct table_iter next = {};
+	int err = 0;
+
+	record_key(rec, &want_index.last_key);
+	record_from_index(&want_index_rec, &want_index);
+	record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, record_type(rec), true);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reader_seek_linear(r, &index_iter, want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (next.typ == record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter *malloced =
+			calloc(sizeof(struct table_iter), 1);
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+exit:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	record_clear(want_index_rec);
+	record_clear(index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reader *r, struct iterator *it,
+				struct record rec)
+{
+	struct reader_offsets *offs = reader_offsets_for(r, record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = {};
+	int err = 0;
+	if (idx > 0) {
+		return reader_seek_indexed(r, it, rec);
+	}
+
+	err = reader_start(r, &ti, record_type(rec), false);
+	if (err < 0) {
+		return err;
+	}
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct table_iter *p = malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reader *r, struct iterator *it, struct record rec)
+{
+	byte typ = record_type(rec);
+
+	struct reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name)
+{
+	struct ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = {};
+	record_from_ref(&rec, &ref);
+	return reader_seek(r, it, rec);
+}
+
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+		       uint64_t update_index)
+{
+	struct log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = {};
+	record_from_log(&rec, &log);
+	return reader_seek(r, it, rec);
+}
+
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int new_reader(struct reader **p, struct block_source src, char const *name)
+{
+	struct reader *rd = calloc(sizeof(struct reader), 1);
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		free(rd);
+	}
+	return err;
+}
+
+void reader_free(struct reader *r)
+{
+	reader_close(r);
+	free(r);
+}
+
+static int reader_refs_for_indexed(struct reader *r, struct iterator *it,
+				   byte *oid)
+{
+	struct obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct record want_rec = {};
+	struct iterator oit = {};
+	struct obj_record got = {};
+	struct record got_rec = {};
+	int err = 0;
+
+	record_from_obj(&want_rec, &want);
+
+	err = reader_seek(r, &oit, want_rec);
+	if (err != 0) {
+		return err;
+	}
+
+	record_from_obj(&got_rec, &got);
+	err = iterator_next(oit, got_rec);
+	iterator_destroy(&oit);
+	if (err < 0) {
+		return err;
+	}
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	{
+		struct indexed_table_ref_iter *itr = NULL;
+		err = new_indexed_table_ref_iter(&itr, r, oid, r->hash_size,
+						 got.offsets, got.offset_len);
+		if (err < 0) {
+			record_clear(got_rec);
+			return err;
+		}
+		got.offsets = NULL;
+		record_clear(got_rec);
+
+		iterator_from_indexed_table_ref_iter(it, itr);
+	}
+
+	return 0;
+}
+
+static int reader_refs_for_unindexed(struct reader *r, struct iterator *it,
+				     byte *oid, int oid_len)
+{
+	struct table_iter *ti = calloc(sizeof(struct table_iter), 1);
+	struct filtering_ref_iterator *filter = NULL;
+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		free(ti);
+		return err;
+	}
+
+	filter = calloc(sizeof(struct filtering_ref_iterator), 1);
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	filter->r = r;
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
+		    int oid_len)
+{
+	if (r->obj_offsets.present) {
+		return reader_refs_for_indexed(r, it, oid);
+	}
+	return reader_refs_for_unindexed(r, it, oid, oid_len);
+}
+
+uint64_t reader_max_update_index(struct reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reader_min_update_index(struct reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 0000000000..599a90028e
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,52 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source *source);
+
+struct reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+struct reader {
+	struct block_source source;
+	char *name;
+	int hash_size;
+	uint64_t size;
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	int object_id_len;
+
+	struct reader_offsets ref_offsets;
+	struct reader_offsets obj_offsets;
+	struct reader_offsets log_offsets;
+};
+
+int init_reader(struct reader *r, struct block_source source, const char *name);
+int reader_seek(struct reader *r, struct iterator *it, struct record rec);
+void reader_close(struct reader *r);
+const char *reader_name(struct reader *r);
+void reader_return_block(struct reader *r, struct block *p);
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 0000000000..a006f9019d
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1107 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0) {
+		return -1;
+	}
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = {};
+	int i = 9;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	{
+		int n = sizeof(buf) - i - 1;
+		if (dest.len < n) {
+			return -1;
+		}
+		memcpy(dest.buf, &buf[i + 1], n);
+		return n;
+	}
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+	if (in.len < tsize) {
+		return -1;
+	}
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	in.buf += tsize;
+	in.len -= tsize;
+
+	return start_len - in.len;
+}
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0) {
+		return -1;
+	}
+	dest.buf += n;
+	dest.len -= n;
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0) {
+		return -1;
+	}
+	dest.buf += n;
+	dest.len -= n;
+
+	if (dest.len < suffix_len) {
+		return -1;
+	}
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	dest.buf += suffix_len;
+	dest.len -= suffix_len;
+
+	return start.len - dest.len;
+}
+
+static byte ref_record_type(void)
+{
+	return BLOCK_TYPE_REF;
+}
+
+static void ref_record_key(const void *r, struct slice *dest)
+{
+	const struct ref_record *rec = (const struct ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void ref_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct ref_record *ref = (struct ref_record *)rec;
+	struct ref_record *src = (struct ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = strdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = strdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9) {
+		return '0' + c;
+	}
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void ref_record_print(struct ref_record *ref, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = {};
+
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size);
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size);
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void ref_record_clear_void(void *rec)
+{
+	ref_record_clear((struct ref_record *)rec);
+}
+
+void ref_record_clear(struct ref_record *ref)
+{
+	free(ref->ref_name);
+	free(ref->target);
+	free(ref->target_value);
+	free(ref->value);
+	memset(ref, 0, sizeof(struct ref_record));
+}
+
+static byte ref_record_val_type(const void *rec)
+{
+	const struct ref_record *r = (const struct ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL) {
+		return 3;
+	}
+	return 0;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+	if (s.len < l) {
+		return -1;
+	}
+	memcpy(s.buf, str, l);
+	s.buf += l;
+	s.len -= l;
+
+	return start.len - s.len;
+}
+
+static int ref_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	const struct ref_record *r = (const struct ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		s.buf += hash_size;
+		s.len -= hash_size;
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		s.buf += hash_size;
+		s.len -= hash_size;
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		s.buf += n;
+		s.len -= n;
+	}
+
+	return start.len - s.len;
+}
+
+static int ref_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct ref_record *r = (struct ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0) {
+		return n;
+	}
+	assert(hash_size > 0);
+
+	in.buf += n;
+	in.len -= n;
+
+	r->ref_name = realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		in.buf += hash_size;
+		in.len -= hash_size;
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		in.buf += hash_size;
+		in.len -= hash_size;
+		break;
+	case 3: {
+		struct slice dest = {};
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		in.buf += n;
+		in.len -= n;
+		seen_target = true;
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+
+	if (prefix_len > last_key.len) {
+		return -1;
+	}
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len) {
+		return -1;
+	}
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	in.buf += suffix_len;
+	in.len -= suffix_len;
+
+	return start_len - in.len;
+}
+
+struct record_vtable ref_record_vtable = {
+	.key = &ref_record_key,
+	.type = &ref_record_type,
+	.copy_from = &ref_record_copy_from,
+	.val_type = &ref_record_val_type,
+	.encode = &ref_record_encode,
+	.decode = &ref_record_decode,
+	.clear = &ref_record_clear_void,
+};
+
+static byte obj_record_type(void)
+{
+	return BLOCK_TYPE_OBJ;
+}
+
+static void obj_record_key(const void *r, struct slice *dest)
+{
+	const struct obj_record *rec = (const struct obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	const struct obj_record *src = (const struct obj_record *)src_rec;
+
+	*ref = *src;
+	ref->hash_prefix = malloc(ref->hash_prefix_len);
+	memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
+
+	{
+		int olen = ref->offset_len * sizeof(uint64_t);
+		ref->offsets = malloc(olen);
+		memcpy(ref->offsets, src->offsets, olen);
+	}
+}
+
+static void obj_record_clear(void *rec)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	FREE_AND_NULL(ref->hash_prefix);
+	FREE_AND_NULL(ref->offsets);
+	memset(ref, 0, sizeof(struct obj_record));
+}
+
+static byte obj_record_val_type(const void *rec)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8) {
+		return r->offset_len;
+	}
+	return 0;
+}
+
+static int obj_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		s.buf += n;
+		s.len -= n;
+	}
+	if (r->offset_len == 0) {
+		return start.len - s.len;
+	}
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	{
+		uint64_t last = r->offsets[0];
+		int i = 0;
+		for (i = 1; i < r->offset_len; i++) {
+			int n = put_var_int(s, r->offsets[i] - last);
+			if (n < 0) {
+				return -1;
+			}
+			s.buf += n;
+			s.len -= n;
+			last = r->offsets[i];
+		}
+	}
+	return start.len - s.len;
+}
+
+static int obj_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct obj_record *r = (struct obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	r->hash_prefix = malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		in.buf += n;
+		in.len -= n;
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0) {
+		return start.len - in.len;
+	}
+
+	r->offsets = malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0) {
+		return n;
+	}
+
+	in.buf += n;
+	in.len -= n;
+
+	{
+		uint64_t last = r->offsets[0];
+		int j = 1;
+		while (j < count) {
+			uint64_t delta = 0;
+			int n = get_var_int(&delta, in);
+			if (n < 0) {
+				return n;
+			}
+
+			in.buf += n;
+			in.len -= n;
+
+			last = r->offsets[j] = (delta + last);
+			j++;
+		}
+	}
+	return start.len - in.len;
+}
+
+struct record_vtable obj_record_vtable = {
+	.key = &obj_record_key,
+	.type = &obj_record_type,
+	.copy_from = &obj_record_copy_from,
+	.val_type = &obj_record_val_type,
+	.encode = &obj_record_encode,
+	.decode = &obj_record_decode,
+	.clear = &obj_record_clear,
+};
+
+void log_record_print(struct log_record *log, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = {};
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size);
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size);
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static byte log_record_type(void)
+{
+	return BLOCK_TYPE_LOG;
+}
+
+static void log_record_key(const void *r, struct slice *dest)
+{
+	const struct log_record *rec = (const struct log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_u64(dest->buf + 1 + len, ts);
+}
+
+static void log_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct log_record *dst = (struct log_record *)rec;
+	const struct log_record *src = (const struct log_record *)src_rec;
+
+	*dst = *src;
+	dst->ref_name = strdup(dst->ref_name);
+	dst->email = strdup(dst->email);
+	dst->name = strdup(dst->name);
+	dst->message = strdup(dst->message);
+	if (dst->new_hash != NULL) {
+		dst->new_hash = malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void log_record_clear_void(void *rec)
+{
+	struct log_record *r = (struct log_record *)rec;
+	log_record_clear(r);
+}
+
+void log_record_clear(struct log_record *r)
+{
+	free(r->ref_name);
+	free(r->new_hash);
+	free(r->old_hash);
+	free(r->name);
+	free(r->email);
+	free(r->message);
+	memset(r, 0, sizeof(struct log_record));
+}
+
+static byte log_record_val_type(const void *rec)
+{
+	return 1;
+}
+
+static byte zero[SHA256_SIZE] = {};
+
+static int log_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct log_record *r = (struct log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size) {
+		return -1;
+	}
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	s.buf += 2 * hash_size;
+	s.len -= 2 * hash_size;
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	n = put_var_int(s, r->time);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	if (s.len < 2) {
+		return -1;
+	}
+
+	put_u16(s.buf, r->tz_offset);
+	s.buf += 2;
+	s.len -= 2;
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	return start.len - s.len;
+}
+
+static int log_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct log_record *r = (struct log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = {};
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+		return FORMAT_ERROR;
+	}
+
+	r->ref_name = realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_u64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (in.len < 2 * hash_size) {
+		return FORMAT_ERROR;
+	}
+
+	r->old_hash = realloc(r->old_hash, hash_size);
+	r->new_hash = realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	in.buf += 2 * hash_size;
+	in.len -= 2 * hash_size;
+
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->name = realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->email = realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+	r->time = ts;
+	if (in.len < 2) {
+		goto error;
+	}
+
+	r->tz_offset = get_u16(in.buf);
+	in.buf += 2;
+	in.len -= 2;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->message = realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	return start.len - in.len;
+
+error:
+	free(slice_yield(&dest));
+	return FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL) {
+		a = empty;
+	}
+	if (b == NULL) {
+		b = empty;
+	}
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL) {
+		a = zero;
+	}
+	if (b == NULL) {
+		b = zero;
+	}
+	return !memcmp(a, b, sz);
+}
+
+bool log_record_equal(struct log_record *a, struct log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+struct record_vtable log_record_vtable = {
+	.key = &log_record_key,
+	.type = &log_record_type,
+	.copy_from = &log_record_copy_from,
+	.val_type = &log_record_val_type,
+	.encode = &log_record_encode,
+	.decode = &log_record_decode,
+	.clear = &log_record_clear_void,
+};
+
+struct record new_record(byte typ)
+{
+	struct record rec;
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct ref_record *r = calloc(1, sizeof(struct ref_record));
+		record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct obj_record *r = calloc(1, sizeof(struct obj_record));
+		record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct log_record *r = calloc(1, sizeof(struct log_record));
+		record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct index_record *r = calloc(1, sizeof(struct index_record));
+		record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+static byte index_record_type(void)
+{
+	return BLOCK_TYPE_INDEX;
+}
+
+static void index_record_key(const void *r, struct slice *dest)
+{
+	struct index_record *rec = (struct index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void index_record_copy_from(void *rec, const void *src_rec,
+				   int hash_size)
+{
+	struct index_record *dst = (struct index_record *)rec;
+	struct index_record *src = (struct index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void index_record_clear(void *rec)
+{
+	struct index_record *idx = (struct index_record *)rec;
+	free(slice_yield(&idx->last_key));
+}
+
+static byte index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int index_record_encode(const void *rec, struct slice out, int hash_size)
+{
+	const struct index_record *r = (const struct index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0) {
+		return n;
+	}
+
+	out.buf += n;
+	out.len -= n;
+
+	return start.len - out.len;
+}
+
+static int index_record_decode(void *rec, struct slice key, byte val_type,
+			       struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct index_record *r = (struct index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0) {
+		return n;
+	}
+
+	in.buf += n;
+	in.len -= n;
+	return start.len - in.len;
+}
+
+struct record_vtable index_record_vtable = {
+	.key = &index_record_key,
+	.type = &index_record_type,
+	.copy_from = &index_record_copy_from,
+	.val_type = &index_record_val_type,
+	.encode = &index_record_encode,
+	.decode = &index_record_decode,
+	.clear = &index_record_clear,
+};
+
+void record_key(struct record rec, struct slice *dest)
+{
+	rec.ops->key(rec.data, dest);
+}
+
+byte record_type(struct record rec)
+{
+	return rec.ops->type();
+}
+
+int record_encode(struct record rec, struct slice dest, int hash_size)
+{
+	return rec.ops->encode(rec.data, dest, hash_size);
+}
+
+void record_copy_from(struct record rec, struct record src, int hash_size)
+{
+	assert(src.ops->type() == rec.ops->type());
+
+	rec.ops->copy_from(rec.data, src.data, hash_size);
+}
+
+byte record_val_type(struct record rec)
+{
+	return rec.ops->val_type(rec.data);
+}
+
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size)
+{
+	return rec.ops->decode(rec.data, key, extra, src, hash_size);
+}
+
+void record_clear(struct record rec)
+{
+	return rec.ops->clear(rec.data);
+}
+
+void record_from_ref(struct record *rec, struct ref_record *ref_rec)
+{
+	rec->data = ref_rec;
+	rec->ops = &ref_record_vtable;
+}
+
+void record_from_obj(struct record *rec, struct obj_record *obj_rec)
+{
+	rec->data = obj_rec;
+	rec->ops = &obj_record_vtable;
+}
+
+void record_from_index(struct record *rec, struct index_record *index_rec)
+{
+	rec->data = index_rec;
+	rec->ops = &index_record_vtable;
+}
+
+void record_from_log(struct record *rec, struct log_record *log_rec)
+{
+	rec->data = log_rec;
+	rec->ops = &log_record_vtable;
+}
+
+void *record_yield(struct record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+struct ref_record *record_as_ref(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_REF);
+	return (struct ref_record *)rec.data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL) {
+		return !memcmp(a, b, hash_size);
+	}
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL) {
+		return 0 == strcmp(a, b);
+	}
+
+	return a == b;
+}
+
+bool ref_record_equal(struct ref_record *a, struct ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct ref_record *)a)->ref_name,
+		      ((struct ref_record *)b)->ref_name);
+}
+
+bool ref_record_is_deletion(const struct ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int log_record_compare_key(const void *a, const void *b)
+{
+	struct log_record *la = (struct log_record *)a;
+	struct log_record *lb = (struct log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp) {
+		return cmp;
+	}
+	if (la->update_index > lb->update_index) {
+		return -1;
+	}
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool log_record_is_deletion(const struct log_record *log)
+{
+	/* XXX */
+	return false;
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 0000000000..dffdd71fc2
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,79 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+struct record_vtable {
+	void (*key)(const void *rec, struct slice *dest);
+	byte (*type)(void);
+	void (*copy_from)(void *rec, const void *src, int hash_size);
+	byte (*val_type)(const void *rec);
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+	void (*clear)(void *rec);
+};
+
+/* record is a generic wrapper for differnt types of records. */
+struct record {
+	void *data;
+	struct record_vtable *ops;
+};
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+int common_prefix_size(struct slice a, struct slice b);
+
+int is_block_type(byte typ);
+struct record new_record(byte typ);
+
+extern struct record_vtable ref_record_vtable;
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra);
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in);
+
+struct index_record {
+	struct slice last_key;
+	uint64_t offset;
+};
+
+struct obj_record {
+	byte *hash_prefix;
+	int hash_prefix_len;
+	uint64_t *offsets;
+	int offset_len;
+};
+
+void record_key(struct record rec, struct slice *dest);
+byte record_type(struct record rec);
+void record_copy_from(struct record rec, struct record src, int hash_size);
+byte record_val_type(struct record rec);
+int record_encode(struct record rec, struct slice dest, int hash_size);
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size);
+void record_clear(struct record rec);
+void *record_yield(struct record *rec);
+void record_from_obj(struct record *rec, struct obj_record *objrec);
+void record_from_index(struct record *rec, struct index_record *idxrec);
+void record_from_ref(struct record *rec, struct ref_record *refrec);
+void record_from_log(struct record *rec, struct log_record *logrec);
+struct ref_record *record_as_ref(struct record ref);
+
+/* for qsort. */
+int ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 0000000000..d84cc2004a
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,399 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include "system.h"
+
+typedef uint8_t byte;
+typedef byte bool;
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct block_source {
+	struct block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct block {
+	byte *data;
+	int len;
+	struct block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct block *dest, uint64_t off,
+			  uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int block_source_from_file(struct block_source *block_src, const char *name);
+
+/* write_options sets options for writing a single reftable. */
+struct write_options {
+	/* do not pad out blocks to block size. */
+	bool unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* do not generate a SHA1 => ref index. */
+	bool skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+};
+
+/* ref_record holds a ref database entry target_value */
+struct ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	byte *value; /* SHA1, or NULL. malloced. */
+	byte *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+bool ref_record_is_deletion(const struct ref_record *ref);
+
+/* prints a ref_record onto stdout */
+void ref_record_print(struct ref_record *ref, int hash_size);
+
+/* frees and nulls all pointer values. */
+void ref_record_clear(struct ref_record *ref);
+
+/* returns whether two ref_records are the same */
+bool ref_record_equal(struct ref_record *a, struct ref_record *b,
+		      int hash_size);
+
+/* log_record holds a reflog entry */
+struct log_record {
+	char *ref_name;
+	uint64_t update_index;
+	byte *new_hash;
+	byte *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+bool log_record_is_deletion(const struct log_record *log);
+
+/* frees and nulls all pointer values. */
+void log_record_clear(struct log_record *log);
+
+/* returns whether two records are equal. */
+bool log_record_equal(struct log_record *a, struct log_record *b,
+		      int hash_size);
+
+void log_record_print(struct log_record *log, int hash_size);
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct iterator {
+	struct iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_ref(struct iterator it, struct ref_record *ref);
+
+/* reads the next log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_log(struct iterator it, struct log_record *log);
+
+/* releases resources associated with an iterator. */
+void iterator_destroy(struct iterator *it);
+
+/* block_stats holds statistics for a single block type */
+struct block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct block_stats obj_stats;
+	/* stats for index blocks */
+	struct block_stats idx_stats;
+	/* stats for log blocks */
+	struct block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* different types of errors */
+
+/* Unexpected file system behavior */
+#define IO_ERROR -2
+
+/* Format inconsistency on reading data
+ */
+#define FORMAT_ERROR -3
+
+/* File does not exist. Returned from block_source_from_file(),  because it
+   needs special handling in stack.
+*/
+#define NOT_EXIST_ERROR -4
+
+/* Trying to write out-of-date data. */
+#define LOCK_ERROR -5
+
+/* Misuse of the API:
+   - on writing a record with NULL ref_name.
+   - on writing a ref_record outside the table limits
+   - on writing a ref or log record before the stack's next_update_index
+   - on reading a ref_record from log iterator, or vice versa.
+ */
+#define API_ERROR -6
+
+/* Decompression error */
+#define ZLIB_ERROR -7
+
+const char *error_str(int err);
+
+/* new_writer creates a new writer */
+struct writer *new_writer(int (*writer_func)(void *, byte *, int),
+			  void *writer_arg, struct write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int fd_writer(void *fdp, byte *data, int size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   stack_next_update_index(), or API_ERROR is returned.
+ */
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max);
+
+/* adds a ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   writer_set_limits(), or API_ERROR is returned.
+ */
+int writer_add_ref(struct writer *w, struct ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n);
+
+/* adds a log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int writer_add_log(struct writer *w, struct log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int writer_add_logs(struct writer *w, struct log_record *logs, int n);
+
+/* writer_close finalizes the reftable. The writer is retained so statistics can
+ * be inspected. */
+int writer_close(struct writer *w);
+
+/* writer_stats returns the statistics on the reftable being written. */
+struct stats *writer_stats(struct writer *w);
+
+/* writer_free deallocates memory for the writer */
+void writer_free(struct writer *w);
+
+struct reader;
+
+/* new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp.  The name is used for creating a
+ * stack. Typically, it is the basename of the file.
+ */
+int new_reader(struct reader **pp, struct block_source, const char *name);
+
+/* reader_seek_ref returns an iterator where 'name' would be inserted in the
+   table.
+
+   example:
+
+   struct reader *r = NULL;
+   int err = new_reader(&r, src, "filename");
+   if (err < 0) { ... }
+   struct iterator it = {};
+   err = reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct ref_record ref = {};
+   while (1) {
+     err = iterator_next_ref(it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   iterator_destroy(&it);
+   ref_record_clear(&ref);
+ */
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name);
+
+/* seek to logs for the given name, older than update_index. */
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+		       uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reader_free(struct reader *);
+
+/* return an iterator for the refs pointing to oid */
+int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
+		    int oid_len);
+
+/* return the max_update_index for a table */
+uint64_t reader_max_update_index(struct reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reader_min_update_index(struct reader *r);
+
+/* a merged table is implements seeking/iterating over a stack of tables. */
+struct merged_table;
+
+/* new_merged_table creates a new merged table. It takes ownership of the stack
+   array.
+*/
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n);
+
+/* returns an iterator positioned just before 'name' */
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+			  const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+			     const char *name, uint64_t update_index);
+
+/* like merged_table_seek_log_at but look for the newest entry. */
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+			  const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t merged_max_update_index(struct merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t merged_min_update_index(struct merged_table *mt);
+
+/* closes readers for the merged tables */
+void merged_table_close(struct merged_table *mt);
+
+/* releases memory for the merged_table */
+void merged_table_free(struct merged_table *m);
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct stack;
+
+/* open a new reftable stack. The tables will be stored in 'dir', while the list
+   of tables is in 'list_file'. Typically, this should be .git/reftables and
+   .git/refs respectively.
+*/
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+	      struct write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t stack_next_update_index(struct stack *st);
+
+/* add a new table to the stack. The write_table function must call
+   writer_set_limits, add refs and return an error value. */
+int stack_add(struct stack *st,
+	      int (*write_table)(struct writer *wr, void *write_arg),
+	      void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct merged_table *stack_merged_table(struct stack *st);
+
+/* frees all resources associated with the stack. */
+void stack_destroy(struct stack *st);
+
+/* reloads the stack if necessary. */
+int stack_reload(struct stack *st);
+
+/* Policy for expiring reflog entries. */
+struct log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int stack_compact_all(struct stack *st, struct log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int stack_auto_compact(struct stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_ref(struct stack *st, const char *refname,
+		   struct ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_log(struct stack *st, const char *refname,
+		   struct log_record *log);
+
+/* statistics on past compactions. */
+struct compaction_stats {
+	uint64_t bytes;
+	int attempts;
+	int failures;
+};
+
+struct compaction_stats *stack_compaction_stats(struct stack *st);
+
+#endif
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 0000000000..efbe625253
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,199 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+
+	{
+		int l = strlen(str);
+		l++; /* \0 */
+		slice_resize(s, l);
+		memcpy(s->buf, str, l);
+		s->len = l - 1;
+	}
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_append_string(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_append(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+byte *slice_yield(struct slice *s)
+{
+	byte *p = s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = {};
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_yield(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	if (a.len != b.len) {
+		return 0;
+	}
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_compare(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	if (res != 0) {
+		return res;
+	}
+	if (a.len < b.len) {
+		return -1;
+	} else if (a.len > b.len) {
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+int slice_write(struct slice *b, byte *data, int sz)
+{
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_write_void(void *b, byte *data, int sz)
+{
+	return slice_write((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = calloc(size, 1);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct block_source *bs, struct slice *buf)
+{
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+struct block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 0000000000..f12a6db228
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+struct slice {
+	byte *buf;
+	int len;
+	int cap;
+};
+
+void slice_set_string(struct slice *dest, const char *);
+void slice_append_string(struct slice *dest, const char *);
+char *slice_to_string(struct slice src);
+const char *slice_as_string(struct slice *src);
+bool slice_equal(struct slice a, struct slice b);
+byte *slice_yield(struct slice *s);
+void slice_copy(struct slice *dest, struct slice src);
+void slice_resize(struct slice *s, int l);
+int slice_compare(struct slice a, struct slice b);
+int slice_write(struct slice *b, byte *data, int sz);
+int slice_write_void(void *b, byte *data, int sz);
+void slice_append(struct slice *dest, struct slice add);
+
+struct block_source;
+void block_source_from_slice(struct block_source *bs, struct slice *buf);
+
+struct block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 0000000000..303eed2f37
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,983 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "reftable.h"
+#include "writer.h"
+
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+	      struct write_options config)
+{
+	struct stack *p = calloc(sizeof(struct stack), 1);
+	int err = 0;
+	*dest = NULL;
+	p->list_file = strdup(list_file);
+	p->reftable_dir = strdup(dir);
+	p->config = config;
+
+	err = stack_reload(p);
+	if (err < 0) {
+		stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fread_lines(FILE *f, char ***namesp)
+{
+	long size = 0;
+	int err = fseek(f, 0, SEEK_END);
+	char *buf = NULL;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	size = ftell(f);
+	if (size < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = fseek(f, 0, SEEK_SET);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	buf = malloc(size + 1);
+	if (fread(buf, 1, size, f) != size) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+exit:
+	free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	FILE *f = fopen(filename, "r");
+	int err = 0;
+	if (f == NULL) {
+		if (errno == ENOENT) {
+			*namesp = calloc(sizeof(char *), 1);
+			return 0;
+		}
+
+		return IO_ERROR;
+	}
+	err = fread_lines(f, namesp);
+	fclose(f);
+	return err;
+}
+
+struct merged_table *stack_merged_table(struct stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void stack_destroy(struct stack *st)
+{
+	if (st->merged == NULL) {
+		return;
+	}
+
+	merged_table_close(st->merged);
+	merged_table_free(st->merged);
+	st->merged = NULL;
+
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	free(st);
+}
+
+static struct reader **stack_copy_readers(struct stack *st, int cur_len)
+{
+	struct reader **cur = calloc(sizeof(struct reader *), cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int stack_reload_once(struct stack *st, char **names, bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reader **new_tables =
+		malloc(sizeof(struct reader *) * names_len);
+	int new_tables_len = 0;
+	struct merged_table *new_merged = NULL;
+
+	struct slice table_path = {};
+
+	while (*names) {
+		struct reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct block_source src = {};
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_append_string(&table_path, "/");
+			slice_append_string(&table_path, name);
+
+			err = block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0) {
+				goto exit;
+			}
+
+			err = new_reader(&rd, src, name);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = new_merged_table(&new_merged, new_tables, new_tables_len);
+	if (err < 0) {
+		goto exit;
+	}
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		merged_table_free(st->merged);
+	}
+	st->merged = new_merged;
+
+	{
+		int i = 0;
+		for (i = 0; i < cur_len; i++) {
+			if (cur[i] != NULL) {
+				reader_close(cur[i]);
+				reader_free(cur[i]);
+			}
+		}
+	}
+exit:
+	free(slice_yield(&table_path));
+	{
+		int i = 0;
+		for (i = 0; i < new_tables_len; i++) {
+			reader_close(new_tables[i]);
+		}
+	}
+	free(new_tables);
+	free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0) {
+		return diff;
+	}
+
+	return udiff;
+}
+
+static int stack_reload_maybe_reuse(struct stack *st, bool reuse_open)
+{
+	struct timeval deadline = {};
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = {};
+		int err = gettimeofday(&now, NULL);
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		err = read_lines(st->list_file, &names_after);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return -1;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 100;
+		usleep(delay);
+	}
+
+	return 0;
+}
+
+int stack_reload(struct stack *st)
+{
+	return stack_reload_maybe_reuse(st, true);
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto exit;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto exit;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	free_names(names);
+	return err;
+}
+
+int stack_add(struct stack *st, int (*write)(struct writer *wr, void *arg),
+	      void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == LOCK_ERROR) {
+			err = stack_reload(st);
+		}
+		return err;
+	}
+
+	return stack_auto_compact(st);
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "%012" PRIx64 "-%012" PRIx64, min, max);
+	slice_set_string(dest, buf);
+}
+
+int stack_try_add(struct stack *st,
+		  int (*write_table)(struct writer *wr, void *arg), void *arg)
+{
+	struct slice lock_name = {};
+	struct slice temp_tab_name = {};
+	struct slice tab_name = {};
+	struct slice next_name = {};
+	struct slice table_list = {};
+	struct writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+	int lock_fd = 0;
+	uint64_t next_update_index = 0;
+
+	slice_set_string(&lock_name, st->list_file);
+	slice_append_string(&lock_name, ".lock");
+
+	lock_fd = open(slice_as_string(&lock_name), O_EXCL | O_CREAT | O_WRONLY,
+		       0644);
+	if (lock_fd < 0) {
+		if (errno == EEXIST) {
+			err = LOCK_ERROR;
+			goto exit;
+		}
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_uptodate(st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 1) {
+		err = LOCK_ERROR;
+		goto exit;
+	}
+
+	next_update_index = stack_next_update_index(st);
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, next_update_index, next_update_index);
+
+	slice_set_string(&temp_tab_name, st->reftable_dir);
+	slice_append_string(&temp_tab_name, "/");
+	slice_append(&temp_tab_name, next_name);
+	slice_append_string(&temp_tab_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_name));
+	if (tab_fd < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	wr = new_writer(fd_writer, &tab_fd, &st->config);
+	err = write_table(wr, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	if (wr->min_update_index < next_update_index) {
+		err = API_ERROR;
+		goto exit;
+	}
+
+	{
+		int i = 0;
+		for (i = 0; i < st->merged->stack_len; i++) {
+			slice_append_string(&table_list,
+					    st->merged->stack[i]->name);
+			slice_append_string(&table_list, "\n");
+		}
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_append_string(&next_name, ".ref");
+	slice_append(&table_list, next_name);
+	slice_append_string(&table_list, "\n");
+
+	slice_set_string(&tab_name, st->reftable_dir);
+	slice_append_string(&tab_name, "/");
+	slice_append(&tab_name, next_name);
+
+	err = rename(slice_as_string(&temp_tab_name),
+		     slice_as_string(&tab_name));
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	free(slice_yield(&temp_tab_name));
+
+	err = write(lock_fd, table_list.buf, table_list.len);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = close(lock_fd);
+	lock_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&tab_name));
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&tab_name));
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_reload(st);
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_name));
+	}
+	unlink(slice_as_string(&lock_name));
+
+	if (lock_fd > 0) {
+		close(lock_fd);
+		lock_fd = 0;
+	}
+
+	free(slice_yield(&lock_name));
+	free(slice_yield(&temp_tab_name));
+	free(slice_yield(&tab_name));
+	free(slice_yield(&next_name));
+	free(slice_yield(&table_list));
+	writer_free(wr);
+	return err;
+}
+
+uint64_t stack_next_update_index(struct stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0) {
+		return reader_max_update_index(st->merged->stack[sz - 1]) + 1;
+	}
+	return 1;
+}
+
+static int stack_compact_locked(struct stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct log_expiry_config *config)
+{
+	struct slice next_name = {};
+	int tab_fd = -1;
+	struct writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reader_min_update_index(st->merged->stack[first]),
+		    reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_append_string(temp_tab, "/");
+	slice_append(temp_tab, next_name);
+	slice_append_string(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = new_writer(fd_writer, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0) {
+		goto exit;
+	}
+	err = writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+	writer_free(wr);
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		free(slice_yield(temp_tab));
+	}
+	free(slice_yield(&next_name));
+	return err;
+}
+
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+			int last, struct log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reader **subtabs =
+		calloc(sizeof(struct reader *), last - first + 1);
+	struct merged_table *mt = NULL;
+	int err = 0;
+	struct iterator it = {};
+	struct ref_record ref = {};
+	struct log_record log = {};
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	writer_set_limits(wr, st->merged->stack[first]->min_update_index,
+			  st->merged->stack[last]->max_update_index);
+
+	err = new_merged_table(&mt, subtabs, subtabs_len);
+	if (err < 0) {
+		free(subtabs);
+		goto exit;
+	}
+
+	err = merged_table_seek_ref(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = iterator_next_ref(it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+	}
+
+	err = merged_table_seek_log(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = iterator_next_log(it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		/* XXX collect stats? */
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+	}
+
+exit:
+	iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		merged_table_free(mt);
+	}
+	ref_record_clear(&ref);
+
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct stack *st, int first, int last,
+			       struct log_expiry_config *expiry)
+{
+	struct slice temp_tab_name = {};
+	struct slice new_table_name = {};
+	struct slice lock_file_name = {};
+	struct slice ref_list_contents = {};
+	struct slice new_table_path = {};
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **delete_on_success = calloc(sizeof(char *), compact_count + 1);
+	char **subtable_locks = calloc(sizeof(char *), compact_count + 1);
+	int i = 0;
+	int j = 0;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto exit;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_append_string(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0) {
+		goto exit;
+	}
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_name = {};
+		struct slice subtab_lock = {};
+		slice_set_string(&subtab_name, st->reftable_dir);
+		slice_append_string(&subtab_name, "/");
+		slice_append_string(&subtab_name,
+				    reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_name);
+		slice_append_string(&subtab_lock, ".lock");
+
+		{
+			int sublock_file_fd =
+				open(slice_as_string(&subtab_lock),
+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
+			if (sublock_file_fd > 0) {
+				close(sublock_file_fd);
+			} else if (sublock_file_fd < 0) {
+				if (errno == EEXIST) {
+					err = 1;
+				}
+				err = IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] = (char *)slice_as_string(&subtab_name);
+		j++;
+
+		if (err != 0) {
+			goto exit;
+		}
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_name, expiry);
+	if (err < 0) {
+		goto exit;
+	}
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_append_string(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_append_string(&new_table_path, "/");
+
+	slice_append(&new_table_path, new_table_name);
+
+	err = rename(slice_as_string(&temp_tab_name),
+		     slice_as_string(&new_table_path));
+	if (err < 0) {
+		goto exit;
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	slice_append(&ref_list_contents, new_table_name);
+	slice_append_string(&ref_list_contents, "\n");
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	have_lock = false;
+
+	for (char **p = delete_on_success; *p; p++) {
+		if (strcmp(*p, slice_as_string(&new_table_path))) {
+			unlink(*p);
+		}
+	}
+
+	err = stack_reload_maybe_reuse(st, first < last);
+exit:
+	for (char **p = subtable_locks; *p; p++) {
+		unlink(*p);
+	}
+	free_names(delete_on_success);
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	free(slice_yield(&new_table_name));
+	free(slice_yield(&new_table_path));
+	free(slice_yield(&ref_list_contents));
+	free(slice_yield(&temp_tab_name));
+	free(slice_yield(&lock_file_name));
+	return err;
+}
+
+int stack_compact_all(struct stack *st, struct log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct stack *st, int first, int last,
+				     struct log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	assert(sz > 0);
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = calloc(sizeof(struct segment), n);
+	int next = 0;
+	struct segment cur = {};
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct stack *st)
+{
+	uint64_t *sizes = calloc(sizeof(uint64_t), st->merged->stack_len);
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		/* overhead is 24 + 68 = 92. */
+		sizes[i] = st->merged->stack[i]->size - 91;
+	}
+	return sizes;
+}
+
+int stack_auto_compact(struct stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	free(sizes);
+	if (segment_size(&seg) > 0) {
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+	}
+
+	return 0;
+}
+
+struct compaction_stats *stack_compaction_stats(struct stack *st)
+{
+	return &st->stats;
+}
+
+int stack_read_ref(struct stack *st, const char *refname,
+		   struct ref_record *ref)
+{
+	struct iterator it = {};
+	struct merged_table *mt = stack_merged_table(st);
+	int err = merged_table_seek_ref(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = iterator_next_ref(it, ref);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(ref->ref_name, refname) || ref_record_is_deletion(ref)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	iterator_destroy(&it);
+	return err;
+}
+
+int stack_read_log(struct stack *st, const char *refname,
+		   struct log_record *log)
+{
+	struct iterator it = {};
+	struct merged_table *mt = stack_merged_table(st);
+	int err = merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = iterator_next_log(it, log);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(log->ref_name, refname) || log_record_is_deletion(log)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 0000000000..d5e2c93c29
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,40 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+
+struct stack {
+	char *list_file;
+	char *reftable_dir;
+
+	struct write_options config;
+
+	struct merged_table *merged;
+	struct compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct stack *st,
+		  int (*write_table)(struct writer *wr, void *arg), void *arg);
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+			int last, struct log_expiry_config *config);
+int fastlog2(uint64_t sz);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 0000000000..ee7a6e1d70
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,55 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#include "config.h"
+
+#ifndef REFTABLE_STANDALONE
+
+#include "git-compat-util.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#define ARRAY_SIZE(a) sizeof((a)) / sizeof((a)[0])
+#define FREE_AND_NULL(x)    \
+	do {                \
+		free(x);    \
+		(x) = NULL; \
+	} while (0)
+#define QSORT(arr, n, cmp) qsort(arr, n, sizeof(arr[0]), cmp)
+#define SWAP(a, b)                              \
+	{                                       \
+		char tmp[sizeof(a)];            \
+		assert(sizeof(a) == sizeof(b)); \
+		memcpy(&tmp[0], &a, sizeof(a)); \
+		memcpy(&a, &b, sizeof(a));      \
+		memcpy(&b, &tmp[0], sizeof(a)); \
+	}
+#endif
+
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 0000000000..9bf7fe531f
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,66 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				calloc(sizeof(struct tree_node), 1);
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	{
+		int res = compare(key, (*rootp)->key);
+		if (res < 0) {
+			return tree_search(key, &(*rootp)->left, compare,
+					   insert);
+		} else if (res > 0) {
+			return tree_search(key, &(*rootp)->right, compare,
+					   insert);
+		}
+	}
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 0000000000..86a71715ae
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,24 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 0000000000..3247481df3
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,623 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct block_stats *writer_block_stats(struct writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct writer *w, byte *data, size_t len, int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = calloc(w->pending_padding, 1);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0) {
+			return n;
+		}
+
+		w->pending_padding = 0;
+		free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0) {
+		return n;
+	}
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_write_header(struct writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+	dest[4] = 1; /* version */
+	put_u24(dest + 5, w->opts.block_size);
+	put_u64(dest + 8, w->min_update_index);
+	put_u64(dest + 16, w->max_update_index);
+	return 24;
+}
+
+static void writer_reinit_block_writer(struct writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = HEADER_SIZE;
+	}
+
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start, w->hash_size);
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct writer *new_writer(int (*writer_func)(void *, byte *, int),
+			  void *writer_arg, struct write_options *opts)
+{
+	struct writer *wp = calloc(sizeof(struct writer), 1);
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->hash_size = SHA1_SIZE;
+	wp->block = calloc(opts->block_size, 1);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void writer_free(struct writer *w)
+{
+	free(w->block);
+	free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
+			     ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		key = calloc(sizeof(struct obj_index_tree_node), 1);
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = realloc(key->offsets,
+				       sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct writer *w, struct record rec)
+{
+	int result = -1;
+	struct slice key = {};
+	int err = 0;
+	record_key(rec, &key);
+	if (slice_compare(w->last_key, key) >= 0) {
+		goto exit;
+	}
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto exit;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	writer_reinit_block_writer(w, record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	result = 0;
+exit:
+	free(slice_yield(&key));
+	return result;
+}
+
+int writer_add_ref(struct writer *w, struct ref_record *ref)
+{
+	struct record rec = {};
+	struct ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL) {
+		return API_ERROR;
+	}
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index) {
+		return API_ERROR;
+	}
+
+	record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = w->hash_size,
+		};
+
+		writer_index_hash(w, h);
+	}
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = w->hash_size,
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int writer_add_log(struct writer *w, struct log_record *log)
+{
+	if (log->ref_name == NULL) {
+		return API_ERROR;
+	}
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	{
+		struct record rec = {};
+		int err;
+		record_from_log(&rec, log);
+		err = writer_add_record(w, rec);
+		return err;
+	}
+}
+
+int writer_add_logs(struct writer *w, struct log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	while (w->index_len > threshold) {
+		struct index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct record rec = {};
+			record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, rec) == 0) {
+				continue;
+			}
+
+			{
+				int err = writer_flush_block(w);
+				if (err < 0) {
+					return err;
+				}
+			}
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, rec);
+			assert(err == 0);
+		}
+		for (i = 0; i < idx_len; i++) {
+			free(slice_yield(&idx[i].last_key));
+		}
+		free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct block_stats *bstats = writer_block_stats(w, typ);
+		bstats->index_blocks =
+			w->stats.idx_stats.blocks - before_blocks;
+		bstats->index_offset = index_start;
+		bstats->max_index_level = max_level;
+	}
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct record rec = {};
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	free(slice_yield(&entry->hash));
+	free(entry);
+}
+
+static int writer_dump_object_index(struct writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = {};
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0) {
+		return closure.err;
+	}
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0) {
+		return err;
+	}
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_close(struct writer *w)
+{
+	byte footer[68];
+	byte *p = footer;
+
+	writer_finish_public_section(w);
+
+	writer_write_header(w, footer);
+	p += 24;
+	put_u64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_u64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_u64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_u64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_u64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_u32(p, crc32(0, footer, p - footer));
+	p += 4;
+	w->pending_padding = 0;
+
+	{
+		int n = padded_write(w, footer, sizeof(footer), 0);
+		if (n < 0) {
+			return n;
+		}
+	}
+
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	free(slice_yield(&w->last_key));
+	return 0;
+}
+
+void writer_clear_index(struct writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		free(slice_yield(&w->index[i].last_key));
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct block_stats *bstats = writer_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	if (raw_bytes < 0) {
+		return raw_bytes;
+	}
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_u24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0) {
+		return err;
+	}
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = realloc(w->index,
+				   sizeof(struct index_record) * w->index_cap);
+	}
+
+	{
+		struct index_record ir = {
+			.offset = w->next,
+		};
+		slice_copy(&ir.last_key, w->block_writer->last_key);
+		w->index[w->index_len] = ir;
+	}
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	block_writer_reset(&w->block_writer_data);
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct writer *w)
+{
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+	if (w->block_writer->entries == 0) {
+		return 0;
+	}
+	return writer_flush_nonempty_block(w);
+}
+
+struct stats *writer_stats(struct writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 0000000000..bd2386474b
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,46 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct writer {
+	int (*write)(void *, byte *, int);
+	void *write_arg;
+	int pending_padding;
+	int hash_size;
+	struct slice last_key;
+
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct write_options opts;
+
+	byte *block;
+	struct block_writer *block_writer;
+	struct block_writer block_writer_data;
+	struct index_record *index;
+	int index_len;
+	int index_cap;
+
+	/* tree for use with tsearch */
+	struct tree_node *obj_index_tree;
+
+	struct stats stats;
+};
+
+int writer_flush_block(struct writer *w);
+void writer_clear_index(struct writer *w);
+int writer_finish_public_section(struct writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 0000000000..3e0b0f24f1
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v4 5/5] Reftable support for git-core
  2020-02-06 22:55     ` [PATCH v4 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                         ` (3 preceding siblings ...)
  2020-02-06 22:55       ` [PATCH v4 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-02-06 22:55       ` Han-Wen Nienhuys via GitGitGadget
  2020-02-06 23:49         ` brian m. carlson
  2020-02-06 23:31       ` [PATCH v4 0/5] Reftable support git-core brian m. carlson
  2020-02-10 14:14       ` [PATCH v5 " Han-Wen Nienhuys via GitGitGadget
  6 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-06 22:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

TODO:

 * Make CI on gitgitgadget compile.
 * Redo interaction between repo config, repo storage version
 * Resolve the design problem with reflog expiry.
 * Resolve spots marked with XXX

Example use:

  $ ~/vc/git/git init --reftable
  warning: templates not found in /usr/local/google/home/hanwen/share/git-core/templates
  Initialized empty Git repository in /tmp/qz/.git/
  $ echo q > a
  $ ~/vc/git/git add a
  $ ~/vc/git/git commit -mx
  fatal: not a git repository (or any of the parent directories): .git
  [master (root-commit) 373d969] x
   1 file changed, 1 insertion(+)
   create mode 100644 a
  $ ~/vc/git/git show-ref
  373d96972fca9b63595740bba3898a762778ba20 HEAD
  373d96972fca9b63595740bba3898a762778ba20 refs/heads/master
  $ ls -l .git/reftable/
  total 12
  -rw------- 1 hanwen primarygroup  126 Jan 23 20:08 000000000001-000000000001.ref
  -rw------- 1 hanwen primarygroup 4277 Jan 23 20:08 000000000002-000000000002.ref
  $ go run ~/vc/reftable/cmd/dump.go  -table /tmp/qz/.git/reftable/000000000002-000000000002.ref
  ** DEBUG **
  name /tmp/qz/.git/reftable/000000000002-000000000002.ref, sz 4209: 'r' reftable.readerOffsets{Present:true, Offset:0x0, IndexOffset:0x0}, 'o' reftable.readerOffsets{Present:false, Offset:0x0, IndexOffset:0x0} 'g' reftable.readerOffsets{Present:true, Offset:0x1000, IndexOffset:0x0}
  ** REFS **
  reftable.RefRecord{RefName:"refs/heads/master", UpdateIndex:0x2, Value:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, TargetValue:[]uint8(nil), Target:""}
  ** LOGS **
  reftable.LogRecord{RefName:"HEAD", UpdateIndex:0x2, New:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, Old:[]uint8{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, Name:"Han-Wen Nienhuys", Email:"hanwen@google.com", Time:0x5e29ef27, TZOffset:100, Message:"commit (initial): x\n"}

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Co-authored-by: Jeff King <peff@peff.net>
---
 Makefile                |  24 +-
 builtin/init-db.c       |  47 ++-
 cache.h                 |   2 +
 refs.c                  |  20 +-
 refs/refs-internal.h    |   1 +
 refs/reftable-backend.c | 880 ++++++++++++++++++++++++++++++++++++++++
 repository.c            |   2 +
 repository.h            |   3 +
 setup.c                 |  12 +-
 9 files changed, 966 insertions(+), 25 deletions(-)
 create mode 100644 refs/reftable-backend.c

diff --git a/Makefile b/Makefile
index 6134104ae6..706bb0a981 100644
--- a/Makefile
+++ b/Makefile
@@ -814,6 +814,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += command-list.h
 
@@ -959,6 +960,7 @@ LIB_OBJS += rebase-interactive.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1163,7 +1165,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2347,11 +2349,28 @@ VCSSVN_OBJS += vcs-svn/fast_export.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/bytes.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2488,6 +2507,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 45bdea0589..ea2a333c4a 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -177,7 +177,7 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 }
 
 static int create_default_files(const char *template_path,
-				const char *original_git_dir)
+				const char *original_git_dir, int flags)
 {
 	struct stat st1;
 	struct strbuf buf = STRBUF_INIT;
@@ -213,6 +213,8 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	if (flags & INIT_DB_REFTABLE)
+		the_repository->ref_storage_format = xstrdup("reftable");
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -222,6 +224,15 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -234,17 +245,20 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
+	} else {
+		/*
+		 * XXX should check whether our ref backend matches the
+		 * original one; if not, either die() or convert
+		 */
 	}
 
 	/* This forces creation of new config file */
-	xsnprintf(repo_version_string, sizeof(repo_version_string),
-		  "%d", GIT_REPO_VERSION);
+	xsnprintf(repo_version_string, sizeof(repo_version_string), "%d",
+		  flags & INIT_DB_REFTABLE ? GIT_REPO_VERSION_READ :
+					     GIT_REPO_VERSION);
 	git_config_set("core.repositoryformatversion", repo_version_string);
 
 	/* Check filemode trustability */
@@ -378,7 +392,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 */
 	check_repository_format();
 
-	reinit = create_default_files(template_dir, original_git_dir);
+	reinit = create_default_files(template_dir, original_git_dir, flags);
 
 	create_object_directory();
 
@@ -403,6 +417,10 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	if (flags & INIT_DB_REFTABLE) {
+		git_config_set("extensions.refStorage", "reftable");
+	}
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -481,15 +499,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *template_dir = NULL;
 	unsigned int flags = 0;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_BIT(0, "reftable", &flags, N_("use reftable"),
+			INIT_DB_REFTABLE),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_END()
diff --git a/cache.h b/cache.h
index cbfaead23a..929a61e861 100644
--- a/cache.h
+++ b/cache.h
@@ -625,6 +625,7 @@ int path_inside_repo(const char *prefix, const char *path);
 
 #define INIT_DB_QUIET 0x0001
 #define INIT_DB_EXIST_OK 0x0002
+#define INIT_DB_REFTABLE 0x0004
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, unsigned int flags);
@@ -1041,6 +1042,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 1ab0bb54d3..e4e5af8ea0 100644
--- a/refs.c
+++ b/refs.c
@@ -20,7 +20,7 @@
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1836,13 +1836,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1858,7 +1858,10 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs = ref_store_init(r->gitdir,
+				 r->ref_storage_format ? r->ref_storage_format :
+							 "files",
+				 REF_STORE_ALL_CAPS);
 	return r->refs;
 }
 
@@ -1913,7 +1916,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, "files", /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1927,6 +1930,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = "files"; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1940,9 +1944,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 1d7a485220..d87e9cbdec 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -661,6 +661,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 0000000000..e793ee362e
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,880 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+/*
+  The reftable v1 spec only supports 20-byte binary SHA1s. A new format version
+  will be needed to support SHA256.
+ */
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *reftable_dir;
+	char *table_list_file;
+	struct stack *stack;
+};
+
+static void clear_log_record(struct log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	log_record_clear(log);
+}
+
+static void fill_log_record(struct log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = {};
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *reftable_ref_store_create(const char *path,
+						   unsigned int store_flags)
+{
+	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct write_options cfg = {
+		.block_size = 4096,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/reftable/tables.list", path);
+	refs->table_list_file = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", path);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/HEAD", path);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", path);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	refs->err = new_stack(&refs->stack, refs->reftable_dir,
+			      refs->table_list_file, cfg);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	FILE *file;
+	safe_create_dir(refs->reftable_dir, 1);
+
+	file = fopen(refs->table_list_file, "a");
+	if (file == NULL) {
+		return -1;
+	}
+	fclose(file);
+	return 0;
+}
+
+struct reftable_iterator {
+	struct ref_iterator base;
+	struct iterator iter;
+	struct ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = iterator_next_ref(ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	ref_record_clear(&ri->ref);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct merged_table *mt = NULL;
+
+	mt = stack_merged_table(refs->stack);
+	ri->err = refs->err;
+	if (ri->err == 0) {
+		ri->err = merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	(void)refs;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid = {};
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct ref_update *)a)->refname,
+		      ((struct ref_update *)b)->refname);
+}
+
+static int write_transaction_table(struct writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = stack_next_update_index(refs->stack);
+	int err = 0;
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	writer_set_limits(writer, ts, ts);
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_NEW) {
+			struct object_id out_oid = {};
+			int out_flags = 0;
+			/* Memory owned by refs_resolve_ref_unsafe, no need to
+			 * free(). */
+			const char *resolved = refs_resolve_ref_unsafe(
+				transaction->ref_store, u->refname, 0, &out_oid,
+				&out_flags);
+			struct ref_record ref = {};
+			ref.ref_name =
+				(char *)(resolved ? resolved : u->refname);
+			ref.value = u->new_oid.hash;
+			ref.update_index = ts;
+			err = writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct log_record log = {};
+		fill_log_record(&log);
+
+		log.ref_name = (char *)u->refname;
+		log.old_hash = u->old_oid.hash;
+		log.new_hash = u->new_oid.hash;
+		log.update_index = ts;
+		log.message = u->msg;
+
+		err = writer_add_log(writer, &log);
+		clear_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+exit:
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_commit(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	int err = stack_add(refs->stack, &write_transaction_table, transaction);
+	if (err < 0) {
+		strbuf_addf(errmsg, "reftable: transaction failure %s",
+			    error_str(err));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *err)
+{
+	return reftable_transaction_commit(ref_store, transaction, err);
+}
+
+struct write_delete_refs_arg {
+	struct stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	int err = 0;
+
+	writer_set_limits(writer, ts, ts);
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct log_record log = {};
+		struct ref_record current = {};
+		fill_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (stack_read_ref(arg->stack, log.ref_name, &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		ref_record_clear(&current);
+
+		clear_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	return stack_add(refs->stack, &write_delete_refs_table, &arg);
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	return stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	writer_set_limits(writer, ts, ts);
+	err = writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct log_record log = {};
+		struct object_id new_oid = {};
+		struct object_id old_oid = {};
+		struct ref_record current = {};
+		stack_read_ref(create->refs->stack, create->refname, &current);
+
+		fill_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	return stack_add(refs->stack, &write_create_symref_table, &arg);
+}
+
+struct write_rename_arg {
+	struct stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	struct ref_record ref = {};
+	int err = stack_read_ref(arg->stack, arg->oldname, &ref);
+	if (err) {
+		goto exit;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto exit;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct ref_record todo[2] = {};
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct log_record todo[2] = {};
+		fill_log_record(&todo[0]);
+		fill_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = writer_add_logs(writer, todo, 2);
+
+		clear_log_record(&todo[0]);
+		clear_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto exit;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	return stack_add(refs->stack, &write_rename_table, &arg);
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct iterator iter;
+	struct log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = iterator_next_log(ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	log_record_clear(&ri->log);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return empty_ref_iterator_begin();
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct iterator it = {};
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &it, refname);
+	struct log_record log = {};
+
+	while (err == 0) {
+		err = iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid = {};
+			struct object_id new_oid = {};
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	log_record_clear(&log);
+	iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct iterator it = {};
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &it, refname);
+
+	struct log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+
+	while (err == 0) {
+		struct log_record log = {};
+		err = iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (int i = len; i--;) {
+		struct log_record *log = &logs[i];
+		struct object_id old_oid = {};
+		struct object_id new_oid = {};
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (int i = 0; i < len; i++) {
+		log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	BUG("per ref reflog expiry is not implemented for reftable backend.");
+	/*
+	  TODO: should iterate over the reflog, and write tombstones. The
+	  tombstones will be removed on compaction.
+	 */
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct ref_record ref = {};
+	int err = stack_read_ref(refs->stack, refname, &ref);
+	if (err) {
+		goto exit;
+	}
+	if (ref.target != NULL) {
+		/* XXX recurse? */
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else {
+		hashcpy(oid->hash, ref.value);
+	}
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire
+};
diff --git a/repository.c b/repository.c
index a4174ddb06..ff0988dac8 100644
--- a/repository.c
+++ b/repository.c
@@ -174,6 +174,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 040057dea6..198d4aa090 100644
--- a/repository.h
+++ b/repository.h
@@ -70,6 +70,9 @@ struct repository {
 	/* The store in which the refs are held. */
 	struct ref_store *refs;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index e2a479a64f..e2e1583eee 100644
--- a/setup.c
+++ b/setup.c
@@ -440,9 +440,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -531,6 +533,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1173,8 +1176,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 4/5] Add reftable library
  2020-02-06 22:55       ` [PATCH v4 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-02-06 23:07         ` Junio C Hamano
  2020-02-07  0:16         ` brian m. carlson
  1 sibling, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-02-06 23:07 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +static struct block_stats *writer_block_stats(struct writer *w, byte typ)
> +{
> +	switch (typ) {
> +	case 'r':
> +		return &w->stats.ref_stats;
> +	case 'o':
> +		return &w->stats.obj_stats;
> +	case 'i':
> +		return &w->stats.idx_stats;
> +	case 'g':
> +		return &w->stats.log_stats;
> +	}
> +	assert(false);
> +	return NULL;
> +}

As assert() turns into nothing, this is not a particularly good way
to document that "if control reaches here, that means we found a
programming error".  

We intead would use BUG("message") in our codebase, which is marked
as NORETURN (so "return NULL" after it would be a dead code).

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 0/5] Reftable support git-core
  2020-02-06 22:55     ` [PATCH v4 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                         ` (4 preceding siblings ...)
  2020-02-06 22:55       ` [PATCH v4 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-02-06 23:31       ` brian m. carlson
  2020-02-10 14:14       ` [PATCH v5 " Han-Wen Nienhuys via GitGitGadget
  6 siblings, 0 replies; 409+ messages in thread
From: brian m. carlson @ 2020-02-06 23:31 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

On 2020-02-06 at 22:55:51, Han-Wen Nienhuys via GitGitGadget wrote:
> This adds the reftable library, and hooks it up as a ref backend.
> 
> At this point, I am mainly interested in feedback on the spots marked with
> XXX in the Git source code.
> 
> v3
> 
>  * passes gitgitgadget CI.

It's interesting you should say that, because one thing I'm missing here
is some tests.  Since SHA-256 support is rapidly coming to Git, I'd like
to make sure that this series works with SHA-256 repositories, but I
can't do that if there are no tests.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 5/5] Reftable support for git-core
  2020-02-06 22:55       ` [PATCH v4 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-02-06 23:49         ` brian m. carlson
  2020-02-10 13:18           ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: brian m. carlson @ 2020-02-06 23:49 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 4557 bytes --]

On 2020-02-06 at 22:55:56, Han-Wen Nienhuys via GitGitGadget wrote:
> @@ -403,6 +417,10 @@ int init_db(const char *git_dir, const char *real_git_dir,
>  		git_config_set("receive.denyNonFastforwards", "true");
>  	}
>  
> +	if (flags & INIT_DB_REFTABLE) {
> +		git_config_set("extensions.refStorage", "reftable");
> +	}

This does seem like the best way to do this.  Can we get an addition to
Documentation/technical/repository-version.txt that documents this
extension and the values it takes?  I presume "reftable" and "files" are
options, but it would be nice it folks didn't have to dig through the
code to find that out.

One side note: we typically omit the braces for a single-line if.

> +
>  	if (!(flags & INIT_DB_QUIET)) {
>  		int len = strlen(git_dir);
>  
> @@ -481,15 +499,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
>  	const char *template_dir = NULL;
>  	unsigned int flags = 0;
>  	const struct option init_db_options[] = {
> -		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
> -				N_("directory from which templates will be used")),
> +		OPT_STRING(0, "template", &template_dir,
> +			   N_("template-directory"),
> +			   N_("directory from which templates will be used")),
>  		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
> -				N_("create a bare repository"), 1),
> +			    N_("create a bare repository"), 1),
>  		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
> -			N_("permissions"),
> -			N_("specify that the git repository is to be shared amongst several users"),
> -			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
> +		  N_("permissions"),
> +		  N_("specify that the git repository is to be shared amongst several users"),
> +		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
>  		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
> +		OPT_BIT(0, "reftable", &flags, N_("use reftable"),
> +			INIT_DB_REFTABLE),

I wonder if this might be better as --refs-backend={files|reftable}.  If
reftable becomes the default at some point in the future, it would be
easier to let people explicitly specify that they want a non-reftable
version.  If we learn support for a hypothetical reftable v2 in the
future, maybe we'd want to let folks specify that instead.

>  		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
>  			   N_("separate git dir from working tree")),
>  		OPT_END()
> diff --git a/cache.h b/cache.h
> index cbfaead23a..929a61e861 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -625,6 +625,7 @@ int path_inside_repo(const char *prefix, const char *path);
>  
>  #define INIT_DB_QUIET 0x0001
>  #define INIT_DB_EXIST_OK 0x0002
> +#define INIT_DB_REFTABLE 0x0004
>  
>  int init_db(const char *git_dir, const char *real_git_dir,
>  	    const char *template_dir, unsigned int flags);
> @@ -1041,6 +1042,7 @@ struct repository_format {
>  	int is_bare;
>  	int hash_algo;
>  	char *work_tree;
> +	char *ref_storage;
>  	struct string_list unknown_extensions;
>  };
>  
> diff --git a/refs.c b/refs.c
> index 1ab0bb54d3..e4e5af8ea0 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -20,7 +20,7 @@
>  /*
>   * List of all available backends
>   */
> -static struct ref_storage_be *refs_backends = &refs_be_files;
> +static struct ref_storage_be *refs_backends = &refs_be_reftable;

I'm not sure why we're making this change.  It doesn't look like the
default is changing.

> @@ -1913,7 +1916,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
>  		goto done;
>  
>  	/* assume that add_submodule_odb() has been called */
> -	refs = ref_store_init(submodule_sb.buf,
> +	refs = ref_store_init(submodule_sb.buf, "files", /* XXX */
>  			      REF_STORE_READ | REF_STORE_ODB);
>  	register_ref_store_map(&submodule_ref_stores, "submodule",
>  			       refs, submodule);
> @@ -1927,6 +1930,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
>  
>  struct ref_store *get_worktree_ref_store(const struct worktree *wt)
>  {
> +	const char *format = "files"; /* XXX */

If the question is whether we still want to default to "files" as the
default, then yes, I think we do.  Suddenly changing things would be a
problem if for some reason someone needed to downgrade Git versions.

Since we have two instances of "files" above, it might be better to
create a #define like DEFAULT_REF_BACKEND or some such.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 4/5] Add reftable library
  2020-02-06 22:55       ` [PATCH v4 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
  2020-02-06 23:07         ` Junio C Hamano
@ 2020-02-07  0:16         ` brian m. carlson
  2020-02-10 13:16           ` Han-Wen Nienhuys
  1 sibling, 1 reply; 409+ messages in thread
From: brian m. carlson @ 2020-02-07  0:16 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 3621 bytes --]

On 2020-02-06 at 22:55:55, Han-Wen Nienhuys via GitGitGadget wrote:
> diff --git a/reftable/README.md b/reftable/README.md
> new file mode 100644
> index 0000000000..f527da0380
> --- /dev/null
> +++ b/reftable/README.md
> @@ -0,0 +1,19 @@
> +
> +The source code in this directory comes from https://github.com/google/reftable.
> +
> +The VERSION file keeps track of the current version of the reftable library.
> +
> +To update the library, do:
> +
> +   ((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
> +    git clone https://github.com/google/reftable reftable-repo) && \
> +   cp reftable-repo/c/*.[ch] reftable/ && \
> +   cp reftable-repo/LICENSE reftable/ &&
> +   git --git-dir reftable-repo/.git show --no-patch origin/master \
> +    > reftable/VERSION && \
> +   echo '/* empty */' > reftable/config.h
> +   rm reftable/*_test.c reftable/test_framework.*
> +   git add reftable/*.[ch]
> +
> +Bugfixes should be accompanied by a test and applied to upstream project at
> +https://github.com/google/reftable.

I think this particular statement may be problematic.  The upstream
project requires a CLA, which is a non-starter for many folks.  I don't
think we can expect folks who are working on Git to necessarily send
patches upstream with that condition.

For example, if I end up needing to patch this series to work with
SHA-256, I'm happy for Google to have my changes under the BSD License,
but I won't be signing a CLA.

> +#define SHA1_SIZE 20
> +#define SHA256_SIZE 32

I'd rather we didn't hard-code these values here.  If you need a size
suitable for allocation, we have GIT_MAX_RAWSZ and GIT_MAX_HEXSZ.  Those
are guaranteed to be the right size for any future hash.

> +	{
> +		struct merged_table m = {
> +			.stack = stack,
> +			.stack_len = n,
> +			.min = first_min,
> +			.max = last_max,
> +			.hash_size = SHA1_SIZE,

This is going to cause problems with SHA-256.  We'd want to write this
as the_hash_algo->rawsz, since it can change at runtime.

> +static byte zero[SHA256_SIZE] = {};

It would be better to refer to null_oid here.

> +#ifndef REFTABLE_H
> +#define REFTABLE_H
> +
> +#include "system.h"
> +
> +typedef uint8_t byte;

I think we typically prefer writing "unsigned char" or "uint8_t" instead
of "byte".

> +typedef byte bool;

I suspect this is going to cause a whole host of sadness if somebody
ever includes stdbool.h, or if that header ever gets included by an
OS-specific header, since bool is a #define constant for the built-in
type _Bool.

We typically use int for booleans, but I'm not opposed to using
stdbool.h for it on those systems that we know have it (which, in 2020,
is probably all of them).

> +struct writer *new_writer(int (*writer_func)(void *, byte *, int),
> +			  void *writer_arg, struct write_options *opts)
> +{
> +	struct writer *wp = calloc(sizeof(struct writer), 1);
> +	options_set_defaults(opts);
> +	if (opts->block_size >= (1 << 24)) {
> +		/* TODO - error return? */
> +		abort();
> +	}
> +	wp->hash_size = SHA1_SIZE;

Another SHA-256 problem.

Once this series has good test coverage, I'm happy to send a patch that
squashes in SHA-256 support if you don't want to implement it yourself
and you CC me.  If you do want to implement it yourself, you can grab
the transition-stage-4 branch from https://github.com/bk2204/git.git,
rebase your series on top of it, and run the testsuite with
GIT_TEST_DEFAULT_HASH=sha256 to see what breaks.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 4/5] Add reftable library
  2020-02-07  0:16         ` brian m. carlson
@ 2020-02-10 13:16           ` Han-Wen Nienhuys
  2020-02-11  0:05             ` brian m. carlson
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-10 13:16 UTC (permalink / raw)
  To: brian m. carlson, Han-Wen Nienhuys via GitGitGadget, git,
	Han-Wen Nienhuys, Han-Wen Nienhuys

On Fri, Feb 7, 2020 at 1:16 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> > +
> > +Bugfixes should be accompanied by a test and applied to upstream project at
> > +https://github.com/google/reftable.
>
> I think this particular statement may be problematic.  The upstream
> project requires a CLA, which is a non-starter for many folks.  I don't
> think we can expect folks who are working on Git to necessarily send
> patches upstream with that condition.
>
> For example, if I end up needing to patch this series to work with
> SHA-256, I'm happy for Google to have my changes under the BSD License,
> but I won't be signing a CLA.

The CLA requirement is something our Open Source lawyers insist on,
and it's not for me to renege on. AFAIU, the CLA is essentially a
legally sound formulation of "I'm happy for Google to have my changes
under the BSD License"

> > +#define SHA1_SIZE 20
> > +#define SHA256_SIZE 32
>
> I'd rather we didn't hard-code these values here.  If you need a size
> suitable for allocation, we have GIT_MAX_RAWSZ and GIT_MAX_HEXSZ.  Those
> are guaranteed to be the right size for any future hash.

this code is structured as a standalone library so the same code can
be integrated into libgit2 unchanged, and can be unit-tested without
needing complex scaffolding.

As a result some data (eg. SHA1_SIZE) is duplicate. It also comes with
its own strbuf library.

> > +     {
> > +             struct merged_table m = {
> > +                     .stack = stack,
> > +                     .stack_len = n,
> > +                     .min = first_min,
> > +                     .max = last_max,
> > +                     .hash_size = SHA1_SIZE,
>
> This is going to cause problems with SHA-256.  We'd want to write this
> as the_hash_algo->rawsz, since it can change at runtime.

As this is within the library, the hash size should be injected
somewhere. I don't see huge problems here (the hash size could be part
of the 'struct write_options'), but until we have properly defined how
reftable will mark the on-disk files as SHA-256, it will only supports
SHA1. I intend to work set the file format in motion, once this SHA1
version is accepted into git-core.

There was one question you might be able to answer, though. Can
repositories exist in dual hash configuration, i.e. where the ref
database must store both SHA1 and SHA256 hashes?

> > +static byte zero[SHA256_SIZE] = {};
>
> It would be better to refer to null_oid here.

See above.

>
> > +#ifndef REFTABLE_H
> > +#define REFTABLE_H
> > +
> > +#include "system.h"
> > +
> > +typedef uint8_t byte;
>
> I think we typically prefer writing "unsigned char" or "uint8_t" instead
> of "byte".
>
> > +typedef byte bool;
>
> I suspect this is going to cause a whole host of sadness if somebody
> ever includes stdbool.h, or if that header ever gets included by an
> OS-specific header, since bool is a #define constant for the built-in
> type _Bool.
>
> We typically use int for booleans, but I'm not opposed to using
> stdbool.h for it on those systems that we know have it (which, in 2020,
> is probably all of them).

fixed in https://github.com/google/reftable/commit/a0c794609f2ce8114f014f7260093d4ffdde0d5d

> > +struct writer *new_writer(int (*writer_func)(void *, byte *, int),
> > +                       void *writer_arg, struct write_options *opts)
> > +{
> > +     struct writer *wp = calloc(sizeof(struct writer), 1);
> > +     options_set_defaults(opts);
> > +     if (opts->block_size >= (1 << 24)) {
> > +             /* TODO - error return? */
> > +             abort();
> > +     }
> > +     wp->hash_size = SHA1_SIZE;
>
> Another SHA-256 problem.
>
> Once this series has good test coverage, I'm happy to send a patch that
> squashes in SHA-256 support if you don't want to implement it yourself

There is really nothing to implement. The C library will accept hashes
of arbitrary size (hence the hash_size field you commented on before),
but it just needs a tiny bit of plumbing for the library user to
specify the wanted hash size, and a change to the documented format to
store it in the file format header.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 5/5] Reftable support for git-core
  2020-02-06 23:49         ` brian m. carlson
@ 2020-02-10 13:18           ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-10 13:18 UTC (permalink / raw)
  To: brian m. carlson, Han-Wen Nienhuys via GitGitGadget, git,
	Han-Wen Nienhuys, Han-Wen Nienhuys

On Fri, Feb 7, 2020 at 12:49 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2020-02-06 at 22:55:56, Han-Wen Nienhuys via GitGitGadget wrote:
> > @@ -403,6 +417,10 @@ int init_db(const char *git_dir, const char *real_git_dir,
> >               git_config_set("receive.denyNonFastforwards", "true");
> >       }
> >
> > +     if (flags & INIT_DB_REFTABLE) {
> > +             git_config_set("extensions.refStorage", "reftable");
> > +     }
>
> This does seem like the best way to do this.  Can we get an addition to
> Documentation/technical/repository-version.txt that documents this
> extension and the values it takes?  I presume "reftable" and "files" are
> options, but it would be nice it folks didn't have to dig through the
> code to find that out.

done.

> I wonder if this might be better as --refs-backend={files|reftable}.  If
> reftable becomes the default at some point in the future, it would be
> easier to let people explicitly specify that they want a non-reftable
> version.  If we learn support for a hypothetical reftable v2 in the
> future, maybe we'd want to let folks specify that instead.

done.

> > -static struct ref_storage_be *refs_backends = &refs_be_files;
> > +static struct ref_storage_be *refs_backends = &refs_be_reftable;
>
> I'm not sure why we're making this change.  It doesn't look like the
> default is changing.

This is adding the reftable backend to the ref_storage_be linked list.

>
> > @@ -1913,7 +1916,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
> >               goto done;
> >
> >       /* assume that add_submodule_odb() has been called */
> > -     refs = ref_store_init(submodule_sb.buf,
> > +     refs = ref_store_init(submodule_sb.buf, "files", /* XXX */
> >                             REF_STORE_READ | REF_STORE_ODB);
> >       register_ref_store_map(&submodule_ref_stores, "submodule",
> >                              refs, submodule);
> > @@ -1927,6 +1930,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
> >
> >  struct ref_store *get_worktree_ref_store(const struct worktree *wt)
> >  {
> > +     const char *format = "files"; /* XXX */
>
> If the question is whether we still want to default to "files" as the
> default, then yes, I think we do.  Suddenly changing things would be a
> problem if for some reason someone needed to downgrade Git versions.
>
> Since we have two instances of "files" above, it might be better to
> create a #define like DEFAULT_REF_BACKEND or some such.

I did this. I stuck it in refs.h which is probably the wrong place.
Suggestions welcome.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v5 0/5] Reftable support git-core
  2020-02-06 22:55     ` [PATCH v4 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                         ` (5 preceding siblings ...)
  2020-02-06 23:31       ` [PATCH v4 0/5] Reftable support git-core brian m. carlson
@ 2020-02-10 14:14       ` Han-Wen Nienhuys via GitGitGadget
  2020-02-10 14:14         ` [PATCH v5 1/5] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
                           ` (5 more replies)
  6 siblings, 6 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-10 14:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend.

At this point, I am mainly interested in feedback on the spots marked with
XXX in the Git source code.

v3

 * passes gitgitgadget CI.

Han-Wen Nienhuys (5):
  refs.h: clarify reflog iteration order
  create .git/refs in files-backend.c
  refs: document how ref_iterator_advance_fn should handle symrefs
  Add reftable library
  Reftable support for git-core

 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   24 +-
 builtin/clone.c                               |    4 +-
 builtin/init-db.c                             |   57 +-
 cache.h                                       |    4 +-
 refs.c                                        |   20 +-
 refs.h                                        |    8 +-
 refs/files-backend.c                          |    6 +
 refs/refs-internal.h                          |    6 +
 refs/reftable-backend.c                       |  880 +++++++++++++
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   19 +
 reftable/VERSION                              |    5 +
 reftable/basics.c                             |  196 +++
 reftable/basics.h                             |   37 +
 reftable/block.c                              |  401 ++++++
 reftable/block.h                              |   71 ++
 reftable/blocksource.h                        |   20 +
 reftable/bytes.c                              |    0
 reftable/config.h                             |    1 +
 reftable/constants.h                          |   27 +
 reftable/dump.c                               |   97 ++
 reftable/file.c                               |   97 ++
 reftable/iter.c                               |  229 ++++
 reftable/iter.h                               |   56 +
 reftable/merged.c                             |  286 +++++
 reftable/merged.h                             |   34 +
 reftable/pq.c                                 |  114 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  708 +++++++++++
 reftable/reader.h                             |   52 +
 reftable/record.c                             | 1107 +++++++++++++++++
 reftable/record.h                             |   79 ++
 reftable/reftable.h                           |  394 ++++++
 reftable/slice.c                              |  199 +++
 reftable/slice.h                              |   39 +
 reftable/stack.c                              |  983 +++++++++++++++
 reftable/stack.h                              |   40 +
 reftable/system.h                             |   58 +
 reftable/tree.c                               |   66 +
 reftable/tree.h                               |   24 +
 reftable/writer.c                             |  623 ++++++++++
 reftable/writer.h                             |   46 +
 reftable/zlib-compat.c                        |   92 ++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 47 files changed, 7266 insertions(+), 32 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c


base-commit: 5b0ca878e008e82f91300091e793427205ce3544
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v5
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v5
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v4:

 1:  c00403c94d = 1:  c00403c94d refs.h: clarify reflog iteration order
 2:  4d6da9bc47 = 2:  4d6da9bc47 create .git/refs in files-backend.c
 3:  fbdcdccc88 = 3:  fbdcdccc88 refs: document how ref_iterator_advance_fn should handle symrefs
 4:  02d2ca8b87 ! 4:  546b82fe79 Add reftable library
     @@ -98,11 +98,11 @@
       --- /dev/null
       +++ b/reftable/VERSION
      @@
     -+commit e7c3fc3099d9999bc8d895f84027b0e36348d5e6
     ++commit 6115b50fdb9bc662be39b05f5589bc109282ae7f
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Thu Feb 6 20:17:40 2020 +0100
     ++Date:   Mon Feb 10 13:59:52 2020 +0100
      +
     -+    C: use inttypes.h header definitions
     ++    README: add a note about the Java implementation
      
       diff --git a/reftable/basics.c b/reftable/basics.c
       new file mode 100644
     @@ -3885,9 +3885,6 @@
      +
      +#include "system.h"
      +
     -+typedef uint8_t byte;
     -+typedef byte bool;
     -+
      +/* block_source is a generic wrapper for a seekable readable file.
      +   It is generally passed around by value.
      + */
     @@ -3900,7 +3897,7 @@
      +   so it can return itself into the pool.
      +*/
      +struct block {
     -+	byte *data;
     ++	uint8_t *data;
      +	int len;
      +	struct block_source source;
      +};
     @@ -3926,14 +3923,14 @@
      +
      +/* write_options sets options for writing a single reftable. */
      +struct write_options {
     -+	/* do not pad out blocks to block size. */
     -+	bool unpadded;
     ++	/* boolean: do not pad out blocks to block size. */
     ++	int unpadded;
      +
      +	/* the blocksize. Should be less than 2^24. */
      +	uint32_t block_size;
      +
     -+	/* do not generate a SHA1 => ref index. */
     -+	bool skip_index_objects;
     ++	/* boolean: do not generate a SHA1 => ref index. */
     ++	int skip_index_objects;
      +
      +	/* how often to write complete keys in each block. */
      +	int restart_interval;
     @@ -3944,13 +3941,13 @@
      +	char *ref_name; /* Name of the ref, malloced. */
      +	uint64_t update_index; /* Logical timestamp at which this value is
      +				  written */
     -+	byte *value; /* SHA1, or NULL. malloced. */
     -+	byte *target_value; /* peeled annotated tag, or NULL. malloced. */
     ++	uint8_t *value; /* SHA1, or NULL. malloced. */
     ++	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
      +	char *target; /* symref, or NULL. malloced. */
      +};
      +
      +/* returns whether 'ref' represents a deletion */
     -+bool ref_record_is_deletion(const struct ref_record *ref);
     ++int ref_record_is_deletion(const struct ref_record *ref);
      +
      +/* prints a ref_record onto stdout */
      +void ref_record_print(struct ref_record *ref, int hash_size);
     @@ -3959,15 +3956,14 @@
      +void ref_record_clear(struct ref_record *ref);
      +
      +/* returns whether two ref_records are the same */
     -+bool ref_record_equal(struct ref_record *a, struct ref_record *b,
     -+		      int hash_size);
     ++int ref_record_equal(struct ref_record *a, struct ref_record *b, int hash_size);
      +
      +/* log_record holds a reflog entry */
      +struct log_record {
      +	char *ref_name;
      +	uint64_t update_index;
     -+	byte *new_hash;
     -+	byte *old_hash;
     ++	uint8_t *new_hash;
     ++	uint8_t *old_hash;
      +	char *name;
      +	char *email;
      +	uint64_t time;
     @@ -3976,14 +3972,13 @@
      +};
      +
      +/* returns whether 'ref' represents the deletion of a log record. */
     -+bool log_record_is_deletion(const struct log_record *log);
     ++int log_record_is_deletion(const struct log_record *log);
      +
      +/* frees and nulls all pointer values. */
      +void log_record_clear(struct log_record *log);
      +
      +/* returns whether two records are equal. */
     -+bool log_record_equal(struct log_record *a, struct log_record *b,
     -+		      int hash_size);
     ++int log_record_equal(struct log_record *a, struct log_record *b, int hash_size);
      +
      +void log_record_print(struct log_record *log, int hash_size);
      +
     @@ -4076,11 +4071,11 @@
      +const char *error_str(int err);
      +
      +/* new_writer creates a new writer */
     -+struct writer *new_writer(int (*writer_func)(void *, byte *, int),
     ++struct writer *new_writer(int (*writer_func)(void *, uint8_t *, int),
      +			  void *writer_arg, struct write_options *opts);
      +
      +/* write to a file descriptor. fdp should be an int* pointing to the fd. */
     -+int fd_writer(void *fdp, byte *data, int size);
     ++int fd_writer(void *fdp, uint8_t *data, int size);
      +
      +/* Set the range of update indices for the records we will add.  When
      +   writing a table into a stack, the min should be at least
     @@ -4163,7 +4158,7 @@
      +void reader_free(struct reader *);
      +
      +/* return an iterator for the refs pointing to oid */
     -+int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
     ++int reader_refs_for(struct reader *r, struct iterator *it, uint8_t *oid,
      +		    int oid_len);
      +
      +/* return the max_update_index for a table */
     @@ -5613,6 +5608,9 @@
      +	}
      +#endif
      +
     ++typedef uint8_t byte;
     ++typedef int bool;
     ++
      +int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
      +			       const Bytef *source, uLong *sourceLen);
      +
 5:  2786a6bf61 ! 5:  702fb89871 Reftable support for git-core
     @@ -41,6 +41,21 @@
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
          Co-authored-by: Jeff King <peff@peff.net>
      
     + diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
     + --- a/Documentation/technical/repository-version.txt
     + +++ b/Documentation/technical/repository-version.txt
     +@@
     + multiple working directory mode, "config" file is shared while
     + "config.worktree" is per-working directory (i.e., it's in
     + GIT_COMMON_DIR/worktrees/<id>/config.worktree)
     ++
     ++==== `refStorage`
     ++
     ++Specifies the file format for the ref database. Values are `files`
     ++(for the traditional packed + loose ref format) and `reftable` for the
     ++binary reftable format. See https://github.com/google/reftable for
     ++more information.
     +
       diff --git a/Makefile b/Makefile
       --- a/Makefile
       +++ b/Makefile
     @@ -109,6 +124,21 @@ $^
       
       Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
      
     + diff --git a/builtin/clone.c b/builtin/clone.c
     + --- a/builtin/clone.c
     + +++ b/builtin/clone.c
     +@@
     + 		}
     + 	}
     + 
     +-	init_db(git_dir, real_git_dir, option_template, INIT_DB_QUIET);
     ++	init_db(git_dir, real_git_dir, option_template,
     ++		DEFAULT_REF_STORAGE, /* XXX */
     ++		INIT_DB_QUIET);
     + 
     + 	if (real_git_dir)
     + 		git_dir = real_git_dir;
     +
       diff --git a/builtin/init-db.c b/builtin/init-db.c
       --- a/builtin/init-db.c
       +++ b/builtin/init-db.c
     @@ -117,7 +147,8 @@ $^
       
       static int create_default_files(const char *template_path,
      -				const char *original_git_dir)
     -+				const char *original_git_dir, int flags)
     ++				const char *original_git_dir,
     ++				const char *ref_storage_format, int flags)
       {
       	struct stat st1;
       	struct strbuf buf = STRBUF_INIT;
     @@ -125,8 +156,7 @@ $^
       	is_bare_repository_cfg = init_is_bare_repository;
       	if (init_shared_repository != -1)
       		set_shared_repository(init_shared_repository);
     -+	if (flags & INIT_DB_REFTABLE)
     -+		the_repository->ref_storage_format = xstrdup("reftable");
     ++	the_repository->ref_storage_format = xstrdup(ref_storage_format);
       
       	/*
       	 * We would have created the above under user's umask -- under
     @@ -167,17 +197,29 @@ $^
      -	xsnprintf(repo_version_string, sizeof(repo_version_string),
      -		  "%d", GIT_REPO_VERSION);
      +	xsnprintf(repo_version_string, sizeof(repo_version_string), "%d",
     -+		  flags & INIT_DB_REFTABLE ? GIT_REPO_VERSION_READ :
     -+					     GIT_REPO_VERSION);
     ++		  !strcmp(ref_storage_format, "reftable") ?
     ++			  GIT_REPO_VERSION_READ :
     ++			  GIT_REPO_VERSION);
       	git_config_set("core.repositoryformatversion", repo_version_string);
       
       	/* Check filemode trustability */
     +@@
     + }
     + 
     + int init_db(const char *git_dir, const char *real_git_dir,
     +-	    const char *template_dir, unsigned int flags)
     ++	    const char *template_dir, const char *ref_storage_format,
     ++	    unsigned int flags)
     + {
     + 	int reinit;
     + 	int exist_ok = flags & INIT_DB_EXIST_OK;
      @@
       	 */
       	check_repository_format();
       
      -	reinit = create_default_files(template_dir, original_git_dir);
     -+	reinit = create_default_files(template_dir, original_git_dir, flags);
     ++	reinit = create_default_files(template_dir, original_git_dir,
     ++				      ref_storage_format, flags);
       
       	create_object_directory();
       
     @@ -185,14 +227,18 @@ $^
       		git_config_set("receive.denyNonFastforwards", "true");
       	}
       
     -+	if (flags & INIT_DB_REFTABLE) {
     -+		git_config_set("extensions.refStorage", "reftable");
     -+	}
     ++	git_config_set("extensions.refStorage", ref_storage_format);
      +
       	if (!(flags & INIT_DB_QUIET)) {
       		int len = strlen(git_dir);
       
      @@
     + int cmd_init_db(int argc, const char **argv, const char *prefix)
     + {
     + 	const char *git_dir;
     ++	const char *ref_storage_format = DEFAULT_REF_STORAGE;
     + 	const char *real_git_dir = NULL;
     + 	const char *work_tree;
       	const char *template_dir = NULL;
       	unsigned int flags = 0;
       	const struct option init_db_options[] = {
     @@ -212,23 +258,38 @@ $^
      +		  N_("specify that the git repository is to be shared amongst several users"),
      +		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
       		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
     -+		OPT_BIT(0, "reftable", &flags, N_("use reftable"),
     -+			INIT_DB_REFTABLE),
     ++		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
     ++			   N_("the ref storage format to use")),
       		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
       			   N_("separate git dir from working tree")),
       		OPT_END()
     +@@
     + 	}
     + 
     + 	UNLEAK(real_git_dir);
     ++	UNLEAK(ref_storage_format);
     + 	UNLEAK(git_dir);
     + 	UNLEAK(work_tree);
     + 
     + 	flags |= INIT_DB_EXIST_OK;
     +-	return init_db(git_dir, real_git_dir, template_dir, flags);
     ++	return init_db(git_dir, real_git_dir, template_dir, ref_storage_format,
     ++		       flags);
     + }
      
       diff --git a/cache.h b/cache.h
       --- a/cache.h
       +++ b/cache.h
      @@
     - 
     - #define INIT_DB_QUIET 0x0001
       #define INIT_DB_EXIST_OK 0x0002
     -+#define INIT_DB_REFTABLE 0x0004
       
       int init_db(const char *git_dir, const char *real_git_dir,
     - 	    const char *template_dir, unsigned int flags);
     +-	    const char *template_dir, unsigned int flags);
     ++	    const char *template_dir, const char *ref_storage_format,
     ++	    unsigned int flags);
     + 
     + void sanitize_stdfds(void);
     + int daemonize(void);
      @@
       	int is_bare;
       	int hash_algo;
     @@ -274,7 +335,7 @@ $^
      -	r->refs = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
      +	r->refs = ref_store_init(r->gitdir,
      +				 r->ref_storage_format ? r->ref_storage_format :
     -+							 "files",
     ++							 DEFAULT_REF_STORAGE,
      +				 REF_STORE_ALL_CAPS);
       	return r->refs;
       }
     @@ -284,7 +345,7 @@ $^
       
       	/* assume that add_submodule_odb() has been called */
      -	refs = ref_store_init(submodule_sb.buf,
     -+	refs = ref_store_init(submodule_sb.buf, "files", /* XXX */
     ++	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
       			      REF_STORE_READ | REF_STORE_ODB);
       	register_ref_store_map(&submodule_ref_stores, "submodule",
       			       refs, submodule);
     @@ -292,7 +353,7 @@ $^
       
       struct ref_store *get_worktree_ref_store(const struct worktree *wt)
       {
     -+	const char *format = "files"; /* XXX */
     ++	const char *format = DEFAULT_REF_STORAGE; /* XXX */
       	struct ref_store *refs;
       	const char *id;
       
     @@ -309,6 +370,20 @@ $^
       
       	if (refs)
      
     + diff --git a/refs.h b/refs.h
     + --- a/refs.h
     + +++ b/refs.h
     +@@
     + struct string_list_item;
     + struct worktree;
     + 
     ++/* XXX where should this be? */
     ++#define DEFAULT_REF_STORAGE "files"
     ++
     + /*
     +  * Resolve a reference, recursively following symbolic refererences.
     +  *
     +
       diff --git a/refs/refs-internal.h b/refs/refs-internal.h
       --- a/refs/refs-internal.h
       +++ b/refs/refs-internal.h

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v5 1/5] refs.h: clarify reflog iteration order
  2020-02-10 14:14       ` [PATCH v5 " Han-Wen Nienhuys via GitGitGadget
@ 2020-02-10 14:14         ` Han-Wen Nienhuys via GitGitGadget
  2020-02-10 14:14         ` [PATCH v5 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-10 14:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/refs.h b/refs.h
index 545029c6d8..87c9ec921b 100644
--- a/refs.h
+++ b/refs.h
@@ -444,18 +444,21 @@ int delete_refs(const char *msg, struct string_list *refnames,
 int refs_delete_reflog(struct ref_store *refs, const char *refname);
 int delete_reflog(const char *refname);
 
-/* iterate over reflog entries */
+/* Iterate over reflog entries. */
 typedef int each_reflog_ent_fn(
 		struct object_id *old_oid, struct object_id *new_oid,
 		const char *committer, timestamp_t timestamp,
 		int tz, const char *msg, void *cb_data);
 
+/* Iterate in over reflog entries, oldest entry first. */
 int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
 			     each_reflog_ent_fn fn, void *cb_data);
 int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
 				     const char *refname,
 				     each_reflog_ent_fn fn,
 				     void *cb_data);
+
+/* Call a function for each reflog entry, oldest entry first. */
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v5 2/5] create .git/refs in files-backend.c
  2020-02-10 14:14       ` [PATCH v5 " Han-Wen Nienhuys via GitGitGadget
  2020-02-10 14:14         ` [PATCH v5 1/5] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
@ 2020-02-10 14:14         ` Han-Wen Nienhuys via GitGitGadget
  2020-02-10 14:14         ` [PATCH v5 3/5] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-10 14:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This prepares for supporting the reftable format, which will want
create its own file system layout in .git

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/init-db.c    | 2 --
 refs/files-backend.c | 6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 944ec77fe1..45bdea0589 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
 	 */
-	safe_create_dir(git_path("refs"), 1);
-	adjust_shared_perm(git_path("refs"));
 
 	if (refs_init_db(&err))
 		die("failed to set up refs db: %s", err.buf);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0ea66a28b6..1d362e99fb 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3158,9 +3158,15 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
 	struct strbuf sb = STRBUF_INIT;
 
+	files_ref_path(refs, &sb, "refs");
+	safe_create_dir(sb.buf, 1);
+	/* adjust permissions even if directory already exists. */
+	adjust_shared_perm(sb.buf);
+
 	/*
 	 * Create .git/refs/{heads,tags}
 	 */
+	strbuf_reset(&sb);
 	files_ref_path(refs, &sb, "refs/heads");
 	safe_create_dir(sb.buf, 1);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v5 3/5] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-02-10 14:14       ` [PATCH v5 " Han-Wen Nienhuys via GitGitGadget
  2020-02-10 14:14         ` [PATCH v5 1/5] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
  2020-02-10 14:14         ` [PATCH v5 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
@ 2020-02-10 14:14         ` Han-Wen Nienhuys via GitGitGadget
  2020-02-10 14:14         ` [PATCH v5 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
                           ` (2 subsequent siblings)
  5 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-10 14:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index ff2436c0fb..1d7a485220 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -438,6 +438,11 @@ void base_ref_iterator_free(struct ref_iterator *iter);
 
 /* Virtual function declarations for ref_iterators: */
 
+/*
+ * backend-specific implementation of ref_iterator_advance.
+ * For symrefs, the function should set REF_ISSYMREF, and it should also dereference
+ * the symref to provide the OID referent.
+ */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v5 4/5] Add reftable library
  2020-02-10 14:14       ` [PATCH v5 " Han-Wen Nienhuys via GitGitGadget
                           ` (2 preceding siblings ...)
  2020-02-10 14:14         ` [PATCH v5 3/5] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
@ 2020-02-10 14:14         ` Han-Wen Nienhuys via GitGitGadget
  2020-02-10 14:14         ` [PATCH v5 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
  2020-02-18  8:43         ` [PATCH v6 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  5 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-10 14:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a new format for storing the ref database. It provides the
following benefits:

 * Simple and fast atomic ref transactions, including multiple refs and reflogs.
 * Compact storage of ref data.
 * Fast look ups of ref data.
 * Case-sensitive ref names on Windows/OSX, regardless of file system
 * Eliminates file/directory conflicts in ref names

Further context and motivation can be found in background reading:

* Spec: https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md

* Original discussion on JGit-dev:  https://www.eclipse.org/lists/jgit-dev/msg03389.html

* First design discussion on git@vger: https://public-inbox.org/git/CAJo=hJtTp2eA3z9wW9cHo-nA7kK40vVThqh6inXpbCcqfdMP9g@mail.gmail.com/

* Last design discussion on git@vger: https://public-inbox.org/git/CAJo=hJsZcAM9sipdVr7TMD-FD2V2W6_pvMQ791EGCDsDkQ033w@mail.gmail.com/

* First attempt at implementation: https://public-inbox.org/git/CAP8UFD0PPZSjBnxCA7ez91vBuatcHKQ+JUWvTD1iHcXzPBjPBg@mail.gmail.com/

* libgit2 support issue: https://github.com/libgit2/libgit2/issues

* GitLab support issue: https://gitlab.com/gitlab-org/git/issues/6

* go-git support issue: https://github.com/src-d/go-git/issues/1059

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE       |   31 ++
 reftable/README.md     |   19 +
 reftable/VERSION       |    5 +
 reftable/basics.c      |  196 +++++++
 reftable/basics.h      |   37 ++
 reftable/block.c       |  401 +++++++++++++++
 reftable/block.h       |   71 +++
 reftable/blocksource.h |   20 +
 reftable/bytes.c       |    0
 reftable/config.h      |    1 +
 reftable/constants.h   |   27 +
 reftable/dump.c        |   97 ++++
 reftable/file.c        |   97 ++++
 reftable/iter.c        |  229 +++++++++
 reftable/iter.h        |   56 ++
 reftable/merged.c      |  286 +++++++++++
 reftable/merged.h      |   34 ++
 reftable/pq.c          |  114 +++++
 reftable/pq.h          |   34 ++
 reftable/reader.c      |  708 +++++++++++++++++++++++++
 reftable/reader.h      |   52 ++
 reftable/record.c      | 1107 ++++++++++++++++++++++++++++++++++++++++
 reftable/record.h      |   79 +++
 reftable/reftable.h    |  394 ++++++++++++++
 reftable/slice.c       |  199 ++++++++
 reftable/slice.h       |   39 ++
 reftable/stack.c       |  983 +++++++++++++++++++++++++++++++++++
 reftable/stack.h       |   40 ++
 reftable/system.h      |   58 +++
 reftable/tree.c        |   66 +++
 reftable/tree.h        |   24 +
 reftable/writer.c      |  623 ++++++++++++++++++++++
 reftable/writer.h      |   46 ++
 reftable/zlib-compat.c |   92 ++++
 34 files changed, 6265 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 0000000000..402e0f9356
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 0000000000..f527da0380
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,19 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   ((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
+    git clone https://github.com/google/reftable reftable-repo) && \
+   cp reftable-repo/c/*.[ch] reftable/ && \
+   cp reftable-repo/LICENSE reftable/ &&
+   git --git-dir reftable-repo/.git show --no-patch origin/master \
+    > reftable/VERSION && \
+   echo '/* empty */' > reftable/config.h
+   rm reftable/*_test.c reftable/test_framework.*
+   git add reftable/*.[ch]
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 0000000000..67a4bc8042
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,5 @@
+commit 6115b50fdb9bc662be39b05f5589bc109282ae7f
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Mon Feb 10 13:59:52 2020 +0100
+
+    README: add a note about the Java implementation
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 0000000000..791dcc867a
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,196 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_u24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)((i)&0xff);
+}
+
+uint32_t get_u24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_u32(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 24) & 0xff);
+	out[1] = (byte)((i >> 16) & 0xff);
+	out[2] = (byte)((i >> 8) & 0xff);
+	out[3] = (byte)((i)&0xff);
+}
+
+uint32_t get_u32(byte *in)
+{
+	return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
+	       (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
+}
+
+void put_u64(byte *out, uint64_t v)
+{
+	int i = 0;
+	for (i = sizeof(uint64_t); i--;) {
+		out[i] = (byte)(v & 0xff);
+		v >>= 8;
+	}
+}
+
+uint64_t get_u64(byte *out)
+{
+	uint64_t v = 0;
+	int i = 0;
+	for (i = 0; i < sizeof(uint64_t); i++) {
+		v = (v << 8) | (byte)(out[i] & 0xff);
+	}
+	return v;
+}
+
+void put_u16(byte *out, uint16_t i)
+{
+	out[0] = (byte)((i >> 8) & 0xff);
+	out[1] = (byte)((i)&0xff);
+}
+
+uint16_t get_u16(byte *in)
+{
+	return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
+}
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(int sz, int (*f)(int k, void *args), void *args)
+{
+	int lo = 0;
+	int hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		int mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		free(*p);
+		p++;
+	}
+	free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	for (char **p = names; *p; p++) {
+		len++;
+	}
+	return len;
+}
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = realloc(names,
+						names_cap * sizeof(char *));
+			}
+			names[names_len++] = strdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *error_str(int err)
+{
+	switch (err) {
+	case IO_ERROR:
+		return "I/O error";
+	case FORMAT_ERROR:
+		return "FORMAT_ERROR";
+	case NOT_EXIST_ERROR:
+		return "NOT_EXIST_ERROR";
+	case LOCK_ERROR:
+		return "LOCK_ERROR";
+	case API_ERROR:
+		return "API_ERROR";
+	case ZLIB_ERROR:
+		return "ZLIB_ERROR";
+	case -1:
+		return "general error";
+	default:
+		return "unknown error code";
+	}
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 0000000000..0ad368cfd3
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,37 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+void put_u24(byte *out, uint32_t i);
+uint32_t get_u24(byte *in);
+
+uint64_t get_u64(byte *in);
+void put_u64(byte *out, uint64_t i);
+
+void put_u32(byte *out, uint32_t i);
+uint32_t get_u32(byte *in);
+
+void put_u16(byte *out, uint16_t i);
+uint16_t get_u16(byte *in);
+int binsearch(int sz, int (*f)(int k, void *args), void *args);
+
+void free_names(char **a);
+void parse_names(char *buf, int size, char ***namesp);
+int names_equal(char **a, char **b);
+int names_length(char **names);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 0000000000..ffbfee13c3
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,401 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "blocksource.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct record rec)
+{
+	struct slice empty = {};
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = {};
+	int n = 0;
+
+	record_key(rec, &key);
+	n = encode_key(&restart, out, last, key, record_val_type(rec));
+	if (n < 0) {
+		goto err;
+	}
+	out.buf += n;
+	out.len -= n;
+
+	n = record_encode(rec, out, w->hash_size);
+	if (n < 0) {
+		goto err;
+	}
+
+	out.buf += n;
+	out.len -= n;
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0) {
+		goto err;
+	}
+
+	free(slice_yield(&key));
+	return 0;
+
+err:
+	free(slice_yield(&key));
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next) {
+		return -1;
+	}
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_u24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_u16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_u24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = {};
+		uLongf dest_len = 0, src_len = 0;
+		slice_resize(&compressed, w->next - block_header_skip);
+
+		dest_len = compressed.len;
+		src_len = w->next - block_header_skip;
+
+		if (Z_OK != compress2(compressed.buf, &dest_len,
+				      w->buf + block_header_skip, src_len, 9)) {
+			free(slice_yield(&compressed));
+			return ZLIB_ERROR;
+		}
+		memcpy(w->buf + block_header_skip, compressed.buf, dest_len);
+		w->next = dest_len + block_header_skip;
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_u24(block->data + header_off + 1);
+
+	if (!is_block_type(typ)) {
+		return FORMAT_ERROR;
+	}
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = {};
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip;
+		uLongf src_len = block->len - block_header_skip;
+
+		slice_resize(&uncompressed, sz);
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			free(slice_yield(&uncompressed));
+			return ZLIB_ERROR;
+		}
+
+		block_source_return_block(block->source, block);
+		block->data = uncompressed.buf;
+		block->len = dst_len; /* XXX: 4 bytes missing? */
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	{
+		uint16_t restart_count = get_u16(block->data + sz - 2);
+		uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+		byte *restart_bytes = block->data + restart_start;
+
+		/* transfer ownership. */
+		br->block = *block;
+		block->data = NULL;
+		block->len = 0;
+
+		br->hash_size = hash_size;
+		br->block_len = restart_start;
+		br->full_block_size = full_block_size;
+		br->header_off = header_off;
+		br->restart_count = restart_count;
+		br->restart_bytes = restart_bytes;
+	}
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_u24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	struct slice key;
+	struct block_reader *r;
+	int error;
+};
+
+static int restart_key_less(int idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = {};
+	struct slice last_key = {};
+	byte unused_extra;
+	int n = decode_key(&rkey, &unused_extra, last_key, in);
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	{
+		int result = slice_compare(a->key, rkey);
+		free(slice_yield(&rkey));
+		return result;
+	}
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct record rec)
+{
+	if (it->next_off >= it->br->block_len) {
+		return 1;
+	}
+
+	{
+		struct slice in = {
+			.buf = it->br->block.data + it->next_off,
+			.len = it->br->block_len - it->next_off,
+		};
+		struct slice start = in;
+		struct slice key = {};
+		byte extra;
+		int n = decode_key(&key, &extra, it->last_key, in);
+		if (n < 0) {
+			return -1;
+		}
+
+		in.buf += n;
+		in.len -= n;
+		n = record_decode(rec, key, extra, in, it->br->hash_size);
+		if (n < 0) {
+			return -1;
+		}
+		in.buf += n;
+		in.len -= n;
+
+		slice_copy(&it->last_key, key);
+		it->next_off += start.len - in.len;
+		free(slice_yield(&key));
+		return 0;
+	}
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = {};
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	byte extra = 0;
+	int n = decode_key(key, &extra, empty, in);
+	if (n < 0) {
+		return n;
+	}
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	free(slice_yield(&it->last_key));
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		return -1;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	{
+		struct record rec = new_record(block_reader_type(br));
+		struct slice key = {};
+		int result = 0;
+		int err = 0;
+		struct block_iter next = {};
+		while (true) {
+			block_iter_copy_from(&next, it);
+
+			err = block_iter_next(&next, rec);
+			if (err < 0) {
+				result = -1;
+				goto exit;
+			}
+
+			record_key(rec, &key);
+			if (err > 0 || slice_compare(key, want) >= 0) {
+				result = 0;
+				goto exit;
+			}
+
+			block_iter_copy_from(it, &next);
+		}
+
+	exit:
+		free(slice_yield(&key));
+		free(slice_yield(&next.last_key));
+		record_clear(rec);
+		free(record_yield(&rec));
+
+		return result;
+	}
+}
+
+void block_writer_reset(struct block_writer *bw)
+{
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	free(slice_yield(&bw->last_key));
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 0000000000..bb42588111
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,71 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+	uint32_t header_off;
+	int restart_interval;
+	int hash_size;
+
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+	struct slice last_key;
+	int entries;
+};
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+byte block_writer_type(struct block_writer *bw);
+int block_writer_add(struct block_writer *w, struct record rec);
+int block_writer_finish(struct block_writer *w);
+void block_writer_reset(struct block_writer *bw);
+void block_writer_clear(struct block_writer *bw);
+
+struct block_reader {
+	uint32_t header_off;
+	struct block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint32_t full_block_size;
+	uint16_t restart_count;
+};
+
+struct block_iter {
+	struct block_reader *br;
+	struct slice last_key;
+	uint32_t next_off;
+};
+
+int block_reader_init(struct block_reader *br, struct block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want);
+byte block_reader_type(struct block_reader *r);
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+int block_iter_next(struct block_iter *it, struct record rec);
+int block_iter_seek(struct block_iter *it, struct slice want);
+void block_iter_close(struct block_iter *it);
+
+#endif
diff --git a/reftable/blocksource.h b/reftable/blocksource.h
new file mode 100644
index 0000000000..f3ad3a4c22
--- /dev/null
+++ b/reftable/blocksource.h
@@ -0,0 +1,20 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCKSOURCE_H
+#define BLOCKSOURCE_H
+
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source source);
+
+#endif
diff --git a/reftable/bytes.c b/reftable/bytes.c
new file mode 100644
index 0000000000..e69de29bb2
diff --git a/reftable/config.h b/reftable/config.h
new file mode 100644
index 0000000000..40a8c178f1
--- /dev/null
+++ b/reftable/config.h
@@ -0,0 +1 @@
+/* empty */
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 0000000000..cd35704610
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,27 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+#define VERSION 1
+#define HEADER_SIZE 24
+#define FOOTER_SIZE 68
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 0000000000..acabe18fbe
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "reftable.h"
+
+static int dump_table(const char *tablename)
+{
+	struct block_source src = {};
+	int err = block_source_from_file(&src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	struct reader *r = NULL;
+	err = new_reader(&r, src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct iterator it = {};
+		err = reader_seek_ref(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+
+		struct ref_record ref = {};
+		while (1) {
+			err = iterator_next_ref(it, &ref);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			ref_record_print(&ref, 20);
+		}
+		iterator_destroy(&it);
+		ref_record_clear(&ref);
+	}
+
+	{
+		struct iterator it = {};
+		err = reader_seek_log(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+		struct log_record log = {};
+		while (1) {
+			err = iterator_next_log(it, &log);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			log_record_print(&log, 20);
+		}
+		iterator_destroy(&it);
+		log_record_clear(&log);
+	}
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int opt;
+	const char *table = NULL;
+	while ((opt = getopt(argc, argv, "t:")) != -1) {
+		switch (opt) {
+		case 't':
+			table = strdup(optarg);
+			break;
+		case '?':
+			printf("usage: %s [-table tablefile]\n", argv[0]);
+			return 2;
+			break;
+		}
+	}
+
+	if (table != NULL) {
+		int err = dump_table(table);
+		if (err < 0) {
+			fprintf(stderr, "%s: %s: %s\n", argv[0], table,
+				error_str(err));
+			return 1;
+		}
+	}
+	return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 0000000000..b2ea90bf94
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	free(b);
+}
+
+static int file_read_block(void *v, struct block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size) {
+		return -1;
+	}
+	dest->len = size;
+	return size;
+}
+
+struct block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int block_source_from_file(struct block_source *bs, const char *name)
+{
+	struct stat st = {};
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0) {
+		return -1;
+	}
+
+	{
+		struct file_block_source *p =
+			calloc(sizeof(struct file_block_source), 1);
+		p->size = st.st_size;
+		p->fd = fd;
+
+		bs->ops = &file_vtable;
+		bs->arg = p;
+	}
+	return 0;
+}
+
+int fd_writer(void *arg, byte *data, int sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 0000000000..c06c891366
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,229 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct iterator it)
+{
+	return it.ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct record rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct iterator *it)
+{
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct iterator it, struct record rec)
+{
+	return it.ops->next(it.iter_arg, rec);
+}
+
+void iterator_destroy(struct iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int iterator_next_ref(struct iterator it, struct ref_record *ref)
+{
+	struct record rec = {};
+	record_from_ref(&rec, ref);
+	return iterator_next(it, rec);
+}
+
+int iterator_next_log(struct iterator it, struct log_record *log)
+{
+	struct record rec = {};
+	record_from_log(&rec, log);
+	return iterator_next(it, rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	free(slice_yield(&fri->oid));
+	iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct ref_record *ref = (struct ref_record *)rec.data;
+
+	while (true) {
+		int err = iterator_next_ref(fri->it, ref);
+		if (err != 0) {
+			return err;
+		}
+
+		if (fri->double_check) {
+			struct iterator it = {};
+
+			int err = reader_seek_ref(fri->r, &it, ref->ref_name);
+			if (err == 0) {
+				err = iterator_next_ref(it, ref);
+			}
+
+			iterator_destroy(&it);
+
+			if (err < 0) {
+				return err;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+}
+
+struct iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reader_return_block(it->r, &it->block_reader.block);
+	free(slice_yield(&it->oid));
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reader_return_block(it->r, &it->block_reader.block);
+
+	{
+		uint64_t off = it->offsets[it->offset_idx++];
+		int err = reader_init_block_reader(it->r, &it->block_reader,
+						   off, BLOCK_TYPE_REF);
+		if (err < 0) {
+			return err;
+		}
+		if (err > 0) {
+			/* indexed block does not exist. */
+			return FORMAT_ERROR;
+		}
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct record rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct ref_record *ref = (struct ref_record *)rec.data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reader *r, byte *oid, int oid_len,
+			       uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter *itr =
+		calloc(sizeof(struct indexed_table_ref_iter), 1);
+	int err = 0;
+
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 0000000000..f497f2a27e
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,56 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct iterator_vtable {
+	int (*next)(void *iter_arg, struct record rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct iterator *it);
+int iterator_next(struct iterator it, struct record rec);
+bool iterator_is_null(struct iterator it);
+
+struct filtering_ref_iterator {
+	struct reader *r;
+	struct slice oid;
+	bool double_check;
+	struct iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *,
+					  struct filtering_ref_iterator *);
+
+struct indexed_table_ref_iter {
+	struct reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reader *r, byte *oid, int oid_len,
+			       uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 0000000000..7d52ec6232
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,286 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct record rec = new_record(mi->typ);
+		int err = iterator_next(mi->stack[i], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			iterator_destroy(&mi->stack[i]);
+			record_clear(rec);
+			free(record_yield(&rec));
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		iterator_destroy(&mi->stack[i]);
+	}
+	free(mi->stack);
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, int idx)
+{
+	if (iterator_is_null(mi->stack[idx])) {
+		return 0;
+	}
+
+	{
+		struct record rec = new_record(mi->typ);
+		struct pq_entry e = {
+			.rec = rec,
+			.index = idx,
+		};
+		int err = iterator_next(mi->stack[idx], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			iterator_destroy(&mi->stack[idx]);
+			record_clear(rec);
+			free(record_yield(&rec));
+			return 0;
+		}
+
+		merged_iter_pqueue_add(&mi->pq, e);
+	}
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct record rec)
+{
+	struct slice entry_key = {};
+	struct pq_entry entry = merged_iter_pqueue_remove(&mi->pq);
+	int err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0) {
+		return err;
+	}
+
+	record_key(entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = {};
+		int err = 0, cmp = 0;
+
+		record_key(top.rec, &k);
+
+		cmp = slice_compare(k, entry_key);
+		free(slice_yield(&k));
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		record_clear(top.rec);
+		free(record_yield(&top.rec));
+	}
+
+	record_copy_from(rec, entry.rec, mi->hash_size);
+	record_clear(entry.rec);
+	free(record_yield(&entry.rec));
+	free(slice_yield(&entry_key));
+	return 0;
+}
+
+static int merged_iter_next_void(void *p, struct record rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	return merged_iter_next(mi, rec);
+}
+
+struct iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct iterator *it,
+				      struct merged_iter *mi)
+{
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n)
+{
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reader *r = stack[i];
+		if (i > 0 && last_max >= reader_min_update_index(r)) {
+			return FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reader_min_update_index(r);
+		}
+
+		last_max = reader_max_update_index(r);
+	}
+
+	{
+		struct merged_table m = {
+			.stack = stack,
+			.stack_len = n,
+			.min = first_min,
+			.max = last_max,
+			.hash_size = SHA1_SIZE,
+		};
+
+		*dest = calloc(sizeof(struct merged_table), 1);
+		**dest = m;
+	}
+	return 0;
+}
+
+void merged_table_close(struct merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void merged_table_free(struct merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	free(mt);
+}
+
+uint64_t merged_max_update_index(struct merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t merged_min_update_index(struct merged_table *mt)
+{
+	return mt->min;
+}
+
+static int merged_table_seek_record(struct merged_table *mt,
+				    struct iterator *it, struct record rec)
+{
+	struct iterator *iters = calloc(sizeof(struct iterator), mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = record_type(rec),
+		.hash_size = mt->hash_size,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			iterator_destroy(&iters[i]);
+		}
+		free(iters);
+		return err;
+	}
+
+	merged.stack_len = n, err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	}
+
+	{
+		struct merged_iter *p = malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+			  const char *name)
+{
+	struct ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = {};
+	record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+			     const char *name, uint64_t update_index)
+{
+	struct log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = {};
+	record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+			  const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 0000000000..b8d3572e26
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct merged_table {
+	struct reader **stack;
+	int stack_len;
+	int hash_size;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct iterator *stack;
+	int hash_size;
+	int stack_len;
+	byte typ;
+	struct merged_iter_pqueue pq;
+} merged_iter;
+
+void merged_table_clear(struct merged_table *mt);
+
+#endif
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 0000000000..a1aff7c98c
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,114 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = {};
+	struct slice bk = {};
+	int cmp = 0;
+	record_key(a.rec, &ak);
+	record_key(b.rec, &bk);
+
+	cmp = slice_compare(ak, bk);
+
+	free(slice_yield(&ak));
+	free(slice_yield(&bk));
+
+	if (cmp == 0) {
+		return a.index > b.index;
+	}
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = realloc(pq->heap, pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		record_clear(pq->heap[i].rec);
+		free(record_yield(&pq->heap[i].rec));
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 0000000000..5f7018979d
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	struct record rec;
+	int index;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 0000000000..981911f076
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,708 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct block_source source)
+{
+	return source.ops->size(source.arg);
+}
+
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size)
+{
+	int result = source.ops->read_block(source.arg, dest, off, size);
+	dest->source = source;
+	return result;
+}
+
+void block_source_return_block(struct block_source source, struct block *blockp)
+{
+	source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reader_offsets *reader_offsets_for(struct reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reader *r, struct block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size) {
+		return 0;
+	}
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(r->source, dest, off, sz);
+}
+
+void reader_return_block(struct reader *r, struct block *p)
+{
+	block_source_return_block(r->source, p);
+}
+
+const char *reader_name(struct reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	int err = 0;
+	if (memcmp(f, "REFT", 4)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, HEADER_SIZE)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+
+	{
+		byte version = *f++;
+		if (version != 1) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	r->block_size = get_u24(f);
+
+	f += 3;
+	r->min_update_index = get_u64(f);
+	f += 8;
+	r->max_update_index = get_u64(f);
+	f += 8;
+
+	r->ref_offsets.index_offset = get_u64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_u64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_u64(f);
+	f += 8;
+	r->log_offsets.offset = get_u64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_u64(f);
+	f += 8;
+
+	{
+		uint32_t computed_crc = crc32(0, footer, f - footer);
+		uint32_t file_crc = get_u32(f);
+		f += 4;
+		if (computed_crc != file_crc) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	{
+		byte first_block_typ = header[HEADER_SIZE];
+		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+		r->ref_offsets.offset = 0;
+		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+					  r->log_offsets.offset > 0);
+		r->obj_offsets.present = r->obj_offsets.offset > 0;
+	}
+	err = 0;
+exit:
+	return err;
+}
+
+int init_reader(struct reader *r, struct block_source source, const char *name)
+{
+	struct block footer = {};
+	struct block header = {};
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reader));
+	r->size = block_source_size(source) - FOOTER_SIZE;
+	r->source = source;
+	r->name = strdup(name);
+	r->hash_size = SHA1_SIZE;
+
+	err = block_source_read_block(source, &footer, r->size, FOOTER_SIZE);
+	if (err != FOOTER_SIZE) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	/* Need +1 to read type of first block. */
+	err = reader_get_block(r, &header, 0, HEADER_SIZE + 1);
+	if (err != HEADER_SIZE + 1) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+exit:
+	block_source_return_block(r->source, &footer);
+	block_source_return_block(r->source, &header);
+	return err;
+}
+
+struct table_iter {
+	struct reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
+		((struct ref_record *)rec.data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reader_return_block(ti->r, &ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += 24;
+	}
+
+	*typ = data[0];
+	if (is_block_type(*typ)) {
+		result = get_u24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct block block = {};
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : HEADER_SIZE;
+	int32_t block_size = 0;
+
+	if (next_off >= r->size) {
+		return 1;
+	}
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0) {
+		return err;
+	}
+
+	block_size = extract_block_size(block.data, &block_typ, next_off);
+	if (block_size < 0) {
+		return block_size;
+	}
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reader_return_block(r, &block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reader_return_block(r, &block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 r->hash_size);
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = {};
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0) {
+		return err;
+	}
+
+	{
+		struct block_reader *brp = malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct record rec)
+{
+	if (record_type(rec) != ti->typ) {
+		return API_ERROR;
+	}
+
+	while (true) {
+		struct table_iter next = {};
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct record rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct iterator *it, struct table_iter *ti)
+{
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reader *r, struct table_iter *ti,
+				uint64_t off, byte typ)
+{
+	struct block_reader br = {};
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0) {
+		return err;
+	}
+
+	brp = malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reader *r, struct table_iter *ti, byte typ,
+			bool index)
+{
+	struct reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reader *r, struct table_iter *ti,
+			      struct record want)
+{
+	struct record rec = new_record(record_type(want));
+	struct slice want_key = {};
+	struct slice got_key = {};
+	struct table_iter next = {};
+	int err = -1;
+	record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0) {
+			goto exit;
+		}
+		{
+			int cmp = slice_compare(got_key, want_key);
+			if (cmp > 0) {
+				table_iter_block_done(&next);
+				break;
+			}
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0) {
+		goto exit;
+	}
+	err = 0;
+
+exit:
+	block_iter_close(&next.bi);
+	record_clear(rec);
+	free(record_yield(&rec));
+	free(slice_yield(&want_key));
+	free(slice_yield(&got_key));
+	return err;
+}
+
+static int reader_seek_indexed(struct reader *r, struct iterator *it,
+			       struct record rec)
+{
+	struct index_record want_index = {};
+	struct record want_index_rec = {};
+	struct index_record index_result = {};
+	struct record index_result_rec = {};
+	struct table_iter index_iter = {};
+	struct table_iter next = {};
+	int err = 0;
+
+	record_key(rec, &want_index.last_key);
+	record_from_index(&want_index_rec, &want_index);
+	record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, record_type(rec), true);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reader_seek_linear(r, &index_iter, want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (next.typ == record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter *malloced =
+			calloc(sizeof(struct table_iter), 1);
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+exit:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	record_clear(want_index_rec);
+	record_clear(index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reader *r, struct iterator *it,
+				struct record rec)
+{
+	struct reader_offsets *offs = reader_offsets_for(r, record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = {};
+	int err = 0;
+	if (idx > 0) {
+		return reader_seek_indexed(r, it, rec);
+	}
+
+	err = reader_start(r, &ti, record_type(rec), false);
+	if (err < 0) {
+		return err;
+	}
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct table_iter *p = malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reader *r, struct iterator *it, struct record rec)
+{
+	byte typ = record_type(rec);
+
+	struct reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name)
+{
+	struct ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = {};
+	record_from_ref(&rec, &ref);
+	return reader_seek(r, it, rec);
+}
+
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+		       uint64_t update_index)
+{
+	struct log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = {};
+	record_from_log(&rec, &log);
+	return reader_seek(r, it, rec);
+}
+
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int new_reader(struct reader **p, struct block_source src, char const *name)
+{
+	struct reader *rd = calloc(sizeof(struct reader), 1);
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		free(rd);
+	}
+	return err;
+}
+
+void reader_free(struct reader *r)
+{
+	reader_close(r);
+	free(r);
+}
+
+static int reader_refs_for_indexed(struct reader *r, struct iterator *it,
+				   byte *oid)
+{
+	struct obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct record want_rec = {};
+	struct iterator oit = {};
+	struct obj_record got = {};
+	struct record got_rec = {};
+	int err = 0;
+
+	record_from_obj(&want_rec, &want);
+
+	err = reader_seek(r, &oit, want_rec);
+	if (err != 0) {
+		return err;
+	}
+
+	record_from_obj(&got_rec, &got);
+	err = iterator_next(oit, got_rec);
+	iterator_destroy(&oit);
+	if (err < 0) {
+		return err;
+	}
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	{
+		struct indexed_table_ref_iter *itr = NULL;
+		err = new_indexed_table_ref_iter(&itr, r, oid, r->hash_size,
+						 got.offsets, got.offset_len);
+		if (err < 0) {
+			record_clear(got_rec);
+			return err;
+		}
+		got.offsets = NULL;
+		record_clear(got_rec);
+
+		iterator_from_indexed_table_ref_iter(it, itr);
+	}
+
+	return 0;
+}
+
+static int reader_refs_for_unindexed(struct reader *r, struct iterator *it,
+				     byte *oid, int oid_len)
+{
+	struct table_iter *ti = calloc(sizeof(struct table_iter), 1);
+	struct filtering_ref_iterator *filter = NULL;
+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		free(ti);
+		return err;
+	}
+
+	filter = calloc(sizeof(struct filtering_ref_iterator), 1);
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	filter->r = r;
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
+		    int oid_len)
+{
+	if (r->obj_offsets.present) {
+		return reader_refs_for_indexed(r, it, oid);
+	}
+	return reader_refs_for_unindexed(r, it, oid, oid_len);
+}
+
+uint64_t reader_max_update_index(struct reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reader_min_update_index(struct reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 0000000000..599a90028e
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,52 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source *source);
+
+struct reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+struct reader {
+	struct block_source source;
+	char *name;
+	int hash_size;
+	uint64_t size;
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	int object_id_len;
+
+	struct reader_offsets ref_offsets;
+	struct reader_offsets obj_offsets;
+	struct reader_offsets log_offsets;
+};
+
+int init_reader(struct reader *r, struct block_source source, const char *name);
+int reader_seek(struct reader *r, struct iterator *it, struct record rec);
+void reader_close(struct reader *r);
+const char *reader_name(struct reader *r);
+void reader_return_block(struct reader *r, struct block *p);
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 0000000000..a006f9019d
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1107 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0) {
+		return -1;
+	}
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = {};
+	int i = 9;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	{
+		int n = sizeof(buf) - i - 1;
+		if (dest.len < n) {
+			return -1;
+		}
+		memcpy(dest.buf, &buf[i + 1], n);
+		return n;
+	}
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+	if (in.len < tsize) {
+		return -1;
+	}
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	in.buf += tsize;
+	in.len -= tsize;
+
+	return start_len - in.len;
+}
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0) {
+		return -1;
+	}
+	dest.buf += n;
+	dest.len -= n;
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0) {
+		return -1;
+	}
+	dest.buf += n;
+	dest.len -= n;
+
+	if (dest.len < suffix_len) {
+		return -1;
+	}
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	dest.buf += suffix_len;
+	dest.len -= suffix_len;
+
+	return start.len - dest.len;
+}
+
+static byte ref_record_type(void)
+{
+	return BLOCK_TYPE_REF;
+}
+
+static void ref_record_key(const void *r, struct slice *dest)
+{
+	const struct ref_record *rec = (const struct ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void ref_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct ref_record *ref = (struct ref_record *)rec;
+	struct ref_record *src = (struct ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = strdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = strdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9) {
+		return '0' + c;
+	}
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void ref_record_print(struct ref_record *ref, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = {};
+
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size);
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size);
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void ref_record_clear_void(void *rec)
+{
+	ref_record_clear((struct ref_record *)rec);
+}
+
+void ref_record_clear(struct ref_record *ref)
+{
+	free(ref->ref_name);
+	free(ref->target);
+	free(ref->target_value);
+	free(ref->value);
+	memset(ref, 0, sizeof(struct ref_record));
+}
+
+static byte ref_record_val_type(const void *rec)
+{
+	const struct ref_record *r = (const struct ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL) {
+		return 3;
+	}
+	return 0;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+	if (s.len < l) {
+		return -1;
+	}
+	memcpy(s.buf, str, l);
+	s.buf += l;
+	s.len -= l;
+
+	return start.len - s.len;
+}
+
+static int ref_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	const struct ref_record *r = (const struct ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		s.buf += hash_size;
+		s.len -= hash_size;
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		s.buf += hash_size;
+		s.len -= hash_size;
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		s.buf += n;
+		s.len -= n;
+	}
+
+	return start.len - s.len;
+}
+
+static int ref_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct ref_record *r = (struct ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0) {
+		return n;
+	}
+	assert(hash_size > 0);
+
+	in.buf += n;
+	in.len -= n;
+
+	r->ref_name = realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		in.buf += hash_size;
+		in.len -= hash_size;
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		in.buf += hash_size;
+		in.len -= hash_size;
+		break;
+	case 3: {
+		struct slice dest = {};
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		in.buf += n;
+		in.len -= n;
+		seen_target = true;
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+
+	if (prefix_len > last_key.len) {
+		return -1;
+	}
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len) {
+		return -1;
+	}
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	in.buf += suffix_len;
+	in.len -= suffix_len;
+
+	return start_len - in.len;
+}
+
+struct record_vtable ref_record_vtable = {
+	.key = &ref_record_key,
+	.type = &ref_record_type,
+	.copy_from = &ref_record_copy_from,
+	.val_type = &ref_record_val_type,
+	.encode = &ref_record_encode,
+	.decode = &ref_record_decode,
+	.clear = &ref_record_clear_void,
+};
+
+static byte obj_record_type(void)
+{
+	return BLOCK_TYPE_OBJ;
+}
+
+static void obj_record_key(const void *r, struct slice *dest)
+{
+	const struct obj_record *rec = (const struct obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	const struct obj_record *src = (const struct obj_record *)src_rec;
+
+	*ref = *src;
+	ref->hash_prefix = malloc(ref->hash_prefix_len);
+	memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
+
+	{
+		int olen = ref->offset_len * sizeof(uint64_t);
+		ref->offsets = malloc(olen);
+		memcpy(ref->offsets, src->offsets, olen);
+	}
+}
+
+static void obj_record_clear(void *rec)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	FREE_AND_NULL(ref->hash_prefix);
+	FREE_AND_NULL(ref->offsets);
+	memset(ref, 0, sizeof(struct obj_record));
+}
+
+static byte obj_record_val_type(const void *rec)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8) {
+		return r->offset_len;
+	}
+	return 0;
+}
+
+static int obj_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		s.buf += n;
+		s.len -= n;
+	}
+	if (r->offset_len == 0) {
+		return start.len - s.len;
+	}
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	{
+		uint64_t last = r->offsets[0];
+		int i = 0;
+		for (i = 1; i < r->offset_len; i++) {
+			int n = put_var_int(s, r->offsets[i] - last);
+			if (n < 0) {
+				return -1;
+			}
+			s.buf += n;
+			s.len -= n;
+			last = r->offsets[i];
+		}
+	}
+	return start.len - s.len;
+}
+
+static int obj_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct obj_record *r = (struct obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	r->hash_prefix = malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		in.buf += n;
+		in.len -= n;
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0) {
+		return start.len - in.len;
+	}
+
+	r->offsets = malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0) {
+		return n;
+	}
+
+	in.buf += n;
+	in.len -= n;
+
+	{
+		uint64_t last = r->offsets[0];
+		int j = 1;
+		while (j < count) {
+			uint64_t delta = 0;
+			int n = get_var_int(&delta, in);
+			if (n < 0) {
+				return n;
+			}
+
+			in.buf += n;
+			in.len -= n;
+
+			last = r->offsets[j] = (delta + last);
+			j++;
+		}
+	}
+	return start.len - in.len;
+}
+
+struct record_vtable obj_record_vtable = {
+	.key = &obj_record_key,
+	.type = &obj_record_type,
+	.copy_from = &obj_record_copy_from,
+	.val_type = &obj_record_val_type,
+	.encode = &obj_record_encode,
+	.decode = &obj_record_decode,
+	.clear = &obj_record_clear,
+};
+
+void log_record_print(struct log_record *log, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = {};
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size);
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size);
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static byte log_record_type(void)
+{
+	return BLOCK_TYPE_LOG;
+}
+
+static void log_record_key(const void *r, struct slice *dest)
+{
+	const struct log_record *rec = (const struct log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_u64(dest->buf + 1 + len, ts);
+}
+
+static void log_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct log_record *dst = (struct log_record *)rec;
+	const struct log_record *src = (const struct log_record *)src_rec;
+
+	*dst = *src;
+	dst->ref_name = strdup(dst->ref_name);
+	dst->email = strdup(dst->email);
+	dst->name = strdup(dst->name);
+	dst->message = strdup(dst->message);
+	if (dst->new_hash != NULL) {
+		dst->new_hash = malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void log_record_clear_void(void *rec)
+{
+	struct log_record *r = (struct log_record *)rec;
+	log_record_clear(r);
+}
+
+void log_record_clear(struct log_record *r)
+{
+	free(r->ref_name);
+	free(r->new_hash);
+	free(r->old_hash);
+	free(r->name);
+	free(r->email);
+	free(r->message);
+	memset(r, 0, sizeof(struct log_record));
+}
+
+static byte log_record_val_type(const void *rec)
+{
+	return 1;
+}
+
+static byte zero[SHA256_SIZE] = {};
+
+static int log_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct log_record *r = (struct log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size) {
+		return -1;
+	}
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	s.buf += 2 * hash_size;
+	s.len -= 2 * hash_size;
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	n = put_var_int(s, r->time);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	if (s.len < 2) {
+		return -1;
+	}
+
+	put_u16(s.buf, r->tz_offset);
+	s.buf += 2;
+	s.len -= 2;
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	return start.len - s.len;
+}
+
+static int log_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct log_record *r = (struct log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = {};
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+		return FORMAT_ERROR;
+	}
+
+	r->ref_name = realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_u64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (in.len < 2 * hash_size) {
+		return FORMAT_ERROR;
+	}
+
+	r->old_hash = realloc(r->old_hash, hash_size);
+	r->new_hash = realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	in.buf += 2 * hash_size;
+	in.len -= 2 * hash_size;
+
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->name = realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->email = realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+	r->time = ts;
+	if (in.len < 2) {
+		goto error;
+	}
+
+	r->tz_offset = get_u16(in.buf);
+	in.buf += 2;
+	in.len -= 2;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->message = realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	return start.len - in.len;
+
+error:
+	free(slice_yield(&dest));
+	return FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL) {
+		a = empty;
+	}
+	if (b == NULL) {
+		b = empty;
+	}
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL) {
+		a = zero;
+	}
+	if (b == NULL) {
+		b = zero;
+	}
+	return !memcmp(a, b, sz);
+}
+
+bool log_record_equal(struct log_record *a, struct log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+struct record_vtable log_record_vtable = {
+	.key = &log_record_key,
+	.type = &log_record_type,
+	.copy_from = &log_record_copy_from,
+	.val_type = &log_record_val_type,
+	.encode = &log_record_encode,
+	.decode = &log_record_decode,
+	.clear = &log_record_clear_void,
+};
+
+struct record new_record(byte typ)
+{
+	struct record rec;
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct ref_record *r = calloc(1, sizeof(struct ref_record));
+		record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct obj_record *r = calloc(1, sizeof(struct obj_record));
+		record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct log_record *r = calloc(1, sizeof(struct log_record));
+		record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct index_record *r = calloc(1, sizeof(struct index_record));
+		record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+static byte index_record_type(void)
+{
+	return BLOCK_TYPE_INDEX;
+}
+
+static void index_record_key(const void *r, struct slice *dest)
+{
+	struct index_record *rec = (struct index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void index_record_copy_from(void *rec, const void *src_rec,
+				   int hash_size)
+{
+	struct index_record *dst = (struct index_record *)rec;
+	struct index_record *src = (struct index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void index_record_clear(void *rec)
+{
+	struct index_record *idx = (struct index_record *)rec;
+	free(slice_yield(&idx->last_key));
+}
+
+static byte index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int index_record_encode(const void *rec, struct slice out, int hash_size)
+{
+	const struct index_record *r = (const struct index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0) {
+		return n;
+	}
+
+	out.buf += n;
+	out.len -= n;
+
+	return start.len - out.len;
+}
+
+static int index_record_decode(void *rec, struct slice key, byte val_type,
+			       struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct index_record *r = (struct index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0) {
+		return n;
+	}
+
+	in.buf += n;
+	in.len -= n;
+	return start.len - in.len;
+}
+
+struct record_vtable index_record_vtable = {
+	.key = &index_record_key,
+	.type = &index_record_type,
+	.copy_from = &index_record_copy_from,
+	.val_type = &index_record_val_type,
+	.encode = &index_record_encode,
+	.decode = &index_record_decode,
+	.clear = &index_record_clear,
+};
+
+void record_key(struct record rec, struct slice *dest)
+{
+	rec.ops->key(rec.data, dest);
+}
+
+byte record_type(struct record rec)
+{
+	return rec.ops->type();
+}
+
+int record_encode(struct record rec, struct slice dest, int hash_size)
+{
+	return rec.ops->encode(rec.data, dest, hash_size);
+}
+
+void record_copy_from(struct record rec, struct record src, int hash_size)
+{
+	assert(src.ops->type() == rec.ops->type());
+
+	rec.ops->copy_from(rec.data, src.data, hash_size);
+}
+
+byte record_val_type(struct record rec)
+{
+	return rec.ops->val_type(rec.data);
+}
+
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size)
+{
+	return rec.ops->decode(rec.data, key, extra, src, hash_size);
+}
+
+void record_clear(struct record rec)
+{
+	return rec.ops->clear(rec.data);
+}
+
+void record_from_ref(struct record *rec, struct ref_record *ref_rec)
+{
+	rec->data = ref_rec;
+	rec->ops = &ref_record_vtable;
+}
+
+void record_from_obj(struct record *rec, struct obj_record *obj_rec)
+{
+	rec->data = obj_rec;
+	rec->ops = &obj_record_vtable;
+}
+
+void record_from_index(struct record *rec, struct index_record *index_rec)
+{
+	rec->data = index_rec;
+	rec->ops = &index_record_vtable;
+}
+
+void record_from_log(struct record *rec, struct log_record *log_rec)
+{
+	rec->data = log_rec;
+	rec->ops = &log_record_vtable;
+}
+
+void *record_yield(struct record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+struct ref_record *record_as_ref(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_REF);
+	return (struct ref_record *)rec.data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL) {
+		return !memcmp(a, b, hash_size);
+	}
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL) {
+		return 0 == strcmp(a, b);
+	}
+
+	return a == b;
+}
+
+bool ref_record_equal(struct ref_record *a, struct ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct ref_record *)a)->ref_name,
+		      ((struct ref_record *)b)->ref_name);
+}
+
+bool ref_record_is_deletion(const struct ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int log_record_compare_key(const void *a, const void *b)
+{
+	struct log_record *la = (struct log_record *)a;
+	struct log_record *lb = (struct log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp) {
+		return cmp;
+	}
+	if (la->update_index > lb->update_index) {
+		return -1;
+	}
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool log_record_is_deletion(const struct log_record *log)
+{
+	/* XXX */
+	return false;
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 0000000000..dffdd71fc2
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,79 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+struct record_vtable {
+	void (*key)(const void *rec, struct slice *dest);
+	byte (*type)(void);
+	void (*copy_from)(void *rec, const void *src, int hash_size);
+	byte (*val_type)(const void *rec);
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+	void (*clear)(void *rec);
+};
+
+/* record is a generic wrapper for differnt types of records. */
+struct record {
+	void *data;
+	struct record_vtable *ops;
+};
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+int common_prefix_size(struct slice a, struct slice b);
+
+int is_block_type(byte typ);
+struct record new_record(byte typ);
+
+extern struct record_vtable ref_record_vtable;
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra);
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in);
+
+struct index_record {
+	struct slice last_key;
+	uint64_t offset;
+};
+
+struct obj_record {
+	byte *hash_prefix;
+	int hash_prefix_len;
+	uint64_t *offsets;
+	int offset_len;
+};
+
+void record_key(struct record rec, struct slice *dest);
+byte record_type(struct record rec);
+void record_copy_from(struct record rec, struct record src, int hash_size);
+byte record_val_type(struct record rec);
+int record_encode(struct record rec, struct slice dest, int hash_size);
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size);
+void record_clear(struct record rec);
+void *record_yield(struct record *rec);
+void record_from_obj(struct record *rec, struct obj_record *objrec);
+void record_from_index(struct record *rec, struct index_record *idxrec);
+void record_from_ref(struct record *rec, struct ref_record *refrec);
+void record_from_log(struct record *rec, struct log_record *logrec);
+struct ref_record *record_as_ref(struct record ref);
+
+/* for qsort. */
+int ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 0000000000..7021bb333f
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,394 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include "system.h"
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct block_source {
+	struct block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct block {
+	uint8_t *data;
+	int len;
+	struct block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct block *dest, uint64_t off,
+			  uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int block_source_from_file(struct block_source *block_src, const char *name);
+
+/* write_options sets options for writing a single reftable. */
+struct write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+};
+
+/* ref_record holds a ref database entry target_value */
+struct ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int ref_record_is_deletion(const struct ref_record *ref);
+
+/* prints a ref_record onto stdout */
+void ref_record_print(struct ref_record *ref, int hash_size);
+
+/* frees and nulls all pointer values. */
+void ref_record_clear(struct ref_record *ref);
+
+/* returns whether two ref_records are the same */
+int ref_record_equal(struct ref_record *a, struct ref_record *b, int hash_size);
+
+/* log_record holds a reflog entry */
+struct log_record {
+	char *ref_name;
+	uint64_t update_index;
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int log_record_is_deletion(const struct log_record *log);
+
+/* frees and nulls all pointer values. */
+void log_record_clear(struct log_record *log);
+
+/* returns whether two records are equal. */
+int log_record_equal(struct log_record *a, struct log_record *b, int hash_size);
+
+void log_record_print(struct log_record *log, int hash_size);
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct iterator {
+	struct iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_ref(struct iterator it, struct ref_record *ref);
+
+/* reads the next log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_log(struct iterator it, struct log_record *log);
+
+/* releases resources associated with an iterator. */
+void iterator_destroy(struct iterator *it);
+
+/* block_stats holds statistics for a single block type */
+struct block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct block_stats obj_stats;
+	/* stats for index blocks */
+	struct block_stats idx_stats;
+	/* stats for log blocks */
+	struct block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* different types of errors */
+
+/* Unexpected file system behavior */
+#define IO_ERROR -2
+
+/* Format inconsistency on reading data
+ */
+#define FORMAT_ERROR -3
+
+/* File does not exist. Returned from block_source_from_file(),  because it
+   needs special handling in stack.
+*/
+#define NOT_EXIST_ERROR -4
+
+/* Trying to write out-of-date data. */
+#define LOCK_ERROR -5
+
+/* Misuse of the API:
+   - on writing a record with NULL ref_name.
+   - on writing a ref_record outside the table limits
+   - on writing a ref or log record before the stack's next_update_index
+   - on reading a ref_record from log iterator, or vice versa.
+ */
+#define API_ERROR -6
+
+/* Decompression error */
+#define ZLIB_ERROR -7
+
+const char *error_str(int err);
+
+/* new_writer creates a new writer */
+struct writer *new_writer(int (*writer_func)(void *, uint8_t *, int),
+			  void *writer_arg, struct write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int fd_writer(void *fdp, uint8_t *data, int size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   stack_next_update_index(), or API_ERROR is returned.
+ */
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max);
+
+/* adds a ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   writer_set_limits(), or API_ERROR is returned.
+ */
+int writer_add_ref(struct writer *w, struct ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n);
+
+/* adds a log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int writer_add_log(struct writer *w, struct log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int writer_add_logs(struct writer *w, struct log_record *logs, int n);
+
+/* writer_close finalizes the reftable. The writer is retained so statistics can
+ * be inspected. */
+int writer_close(struct writer *w);
+
+/* writer_stats returns the statistics on the reftable being written. */
+struct stats *writer_stats(struct writer *w);
+
+/* writer_free deallocates memory for the writer */
+void writer_free(struct writer *w);
+
+struct reader;
+
+/* new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp.  The name is used for creating a
+ * stack. Typically, it is the basename of the file.
+ */
+int new_reader(struct reader **pp, struct block_source, const char *name);
+
+/* reader_seek_ref returns an iterator where 'name' would be inserted in the
+   table.
+
+   example:
+
+   struct reader *r = NULL;
+   int err = new_reader(&r, src, "filename");
+   if (err < 0) { ... }
+   struct iterator it = {};
+   err = reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct ref_record ref = {};
+   while (1) {
+     err = iterator_next_ref(it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   iterator_destroy(&it);
+   ref_record_clear(&ref);
+ */
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name);
+
+/* seek to logs for the given name, older than update_index. */
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+		       uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reader_free(struct reader *);
+
+/* return an iterator for the refs pointing to oid */
+int reader_refs_for(struct reader *r, struct iterator *it, uint8_t *oid,
+		    int oid_len);
+
+/* return the max_update_index for a table */
+uint64_t reader_max_update_index(struct reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reader_min_update_index(struct reader *r);
+
+/* a merged table is implements seeking/iterating over a stack of tables. */
+struct merged_table;
+
+/* new_merged_table creates a new merged table. It takes ownership of the stack
+   array.
+*/
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n);
+
+/* returns an iterator positioned just before 'name' */
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+			  const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+			     const char *name, uint64_t update_index);
+
+/* like merged_table_seek_log_at but look for the newest entry. */
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+			  const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t merged_max_update_index(struct merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t merged_min_update_index(struct merged_table *mt);
+
+/* closes readers for the merged tables */
+void merged_table_close(struct merged_table *mt);
+
+/* releases memory for the merged_table */
+void merged_table_free(struct merged_table *m);
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct stack;
+
+/* open a new reftable stack. The tables will be stored in 'dir', while the list
+   of tables is in 'list_file'. Typically, this should be .git/reftables and
+   .git/refs respectively.
+*/
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+	      struct write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t stack_next_update_index(struct stack *st);
+
+/* add a new table to the stack. The write_table function must call
+   writer_set_limits, add refs and return an error value. */
+int stack_add(struct stack *st,
+	      int (*write_table)(struct writer *wr, void *write_arg),
+	      void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct merged_table *stack_merged_table(struct stack *st);
+
+/* frees all resources associated with the stack. */
+void stack_destroy(struct stack *st);
+
+/* reloads the stack if necessary. */
+int stack_reload(struct stack *st);
+
+/* Policy for expiring reflog entries. */
+struct log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int stack_compact_all(struct stack *st, struct log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int stack_auto_compact(struct stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_ref(struct stack *st, const char *refname,
+		   struct ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_log(struct stack *st, const char *refname,
+		   struct log_record *log);
+
+/* statistics on past compactions. */
+struct compaction_stats {
+	uint64_t bytes;
+	int attempts;
+	int failures;
+};
+
+struct compaction_stats *stack_compaction_stats(struct stack *st);
+
+#endif
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 0000000000..efbe625253
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,199 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+
+	{
+		int l = strlen(str);
+		l++; /* \0 */
+		slice_resize(s, l);
+		memcpy(s->buf, str, l);
+		s->len = l - 1;
+	}
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_append_string(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_append(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+byte *slice_yield(struct slice *s)
+{
+	byte *p = s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = {};
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_yield(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	if (a.len != b.len) {
+		return 0;
+	}
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_compare(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	if (res != 0) {
+		return res;
+	}
+	if (a.len < b.len) {
+		return -1;
+	} else if (a.len > b.len) {
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+int slice_write(struct slice *b, byte *data, int sz)
+{
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_write_void(void *b, byte *data, int sz)
+{
+	return slice_write((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = calloc(size, 1);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct block_source *bs, struct slice *buf)
+{
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+struct block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 0000000000..f12a6db228
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+struct slice {
+	byte *buf;
+	int len;
+	int cap;
+};
+
+void slice_set_string(struct slice *dest, const char *);
+void slice_append_string(struct slice *dest, const char *);
+char *slice_to_string(struct slice src);
+const char *slice_as_string(struct slice *src);
+bool slice_equal(struct slice a, struct slice b);
+byte *slice_yield(struct slice *s);
+void slice_copy(struct slice *dest, struct slice src);
+void slice_resize(struct slice *s, int l);
+int slice_compare(struct slice a, struct slice b);
+int slice_write(struct slice *b, byte *data, int sz);
+int slice_write_void(void *b, byte *data, int sz);
+void slice_append(struct slice *dest, struct slice add);
+
+struct block_source;
+void block_source_from_slice(struct block_source *bs, struct slice *buf);
+
+struct block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 0000000000..303eed2f37
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,983 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "reftable.h"
+#include "writer.h"
+
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+	      struct write_options config)
+{
+	struct stack *p = calloc(sizeof(struct stack), 1);
+	int err = 0;
+	*dest = NULL;
+	p->list_file = strdup(list_file);
+	p->reftable_dir = strdup(dir);
+	p->config = config;
+
+	err = stack_reload(p);
+	if (err < 0) {
+		stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fread_lines(FILE *f, char ***namesp)
+{
+	long size = 0;
+	int err = fseek(f, 0, SEEK_END);
+	char *buf = NULL;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	size = ftell(f);
+	if (size < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = fseek(f, 0, SEEK_SET);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	buf = malloc(size + 1);
+	if (fread(buf, 1, size, f) != size) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+exit:
+	free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	FILE *f = fopen(filename, "r");
+	int err = 0;
+	if (f == NULL) {
+		if (errno == ENOENT) {
+			*namesp = calloc(sizeof(char *), 1);
+			return 0;
+		}
+
+		return IO_ERROR;
+	}
+	err = fread_lines(f, namesp);
+	fclose(f);
+	return err;
+}
+
+struct merged_table *stack_merged_table(struct stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void stack_destroy(struct stack *st)
+{
+	if (st->merged == NULL) {
+		return;
+	}
+
+	merged_table_close(st->merged);
+	merged_table_free(st->merged);
+	st->merged = NULL;
+
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	free(st);
+}
+
+static struct reader **stack_copy_readers(struct stack *st, int cur_len)
+{
+	struct reader **cur = calloc(sizeof(struct reader *), cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int stack_reload_once(struct stack *st, char **names, bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reader **new_tables =
+		malloc(sizeof(struct reader *) * names_len);
+	int new_tables_len = 0;
+	struct merged_table *new_merged = NULL;
+
+	struct slice table_path = {};
+
+	while (*names) {
+		struct reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct block_source src = {};
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_append_string(&table_path, "/");
+			slice_append_string(&table_path, name);
+
+			err = block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0) {
+				goto exit;
+			}
+
+			err = new_reader(&rd, src, name);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = new_merged_table(&new_merged, new_tables, new_tables_len);
+	if (err < 0) {
+		goto exit;
+	}
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		merged_table_free(st->merged);
+	}
+	st->merged = new_merged;
+
+	{
+		int i = 0;
+		for (i = 0; i < cur_len; i++) {
+			if (cur[i] != NULL) {
+				reader_close(cur[i]);
+				reader_free(cur[i]);
+			}
+		}
+	}
+exit:
+	free(slice_yield(&table_path));
+	{
+		int i = 0;
+		for (i = 0; i < new_tables_len; i++) {
+			reader_close(new_tables[i]);
+		}
+	}
+	free(new_tables);
+	free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0) {
+		return diff;
+	}
+
+	return udiff;
+}
+
+static int stack_reload_maybe_reuse(struct stack *st, bool reuse_open)
+{
+	struct timeval deadline = {};
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = {};
+		int err = gettimeofday(&now, NULL);
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		err = read_lines(st->list_file, &names_after);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return -1;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 100;
+		usleep(delay);
+	}
+
+	return 0;
+}
+
+int stack_reload(struct stack *st)
+{
+	return stack_reload_maybe_reuse(st, true);
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto exit;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto exit;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	free_names(names);
+	return err;
+}
+
+int stack_add(struct stack *st, int (*write)(struct writer *wr, void *arg),
+	      void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == LOCK_ERROR) {
+			err = stack_reload(st);
+		}
+		return err;
+	}
+
+	return stack_auto_compact(st);
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "%012" PRIx64 "-%012" PRIx64, min, max);
+	slice_set_string(dest, buf);
+}
+
+int stack_try_add(struct stack *st,
+		  int (*write_table)(struct writer *wr, void *arg), void *arg)
+{
+	struct slice lock_name = {};
+	struct slice temp_tab_name = {};
+	struct slice tab_name = {};
+	struct slice next_name = {};
+	struct slice table_list = {};
+	struct writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+	int lock_fd = 0;
+	uint64_t next_update_index = 0;
+
+	slice_set_string(&lock_name, st->list_file);
+	slice_append_string(&lock_name, ".lock");
+
+	lock_fd = open(slice_as_string(&lock_name), O_EXCL | O_CREAT | O_WRONLY,
+		       0644);
+	if (lock_fd < 0) {
+		if (errno == EEXIST) {
+			err = LOCK_ERROR;
+			goto exit;
+		}
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_uptodate(st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 1) {
+		err = LOCK_ERROR;
+		goto exit;
+	}
+
+	next_update_index = stack_next_update_index(st);
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, next_update_index, next_update_index);
+
+	slice_set_string(&temp_tab_name, st->reftable_dir);
+	slice_append_string(&temp_tab_name, "/");
+	slice_append(&temp_tab_name, next_name);
+	slice_append_string(&temp_tab_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_name));
+	if (tab_fd < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	wr = new_writer(fd_writer, &tab_fd, &st->config);
+	err = write_table(wr, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	if (wr->min_update_index < next_update_index) {
+		err = API_ERROR;
+		goto exit;
+	}
+
+	{
+		int i = 0;
+		for (i = 0; i < st->merged->stack_len; i++) {
+			slice_append_string(&table_list,
+					    st->merged->stack[i]->name);
+			slice_append_string(&table_list, "\n");
+		}
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_append_string(&next_name, ".ref");
+	slice_append(&table_list, next_name);
+	slice_append_string(&table_list, "\n");
+
+	slice_set_string(&tab_name, st->reftable_dir);
+	slice_append_string(&tab_name, "/");
+	slice_append(&tab_name, next_name);
+
+	err = rename(slice_as_string(&temp_tab_name),
+		     slice_as_string(&tab_name));
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	free(slice_yield(&temp_tab_name));
+
+	err = write(lock_fd, table_list.buf, table_list.len);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = close(lock_fd);
+	lock_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&tab_name));
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&tab_name));
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_reload(st);
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_name));
+	}
+	unlink(slice_as_string(&lock_name));
+
+	if (lock_fd > 0) {
+		close(lock_fd);
+		lock_fd = 0;
+	}
+
+	free(slice_yield(&lock_name));
+	free(slice_yield(&temp_tab_name));
+	free(slice_yield(&tab_name));
+	free(slice_yield(&next_name));
+	free(slice_yield(&table_list));
+	writer_free(wr);
+	return err;
+}
+
+uint64_t stack_next_update_index(struct stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0) {
+		return reader_max_update_index(st->merged->stack[sz - 1]) + 1;
+	}
+	return 1;
+}
+
+static int stack_compact_locked(struct stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct log_expiry_config *config)
+{
+	struct slice next_name = {};
+	int tab_fd = -1;
+	struct writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reader_min_update_index(st->merged->stack[first]),
+		    reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_append_string(temp_tab, "/");
+	slice_append(temp_tab, next_name);
+	slice_append_string(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = new_writer(fd_writer, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0) {
+		goto exit;
+	}
+	err = writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+	writer_free(wr);
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		free(slice_yield(temp_tab));
+	}
+	free(slice_yield(&next_name));
+	return err;
+}
+
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+			int last, struct log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reader **subtabs =
+		calloc(sizeof(struct reader *), last - first + 1);
+	struct merged_table *mt = NULL;
+	int err = 0;
+	struct iterator it = {};
+	struct ref_record ref = {};
+	struct log_record log = {};
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	writer_set_limits(wr, st->merged->stack[first]->min_update_index,
+			  st->merged->stack[last]->max_update_index);
+
+	err = new_merged_table(&mt, subtabs, subtabs_len);
+	if (err < 0) {
+		free(subtabs);
+		goto exit;
+	}
+
+	err = merged_table_seek_ref(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = iterator_next_ref(it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+	}
+
+	err = merged_table_seek_log(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = iterator_next_log(it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		/* XXX collect stats? */
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+	}
+
+exit:
+	iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		merged_table_free(mt);
+	}
+	ref_record_clear(&ref);
+
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct stack *st, int first, int last,
+			       struct log_expiry_config *expiry)
+{
+	struct slice temp_tab_name = {};
+	struct slice new_table_name = {};
+	struct slice lock_file_name = {};
+	struct slice ref_list_contents = {};
+	struct slice new_table_path = {};
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **delete_on_success = calloc(sizeof(char *), compact_count + 1);
+	char **subtable_locks = calloc(sizeof(char *), compact_count + 1);
+	int i = 0;
+	int j = 0;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto exit;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_append_string(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0) {
+		goto exit;
+	}
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_name = {};
+		struct slice subtab_lock = {};
+		slice_set_string(&subtab_name, st->reftable_dir);
+		slice_append_string(&subtab_name, "/");
+		slice_append_string(&subtab_name,
+				    reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_name);
+		slice_append_string(&subtab_lock, ".lock");
+
+		{
+			int sublock_file_fd =
+				open(slice_as_string(&subtab_lock),
+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
+			if (sublock_file_fd > 0) {
+				close(sublock_file_fd);
+			} else if (sublock_file_fd < 0) {
+				if (errno == EEXIST) {
+					err = 1;
+				}
+				err = IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] = (char *)slice_as_string(&subtab_name);
+		j++;
+
+		if (err != 0) {
+			goto exit;
+		}
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_name, expiry);
+	if (err < 0) {
+		goto exit;
+	}
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_append_string(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_append_string(&new_table_path, "/");
+
+	slice_append(&new_table_path, new_table_name);
+
+	err = rename(slice_as_string(&temp_tab_name),
+		     slice_as_string(&new_table_path));
+	if (err < 0) {
+		goto exit;
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	slice_append(&ref_list_contents, new_table_name);
+	slice_append_string(&ref_list_contents, "\n");
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	have_lock = false;
+
+	for (char **p = delete_on_success; *p; p++) {
+		if (strcmp(*p, slice_as_string(&new_table_path))) {
+			unlink(*p);
+		}
+	}
+
+	err = stack_reload_maybe_reuse(st, first < last);
+exit:
+	for (char **p = subtable_locks; *p; p++) {
+		unlink(*p);
+	}
+	free_names(delete_on_success);
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	free(slice_yield(&new_table_name));
+	free(slice_yield(&new_table_path));
+	free(slice_yield(&ref_list_contents));
+	free(slice_yield(&temp_tab_name));
+	free(slice_yield(&lock_file_name));
+	return err;
+}
+
+int stack_compact_all(struct stack *st, struct log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct stack *st, int first, int last,
+				     struct log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	assert(sz > 0);
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = calloc(sizeof(struct segment), n);
+	int next = 0;
+	struct segment cur = {};
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct stack *st)
+{
+	uint64_t *sizes = calloc(sizeof(uint64_t), st->merged->stack_len);
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		/* overhead is 24 + 68 = 92. */
+		sizes[i] = st->merged->stack[i]->size - 91;
+	}
+	return sizes;
+}
+
+int stack_auto_compact(struct stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	free(sizes);
+	if (segment_size(&seg) > 0) {
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+	}
+
+	return 0;
+}
+
+struct compaction_stats *stack_compaction_stats(struct stack *st)
+{
+	return &st->stats;
+}
+
+int stack_read_ref(struct stack *st, const char *refname,
+		   struct ref_record *ref)
+{
+	struct iterator it = {};
+	struct merged_table *mt = stack_merged_table(st);
+	int err = merged_table_seek_ref(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = iterator_next_ref(it, ref);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(ref->ref_name, refname) || ref_record_is_deletion(ref)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	iterator_destroy(&it);
+	return err;
+}
+
+int stack_read_log(struct stack *st, const char *refname,
+		   struct log_record *log)
+{
+	struct iterator it = {};
+	struct merged_table *mt = stack_merged_table(st);
+	int err = merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = iterator_next_log(it, log);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(log->ref_name, refname) || log_record_is_deletion(log)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 0000000000..d5e2c93c29
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,40 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+
+struct stack {
+	char *list_file;
+	char *reftable_dir;
+
+	struct write_options config;
+
+	struct merged_table *merged;
+	struct compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct stack *st,
+		  int (*write_table)(struct writer *wr, void *arg), void *arg);
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+			int last, struct log_expiry_config *config);
+int fastlog2(uint64_t sz);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 0000000000..e27c80e3ca
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,58 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#include "config.h"
+
+#ifndef REFTABLE_STANDALONE
+
+#include "git-compat-util.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#define ARRAY_SIZE(a) sizeof((a)) / sizeof((a)[0])
+#define FREE_AND_NULL(x)    \
+	do {                \
+		free(x);    \
+		(x) = NULL; \
+	} while (0)
+#define QSORT(arr, n, cmp) qsort(arr, n, sizeof(arr[0]), cmp)
+#define SWAP(a, b)                              \
+	{                                       \
+		char tmp[sizeof(a)];            \
+		assert(sizeof(a) == sizeof(b)); \
+		memcpy(&tmp[0], &a, sizeof(a)); \
+		memcpy(&a, &b, sizeof(a));      \
+		memcpy(&b, &tmp[0], sizeof(a)); \
+	}
+#endif
+
+typedef uint8_t byte;
+typedef int bool;
+
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 0000000000..9bf7fe531f
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,66 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				calloc(sizeof(struct tree_node), 1);
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	{
+		int res = compare(key, (*rootp)->key);
+		if (res < 0) {
+			return tree_search(key, &(*rootp)->left, compare,
+					   insert);
+		} else if (res > 0) {
+			return tree_search(key, &(*rootp)->right, compare,
+					   insert);
+		}
+	}
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 0000000000..86a71715ae
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,24 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 0000000000..3247481df3
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,623 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct block_stats *writer_block_stats(struct writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct writer *w, byte *data, size_t len, int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = calloc(w->pending_padding, 1);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0) {
+			return n;
+		}
+
+		w->pending_padding = 0;
+		free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0) {
+		return n;
+	}
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_write_header(struct writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+	dest[4] = 1; /* version */
+	put_u24(dest + 5, w->opts.block_size);
+	put_u64(dest + 8, w->min_update_index);
+	put_u64(dest + 16, w->max_update_index);
+	return 24;
+}
+
+static void writer_reinit_block_writer(struct writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = HEADER_SIZE;
+	}
+
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start, w->hash_size);
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct writer *new_writer(int (*writer_func)(void *, byte *, int),
+			  void *writer_arg, struct write_options *opts)
+{
+	struct writer *wp = calloc(sizeof(struct writer), 1);
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->hash_size = SHA1_SIZE;
+	wp->block = calloc(opts->block_size, 1);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void writer_free(struct writer *w)
+{
+	free(w->block);
+	free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
+			     ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		key = calloc(sizeof(struct obj_index_tree_node), 1);
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = realloc(key->offsets,
+				       sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct writer *w, struct record rec)
+{
+	int result = -1;
+	struct slice key = {};
+	int err = 0;
+	record_key(rec, &key);
+	if (slice_compare(w->last_key, key) >= 0) {
+		goto exit;
+	}
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto exit;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	writer_reinit_block_writer(w, record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	result = 0;
+exit:
+	free(slice_yield(&key));
+	return result;
+}
+
+int writer_add_ref(struct writer *w, struct ref_record *ref)
+{
+	struct record rec = {};
+	struct ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL) {
+		return API_ERROR;
+	}
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index) {
+		return API_ERROR;
+	}
+
+	record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = w->hash_size,
+		};
+
+		writer_index_hash(w, h);
+	}
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = w->hash_size,
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int writer_add_log(struct writer *w, struct log_record *log)
+{
+	if (log->ref_name == NULL) {
+		return API_ERROR;
+	}
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	{
+		struct record rec = {};
+		int err;
+		record_from_log(&rec, log);
+		err = writer_add_record(w, rec);
+		return err;
+	}
+}
+
+int writer_add_logs(struct writer *w, struct log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	while (w->index_len > threshold) {
+		struct index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct record rec = {};
+			record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, rec) == 0) {
+				continue;
+			}
+
+			{
+				int err = writer_flush_block(w);
+				if (err < 0) {
+					return err;
+				}
+			}
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, rec);
+			assert(err == 0);
+		}
+		for (i = 0; i < idx_len; i++) {
+			free(slice_yield(&idx[i].last_key));
+		}
+		free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct block_stats *bstats = writer_block_stats(w, typ);
+		bstats->index_blocks =
+			w->stats.idx_stats.blocks - before_blocks;
+		bstats->index_offset = index_start;
+		bstats->max_index_level = max_level;
+	}
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct record rec = {};
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	free(slice_yield(&entry->hash));
+	free(entry);
+}
+
+static int writer_dump_object_index(struct writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = {};
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0) {
+		return closure.err;
+	}
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0) {
+		return err;
+	}
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_close(struct writer *w)
+{
+	byte footer[68];
+	byte *p = footer;
+
+	writer_finish_public_section(w);
+
+	writer_write_header(w, footer);
+	p += 24;
+	put_u64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_u64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_u64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_u64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_u64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_u32(p, crc32(0, footer, p - footer));
+	p += 4;
+	w->pending_padding = 0;
+
+	{
+		int n = padded_write(w, footer, sizeof(footer), 0);
+		if (n < 0) {
+			return n;
+		}
+	}
+
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	free(slice_yield(&w->last_key));
+	return 0;
+}
+
+void writer_clear_index(struct writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		free(slice_yield(&w->index[i].last_key));
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct block_stats *bstats = writer_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	if (raw_bytes < 0) {
+		return raw_bytes;
+	}
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_u24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0) {
+		return err;
+	}
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = realloc(w->index,
+				   sizeof(struct index_record) * w->index_cap);
+	}
+
+	{
+		struct index_record ir = {
+			.offset = w->next,
+		};
+		slice_copy(&ir.last_key, w->block_writer->last_key);
+		w->index[w->index_len] = ir;
+	}
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	block_writer_reset(&w->block_writer_data);
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct writer *w)
+{
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+	if (w->block_writer->entries == 0) {
+		return 0;
+	}
+	return writer_flush_nonempty_block(w);
+}
+
+struct stats *writer_stats(struct writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 0000000000..bd2386474b
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,46 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct writer {
+	int (*write)(void *, byte *, int);
+	void *write_arg;
+	int pending_padding;
+	int hash_size;
+	struct slice last_key;
+
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct write_options opts;
+
+	byte *block;
+	struct block_writer *block_writer;
+	struct block_writer block_writer_data;
+	struct index_record *index;
+	int index_len;
+	int index_cap;
+
+	/* tree for use with tsearch */
+	struct tree_node *obj_index_tree;
+
+	struct stats stats;
+};
+
+int writer_flush_block(struct writer *w);
+void writer_clear_index(struct writer *w);
+int writer_finish_public_section(struct writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 0000000000..3e0b0f24f1
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v5 5/5] Reftable support for git-core
  2020-02-10 14:14       ` [PATCH v5 " Han-Wen Nienhuys via GitGitGadget
                           ` (3 preceding siblings ...)
  2020-02-10 14:14         ` [PATCH v5 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-02-10 14:14         ` Han-Wen Nienhuys via GitGitGadget
  2020-02-18  8:43         ` [PATCH v6 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  5 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-10 14:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

TODO:

 * Make CI on gitgitgadget compile.
 * Redo interaction between repo config, repo storage version
 * Resolve the design problem with reflog expiry.
 * Resolve spots marked with XXX

Example use:

  $ ~/vc/git/git init --reftable
  warning: templates not found in /usr/local/google/home/hanwen/share/git-core/templates
  Initialized empty Git repository in /tmp/qz/.git/
  $ echo q > a
  $ ~/vc/git/git add a
  $ ~/vc/git/git commit -mx
  fatal: not a git repository (or any of the parent directories): .git
  [master (root-commit) 373d969] x
   1 file changed, 1 insertion(+)
   create mode 100644 a
  $ ~/vc/git/git show-ref
  373d96972fca9b63595740bba3898a762778ba20 HEAD
  373d96972fca9b63595740bba3898a762778ba20 refs/heads/master
  $ ls -l .git/reftable/
  total 12
  -rw------- 1 hanwen primarygroup  126 Jan 23 20:08 000000000001-000000000001.ref
  -rw------- 1 hanwen primarygroup 4277 Jan 23 20:08 000000000002-000000000002.ref
  $ go run ~/vc/reftable/cmd/dump.go  -table /tmp/qz/.git/reftable/000000000002-000000000002.ref
  ** DEBUG **
  name /tmp/qz/.git/reftable/000000000002-000000000002.ref, sz 4209: 'r' reftable.readerOffsets{Present:true, Offset:0x0, IndexOffset:0x0}, 'o' reftable.readerOffsets{Present:false, Offset:0x0, IndexOffset:0x0} 'g' reftable.readerOffsets{Present:true, Offset:0x1000, IndexOffset:0x0}
  ** REFS **
  reftable.RefRecord{RefName:"refs/heads/master", UpdateIndex:0x2, Value:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, TargetValue:[]uint8(nil), Target:""}
  ** LOGS **
  reftable.LogRecord{RefName:"HEAD", UpdateIndex:0x2, New:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, Old:[]uint8{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, Name:"Han-Wen Nienhuys", Email:"hanwen@google.com", Time:0x5e29ef27, TZOffset:100, Message:"commit (initial): x\n"}

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |   7 +
 Makefile                                      |  24 +-
 builtin/clone.c                               |   4 +-
 builtin/init-db.c                             |  55 +-
 cache.h                                       |   4 +-
 refs.c                                        |  20 +-
 refs.h                                        |   3 +
 refs/refs-internal.h                          |   1 +
 refs/reftable-backend.c                       | 880 ++++++++++++++++++
 repository.c                                  |   2 +
 repository.h                                  |   3 +
 setup.c                                       |  12 +-
 12 files changed, 986 insertions(+), 29 deletions(-)
 create mode 100644 refs/reftable-backend.c

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ff..7257623583 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index 6134104ae6..706bb0a981 100644
--- a/Makefile
+++ b/Makefile
@@ -814,6 +814,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += command-list.h
 
@@ -959,6 +960,7 @@ LIB_OBJS += rebase-interactive.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1163,7 +1165,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2347,11 +2349,28 @@ VCSSVN_OBJS += vcs-svn/fast_export.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/bytes.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2488,6 +2507,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
diff --git a/builtin/clone.c b/builtin/clone.c
index 0fc89ae2b9..092485c883 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1096,7 +1096,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template,
+		DEFAULT_REF_STORAGE, /* XXX */
+		INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 45bdea0589..d51a99ed77 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -177,7 +177,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 }
 
 static int create_default_files(const char *template_path,
-				const char *original_git_dir)
+				const char *original_git_dir,
+				const char *ref_storage_format, int flags)
 {
 	struct stat st1;
 	struct strbuf buf = STRBUF_INIT;
@@ -213,6 +214,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(ref_storage_format);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -222,6 +224,15 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -234,17 +245,21 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
+	} else {
+		/*
+		 * XXX should check whether our ref backend matches the
+		 * original one; if not, either die() or convert
+		 */
 	}
 
 	/* This forces creation of new config file */
-	xsnprintf(repo_version_string, sizeof(repo_version_string),
-		  "%d", GIT_REPO_VERSION);
+	xsnprintf(repo_version_string, sizeof(repo_version_string), "%d",
+		  !strcmp(ref_storage_format, "reftable") ?
+			  GIT_REPO_VERSION_READ :
+			  GIT_REPO_VERSION);
 	git_config_set("core.repositoryformatversion", repo_version_string);
 
 	/* Check filemode trustability */
@@ -339,7 +354,8 @@ static void separate_git_dir(const char *git_dir, const char *git_link)
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, unsigned int flags)
+	    const char *template_dir, const char *ref_storage_format,
+	    unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -378,7 +394,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 */
 	check_repository_format();
 
-	reinit = create_default_files(template_dir, original_git_dir);
+	reinit = create_default_files(template_dir, original_git_dir,
+				      ref_storage_format, flags);
 
 	create_object_directory();
 
@@ -403,6 +420,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -476,20 +495,24 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = DEFAULT_REF_STORAGE;
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
 	unsigned int flags = 0;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_END()
@@ -591,9 +614,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, flags);
+	return init_db(git_dir, real_git_dir, template_dir, ref_storage_format,
+		       flags);
 }
diff --git a/cache.h b/cache.h
index cbfaead23a..781f929cb5 100644
--- a/cache.h
+++ b/cache.h
@@ -627,7 +627,8 @@ int path_inside_repo(const char *prefix, const char *path);
 #define INIT_DB_EXIST_OK 0x0002
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, unsigned int flags);
+	    const char *template_dir, const char *ref_storage_format,
+	    unsigned int flags);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1041,6 +1042,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 1ab0bb54d3..6530219762 100644
--- a/refs.c
+++ b/refs.c
@@ -20,7 +20,7 @@
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1836,13 +1836,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1858,7 +1858,10 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs = ref_store_init(r->gitdir,
+				 r->ref_storage_format ? r->ref_storage_format :
+							 DEFAULT_REF_STORAGE,
+				 REF_STORE_ALL_CAPS);
 	return r->refs;
 }
 
@@ -1913,7 +1916,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1927,6 +1930,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1940,9 +1944,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index 87c9ec921b..2b5985ad59 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* XXX where should this be? */
+#define DEFAULT_REF_STORAGE "files"
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 1d7a485220..d87e9cbdec 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -661,6 +661,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 0000000000..e793ee362e
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,880 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+/*
+  The reftable v1 spec only supports 20-byte binary SHA1s. A new format version
+  will be needed to support SHA256.
+ */
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *reftable_dir;
+	char *table_list_file;
+	struct stack *stack;
+};
+
+static void clear_log_record(struct log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	log_record_clear(log);
+}
+
+static void fill_log_record(struct log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = {};
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *reftable_ref_store_create(const char *path,
+						   unsigned int store_flags)
+{
+	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct write_options cfg = {
+		.block_size = 4096,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/reftable/tables.list", path);
+	refs->table_list_file = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", path);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/HEAD", path);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", path);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	refs->err = new_stack(&refs->stack, refs->reftable_dir,
+			      refs->table_list_file, cfg);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	FILE *file;
+	safe_create_dir(refs->reftable_dir, 1);
+
+	file = fopen(refs->table_list_file, "a");
+	if (file == NULL) {
+		return -1;
+	}
+	fclose(file);
+	return 0;
+}
+
+struct reftable_iterator {
+	struct ref_iterator base;
+	struct iterator iter;
+	struct ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = iterator_next_ref(ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	ref_record_clear(&ri->ref);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct merged_table *mt = NULL;
+
+	mt = stack_merged_table(refs->stack);
+	ri->err = refs->err;
+	if (ri->err == 0) {
+		ri->err = merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	(void)refs;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid = {};
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct ref_update *)a)->refname,
+		      ((struct ref_update *)b)->refname);
+}
+
+static int write_transaction_table(struct writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = stack_next_update_index(refs->stack);
+	int err = 0;
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	writer_set_limits(writer, ts, ts);
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_NEW) {
+			struct object_id out_oid = {};
+			int out_flags = 0;
+			/* Memory owned by refs_resolve_ref_unsafe, no need to
+			 * free(). */
+			const char *resolved = refs_resolve_ref_unsafe(
+				transaction->ref_store, u->refname, 0, &out_oid,
+				&out_flags);
+			struct ref_record ref = {};
+			ref.ref_name =
+				(char *)(resolved ? resolved : u->refname);
+			ref.value = u->new_oid.hash;
+			ref.update_index = ts;
+			err = writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct log_record log = {};
+		fill_log_record(&log);
+
+		log.ref_name = (char *)u->refname;
+		log.old_hash = u->old_oid.hash;
+		log.new_hash = u->new_oid.hash;
+		log.update_index = ts;
+		log.message = u->msg;
+
+		err = writer_add_log(writer, &log);
+		clear_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+exit:
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_commit(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	int err = stack_add(refs->stack, &write_transaction_table, transaction);
+	if (err < 0) {
+		strbuf_addf(errmsg, "reftable: transaction failure %s",
+			    error_str(err));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *err)
+{
+	return reftable_transaction_commit(ref_store, transaction, err);
+}
+
+struct write_delete_refs_arg {
+	struct stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	int err = 0;
+
+	writer_set_limits(writer, ts, ts);
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct log_record log = {};
+		struct ref_record current = {};
+		fill_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (stack_read_ref(arg->stack, log.ref_name, &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		ref_record_clear(&current);
+
+		clear_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	return stack_add(refs->stack, &write_delete_refs_table, &arg);
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	return stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	writer_set_limits(writer, ts, ts);
+	err = writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct log_record log = {};
+		struct object_id new_oid = {};
+		struct object_id old_oid = {};
+		struct ref_record current = {};
+		stack_read_ref(create->refs->stack, create->refname, &current);
+
+		fill_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	return stack_add(refs->stack, &write_create_symref_table, &arg);
+}
+
+struct write_rename_arg {
+	struct stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	struct ref_record ref = {};
+	int err = stack_read_ref(arg->stack, arg->oldname, &ref);
+	if (err) {
+		goto exit;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto exit;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct ref_record todo[2] = {};
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct log_record todo[2] = {};
+		fill_log_record(&todo[0]);
+		fill_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = writer_add_logs(writer, todo, 2);
+
+		clear_log_record(&todo[0]);
+		clear_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto exit;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	return stack_add(refs->stack, &write_rename_table, &arg);
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct iterator iter;
+	struct log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = iterator_next_log(ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	log_record_clear(&ri->log);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return empty_ref_iterator_begin();
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct iterator it = {};
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &it, refname);
+	struct log_record log = {};
+
+	while (err == 0) {
+		err = iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid = {};
+			struct object_id new_oid = {};
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	log_record_clear(&log);
+	iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct iterator it = {};
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &it, refname);
+
+	struct log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+
+	while (err == 0) {
+		struct log_record log = {};
+		err = iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (int i = len; i--;) {
+		struct log_record *log = &logs[i];
+		struct object_id old_oid = {};
+		struct object_id new_oid = {};
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (int i = 0; i < len; i++) {
+		log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	BUG("per ref reflog expiry is not implemented for reftable backend.");
+	/*
+	  TODO: should iterate over the reflog, and write tombstones. The
+	  tombstones will be removed on compaction.
+	 */
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct ref_record ref = {};
+	int err = stack_read_ref(refs->stack, refname, &ref);
+	if (err) {
+		goto exit;
+	}
+	if (ref.target != NULL) {
+		/* XXX recurse? */
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else {
+		hashcpy(oid->hash, ref.value);
+	}
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire
+};
diff --git a/repository.c b/repository.c
index a4174ddb06..ff0988dac8 100644
--- a/repository.c
+++ b/repository.c
@@ -174,6 +174,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 040057dea6..198d4aa090 100644
--- a/repository.h
+++ b/repository.h
@@ -70,6 +70,9 @@ struct repository {
 	/* The store in which the refs are held. */
 	struct ref_store *refs;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index e2a479a64f..e2e1583eee 100644
--- a/setup.c
+++ b/setup.c
@@ -440,9 +440,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -531,6 +533,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1173,8 +1176,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 4/5] Add reftable library
  2020-02-10 13:16           ` Han-Wen Nienhuys
@ 2020-02-11  0:05             ` brian m. carlson
  2020-02-11 14:20               ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: brian m. carlson @ 2020-02-11  0:05 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 2364 bytes --]

On 2020-02-10 at 13:16:26, Han-Wen Nienhuys wrote:
> On Fri, Feb 7, 2020 at 1:16 AM brian m. carlson
> <sandals@crustytoothpaste.net> wrote:
> > > +     {
> > > +             struct merged_table m = {
> > > +                     .stack = stack,
> > > +                     .stack_len = n,
> > > +                     .min = first_min,
> > > +                     .max = last_max,
> > > +                     .hash_size = SHA1_SIZE,
> >
> > This is going to cause problems with SHA-256.  We'd want to write this
> > as the_hash_algo->rawsz, since it can change at runtime.
> 
> As this is within the library, the hash size should be injected
> somewhere. I don't see huge problems here (the hash size could be part
> of the 'struct write_options'), but until we have properly defined how
> reftable will mark the on-disk files as SHA-256, it will only supports
> SHA1. I intend to work set the file format in motion, once this SHA1
> version is accepted into git-core.

It's not required to mark the on-disk files.  A repository's storage is
either entirely SHA-1 or entirely SHA-256, except for the translation
tables.  The entire marking of the repository is in the config file, and
everything in the .git directory is assumed to be in that format.

I expect full stage 4 SHA-256 support to land in the next couple of
months.  Currently, the entire codebase is ready and is hash agnostic
except for a small number of remaining tests, and having reftable
support land without support for SHA-256 will be a significant
regression to that state and delay future series.  So when you have a
series with tests, I'll send a patch that can be added to the series to
add SHA-256 support so we don't have that happen.

> There was one question you might be able to answer, though. Can
> repositories exist in dual hash configuration, i.e. where the ref
> database must store both SHA1 and SHA256 hashes?

No.  Refs are always stored in the native format with the rest of the
repository.  When we gain support for using both SHA-1 and SHA-256 to
refer to objects in the same repository, there will be either pack index
v3 or a loose object lookup table which has both and is used to
translate.  We won't store both values in the ref database at once.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 4/5] Add reftable library
  2020-02-11  0:05             ` brian m. carlson
@ 2020-02-11 14:20               ` Han-Wen Nienhuys
  2020-02-11 16:31                 ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-11 14:20 UTC (permalink / raw)
  To: brian m. carlson, Han-Wen Nienhuys,
	Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, Feb 11, 2020 at 1:05 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
> > As this is within the library, the hash size should be injected
> > somewhere. I don't see huge problems here (the hash size could be part
> > of the 'struct write_options'), but until we have properly defined how
> > reftable will mark the on-disk files as SHA-256, it will only supports
> > SHA1. I intend to work set the file format in motion, once this SHA1
> > version is accepted into git-core.
>
> It's not required to mark the on-disk files.  A repository's storage is
> either entirely SHA-1 or entirely SHA-256, except for the translation
> tables.  The entire marking of the repository is in the config file, and
> everything in the .git directory is assumed to be in that format.

I disagree. I think it should be possible to determine if a file in
reftable format is well-formed or not. Since parsing requires knowing
the hash size, we have to encode it somewhere.

I've uploaded https://git.eclipse.org/r/c/157501/ that proposes a way
to encode the hash size. I  look forward to feedback.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 4/5] Add reftable library
  2020-02-11 14:20               ` Han-Wen Nienhuys
@ 2020-02-11 16:31                 ` Junio C Hamano
  2020-02-11 16:40                   ` Han-Wen Nienhuys
  2020-02-11 16:46                   ` Han-Wen Nienhuys
  0 siblings, 2 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-02-11 16:31 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: brian m. carlson, Han-Wen Nienhuys via GitGitGadget, git,
	Han-Wen Nienhuys

Han-Wen Nienhuys <hanwen@google.com> writes:

> I've uploaded https://git.eclipse.org/r/c/157501/ that proposes a way
> to encode the hash size. I  look forward to feedback.

Development and design discussion happens here.  Please do not
require people to go click at an external website---once people
start doing so, we'd have to go around 47 different places to piece
one discussion together, which is just crazy.

Here is what I saw:

    A 24-byte header appears at the beginning of the file:

        'REFT'
        uint8( format_version )
        uint24( block_size )
        uint64( min_update_index )
        uint64( max_update_index )

    The `format_version` is a byte, and it indicates both the version of the on-disk
    format, as well as the size of the hash. The hash size is indicated in the MSB
    of the `format_version`. For the SHA1 hash, `format_version & 0x80 == 0` and all
    hash values are 20 bytes. For SHA256, `format_version & 0x80 == 1`, and all hash
    values are 32 bytes. Future hash functions may be added by using more bits at
    the right.

    The file format version can be extract as `format_version & 0x7f`. Currently,
    only version 1 is defined.

If you cast in stone that "& 0x7f is the way to extract the
version", then you cannot promise that you may steal more bits at
the right of MSB to support more hash functions, as you've reserved
the rightmost 7 bits already for the version number with 0x7f and
there are only 8 bits in your byte.

It seems that you are trying to make the format too dense?  Is it
too much a waste to use a separate word or a byte for hash?  Or
perhaps declare that format version 1 uses SHA-1, format version 2
uses SHA-256, etc. (in other words, do we want to support both SHA-1
and SHA-256 when we are at format version 7)?

Thanks.






^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 4/5] Add reftable library
  2020-02-11 16:31                 ` Junio C Hamano
@ 2020-02-11 16:40                   ` Han-Wen Nienhuys
  2020-02-11 23:40                     ` brian m. carlson
  2020-02-11 16:46                   ` Han-Wen Nienhuys
  1 sibling, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-11 16:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Han-Wen Nienhuys via GitGitGadget, git,
	Han-Wen Nienhuys

On Tue, Feb 11, 2020 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote:
> Here is what I saw:
>
>     A 24-byte header appears at the beginning of the file:
>
>         'REFT'
>         uint8( format_version )
>         uint24( block_size )
>         uint64( min_update_index )
>         uint64( max_update_index )
>
>     The `format_version` is a byte, and it indicates both the version of the on-disk
>     format, as well as the size of the hash. The hash size is indicated in the MSB
>     of the `format_version`. For the SHA1 hash, `format_version & 0x80 == 0` and all
>     hash values are 20 bytes. For SHA256, `format_version & 0x80 == 1`, and all hash
>     values are 32 bytes. Future hash functions may be added by using more bits at
>     the right.
>
>     The file format version can be extract as `format_version & 0x7f`. Currently,
>     only version 1 is defined.
>
> If you cast in stone that "& 0x7f is the way to extract the
> version", then you cannot promise that you may steal more bits at
> the right of MSB to support more hash functions, as you've reserved
> the rightmost 7 bits already for the version number with 0x7f and
> there are only 8 bits in your byte.
>
> It seems that you are trying to make the format too dense?  Is it
> too much a waste to use a separate word or a byte for hash?  Or
> perhaps declare that format version 1 uses SHA-1, format version 2
> uses SHA-256, etc. (in other words, do we want to support both SHA-1
> and SHA-256 when we are at format version 7)?

I can see a future where we have a different format that allows for
more metadata so we can encode the hash size separately. But maybe
that can be for format v3 and up.

Let's do format v2 = format v1 but with 32-byte hashes.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 4/5] Add reftable library
  2020-02-11 16:31                 ` Junio C Hamano
  2020-02-11 16:40                   ` Han-Wen Nienhuys
@ 2020-02-11 16:46                   ` Han-Wen Nienhuys
  2020-02-20 17:20                     ` Jonathan Nieder
  1 sibling, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-11 16:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: brian m. carlson, Han-Wen Nienhuys via GitGitGadget, git,
	Han-Wen Nienhuys

On Tue, Feb 11, 2020 at 5:32 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Han-Wen Nienhuys <hanwen@google.com> writes:
>
> > I've uploaded https://git.eclipse.org/r/c/157501/ that proposes a way
> > to encode the hash size. I  look forward to feedback.
>
> Development and design discussion happens here.  Please do not
> require people to go click at an external website---once people
> start doing so, we'd have to go around 47 different places to piece
> one discussion together, which is just crazy.

The ground truth for the format is the spec. The spec is part of the
JGit repository, so it seems reasonable for it to be reviewed there.
Do you want to move it somewhere else?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 4/5] Add reftable library
  2020-02-11 16:40                   ` Han-Wen Nienhuys
@ 2020-02-11 23:40                     ` brian m. carlson
  2020-02-18  9:25                       ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: brian m. carlson @ 2020-02-11 23:40 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Junio C Hamano, Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

[-- Attachment #1: Type: text/plain, Size: 556 bytes --]

On 2020-02-11 at 16:40:56, Han-Wen Nienhuys wrote:
> I can see a future where we have a different format that allows for
> more metadata so we can encode the hash size separately. But maybe
> that can be for format v3 and up.
> 
> Let's do format v2 = format v1 but with 32-byte hashes.

The way we've preferred to do this in Git is to write a four-byte hash
identifier into the file, but we have legacy concerns here, so I think
switching to v2 for SHA-256 is fine.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 868 bytes --]

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v6 0/5] Reftable support git-core
  2020-02-10 14:14       ` [PATCH v5 " Han-Wen Nienhuys via GitGitGadget
                           ` (4 preceding siblings ...)
  2020-02-10 14:14         ` [PATCH v5 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-02-18  8:43         ` Han-Wen Nienhuys via GitGitGadget
  2020-02-18  8:43           ` [PATCH v6 1/5] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
                             ` (7 more replies)
  5 siblings, 8 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-18  8:43 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend.

Feedback wanted:

 * spots marked with XXX in the Git source code.
 * how to write a unittest script? From the command-line, it works; inside
   the test framework, it says: 

fatal: could not open '/usr/local/google/home/hanwen/vc/git/t/trash directory.t0031-reftable/.git/refs/heads' for writing: Is a directory

ie. the files backend already initialized.

 * what is a good test strategy? Could we have a CI flavor where we flip the
   default to reftable, and see how it fares?

v6

 * implement reflog expiry.

v7

 * support SHA256 as version 2 of the format.

Han-Wen Nienhuys (5):
  refs.h: clarify reflog iteration order
  create .git/refs in files-backend.c
  refs: document how ref_iterator_advance_fn should handle symrefs
  Add reftable library
  Reftable support for git-core

 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   24 +-
 builtin/clone.c                               |    4 +-
 builtin/init-db.c                             |   57 +-
 cache.h                                       |    4 +-
 refs.c                                        |   20 +-
 refs.h                                        |    8 +-
 refs/files-backend.c                          |    6 +
 refs/refs-internal.h                          |    6 +
 refs/reftable-backend.c                       |  971 +++++++++++++++
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   19 +
 reftable/VERSION                              |   11 +
 reftable/basics.c                             |  196 +++
 reftable/basics.h                             |   37 +
 reftable/block.c                              |  413 ++++++
 reftable/block.h                              |   71 ++
 reftable/blocksource.h                        |   20 +
 reftable/bytes.c                              |    0
 reftable/config.h                             |    1 +
 reftable/constants.h                          |   25 +
 reftable/dump.c                               |   97 ++
 reftable/file.c                               |   97 ++
 reftable/iter.c                               |  229 ++++
 reftable/iter.h                               |   56 +
 reftable/merged.c                             |  290 +++++
 reftable/merged.h                             |   34 +
 reftable/pq.c                                 |  114 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  718 +++++++++++
 reftable/reader.h                             |   52 +
 reftable/record.c                             | 1107 +++++++++++++++++
 reftable/record.h                             |   79 ++
 reftable/reftable.h                           |  405 ++++++
 reftable/slice.c                              |  199 +++
 reftable/slice.h                              |   39 +
 reftable/stack.c                              |  984 +++++++++++++++
 reftable/stack.h                              |   40 +
 reftable/system.h                             |   59 +
 reftable/tree.c                               |   66 +
 reftable/tree.h                               |   24 +
 reftable/writer.c                             |  630 ++++++++++
 reftable/writer.h                             |   46 +
 reftable/zlib-compat.c                        |   92 ++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 47 files changed, 7407 insertions(+), 32 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c


base-commit: e68e29171cc2d6968902e0654b5687fbe1ccb903
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v6
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v6
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v5:

 1:  c00403c94de = 1:  e4b0773becd refs.h: clarify reflog iteration order
 2:  4d6da9bc473 = 2:  c25b6c601dc create .git/refs in files-backend.c
 3:  fbdcdccc885 = 3:  e132e0f4e00 refs: document how ref_iterator_advance_fn should handle symrefs
 4:  546b82fe798 ! 4:  fe29a9db399 Add reftable library
     @@ -86,7 +86,7 @@
      +   cp reftable-repo/LICENSE reftable/ &&
      +   git --git-dir reftable-repo/.git show --no-patch origin/master \
      +    > reftable/VERSION && \
     -+   echo '/* empty */' > reftable/config.h
     ++   sed -i~ 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|' reftable/system.h
      +   rm reftable/*_test.c reftable/test_framework.*
      +   git add reftable/*.[ch]
      +
     @@ -98,11 +98,17 @@
       --- /dev/null
       +++ b/reftable/VERSION
      @@
     -+commit 6115b50fdb9bc662be39b05f5589bc109282ae7f
     ++commit 42aa04f301a682197f0698867b9771b78ce4fdaf
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Mon Feb 10 13:59:52 2020 +0100
     -+
     -+    README: add a note about the Java implementation
     ++Date:   Mon Feb 17 16:05:22 2020 +0100
     ++
     ++    C: simplify system and config header handling
     ++    
     ++    reftable.h only depends on <stdint.h>
     ++    
     ++    system.h is only included from .c files. The git-core specific code is
     ++    guarded by a #if REFTABLE_IN_GITCORE, which we'll edit with a sed
     ++    script on the git-core side.
      
       diff --git a/reftable/basics.c b/reftable/basics.c
       new file mode 100644
     @@ -485,19 +491,31 @@
      +	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
      +		int block_header_skip = 4 + w->header_off;
      +		struct slice compressed = {};
     -+		uLongf dest_len = 0, src_len = 0;
     -+		slice_resize(&compressed, w->next - block_header_skip);
     ++		int zresult = 0;
     ++		uLongf src_len = w->next - block_header_skip;
     ++		slice_resize(&compressed, src_len);
      +
     -+		dest_len = compressed.len;
     -+		src_len = w->next - block_header_skip;
     ++		while (1) {
     ++			uLongf dest_len = compressed.len;
      +
     -+		if (Z_OK != compress2(compressed.buf, &dest_len,
     -+				      w->buf + block_header_skip, src_len, 9)) {
     -+			free(slice_yield(&compressed));
     -+			return ZLIB_ERROR;
     ++			zresult = compress2(compressed.buf, &dest_len,
     ++					    w->buf + block_header_skip, src_len,
     ++					    9);
     ++			if (zresult == Z_BUF_ERROR) {
     ++				slice_resize(&compressed, 2 * compressed.len);
     ++				continue;
     ++			}
     ++
     ++			if (Z_OK != zresult) {
     ++				free(slice_yield(&compressed));
     ++				return ZLIB_ERROR;
     ++			}
     ++
     ++			memcpy(w->buf + block_header_skip, compressed.buf,
     ++			       dest_len);
     ++			w->next = dest_len + block_header_skip;
     ++			break;
      +		}
     -+		memcpy(w->buf + block_header_skip, compressed.buf, dest_len);
     -+		w->next = dest_len + block_header_skip;
      +	}
      +	return w->next;
      +}
     @@ -885,8 +903,6 @@
      +#ifndef CONSTANTS_H
      +#define CONSTANTS_H
      +
     -+#define SHA1_SIZE 20
     -+#define SHA256_SIZE 32
      +#define VERSION 1
      +#define HEADER_SIZE 24
      +#define FOOTER_SIZE 68
     @@ -1555,13 +1571,17 @@
      +	it->ops = &merged_iter_vtable;
      +}
      +
     -+int new_merged_table(struct merged_table **dest, struct reader **stack, int n)
     ++int new_merged_table(struct merged_table **dest, struct reader **stack, int n,
     ++		     int hash_size)
      +{
      +	uint64_t last_max = 0;
      +	uint64_t first_min = 0;
      +	int i = 0;
      +	for (i = 0; i < n; i++) {
      +		struct reader *r = stack[i];
     ++		if (reader_hash_size(r) != hash_size) {
     ++			return FORMAT_ERROR;
     ++		}
      +		if (i > 0 && last_max >= reader_min_update_index(r)) {
      +			return FORMAT_ERROR;
      +		}
     @@ -1578,7 +1598,7 @@
      +			.stack_len = n,
      +			.min = first_min,
      +			.max = last_max,
     -+			.hash_size = SHA1_SIZE,
     ++			.hash_size = hash_size,
      +		};
      +
      +		*dest = calloc(sizeof(struct merged_table), 1);
     @@ -1985,6 +2005,11 @@
      +	block_source_return_block(r->source, p);
      +}
      +
     ++int reader_hash_size(struct reader *r)
     ++{
     ++	return r->hash_size;
     ++}
     ++
      +const char *reader_name(struct reader *r)
      +{
      +	return r->name;
     @@ -2005,8 +2030,13 @@
      +		goto exit;
      +	}
      +
     ++	r->hash_size = SHA1_SIZE;
      +	{
      +		byte version = *f++;
     ++		if (version == 2) {
     ++			r->hash_size = SHA256_SIZE;
     ++			version = 1;
     ++		}
      +		if (version != 1) {
      +			err = FORMAT_ERROR;
      +			goto exit;
     @@ -2070,7 +2100,7 @@
      +	r->size = block_source_size(source) - FOOTER_SIZE;
      +	r->source = source;
      +	r->name = strdup(name);
     -+	r->hash_size = SHA1_SIZE;
     ++	r->hash_size = 0;
      +
      +	err = block_source_read_block(source, &footer, r->size, FOOTER_SIZE);
      +	if (err != FOOTER_SIZE) {
     @@ -3883,7 +3913,7 @@
      +#ifndef REFTABLE_H
      +#define REFTABLE_H
      +
     -+#include "system.h"
     ++#include <stdint.h>
      +
      +/* block_source is a generic wrapper for a seekable readable file.
      +   It is generally passed around by value.
     @@ -3934,6 +3964,10 @@
      +
      +	/* how often to write complete keys in each block. */
      +	int restart_interval;
     ++
     ++	/* width of the hash. Should be 20 for SHA1 or 32 for SHA256. Defaults
     ++	 * to SHA1 if unset */
     ++	int hash_size;
      +};
      +
      +/* ref_record holds a ref database entry target_value */
     @@ -4147,6 +4181,9 @@
      + */
      +int reader_seek_ref(struct reader *r, struct iterator *it, const char *name);
      +
     ++/* returns the hash size used in this table. */
     ++int reader_hash_size(struct reader *r);
     ++
      +/* seek to logs for the given name, older than update_index. */
      +int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
      +		       uint64_t update_index);
     @@ -4173,7 +4210,11 @@
      +/* new_merged_table creates a new merged table. It takes ownership of the stack
      +   array.
      +*/
     -+int new_merged_table(struct merged_table **dest, struct reader **stack, int n);
     ++int new_merged_table(struct merged_table **dest, struct reader **stack, int n,
     ++		     int hash_size);
     ++
     ++/* returns the hash size used in this merged table. */
     ++int merged_hash_size(struct merged_table *mt);
      +
      +/* returns an iterator positioned just before 'name' */
      +int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
     @@ -4688,7 +4729,8 @@
      +	}
      +
      +	/* success! */
     -+	err = new_merged_table(&new_merged, new_tables, new_tables_len);
     ++	err = new_merged_table(&new_merged, new_tables, new_tables_len,
     ++			       st->config.hash_size);
      +	if (err < 0) {
      +		goto exit;
      +	}
     @@ -5084,7 +5126,7 @@
      +	writer_set_limits(wr, st->merged->stack[first]->min_update_index,
      +			  st->merged->stack[last]->max_update_index);
      +
     -+	err = new_merged_table(&mt, subtabs, subtabs_len);
     ++	err = new_merged_table(&mt, subtabs, subtabs_len, st->config.hash_size);
      +	if (err < 0) {
      +		free(subtabs);
      +		goto exit;
     @@ -5568,9 +5610,7 @@
      +#ifndef SYSTEM_H
      +#define SYSTEM_H
      +
     -+#include "config.h"
     -+
     -+#ifndef REFTABLE_STANDALONE
     ++#if 1 /* REFTABLE_IN_GITCORE */
      +
      +#include "git-compat-util.h"
      +#include <zlib.h>
     @@ -5606,7 +5646,7 @@
      +		memcpy(&a, &b, sizeof(a));      \
      +		memcpy(&b, &tmp[0], sizeof(a)); \
      +	}
     -+#endif
     ++#endif /* REFTABLE_IN_GITCORE */
      +
      +typedef uint8_t byte;
      +typedef int bool;
     @@ -5614,6 +5654,9 @@
      +int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
      +			       const Bytef *source, uLong *sourceLen);
      +
     ++#define SHA1_SIZE 20
     ++#define SHA256_SIZE 32
     ++
      +#endif
      
       diff --git a/reftable/tree.c b/reftable/tree.c
     @@ -5788,6 +5831,9 @@
      +		opts->restart_interval = 16;
      +	}
      +
     ++	if (opts->hash_size == 0) {
     ++		opts->hash_size = SHA1_SIZE;
     ++	}
      +	if (opts->block_size == 0) {
      +		opts->block_size = DEFAULT_BLOCK_SIZE;
      +	}
     @@ -5796,7 +5842,8 @@
      +static int writer_write_header(struct writer *w, byte *dest)
      +{
      +	memcpy((char *)dest, "REFT", 4);
     -+	dest[4] = 1; /* version */
     ++	dest[4] = (w->hash_size == SHA1_SIZE) ? 1 : 2; /* version */
     ++
      +	put_u24(dest + 5, w->opts.block_size);
      +	put_u64(dest + 8, w->min_update_index);
      +	put_u64(dest + 16, w->max_update_index);
     @@ -5825,7 +5872,7 @@
      +		/* TODO - error return? */
      +		abort();
      +	}
     -+	wp->hash_size = SHA1_SIZE;
     ++	wp->hash_size = opts->hash_size;
      +	wp->block = calloc(opts->block_size, 1);
      +	wp->write = writer_func;
      +	wp->write_arg = writer_arg;
     @@ -6222,7 +6269,10 @@
      +	byte footer[68];
      +	byte *p = footer;
      +
     -+	writer_finish_public_section(w);
     ++	int err = writer_finish_public_section(w);
     ++	if (err != 0) {
     ++		return err;
     ++	}
      +
      +	writer_write_header(w, footer);
      +	p += 24;
 5:  702fb89871e ! 5:  8d95ab52f75 Reftable support for git-core
     @@ -6,10 +6,8 @@
      
          TODO:
      
     -     * Make CI on gitgitgadget compile.
     -     * Redo interaction between repo config, repo storage version
     -     * Resolve the design problem with reflog expiry.
           * Resolve spots marked with XXX
     +     * Test strategy?
      
          Example use:
      
     @@ -411,11 +409,6 @@ $^
      +
      +#include "../reftable/reftable.h"
      +
     -+/*
     -+  The reftable v1 spec only supports 20-byte binary SHA1s. A new format version
     -+  will be needed to support SHA256.
     -+ */
     -+
      +extern struct ref_storage_be refs_be_reftable;
      +
      +struct reftable_ref_store {
     @@ -470,6 +463,7 @@ $^
      +	struct ref_store *ref_store = (struct ref_store *)refs;
      +	struct write_options cfg = {
      +		.block_size = 4096,
     ++		.hash_size = the_hash_algo->rawsz,
      +	};
      +	struct strbuf sb = STRBUF_INIT;
      +
     @@ -676,6 +670,7 @@ $^
      +		(struct reftable_ref_store *)transaction->ref_store;
      +	uint64_t ts = stack_next_update_index(refs->stack);
      +	int err = 0;
     ++	struct log_record *logs = calloc(transaction->nr, sizeof(*logs));
      +	struct ref_update **sorted =
      +		malloc(transaction->nr * sizeof(struct ref_update *));
      +	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
     @@ -695,6 +690,14 @@ $^
      +
      +	for (int i = 0; i < transaction->nr; i++) {
      +		struct ref_update *u = sorted[i];
     ++		struct log_record *log = &logs[i];
     ++		fill_log_record(log);
     ++		log->ref_name = (char *)u->refname;
     ++		log->old_hash = u->old_oid.hash;
     ++		log->new_hash = u->new_oid.hash;
     ++		log->update_index = ts;
     ++		log->message = u->msg;
     ++
      +		if (u->flags & REF_HAVE_NEW) {
      +			struct object_id out_oid = {};
      +			int out_flags = 0;
     @@ -706,6 +709,7 @@ $^
      +			struct ref_record ref = {};
      +			ref.ref_name =
      +				(char *)(resolved ? resolved : u->refname);
     ++			log->ref_name = ref.ref_name;
      +			ref.value = u->new_oid.hash;
      +			ref.update_index = ts;
      +			err = writer_add_ref(writer, &ref);
     @@ -716,23 +720,15 @@ $^
      +	}
      +
      +	for (int i = 0; i < transaction->nr; i++) {
     -+		struct ref_update *u = sorted[i];
     -+		struct log_record log = {};
     -+		fill_log_record(&log);
     -+
     -+		log.ref_name = (char *)u->refname;
     -+		log.old_hash = u->old_oid.hash;
     -+		log.new_hash = u->new_oid.hash;
     -+		log.update_index = ts;
     -+		log.message = u->msg;
     -+
     -+		err = writer_add_log(writer, &log);
     -+		clear_log_record(&log);
     ++		err = writer_add_log(writer, &logs[i]);
     ++		clear_log_record(&logs[i]);
      +		if (err < 0) {
     -+			return err;
     ++			goto exit;
      +		}
      +	}
     ++
      +exit:
     ++	free(logs);
      +	free(sorted);
      +	return err;
      +}
     @@ -1076,7 +1072,7 @@ $^
      +			       1);
      +	ri->base.oid = &ri->oid;
      +
     -+	return empty_ref_iterator_begin();
     ++	return (struct ref_iterator *)ri;
      +}
      +
      +static int
     @@ -1214,6 +1210,53 @@ $^
      +	return 0;
      +}
      +
     ++struct reflog_expiry_arg {
     ++	struct reftable_ref_store *refs;
     ++	struct log_record *tombstones;
     ++	int len;
     ++	int cap;
     ++};
     ++
     ++static void clear_log_tombstones(struct reflog_expiry_arg *arg)
     ++{
     ++	int i = 0;
     ++	for (; i < arg->len; i++) {
     ++		log_record_clear(&arg->tombstones[i]);
     ++	}
     ++
     ++	FREE_AND_NULL(arg->tombstones);
     ++}
     ++
     ++static void add_log_tombstone(struct reflog_expiry_arg *arg,
     ++			      const char *refname, uint64_t ts)
     ++{
     ++	struct log_record tombstone = {
     ++		.ref_name = xstrdup(refname),
     ++		.update_index = ts,
     ++	};
     ++	if (arg->len == arg->cap) {
     ++		arg->cap = 2 * arg->cap + 1;
     ++		arg->tombstones =
     ++			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
     ++	}
     ++	arg->tombstones[arg->len++] = tombstone;
     ++}
     ++
     ++static int write_reflog_expiry_table(struct writer *writer, void *argv)
     ++{
     ++	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
     ++	uint64_t ts = stack_next_update_index(arg->refs->stack);
     ++	int i = 0;
     ++	writer_set_limits(writer, ts, ts);
     ++	for (i = 0; i < arg->len; i++) {
     ++		int err = writer_add_log(writer, &arg->tombstones[i]);
     ++		if (err) {
     ++			return err;
     ++		}
     ++	}
     ++	return 0;
     ++}
     ++
      +static int reftable_reflog_expire(struct ref_store *ref_store,
      +				  const char *refname,
      +				  const struct object_id *oid,
     @@ -1223,11 +1266,57 @@ $^
      +				  reflog_expiry_cleanup_fn cleanup_fn,
      +				  void *policy_cb_data)
      +{
     -+	BUG("per ref reflog expiry is not implemented for reftable backend.");
      +	/*
     -+	  TODO: should iterate over the reflog, and write tombstones. The
     -+	  tombstones will be removed on compaction.
     -+	 */
     ++	  For log expiry, we write tombstones in place of the expired entries,
     ++	  This means that the entries are still retrievable by delving into the
     ++	  stack, and expiring entries paradoxically takes extra memory.
     ++
     ++	  This memory is only reclaimed when some operation issues a
     ++	  reftable_pack_refs(), which will compact the entire stack and get rid
     ++	  of deletion entries.
     ++
     ++	  It would be better if the refs backend supported an API that sets a
     ++	  criterion for all refs, passing the criterion to pack_refs().
     ++	*/
     ++	struct reftable_ref_store *refs =
     ++		(struct reftable_ref_store *)ref_store;
     ++	struct merged_table *mt = stack_merged_table(refs->stack);
     ++	struct reflog_expiry_arg arg = {
     ++		.refs = refs,
     ++	};
     ++	struct log_record log = {};
     ++	struct iterator it = {};
     ++	int err = merged_table_seek_log(mt, &it, refname);
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	while (1) {
     ++		struct object_id ooid = {};
     ++		struct object_id noid = {};
     ++
     ++		int err = iterator_next_log(it, &log);
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++
     ++		if (err > 0 || strcmp(log.ref_name, refname)) {
     ++			break;
     ++		}
     ++		hashcpy(ooid.hash, log.old_hash);
     ++		hashcpy(noid.hash, log.new_hash);
     ++
     ++		if (should_prune_fn(&ooid, &noid, log.email,
     ++				    (timestamp_t)log.time, log.tz_offset,
     ++				    log.message, policy_cb_data)) {
     ++			add_log_tombstone(&arg, refname, log.update_index);
     ++		}
     ++	}
     ++	log_record_clear(&log);
     ++	iterator_destroy(&it);
     ++	err = stack_add(refs->stack, &write_reflog_expiry_table, &arg);
     ++	clear_log_tombstones(&arg);
     ++	return err;
      +}
      +
      +static int reftable_read_raw_ref(struct ref_store *ref_store,

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v6 1/5] refs.h: clarify reflog iteration order
  2020-02-18  8:43         ` [PATCH v6 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-02-18  8:43           ` Han-Wen Nienhuys via GitGitGadget
  2020-02-18  8:43           ` [PATCH v6 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
                             ` (6 subsequent siblings)
  7 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-18  8:43 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/refs.h b/refs.h
index 545029c6d80..87c9ec921b9 100644
--- a/refs.h
+++ b/refs.h
@@ -444,18 +444,21 @@ int delete_refs(const char *msg, struct string_list *refnames,
 int refs_delete_reflog(struct ref_store *refs, const char *refname);
 int delete_reflog(const char *refname);
 
-/* iterate over reflog entries */
+/* Iterate over reflog entries. */
 typedef int each_reflog_ent_fn(
 		struct object_id *old_oid, struct object_id *new_oid,
 		const char *committer, timestamp_t timestamp,
 		int tz, const char *msg, void *cb_data);
 
+/* Iterate in over reflog entries, oldest entry first. */
 int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
 			     each_reflog_ent_fn fn, void *cb_data);
 int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
 				     const char *refname,
 				     each_reflog_ent_fn fn,
 				     void *cb_data);
+
+/* Call a function for each reflog entry, oldest entry first. */
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v6 2/5] create .git/refs in files-backend.c
  2020-02-18  8:43         ` [PATCH v6 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-02-18  8:43           ` [PATCH v6 1/5] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
@ 2020-02-18  8:43           ` Han-Wen Nienhuys via GitGitGadget
  2020-02-18  8:43           ` [PATCH v6 3/5] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
                             ` (5 subsequent siblings)
  7 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-18  8:43 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This prepares for supporting the reftable format, which will want
create its own file system layout in .git

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/init-db.c    | 2 --
 refs/files-backend.c | 6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 944ec77fe10..45bdea05890 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
 	 */
-	safe_create_dir(git_path("refs"), 1);
-	adjust_shared_perm(git_path("refs"));
 
 	if (refs_init_db(&err))
 		die("failed to set up refs db: %s", err.buf);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 561c33ac8a9..ab7899a9c77 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3157,9 +3157,15 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
 	struct strbuf sb = STRBUF_INIT;
 
+	files_ref_path(refs, &sb, "refs");
+	safe_create_dir(sb.buf, 1);
+	/* adjust permissions even if directory already exists. */
+	adjust_shared_perm(sb.buf);
+
 	/*
 	 * Create .git/refs/{heads,tags}
 	 */
+	strbuf_reset(&sb);
 	files_ref_path(refs, &sb, "refs/heads");
 	safe_create_dir(sb.buf, 1);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v6 3/5] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-02-18  8:43         ` [PATCH v6 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-02-18  8:43           ` [PATCH v6 1/5] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
  2020-02-18  8:43           ` [PATCH v6 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
@ 2020-02-18  8:43           ` Han-Wen Nienhuys via GitGitGadget
  2020-02-18  8:43           ` [PATCH v6 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
                             ` (4 subsequent siblings)
  7 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-18  8:43 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index ff2436c0fb7..1d7a4852209 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -438,6 +438,11 @@ void base_ref_iterator_free(struct ref_iterator *iter);
 
 /* Virtual function declarations for ref_iterators: */
 
+/*
+ * backend-specific implementation of ref_iterator_advance.
+ * For symrefs, the function should set REF_ISSYMREF, and it should also dereference
+ * the symref to provide the OID referent.
+ */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v6 4/5] Add reftable library
  2020-02-18  8:43         ` [PATCH v6 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                             ` (2 preceding siblings ...)
  2020-02-18  8:43           ` [PATCH v6 3/5] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
@ 2020-02-18  8:43           ` Han-Wen Nienhuys via GitGitGadget
  2020-02-18 21:11             ` Junio C Hamano
  2020-02-18  8:43           ` [PATCH v6 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                             ` (3 subsequent siblings)
  7 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-18  8:43 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a new format for storing the ref database. It provides the
following benefits:

 * Simple and fast atomic ref transactions, including multiple refs and reflogs.
 * Compact storage of ref data.
 * Fast look ups of ref data.
 * Case-sensitive ref names on Windows/OSX, regardless of file system
 * Eliminates file/directory conflicts in ref names

Further context and motivation can be found in background reading:

* Spec: https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md

* Original discussion on JGit-dev:  https://www.eclipse.org/lists/jgit-dev/msg03389.html

* First design discussion on git@vger: https://public-inbox.org/git/CAJo=hJtTp2eA3z9wW9cHo-nA7kK40vVThqh6inXpbCcqfdMP9g@mail.gmail.com/

* Last design discussion on git@vger: https://public-inbox.org/git/CAJo=hJsZcAM9sipdVr7TMD-FD2V2W6_pvMQ791EGCDsDkQ033w@mail.gmail.com/

* First attempt at implementation: https://public-inbox.org/git/CAP8UFD0PPZSjBnxCA7ez91vBuatcHKQ+JUWvTD1iHcXzPBjPBg@mail.gmail.com/

* libgit2 support issue: https://github.com/libgit2/libgit2/issues

* GitLab support issue: https://gitlab.com/gitlab-org/git/issues/6

* go-git support issue: https://github.com/src-d/go-git/issues/1059

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE       |   31 ++
 reftable/README.md     |   19 +
 reftable/VERSION       |   11 +
 reftable/basics.c      |  196 +++++++
 reftable/basics.h      |   37 ++
 reftable/block.c       |  413 +++++++++++++++
 reftable/block.h       |   71 +++
 reftable/blocksource.h |   20 +
 reftable/bytes.c       |    0
 reftable/config.h      |    1 +
 reftable/constants.h   |   25 +
 reftable/dump.c        |   97 ++++
 reftable/file.c        |   97 ++++
 reftable/iter.c        |  229 +++++++++
 reftable/iter.h        |   56 ++
 reftable/merged.c      |  290 +++++++++++
 reftable/merged.h      |   34 ++
 reftable/pq.c          |  114 +++++
 reftable/pq.h          |   34 ++
 reftable/reader.c      |  718 ++++++++++++++++++++++++++
 reftable/reader.h      |   52 ++
 reftable/record.c      | 1107 ++++++++++++++++++++++++++++++++++++++++
 reftable/record.h      |   79 +++
 reftable/reftable.h    |  405 +++++++++++++++
 reftable/slice.c       |  199 ++++++++
 reftable/slice.h       |   39 ++
 reftable/stack.c       |  984 +++++++++++++++++++++++++++++++++++
 reftable/stack.h       |   40 ++
 reftable/system.h      |   59 +++
 reftable/tree.c        |   66 +++
 reftable/tree.h        |   24 +
 reftable/writer.c      |  630 +++++++++++++++++++++++
 reftable/writer.h      |   46 ++
 reftable/zlib-compat.c |   92 ++++
 34 files changed, 6315 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 00000000000..402e0f9356b
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 00000000000..333b6ba02b4
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,19 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   ((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
+    git clone https://github.com/google/reftable reftable-repo) && \
+   cp reftable-repo/c/*.[ch] reftable/ && \
+   cp reftable-repo/LICENSE reftable/ &&
+   git --git-dir reftable-repo/.git show --no-patch origin/master \
+    > reftable/VERSION && \
+   sed -i~ 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|' reftable/system.h
+   rm reftable/*_test.c reftable/test_framework.*
+   git add reftable/*.[ch]
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 00000000000..7a71a3b1d4e
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,11 @@
+commit 42aa04f301a682197f0698867b9771b78ce4fdaf
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Mon Feb 17 16:05:22 2020 +0100
+
+    C: simplify system and config header handling
+    
+    reftable.h only depends on <stdint.h>
+    
+    system.h is only included from .c files. The git-core specific code is
+    guarded by a #if REFTABLE_IN_GITCORE, which we'll edit with a sed
+    script on the git-core side.
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 00000000000..791dcc867ad
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,196 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_u24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)((i)&0xff);
+}
+
+uint32_t get_u24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_u32(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 24) & 0xff);
+	out[1] = (byte)((i >> 16) & 0xff);
+	out[2] = (byte)((i >> 8) & 0xff);
+	out[3] = (byte)((i)&0xff);
+}
+
+uint32_t get_u32(byte *in)
+{
+	return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
+	       (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
+}
+
+void put_u64(byte *out, uint64_t v)
+{
+	int i = 0;
+	for (i = sizeof(uint64_t); i--;) {
+		out[i] = (byte)(v & 0xff);
+		v >>= 8;
+	}
+}
+
+uint64_t get_u64(byte *out)
+{
+	uint64_t v = 0;
+	int i = 0;
+	for (i = 0; i < sizeof(uint64_t); i++) {
+		v = (v << 8) | (byte)(out[i] & 0xff);
+	}
+	return v;
+}
+
+void put_u16(byte *out, uint16_t i)
+{
+	out[0] = (byte)((i >> 8) & 0xff);
+	out[1] = (byte)((i)&0xff);
+}
+
+uint16_t get_u16(byte *in)
+{
+	return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
+}
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(int sz, int (*f)(int k, void *args), void *args)
+{
+	int lo = 0;
+	int hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		int mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		free(*p);
+		p++;
+	}
+	free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	for (char **p = names; *p; p++) {
+		len++;
+	}
+	return len;
+}
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = realloc(names,
+						names_cap * sizeof(char *));
+			}
+			names[names_len++] = strdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *error_str(int err)
+{
+	switch (err) {
+	case IO_ERROR:
+		return "I/O error";
+	case FORMAT_ERROR:
+		return "FORMAT_ERROR";
+	case NOT_EXIST_ERROR:
+		return "NOT_EXIST_ERROR";
+	case LOCK_ERROR:
+		return "LOCK_ERROR";
+	case API_ERROR:
+		return "API_ERROR";
+	case ZLIB_ERROR:
+		return "ZLIB_ERROR";
+	case -1:
+		return "general error";
+	default:
+		return "unknown error code";
+	}
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 00000000000..0ad368cfd3d
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,37 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+void put_u24(byte *out, uint32_t i);
+uint32_t get_u24(byte *in);
+
+uint64_t get_u64(byte *in);
+void put_u64(byte *out, uint64_t i);
+
+void put_u32(byte *out, uint32_t i);
+uint32_t get_u32(byte *in);
+
+void put_u16(byte *out, uint16_t i);
+uint16_t get_u16(byte *in);
+int binsearch(int sz, int (*f)(int k, void *args), void *args);
+
+void free_names(char **a);
+void parse_names(char *buf, int size, char ***namesp);
+int names_equal(char **a, char **b);
+int names_length(char **names);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 00000000000..2ad1bea0ede
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,413 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "blocksource.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct record rec)
+{
+	struct slice empty = {};
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = {};
+	int n = 0;
+
+	record_key(rec, &key);
+	n = encode_key(&restart, out, last, key, record_val_type(rec));
+	if (n < 0) {
+		goto err;
+	}
+	out.buf += n;
+	out.len -= n;
+
+	n = record_encode(rec, out, w->hash_size);
+	if (n < 0) {
+		goto err;
+	}
+
+	out.buf += n;
+	out.len -= n;
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0) {
+		goto err;
+	}
+
+	free(slice_yield(&key));
+	return 0;
+
+err:
+	free(slice_yield(&key));
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next) {
+		return -1;
+	}
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_u24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_u16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_u24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = {};
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		slice_resize(&compressed, src_len);
+
+		while (1) {
+			uLongf dest_len = compressed.len;
+
+			zresult = compress2(compressed.buf, &dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				slice_resize(&compressed, 2 * compressed.len);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				free(slice_yield(&compressed));
+				return ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed.buf,
+			       dest_len);
+			w->next = dest_len + block_header_skip;
+			break;
+		}
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_u24(block->data + header_off + 1);
+
+	if (!is_block_type(typ)) {
+		return FORMAT_ERROR;
+	}
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = {};
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip;
+		uLongf src_len = block->len - block_header_skip;
+
+		slice_resize(&uncompressed, sz);
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			free(slice_yield(&uncompressed));
+			return ZLIB_ERROR;
+		}
+
+		block_source_return_block(block->source, block);
+		block->data = uncompressed.buf;
+		block->len = dst_len; /* XXX: 4 bytes missing? */
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	{
+		uint16_t restart_count = get_u16(block->data + sz - 2);
+		uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+		byte *restart_bytes = block->data + restart_start;
+
+		/* transfer ownership. */
+		br->block = *block;
+		block->data = NULL;
+		block->len = 0;
+
+		br->hash_size = hash_size;
+		br->block_len = restart_start;
+		br->full_block_size = full_block_size;
+		br->header_off = header_off;
+		br->restart_count = restart_count;
+		br->restart_bytes = restart_bytes;
+	}
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_u24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	struct slice key;
+	struct block_reader *r;
+	int error;
+};
+
+static int restart_key_less(int idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = {};
+	struct slice last_key = {};
+	byte unused_extra;
+	int n = decode_key(&rkey, &unused_extra, last_key, in);
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	{
+		int result = slice_compare(a->key, rkey);
+		free(slice_yield(&rkey));
+		return result;
+	}
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct record rec)
+{
+	if (it->next_off >= it->br->block_len) {
+		return 1;
+	}
+
+	{
+		struct slice in = {
+			.buf = it->br->block.data + it->next_off,
+			.len = it->br->block_len - it->next_off,
+		};
+		struct slice start = in;
+		struct slice key = {};
+		byte extra;
+		int n = decode_key(&key, &extra, it->last_key, in);
+		if (n < 0) {
+			return -1;
+		}
+
+		in.buf += n;
+		in.len -= n;
+		n = record_decode(rec, key, extra, in, it->br->hash_size);
+		if (n < 0) {
+			return -1;
+		}
+		in.buf += n;
+		in.len -= n;
+
+		slice_copy(&it->last_key, key);
+		it->next_off += start.len - in.len;
+		free(slice_yield(&key));
+		return 0;
+	}
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = {};
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	byte extra = 0;
+	int n = decode_key(key, &extra, empty, in);
+	if (n < 0) {
+		return n;
+	}
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	free(slice_yield(&it->last_key));
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		return -1;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	{
+		struct record rec = new_record(block_reader_type(br));
+		struct slice key = {};
+		int result = 0;
+		int err = 0;
+		struct block_iter next = {};
+		while (true) {
+			block_iter_copy_from(&next, it);
+
+			err = block_iter_next(&next, rec);
+			if (err < 0) {
+				result = -1;
+				goto exit;
+			}
+
+			record_key(rec, &key);
+			if (err > 0 || slice_compare(key, want) >= 0) {
+				result = 0;
+				goto exit;
+			}
+
+			block_iter_copy_from(it, &next);
+		}
+
+	exit:
+		free(slice_yield(&key));
+		free(slice_yield(&next.last_key));
+		record_clear(rec);
+		free(record_yield(&rec));
+
+		return result;
+	}
+}
+
+void block_writer_reset(struct block_writer *bw)
+{
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	free(slice_yield(&bw->last_key));
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 00000000000..bb42588111f
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,71 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+	uint32_t header_off;
+	int restart_interval;
+	int hash_size;
+
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+	struct slice last_key;
+	int entries;
+};
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+byte block_writer_type(struct block_writer *bw);
+int block_writer_add(struct block_writer *w, struct record rec);
+int block_writer_finish(struct block_writer *w);
+void block_writer_reset(struct block_writer *bw);
+void block_writer_clear(struct block_writer *bw);
+
+struct block_reader {
+	uint32_t header_off;
+	struct block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint32_t full_block_size;
+	uint16_t restart_count;
+};
+
+struct block_iter {
+	struct block_reader *br;
+	struct slice last_key;
+	uint32_t next_off;
+};
+
+int block_reader_init(struct block_reader *br, struct block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want);
+byte block_reader_type(struct block_reader *r);
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+int block_iter_next(struct block_iter *it, struct record rec);
+int block_iter_seek(struct block_iter *it, struct slice want);
+void block_iter_close(struct block_iter *it);
+
+#endif
diff --git a/reftable/blocksource.h b/reftable/blocksource.h
new file mode 100644
index 00000000000..f3ad3a4c229
--- /dev/null
+++ b/reftable/blocksource.h
@@ -0,0 +1,20 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCKSOURCE_H
+#define BLOCKSOURCE_H
+
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source source);
+
+#endif
diff --git a/reftable/bytes.c b/reftable/bytes.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/reftable/config.h b/reftable/config.h
new file mode 100644
index 00000000000..40a8c178f10
--- /dev/null
+++ b/reftable/config.h
@@ -0,0 +1 @@
+/* empty */
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 00000000000..85399fa4758
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,25 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define VERSION 1
+#define HEADER_SIZE 24
+#define FOOTER_SIZE 68
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 00000000000..acabe18fbed
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "reftable.h"
+
+static int dump_table(const char *tablename)
+{
+	struct block_source src = {};
+	int err = block_source_from_file(&src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	struct reader *r = NULL;
+	err = new_reader(&r, src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct iterator it = {};
+		err = reader_seek_ref(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+
+		struct ref_record ref = {};
+		while (1) {
+			err = iterator_next_ref(it, &ref);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			ref_record_print(&ref, 20);
+		}
+		iterator_destroy(&it);
+		ref_record_clear(&ref);
+	}
+
+	{
+		struct iterator it = {};
+		err = reader_seek_log(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+		struct log_record log = {};
+		while (1) {
+			err = iterator_next_log(it, &log);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			log_record_print(&log, 20);
+		}
+		iterator_destroy(&it);
+		log_record_clear(&log);
+	}
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int opt;
+	const char *table = NULL;
+	while ((opt = getopt(argc, argv, "t:")) != -1) {
+		switch (opt) {
+		case 't':
+			table = strdup(optarg);
+			break;
+		case '?':
+			printf("usage: %s [-table tablefile]\n", argv[0]);
+			return 2;
+			break;
+		}
+	}
+
+	if (table != NULL) {
+		int err = dump_table(table);
+		if (err < 0) {
+			fprintf(stderr, "%s: %s: %s\n", argv[0], table,
+				error_str(err));
+			return 1;
+		}
+	}
+	return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 00000000000..b2ea90bf941
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	free(b);
+}
+
+static int file_read_block(void *v, struct block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size) {
+		return -1;
+	}
+	dest->len = size;
+	return size;
+}
+
+struct block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int block_source_from_file(struct block_source *bs, const char *name)
+{
+	struct stat st = {};
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0) {
+		return -1;
+	}
+
+	{
+		struct file_block_source *p =
+			calloc(sizeof(struct file_block_source), 1);
+		p->size = st.st_size;
+		p->fd = fd;
+
+		bs->ops = &file_vtable;
+		bs->arg = p;
+	}
+	return 0;
+}
+
+int fd_writer(void *arg, byte *data, int sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 00000000000..c06c8913662
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,229 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct iterator it)
+{
+	return it.ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct record rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct iterator *it)
+{
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct iterator it, struct record rec)
+{
+	return it.ops->next(it.iter_arg, rec);
+}
+
+void iterator_destroy(struct iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int iterator_next_ref(struct iterator it, struct ref_record *ref)
+{
+	struct record rec = {};
+	record_from_ref(&rec, ref);
+	return iterator_next(it, rec);
+}
+
+int iterator_next_log(struct iterator it, struct log_record *log)
+{
+	struct record rec = {};
+	record_from_log(&rec, log);
+	return iterator_next(it, rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	free(slice_yield(&fri->oid));
+	iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct ref_record *ref = (struct ref_record *)rec.data;
+
+	while (true) {
+		int err = iterator_next_ref(fri->it, ref);
+		if (err != 0) {
+			return err;
+		}
+
+		if (fri->double_check) {
+			struct iterator it = {};
+
+			int err = reader_seek_ref(fri->r, &it, ref->ref_name);
+			if (err == 0) {
+				err = iterator_next_ref(it, ref);
+			}
+
+			iterator_destroy(&it);
+
+			if (err < 0) {
+				return err;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+}
+
+struct iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reader_return_block(it->r, &it->block_reader.block);
+	free(slice_yield(&it->oid));
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reader_return_block(it->r, &it->block_reader.block);
+
+	{
+		uint64_t off = it->offsets[it->offset_idx++];
+		int err = reader_init_block_reader(it->r, &it->block_reader,
+						   off, BLOCK_TYPE_REF);
+		if (err < 0) {
+			return err;
+		}
+		if (err > 0) {
+			/* indexed block does not exist. */
+			return FORMAT_ERROR;
+		}
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct record rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct ref_record *ref = (struct ref_record *)rec.data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reader *r, byte *oid, int oid_len,
+			       uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter *itr =
+		calloc(sizeof(struct indexed_table_ref_iter), 1);
+	int err = 0;
+
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 00000000000..f497f2a27eb
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,56 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct iterator_vtable {
+	int (*next)(void *iter_arg, struct record rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct iterator *it);
+int iterator_next(struct iterator it, struct record rec);
+bool iterator_is_null(struct iterator it);
+
+struct filtering_ref_iterator {
+	struct reader *r;
+	struct slice oid;
+	bool double_check;
+	struct iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *,
+					  struct filtering_ref_iterator *);
+
+struct indexed_table_ref_iter {
+	struct reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reader *r, byte *oid, int oid_len,
+			       uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 00000000000..36ea9da9220
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,290 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct record rec = new_record(mi->typ);
+		int err = iterator_next(mi->stack[i], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			iterator_destroy(&mi->stack[i]);
+			record_clear(rec);
+			free(record_yield(&rec));
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		iterator_destroy(&mi->stack[i]);
+	}
+	free(mi->stack);
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, int idx)
+{
+	if (iterator_is_null(mi->stack[idx])) {
+		return 0;
+	}
+
+	{
+		struct record rec = new_record(mi->typ);
+		struct pq_entry e = {
+			.rec = rec,
+			.index = idx,
+		};
+		int err = iterator_next(mi->stack[idx], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			iterator_destroy(&mi->stack[idx]);
+			record_clear(rec);
+			free(record_yield(&rec));
+			return 0;
+		}
+
+		merged_iter_pqueue_add(&mi->pq, e);
+	}
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct record rec)
+{
+	struct slice entry_key = {};
+	struct pq_entry entry = merged_iter_pqueue_remove(&mi->pq);
+	int err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0) {
+		return err;
+	}
+
+	record_key(entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = {};
+		int err = 0, cmp = 0;
+
+		record_key(top.rec, &k);
+
+		cmp = slice_compare(k, entry_key);
+		free(slice_yield(&k));
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		record_clear(top.rec);
+		free(record_yield(&top.rec));
+	}
+
+	record_copy_from(rec, entry.rec, mi->hash_size);
+	record_clear(entry.rec);
+	free(record_yield(&entry.rec));
+	free(slice_yield(&entry_key));
+	return 0;
+}
+
+static int merged_iter_next_void(void *p, struct record rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	return merged_iter_next(mi, rec);
+}
+
+struct iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct iterator *it,
+				      struct merged_iter *mi)
+{
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n,
+		     int hash_size)
+{
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reader *r = stack[i];
+		if (reader_hash_size(r) != hash_size) {
+			return FORMAT_ERROR;
+		}
+		if (i > 0 && last_max >= reader_min_update_index(r)) {
+			return FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reader_min_update_index(r);
+		}
+
+		last_max = reader_max_update_index(r);
+	}
+
+	{
+		struct merged_table m = {
+			.stack = stack,
+			.stack_len = n,
+			.min = first_min,
+			.max = last_max,
+			.hash_size = hash_size,
+		};
+
+		*dest = calloc(sizeof(struct merged_table), 1);
+		**dest = m;
+	}
+	return 0;
+}
+
+void merged_table_close(struct merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void merged_table_free(struct merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	free(mt);
+}
+
+uint64_t merged_max_update_index(struct merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t merged_min_update_index(struct merged_table *mt)
+{
+	return mt->min;
+}
+
+static int merged_table_seek_record(struct merged_table *mt,
+				    struct iterator *it, struct record rec)
+{
+	struct iterator *iters = calloc(sizeof(struct iterator), mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = record_type(rec),
+		.hash_size = mt->hash_size,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			iterator_destroy(&iters[i]);
+		}
+		free(iters);
+		return err;
+	}
+
+	merged.stack_len = n, err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	}
+
+	{
+		struct merged_iter *p = malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+			  const char *name)
+{
+	struct ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = {};
+	record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+			     const char *name, uint64_t update_index)
+{
+	struct log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = {};
+	record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+			  const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 00000000000..b8d3572e26e
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct merged_table {
+	struct reader **stack;
+	int stack_len;
+	int hash_size;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct iterator *stack;
+	int hash_size;
+	int stack_len;
+	byte typ;
+	struct merged_iter_pqueue pq;
+} merged_iter;
+
+void merged_table_clear(struct merged_table *mt);
+
+#endif
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 00000000000..a1aff7c98c3
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,114 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = {};
+	struct slice bk = {};
+	int cmp = 0;
+	record_key(a.rec, &ak);
+	record_key(b.rec, &bk);
+
+	cmp = slice_compare(ak, bk);
+
+	free(slice_yield(&ak));
+	free(slice_yield(&bk));
+
+	if (cmp == 0) {
+		return a.index > b.index;
+	}
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = realloc(pq->heap, pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		record_clear(pq->heap[i].rec);
+		free(record_yield(&pq->heap[i].rec));
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 00000000000..5f7018979dd
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	struct record rec;
+	int index;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 00000000000..f9a52992ab6
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,718 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct block_source source)
+{
+	return source.ops->size(source.arg);
+}
+
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size)
+{
+	int result = source.ops->read_block(source.arg, dest, off, size);
+	dest->source = source;
+	return result;
+}
+
+void block_source_return_block(struct block_source source, struct block *blockp)
+{
+	source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reader_offsets *reader_offsets_for(struct reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reader *r, struct block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size) {
+		return 0;
+	}
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(r->source, dest, off, sz);
+}
+
+void reader_return_block(struct reader *r, struct block *p)
+{
+	block_source_return_block(r->source, p);
+}
+
+int reader_hash_size(struct reader *r)
+{
+	return r->hash_size;
+}
+
+const char *reader_name(struct reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	int err = 0;
+	if (memcmp(f, "REFT", 4)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, HEADER_SIZE)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+
+	r->hash_size = SHA1_SIZE;
+	{
+		byte version = *f++;
+		if (version == 2) {
+			r->hash_size = SHA256_SIZE;
+			version = 1;
+		}
+		if (version != 1) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	r->block_size = get_u24(f);
+
+	f += 3;
+	r->min_update_index = get_u64(f);
+	f += 8;
+	r->max_update_index = get_u64(f);
+	f += 8;
+
+	r->ref_offsets.index_offset = get_u64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_u64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_u64(f);
+	f += 8;
+	r->log_offsets.offset = get_u64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_u64(f);
+	f += 8;
+
+	{
+		uint32_t computed_crc = crc32(0, footer, f - footer);
+		uint32_t file_crc = get_u32(f);
+		f += 4;
+		if (computed_crc != file_crc) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	{
+		byte first_block_typ = header[HEADER_SIZE];
+		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+		r->ref_offsets.offset = 0;
+		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+					  r->log_offsets.offset > 0);
+		r->obj_offsets.present = r->obj_offsets.offset > 0;
+	}
+	err = 0;
+exit:
+	return err;
+}
+
+int init_reader(struct reader *r, struct block_source source, const char *name)
+{
+	struct block footer = {};
+	struct block header = {};
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reader));
+	r->size = block_source_size(source) - FOOTER_SIZE;
+	r->source = source;
+	r->name = strdup(name);
+	r->hash_size = 0;
+
+	err = block_source_read_block(source, &footer, r->size, FOOTER_SIZE);
+	if (err != FOOTER_SIZE) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	/* Need +1 to read type of first block. */
+	err = reader_get_block(r, &header, 0, HEADER_SIZE + 1);
+	if (err != HEADER_SIZE + 1) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+exit:
+	block_source_return_block(r->source, &footer);
+	block_source_return_block(r->source, &header);
+	return err;
+}
+
+struct table_iter {
+	struct reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
+		((struct ref_record *)rec.data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reader_return_block(ti->r, &ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += 24;
+	}
+
+	*typ = data[0];
+	if (is_block_type(*typ)) {
+		result = get_u24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct block block = {};
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : HEADER_SIZE;
+	int32_t block_size = 0;
+
+	if (next_off >= r->size) {
+		return 1;
+	}
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0) {
+		return err;
+	}
+
+	block_size = extract_block_size(block.data, &block_typ, next_off);
+	if (block_size < 0) {
+		return block_size;
+	}
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reader_return_block(r, &block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reader_return_block(r, &block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 r->hash_size);
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = {};
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0) {
+		return err;
+	}
+
+	{
+		struct block_reader *brp = malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct record rec)
+{
+	if (record_type(rec) != ti->typ) {
+		return API_ERROR;
+	}
+
+	while (true) {
+		struct table_iter next = {};
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct record rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct iterator *it, struct table_iter *ti)
+{
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reader *r, struct table_iter *ti,
+				uint64_t off, byte typ)
+{
+	struct block_reader br = {};
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0) {
+		return err;
+	}
+
+	brp = malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reader *r, struct table_iter *ti, byte typ,
+			bool index)
+{
+	struct reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reader *r, struct table_iter *ti,
+			      struct record want)
+{
+	struct record rec = new_record(record_type(want));
+	struct slice want_key = {};
+	struct slice got_key = {};
+	struct table_iter next = {};
+	int err = -1;
+	record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0) {
+			goto exit;
+		}
+		{
+			int cmp = slice_compare(got_key, want_key);
+			if (cmp > 0) {
+				table_iter_block_done(&next);
+				break;
+			}
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0) {
+		goto exit;
+	}
+	err = 0;
+
+exit:
+	block_iter_close(&next.bi);
+	record_clear(rec);
+	free(record_yield(&rec));
+	free(slice_yield(&want_key));
+	free(slice_yield(&got_key));
+	return err;
+}
+
+static int reader_seek_indexed(struct reader *r, struct iterator *it,
+			       struct record rec)
+{
+	struct index_record want_index = {};
+	struct record want_index_rec = {};
+	struct index_record index_result = {};
+	struct record index_result_rec = {};
+	struct table_iter index_iter = {};
+	struct table_iter next = {};
+	int err = 0;
+
+	record_key(rec, &want_index.last_key);
+	record_from_index(&want_index_rec, &want_index);
+	record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, record_type(rec), true);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reader_seek_linear(r, &index_iter, want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (next.typ == record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter *malloced =
+			calloc(sizeof(struct table_iter), 1);
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+exit:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	record_clear(want_index_rec);
+	record_clear(index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reader *r, struct iterator *it,
+				struct record rec)
+{
+	struct reader_offsets *offs = reader_offsets_for(r, record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = {};
+	int err = 0;
+	if (idx > 0) {
+		return reader_seek_indexed(r, it, rec);
+	}
+
+	err = reader_start(r, &ti, record_type(rec), false);
+	if (err < 0) {
+		return err;
+	}
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct table_iter *p = malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reader *r, struct iterator *it, struct record rec)
+{
+	byte typ = record_type(rec);
+
+	struct reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name)
+{
+	struct ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = {};
+	record_from_ref(&rec, &ref);
+	return reader_seek(r, it, rec);
+}
+
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+		       uint64_t update_index)
+{
+	struct log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = {};
+	record_from_log(&rec, &log);
+	return reader_seek(r, it, rec);
+}
+
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int new_reader(struct reader **p, struct block_source src, char const *name)
+{
+	struct reader *rd = calloc(sizeof(struct reader), 1);
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		free(rd);
+	}
+	return err;
+}
+
+void reader_free(struct reader *r)
+{
+	reader_close(r);
+	free(r);
+}
+
+static int reader_refs_for_indexed(struct reader *r, struct iterator *it,
+				   byte *oid)
+{
+	struct obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct record want_rec = {};
+	struct iterator oit = {};
+	struct obj_record got = {};
+	struct record got_rec = {};
+	int err = 0;
+
+	record_from_obj(&want_rec, &want);
+
+	err = reader_seek(r, &oit, want_rec);
+	if (err != 0) {
+		return err;
+	}
+
+	record_from_obj(&got_rec, &got);
+	err = iterator_next(oit, got_rec);
+	iterator_destroy(&oit);
+	if (err < 0) {
+		return err;
+	}
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	{
+		struct indexed_table_ref_iter *itr = NULL;
+		err = new_indexed_table_ref_iter(&itr, r, oid, r->hash_size,
+						 got.offsets, got.offset_len);
+		if (err < 0) {
+			record_clear(got_rec);
+			return err;
+		}
+		got.offsets = NULL;
+		record_clear(got_rec);
+
+		iterator_from_indexed_table_ref_iter(it, itr);
+	}
+
+	return 0;
+}
+
+static int reader_refs_for_unindexed(struct reader *r, struct iterator *it,
+				     byte *oid, int oid_len)
+{
+	struct table_iter *ti = calloc(sizeof(struct table_iter), 1);
+	struct filtering_ref_iterator *filter = NULL;
+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		free(ti);
+		return err;
+	}
+
+	filter = calloc(sizeof(struct filtering_ref_iterator), 1);
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	filter->r = r;
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
+		    int oid_len)
+{
+	if (r->obj_offsets.present) {
+		return reader_refs_for_indexed(r, it, oid);
+	}
+	return reader_refs_for_unindexed(r, it, oid, oid_len);
+}
+
+uint64_t reader_max_update_index(struct reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reader_min_update_index(struct reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 00000000000..599a90028e2
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,52 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source *source);
+
+struct reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+struct reader {
+	struct block_source source;
+	char *name;
+	int hash_size;
+	uint64_t size;
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	int object_id_len;
+
+	struct reader_offsets ref_offsets;
+	struct reader_offsets obj_offsets;
+	struct reader_offsets log_offsets;
+};
+
+int init_reader(struct reader *r, struct block_source source, const char *name);
+int reader_seek(struct reader *r, struct iterator *it, struct record rec);
+void reader_close(struct reader *r);
+const char *reader_name(struct reader *r);
+void reader_return_block(struct reader *r, struct block *p);
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 00000000000..a006f9019d8
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1107 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0) {
+		return -1;
+	}
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = {};
+	int i = 9;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	{
+		int n = sizeof(buf) - i - 1;
+		if (dest.len < n) {
+			return -1;
+		}
+		memcpy(dest.buf, &buf[i + 1], n);
+		return n;
+	}
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+	if (in.len < tsize) {
+		return -1;
+	}
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	in.buf += tsize;
+	in.len -= tsize;
+
+	return start_len - in.len;
+}
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0) {
+		return -1;
+	}
+	dest.buf += n;
+	dest.len -= n;
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0) {
+		return -1;
+	}
+	dest.buf += n;
+	dest.len -= n;
+
+	if (dest.len < suffix_len) {
+		return -1;
+	}
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	dest.buf += suffix_len;
+	dest.len -= suffix_len;
+
+	return start.len - dest.len;
+}
+
+static byte ref_record_type(void)
+{
+	return BLOCK_TYPE_REF;
+}
+
+static void ref_record_key(const void *r, struct slice *dest)
+{
+	const struct ref_record *rec = (const struct ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void ref_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct ref_record *ref = (struct ref_record *)rec;
+	struct ref_record *src = (struct ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = strdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = strdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9) {
+		return '0' + c;
+	}
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void ref_record_print(struct ref_record *ref, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = {};
+
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size);
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size);
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void ref_record_clear_void(void *rec)
+{
+	ref_record_clear((struct ref_record *)rec);
+}
+
+void ref_record_clear(struct ref_record *ref)
+{
+	free(ref->ref_name);
+	free(ref->target);
+	free(ref->target_value);
+	free(ref->value);
+	memset(ref, 0, sizeof(struct ref_record));
+}
+
+static byte ref_record_val_type(const void *rec)
+{
+	const struct ref_record *r = (const struct ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL) {
+		return 3;
+	}
+	return 0;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+	if (s.len < l) {
+		return -1;
+	}
+	memcpy(s.buf, str, l);
+	s.buf += l;
+	s.len -= l;
+
+	return start.len - s.len;
+}
+
+static int ref_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	const struct ref_record *r = (const struct ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		s.buf += hash_size;
+		s.len -= hash_size;
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		s.buf += hash_size;
+		s.len -= hash_size;
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		s.buf += n;
+		s.len -= n;
+	}
+
+	return start.len - s.len;
+}
+
+static int ref_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct ref_record *r = (struct ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0) {
+		return n;
+	}
+	assert(hash_size > 0);
+
+	in.buf += n;
+	in.len -= n;
+
+	r->ref_name = realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		in.buf += hash_size;
+		in.len -= hash_size;
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		in.buf += hash_size;
+		in.len -= hash_size;
+		break;
+	case 3: {
+		struct slice dest = {};
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		in.buf += n;
+		in.len -= n;
+		seen_target = true;
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+
+	if (prefix_len > last_key.len) {
+		return -1;
+	}
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len) {
+		return -1;
+	}
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	in.buf += suffix_len;
+	in.len -= suffix_len;
+
+	return start_len - in.len;
+}
+
+struct record_vtable ref_record_vtable = {
+	.key = &ref_record_key,
+	.type = &ref_record_type,
+	.copy_from = &ref_record_copy_from,
+	.val_type = &ref_record_val_type,
+	.encode = &ref_record_encode,
+	.decode = &ref_record_decode,
+	.clear = &ref_record_clear_void,
+};
+
+static byte obj_record_type(void)
+{
+	return BLOCK_TYPE_OBJ;
+}
+
+static void obj_record_key(const void *r, struct slice *dest)
+{
+	const struct obj_record *rec = (const struct obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	const struct obj_record *src = (const struct obj_record *)src_rec;
+
+	*ref = *src;
+	ref->hash_prefix = malloc(ref->hash_prefix_len);
+	memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
+
+	{
+		int olen = ref->offset_len * sizeof(uint64_t);
+		ref->offsets = malloc(olen);
+		memcpy(ref->offsets, src->offsets, olen);
+	}
+}
+
+static void obj_record_clear(void *rec)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	FREE_AND_NULL(ref->hash_prefix);
+	FREE_AND_NULL(ref->offsets);
+	memset(ref, 0, sizeof(struct obj_record));
+}
+
+static byte obj_record_val_type(const void *rec)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8) {
+		return r->offset_len;
+	}
+	return 0;
+}
+
+static int obj_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		s.buf += n;
+		s.len -= n;
+	}
+	if (r->offset_len == 0) {
+		return start.len - s.len;
+	}
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	{
+		uint64_t last = r->offsets[0];
+		int i = 0;
+		for (i = 1; i < r->offset_len; i++) {
+			int n = put_var_int(s, r->offsets[i] - last);
+			if (n < 0) {
+				return -1;
+			}
+			s.buf += n;
+			s.len -= n;
+			last = r->offsets[i];
+		}
+	}
+	return start.len - s.len;
+}
+
+static int obj_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct obj_record *r = (struct obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	r->hash_prefix = malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		in.buf += n;
+		in.len -= n;
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0) {
+		return start.len - in.len;
+	}
+
+	r->offsets = malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0) {
+		return n;
+	}
+
+	in.buf += n;
+	in.len -= n;
+
+	{
+		uint64_t last = r->offsets[0];
+		int j = 1;
+		while (j < count) {
+			uint64_t delta = 0;
+			int n = get_var_int(&delta, in);
+			if (n < 0) {
+				return n;
+			}
+
+			in.buf += n;
+			in.len -= n;
+
+			last = r->offsets[j] = (delta + last);
+			j++;
+		}
+	}
+	return start.len - in.len;
+}
+
+struct record_vtable obj_record_vtable = {
+	.key = &obj_record_key,
+	.type = &obj_record_type,
+	.copy_from = &obj_record_copy_from,
+	.val_type = &obj_record_val_type,
+	.encode = &obj_record_encode,
+	.decode = &obj_record_decode,
+	.clear = &obj_record_clear,
+};
+
+void log_record_print(struct log_record *log, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = {};
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size);
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size);
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static byte log_record_type(void)
+{
+	return BLOCK_TYPE_LOG;
+}
+
+static void log_record_key(const void *r, struct slice *dest)
+{
+	const struct log_record *rec = (const struct log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_u64(dest->buf + 1 + len, ts);
+}
+
+static void log_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct log_record *dst = (struct log_record *)rec;
+	const struct log_record *src = (const struct log_record *)src_rec;
+
+	*dst = *src;
+	dst->ref_name = strdup(dst->ref_name);
+	dst->email = strdup(dst->email);
+	dst->name = strdup(dst->name);
+	dst->message = strdup(dst->message);
+	if (dst->new_hash != NULL) {
+		dst->new_hash = malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void log_record_clear_void(void *rec)
+{
+	struct log_record *r = (struct log_record *)rec;
+	log_record_clear(r);
+}
+
+void log_record_clear(struct log_record *r)
+{
+	free(r->ref_name);
+	free(r->new_hash);
+	free(r->old_hash);
+	free(r->name);
+	free(r->email);
+	free(r->message);
+	memset(r, 0, sizeof(struct log_record));
+}
+
+static byte log_record_val_type(const void *rec)
+{
+	return 1;
+}
+
+static byte zero[SHA256_SIZE] = {};
+
+static int log_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct log_record *r = (struct log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size) {
+		return -1;
+	}
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	s.buf += 2 * hash_size;
+	s.len -= 2 * hash_size;
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	n = put_var_int(s, r->time);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	if (s.len < 2) {
+		return -1;
+	}
+
+	put_u16(s.buf, r->tz_offset);
+	s.buf += 2;
+	s.len -= 2;
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	return start.len - s.len;
+}
+
+static int log_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct log_record *r = (struct log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = {};
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+		return FORMAT_ERROR;
+	}
+
+	r->ref_name = realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_u64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (in.len < 2 * hash_size) {
+		return FORMAT_ERROR;
+	}
+
+	r->old_hash = realloc(r->old_hash, hash_size);
+	r->new_hash = realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	in.buf += 2 * hash_size;
+	in.len -= 2 * hash_size;
+
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->name = realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->email = realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+	r->time = ts;
+	if (in.len < 2) {
+		goto error;
+	}
+
+	r->tz_offset = get_u16(in.buf);
+	in.buf += 2;
+	in.len -= 2;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->message = realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	return start.len - in.len;
+
+error:
+	free(slice_yield(&dest));
+	return FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL) {
+		a = empty;
+	}
+	if (b == NULL) {
+		b = empty;
+	}
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL) {
+		a = zero;
+	}
+	if (b == NULL) {
+		b = zero;
+	}
+	return !memcmp(a, b, sz);
+}
+
+bool log_record_equal(struct log_record *a, struct log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+struct record_vtable log_record_vtable = {
+	.key = &log_record_key,
+	.type = &log_record_type,
+	.copy_from = &log_record_copy_from,
+	.val_type = &log_record_val_type,
+	.encode = &log_record_encode,
+	.decode = &log_record_decode,
+	.clear = &log_record_clear_void,
+};
+
+struct record new_record(byte typ)
+{
+	struct record rec;
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct ref_record *r = calloc(1, sizeof(struct ref_record));
+		record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct obj_record *r = calloc(1, sizeof(struct obj_record));
+		record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct log_record *r = calloc(1, sizeof(struct log_record));
+		record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct index_record *r = calloc(1, sizeof(struct index_record));
+		record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+static byte index_record_type(void)
+{
+	return BLOCK_TYPE_INDEX;
+}
+
+static void index_record_key(const void *r, struct slice *dest)
+{
+	struct index_record *rec = (struct index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void index_record_copy_from(void *rec, const void *src_rec,
+				   int hash_size)
+{
+	struct index_record *dst = (struct index_record *)rec;
+	struct index_record *src = (struct index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void index_record_clear(void *rec)
+{
+	struct index_record *idx = (struct index_record *)rec;
+	free(slice_yield(&idx->last_key));
+}
+
+static byte index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int index_record_encode(const void *rec, struct slice out, int hash_size)
+{
+	const struct index_record *r = (const struct index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0) {
+		return n;
+	}
+
+	out.buf += n;
+	out.len -= n;
+
+	return start.len - out.len;
+}
+
+static int index_record_decode(void *rec, struct slice key, byte val_type,
+			       struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct index_record *r = (struct index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0) {
+		return n;
+	}
+
+	in.buf += n;
+	in.len -= n;
+	return start.len - in.len;
+}
+
+struct record_vtable index_record_vtable = {
+	.key = &index_record_key,
+	.type = &index_record_type,
+	.copy_from = &index_record_copy_from,
+	.val_type = &index_record_val_type,
+	.encode = &index_record_encode,
+	.decode = &index_record_decode,
+	.clear = &index_record_clear,
+};
+
+void record_key(struct record rec, struct slice *dest)
+{
+	rec.ops->key(rec.data, dest);
+}
+
+byte record_type(struct record rec)
+{
+	return rec.ops->type();
+}
+
+int record_encode(struct record rec, struct slice dest, int hash_size)
+{
+	return rec.ops->encode(rec.data, dest, hash_size);
+}
+
+void record_copy_from(struct record rec, struct record src, int hash_size)
+{
+	assert(src.ops->type() == rec.ops->type());
+
+	rec.ops->copy_from(rec.data, src.data, hash_size);
+}
+
+byte record_val_type(struct record rec)
+{
+	return rec.ops->val_type(rec.data);
+}
+
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size)
+{
+	return rec.ops->decode(rec.data, key, extra, src, hash_size);
+}
+
+void record_clear(struct record rec)
+{
+	return rec.ops->clear(rec.data);
+}
+
+void record_from_ref(struct record *rec, struct ref_record *ref_rec)
+{
+	rec->data = ref_rec;
+	rec->ops = &ref_record_vtable;
+}
+
+void record_from_obj(struct record *rec, struct obj_record *obj_rec)
+{
+	rec->data = obj_rec;
+	rec->ops = &obj_record_vtable;
+}
+
+void record_from_index(struct record *rec, struct index_record *index_rec)
+{
+	rec->data = index_rec;
+	rec->ops = &index_record_vtable;
+}
+
+void record_from_log(struct record *rec, struct log_record *log_rec)
+{
+	rec->data = log_rec;
+	rec->ops = &log_record_vtable;
+}
+
+void *record_yield(struct record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+struct ref_record *record_as_ref(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_REF);
+	return (struct ref_record *)rec.data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL) {
+		return !memcmp(a, b, hash_size);
+	}
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL) {
+		return 0 == strcmp(a, b);
+	}
+
+	return a == b;
+}
+
+bool ref_record_equal(struct ref_record *a, struct ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct ref_record *)a)->ref_name,
+		      ((struct ref_record *)b)->ref_name);
+}
+
+bool ref_record_is_deletion(const struct ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int log_record_compare_key(const void *a, const void *b)
+{
+	struct log_record *la = (struct log_record *)a;
+	struct log_record *lb = (struct log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp) {
+		return cmp;
+	}
+	if (la->update_index > lb->update_index) {
+		return -1;
+	}
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool log_record_is_deletion(const struct log_record *log)
+{
+	/* XXX */
+	return false;
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 00000000000..dffdd71fc28
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,79 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+struct record_vtable {
+	void (*key)(const void *rec, struct slice *dest);
+	byte (*type)(void);
+	void (*copy_from)(void *rec, const void *src, int hash_size);
+	byte (*val_type)(const void *rec);
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+	void (*clear)(void *rec);
+};
+
+/* record is a generic wrapper for differnt types of records. */
+struct record {
+	void *data;
+	struct record_vtable *ops;
+};
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+int common_prefix_size(struct slice a, struct slice b);
+
+int is_block_type(byte typ);
+struct record new_record(byte typ);
+
+extern struct record_vtable ref_record_vtable;
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra);
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in);
+
+struct index_record {
+	struct slice last_key;
+	uint64_t offset;
+};
+
+struct obj_record {
+	byte *hash_prefix;
+	int hash_prefix_len;
+	uint64_t *offsets;
+	int offset_len;
+};
+
+void record_key(struct record rec, struct slice *dest);
+byte record_type(struct record rec);
+void record_copy_from(struct record rec, struct record src, int hash_size);
+byte record_val_type(struct record rec);
+int record_encode(struct record rec, struct slice dest, int hash_size);
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size);
+void record_clear(struct record rec);
+void *record_yield(struct record *rec);
+void record_from_obj(struct record *rec, struct obj_record *objrec);
+void record_from_index(struct record *rec, struct index_record *idxrec);
+void record_from_ref(struct record *rec, struct ref_record *refrec);
+void record_from_log(struct record *rec, struct log_record *logrec);
+struct ref_record *record_as_ref(struct record ref);
+
+/* for qsort. */
+int ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 00000000000..e66647bebcd
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,405 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct block_source {
+	struct block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct block {
+	uint8_t *data;
+	int len;
+	struct block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct block *dest, uint64_t off,
+			  uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int block_source_from_file(struct block_source *block_src, const char *name);
+
+/* write_options sets options for writing a single reftable. */
+struct write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* width of the hash. Should be 20 for SHA1 or 32 for SHA256. Defaults
+	 * to SHA1 if unset */
+	int hash_size;
+};
+
+/* ref_record holds a ref database entry target_value */
+struct ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int ref_record_is_deletion(const struct ref_record *ref);
+
+/* prints a ref_record onto stdout */
+void ref_record_print(struct ref_record *ref, int hash_size);
+
+/* frees and nulls all pointer values. */
+void ref_record_clear(struct ref_record *ref);
+
+/* returns whether two ref_records are the same */
+int ref_record_equal(struct ref_record *a, struct ref_record *b, int hash_size);
+
+/* log_record holds a reflog entry */
+struct log_record {
+	char *ref_name;
+	uint64_t update_index;
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int log_record_is_deletion(const struct log_record *log);
+
+/* frees and nulls all pointer values. */
+void log_record_clear(struct log_record *log);
+
+/* returns whether two records are equal. */
+int log_record_equal(struct log_record *a, struct log_record *b, int hash_size);
+
+void log_record_print(struct log_record *log, int hash_size);
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct iterator {
+	struct iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_ref(struct iterator it, struct ref_record *ref);
+
+/* reads the next log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_log(struct iterator it, struct log_record *log);
+
+/* releases resources associated with an iterator. */
+void iterator_destroy(struct iterator *it);
+
+/* block_stats holds statistics for a single block type */
+struct block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct block_stats obj_stats;
+	/* stats for index blocks */
+	struct block_stats idx_stats;
+	/* stats for log blocks */
+	struct block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* different types of errors */
+
+/* Unexpected file system behavior */
+#define IO_ERROR -2
+
+/* Format inconsistency on reading data
+ */
+#define FORMAT_ERROR -3
+
+/* File does not exist. Returned from block_source_from_file(),  because it
+   needs special handling in stack.
+*/
+#define NOT_EXIST_ERROR -4
+
+/* Trying to write out-of-date data. */
+#define LOCK_ERROR -5
+
+/* Misuse of the API:
+   - on writing a record with NULL ref_name.
+   - on writing a ref_record outside the table limits
+   - on writing a ref or log record before the stack's next_update_index
+   - on reading a ref_record from log iterator, or vice versa.
+ */
+#define API_ERROR -6
+
+/* Decompression error */
+#define ZLIB_ERROR -7
+
+const char *error_str(int err);
+
+/* new_writer creates a new writer */
+struct writer *new_writer(int (*writer_func)(void *, uint8_t *, int),
+			  void *writer_arg, struct write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int fd_writer(void *fdp, uint8_t *data, int size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   stack_next_update_index(), or API_ERROR is returned.
+ */
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max);
+
+/* adds a ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   writer_set_limits(), or API_ERROR is returned.
+ */
+int writer_add_ref(struct writer *w, struct ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n);
+
+/* adds a log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int writer_add_log(struct writer *w, struct log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int writer_add_logs(struct writer *w, struct log_record *logs, int n);
+
+/* writer_close finalizes the reftable. The writer is retained so statistics can
+ * be inspected. */
+int writer_close(struct writer *w);
+
+/* writer_stats returns the statistics on the reftable being written. */
+struct stats *writer_stats(struct writer *w);
+
+/* writer_free deallocates memory for the writer */
+void writer_free(struct writer *w);
+
+struct reader;
+
+/* new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp.  The name is used for creating a
+ * stack. Typically, it is the basename of the file.
+ */
+int new_reader(struct reader **pp, struct block_source, const char *name);
+
+/* reader_seek_ref returns an iterator where 'name' would be inserted in the
+   table.
+
+   example:
+
+   struct reader *r = NULL;
+   int err = new_reader(&r, src, "filename");
+   if (err < 0) { ... }
+   struct iterator it = {};
+   err = reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct ref_record ref = {};
+   while (1) {
+     err = iterator_next_ref(it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   iterator_destroy(&it);
+   ref_record_clear(&ref);
+ */
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name);
+
+/* returns the hash size used in this table. */
+int reader_hash_size(struct reader *r);
+
+/* seek to logs for the given name, older than update_index. */
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+		       uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reader_free(struct reader *);
+
+/* return an iterator for the refs pointing to oid */
+int reader_refs_for(struct reader *r, struct iterator *it, uint8_t *oid,
+		    int oid_len);
+
+/* return the max_update_index for a table */
+uint64_t reader_max_update_index(struct reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reader_min_update_index(struct reader *r);
+
+/* a merged table is implements seeking/iterating over a stack of tables. */
+struct merged_table;
+
+/* new_merged_table creates a new merged table. It takes ownership of the stack
+   array.
+*/
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n,
+		     int hash_size);
+
+/* returns the hash size used in this merged table. */
+int merged_hash_size(struct merged_table *mt);
+
+/* returns an iterator positioned just before 'name' */
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+			  const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+			     const char *name, uint64_t update_index);
+
+/* like merged_table_seek_log_at but look for the newest entry. */
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+			  const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t merged_max_update_index(struct merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t merged_min_update_index(struct merged_table *mt);
+
+/* closes readers for the merged tables */
+void merged_table_close(struct merged_table *mt);
+
+/* releases memory for the merged_table */
+void merged_table_free(struct merged_table *m);
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct stack;
+
+/* open a new reftable stack. The tables will be stored in 'dir', while the list
+   of tables is in 'list_file'. Typically, this should be .git/reftables and
+   .git/refs respectively.
+*/
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+	      struct write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t stack_next_update_index(struct stack *st);
+
+/* add a new table to the stack. The write_table function must call
+   writer_set_limits, add refs and return an error value. */
+int stack_add(struct stack *st,
+	      int (*write_table)(struct writer *wr, void *write_arg),
+	      void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct merged_table *stack_merged_table(struct stack *st);
+
+/* frees all resources associated with the stack. */
+void stack_destroy(struct stack *st);
+
+/* reloads the stack if necessary. */
+int stack_reload(struct stack *st);
+
+/* Policy for expiring reflog entries. */
+struct log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int stack_compact_all(struct stack *st, struct log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int stack_auto_compact(struct stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_ref(struct stack *st, const char *refname,
+		   struct ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_log(struct stack *st, const char *refname,
+		   struct log_record *log);
+
+/* statistics on past compactions. */
+struct compaction_stats {
+	uint64_t bytes;
+	int attempts;
+	int failures;
+};
+
+struct compaction_stats *stack_compaction_stats(struct stack *st);
+
+#endif
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 00000000000..efbe625253f
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,199 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+
+	{
+		int l = strlen(str);
+		l++; /* \0 */
+		slice_resize(s, l);
+		memcpy(s->buf, str, l);
+		s->len = l - 1;
+	}
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_append_string(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_append(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+byte *slice_yield(struct slice *s)
+{
+	byte *p = s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = {};
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_yield(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	if (a.len != b.len) {
+		return 0;
+	}
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_compare(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	if (res != 0) {
+		return res;
+	}
+	if (a.len < b.len) {
+		return -1;
+	} else if (a.len > b.len) {
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+int slice_write(struct slice *b, byte *data, int sz)
+{
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_write_void(void *b, byte *data, int sz)
+{
+	return slice_write((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = calloc(size, 1);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct block_source *bs, struct slice *buf)
+{
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+struct block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 00000000000..f12a6db228c
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+struct slice {
+	byte *buf;
+	int len;
+	int cap;
+};
+
+void slice_set_string(struct slice *dest, const char *);
+void slice_append_string(struct slice *dest, const char *);
+char *slice_to_string(struct slice src);
+const char *slice_as_string(struct slice *src);
+bool slice_equal(struct slice a, struct slice b);
+byte *slice_yield(struct slice *s);
+void slice_copy(struct slice *dest, struct slice src);
+void slice_resize(struct slice *s, int l);
+int slice_compare(struct slice a, struct slice b);
+int slice_write(struct slice *b, byte *data, int sz);
+int slice_write_void(void *b, byte *data, int sz);
+void slice_append(struct slice *dest, struct slice add);
+
+struct block_source;
+void block_source_from_slice(struct block_source *bs, struct slice *buf);
+
+struct block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 00000000000..6a5aebf2ecd
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,984 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "reftable.h"
+#include "writer.h"
+
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+	      struct write_options config)
+{
+	struct stack *p = calloc(sizeof(struct stack), 1);
+	int err = 0;
+	*dest = NULL;
+	p->list_file = strdup(list_file);
+	p->reftable_dir = strdup(dir);
+	p->config = config;
+
+	err = stack_reload(p);
+	if (err < 0) {
+		stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fread_lines(FILE *f, char ***namesp)
+{
+	long size = 0;
+	int err = fseek(f, 0, SEEK_END);
+	char *buf = NULL;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	size = ftell(f);
+	if (size < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = fseek(f, 0, SEEK_SET);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	buf = malloc(size + 1);
+	if (fread(buf, 1, size, f) != size) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+exit:
+	free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	FILE *f = fopen(filename, "r");
+	int err = 0;
+	if (f == NULL) {
+		if (errno == ENOENT) {
+			*namesp = calloc(sizeof(char *), 1);
+			return 0;
+		}
+
+		return IO_ERROR;
+	}
+	err = fread_lines(f, namesp);
+	fclose(f);
+	return err;
+}
+
+struct merged_table *stack_merged_table(struct stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void stack_destroy(struct stack *st)
+{
+	if (st->merged == NULL) {
+		return;
+	}
+
+	merged_table_close(st->merged);
+	merged_table_free(st->merged);
+	st->merged = NULL;
+
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	free(st);
+}
+
+static struct reader **stack_copy_readers(struct stack *st, int cur_len)
+{
+	struct reader **cur = calloc(sizeof(struct reader *), cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int stack_reload_once(struct stack *st, char **names, bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reader **new_tables =
+		malloc(sizeof(struct reader *) * names_len);
+	int new_tables_len = 0;
+	struct merged_table *new_merged = NULL;
+
+	struct slice table_path = {};
+
+	while (*names) {
+		struct reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct block_source src = {};
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_append_string(&table_path, "/");
+			slice_append_string(&table_path, name);
+
+			err = block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0) {
+				goto exit;
+			}
+
+			err = new_reader(&rd, src, name);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = new_merged_table(&new_merged, new_tables, new_tables_len,
+			       st->config.hash_size);
+	if (err < 0) {
+		goto exit;
+	}
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		merged_table_free(st->merged);
+	}
+	st->merged = new_merged;
+
+	{
+		int i = 0;
+		for (i = 0; i < cur_len; i++) {
+			if (cur[i] != NULL) {
+				reader_close(cur[i]);
+				reader_free(cur[i]);
+			}
+		}
+	}
+exit:
+	free(slice_yield(&table_path));
+	{
+		int i = 0;
+		for (i = 0; i < new_tables_len; i++) {
+			reader_close(new_tables[i]);
+		}
+	}
+	free(new_tables);
+	free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0) {
+		return diff;
+	}
+
+	return udiff;
+}
+
+static int stack_reload_maybe_reuse(struct stack *st, bool reuse_open)
+{
+	struct timeval deadline = {};
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = {};
+		int err = gettimeofday(&now, NULL);
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		err = read_lines(st->list_file, &names_after);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return -1;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 100;
+		usleep(delay);
+	}
+
+	return 0;
+}
+
+int stack_reload(struct stack *st)
+{
+	return stack_reload_maybe_reuse(st, true);
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto exit;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto exit;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	free_names(names);
+	return err;
+}
+
+int stack_add(struct stack *st, int (*write)(struct writer *wr, void *arg),
+	      void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == LOCK_ERROR) {
+			err = stack_reload(st);
+		}
+		return err;
+	}
+
+	return stack_auto_compact(st);
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "%012" PRIx64 "-%012" PRIx64, min, max);
+	slice_set_string(dest, buf);
+}
+
+int stack_try_add(struct stack *st,
+		  int (*write_table)(struct writer *wr, void *arg), void *arg)
+{
+	struct slice lock_name = {};
+	struct slice temp_tab_name = {};
+	struct slice tab_name = {};
+	struct slice next_name = {};
+	struct slice table_list = {};
+	struct writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+	int lock_fd = 0;
+	uint64_t next_update_index = 0;
+
+	slice_set_string(&lock_name, st->list_file);
+	slice_append_string(&lock_name, ".lock");
+
+	lock_fd = open(slice_as_string(&lock_name), O_EXCL | O_CREAT | O_WRONLY,
+		       0644);
+	if (lock_fd < 0) {
+		if (errno == EEXIST) {
+			err = LOCK_ERROR;
+			goto exit;
+		}
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_uptodate(st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 1) {
+		err = LOCK_ERROR;
+		goto exit;
+	}
+
+	next_update_index = stack_next_update_index(st);
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, next_update_index, next_update_index);
+
+	slice_set_string(&temp_tab_name, st->reftable_dir);
+	slice_append_string(&temp_tab_name, "/");
+	slice_append(&temp_tab_name, next_name);
+	slice_append_string(&temp_tab_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_name));
+	if (tab_fd < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	wr = new_writer(fd_writer, &tab_fd, &st->config);
+	err = write_table(wr, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	if (wr->min_update_index < next_update_index) {
+		err = API_ERROR;
+		goto exit;
+	}
+
+	{
+		int i = 0;
+		for (i = 0; i < st->merged->stack_len; i++) {
+			slice_append_string(&table_list,
+					    st->merged->stack[i]->name);
+			slice_append_string(&table_list, "\n");
+		}
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_append_string(&next_name, ".ref");
+	slice_append(&table_list, next_name);
+	slice_append_string(&table_list, "\n");
+
+	slice_set_string(&tab_name, st->reftable_dir);
+	slice_append_string(&tab_name, "/");
+	slice_append(&tab_name, next_name);
+
+	err = rename(slice_as_string(&temp_tab_name),
+		     slice_as_string(&tab_name));
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	free(slice_yield(&temp_tab_name));
+
+	err = write(lock_fd, table_list.buf, table_list.len);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = close(lock_fd);
+	lock_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&tab_name));
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&tab_name));
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_reload(st);
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_name));
+	}
+	unlink(slice_as_string(&lock_name));
+
+	if (lock_fd > 0) {
+		close(lock_fd);
+		lock_fd = 0;
+	}
+
+	free(slice_yield(&lock_name));
+	free(slice_yield(&temp_tab_name));
+	free(slice_yield(&tab_name));
+	free(slice_yield(&next_name));
+	free(slice_yield(&table_list));
+	writer_free(wr);
+	return err;
+}
+
+uint64_t stack_next_update_index(struct stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0) {
+		return reader_max_update_index(st->merged->stack[sz - 1]) + 1;
+	}
+	return 1;
+}
+
+static int stack_compact_locked(struct stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct log_expiry_config *config)
+{
+	struct slice next_name = {};
+	int tab_fd = -1;
+	struct writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reader_min_update_index(st->merged->stack[first]),
+		    reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_append_string(temp_tab, "/");
+	slice_append(temp_tab, next_name);
+	slice_append_string(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = new_writer(fd_writer, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0) {
+		goto exit;
+	}
+	err = writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+	writer_free(wr);
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		free(slice_yield(temp_tab));
+	}
+	free(slice_yield(&next_name));
+	return err;
+}
+
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+			int last, struct log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reader **subtabs =
+		calloc(sizeof(struct reader *), last - first + 1);
+	struct merged_table *mt = NULL;
+	int err = 0;
+	struct iterator it = {};
+	struct ref_record ref = {};
+	struct log_record log = {};
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	writer_set_limits(wr, st->merged->stack[first]->min_update_index,
+			  st->merged->stack[last]->max_update_index);
+
+	err = new_merged_table(&mt, subtabs, subtabs_len, st->config.hash_size);
+	if (err < 0) {
+		free(subtabs);
+		goto exit;
+	}
+
+	err = merged_table_seek_ref(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = iterator_next_ref(it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+	}
+
+	err = merged_table_seek_log(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = iterator_next_log(it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		/* XXX collect stats? */
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+	}
+
+exit:
+	iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		merged_table_free(mt);
+	}
+	ref_record_clear(&ref);
+
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct stack *st, int first, int last,
+			       struct log_expiry_config *expiry)
+{
+	struct slice temp_tab_name = {};
+	struct slice new_table_name = {};
+	struct slice lock_file_name = {};
+	struct slice ref_list_contents = {};
+	struct slice new_table_path = {};
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **delete_on_success = calloc(sizeof(char *), compact_count + 1);
+	char **subtable_locks = calloc(sizeof(char *), compact_count + 1);
+	int i = 0;
+	int j = 0;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto exit;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_append_string(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0) {
+		goto exit;
+	}
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_name = {};
+		struct slice subtab_lock = {};
+		slice_set_string(&subtab_name, st->reftable_dir);
+		slice_append_string(&subtab_name, "/");
+		slice_append_string(&subtab_name,
+				    reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_name);
+		slice_append_string(&subtab_lock, ".lock");
+
+		{
+			int sublock_file_fd =
+				open(slice_as_string(&subtab_lock),
+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
+			if (sublock_file_fd > 0) {
+				close(sublock_file_fd);
+			} else if (sublock_file_fd < 0) {
+				if (errno == EEXIST) {
+					err = 1;
+				}
+				err = IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] = (char *)slice_as_string(&subtab_name);
+		j++;
+
+		if (err != 0) {
+			goto exit;
+		}
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_name, expiry);
+	if (err < 0) {
+		goto exit;
+	}
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_append_string(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_append_string(&new_table_path, "/");
+
+	slice_append(&new_table_path, new_table_name);
+
+	err = rename(slice_as_string(&temp_tab_name),
+		     slice_as_string(&new_table_path));
+	if (err < 0) {
+		goto exit;
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	slice_append(&ref_list_contents, new_table_name);
+	slice_append_string(&ref_list_contents, "\n");
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	have_lock = false;
+
+	for (char **p = delete_on_success; *p; p++) {
+		if (strcmp(*p, slice_as_string(&new_table_path))) {
+			unlink(*p);
+		}
+	}
+
+	err = stack_reload_maybe_reuse(st, first < last);
+exit:
+	for (char **p = subtable_locks; *p; p++) {
+		unlink(*p);
+	}
+	free_names(delete_on_success);
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	free(slice_yield(&new_table_name));
+	free(slice_yield(&new_table_path));
+	free(slice_yield(&ref_list_contents));
+	free(slice_yield(&temp_tab_name));
+	free(slice_yield(&lock_file_name));
+	return err;
+}
+
+int stack_compact_all(struct stack *st, struct log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct stack *st, int first, int last,
+				     struct log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	assert(sz > 0);
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = calloc(sizeof(struct segment), n);
+	int next = 0;
+	struct segment cur = {};
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct stack *st)
+{
+	uint64_t *sizes = calloc(sizeof(uint64_t), st->merged->stack_len);
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		/* overhead is 24 + 68 = 92. */
+		sizes[i] = st->merged->stack[i]->size - 91;
+	}
+	return sizes;
+}
+
+int stack_auto_compact(struct stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	free(sizes);
+	if (segment_size(&seg) > 0) {
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+	}
+
+	return 0;
+}
+
+struct compaction_stats *stack_compaction_stats(struct stack *st)
+{
+	return &st->stats;
+}
+
+int stack_read_ref(struct stack *st, const char *refname,
+		   struct ref_record *ref)
+{
+	struct iterator it = {};
+	struct merged_table *mt = stack_merged_table(st);
+	int err = merged_table_seek_ref(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = iterator_next_ref(it, ref);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(ref->ref_name, refname) || ref_record_is_deletion(ref)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	iterator_destroy(&it);
+	return err;
+}
+
+int stack_read_log(struct stack *st, const char *refname,
+		   struct log_record *log)
+{
+	struct iterator it = {};
+	struct merged_table *mt = stack_merged_table(st);
+	int err = merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = iterator_next_log(it, log);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(log->ref_name, refname) || log_record_is_deletion(log)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 00000000000..d5e2c93c293
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,40 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+
+struct stack {
+	char *list_file;
+	char *reftable_dir;
+
+	struct write_options config;
+
+	struct merged_table *merged;
+	struct compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct stack *st,
+		  int (*write_table)(struct writer *wr, void *arg), void *arg);
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+			int last, struct log_expiry_config *config);
+int fastlog2(uint64_t sz);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 00000000000..1ab80f6a100
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,59 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#if 1 /* REFTABLE_IN_GITCORE */
+
+#include "git-compat-util.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#define ARRAY_SIZE(a) sizeof((a)) / sizeof((a)[0])
+#define FREE_AND_NULL(x)    \
+	do {                \
+		free(x);    \
+		(x) = NULL; \
+	} while (0)
+#define QSORT(arr, n, cmp) qsort(arr, n, sizeof(arr[0]), cmp)
+#define SWAP(a, b)                              \
+	{                                       \
+		char tmp[sizeof(a)];            \
+		assert(sizeof(a) == sizeof(b)); \
+		memcpy(&tmp[0], &a, sizeof(a)); \
+		memcpy(&a, &b, sizeof(a));      \
+		memcpy(&b, &tmp[0], sizeof(a)); \
+	}
+#endif /* REFTABLE_IN_GITCORE */
+
+typedef uint8_t byte;
+typedef int bool;
+
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 00000000000..9bf7fe531ff
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,66 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				calloc(sizeof(struct tree_node), 1);
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	{
+		int res = compare(key, (*rootp)->key);
+		if (res < 0) {
+			return tree_search(key, &(*rootp)->left, compare,
+					   insert);
+		} else if (res > 0) {
+			return tree_search(key, &(*rootp)->right, compare,
+					   insert);
+		}
+	}
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 00000000000..86a71715aee
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,24 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 00000000000..5a61288663d
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,630 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct block_stats *writer_block_stats(struct writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct writer *w, byte *data, size_t len, int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = calloc(w->pending_padding, 1);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0) {
+			return n;
+		}
+
+		w->pending_padding = 0;
+		free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0) {
+		return n;
+	}
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_size == 0) {
+		opts->hash_size = SHA1_SIZE;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_write_header(struct writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+	dest[4] = (w->hash_size == SHA1_SIZE) ? 1 : 2; /* version */
+
+	put_u24(dest + 5, w->opts.block_size);
+	put_u64(dest + 8, w->min_update_index);
+	put_u64(dest + 16, w->max_update_index);
+	return 24;
+}
+
+static void writer_reinit_block_writer(struct writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = HEADER_SIZE;
+	}
+
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start, w->hash_size);
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct writer *new_writer(int (*writer_func)(void *, byte *, int),
+			  void *writer_arg, struct write_options *opts)
+{
+	struct writer *wp = calloc(sizeof(struct writer), 1);
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->hash_size = opts->hash_size;
+	wp->block = calloc(opts->block_size, 1);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void writer_free(struct writer *w)
+{
+	free(w->block);
+	free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
+			     ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		key = calloc(sizeof(struct obj_index_tree_node), 1);
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = realloc(key->offsets,
+				       sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct writer *w, struct record rec)
+{
+	int result = -1;
+	struct slice key = {};
+	int err = 0;
+	record_key(rec, &key);
+	if (slice_compare(w->last_key, key) >= 0) {
+		goto exit;
+	}
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto exit;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	writer_reinit_block_writer(w, record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	result = 0;
+exit:
+	free(slice_yield(&key));
+	return result;
+}
+
+int writer_add_ref(struct writer *w, struct ref_record *ref)
+{
+	struct record rec = {};
+	struct ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL) {
+		return API_ERROR;
+	}
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index) {
+		return API_ERROR;
+	}
+
+	record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = w->hash_size,
+		};
+
+		writer_index_hash(w, h);
+	}
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = w->hash_size,
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int writer_add_log(struct writer *w, struct log_record *log)
+{
+	if (log->ref_name == NULL) {
+		return API_ERROR;
+	}
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	{
+		struct record rec = {};
+		int err;
+		record_from_log(&rec, log);
+		err = writer_add_record(w, rec);
+		return err;
+	}
+}
+
+int writer_add_logs(struct writer *w, struct log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	while (w->index_len > threshold) {
+		struct index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct record rec = {};
+			record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, rec) == 0) {
+				continue;
+			}
+
+			{
+				int err = writer_flush_block(w);
+				if (err < 0) {
+					return err;
+				}
+			}
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, rec);
+			assert(err == 0);
+		}
+		for (i = 0; i < idx_len; i++) {
+			free(slice_yield(&idx[i].last_key));
+		}
+		free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct block_stats *bstats = writer_block_stats(w, typ);
+		bstats->index_blocks =
+			w->stats.idx_stats.blocks - before_blocks;
+		bstats->index_offset = index_start;
+		bstats->max_index_level = max_level;
+	}
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct record rec = {};
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	free(slice_yield(&entry->hash));
+	free(entry);
+}
+
+static int writer_dump_object_index(struct writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = {};
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0) {
+		return closure.err;
+	}
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0) {
+		return err;
+	}
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_close(struct writer *w)
+{
+	byte footer[68];
+	byte *p = footer;
+
+	int err = writer_finish_public_section(w);
+	if (err != 0) {
+		return err;
+	}
+
+	writer_write_header(w, footer);
+	p += 24;
+	put_u64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_u64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_u64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_u64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_u64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_u32(p, crc32(0, footer, p - footer));
+	p += 4;
+	w->pending_padding = 0;
+
+	{
+		int n = padded_write(w, footer, sizeof(footer), 0);
+		if (n < 0) {
+			return n;
+		}
+	}
+
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	free(slice_yield(&w->last_key));
+	return 0;
+}
+
+void writer_clear_index(struct writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		free(slice_yield(&w->index[i].last_key));
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct block_stats *bstats = writer_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	if (raw_bytes < 0) {
+		return raw_bytes;
+	}
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_u24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0) {
+		return err;
+	}
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = realloc(w->index,
+				   sizeof(struct index_record) * w->index_cap);
+	}
+
+	{
+		struct index_record ir = {
+			.offset = w->next,
+		};
+		slice_copy(&ir.last_key, w->block_writer->last_key);
+		w->index[w->index_len] = ir;
+	}
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	block_writer_reset(&w->block_writer_data);
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct writer *w)
+{
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+	if (w->block_writer->entries == 0) {
+		return 0;
+	}
+	return writer_flush_nonempty_block(w);
+}
+
+struct stats *writer_stats(struct writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 00000000000..bd2386474b2
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,46 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct writer {
+	int (*write)(void *, byte *, int);
+	void *write_arg;
+	int pending_padding;
+	int hash_size;
+	struct slice last_key;
+
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct write_options opts;
+
+	byte *block;
+	struct block_writer *block_writer;
+	struct block_writer block_writer_data;
+	struct index_record *index;
+	int index_len;
+	int index_cap;
+
+	/* tree for use with tsearch */
+	struct tree_node *obj_index_tree;
+
+	struct stats stats;
+};
+
+int writer_flush_block(struct writer *w);
+void writer_clear_index(struct writer *w);
+int writer_finish_public_section(struct writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 00000000000..3e0b0f24f1c
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v6 5/5] Reftable support for git-core
  2020-02-18  8:43         ` [PATCH v6 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                             ` (3 preceding siblings ...)
  2020-02-18  8:43           ` [PATCH v6 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-02-18  8:43           ` Han-Wen Nienhuys via GitGitGadget
  2020-02-18 21:05           ` [PATCH v6 0/5] Reftable support git-core Junio C Hamano
                             ` (2 subsequent siblings)
  7 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-18  8:43 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

TODO:

 * Resolve spots marked with XXX
 * Test strategy?

Example use:

  $ ~/vc/git/git init --reftable
  warning: templates not found in /usr/local/google/home/hanwen/share/git-core/templates
  Initialized empty Git repository in /tmp/qz/.git/
  $ echo q > a
  $ ~/vc/git/git add a
  $ ~/vc/git/git commit -mx
  fatal: not a git repository (or any of the parent directories): .git
  [master (root-commit) 373d969] x
   1 file changed, 1 insertion(+)
   create mode 100644 a
  $ ~/vc/git/git show-ref
  373d96972fca9b63595740bba3898a762778ba20 HEAD
  373d96972fca9b63595740bba3898a762778ba20 refs/heads/master
  $ ls -l .git/reftable/
  total 12
  -rw------- 1 hanwen primarygroup  126 Jan 23 20:08 000000000001-000000000001.ref
  -rw------- 1 hanwen primarygroup 4277 Jan 23 20:08 000000000002-000000000002.ref
  $ go run ~/vc/reftable/cmd/dump.go  -table /tmp/qz/.git/reftable/000000000002-000000000002.ref
  ** DEBUG **
  name /tmp/qz/.git/reftable/000000000002-000000000002.ref, sz 4209: 'r' reftable.readerOffsets{Present:true, Offset:0x0, IndexOffset:0x0}, 'o' reftable.readerOffsets{Present:false, Offset:0x0, IndexOffset:0x0} 'g' reftable.readerOffsets{Present:true, Offset:0x1000, IndexOffset:0x0}
  ** REFS **
  reftable.RefRecord{RefName:"refs/heads/master", UpdateIndex:0x2, Value:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, TargetValue:[]uint8(nil), Target:""}
  ** LOGS **
  reftable.LogRecord{RefName:"HEAD", UpdateIndex:0x2, New:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, Old:[]uint8{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, Name:"Han-Wen Nienhuys", Email:"hanwen@google.com", Time:0x5e29ef27, TZOffset:100, Message:"commit (initial): x\n"}

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |   7 +
 Makefile                                      |  24 +-
 builtin/clone.c                               |   4 +-
 builtin/init-db.c                             |  55 +-
 cache.h                                       |   4 +-
 refs.c                                        |  20 +-
 refs.h                                        |   3 +
 refs/refs-internal.h                          |   1 +
 refs/reftable-backend.c                       | 971 ++++++++++++++++++
 repository.c                                  |   2 +
 repository.h                                  |   3 +
 setup.c                                       |  12 +-
 12 files changed, 1077 insertions(+), 29 deletions(-)
 create mode 100644 refs/reftable-backend.c

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ffd..72576235833 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index 6134104ae65..706bb0a9814 100644
--- a/Makefile
+++ b/Makefile
@@ -814,6 +814,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += command-list.h
 
@@ -959,6 +960,7 @@ LIB_OBJS += rebase-interactive.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1163,7 +1165,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2347,11 +2349,28 @@ VCSSVN_OBJS += vcs-svn/fast_export.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/bytes.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2488,6 +2507,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
diff --git a/builtin/clone.c b/builtin/clone.c
index 4f6150c55c3..4bcea0c18da 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1097,7 +1097,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template,
+		DEFAULT_REF_STORAGE, /* XXX */
+		INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 45bdea05890..d51a99ed77e 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -177,7 +177,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 }
 
 static int create_default_files(const char *template_path,
-				const char *original_git_dir)
+				const char *original_git_dir,
+				const char *ref_storage_format, int flags)
 {
 	struct stat st1;
 	struct strbuf buf = STRBUF_INIT;
@@ -213,6 +214,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(ref_storage_format);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -222,6 +224,15 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -234,17 +245,21 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
+	} else {
+		/*
+		 * XXX should check whether our ref backend matches the
+		 * original one; if not, either die() or convert
+		 */
 	}
 
 	/* This forces creation of new config file */
-	xsnprintf(repo_version_string, sizeof(repo_version_string),
-		  "%d", GIT_REPO_VERSION);
+	xsnprintf(repo_version_string, sizeof(repo_version_string), "%d",
+		  !strcmp(ref_storage_format, "reftable") ?
+			  GIT_REPO_VERSION_READ :
+			  GIT_REPO_VERSION);
 	git_config_set("core.repositoryformatversion", repo_version_string);
 
 	/* Check filemode trustability */
@@ -339,7 +354,8 @@ static void separate_git_dir(const char *git_dir, const char *git_link)
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, unsigned int flags)
+	    const char *template_dir, const char *ref_storage_format,
+	    unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -378,7 +394,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 */
 	check_repository_format();
 
-	reinit = create_default_files(template_dir, original_git_dir);
+	reinit = create_default_files(template_dir, original_git_dir,
+				      ref_storage_format, flags);
 
 	create_object_directory();
 
@@ -403,6 +420,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -476,20 +495,24 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = DEFAULT_REF_STORAGE;
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
 	unsigned int flags = 0;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_END()
@@ -591,9 +614,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, flags);
+	return init_db(git_dir, real_git_dir, template_dir, ref_storage_format,
+		       flags);
 }
diff --git a/cache.h b/cache.h
index 37c899b53f7..4d905e2d565 100644
--- a/cache.h
+++ b/cache.h
@@ -627,7 +627,8 @@ int path_inside_repo(const char *prefix, const char *path);
 #define INIT_DB_EXIST_OK 0x0002
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, unsigned int flags);
+	    const char *template_dir, const char *ref_storage_format,
+	    unsigned int flags);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1041,6 +1042,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 1ab0bb54d3d..6530219762f 100644
--- a/refs.c
+++ b/refs.c
@@ -20,7 +20,7 @@
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1836,13 +1836,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1858,7 +1858,10 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs = ref_store_init(r->gitdir,
+				 r->ref_storage_format ? r->ref_storage_format :
+							 DEFAULT_REF_STORAGE,
+				 REF_STORE_ALL_CAPS);
 	return r->refs;
 }
 
@@ -1913,7 +1916,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1927,6 +1930,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1940,9 +1944,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index 87c9ec921b9..2b5985ad593 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* XXX where should this be? */
+#define DEFAULT_REF_STORAGE "files"
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 1d7a4852209..d87e9cbdec7 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -661,6 +661,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 00000000000..02ba4fae964
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,971 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *reftable_dir;
+	char *table_list_file;
+	struct stack *stack;
+};
+
+static void clear_log_record(struct log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	log_record_clear(log);
+}
+
+static void fill_log_record(struct log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = {};
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *reftable_ref_store_create(const char *path,
+						   unsigned int store_flags)
+{
+	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct write_options cfg = {
+		.block_size = 4096,
+		.hash_size = the_hash_algo->rawsz,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/reftable/tables.list", path);
+	refs->table_list_file = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", path);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/HEAD", path);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", path);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	refs->err = new_stack(&refs->stack, refs->reftable_dir,
+			      refs->table_list_file, cfg);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	FILE *file;
+	safe_create_dir(refs->reftable_dir, 1);
+
+	file = fopen(refs->table_list_file, "a");
+	if (file == NULL) {
+		return -1;
+	}
+	fclose(file);
+	return 0;
+}
+
+struct reftable_iterator {
+	struct ref_iterator base;
+	struct iterator iter;
+	struct ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = iterator_next_ref(ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	ref_record_clear(&ri->ref);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct merged_table *mt = NULL;
+
+	mt = stack_merged_table(refs->stack);
+	ri->err = refs->err;
+	if (ri->err == 0) {
+		ri->err = merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	(void)refs;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid = {};
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct ref_update *)a)->refname,
+		      ((struct ref_update *)b)->refname);
+}
+
+static int write_transaction_table(struct writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = stack_next_update_index(refs->stack);
+	int err = 0;
+	struct log_record *logs = calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	writer_set_limits(writer, ts, ts);
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct log_record *log = &logs[i];
+		fill_log_record(log);
+		log->ref_name = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct object_id out_oid = {};
+			int out_flags = 0;
+			/* Memory owned by refs_resolve_ref_unsafe, no need to
+			 * free(). */
+			const char *resolved = refs_resolve_ref_unsafe(
+				transaction->ref_store, u->refname, 0, &out_oid,
+				&out_flags);
+			struct ref_record ref = {};
+			ref.ref_name =
+				(char *)(resolved ? resolved : u->refname);
+			log->ref_name = ref.ref_name;
+			ref.value = u->new_oid.hash;
+			ref.update_index = ts;
+			err = writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		err = writer_add_log(writer, &logs[i]);
+		clear_log_record(&logs[i]);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+exit:
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_commit(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	int err = stack_add(refs->stack, &write_transaction_table, transaction);
+	if (err < 0) {
+		strbuf_addf(errmsg, "reftable: transaction failure %s",
+			    error_str(err));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *err)
+{
+	return reftable_transaction_commit(ref_store, transaction, err);
+}
+
+struct write_delete_refs_arg {
+	struct stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	int err = 0;
+
+	writer_set_limits(writer, ts, ts);
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct log_record log = {};
+		struct ref_record current = {};
+		fill_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (stack_read_ref(arg->stack, log.ref_name, &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		ref_record_clear(&current);
+
+		clear_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	return stack_add(refs->stack, &write_delete_refs_table, &arg);
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	return stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	writer_set_limits(writer, ts, ts);
+	err = writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct log_record log = {};
+		struct object_id new_oid = {};
+		struct object_id old_oid = {};
+		struct ref_record current = {};
+		stack_read_ref(create->refs->stack, create->refname, &current);
+
+		fill_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	return stack_add(refs->stack, &write_create_symref_table, &arg);
+}
+
+struct write_rename_arg {
+	struct stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	struct ref_record ref = {};
+	int err = stack_read_ref(arg->stack, arg->oldname, &ref);
+	if (err) {
+		goto exit;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto exit;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct ref_record todo[2] = {};
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct log_record todo[2] = {};
+		fill_log_record(&todo[0]);
+		fill_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = writer_add_logs(writer, todo, 2);
+
+		clear_log_record(&todo[0]);
+		clear_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto exit;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	return stack_add(refs->stack, &write_rename_table, &arg);
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct iterator iter;
+	struct log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = iterator_next_log(ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	log_record_clear(&ri->log);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct iterator it = {};
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &it, refname);
+	struct log_record log = {};
+
+	while (err == 0) {
+		err = iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid = {};
+			struct object_id new_oid = {};
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	log_record_clear(&log);
+	iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct iterator it = {};
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &it, refname);
+
+	struct log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+
+	while (err == 0) {
+		struct log_record log = {};
+		err = iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (int i = len; i--;) {
+		struct log_record *log = &logs[i];
+		struct object_id old_oid = {};
+		struct object_id new_oid = {};
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (int i = 0; i < len; i++) {
+		log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct reftable_ref_store *refs;
+	struct log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct log_record tombstone = {
+		.ref_name = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->refs->stack);
+	int i = 0;
+	writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  reftable_pack_refs(), which will compact the entire stack and get rid
+	  of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	struct reflog_expiry_arg arg = {
+		.refs = refs,
+	};
+	struct log_record log = {};
+	struct iterator it = {};
+	int err = merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		struct object_id ooid = {};
+		struct object_id noid = {};
+
+		int err = iterator_next_log(it, &log);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0 || strcmp(log.ref_name, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	log_record_clear(&log);
+	iterator_destroy(&it);
+	err = stack_add(refs->stack, &write_reflog_expiry_table, &arg);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct ref_record ref = {};
+	int err = stack_read_ref(refs->stack, refname, &ref);
+	if (err) {
+		goto exit;
+	}
+	if (ref.target != NULL) {
+		/* XXX recurse? */
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else {
+		hashcpy(oid->hash, ref.value);
+	}
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire
+};
diff --git a/repository.c b/repository.c
index a4174ddb062..ff0988dac84 100644
--- a/repository.c
+++ b/repository.c
@@ -174,6 +174,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 040057dea6f..198d4aa0907 100644
--- a/repository.h
+++ b/repository.h
@@ -70,6 +70,9 @@ struct repository {
 	/* The store in which the refs are held. */
 	struct ref_store *refs;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index 12228c0d9c1..01bf5580180 100644
--- a/setup.c
+++ b/setup.c
@@ -457,9 +457,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -548,6 +550,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1190,8 +1193,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 4/5] Add reftable library
  2020-02-11 23:40                     ` brian m. carlson
@ 2020-02-18  9:25                       ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-18  9:25 UTC (permalink / raw)
  To: brian m. carlson, Han-Wen Nienhuys, Junio C Hamano,
	Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Wed, Feb 12, 2020 at 12:40 AM brian m. carlson
<sandals@crustytoothpaste.net> wrote:
>
> On 2020-02-11 at 16:40:56, Han-Wen Nienhuys wrote:
> > I can see a future where we have a different format that allows for
> > more metadata so we can encode the hash size separately. But maybe
> > that can be for format v3 and up.
> >
> > Let's do format v2 = format v1 but with 32-byte hashes.
>
> The way we've preferred to do this in Git is to write a four-byte hash
> identifier into the file, but we have legacy concerns here, so I think
> switching to v2 for SHA-256 is fine.
> --
> brian m. carlson: Houston, Texas, US
> OpenPGP: https://keybase.io/bk2204



I've implemented this in the last version of the series.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
  2020-02-18  8:43         ` [PATCH v6 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                             ` (4 preceding siblings ...)
  2020-02-18  8:43           ` [PATCH v6 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-02-18 21:05           ` Junio C Hamano
  2020-02-19 16:59             ` Han-Wen Nienhuys
  2020-02-21  6:40           ` Jonathan Nieder
  2020-02-26  8:49           ` [PATCH v7 0/6] " Han-Wen Nienhuys via GitGitGadget
  7 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-02-18 21:05 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys

Here is what I got from "git am --whitespace=fix" on these five patches.
Does it match your test application?

Applying: refs.h: clarify reflog iteration order
Applying: create .git/refs in files-backend.c
Applying: refs: document how ref_iterator_advance_fn should handle symrefs
.git/rebase-apply/patch:145: trailing whitespace.
    
.git/rebase-apply/patch:147: trailing whitespace.
    
.git/rebase-apply/patch:6544: indent with spaces.
        left = *destLen;
.git/rebase-apply/patch:6545: indent with spaces.
        *destLen = 0;
.git/rebase-apply/patch:6548: indent with spaces.
        left = 1;
warning: squelched 15 whitespace errors
warning: 20 lines applied after fixing whitespace errors.
Applying: Add reftable library
Applying: Reftable support for git-core

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 4/5] Add reftable library
  2020-02-18  8:43           ` [PATCH v6 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-02-18 21:11             ` Junio C Hamano
  2020-02-19  6:55               ` Jeff King
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-02-18 21:11 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +void put_u64(byte *out, uint64_t v)
> +{
> +	int i = 0;
> +	for (i = sizeof(uint64_t); i--;) {
> +		out[i] = (byte)(v & 0xff);
> +		v >>= 8;
> +	}
> +}

This looks OK, but ...

> +int names_length(char **names)
> +{
> +	int len = 0;
> +	for (char **p = names; *p; p++) {
> +		len++;
> +	}

... this will break the build with some compiler options (see for
example https://travis-ci.org/git/git/jobs/651322628#L679).  You
may also want to lose {braces} around a single-statement block.

The incremental change between v5 and v6, especially the reflog
expiration bit, looked sensible.

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 4/5] Add reftable library
  2020-02-18 21:11             ` Junio C Hamano
@ 2020-02-19  6:55               ` Jeff King
  2020-02-19 17:00                 ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Jeff King @ 2020-02-19  6:55 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Han-Wen Nienhuys

On Tue, Feb 18, 2020 at 01:11:28PM -0800, Junio C Hamano wrote:

> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > +void put_u64(byte *out, uint64_t v)
> > +{
> > +	int i = 0;
> > +	for (i = sizeof(uint64_t); i--;) {
> > +		out[i] = (byte)(v & 0xff);
> > +		v >>= 8;
> > +	}
> > +}
> 
> This looks OK, but ...

There's a useless initialization of "i" there, I think?

It also seems weird to decrement "i" in the loop condition and leave the
final field of the for-loop empty. Would:

  int i = sizeof(uint64_t);
  while (i--) {
     ...
  }

be more clear? But...

We also have put_be64() already (it's coded without a loop, though I'd
imagine just about any compiler would unroll the loop above).

I wonder how possible it would be to use the Git names for these utility
functions in the reftable library, and then push the common ones out
into a compat file within the reftable library. Then we could toss out
the compat file when importing the library to Git, but it could still
compile standalone.

-Peff

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
  2020-02-18 21:05           ` [PATCH v6 0/5] Reftable support git-core Junio C Hamano
@ 2020-02-19 16:59             ` Han-Wen Nienhuys
  2020-02-19 17:02               ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-19 16:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git

On Tue, Feb 18, 2020 at 10:05 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Here is what I got from "git am --whitespace=fix" on these five patches.
> Does it match your test application?
>
> Applying: refs.h: clarify reflog iteration order
> Applying: create .git/refs in files-backend.c
> Applying: refs: document how ref_iterator_advance_fn should handle symrefs
> .git/rebase-apply/patch:145: trailing whitespace.
>
> .git/rebase-apply/patch:147: trailing whitespace.
>
> .git/rebase-apply/patch:6544: indent with spaces.
>         left = *destLen;
> .git/rebase-apply/patch:6545: indent with spaces.
>         *destLen = 0;
> .git/rebase-apply/patch:6548: indent with spaces.
>         left = 1;
> warning: squelched 15 whitespace errors


your checker is tripping over code imported from zlib. I added a /*
clang-format off */ comment to avoid reformatting this code. What do
you suggest?


-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 4/5] Add reftable library
  2020-02-19  6:55               ` Jeff King
@ 2020-02-19 17:00                 ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-19 17:00 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Wed, Feb 19, 2020 at 7:55 AM Jeff King <peff@peff.net> wrote:
> I wonder how possible it would be to use the Git names for these utility
> functions in the reftable library, and then push the common ones out
> into a compat file within the reftable library. Then we could toss out
> the compat file when importing the library to Git, but it could still
> compile standalone.

I did this for {get,put}_u{64,32} and xstrdup.  For some reason, git
is missing put_u16.

Let me know if there are other functions I should look at.



-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
  2020-02-19 16:59             ` Han-Wen Nienhuys
@ 2020-02-19 17:02               ` Junio C Hamano
  2020-02-19 17:21                 ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-02-19 17:02 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git

Han-Wen Nienhuys <hanwenn@gmail.com> writes:

> On Tue, Feb 18, 2020 at 10:05 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Here is what I got from "git am --whitespace=fix" on these five patches.
>> Does it match your test application?
>>
>> Applying: refs.h: clarify reflog iteration order
>> Applying: create .git/refs in files-backend.c
>> Applying: refs: document how ref_iterator_advance_fn should handle symrefs
>> .git/rebase-apply/patch:145: trailing whitespace.
>>
>> .git/rebase-apply/patch:147: trailing whitespace.
>>
>> .git/rebase-apply/patch:6544: indent with spaces.
>>         left = *destLen;
>> .git/rebase-apply/patch:6545: indent with spaces.
>>         *destLen = 0;
>> .git/rebase-apply/patch:6548: indent with spaces.
>>         left = 1;
>> warning: squelched 15 whitespace errors
>
>
> your checker is tripping over code imported from zlib. I added a /*
> clang-format off */ comment to avoid reformatting this code. What do
> you suggest?

Use zlib from the system instead?

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
  2020-02-19 17:02               ` Junio C Hamano
@ 2020-02-19 17:21                 ` Han-Wen Nienhuys
  2020-02-19 18:10                   ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-19 17:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git

On Wed, Feb 19, 2020 at 6:02 PM Junio C Hamano <gitster@pobox.com> wrote:

> > your checker is tripping over code imported from zlib. I added a /*
> > clang-format off */ comment to avoid reformatting this code. What do
> > you suggest?
>
> Use zlib from the system instead?

uncompress2 is a 2016 addition to zlib. It doesn't pass on
gitgitgadget's CI, because it is using an older version of zlib.

-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
  2020-02-19 17:21                 ` Han-Wen Nienhuys
@ 2020-02-19 18:10                   ` Junio C Hamano
  2020-02-19 19:14                     ` Han-Wen Nienhuys
  2020-02-20 11:19                     ` Jeff King
  0 siblings, 2 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-02-19 18:10 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git

Han-Wen Nienhuys <hanwenn@gmail.com> writes:

> On Wed, Feb 19, 2020 at 6:02 PM Junio C Hamano <gitster@pobox.com> wrote:
>
>> > your checker is tripping over code imported from zlib. I added a /*
>> > clang-format off */ comment to avoid reformatting this code. What do
>> > you suggest?
>>
>> Use zlib from the system instead?
>
> uncompress2 is a 2016 addition to zlib. It doesn't pass on
> gitgitgadget's CI, because it is using an older version of zlib.

Ahh.

It is OK (and indeed you're right that you cannot avoid it) to ship
a reasonably new snapshot as a fallback in such a case, but it still
is far more preferrable to avoid linking with the fallback snapshot
copy when a working one is available on the system, especially for a
widely used and established library like zlib, because we have one
less thing to keep up-to-date with the security patches made to the
upstream.

In any case, if we cannot avoid shipping a borrowed code that we'd
rather not touch or fix-up, covering them with local .gitattributes
file, disabling selected whitespace checks, may be one way out.
IIUC, our default settings for the whitespace check is in the
top-level .gitattributes file

    * whitespace=!indent,trail,space
    *.[ch] whitespace=indent,trail,space diff=cpp
    *.sh whitespace=indent,trail,space eol=lf

A new local .gitattrubutes file in retable/ directory to loosen the
set of checks.  The borrowed code triggers indent-with-non-tab and
trailing-space at least, which needs to be disabled locally, I
think.

Thanks.

 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 0000000000..f44451a379
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
  2020-02-19 18:10                   ` Junio C Hamano
@ 2020-02-19 19:14                     ` Han-Wen Nienhuys
  2020-02-19 20:09                       ` Junio C Hamano
  2020-02-20 11:19                     ` Jeff King
  1 sibling, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-19 19:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git

On Wed, Feb 19, 2020 at 7:10 PM Junio C Hamano <gitster@pobox.com> wrote:
> It is OK (and indeed you're right that you cannot avoid it) to ship
> a reasonably new snapshot as a fallback in such a case, but it still
> is far more preferrable to avoid linking with the fallback snapshot
> copy when a working one is available on the system, especially for a
> widely used and established library like zlib, because we have one
> less thing to keep up-to-date with the security patches made to the
> upstream.

what is the procedure for doing this? Do I need to write autoconf
macros (oh no!) ?

I've added a comment for now in system.h

>  reftable/.gitattributes | 1 +
>  1 file changed, 1 insertion(+)

thanks, I've put this in.
-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
  2020-02-19 19:14                     ` Han-Wen Nienhuys
@ 2020-02-19 20:09                       ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-02-19 20:09 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git

Han-Wen Nienhuys <hanwenn@gmail.com> writes:

> On Wed, Feb 19, 2020 at 7:10 PM Junio C Hamano <gitster@pobox.com> wrote:
>> It is OK (and indeed you're right that you cannot avoid it) to ship
>> a reasonably new snapshot as a fallback in such a case, but it still
>> is far more preferrable to avoid linking with the fallback snapshot
>> copy when a working one is available on the system, especially for a
>> widely used and established library like zlib, because we have one
>> less thing to keep up-to-date with the security patches made to the
>> upstream.
>
> what is the procedure for doing this? Do I need to write autoconf
> macros (oh no!) ?

We treat the Makefile macros listed at the top of the file as
authoritative set of knobs, so 

    # Define ZLIB_LACKS_UNCOMPRESS2 if your zlib is so old to lack the
    # function uncompress2().

with conditional compilation/linkage based on the macro would be
minimally sufficient.  Setting it automatically in ./configure is
nice-to-have but is optional.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
  2020-02-19 18:10                   ` Junio C Hamano
  2020-02-19 19:14                     ` Han-Wen Nienhuys
@ 2020-02-20 11:19                     ` Jeff King
  1 sibling, 0 replies; 409+ messages in thread
From: Jeff King @ 2020-02-20 11:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys via GitGitGadget, git

On Wed, Feb 19, 2020 at 10:10:29AM -0800, Junio C Hamano wrote:

> Han-Wen Nienhuys <hanwenn@gmail.com> writes:
> 
> > On Wed, Feb 19, 2020 at 6:02 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> >> > your checker is tripping over code imported from zlib. I added a /*
> >> > clang-format off */ comment to avoid reformatting this code. What do
> >> > you suggest?
> >>
> >> Use zlib from the system instead?
> >
> > uncompress2 is a 2016 addition to zlib. It doesn't pass on
> > gitgitgadget's CI, because it is using an older version of zlib.
> 
> Ahh.
> 
> It is OK (and indeed you're right that you cannot avoid it) to ship
> a reasonably new snapshot as a fallback in such a case, but it still
> is far more preferrable to avoid linking with the fallback snapshot
> copy when a working one is available on the system, especially for a
> widely used and established library like zlib, because we have one
> less thing to keep up-to-date with the security patches made to the
> upstream.

If this were substantial code, I'd agree. But this is really just a thin
wrapper around the usual loop zlib inflate() loop that we already do in
a dozen places in our code (e.g., unpack_loose_rest(), which even does
the "did we consume all the bytes" check that uncompress2 allows).

I'm not sure it is worth the mental energy of adding a Makefile knob to
use the zlib version if we can just open-code it ourselves (or provide
our own custom helper that we always use).

And if it _is_ worth taking the zlib version because we're concerned
that it may contain or gain bugfixes[1], then possibly we should
be using it in lots more places (though probably not everywhere, as we
do sometimes need the streaming behavior of the loop).

[1] I'll admit we've hit some subtleties with that loop in the past,
    especially around corrupted or bogus inputs, but I think we've
    shaken them all out these days.

-Peff

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v4 4/5] Add reftable library
  2020-02-11 16:46                   ` Han-Wen Nienhuys
@ 2020-02-20 17:20                     ` Jonathan Nieder
  0 siblings, 0 replies; 409+ messages in thread
From: Jonathan Nieder @ 2020-02-20 17:20 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Junio C Hamano, brian m. carlson,
	Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

Han-Wen Nienhuys wrote:

> The ground truth for the format is the spec. The spec is part of the
> JGit repository, so it seems reasonable for it to be reviewed there.
> Do you want to move it somewhere else?

Let's stick a copy in Documentation/technical.  I can send a patch to
Git to do that today, and we can update the doc in JGit to point to
Git's version as the canonical one.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
  2020-02-18  8:43         ` [PATCH v6 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                             ` (5 preceding siblings ...)
  2020-02-18 21:05           ` [PATCH v6 0/5] Reftable support git-core Junio C Hamano
@ 2020-02-21  6:40           ` Jonathan Nieder
  2020-02-26 17:16             ` Han-Wen Nienhuys
       [not found]             ` <CAFQ2z_NQn9O3kFmHk8Cr31FY66ToU4bUdE=asHUfN++zBG+SPw@mail.gmail.com>
  2020-02-26  8:49           ` [PATCH v7 0/6] " Han-Wen Nienhuys via GitGitGadget
  7 siblings, 2 replies; 409+ messages in thread
From: Jonathan Nieder @ 2020-02-21  6:40 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: git, brian m. carlson, Junio C Hamano, Han-Wen Nienhuys via GitGitGadget

Hi,

Han-Wen Nienhuys wrote:

> This adds the reftable library, and hooks it up as a ref backend.

As promised, here's a patch to include the reftable spec in git.git.
Please include this in the next iteration of this patch series (or if
you prefer for it to land separately, that's also fine with me).

[...]
>  * support SHA256 as version 2 of the format.

I'd prefer if we error out for now when someone tries to use reftable
when the_hash_algo != sha1.  That would buy a bit more time to pin
down the details of version 2 (e.g., should it only add an object
format field, or object format and oid size?).

The current reftable spec says

	Future

	Longer hashes

	Version will bump (e.g. 2) to indicate value uses a different
	object id length other than 20. The length could be stored in
	an expanded file header, or hardcoded as part of the version.

or in other words doesn't define that aspect of the format yet.

This is also a reason to get the reftable spec in git.git, since it
would make it easier to have the discussion in one place. :)  Thanks
for your explanations so far at https://git.eclipse.org/r/c/157501/.

Sincerely,
Jonathan

-- >8 --
Subject: reftable: file format documentation

Shawn Pearce explains:

Some repositories contain a lot of references (e.g. android at 866k,
rails at 31k). The reftable format provides:

- Near constant time lookup for any single reference, even when the
  repository is cold and not in process or kernel cache.
- Near constant time verification a SHA-1 is referred to by at least
  one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.

This file format spec was originally written in July, 2017 by Shawn
Pearce.  Some refinements since then were made by Shawn and by Han-Wen
Nienhuys based on experiences implementing and experimenting with the
format.  (All of this was in the context of our work at Google and
Google is happy to contribute the result to the Git project.)

Imported from JGit[1]'s current version (c217d33ff,
"Documentation/technical/reftable: improve repo layout", 2020-02-04)
of Documentation/technical/reftable.md and converted to asciidoc by
running

  pandoc -t asciidoc -f markdown reftable.md >reftable.txt

using pandoc 2.2.1.  The result required the following additional
minor changes:

- removed the [TOC] directive to add a table of contents, since
  asciidoc does not support it
- replaced git-scm.com/docs links with linkgit: directives that link
  to other pages within Git's documentation

[1] https://eclipse.googlesource.com/jgit/jgit

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/Makefile               |    1 +
 Documentation/technical/reftable.txt | 1067 ++++++++++++++++++++++++++
 2 files changed, 1068 insertions(+)
 create mode 100644 Documentation/technical/reftable.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 8fe829cc1b..3aab9b8d61 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -92,6 +92,7 @@ TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
 TECH_DOCS += technical/protocol-v2
 TECH_DOCS += technical/racy-git
+TECH_DOCS += technical/reftable
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
 TECH_DOCS += technical/signature-format
diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
new file mode 100644
index 0000000000..9fa4657d9f
--- /dev/null
+++ b/Documentation/technical/reftable.txt
@@ -0,0 +1,1067 @@
+reftable
+--------
+
+Overview
+~~~~~~~~
+
+Problem statement
+^^^^^^^^^^^^^^^^^
+
+Some repositories contain a lot of references (e.g. android at 866k,
+rails at 31k). The existing packed-refs format takes up a lot of space
+(e.g. 62M), and does not scale with additional references. Lookup of a
+single reference requires linearly scanning the file.
+
+Atomic pushes modifying multiple references require copying the entire
+packed-refs file, which can be a considerable amount of data moved
+(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
+
+Repositories with many loose references occupy a large number of disk
+blocks from the local file system, as each reference is its own file
+storing 41 bytes (and another file for the corresponding reflog). This
+negatively affects the number of inodes available when a large number of
+repositories are stored on the same filesystem. Readers can be penalized
+due to the larger number of syscalls required to traverse and read the
+`$GIT_DIR/refs` directory.
+
+Objectives
+^^^^^^^^^^
+
+* Near constant time lookup for any single reference, even when the
+repository is cold and not in process or kernel cache.
+* Near constant time verification if a SHA-1 is referred to by at least
+one reference (for allow-tip-sha1-in-want).
+* Efficient lookup of an entire namespace, such as `refs/tags/`.
+* Support atomic push with `O(size_of_update)` operations.
+* Combine reflog storage with ref storage for small transactions.
+* Separate reflog storage for base refs and historical logs.
+
+Description
+^^^^^^^^^^^
+
+A reftable file is a portable binary file format customized for
+reference storage. References are sorted, enabling linear scans, binary
+search lookup, and range scans.
+
+Storage in the file is organized into variable sized blocks. Prefix
+compression is used within a single block to reduce disk space. Block
+size and alignment is tunable by the writer.
+
+Performance
+^^^^^^^^^^^
+
+Space used, packed-refs vs. reftable:
+
+[cols=",>,>,>,>,>",options="header",]
+|===============================================================
+|repository |packed-refs |reftable |% original |avg ref |avg obj
+|android |62.2 M |36.1 M |58.0% |33 bytes |5 bytes
+|rails |1.8 M |1.1 M |57.7% |29 bytes |4 bytes
+|git |78.7 K |48.1 K |61.0% |50 bytes |4 bytes
+|git (heads) |332 b |269 b |81.0% |33 bytes |0 bytes
+|===============================================================
+
+Scan (read 866k refs), by reference name lookup (single ref from 866k
+refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs):
+
+[cols=",>,>,>,>",options="header",]
+|=========================================================
+|format |cache |scan |by name |by SHA-1
+|packed-refs |cold |402 ms |409,660.1 usec |412,535.8 usec
+|packed-refs |hot | |6,844.6 usec |20,110.1 usec
+|reftable |cold |112 ms |33.9 usec |323.2 usec
+|reftable |hot | |20.2 usec |320.8 usec
+|=========================================================
+
+Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable:
+
+[cols=",>,>",options="header",]
+|================================
+|format |size |avg entry
+|$GIT_DIR/logs |173 M |1209 bytes
+|reftable |5 M |37 bytes
+|================================
+
+Details
+~~~~~~~
+
+Peeling
+^^^^^^^
+
+References stored in a reftable are peeled, a record for an annotated
+(or signed) tag records both the tag object, and the object it refers
+to.
+
+Reference name encoding
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Reference names are an uninterpreted sequence of bytes that must pass
+linkgit:git-check-ref-format[1] as a valid reference name.
+
+Key unicity
+^^^^^^^^^^^
+
+Each entry must have a unique key; repeated keys are disallowed.
+
+Network byte order
+^^^^^^^^^^^^^^^^^^
+
+All multi-byte, fixed width fields are in network byte order.
+
+Ordering
+^^^^^^^^
+
+Blocks are lexicographically ordered by their first reference.
+
+Directory/file conflicts
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format accepts both `refs/heads/foo` and
+`refs/heads/foo/bar` as distinct references.
+
+This property is useful for retaining log records in reftable, but may
+confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
+references. Users of reftable may choose to continue to reject `foo` and
+`foo/bar` type conflicts to prevent problems for peers.
+
+File format
+~~~~~~~~~~~
+
+Structure
+^^^^^^^^^
+
+A reftable file has the following high-level structure:
+
+....
+first_block {
+  header
+  first_ref_block
+}
+ref_block*
+ref_index*
+obj_block*
+obj_index*
+log_block*
+log_index*
+footer
+....
+
+A log-only file omits the `ref_block`, `ref_index`, `obj_block` and
+`obj_index` sections, containing only the file header and log block:
+
+....
+first_block {
+  header
+}
+log_block*
+log_index*
+footer
+....
+
+in a log-only file the first log block immediately follows the file
+header, without padding to block alignment.
+
+Block size
+^^^^^^^^^^
+
+The file’s block size is arbitrarily determined by the writer, and does
+not have to be a power of 2. The block size must be larger than the
+longest reference name or log entry used in the repository, as
+references cannot span blocks.
+
+Powers of two that are friendly to the virtual memory system or
+filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
+yield better compression, with a possible increased cost incurred by
+readers during access.
+
+The largest block size is `16777215` bytes (15.99 MiB).
+
+Block alignment
+^^^^^^^^^^^^^^^
+
+Writers may choose to align blocks at multiples of the block size by
+including `padding` filled with NUL bytes at the end of a block to round
+out to the chosen alignment. When alignment is used, writers must
+specify the alignment with the file header’s `block_size` field.
+
+Block alignment is not required by the file format. Unaligned files must
+set `block_size = 0` in the file header, and omit `padding`. Unaligned
+files with more than one ref block must include the link:#Ref-index[ref
+index] to support fast lookup. Readers must be able to read both aligned
+and non-aligned files.
+
+Very small files (e.g. 1 only ref block) may omit `padding` and the ref
+index to reduce total file size.
+
+Header
+^^^^^^
+
+A 24-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+....
+
+Aligned files must specify `block_size` to configure readers with the
+expected block alignment. Unaligned files must set `block_size = 0`.
+
+The `min_update_index` and `max_update_index` describe bounds for the
+`update_index` field of all log records in this file. When reftables are
+used in a stack for link:#Update-transactions[transactions], these
+fields can order the files such that the prior file’s
+`max_update_index + 1` is the next file’s `min_update_index`.
+
+First ref block
+^^^^^^^^^^^^^^^
+
+The first ref block shares the same block as the file header, and is 24
+bytes smaller than all other blocks in the file. The first block
+immediately begins after the file header, at position 24.
+
+If the first block is a log block (a log-only file), its block header
+begins immediately at position 24.
+
+Ref block format
+^^^^^^^^^^^^^^^^
+
+A ref block is written as:
+
+....
+'r'
+uint24( block_len )
+ref_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
+encodes the number of bytes in the block up to, but not including the
+optional `padding`. This is always less than or equal to the file’s
+block size. In the first ref block, `block_len` includes 24 bytes for
+the file header.
+
+The 2-byte `restart_count` stores the number of entries in the
+`restart_offset` list, which must not be empty. Readers can use
+`restart_count` to binary search between restarts before starting a
+linear scan.
+
+Exactly `restart_count` 3-byte `restart_offset` values precedes the
+`restart_count`. Offsets are relative to the start of the block and
+refer to the first byte of any `ref_record` whose name has not been
+prefix compressed. Entries in the `restart_offset` list must be sorted,
+ascending. Readers can start linear scans from any of these records.
+
+A variable number of `ref_record` fill the middle of the block,
+describing reference names and values. The format is described below.
+
+As the first ref block shares the first file block with the file header,
+all `restart_offset` in the first block are relative to the start of the
+file (position 0), and include the file header. This forces the first
+`restart_offset` to be `28`.
+
+ref record
+++++++++++
+
+A `ref_record` describes a single reference, storing both the name and
+its value(s). Records are formatted as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | value_type )
+suffix
+varint( update_index_delta )
+value?
+....
+
+The `prefix_length` field specifies how many leading bytes of the prior
+reference record’s name should be copied to obtain this reference’s
+name. This must be 0 for the first reference in any block, and also must
+be 0 for any `ref_record` whose offset is listed in the `restart_offset`
+table at the end of the block.
+
+Recovering a reference name from any `ref_record` is a simple concat:
+
+....
+this_name = prior_name[0..prefix_length] + suffix
+....
+
+The `suffix_length` value provides the number of bytes available in
+`suffix` to copy from `suffix` to complete the reference name.
+
+The `update_index` that last modified the reference can be obtained by
+adding `update_index_delta` to the `min_update_index` from the file
+header: `min_update_index + update_index_delta`.
+
+The `value` follows. Its format is determined by `value_type`, one of
+the following:
+
+* `0x0`: deletion; no value data (see transactions, below)
+* `0x1`: one 20-byte object id; value of the ref
+* `0x2`: two 20-byte object ids; value of the ref, peeled target
+* `0x3`: symbolic reference: `varint( target_len ) target`
+
+Symbolic references use `0x3`, followed by the complete name of the
+reference target. No compression is applied to the target name.
+
+Types `0x4..0x7` are reserved for future use.
+
+Ref index
+^^^^^^^^^
+
+The ref index stores the name of the last reference from every ref block
+in the file, enabling reduced disk seeks for lookups. Any reference can
+be found by searching the index, identifying the containing block, and
+searching within that block.
+
+The index may be organized into a multi-level index, where the 1st level
+index block points to additional ref index blocks (2nd level), which may
+in turn point to either additional index blocks (e.g. 3rd level) or ref
+blocks (leaf level). Disk reads required to access a ref go up with
+higher index levels. Multi-level indexes may be required to ensure no
+single index block exceeds the file format’s max block size of
+`16777215` bytes (15.99 MiB). To acheive constant O(1) disk seeks for
+lookups the index must be a single level, which is permitted to exceed
+the file’s configured block size, but not the format’s max block size of
+15.99 MiB.
+
+If present, the ref index block(s) appears after the last ref block.
+
+If there are at least 4 ref blocks, a ref index block should be written
+to improve lookup times. Cold reads using the index require 2 disk reads
+(read index, read block), and binary searching < 4 blocks also requires
+<= 2 reads. Omitting the index block from smaller files saves space.
+
+If the file is unaligned and contains more than one ref block, the ref
+index must be written.
+
+Index block format:
+
+....
+'i'
+uint24( block_len )
+index_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+The index blocks begin with `block_type = 'i'` and a 3-byte `block_len`
+which encodes the number of bytes in the block, up to but not including
+the optional `padding`.
+
+The `restart_offset` and `restart_count` fields are identical in format,
+meaning and usage as in ref blocks.
+
+To reduce the number of reads required for random access in very large
+files the index block may be larger than other blocks. However, readers
+must hold the entire index in memory to benefit from this, so it’s a
+time-space tradeoff in both file size and reader memory.
+
+Increasing the file’s block size decreases the index size. Alternatively
+a multi-level index may be used, keeping index blocks within the file’s
+block size, but increasing the number of blocks that need to be
+accessed.
+
+index record
+++++++++++++
+
+An index record describes the last entry in another block. Index records
+are written as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | 0 )
+suffix
+varint( block_position )
+....
+
+Index records use prefix compression exactly like `ref_record`.
+
+Index records store `block_position` after the suffix, specifying the
+absolute position in bytes (from the start of the file) of the block
+that ends with this reference. Readers can seek to `block_position` to
+begin reading the block header.
+
+Readers must examine the block header at `block_position` to determine
+if the next block is another level index block, or the leaf-level ref
+block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the ref index must first read the footer (below) to
+obtain `ref_index_position`. If not present, the position will be 0. The
+`ref_index_position` is for the 1st level root of the ref index.
+
+Obj block format
+^^^^^^^^^^^^^^^^
+
+Object blocks are optional. Writers may choose to omit object blocks,
+especially if readers will not use the SHA-1 to ref mapping.
+
+Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to
+ref blocks containing references pointing to that object directly, or as
+the peeled value of an annotated tag. Like ref blocks, object blocks use
+the file’s standard block size. The abbrevation length is available in
+the footer as `obj_id_len`.
+
+To save space in small files, object blocks may be omitted if the ref
+index is not present, as brute force search will only need to read a few
+ref blocks. When missing, readers should brute force a linear search of
+all references to lookup by SHA-1.
+
+An object block is written as:
+
+....
+'o'
+uint24( block_len )
+obj_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Fields are identical to ref block. Binary search using the restart table
+works the same as in reference blocks.
+
+Because object identifiers are abbreviated by writers to the shortest
+unique abbreviation within the reftable, obj key lengths are variable
+between 2 and 20 bytes. Readers must compare only for common prefix
+match within an obj block or obj index.
+
+obj record
+++++++++++
+
+An `obj_record` describes a single object abbreviation, and the blocks
+containing references using that unique abbreviation:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | cnt_3 )
+suffix
+varint( cnt_large )?
+varint( position_delta )*
+....
+
+Like in reference blocks, abbreviations are prefix compressed within an
+obj block. On large reftables with many unique objects, higher block
+sizes (64k), and higher restart interval (128), a `prefix_length` of 2
+or 3 and `suffix_length` of 3 may be common in obj records (unique
+abbreviation of 5-6 raw bytes, 10-12 hex digits).
+
+Each record contains `position_count` number of positions for matching
+ref blocks. For 1-7 positions the count is stored in `cnt_3`. When
+`cnt_3 = 0` the actual count follows in a varint, `cnt_large`.
+
+The use of `cnt_3` bets most objects are pointed to by only a single
+reference, some may be pointed to by a couple of references, and very
+few (if any) are pointed to by more than 7 references.
+
+A special case exists when `cnt_3 = 0` and `cnt_large = 0`: there are no
+`position_delta`, but at least one reference starts with this
+abbreviation. A reader that needs exact reference names must scan all
+references to find which specific references have the desired object.
+Writers should use this format when the `position_delta` list would have
+overflowed the file’s block size due to a high number of references
+pointing to the same object.
+
+The first `position_delta` is the position from the start of the file.
+Additional `position_delta` entries are sorted ascending and relative to
+the prior entry, e.g. a reader would perform:
+
+....
+pos = position_delta[0]
+prior = pos
+for (j = 1; j < position_count; j++) {
+  pos = prior + position_delta[j]
+  prior = pos
+}
+....
+
+With a position in hand, a reader must linearly scan the ref block,
+starting from the first `ref_record`, testing each reference’s SHA-1s
+(for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
+SHA-1 within a single ref block is not supported by the reftable format.
+Smaller block sizes reduce the number of candidates this step must
+consider.
+
+Obj index
+^^^^^^^^^
+
+The obj index stores the abbreviation from the last entry for every obj
+block in the file, enabling reduced disk seeks for all lookups. It is
+formatted exactly the same as the ref index, but refers to obj blocks.
+
+The obj index should be present if obj blocks are present, as obj blocks
+should only be written in larger files.
+
+Readers loading the obj index must first read the footer (below) to
+obtain `obj_index_position`. If not present, the position will be 0.
+
+Log block format
+^^^^^^^^^^^^^^^^
+
+Unlike ref and obj blocks, log blocks are always unaligned.
+
+Log blocks are variable in size, and do not match the `block_size`
+specified in the file header or footer. Writers should choose an
+appropriate buffer size to prepare a log block for deflation, such as
+`2 * block_size`.
+
+A log block is written as:
+
+....
+'g'
+uint24( block_len )
+zlib_deflate {
+  log_record+
+  uint24( restart_offset )+
+  uint16( restart_count )
+}
+....
+
+Log blocks look similar to ref blocks, except `block_type = 'g'`.
+
+The 4-byte block header is followed by the deflated block contents using
+zlib deflate. The `block_len` in the header is the inflated size
+(including 4-byte block header), and should be used by readers to
+preallocate the inflation output buffer. A log block’s `block_len` may
+exceed the file’s block size.
+
+Offsets within the log block (e.g. `restart_offset`) still include the
+4-byte header. Readers may prefer prefixing the inflation output buffer
+with the 4-byte header.
+
+Within the deflate container, a variable number of `log_record` describe
+reference changes. The log record format is described below. See ref
+block format (above) for a description of `restart_offset` and
+`restart_count`.
+
+Because log blocks have no alignment or padding between blocks, readers
+must keep track of the bytes consumed by the inflater to know where the
+next log block begins.
+
+log record
+++++++++++
+
+Log record keys are structured as:
+
+....
+ref_name '\0' reverse_int64( update_index )
+....
+
+where `update_index` is the unique transaction identifier. The
+`update_index` field must be unique within the scope of a `ref_name`.
+See the update transactions section below for further details.
+
+The `reverse_int64` function inverses the value so lexographical
+ordering the network byte order encoding sorts the more recent records
+with higher `update_index` values first:
+
+....
+reverse_int64(int64 t) {
+  return 0xffffffffffffffff - t;
+}
+....
+
+Log records have a similar starting structure to ref and index records,
+utilizing the same prefix compression scheme applied to the log record
+key described above.
+
+....
+    varint( prefix_length )
+    varint( (suffix_length << 3) | log_type )
+    suffix
+    log_data {
+      old_id
+      new_id
+      varint( name_length    )  name
+      varint( email_length   )  email
+      varint( time_seconds )
+      sint16( tz_offset )
+      varint( message_length )  message
+    }?
+....
+
+Log record entries use `log_type` to indicate what follows:
+
+* `0x0`: deletion; no log data.
+* `0x1`: standard git reflog data using `log_data` above.
+
+The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
+entry from the reflog of `refs/stash` in a transaction file (below),
+without needing to rewrite larger files. Readers reading a stack of
+reflogs must treat this as a deletion.
+
+For `log_type = 0x1`, the `log_data` section follows
+linkgit:git-update-ref[1] logging and includes:
+
+* two 20-byte SHA-1s (old id, new id)
+* varint string of committer’s name
+* varint string of committer’s email
+* varint time in seconds since epoch (Jan 1, 1970)
+* 2-byte timezone offset in minutes (signed)
+* varint string of message
+
+`tz_offset` is the absolute number of minutes from GMT the committer was
+at the time of the update. For example `GMT-0800` is encoded in reftable
+as `sint16(-480)` and `GMT+0230` is `sint16(150)`.
+
+The committer email does not contain `<` or `>`, it’s the value normally
+found between the `<>` in a git commit object header.
+
+The `message_length` may be 0, in which case there was no message
+supplied for the update.
+
+Contrary to traditional reflog (which is a file), renames are encoded as
+a combination of ref deletion and ref creation.
+
+Reading the log
++++++++++++++++
+
+Readers accessing the log must first read the footer (below) to
+determine the `log_position`. The first block of the log begins at
+`log_position` bytes since the start of the file. The `log_position` is
+not block aligned.
+
+Importing logs
+++++++++++++++
+
+When importing from `$GIT_DIR/logs` writers should globally order all
+log records roughly by timestamp while preserving file order, and assign
+unique, increasing `update_index` values for each log line. Newer log
+records get higher `update_index` values.
+
+Although an import may write only a single reftable file, the reftable
+file must span many unique `update_index`, as each log line requires its
+own `update_index` to preserve semantics.
+
+Log index
+^^^^^^^^^
+
+The log index stores the log key
+(`refname \0 reverse_int64(update_index)`) for the last log record of
+every log block in the file, supporting bounded-time lookup.
+
+A log index block must be written if 2 or more log blocks are written to
+the file. If present, the log index appears after the last log block.
+There is no padding used to align the log index to block alignment.
+
+Log index format is identical to ref index, except the keys are 9 bytes
+longer to include `'\0'` and the 8-byte `reverse_int64(update_index)`.
+Records use `block_position` to refer to the start of a log block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the log index must first read the footer (below) to
+obtain `log_index_position`. If not present, the position will be 0.
+
+Footer
+^^^^^^
+
+After the last block of the file, a file footer is written. It begins
+like the file header, but is extended with additional data.
+
+A 68-byte footer appears at the end:
+
+....
+    'REFT'
+    uint8( version_number = 1 )
+    uint24( block_size )
+    uint64( min_update_index )
+    uint64( max_update_index )
+
+    uint64( ref_index_position )
+    uint64( (obj_position << 5) | obj_id_len )
+    uint64( obj_index_position )
+
+    uint64( log_position )
+    uint64( log_index_position )
+
+    uint32( CRC-32 of above )
+....
+
+If a section is missing (e.g. ref index) the corresponding position
+field (e.g. `ref_index_position`) will be 0.
+
+* `obj_position`: byte position for the first obj block.
+* `obj_id_len`: number of bytes used to abbreviate object identifiers in
+obj blocks.
+* `log_position`: byte position for the first log block.
+* `ref_index_position`: byte position for the start of the ref index.
+* `obj_index_position`: byte position for the start of the obj index.
+* `log_index_position`: byte position for the start of the log index.
+
+Reading the footer
+++++++++++++++++++
+
+Readers must seek to `file_length - 68` to access the footer. A trusted
+external source (such as `stat(2)`) is necessary to obtain
+`file_length`. When reading the footer, readers must verify:
+
+* 4-byte magic is correct
+* 1-byte version number is recognized
+* 4-byte CRC-32 matches the other 64 bytes (including magic, and
+version)
+
+Once verified, the other fields of the footer can be accessed.
+
+Varint encoding
+^^^^^^^^^^^^^^^
+
+Varint encoding is identical to the ofs-delta encoding method used
+within pack files.
+
+Decoder works such as:
+
+....
+val = buf[ptr] & 0x7f
+while (buf[ptr] & 0x80) {
+  ptr++
+  val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
+}
+....
+
+Binary search
+^^^^^^^^^^^^^
+
+Binary search within a block is supported by the `restart_offset` fields
+at the end of the block. Readers can binary search through the restart
+table to locate between which two restart points the sought reference or
+key should appear.
+
+Each record identified by a `restart_offset` stores the complete key in
+the `suffix` field of the record, making the compare operation during
+binary search straightforward.
+
+Once a restart point lexicographically before the sought reference has
+been identified, readers can linearly scan through the following record
+entries to locate the sought record, terminating if the current record
+sorts after (and therefore the sought key is not present).
+
+Restart point selection
++++++++++++++++++++++++
+
+Writers determine the restart points at file creation. The process is
+arbitrary, but every 16 or 64 records is recommended. Every 16 may be
+more suitable for smaller block sizes (4k or 8k), every 64 for larger
+block sizes (64k).
+
+More frequent restart points reduces prefix compression and increases
+space consumed by the restart table, both of which increase file size.
+
+Less frequent restart points makes prefix compression more effective,
+decreasing overall file size, with increased penalities for readers
+walking through more records after the binary search step.
+
+A maximum of `65535` restart points per block is supported.
+
+Considerations
+~~~~~~~~~~~~~~
+
+Lightweight refs dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format assumes the vast majority of references are single
+SHA-1 valued with common prefixes, such as Gerrit Code Review’s
+`refs/changes/` namespace, GitHub’s `refs/pulls/` namespace, or many
+lightweight tags in the `refs/tags/` namespace.
+
+Annotated tags storing the peeled object cost an additional 20 bytes per
+reference.
+
+Low overhead
+^^^^^^^^^^^^
+
+A reftable with very few references (e.g. git.git with 5 heads) is 269
+bytes for reftable, vs. 332 bytes for packed-refs. This supports
+reftable scaling down for transaction logs (below).
+
+Block size
+^^^^^^^^^^
+
+For a Gerrit Code Review type repository with many change refs, larger
+block sizes (64 KiB) and less frequent restart points (every 64) yield
+better compression due to more references within the block compressing
+against the prior reference.
+
+Larger block sizes reduce the index size, as the reftable will require
+fewer blocks to store the same number of references.
+
+Minimal disk seeks
+^^^^^^^^^^^^^^^^^^
+
+Assuming the index block has been loaded into memory, binary searching
+for any single reference requires exactly 1 disk seek to load the
+containing block.
+
+Scans and lookups dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Scanning all references and lookup by name (or namespace such as
+`refs/heads/`) are the most common activities performed on repositories.
+SHA-1s are stored directly with references to optimize this use case.
+
+Logs are infrequently read
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are infrequently accessed, but can be large. Deflating log blocks
+saves disk space, with some increased penalty at read time.
+
+Logs are stored in an isolated section from refs, reducing the burden on
+reference readers that want to ignore logs. Further, historical logs can
+be isolated into log-only files.
+
+Logs are read backwards
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are frequently accessed backwards (most recent N records for master
+to answer `master@{4}`), so log records are grouped by reference, and
+sorted descending by update index.
+
+Repository format
+~~~~~~~~~~~~~~~~~
+
+Version 1
+^^^^^^^^^
+
+A repository must set its `$GIT_DIR/config` to configure reftable:
+
+....
+[core]
+    repositoryformatversion = 1
+[extensions]
+    refStorage = reftable
+....
+
+Layout
+^^^^^^
+
+A collection of reftable files are stored in the `$GIT_DIR/reftable/`
+directory:
+
+....
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+where reftable files are named by a unique name such as produced by the
+function `${min_update_index}-${max_update_index}.ref`.
+
+Log-only files use the `.log` extension, while ref-only and mixed ref
+and log files use `.ref`. extension.
+
+The stack ordering file is `$GIT_DIR/reftable/tables.list` and lists the
+current files, one per line, in order, from oldest (base) to newest
+(most recent):
+
+....
+$ cat .git/reftable/tables.list
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+Readers must read `$GIT_DIR/reftable/tables.list` to determine which
+files are relevant right now, and search through the stack in reverse
+order (last reftable is examined first).
+
+Reftable files not listed in `tables.list` may be new (and about to be
+added to the stack by the active writer), or ancient and ready to be
+pruned.
+
+Backward compatibility
+^^^^^^^^^^^^^^^^^^^^^^
+
+Older clients should continue to recognize the directory as a git
+repository so they don’t look for an enclosing repository in parent
+directories. To this end, a reftable-enabled repository must contain the
+following dummy files
+
+* `.git/HEAD`, a regular file containing `ref: refs/heads/.invalid`.
+* `.git/refs/`, a directory
+* `.git/refs/heads`, a regular file
+
+Readers
+^^^^^^^
+
+Readers can obtain a consistent snapshot of the reference space by
+following:
+
+1.  Open and read the `tables.list` file.
+2.  Open each of the reftable files that it mentions.
+3.  If any of the files is missing, goto 1.
+4.  Read from the now-open files as long as necessary.
+
+Update transactions
+^^^^^^^^^^^^^^^^^^^
+
+Although reftables are immutable, mutations are supported by writing a
+new reftable and atomically appending it to the stack:
+
+1.  Acquire `tables.list.lock`.
+2.  Read `tables.list` to determine current reftables.
+3.  Select `update_index` to be most recent file’s
+`max_update_index + 1`.
+4.  Prepare temp reftable `tmp_XXXXXX`, including log entries.
+5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
+6.  Copy `tables.list` to `tables.list.lock`, appending file from (5).
+7.  Rename `tables.list.lock` to `tables.list`.
+
+During step 4 the new file’s `min_update_index` and `max_update_index`
+are both set to the `update_index` selected by step 3. All log records
+for the transaction use the same `update_index` in their keys. This
+enables later correlation of which references were updated by the same
+transaction.
+
+Because a single `tables.list.lock` file is used to manage locking, the
+repository is single-threaded for writers. Writers may have to busy-spin
+(with backoff) around creating `tables.list.lock`, for up to an
+acceptable wait period, aborting if the repository is too busy to
+mutate. Application servers wrapped around repositories (e.g. Gerrit
+Code Review) can layer their own lock/wait queue to improve fairness to
+writers.
+
+Reference deletions
+^^^^^^^^^^^^^^^^^^^
+
+Deletion of any reference can be explicitly stored by setting the `type`
+to `0x0` and omitting the `value` field of the `ref_record`. This serves
+as a tombstone, overriding any assertions about the existence of the
+reference from earlier files in the stack.
+
+Compaction
+^^^^^^^^^^
+
+A partial stack of reftables can be compacted by merging references
+using a straightforward merge join across reftables, selecting the most
+recent value for output, and omitting deleted references that do not
+appear in remaining, lower reftables.
+
+A compacted reftable should set its `min_update_index` to the smallest
+of the input files’ `min_update_index`, and its `max_update_index`
+likewise to the largest input `max_update_index`.
+
+For sake of illustration, assume the stack currently consists of
+reftable files (from oldest to newest): A, B, C, and D. The compactor is
+going to compact B and C, leaving A and D alone.
+
+1.  Obtain lock `tables.list.lock` and read the `tables.list` file.
+2.  Obtain locks `B.lock` and `C.lock`. Ownership of these locks
+prevents other processes from trying to compact these files.
+3.  Release `tables.list.lock`.
+4.  Compact `B` and `C` into a temp file
+`${min_update_index}-${max_update_index}_XXXXXX`.
+5.  Reacquire lock `tables.list.lock`.
+6.  Verify that `B` and `C` are still in the stack, in that order. This
+should always be the case, assuming that other processes are adhering to
+the locking protocol.
+7.  Rename `${min_update_index}-${max_update_index}_XXXXXX` to
+`${min_update_index}-${max_update_index}.ref`.
+8.  Write the new stack to `tables.list.lock`, replacing `B` and `C`
+with the file from (4).
+9.  Rename `tables.list.lock` to `tables.list`.
+10. Delete `B` and `C`, perhaps after a short sleep to avoid forcing
+readers to backtrack.
+
+This strategy permits compactions to proceed independently of updates.
+
+Each reftable (compacted or not) is uniquely identified by its name, so
+open reftables can be cached by their name.
+
+Alternatives considered
+~~~~~~~~~~~~~~~~~~~~~~~
+
+bzip packed-refs
+^^^^^^^^^^^^^^^^
+
+`bzip2` can significantly shrink a large packed-refs file (e.g. 62 MiB
+compresses to 23 MiB, 37%). However the bzip format does not support
+random access to a single reference. Readers must inflate and discard
+while performing a linear scan.
+
+Breaking packed-refs into chunks (individually compressing each chunk)
+would reduce the amount of data a reader must inflate, but still leaves
+the problem of indexing chunks to support readers efficiently locating
+the correct chunk.
+
+Given the compression achieved by reftable’s encoding, it does not seem
+necessary to add the complexity of bzip/gzip/zlib.
+
+Michael Haggerty’s alternate format
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Michael Haggerty proposed
+https://public-inbox.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe+wGvEJ7A@mail.gmail.com/[an
+alternate] format to reftable on the Git mailing list. This format uses
+smaller chunks, without the restart table, and avoids block alignment
+with padding. Reflog entries immediately follow each ref, and are thus
+interleaved between refs.
+
+Performance testing indicates reftable is faster for lookups (51%
+faster, 11.2 usec vs. 5.4 usec), although reftable produces a slightly
+larger file (+ ~3.2%, 28.3M vs 29.2M):
+
+[cols=">,>,>,>",options="header",]
+|=====================================
+|format |size |seek cold |seek hot
+|mh-alt |28.3 M |23.4 usec |11.2 usec
+|reftable |29.2 M |19.9 usec |5.4 usec
+|=====================================
+
+JGit Ketch RefTree
+^^^^^^^^^^^^^^^^^^
+
+https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg03073.html[JGit Ketch]
+proposed
+https://public-inbox.org/git/CAJo=hJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg@mail.gmail.com/[RefTree],
+an encoding of references inside Git tree objects stored as part of the
+repository’s object database.
+
+The RefTree format adds additional load on the object database storage
+layer (more loose objects, more objects in packs), and relies heavily on
+the packer’s delta compression to save space. Namespaces which are flat
+(e.g. thousands of tags in refs/tags) initially create very large loose
+objects, and so RefTree does not address the problem of copying many
+references to modify a handful.
+
+Flat namespaces are not efficiently searchable in RefTree, as tree
+objects in canonical formatting cannot be binary searched. This fails
+the need to handle a large number of references in a single namespace,
+such as GitHub’s `refs/pulls`, or a project with many tags.
+
+LMDB
+^^^^
+
+David Turner proposed
+https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/[using
+LMDB], as LMDB is lightweight (64k of runtime code) and GPL-compatible
+license.
+
+A downside of LMDB is its reliance on a single C implementation. This
+makes embedding inside JGit (a popular reimplemenation of Git)
+difficult, and hoisting onto virtual storage (for JGit DFS) virtually
+impossible.
+
+A common format that can be supported by all major Git implementations
+(git-core, JGit, libgit2) is strongly preferred.
+
+Future
+~~~~~~
+
+Longer hashes
+^^^^^^^^^^^^^
+
+Version will bump (e.g. 2) to indicate `value` uses a different object
+id length other than 20. The length could be stored in an expanded file
+header, or hardcoded as part of the version.
-- 
2.25.1.481.gfbce0eb801


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v7 0/6] Reftable support git-core
  2020-02-18  8:43         ` [PATCH v6 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                             ` (6 preceding siblings ...)
  2020-02-21  6:40           ` Jonathan Nieder
@ 2020-02-26  8:49           ` Han-Wen Nienhuys via GitGitGadget
  2020-02-26  8:49             ` [PATCH v7 1/6] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
                               ` (8 more replies)
  7 siblings, 9 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-26  8:49 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend.

Feedback wanted:

 * spots marked with XXX in the Git source code.
   
   
 * what is a good test strategy? Could we have a CI flavor where we flip the
   default to reftable, and see how it fares?
   
   

v7

 * support SHA256 as version 2 of the format.

v8

 * propagate errors to git.
 * discard empty tables in the stack. 
 * one very basic test (t0031.sh)

v9

 * Added spec as technical doc.
 * Use 4-byte hash ID as API. Storage format for SHA256 is still pending
   discussion.

Han-Wen Nienhuys (5):
  refs.h: clarify reflog iteration order
  create .git/refs in files-backend.c
  refs: document how ref_iterator_advance_fn should handle symrefs
  Add reftable library
  Reftable support for git-core

Jonathan Nieder (1):
  reftable: file format documentation

 Documentation/Makefile                        |    1 +
 Documentation/technical/reftable.txt          | 1067 ++++++++++++++++
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   24 +-
 builtin/clone.c                               |    4 +-
 builtin/init-db.c                             |   57 +-
 cache.h                                       |    4 +-
 refs.c                                        |   20 +-
 refs.h                                        |    8 +-
 refs/files-backend.c                          |    6 +
 refs/refs-internal.h                          |    6 +
 refs/reftable-backend.c                       | 1015 +++++++++++++++
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   11 +
 reftable/VERSION                              |    5 +
 reftable/basics.c                             |  160 +++
 reftable/basics.h                             |   30 +
 reftable/block.c                              |  413 ++++++
 reftable/block.h                              |   71 ++
 reftable/blocksource.h                        |   20 +
 reftable/bytes.c                              |    0
 reftable/config.h                             |    1 +
 reftable/constants.h                          |   25 +
 reftable/dump.c                               |   97 ++
 reftable/file.c                               |   97 ++
 reftable/iter.c                               |  229 ++++
 reftable/iter.h                               |   56 +
 reftable/merged.c                             |  290 +++++
 reftable/merged.h                             |   34 +
 reftable/pq.c                                 |  114 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  720 +++++++++++
 reftable/reader.h                             |   52 +
 reftable/record.c                             | 1119 +++++++++++++++++
 reftable/record.h                             |   79 ++
 reftable/reftable.h                           |  409 ++++++
 reftable/slice.c                              |  199 +++
 reftable/slice.h                              |   39 +
 reftable/stack.c                              | 1007 +++++++++++++++
 reftable/stack.h                              |   40 +
 reftable/system.h                             |   53 +
 reftable/tree.c                               |   66 +
 reftable/tree.h                               |   24 +
 reftable/update.sh                            |   13 +
 reftable/writer.c                             |  637 ++++++++++
 reftable/writer.h                             |   45 +
 reftable/zlib-compat.c                        |   92 ++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |   31 +
 51 files changed, 8547 insertions(+), 32 deletions(-)
 create mode 100644 Documentation/technical/reftable.txt
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100755 t/t0031-reftable.sh


base-commit: 51ebf55b9309824346a6589c9f3b130c6f371b8f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v7
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v7
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v6:

 1:  e4b0773becd = 1:  b1e44bc431e refs.h: clarify reflog iteration order
 2:  c25b6c601dc = 2:  b68488a095e create .git/refs in files-backend.c
 3:  e132e0f4e00 ! 3:  da538ef7421 refs: document how ref_iterator_advance_fn should handle symrefs
     @@ -13,8 +13,8 @@
       
      +/*
      + * backend-specific implementation of ref_iterator_advance.
     -+ * For symrefs, the function should set REF_ISSYMREF, and it should also dereference
     -+ * the symref to provide the OID referent.
     ++ * For symrefs, the function should set REF_ISSYMREF, and it should also
     ++ * dereference the symref to provide the OID referent.
      + */
       typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
       
 -:  ----------- > 4:  aae26814983 reftable: file format documentation
 4:  fe29a9db399 ! 5:  30ed43a4fdb Add reftable library
     @@ -2,19 +2,11 @@
      
          Add reftable library
      
     -    Reftable is a new format for storing the ref database. It provides the
     -    following benefits:
     -
     -     * Simple and fast atomic ref transactions, including multiple refs and reflogs.
     -     * Compact storage of ref data.
     -     * Fast look ups of ref data.
     -     * Case-sensitive ref names on Windows/OSX, regardless of file system
     -     * Eliminates file/directory conflicts in ref names
     +    Reftable is a format for storing the ref database. Its rationale and
     +    specification is in the preceding commit.
      
          Further context and motivation can be found in background reading:
      
     -    * Spec: https://github.com/eclipse/jgit/blob/master/Documentation/technical/reftable.md
     -
          * Original discussion on JGit-dev:  https://www.eclipse.org/lists/jgit-dev/msg03389.html
      
          * First design discussion on git@vger: https://public-inbox.org/git/CAJo=hJtTp2eA3z9wW9cHo-nA7kK40vVThqh6inXpbCcqfdMP9g@mail.gmail.com/
     @@ -80,15 +72,7 @@
      +
      +To update the library, do:
      +
     -+   ((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
     -+    git clone https://github.com/google/reftable reftable-repo) && \
     -+   cp reftable-repo/c/*.[ch] reftable/ && \
     -+   cp reftable-repo/LICENSE reftable/ &&
     -+   git --git-dir reftable-repo/.git show --no-patch origin/master \
     -+    > reftable/VERSION && \
     -+   sed -i~ 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|' reftable/system.h
     -+   rm reftable/*_test.c reftable/test_framework.*
     -+   git add reftable/*.[ch]
     ++   sh reftable/update.sh
      +
      +Bugfixes should be accompanied by a test and applied to upstream project at
      +https://github.com/google/reftable.
     @@ -98,17 +82,11 @@
       --- /dev/null
       +++ b/reftable/VERSION
      @@
     -+commit 42aa04f301a682197f0698867b9771b78ce4fdaf
     ++commit ccce3b3eb763e79b23a3b4677d65ecceb4155ba3
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Mon Feb 17 16:05:22 2020 +0100
     -+
     -+    C: simplify system and config header handling
     -+    
     -+    reftable.h only depends on <stdint.h>
     -+    
     -+    system.h is only included from .c files. The git-core specific code is
     -+    guarded by a #if REFTABLE_IN_GITCORE, which we'll edit with a sed
     -+    script on the git-core side.
     ++Date:   Tue Feb 25 20:42:29 2020 +0100
     ++
     ++    C: rephrase the hash ID API in terms of a uint32_t
      
       diff --git a/reftable/basics.c b/reftable/basics.c
       new file mode 100644
     @@ -127,61 +105,23 @@
      +
      +#include "system.h"
      +
     -+void put_u24(byte *out, uint32_t i)
     ++void put_be24(byte *out, uint32_t i)
      +{
      +	out[0] = (byte)((i >> 16) & 0xff);
      +	out[1] = (byte)((i >> 8) & 0xff);
      +	out[2] = (byte)((i)&0xff);
      +}
      +
     -+uint32_t get_u24(byte *in)
     ++uint32_t get_be24(byte *in)
      +{
      +	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
      +	       (uint32_t)(in[2]);
      +}
      +
     -+void put_u32(byte *out, uint32_t i)
     ++void put_be16(uint8_t *out, uint16_t i)
      +{
     -+	out[0] = (byte)((i >> 24) & 0xff);
     -+	out[1] = (byte)((i >> 16) & 0xff);
     -+	out[2] = (byte)((i >> 8) & 0xff);
     -+	out[3] = (byte)((i)&0xff);
     -+}
     -+
     -+uint32_t get_u32(byte *in)
     -+{
     -+	return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
     -+	       (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
     -+}
     -+
     -+void put_u64(byte *out, uint64_t v)
     -+{
     -+	int i = 0;
     -+	for (i = sizeof(uint64_t); i--;) {
     -+		out[i] = (byte)(v & 0xff);
     -+		v >>= 8;
     -+	}
     -+}
     -+
     -+uint64_t get_u64(byte *out)
     -+{
     -+	uint64_t v = 0;
     -+	int i = 0;
     -+	for (i = 0; i < sizeof(uint64_t); i++) {
     -+		v = (v << 8) | (byte)(out[i] & 0xff);
     -+	}
     -+	return v;
     -+}
     -+
     -+void put_u16(byte *out, uint16_t i)
     -+{
     -+	out[0] = (byte)((i >> 8) & 0xff);
     -+	out[1] = (byte)((i)&0xff);
     -+}
     -+
     -+uint16_t get_u16(byte *in)
     -+{
     -+	return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
     ++	out[0] = (uint8_t)((i >> 8) & 0xff);
     ++	out[1] = (uint8_t)((i)&0xff);
      +}
      +
      +/*
     @@ -234,7 +174,9 @@
      +int names_length(char **names)
      +{
      +	int len = 0;
     -+	for (char **p = names; *p; p++) {
     ++	char **p = names;
     ++	while (*p) {
     ++		p++;
      +		len++;
      +	}
      +	return len;
     @@ -262,7 +204,7 @@
      +				names = realloc(names,
      +						names_cap * sizeof(char *));
      +			}
     -+			names[names_len++] = strdup(p);
     ++			names[names_len++] = xstrdup(p);
      +		}
      +		p = next + 1;
      +	}
     @@ -335,17 +277,10 @@
      +#define true 1
      +#define false 0
      +
     -+void put_u24(byte *out, uint32_t i);
     -+uint32_t get_u24(byte *in);
     -+
     -+uint64_t get_u64(byte *in);
     -+void put_u64(byte *out, uint64_t i);
     ++void put_be24(byte *out, uint32_t i);
     ++uint32_t get_be24(byte *in);
     ++void put_be16(uint8_t *out, uint16_t i);
      +
     -+void put_u32(byte *out, uint32_t i);
     -+uint32_t get_u32(byte *in);
     -+
     -+void put_u16(byte *out, uint16_t i);
     -+uint16_t get_u16(byte *in);
      +int binsearch(int sz, int (*f)(int k, void *args), void *args);
      +
      +void free_names(char **a);
     @@ -403,7 +338,7 @@
      +   success */
      +int block_writer_add(struct block_writer *w, struct record rec)
      +{
     -+	struct slice empty = {};
     ++	struct slice empty = { 0 };
      +	struct slice last = w->entries % w->restart_interval == 0 ? empty :
      +								    w->last_key;
      +	struct slice out = {
     @@ -414,7 +349,7 @@
      +	struct slice start = out;
      +
      +	bool restart = false;
     -+	struct slice key = {};
     ++	struct slice key = { 0 };
      +	int n = 0;
      +
      +	record_key(rec, &key);
     @@ -480,17 +415,17 @@
      +{
      +	int i = 0;
      +	for (i = 0; i < w->restart_len; i++) {
     -+		put_u24(w->buf + w->next, w->restarts[i]);
     ++		put_be24(w->buf + w->next, w->restarts[i]);
      +		w->next += 3;
      +	}
      +
     -+	put_u16(w->buf + w->next, w->restart_len);
     ++	put_be16(w->buf + w->next, w->restart_len);
      +	w->next += 2;
     -+	put_u24(w->buf + 1 + w->header_off, w->next);
     ++	put_be24(w->buf + 1 + w->header_off, w->next);
      +
      +	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
      +		int block_header_skip = 4 + w->header_off;
     -+		struct slice compressed = {};
     ++		struct slice compressed = { 0 };
      +		int zresult = 0;
      +		uLongf src_len = w->next - block_header_skip;
      +		slice_resize(&compressed, src_len);
     @@ -531,14 +466,14 @@
      +{
      +	uint32_t full_block_size = table_block_size;
      +	byte typ = block->data[header_off];
     -+	uint32_t sz = get_u24(block->data + header_off + 1);
     ++	uint32_t sz = get_be24(block->data + header_off + 1);
      +
      +	if (!is_block_type(typ)) {
      +		return FORMAT_ERROR;
      +	}
      +
      +	if (typ == BLOCK_TYPE_LOG) {
     -+		struct slice uncompressed = {};
     ++		struct slice uncompressed = { 0 };
      +		int block_header_skip = 4 + header_off;
      +		uLongf dst_len = sz - block_header_skip;
      +		uLongf src_len = block->len - block_header_skip;
     @@ -570,7 +505,7 @@
      +	}
      +
      +	{
     -+		uint16_t restart_count = get_u16(block->data + sz - 2);
     ++		uint16_t restart_count = get_be16(block->data + sz - 2);
      +		uint32_t restart_start = sz - 2 - 3 * restart_count;
      +
      +		byte *restart_bytes = block->data + restart_start;
     @@ -593,7 +528,7 @@
      +
      +static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
      +{
     -+	return get_u24(br->restart_bytes + 3 * i);
     ++	return get_be24(br->restart_bytes + 3 * i);
      +}
      +
      +void block_reader_start(struct block_reader *br, struct block_iter *it)
     @@ -604,9 +539,9 @@
      +}
      +
      +struct restart_find_args {
     ++	int error;
      +	struct slice key;
      +	struct block_reader *r;
     -+	int error;
      +};
      +
      +static int restart_key_less(int idx, void *args)
     @@ -620,8 +555,8 @@
      +
      +	/* the restart key is verbatim in the block, so this could avoid the
      +	   alloc for decoding the key */
     -+	struct slice rkey = {};
     -+	struct slice last_key = {};
     ++	struct slice rkey = { 0 };
     ++	struct slice last_key = { 0 };
      +	byte unused_extra;
      +	int n = decode_key(&rkey, &unused_extra, last_key, in);
      +	if (n < 0) {
     @@ -656,7 +591,7 @@
      +			.len = it->br->block_len - it->next_off,
      +		};
      +		struct slice start = in;
     -+		struct slice key = {};
     ++		struct slice key = { 0 };
      +		byte extra;
      +		int n = decode_key(&key, &extra, it->last_key, in);
      +		if (n < 0) {
     @@ -681,7 +616,7 @@
      +
      +int block_reader_first_key(struct block_reader *br, struct slice *key)
      +{
     -+	struct slice empty = {};
     ++	struct slice empty = { 0 };
      +	int off = br->header_off + 4;
      +	struct slice in = {
      +		.buf = br->block.data + off,
     @@ -729,10 +664,10 @@
      +
      +	{
      +		struct record rec = new_record(block_reader_type(br));
     -+		struct slice key = {};
     ++		struct slice key = { 0 };
      +		int result = 0;
      +		int err = 0;
     -+		struct block_iter next = {};
     ++		struct block_iter next = { 0 };
      +		while (true) {
      +			block_iter_copy_from(&next, it);
      +
     @@ -830,9 +765,9 @@
      +};
      +
      +struct block_iter {
     ++	uint32_t next_off;
      +	struct block_reader *br;
      +	struct slice last_key;
     -+	uint32_t next_off;
      +};
      +
      +int block_reader_init(struct block_reader *br, struct block *bl,
     @@ -937,7 +872,7 @@
      +
      +static int dump_table(const char *tablename)
      +{
     -+	struct block_source src = {};
     ++	struct block_source src = { 0 };
      +	int err = block_source_from_file(&src, tablename);
      +	if (err < 0) {
      +		return err;
     @@ -950,13 +885,13 @@
      +	}
      +
      +	{
     -+		struct iterator it = {};
     ++		struct iterator it = { 0 };
      +		err = reader_seek_ref(r, &it, "");
      +		if (err < 0) {
      +			return err;
      +		}
      +
     -+		struct ref_record ref = {};
     ++		struct ref_record ref = { 0 };
      +		while (1) {
      +			err = iterator_next_ref(it, &ref);
      +			if (err > 0) {
     @@ -972,12 +907,12 @@
      +	}
      +
      +	{
     -+		struct iterator it = {};
     ++		struct iterator it = { 0 };
      +		err = reader_seek_log(r, &it, "");
      +		if (err < 0) {
      +			return err;
      +		}
     -+		struct log_record log = {};
     ++		struct log_record log = { 0 };
      +		while (1) {
      +			err = iterator_next_log(it, &log);
      +			if (err > 0) {
     @@ -1001,7 +936,7 @@
      +	while ((opt = getopt(argc, argv, "t:")) != -1) {
      +		switch (opt) {
      +		case 't':
     -+			table = strdup(optarg);
     ++			table = xstrdup(optarg);
      +			break;
      +		case '?':
      +			printf("usage: %s [-table tablefile]\n", argv[0]);
     @@ -1091,7 +1026,7 @@
      +
      +int block_source_from_file(struct block_source *bs, const char *name)
      +{
     -+	struct stat st = {};
     ++	struct stat st = { 0 };
      +	int err = 0;
      +	int fd = open(name, O_RDONLY);
      +	if (fd < 0) {
     @@ -1188,14 +1123,14 @@
      +
      +int iterator_next_ref(struct iterator it, struct ref_record *ref)
      +{
     -+	struct record rec = {};
     ++	struct record rec = { 0 };
      +	record_from_ref(&rec, ref);
      +	return iterator_next(it, rec);
      +}
      +
      +int iterator_next_log(struct iterator it, struct log_record *log)
      +{
     -+	struct record rec = {};
     ++	struct record rec = { 0 };
      +	record_from_log(&rec, log);
      +	return iterator_next(it, rec);
      +}
     @@ -1221,7 +1156,7 @@
      +		}
      +
      +		if (fri->double_check) {
     -+			struct iterator it = {};
     ++			struct iterator it = { 0 };
      +
      +			int err = reader_seek_ref(fri->r, &it, ref->ref_name);
      +			if (err == 0) {
     @@ -1389,9 +1324,9 @@
      +bool iterator_is_null(struct iterator it);
      +
      +struct filtering_ref_iterator {
     ++	bool double_check;
      +	struct reader *r;
      +	struct slice oid;
     -+	bool double_check;
      +	struct iterator it;
      +};
      +
     @@ -1511,7 +1446,7 @@
      +
      +static int merged_iter_next(struct merged_iter *mi, struct record rec)
      +{
     -+	struct slice entry_key = {};
     ++	struct slice entry_key = { 0 };
      +	struct pq_entry entry = merged_iter_pqueue_remove(&mi->pq);
      +	int err = merged_iter_advance_subiter(mi, entry.index);
      +	if (err < 0) {
     @@ -1521,7 +1456,7 @@
      +	record_key(entry.rec, &entry_key);
      +	while (!merged_iter_pqueue_is_empty(mi->pq)) {
      +		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
     -+		struct slice k = {};
     ++		struct slice k = { 0 };
      +		int err = 0, cmp = 0;
      +
      +		record_key(top.rec, &k);
     @@ -1542,7 +1477,7 @@
      +		free(record_yield(&top.rec));
      +	}
      +
     -+	record_copy_from(rec, entry.rec, mi->hash_size);
     ++	record_copy_from(rec, entry.rec, hash_size(mi->hash_id));
      +	record_clear(entry.rec);
      +	free(record_yield(&entry.rec));
      +	free(slice_yield(&entry_key));
     @@ -1572,14 +1507,14 @@
      +}
      +
      +int new_merged_table(struct merged_table **dest, struct reader **stack, int n,
     -+		     int hash_size)
     ++		     uint32_t hash_id)
      +{
      +	uint64_t last_max = 0;
      +	uint64_t first_min = 0;
      +	int i = 0;
      +	for (i = 0; i < n; i++) {
      +		struct reader *r = stack[i];
     -+		if (reader_hash_size(r) != hash_size) {
     ++		if (r->hash_id != hash_id) {
      +			return FORMAT_ERROR;
      +		}
      +		if (i > 0 && last_max >= reader_min_update_index(r)) {
     @@ -1598,7 +1533,7 @@
      +			.stack_len = n,
      +			.min = first_min,
      +			.max = last_max,
     -+			.hash_size = hash_size,
     ++			.hash_id = hash_id,
      +		};
      +
      +		*dest = calloc(sizeof(struct merged_table), 1);
     @@ -1650,7 +1585,7 @@
      +	struct merged_iter merged = {
      +		.stack = iters,
      +		.typ = record_type(rec),
     -+		.hash_size = mt->hash_size,
     ++		.hash_id = mt->hash_id,
      +	};
      +	int n = 0;
      +	int err = 0;
     @@ -1693,7 +1628,7 @@
      +	struct ref_record ref = {
      +		.ref_name = (char *)name,
      +	};
     -+	struct record rec = {};
     ++	struct record rec = { 0 };
      +	record_from_ref(&rec, &ref);
      +	return merged_table_seek_record(mt, it, rec);
      +}
     @@ -1705,7 +1640,7 @@
      +		.ref_name = (char *)name,
      +		.update_index = update_index,
      +	};
     -+	struct record rec = {};
     ++	struct record rec = { 0 };
      +	record_from_log(&rec, &log);
      +	return merged_table_seek_record(mt, it, rec);
      +}
     @@ -1739,7 +1674,7 @@
      +struct merged_table {
      +	struct reader **stack;
      +	int stack_len;
     -+	int hash_size;
     ++	uint32_t hash_id;
      +
      +	uint64_t min;
      +	uint64_t max;
     @@ -1747,7 +1682,7 @@
      +
      +struct merged_iter {
      +	struct iterator *stack;
     -+	int hash_size;
     ++	uint32_t hash_id;
      +	int stack_len;
      +	byte typ;
      +	struct merged_iter_pqueue pq;
     @@ -1776,8 +1711,8 @@
      +
      +int pq_less(struct pq_entry a, struct pq_entry b)
      +{
     -+	struct slice ak = {};
     -+	struct slice bk = {};
     ++	struct slice ak = { 0 };
     ++	struct slice bk = { 0 };
      +	int cmp = 0;
      +	record_key(a.rec, &ak);
      +	record_key(b.rec, &bk);
     @@ -1896,8 +1831,8 @@
      +#include "record.h"
      +
      +struct pq_entry {
     -+	struct record rec;
      +	int index;
     ++	struct record rec;
      +};
      +
      +int pq_less(struct pq_entry a, struct pq_entry b);
     @@ -2005,9 +1940,9 @@
      +	block_source_return_block(r->source, p);
      +}
      +
     -+int reader_hash_size(struct reader *r)
     ++uint32_t reader_hash_id(struct reader *r)
      +{
     -+	return r->hash_size;
     ++	return r->hash_id;
      +}
      +
      +const char *reader_name(struct reader *r)
     @@ -2030,11 +1965,12 @@
      +		goto exit;
      +	}
      +
     -+	r->hash_size = SHA1_SIZE;
     ++	r->hash_id = SHA1_ID;
      +	{
      +		byte version = *f++;
      +		if (version == 2) {
     -+			r->hash_size = SHA256_SIZE;
     ++			/* DO NOT SUBMIT.  Not yet in the standard. */
     ++			r->hash_id = SHA256_ID;
      +			version = 1;
      +		}
      +		if (version != 1) {
     @@ -2043,33 +1979,33 @@
      +		}
      +	}
      +
     -+	r->block_size = get_u24(f);
     ++	r->block_size = get_be24(f);
      +
      +	f += 3;
     -+	r->min_update_index = get_u64(f);
     ++	r->min_update_index = get_be64(f);
      +	f += 8;
     -+	r->max_update_index = get_u64(f);
     ++	r->max_update_index = get_be64(f);
      +	f += 8;
      +
     -+	r->ref_offsets.index_offset = get_u64(f);
     ++	r->ref_offsets.index_offset = get_be64(f);
      +	f += 8;
      +
     -+	r->obj_offsets.offset = get_u64(f);
     ++	r->obj_offsets.offset = get_be64(f);
      +	f += 8;
      +
      +	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
      +	r->obj_offsets.offset >>= 5;
      +
     -+	r->obj_offsets.index_offset = get_u64(f);
     ++	r->obj_offsets.index_offset = get_be64(f);
      +	f += 8;
     -+	r->log_offsets.offset = get_u64(f);
     ++	r->log_offsets.offset = get_be64(f);
      +	f += 8;
     -+	r->log_offsets.index_offset = get_u64(f);
     ++	r->log_offsets.index_offset = get_be64(f);
      +	f += 8;
      +
      +	{
      +		uint32_t computed_crc = crc32(0, footer, f - footer);
     -+		uint32_t file_crc = get_u32(f);
     ++		uint32_t file_crc = get_be32(f);
      +		f += 4;
      +		if (computed_crc != file_crc) {
      +			err = FORMAT_ERROR;
     @@ -2092,15 +2028,15 @@
      +
      +int init_reader(struct reader *r, struct block_source source, const char *name)
      +{
     -+	struct block footer = {};
     -+	struct block header = {};
     ++	struct block footer = { 0 };
     ++	struct block header = { 0 };
      +	int err = 0;
      +
      +	memset(r, 0, sizeof(struct reader));
      +	r->size = block_source_size(source) - FOOTER_SIZE;
      +	r->source = source;
     -+	r->name = strdup(name);
     -+	r->hash_size = 0;
     ++	r->name = xstrdup(name);
     ++	r->hash_id = 0;
      +
      +	err = block_source_read_block(source, &footer, r->size, FOOTER_SIZE);
      +	if (err != FOOTER_SIZE) {
     @@ -2173,7 +2109,7 @@
      +
      +	*typ = data[0];
      +	if (is_block_type(*typ)) {
     -+		result = get_u24(data + 1);
     ++		result = get_be24(data + 1);
      +	}
      +	return result;
      +}
     @@ -2183,7 +2119,7 @@
      +{
      +	int32_t guess_block_size = r->block_size ? r->block_size :
      +						   DEFAULT_BLOCK_SIZE;
     -+	struct block block = {};
     ++	struct block block = { 0 };
      +	byte block_typ = 0;
      +	int err = 0;
      +	uint32_t header_off = next_off ? 0 : HEADER_SIZE;
     @@ -2217,14 +2153,14 @@
      +	}
      +
      +	return block_reader_init(br, &block, header_off, r->block_size,
     -+				 r->hash_size);
     ++				 hash_size(r->hash_id));
      +}
      +
      +static int table_iter_next_block(struct table_iter *dest,
      +				 struct table_iter *src)
      +{
      +	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
     -+	struct block_reader br = {};
     ++	struct block_reader br = { 0 };
      +	int err = 0;
      +
      +	dest->r = src->r;
     @@ -2257,7 +2193,7 @@
      +	}
      +
      +	while (true) {
     -+		struct table_iter next = {};
     ++		struct table_iter next = { 0 };
      +		int err = 0;
      +		if (ti->finished) {
      +			return 1;
     @@ -2307,7 +2243,7 @@
      +static int reader_table_iter_at(struct reader *r, struct table_iter *ti,
      +				uint64_t off, byte typ)
      +{
     -+	struct block_reader br = {};
     ++	struct block_reader br = { 0 };
      +	struct block_reader *brp = NULL;
      +
      +	int err = reader_init_block_reader(r, &br, off, typ);
     @@ -2344,9 +2280,9 @@
      +			      struct record want)
      +{
      +	struct record rec = new_record(record_type(want));
     -+	struct slice want_key = {};
     -+	struct slice got_key = {};
     -+	struct table_iter next = {};
     ++	struct slice want_key = { 0 };
     ++	struct slice got_key = { 0 };
     ++	struct table_iter next = { 0 };
      +	int err = -1;
      +	record_key(want, &want_key);
      +
     @@ -2394,12 +2330,12 @@
      +static int reader_seek_indexed(struct reader *r, struct iterator *it,
      +			       struct record rec)
      +{
     -+	struct index_record want_index = {};
     -+	struct record want_index_rec = {};
     -+	struct index_record index_result = {};
     -+	struct record index_result_rec = {};
     -+	struct table_iter index_iter = {};
     -+	struct table_iter next = {};
     ++	struct index_record want_index = { 0 };
     ++	struct record want_index_rec = { 0 };
     ++	struct index_record index_result = { 0 };
     ++	struct record index_result_rec = { 0 };
     ++	struct table_iter index_iter = { 0 };
     ++	struct table_iter next = { 0 };
      +	int err = 0;
      +
      +	record_key(rec, &want_index.last_key);
     @@ -2461,7 +2397,7 @@
      +{
      +	struct reader_offsets *offs = reader_offsets_for(r, record_type(rec));
      +	uint64_t idx = offs->index_offset;
     -+	struct table_iter ti = {};
     ++	struct table_iter ti = { 0 };
      +	int err = 0;
      +	if (idx > 0) {
      +		return reader_seek_indexed(r, it, rec);
     @@ -2503,7 +2439,7 @@
      +	struct ref_record ref = {
      +		.ref_name = (char *)name,
      +	};
     -+	struct record rec = {};
     ++	struct record rec = { 0 };
      +	record_from_ref(&rec, &ref);
      +	return reader_seek(r, it, rec);
      +}
     @@ -2515,7 +2451,7 @@
      +		.ref_name = (char *)name,
      +		.update_index = update_index,
      +	};
     -+	struct record rec = {};
     ++	struct record rec = { 0 };
      +	record_from_log(&rec, &log);
      +	return reader_seek(r, it, rec);
      +}
     @@ -2557,10 +2493,10 @@
      +		.hash_prefix = oid,
      +		.hash_prefix_len = r->object_id_len,
      +	};
     -+	struct record want_rec = {};
     -+	struct iterator oit = {};
     -+	struct obj_record got = {};
     -+	struct record got_rec = {};
     ++	struct record want_rec = { 0 };
     ++	struct iterator oit = { 0 };
     ++	struct obj_record got = { 0 };
     ++	struct record got_rec = { 0 };
      +	int err = 0;
      +
      +	record_from_obj(&want_rec, &want);
     @@ -2585,7 +2521,8 @@
      +
      +	{
      +		struct indexed_table_ref_iter *itr = NULL;
     -+		err = new_indexed_table_ref_iter(&itr, r, oid, r->hash_size,
     ++		err = new_indexed_table_ref_iter(&itr, r, oid,
     ++						 hash_size(r->hash_id),
      +						 got.offsets, got.offset_len);
      +		if (err < 0) {
      +			record_clear(got_rec);
     @@ -2675,9 +2612,9 @@
      +};
      +
      +struct reader {
     -+	struct block_source source;
      +	char *name;
     -+	int hash_size;
     ++	struct block_source source;
     ++	uint32_t hash_id;
      +	uint64_t size;
      +	uint32_t block_size;
      +	uint64_t min_update_index;
     @@ -2755,7 +2692,7 @@
      +
      +int put_var_int(struct slice dest, uint64_t val)
      +{
     -+	byte buf[10] = {};
     ++	byte buf[10] = { 0 };
      +	int i = 9;
      +	buf[i] = (byte)(val & 0x7f);
      +	i--;
     @@ -2868,11 +2805,11 @@
      +	   fields. */
      +	ref_record_clear(ref);
      +	if (src->ref_name != NULL) {
     -+		ref->ref_name = strdup(src->ref_name);
     ++		ref->ref_name = xstrdup(src->ref_name);
      +	}
      +
      +	if (src->target != NULL) {
     -+		ref->target = strdup(src->target);
     ++		ref->target = xstrdup(src->target);
      +	}
      +
      +	if (src->target_value != NULL) {
     @@ -2910,7 +2847,7 @@
      +
      +void ref_record_print(struct ref_record *ref, int hash_size)
      +{
     -+	char hex[SHA256_SIZE + 1] = {};
     ++	char hex[SHA256_SIZE + 1] = { 0 };
      +
      +	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
      +	if (ref->value != NULL) {
     @@ -3066,7 +3003,7 @@
      +		in.len -= hash_size;
      +		break;
      +	case 3: {
     -+		struct slice dest = {};
     ++		struct slice dest = { 0 };
      +		int n = decode_string(&dest, in);
      +		if (n < 0) {
      +			return -1;
     @@ -3302,7 +3239,7 @@
      +
      +void log_record_print(struct log_record *log, int hash_size)
      +{
     -+	char hex[SHA256_SIZE + 1] = {};
     ++	char hex[SHA256_SIZE + 1] = { 0 };
      +
      +	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
      +	       log->update_index, log->name, log->email, log->time,
     @@ -3326,7 +3263,7 @@
      +	slice_resize(dest, len + 9);
      +	memcpy(dest->buf, rec->ref_name, len + 1);
      +	ts = (~ts) - rec->update_index;
     -+	put_u64(dest->buf + 1 + len, ts);
     ++	put_be64(dest->buf + 1 + len, ts);
      +}
      +
      +static void log_record_copy_from(void *rec, const void *src_rec, int hash_size)
     @@ -3335,10 +3272,10 @@
      +	const struct log_record *src = (const struct log_record *)src_rec;
      +
      +	*dst = *src;
     -+	dst->ref_name = strdup(dst->ref_name);
     -+	dst->email = strdup(dst->email);
     -+	dst->name = strdup(dst->name);
     -+	dst->message = strdup(dst->message);
     ++	dst->ref_name = xstrdup(dst->ref_name);
     ++	dst->email = xstrdup(dst->email);
     ++	dst->name = xstrdup(dst->name);
     ++	dst->message = xstrdup(dst->message);
      +	if (dst->new_hash != NULL) {
      +		dst->new_hash = malloc(hash_size);
      +		memcpy(dst->new_hash, src->new_hash, hash_size);
     @@ -3371,7 +3308,7 @@
      +	return 1;
      +}
      +
     -+static byte zero[SHA256_SIZE] = {};
     ++static byte zero[SHA256_SIZE] = { 0 };
      +
      +static int log_record_encode(const void *rec, struct slice s, int hash_size)
      +{
     @@ -3421,7 +3358,7 @@
      +		return -1;
      +	}
      +
     -+	put_u16(s.buf, r->tz_offset);
     ++	put_be16(s.buf, r->tz_offset);
      +	s.buf += 2;
      +	s.len -= 2;
      +
     @@ -3442,7 +3379,7 @@
      +	struct log_record *r = (struct log_record *)rec;
      +	uint64_t max = 0;
      +	uint64_t ts = 0;
     -+	struct slice dest = {};
     ++	struct slice dest = { 0 };
      +	int n;
      +
      +	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
     @@ -3451,7 +3388,7 @@
      +
      +	r->ref_name = realloc(r->ref_name, key.len - 8);
      +	memcpy(r->ref_name, key.buf, key.len - 8);
     -+	ts = get_u64(key.buf + key.len - 8);
     ++	ts = get_be64(key.buf + key.len - 8);
      +
      +	r->update_index = (~max) - ts;
      +
     @@ -3503,7 +3440,7 @@
      +		goto error;
      +	}
      +
     -+	r->tz_offset = get_u16(in.buf);
     ++	r->tz_offset = get_be16(in.buf);
      +	in.buf += 2;
      +	in.len -= 2;
      +
     @@ -3810,6 +3747,18 @@
      +{
      +	/* XXX */
      +	return false;
     ++}
     ++
     ++int hash_size(uint32_t id)
     ++{
     ++	switch (id) {
     ++	case 0:
     ++	case SHA1_ID:
     ++		return SHA1_SIZE;
     ++	case SHA256_ID:
     ++		return SHA256_SIZE;
     ++	}
     ++	abort();
      +}
      
       diff --git a/reftable/record.h b/reftable/record.h
     @@ -3842,7 +3791,7 @@
      +	void (*clear)(void *rec);
      +};
      +
     -+/* record is a generic wrapper for differnt types of records. */
     ++/* record is a generic wrapper for different types of records. */
      +struct record {
      +	void *data;
      +	struct record_vtable *ops;
     @@ -3863,8 +3812,8 @@
      +	       struct slice in);
      +
      +struct index_record {
     -+	struct slice last_key;
      +	uint64_t offset;
     ++	struct slice last_key;
      +};
      +
      +struct obj_record {
     @@ -3965,9 +3914,10 @@
      +	/* how often to write complete keys in each block. */
      +	int restart_interval;
      +
     -+	/* width of the hash. Should be 20 for SHA1 or 32 for SHA256. Defaults
     -+	 * to SHA1 if unset */
     -+	int hash_size;
     ++	/* 4-byte identifier ("sha1", "s256") of the hash.
     ++	 * Defaults to SHA1 if unset
     ++	 */
     ++	uint32_t hash_id;
      +};
      +
      +/* ref_record holds a ref database entry target_value */
     @@ -4102,6 +4052,9 @@
      +/* Decompression error */
      +#define ZLIB_ERROR -7
      +
     ++/* Wrote a table without blocks. */
     ++#define EMPTY_TABLE_ERROR -8
     ++
      +const char *error_str(int err);
      +
      +/* new_writer creates a new writer */
     @@ -4162,10 +4115,10 @@
      +   struct reader *r = NULL;
      +   int err = new_reader(&r, src, "filename");
      +   if (err < 0) { ... }
     -+   struct iterator it = {};
     ++   struct iterator it  = {0};
      +   err = reader_seek_ref(r, &it, "refs/heads/master");
      +   if (err < 0) { ... }
     -+   struct ref_record ref = {};
     ++   struct ref_record ref  = {0};
      +   while (1) {
      +     err = iterator_next_ref(it, &ref);
      +     if (err > 0) {
     @@ -4181,8 +4134,8 @@
      + */
      +int reader_seek_ref(struct reader *r, struct iterator *it, const char *name);
      +
     -+/* returns the hash size used in this table. */
     -+int reader_hash_size(struct reader *r);
     ++/* returns the hash ID used in this table. */
     ++uint32_t reader_hash_id(struct reader *r);
      +
      +/* seek to logs for the given name, older than update_index. */
      +int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
     @@ -4211,10 +4164,10 @@
      +   array.
      +*/
      +int new_merged_table(struct merged_table **dest, struct reader **stack, int n,
     -+		     int hash_size);
     ++		     uint32_t hash_id);
      +
     -+/* returns the hash size used in this merged table. */
     -+int merged_hash_size(struct merged_table *mt);
     ++/* returns the hash id used in this merged table. */
     ++uint32_t merged_hash_id(struct merged_table *mt);
      +
      +/* returns an iterator positioned just before 'name' */
      +int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
     @@ -4403,7 +4356,7 @@
      +/* return a newly malloced string for this slice */
      +char *slice_to_string(struct slice in)
      +{
     -+	struct slice s = {};
     ++	struct slice s = { 0 };
      +	slice_resize(&s, in.len + 1);
      +	s.buf[in.len] = 0;
      +	memcpy(s.buf, in.buf, in.len);
     @@ -4585,8 +4538,8 @@
      +	struct stack *p = calloc(sizeof(struct stack), 1);
      +	int err = 0;
      +	*dest = NULL;
     -+	p->list_file = strdup(list_file);
     -+	p->reftable_dir = strdup(dir);
     ++	p->list_file = xstrdup(list_file);
     ++	p->reftable_dir = xstrdup(dir);
      +	p->config = config;
      +
      +	err = stack_reload(p);
     @@ -4690,7 +4643,7 @@
      +	int new_tables_len = 0;
      +	struct merged_table *new_merged = NULL;
      +
     -+	struct slice table_path = {};
     ++	struct slice table_path = { 0 };
      +
      +	while (*names) {
      +		struct reader *rd = NULL;
     @@ -4708,7 +4661,7 @@
      +		}
      +
      +		if (rd == NULL) {
     -+			struct block_source src = {};
     ++			struct block_source src = { 0 };
      +			slice_set_string(&table_path, st->reftable_dir);
      +			slice_append_string(&table_path, "/");
      +			slice_append_string(&table_path, name);
     @@ -4730,7 +4683,7 @@
      +
      +	/* success! */
      +	err = new_merged_table(&new_merged, new_tables, new_tables_len,
     -+			       st->config.hash_size);
     ++			       st->config.hash_id);
      +	if (err < 0) {
      +		goto exit;
      +	}
     @@ -4780,7 +4733,7 @@
      +
      +static int stack_reload_maybe_reuse(struct stack *st, bool reuse_open)
      +{
     -+	struct timeval deadline = {};
     ++	struct timeval deadline = { 0 };
      +	int err = gettimeofday(&deadline, NULL);
      +	int64_t delay = 0;
      +	int tries = 0;
     @@ -4792,7 +4745,7 @@
      +	while (true) {
      +		char **names = NULL;
      +		char **names_after = NULL;
     -+		struct timeval now = {};
     ++		struct timeval now = { 0 };
      +		int err = gettimeofday(&now, NULL);
      +		if (err < 0) {
      +			return err;
     @@ -4904,11 +4857,11 @@
      +int stack_try_add(struct stack *st,
      +		  int (*write_table)(struct writer *wr, void *arg), void *arg)
      +{
     -+	struct slice lock_name = {};
     -+	struct slice temp_tab_name = {};
     -+	struct slice tab_name = {};
     -+	struct slice next_name = {};
     -+	struct slice table_list = {};
     ++	struct slice lock_name = { 0 };
     ++	struct slice temp_tab_name = { 0 };
     ++	struct slice tab_name = { 0 };
     ++	struct slice next_name = { 0 };
     ++	struct slice table_list = { 0 };
      +	struct writer *wr = NULL;
      +	int err = 0;
      +	int tab_fd = 0;
     @@ -4962,6 +4915,10 @@
      +	}
      +
      +	err = writer_close(wr);
     ++	if (err == EMPTY_TABLE_ERROR) {
     ++		err = 0;
     ++		goto exit;
     ++	}
      +	if (err < 0) {
      +		goto exit;
      +	}
     @@ -5062,7 +5019,7 @@
      +				struct slice *temp_tab,
      +				struct log_expiry_config *config)
      +{
     -+	struct slice next_name = {};
     ++	struct slice next_name = { 0 };
      +	int tab_fd = -1;
      +	struct writer *wr = NULL;
      +	int err = 0;
     @@ -5113,9 +5070,9 @@
      +		calloc(sizeof(struct reader *), last - first + 1);
      +	struct merged_table *mt = NULL;
      +	int err = 0;
     -+	struct iterator it = {};
     -+	struct ref_record ref = {};
     -+	struct log_record log = {};
     ++	struct iterator it = { 0 };
     ++	struct ref_record ref = { 0 };
     ++	struct log_record log = { 0 };
      +
      +	int i = 0, j = 0;
      +	for (i = first, j = 0; i <= last; i++) {
     @@ -5126,7 +5083,7 @@
      +	writer_set_limits(wr, st->merged->stack[first]->min_update_index,
      +			  st->merged->stack[last]->max_update_index);
      +
     -+	err = new_merged_table(&mt, subtabs, subtabs_len, st->config.hash_size);
     ++	err = new_merged_table(&mt, subtabs, subtabs_len, st->config.hash_id);
      +	if (err < 0) {
      +		free(subtabs);
      +		goto exit;
     @@ -5207,11 +5164,11 @@
      +static int stack_compact_range(struct stack *st, int first, int last,
      +			       struct log_expiry_config *expiry)
      +{
     -+	struct slice temp_tab_name = {};
     -+	struct slice new_table_name = {};
     -+	struct slice lock_file_name = {};
     -+	struct slice ref_list_contents = {};
     -+	struct slice new_table_path = {};
     ++	struct slice temp_tab_name = { 0 };
     ++	struct slice new_table_name = { 0 };
     ++	struct slice lock_file_name = { 0 };
     ++	struct slice ref_list_contents = { 0 };
     ++	struct slice new_table_path = { 0 };
      +	int err = 0;
      +	bool have_lock = false;
      +	int lock_file_fd = 0;
     @@ -5220,6 +5177,7 @@
      +	char **subtable_locks = calloc(sizeof(char *), compact_count + 1);
      +	int i = 0;
      +	int j = 0;
     ++	bool is_empty_table = false;
      +
      +	if (first > last || (expiry == NULL && first == last)) {
      +		err = 0;
     @@ -5248,8 +5206,8 @@
      +	}
      +
      +	for (i = first, j = 0; i <= last; i++) {
     -+		struct slice subtab_name = {};
     -+		struct slice subtab_lock = {};
     ++		struct slice subtab_name = { 0 };
     ++		struct slice subtab_lock = { 0 };
      +		slice_set_string(&subtab_name, st->reftable_dir);
      +		slice_append_string(&subtab_name, "/");
      +		slice_append_string(&subtab_name,
     @@ -5288,6 +5246,12 @@
      +	have_lock = false;
      +
      +	err = stack_compact_locked(st, first, last, &temp_tab_name, expiry);
     ++	/* Compaction + tombstones can create an empty table out of non-empty
     ++	 * tables. */
     ++	is_empty_table = (err == EMPTY_TABLE_ERROR);
     ++	if (is_empty_table) {
     ++		err = 0;
     ++	}
      +	if (err < 0) {
      +		goto exit;
      +	}
     @@ -5313,10 +5277,12 @@
      +
      +	slice_append(&new_table_path, new_table_name);
      +
     -+	err = rename(slice_as_string(&temp_tab_name),
     -+		     slice_as_string(&new_table_path));
     -+	if (err < 0) {
     -+		goto exit;
     ++	if (!is_empty_table) {
     ++		err = rename(slice_as_string(&temp_tab_name),
     ++			     slice_as_string(&new_table_path));
     ++		if (err < 0) {
     ++			goto exit;
     ++		}
      +	}
      +
      +	for (i = 0; i < first; i++) {
     @@ -5324,8 +5290,10 @@
      +				    st->merged->stack[i]->name);
      +		slice_append_string(&ref_list_contents, "\n");
      +	}
     -+	slice_append(&ref_list_contents, new_table_name);
     -+	slice_append_string(&ref_list_contents, "\n");
     ++	if (!is_empty_table) {
     ++		slice_append(&ref_list_contents, new_table_name);
     ++		slice_append_string(&ref_list_contents, "\n");
     ++	}
      +	for (i = last + 1; i < st->merged->stack_len; i++) {
      +		slice_append_string(&ref_list_contents,
      +				    st->merged->stack[i]->name);
     @@ -5351,18 +5319,26 @@
      +	}
      +	have_lock = false;
      +
     -+	for (char **p = delete_on_success; *p; p++) {
     -+		if (strcmp(*p, slice_as_string(&new_table_path))) {
     -+			unlink(*p);
     ++	{
     ++		char **p = delete_on_success;
     ++		while (*p) {
     ++			if (strcmp(*p, slice_as_string(&new_table_path))) {
     ++				unlink(*p);
     ++			}
     ++			p++;
      +		}
      +	}
      +
      +	err = stack_reload_maybe_reuse(st, first < last);
      +exit:
     -+	for (char **p = subtable_locks; *p; p++) {
     -+		unlink(*p);
     -+	}
      +	free_names(delete_on_success);
     ++	{
     ++		char **p = subtable_locks;
     ++		while (*p) {
     ++			unlink(*p);
     ++			p++;
     ++		}
     ++	}
      +	free_names(subtable_locks);
      +	if (lock_file_fd > 0) {
      +		close(lock_file_fd);
     @@ -5413,7 +5389,7 @@
      +{
      +	struct segment *segs = calloc(sizeof(struct segment), n);
      +	int next = 0;
     -+	struct segment cur = {};
     ++	struct segment cur = { 0 };
      +	int i = 0;
      +	for (i = 0; i < n; i++) {
      +		int log = fastlog2(sizes[i]);
     @@ -5501,7 +5477,7 @@
      +int stack_read_ref(struct stack *st, const char *refname,
      +		   struct ref_record *ref)
      +{
     -+	struct iterator it = {};
     ++	struct iterator it = { 0 };
      +	struct merged_table *mt = stack_merged_table(st);
      +	int err = merged_table_seek_ref(mt, &it, refname);
      +	if (err) {
     @@ -5526,7 +5502,7 @@
      +int stack_read_log(struct stack *st, const char *refname,
      +		   struct log_record *log)
      +{
     -+	struct iterator it = {};
     ++	struct iterator it = { 0 };
      +	struct merged_table *mt = stack_merged_table(st);
      +	int err = merged_table_seek_log(mt, &it, refname);
      +	if (err) {
     @@ -5631,31 +5607,25 @@
      +#include <unistd.h>
      +#include <zlib.h>
      +
     -+#define ARRAY_SIZE(a) sizeof((a)) / sizeof((a)[0])
     -+#define FREE_AND_NULL(x)    \
     -+	do {                \
     -+		free(x);    \
     -+		(x) = NULL; \
     -+	} while (0)
     -+#define QSORT(arr, n, cmp) qsort(arr, n, sizeof(arr[0]), cmp)
     -+#define SWAP(a, b)                              \
     -+	{                                       \
     -+		char tmp[sizeof(a)];            \
     -+		assert(sizeof(a) == sizeof(b)); \
     -+		memcpy(&tmp[0], &a, sizeof(a)); \
     -+		memcpy(&a, &b, sizeof(a));      \
     -+		memcpy(&b, &tmp[0], sizeof(a)); \
     -+	}
     ++#include "compat.h"
     ++
      +#endif /* REFTABLE_IN_GITCORE */
      +
     ++#define SHA1_ID 0x73686131
     ++#define SHA256_ID 0x73323536
     ++#define SHA1_SIZE 20
     ++#define SHA256_SIZE 32
     ++
      +typedef uint8_t byte;
      +typedef int bool;
      +
     ++/* This is uncompress2, which is only available in zlib as of 2017.
     ++ *
     ++ * TODO: in git-core, this should fallback to uncompress2 if it is available.
     ++ */
      +int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
      +			       const Bytef *source, uLong *sourceLen);
     -+
     -+#define SHA1_SIZE 20
     -+#define SHA256_SIZE 32
     ++int hash_size(uint32_t id);
      +
      +#endif
      
     @@ -5761,6 +5731,25 @@
      +
      +#endif
      
     + diff --git a/reftable/update.sh b/reftable/update.sh
     + new file mode 100644
     + --- /dev/null
     + +++ b/reftable/update.sh
     +@@
     ++#!/bin/sh
     ++
     ++set -eux
     ++
     ++((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
     ++git clone https://github.com/google/reftable reftable-repo) && \
     ++cp reftable-repo/c/*.[ch] reftable/ && \
     ++cp reftable-repo/LICENSE reftable/ &&
     ++git --git-dir reftable-repo/.git show --no-patch origin/master \
     ++> reftable/VERSION && \
     ++sed -i~ 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|' reftable/system.h
     ++rm reftable/*_test.c reftable/test_framework.* reftable/compat.*
     ++git add reftable/*.[ch]
     +
       diff --git a/reftable/writer.c b/reftable/writer.c
       new file mode 100644
       --- /dev/null
     @@ -5831,8 +5820,8 @@
      +		opts->restart_interval = 16;
      +	}
      +
     -+	if (opts->hash_size == 0) {
     -+		opts->hash_size = SHA1_SIZE;
     ++	if (opts->hash_id == 0) {
     ++		opts->hash_id = SHA1_ID;
      +	}
      +	if (opts->block_size == 0) {
      +		opts->block_size = DEFAULT_BLOCK_SIZE;
     @@ -5842,11 +5831,14 @@
      +static int writer_write_header(struct writer *w, byte *dest)
      +{
      +	memcpy((char *)dest, "REFT", 4);
     -+	dest[4] = (w->hash_size == SHA1_SIZE) ? 1 : 2; /* version */
      +
     -+	put_u24(dest + 5, w->opts.block_size);
     -+	put_u64(dest + 8, w->min_update_index);
     -+	put_u64(dest + 16, w->max_update_index);
     ++	/* DO NOT SUBMIT.  This has not been encoded in the standard yet. */
     ++	dest[4] = (hash_size(w->opts.hash_id) == SHA1_SIZE) ? 1 : 2; /* version
     ++								      */
     ++
     ++	put_be24(dest + 5, w->opts.block_size);
     ++	put_be64(dest + 8, w->min_update_index);
     ++	put_be64(dest + 16, w->max_update_index);
      +	return 24;
      +}
      +
     @@ -5858,7 +5850,8 @@
      +	}
      +
      +	block_writer_init(&w->block_writer_data, typ, w->block,
     -+			  w->opts.block_size, block_start, w->hash_size);
     ++			  w->opts.block_size, block_start,
     ++			  hash_size(w->opts.hash_id));
      +	w->block_writer = &w->block_writer_data;
      +	w->block_writer->restart_interval = w->opts.restart_interval;
      +}
     @@ -5872,7 +5865,6 @@
      +		/* TODO - error return? */
      +		abort();
      +	}
     -+	wp->hash_size = opts->hash_size;
      +	wp->block = calloc(opts->block_size, 1);
      +	wp->write = writer_func;
      +	wp->write_arg = writer_arg;
     @@ -5941,7 +5933,7 @@
      +static int writer_add_record(struct writer *w, struct record rec)
      +{
      +	int result = -1;
     -+	struct slice key = {};
     ++	struct slice key = { 0 };
      +	int err = 0;
      +	record_key(rec, &key);
      +	if (slice_compare(w->last_key, key) >= 0) {
     @@ -5981,7 +5973,7 @@
      +
      +int writer_add_ref(struct writer *w, struct ref_record *ref)
      +{
     -+	struct record rec = {};
     ++	struct record rec = { 0 };
      +	struct ref_record copy = *ref;
      +	int err = 0;
      +
     @@ -6003,7 +5995,7 @@
      +	if (!w->opts.skip_index_objects && ref->value != NULL) {
      +		struct slice h = {
      +			.buf = ref->value,
     -+			.len = w->hash_size,
     ++			.len = hash_size(w->opts.hash_id),
      +		};
      +
      +		writer_index_hash(w, h);
     @@ -6011,7 +6003,7 @@
      +	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
      +		struct slice h = {
      +			.buf = ref->target_value,
     -+			.len = w->hash_size,
     ++			.len = hash_size(w->opts.hash_id),
      +		};
      +		writer_index_hash(w, h);
      +	}
     @@ -6047,7 +6039,7 @@
      +	w->pending_padding = 0;
      +
      +	{
     -+		struct record rec = {};
     ++		struct record rec = { 0 };
      +		int err;
      +		record_from_log(&rec, log);
      +		err = writer_add_record(w, rec);
     @@ -6094,7 +6086,7 @@
      +		w->index_len = 0;
      +		w->index_cap = 0;
      +		for (i = 0; i < idx_len; i++) {
     -+			struct record rec = {};
     ++			struct record rec = { 0 };
      +			record_from_index(&rec, idx + i);
      +			if (block_writer_add(w->block_writer, rec) == 0) {
      +				continue;
     @@ -6172,7 +6164,7 @@
      +		.offsets = entry->offsets,
      +		.offset_len = entry->offset_len,
      +	};
     -+	struct record rec = {};
     ++	struct record rec = { 0 };
      +	if (arg->err < 0) {
      +		goto exit;
      +	}
     @@ -6214,7 +6206,7 @@
      +static int writer_dump_object_index(struct writer *w)
      +{
      +	struct write_record_arg closure = { .w = w };
     -+	struct common_prefix_arg common = {};
     ++	struct common_prefix_arg common = { 0 };
      +	if (w->obj_index_tree != NULL) {
      +		infix_walk(w->obj_index_tree, &update_common, &common);
      +	}
     @@ -6271,39 +6263,43 @@
      +
      +	int err = writer_finish_public_section(w);
      +	if (err != 0) {
     -+		return err;
     ++		goto exit;
      +	}
      +
      +	writer_write_header(w, footer);
      +	p += 24;
     -+	put_u64(p, w->stats.ref_stats.index_offset);
     ++	put_be64(p, w->stats.ref_stats.index_offset);
      +	p += 8;
     -+	put_u64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
     ++	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
      +	p += 8;
     -+	put_u64(p, w->stats.obj_stats.index_offset);
     ++	put_be64(p, w->stats.obj_stats.index_offset);
      +	p += 8;
      +
     -+	put_u64(p, w->stats.log_stats.offset);
     ++	put_be64(p, w->stats.log_stats.offset);
      +	p += 8;
     -+	put_u64(p, w->stats.log_stats.index_offset);
     ++	put_be64(p, w->stats.log_stats.index_offset);
      +	p += 8;
      +
     -+	put_u32(p, crc32(0, footer, p - footer));
     ++	put_be32(p, crc32(0, footer, p - footer));
      +	p += 4;
      +	w->pending_padding = 0;
      +
     -+	{
     -+		int n = padded_write(w, footer, sizeof(footer), 0);
     -+		if (n < 0) {
     -+			return n;
     -+		}
     ++	err = padded_write(w, footer, sizeof(footer), 0);
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	if (w->stats.log_stats.entries + w->stats.ref_stats.entries == 0) {
     ++		err = EMPTY_TABLE_ERROR;
     ++		goto exit;
      +	}
      +
     ++exit:
      +	/* free up memory. */
      +	block_writer_clear(&w->block_writer_data);
      +	writer_clear_index(w);
      +	free(slice_yield(&w->last_key));
     -+	return 0;
     ++	return err;
      +}
      +
      +void writer_clear_index(struct writer *w)
     @@ -6348,7 +6344,7 @@
      +	if (debug) {
      +		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
      +			w->next, raw_bytes,
     -+			get_u24(w->block + w->block_writer->header_off + 1));
     ++			get_be24(w->block + w->block_writer->header_off + 1));
      +	}
      +
      +	if (w->next == 0) {
     @@ -6423,7 +6419,6 @@
      +	int (*write)(void *, byte *, int);
      +	void *write_arg;
      +	int pending_padding;
     -+	int hash_size;
      +	struct slice last_key;
      +
      +	uint64_t next;
 5:  8d95ab52f75 ! 6:  a622d8066c7 Reftable support for git-core
     @@ -9,32 +9,7 @@
           * Resolve spots marked with XXX
           * Test strategy?
      
     -    Example use:
     -
     -      $ ~/vc/git/git init --reftable
     -      warning: templates not found in /usr/local/google/home/hanwen/share/git-core/templates
     -      Initialized empty Git repository in /tmp/qz/.git/
     -      $ echo q > a
     -      $ ~/vc/git/git add a
     -      $ ~/vc/git/git commit -mx
     -      fatal: not a git repository (or any of the parent directories): .git
     -      [master (root-commit) 373d969] x
     -       1 file changed, 1 insertion(+)
     -       create mode 100644 a
     -      $ ~/vc/git/git show-ref
     -      373d96972fca9b63595740bba3898a762778ba20 HEAD
     -      373d96972fca9b63595740bba3898a762778ba20 refs/heads/master
     -      $ ls -l .git/reftable/
     -      total 12
     -      -rw------- 1 hanwen primarygroup  126 Jan 23 20:08 000000000001-000000000001.ref
     -      -rw------- 1 hanwen primarygroup 4277 Jan 23 20:08 000000000002-000000000002.ref
     -      $ go run ~/vc/reftable/cmd/dump.go  -table /tmp/qz/.git/reftable/000000000002-000000000002.ref
     -      ** DEBUG **
     -      name /tmp/qz/.git/reftable/000000000002-000000000002.ref, sz 4209: 'r' reftable.readerOffsets{Present:true, Offset:0x0, IndexOffset:0x0}, 'o' reftable.readerOffsets{Present:false, Offset:0x0, IndexOffset:0x0} 'g' reftable.readerOffsets{Present:true, Offset:0x1000, IndexOffset:0x0}
     -      ** REFS **
     -      reftable.RefRecord{RefName:"refs/heads/master", UpdateIndex:0x2, Value:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, TargetValue:[]uint8(nil), Target:""}
     -      ** LOGS **
     -      reftable.LogRecord{RefName:"HEAD", UpdateIndex:0x2, New:[]uint8{0x37, 0x3d, 0x96, 0x97, 0x2f, 0xca, 0x9b, 0x63, 0x59, 0x57, 0x40, 0xbb, 0xa3, 0x89, 0x8a, 0x76, 0x27, 0x78, 0xba, 0x20}, Old:[]uint8{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, Name:"Han-Wen Nienhuys", Email:"hanwen@google.com", Time:0x5e29ef27, TZOffset:100, Message:"commit (initial): x\n"}
     +    Example use: see t/t0031-reftable.sh
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
          Co-authored-by: Jeff King <peff@peff.net>
     @@ -463,7 +438,7 @@ $^
      +	struct ref_store *ref_store = (struct ref_store *)refs;
      +	struct write_options cfg = {
      +		.block_size = 4096,
     -+		.hash_size = the_hash_algo->rawsz,
     ++		.hash_id = the_hash_algo->format_id,
      +	};
      +	struct strbuf sb = STRBUF_INIT;
      +
     @@ -610,11 +585,13 @@ $^
      +	struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
      +	struct merged_table *mt = NULL;
      +
     -+	mt = stack_merged_table(refs->stack);
     -+	ri->err = refs->err;
     -+	if (ri->err == 0) {
     ++	if (refs->err < 0) {
     ++		ri->err = refs->err;
     ++	} else {
     ++		mt = stack_merged_table(refs->stack);
      +		ri->err = merged_table_seek_ref(mt, &ri->iter, prefix);
      +	}
     ++
      +	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
      +	ri->base.oid = &ri->oid;
      +	ri->flags = flags;
     @@ -739,7 +716,12 @@ $^
      +{
      +	struct reftable_ref_store *refs =
      +		(struct reftable_ref_store *)ref_store;
     -+	int err = stack_add(refs->stack, &write_transaction_table, transaction);
     ++	int err = 0;
     ++	if (refs->err < 0) {
     ++		return refs->err;
     ++	}
     ++
     ++	err = stack_add(refs->stack, &write_transaction_table, transaction);
      +	if (err < 0) {
      +		strbuf_addf(errmsg, "reftable: transaction failure %s",
      +			    error_str(err));
     @@ -819,6 +801,10 @@ $^
      +		.logmsg = msg,
      +		.flags = flags,
      +	};
     ++	if (refs->err < 0) {
     ++		return refs->err;
     ++	}
     ++
      +	return stack_add(refs->stack, &write_delete_refs_table, &arg);
      +}
      +
     @@ -826,6 +812,9 @@ $^
      +{
      +	struct reftable_ref_store *refs =
      +		(struct reftable_ref_store *)ref_store;
     ++	if (refs->err < 0) {
     ++		return refs->err;
     ++	}
      +	return stack_compact_all(refs->stack, NULL);
      +}
      +
     @@ -896,6 +885,9 @@ $^
      +					       .refname = refname,
      +					       .target = target,
      +					       .logmsg = logmsg };
     ++	if (refs->err < 0) {
     ++		return refs->err;
     ++	}
      +	return stack_add(refs->stack, &write_create_symref_table, &arg);
      +}
      +
     @@ -912,6 +904,7 @@ $^
      +	uint64_t ts = stack_next_update_index(arg->stack);
      +	struct ref_record ref = {};
      +	int err = stack_read_ref(arg->stack, arg->oldname, &ref);
     ++
      +	if (err) {
      +		goto exit;
      +	}
     @@ -987,6 +980,10 @@ $^
      +		.newname = newrefname,
      +		.logmsg = logmsg,
      +	};
     ++	if (refs->err < 0) {
     ++		return refs->err;
     ++	}
     ++
      +	return stack_add(refs->stack, &write_rename_table, &arg);
      +}
      +
     @@ -1083,10 +1080,16 @@ $^
      +	struct iterator it = {};
      +	struct reftable_ref_store *refs =
      +		(struct reftable_ref_store *)ref_store;
     -+	struct merged_table *mt = stack_merged_table(refs->stack);
     -+	int err = merged_table_seek_log(mt, &it, refname);
     ++	struct merged_table *mt = NULL;
     ++	int err = 0;
      +	struct log_record log = {};
      +
     ++	if (refs->err < 0) {
     ++		return refs->err;
     ++	}
     ++
     ++	mt = stack_merged_table(refs->stack);
     ++	err = merged_table_seek_log(mt, &it, refname);
      +	while (err == 0) {
      +		err = iterator_next_log(it, &log);
      +		if (err != 0) {
     @@ -1133,12 +1136,17 @@ $^
      +	struct iterator it = {};
      +	struct reftable_ref_store *refs =
      +		(struct reftable_ref_store *)ref_store;
     -+	struct merged_table *mt = stack_merged_table(refs->stack);
     -+	int err = merged_table_seek_log(mt, &it, refname);
     -+
     ++	struct merged_table *mt = NULL;
      +	struct log_record *logs = NULL;
      +	int cap = 0;
      +	int len = 0;
     ++	int err = 0;
     ++
     ++	if (refs->err < 0) {
     ++		return refs->err;
     ++	}
     ++	mt = stack_merged_table(refs->stack);
     ++	err = merged_table_seek_log(mt, &it, refname);
      +
      +	while (err == 0) {
      +		struct log_record log = {};
     @@ -1280,13 +1288,19 @@ $^
      +	*/
      +	struct reftable_ref_store *refs =
      +		(struct reftable_ref_store *)ref_store;
     -+	struct merged_table *mt = stack_merged_table(refs->stack);
     ++	struct merged_table *mt = NULL;
      +	struct reflog_expiry_arg arg = {
      +		.refs = refs,
      +	};
      +	struct log_record log = {};
      +	struct iterator it = {};
     -+	int err = merged_table_seek_log(mt, &it, refname);
     ++	int err = 0;
     ++	if (refs->err < 0) {
     ++		return refs->err;
     ++	}
     ++
     ++	mt = stack_merged_table(refs->stack);
     ++	err = merged_table_seek_log(mt, &it, refname);
      +	if (err < 0) {
      +		return err;
      +	}
     @@ -1326,7 +1340,12 @@ $^
      +	struct reftable_ref_store *refs =
      +		(struct reftable_ref_store *)ref_store;
      +	struct ref_record ref = {};
     -+	int err = stack_read_ref(refs->stack, refname, &ref);
     ++	int err = 0;
     ++	if (refs->err < 0) {
     ++		return refs->err;
     ++	}
     ++
     ++	err = stack_read_ref(refs->stack, refname, &ref);
      +	if (err) {
      +		goto exit;
      +	}
     @@ -1436,3 +1455,40 @@ $^
       	}
       
       	strbuf_release(&dir);
     +
     + diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
     + new file mode 100755
     + --- /dev/null
     + +++ b/t/t0031-reftable.sh
     +@@
     ++#!/bin/sh
     ++#
     ++# Copyright (c) 2020 Google LLC
     ++#
     ++
     ++test_description='reftable basics'
     ++
     ++. ./test-lib.sh
     ++
     ++test_expect_success 'basic operation of reftable storage' '
     ++        git init --ref-storage=reftable repo &&
     ++        cd repo &&
     ++        echo "hello" > world.txt &&
     ++        git add world.txt &&
     ++        git commit -m "first post" &&
     ++        printf "HEAD\nrefs/heads/master\n" > want &&
     ++        git show-ref | cut -f2 -d" " > got &&
     ++        test_cmp got want &&
     ++        for count in $(test_seq 1 10); do
     ++                echo "hello" >> world.txt
     ++                git commit -m "number ${count}" world.txt
     ++        done &&
     ++        git gc &&
     ++        nfiles=$(ls -1 .git/reftable | wc -l | tr -d "[ \t]" ) &&
     ++        test "${nfiles}" = "2" &&
     ++        git reflog refs/heads/master > output &&
     ++        grep "commit (initial): first post" output &&
     ++        grep "commit: number 10" output
     ++'
     ++
     ++test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v7 1/6] refs.h: clarify reflog iteration order
  2020-02-26  8:49           ` [PATCH v7 0/6] " Han-Wen Nienhuys via GitGitGadget
@ 2020-02-26  8:49             ` Han-Wen Nienhuys via GitGitGadget
  2020-02-26  8:49             ` [PATCH v7 2/6] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
                               ` (7 subsequent siblings)
  8 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-26  8:49 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/refs.h b/refs.h
index 545029c6d80..87c9ec921b9 100644
--- a/refs.h
+++ b/refs.h
@@ -444,18 +444,21 @@ int delete_refs(const char *msg, struct string_list *refnames,
 int refs_delete_reflog(struct ref_store *refs, const char *refname);
 int delete_reflog(const char *refname);
 
-/* iterate over reflog entries */
+/* Iterate over reflog entries. */
 typedef int each_reflog_ent_fn(
 		struct object_id *old_oid, struct object_id *new_oid,
 		const char *committer, timestamp_t timestamp,
 		int tz, const char *msg, void *cb_data);
 
+/* Iterate in over reflog entries, oldest entry first. */
 int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
 			     each_reflog_ent_fn fn, void *cb_data);
 int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
 				     const char *refname,
 				     each_reflog_ent_fn fn,
 				     void *cb_data);
+
+/* Call a function for each reflog entry, oldest entry first. */
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v7 2/6] create .git/refs in files-backend.c
  2020-02-26  8:49           ` [PATCH v7 0/6] " Han-Wen Nienhuys via GitGitGadget
  2020-02-26  8:49             ` [PATCH v7 1/6] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
@ 2020-02-26  8:49             ` Han-Wen Nienhuys via GitGitGadget
  2020-02-26  8:49             ` [PATCH v7 3/6] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
                               ` (6 subsequent siblings)
  8 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-26  8:49 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This prepares for supporting the reftable format, which will want
create its own file system layout in .git

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/init-db.c    | 2 --
 refs/files-backend.c | 6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 944ec77fe10..45bdea05890 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
 	 */
-	safe_create_dir(git_path("refs"), 1);
-	adjust_shared_perm(git_path("refs"));
 
 	if (refs_init_db(&err))
 		die("failed to set up refs db: %s", err.buf);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 561c33ac8a9..ab7899a9c77 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3157,9 +3157,15 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
 	struct strbuf sb = STRBUF_INIT;
 
+	files_ref_path(refs, &sb, "refs");
+	safe_create_dir(sb.buf, 1);
+	/* adjust permissions even if directory already exists. */
+	adjust_shared_perm(sb.buf);
+
 	/*
 	 * Create .git/refs/{heads,tags}
 	 */
+	strbuf_reset(&sb);
 	files_ref_path(refs, &sb, "refs/heads");
 	safe_create_dir(sb.buf, 1);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v7 3/6] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-02-26  8:49           ` [PATCH v7 0/6] " Han-Wen Nienhuys via GitGitGadget
  2020-02-26  8:49             ` [PATCH v7 1/6] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
  2020-02-26  8:49             ` [PATCH v7 2/6] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
@ 2020-02-26  8:49             ` Han-Wen Nienhuys via GitGitGadget
  2020-02-26  8:49             ` [PATCH v7 4/6] reftable: file format documentation Jonathan Nieder via GitGitGadget
                               ` (5 subsequent siblings)
  8 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-26  8:49 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index ff2436c0fb7..3490aac3a40 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -438,6 +438,11 @@ void base_ref_iterator_free(struct ref_iterator *iter);
 
 /* Virtual function declarations for ref_iterators: */
 
+/*
+ * backend-specific implementation of ref_iterator_advance.
+ * For symrefs, the function should set REF_ISSYMREF, and it should also
+ * dereference the symref to provide the OID referent.
+ */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v7 4/6] reftable: file format documentation
  2020-02-26  8:49           ` [PATCH v7 0/6] " Han-Wen Nienhuys via GitGitGadget
                               ` (2 preceding siblings ...)
  2020-02-26  8:49             ` [PATCH v7 3/6] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
@ 2020-02-26  8:49             ` Jonathan Nieder via GitGitGadget
  2020-02-26  8:49             ` [PATCH v7 5/6] Add reftable library Han-Wen Nienhuys via GitGitGadget
                               ` (4 subsequent siblings)
  8 siblings, 0 replies; 409+ messages in thread
From: Jonathan Nieder via GitGitGadget @ 2020-02-26  8:49 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder

From: Jonathan Nieder <jrnieder@gmail.com>

Shawn Pearce explains:

Some repositories contain a lot of references (e.g. android at 866k,
rails at 31k). The reftable format provides:

- Near constant time lookup for any single reference, even when the
  repository is cold and not in process or kernel cache.
- Near constant time verification a SHA-1 is referred to by at least
  one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.

This file format spec was originally written in July, 2017 by Shawn
Pearce.  Some refinements since then were made by Shawn and by Han-Wen
Nienhuys based on experiences implementing and experimenting with the
format.  (All of this was in the context of our work at Google and
Google is happy to contribute the result to the Git project.)

Imported from JGit[1]'s current version (c217d33ff,
"Documentation/technical/reftable: improve repo layout", 2020-02-04)
of Documentation/technical/reftable.md and converted to asciidoc by
running

  pandoc -t asciidoc -f markdown reftable.md >reftable.txt

using pandoc 2.2.1.  The result required the following additional
minor changes:

- removed the [TOC] directive to add a table of contents, since
  asciidoc does not support it
- replaced git-scm.com/docs links with linkgit: directives that link
  to other pages within Git's documentation

[1] https://eclipse.googlesource.com/jgit/jgit

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/Makefile               |    1 +
 Documentation/technical/reftable.txt | 1067 ++++++++++++++++++++++++++
 2 files changed, 1068 insertions(+)
 create mode 100644 Documentation/technical/reftable.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 8fe829cc1b8..3aab9b8d61a 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -92,6 +92,7 @@ TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
 TECH_DOCS += technical/protocol-v2
 TECH_DOCS += technical/racy-git
+TECH_DOCS += technical/reftable
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
 TECH_DOCS += technical/signature-format
diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
new file mode 100644
index 00000000000..9fa4657d9ff
--- /dev/null
+++ b/Documentation/technical/reftable.txt
@@ -0,0 +1,1067 @@
+reftable
+--------
+
+Overview
+~~~~~~~~
+
+Problem statement
+^^^^^^^^^^^^^^^^^
+
+Some repositories contain a lot of references (e.g. android at 866k,
+rails at 31k). The existing packed-refs format takes up a lot of space
+(e.g. 62M), and does not scale with additional references. Lookup of a
+single reference requires linearly scanning the file.
+
+Atomic pushes modifying multiple references require copying the entire
+packed-refs file, which can be a considerable amount of data moved
+(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
+
+Repositories with many loose references occupy a large number of disk
+blocks from the local file system, as each reference is its own file
+storing 41 bytes (and another file for the corresponding reflog). This
+negatively affects the number of inodes available when a large number of
+repositories are stored on the same filesystem. Readers can be penalized
+due to the larger number of syscalls required to traverse and read the
+`$GIT_DIR/refs` directory.
+
+Objectives
+^^^^^^^^^^
+
+* Near constant time lookup for any single reference, even when the
+repository is cold and not in process or kernel cache.
+* Near constant time verification if a SHA-1 is referred to by at least
+one reference (for allow-tip-sha1-in-want).
+* Efficient lookup of an entire namespace, such as `refs/tags/`.
+* Support atomic push with `O(size_of_update)` operations.
+* Combine reflog storage with ref storage for small transactions.
+* Separate reflog storage for base refs and historical logs.
+
+Description
+^^^^^^^^^^^
+
+A reftable file is a portable binary file format customized for
+reference storage. References are sorted, enabling linear scans, binary
+search lookup, and range scans.
+
+Storage in the file is organized into variable sized blocks. Prefix
+compression is used within a single block to reduce disk space. Block
+size and alignment is tunable by the writer.
+
+Performance
+^^^^^^^^^^^
+
+Space used, packed-refs vs. reftable:
+
+[cols=",>,>,>,>,>",options="header",]
+|===============================================================
+|repository |packed-refs |reftable |% original |avg ref |avg obj
+|android |62.2 M |36.1 M |58.0% |33 bytes |5 bytes
+|rails |1.8 M |1.1 M |57.7% |29 bytes |4 bytes
+|git |78.7 K |48.1 K |61.0% |50 bytes |4 bytes
+|git (heads) |332 b |269 b |81.0% |33 bytes |0 bytes
+|===============================================================
+
+Scan (read 866k refs), by reference name lookup (single ref from 866k
+refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs):
+
+[cols=",>,>,>,>",options="header",]
+|=========================================================
+|format |cache |scan |by name |by SHA-1
+|packed-refs |cold |402 ms |409,660.1 usec |412,535.8 usec
+|packed-refs |hot | |6,844.6 usec |20,110.1 usec
+|reftable |cold |112 ms |33.9 usec |323.2 usec
+|reftable |hot | |20.2 usec |320.8 usec
+|=========================================================
+
+Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable:
+
+[cols=",>,>",options="header",]
+|================================
+|format |size |avg entry
+|$GIT_DIR/logs |173 M |1209 bytes
+|reftable |5 M |37 bytes
+|================================
+
+Details
+~~~~~~~
+
+Peeling
+^^^^^^^
+
+References stored in a reftable are peeled, a record for an annotated
+(or signed) tag records both the tag object, and the object it refers
+to.
+
+Reference name encoding
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Reference names are an uninterpreted sequence of bytes that must pass
+linkgit:git-check-ref-format[1] as a valid reference name.
+
+Key unicity
+^^^^^^^^^^^
+
+Each entry must have a unique key; repeated keys are disallowed.
+
+Network byte order
+^^^^^^^^^^^^^^^^^^
+
+All multi-byte, fixed width fields are in network byte order.
+
+Ordering
+^^^^^^^^
+
+Blocks are lexicographically ordered by their first reference.
+
+Directory/file conflicts
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format accepts both `refs/heads/foo` and
+`refs/heads/foo/bar` as distinct references.
+
+This property is useful for retaining log records in reftable, but may
+confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
+references. Users of reftable may choose to continue to reject `foo` and
+`foo/bar` type conflicts to prevent problems for peers.
+
+File format
+~~~~~~~~~~~
+
+Structure
+^^^^^^^^^
+
+A reftable file has the following high-level structure:
+
+....
+first_block {
+  header
+  first_ref_block
+}
+ref_block*
+ref_index*
+obj_block*
+obj_index*
+log_block*
+log_index*
+footer
+....
+
+A log-only file omits the `ref_block`, `ref_index`, `obj_block` and
+`obj_index` sections, containing only the file header and log block:
+
+....
+first_block {
+  header
+}
+log_block*
+log_index*
+footer
+....
+
+in a log-only file the first log block immediately follows the file
+header, without padding to block alignment.
+
+Block size
+^^^^^^^^^^
+
+The file’s block size is arbitrarily determined by the writer, and does
+not have to be a power of 2. The block size must be larger than the
+longest reference name or log entry used in the repository, as
+references cannot span blocks.
+
+Powers of two that are friendly to the virtual memory system or
+filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
+yield better compression, with a possible increased cost incurred by
+readers during access.
+
+The largest block size is `16777215` bytes (15.99 MiB).
+
+Block alignment
+^^^^^^^^^^^^^^^
+
+Writers may choose to align blocks at multiples of the block size by
+including `padding` filled with NUL bytes at the end of a block to round
+out to the chosen alignment. When alignment is used, writers must
+specify the alignment with the file header’s `block_size` field.
+
+Block alignment is not required by the file format. Unaligned files must
+set `block_size = 0` in the file header, and omit `padding`. Unaligned
+files with more than one ref block must include the link:#Ref-index[ref
+index] to support fast lookup. Readers must be able to read both aligned
+and non-aligned files.
+
+Very small files (e.g. 1 only ref block) may omit `padding` and the ref
+index to reduce total file size.
+
+Header
+^^^^^^
+
+A 24-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+....
+
+Aligned files must specify `block_size` to configure readers with the
+expected block alignment. Unaligned files must set `block_size = 0`.
+
+The `min_update_index` and `max_update_index` describe bounds for the
+`update_index` field of all log records in this file. When reftables are
+used in a stack for link:#Update-transactions[transactions], these
+fields can order the files such that the prior file’s
+`max_update_index + 1` is the next file’s `min_update_index`.
+
+First ref block
+^^^^^^^^^^^^^^^
+
+The first ref block shares the same block as the file header, and is 24
+bytes smaller than all other blocks in the file. The first block
+immediately begins after the file header, at position 24.
+
+If the first block is a log block (a log-only file), its block header
+begins immediately at position 24.
+
+Ref block format
+^^^^^^^^^^^^^^^^
+
+A ref block is written as:
+
+....
+'r'
+uint24( block_len )
+ref_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
+encodes the number of bytes in the block up to, but not including the
+optional `padding`. This is always less than or equal to the file’s
+block size. In the first ref block, `block_len` includes 24 bytes for
+the file header.
+
+The 2-byte `restart_count` stores the number of entries in the
+`restart_offset` list, which must not be empty. Readers can use
+`restart_count` to binary search between restarts before starting a
+linear scan.
+
+Exactly `restart_count` 3-byte `restart_offset` values precedes the
+`restart_count`. Offsets are relative to the start of the block and
+refer to the first byte of any `ref_record` whose name has not been
+prefix compressed. Entries in the `restart_offset` list must be sorted,
+ascending. Readers can start linear scans from any of these records.
+
+A variable number of `ref_record` fill the middle of the block,
+describing reference names and values. The format is described below.
+
+As the first ref block shares the first file block with the file header,
+all `restart_offset` in the first block are relative to the start of the
+file (position 0), and include the file header. This forces the first
+`restart_offset` to be `28`.
+
+ref record
+++++++++++
+
+A `ref_record` describes a single reference, storing both the name and
+its value(s). Records are formatted as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | value_type )
+suffix
+varint( update_index_delta )
+value?
+....
+
+The `prefix_length` field specifies how many leading bytes of the prior
+reference record’s name should be copied to obtain this reference’s
+name. This must be 0 for the first reference in any block, and also must
+be 0 for any `ref_record` whose offset is listed in the `restart_offset`
+table at the end of the block.
+
+Recovering a reference name from any `ref_record` is a simple concat:
+
+....
+this_name = prior_name[0..prefix_length] + suffix
+....
+
+The `suffix_length` value provides the number of bytes available in
+`suffix` to copy from `suffix` to complete the reference name.
+
+The `update_index` that last modified the reference can be obtained by
+adding `update_index_delta` to the `min_update_index` from the file
+header: `min_update_index + update_index_delta`.
+
+The `value` follows. Its format is determined by `value_type`, one of
+the following:
+
+* `0x0`: deletion; no value data (see transactions, below)
+* `0x1`: one 20-byte object id; value of the ref
+* `0x2`: two 20-byte object ids; value of the ref, peeled target
+* `0x3`: symbolic reference: `varint( target_len ) target`
+
+Symbolic references use `0x3`, followed by the complete name of the
+reference target. No compression is applied to the target name.
+
+Types `0x4..0x7` are reserved for future use.
+
+Ref index
+^^^^^^^^^
+
+The ref index stores the name of the last reference from every ref block
+in the file, enabling reduced disk seeks for lookups. Any reference can
+be found by searching the index, identifying the containing block, and
+searching within that block.
+
+The index may be organized into a multi-level index, where the 1st level
+index block points to additional ref index blocks (2nd level), which may
+in turn point to either additional index blocks (e.g. 3rd level) or ref
+blocks (leaf level). Disk reads required to access a ref go up with
+higher index levels. Multi-level indexes may be required to ensure no
+single index block exceeds the file format’s max block size of
+`16777215` bytes (15.99 MiB). To acheive constant O(1) disk seeks for
+lookups the index must be a single level, which is permitted to exceed
+the file’s configured block size, but not the format’s max block size of
+15.99 MiB.
+
+If present, the ref index block(s) appears after the last ref block.
+
+If there are at least 4 ref blocks, a ref index block should be written
+to improve lookup times. Cold reads using the index require 2 disk reads
+(read index, read block), and binary searching < 4 blocks also requires
+<= 2 reads. Omitting the index block from smaller files saves space.
+
+If the file is unaligned and contains more than one ref block, the ref
+index must be written.
+
+Index block format:
+
+....
+'i'
+uint24( block_len )
+index_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+The index blocks begin with `block_type = 'i'` and a 3-byte `block_len`
+which encodes the number of bytes in the block, up to but not including
+the optional `padding`.
+
+The `restart_offset` and `restart_count` fields are identical in format,
+meaning and usage as in ref blocks.
+
+To reduce the number of reads required for random access in very large
+files the index block may be larger than other blocks. However, readers
+must hold the entire index in memory to benefit from this, so it’s a
+time-space tradeoff in both file size and reader memory.
+
+Increasing the file’s block size decreases the index size. Alternatively
+a multi-level index may be used, keeping index blocks within the file’s
+block size, but increasing the number of blocks that need to be
+accessed.
+
+index record
+++++++++++++
+
+An index record describes the last entry in another block. Index records
+are written as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | 0 )
+suffix
+varint( block_position )
+....
+
+Index records use prefix compression exactly like `ref_record`.
+
+Index records store `block_position` after the suffix, specifying the
+absolute position in bytes (from the start of the file) of the block
+that ends with this reference. Readers can seek to `block_position` to
+begin reading the block header.
+
+Readers must examine the block header at `block_position` to determine
+if the next block is another level index block, or the leaf-level ref
+block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the ref index must first read the footer (below) to
+obtain `ref_index_position`. If not present, the position will be 0. The
+`ref_index_position` is for the 1st level root of the ref index.
+
+Obj block format
+^^^^^^^^^^^^^^^^
+
+Object blocks are optional. Writers may choose to omit object blocks,
+especially if readers will not use the SHA-1 to ref mapping.
+
+Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to
+ref blocks containing references pointing to that object directly, or as
+the peeled value of an annotated tag. Like ref blocks, object blocks use
+the file’s standard block size. The abbrevation length is available in
+the footer as `obj_id_len`.
+
+To save space in small files, object blocks may be omitted if the ref
+index is not present, as brute force search will only need to read a few
+ref blocks. When missing, readers should brute force a linear search of
+all references to lookup by SHA-1.
+
+An object block is written as:
+
+....
+'o'
+uint24( block_len )
+obj_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Fields are identical to ref block. Binary search using the restart table
+works the same as in reference blocks.
+
+Because object identifiers are abbreviated by writers to the shortest
+unique abbreviation within the reftable, obj key lengths are variable
+between 2 and 20 bytes. Readers must compare only for common prefix
+match within an obj block or obj index.
+
+obj record
+++++++++++
+
+An `obj_record` describes a single object abbreviation, and the blocks
+containing references using that unique abbreviation:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | cnt_3 )
+suffix
+varint( cnt_large )?
+varint( position_delta )*
+....
+
+Like in reference blocks, abbreviations are prefix compressed within an
+obj block. On large reftables with many unique objects, higher block
+sizes (64k), and higher restart interval (128), a `prefix_length` of 2
+or 3 and `suffix_length` of 3 may be common in obj records (unique
+abbreviation of 5-6 raw bytes, 10-12 hex digits).
+
+Each record contains `position_count` number of positions for matching
+ref blocks. For 1-7 positions the count is stored in `cnt_3`. When
+`cnt_3 = 0` the actual count follows in a varint, `cnt_large`.
+
+The use of `cnt_3` bets most objects are pointed to by only a single
+reference, some may be pointed to by a couple of references, and very
+few (if any) are pointed to by more than 7 references.
+
+A special case exists when `cnt_3 = 0` and `cnt_large = 0`: there are no
+`position_delta`, but at least one reference starts with this
+abbreviation. A reader that needs exact reference names must scan all
+references to find which specific references have the desired object.
+Writers should use this format when the `position_delta` list would have
+overflowed the file’s block size due to a high number of references
+pointing to the same object.
+
+The first `position_delta` is the position from the start of the file.
+Additional `position_delta` entries are sorted ascending and relative to
+the prior entry, e.g. a reader would perform:
+
+....
+pos = position_delta[0]
+prior = pos
+for (j = 1; j < position_count; j++) {
+  pos = prior + position_delta[j]
+  prior = pos
+}
+....
+
+With a position in hand, a reader must linearly scan the ref block,
+starting from the first `ref_record`, testing each reference’s SHA-1s
+(for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
+SHA-1 within a single ref block is not supported by the reftable format.
+Smaller block sizes reduce the number of candidates this step must
+consider.
+
+Obj index
+^^^^^^^^^
+
+The obj index stores the abbreviation from the last entry for every obj
+block in the file, enabling reduced disk seeks for all lookups. It is
+formatted exactly the same as the ref index, but refers to obj blocks.
+
+The obj index should be present if obj blocks are present, as obj blocks
+should only be written in larger files.
+
+Readers loading the obj index must first read the footer (below) to
+obtain `obj_index_position`. If not present, the position will be 0.
+
+Log block format
+^^^^^^^^^^^^^^^^
+
+Unlike ref and obj blocks, log blocks are always unaligned.
+
+Log blocks are variable in size, and do not match the `block_size`
+specified in the file header or footer. Writers should choose an
+appropriate buffer size to prepare a log block for deflation, such as
+`2 * block_size`.
+
+A log block is written as:
+
+....
+'g'
+uint24( block_len )
+zlib_deflate {
+  log_record+
+  uint24( restart_offset )+
+  uint16( restart_count )
+}
+....
+
+Log blocks look similar to ref blocks, except `block_type = 'g'`.
+
+The 4-byte block header is followed by the deflated block contents using
+zlib deflate. The `block_len` in the header is the inflated size
+(including 4-byte block header), and should be used by readers to
+preallocate the inflation output buffer. A log block’s `block_len` may
+exceed the file’s block size.
+
+Offsets within the log block (e.g. `restart_offset`) still include the
+4-byte header. Readers may prefer prefixing the inflation output buffer
+with the 4-byte header.
+
+Within the deflate container, a variable number of `log_record` describe
+reference changes. The log record format is described below. See ref
+block format (above) for a description of `restart_offset` and
+`restart_count`.
+
+Because log blocks have no alignment or padding between blocks, readers
+must keep track of the bytes consumed by the inflater to know where the
+next log block begins.
+
+log record
+++++++++++
+
+Log record keys are structured as:
+
+....
+ref_name '\0' reverse_int64( update_index )
+....
+
+where `update_index` is the unique transaction identifier. The
+`update_index` field must be unique within the scope of a `ref_name`.
+See the update transactions section below for further details.
+
+The `reverse_int64` function inverses the value so lexographical
+ordering the network byte order encoding sorts the more recent records
+with higher `update_index` values first:
+
+....
+reverse_int64(int64 t) {
+  return 0xffffffffffffffff - t;
+}
+....
+
+Log records have a similar starting structure to ref and index records,
+utilizing the same prefix compression scheme applied to the log record
+key described above.
+
+....
+    varint( prefix_length )
+    varint( (suffix_length << 3) | log_type )
+    suffix
+    log_data {
+      old_id
+      new_id
+      varint( name_length    )  name
+      varint( email_length   )  email
+      varint( time_seconds )
+      sint16( tz_offset )
+      varint( message_length )  message
+    }?
+....
+
+Log record entries use `log_type` to indicate what follows:
+
+* `0x0`: deletion; no log data.
+* `0x1`: standard git reflog data using `log_data` above.
+
+The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
+entry from the reflog of `refs/stash` in a transaction file (below),
+without needing to rewrite larger files. Readers reading a stack of
+reflogs must treat this as a deletion.
+
+For `log_type = 0x1`, the `log_data` section follows
+linkgit:git-update-ref[1] logging and includes:
+
+* two 20-byte SHA-1s (old id, new id)
+* varint string of committer’s name
+* varint string of committer’s email
+* varint time in seconds since epoch (Jan 1, 1970)
+* 2-byte timezone offset in minutes (signed)
+* varint string of message
+
+`tz_offset` is the absolute number of minutes from GMT the committer was
+at the time of the update. For example `GMT-0800` is encoded in reftable
+as `sint16(-480)` and `GMT+0230` is `sint16(150)`.
+
+The committer email does not contain `<` or `>`, it’s the value normally
+found between the `<>` in a git commit object header.
+
+The `message_length` may be 0, in which case there was no message
+supplied for the update.
+
+Contrary to traditional reflog (which is a file), renames are encoded as
+a combination of ref deletion and ref creation.
+
+Reading the log
++++++++++++++++
+
+Readers accessing the log must first read the footer (below) to
+determine the `log_position`. The first block of the log begins at
+`log_position` bytes since the start of the file. The `log_position` is
+not block aligned.
+
+Importing logs
+++++++++++++++
+
+When importing from `$GIT_DIR/logs` writers should globally order all
+log records roughly by timestamp while preserving file order, and assign
+unique, increasing `update_index` values for each log line. Newer log
+records get higher `update_index` values.
+
+Although an import may write only a single reftable file, the reftable
+file must span many unique `update_index`, as each log line requires its
+own `update_index` to preserve semantics.
+
+Log index
+^^^^^^^^^
+
+The log index stores the log key
+(`refname \0 reverse_int64(update_index)`) for the last log record of
+every log block in the file, supporting bounded-time lookup.
+
+A log index block must be written if 2 or more log blocks are written to
+the file. If present, the log index appears after the last log block.
+There is no padding used to align the log index to block alignment.
+
+Log index format is identical to ref index, except the keys are 9 bytes
+longer to include `'\0'` and the 8-byte `reverse_int64(update_index)`.
+Records use `block_position` to refer to the start of a log block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the log index must first read the footer (below) to
+obtain `log_index_position`. If not present, the position will be 0.
+
+Footer
+^^^^^^
+
+After the last block of the file, a file footer is written. It begins
+like the file header, but is extended with additional data.
+
+A 68-byte footer appears at the end:
+
+....
+    'REFT'
+    uint8( version_number = 1 )
+    uint24( block_size )
+    uint64( min_update_index )
+    uint64( max_update_index )
+
+    uint64( ref_index_position )
+    uint64( (obj_position << 5) | obj_id_len )
+    uint64( obj_index_position )
+
+    uint64( log_position )
+    uint64( log_index_position )
+
+    uint32( CRC-32 of above )
+....
+
+If a section is missing (e.g. ref index) the corresponding position
+field (e.g. `ref_index_position`) will be 0.
+
+* `obj_position`: byte position for the first obj block.
+* `obj_id_len`: number of bytes used to abbreviate object identifiers in
+obj blocks.
+* `log_position`: byte position for the first log block.
+* `ref_index_position`: byte position for the start of the ref index.
+* `obj_index_position`: byte position for the start of the obj index.
+* `log_index_position`: byte position for the start of the log index.
+
+Reading the footer
+++++++++++++++++++
+
+Readers must seek to `file_length - 68` to access the footer. A trusted
+external source (such as `stat(2)`) is necessary to obtain
+`file_length`. When reading the footer, readers must verify:
+
+* 4-byte magic is correct
+* 1-byte version number is recognized
+* 4-byte CRC-32 matches the other 64 bytes (including magic, and
+version)
+
+Once verified, the other fields of the footer can be accessed.
+
+Varint encoding
+^^^^^^^^^^^^^^^
+
+Varint encoding is identical to the ofs-delta encoding method used
+within pack files.
+
+Decoder works such as:
+
+....
+val = buf[ptr] & 0x7f
+while (buf[ptr] & 0x80) {
+  ptr++
+  val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
+}
+....
+
+Binary search
+^^^^^^^^^^^^^
+
+Binary search within a block is supported by the `restart_offset` fields
+at the end of the block. Readers can binary search through the restart
+table to locate between which two restart points the sought reference or
+key should appear.
+
+Each record identified by a `restart_offset` stores the complete key in
+the `suffix` field of the record, making the compare operation during
+binary search straightforward.
+
+Once a restart point lexicographically before the sought reference has
+been identified, readers can linearly scan through the following record
+entries to locate the sought record, terminating if the current record
+sorts after (and therefore the sought key is not present).
+
+Restart point selection
++++++++++++++++++++++++
+
+Writers determine the restart points at file creation. The process is
+arbitrary, but every 16 or 64 records is recommended. Every 16 may be
+more suitable for smaller block sizes (4k or 8k), every 64 for larger
+block sizes (64k).
+
+More frequent restart points reduces prefix compression and increases
+space consumed by the restart table, both of which increase file size.
+
+Less frequent restart points makes prefix compression more effective,
+decreasing overall file size, with increased penalities for readers
+walking through more records after the binary search step.
+
+A maximum of `65535` restart points per block is supported.
+
+Considerations
+~~~~~~~~~~~~~~
+
+Lightweight refs dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format assumes the vast majority of references are single
+SHA-1 valued with common prefixes, such as Gerrit Code Review’s
+`refs/changes/` namespace, GitHub’s `refs/pulls/` namespace, or many
+lightweight tags in the `refs/tags/` namespace.
+
+Annotated tags storing the peeled object cost an additional 20 bytes per
+reference.
+
+Low overhead
+^^^^^^^^^^^^
+
+A reftable with very few references (e.g. git.git with 5 heads) is 269
+bytes for reftable, vs. 332 bytes for packed-refs. This supports
+reftable scaling down for transaction logs (below).
+
+Block size
+^^^^^^^^^^
+
+For a Gerrit Code Review type repository with many change refs, larger
+block sizes (64 KiB) and less frequent restart points (every 64) yield
+better compression due to more references within the block compressing
+against the prior reference.
+
+Larger block sizes reduce the index size, as the reftable will require
+fewer blocks to store the same number of references.
+
+Minimal disk seeks
+^^^^^^^^^^^^^^^^^^
+
+Assuming the index block has been loaded into memory, binary searching
+for any single reference requires exactly 1 disk seek to load the
+containing block.
+
+Scans and lookups dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Scanning all references and lookup by name (or namespace such as
+`refs/heads/`) are the most common activities performed on repositories.
+SHA-1s are stored directly with references to optimize this use case.
+
+Logs are infrequently read
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are infrequently accessed, but can be large. Deflating log blocks
+saves disk space, with some increased penalty at read time.
+
+Logs are stored in an isolated section from refs, reducing the burden on
+reference readers that want to ignore logs. Further, historical logs can
+be isolated into log-only files.
+
+Logs are read backwards
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are frequently accessed backwards (most recent N records for master
+to answer `master@{4}`), so log records are grouped by reference, and
+sorted descending by update index.
+
+Repository format
+~~~~~~~~~~~~~~~~~
+
+Version 1
+^^^^^^^^^
+
+A repository must set its `$GIT_DIR/config` to configure reftable:
+
+....
+[core]
+    repositoryformatversion = 1
+[extensions]
+    refStorage = reftable
+....
+
+Layout
+^^^^^^
+
+A collection of reftable files are stored in the `$GIT_DIR/reftable/`
+directory:
+
+....
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+where reftable files are named by a unique name such as produced by the
+function `${min_update_index}-${max_update_index}.ref`.
+
+Log-only files use the `.log` extension, while ref-only and mixed ref
+and log files use `.ref`. extension.
+
+The stack ordering file is `$GIT_DIR/reftable/tables.list` and lists the
+current files, one per line, in order, from oldest (base) to newest
+(most recent):
+
+....
+$ cat .git/reftable/tables.list
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+Readers must read `$GIT_DIR/reftable/tables.list` to determine which
+files are relevant right now, and search through the stack in reverse
+order (last reftable is examined first).
+
+Reftable files not listed in `tables.list` may be new (and about to be
+added to the stack by the active writer), or ancient and ready to be
+pruned.
+
+Backward compatibility
+^^^^^^^^^^^^^^^^^^^^^^
+
+Older clients should continue to recognize the directory as a git
+repository so they don’t look for an enclosing repository in parent
+directories. To this end, a reftable-enabled repository must contain the
+following dummy files
+
+* `.git/HEAD`, a regular file containing `ref: refs/heads/.invalid`.
+* `.git/refs/`, a directory
+* `.git/refs/heads`, a regular file
+
+Readers
+^^^^^^^
+
+Readers can obtain a consistent snapshot of the reference space by
+following:
+
+1.  Open and read the `tables.list` file.
+2.  Open each of the reftable files that it mentions.
+3.  If any of the files is missing, goto 1.
+4.  Read from the now-open files as long as necessary.
+
+Update transactions
+^^^^^^^^^^^^^^^^^^^
+
+Although reftables are immutable, mutations are supported by writing a
+new reftable and atomically appending it to the stack:
+
+1.  Acquire `tables.list.lock`.
+2.  Read `tables.list` to determine current reftables.
+3.  Select `update_index` to be most recent file’s
+`max_update_index + 1`.
+4.  Prepare temp reftable `tmp_XXXXXX`, including log entries.
+5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
+6.  Copy `tables.list` to `tables.list.lock`, appending file from (5).
+7.  Rename `tables.list.lock` to `tables.list`.
+
+During step 4 the new file’s `min_update_index` and `max_update_index`
+are both set to the `update_index` selected by step 3. All log records
+for the transaction use the same `update_index` in their keys. This
+enables later correlation of which references were updated by the same
+transaction.
+
+Because a single `tables.list.lock` file is used to manage locking, the
+repository is single-threaded for writers. Writers may have to busy-spin
+(with backoff) around creating `tables.list.lock`, for up to an
+acceptable wait period, aborting if the repository is too busy to
+mutate. Application servers wrapped around repositories (e.g. Gerrit
+Code Review) can layer their own lock/wait queue to improve fairness to
+writers.
+
+Reference deletions
+^^^^^^^^^^^^^^^^^^^
+
+Deletion of any reference can be explicitly stored by setting the `type`
+to `0x0` and omitting the `value` field of the `ref_record`. This serves
+as a tombstone, overriding any assertions about the existence of the
+reference from earlier files in the stack.
+
+Compaction
+^^^^^^^^^^
+
+A partial stack of reftables can be compacted by merging references
+using a straightforward merge join across reftables, selecting the most
+recent value for output, and omitting deleted references that do not
+appear in remaining, lower reftables.
+
+A compacted reftable should set its `min_update_index` to the smallest
+of the input files’ `min_update_index`, and its `max_update_index`
+likewise to the largest input `max_update_index`.
+
+For sake of illustration, assume the stack currently consists of
+reftable files (from oldest to newest): A, B, C, and D. The compactor is
+going to compact B and C, leaving A and D alone.
+
+1.  Obtain lock `tables.list.lock` and read the `tables.list` file.
+2.  Obtain locks `B.lock` and `C.lock`. Ownership of these locks
+prevents other processes from trying to compact these files.
+3.  Release `tables.list.lock`.
+4.  Compact `B` and `C` into a temp file
+`${min_update_index}-${max_update_index}_XXXXXX`.
+5.  Reacquire lock `tables.list.lock`.
+6.  Verify that `B` and `C` are still in the stack, in that order. This
+should always be the case, assuming that other processes are adhering to
+the locking protocol.
+7.  Rename `${min_update_index}-${max_update_index}_XXXXXX` to
+`${min_update_index}-${max_update_index}.ref`.
+8.  Write the new stack to `tables.list.lock`, replacing `B` and `C`
+with the file from (4).
+9.  Rename `tables.list.lock` to `tables.list`.
+10. Delete `B` and `C`, perhaps after a short sleep to avoid forcing
+readers to backtrack.
+
+This strategy permits compactions to proceed independently of updates.
+
+Each reftable (compacted or not) is uniquely identified by its name, so
+open reftables can be cached by their name.
+
+Alternatives considered
+~~~~~~~~~~~~~~~~~~~~~~~
+
+bzip packed-refs
+^^^^^^^^^^^^^^^^
+
+`bzip2` can significantly shrink a large packed-refs file (e.g. 62 MiB
+compresses to 23 MiB, 37%). However the bzip format does not support
+random access to a single reference. Readers must inflate and discard
+while performing a linear scan.
+
+Breaking packed-refs into chunks (individually compressing each chunk)
+would reduce the amount of data a reader must inflate, but still leaves
+the problem of indexing chunks to support readers efficiently locating
+the correct chunk.
+
+Given the compression achieved by reftable’s encoding, it does not seem
+necessary to add the complexity of bzip/gzip/zlib.
+
+Michael Haggerty’s alternate format
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Michael Haggerty proposed
+https://public-inbox.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe+wGvEJ7A@mail.gmail.com/[an
+alternate] format to reftable on the Git mailing list. This format uses
+smaller chunks, without the restart table, and avoids block alignment
+with padding. Reflog entries immediately follow each ref, and are thus
+interleaved between refs.
+
+Performance testing indicates reftable is faster for lookups (51%
+faster, 11.2 usec vs. 5.4 usec), although reftable produces a slightly
+larger file (+ ~3.2%, 28.3M vs 29.2M):
+
+[cols=">,>,>,>",options="header",]
+|=====================================
+|format |size |seek cold |seek hot
+|mh-alt |28.3 M |23.4 usec |11.2 usec
+|reftable |29.2 M |19.9 usec |5.4 usec
+|=====================================
+
+JGit Ketch RefTree
+^^^^^^^^^^^^^^^^^^
+
+https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg03073.html[JGit Ketch]
+proposed
+https://public-inbox.org/git/CAJo=hJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg@mail.gmail.com/[RefTree],
+an encoding of references inside Git tree objects stored as part of the
+repository’s object database.
+
+The RefTree format adds additional load on the object database storage
+layer (more loose objects, more objects in packs), and relies heavily on
+the packer’s delta compression to save space. Namespaces which are flat
+(e.g. thousands of tags in refs/tags) initially create very large loose
+objects, and so RefTree does not address the problem of copying many
+references to modify a handful.
+
+Flat namespaces are not efficiently searchable in RefTree, as tree
+objects in canonical formatting cannot be binary searched. This fails
+the need to handle a large number of references in a single namespace,
+such as GitHub’s `refs/pulls`, or a project with many tags.
+
+LMDB
+^^^^
+
+David Turner proposed
+https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/[using
+LMDB], as LMDB is lightweight (64k of runtime code) and GPL-compatible
+license.
+
+A downside of LMDB is its reliance on a single C implementation. This
+makes embedding inside JGit (a popular reimplemenation of Git)
+difficult, and hoisting onto virtual storage (for JGit DFS) virtually
+impossible.
+
+A common format that can be supported by all major Git implementations
+(git-core, JGit, libgit2) is strongly preferred.
+
+Future
+~~~~~~
+
+Longer hashes
+^^^^^^^^^^^^^
+
+Version will bump (e.g. 2) to indicate `value` uses a different object
+id length other than 20. The length could be stored in an expanded file
+header, or hardcoded as part of the version.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v7 5/6] Add reftable library
  2020-02-26  8:49           ` [PATCH v7 0/6] " Han-Wen Nienhuys via GitGitGadget
                               ` (3 preceding siblings ...)
  2020-02-26  8:49             ` [PATCH v7 4/6] reftable: file format documentation Jonathan Nieder via GitGitGadget
@ 2020-02-26  8:49             ` Han-Wen Nienhuys via GitGitGadget
  2020-02-26  8:49             ` [PATCH v7 6/6] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                               ` (3 subsequent siblings)
  8 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-26  8:49 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. Its rationale and
specification is in the preceding commit.

Further context and motivation can be found in background reading:

* Original discussion on JGit-dev:  https://www.eclipse.org/lists/jgit-dev/msg03389.html

* First design discussion on git@vger: https://public-inbox.org/git/CAJo=hJtTp2eA3z9wW9cHo-nA7kK40vVThqh6inXpbCcqfdMP9g@mail.gmail.com/

* Last design discussion on git@vger: https://public-inbox.org/git/CAJo=hJsZcAM9sipdVr7TMD-FD2V2W6_pvMQ791EGCDsDkQ033w@mail.gmail.com/

* First attempt at implementation: https://public-inbox.org/git/CAP8UFD0PPZSjBnxCA7ez91vBuatcHKQ+JUWvTD1iHcXzPBjPBg@mail.gmail.com/

* libgit2 support issue: https://github.com/libgit2/libgit2/issues

* GitLab support issue: https://gitlab.com/gitlab-org/git/issues/6

* go-git support issue: https://github.com/src-d/go-git/issues/1059

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE       |   31 ++
 reftable/README.md     |   11 +
 reftable/VERSION       |    5 +
 reftable/basics.c      |  160 ++++++
 reftable/basics.h      |   30 ++
 reftable/block.c       |  413 +++++++++++++++
 reftable/block.h       |   71 +++
 reftable/blocksource.h |   20 +
 reftable/bytes.c       |    0
 reftable/config.h      |    1 +
 reftable/constants.h   |   25 +
 reftable/dump.c        |   97 ++++
 reftable/file.c        |   97 ++++
 reftable/iter.c        |  229 ++++++++
 reftable/iter.h        |   56 ++
 reftable/merged.c      |  290 +++++++++++
 reftable/merged.h      |   34 ++
 reftable/pq.c          |  114 ++++
 reftable/pq.h          |   34 ++
 reftable/reader.c      |  720 ++++++++++++++++++++++++++
 reftable/reader.h      |   52 ++
 reftable/record.c      | 1119 ++++++++++++++++++++++++++++++++++++++++
 reftable/record.h      |   79 +++
 reftable/reftable.h    |  409 +++++++++++++++
 reftable/slice.c       |  199 +++++++
 reftable/slice.h       |   39 ++
 reftable/stack.c       | 1007 ++++++++++++++++++++++++++++++++++++
 reftable/stack.h       |   40 ++
 reftable/system.h      |   53 ++
 reftable/tree.c        |   66 +++
 reftable/tree.h        |   24 +
 reftable/update.sh     |   13 +
 reftable/writer.c      |  637 +++++++++++++++++++++++
 reftable/writer.h      |   45 ++
 reftable/zlib-compat.c |   92 ++++
 35 files changed, 6312 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 00000000000..402e0f9356b
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 00000000000..21500563fd4
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,11 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   sh reftable/update.sh
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 00000000000..f3f33eae71e
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,5 @@
+commit ccce3b3eb763e79b23a3b4677d65ecceb4155ba3
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Tue Feb 25 20:42:29 2020 +0100
+
+    C: rephrase the hash ID API in terms of a uint32_t
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 00000000000..ab2d09ef637
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,160 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)((i)&0xff);
+}
+
+uint32_t get_be24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)((i)&0xff);
+}
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(int sz, int (*f)(int k, void *args), void *args)
+{
+	int lo = 0;
+	int hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		int mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		free(*p);
+		p++;
+	}
+	free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = realloc(names,
+						names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *error_str(int err)
+{
+	switch (err) {
+	case IO_ERROR:
+		return "I/O error";
+	case FORMAT_ERROR:
+		return "FORMAT_ERROR";
+	case NOT_EXIST_ERROR:
+		return "NOT_EXIST_ERROR";
+	case LOCK_ERROR:
+		return "LOCK_ERROR";
+	case API_ERROR:
+		return "API_ERROR";
+	case ZLIB_ERROR:
+		return "ZLIB_ERROR";
+	case -1:
+		return "general error";
+	default:
+		return "unknown error code";
+	}
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 00000000000..fd0a9796f84
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,30 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+void put_be24(byte *out, uint32_t i);
+uint32_t get_be24(byte *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+int binsearch(int sz, int (*f)(int k, void *args), void *args);
+
+void free_names(char **a);
+void parse_names(char *buf, int size, char ***namesp);
+int names_equal(char **a, char **b);
+int names_length(char **names);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 00000000000..aa69c8abe21
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,413 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "blocksource.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct record rec)
+{
+	struct slice empty = { 0 };
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = { 0 };
+	int n = 0;
+
+	record_key(rec, &key);
+	n = encode_key(&restart, out, last, key, record_val_type(rec));
+	if (n < 0) {
+		goto err;
+	}
+	out.buf += n;
+	out.len -= n;
+
+	n = record_encode(rec, out, w->hash_size);
+	if (n < 0) {
+		goto err;
+	}
+
+	out.buf += n;
+	out.len -= n;
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0) {
+		goto err;
+	}
+
+	free(slice_yield(&key));
+	return 0;
+
+err:
+	free(slice_yield(&key));
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next) {
+		return -1;
+	}
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = { 0 };
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		slice_resize(&compressed, src_len);
+
+		while (1) {
+			uLongf dest_len = compressed.len;
+
+			zresult = compress2(compressed.buf, &dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				slice_resize(&compressed, 2 * compressed.len);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				free(slice_yield(&compressed));
+				return ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed.buf,
+			       dest_len);
+			w->next = dest_len + block_header_skip;
+			break;
+		}
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	if (!is_block_type(typ)) {
+		return FORMAT_ERROR;
+	}
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = { 0 };
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip;
+		uLongf src_len = block->len - block_header_skip;
+
+		slice_resize(&uncompressed, sz);
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			free(slice_yield(&uncompressed));
+			return ZLIB_ERROR;
+		}
+
+		block_source_return_block(block->source, block);
+		block->data = uncompressed.buf;
+		block->len = dst_len; /* XXX: 4 bytes missing? */
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	{
+		uint16_t restart_count = get_be16(block->data + sz - 2);
+		uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+		byte *restart_bytes = block->data + restart_start;
+
+		/* transfer ownership. */
+		br->block = *block;
+		block->data = NULL;
+		block->len = 0;
+
+		br->hash_size = hash_size;
+		br->block_len = restart_start;
+		br->full_block_size = full_block_size;
+		br->header_off = header_off;
+		br->restart_count = restart_count;
+		br->restart_bytes = restart_bytes;
+	}
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct slice key;
+	struct block_reader *r;
+};
+
+static int restart_key_less(int idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = { 0 };
+	struct slice last_key = { 0 };
+	byte unused_extra;
+	int n = decode_key(&rkey, &unused_extra, last_key, in);
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	{
+		int result = slice_compare(a->key, rkey);
+		free(slice_yield(&rkey));
+		return result;
+	}
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct record rec)
+{
+	if (it->next_off >= it->br->block_len) {
+		return 1;
+	}
+
+	{
+		struct slice in = {
+			.buf = it->br->block.data + it->next_off,
+			.len = it->br->block_len - it->next_off,
+		};
+		struct slice start = in;
+		struct slice key = { 0 };
+		byte extra;
+		int n = decode_key(&key, &extra, it->last_key, in);
+		if (n < 0) {
+			return -1;
+		}
+
+		in.buf += n;
+		in.len -= n;
+		n = record_decode(rec, key, extra, in, it->br->hash_size);
+		if (n < 0) {
+			return -1;
+		}
+		in.buf += n;
+		in.len -= n;
+
+		slice_copy(&it->last_key, key);
+		it->next_off += start.len - in.len;
+		free(slice_yield(&key));
+		return 0;
+	}
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = { 0 };
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	byte extra = 0;
+	int n = decode_key(key, &extra, empty, in);
+	if (n < 0) {
+		return n;
+	}
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	free(slice_yield(&it->last_key));
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		return -1;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	{
+		struct record rec = new_record(block_reader_type(br));
+		struct slice key = { 0 };
+		int result = 0;
+		int err = 0;
+		struct block_iter next = { 0 };
+		while (true) {
+			block_iter_copy_from(&next, it);
+
+			err = block_iter_next(&next, rec);
+			if (err < 0) {
+				result = -1;
+				goto exit;
+			}
+
+			record_key(rec, &key);
+			if (err > 0 || slice_compare(key, want) >= 0) {
+				result = 0;
+				goto exit;
+			}
+
+			block_iter_copy_from(it, &next);
+		}
+
+	exit:
+		free(slice_yield(&key));
+		free(slice_yield(&next.last_key));
+		record_clear(rec);
+		free(record_yield(&rec));
+
+		return result;
+	}
+}
+
+void block_writer_reset(struct block_writer *bw)
+{
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	free(slice_yield(&bw->last_key));
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 00000000000..66c7772c304
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,71 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+	uint32_t header_off;
+	int restart_interval;
+	int hash_size;
+
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+	struct slice last_key;
+	int entries;
+};
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+byte block_writer_type(struct block_writer *bw);
+int block_writer_add(struct block_writer *w, struct record rec);
+int block_writer_finish(struct block_writer *w);
+void block_writer_reset(struct block_writer *bw);
+void block_writer_clear(struct block_writer *bw);
+
+struct block_reader {
+	uint32_t header_off;
+	struct block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint32_t full_block_size;
+	uint16_t restart_count;
+};
+
+struct block_iter {
+	uint32_t next_off;
+	struct block_reader *br;
+	struct slice last_key;
+};
+
+int block_reader_init(struct block_reader *br, struct block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want);
+byte block_reader_type(struct block_reader *r);
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+int block_iter_next(struct block_iter *it, struct record rec);
+int block_iter_seek(struct block_iter *it, struct slice want);
+void block_iter_close(struct block_iter *it);
+
+#endif
diff --git a/reftable/blocksource.h b/reftable/blocksource.h
new file mode 100644
index 00000000000..f3ad3a4c229
--- /dev/null
+++ b/reftable/blocksource.h
@@ -0,0 +1,20 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCKSOURCE_H
+#define BLOCKSOURCE_H
+
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source source);
+
+#endif
diff --git a/reftable/bytes.c b/reftable/bytes.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/reftable/config.h b/reftable/config.h
new file mode 100644
index 00000000000..40a8c178f10
--- /dev/null
+++ b/reftable/config.h
@@ -0,0 +1 @@
+/* empty */
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 00000000000..85399fa4758
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,25 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define VERSION 1
+#define HEADER_SIZE 24
+#define FOOTER_SIZE 68
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 00000000000..c0065792a4f
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "reftable.h"
+
+static int dump_table(const char *tablename)
+{
+	struct block_source src = { 0 };
+	int err = block_source_from_file(&src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	struct reader *r = NULL;
+	err = new_reader(&r, src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct iterator it = { 0 };
+		err = reader_seek_ref(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+
+		struct ref_record ref = { 0 };
+		while (1) {
+			err = iterator_next_ref(it, &ref);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			ref_record_print(&ref, 20);
+		}
+		iterator_destroy(&it);
+		ref_record_clear(&ref);
+	}
+
+	{
+		struct iterator it = { 0 };
+		err = reader_seek_log(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+		struct log_record log = { 0 };
+		while (1) {
+			err = iterator_next_log(it, &log);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			log_record_print(&log, 20);
+		}
+		iterator_destroy(&it);
+		log_record_clear(&log);
+	}
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int opt;
+	const char *table = NULL;
+	while ((opt = getopt(argc, argv, "t:")) != -1) {
+		switch (opt) {
+		case 't':
+			table = xstrdup(optarg);
+			break;
+		case '?':
+			printf("usage: %s [-table tablefile]\n", argv[0]);
+			return 2;
+			break;
+		}
+	}
+
+	if (table != NULL) {
+		int err = dump_table(table);
+		if (err < 0) {
+			fprintf(stderr, "%s: %s: %s\n", argv[0], table,
+				error_str(err));
+			return 1;
+		}
+	}
+	return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 00000000000..253ec400253
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	free(b);
+}
+
+static int file_read_block(void *v, struct block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size) {
+		return -1;
+	}
+	dest->len = size;
+	return size;
+}
+
+struct block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int block_source_from_file(struct block_source *bs, const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0) {
+		return -1;
+	}
+
+	{
+		struct file_block_source *p =
+			calloc(sizeof(struct file_block_source), 1);
+		p->size = st.st_size;
+		p->fd = fd;
+
+		bs->ops = &file_vtable;
+		bs->arg = p;
+	}
+	return 0;
+}
+
+int fd_writer(void *arg, byte *data, int sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 00000000000..38464617c28
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,229 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct iterator it)
+{
+	return it.ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct record rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct iterator *it)
+{
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct iterator it, struct record rec)
+{
+	return it.ops->next(it.iter_arg, rec);
+}
+
+void iterator_destroy(struct iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int iterator_next_ref(struct iterator it, struct ref_record *ref)
+{
+	struct record rec = { 0 };
+	record_from_ref(&rec, ref);
+	return iterator_next(it, rec);
+}
+
+int iterator_next_log(struct iterator it, struct log_record *log)
+{
+	struct record rec = { 0 };
+	record_from_log(&rec, log);
+	return iterator_next(it, rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	free(slice_yield(&fri->oid));
+	iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct ref_record *ref = (struct ref_record *)rec.data;
+
+	while (true) {
+		int err = iterator_next_ref(fri->it, ref);
+		if (err != 0) {
+			return err;
+		}
+
+		if (fri->double_check) {
+			struct iterator it = { 0 };
+
+			int err = reader_seek_ref(fri->r, &it, ref->ref_name);
+			if (err == 0) {
+				err = iterator_next_ref(it, ref);
+			}
+
+			iterator_destroy(&it);
+
+			if (err < 0) {
+				return err;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+}
+
+struct iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reader_return_block(it->r, &it->block_reader.block);
+	free(slice_yield(&it->oid));
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reader_return_block(it->r, &it->block_reader.block);
+
+	{
+		uint64_t off = it->offsets[it->offset_idx++];
+		int err = reader_init_block_reader(it->r, &it->block_reader,
+						   off, BLOCK_TYPE_REF);
+		if (err < 0) {
+			return err;
+		}
+		if (err > 0) {
+			/* indexed block does not exist. */
+			return FORMAT_ERROR;
+		}
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct record rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct ref_record *ref = (struct ref_record *)rec.data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reader *r, byte *oid, int oid_len,
+			       uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter *itr =
+		calloc(sizeof(struct indexed_table_ref_iter), 1);
+	int err = 0;
+
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 00000000000..41538c6babd
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,56 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct iterator_vtable {
+	int (*next)(void *iter_arg, struct record rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct iterator *it);
+int iterator_next(struct iterator it, struct record rec);
+bool iterator_is_null(struct iterator it);
+
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reader *r;
+	struct slice oid;
+	struct iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct iterator *,
+					  struct filtering_ref_iterator *);
+
+struct indexed_table_ref_iter {
+	struct reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reader *r, byte *oid, int oid_len,
+			       uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 00000000000..c1e8a0aed9a
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,290 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct record rec = new_record(mi->typ);
+		int err = iterator_next(mi->stack[i], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			iterator_destroy(&mi->stack[i]);
+			record_clear(rec);
+			free(record_yield(&rec));
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		iterator_destroy(&mi->stack[i]);
+	}
+	free(mi->stack);
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, int idx)
+{
+	if (iterator_is_null(mi->stack[idx])) {
+		return 0;
+	}
+
+	{
+		struct record rec = new_record(mi->typ);
+		struct pq_entry e = {
+			.rec = rec,
+			.index = idx,
+		};
+		int err = iterator_next(mi->stack[idx], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			iterator_destroy(&mi->stack[idx]);
+			record_clear(rec);
+			free(record_yield(&rec));
+			return 0;
+		}
+
+		merged_iter_pqueue_add(&mi->pq, e);
+	}
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct record rec)
+{
+	struct slice entry_key = { 0 };
+	struct pq_entry entry = merged_iter_pqueue_remove(&mi->pq);
+	int err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0) {
+		return err;
+	}
+
+	record_key(entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = { 0 };
+		int err = 0, cmp = 0;
+
+		record_key(top.rec, &k);
+
+		cmp = slice_compare(k, entry_key);
+		free(slice_yield(&k));
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		record_clear(top.rec);
+		free(record_yield(&top.rec));
+	}
+
+	record_copy_from(rec, entry.rec, hash_size(mi->hash_id));
+	record_clear(entry.rec);
+	free(record_yield(&entry.rec));
+	free(slice_yield(&entry_key));
+	return 0;
+}
+
+static int merged_iter_next_void(void *p, struct record rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	return merged_iter_next(mi, rec);
+}
+
+struct iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct iterator *it,
+				      struct merged_iter *mi)
+{
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n,
+		     uint32_t hash_id)
+{
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reader *r = stack[i];
+		if (r->hash_id != hash_id) {
+			return FORMAT_ERROR;
+		}
+		if (i > 0 && last_max >= reader_min_update_index(r)) {
+			return FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reader_min_update_index(r);
+		}
+
+		last_max = reader_max_update_index(r);
+	}
+
+	{
+		struct merged_table m = {
+			.stack = stack,
+			.stack_len = n,
+			.min = first_min,
+			.max = last_max,
+			.hash_id = hash_id,
+		};
+
+		*dest = calloc(sizeof(struct merged_table), 1);
+		**dest = m;
+	}
+	return 0;
+}
+
+void merged_table_close(struct merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void merged_table_free(struct merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	free(mt);
+}
+
+uint64_t merged_max_update_index(struct merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t merged_min_update_index(struct merged_table *mt)
+{
+	return mt->min;
+}
+
+static int merged_table_seek_record(struct merged_table *mt,
+				    struct iterator *it, struct record rec)
+{
+	struct iterator *iters = calloc(sizeof(struct iterator), mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = record_type(rec),
+		.hash_id = mt->hash_id,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			iterator_destroy(&iters[i]);
+		}
+		free(iters);
+		return err;
+	}
+
+	merged.stack_len = n, err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	}
+
+	{
+		struct merged_iter *p = malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+			  const char *name)
+{
+	struct ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+			     const char *name, uint64_t update_index)
+{
+	struct log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+			  const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 00000000000..f2b20c5a26e
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct merged_table {
+	struct reader **stack;
+	int stack_len;
+	uint32_t hash_id;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct iterator *stack;
+	uint32_t hash_id;
+	int stack_len;
+	byte typ;
+	struct merged_iter_pqueue pq;
+} merged_iter;
+
+void merged_table_clear(struct merged_table *mt);
+
+#endif
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 00000000000..8b1f54a5bb3
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,114 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = { 0 };
+	struct slice bk = { 0 };
+	int cmp = 0;
+	record_key(a.rec, &ak);
+	record_key(b.rec, &bk);
+
+	cmp = slice_compare(ak, bk);
+
+	free(slice_yield(&ak));
+	free(slice_yield(&bk));
+
+	if (cmp == 0) {
+		return a.index > b.index;
+	}
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = realloc(pq->heap, pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		record_clear(pq->heap[i].rec);
+		free(record_yield(&pq->heap[i].rec));
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 00000000000..b585a52ee14
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 00000000000..90542d79e42
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,720 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct block_source source)
+{
+	return source.ops->size(source.arg);
+}
+
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size)
+{
+	int result = source.ops->read_block(source.arg, dest, off, size);
+	dest->source = source;
+	return result;
+}
+
+void block_source_return_block(struct block_source source, struct block *blockp)
+{
+	source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reader_offsets *reader_offsets_for(struct reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reader *r, struct block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size) {
+		return 0;
+	}
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(r->source, dest, off, sz);
+}
+
+void reader_return_block(struct reader *r, struct block *p)
+{
+	block_source_return_block(r->source, p);
+}
+
+uint32_t reader_hash_id(struct reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	int err = 0;
+	if (memcmp(f, "REFT", 4)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, HEADER_SIZE)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+
+	r->hash_id = SHA1_ID;
+	{
+		byte version = *f++;
+		if (version == 2) {
+			/* DO NOT SUBMIT.  Not yet in the standard. */
+			r->hash_id = SHA256_ID;
+			version = 1;
+		}
+		if (version != 1) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	{
+		uint32_t computed_crc = crc32(0, footer, f - footer);
+		uint32_t file_crc = get_be32(f);
+		f += 4;
+		if (computed_crc != file_crc) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	{
+		byte first_block_typ = header[HEADER_SIZE];
+		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+		r->ref_offsets.offset = 0;
+		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+					  r->log_offsets.offset > 0);
+		r->obj_offsets.present = r->obj_offsets.offset > 0;
+	}
+	err = 0;
+exit:
+	return err;
+}
+
+int init_reader(struct reader *r, struct block_source source, const char *name)
+{
+	struct block footer = { 0 };
+	struct block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reader));
+	r->size = block_source_size(source) - FOOTER_SIZE;
+	r->source = source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size, FOOTER_SIZE);
+	if (err != FOOTER_SIZE) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	/* Need +1 to read type of first block. */
+	err = reader_get_block(r, &header, 0, HEADER_SIZE + 1);
+	if (err != HEADER_SIZE + 1) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+exit:
+	block_source_return_block(r->source, &footer);
+	block_source_return_block(r->source, &header);
+	return err;
+}
+
+struct table_iter {
+	struct reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
+		((struct ref_record *)rec.data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reader_return_block(ti->r, &ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += 24;
+	}
+
+	*typ = data[0];
+	if (is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct block block = { 0 };
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : HEADER_SIZE;
+	int32_t block_size = 0;
+
+	if (next_off >= r->size) {
+		return 1;
+	}
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0) {
+		return err;
+	}
+
+	block_size = extract_block_size(block.data, &block_typ, next_off);
+	if (block_size < 0) {
+		return block_size;
+	}
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reader_return_block(r, &block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reader_return_block(r, &block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0) {
+		return err;
+	}
+
+	{
+		struct block_reader *brp = malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct record rec)
+{
+	if (record_type(rec) != ti->typ) {
+		return API_ERROR;
+	}
+
+	while (true) {
+		struct table_iter next = { 0 };
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct record rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct iterator *it, struct table_iter *ti)
+{
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reader *r, struct table_iter *ti,
+				uint64_t off, byte typ)
+{
+	struct block_reader br = { 0 };
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0) {
+		return err;
+	}
+
+	brp = malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reader *r, struct table_iter *ti, byte typ,
+			bool index)
+{
+	struct reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reader *r, struct table_iter *ti,
+			      struct record want)
+{
+	struct record rec = new_record(record_type(want));
+	struct slice want_key = { 0 };
+	struct slice got_key = { 0 };
+	struct table_iter next = { 0 };
+	int err = -1;
+	record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0) {
+			goto exit;
+		}
+		{
+			int cmp = slice_compare(got_key, want_key);
+			if (cmp > 0) {
+				table_iter_block_done(&next);
+				break;
+			}
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0) {
+		goto exit;
+	}
+	err = 0;
+
+exit:
+	block_iter_close(&next.bi);
+	record_clear(rec);
+	free(record_yield(&rec));
+	free(slice_yield(&want_key));
+	free(slice_yield(&got_key));
+	return err;
+}
+
+static int reader_seek_indexed(struct reader *r, struct iterator *it,
+			       struct record rec)
+{
+	struct index_record want_index = { 0 };
+	struct record want_index_rec = { 0 };
+	struct index_record index_result = { 0 };
+	struct record index_result_rec = { 0 };
+	struct table_iter index_iter = { 0 };
+	struct table_iter next = { 0 };
+	int err = 0;
+
+	record_key(rec, &want_index.last_key);
+	record_from_index(&want_index_rec, &want_index);
+	record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, record_type(rec), true);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reader_seek_linear(r, &index_iter, want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (next.typ == record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter *malloced =
+			calloc(sizeof(struct table_iter), 1);
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+exit:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	record_clear(want_index_rec);
+	record_clear(index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reader *r, struct iterator *it,
+				struct record rec)
+{
+	struct reader_offsets *offs = reader_offsets_for(r, record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = { 0 };
+	int err = 0;
+	if (idx > 0) {
+		return reader_seek_indexed(r, it, rec);
+	}
+
+	err = reader_start(r, &ti, record_type(rec), false);
+	if (err < 0) {
+		return err;
+	}
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct table_iter *p = malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reader *r, struct iterator *it, struct record rec)
+{
+	byte typ = record_type(rec);
+
+	struct reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name)
+{
+	struct ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return reader_seek(r, it, rec);
+}
+
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+		       uint64_t update_index)
+{
+	struct log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return reader_seek(r, it, rec);
+}
+
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int new_reader(struct reader **p, struct block_source src, char const *name)
+{
+	struct reader *rd = calloc(sizeof(struct reader), 1);
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		free(rd);
+	}
+	return err;
+}
+
+void reader_free(struct reader *r)
+{
+	reader_close(r);
+	free(r);
+}
+
+static int reader_refs_for_indexed(struct reader *r, struct iterator *it,
+				   byte *oid)
+{
+	struct obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct record want_rec = { 0 };
+	struct iterator oit = { 0 };
+	struct obj_record got = { 0 };
+	struct record got_rec = { 0 };
+	int err = 0;
+
+	record_from_obj(&want_rec, &want);
+
+	err = reader_seek(r, &oit, want_rec);
+	if (err != 0) {
+		return err;
+	}
+
+	record_from_obj(&got_rec, &got);
+	err = iterator_next(oit, got_rec);
+	iterator_destroy(&oit);
+	if (err < 0) {
+		return err;
+	}
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	{
+		struct indexed_table_ref_iter *itr = NULL;
+		err = new_indexed_table_ref_iter(&itr, r, oid,
+						 hash_size(r->hash_id),
+						 got.offsets, got.offset_len);
+		if (err < 0) {
+			record_clear(got_rec);
+			return err;
+		}
+		got.offsets = NULL;
+		record_clear(got_rec);
+
+		iterator_from_indexed_table_ref_iter(it, itr);
+	}
+
+	return 0;
+}
+
+static int reader_refs_for_unindexed(struct reader *r, struct iterator *it,
+				     byte *oid, int oid_len)
+{
+	struct table_iter *ti = calloc(sizeof(struct table_iter), 1);
+	struct filtering_ref_iterator *filter = NULL;
+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		free(ti);
+		return err;
+	}
+
+	filter = calloc(sizeof(struct filtering_ref_iterator), 1);
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	filter->r = r;
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
+		    int oid_len)
+{
+	if (r->obj_offsets.present) {
+		return reader_refs_for_indexed(r, it, oid);
+	}
+	return reader_refs_for_unindexed(r, it, oid, oid_len);
+}
+
+uint64_t reader_max_update_index(struct reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reader_min_update_index(struct reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 00000000000..4ea37c4cea6
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,52 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct block_source source);
+
+int block_source_read_block(struct block_source source, struct block *dest,
+			    uint64_t off, uint32_t size);
+void block_source_return_block(struct block_source source, struct block *ret);
+void block_source_close(struct block_source *source);
+
+struct reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+struct reader {
+	char *name;
+	struct block_source source;
+	uint32_t hash_id;
+	uint64_t size;
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	int object_id_len;
+
+	struct reader_offsets ref_offsets;
+	struct reader_offsets obj_offsets;
+	struct reader_offsets log_offsets;
+};
+
+int init_reader(struct reader *r, struct block_source source, const char *name);
+int reader_seek(struct reader *r, struct iterator *it, struct record rec);
+void reader_close(struct reader *r);
+const char *reader_name(struct reader *r);
+void reader_return_block(struct reader *r, struct block *p);
+int reader_init_block_reader(struct reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 00000000000..a839aa14b2c
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1119 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0) {
+		return -1;
+	}
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = { 0 };
+	int i = 9;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	{
+		int n = sizeof(buf) - i - 1;
+		if (dest.len < n) {
+			return -1;
+		}
+		memcpy(dest.buf, &buf[i + 1], n);
+		return n;
+	}
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+	if (in.len < tsize) {
+		return -1;
+	}
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	in.buf += tsize;
+	in.len -= tsize;
+
+	return start_len - in.len;
+}
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0) {
+		return -1;
+	}
+	dest.buf += n;
+	dest.len -= n;
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0) {
+		return -1;
+	}
+	dest.buf += n;
+	dest.len -= n;
+
+	if (dest.len < suffix_len) {
+		return -1;
+	}
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	dest.buf += suffix_len;
+	dest.len -= suffix_len;
+
+	return start.len - dest.len;
+}
+
+static byte ref_record_type(void)
+{
+	return BLOCK_TYPE_REF;
+}
+
+static void ref_record_key(const void *r, struct slice *dest)
+{
+	const struct ref_record *rec = (const struct ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void ref_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct ref_record *ref = (struct ref_record *)rec;
+	struct ref_record *src = (struct ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = xstrdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9) {
+		return '0' + c;
+	}
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void ref_record_print(struct ref_record *ref, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size);
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size);
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void ref_record_clear_void(void *rec)
+{
+	ref_record_clear((struct ref_record *)rec);
+}
+
+void ref_record_clear(struct ref_record *ref)
+{
+	free(ref->ref_name);
+	free(ref->target);
+	free(ref->target_value);
+	free(ref->value);
+	memset(ref, 0, sizeof(struct ref_record));
+}
+
+static byte ref_record_val_type(const void *rec)
+{
+	const struct ref_record *r = (const struct ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL) {
+		return 3;
+	}
+	return 0;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+	if (s.len < l) {
+		return -1;
+	}
+	memcpy(s.buf, str, l);
+	s.buf += l;
+	s.len -= l;
+
+	return start.len - s.len;
+}
+
+static int ref_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	const struct ref_record *r = (const struct ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		s.buf += hash_size;
+		s.len -= hash_size;
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		s.buf += hash_size;
+		s.len -= hash_size;
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		s.buf += n;
+		s.len -= n;
+	}
+
+	return start.len - s.len;
+}
+
+static int ref_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct ref_record *r = (struct ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0) {
+		return n;
+	}
+	assert(hash_size > 0);
+
+	in.buf += n;
+	in.len -= n;
+
+	r->ref_name = realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		in.buf += hash_size;
+		in.len -= hash_size;
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		in.buf += hash_size;
+		in.len -= hash_size;
+		break;
+	case 3: {
+		struct slice dest = { 0 };
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		in.buf += n;
+		in.len -= n;
+		seen_target = true;
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+
+	if (prefix_len > last_key.len) {
+		return -1;
+	}
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0) {
+		return -1;
+	}
+	in.buf += n;
+	in.len -= n;
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len) {
+		return -1;
+	}
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	in.buf += suffix_len;
+	in.len -= suffix_len;
+
+	return start_len - in.len;
+}
+
+struct record_vtable ref_record_vtable = {
+	.key = &ref_record_key,
+	.type = &ref_record_type,
+	.copy_from = &ref_record_copy_from,
+	.val_type = &ref_record_val_type,
+	.encode = &ref_record_encode,
+	.decode = &ref_record_decode,
+	.clear = &ref_record_clear_void,
+};
+
+static byte obj_record_type(void)
+{
+	return BLOCK_TYPE_OBJ;
+}
+
+static void obj_record_key(const void *r, struct slice *dest)
+{
+	const struct obj_record *rec = (const struct obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	const struct obj_record *src = (const struct obj_record *)src_rec;
+
+	*ref = *src;
+	ref->hash_prefix = malloc(ref->hash_prefix_len);
+	memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
+
+	{
+		int olen = ref->offset_len * sizeof(uint64_t);
+		ref->offsets = malloc(olen);
+		memcpy(ref->offsets, src->offsets, olen);
+	}
+}
+
+static void obj_record_clear(void *rec)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	FREE_AND_NULL(ref->hash_prefix);
+	FREE_AND_NULL(ref->offsets);
+	memset(ref, 0, sizeof(struct obj_record));
+}
+
+static byte obj_record_val_type(const void *rec)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8) {
+		return r->offset_len;
+	}
+	return 0;
+}
+
+static int obj_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		s.buf += n;
+		s.len -= n;
+	}
+	if (r->offset_len == 0) {
+		return start.len - s.len;
+	}
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	{
+		uint64_t last = r->offsets[0];
+		int i = 0;
+		for (i = 1; i < r->offset_len; i++) {
+			int n = put_var_int(s, r->offsets[i] - last);
+			if (n < 0) {
+				return -1;
+			}
+			s.buf += n;
+			s.len -= n;
+			last = r->offsets[i];
+		}
+	}
+	return start.len - s.len;
+}
+
+static int obj_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct obj_record *r = (struct obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	r->hash_prefix = malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		in.buf += n;
+		in.len -= n;
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0) {
+		return start.len - in.len;
+	}
+
+	r->offsets = malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0) {
+		return n;
+	}
+
+	in.buf += n;
+	in.len -= n;
+
+	{
+		uint64_t last = r->offsets[0];
+		int j = 1;
+		while (j < count) {
+			uint64_t delta = 0;
+			int n = get_var_int(&delta, in);
+			if (n < 0) {
+				return n;
+			}
+
+			in.buf += n;
+			in.len -= n;
+
+			last = r->offsets[j] = (delta + last);
+			j++;
+		}
+	}
+	return start.len - in.len;
+}
+
+struct record_vtable obj_record_vtable = {
+	.key = &obj_record_key,
+	.type = &obj_record_type,
+	.copy_from = &obj_record_copy_from,
+	.val_type = &obj_record_val_type,
+	.encode = &obj_record_encode,
+	.decode = &obj_record_decode,
+	.clear = &obj_record_clear,
+};
+
+void log_record_print(struct log_record *log, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size);
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size);
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static byte log_record_type(void)
+{
+	return BLOCK_TYPE_LOG;
+}
+
+static void log_record_key(const void *r, struct slice *dest)
+{
+	const struct log_record *rec = (const struct log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_be64(dest->buf + 1 + len, ts);
+}
+
+static void log_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct log_record *dst = (struct log_record *)rec;
+	const struct log_record *src = (const struct log_record *)src_rec;
+
+	*dst = *src;
+	dst->ref_name = xstrdup(dst->ref_name);
+	dst->email = xstrdup(dst->email);
+	dst->name = xstrdup(dst->name);
+	dst->message = xstrdup(dst->message);
+	if (dst->new_hash != NULL) {
+		dst->new_hash = malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void log_record_clear_void(void *rec)
+{
+	struct log_record *r = (struct log_record *)rec;
+	log_record_clear(r);
+}
+
+void log_record_clear(struct log_record *r)
+{
+	free(r->ref_name);
+	free(r->new_hash);
+	free(r->old_hash);
+	free(r->name);
+	free(r->email);
+	free(r->message);
+	memset(r, 0, sizeof(struct log_record));
+}
+
+static byte log_record_val_type(const void *rec)
+{
+	return 1;
+}
+
+static byte zero[SHA256_SIZE] = { 0 };
+
+static int log_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct log_record *r = (struct log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size) {
+		return -1;
+	}
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	s.buf += 2 * hash_size;
+	s.len -= 2 * hash_size;
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	n = put_var_int(s, r->time);
+	if (n < 0) {
+		return -1;
+	}
+	s.buf += n;
+	s.len -= n;
+
+	if (s.len < 2) {
+		return -1;
+	}
+
+	put_be16(s.buf, r->tz_offset);
+	s.buf += 2;
+	s.len -= 2;
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	s.len -= n;
+	s.buf += n;
+
+	return start.len - s.len;
+}
+
+static int log_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct log_record *r = (struct log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = { 0 };
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+		return FORMAT_ERROR;
+	}
+
+	r->ref_name = realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (in.len < 2 * hash_size) {
+		return FORMAT_ERROR;
+	}
+
+	r->old_hash = realloc(r->old_hash, hash_size);
+	r->new_hash = realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	in.buf += 2 * hash_size;
+	in.len -= 2 * hash_size;
+
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->name = realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->email = realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+	r->time = ts;
+	if (in.len < 2) {
+		goto error;
+	}
+
+	r->tz_offset = get_be16(in.buf);
+	in.buf += 2;
+	in.len -= 2;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	in.len -= n;
+	in.buf += n;
+
+	r->message = realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	return start.len - in.len;
+
+error:
+	free(slice_yield(&dest));
+	return FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL) {
+		a = empty;
+	}
+	if (b == NULL) {
+		b = empty;
+	}
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL) {
+		a = zero;
+	}
+	if (b == NULL) {
+		b = zero;
+	}
+	return !memcmp(a, b, sz);
+}
+
+bool log_record_equal(struct log_record *a, struct log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+struct record_vtable log_record_vtable = {
+	.key = &log_record_key,
+	.type = &log_record_type,
+	.copy_from = &log_record_copy_from,
+	.val_type = &log_record_val_type,
+	.encode = &log_record_encode,
+	.decode = &log_record_decode,
+	.clear = &log_record_clear_void,
+};
+
+struct record new_record(byte typ)
+{
+	struct record rec;
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct ref_record *r = calloc(1, sizeof(struct ref_record));
+		record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct obj_record *r = calloc(1, sizeof(struct obj_record));
+		record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct log_record *r = calloc(1, sizeof(struct log_record));
+		record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct index_record *r = calloc(1, sizeof(struct index_record));
+		record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+static byte index_record_type(void)
+{
+	return BLOCK_TYPE_INDEX;
+}
+
+static void index_record_key(const void *r, struct slice *dest)
+{
+	struct index_record *rec = (struct index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void index_record_copy_from(void *rec, const void *src_rec,
+				   int hash_size)
+{
+	struct index_record *dst = (struct index_record *)rec;
+	struct index_record *src = (struct index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void index_record_clear(void *rec)
+{
+	struct index_record *idx = (struct index_record *)rec;
+	free(slice_yield(&idx->last_key));
+}
+
+static byte index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int index_record_encode(const void *rec, struct slice out, int hash_size)
+{
+	const struct index_record *r = (const struct index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0) {
+		return n;
+	}
+
+	out.buf += n;
+	out.len -= n;
+
+	return start.len - out.len;
+}
+
+static int index_record_decode(void *rec, struct slice key, byte val_type,
+			       struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct index_record *r = (struct index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0) {
+		return n;
+	}
+
+	in.buf += n;
+	in.len -= n;
+	return start.len - in.len;
+}
+
+struct record_vtable index_record_vtable = {
+	.key = &index_record_key,
+	.type = &index_record_type,
+	.copy_from = &index_record_copy_from,
+	.val_type = &index_record_val_type,
+	.encode = &index_record_encode,
+	.decode = &index_record_decode,
+	.clear = &index_record_clear,
+};
+
+void record_key(struct record rec, struct slice *dest)
+{
+	rec.ops->key(rec.data, dest);
+}
+
+byte record_type(struct record rec)
+{
+	return rec.ops->type();
+}
+
+int record_encode(struct record rec, struct slice dest, int hash_size)
+{
+	return rec.ops->encode(rec.data, dest, hash_size);
+}
+
+void record_copy_from(struct record rec, struct record src, int hash_size)
+{
+	assert(src.ops->type() == rec.ops->type());
+
+	rec.ops->copy_from(rec.data, src.data, hash_size);
+}
+
+byte record_val_type(struct record rec)
+{
+	return rec.ops->val_type(rec.data);
+}
+
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size)
+{
+	return rec.ops->decode(rec.data, key, extra, src, hash_size);
+}
+
+void record_clear(struct record rec)
+{
+	return rec.ops->clear(rec.data);
+}
+
+void record_from_ref(struct record *rec, struct ref_record *ref_rec)
+{
+	rec->data = ref_rec;
+	rec->ops = &ref_record_vtable;
+}
+
+void record_from_obj(struct record *rec, struct obj_record *obj_rec)
+{
+	rec->data = obj_rec;
+	rec->ops = &obj_record_vtable;
+}
+
+void record_from_index(struct record *rec, struct index_record *index_rec)
+{
+	rec->data = index_rec;
+	rec->ops = &index_record_vtable;
+}
+
+void record_from_log(struct record *rec, struct log_record *log_rec)
+{
+	rec->data = log_rec;
+	rec->ops = &log_record_vtable;
+}
+
+void *record_yield(struct record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+struct ref_record *record_as_ref(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_REF);
+	return (struct ref_record *)rec.data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL) {
+		return !memcmp(a, b, hash_size);
+	}
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL) {
+		return 0 == strcmp(a, b);
+	}
+
+	return a == b;
+}
+
+bool ref_record_equal(struct ref_record *a, struct ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct ref_record *)a)->ref_name,
+		      ((struct ref_record *)b)->ref_name);
+}
+
+bool ref_record_is_deletion(const struct ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int log_record_compare_key(const void *a, const void *b)
+{
+	struct log_record *la = (struct log_record *)a;
+	struct log_record *lb = (struct log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp) {
+		return cmp;
+	}
+	if (la->update_index > lb->update_index) {
+		return -1;
+	}
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool log_record_is_deletion(const struct log_record *log)
+{
+	/* XXX */
+	return false;
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 00000000000..726a073f430
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,79 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+struct record_vtable {
+	void (*key)(const void *rec, struct slice *dest);
+	byte (*type)(void);
+	void (*copy_from)(void *rec, const void *src, int hash_size);
+	byte (*val_type)(const void *rec);
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+	void (*clear)(void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct record {
+	void *data;
+	struct record_vtable *ops;
+};
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+int common_prefix_size(struct slice a, struct slice b);
+
+int is_block_type(byte typ);
+struct record new_record(byte typ);
+
+extern struct record_vtable ref_record_vtable;
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra);
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in);
+
+struct index_record {
+	uint64_t offset;
+	struct slice last_key;
+};
+
+struct obj_record {
+	byte *hash_prefix;
+	int hash_prefix_len;
+	uint64_t *offsets;
+	int offset_len;
+};
+
+void record_key(struct record rec, struct slice *dest);
+byte record_type(struct record rec);
+void record_copy_from(struct record rec, struct record src, int hash_size);
+byte record_val_type(struct record rec);
+int record_encode(struct record rec, struct slice dest, int hash_size);
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size);
+void record_clear(struct record rec);
+void *record_yield(struct record *rec);
+void record_from_obj(struct record *rec, struct obj_record *objrec);
+void record_from_index(struct record *rec, struct index_record *idxrec);
+void record_from_ref(struct record *rec, struct ref_record *refrec);
+void record_from_log(struct record *rec, struct log_record *logrec);
+struct ref_record *record_as_ref(struct record ref);
+
+/* for qsort. */
+int ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 00000000000..e61669bb539
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,409 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct block_source {
+	struct block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct block {
+	uint8_t *data;
+	int len;
+	struct block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct block *dest, uint64_t off,
+			  uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int block_source_from_file(struct block_source *block_src, const char *name);
+
+/* write_options sets options for writing a single reftable. */
+struct write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+};
+
+/* ref_record holds a ref database entry target_value */
+struct ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int ref_record_is_deletion(const struct ref_record *ref);
+
+/* prints a ref_record onto stdout */
+void ref_record_print(struct ref_record *ref, int hash_size);
+
+/* frees and nulls all pointer values. */
+void ref_record_clear(struct ref_record *ref);
+
+/* returns whether two ref_records are the same */
+int ref_record_equal(struct ref_record *a, struct ref_record *b, int hash_size);
+
+/* log_record holds a reflog entry */
+struct log_record {
+	char *ref_name;
+	uint64_t update_index;
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int log_record_is_deletion(const struct log_record *log);
+
+/* frees and nulls all pointer values. */
+void log_record_clear(struct log_record *log);
+
+/* returns whether two records are equal. */
+int log_record_equal(struct log_record *a, struct log_record *b, int hash_size);
+
+void log_record_print(struct log_record *log, int hash_size);
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct iterator {
+	struct iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_ref(struct iterator it, struct ref_record *ref);
+
+/* reads the next log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int iterator_next_log(struct iterator it, struct log_record *log);
+
+/* releases resources associated with an iterator. */
+void iterator_destroy(struct iterator *it);
+
+/* block_stats holds statistics for a single block type */
+struct block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct block_stats obj_stats;
+	/* stats for index blocks */
+	struct block_stats idx_stats;
+	/* stats for log blocks */
+	struct block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* different types of errors */
+
+/* Unexpected file system behavior */
+#define IO_ERROR -2
+
+/* Format inconsistency on reading data
+ */
+#define FORMAT_ERROR -3
+
+/* File does not exist. Returned from block_source_from_file(),  because it
+   needs special handling in stack.
+*/
+#define NOT_EXIST_ERROR -4
+
+/* Trying to write out-of-date data. */
+#define LOCK_ERROR -5
+
+/* Misuse of the API:
+   - on writing a record with NULL ref_name.
+   - on writing a ref_record outside the table limits
+   - on writing a ref or log record before the stack's next_update_index
+   - on reading a ref_record from log iterator, or vice versa.
+ */
+#define API_ERROR -6
+
+/* Decompression error */
+#define ZLIB_ERROR -7
+
+/* Wrote a table without blocks. */
+#define EMPTY_TABLE_ERROR -8
+
+const char *error_str(int err);
+
+/* new_writer creates a new writer */
+struct writer *new_writer(int (*writer_func)(void *, uint8_t *, int),
+			  void *writer_arg, struct write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int fd_writer(void *fdp, uint8_t *data, int size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   stack_next_update_index(), or API_ERROR is returned.
+ */
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max);
+
+/* adds a ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   writer_set_limits(), or API_ERROR is returned.
+ */
+int writer_add_ref(struct writer *w, struct ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n);
+
+/* adds a log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int writer_add_log(struct writer *w, struct log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int writer_add_logs(struct writer *w, struct log_record *logs, int n);
+
+/* writer_close finalizes the reftable. The writer is retained so statistics can
+ * be inspected. */
+int writer_close(struct writer *w);
+
+/* writer_stats returns the statistics on the reftable being written. */
+struct stats *writer_stats(struct writer *w);
+
+/* writer_free deallocates memory for the writer */
+void writer_free(struct writer *w);
+
+struct reader;
+
+/* new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp.  The name is used for creating a
+ * stack. Typically, it is the basename of the file.
+ */
+int new_reader(struct reader **pp, struct block_source, const char *name);
+
+/* reader_seek_ref returns an iterator where 'name' would be inserted in the
+   table.
+
+   example:
+
+   struct reader *r = NULL;
+   int err = new_reader(&r, src, "filename");
+   if (err < 0) { ... }
+   struct iterator it  = {0};
+   err = reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct ref_record ref  = {0};
+   while (1) {
+     err = iterator_next_ref(it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   iterator_destroy(&it);
+   ref_record_clear(&ref);
+ */
+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reader_hash_id(struct reader *r);
+
+/* seek to logs for the given name, older than update_index. */
+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
+		       uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reader_seek_log(struct reader *r, struct iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reader_free(struct reader *);
+
+/* return an iterator for the refs pointing to oid */
+int reader_refs_for(struct reader *r, struct iterator *it, uint8_t *oid,
+		    int oid_len);
+
+/* return the max_update_index for a table */
+uint64_t reader_max_update_index(struct reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reader_min_update_index(struct reader *r);
+
+/* a merged table is implements seeking/iterating over a stack of tables. */
+struct merged_table;
+
+/* new_merged_table creates a new merged table. It takes ownership of the stack
+   array.
+*/
+int new_merged_table(struct merged_table **dest, struct reader **stack, int n,
+		     uint32_t hash_id);
+
+/* returns the hash id used in this merged table. */
+uint32_t merged_hash_id(struct merged_table *mt);
+
+/* returns an iterator positioned just before 'name' */
+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
+			  const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
+			     const char *name, uint64_t update_index);
+
+/* like merged_table_seek_log_at but look for the newest entry. */
+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
+			  const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t merged_max_update_index(struct merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t merged_min_update_index(struct merged_table *mt);
+
+/* closes readers for the merged tables */
+void merged_table_close(struct merged_table *mt);
+
+/* releases memory for the merged_table */
+void merged_table_free(struct merged_table *m);
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct stack;
+
+/* open a new reftable stack. The tables will be stored in 'dir', while the list
+   of tables is in 'list_file'. Typically, this should be .git/reftables and
+   .git/refs respectively.
+*/
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+	      struct write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t stack_next_update_index(struct stack *st);
+
+/* add a new table to the stack. The write_table function must call
+   writer_set_limits, add refs and return an error value. */
+int stack_add(struct stack *st,
+	      int (*write_table)(struct writer *wr, void *write_arg),
+	      void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct merged_table *stack_merged_table(struct stack *st);
+
+/* frees all resources associated with the stack. */
+void stack_destroy(struct stack *st);
+
+/* reloads the stack if necessary. */
+int stack_reload(struct stack *st);
+
+/* Policy for expiring reflog entries. */
+struct log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int stack_compact_all(struct stack *st, struct log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int stack_auto_compact(struct stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_ref(struct stack *st, const char *refname,
+		   struct ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int stack_read_log(struct stack *st, const char *refname,
+		   struct log_record *log);
+
+/* statistics on past compactions. */
+struct compaction_stats {
+	uint64_t bytes;
+	int attempts;
+	int failures;
+};
+
+struct compaction_stats *stack_compaction_stats(struct stack *st);
+
+#endif
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 00000000000..567ed6cf5fe
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,199 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+
+	{
+		int l = strlen(str);
+		l++; /* \0 */
+		slice_resize(s, l);
+		memcpy(s->buf, str, l);
+		s->len = l - 1;
+	}
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_append_string(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_append(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+byte *slice_yield(struct slice *s)
+{
+	byte *p = s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = { 0 };
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_yield(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	if (a.len != b.len) {
+		return 0;
+	}
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_compare(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	if (res != 0) {
+		return res;
+	}
+	if (a.len < b.len) {
+		return -1;
+	} else if (a.len > b.len) {
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+int slice_write(struct slice *b, byte *data, int sz)
+{
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_write_void(void *b, byte *data, int sz)
+{
+	return slice_write((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = calloc(size, 1);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct block_source *bs, struct slice *buf)
+{
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	free(dest->data);
+}
+
+struct block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 00000000000..f12a6db228c
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+struct slice {
+	byte *buf;
+	int len;
+	int cap;
+};
+
+void slice_set_string(struct slice *dest, const char *);
+void slice_append_string(struct slice *dest, const char *);
+char *slice_to_string(struct slice src);
+const char *slice_as_string(struct slice *src);
+bool slice_equal(struct slice a, struct slice b);
+byte *slice_yield(struct slice *s);
+void slice_copy(struct slice *dest, struct slice src);
+void slice_resize(struct slice *s, int l);
+int slice_compare(struct slice a, struct slice b);
+int slice_write(struct slice *b, byte *data, int sz);
+int slice_write_void(void *b, byte *data, int sz);
+void slice_append(struct slice *dest, struct slice add);
+
+struct block_source;
+void block_source_from_slice(struct block_source *bs, struct slice *buf);
+
+struct block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 00000000000..720907e6f23
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1007 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "reftable.h"
+#include "writer.h"
+
+int new_stack(struct stack **dest, const char *dir, const char *list_file,
+	      struct write_options config)
+{
+	struct stack *p = calloc(sizeof(struct stack), 1);
+	int err = 0;
+	*dest = NULL;
+	p->list_file = xstrdup(list_file);
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = stack_reload(p);
+	if (err < 0) {
+		stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fread_lines(FILE *f, char ***namesp)
+{
+	long size = 0;
+	int err = fseek(f, 0, SEEK_END);
+	char *buf = NULL;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	size = ftell(f);
+	if (size < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = fseek(f, 0, SEEK_SET);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	buf = malloc(size + 1);
+	if (fread(buf, 1, size, f) != size) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+exit:
+	free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	FILE *f = fopen(filename, "r");
+	int err = 0;
+	if (f == NULL) {
+		if (errno == ENOENT) {
+			*namesp = calloc(sizeof(char *), 1);
+			return 0;
+		}
+
+		return IO_ERROR;
+	}
+	err = fread_lines(f, namesp);
+	fclose(f);
+	return err;
+}
+
+struct merged_table *stack_merged_table(struct stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void stack_destroy(struct stack *st)
+{
+	if (st->merged == NULL) {
+		return;
+	}
+
+	merged_table_close(st->merged);
+	merged_table_free(st->merged);
+	st->merged = NULL;
+
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	free(st);
+}
+
+static struct reader **stack_copy_readers(struct stack *st, int cur_len)
+{
+	struct reader **cur = calloc(sizeof(struct reader *), cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int stack_reload_once(struct stack *st, char **names, bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reader **new_tables =
+		malloc(sizeof(struct reader *) * names_len);
+	int new_tables_len = 0;
+	struct merged_table *new_merged = NULL;
+
+	struct slice table_path = { 0 };
+
+	while (*names) {
+		struct reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct block_source src = { 0 };
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_append_string(&table_path, "/");
+			slice_append_string(&table_path, name);
+
+			err = block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0) {
+				goto exit;
+			}
+
+			err = new_reader(&rd, src, name);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = new_merged_table(&new_merged, new_tables, new_tables_len,
+			       st->config.hash_id);
+	if (err < 0) {
+		goto exit;
+	}
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		merged_table_free(st->merged);
+	}
+	st->merged = new_merged;
+
+	{
+		int i = 0;
+		for (i = 0; i < cur_len; i++) {
+			if (cur[i] != NULL) {
+				reader_close(cur[i]);
+				reader_free(cur[i]);
+			}
+		}
+	}
+exit:
+	free(slice_yield(&table_path));
+	{
+		int i = 0;
+		for (i = 0; i < new_tables_len; i++) {
+			reader_close(new_tables[i]);
+		}
+	}
+	free(new_tables);
+	free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0) {
+		return diff;
+	}
+
+	return udiff;
+}
+
+static int stack_reload_maybe_reuse(struct stack *st, bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		err = read_lines(st->list_file, &names_after);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return -1;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 100;
+		usleep(delay);
+	}
+
+	return 0;
+}
+
+int stack_reload(struct stack *st)
+{
+	return stack_reload_maybe_reuse(st, true);
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto exit;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto exit;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	free_names(names);
+	return err;
+}
+
+int stack_add(struct stack *st, int (*write)(struct writer *wr, void *arg),
+	      void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == LOCK_ERROR) {
+			err = stack_reload(st);
+		}
+		return err;
+	}
+
+	return stack_auto_compact(st);
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "%012" PRIx64 "-%012" PRIx64, min, max);
+	slice_set_string(dest, buf);
+}
+
+int stack_try_add(struct stack *st,
+		  int (*write_table)(struct writer *wr, void *arg), void *arg)
+{
+	struct slice lock_name = { 0 };
+	struct slice temp_tab_name = { 0 };
+	struct slice tab_name = { 0 };
+	struct slice next_name = { 0 };
+	struct slice table_list = { 0 };
+	struct writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+	int lock_fd = 0;
+	uint64_t next_update_index = 0;
+
+	slice_set_string(&lock_name, st->list_file);
+	slice_append_string(&lock_name, ".lock");
+
+	lock_fd = open(slice_as_string(&lock_name), O_EXCL | O_CREAT | O_WRONLY,
+		       0644);
+	if (lock_fd < 0) {
+		if (errno == EEXIST) {
+			err = LOCK_ERROR;
+			goto exit;
+		}
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_uptodate(st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 1) {
+		err = LOCK_ERROR;
+		goto exit;
+	}
+
+	next_update_index = stack_next_update_index(st);
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, next_update_index, next_update_index);
+
+	slice_set_string(&temp_tab_name, st->reftable_dir);
+	slice_append_string(&temp_tab_name, "/");
+	slice_append(&temp_tab_name, next_name);
+	slice_append_string(&temp_tab_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_name));
+	if (tab_fd < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	wr = new_writer(fd_writer, &tab_fd, &st->config);
+	err = write_table(wr, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = writer_close(wr);
+	if (err == EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto exit;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	if (wr->min_update_index < next_update_index) {
+		err = API_ERROR;
+		goto exit;
+	}
+
+	{
+		int i = 0;
+		for (i = 0; i < st->merged->stack_len; i++) {
+			slice_append_string(&table_list,
+					    st->merged->stack[i]->name);
+			slice_append_string(&table_list, "\n");
+		}
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_append_string(&next_name, ".ref");
+	slice_append(&table_list, next_name);
+	slice_append_string(&table_list, "\n");
+
+	slice_set_string(&tab_name, st->reftable_dir);
+	slice_append_string(&tab_name, "/");
+	slice_append(&tab_name, next_name);
+
+	err = rename(slice_as_string(&temp_tab_name),
+		     slice_as_string(&tab_name));
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	free(slice_yield(&temp_tab_name));
+
+	err = write(lock_fd, table_list.buf, table_list.len);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = close(lock_fd);
+	lock_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&tab_name));
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&tab_name));
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_reload(st);
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_name));
+	}
+	unlink(slice_as_string(&lock_name));
+
+	if (lock_fd > 0) {
+		close(lock_fd);
+		lock_fd = 0;
+	}
+
+	free(slice_yield(&lock_name));
+	free(slice_yield(&temp_tab_name));
+	free(slice_yield(&tab_name));
+	free(slice_yield(&next_name));
+	free(slice_yield(&table_list));
+	writer_free(wr);
+	return err;
+}
+
+uint64_t stack_next_update_index(struct stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0) {
+		return reader_max_update_index(st->merged->stack[sz - 1]) + 1;
+	}
+	return 1;
+}
+
+static int stack_compact_locked(struct stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct log_expiry_config *config)
+{
+	struct slice next_name = { 0 };
+	int tab_fd = -1;
+	struct writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reader_min_update_index(st->merged->stack[first]),
+		    reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_append_string(temp_tab, "/");
+	slice_append(temp_tab, next_name);
+	slice_append_string(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = new_writer(fd_writer, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0) {
+		goto exit;
+	}
+	err = writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+	writer_free(wr);
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		free(slice_yield(temp_tab));
+	}
+	free(slice_yield(&next_name));
+	return err;
+}
+
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+			int last, struct log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reader **subtabs =
+		calloc(sizeof(struct reader *), last - first + 1);
+	struct merged_table *mt = NULL;
+	int err = 0;
+	struct iterator it = { 0 };
+	struct ref_record ref = { 0 };
+	struct log_record log = { 0 };
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	writer_set_limits(wr, st->merged->stack[first]->min_update_index,
+			  st->merged->stack[last]->max_update_index);
+
+	err = new_merged_table(&mt, subtabs, subtabs_len, st->config.hash_id);
+	if (err < 0) {
+		free(subtabs);
+		goto exit;
+	}
+
+	err = merged_table_seek_ref(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = iterator_next_ref(it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+	}
+
+	err = merged_table_seek_log(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = iterator_next_log(it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		/* XXX collect stats? */
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+	}
+
+exit:
+	iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		merged_table_free(mt);
+	}
+	ref_record_clear(&ref);
+
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct stack *st, int first, int last,
+			       struct log_expiry_config *expiry)
+{
+	struct slice temp_tab_name = { 0 };
+	struct slice new_table_name = { 0 };
+	struct slice lock_file_name = { 0 };
+	struct slice ref_list_contents = { 0 };
+	struct slice new_table_path = { 0 };
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **delete_on_success = calloc(sizeof(char *), compact_count + 1);
+	char **subtable_locks = calloc(sizeof(char *), compact_count + 1);
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto exit;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_append_string(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0) {
+		goto exit;
+	}
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_name = { 0 };
+		struct slice subtab_lock = { 0 };
+		slice_set_string(&subtab_name, st->reftable_dir);
+		slice_append_string(&subtab_name, "/");
+		slice_append_string(&subtab_name,
+				    reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_name);
+		slice_append_string(&subtab_lock, ".lock");
+
+		{
+			int sublock_file_fd =
+				open(slice_as_string(&subtab_lock),
+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
+			if (sublock_file_fd > 0) {
+				close(sublock_file_fd);
+			} else if (sublock_file_fd < 0) {
+				if (errno == EEXIST) {
+					err = 1;
+				}
+				err = IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] = (char *)slice_as_string(&subtab_name);
+		j++;
+
+		if (err != 0) {
+			goto exit;
+		}
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_name, expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_append_string(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_append_string(&new_table_path, "/");
+
+	slice_append(&new_table_path, new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(slice_as_string(&temp_tab_name),
+			     slice_as_string(&new_table_path));
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		slice_append(&ref_list_contents, new_table_name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	have_lock = false;
+
+	{
+		char **p = delete_on_success;
+		while (*p) {
+			if (strcmp(*p, slice_as_string(&new_table_path))) {
+				unlink(*p);
+			}
+			p++;
+		}
+	}
+
+	err = stack_reload_maybe_reuse(st, first < last);
+exit:
+	free_names(delete_on_success);
+	{
+		char **p = subtable_locks;
+		while (*p) {
+			unlink(*p);
+			p++;
+		}
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	free(slice_yield(&new_table_name));
+	free(slice_yield(&new_table_path));
+	free(slice_yield(&ref_list_contents));
+	free(slice_yield(&temp_tab_name));
+	free(slice_yield(&lock_file_name));
+	return err;
+}
+
+int stack_compact_all(struct stack *st, struct log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct stack *st, int first, int last,
+				     struct log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	assert(sz > 0);
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = calloc(sizeof(struct segment), n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct stack *st)
+{
+	uint64_t *sizes = calloc(sizeof(uint64_t), st->merged->stack_len);
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		/* overhead is 24 + 68 = 92. */
+		sizes[i] = st->merged->stack[i]->size - 91;
+	}
+	return sizes;
+}
+
+int stack_auto_compact(struct stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	free(sizes);
+	if (segment_size(&seg) > 0) {
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+	}
+
+	return 0;
+}
+
+struct compaction_stats *stack_compaction_stats(struct stack *st)
+{
+	return &st->stats;
+}
+
+int stack_read_ref(struct stack *st, const char *refname,
+		   struct ref_record *ref)
+{
+	struct iterator it = { 0 };
+	struct merged_table *mt = stack_merged_table(st);
+	int err = merged_table_seek_ref(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = iterator_next_ref(it, ref);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(ref->ref_name, refname) || ref_record_is_deletion(ref)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	iterator_destroy(&it);
+	return err;
+}
+
+int stack_read_log(struct stack *st, const char *refname,
+		   struct log_record *log)
+{
+	struct iterator it = { 0 };
+	struct merged_table *mt = stack_merged_table(st);
+	int err = merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = iterator_next_log(it, log);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(log->ref_name, refname) || log_record_is_deletion(log)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 00000000000..d5e2c93c293
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,40 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+
+struct stack {
+	char *list_file;
+	char *reftable_dir;
+
+	struct write_options config;
+
+	struct merged_table *merged;
+	struct compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct stack *st,
+		  int (*write_table)(struct writer *wr, void *arg), void *arg);
+int stack_write_compact(struct stack *st, struct writer *wr, int first,
+			int last, struct log_expiry_config *config);
+int fastlog2(uint64_t sz);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 00000000000..99f453bfec8
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#if 1 /* REFTABLE_IN_GITCORE */
+
+#include "git-compat-util.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_IN_GITCORE */
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef uint8_t byte;
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ *
+ * TODO: in git-core, this should fallback to uncompress2 if it is available.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 00000000000..9bf7fe531ff
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,66 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				calloc(sizeof(struct tree_node), 1);
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	{
+		int res = compare(key, (*rootp)->key);
+		if (res < 0) {
+			return tree_search(key, &(*rootp)->left, compare,
+					   insert);
+		} else if (res > 0) {
+			return tree_search(key, &(*rootp)->right, compare,
+					   insert);
+		}
+	}
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 00000000000..86a71715aee
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,24 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100644
index 00000000000..70c0a7dee3e
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,13 @@
+#!/bin/sh
+
+set -eux
+
+((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
+git clone https://github.com/google/reftable reftable-repo) && \
+cp reftable-repo/c/*.[ch] reftable/ && \
+cp reftable-repo/LICENSE reftable/ &&
+git --git-dir reftable-repo/.git show --no-patch origin/master \
+> reftable/VERSION && \
+sed -i~ 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|' reftable/system.h
+rm reftable/*_test.c reftable/test_framework.* reftable/compat.*
+git add reftable/*.[ch]
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 00000000000..22331d193d0
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,637 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct block_stats *writer_block_stats(struct writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct writer *w, byte *data, size_t len, int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = calloc(w->pending_padding, 1);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0) {
+			return n;
+		}
+
+		w->pending_padding = 0;
+		free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0) {
+		return n;
+	}
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_write_header(struct writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	/* DO NOT SUBMIT.  This has not been encoded in the standard yet. */
+	dest[4] = (hash_size(w->opts.hash_id) == SHA1_SIZE) ? 1 : 2; /* version
+								      */
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	return 24;
+}
+
+static void writer_reinit_block_writer(struct writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = HEADER_SIZE;
+	}
+
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct writer *new_writer(int (*writer_func)(void *, byte *, int),
+			  void *writer_arg, struct write_options *opts)
+{
+	struct writer *wp = calloc(sizeof(struct writer), 1);
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->block = calloc(opts->block_size, 1);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void writer_free(struct writer *w)
+{
+	free(w->block);
+	free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
+			     ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		key = calloc(sizeof(struct obj_index_tree_node), 1);
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = realloc(key->offsets,
+				       sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct writer *w, struct record rec)
+{
+	int result = -1;
+	struct slice key = { 0 };
+	int err = 0;
+	record_key(rec, &key);
+	if (slice_compare(w->last_key, key) >= 0) {
+		goto exit;
+	}
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto exit;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	writer_reinit_block_writer(w, record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	result = 0;
+exit:
+	free(slice_yield(&key));
+	return result;
+}
+
+int writer_add_ref(struct writer *w, struct ref_record *ref)
+{
+	struct record rec = { 0 };
+	struct ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL) {
+		return API_ERROR;
+	}
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index) {
+		return API_ERROR;
+	}
+
+	record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = hash_size(w->opts.hash_id),
+		};
+
+		writer_index_hash(w, h);
+	}
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = hash_size(w->opts.hash_id),
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int writer_add_refs(struct writer *w, struct ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int writer_add_log(struct writer *w, struct log_record *log)
+{
+	if (log->ref_name == NULL) {
+		return API_ERROR;
+	}
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	{
+		struct record rec = { 0 };
+		int err;
+		record_from_log(&rec, log);
+		err = writer_add_record(w, rec);
+		return err;
+	}
+}
+
+int writer_add_logs(struct writer *w, struct log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	while (w->index_len > threshold) {
+		struct index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct record rec = { 0 };
+			record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, rec) == 0) {
+				continue;
+			}
+
+			{
+				int err = writer_flush_block(w);
+				if (err < 0) {
+					return err;
+				}
+			}
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, rec);
+			assert(err == 0);
+		}
+		for (i = 0; i < idx_len; i++) {
+			free(slice_yield(&idx[i].last_key));
+		}
+		free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct block_stats *bstats = writer_block_stats(w, typ);
+		bstats->index_blocks =
+			w->stats.idx_stats.blocks - before_blocks;
+		bstats->index_offset = index_start;
+		bstats->max_index_level = max_level;
+	}
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct record rec = { 0 };
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	free(slice_yield(&entry->hash));
+	free(entry);
+}
+
+static int writer_dump_object_index(struct writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0) {
+		return closure.err;
+	}
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0) {
+		return err;
+	}
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_close(struct writer *w)
+{
+	byte footer[68];
+	byte *p = footer;
+
+	int err = writer_finish_public_section(w);
+	if (err != 0) {
+		goto exit;
+	}
+
+	writer_write_header(w, footer);
+	p += 24;
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+	w->pending_padding = 0;
+
+	err = padded_write(w, footer, sizeof(footer), 0);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (w->stats.log_stats.entries + w->stats.ref_stats.entries == 0) {
+		err = EMPTY_TABLE_ERROR;
+		goto exit;
+	}
+
+exit:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	free(slice_yield(&w->last_key));
+	return err;
+}
+
+void writer_clear_index(struct writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		free(slice_yield(&w->index[i].last_key));
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct block_stats *bstats = writer_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	if (raw_bytes < 0) {
+		return raw_bytes;
+	}
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0) {
+		return err;
+	}
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = realloc(w->index,
+				   sizeof(struct index_record) * w->index_cap);
+	}
+
+	{
+		struct index_record ir = {
+			.offset = w->next,
+		};
+		slice_copy(&ir.last_key, w->block_writer->last_key);
+		w->index[w->index_len] = ir;
+	}
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	block_writer_reset(&w->block_writer_data);
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct writer *w)
+{
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+	if (w->block_writer->entries == 0) {
+		return 0;
+	}
+	return writer_flush_nonempty_block(w);
+}
+
+struct stats *writer_stats(struct writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 00000000000..42ad03e9ec6
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,45 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct writer {
+	int (*write)(void *, byte *, int);
+	void *write_arg;
+	int pending_padding;
+	struct slice last_key;
+
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct write_options opts;
+
+	byte *block;
+	struct block_writer *block_writer;
+	struct block_writer block_writer_data;
+	struct index_record *index;
+	int index_len;
+	int index_cap;
+
+	/* tree for use with tsearch */
+	struct tree_node *obj_index_tree;
+
+	struct stats stats;
+};
+
+int writer_flush_block(struct writer *w);
+void writer_clear_index(struct writer *w);
+int writer_finish_public_section(struct writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 00000000000..3e0b0f24f1c
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v7 6/6] Reftable support for git-core
  2020-02-26  8:49           ` [PATCH v7 0/6] " Han-Wen Nienhuys via GitGitGadget
                               ` (4 preceding siblings ...)
  2020-02-26  8:49             ` [PATCH v7 5/6] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-02-26  8:49             ` Han-Wen Nienhuys via GitGitGadget
  2020-02-26 18:12               ` Junio C Hamano
  2020-02-26 21:31               ` Junio C Hamano
  2020-02-26 17:35             ` [PATCH v7 0/6] Reftable support git-core Junio C Hamano
                               ` (2 subsequent siblings)
  8 siblings, 2 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-02-26  8:49 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

TODO:

 * Resolve spots marked with XXX
 * Test strategy?

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   24 +-
 builtin/clone.c                               |    4 +-
 builtin/init-db.c                             |   55 +-
 cache.h                                       |    4 +-
 refs.c                                        |   20 +-
 refs.h                                        |    3 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1015 +++++++++++++++++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |   31 +
 13 files changed, 1152 insertions(+), 29 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ffd..72576235833 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index 6134104ae65..706bb0a9814 100644
--- a/Makefile
+++ b/Makefile
@@ -814,6 +814,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += command-list.h
 
@@ -959,6 +960,7 @@ LIB_OBJS += rebase-interactive.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1163,7 +1165,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2347,11 +2349,28 @@ VCSSVN_OBJS += vcs-svn/fast_export.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/bytes.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2488,6 +2507,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
diff --git a/builtin/clone.c b/builtin/clone.c
index 4f6150c55c3..4bcea0c18da 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1097,7 +1097,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template,
+		DEFAULT_REF_STORAGE, /* XXX */
+		INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 45bdea05890..d51a99ed77e 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -177,7 +177,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 }
 
 static int create_default_files(const char *template_path,
-				const char *original_git_dir)
+				const char *original_git_dir,
+				const char *ref_storage_format, int flags)
 {
 	struct stat st1;
 	struct strbuf buf = STRBUF_INIT;
@@ -213,6 +214,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(ref_storage_format);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -222,6 +224,15 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -234,17 +245,21 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
+	} else {
+		/*
+		 * XXX should check whether our ref backend matches the
+		 * original one; if not, either die() or convert
+		 */
 	}
 
 	/* This forces creation of new config file */
-	xsnprintf(repo_version_string, sizeof(repo_version_string),
-		  "%d", GIT_REPO_VERSION);
+	xsnprintf(repo_version_string, sizeof(repo_version_string), "%d",
+		  !strcmp(ref_storage_format, "reftable") ?
+			  GIT_REPO_VERSION_READ :
+			  GIT_REPO_VERSION);
 	git_config_set("core.repositoryformatversion", repo_version_string);
 
 	/* Check filemode trustability */
@@ -339,7 +354,8 @@ static void separate_git_dir(const char *git_dir, const char *git_link)
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, unsigned int flags)
+	    const char *template_dir, const char *ref_storage_format,
+	    unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -378,7 +394,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 */
 	check_repository_format();
 
-	reinit = create_default_files(template_dir, original_git_dir);
+	reinit = create_default_files(template_dir, original_git_dir,
+				      ref_storage_format, flags);
 
 	create_object_directory();
 
@@ -403,6 +420,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -476,20 +495,24 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = DEFAULT_REF_STORAGE;
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
 	unsigned int flags = 0;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_END()
@@ -591,9 +614,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, flags);
+	return init_db(git_dir, real_git_dir, template_dir, ref_storage_format,
+		       flags);
 }
diff --git a/cache.h b/cache.h
index 37c899b53f7..4d905e2d565 100644
--- a/cache.h
+++ b/cache.h
@@ -627,7 +627,8 @@ int path_inside_repo(const char *prefix, const char *path);
 #define INIT_DB_EXIST_OK 0x0002
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, unsigned int flags);
+	    const char *template_dir, const char *ref_storage_format,
+	    unsigned int flags);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1041,6 +1042,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 1ab0bb54d3d..6530219762f 100644
--- a/refs.c
+++ b/refs.c
@@ -20,7 +20,7 @@
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1836,13 +1836,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1858,7 +1858,10 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs = ref_store_init(r->gitdir,
+				 r->ref_storage_format ? r->ref_storage_format :
+							 DEFAULT_REF_STORAGE,
+				 REF_STORE_ALL_CAPS);
 	return r->refs;
 }
 
@@ -1913,7 +1916,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1927,6 +1930,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1940,9 +1944,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index 87c9ec921b9..2b5985ad593 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* XXX where should this be? */
+#define DEFAULT_REF_STORAGE "files"
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 3490aac3a40..cafe5b97376 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -661,6 +661,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 00000000000..4279c06010c
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1015 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *reftable_dir;
+	char *table_list_file;
+	struct stack *stack;
+};
+
+static void clear_log_record(struct log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	log_record_clear(log);
+}
+
+static void fill_log_record(struct log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = {};
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *reftable_ref_store_create(const char *path,
+						   unsigned int store_flags)
+{
+	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/reftable/tables.list", path);
+	refs->table_list_file = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", path);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/HEAD", path);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", path);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	refs->err = new_stack(&refs->stack, refs->reftable_dir,
+			      refs->table_list_file, cfg);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	FILE *file;
+	safe_create_dir(refs->reftable_dir, 1);
+
+	file = fopen(refs->table_list_file, "a");
+	if (file == NULL) {
+		return -1;
+	}
+	fclose(file);
+	return 0;
+}
+
+struct reftable_iterator {
+	struct ref_iterator base;
+	struct iterator iter;
+	struct ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = iterator_next_ref(ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
+	ref_record_clear(&ri->ref);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct merged_table *mt = NULL;
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else {
+		mt = stack_merged_table(refs->stack);
+		ri->err = merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	(void)refs;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid = {};
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct ref_update *)a)->refname,
+		      ((struct ref_update *)b)->refname);
+}
+
+static int write_transaction_table(struct writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = stack_next_update_index(refs->stack);
+	int err = 0;
+	struct log_record *logs = calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	writer_set_limits(writer, ts, ts);
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct log_record *log = &logs[i];
+		fill_log_record(log);
+		log->ref_name = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct object_id out_oid = {};
+			int out_flags = 0;
+			/* Memory owned by refs_resolve_ref_unsafe, no need to
+			 * free(). */
+			const char *resolved = refs_resolve_ref_unsafe(
+				transaction->ref_store, u->refname, 0, &out_oid,
+				&out_flags);
+			struct ref_record ref = {};
+			ref.ref_name =
+				(char *)(resolved ? resolved : u->refname);
+			log->ref_name = ref.ref_name;
+			ref.value = u->new_oid.hash;
+			ref.update_index = ts;
+			err = writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		err = writer_add_log(writer, &logs[i]);
+		clear_log_record(&logs[i]);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+exit:
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_commit(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = stack_add(refs->stack, &write_transaction_table, transaction);
+	if (err < 0) {
+		strbuf_addf(errmsg, "reftable: transaction failure %s",
+			    error_str(err));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *err)
+{
+	return reftable_transaction_commit(ref_store, transaction, err);
+}
+
+struct write_delete_refs_arg {
+	struct stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	int err = 0;
+
+	writer_set_limits(writer, ts, ts);
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct log_record log = {};
+		struct ref_record current = {};
+		fill_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (stack_read_ref(arg->stack, log.ref_name, &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		ref_record_clear(&current);
+
+		clear_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	return stack_add(refs->stack, &write_delete_refs_table, &arg);
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	writer_set_limits(writer, ts, ts);
+	err = writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct log_record log = {};
+		struct object_id new_oid = {};
+		struct object_id old_oid = {};
+		struct ref_record current = {};
+		stack_read_ref(create->refs->stack, create->refname, &current);
+
+		fill_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return stack_add(refs->stack, &write_create_symref_table, &arg);
+}
+
+struct write_rename_arg {
+	struct stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->stack);
+	struct ref_record ref = {};
+	int err = stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto exit;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto exit;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct ref_record todo[2] = {};
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct log_record todo[2] = {};
+		fill_log_record(&todo[0]);
+		fill_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = writer_add_logs(writer, todo, 2);
+
+		clear_log_record(&todo[0]);
+		clear_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto exit;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	return stack_add(refs->stack, &write_rename_table, &arg);
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct iterator iter;
+	struct log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = iterator_next_log(ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	log_record_clear(&ri->log);
+	iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+
+	struct merged_table *mt = stack_merged_table(refs->stack);
+	int err = merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct iterator it = {};
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = NULL;
+	int err = 0;
+	struct log_record log = {};
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = stack_merged_table(refs->stack);
+	err = merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		err = iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid = {};
+			struct object_id new_oid = {};
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	log_record_clear(&log);
+	iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct iterator it = {};
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = NULL;
+	struct log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = stack_merged_table(refs->stack);
+	err = merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct log_record log = {};
+		err = iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (int i = len; i--;) {
+		struct log_record *log = &logs[i];
+		struct object_id old_oid = {};
+		struct object_id new_oid = {};
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (int i = 0; i < len; i++) {
+		log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct reftable_ref_store *refs;
+	struct log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct log_record tombstone = {
+		.ref_name = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = stack_next_update_index(arg->refs->stack);
+	int i = 0;
+	writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  reftable_pack_refs(), which will compact the entire stack and get rid
+	  of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.refs = refs,
+	};
+	struct log_record log = {};
+	struct iterator it = {};
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = stack_merged_table(refs->stack);
+	err = merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		struct object_id ooid = {};
+		struct object_id noid = {};
+
+		int err = iterator_next_log(it, &log);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0 || strcmp(log.ref_name, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	log_record_clear(&log);
+	iterator_destroy(&it);
+	err = stack_add(refs->stack, &write_reflog_expiry_table, &arg);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct reftable_ref_store *refs =
+		(struct reftable_ref_store *)ref_store;
+	struct ref_record ref = {};
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = stack_read_ref(refs->stack, refname, &ref);
+	if (err) {
+		goto exit;
+	}
+	if (ref.target != NULL) {
+		/* XXX recurse? */
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else {
+		hashcpy(oid->hash, ref.value);
+	}
+exit:
+	ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire
+};
diff --git a/repository.c b/repository.c
index a4174ddb062..ff0988dac84 100644
--- a/repository.c
+++ b/repository.c
@@ -174,6 +174,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 040057dea6f..198d4aa0907 100644
--- a/repository.h
+++ b/repository.h
@@ -70,6 +70,9 @@ struct repository {
 	/* The store in which the refs are held. */
 	struct ref_store *refs;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index 4ea7a0b081b..81dcc23fc90 100644
--- a/setup.c
+++ b/setup.c
@@ -458,9 +458,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -549,6 +551,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1191,8 +1194,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 00000000000..d35211c9db2
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+test_expect_success 'basic operation of reftable storage' '
+        git init --ref-storage=reftable repo &&
+        cd repo &&
+        echo "hello" > world.txt &&
+        git add world.txt &&
+        git commit -m "first post" &&
+        printf "HEAD\nrefs/heads/master\n" > want &&
+        git show-ref | cut -f2 -d" " > got &&
+        test_cmp got want &&
+        for count in $(test_seq 1 10); do
+                echo "hello" >> world.txt
+                git commit -m "number ${count}" world.txt
+        done &&
+        git gc &&
+        nfiles=$(ls -1 .git/reftable | wc -l | tr -d "[ \t]" ) &&
+        test "${nfiles}" = "2" &&
+        git reflog refs/heads/master > output &&
+        grep "commit (initial): first post" output &&
+        grep "commit: number 10" output
+'
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
  2020-02-21  6:40           ` Jonathan Nieder
@ 2020-02-26 17:16             ` Han-Wen Nienhuys
  2020-02-26 20:04               ` Junio C Hamano
  2020-02-27  0:01               ` brian m. carlson
       [not found]             ` <CAFQ2z_NQn9O3kFmHk8Cr31FY66ToU4bUdE=asHUfN++zBG+SPw@mail.gmail.com>
  1 sibling, 2 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-26 17:16 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, brian m. carlson, Junio C Hamano, Han-Wen Nienhuys via GitGitGadget

On Fri, Feb 21, 2020 at 7:40 AM Jonathan Nieder <jrnieder@gmail.com> wrote:
> > This adds the reftable library, and hooks it up as a ref backend.
>
> As promised, here's a patch to include the reftable spec in git.git.
> Please include this in the next iteration of this patch series (or if
> you prefer for it to land separately, that's also fine with me).

I did.

> [...]
> >  * support SHA256 as version 2 of the format.
>
> I'd prefer if we error out for now when someone tries to use reftable
> when the_hash_algo != sha1.  That would buy a bit more time to pin
> down the details of version 2 (e.g., should it only add an object
> format field, or object format and oid size?).

You have convinced me that we should go with the 4-byte identifier.

How about setting version=2 and extending the header by 4 bytes (which
hold the 4-byte identifier). The footer would also be increased in
size equivalently. There are no further changes.

BTW, could we document that a reftable consisting of just a footer is
a valid table too?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v7 0/6] Reftable support git-core
  2020-02-26  8:49           ` [PATCH v7 0/6] " Han-Wen Nienhuys via GitGitGadget
                               ` (5 preceding siblings ...)
  2020-02-26  8:49             ` [PATCH v7 6/6] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-02-26 17:35             ` Junio C Hamano
  2020-03-24  6:06             ` Jonathan Nieder
  2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
  8 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-02-26 17:35 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This adds the reftable library, and hooks it up as a ref backend.

I do not see the promised .gitattributes addition; please add one
early in the series, before the step in which you start adding files
that want different whitespace rules, so that I do not have to apply
with "git am --whitespace=nowarn".

It is rather unfortunate that you included Jonathan's reftable doc
patch in the series---it has separately been queued and merged
already in 'next'.  I'll fork hn/reftable on top of the reftable doc
topic of Jonathan's, while dropping the extra copy [4/6] in this
series, to cope with this (an alternative would be to revert the one
in 'next' and take this series as-is).

>  * spots marked with XXX in the Git source code.

What follows is my notes while "git diff master...hn/reftable" and
looking for those lines, except for the ones in the reftable
implementation proper (for which you would be a better judge).

The init-db call in clone hardcodes DEFAULT_REF_STORAGE and that
should be OK at least for now.  Eventually the code would want to
read from the system-wide and the per-user configuration files to
figure it out, but I do not think the config subsystem is prepared
to read the configuration before making this call to init_db() right
now---let's leave such a complication until after the basics is
done.  After all, "git init && git add remote && git pull" can be
used as a rough approximate for "git clone"; as long as "git init"
can be told what backend to use, that would be sufficient to get us
going.

It sounds a bit too magical to convert the repository by running
"git init" again on an already existing repository.  But a user who
runs "git init" in an existing repository and asks to use settings
that are different from what the repository uses from the command
line cannot be expecting the command to error out when the settings
do not match (i.e. as a mechanism to check the setting)---the only
sensible way to interpret such an invocation is as a request to
convert the existing repository.  Again, I do not think it is a huge
loss to do without such an automatic conversion in the initial
deployment.

In refs.c, I am puzzled by the fact that a call to ref_store_init()
in the get_submodule_ref_store() needs to pass DEFAULT_REF_STORAGE,
or any format for that matter, at all.  Wouldn't the gitdir argument
sufficient for ref_store_init() to learn what backend the repository
uses?  IOW, I am not sure why the caller needs to pass the backend
format.  

The way r->ref_storage_format is used by get_main_ref_store(), at
least when the field is set, looks a lot more sensible.  Would it be
possible that get_submodule_ref_store() function would also want to
have, not just a submodule as a mere "string", but an instance of a
repository for that submodule?

Speaking of the use of ref_store_init() and r->ref_storage_format in
get_main_ref_store(), when is the storage_format field unspecified?
Is there a chicken-and-egg problem around here?

The same comments apply to the use of the default format in
get_worktree_ref_store(), but do we foresee to "add" a worktree that
uses one ref backend off of a repository that uses a different ref
backend?  Even if we do, it probably makes sense to default to the
same format as the primary worktree uses, I would think.  It is
still unclear to me why ref_store_init() wants the format in the
first place, though, given that it takes a path to the .git/
directory (or its equivalent), and ref_store_init() should be able
to figuire it out without being told.

I think it is fine to define the textual name of the default ref
storage in refs.h

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
       [not found]             ` <CAFQ2z_NQn9O3kFmHk8Cr31FY66ToU4bUdE=asHUfN++zBG+SPw@mail.gmail.com>
@ 2020-02-26 17:41               ` Jonathan Nieder
  2020-02-26 17:54                 ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Jonathan Nieder @ 2020-02-26 17:41 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: git, brian m. carlson, Junio C Hamano,
	Han-Wen Nienhuys via GitGitGadget, Josh Steadmon

(+Josh Steadmon, fuzz testing specialist)
Han-Wen Nienhuys wrote:

> You have convinced me that we should go with the 4-byte identifier.
>
> How about setting version=2 and extending the header by 4 bytes (which hold
> the 4-byte identifier). The footer would also be increased in size.

SGTM.

> BTW, could we document that a reftable consisting of just a footer is a
> valid table too?

Yes, SGTM as well.

Do you have sample reftables collected somewhere that you use for
testing?  Asking since that would help us set up some fuzz tests. :)

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
  2020-02-26 17:41               ` Jonathan Nieder
@ 2020-02-26 17:54                 ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-26 17:54 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, brian m. carlson, Junio C Hamano,
	Han-Wen Nienhuys via GitGitGadget, Josh Steadmon

On Wed, Feb 26, 2020 at 6:41 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> (+Josh Steadmon, fuzz testing specialist)
> Han-Wen Nienhuys wrote:
>
> > You have convinced me that we should go with the 4-byte identifier.
> >
> > How about setting version=2 and extending the header by 4 bytes (which hold
> > the 4-byte identifier). The footer would also be increased in size.
>
> SGTM.
>
> > BTW, could we document that a reftable consisting of just a footer is a
> > valid table too?
>
> Yes, SGTM as well.
>
> Do you have sample reftables collected somewhere that you use for
> testing?  Asking since that would help us set up some fuzz tests. :)

No, but the library is standalone, so writing fuzz tests should be
easy. The C code is essentially a copy of the Go code, so you could
start fuzzing there. I believe it is really easy to setup fuzzing in
Go, and we could generate a corpus starting there.

On a related note,  I should write interoperation tests that pass
tables back and forth between, JGit, C and Go, but I'm currently
focusing on getting the code ready for git-core.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v7 6/6] Reftable support for git-core
  2020-02-26  8:49             ` [PATCH v7 6/6] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-02-26 18:12               ` Junio C Hamano
  2020-02-26 18:59                 ` Han-Wen Nienhuys
  2020-02-26 21:31               ` Junio C Hamano
  1 sibling, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-02-26 18:12 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
> new file mode 100755
> index 00000000000..d35211c9db2
> --- /dev/null
> +++ b/t/t0031-reftable.sh
> @@ -0,0 +1,31 @@
> +#!/bin/sh
> +#
> +# Copyright (c) 2020 Google LLC
> +#
> +
> +test_description='reftable basics'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'basic operation of reftable storage' '
> +        git init --ref-storage=reftable repo &&

Test script bodies are supposed to be indented with HT, not 8 SPs.

> +        cd repo &&

You are not supposed to chdir around inside tests, unless you do so
in a subshell.  i.e.

	test_expect_success 'basic operation' '
		git init --ref-storage=reftable repo &&
		(
			cd repo &&
			...
		)
	'

This is because the next test somebody else will add after this one
*will* start from inside that repo (but if and only if this step
gets run---and there are ways to skip steps).

> +        echo "hello" > world.txt &&

A shell redirection operator should immediately be followed by the
filename without SP in between, i.e.

	echo hello >world.txt &&

> +        git add world.txt &&
> +        git commit -m "first post" &&
> +        printf "HEAD\nrefs/heads/master\n" > want &&

There is an easier-to-read helper.  Also, by convention, the file
that holds the expected state is called "expect", while the file
that records the actual state observed is called "actual".

	test_write_lines HEAD refs/heads/master >expect &&

> +        git show-ref | cut -f2 -d" " > got &&

Is the order of "show-ref" output guaranteed in some way?  Otherwise
we may need to make sure that we sort it to keep it stable.

Without --show-head, HEAD should not appear in "git show-ref"
output, but the expected output has HEAD, which is puzzling.

We will not notice if a "git" command that placed on the left hand
side of a pipeline segfaults after showing its output.  

People often split a pipeline like this into separate command for
that reason (but there is probably a better way in this case, as we
see soon).

> +        test_cmp got want &&

Taking the above comments together, perhaps

	test_write_lines refs/heads/master >expect &&
	git for-each-ref --sort=refname >actual &&
	test_cmp expect actual &&

> +        for count in $(test_seq 1 10); do
> +                echo "hello" >> world.txt
> +                git commit -m "number ${count}" world.txt
> +        done &&

Style: write "do" on its own line, aligned with "for" on the
previous line.  Same for "if/then/else/fi" "while/do/done" etc.  

    An easy way to remember is that you never use any semicolon in
    your shell script, except for the double-semicolon you use to
    delimit your case arms.

When any of these steps fail, you do not notice it, unless the
failure happens to occur to "git commit" during the last round.  I
think you can use "return 1" to signal the failure to the test
framework, like so.

	for count in $(test_seq 1 10)
	do
		echo hello >>world.txt &&
		git commit -m "number $count} world.txt ||
		return 1
	done &&

> +        git gc &&
> +        nfiles=$(ls -1 .git/reftable | wc -l | tr -d "[ \t]" ) &&
> +        test "${nfiles}" = "2" &&

No need to pipe "wc -l" output to "tr" to strip spaces.  Just stop
double-quoting it on the using side, and even better is to use
test_line_count.  Perhaps

	ls -1 .git/reftable >files &&
	test_line_count = 2 files

> +        git reflog refs/heads/master > output &&
> +        grep "commit (initial): first post" output &&
> +        grep "commit: number 10" output

Do we give guarantees that we keep all the reflog entries?  Would it
help to count the number of lines in the output file here?

> +'
> +
> +test_done

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v7 6/6] Reftable support for git-core
  2020-02-26 18:12               ` Junio C Hamano
@ 2020-02-26 18:59                 ` Han-Wen Nienhuys
  2020-02-26 19:59                   ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-26 18:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Wed, Feb 26, 2020 at 7:13 PM Junio C Hamano <gitster@pobox.com> wrote:
> > diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
> > new file mode 100755
> > index 00000000000..d35211c9db2
> > --- /dev/null
> > +++ b/t/t0031-reftable.sh

Thanks for your careful review. Should we not start with the C code,
though? I've written the reftable library to my best ability, but it
hasn't been peer reviewed yet.

> > +test_expect_success 'basic operation of reftable storage' '
> > +        git init --ref-storage=reftable repo &&
>
> Test script bodies are supposed to be indented with HT, not 8 SPs.

Done

> > +        cd repo &&
>
> You are not supposed to chdir around inside tests, unless you do so
> in a subshell.  i.e.
>
>         test_expect_success 'basic operation' '
>                 git init --ref-storage=reftable repo &&
>                 (
>                         cd repo &&

Done.

> > +        echo "hello" > world.txt &&
>
> A shell redirection operator should immediately be followed by the
> filename without SP in between, i.e.

Done.

>         echo hello >world.txt &&
>
> > +        git add world.txt &&
> > +        git commit -m "first post" &&
> > +        printf "HEAD\nrefs/heads/master\n" > want &&
>
> There is an easier-to-read helper.  Also, by convention, the file
> that holds the expected state is called "expect", while the file
> that records the actual state observed is called "actual".

Done

>         test_write_lines HEAD refs/heads/master >expect &&
>
> > +        git show-ref | cut -f2 -d" " > got &&
>
> Is the order of "show-ref" output guaranteed in some way?  Otherwise
> we may need to make sure that we sort it to keep it stable.

The ref iteration result is lexicographically ordered.

> Without --show-head, HEAD should not appear in "git show-ref"
> output, but the expected output has HEAD, which is puzzling.

The HEAD symref is stored in reftable too, and to the reftable code
it's just another ref. How should we refactor this? Shouldn't the
files ref backend produce "HEAD" and then it's up to the show-ref
builtin to filter it out?

> We will not notice if a "git" command that placed on the left hand
> side of a pipeline segfaults after showing its output.
>
> People often split a pipeline like this into separate command for
> that reason (but there is probably a better way in this case, as we
> see soon).
>
> > +        test_cmp got want &&
>
> Taking the above comments together, perhaps
>
>         test_write_lines refs/heads/master >expect &&
>         git for-each-ref --sort=refname >actual &&
>         test_cmp expect actual &&
>
> > +        for count in $(test_seq 1 10); do
> > +                echo "hello" >> world.txt
> > +                git commit -m "number ${count}" world.txt
> > +        done &&
>
> Style: write "do" on its own line, aligned with "for" on the
> previous line.  Same for "if/then/else/fi" "while/do/done" etc.

Done.

>     An easy way to remember is that you never use any semicolon in
>     your shell script, except for the double-semicolon you use to
>     delimit your case arms.
>
> When any of these steps fail, you do not notice it, unless the
> failure happens to occur to "git commit" during the last round.  I
> think you can use "return 1" to signal the failure to the test
> framework, like so.
>
>         for count in $(test_seq 1 10)
>         do
>                 echo hello >>world.txt &&
>                 git commit -m "number $count} world.txt ||
>                 return 1
>         done &&

Done.

>         ls -1 .git/reftable >files &&
>         test_line_count = 2 files
>

Done.

> > +        git reflog refs/heads/master > output &&
> > +        grep "commit (initial): first post" output &&
> > +        grep "commit: number 10" output
>
> Do we give guarantees that we keep all the reflog entries?  Would it
> help to count the number of lines in the output file here?

The reflog entries are only removed if someone calls into the ref
backend to expire them. Does that happen automatically?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v7 6/6] Reftable support for git-core
  2020-02-26 18:59                 ` Han-Wen Nienhuys
@ 2020-02-26 19:59                   ` Junio C Hamano
  2020-02-27 16:03                     ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-02-26 19:59 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

Han-Wen Nienhuys <hanwen@google.com> writes:

> On Wed, Feb 26, 2020 at 7:13 PM Junio C Hamano <gitster@pobox.com> wrote:
>> > diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
>> > new file mode 100755
>> > index 00000000000..d35211c9db2
>> > --- /dev/null
>> > +++ b/t/t0031-reftable.sh
>
> Thanks for your careful review. Should we not start with the C code,
> though? I've written the reftable library to my best ability, but it
> hasn't been peer reviewed yet.

My intention was to let others, especially those from outside
Google, to comment before I do ;-)

>>         test_write_lines HEAD refs/heads/master >expect &&
>>
>> > +        git show-ref | cut -f2 -d" " > got &&
>>
>> Is the order of "show-ref" output guaranteed in some way?  Otherwise
>> we may need to make sure that we sort it to keep it stable.
>
> The ref iteration result is lexicographically ordered.
>
>> Without --show-head, HEAD should not appear in "git show-ref"
>> output, but the expected output has HEAD, which is puzzling.
>
> The HEAD symref is stored in reftable too, and to the reftable code
> it's just another ref. How should we refactor this? Shouldn't the
> files ref backend produce "HEAD" and then it's up to the show-ref
> builtin to filter it out?

After looking at show-ref.c, the answer is obvious, I would think.
for_each_ref() MUST NOT include HEAD in its enumeration, and that is
why the caller of for_each_ref() makes a separate call to head_ref()
when it wants to talk about HEAD.  As a backend-agnostic abstraction
layer, behaviour of functions like for_each_ref() should not change
depending on what ref backend is in use.

> The reflog entries are only removed if someone calls into the ref
> backend to expire them. Does that happen automatically?

"gc auto" may kick in in theory but not likely in these tests, so it
would be safe and sensible to know and check the expected number of
reflog entries in this test, I guess.

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
  2020-02-26 17:16             ` Han-Wen Nienhuys
@ 2020-02-26 20:04               ` Junio C Hamano
  2020-02-27  0:01               ` brian m. carlson
  1 sibling, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-02-26 20:04 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Jonathan Nieder, git, brian m. carlson,
	Han-Wen Nienhuys via GitGitGadget

Han-Wen Nienhuys <hanwen@google.com> writes:

> On Fri, Feb 21, 2020 at 7:40 AM Jonathan Nieder <jrnieder@gmail.com> wrote:
>> > This adds the reftable library, and hooks it up as a ref backend.
>>
>> As promised, here's a patch to include the reftable spec in git.git.
>> Please include this in the next iteration of this patch series (or if
>> you prefer for it to land separately, that's also fine with me).
>
> I did.

Ahh, I didn't see that original request to make it part of the
series.  Let me drop the standalone patch and revert it out of
'next', so that we can polish it as a part of this topic, which
makes a lot more sense.

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v7 6/6] Reftable support for git-core
  2020-02-26  8:49             ` [PATCH v7 6/6] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
  2020-02-26 18:12               ` Junio C Hamano
@ 2020-02-26 21:31               ` Junio C Hamano
  2020-02-27 16:01                 ` Han-Wen Nienhuys
  1 sibling, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-02-26 21:31 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget, brian m. carlson
  Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  int init_db(const char *git_dir, const char *real_git_dir,
> -	    const char *template_dir, unsigned int flags)
> +	    const char *template_dir, const char *ref_storage_format,
> +	    unsigned int flags)
>  {

This change, and Brian's SHA-256 work, obviously will introduce a
conflict as the other side wants to add an extra parameter to choose
from different hash algorhtims.

I wonder if we should rather add a single pointer to a struct that
can hold various initialization options as an extra parameter.  That
way, each topic can add and manage its own new member in the struct.

>  static int create_default_files(const char *template_path,
> -				const char *original_git_dir)
> +				const char *original_git_dir,
> +				const char *ref_storage_format, int flags)

Pretty much the same story here, too.  The other side aims to be
more generic and passes a "struct repository_format *" as an extra
parameter.  I think the repository-format structure needs to have
what ref backend is in use, so instead of passing only the string
like this patch, we can use the appropriate member from the struct?

Please try to apply this series to 'master', and try to merge the
result with the tip of 'pu', to see if there are other areas that a
better coordination between you and Brian would help moving these
two topics together and work well with each other.

I am not sure how quickly Brian's SHA-256 work solidifies enough to
build on top of, but if it is a good option to build on top of the
topic, that may save some work from this topic, too, as the
mechanism to choose something (i.e. hash algorithm for Brian's
topic, ref backend for this topic) fundamental to the repository at
runtime and at initialization time needs similar supporting
infrastructure, such as changes in "[12/24] setup: allow
check_repository_format to read repository format", and "[13/24]
builtin/init-db: allow specifying hash algorithm on command line"
of that series.

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v6 0/5] Reftable support git-core
  2020-02-26 17:16             ` Han-Wen Nienhuys
  2020-02-26 20:04               ` Junio C Hamano
@ 2020-02-27  0:01               ` brian m. carlson
  1 sibling, 0 replies; 409+ messages in thread
From: brian m. carlson @ 2020-02-27  0:01 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Jonathan Nieder, git, Junio C Hamano, Han-Wen Nienhuys via GitGitGadget

[-- Attachment #1: Type: text/plain, Size: 1626 bytes --]

On 2020-02-26 at 17:16:01, Han-Wen Nienhuys wrote:
> On Fri, Feb 21, 2020 at 7:40 AM Jonathan Nieder <jrnieder@gmail.com> wrote:
> > > This adds the reftable library, and hooks it up as a ref backend.
> >
> > As promised, here's a patch to include the reftable spec in git.git.
> > Please include this in the next iteration of this patch series (or if
> > you prefer for it to land separately, that's also fine with me).
> 
> I did.
> 
> > [...]
> > >  * support SHA256 as version 2 of the format.
> >
> > I'd prefer if we error out for now when someone tries to use reftable
> > when the_hash_algo != sha1.  That would buy a bit more time to pin
> > down the details of version 2 (e.g., should it only add an object
> > format field, or object format and oid size?).

If we do this, the testsuite will start to fail with SHA-256 unless you
label those tests with the SHA1 prerequisite.

> You have convinced me that we should go with the 4-byte identifier.
> 
> How about setting version=2 and extending the header by 4 bytes (which
> hold the 4-byte identifier). The footer would also be increased in
> size equivalently. There are no further changes.

Just so you're aware, the values we're going to be using in pack index
version 3 are in sha1-file.c as the format_version values of struct
hash_algo.  They're 0x73686131 for SHA-1  ("sha1", big-endian) and
0x73323536 for SHA-256 ("s256", big-endian).

I know Jonathan is already familiar with those, but I thought I'd point
them out since others may not be.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v7 6/6] Reftable support for git-core
  2020-02-26 21:31               ` Junio C Hamano
@ 2020-02-27 16:01                 ` Han-Wen Nienhuys
  2020-02-27 16:26                   ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-27 16:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Han-Wen Nienhuys via GitGitGadget, brian m. carlson, git,
	Han-Wen Nienhuys

On Wed, Feb 26, 2020 at 10:31 PM Junio C Hamano <gitster@pobox.com> wrote:
> This change, and Brian's SHA-256 work, obviously will introduce a
> conflict as the other side wants to add an extra parameter to choose
> from different hash algorhtims.
>
> I wonder if we should rather add a single pointer to a struct that
> can hold various initialization options as an extra parameter.  That
> way, each topic can add and manage its own new member in the struct.
>
> >  static int create_default_files(const char *template_path,
> > -                             const char *original_git_dir)
> > +                             const char *original_git_dir,
> > +                             const char *ref_storage_format, int flags)
>
> Pretty much the same story here, too.  The other side aims to be
> more generic and passes a "struct repository_format *" as an extra
> parameter.

Sounds great to me. Could we submit a patch that changes this into
master already so Brian and can work on the same basis?

> I am not sure how quickly Brian's SHA-256 work solidifies enough to
> build on top of, but if it is a good option to build on top of the
> topic, that may save some work from this topic, too, as the
> mechanism to choose something (i.e. hash algorithm for Brian's
> topic, ref backend for this topic) fundamental to the repository at
> runtime and at initialization time needs similar supporting
> infrastructure, such as changes in "[12/24] setup: allow
> check_repository_format to read repository format", and "[13/24]
> builtin/init-db: allow specifying hash algorithm on command line"
> of that series.

Can I pull this series as a git branch somewhere?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v7 6/6] Reftable support for git-core
  2020-02-26 19:59                   ` Junio C Hamano
@ 2020-02-27 16:03                     ` Han-Wen Nienhuys
  2020-02-27 16:23                       ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-27 16:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Wed, Feb 26, 2020 at 8:59 PM Junio C Hamano <gitster@pobox.com> wrote:
> >> Without --show-head, HEAD should not appear in "git show-ref"
> >> output, but the expected output has HEAD, which is puzzling.
> >
> > The HEAD symref is stored in reftable too, and to the reftable code
> > it's just another ref. How should we refactor this? Shouldn't the
> > files ref backend produce "HEAD" and then it's up to the show-ref
> > builtin to filter it out?
>
> After looking at show-ref.c, the answer is obvious, I would think.
> for_each_ref() MUST NOT include HEAD in its enumeration, and that is
> why the caller of for_each_ref() makes a separate call to head_ref()
> when it wants to talk about HEAD.  As a backend-agnostic abstraction
> layer, behaviour of functions like for_each_ref() should not change
> depending on what ref backend is in use.

So, the ref backend should manage the HEAD ref, but iteration should
not produce the HEAD ref.

Are there other refs that behave like this, or is this the only special case?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v7 6/6] Reftable support for git-core
  2020-02-27 16:03                     ` Han-Wen Nienhuys
@ 2020-02-27 16:23                       ` Junio C Hamano
  2020-02-27 17:56                         ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-02-27 16:23 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Michael Haggerty, Jeff King

Han-Wen Nienhuys <hanwen@google.com> writes:

> So, the ref backend should manage the HEAD ref, but iteration should
> not produce the HEAD ref.

Yeah, as there is a dedicated API function head_ref().

Things like ORIG_HEAD and MERGE_HEAD have always been curiosity
outside the official API, but IIRC read_ref() and friends are
prepared to read them [*1*], so the vocabulary may be unbounded [*2*].

These won't be listed in for_each_ref() iteration, either (think of
for_each_ref() as a filtered "ls -R .git/refs/" output).

Thanks.


[Footnotes]

*1* I do not offhand know if these *_HEAD pseudo-refs are written
via the refs API---it probably is the safest to arrange the ref
backends in such a way that they always go to the files backend,
even if HEAD and refs/* are managed by a backend different from the
files one.

*2* I vaguely recall that back when ref-backend infrastructure was
introduced, we had discussion on declaring that "^[A-Z_]*HEAD$"
directly under $GIT_DIR/ are these special refs (and nothing else
directly under $GIT_DIR/ is).  I do not recall what happend to the
discussion, though other folks may.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v7 6/6] Reftable support for git-core
  2020-02-27 16:01                 ` Han-Wen Nienhuys
@ 2020-02-27 16:26                   ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-02-27 16:26 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Han-Wen Nienhuys via GitGitGadget, brian m. carlson, git,
	Han-Wen Nienhuys

Han-Wen Nienhuys <hanwen@google.com> writes:

> Sounds great to me. Could we submit a patch that changes this into
> master already so Brian and can work on the same basis?

Heh, "into master already" is never be the way things work around
here.  But if these two topics can agree the best way forward, it
can be proposed, polished and then merged to 'master', just like any
other topic.  And the two topics both can build on top of that
topic.

>> I am not sure how quickly Brian's SHA-256 work solidifies enough to
>> build on top of, but if it is a good option to build on top of the
>> topic, that may save some work from this topic, too, as the
>> mechanism to choose something (i.e. hash algorithm for Brian's
>> topic, ref backend for this topic) fundamental to the repository at
>> runtime and at initialization time needs similar supporting
>> infrastructure, such as changes in "[12/24] setup: allow
>> check_repository_format to read repository format", and "[13/24]
>> builtin/init-db: allow specifying hash algorithm on command line"
>> of that series.
>
> Can I pull this series as a git branch somewhere?

There is a single merge somewhere on "git log --first-parent master..pu"
that merges bc/sha-256- something (I assume you know how to obtain
'pu').

Thanks.


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v7 6/6] Reftable support for git-core
  2020-02-27 16:23                       ` Junio C Hamano
@ 2020-02-27 17:56                         ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-02-27 17:56 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Michael Haggerty, Jeff King

On Thu, Feb 27, 2020 at 5:23 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Han-Wen Nienhuys <hanwen@google.com> writes:
>
> > So, the ref backend should manage the HEAD ref, but iteration should
> > not produce the HEAD ref.
>
> Yeah, as there is a dedicated API function head_ref().
>
> Things like ORIG_HEAD and MERGE_HEAD have always been curiosity
> outside the official API, but IIRC read_ref() and friends are
> prepared to read them [*1*], so the vocabulary may be unbounded [*2*].
>
> These won't be listed in for_each_ref() iteration, either (think of
> for_each_ref() as a filtered "ls -R .git/refs/" output).


currently the code says

 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
   return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
 }

but it looks like this should do

   return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);

instead.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v7 0/6] Reftable support git-core
  2020-02-26  8:49           ` [PATCH v7 0/6] " Han-Wen Nienhuys via GitGitGadget
                               ` (6 preceding siblings ...)
  2020-02-26 17:35             ` [PATCH v7 0/6] Reftable support git-core Junio C Hamano
@ 2020-03-24  6:06             ` Jonathan Nieder
  2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
  8 siblings, 0 replies; 409+ messages in thread
From: Jonathan Nieder @ 2020-03-24  6:06 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget
  Cc: git, Han-Wen Nienhuys, Derrick Stolee, Carlos Martín Nieto

Hi!

Han-Wen Nienhuys wrote:

> This adds the reftable library, and hooks it up as a ref backend.
>
> Feedback wanted:
>
>  * spots marked with XXX in the Git source code.
>
>  * what is a good test strategy? Could we have a CI flavor where we flip the
>    default to reftable, and see how it fares?
[...]
> Han-Wen Nienhuys (5):
>   refs.h: clarify reflog iteration order
>   create .git/refs in files-backend.c
>   refs: document how ref_iterator_advance_fn should handle symrefs
>   Add reftable library
>   Reftable support for git-core
>
> Jonathan Nieder (1):
>   reftable: file format documentation
[...]
>  51 files changed, 8547 insertions(+), 32 deletions(-)

This series was discussed at the Git Contributor Summit, and I
promised to provide a summary here.  Sorry for the delay.

Contribution format
~~~~~~~~~~~~~~~~~~~
This is a bit of an unusual contribution to Git:

- the heart of the series is in a single commit
- it is under a different license than most of Git
- the code doesn't use the same dialect as and "look like" the rest of
  Git

All three of these are because the code is meant to be a standalone
library that can be used by Git, libgit2, and any other project that
wants to understand reftables.  But they create obstacles to reviewing
(one reviewer said he had spent a two-hour block looking at the patch
and not made much headway).

On the other hand, people mostly agreed that having the ability to
share this code with libgit2 is beneficial.  So how can we get a
substantial review without losing that benefit?

 1. Most importantly, people would like a series of multiple patches
    that tell a story.  If review of an early patch in the series
    reveals significant changes that should happen in reftable
    upstream, that's fine --- upstream, that can happen in a patch on
    top, whereas in the series being contributed, the patch would be
    amended.

    In the end, there would be two different series of commits: the
    contributed series, and the upstream history.  The upstream
    history might be messier, and that's fine.  The contributed series
    would be applied to git.git.

    If I understood correctly, this is more important for the initial
    contribution than for later changes.  Git includes some code (e.g.
    in compat/) with outside upstreams and gets all-in-one-commit
    updates.  As long as upstream has a functional review process,
    this is fine --- here, reftable upstream (you :)) wants review on
    the initial contribution and splitting into a series that tells a
    story is the cost of making that review feasible to provide.

 2. Both Git and libgit2 have abstractions like strbuf.  We'd like
    reftable to make use of similar abstractions where they make the
    code cleaner.

 3. libgit2 has a git_malloc allocator.  reftable doesn't necessarily
    have to use it or make it pluggable --- at worst, we can #define
    malloc to use it.  But it's something to be aware of.

Maintenance
~~~~~~~~~~~
There's some interest in the maintenance story: if we run into an
issue with the reftable library, do we report and fix it on the git
mailing list or does the reftable library have its own upstream forums
for that?

Portability
~~~~~~~~~~~
There was some confusion about the scope of what the reftable library
provides.  It provides a reader and a writer and the caller is
responsible for connecting those to files.

There were some questions about mmap usage here (there is none) and
whether the library needs some kind of seeking reader interface for
abstracting the OS interface to files.

I think the upshot is

 4. Some summary documentation would be helpful e.g. in the README.md,
    to point people to what header file a person should start with to
    understand the library

 5. People are also interested in the file-oriented part of reftable,
    even though this library doesn't implement it (that's patch 6,
    which is Git-specific).

Testing
~~~~~~~
Git's testsuite has various GIT_TEST_* knobs.  A GIT_TEST_REF_BACKEND
knob would be helpful, to allow running the full testsuite with
reftable.  That's a good way to suss out edge cases.

The testsuite should *also* include some specific tests designed for
reftable.  They can demonstrate edge cases and demonstrate some of the
benefits

- case sensitivity
- handling of directory/file conflicts

That provides another entry point for people to understand reftable.

We also discussed table-driven tests that can be shared between
implementations but didn't end up saying anything too concrete about
that.

Summary
~~~~~~~
Breaking down the library into multiple patches is likely to be a
signficant amount of work, but it's a good way to break the review
into manageable pieces that get the project engaged.

Integration testing is also likely to be helpful.

People were very excited about the benefits reftable brings and it
feels a little strange to ask so much when you've already written this
code, but this seemed like the best way to make it into something the
project understands well and to get any kinks ironed out.  I'm happy
to help in any way I can.

Thoughts?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v8 0/9] Reftable support git-core
  2020-02-26  8:49           ` [PATCH v7 0/6] " Han-Wen Nienhuys via GitGitGadget
                               ` (7 preceding siblings ...)
  2020-03-24  6:06             ` Jonathan Nieder
@ 2020-04-01 11:28             ` Han-Wen Nienhuys via GitGitGadget
  2020-04-01 11:28               ` [PATCH v8 1/9] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
                                 ` (10 more replies)
  8 siblings, 11 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-01 11:28 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend.

Feedback wanted:

 * spots marked with XXX in the Git source code.
   
   
 * what is a good test strategy? Could we have a CI flavor where we flip the
   default to reftable, and see how it fares?
   
   

v8

 * propagate errors to git.
 * discard empty tables in the stack. 
 * one very basic test (t0031.sh)

v9

 * Added spec as technical doc.
 * Use 4-byte hash ID as API. Storage format for SHA256 is still pending
   discussion.

v10 

 * Serialize 4 byte hash ID to the version-2 format

v11

 * use reftable_ namespace prefix.
 * transaction API to provide locking.
 * reorganized record.{h,c}. More doc in reftable.h
 * support log record deletion.
 * pluggable malloc/free

Han-Wen Nienhuys (8):
  refs.h: clarify reflog iteration order
  create .git/refs in files-backend.c
  refs: document how ref_iterator_advance_fn should handle symrefs
  Add .gitattributes for the reftable/ directory
  reftable: define version 2 of the spec to accomodate SHA256
  reftable: clarify how empty tables should be written
  Add reftable library
  Reftable support for git-core

Jonathan Nieder (1):
  reftable: file format documentation

 Documentation/Makefile                        |    1 +
 Documentation/technical/reftable.txt          | 1080 ++++++++++++++++
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   24 +-
 builtin/clone.c                               |    5 +-
 builtin/init-db.c                             |   51 +-
 cache.h                                       |    4 +-
 refs.c                                        |   20 +-
 refs.h                                        |    7 +-
 refs/files-backend.c                          |    6 +
 refs/refs-internal.h                          |    6 +
 refs/reftable-backend.c                       | 1002 +++++++++++++++
 reftable/.gitattributes                       |    1 +
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   11 +
 reftable/VERSION                              |    5 +
 reftable/basics.c                             |  194 +++
 reftable/basics.h                             |   35 +
 reftable/block.c                              |  433 +++++++
 reftable/block.h                              |   76 ++
 reftable/blocksource.h                        |   22 +
 reftable/bytes.c                              |    0
 reftable/config.h                             |    1 +
 reftable/constants.h                          |   21 +
 reftable/dump.c                               |   97 ++
 reftable/file.c                               |   98 ++
 reftable/iter.c                               |  234 ++++
 reftable/iter.h                               |   56 +
 reftable/merged.c                             |  299 +++++
 reftable/merged.h                             |   34 +
 reftable/pq.c                                 |  115 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  756 +++++++++++
 reftable/reader.h                             |   60 +
 reftable/record.c                             | 1098 ++++++++++++++++
 reftable/record.h                             |  106 ++
 reftable/reftable.h                           |  520 ++++++++
 reftable/slice.c                              |  219 ++++
 reftable/slice.h                              |   42 +
 reftable/stack.c                              | 1132 +++++++++++++++++
 reftable/stack.h                              |   42 +
 reftable/system.h                             |   53 +
 reftable/tree.c                               |   67 +
 reftable/tree.h                               |   24 +
 reftable/update.sh                            |   14 +
 reftable/writer.c                             |  653 ++++++++++
 reftable/writer.h                             |   45 +
 reftable/zlib-compat.c                        |   92 ++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |   35 +
 52 files changed, 8954 insertions(+), 31 deletions(-)
 create mode 100644 Documentation/technical/reftable.txt
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/.gitattributes
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100755 t/t0031-reftable.sh


base-commit: 9fadedd637b312089337d73c3ed8447e9f0aa775
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v8
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v8
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v7:

  1:  b1e44bc431e =  1:  05634d26dd5 refs.h: clarify reflog iteration order
  2:  b68488a095e =  2:  8d34e2c5c8b create .git/refs in files-backend.c
  3:  da538ef7421 =  3:  d08f823844d refs: document how ref_iterator_advance_fn should handle symrefs
  -:  ----------- >  4:  ca95b3a371e Add .gitattributes for the reftable/ directory
  4:  aae26814983 =  5:  dfc8b131294 reftable: file format documentation
  -:  ----------- >  6:  45ab5361750 reftable: define version 2 of the spec to accomodate SHA256
  -:  ----------- >  7:  67ee5962e85 reftable: clarify how empty tables should be written
  5:  30ed43a4fdb !  8:  4f9bdd7312b Add reftable library
     @@ -5,21 +5,21 @@
          Reftable is a format for storing the ref database. Its rationale and
          specification is in the preceding commit.
      
     -    Further context and motivation can be found in background reading:
     +    This imports the upstream library as one big commit. For understanding
     +    the code, it is suggested to read the code in the following order:
      
     -    * Original discussion on JGit-dev:  https://www.eclipse.org/lists/jgit-dev/msg03389.html
     +    * The specification under Documentation/technical/reftable.txt
      
     -    * First design discussion on git@vger: https://public-inbox.org/git/CAJo=hJtTp2eA3z9wW9cHo-nA7kK40vVThqh6inXpbCcqfdMP9g@mail.gmail.com/
     +    * reftable.h - the public API
      
     -    * Last design discussion on git@vger: https://public-inbox.org/git/CAJo=hJsZcAM9sipdVr7TMD-FD2V2W6_pvMQ791EGCDsDkQ033w@mail.gmail.com/
     +    * block.{c,h} - reading and writing blocks.
      
     -    * First attempt at implementation: https://public-inbox.org/git/CAP8UFD0PPZSjBnxCA7ez91vBuatcHKQ+JUWvTD1iHcXzPBjPBg@mail.gmail.com/
     +    * writer.{c,h} - writing a complete reftable file.
      
     -    * libgit2 support issue: https://github.com/libgit2/libgit2/issues
     +    * merged.{c,h} and pq.{c,h} - reading a stack of reftables
      
     -    * GitLab support issue: https://gitlab.com/gitlab-org/git/issues/6
     -
     -    * go-git support issue: https://github.com/src-d/go-git/issues/1059
     +    * stack.{c,h} - writing and compacting stacks of reftable on the
     +    filesystem.
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
     @@ -82,11 +82,11 @@
       --- /dev/null
       +++ b/reftable/VERSION
      @@
     -+commit ccce3b3eb763e79b23a3b4677d65ecceb4155ba3
     ++commit 920f7413f7a25f308d9a203f584873f6753192ec
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Tue Feb 25 20:42:29 2020 +0100
     ++Date:   Wed Apr 1 12:41:41 2020 +0200
      +
     -+    C: rephrase the hash ID API in terms of a uint32_t
     ++    C: remove stray debug printfs
      
       diff --git a/reftable/basics.c b/reftable/basics.c
       new file mode 100644
     @@ -165,10 +165,10 @@
      +		return;
      +	}
      +	while (*p) {
     -+		free(*p);
     ++		reftable_free(*p);
      +		p++;
      +	}
     -+	free(a);
     ++	reftable_free(a);
      +}
      +
      +int names_length(char **names)
     @@ -201,8 +201,8 @@
      +		if (p < next) {
      +			if (names_len == names_cap) {
      +				names_cap = 2 * names_cap + 1;
     -+				names = realloc(names,
     -+						names_cap * sizeof(char *));
     ++				names = reftable_realloc(
     ++					names, names_cap * sizeof(char *));
      +			}
      +			names[names_len++] = xstrdup(p);
      +		}
     @@ -211,7 +211,7 @@
      +
      +	if (names_len == names_cap) {
      +		names_cap = 2 * names_cap + 1;
     -+		names = realloc(names, names_cap * sizeof(char *));
     ++		names = reftable_realloc(names, names_cap * sizeof(char *));
      +	}
      +
      +	names[names_len] = NULL;
     @@ -232,7 +232,7 @@
      +	return *a == *b;
      +}
      +
     -+const char *error_str(int err)
     ++const char *reftable_error_str(int err)
      +{
      +	switch (err) {
      +	case IO_ERROR:
     @@ -252,6 +252,40 @@
      +	default:
      +		return "unknown error code";
      +	}
     ++}
     ++
     ++void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
     ++void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
     ++void (*reftable_free_ptr)(void *) = &free;
     ++
     ++void *reftable_malloc(size_t sz)
     ++{
     ++	return (*reftable_malloc_ptr)(sz);
     ++}
     ++
     ++void *reftable_realloc(void *p, size_t sz)
     ++{
     ++	return (*reftable_realloc_ptr)(p, sz);
     ++}
     ++
     ++void reftable_free(void *p)
     ++{
     ++	reftable_free_ptr(p);
     ++}
     ++
     ++void *reftable_calloc(size_t sz)
     ++{
     ++	void *p = reftable_malloc(sz);
     ++	memset(p, 0, sz);
     ++	return p;
     ++}
     ++
     ++void reftable_set_alloc(void *(*malloc)(size_t),
     ++			void *(*realloc)(void *, size_t), void (*free)(void *))
     ++{
     ++	reftable_malloc_ptr = malloc;
     ++	reftable_realloc_ptr = realloc;
     ++	reftable_free_ptr = free;
      +}
      
       diff --git a/reftable/basics.h b/reftable/basics.h
     @@ -288,6 +322,11 @@
      +int names_equal(char **a, char **b);
      +int names_length(char **names);
      +
     ++void *reftable_malloc(size_t sz);
     ++void *reftable_realloc(void *p, size_t sz);
     ++void reftable_free(void *p);
     ++void *reftable_calloc(size_t sz);
     ++
      +#endif
      
       diff --git a/reftable/block.c b/reftable/block.c
     @@ -313,10 +352,32 @@
      +#include "reftable.h"
      +#include "zlib.h"
      +
     -+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
     -+				  struct slice key);
     ++int header_size(int version)
     ++{
     ++	switch (version) {
     ++	case 1:
     ++		return 24;
     ++	case 2:
     ++		return 28;
     ++	}
     ++	abort();
     ++}
     ++
     ++int footer_size(int version)
     ++{
     ++	switch (version) {
     ++	case 1:
     ++		return 68;
     ++	case 2:
     ++		return 72;
     ++	}
     ++	abort();
     ++}
     ++
     ++int block_writer_register_restart(struct reftable_block_writer *w, int n,
     ++				  bool restart, struct slice key);
      +
     -+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
     ++void block_writer_init(struct reftable_block_writer *bw, byte typ, byte *buf,
      +		       uint32_t block_size, uint32_t header_off, int hash_size)
      +{
      +	bw->buf = buf;
     @@ -329,14 +390,14 @@
      +	bw->entries = 0;
      +}
      +
     -+byte block_writer_type(struct block_writer *bw)
     ++byte block_writer_type(struct reftable_block_writer *bw)
      +{
      +	return bw->buf[bw->header_off];
      +}
      +
      +/* adds the record to the block. Returns -1 if it does not fit, 0 on
      +   success */
     -+int block_writer_add(struct block_writer *w, struct record rec)
     ++int block_writer_add(struct reftable_block_writer *w, struct record rec)
      +{
      +	struct slice empty = { 0 };
      +	struct slice last = w->entries % w->restart_interval == 0 ? empty :
     @@ -357,32 +418,29 @@
      +	if (n < 0) {
      +		goto err;
      +	}
     -+	out.buf += n;
     -+	out.len -= n;
     ++	slice_consume(&out, n);
      +
      +	n = record_encode(rec, out, w->hash_size);
      +	if (n < 0) {
      +		goto err;
      +	}
     -+
     -+	out.buf += n;
     -+	out.len -= n;
     ++	slice_consume(&out, n);
      +
      +	if (block_writer_register_restart(w, start.len - out.len, restart,
      +					  key) < 0) {
      +		goto err;
      +	}
      +
     -+	free(slice_yield(&key));
     ++	reftable_free(slice_yield(&key));
      +	return 0;
      +
      +err:
     -+	free(slice_yield(&key));
     ++	reftable_free(slice_yield(&key));
      +	return -1;
      +}
      +
     -+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
     -+				  struct slice key)
     ++int block_writer_register_restart(struct reftable_block_writer *w, int n,
     ++				  bool restart, struct slice key)
      +{
      +	int rlen = w->restart_len;
      +	if (rlen >= MAX_RESTARTS) {
     @@ -398,7 +456,7 @@
      +	if (restart) {
      +		if (w->restart_len == w->restart_cap) {
      +			w->restart_cap = w->restart_cap * 2 + 1;
     -+			w->restarts = realloc(
     ++			w->restarts = reftable_realloc(
      +				w->restarts, sizeof(uint32_t) * w->restart_cap);
      +		}
      +
     @@ -411,7 +469,7 @@
      +	return 0;
      +}
      +
     -+int block_writer_finish(struct block_writer *w)
     ++int block_writer_finish(struct reftable_block_writer *w)
      +{
      +	int i = 0;
      +	for (i = 0; i < w->restart_len; i++) {
     @@ -442,7 +500,7 @@
      +			}
      +
      +			if (Z_OK != zresult) {
     -+				free(slice_yield(&compressed));
     ++				reftable_free(slice_yield(&compressed));
      +				return ZLIB_ERROR;
      +			}
      +
     @@ -455,14 +513,14 @@
      +	return w->next;
      +}
      +
     -+byte block_reader_type(struct block_reader *r)
     ++byte block_reader_type(struct reftable_block_reader *r)
      +{
      +	return r->block.data[r->header_off];
      +}
      +
     -+int block_reader_init(struct block_reader *br, struct block *block,
     -+		      uint32_t header_off, uint32_t table_block_size,
     -+		      int hash_size)
     ++int block_reader_init(struct reftable_block_reader *br,
     ++		      struct reftable_block *block, uint32_t header_off,
     ++		      uint32_t table_block_size, int hash_size)
      +{
      +	uint32_t full_block_size = table_block_size;
      +	byte typ = block->data[header_off];
     @@ -485,7 +543,7 @@
      +				    uncompressed.buf + block_header_skip,
      +				    &dst_len, block->data + block_header_skip,
      +				    &src_len)) {
     -+			free(slice_yield(&uncompressed));
     ++			reftable_free(slice_yield(&uncompressed));
      +			return ZLIB_ERROR;
      +		}
      +
     @@ -526,12 +584,14 @@
      +	return 0;
      +}
      +
     -+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
     ++static uint32_t block_reader_restart_offset(struct reftable_block_reader *br,
     ++					    int i)
      +{
      +	return get_be24(br->restart_bytes + 3 * i);
      +}
      +
     -+void block_reader_start(struct block_reader *br, struct block_iter *it)
     ++void block_reader_start(struct reftable_block_reader *br,
     ++			struct reftable_block_iter *it)
      +{
      +	it->br = br;
      +	slice_resize(&it->last_key, 0);
     @@ -541,7 +601,7 @@
      +struct restart_find_args {
      +	int error;
      +	struct slice key;
     -+	struct block_reader *r;
     ++	struct reftable_block_reader *r;
      +};
      +
      +static int restart_key_less(int idx, void *args)
     @@ -566,12 +626,13 @@
      +
      +	{
      +		int result = slice_compare(a->key, rkey);
     -+		free(slice_yield(&rkey));
     ++		reftable_free(slice_yield(&rkey));
      +		return result;
      +	}
      +}
      +
     -+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
     ++void block_iter_copy_from(struct reftable_block_iter *dest,
     ++			  struct reftable_block_iter *src)
      +{
      +	dest->br = src->br;
      +	dest->next_off = src->next_off;
     @@ -579,7 +640,7 @@
      +}
      +
      +/* return < 0 for error, 0 for OK, > 0 for EOF. */
     -+int block_iter_next(struct block_iter *it, struct record rec)
     ++int block_iter_next(struct reftable_block_iter *it, struct record rec)
      +{
      +	if (it->next_off >= it->br->block_len) {
      +		return 1;
     @@ -598,23 +659,21 @@
      +			return -1;
      +		}
      +
     -+		in.buf += n;
     -+		in.len -= n;
     ++		slice_consume(&in, n);
      +		n = record_decode(rec, key, extra, in, it->br->hash_size);
      +		if (n < 0) {
      +			return -1;
      +		}
     -+		in.buf += n;
     -+		in.len -= n;
     ++		slice_consume(&in, n);
      +
      +		slice_copy(&it->last_key, key);
      +		it->next_off += start.len - in.len;
     -+		free(slice_yield(&key));
     ++		reftable_free(slice_yield(&key));
      +		return 0;
      +	}
      +}
      +
     -+int block_reader_first_key(struct block_reader *br, struct slice *key)
     ++int block_reader_first_key(struct reftable_block_reader *br, struct slice *key)
      +{
      +	struct slice empty = { 0 };
      +	int off = br->header_off + 4;
     @@ -631,18 +690,18 @@
      +	return 0;
      +}
      +
     -+int block_iter_seek(struct block_iter *it, struct slice want)
     ++int block_iter_seek(struct reftable_block_iter *it, struct slice want)
      +{
      +	return block_reader_seek(it->br, it, want);
      +}
      +
     -+void block_iter_close(struct block_iter *it)
     ++void block_iter_close(struct reftable_block_iter *it)
      +{
     -+	free(slice_yield(&it->last_key));
     ++	reftable_free(slice_yield(&it->last_key));
      +}
      +
     -+int block_reader_seek(struct block_reader *br, struct block_iter *it,
     -+		      struct slice want)
     ++int block_reader_seek(struct reftable_block_reader *br,
     ++		      struct reftable_block_iter *it, struct slice want)
      +{
      +	struct restart_find_args args = {
      +		.key = want,
     @@ -667,7 +726,7 @@
      +		struct slice key = { 0 };
      +		int result = 0;
      +		int err = 0;
     -+		struct block_iter next = { 0 };
     ++		struct reftable_block_iter next = { 0 };
      +		while (true) {
      +			block_iter_copy_from(&next, it);
      +
     @@ -687,25 +746,25 @@
      +		}
      +
      +	exit:
     -+		free(slice_yield(&key));
     -+		free(slice_yield(&next.last_key));
     ++		reftable_free(slice_yield(&key));
     ++		reftable_free(slice_yield(&next.last_key));
      +		record_clear(rec);
     -+		free(record_yield(&rec));
     ++		reftable_free(record_yield(&rec));
      +
      +		return result;
      +	}
      +}
      +
     -+void block_writer_reset(struct block_writer *bw)
     ++void block_writer_reset(struct reftable_block_writer *bw)
      +{
      +	bw->restart_len = 0;
      +	bw->last_key.len = 0;
      +}
      +
     -+void block_writer_clear(struct block_writer *bw)
     ++void block_writer_clear(struct reftable_block_writer *bw)
      +{
      +	FREE_AND_NULL(bw->restarts);
     -+	free(slice_yield(&bw->last_key));
     ++	reftable_free(slice_yield(&bw->last_key));
      +	/* the block is not owned. */
      +}
      
     @@ -729,7 +788,7 @@
      +#include "record.h"
      +#include "reftable.h"
      +
     -+struct block_writer {
     ++struct reftable_block_writer {
      +	byte *buf;
      +	uint32_t block_size;
      +	uint32_t header_off;
     @@ -744,17 +803,17 @@
      +	int entries;
      +};
      +
     -+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
     ++void block_writer_init(struct reftable_block_writer *bw, byte typ, byte *buf,
      +		       uint32_t block_size, uint32_t header_off, int hash_size);
     -+byte block_writer_type(struct block_writer *bw);
     -+int block_writer_add(struct block_writer *w, struct record rec);
     -+int block_writer_finish(struct block_writer *w);
     -+void block_writer_reset(struct block_writer *bw);
     -+void block_writer_clear(struct block_writer *bw);
     ++byte block_writer_type(struct reftable_block_writer *bw);
     ++int block_writer_add(struct reftable_block_writer *w, struct record rec);
     ++int block_writer_finish(struct reftable_block_writer *w);
     ++void block_writer_reset(struct reftable_block_writer *bw);
     ++void block_writer_clear(struct reftable_block_writer *bw);
      +
     -+struct block_reader {
     ++struct reftable_block_reader {
      +	uint32_t header_off;
     -+	struct block block;
     ++	struct reftable_block block;
      +	int hash_size;
      +
      +	/* size of the data, excluding restart data. */
     @@ -764,25 +823,30 @@
      +	uint16_t restart_count;
      +};
      +
     -+struct block_iter {
     ++struct reftable_block_iter {
      +	uint32_t next_off;
     -+	struct block_reader *br;
     ++	struct reftable_block_reader *br;
      +	struct slice last_key;
      +};
      +
     -+int block_reader_init(struct block_reader *br, struct block *bl,
     -+		      uint32_t header_off, uint32_t table_block_size,
     -+		      int hash_size);
     -+void block_reader_start(struct block_reader *br, struct block_iter *it);
     -+int block_reader_seek(struct block_reader *br, struct block_iter *it,
     -+		      struct slice want);
     -+byte block_reader_type(struct block_reader *r);
     -+int block_reader_first_key(struct block_reader *br, struct slice *key);
     -+
     -+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
     -+int block_iter_next(struct block_iter *it, struct record rec);
     -+int block_iter_seek(struct block_iter *it, struct slice want);
     -+void block_iter_close(struct block_iter *it);
     ++int block_reader_init(struct reftable_block_reader *br,
     ++		      struct reftable_block *bl, uint32_t header_off,
     ++		      uint32_t table_block_size, int hash_size);
     ++void block_reader_start(struct reftable_block_reader *br,
     ++			struct reftable_block_iter *it);
     ++int block_reader_seek(struct reftable_block_reader *br,
     ++		      struct reftable_block_iter *it, struct slice want);
     ++byte block_reader_type(struct reftable_block_reader *r);
     ++int block_reader_first_key(struct reftable_block_reader *br, struct slice *key);
     ++
     ++void block_iter_copy_from(struct reftable_block_iter *dest,
     ++			  struct reftable_block_iter *src);
     ++int block_iter_next(struct reftable_block_iter *it, struct record rec);
     ++int block_iter_seek(struct reftable_block_iter *it, struct slice want);
     ++void block_iter_close(struct reftable_block_iter *it);
     ++
     ++int header_size(int version);
     ++int footer_size(int version);
      +
      +#endif
      
     @@ -804,11 +868,13 @@
      +
      +#include "reftable.h"
      +
     -+uint64_t block_source_size(struct block_source source);
     -+int block_source_read_block(struct block_source source, struct block *dest,
     -+			    uint64_t off, uint32_t size);
     -+void block_source_return_block(struct block_source source, struct block *ret);
     -+void block_source_close(struct block_source source);
     ++uint64_t block_source_size(struct reftable_block_source source);
     ++int block_source_read_block(struct reftable_block_source source,
     ++			    struct reftable_block *dest, uint64_t off,
     ++			    uint32_t size);
     ++void block_source_return_block(struct reftable_block_source source,
     ++			       struct reftable_block *ret);
     ++void block_source_close(struct reftable_block_source source);
      +
      +#endif
      
     @@ -838,10 +904,6 @@
      +#ifndef CONSTANTS_H
      +#define CONSTANTS_H
      +
     -+#define VERSION 1
     -+#define HEADER_SIZE 24
     -+#define FOOTER_SIZE 68
     -+
      +#define BLOCK_TYPE_LOG 'g'
      +#define BLOCK_TYPE_INDEX 'i'
      +#define BLOCK_TYPE_REF 'r'
     @@ -987,10 +1049,10 @@
      +	return ((struct file_block_source *)b)->size;
      +}
      +
     -+static void file_return_block(void *b, struct block *dest)
     ++static void file_return_block(void *b, struct reftable_block *dest)
      +{
      +	memset(dest->data, 0xff, dest->len);
     -+	free(dest->data);
     ++	reftable_free(dest->data);
      +}
      +
      +static void file_close(void *b)
     @@ -1001,15 +1063,15 @@
      +		((struct file_block_source *)b)->fd = 0;
      +	}
      +
     -+	free(b);
     ++	reftable_free(b);
      +}
      +
     -+static int file_read_block(void *v, struct block *dest, uint64_t off,
     ++static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
      +			   uint32_t size)
      +{
      +	struct file_block_source *b = (struct file_block_source *)v;
      +	assert(off + size <= b->size);
     -+	dest->data = malloc(size);
     ++	dest->data = reftable_malloc(size);
      +	if (pread(b->fd, dest->data, size, off) != size) {
      +		return -1;
      +	}
     @@ -1017,14 +1079,15 @@
      +	return size;
      +}
      +
     -+struct block_source_vtable file_vtable = {
     ++struct reftable_block_source_vtable file_vtable = {
      +	.size = &file_size,
      +	.read_block = &file_read_block,
      +	.return_block = &file_return_block,
      +	.close = &file_close,
      +};
      +
     -+int block_source_from_file(struct block_source *bs, const char *name)
     ++int reftable_block_source_from_file(struct reftable_block_source *bs,
     ++				    const char *name)
      +{
      +	struct stat st = { 0 };
      +	int err = 0;
     @@ -1043,7 +1106,7 @@
      +
      +	{
      +		struct file_block_source *p =
     -+			calloc(sizeof(struct file_block_source), 1);
     ++			reftable_calloc(sizeof(struct file_block_source));
      +		p->size = st.st_size;
      +		p->fd = fd;
      +
     @@ -1053,7 +1116,7 @@
      +	return 0;
      +}
      +
     -+int fd_writer(void *arg, byte *data, int sz)
     ++int reftable_fd_write(void *arg, byte *data, int sz)
      +{
      +	int *fdp = (int *)arg;
      +	return write(*fdp, data, sz);
     @@ -1081,7 +1144,7 @@
      +#include "reader.h"
      +#include "reftable.h"
      +
     -+bool iterator_is_null(struct iterator it)
     ++bool iterator_is_null(struct reftable_iterator it)
      +{
      +	return it.ops == NULL;
      +}
     @@ -1095,23 +1158,23 @@
      +{
      +}
      +
     -+struct iterator_vtable empty_vtable = {
     ++struct reftable_iterator_vtable empty_vtable = {
      +	.next = &empty_iterator_next,
      +	.close = &empty_iterator_close,
      +};
      +
     -+void iterator_set_empty(struct iterator *it)
     ++void iterator_set_empty(struct reftable_iterator *it)
      +{
      +	it->iter_arg = NULL;
      +	it->ops = &empty_vtable;
      +}
      +
     -+int iterator_next(struct iterator it, struct record rec)
     ++int iterator_next(struct reftable_iterator it, struct record rec)
      +{
      +	return it.ops->next(it.iter_arg, rec);
      +}
      +
     -+void iterator_destroy(struct iterator *it)
     ++void reftable_iterator_destroy(struct reftable_iterator *it)
      +{
      +	if (it->ops == NULL) {
      +		return;
     @@ -1121,14 +1184,16 @@
      +	FREE_AND_NULL(it->iter_arg);
      +}
      +
     -+int iterator_next_ref(struct iterator it, struct ref_record *ref)
     ++int reftable_iterator_next_ref(struct reftable_iterator it,
     ++			       struct reftable_ref_record *ref)
      +{
      +	struct record rec = { 0 };
      +	record_from_ref(&rec, ref);
      +	return iterator_next(it, rec);
      +}
      +
     -+int iterator_next_log(struct iterator it, struct log_record *log)
     ++int reftable_iterator_next_log(struct reftable_iterator it,
     ++			       struct reftable_log_record *log)
      +{
      +	struct record rec = { 0 };
      +	record_from_log(&rec, log);
     @@ -1139,31 +1204,33 @@
      +{
      +	struct filtering_ref_iterator *fri =
      +		(struct filtering_ref_iterator *)iter_arg;
     -+	free(slice_yield(&fri->oid));
     -+	iterator_destroy(&fri->it);
     ++	reftable_free(slice_yield(&fri->oid));
     ++	reftable_iterator_destroy(&fri->it);
      +}
      +
      +static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
      +{
      +	struct filtering_ref_iterator *fri =
      +		(struct filtering_ref_iterator *)iter_arg;
     -+	struct ref_record *ref = (struct ref_record *)rec.data;
     ++	struct reftable_ref_record *ref =
     ++		(struct reftable_ref_record *)rec.data;
      +
      +	while (true) {
     -+		int err = iterator_next_ref(fri->it, ref);
     ++		int err = reftable_iterator_next_ref(fri->it, ref);
      +		if (err != 0) {
      +			return err;
      +		}
      +
      +		if (fri->double_check) {
     -+			struct iterator it = { 0 };
     ++			struct reftable_iterator it = { 0 };
      +
     -+			int err = reader_seek_ref(fri->r, &it, ref->ref_name);
     ++			int err = reftable_reader_seek_ref(fri->r, &it,
     ++							   ref->ref_name);
      +			if (err == 0) {
     -+				err = iterator_next_ref(it, ref);
     ++				err = reftable_iterator_next_ref(it, ref);
      +			}
      +
     -+			iterator_destroy(&it);
     ++			reftable_iterator_destroy(&it);
      +
      +			if (err < 0) {
      +				return err;
     @@ -1183,12 +1250,12 @@
      +	}
      +}
      +
     -+struct iterator_vtable filtering_ref_iterator_vtable = {
     ++struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
      +	.next = &filtering_ref_iterator_next,
      +	.close = &filtering_ref_iterator_close,
      +};
      +
     -+void iterator_from_filtering_ref_iterator(struct iterator *it,
     ++void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
      +					  struct filtering_ref_iterator *fri)
      +{
      +	it->iter_arg = fri;
     @@ -1200,7 +1267,7 @@
      +	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
      +	block_iter_close(&it->cur);
      +	reader_return_block(it->r, &it->block_reader.block);
     -+	free(slice_yield(&it->oid));
     ++	reftable_free(slice_yield(&it->oid));
      +}
      +
      +static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
     @@ -1231,7 +1298,8 @@
      +static int indexed_table_ref_iter_next(void *p, struct record rec)
      +{
      +	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
     -+	struct ref_record *ref = (struct ref_record *)rec.data;
     ++	struct reftable_ref_record *ref =
     ++		(struct reftable_ref_record *)rec.data;
      +
      +	while (true) {
      +		int err = block_iter_next(&it->cur, rec);
     @@ -1259,11 +1327,11 @@
      +}
      +
      +int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
     -+			       struct reader *r, byte *oid, int oid_len,
     -+			       uint64_t *offsets, int offset_len)
     ++			       struct reftable_reader *r, byte *oid,
     ++			       int oid_len, uint64_t *offsets, int offset_len)
      +{
      +	struct indexed_table_ref_iter *itr =
     -+		calloc(sizeof(struct indexed_table_ref_iter), 1);
     ++		reftable_calloc(sizeof(struct indexed_table_ref_iter));
      +	int err = 0;
      +
      +	itr->r = r;
     @@ -1275,19 +1343,19 @@
      +
      +	err = indexed_table_ref_iter_next_block(itr);
      +	if (err < 0) {
     -+		free(itr);
     ++		reftable_free(itr);
      +	} else {
      +		*dest = itr;
      +	}
      +	return err;
      +}
      +
     -+struct iterator_vtable indexed_table_ref_iter_vtable = {
     ++struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
      +	.next = &indexed_table_ref_iter_next,
      +	.close = &indexed_table_ref_iter_close,
      +};
      +
     -+void iterator_from_indexed_table_ref_iter(struct iterator *it,
     ++void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
      +					  struct indexed_table_ref_iter *itr)
      +{
      +	it->iter_arg = itr;
     @@ -1314,27 +1382,27 @@
      +#include "record.h"
      +#include "slice.h"
      +
     -+struct iterator_vtable {
     ++struct reftable_iterator_vtable {
      +	int (*next)(void *iter_arg, struct record rec);
      +	void (*close)(void *iter_arg);
      +};
      +
     -+void iterator_set_empty(struct iterator *it);
     -+int iterator_next(struct iterator it, struct record rec);
     -+bool iterator_is_null(struct iterator it);
     ++void iterator_set_empty(struct reftable_iterator *it);
     ++int iterator_next(struct reftable_iterator it, struct record rec);
     ++bool iterator_is_null(struct reftable_iterator it);
      +
      +struct filtering_ref_iterator {
      +	bool double_check;
     -+	struct reader *r;
     ++	struct reftable_reader *r;
      +	struct slice oid;
     -+	struct iterator it;
     ++	struct reftable_iterator it;
      +};
      +
     -+void iterator_from_filtering_ref_iterator(struct iterator *,
     ++void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
      +					  struct filtering_ref_iterator *);
      +
      +struct indexed_table_ref_iter {
     -+	struct reader *r;
     ++	struct reftable_reader *r;
      +	struct slice oid;
      +
      +	/* mutable */
     @@ -1343,16 +1411,16 @@
      +	/* Points to the next offset to read. */
      +	int offset_idx;
      +	int offset_len;
     -+	struct block_reader block_reader;
     -+	struct block_iter cur;
     ++	struct reftable_block_reader block_reader;
     ++	struct reftable_block_iter cur;
      +	bool finished;
      +};
      +
     -+void iterator_from_indexed_table_ref_iter(struct iterator *it,
     ++void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
      +					  struct indexed_table_ref_iter *itr);
      +int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
     -+			       struct reader *r, byte *oid, int oid_len,
     -+			       uint64_t *offsets, int offset_len);
     ++			       struct reftable_reader *r, byte *oid,
     ++			       int oid_len, uint64_t *offsets, int offset_len);
      +
      +#endif
      
     @@ -1389,9 +1457,9 @@
      +		}
      +
      +		if (err > 0) {
     -+			iterator_destroy(&mi->stack[i]);
     ++			reftable_iterator_destroy(&mi->stack[i]);
      +			record_clear(rec);
     -+			free(record_yield(&rec));
     ++			reftable_free(record_yield(&rec));
      +		} else {
      +			struct pq_entry e = {
      +				.rec = rec,
     @@ -1410,9 +1478,9 @@
      +	int i = 0;
      +	merged_iter_pqueue_clear(&mi->pq);
      +	for (i = 0; i < mi->stack_len; i++) {
     -+		iterator_destroy(&mi->stack[i]);
     ++		reftable_iterator_destroy(&mi->stack[i]);
      +	}
     -+	free(mi->stack);
     ++	reftable_free(mi->stack);
      +}
      +
      +static int merged_iter_advance_subiter(struct merged_iter *mi, int idx)
     @@ -1433,9 +1501,9 @@
      +		}
      +
      +		if (err > 0) {
     -+			iterator_destroy(&mi->stack[idx]);
     ++			reftable_iterator_destroy(&mi->stack[idx]);
      +			record_clear(rec);
     -+			free(record_yield(&rec));
     ++			reftable_free(record_yield(&rec));
      +			return 0;
      +		}
      +
     @@ -1462,7 +1530,7 @@
      +		record_key(top.rec, &k);
      +
      +		cmp = slice_compare(k, entry_key);
     -+		free(slice_yield(&k));
     ++		reftable_free(slice_yield(&k));
      +
      +		if (cmp > 0) {
      +			break;
     @@ -1474,13 +1542,13 @@
      +			return err;
      +		}
      +		record_clear(top.rec);
     -+		free(record_yield(&top.rec));
     ++		reftable_free(record_yield(&top.rec));
      +	}
      +
      +	record_copy_from(rec, entry.rec, hash_size(mi->hash_id));
      +	record_clear(entry.rec);
     -+	free(record_yield(&entry.rec));
     -+	free(slice_yield(&entry_key));
     ++	reftable_free(record_yield(&entry.rec));
     ++	reftable_free(slice_yield(&entry_key));
      +	return 0;
      +}
      +
     @@ -1494,41 +1562,42 @@
      +	return merged_iter_next(mi, rec);
      +}
      +
     -+struct iterator_vtable merged_iter_vtable = {
     ++struct reftable_iterator_vtable merged_iter_vtable = {
      +	.next = &merged_iter_next_void,
      +	.close = &merged_iter_close,
      +};
      +
     -+static void iterator_from_merged_iter(struct iterator *it,
     ++static void iterator_from_merged_iter(struct reftable_iterator *it,
      +				      struct merged_iter *mi)
      +{
      +	it->iter_arg = mi;
      +	it->ops = &merged_iter_vtable;
      +}
      +
     -+int new_merged_table(struct merged_table **dest, struct reader **stack, int n,
     -+		     uint32_t hash_id)
     ++int reftable_new_merged_table(struct reftable_merged_table **dest,
     ++			      struct reftable_reader **stack, int n,
     ++			      uint32_t hash_id)
      +{
      +	uint64_t last_max = 0;
      +	uint64_t first_min = 0;
      +	int i = 0;
      +	for (i = 0; i < n; i++) {
     -+		struct reader *r = stack[i];
     ++		struct reftable_reader *r = stack[i];
      +		if (r->hash_id != hash_id) {
      +			return FORMAT_ERROR;
      +		}
     -+		if (i > 0 && last_max >= reader_min_update_index(r)) {
     ++		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
      +			return FORMAT_ERROR;
      +		}
      +		if (i == 0) {
     -+			first_min = reader_min_update_index(r);
     ++			first_min = reftable_reader_min_update_index(r);
      +		}
      +
     -+		last_max = reader_max_update_index(r);
     ++		last_max = reftable_reader_max_update_index(r);
      +	}
      +
      +	{
     -+		struct merged_table m = {
     ++		struct reftable_merged_table m = {
      +			.stack = stack,
      +			.stack_len = n,
      +			.min = first_min,
     @@ -1536,52 +1605,56 @@
      +			.hash_id = hash_id,
      +		};
      +
     -+		*dest = calloc(sizeof(struct merged_table), 1);
     ++		*dest = reftable_calloc(sizeof(struct reftable_merged_table));
      +		**dest = m;
      +	}
      +	return 0;
      +}
      +
     -+void merged_table_close(struct merged_table *mt)
     ++void reftable_merged_table_close(struct reftable_merged_table *mt)
      +{
      +	int i = 0;
      +	for (i = 0; i < mt->stack_len; i++) {
     -+		reader_free(mt->stack[i]);
     ++		reftable_reader_free(mt->stack[i]);
      +	}
      +	FREE_AND_NULL(mt->stack);
      +	mt->stack_len = 0;
      +}
      +
      +/* clears the list of subtable, without affecting the readers themselves. */
     -+void merged_table_clear(struct merged_table *mt)
     ++void merged_table_clear(struct reftable_merged_table *mt)
      +{
      +	FREE_AND_NULL(mt->stack);
      +	mt->stack_len = 0;
      +}
      +
     -+void merged_table_free(struct merged_table *mt)
     ++void reftable_merged_table_free(struct reftable_merged_table *mt)
      +{
      +	if (mt == NULL) {
      +		return;
      +	}
      +	merged_table_clear(mt);
     -+	free(mt);
     ++	reftable_free(mt);
      +}
      +
     -+uint64_t merged_max_update_index(struct merged_table *mt)
     ++uint64_t
     ++reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
      +{
      +	return mt->max;
      +}
      +
     -+uint64_t merged_min_update_index(struct merged_table *mt)
     ++uint64_t
     ++reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
      +{
      +	return mt->min;
      +}
      +
     -+static int merged_table_seek_record(struct merged_table *mt,
     -+				    struct iterator *it, struct record rec)
     ++static int merged_table_seek_record(struct reftable_merged_table *mt,
     ++				    struct reftable_iterator *it,
     ++				    struct record rec)
      +{
     -+	struct iterator *iters = calloc(sizeof(struct iterator), mt->stack_len);
     ++	struct reftable_iterator *iters = reftable_calloc(
     ++		sizeof(struct reftable_iterator) * mt->stack_len);
      +	struct merged_iter merged = {
      +		.stack = iters,
      +		.typ = record_type(rec),
     @@ -1602,9 +1675,9 @@
      +	if (err < 0) {
      +		int i = 0;
      +		for (i = 0; i < n; i++) {
     -+			iterator_destroy(&iters[i]);
     ++			reftable_iterator_destroy(&iters[i]);
      +		}
     -+		free(iters);
     ++		reftable_free(iters);
      +		return err;
      +	}
      +
     @@ -1615,17 +1688,19 @@
      +	}
      +
      +	{
     -+		struct merged_iter *p = malloc(sizeof(struct merged_iter));
     ++		struct merged_iter *p =
     ++			reftable_malloc(sizeof(struct merged_iter));
      +		*p = merged;
      +		iterator_from_merged_iter(it, p);
      +	}
      +	return 0;
      +}
      +
     -+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
     -+			  const char *name)
     ++int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
     ++				   struct reftable_iterator *it,
     ++				   const char *name)
      +{
     -+	struct ref_record ref = {
     ++	struct reftable_ref_record ref = {
      +		.ref_name = (char *)name,
      +	};
      +	struct record rec = { 0 };
     @@ -1633,10 +1708,11 @@
      +	return merged_table_seek_record(mt, it, rec);
      +}
      +
     -+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
     -+			     const char *name, uint64_t update_index)
     ++int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
     ++				      struct reftable_iterator *it,
     ++				      const char *name, uint64_t update_index)
      +{
     -+	struct log_record log = {
     ++	struct reftable_log_record log = {
      +		.ref_name = (char *)name,
      +		.update_index = update_index,
      +	};
     @@ -1645,11 +1721,12 @@
      +	return merged_table_seek_record(mt, it, rec);
      +}
      +
     -+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
     -+			  const char *name)
     ++int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
     ++				   struct reftable_iterator *it,
     ++				   const char *name)
      +{
      +	uint64_t max = ~((uint64_t)0);
     -+	return merged_table_seek_log_at(mt, it, name, max);
     ++	return reftable_merged_table_seek_log_at(mt, it, name, max);
      +}
      
       diff --git a/reftable/merged.h b/reftable/merged.h
     @@ -1671,8 +1748,8 @@
      +#include "pq.h"
      +#include "reftable.h"
      +
     -+struct merged_table {
     -+	struct reader **stack;
     ++struct reftable_merged_table {
     ++	struct reftable_reader **stack;
      +	int stack_len;
      +	uint32_t hash_id;
      +
     @@ -1681,14 +1758,14 @@
      +};
      +
      +struct merged_iter {
     -+	struct iterator *stack;
     ++	struct reftable_iterator *stack;
      +	uint32_t hash_id;
      +	int stack_len;
      +	byte typ;
      +	struct merged_iter_pqueue pq;
     -+} merged_iter;
     ++};
      +
     -+void merged_table_clear(struct merged_table *mt);
     ++void merged_table_clear(struct reftable_merged_table *mt);
      +
      +#endif
      
     @@ -1719,8 +1796,8 @@
      +
      +	cmp = slice_compare(ak, bk);
      +
     -+	free(slice_yield(&ak));
     -+	free(slice_yield(&bk));
     ++	reftable_free(slice_yield(&ak));
     ++	reftable_free(slice_yield(&bk));
      +
      +	if (cmp == 0) {
      +		return a.index > b.index;
     @@ -1784,7 +1861,8 @@
      +	int i = 0;
      +	if (pq->len == pq->cap) {
      +		pq->cap = 2 * pq->cap + 1;
     -+		pq->heap = realloc(pq->heap, pq->cap * sizeof(struct pq_entry));
     ++		pq->heap = reftable_realloc(pq->heap,
     ++					    pq->cap * sizeof(struct pq_entry));
      +	}
      +
      +	pq->heap[pq->len++] = e;
     @@ -1806,7 +1884,7 @@
      +	int i = 0;
      +	for (i = 0; i < pq->len; i++) {
      +		record_clear(pq->heap[i].rec);
     -+		free(record_yield(&pq->heap[i].rec));
     ++		reftable_free(record_yield(&pq->heap[i].rec));
      +	}
      +	FREE_AND_NULL(pq->heap);
      +	pq->len = pq->cap = 0;
     @@ -1876,20 +1954,22 @@
      +#include "reftable.h"
      +#include "tree.h"
      +
     -+uint64_t block_source_size(struct block_source source)
     ++uint64_t block_source_size(struct reftable_block_source source)
      +{
      +	return source.ops->size(source.arg);
      +}
      +
     -+int block_source_read_block(struct block_source source, struct block *dest,
     -+			    uint64_t off, uint32_t size)
     ++int block_source_read_block(struct reftable_block_source source,
     ++			    struct reftable_block *dest, uint64_t off,
     ++			    uint32_t size)
      +{
      +	int result = source.ops->read_block(source.arg, dest, off, size);
      +	dest->source = source;
      +	return result;
      +}
      +
     -+void block_source_return_block(struct block_source source, struct block *blockp)
     ++void block_source_return_block(struct reftable_block_source source,
     ++			       struct reftable_block *blockp)
      +{
      +	source.ops->return_block(source.arg, blockp);
      +	blockp->data = NULL;
     @@ -1898,7 +1978,7 @@
      +	blockp->source.arg = NULL;
      +}
      +
     -+void block_source_close(struct block_source *source)
     ++void block_source_close(struct reftable_block_source *source)
      +{
      +	if (source->ops == NULL) {
      +		return;
     @@ -1908,7 +1988,8 @@
      +	source->ops = NULL;
      +}
      +
     -+static struct reader_offsets *reader_offsets_for(struct reader *r, byte typ)
     ++static struct reftable_reader_offsets *
     ++reader_offsets_for(struct reftable_reader *r, byte typ)
      +{
      +	switch (typ) {
      +	case BLOCK_TYPE_REF:
     @@ -1921,7 +2002,8 @@
      +	abort();
      +}
      +
     -+static int reader_get_block(struct reader *r, struct block *dest, uint64_t off,
     ++static int reader_get_block(struct reftable_reader *r,
     ++			    struct reftable_block *dest, uint64_t off,
      +			    uint32_t sz)
      +{
      +	if (off >= r->size) {
     @@ -1935,22 +2017,22 @@
      +	return block_source_read_block(r->source, dest, off, sz);
      +}
      +
     -+void reader_return_block(struct reader *r, struct block *p)
     ++void reader_return_block(struct reftable_reader *r, struct reftable_block *p)
      +{
      +	block_source_return_block(r->source, p);
      +}
      +
     -+uint32_t reader_hash_id(struct reader *r)
     ++uint32_t reftable_reader_hash_id(struct reftable_reader *r)
      +{
      +	return r->hash_id;
      +}
      +
     -+const char *reader_name(struct reader *r)
     ++const char *reader_name(struct reftable_reader *r)
      +{
      +	return r->name;
      +}
      +
     -+static int parse_footer(struct reader *r, byte *footer, byte *header)
     ++static int parse_footer(struct reftable_reader *r, byte *footer, byte *header)
      +{
      +	byte *f = footer;
      +	int err = 0;
     @@ -1960,25 +2042,12 @@
      +	}
      +	f += 4;
      +
     -+	if (memcmp(footer, header, HEADER_SIZE)) {
     ++	if (memcmp(footer, header, header_size(r->version))) {
      +		err = FORMAT_ERROR;
      +		goto exit;
      +	}
      +
     -+	r->hash_id = SHA1_ID;
     -+	{
     -+		byte version = *f++;
     -+		if (version == 2) {
     -+			/* DO NOT SUBMIT.  Not yet in the standard. */
     -+			r->hash_id = SHA256_ID;
     -+			version = 1;
     -+		}
     -+		if (version != 1) {
     -+			err = FORMAT_ERROR;
     -+			goto exit;
     -+		}
     -+	}
     -+
     ++	f++;
      +	r->block_size = get_be24(f);
      +
      +	f += 3;
     @@ -1987,6 +2056,22 @@
      +	r->max_update_index = get_be64(f);
      +	f += 8;
      +
     ++	if (r->version == 1) {
     ++		r->hash_id = SHA1_ID;
     ++	} else {
     ++		r->hash_id = get_be32(f);
     ++		switch (r->hash_id) {
     ++		case SHA1_ID:
     ++			break;
     ++		case SHA256_ID:
     ++			break;
     ++		default:
     ++			err = FORMAT_ERROR;
     ++			goto exit;
     ++		}
     ++		f += 4;
     ++	}
     ++
      +	r->ref_offsets.index_offset = get_be64(f);
      +	f += 8;
      +
     @@ -2014,7 +2099,7 @@
      +	}
      +
      +	{
     -+		byte first_block_typ = header[HEADER_SIZE];
     ++		byte first_block_typ = header[header_size(r->version)];
      +		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
      +		r->ref_offsets.offset = 0;
      +		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
     @@ -2026,27 +2111,40 @@
      +	return err;
      +}
      +
     -+int init_reader(struct reader *r, struct block_source source, const char *name)
     ++int init_reader(struct reftable_reader *r, struct reftable_block_source source,
     ++		const char *name)
      +{
     -+	struct block footer = { 0 };
     -+	struct block header = { 0 };
     ++	struct reftable_block footer = { 0 };
     ++	struct reftable_block header = { 0 };
      +	int err = 0;
      +
     -+	memset(r, 0, sizeof(struct reader));
     -+	r->size = block_source_size(source) - FOOTER_SIZE;
     -+	r->source = source;
     -+	r->name = xstrdup(name);
     -+	r->hash_id = 0;
     ++	memset(r, 0, sizeof(struct reftable_reader));
      +
     -+	err = block_source_read_block(source, &footer, r->size, FOOTER_SIZE);
     -+	if (err != FOOTER_SIZE) {
     ++	/* Need +1 to read type of first block. */
     ++	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
     ++	if (err != header_size(2) + 1) {
      +		err = IO_ERROR;
      +		goto exit;
      +	}
      +
     -+	/* Need +1 to read type of first block. */
     -+	err = reader_get_block(r, &header, 0, HEADER_SIZE + 1);
     -+	if (err != HEADER_SIZE + 1) {
     ++	if (memcmp(header.data, "REFT", 4)) {
     ++		err = FORMAT_ERROR;
     ++		goto exit;
     ++	}
     ++	r->version = header.data[4];
     ++	if (r->version != 1 && r->version != 2) {
     ++		err = FORMAT_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	r->size = block_source_size(source) - footer_size(r->version);
     ++	r->source = source;
     ++	r->name = xstrdup(name);
     ++	r->hash_id = 0;
     ++
     ++	err = block_source_read_block(source, &footer, r->size,
     ++				      footer_size(r->version));
     ++	if (err != footer_size(r->version)) {
      +		err = IO_ERROR;
      +		goto exit;
      +	}
     @@ -2059,10 +2157,10 @@
      +}
      +
      +struct table_iter {
     -+	struct reader *r;
     ++	struct reftable_reader *r;
      +	byte typ;
      +	uint64_t block_off;
     -+	struct block_iter bi;
     ++	struct reftable_block_iter bi;
      +	bool finished;
      +};
      +
     @@ -2080,7 +2178,7 @@
      +{
      +	int res = block_iter_next(&ti->bi, rec);
      +	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
     -+		((struct ref_record *)rec.data)->update_index +=
     ++		((struct reftable_ref_record *)rec.data)->update_index +=
      +			ti->r->min_update_index;
      +	}
      +
     @@ -2099,12 +2197,13 @@
      +	ti->bi.next_off = 0;
      +}
      +
     -+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off)
     ++static int32_t extract_block_size(byte *data, byte *typ, uint64_t off,
     ++				  int version)
      +{
      +	int32_t result = 0;
      +
      +	if (off == 0) {
     -+		data += 24;
     ++		data += header_size(version);
      +	}
      +
      +	*typ = data[0];
     @@ -2114,15 +2213,16 @@
      +	return result;
      +}
      +
     -+int reader_init_block_reader(struct reader *r, struct block_reader *br,
     ++int reader_init_block_reader(struct reftable_reader *r,
     ++			     struct reftable_block_reader *br,
      +			     uint64_t next_off, byte want_typ)
      +{
      +	int32_t guess_block_size = r->block_size ? r->block_size :
      +						   DEFAULT_BLOCK_SIZE;
     -+	struct block block = { 0 };
     ++	struct reftable_block block = { 0 };
      +	byte block_typ = 0;
      +	int err = 0;
     -+	uint32_t header_off = next_off ? 0 : HEADER_SIZE;
     ++	uint32_t header_off = next_off ? 0 : header_size(r->version);
      +	int32_t block_size = 0;
      +
      +	if (next_off >= r->size) {
     @@ -2134,7 +2234,8 @@
      +		return err;
      +	}
      +
     -+	block_size = extract_block_size(block.data, &block_typ, next_off);
     ++	block_size = extract_block_size(block.data, &block_typ, next_off,
     ++					r->version);
      +	if (block_size < 0) {
      +		return block_size;
      +	}
     @@ -2160,7 +2261,7 @@
      +				 struct table_iter *src)
      +{
      +	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
     -+	struct block_reader br = { 0 };
     ++	struct reftable_block_reader br = { 0 };
      +	int err = 0;
      +
      +	dest->r = src->r;
     @@ -2177,7 +2278,8 @@
      +	}
      +
      +	{
     -+		struct block_reader *brp = malloc(sizeof(struct block_reader));
     ++		struct reftable_block_reader *brp =
     ++			reftable_malloc(sizeof(struct reftable_block_reader));
      +		*brp = br;
      +
      +		dest->finished = false;
     @@ -2229,29 +2331,30 @@
      +	block_iter_close(&ti->bi);
      +}
      +
     -+struct iterator_vtable table_iter_vtable = {
     ++struct reftable_iterator_vtable table_iter_vtable = {
      +	.next = &table_iter_next_void,
      +	.close = &table_iter_close,
      +};
      +
     -+static void iterator_from_table_iter(struct iterator *it, struct table_iter *ti)
     ++static void iterator_from_table_iter(struct reftable_iterator *it,
     ++				     struct table_iter *ti)
      +{
      +	it->iter_arg = ti;
      +	it->ops = &table_iter_vtable;
      +}
      +
     -+static int reader_table_iter_at(struct reader *r, struct table_iter *ti,
     -+				uint64_t off, byte typ)
     ++static int reader_table_iter_at(struct reftable_reader *r,
     ++				struct table_iter *ti, uint64_t off, byte typ)
      +{
     -+	struct block_reader br = { 0 };
     -+	struct block_reader *brp = NULL;
     ++	struct reftable_block_reader br = { 0 };
     ++	struct reftable_block_reader *brp = NULL;
      +
      +	int err = reader_init_block_reader(r, &br, off, typ);
      +	if (err != 0) {
      +		return err;
      +	}
      +
     -+	brp = malloc(sizeof(struct block_reader));
     ++	brp = reftable_malloc(sizeof(struct reftable_block_reader));
      +	*brp = br;
      +	ti->r = r;
      +	ti->typ = block_reader_type(brp);
     @@ -2260,10 +2363,10 @@
      +	return 0;
      +}
      +
     -+static int reader_start(struct reader *r, struct table_iter *ti, byte typ,
     -+			bool index)
     ++static int reader_start(struct reftable_reader *r, struct table_iter *ti,
     ++			byte typ, bool index)
      +{
     -+	struct reader_offsets *offs = reader_offsets_for(r, typ);
     ++	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
      +	uint64_t off = offs->offset;
      +	if (index) {
      +		off = offs->index_offset;
     @@ -2276,7 +2379,7 @@
      +	return reader_table_iter_at(r, ti, off, typ);
      +}
      +
     -+static int reader_seek_linear(struct reader *r, struct table_iter *ti,
     ++static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
      +			      struct record want)
      +{
      +	struct record rec = new_record(record_type(want));
     @@ -2321,14 +2424,14 @@
      +exit:
      +	block_iter_close(&next.bi);
      +	record_clear(rec);
     -+	free(record_yield(&rec));
     -+	free(slice_yield(&want_key));
     -+	free(slice_yield(&got_key));
     ++	reftable_free(record_yield(&rec));
     ++	reftable_free(slice_yield(&want_key));
     ++	reftable_free(slice_yield(&got_key));
      +	return err;
      +}
      +
     -+static int reader_seek_indexed(struct reader *r, struct iterator *it,
     -+			       struct record rec)
     ++static int reader_seek_indexed(struct reftable_reader *r,
     ++			       struct reftable_iterator *it, struct record rec)
      +{
      +	struct index_record want_index = { 0 };
      +	struct record want_index_rec = { 0 };
     @@ -2380,7 +2483,7 @@
      +
      +	if (err == 0) {
      +		struct table_iter *malloced =
     -+			calloc(sizeof(struct table_iter), 1);
     ++			reftable_calloc(sizeof(struct table_iter));
      +		table_iter_copy_from(malloced, &next);
      +		iterator_from_table_iter(it, malloced);
      +	}
     @@ -2392,10 +2495,11 @@
      +	return err;
      +}
      +
     -+static int reader_seek_internal(struct reader *r, struct iterator *it,
     -+				struct record rec)
     ++static int reader_seek_internal(struct reftable_reader *r,
     ++				struct reftable_iterator *it, struct record rec)
      +{
     -+	struct reader_offsets *offs = reader_offsets_for(r, record_type(rec));
     ++	struct reftable_reader_offsets *offs =
     ++		reader_offsets_for(r, record_type(rec));
      +	uint64_t idx = offs->index_offset;
      +	struct table_iter ti = { 0 };
      +	int err = 0;
     @@ -2413,7 +2517,8 @@
      +	}
      +
      +	{
     -+		struct table_iter *p = malloc(sizeof(struct table_iter));
     ++		struct table_iter *p =
     ++			reftable_malloc(sizeof(struct table_iter));
      +		*p = ti;
      +		iterator_from_table_iter(it, p);
      +	}
     @@ -2421,11 +2526,12 @@
      +	return 0;
      +}
      +
     -+int reader_seek(struct reader *r, struct iterator *it, struct record rec)
     ++int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
     ++		struct record rec)
      +{
      +	byte typ = record_type(rec);
      +
     -+	struct reader_offsets *offs = reader_offsets_for(r, typ);
     ++	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
      +	if (!offs->present) {
      +		iterator_set_empty(it);
      +		return 0;
     @@ -2434,9 +2540,10 @@
      +	return reader_seek_internal(r, it, rec);
      +}
      +
     -+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name)
     ++int reftable_reader_seek_ref(struct reftable_reader *r,
     ++			     struct reftable_iterator *it, const char *name)
      +{
     -+	struct ref_record ref = {
     ++	struct reftable_ref_record ref = {
      +		.ref_name = (char *)name,
      +	};
      +	struct record rec = { 0 };
     @@ -2444,10 +2551,11 @@
      +	return reader_seek(r, it, rec);
      +}
      +
     -+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
     -+		       uint64_t update_index)
     ++int reftable_reader_seek_log_at(struct reftable_reader *r,
     ++				struct reftable_iterator *it, const char *name,
     ++				uint64_t update_index)
      +{
     -+	struct log_record log = {
     ++	struct reftable_log_record log = {
      +		.ref_name = (char *)name,
      +		.update_index = update_index,
      +	};
     @@ -2456,45 +2564,49 @@
      +	return reader_seek(r, it, rec);
      +}
      +
     -+int reader_seek_log(struct reader *r, struct iterator *it, const char *name)
     ++int reftable_reader_seek_log(struct reftable_reader *r,
     ++			     struct reftable_iterator *it, const char *name)
      +{
      +	uint64_t max = ~((uint64_t)0);
     -+	return reader_seek_log_at(r, it, name, max);
     ++	return reftable_reader_seek_log_at(r, it, name, max);
      +}
      +
     -+void reader_close(struct reader *r)
     ++void reader_close(struct reftable_reader *r)
      +{
      +	block_source_close(&r->source);
      +	FREE_AND_NULL(r->name);
      +}
      +
     -+int new_reader(struct reader **p, struct block_source src, char const *name)
     ++int reftable_new_reader(struct reftable_reader **p,
     ++			struct reftable_block_source src, char const *name)
      +{
     -+	struct reader *rd = calloc(sizeof(struct reader), 1);
     ++	struct reftable_reader *rd =
     ++		reftable_calloc(sizeof(struct reftable_reader));
      +	int err = init_reader(rd, src, name);
      +	if (err == 0) {
      +		*p = rd;
      +	} else {
     -+		free(rd);
     ++		reftable_free(rd);
      +	}
      +	return err;
      +}
      +
     -+void reader_free(struct reader *r)
     ++void reftable_reader_free(struct reftable_reader *r)
      +{
      +	reader_close(r);
     -+	free(r);
     ++	reftable_free(r);
      +}
      +
     -+static int reader_refs_for_indexed(struct reader *r, struct iterator *it,
     -+				   byte *oid)
     ++static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
     ++					    struct reftable_iterator *it,
     ++					    byte *oid)
      +{
      +	struct obj_record want = {
      +		.hash_prefix = oid,
      +		.hash_prefix_len = r->object_id_len,
      +	};
      +	struct record want_rec = { 0 };
     -+	struct iterator oit = { 0 };
     ++	struct reftable_iterator oit = { 0 };
      +	struct obj_record got = { 0 };
      +	struct record got_rec = { 0 };
      +	int err = 0;
     @@ -2508,7 +2620,7 @@
      +
      +	record_from_obj(&got_rec, &got);
      +	err = iterator_next(oit, got_rec);
     -+	iterator_destroy(&oit);
     ++	reftable_iterator_destroy(&oit);
      +	if (err < 0) {
      +		return err;
      +	}
     @@ -2537,18 +2649,19 @@
      +	return 0;
      +}
      +
     -+static int reader_refs_for_unindexed(struct reader *r, struct iterator *it,
     -+				     byte *oid, int oid_len)
     ++static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
     ++					      struct reftable_iterator *it,
     ++					      byte *oid, int oid_len)
      +{
     -+	struct table_iter *ti = calloc(sizeof(struct table_iter), 1);
     ++	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
      +	struct filtering_ref_iterator *filter = NULL;
      +	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
      +	if (err < 0) {
     -+		free(ti);
     ++		reftable_free(ti);
      +		return err;
      +	}
      +
     -+	filter = calloc(sizeof(struct filtering_ref_iterator), 1);
     ++	filter = reftable_calloc(sizeof(struct filtering_ref_iterator));
      +	slice_resize(&filter->oid, oid_len);
      +	memcpy(filter->oid.buf, oid, oid_len);
      +	filter->r = r;
     @@ -2559,21 +2672,22 @@
      +	return 0;
      +}
      +
     -+int reader_refs_for(struct reader *r, struct iterator *it, byte *oid,
     -+		    int oid_len)
     ++int reftable_reader_refs_for(struct reftable_reader *r,
     ++			     struct reftable_iterator *it, byte *oid,
     ++			     int oid_len)
      +{
      +	if (r->obj_offsets.present) {
     -+		return reader_refs_for_indexed(r, it, oid);
     ++		return reftable_reader_refs_for_indexed(r, it, oid);
      +	}
     -+	return reader_refs_for_unindexed(r, it, oid, oid_len);
     ++	return reftable_reader_refs_for_unindexed(r, it, oid, oid_len);
      +}
      +
     -+uint64_t reader_max_update_index(struct reader *r)
     ++uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
      +{
      +	return r->max_update_index;
      +}
      +
     -+uint64_t reader_min_update_index(struct reader *r)
     ++uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
      +{
      +	return r->min_update_index;
      +}
     @@ -2598,40 +2712,48 @@
      +#include "record.h"
      +#include "reftable.h"
      +
     -+uint64_t block_source_size(struct block_source source);
     ++uint64_t block_source_size(struct reftable_block_source source);
      +
     -+int block_source_read_block(struct block_source source, struct block *dest,
     -+			    uint64_t off, uint32_t size);
     -+void block_source_return_block(struct block_source source, struct block *ret);
     -+void block_source_close(struct block_source *source);
     ++int block_source_read_block(struct reftable_block_source source,
     ++			    struct reftable_block *dest, uint64_t off,
     ++			    uint32_t size);
     ++void block_source_return_block(struct reftable_block_source source,
     ++			       struct reftable_block *ret);
     ++void block_source_close(struct reftable_block_source *source);
      +
     -+struct reader_offsets {
     ++struct reftable_reader_offsets {
      +	bool present;
      +	uint64_t offset;
      +	uint64_t index_offset;
      +};
      +
     -+struct reader {
     ++struct reftable_reader {
      +	char *name;
     -+	struct block_source source;
     ++	struct reftable_block_source source;
      +	uint32_t hash_id;
     ++
     ++	// Size of the file, excluding the footer.
      +	uint64_t size;
      +	uint32_t block_size;
      +	uint64_t min_update_index;
      +	uint64_t max_update_index;
      +	int object_id_len;
     ++	int version;
      +
     -+	struct reader_offsets ref_offsets;
     -+	struct reader_offsets obj_offsets;
     -+	struct reader_offsets log_offsets;
     ++	struct reftable_reader_offsets ref_offsets;
     ++	struct reftable_reader_offsets obj_offsets;
     ++	struct reftable_reader_offsets log_offsets;
      +};
      +
     -+int init_reader(struct reader *r, struct block_source source, const char *name);
     -+int reader_seek(struct reader *r, struct iterator *it, struct record rec);
     -+void reader_close(struct reader *r);
     -+const char *reader_name(struct reader *r);
     -+void reader_return_block(struct reader *r, struct block *p);
     -+int reader_init_block_reader(struct reader *r, struct block_reader *br,
     ++int init_reader(struct reftable_reader *r, struct reftable_block_source source,
     ++		const char *name);
     ++int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
     ++		struct record rec);
     ++void reader_close(struct reftable_reader *r);
     ++const char *reader_name(struct reftable_reader *r);
     ++void reader_return_block(struct reftable_reader *r, struct reftable_block *p);
     ++int reader_init_block_reader(struct reftable_reader *r,
     ++			     struct reftable_block_reader *br,
      +			     uint64_t next_off, byte want_typ);
      +
      +#endif
     @@ -2649,6 +2771,8 @@
      +https://developers.google.com/open-source/licenses/bsd
      +*/
      +
     ++/* record.c - methods for different types of records. */
     ++
      +#include "record.h"
      +
      +#include "system.h"
     @@ -2656,18 +2780,6 @@
      +#include "constants.h"
      +#include "reftable.h"
      +
     -+int is_block_type(byte typ)
     -+{
     -+	switch (typ) {
     -+	case BLOCK_TYPE_REF:
     -+	case BLOCK_TYPE_LOG:
     -+	case BLOCK_TYPE_OBJ:
     -+	case BLOCK_TYPE_INDEX:
     -+		return true;
     -+	}
     -+	return false;
     -+}
     -+
      +int get_var_int(uint64_t *dest, struct slice in)
      +{
      +	int ptr = 0;
     @@ -2716,17 +2828,16 @@
      +	}
      +}
      +
     -+int common_prefix_size(struct slice a, struct slice b)
     ++int is_block_type(byte typ)
      +{
     -+	int p = 0;
     -+	while (p < a.len && p < b.len) {
     -+		if (a.buf[p] != b.buf[p]) {
     -+			break;
     -+		}
     -+		p++;
     ++	switch (typ) {
     ++	case BLOCK_TYPE_REF:
     ++	case BLOCK_TYPE_LOG:
     ++	case BLOCK_TYPE_OBJ:
     ++	case BLOCK_TYPE_INDEX:
     ++		return true;
      +	}
     -+
     -+	return p;
     ++	return false;
      +}
      +
      +static int decode_string(struct slice *dest, struct slice in)
     @@ -2737,8 +2848,7 @@
      +	if (n <= 0) {
      +		return -1;
      +	}
     -+	in.buf += n;
     -+	in.len -= n;
     ++	slice_consume(&in, n);
      +	if (in.len < tsize) {
      +		return -1;
      +	}
     @@ -2746,12 +2856,29 @@
      +	slice_resize(dest, tsize + 1);
      +	dest->buf[tsize] = 0;
      +	memcpy(dest->buf, in.buf, tsize);
     -+	in.buf += tsize;
     -+	in.len -= tsize;
     ++	slice_consume(&in, tsize);
      +
      +	return start_len - in.len;
      +}
      +
     ++static int encode_string(char *str, struct slice s)
     ++{
     ++	struct slice start = s;
     ++	int l = strlen(str);
     ++	int n = put_var_int(s, l);
     ++	if (n < 0) {
     ++		return -1;
     ++	}
     ++	slice_consume(&s, n);
     ++	if (s.len < l) {
     ++		return -1;
     ++	}
     ++	memcpy(s.buf, str, l);
     ++	slice_consume(&s, l);
     ++
     ++	return start.len - s.len;
     ++}
     ++
      +int encode_key(bool *restart, struct slice dest, struct slice prev_key,
      +	       struct slice key, byte extra)
      +{
     @@ -2762,8 +2889,7 @@
      +	if (n < 0) {
      +		return -1;
      +	}
     -+	dest.buf += n;
     -+	dest.len -= n;
     ++	slice_consume(&dest, n);
      +
      +	*restart = (prefix_len == 0);
      +
     @@ -2771,39 +2897,77 @@
      +	if (n < 0) {
      +		return -1;
      +	}
     -+	dest.buf += n;
     -+	dest.len -= n;
     ++	slice_consume(&dest, n);
      +
      +	if (dest.len < suffix_len) {
      +		return -1;
      +	}
      +	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
     -+	dest.buf += suffix_len;
     -+	dest.len -= suffix_len;
     ++	slice_consume(&dest, suffix_len);
      +
      +	return start.len - dest.len;
      +}
      +
     -+static byte ref_record_type(void)
     ++int decode_key(struct slice *key, byte *extra, struct slice last_key,
     ++	       struct slice in)
     ++{
     ++	int start_len = in.len;
     ++	uint64_t prefix_len = 0;
     ++	uint64_t suffix_len = 0;
     ++	int n = get_var_int(&prefix_len, in);
     ++	if (n < 0) {
     ++		return -1;
     ++	}
     ++	slice_consume(&in, n);
     ++
     ++	if (prefix_len > last_key.len) {
     ++		return -1;
     ++	}
     ++
     ++	n = get_var_int(&suffix_len, in);
     ++	if (n <= 0) {
     ++		return -1;
     ++	}
     ++	slice_consume(&in, n);
     ++
     ++	*extra = (byte)(suffix_len & 0x7);
     ++	suffix_len >>= 3;
     ++
     ++	if (in.len < suffix_len) {
     ++		return -1;
     ++	}
     ++
     ++	slice_resize(key, suffix_len + prefix_len);
     ++	memcpy(key->buf, last_key.buf, prefix_len);
     ++
     ++	memcpy(key->buf + prefix_len, in.buf, suffix_len);
     ++	slice_consume(&in, suffix_len);
     ++
     ++	return start_len - in.len;
     ++}
     ++
     ++static byte reftable_ref_record_type(void)
      +{
      +	return BLOCK_TYPE_REF;
      +}
      +
     -+static void ref_record_key(const void *r, struct slice *dest)
     ++static void reftable_ref_record_key(const void *r, struct slice *dest)
      +{
     -+	const struct ref_record *rec = (const struct ref_record *)r;
     ++	const struct reftable_ref_record *rec =
     ++		(const struct reftable_ref_record *)r;
      +	slice_set_string(dest, rec->ref_name);
      +}
      +
     -+static void ref_record_copy_from(void *rec, const void *src_rec, int hash_size)
     ++static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
     ++					  int hash_size)
      +{
     -+	struct ref_record *ref = (struct ref_record *)rec;
     -+	struct ref_record *src = (struct ref_record *)src_rec;
     ++	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
     ++	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
      +	assert(hash_size > 0);
      +
      +	/* This is simple and correct, but we could probably reuse the hash
      +	   fields. */
     -+	ref_record_clear(ref);
     ++	reftable_ref_record_clear(ref);
      +	if (src->ref_name != NULL) {
      +		ref->ref_name = xstrdup(src->ref_name);
      +	}
     @@ -2813,12 +2977,12 @@
      +	}
      +
      +	if (src->target_value != NULL) {
     -+		ref->target_value = malloc(hash_size);
     ++		ref->target_value = reftable_malloc(hash_size);
      +		memcpy(ref->target_value, src->target_value, hash_size);
      +	}
      +
      +	if (src->value != NULL) {
     -+		ref->value = malloc(hash_size);
     ++		ref->value = reftable_malloc(hash_size);
      +		memcpy(ref->value, src->value, hash_size);
      +	}
      +	ref->update_index = src->update_index;
     @@ -2845,7 +3009,7 @@
      +	}
      +}
      +
     -+void ref_record_print(struct ref_record *ref, int hash_size)
     ++void reftable_ref_record_print(struct reftable_ref_record *ref, int hash_size)
      +{
      +	char hex[SHA256_SIZE + 1] = { 0 };
      +
     @@ -2864,23 +3028,24 @@
      +	printf("}\n");
      +}
      +
     -+static void ref_record_clear_void(void *rec)
     ++static void reftable_ref_record_clear_void(void *rec)
      +{
     -+	ref_record_clear((struct ref_record *)rec);
     ++	reftable_ref_record_clear((struct reftable_ref_record *)rec);
      +}
      +
     -+void ref_record_clear(struct ref_record *ref)
     ++void reftable_ref_record_clear(struct reftable_ref_record *ref)
      +{
     -+	free(ref->ref_name);
     -+	free(ref->target);
     -+	free(ref->target_value);
     -+	free(ref->value);
     -+	memset(ref, 0, sizeof(struct ref_record));
     ++	reftable_free(ref->ref_name);
     ++	reftable_free(ref->target);
     ++	reftable_free(ref->target_value);
     ++	reftable_free(ref->value);
     ++	memset(ref, 0, sizeof(struct reftable_ref_record));
      +}
      +
     -+static byte ref_record_val_type(const void *rec)
     ++static byte reftable_ref_record_val_type(const void *rec)
      +{
     -+	const struct ref_record *r = (const struct ref_record *)rec;
     ++	const struct reftable_ref_record *r =
     ++		(const struct reftable_ref_record *)rec;
      +	if (r->value != NULL) {
      +		if (r->target_value != NULL) {
      +			return 2;
     @@ -2893,45 +3058,25 @@
      +	return 0;
      +}
      +
     -+static int encode_string(char *str, struct slice s)
     -+{
     -+	struct slice start = s;
     -+	int l = strlen(str);
     -+	int n = put_var_int(s, l);
     -+	if (n < 0) {
     -+		return -1;
     -+	}
     -+	s.buf += n;
     -+	s.len -= n;
     -+	if (s.len < l) {
     -+		return -1;
     -+	}
     -+	memcpy(s.buf, str, l);
     -+	s.buf += l;
     -+	s.len -= l;
     -+
     -+	return start.len - s.len;
     -+}
     -+
     -+static int ref_record_encode(const void *rec, struct slice s, int hash_size)
     ++static int reftable_ref_record_encode(const void *rec, struct slice s,
     ++				      int hash_size)
      +{
     -+	const struct ref_record *r = (const struct ref_record *)rec;
     ++	const struct reftable_ref_record *r =
     ++		(const struct reftable_ref_record *)rec;
      +	struct slice start = s;
      +	int n = put_var_int(s, r->update_index);
      +	assert(hash_size > 0);
      +	if (n < 0) {
      +		return -1;
      +	}
     -+	s.buf += n;
     -+	s.len -= n;
     ++	slice_consume(&s, n);
      +
      +	if (r->value != NULL) {
      +		if (s.len < hash_size) {
      +			return -1;
      +		}
      +		memcpy(s.buf, r->value, hash_size);
     -+		s.buf += hash_size;
     -+		s.len -= hash_size;
     ++		slice_consume(&s, hash_size);
      +	}
      +
      +	if (r->target_value != NULL) {
     @@ -2939,8 +3084,7 @@
      +			return -1;
      +		}
      +		memcpy(s.buf, r->target_value, hash_size);
     -+		s.buf += hash_size;
     -+		s.len -= hash_size;
     ++		slice_consume(&s, hash_size);
      +	}
      +
      +	if (r->target != NULL) {
     @@ -2948,17 +3092,17 @@
      +		if (n < 0) {
      +			return -1;
      +		}
     -+		s.buf += n;
     -+		s.len -= n;
     ++		slice_consume(&s, n);
      +	}
      +
      +	return start.len - s.len;
      +}
      +
     -+static int ref_record_decode(void *rec, struct slice key, byte val_type,
     -+			     struct slice in, int hash_size)
     ++static int reftable_ref_record_decode(void *rec, struct slice key,
     ++				      byte val_type, struct slice in,
     ++				      int hash_size)
      +{
     -+	struct ref_record *r = (struct ref_record *)rec;
     ++	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
      +	struct slice start = in;
      +	bool seen_value = false;
      +	bool seen_target_value = false;
     @@ -2970,10 +3114,9 @@
      +	}
      +	assert(hash_size > 0);
      +
     -+	in.buf += n;
     -+	in.len -= n;
     ++	slice_consume(&in, n);
      +
     -+	r->ref_name = realloc(r->ref_name, key.len + 1);
     ++	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
      +	memcpy(r->ref_name, key.buf, key.len);
      +	r->ref_name[key.len] = 0;
      +
     @@ -2985,22 +3128,20 @@
      +		}
      +
      +		if (r->value == NULL) {
     -+			r->value = malloc(hash_size);
     ++			r->value = reftable_malloc(hash_size);
      +		}
      +		seen_value = true;
      +		memcpy(r->value, in.buf, hash_size);
     -+		in.buf += hash_size;
     -+		in.len -= hash_size;
     ++		slice_consume(&in, hash_size);
      +		if (val_type == 1) {
      +			break;
      +		}
      +		if (r->target_value == NULL) {
     -+			r->target_value = malloc(hash_size);
     ++			r->target_value = reftable_malloc(hash_size);
      +		}
      +		seen_target_value = true;
      +		memcpy(r->target_value, in.buf, hash_size);
     -+		in.buf += hash_size;
     -+		in.len -= hash_size;
     ++		slice_consume(&in, hash_size);
      +		break;
      +	case 3: {
      +		struct slice dest = { 0 };
     @@ -3008,8 +3149,7 @@
      +		if (n < 0) {
      +			return -1;
      +		}
     -+		in.buf += n;
     -+		in.len -= n;
     ++		slice_consume(&in, n);
      +		seen_target = true;
      +		r->target = (char *)slice_as_string(&dest);
      +	} break;
     @@ -3034,55 +3174,14 @@
      +	return start.len - in.len;
      +}
      +
     -+int decode_key(struct slice *key, byte *extra, struct slice last_key,
     -+	       struct slice in)
     -+{
     -+	int start_len = in.len;
     -+	uint64_t prefix_len = 0;
     -+	uint64_t suffix_len = 0;
     -+	int n = get_var_int(&prefix_len, in);
     -+	if (n < 0) {
     -+		return -1;
     -+	}
     -+	in.buf += n;
     -+	in.len -= n;
     -+
     -+	if (prefix_len > last_key.len) {
     -+		return -1;
     -+	}
     -+
     -+	n = get_var_int(&suffix_len, in);
     -+	if (n <= 0) {
     -+		return -1;
     -+	}
     -+	in.buf += n;
     -+	in.len -= n;
     -+
     -+	*extra = (byte)(suffix_len & 0x7);
     -+	suffix_len >>= 3;
     -+
     -+	if (in.len < suffix_len) {
     -+		return -1;
     -+	}
     -+
     -+	slice_resize(key, suffix_len + prefix_len);
     -+	memcpy(key->buf, last_key.buf, prefix_len);
     -+
     -+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
     -+	in.buf += suffix_len;
     -+	in.len -= suffix_len;
     -+
     -+	return start_len - in.len;
     -+}
     -+
     -+struct record_vtable ref_record_vtable = {
     -+	.key = &ref_record_key,
     -+	.type = &ref_record_type,
     -+	.copy_from = &ref_record_copy_from,
     -+	.val_type = &ref_record_val_type,
     -+	.encode = &ref_record_encode,
     -+	.decode = &ref_record_decode,
     -+	.clear = &ref_record_clear_void,
     ++struct record_vtable reftable_ref_record_vtable = {
     ++	.key = &reftable_ref_record_key,
     ++	.type = &reftable_ref_record_type,
     ++	.copy_from = &reftable_ref_record_copy_from,
     ++	.val_type = &reftable_ref_record_val_type,
     ++	.encode = &reftable_ref_record_encode,
     ++	.decode = &reftable_ref_record_decode,
     ++	.clear = &reftable_ref_record_clear_void,
      +};
      +
      +static byte obj_record_type(void)
     @@ -3103,12 +3202,12 @@
      +	const struct obj_record *src = (const struct obj_record *)src_rec;
      +
      +	*ref = *src;
     -+	ref->hash_prefix = malloc(ref->hash_prefix_len);
     ++	ref->hash_prefix = reftable_malloc(ref->hash_prefix_len);
      +	memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
      +
      +	{
      +		int olen = ref->offset_len * sizeof(uint64_t);
     -+		ref->offsets = malloc(olen);
     ++		ref->offsets = reftable_malloc(olen);
      +		memcpy(ref->offsets, src->offsets, olen);
      +	}
      +}
     @@ -3140,8 +3239,7 @@
      +		if (n < 0) {
      +			return -1;
      +		}
     -+		s.buf += n;
     -+		s.len -= n;
     ++		slice_consume(&s, n);
      +	}
      +	if (r->offset_len == 0) {
      +		return start.len - s.len;
     @@ -3150,8 +3248,7 @@
      +	if (n < 0) {
      +		return -1;
      +	}
     -+	s.buf += n;
     -+	s.len -= n;
     ++	slice_consume(&s, n);
      +
      +	{
      +		uint64_t last = r->offsets[0];
     @@ -3161,8 +3258,7 @@
      +			if (n < 0) {
      +				return -1;
      +			}
     -+			s.buf += n;
     -+			s.len -= n;
     ++			slice_consume(&s, n);
      +			last = r->offsets[i];
      +		}
      +	}
     @@ -3176,7 +3272,7 @@
      +	struct obj_record *r = (struct obj_record *)rec;
      +	uint64_t count = val_type;
      +	int n = 0;
     -+	r->hash_prefix = malloc(key.len);
     ++	r->hash_prefix = reftable_malloc(key.len);
      +	memcpy(r->hash_prefix, key.buf, key.len);
      +	r->hash_prefix_len = key.len;
      +
     @@ -3186,8 +3282,7 @@
      +			return n;
      +		}
      +
     -+		in.buf += n;
     -+		in.len -= n;
     ++		slice_consume(&in, n);
      +	}
      +
      +	r->offsets = NULL;
     @@ -3196,16 +3291,14 @@
      +		return start.len - in.len;
      +	}
      +
     -+	r->offsets = malloc(count * sizeof(uint64_t));
     ++	r->offsets = reftable_malloc(count * sizeof(uint64_t));
      +	r->offset_len = count;
      +
      +	n = get_var_int(&r->offsets[0], in);
      +	if (n < 0) {
      +		return n;
      +	}
     -+
     -+	in.buf += n;
     -+	in.len -= n;
     ++	slice_consume(&in, n);
      +
      +	{
      +		uint64_t last = r->offsets[0];
     @@ -3216,9 +3309,7 @@
      +			if (n < 0) {
      +				return n;
      +			}
     -+
     -+			in.buf += n;
     -+			in.len -= n;
     ++			slice_consume(&in, n);
      +
      +			last = r->offsets[j] = (delta + last);
      +			j++;
     @@ -3237,7 +3328,7 @@
      +	.clear = &obj_record_clear,
      +};
      +
     -+void log_record_print(struct log_record *log, int hash_size)
     ++void reftable_log_record_print(struct reftable_log_record *log, int hash_size)
      +{
      +	char hex[SHA256_SIZE + 1] = { 0 };
      +
     @@ -3250,14 +3341,15 @@
      +	printf("%s\n\n%s\n}\n", hex, log->message);
      +}
      +
     -+static byte log_record_type(void)
     ++static byte reftable_log_record_type(void)
      +{
      +	return BLOCK_TYPE_LOG;
      +}
      +
     -+static void log_record_key(const void *r, struct slice *dest)
     ++static void reftable_log_record_key(const void *r, struct slice *dest)
      +{
     -+	const struct log_record *rec = (const struct log_record *)r;
     ++	const struct reftable_log_record *rec =
     ++		(const struct reftable_log_record *)r;
      +	int len = strlen(rec->ref_name);
      +	uint64_t ts = 0;
      +	slice_resize(dest, len + 9);
     @@ -3266,10 +3358,12 @@
      +	put_be64(dest->buf + 1 + len, ts);
      +}
      +
     -+static void log_record_copy_from(void *rec, const void *src_rec, int hash_size)
     ++static void reftable_log_record_copy_from(void *rec, const void *src_rec,
     ++					  int hash_size)
      +{
     -+	struct log_record *dst = (struct log_record *)rec;
     -+	const struct log_record *src = (const struct log_record *)src_rec;
     ++	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
     ++	const struct reftable_log_record *src =
     ++		(const struct reftable_log_record *)src_rec;
      +
      +	*dst = *src;
      +	dst->ref_name = xstrdup(dst->ref_name);
     @@ -3277,46 +3371,54 @@
      +	dst->name = xstrdup(dst->name);
      +	dst->message = xstrdup(dst->message);
      +	if (dst->new_hash != NULL) {
     -+		dst->new_hash = malloc(hash_size);
     ++		dst->new_hash = reftable_malloc(hash_size);
      +		memcpy(dst->new_hash, src->new_hash, hash_size);
      +	}
      +	if (dst->old_hash != NULL) {
     -+		dst->old_hash = malloc(hash_size);
     ++		dst->old_hash = reftable_malloc(hash_size);
      +		memcpy(dst->old_hash, src->old_hash, hash_size);
      +	}
      +}
      +
     -+static void log_record_clear_void(void *rec)
     ++static void reftable_log_record_clear_void(void *rec)
      +{
     -+	struct log_record *r = (struct log_record *)rec;
     -+	log_record_clear(r);
     ++	struct reftable_log_record *r = (struct reftable_log_record *)rec;
     ++	reftable_log_record_clear(r);
      +}
      +
     -+void log_record_clear(struct log_record *r)
     ++void reftable_log_record_clear(struct reftable_log_record *r)
      +{
     -+	free(r->ref_name);
     -+	free(r->new_hash);
     -+	free(r->old_hash);
     -+	free(r->name);
     -+	free(r->email);
     -+	free(r->message);
     -+	memset(r, 0, sizeof(struct log_record));
     ++	reftable_free(r->ref_name);
     ++	reftable_free(r->new_hash);
     ++	reftable_free(r->old_hash);
     ++	reftable_free(r->name);
     ++	reftable_free(r->email);
     ++	reftable_free(r->message);
     ++	memset(r, 0, sizeof(struct reftable_log_record));
      +}
      +
     -+static byte log_record_val_type(const void *rec)
     ++static byte reftable_log_record_val_type(const void *rec)
      +{
     -+	return 1;
     ++	const struct reftable_log_record *log =
     ++		(const struct reftable_log_record *)rec;
     ++
     ++	return reftable_log_record_is_deletion(log) ? 0 : 1;
      +}
      +
      +static byte zero[SHA256_SIZE] = { 0 };
      +
     -+static int log_record_encode(const void *rec, struct slice s, int hash_size)
     ++static int reftable_log_record_encode(const void *rec, struct slice s,
     ++				      int hash_size)
      +{
     -+	struct log_record *r = (struct log_record *)rec;
     ++	struct reftable_log_record *r = (struct reftable_log_record *)rec;
      +	struct slice start = s;
      +	int n = 0;
      +	byte *oldh = r->old_hash;
      +	byte *newh = r->new_hash;
     ++	if (reftable_log_record_is_deletion(r)) {
     ++		return 0;
     ++	}
     ++
      +	if (oldh == NULL) {
      +		oldh = zero;
      +	}
     @@ -3330,53 +3432,48 @@
      +
      +	memcpy(s.buf, oldh, hash_size);
      +	memcpy(s.buf + hash_size, newh, hash_size);
     -+	s.buf += 2 * hash_size;
     -+	s.len -= 2 * hash_size;
     ++	slice_consume(&s, 2 * hash_size);
      +
      +	n = encode_string(r->name ? r->name : "", s);
      +	if (n < 0) {
      +		return -1;
      +	}
     -+	s.len -= n;
     -+	s.buf += n;
     ++	slice_consume(&s, n);
      +
      +	n = encode_string(r->email ? r->email : "", s);
      +	if (n < 0) {
      +		return -1;
      +	}
     -+	s.len -= n;
     -+	s.buf += n;
     ++	slice_consume(&s, n);
      +
      +	n = put_var_int(s, r->time);
      +	if (n < 0) {
      +		return -1;
      +	}
     -+	s.buf += n;
     -+	s.len -= n;
     ++	slice_consume(&s, n);
      +
      +	if (s.len < 2) {
      +		return -1;
      +	}
      +
      +	put_be16(s.buf, r->tz_offset);
     -+	s.buf += 2;
     -+	s.len -= 2;
     ++	slice_consume(&s, 2);
      +
      +	n = encode_string(r->message ? r->message : "", s);
      +	if (n < 0) {
      +		return -1;
      +	}
     -+	s.len -= n;
     -+	s.buf += n;
     ++	slice_consume(&s, n);
      +
      +	return start.len - s.len;
      +}
      +
     -+static int log_record_decode(void *rec, struct slice key, byte val_type,
     -+			     struct slice in, int hash_size)
     ++static int reftable_log_record_decode(void *rec, struct slice key,
     ++				      byte val_type, struct slice in,
     ++				      int hash_size)
      +{
      +	struct slice start = in;
     -+	struct log_record *r = (struct log_record *)rec;
     ++	struct reftable_log_record *r = (struct reftable_log_record *)rec;
      +	uint64_t max = 0;
      +	uint64_t ts = 0;
      +	struct slice dest = { 0 };
     @@ -3386,33 +3483,35 @@
      +		return FORMAT_ERROR;
      +	}
      +
     -+	r->ref_name = realloc(r->ref_name, key.len - 8);
     ++	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
      +	memcpy(r->ref_name, key.buf, key.len - 8);
      +	ts = get_be64(key.buf + key.len - 8);
      +
      +	r->update_index = (~max) - ts;
      +
     ++	if (val_type == 0) {
     ++		return 0;
     ++	}
     ++
      +	if (in.len < 2 * hash_size) {
      +		return FORMAT_ERROR;
      +	}
      +
     -+	r->old_hash = realloc(r->old_hash, hash_size);
     -+	r->new_hash = realloc(r->new_hash, hash_size);
     ++	r->old_hash = reftable_realloc(r->old_hash, hash_size);
     ++	r->new_hash = reftable_realloc(r->new_hash, hash_size);
      +
      +	memcpy(r->old_hash, in.buf, hash_size);
      +	memcpy(r->new_hash, in.buf + hash_size, hash_size);
      +
     -+	in.buf += 2 * hash_size;
     -+	in.len -= 2 * hash_size;
     ++	slice_consume(&in, 2 * hash_size);
      +
      +	n = decode_string(&dest, in);
      +	if (n < 0) {
      +		goto error;
      +	}
     -+	in.len -= n;
     -+	in.buf += n;
     ++	slice_consume(&in, n);
      +
     -+	r->name = realloc(r->name, dest.len + 1);
     ++	r->name = reftable_realloc(r->name, dest.len + 1);
      +	memcpy(r->name, dest.buf, dest.len);
      +	r->name[dest.len] = 0;
      +
     @@ -3421,10 +3520,9 @@
      +	if (n < 0) {
      +		goto error;
      +	}
     -+	in.len -= n;
     -+	in.buf += n;
     ++	slice_consume(&in, n);
      +
     -+	r->email = realloc(r->email, dest.len + 1);
     ++	r->email = reftable_realloc(r->email, dest.len + 1);
      +	memcpy(r->email, dest.buf, dest.len);
      +	r->email[dest.len] = 0;
      +
     @@ -3433,33 +3531,30 @@
      +	if (n < 0) {
      +		goto error;
      +	}
     -+	in.len -= n;
     -+	in.buf += n;
     ++	slice_consume(&in, n);
      +	r->time = ts;
      +	if (in.len < 2) {
      +		goto error;
      +	}
      +
      +	r->tz_offset = get_be16(in.buf);
     -+	in.buf += 2;
     -+	in.len -= 2;
     ++	slice_consume(&in, 2);
      +
      +	slice_resize(&dest, 0);
      +	n = decode_string(&dest, in);
      +	if (n < 0) {
      +		goto error;
      +	}
     -+	in.len -= n;
     -+	in.buf += n;
     ++	slice_consume(&in, n);
      +
     -+	r->message = realloc(r->message, dest.len + 1);
     ++	r->message = reftable_realloc(r->message, dest.len + 1);
      +	memcpy(r->message, dest.buf, dest.len);
      +	r->message[dest.len] = 0;
      +
      +	return start.len - in.len;
      +
      +error:
     -+	free(slice_yield(&dest));
     ++	reftable_free(slice_yield(&dest));
      +	return FORMAT_ERROR;
      +}
      +
     @@ -3486,7 +3581,8 @@
      +	return !memcmp(a, b, sz);
      +}
      +
     -+bool log_record_equal(struct log_record *a, struct log_record *b, int hash_size)
     ++bool reftable_log_record_equal(struct reftable_log_record *a,
     ++			       struct reftable_log_record *b, int hash_size)
      +{
      +	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
      +	       null_streq(a->message, b->message) &&
     @@ -3496,14 +3592,14 @@
      +	       a->update_index == b->update_index;
      +}
      +
     -+struct record_vtable log_record_vtable = {
     -+	.key = &log_record_key,
     -+	.type = &log_record_type,
     -+	.copy_from = &log_record_copy_from,
     -+	.val_type = &log_record_val_type,
     -+	.encode = &log_record_encode,
     -+	.decode = &log_record_decode,
     -+	.clear = &log_record_clear_void,
     ++struct record_vtable reftable_log_record_vtable = {
     ++	.key = &reftable_log_record_key,
     ++	.type = &reftable_log_record_type,
     ++	.copy_from = &reftable_log_record_copy_from,
     ++	.val_type = &reftable_log_record_val_type,
     ++	.encode = &reftable_log_record_encode,
     ++	.decode = &reftable_log_record_decode,
     ++	.clear = &reftable_log_record_clear_void,
      +};
      +
      +struct record new_record(byte typ)
     @@ -3511,23 +3607,27 @@
      +	struct record rec;
      +	switch (typ) {
      +	case BLOCK_TYPE_REF: {
     -+		struct ref_record *r = calloc(1, sizeof(struct ref_record));
     ++		struct reftable_ref_record *r =
     ++			reftable_calloc(sizeof(struct reftable_ref_record));
      +		record_from_ref(&rec, r);
      +		return rec;
      +	}
      +
      +	case BLOCK_TYPE_OBJ: {
     -+		struct obj_record *r = calloc(1, sizeof(struct obj_record));
     ++		struct obj_record *r =
     ++			reftable_calloc(sizeof(struct obj_record));
      +		record_from_obj(&rec, r);
      +		return rec;
      +	}
      +	case BLOCK_TYPE_LOG: {
     -+		struct log_record *r = calloc(1, sizeof(struct log_record));
     ++		struct reftable_log_record *r =
     ++			reftable_calloc(sizeof(struct reftable_log_record));
      +		record_from_log(&rec, r);
      +		return rec;
      +	}
      +	case BLOCK_TYPE_INDEX: {
     -+		struct index_record *r = calloc(1, sizeof(struct index_record));
     ++		struct index_record *r =
     ++			reftable_calloc(sizeof(struct index_record));
      +		record_from_index(&rec, r);
      +		return rec;
      +	}
     @@ -3560,7 +3660,7 @@
      +static void index_record_clear(void *rec)
      +{
      +	struct index_record *idx = (struct index_record *)rec;
     -+	free(slice_yield(&idx->last_key));
     ++	reftable_free(slice_yield(&idx->last_key));
      +}
      +
      +static byte index_record_val_type(const void *rec)
     @@ -3578,8 +3678,7 @@
      +		return n;
      +	}
      +
     -+	out.buf += n;
     -+	out.len -= n;
     ++	slice_consume(&out, n);
      +
      +	return start.len - out.len;
      +}
     @@ -3598,8 +3697,7 @@
      +		return n;
      +	}
      +
     -+	in.buf += n;
     -+	in.len -= n;
     ++	slice_consume(&in, n);
      +	return start.len - in.len;
      +}
      +
     @@ -3651,10 +3749,10 @@
      +	return rec.ops->clear(rec.data);
      +}
      +
     -+void record_from_ref(struct record *rec, struct ref_record *ref_rec)
     ++void record_from_ref(struct record *rec, struct reftable_ref_record *ref_rec)
      +{
      +	rec->data = ref_rec;
     -+	rec->ops = &ref_record_vtable;
     ++	rec->ops = &reftable_ref_record_vtable;
      +}
      +
      +void record_from_obj(struct record *rec, struct obj_record *obj_rec)
     @@ -3669,10 +3767,10 @@
      +	rec->ops = &index_record_vtable;
      +}
      +
     -+void record_from_log(struct record *rec, struct log_record *log_rec)
     ++void record_from_log(struct record *rec, struct reftable_log_record *log_rec)
      +{
      +	rec->data = log_rec;
     -+	rec->ops = &log_record_vtable;
     ++	rec->ops = &reftable_log_record_vtable;
      +}
      +
      +void *record_yield(struct record *rec)
     @@ -3682,10 +3780,10 @@
      +	return p;
      +}
      +
     -+struct ref_record *record_as_ref(struct record rec)
     ++struct reftable_ref_record *record_as_ref(struct record rec)
      +{
      +	assert(record_type(rec) == BLOCK_TYPE_REF);
     -+	return (struct ref_record *)rec.data;
     ++	return (struct reftable_ref_record *)rec.data;
      +}
      +
      +static bool hash_equal(byte *a, byte *b, int hash_size)
     @@ -3706,7 +3804,8 @@
      +	return a == b;
      +}
      +
     -+bool ref_record_equal(struct ref_record *a, struct ref_record *b, int hash_size)
     ++bool reftable_ref_record_equal(struct reftable_ref_record *a,
     ++			       struct reftable_ref_record *b, int hash_size)
      +{
      +	assert(hash_size > 0);
      +	return 0 == strcmp(a->ref_name, b->ref_name) &&
     @@ -3716,22 +3815,22 @@
      +	       str_equal(a->target, b->target);
      +}
      +
     -+int ref_record_compare_name(const void *a, const void *b)
     ++int reftable_ref_record_compare_name(const void *a, const void *b)
      +{
     -+	return strcmp(((struct ref_record *)a)->ref_name,
     -+		      ((struct ref_record *)b)->ref_name);
     ++	return strcmp(((struct reftable_ref_record *)a)->ref_name,
     ++		      ((struct reftable_ref_record *)b)->ref_name);
      +}
      +
     -+bool ref_record_is_deletion(const struct ref_record *ref)
     ++bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
      +{
      +	return ref->value == NULL && ref->target == NULL &&
      +	       ref->target_value == NULL;
      +}
      +
     -+int log_record_compare_key(const void *a, const void *b)
     ++int reftable_log_record_compare_key(const void *a, const void *b)
      +{
     -+	struct log_record *la = (struct log_record *)a;
     -+	struct log_record *lb = (struct log_record *)b;
     ++	struct reftable_log_record *la = (struct reftable_log_record *)a;
     ++	struct reftable_log_record *lb = (struct reftable_log_record *)b;
      +
      +	int cmp = strcmp(la->ref_name, lb->ref_name);
      +	if (cmp) {
     @@ -3743,10 +3842,12 @@
      +	return (la->update_index < lb->update_index) ? 1 : 0;
      +}
      +
     -+bool log_record_is_deletion(const struct log_record *log)
     ++bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
      +{
     -+	/* XXX */
     -+	return false;
     ++	return (log->new_hash == NULL && log->old_hash == NULL &&
     ++		log->name == NULL && log->email == NULL &&
     ++		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
     ++		log->message == NULL);
      +}
      +
      +int hash_size(uint32_t id)
     @@ -3780,14 +3881,33 @@
      +#include "reftable.h"
      +#include "slice.h"
      +
     ++/* utilities for de/encoding varints */
     ++
     ++int get_var_int(uint64_t *dest, struct slice in);
     ++int put_var_int(struct slice dest, uint64_t val);
     ++
     ++/* Methods for records. */
      +struct record_vtable {
     ++	/* encode the key of to a byte slice. */
      +	void (*key)(const void *rec, struct slice *dest);
     ++
     ++	/* The record type of ('r' for ref). */
      +	byte (*type)(void);
     -+	void (*copy_from)(void *rec, const void *src, int hash_size);
     ++
     ++	void (*copy_from)(void *dest, const void *src, int hash_size);
     ++
     ++	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
     ++	 * vs ref deletion) */
      +	byte (*val_type)(const void *rec);
     ++
     ++	/* encodes rec into dest, returning how much space was used. */
      +	int (*encode)(const void *rec, struct slice dest, int hash_size);
     ++
     ++	/* decode data from `src` into the record. */
      +	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
      +		      int hash_size);
     ++
     ++	/* deallocate and null the record. */
      +	void (*clear)(void *rec);
      +};
      +
     @@ -3797,32 +3917,34 @@
      +	struct record_vtable *ops;
      +};
      +
     -+int get_var_int(uint64_t *dest, struct slice in);
     -+int put_var_int(struct slice dest, uint64_t val);
     -+int common_prefix_size(struct slice a, struct slice b);
     -+
      +int is_block_type(byte typ);
     ++
      +struct record new_record(byte typ);
      +
     -+extern struct record_vtable ref_record_vtable;
     ++extern struct record_vtable reftable_ref_record_vtable;
      +
      +int encode_key(bool *restart, struct slice dest, struct slice prev_key,
      +	       struct slice key, byte extra);
      +int decode_key(struct slice *key, byte *extra, struct slice last_key,
      +	       struct slice in);
      +
     ++/* index_record are used internally to speed up lookups. */
      +struct index_record {
     -+	uint64_t offset;
     -+	struct slice last_key;
     ++	uint64_t offset; /* Offset of block */
     ++	struct slice last_key; /* Last key of the block. */
      +};
      +
     ++/* obj_record stores an object ID => ref mapping. */
      +struct obj_record {
     -+	byte *hash_prefix;
     -+	int hash_prefix_len;
     -+	uint64_t *offsets;
     ++	byte *hash_prefix; /* leading bytes of the object ID */
     ++	int hash_prefix_len; /* number of leading bytes. Constant
     ++			      * across a single table. */
     ++	uint64_t *offsets; /* a vector of file offsets. */
      +	int offset_len;
      +};
      +
     ++/* see struct record_vtable */
     ++
      +void record_key(struct record rec, struct slice *dest);
      +byte record_type(struct record rec);
      +void record_copy_from(struct record rec, struct record src, int hash_size);
     @@ -3831,18 +3953,24 @@
      +int record_decode(struct record rec, struct slice key, byte extra,
      +		  struct slice src, int hash_size);
      +void record_clear(struct record rec);
     ++
     ++/* clear out the record, yielding the record data that was encapsulated. */
      +void *record_yield(struct record *rec);
     ++
     ++/* initialize generic records from concrete records. The generic record should
     ++ * be zeroed out. */
     ++
      +void record_from_obj(struct record *rec, struct obj_record *objrec);
      +void record_from_index(struct record *rec, struct index_record *idxrec);
     -+void record_from_ref(struct record *rec, struct ref_record *refrec);
     -+void record_from_log(struct record *rec, struct log_record *logrec);
     -+struct ref_record *record_as_ref(struct record ref);
     ++void record_from_ref(struct record *rec, struct reftable_ref_record *refrec);
     ++void record_from_log(struct record *rec, struct reftable_log_record *logrec);
     ++struct reftable_ref_record *record_as_ref(struct record ref);
      +
      +/* for qsort. */
     -+int ref_record_compare_name(const void *a, const void *b);
     ++int reftable_ref_record_compare_name(const void *a, const void *b);
      +
      +/* for qsort. */
     -+int log_record_compare_key(const void *a, const void *b);
     ++int reftable_log_record_compare_key(const void *a, const void *b);
      +
      +#endif
      
     @@ -3863,65 +3991,20 @@
      +#define REFTABLE_H
      +
      +#include <stdint.h>
     ++#include <stddef.h>
      +
     -+/* block_source is a generic wrapper for a seekable readable file.
     -+   It is generally passed around by value.
     -+ */
     -+struct block_source {
     -+	struct block_source_vtable *ops;
     -+	void *arg;
     -+};
     ++void reftable_set_alloc(void *(*malloc)(size_t),
     ++			void *(*realloc)(void *, size_t), void (*free)(void *));
      +
     -+/* a contiguous segment of bytes. It keeps track of its generating block_source
     -+   so it can return itself into the pool.
     -+*/
     -+struct block {
     -+	uint8_t *data;
     -+	int len;
     -+	struct block_source source;
     -+};
     ++/****************************************************************
     ++ Basic data types
      +
     -+/* block_source_vtable are the operations that make up block_source */
     -+struct block_source_vtable {
     -+	/* returns the size of a block source */
     -+	uint64_t (*size)(void *source);
     ++ Reftables store the state of each ref in struct reftable_ref_record, and they
     ++ store a sequence of reflog updates in struct reftable_log_record.
     ++ ****************************************************************/
      +
     -+	/* reads a segment from the block source. It is an error to read
     -+	   beyond the end of the block */
     -+	int (*read_block)(void *source, struct block *dest, uint64_t off,
     -+			  uint32_t size);
     -+	/* mark the block as read; may return the data back to malloc */
     -+	void (*return_block)(void *source, struct block *blockp);
     -+
     -+	/* release all resources associated with the block source */
     -+	void (*close)(void *source);
     -+};
     -+
     -+/* opens a file on the file system as a block_source */
     -+int block_source_from_file(struct block_source *block_src, const char *name);
     -+
     -+/* write_options sets options for writing a single reftable. */
     -+struct write_options {
     -+	/* boolean: do not pad out blocks to block size. */
     -+	int unpadded;
     -+
     -+	/* the blocksize. Should be less than 2^24. */
     -+	uint32_t block_size;
     -+
     -+	/* boolean: do not generate a SHA1 => ref index. */
     -+	int skip_index_objects;
     -+
     -+	/* how often to write complete keys in each block. */
     -+	int restart_interval;
     -+
     -+	/* 4-byte identifier ("sha1", "s256") of the hash.
     -+	 * Defaults to SHA1 if unset
     -+	 */
     -+	uint32_t hash_id;
     -+};
     -+
     -+/* ref_record holds a ref database entry target_value */
     -+struct ref_record {
     ++/* reftable_ref_record holds a ref database entry target_value */
     ++struct reftable_ref_record {
      +	char *ref_name; /* Name of the ref, malloced. */
      +	uint64_t update_index; /* Logical timestamp at which this value is
      +				  written */
     @@ -3931,21 +4014,23 @@
      +};
      +
      +/* returns whether 'ref' represents a deletion */
     -+int ref_record_is_deletion(const struct ref_record *ref);
     ++int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
      +
     -+/* prints a ref_record onto stdout */
     -+void ref_record_print(struct ref_record *ref, int hash_size);
     ++/* prints a reftable_ref_record onto stdout */
     ++void reftable_ref_record_print(struct reftable_ref_record *ref, int hash_size);
      +
      +/* frees and nulls all pointer values. */
     -+void ref_record_clear(struct ref_record *ref);
     ++void reftable_ref_record_clear(struct reftable_ref_record *ref);
      +
     -+/* returns whether two ref_records are the same */
     -+int ref_record_equal(struct ref_record *a, struct ref_record *b, int hash_size);
     ++/* returns whether two reftable_ref_records are the same */
     ++int reftable_ref_record_equal(struct reftable_ref_record *a,
     ++			      struct reftable_ref_record *b, int hash_size);
      +
     -+/* log_record holds a reflog entry */
     -+struct log_record {
     ++/* reftable_log_record holds a reflog entry */
     ++struct reftable_log_record {
      +	char *ref_name;
     -+	uint64_t update_index;
     ++	uint64_t update_index; /* logical timestamp of a transactional update.
     ++				*/
      +	uint8_t *new_hash;
      +	uint8_t *old_hash;
      +	char *name;
     @@ -3956,39 +4041,88 @@
      +};
      +
      +/* returns whether 'ref' represents the deletion of a log record. */
     -+int log_record_is_deletion(const struct log_record *log);
     ++int reftable_log_record_is_deletion(const struct reftable_log_record *log);
      +
      +/* frees and nulls all pointer values. */
     -+void log_record_clear(struct log_record *log);
     ++void reftable_log_record_clear(struct reftable_log_record *log);
      +
      +/* returns whether two records are equal. */
     -+int log_record_equal(struct log_record *a, struct log_record *b, int hash_size);
     ++int reftable_log_record_equal(struct reftable_log_record *a,
     ++			      struct reftable_log_record *b, int hash_size);
      +
     -+void log_record_print(struct log_record *log, int hash_size);
     ++/* dumps a reftable_log_record on stdout, for debugging/testing. */
     ++void reftable_log_record_print(struct reftable_log_record *log, int hash_size);
      +
     -+/* iterator is the generic interface for walking over data stored in a
     -+   reftable. It is generally passed around by value.
     -+*/
     -+struct iterator {
     -+	struct iterator_vtable *ops;
     -+	void *iter_arg;
     ++/****************************************************************
     ++ Error handling
     ++
     ++ Error are signaled with negative integer return values. 0 means success.
     ++ ****************************************************************/
     ++
     ++/* different types of errors */
     ++enum reftable_error {
     ++	/* Unexpected file system behavior */
     ++	IO_ERROR = -2,
     ++
     ++	/* Format inconsistency on reading data
     ++	 */
     ++	FORMAT_ERROR = -3,
     ++
     ++	/* File does not exist. Returned from block_source_from_file(),  because
     ++	   it needs special handling in stack.
     ++	*/
     ++	NOT_EXIST_ERROR = -4,
     ++
     ++	/* Trying to write out-of-date data. */
     ++	LOCK_ERROR = -5,
     ++
     ++	/* Misuse of the API:
     ++	   - on writing a record with NULL ref_name.
     ++	   - on writing a reftable_ref_record outside the table limits
     ++	   - on writing a ref or log record before the stack's next_update_index
     ++	   - on reading a reftable_ref_record from log iterator, or vice versa.
     ++	*/
     ++	API_ERROR = -6,
     ++
     ++	/* Decompression error */
     ++	ZLIB_ERROR = -7,
     ++
     ++	/* Wrote a table without blocks. */
     ++	EMPTY_TABLE_ERROR = -8,
      +};
      +
     -+/* reads the next ref_record. Returns < 0 for error, 0 for OK and > 0:
     -+   end of iteration.
     -+*/
     -+int iterator_next_ref(struct iterator it, struct ref_record *ref);
     ++/* convert the numeric error code to a string. The string should not be
     ++ * deallocated. */
     ++const char *reftable_error_str(int err);
      +
     -+/* reads the next log_record. Returns < 0 for error, 0 for OK and > 0:
     -+   end of iteration.
     -+*/
     -+int iterator_next_log(struct iterator it, struct log_record *log);
     ++/****************************************************************
     ++ Writing
      +
     -+/* releases resources associated with an iterator. */
     -+void iterator_destroy(struct iterator *it);
     ++ Writing single reftables
     ++ ****************************************************************/
     ++
     ++/* reftable_write_options sets options for writing a single reftable. */
     ++struct reftable_write_options {
     ++	/* boolean: do not pad out blocks to block size. */
     ++	int unpadded;
     ++
     ++	/* the blocksize. Should be less than 2^24. */
     ++	uint32_t block_size;
      +
     -+/* block_stats holds statistics for a single block type */
     -+struct block_stats {
     ++	/* boolean: do not generate a SHA1 => ref index. */
     ++	int skip_index_objects;
     ++
     ++	/* how often to write complete keys in each block. */
     ++	int restart_interval;
     ++
     ++	/* 4-byte identifier ("sha1", "s256") of the hash.
     ++	 * Defaults to SHA1 if unset
     ++	 */
     ++	uint32_t hash_id;
     ++};
     ++
     ++/* reftable_block_stats holds statistics for a single block type */
     ++struct reftable_block_stats {
      +	/* total number of entries written */
      +	int entries;
      +	/* total number of key restarts */
     @@ -4008,119 +4142,175 @@
      +};
      +
      +/* stats holds overall statistics for a single reftable */
     -+struct stats {
     ++struct reftable_stats {
      +	/* total number of blocks written. */
      +	int blocks;
      +	/* stats for ref data */
     -+	struct block_stats ref_stats;
     ++	struct reftable_block_stats ref_stats;
      +	/* stats for the SHA1 to ref map. */
     -+	struct block_stats obj_stats;
     ++	struct reftable_block_stats obj_stats;
      +	/* stats for index blocks */
     -+	struct block_stats idx_stats;
     ++	struct reftable_block_stats idx_stats;
      +	/* stats for log blocks */
     -+	struct block_stats log_stats;
     ++	struct reftable_block_stats log_stats;
      +
      +	/* disambiguation length of shortened object IDs. */
      +	int object_id_len;
      +};
      +
     -+/* different types of errors */
     -+
     -+/* Unexpected file system behavior */
     -+#define IO_ERROR -2
     -+
     -+/* Format inconsistency on reading data
     -+ */
     -+#define FORMAT_ERROR -3
     -+
     -+/* File does not exist. Returned from block_source_from_file(),  because it
     -+   needs special handling in stack.
     -+*/
     -+#define NOT_EXIST_ERROR -4
     -+
     -+/* Trying to write out-of-date data. */
     -+#define LOCK_ERROR -5
     -+
     -+/* Misuse of the API:
     -+   - on writing a record with NULL ref_name.
     -+   - on writing a ref_record outside the table limits
     -+   - on writing a ref or log record before the stack's next_update_index
     -+   - on reading a ref_record from log iterator, or vice versa.
     -+ */
     -+#define API_ERROR -6
     -+
     -+/* Decompression error */
     -+#define ZLIB_ERROR -7
     -+
     -+/* Wrote a table without blocks. */
     -+#define EMPTY_TABLE_ERROR -8
     -+
     -+const char *error_str(int err);
     -+
     -+/* new_writer creates a new writer */
     -+struct writer *new_writer(int (*writer_func)(void *, uint8_t *, int),
     -+			  void *writer_arg, struct write_options *opts);
     ++/* reftable_new_writer creates a new writer */
     ++struct reftable_writer *
     ++reftable_new_writer(int (*writer_func)(void *, uint8_t *, int),
     ++		    void *writer_arg, struct reftable_write_options *opts);
      +
      +/* write to a file descriptor. fdp should be an int* pointing to the fd. */
     -+int fd_writer(void *fdp, uint8_t *data, int size);
     ++int reftable_fd_write(void *fdp, uint8_t *data, int size);
      +
      +/* Set the range of update indices for the records we will add.  When
      +   writing a table into a stack, the min should be at least
     -+   stack_next_update_index(), or API_ERROR is returned.
     ++   reftable_stack_next_update_index(), or API_ERROR is returned.
     ++
     ++   For transactional updates, typically min==max. When converting an existing
     ++   ref database into a single reftable, this would be a range of update-index
     ++   timestamps.
      + */
     -+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max);
     ++void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
     ++				uint64_t max);
      +
     -+/* adds a ref_record. Must be called in ascending
     ++/* adds a reftable_ref_record. Must be called in ascending
      +   order. The update_index must be within the limits set by
     -+   writer_set_limits(), or API_ERROR is returned.
     ++   reftable_writer_set_limits(), or API_ERROR is returned.
     ++
     ++   It is an error to write a ref record after a log record.
      + */
     -+int writer_add_ref(struct writer *w, struct ref_record *ref);
     ++int reftable_writer_add_ref(struct reftable_writer *w,
     ++			    struct reftable_ref_record *ref);
      +
      +/* Convenience function to add multiple refs. Will sort the refs by
      +   name before adding. */
     -+int writer_add_refs(struct writer *w, struct ref_record *refs, int n);
     ++int reftable_writer_add_refs(struct reftable_writer *w,
     ++			     struct reftable_ref_record *refs, int n);
      +
     -+/* adds a log_record. Must be called in ascending order (with more
     ++/* adds a reftable_log_record. Must be called in ascending order (with more
      +   recent log entries first.)
      + */
     -+int writer_add_log(struct writer *w, struct log_record *log);
     ++int reftable_writer_add_log(struct reftable_writer *w,
     ++			    struct reftable_log_record *log);
      +
      +/* Convenience function to add multiple logs. Will sort the records by
      +   key before adding. */
     -+int writer_add_logs(struct writer *w, struct log_record *logs, int n);
     ++int reftable_writer_add_logs(struct reftable_writer *w,
     ++			     struct reftable_log_record *logs, int n);
     ++
     ++/* reftable_writer_close finalizes the reftable. The writer is retained so
     ++ * statistics can be inspected. */
     ++int reftable_writer_close(struct reftable_writer *w);
     ++
     ++/* writer_stats returns the statistics on the reftable being written.
     ++
     ++   This struct becomes invalid when the writer is freed.
     ++ */
     ++const struct reftable_stats *writer_stats(struct reftable_writer *w);
     ++
     ++/* reftable_writer_free deallocates memory for the writer */
     ++void reftable_writer_free(struct reftable_writer *w);
     ++
     ++/****************************************************************
     ++ * ITERATING
     ++ ****************************************************************/
      +
     -+/* writer_close finalizes the reftable. The writer is retained so statistics can
     -+ * be inspected. */
     -+int writer_close(struct writer *w);
     ++/* iterator is the generic interface for walking over data stored in a
     ++   reftable. It is generally passed around by value.
     ++*/
     ++struct reftable_iterator {
     ++	struct reftable_iterator_vtable *ops;
     ++	void *iter_arg;
     ++};
      +
     -+/* writer_stats returns the statistics on the reftable being written. */
     -+struct stats *writer_stats(struct writer *w);
     ++/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
     ++   end of iteration.
     ++*/
     ++int reftable_iterator_next_ref(struct reftable_iterator it,
     ++			       struct reftable_ref_record *ref);
      +
     -+/* writer_free deallocates memory for the writer */
     -+void writer_free(struct writer *w);
     ++/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
     ++   end of iteration.
     ++*/
     ++int reftable_iterator_next_log(struct reftable_iterator it,
     ++			       struct reftable_log_record *log);
      +
     -+struct reader;
     ++/* releases resources associated with an iterator. */
     ++void reftable_iterator_destroy(struct reftable_iterator *it);
     ++
     ++/****************************************************************
     ++ Reading single tables
      +
     -+/* new_reader opens a reftable for reading. If successful, returns 0
     ++ The follow routines are for reading single files. For an application-level
     ++ interface, skip ahead to struct reftable_merged_table and struct
     ++ reftable_stack.
     ++ ****************************************************************/
     ++
     ++/* block_source is a generic wrapper for a seekable readable file.
     ++   It is generally passed around by value.
     ++ */
     ++struct reftable_block_source {
     ++	struct reftable_block_source_vtable *ops;
     ++	void *arg;
     ++};
     ++
     ++/* a contiguous segment of bytes. It keeps track of its generating block_source
     ++   so it can return itself into the pool.
     ++*/
     ++struct reftable_block {
     ++	uint8_t *data;
     ++	int len;
     ++	struct reftable_block_source source;
     ++};
     ++
     ++/* block_source_vtable are the operations that make up block_source */
     ++struct reftable_block_source_vtable {
     ++	/* returns the size of a block source */
     ++	uint64_t (*size)(void *source);
     ++
     ++	/* reads a segment from the block source. It is an error to read
     ++	   beyond the end of the block */
     ++	int (*read_block)(void *source, struct reftable_block *dest,
     ++			  uint64_t off, uint32_t size);
     ++	/* mark the block as read; may return the data back to malloc */
     ++	void (*return_block)(void *source, struct reftable_block *blockp);
     ++
     ++	/* release all resources associated with the block source */
     ++	void (*close)(void *source);
     ++};
     ++
     ++/* opens a file on the file system as a block_source */
     ++int reftable_block_source_from_file(struct reftable_block_source *block_src,
     ++				    const char *name);
     ++
     ++/* The reader struct is a handle to an open reftable file. */
     ++struct reftable_reader;
     ++
     ++/* reftable_new_reader opens a reftable for reading. If successful, returns 0
      + * code and sets pp.  The name is used for creating a
      + * stack. Typically, it is the basename of the file.
      + */
     -+int new_reader(struct reader **pp, struct block_source, const char *name);
     ++int reftable_new_reader(struct reftable_reader **pp,
     ++			struct reftable_block_source, const char *name);
      +
     -+/* reader_seek_ref returns an iterator where 'name' would be inserted in the
     -+   table.
     ++/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
     ++   in the table.  To seek to the start of the table, use name = "".
      +
      +   example:
      +
     -+   struct reader *r = NULL;
     -+   int err = new_reader(&r, src, "filename");
     ++   struct reftable_reader *r = NULL;
     ++   int err = reftable_new_reader(&r, src, "filename");
      +   if (err < 0) { ... }
     -+   struct iterator it  = {0};
     -+   err = reader_seek_ref(r, &it, "refs/heads/master");
     ++   struct reftable_iterator it  = {0};
     ++   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
      +   if (err < 0) { ... }
     -+   struct ref_record ref  = {0};
     ++   struct reftable_ref_record ref  = {0};
      +   while (1) {
     -+     err = iterator_next_ref(it, &ref);
     ++     err = reftable_iterator_next_ref(it, &ref);
      +     if (err > 0) {
      +       break;
      +     }
     @@ -4129,103 +4319,149 @@
      +     }
      +     ..found..
      +   }
     -+   iterator_destroy(&it);
     -+   ref_record_clear(&ref);
     ++   reftable_iterator_destroy(&it);
     ++   reftable_ref_record_clear(&ref);
      + */
     -+int reader_seek_ref(struct reader *r, struct iterator *it, const char *name);
     ++int reftable_reader_seek_ref(struct reftable_reader *r,
     ++			     struct reftable_iterator *it, const char *name);
      +
      +/* returns the hash ID used in this table. */
     -+uint32_t reader_hash_id(struct reader *r);
     ++uint32_t reftable_reader_hash_id(struct reftable_reader *r);
      +
     -+/* seek to logs for the given name, older than update_index. */
     -+int reader_seek_log_at(struct reader *r, struct iterator *it, const char *name,
     -+		       uint64_t update_index);
     ++/* seek to logs for the given name, older than update_index. To seek to the
     ++   start of the table, use name = "".
     ++ */
     ++int reftable_reader_seek_log_at(struct reftable_reader *r,
     ++				struct reftable_iterator *it, const char *name,
     ++				uint64_t update_index);
      +
      +/* seek to newest log entry for given name. */
     -+int reader_seek_log(struct reader *r, struct iterator *it, const char *name);
     ++int reftable_reader_seek_log(struct reftable_reader *r,
     ++			     struct reftable_iterator *it, const char *name);
      +
      +/* closes and deallocates a reader. */
     -+void reader_free(struct reader *);
     ++void reftable_reader_free(struct reftable_reader *);
      +
      +/* return an iterator for the refs pointing to oid */
     -+int reader_refs_for(struct reader *r, struct iterator *it, uint8_t *oid,
     -+		    int oid_len);
     ++int reftable_reader_refs_for(struct reftable_reader *r,
     ++			     struct reftable_iterator *it, uint8_t *oid,
     ++			     int oid_len);
      +
      +/* return the max_update_index for a table */
     -+uint64_t reader_max_update_index(struct reader *r);
     ++uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
      +
      +/* return the min_update_index for a table */
     -+uint64_t reader_min_update_index(struct reader *r);
     ++uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
     ++
     ++/****************************************************************
     ++ Merged tables
      +
     -+/* a merged table is implements seeking/iterating over a stack of tables. */
     -+struct merged_table;
     ++ A ref database kept in a sequence of table files. The merged_table presents a
     ++ unified view to reading (seeking, iterating) a sequence of immutable tables.
     ++ ****************************************************************/
      +
     -+/* new_merged_table creates a new merged table. It takes ownership of the stack
     -+   array.
     ++/* A merged table is implements seeking/iterating over a stack of tables. */
     ++struct reftable_merged_table;
     ++
     ++/* reftable_new_merged_table creates a new merged table. It takes ownership of
     ++   the stack array.
      +*/
     -+int new_merged_table(struct merged_table **dest, struct reader **stack, int n,
     -+		     uint32_t hash_id);
     ++int reftable_new_merged_table(struct reftable_merged_table **dest,
     ++			      struct reftable_reader **stack, int n,
     ++			      uint32_t hash_id);
      +
      +/* returns the hash id used in this merged table. */
     -+uint32_t merged_hash_id(struct merged_table *mt);
     ++uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *mt);
      +
      +/* returns an iterator positioned just before 'name' */
     -+int merged_table_seek_ref(struct merged_table *mt, struct iterator *it,
     -+			  const char *name);
     ++int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
     ++				   struct reftable_iterator *it,
     ++				   const char *name);
      +
      +/* returns an iterator for log entry, at given update_index */
     -+int merged_table_seek_log_at(struct merged_table *mt, struct iterator *it,
     -+			     const char *name, uint64_t update_index);
     ++int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
     ++				      struct reftable_iterator *it,
     ++				      const char *name, uint64_t update_index);
      +
     -+/* like merged_table_seek_log_at but look for the newest entry. */
     -+int merged_table_seek_log(struct merged_table *mt, struct iterator *it,
     -+			  const char *name);
     ++/* like reftable_merged_table_seek_log_at but look for the newest entry. */
     ++int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
     ++				   struct reftable_iterator *it,
     ++				   const char *name);
      +
      +/* returns the max update_index covered by this merged table. */
     -+uint64_t merged_max_update_index(struct merged_table *mt);
     ++uint64_t
     ++reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
      +
      +/* returns the min update_index covered by this merged table. */
     -+uint64_t merged_min_update_index(struct merged_table *mt);
     ++uint64_t
     ++reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
      +
      +/* closes readers for the merged tables */
     -+void merged_table_close(struct merged_table *mt);
     ++void reftable_merged_table_close(struct reftable_merged_table *mt);
      +
      +/* releases memory for the merged_table */
     -+void merged_table_free(struct merged_table *m);
     ++void reftable_merged_table_free(struct reftable_merged_table *m);
     ++
     ++/****************************************************************
     ++ Mutable ref database
     ++
     ++ The stack presents an interface to a mutable sequence of reftables.
     ++ ****************************************************************/
      +
      +/* a stack is a stack of reftables, which can be mutated by pushing a table to
      + * the top of the stack */
     -+struct stack;
     ++struct reftable_stack;
      +
     -+/* open a new reftable stack. The tables will be stored in 'dir', while the list
     -+   of tables is in 'list_file'. Typically, this should be .git/reftables and
     -+   .git/refs respectively.
     ++/* open a new reftable stack. The tables along with the table list will be
     ++   stored in 'dir'. Typically, this should be .git/reftables.
      +*/
     -+int new_stack(struct stack **dest, const char *dir, const char *list_file,
     -+	      struct write_options config);
     ++int reftable_new_stack(struct reftable_stack **dest, const char *dir,
     ++		       struct reftable_write_options config);
      +
      +/* returns the update_index at which a next table should be written. */
     -+uint64_t stack_next_update_index(struct stack *st);
     ++uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
     ++
     ++/* holds a transaction to add tables at the top of a stack. */
     ++struct reftable_addition;
     ++
     ++/*
     ++  returns a new transaction to add reftables to the given stack. As a side
     ++  effect, the ref database is locked.
     ++*/ 
     ++int reftable_stack_new_addition(struct reftable_addition **dest, struct reftable_stack *st);
     ++
     ++/* Adds a reftable to transaction. */ 
     ++int reftable_addition_add(struct reftable_addition *add,
     ++                          int (*write_table)(struct reftable_writer *wr, void *arg),
     ++                          void *arg);
     ++
     ++/* Commits the transaction, releasing the lock. */
     ++int reftable_addition_commit(struct reftable_addition *add);
     ++
     ++/* Release all non-committed data from the transaction; releases the lock if held. */
     ++void reftable_addition_close(struct reftable_addition *add);
      +
      +/* add a new table to the stack. The write_table function must call
     -+   writer_set_limits, add refs and return an error value. */
     -+int stack_add(struct stack *st,
     -+	      int (*write_table)(struct writer *wr, void *write_arg),
     -+	      void *write_arg);
     ++   reftable_writer_set_limits, add refs and return an error value. */
     ++int reftable_stack_add(struct reftable_stack *st,
     ++		       int (*write_table)(struct reftable_writer *wr,
     ++					  void *write_arg),
     ++		       void *write_arg);
      +
      +/* returns the merged_table for seeking. This table is valid until the
      +   next write or reload, and should not be closed or deleted.
      +*/
     -+struct merged_table *stack_merged_table(struct stack *st);
     ++struct reftable_merged_table *
     ++reftable_stack_merged_table(struct reftable_stack *st);
      +
      +/* frees all resources associated with the stack. */
     -+void stack_destroy(struct stack *st);
     ++void reftable_stack_destroy(struct reftable_stack *st);
      +
      +/* reloads the stack if necessary. */
     -+int stack_reload(struct stack *st);
     ++int reftable_stack_reload(struct reftable_stack *st);
      +
      +/* Policy for expiring reflog entries. */
     -+struct log_expiry_config {
     ++struct reftable_log_expiry_config {
      +	/* Drop entries older than this timestamp */
      +	uint64_t time;
      +
     @@ -4235,29 +4471,32 @@
      +
      +/* compacts all reftables into a giant table. Expire reflog entries if config is
      + * non-NULL */
     -+int stack_compact_all(struct stack *st, struct log_expiry_config *config);
     ++int reftable_stack_compact_all(struct reftable_stack *st,
     ++			       struct reftable_log_expiry_config *config);
      +
      +/* heuristically compact unbalanced table stack. */
     -+int stack_auto_compact(struct stack *st);
     ++int reftable_stack_auto_compact(struct reftable_stack *st);
      +
      +/* convenience function to read a single ref. Returns < 0 for error, 0
      +   for success, and 1 if ref not found. */
     -+int stack_read_ref(struct stack *st, const char *refname,
     -+		   struct ref_record *ref);
     ++int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
     ++			    struct reftable_ref_record *ref);
      +
      +/* convenience function to read a single log. Returns < 0 for error, 0
      +   for success, and 1 if ref not found. */
     -+int stack_read_log(struct stack *st, const char *refname,
     -+		   struct log_record *log);
     ++int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
     ++			    struct reftable_log_record *log);
      +
      +/* statistics on past compactions. */
     -+struct compaction_stats {
     -+	uint64_t bytes;
     -+	int attempts;
     -+	int failures;
     ++struct reftable_compaction_stats {
     ++	uint64_t bytes; /* total number of bytes written */
     ++	int attempts; /* how often we tried to compact */
     ++	int failures; /* failures happen on concurrent updates */
      +};
      +
     -+struct compaction_stats *stack_compaction_stats(struct stack *st);
     ++/* return statistics for compaction up till now. */
     ++struct reftable_compaction_stats *
     ++reftable_stack_compaction_stats(struct reftable_stack *st);
      +
      +#endif
      
     @@ -4304,7 +4543,7 @@
      +			c = l;
      +		}
      +		s->cap = c;
     -+		s->buf = realloc(s->buf, s->cap);
     ++		s->buf = reftable_realloc(s->buf, s->cap);
      +	}
      +	s->len = l;
      +}
     @@ -4325,6 +4564,12 @@
      +	memcpy(s->buf + end, a.buf, a.len);
      +}
      +
     ++void slice_consume(struct slice *s, int n)
     ++{
     ++	s->buf += n;
     ++	s->len -= n;
     ++}
     ++
      +byte *slice_yield(struct slice *s)
      +{
      +	byte *p = s->buf;
     @@ -4394,7 +4639,7 @@
      +		if (newcap < b->len + sz) {
      +			newcap = (b->len + sz);
      +		}
     -+		b->buf = realloc(b->buf, newcap);
     ++		b->buf = reftable_realloc(b->buf, newcap);
      +		b->cap = newcap;
      +	}
      +
     @@ -4413,57 +4658,71 @@
      +	return ((struct slice *)b)->len;
      +}
      +
     -+static void slice_return_block(void *b, struct block *dest)
     ++static void slice_return_block(void *b, struct reftable_block *dest)
      +{
      +	memset(dest->data, 0xff, dest->len);
     -+	free(dest->data);
     ++	reftable_free(dest->data);
      +}
      +
      +static void slice_close(void *b)
      +{
      +}
      +
     -+static int slice_read_block(void *v, struct block *dest, uint64_t off,
     ++static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
      +			    uint32_t size)
      +{
      +	struct slice *b = (struct slice *)v;
      +	assert(off + size <= b->len);
     -+	dest->data = calloc(size, 1);
     ++	dest->data = reftable_calloc(size);
      +	memcpy(dest->data, b->buf + off, size);
      +	dest->len = size;
      +	return size;
      +}
      +
     -+struct block_source_vtable slice_vtable = {
     ++struct reftable_block_source_vtable slice_vtable = {
      +	.size = &slice_size,
      +	.read_block = &slice_read_block,
      +	.return_block = &slice_return_block,
      +	.close = &slice_close,
      +};
      +
     -+void block_source_from_slice(struct block_source *bs, struct slice *buf)
     ++void block_source_from_slice(struct reftable_block_source *bs,
     ++			     struct slice *buf)
      +{
      +	bs->ops = &slice_vtable;
      +	bs->arg = buf;
      +}
      +
     -+static void malloc_return_block(void *b, struct block *dest)
     ++static void malloc_return_block(void *b, struct reftable_block *dest)
      +{
      +	memset(dest->data, 0xff, dest->len);
     -+	free(dest->data);
     ++	reftable_free(dest->data);
      +}
      +
     -+struct block_source_vtable malloc_vtable = {
     ++struct reftable_block_source_vtable malloc_vtable = {
      +	.return_block = &malloc_return_block,
      +};
      +
     -+struct block_source malloc_block_source_instance = {
     ++struct reftable_block_source malloc_block_source_instance = {
      +	.ops = &malloc_vtable,
      +};
      +
     -+struct block_source malloc_block_source(void)
     ++struct reftable_block_source malloc_block_source(void)
      +{
      +	return malloc_block_source_instance;
     ++}
     ++
     ++int common_prefix_size(struct slice a, struct slice b)
     ++{
     ++	int p = 0;
     ++	while (p < a.len && p < b.len) {
     ++		if (a.buf[p] != b.buf[p]) {
     ++			break;
     ++		}
     ++		p++;
     ++	}
     ++
     ++	return p;
      +}
      
       diff --git a/reftable/slice.h b/reftable/slice.h
     @@ -4498,16 +4757,19 @@
      +bool slice_equal(struct slice a, struct slice b);
      +byte *slice_yield(struct slice *s);
      +void slice_copy(struct slice *dest, struct slice src);
     ++void slice_consume(struct slice *s, int n);
      +void slice_resize(struct slice *s, int l);
      +int slice_compare(struct slice a, struct slice b);
      +int slice_write(struct slice *b, byte *data, int sz);
      +int slice_write_void(void *b, byte *data, int sz);
      +void slice_append(struct slice *dest, struct slice add);
     ++int common_prefix_size(struct slice a, struct slice b);
      +
     -+struct block_source;
     -+void block_source_from_slice(struct block_source *bs, struct slice *buf);
     ++struct reftable_block_source;
     ++void block_source_from_slice(struct reftable_block_source *bs,
     ++			     struct slice *buf);
      +
     -+struct block_source malloc_block_source(void);
     ++struct reftable_block_source malloc_block_source(void);
      +
      +#endif
      
     @@ -4532,19 +4794,26 @@
      +#include "reftable.h"
      +#include "writer.h"
      +
     -+int new_stack(struct stack **dest, const char *dir, const char *list_file,
     -+	      struct write_options config)
     ++int reftable_new_stack(struct reftable_stack **dest, const char *dir,
     ++		       struct reftable_write_options config)
      +{
     -+	struct stack *p = calloc(sizeof(struct stack), 1);
     ++	struct reftable_stack *p =
     ++		reftable_calloc(sizeof(struct reftable_stack));
     ++	struct slice list_file_name = {};
      +	int err = 0;
      +	*dest = NULL;
     -+	p->list_file = xstrdup(list_file);
     ++
     ++	slice_set_string(&list_file_name, dir);
     ++	slice_append_string(&list_file_name, "/reftables.list");
     ++
     ++	p->list_file = slice_to_string(list_file_name);
     ++	reftable_free(slice_yield(&list_file_name));
      +	p->reftable_dir = xstrdup(dir);
      +	p->config = config;
      +
     -+	err = stack_reload(p);
     ++	err = reftable_stack_reload(p);
      +	if (err < 0) {
     -+		stack_destroy(p);
     ++		reftable_stack_destroy(p);
      +	} else {
      +		*dest = p;
      +	}
     @@ -4571,7 +4840,7 @@
      +		goto exit;
      +	}
      +
     -+	buf = malloc(size + 1);
     ++	buf = reftable_malloc(size + 1);
      +	if (fread(buf, 1, size, f) != size) {
      +		err = IO_ERROR;
      +		goto exit;
     @@ -4580,7 +4849,7 @@
      +
      +	parse_names(buf, size, namesp);
      +exit:
     -+	free(buf);
     ++	reftable_free(buf);
      +	return err;
      +}
      +
     @@ -4590,7 +4859,7 @@
      +	int err = 0;
      +	if (f == NULL) {
      +		if (errno == ENOENT) {
     -+			*namesp = calloc(sizeof(char *), 1);
     ++			*namesp = reftable_calloc(sizeof(char *));
      +			return 0;
      +		}
      +
     @@ -4601,30 +4870,33 @@
      +	return err;
      +}
      +
     -+struct merged_table *stack_merged_table(struct stack *st)
     ++struct reftable_merged_table *
     ++reftable_stack_merged_table(struct reftable_stack *st)
      +{
      +	return st->merged;
      +}
      +
      +/* Close and free the stack */
     -+void stack_destroy(struct stack *st)
     ++void reftable_stack_destroy(struct reftable_stack *st)
      +{
      +	if (st->merged == NULL) {
      +		return;
      +	}
      +
     -+	merged_table_close(st->merged);
     -+	merged_table_free(st->merged);
     ++	reftable_merged_table_close(st->merged);
     ++	reftable_merged_table_free(st->merged);
      +	st->merged = NULL;
      +
      +	FREE_AND_NULL(st->list_file);
      +	FREE_AND_NULL(st->reftable_dir);
     -+	free(st);
     ++	reftable_free(st);
      +}
      +
     -+static struct reader **stack_copy_readers(struct stack *st, int cur_len)
     ++static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
     ++						   int cur_len)
      +{
     -+	struct reader **cur = calloc(sizeof(struct reader *), cur_len);
     ++	struct reftable_reader **cur =
     ++		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
      +	int i = 0;
      +	for (i = 0; i < cur_len; i++) {
      +		cur[i] = st->merged->stack[i];
     @@ -4632,21 +4904,22 @@
      +	return cur;
      +}
      +
     -+static int stack_reload_once(struct stack *st, char **names, bool reuse_open)
     ++static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
     ++				      bool reuse_open)
      +{
      +	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
     -+	struct reader **cur = stack_copy_readers(st, cur_len);
     ++	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
      +	int err = 0;
      +	int names_len = names_length(names);
     -+	struct reader **new_tables =
     -+		malloc(sizeof(struct reader *) * names_len);
     ++	struct reftable_reader **new_tables =
     ++		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
      +	int new_tables_len = 0;
     -+	struct merged_table *new_merged = NULL;
     ++	struct reftable_merged_table *new_merged = NULL;
      +
      +	struct slice table_path = { 0 };
      +
      +	while (*names) {
     -+		struct reader *rd = NULL;
     ++		struct reftable_reader *rd = NULL;
      +		char *name = *names++;
      +
      +		/* this is linear; we assume compaction keeps the number of
     @@ -4661,18 +4934,18 @@
      +		}
      +
      +		if (rd == NULL) {
     -+			struct block_source src = { 0 };
     ++			struct reftable_block_source src = { 0 };
      +			slice_set_string(&table_path, st->reftable_dir);
      +			slice_append_string(&table_path, "/");
      +			slice_append_string(&table_path, name);
      +
     -+			err = block_source_from_file(
     ++			err = reftable_block_source_from_file(
      +				&src, slice_as_string(&table_path));
      +			if (err < 0) {
      +				goto exit;
      +			}
      +
     -+			err = new_reader(&rd, src, name);
     ++			err = reftable_new_reader(&rd, src, name);
      +			if (err < 0) {
      +				goto exit;
      +			}
     @@ -4682,8 +4955,8 @@
      +	}
      +
      +	/* success! */
     -+	err = new_merged_table(&new_merged, new_tables, new_tables_len,
     -+			       st->config.hash_id);
     ++	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
     ++					st->config.hash_id);
      +	if (err < 0) {
      +		goto exit;
      +	}
     @@ -4692,7 +4965,7 @@
      +	new_tables_len = 0;
      +	if (st->merged != NULL) {
      +		merged_table_clear(st->merged);
     -+		merged_table_free(st->merged);
     ++		reftable_merged_table_free(st->merged);
      +	}
      +	st->merged = new_merged;
      +
     @@ -4701,20 +4974,20 @@
      +		for (i = 0; i < cur_len; i++) {
      +			if (cur[i] != NULL) {
      +				reader_close(cur[i]);
     -+				reader_free(cur[i]);
     ++				reftable_reader_free(cur[i]);
      +			}
      +		}
      +	}
      +exit:
     -+	free(slice_yield(&table_path));
     ++	reftable_free(slice_yield(&table_path));
      +	{
      +		int i = 0;
      +		for (i = 0; i < new_tables_len; i++) {
      +			reader_close(new_tables[i]);
      +		}
      +	}
     -+	free(new_tables);
     -+	free(cur);
     ++	reftable_free(new_tables);
     ++	reftable_free(cur);
      +	return err;
      +}
      +
     @@ -4731,7 +5004,8 @@
      +	return udiff;
      +}
      +
     -+static int stack_reload_maybe_reuse(struct stack *st, bool reuse_open)
     ++static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
     ++					     bool reuse_open)
      +{
      +	struct timeval deadline = { 0 };
      +	int err = gettimeofday(&deadline, NULL);
     @@ -4763,7 +5037,7 @@
      +			free_names(names);
      +			return err;
      +		}
     -+		err = stack_reload_once(st, names, reuse_open);
     ++		err = reftable_stack_reload_once(st, names, reuse_open);
      +		if (err == 0) {
      +			free_names(names);
      +			break;
     @@ -4794,15 +5068,15 @@
      +	return 0;
      +}
      +
     -+int stack_reload(struct stack *st)
     ++int reftable_stack_reload(struct reftable_stack *st)
      +{
     -+	return stack_reload_maybe_reuse(st, true);
     ++	return reftable_stack_reload_maybe_reuse(st, true);
      +}
      +
      +/* -1 = error
      + 0 = up to date
      + 1 = changed. */
     -+static int stack_uptodate(struct stack *st)
     ++static int stack_uptodate(struct reftable_stack *st)
      +{
      +	char **names = NULL;
      +	int err = read_lines(st->list_file, &names);
     @@ -4833,18 +5107,19 @@
      +	return err;
      +}
      +
     -+int stack_add(struct stack *st, int (*write)(struct writer *wr, void *arg),
     -+	      void *arg)
     ++int reftable_stack_add(struct reftable_stack *st,
     ++		       int (*write)(struct reftable_writer *wr, void *arg),
     ++		       void *arg)
      +{
      +	int err = stack_try_add(st, write, arg);
      +	if (err < 0) {
      +		if (err == LOCK_ERROR) {
     -+			err = stack_reload(st);
     ++			err = reftable_stack_reload(st);
      +		}
      +		return err;
      +	}
      +
     -+	return stack_auto_compact(st);
     ++	return reftable_stack_auto_compact(st);
      +}
      +
      +static void format_name(struct slice *dest, uint64_t min, uint64_t max)
     @@ -4854,34 +5129,35 @@
      +	slice_set_string(dest, buf);
      +}
      +
     -+int stack_try_add(struct stack *st,
     -+		  int (*write_table)(struct writer *wr, void *arg), void *arg)
     ++struct reftable_addition {
     ++	int lock_file_fd;
     ++	struct slice lock_file_name;
     ++	struct reftable_stack *stack;
     ++	char **names;
     ++	char **new_tables;
     ++	int new_tables_len;
     ++	uint64_t next_update_index;
     ++};
     ++
     ++static int reftable_stack_init_addition(struct reftable_addition *add,
     ++					struct reftable_stack *st)
      +{
     -+	struct slice lock_name = { 0 };
     -+	struct slice temp_tab_name = { 0 };
     -+	struct slice tab_name = { 0 };
     -+	struct slice next_name = { 0 };
     -+	struct slice table_list = { 0 };
     -+	struct writer *wr = NULL;
      +	int err = 0;
     -+	int tab_fd = 0;
     -+	int lock_fd = 0;
     -+	uint64_t next_update_index = 0;
     ++	add->stack = st;
      +
     -+	slice_set_string(&lock_name, st->list_file);
     -+	slice_append_string(&lock_name, ".lock");
     ++	slice_set_string(&add->lock_file_name, st->list_file);
     ++	slice_append_string(&add->lock_file_name, ".lock");
      +
     -+	lock_fd = open(slice_as_string(&lock_name), O_EXCL | O_CREAT | O_WRONLY,
     -+		       0644);
     -+	if (lock_fd < 0) {
     ++	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
     ++				 O_EXCL | O_CREAT | O_WRONLY, 0644);
     ++	if (add->lock_file_fd < 0) {
      +		if (errno == EEXIST) {
      +			err = LOCK_ERROR;
     -+			goto exit;
     ++		} else {
     ++			err = IO_ERROR;
      +		}
     -+		err = IO_ERROR;
      +		goto exit;
      +	}
     -+
      +	err = stack_uptodate(st);
      +	if (err < 0) {
      +		goto exit;
     @@ -4892,29 +5168,160 @@
      +		goto exit;
      +	}
      +
     -+	next_update_index = stack_next_update_index(st);
     ++	add->next_update_index = reftable_stack_next_update_index(st);
     ++exit:
     ++	if (err) {
     ++		reftable_addition_close(add);
     ++	}
     ++	return err;
     ++}
     ++
     ++void reftable_addition_close(struct reftable_addition *add)
     ++{
     ++	int i = 0;
     ++	struct slice nm = {};
     ++	for (i = 0; i < add->new_tables_len; i++) {
     ++		slice_set_string(&nm, add->stack->list_file);
     ++		slice_append_string(&nm, "/");
     ++		slice_append_string(&nm, add->new_tables[i]);
     ++		unlink(slice_as_string(&nm));
     ++
     ++		reftable_free(add->new_tables[i]);
     ++		add->new_tables[i] = NULL;
     ++	}
     ++	reftable_free(add->new_tables);
     ++	add->new_tables = NULL;
     ++	add->new_tables_len = 0;
     ++
     ++	if (add->lock_file_fd > 0) {
     ++		close(add->lock_file_fd);
     ++		add->lock_file_fd = 0;
     ++	}
     ++	if (add->lock_file_name.len > 0) {
     ++		unlink(slice_as_string(&add->lock_file_name));
     ++		reftable_free(slice_yield(&add->lock_file_name));
     ++	}
     ++
     ++	free_names(add->names);
     ++	add->names = NULL;
     ++}
     ++
     ++int reftable_addition_commit(struct reftable_addition *add)
     ++{
     ++	struct slice table_list = { 0 };
     ++	int i = 0;
     ++	int err = 0;
     ++	if (add->new_tables_len == 0) {
     ++		goto exit;
     ++	}
     ++
     ++	for (i = 0; i < add->stack->merged->stack_len; i++) {
     ++		slice_append_string(&table_list,
     ++				    add->stack->merged->stack[i]->name);
     ++		slice_append_string(&table_list, "\n");
     ++	}
     ++	for (i = 0; i < add->new_tables_len; i++) {
     ++		slice_append_string(&table_list, add->new_tables[i]);
     ++		slice_append_string(&table_list, "\n");
     ++	}
     ++
     ++	err = write(add->lock_file_fd, table_list.buf, table_list.len);
     ++	free(slice_yield(&table_list));
     ++	if (err < 0) {
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	err = close(add->lock_file_fd);
     ++	add->lock_file_fd = 0;
     ++	if (err < 0) {
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	err = rename(slice_as_string(&add->lock_file_name),
     ++		     add->stack->list_file);
     ++	if (err < 0) {
     ++		err = IO_ERROR;
     ++		goto exit;
     ++	}
     ++
     ++	err = reftable_stack_reload(add->stack);
     ++
     ++exit:
     ++	reftable_addition_close(add);
     ++	return err;
     ++}
     ++
     ++int reftable_stack_new_addition(struct reftable_addition **dest,
     ++				struct reftable_stack *st)
     ++{
     ++	int err = 0;
     ++	*dest = reftable_malloc(sizeof(**dest));
     ++	err = reftable_stack_init_addition(*dest, st);
     ++	if (err) {
     ++		reftable_free(*dest);
     ++		*dest = NULL;
     ++	}
     ++	return err;
     ++}
     ++
     ++int stack_try_add(struct reftable_stack *st,
     ++		  int (*write_table)(struct reftable_writer *wr, void *arg),
     ++		  void *arg)
     ++{
     ++	struct reftable_addition add = { 0 };
     ++	int err = reftable_stack_init_addition(&add, st);
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	err = reftable_addition_add(&add, write_table, arg);
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	err = reftable_addition_commit(&add);
     ++exit:
     ++	reftable_addition_close(&add);
     ++	return err;
     ++}
     ++
     ++int reftable_addition_add(struct reftable_addition *add,
     ++			  int (*write_table)(struct reftable_writer *wr,
     ++					     void *arg),
     ++			  void *arg)
     ++{
     ++	struct slice temp_tab_file_name = { 0 };
     ++	struct slice tab_file_name = { 0 };
     ++	struct slice next_name = { 0 };
     ++	struct reftable_writer *wr = NULL;
     ++	int err = 0;
     ++	int tab_fd = 0;
     ++	uint64_t next_update_index = 0;
      +
      +	slice_resize(&next_name, 0);
      +	format_name(&next_name, next_update_index, next_update_index);
      +
     -+	slice_set_string(&temp_tab_name, st->reftable_dir);
     -+	slice_append_string(&temp_tab_name, "/");
     -+	slice_append(&temp_tab_name, next_name);
     -+	slice_append_string(&temp_tab_name, ".temp.XXXXXX");
     ++	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
     ++	slice_append_string(&temp_tab_file_name, "/");
     ++	slice_append(&temp_tab_file_name, next_name);
     ++	slice_append_string(&temp_tab_file_name, ".temp.XXXXXX");
      +
     -+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_name));
     ++	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
      +	if (tab_fd < 0) {
      +		err = IO_ERROR;
      +		goto exit;
      +	}
      +
     -+	wr = new_writer(fd_writer, &tab_fd, &st->config);
     ++	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
     ++				 &add->stack->config);
      +	err = write_table(wr, arg);
      +	if (err < 0) {
      +		goto exit;
      +	}
      +
     -+	err = writer_close(wr);
     ++	err = reftable_writer_close(wr);
      +	if (err == EMPTY_TABLE_ERROR) {
      +		err = 0;
      +		goto exit;
     @@ -4935,98 +5342,64 @@
      +		goto exit;
      +	}
      +
     -+	{
     -+		int i = 0;
     -+		for (i = 0; i < st->merged->stack_len; i++) {
     -+			slice_append_string(&table_list,
     -+					    st->merged->stack[i]->name);
     -+			slice_append_string(&table_list, "\n");
     -+		}
     -+	}
     -+
      +	format_name(&next_name, wr->min_update_index, wr->max_update_index);
      +	slice_append_string(&next_name, ".ref");
     -+	slice_append(&table_list, next_name);
     -+	slice_append_string(&table_list, "\n");
      +
     -+	slice_set_string(&tab_name, st->reftable_dir);
     -+	slice_append_string(&tab_name, "/");
     -+	slice_append(&tab_name, next_name);
     ++	slice_set_string(&tab_file_name, add->stack->reftable_dir);
     ++	slice_append_string(&tab_file_name, "/");
     ++	slice_append(&tab_file_name, next_name);
      +
     -+	err = rename(slice_as_string(&temp_tab_name),
     -+		     slice_as_string(&tab_name));
     ++	err = rename(slice_as_string(&temp_tab_file_name),
     ++		     slice_as_string(&tab_file_name));
      +	if (err < 0) {
      +		err = IO_ERROR;
      +		goto exit;
      +	}
     -+	free(slice_yield(&temp_tab_name));
      +
     -+	err = write(lock_fd, table_list.buf, table_list.len);
     -+	if (err < 0) {
     -+		err = IO_ERROR;
     -+		goto exit;
     -+	}
     -+	err = close(lock_fd);
     -+	lock_fd = 0;
     -+	if (err < 0) {
     -+		unlink(slice_as_string(&tab_name));
     -+		err = IO_ERROR;
     -+		goto exit;
     -+	}
     -+
     -+	err = rename(slice_as_string(&lock_name), st->list_file);
     -+	if (err < 0) {
     -+		unlink(slice_as_string(&tab_name));
     -+		err = IO_ERROR;
     -+		goto exit;
     -+	}
     -+
     -+	err = stack_reload(st);
     ++	add->new_tables = reftable_realloc(add->new_tables,
     ++					   sizeof(*add->new_tables) *
     ++						   (add->new_tables_len + 1));
     ++	add->new_tables[add->new_tables_len] = slice_to_string(next_name);
     ++	add->new_tables_len++;
      +exit:
      +	if (tab_fd > 0) {
      +		close(tab_fd);
      +		tab_fd = 0;
      +	}
     -+	if (temp_tab_name.len > 0) {
     -+		unlink(slice_as_string(&temp_tab_name));
     -+	}
     -+	unlink(slice_as_string(&lock_name));
     -+
     -+	if (lock_fd > 0) {
     -+		close(lock_fd);
     -+		lock_fd = 0;
     ++	if (temp_tab_file_name.len > 0) {
     ++		unlink(slice_as_string(&temp_tab_file_name));
      +	}
      +
     -+	free(slice_yield(&lock_name));
     -+	free(slice_yield(&temp_tab_name));
     -+	free(slice_yield(&tab_name));
     -+	free(slice_yield(&next_name));
     -+	free(slice_yield(&table_list));
     -+	writer_free(wr);
     ++	reftable_free(slice_yield(&temp_tab_file_name));
     ++	reftable_free(slice_yield(&tab_file_name));
     ++	reftable_free(slice_yield(&next_name));
     ++	reftable_writer_free(wr);
      +	return err;
      +}
      +
     -+uint64_t stack_next_update_index(struct stack *st)
     ++uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
      +{
      +	int sz = st->merged->stack_len;
      +	if (sz > 0) {
     -+		return reader_max_update_index(st->merged->stack[sz - 1]) + 1;
     ++		return reftable_reader_max_update_index(
     ++			       st->merged->stack[sz - 1]) +
     ++		       1;
      +	}
      +	return 1;
      +}
      +
     -+static int stack_compact_locked(struct stack *st, int first, int last,
     ++static int stack_compact_locked(struct reftable_stack *st, int first, int last,
      +				struct slice *temp_tab,
     -+				struct log_expiry_config *config)
     ++				struct reftable_log_expiry_config *config)
      +{
      +	struct slice next_name = { 0 };
      +	int tab_fd = -1;
     -+	struct writer *wr = NULL;
     ++	struct reftable_writer *wr = NULL;
      +	int err = 0;
      +
      +	format_name(&next_name,
     -+		    reader_min_update_index(st->merged->stack[first]),
     -+		    reader_max_update_index(st->merged->stack[first]));
     ++		    reftable_reader_min_update_index(st->merged->stack[first]),
     ++		    reftable_reader_max_update_index(st->merged->stack[first]));
      +
      +	slice_set_string(temp_tab, st->reftable_dir);
      +	slice_append_string(temp_tab, "/");
     @@ -5034,17 +5407,17 @@
      +	slice_append_string(temp_tab, ".temp.XXXXXX");
      +
      +	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
     -+	wr = new_writer(fd_writer, &tab_fd, &st->config);
     ++	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
      +
      +	err = stack_write_compact(st, wr, first, last, config);
      +	if (err < 0) {
      +		goto exit;
      +	}
     -+	err = writer_close(wr);
     ++	err = reftable_writer_close(wr);
      +	if (err < 0) {
      +		goto exit;
      +	}
     -+	writer_free(wr);
     ++	reftable_writer_free(wr);
      +
      +	err = close(tab_fd);
      +	tab_fd = 0;
     @@ -5056,46 +5429,49 @@
      +	}
      +	if (err != 0 && temp_tab->len > 0) {
      +		unlink(slice_as_string(temp_tab));
     -+		free(slice_yield(temp_tab));
     ++		reftable_free(slice_yield(temp_tab));
      +	}
     -+	free(slice_yield(&next_name));
     ++	reftable_free(slice_yield(&next_name));
      +	return err;
      +}
      +
     -+int stack_write_compact(struct stack *st, struct writer *wr, int first,
     -+			int last, struct log_expiry_config *config)
     ++int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
     ++			int first, int last,
     ++			struct reftable_log_expiry_config *config)
      +{
      +	int subtabs_len = last - first + 1;
     -+	struct reader **subtabs =
     -+		calloc(sizeof(struct reader *), last - first + 1);
     -+	struct merged_table *mt = NULL;
     ++	struct reftable_reader **subtabs = reftable_calloc(
     ++		sizeof(struct reftable_reader *) * (last - first + 1));
     ++	struct reftable_merged_table *mt = NULL;
      +	int err = 0;
     -+	struct iterator it = { 0 };
     -+	struct ref_record ref = { 0 };
     -+	struct log_record log = { 0 };
     ++	struct reftable_iterator it = { 0 };
     ++	struct reftable_ref_record ref = { 0 };
     ++	struct reftable_log_record log = { 0 };
      +
      +	int i = 0, j = 0;
      +	for (i = first, j = 0; i <= last; i++) {
     -+		struct reader *t = st->merged->stack[i];
     ++		struct reftable_reader *t = st->merged->stack[i];
      +		subtabs[j++] = t;
      +		st->stats.bytes += t->size;
      +	}
     -+	writer_set_limits(wr, st->merged->stack[first]->min_update_index,
     -+			  st->merged->stack[last]->max_update_index);
     ++	reftable_writer_set_limits(wr,
     ++				   st->merged->stack[first]->min_update_index,
     ++				   st->merged->stack[last]->max_update_index);
      +
     -+	err = new_merged_table(&mt, subtabs, subtabs_len, st->config.hash_id);
     ++	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
     ++					st->config.hash_id);
      +	if (err < 0) {
     -+		free(subtabs);
     ++		reftable_free(subtabs);
      +		goto exit;
      +	}
      +
     -+	err = merged_table_seek_ref(mt, &it, "");
     ++	err = reftable_merged_table_seek_ref(mt, &it, "");
      +	if (err < 0) {
      +		goto exit;
      +	}
      +
      +	while (true) {
     -+		err = iterator_next_ref(it, &ref);
     ++		err = reftable_iterator_next_ref(it, &ref);
      +		if (err > 0) {
      +			err = 0;
      +			break;
     @@ -5103,23 +5479,23 @@
      +		if (err < 0) {
      +			break;
      +		}
     -+		if (first == 0 && ref_record_is_deletion(&ref)) {
     ++		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
      +			continue;
      +		}
      +
     -+		err = writer_add_ref(wr, &ref);
     ++		err = reftable_writer_add_ref(wr, &ref);
      +		if (err < 0) {
      +			break;
      +		}
      +	}
      +
     -+	err = merged_table_seek_log(mt, &it, "");
     ++	err = reftable_merged_table_seek_log(mt, &it, "");
      +	if (err < 0) {
      +		goto exit;
      +	}
      +
      +	while (true) {
     -+		err = iterator_next_log(it, &log);
     ++		err = reftable_iterator_next_log(it, &log);
      +		if (err > 0) {
      +			err = 0;
      +			break;
     @@ -5127,7 +5503,7 @@
      +		if (err < 0) {
      +			break;
      +		}
     -+		if (first == 0 && log_record_is_deletion(&log)) {
     ++		if (first == 0 && reftable_log_record_is_deletion(&log)) {
      +			continue;
      +		}
      +
     @@ -5143,28 +5519,28 @@
      +			continue;
      +		}
      +
     -+		err = writer_add_log(wr, &log);
     ++		err = reftable_writer_add_log(wr, &log);
      +		if (err < 0) {
      +			break;
      +		}
      +	}
      +
      +exit:
     -+	iterator_destroy(&it);
     ++	reftable_iterator_destroy(&it);
      +	if (mt != NULL) {
      +		merged_table_clear(mt);
     -+		merged_table_free(mt);
     ++		reftable_merged_table_free(mt);
      +	}
     -+	ref_record_clear(&ref);
     ++	reftable_ref_record_clear(&ref);
      +
      +	return err;
      +}
      +
      +/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
     -+static int stack_compact_range(struct stack *st, int first, int last,
     -+			       struct log_expiry_config *expiry)
     ++static int stack_compact_range(struct reftable_stack *st, int first, int last,
     ++			       struct reftable_log_expiry_config *expiry)
      +{
     -+	struct slice temp_tab_name = { 0 };
     ++	struct slice temp_tab_file_name = { 0 };
      +	struct slice new_table_name = { 0 };
      +	struct slice lock_file_name = { 0 };
      +	struct slice ref_list_contents = { 0 };
     @@ -5173,8 +5549,10 @@
      +	bool have_lock = false;
      +	int lock_file_fd = 0;
      +	int compact_count = last - first + 1;
     -+	char **delete_on_success = calloc(sizeof(char *), compact_count + 1);
     -+	char **subtable_locks = calloc(sizeof(char *), compact_count + 1);
     ++	char **delete_on_success =
     ++		reftable_calloc(sizeof(char *) * (compact_count + 1));
     ++	char **subtable_locks =
     ++		reftable_calloc(sizeof(char *) * (compact_count + 1));
      +	int i = 0;
      +	int j = 0;
      +	bool is_empty_table = false;
     @@ -5206,14 +5584,14 @@
      +	}
      +
      +	for (i = first, j = 0; i <= last; i++) {
     -+		struct slice subtab_name = { 0 };
     ++		struct slice subtab_file_name = { 0 };
      +		struct slice subtab_lock = { 0 };
     -+		slice_set_string(&subtab_name, st->reftable_dir);
     -+		slice_append_string(&subtab_name, "/");
     -+		slice_append_string(&subtab_name,
     ++		slice_set_string(&subtab_file_name, st->reftable_dir);
     ++		slice_append_string(&subtab_file_name, "/");
     ++		slice_append_string(&subtab_file_name,
      +				    reader_name(st->merged->stack[i]));
      +
     -+		slice_copy(&subtab_lock, subtab_name);
     ++		slice_copy(&subtab_lock, subtab_file_name);
      +		slice_append_string(&subtab_lock, ".lock");
      +
      +		{
     @@ -5231,7 +5609,8 @@
      +		}
      +
      +		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
     -+		delete_on_success[j] = (char *)slice_as_string(&subtab_name);
     ++		delete_on_success[j] =
     ++			(char *)slice_as_string(&subtab_file_name);
      +		j++;
      +
      +		if (err != 0) {
     @@ -5245,7 +5624,8 @@
      +	}
      +	have_lock = false;
      +
     -+	err = stack_compact_locked(st, first, last, &temp_tab_name, expiry);
     ++	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
     ++				   expiry);
      +	/* Compaction + tombstones can create an empty table out of non-empty
      +	 * tables. */
      +	is_empty_table = (err == EMPTY_TABLE_ERROR);
     @@ -5278,7 +5658,7 @@
      +	slice_append(&new_table_path, new_table_name);
      +
      +	if (!is_empty_table) {
     -+		err = rename(slice_as_string(&temp_tab_name),
     ++		err = rename(slice_as_string(&temp_tab_file_name),
      +			     slice_as_string(&new_table_path));
      +		if (err < 0) {
      +			goto exit;
     @@ -5329,7 +5709,7 @@
      +		}
      +	}
      +
     -+	err = stack_reload_maybe_reuse(st, first < last);
     ++	err = reftable_stack_reload_maybe_reuse(st, first < last);
      +exit:
      +	free_names(delete_on_success);
      +	{
     @@ -5347,21 +5727,23 @@
      +	if (have_lock) {
      +		unlink(slice_as_string(&lock_file_name));
      +	}
     -+	free(slice_yield(&new_table_name));
     -+	free(slice_yield(&new_table_path));
     -+	free(slice_yield(&ref_list_contents));
     -+	free(slice_yield(&temp_tab_name));
     -+	free(slice_yield(&lock_file_name));
     ++	reftable_free(slice_yield(&new_table_name));
     ++	reftable_free(slice_yield(&new_table_path));
     ++	reftable_free(slice_yield(&ref_list_contents));
     ++	reftable_free(slice_yield(&temp_tab_file_name));
     ++	reftable_free(slice_yield(&lock_file_name));
      +	return err;
      +}
      +
     -+int stack_compact_all(struct stack *st, struct log_expiry_config *config)
     ++int reftable_stack_compact_all(struct reftable_stack *st,
     ++			       struct reftable_log_expiry_config *config)
      +{
      +	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
      +}
      +
     -+static int stack_compact_range_stats(struct stack *st, int first, int last,
     -+				     struct log_expiry_config *config)
     ++static int stack_compact_range_stats(struct reftable_stack *st, int first,
     ++				     int last,
     ++				     struct reftable_log_expiry_config *config)
      +{
      +	int err = stack_compact_range(st, first, last, config);
      +	if (err > 0) {
     @@ -5387,7 +5769,7 @@
      +
      +struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
      +{
     -+	struct segment *segs = calloc(sizeof(struct segment), n);
     ++	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
      +	int next = 0;
      +	struct segment cur = { 0 };
      +	int i = 0;
     @@ -5406,8 +5788,9 @@
      +		cur.end = i + 1;
      +		cur.bytes += sizes[i];
      +	}
     -+
     -+	segs[next++] = cur;
     ++	if (next > 0) {
     ++		segs[next++] = cur;
     ++	}
      +	*seglen = next;
      +	return segs;
      +}
     @@ -5440,13 +5823,14 @@
      +		min_seg.bytes += sizes[prev];
      +	}
      +
     -+	free(segs);
     ++	reftable_free(segs);
      +	return min_seg;
      +}
      +
     -+static uint64_t *stack_table_sizes_for_compaction(struct stack *st)
     ++static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
      +{
     -+	uint64_t *sizes = calloc(sizeof(uint64_t), st->merged->stack_len);
     ++	uint64_t *sizes =
     ++		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
      +	int i = 0;
      +	for (i = 0; i < st->merged->stack_len; i++) {
      +		/* overhead is 24 + 68 = 92. */
     @@ -5455,12 +5839,12 @@
      +	return sizes;
      +}
      +
     -+int stack_auto_compact(struct stack *st)
     ++int reftable_stack_auto_compact(struct reftable_stack *st)
      +{
      +	uint64_t *sizes = stack_table_sizes_for_compaction(st);
      +	struct segment seg =
      +		suggest_compaction_segment(sizes, st->merged->stack_len);
     -+	free(sizes);
     ++	reftable_free(sizes);
      +	if (segment_size(&seg) > 0) {
      +		return stack_compact_range_stats(st, seg.start, seg.end - 1,
      +						 NULL);
     @@ -5469,58 +5853,61 @@
      +	return 0;
      +}
      +
     -+struct compaction_stats *stack_compaction_stats(struct stack *st)
     ++struct reftable_compaction_stats *
     ++reftable_stack_compaction_stats(struct reftable_stack *st)
      +{
      +	return &st->stats;
      +}
      +
     -+int stack_read_ref(struct stack *st, const char *refname,
     -+		   struct ref_record *ref)
     ++int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
     ++			    struct reftable_ref_record *ref)
      +{
     -+	struct iterator it = { 0 };
     -+	struct merged_table *mt = stack_merged_table(st);
     -+	int err = merged_table_seek_ref(mt, &it, refname);
     ++	struct reftable_iterator it = { 0 };
     ++	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
     ++	int err = reftable_merged_table_seek_ref(mt, &it, refname);
      +	if (err) {
      +		goto exit;
      +	}
      +
     -+	err = iterator_next_ref(it, ref);
     ++	err = reftable_iterator_next_ref(it, ref);
      +	if (err) {
      +		goto exit;
      +	}
      +
     -+	if (strcmp(ref->ref_name, refname) || ref_record_is_deletion(ref)) {
     ++	if (strcmp(ref->ref_name, refname) ||
     ++	    reftable_ref_record_is_deletion(ref)) {
      +		err = 1;
      +		goto exit;
      +	}
      +
      +exit:
     -+	iterator_destroy(&it);
     ++	reftable_iterator_destroy(&it);
      +	return err;
      +}
      +
     -+int stack_read_log(struct stack *st, const char *refname,
     -+		   struct log_record *log)
     ++int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
     ++			    struct reftable_log_record *log)
      +{
     -+	struct iterator it = { 0 };
     -+	struct merged_table *mt = stack_merged_table(st);
     -+	int err = merged_table_seek_log(mt, &it, refname);
     ++	struct reftable_iterator it = { 0 };
     ++	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
     ++	int err = reftable_merged_table_seek_log(mt, &it, refname);
      +	if (err) {
      +		goto exit;
      +	}
      +
     -+	err = iterator_next_log(it, log);
     ++	err = reftable_iterator_next_log(it, log);
      +	if (err) {
      +		goto exit;
      +	}
      +
     -+	if (strcmp(log->ref_name, refname) || log_record_is_deletion(log)) {
     ++	if (strcmp(log->ref_name, refname) ||
     ++	    reftable_log_record_is_deletion(log)) {
      +		err = 1;
      +		goto exit;
      +	}
      +
      +exit:
     -+	iterator_destroy(&it);
     ++	reftable_iterator_destroy(&it);
      +	return err;
      +}
      
     @@ -5542,21 +5929,23 @@
      +
      +#include "reftable.h"
      +
     -+struct stack {
     ++struct reftable_stack {
      +	char *list_file;
      +	char *reftable_dir;
      +
     -+	struct write_options config;
     ++	struct reftable_write_options config;
      +
     -+	struct merged_table *merged;
     -+	struct compaction_stats stats;
     ++	struct reftable_merged_table *merged;
     ++	struct reftable_compaction_stats stats;
      +};
      +
      +int read_lines(const char *filename, char ***lines);
     -+int stack_try_add(struct stack *st,
     -+		  int (*write_table)(struct writer *wr, void *arg), void *arg);
     -+int stack_write_compact(struct stack *st, struct writer *wr, int first,
     -+			int last, struct log_expiry_config *config);
     ++int stack_try_add(struct reftable_stack *st,
     ++		  int (*write_table)(struct reftable_writer *wr, void *arg),
     ++		  void *arg);
     ++int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
     ++			int first, int last,
     ++			struct reftable_log_expiry_config *config);
      +int fastlog2(uint64_t sz);
      +
      +struct segment {
     @@ -5644,6 +6033,7 @@
      +
      +#include "tree.h"
      +
     ++#include "basics.h"
      +#include "system.h"
      +
      +struct tree_node *tree_search(void *key, struct tree_node **rootp,
     @@ -5655,7 +6045,7 @@
      +			return NULL;
      +		} else {
      +			struct tree_node *n =
     -+				calloc(sizeof(struct tree_node), 1);
     ++				reftable_calloc(sizeof(struct tree_node));
      +			n->key = key;
      +			*rootp = n;
      +			return *rootp;
     @@ -5698,7 +6088,7 @@
      +	if (t->right != NULL) {
      +		tree_free(t->right);
      +	}
     -+	free(t);
     ++	reftable_free(t);
      +}
      
       diff --git a/reftable/tree.h b/reftable/tree.h
     @@ -5743,6 +6133,7 @@
      +((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
      +git clone https://github.com/google/reftable reftable-repo) && \
      +cp reftable-repo/c/*.[ch] reftable/ && \
     ++cp reftable-repo/c/include/*.[ch] reftable/ && \
      +cp reftable-repo/LICENSE reftable/ &&
      +git --git-dir reftable-repo/.git show --no-patch origin/master \
      +> reftable/VERSION && \
     @@ -5773,7 +6164,8 @@
      +#include "reftable.h"
      +#include "tree.h"
      +
     -+static struct block_stats *writer_block_stats(struct writer *w, byte typ)
     ++static struct reftable_block_stats *
     ++writer_reftable_block_stats(struct reftable_writer *w, byte typ)
      +{
      +	switch (typ) {
      +	case 'r':
     @@ -5791,18 +6183,19 @@
      +
      +/* write data, queuing the padding for the next write. Returns negative for
      + * error. */
     -+static int padded_write(struct writer *w, byte *data, size_t len, int padding)
     ++static int padded_write(struct reftable_writer *w, byte *data, size_t len,
     ++			int padding)
      +{
      +	int n = 0;
      +	if (w->pending_padding > 0) {
     -+		byte *zeroed = calloc(w->pending_padding, 1);
     ++		byte *zeroed = reftable_calloc(w->pending_padding);
      +		int n = w->write(w->write_arg, zeroed, w->pending_padding);
      +		if (n < 0) {
      +			return n;
      +		}
      +
      +		w->pending_padding = 0;
     -+		free(zeroed);
     ++		reftable_free(zeroed);
      +	}
      +
      +	w->pending_padding = padding;
     @@ -5814,7 +6207,7 @@
      +	return 0;
      +}
      +
     -+static void options_set_defaults(struct write_options *opts)
     ++static void options_set_defaults(struct reftable_write_options *opts)
      +{
      +	if (opts->restart_interval == 0) {
      +		opts->restart_interval = 16;
     @@ -5828,25 +6221,31 @@
      +	}
      +}
      +
     -+static int writer_write_header(struct writer *w, byte *dest)
     ++static int writer_version(struct reftable_writer *w)
     ++{
     ++	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
     ++}
     ++
     ++static int writer_write_header(struct reftable_writer *w, byte *dest)
      +{
      +	memcpy((char *)dest, "REFT", 4);
      +
     -+	/* DO NOT SUBMIT.  This has not been encoded in the standard yet. */
     -+	dest[4] = (hash_size(w->opts.hash_id) == SHA1_SIZE) ? 1 : 2; /* version
     -+								      */
     ++	dest[4] = writer_version(w);
      +
      +	put_be24(dest + 5, w->opts.block_size);
      +	put_be64(dest + 8, w->min_update_index);
      +	put_be64(dest + 16, w->max_update_index);
     -+	return 24;
     ++	if (writer_version(w) == 2) {
     ++		put_be32(dest + 24, w->opts.hash_id);
     ++	}
     ++	return header_size(writer_version(w));
      +}
      +
     -+static void writer_reinit_block_writer(struct writer *w, byte typ)
     ++static void writer_reinit_block_writer(struct reftable_writer *w, byte typ)
      +{
      +	int block_start = 0;
      +	if (w->next == 0) {
     -+		block_start = HEADER_SIZE;
     ++		block_start = header_size(writer_version(w));
      +	}
      +
      +	block_writer_init(&w->block_writer_data, typ, w->block,
     @@ -5856,16 +6255,18 @@
      +	w->block_writer->restart_interval = w->opts.restart_interval;
      +}
      +
     -+struct writer *new_writer(int (*writer_func)(void *, byte *, int),
     -+			  void *writer_arg, struct write_options *opts)
     ++struct reftable_writer *
     ++reftable_new_writer(int (*writer_func)(void *, byte *, int), void *writer_arg,
     ++		    struct reftable_write_options *opts)
      +{
     -+	struct writer *wp = calloc(sizeof(struct writer), 1);
     ++	struct reftable_writer *wp =
     ++		reftable_calloc(sizeof(struct reftable_writer));
      +	options_set_defaults(opts);
      +	if (opts->block_size >= (1 << 24)) {
      +		/* TODO - error return? */
      +		abort();
      +	}
     -+	wp->block = calloc(opts->block_size, 1);
     ++	wp->block = reftable_calloc(opts->block_size);
      +	wp->write = writer_func;
      +	wp->write_arg = writer_arg;
      +	wp->opts = *opts;
     @@ -5874,16 +6275,17 @@
      +	return wp;
      +}
      +
     -+void writer_set_limits(struct writer *w, uint64_t min, uint64_t max)
     ++void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
     ++				uint64_t max)
      +{
      +	w->min_update_index = min;
      +	w->max_update_index = max;
      +}
      +
     -+void writer_free(struct writer *w)
     ++void reftable_writer_free(struct reftable_writer *w)
      +{
     -+	free(w->block);
     -+	free(w);
     ++	reftable_free(w->block);
     ++	reftable_free(w);
      +}
      +
      +struct obj_index_tree_node {
     @@ -5899,7 +6301,7 @@
      +			     ((const struct obj_index_tree_node *)b)->hash);
      +}
      +
     -+static void writer_index_hash(struct writer *w, struct slice hash)
     ++static void writer_index_hash(struct reftable_writer *w, struct slice hash)
      +{
      +	uint64_t off = w->next;
      +
     @@ -5909,7 +6311,7 @@
      +					     &obj_index_tree_node_compare, 0);
      +	struct obj_index_tree_node *key = NULL;
      +	if (node == NULL) {
     -+		key = calloc(sizeof(struct obj_index_tree_node), 1);
     ++		key = reftable_calloc(sizeof(struct obj_index_tree_node));
      +		slice_copy(&key->hash, hash);
      +		tree_search((void *)key, &w->obj_index_tree,
      +			    &obj_index_tree_node_compare, 1);
     @@ -5923,14 +6325,14 @@
      +
      +	if (key->offset_len == key->offset_cap) {
      +		key->offset_cap = 2 * key->offset_cap + 1;
     -+		key->offsets = realloc(key->offsets,
     -+				       sizeof(uint64_t) * key->offset_cap);
     ++		key->offsets = reftable_realloc(
     ++			key->offsets, sizeof(uint64_t) * key->offset_cap);
      +	}
      +
      +	key->offsets[key->offset_len++] = off;
      +}
      +
     -+static int writer_add_record(struct writer *w, struct record rec)
     ++static int writer_add_record(struct reftable_writer *w, struct record rec)
      +{
      +	int result = -1;
      +	struct slice key = { 0 };
     @@ -5967,14 +6369,15 @@
      +
      +	result = 0;
      +exit:
     -+	free(slice_yield(&key));
     ++	reftable_free(slice_yield(&key));
      +	return result;
      +}
      +
     -+int writer_add_ref(struct writer *w, struct ref_record *ref)
     ++int reftable_writer_add_ref(struct reftable_writer *w,
     ++			    struct reftable_ref_record *ref)
      +{
      +	struct record rec = { 0 };
     -+	struct ref_record copy = *ref;
     ++	struct reftable_ref_record copy = *ref;
      +	int err = 0;
      +
      +	if (ref->ref_name == NULL) {
     @@ -6010,18 +6413,20 @@
      +	return 0;
      +}
      +
     -+int writer_add_refs(struct writer *w, struct ref_record *refs, int n)
     ++int reftable_writer_add_refs(struct reftable_writer *w,
     ++			     struct reftable_ref_record *refs, int n)
      +{
      +	int err = 0;
      +	int i = 0;
     -+	QSORT(refs, n, ref_record_compare_name);
     ++	QSORT(refs, n, reftable_ref_record_compare_name);
      +	for (i = 0; err == 0 && i < n; i++) {
     -+		err = writer_add_ref(w, &refs[i]);
     ++		err = reftable_writer_add_ref(w, &refs[i]);
      +	}
      +	return err;
      +}
      +
     -+int writer_add_log(struct writer *w, struct log_record *log)
     ++int reftable_writer_add_log(struct reftable_writer *w,
     ++			    struct reftable_log_record *log)
      +{
      +	if (log->ref_name == NULL) {
      +		return API_ERROR;
     @@ -6047,18 +6452,19 @@
      +	}
      +}
      +
     -+int writer_add_logs(struct writer *w, struct log_record *logs, int n)
     ++int reftable_writer_add_logs(struct reftable_writer *w,
     ++			     struct reftable_log_record *logs, int n)
      +{
      +	int err = 0;
      +	int i = 0;
     -+	QSORT(logs, n, log_record_compare_key);
     ++	QSORT(logs, n, reftable_log_record_compare_key);
      +	for (i = 0; err == 0 && i < n; i++) {
     -+		err = writer_add_log(w, &logs[i]);
     ++		err = reftable_writer_add_log(w, &logs[i]);
      +	}
      +	return err;
      +}
      +
     -+static int writer_finish_section(struct writer *w)
     ++static int writer_finish_section(struct reftable_writer *w)
      +{
      +	byte typ = block_writer_type(w->block_writer);
      +	uint64_t index_start = 0;
     @@ -6105,9 +6511,9 @@
      +			assert(err == 0);
      +		}
      +		for (i = 0; i < idx_len; i++) {
     -+			free(slice_yield(&idx[i].last_key));
     ++			reftable_free(slice_yield(&idx[i].last_key));
      +		}
     -+		free(idx);
     ++		reftable_free(idx);
      +	}
      +
      +	writer_clear_index(w);
     @@ -6118,7 +6524,8 @@
      +	}
      +
      +	{
     -+		struct block_stats *bstats = writer_block_stats(w, typ);
     ++		struct reftable_block_stats *bstats =
     ++			writer_reftable_block_stats(w, typ);
      +		bstats->index_blocks =
      +			w->stats.idx_stats.blocks - before_blocks;
      +		bstats->index_offset = index_start;
     @@ -6150,7 +6557,7 @@
      +}
      +
      +struct write_record_arg {
     -+	struct writer *w;
     ++	struct reftable_writer *w;
      +	int err;
      +};
      +
     @@ -6199,11 +6606,11 @@
      +	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
      +
      +	FREE_AND_NULL(entry->offsets);
     -+	free(slice_yield(&entry->hash));
     -+	free(entry);
     ++	reftable_free(slice_yield(&entry->hash));
     ++	reftable_free(entry);
      +}
      +
     -+static int writer_dump_object_index(struct writer *w)
     ++static int writer_dump_object_index(struct reftable_writer *w)
      +{
      +	struct write_record_arg closure = { .w = w };
      +	struct common_prefix_arg common = { 0 };
     @@ -6224,7 +6631,7 @@
      +	return writer_finish_section(w);
      +}
      +
     -+int writer_finish_public_section(struct writer *w)
     ++int writer_finish_public_section(struct reftable_writer *w)
      +{
      +	byte typ = 0;
      +	int err = 0;
     @@ -6256,9 +6663,9 @@
      +	return 0;
      +}
      +
     -+int writer_close(struct writer *w)
     ++int reftable_writer_close(struct reftable_writer *w)
      +{
     -+	byte footer[68];
     ++	byte footer[72];
      +	byte *p = footer;
      +
      +	int err = writer_finish_public_section(w);
     @@ -6266,8 +6673,7 @@
      +		goto exit;
      +	}
      +
     -+	writer_write_header(w, footer);
     -+	p += 24;
     ++	p += writer_write_header(w, footer);
      +	put_be64(p, w->stats.ref_stats.index_offset);
      +	p += 8;
      +	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
     @@ -6284,7 +6690,7 @@
      +	p += 4;
      +	w->pending_padding = 0;
      +
     -+	err = padded_write(w, footer, sizeof(footer), 0);
     ++	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
      +	if (err < 0) {
      +		goto exit;
      +	}
     @@ -6298,15 +6704,15 @@
      +	/* free up memory. */
      +	block_writer_clear(&w->block_writer_data);
      +	writer_clear_index(w);
     -+	free(slice_yield(&w->last_key));
     ++	reftable_free(slice_yield(&w->last_key));
      +	return err;
      +}
      +
     -+void writer_clear_index(struct writer *w)
     ++void writer_clear_index(struct reftable_writer *w)
      +{
      +	int i = 0;
      +	for (i = 0; i < w->index_len; i++) {
     -+		free(slice_yield(&w->index[i].last_key));
     ++		reftable_free(slice_yield(&w->index[i].last_key));
      +	}
      +
      +	FREE_AND_NULL(w->index);
     @@ -6316,10 +6722,11 @@
      +
      +const int debug = 0;
      +
     -+static int writer_flush_nonempty_block(struct writer *w)
     ++static int writer_flush_nonempty_block(struct reftable_writer *w)
      +{
      +	byte typ = block_writer_type(w->block_writer);
     -+	struct block_stats *bstats = writer_block_stats(w, typ);
     ++	struct reftable_block_stats *bstats =
     ++		writer_reftable_block_stats(w, typ);
      +	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
      +	int raw_bytes = block_writer_finish(w->block_writer);
      +	int padding = 0;
     @@ -6358,8 +6765,8 @@
      +
      +	if (w->index_cap == w->index_len) {
      +		w->index_cap = 2 * w->index_cap + 1;
     -+		w->index = realloc(w->index,
     -+				   sizeof(struct index_record) * w->index_cap);
     ++		w->index = reftable_realloc(
     ++			w->index, sizeof(struct index_record) * w->index_cap);
      +	}
      +
      +	{
     @@ -6377,7 +6784,7 @@
      +	return 0;
      +}
      +
     -+int writer_flush_block(struct writer *w)
     ++int writer_flush_block(struct reftable_writer *w)
      +{
      +	if (w->block_writer == NULL) {
      +		return 0;
     @@ -6388,7 +6795,7 @@
      +	return writer_flush_nonempty_block(w);
      +}
      +
     -+struct stats *writer_stats(struct writer *w)
     ++const struct reftable_stats *writer_stats(struct reftable_writer *w)
      +{
      +	return &w->stats;
      +}
     @@ -6415,7 +6822,7 @@
      +#include "slice.h"
      +#include "tree.h"
      +
     -+struct writer {
     ++struct reftable_writer {
      +	int (*write)(void *, byte *, int);
      +	void *write_arg;
      +	int pending_padding;
     @@ -6423,11 +6830,11 @@
      +
      +	uint64_t next;
      +	uint64_t min_update_index, max_update_index;
     -+	struct write_options opts;
     ++	struct reftable_write_options opts;
      +
      +	byte *block;
     -+	struct block_writer *block_writer;
     -+	struct block_writer block_writer_data;
     ++	struct reftable_block_writer *block_writer;
     ++	struct reftable_block_writer block_writer_data;
      +	struct index_record *index;
      +	int index_len;
      +	int index_cap;
     @@ -6435,12 +6842,12 @@
      +	/* tree for use with tsearch */
      +	struct tree_node *obj_index_tree;
      +
     -+	struct stats stats;
     ++	struct reftable_stats stats;
      +};
      +
     -+int writer_flush_block(struct writer *w);
     -+void writer_clear_index(struct writer *w);
     -+int writer_finish_public_section(struct writer *w);
     ++int writer_flush_block(struct reftable_writer *w);
     ++void writer_clear_index(struct reftable_writer *w);
     ++int writer_finish_public_section(struct reftable_writer *w);
      +
      +#endif
      
  6:  a622d8066c7 !  9:  b29c4ecc1c4 Reftable support for git-core
     @@ -6,6 +6,7 @@
      
          TODO:
      
     +     * "git show-ref" shows "HEAD"
           * Resolve spots marked with XXX
           * Test strategy?
      
     @@ -104,8 +105,9 @@ $^
       		}
       	}
       
     --	init_db(git_dir, real_git_dir, option_template, INIT_DB_QUIET);
     +-	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
      +	init_db(git_dir, real_git_dir, option_template,
     ++                GIT_HASH_UNKNOWN,
      +		DEFAULT_REF_STORAGE, /* XXX */
      +		INIT_DB_QUIET);
       
     @@ -116,20 +118,28 @@ $^
       --- a/builtin/init-db.c
       +++ b/builtin/init-db.c
      @@
     + 	return 1;
       }
       
     - static int create_default_files(const char *template_path,
     --				const char *original_git_dir)
     -+				const char *original_git_dir,
     -+				const char *ref_storage_format, int flags)
     +-void initialize_repository_version(int hash_algo)
     ++void initialize_repository_version(int hash_algo, const char *ref_storage_format)
       {
     - 	struct stat st1;
     - 	struct strbuf buf = STRBUF_INIT;
     + 	char repo_version_string[10];
     + 	int repo_version = GIT_REPO_VERSION;
     +@@
     + 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
     + #endif
     + 
     +-	if (hash_algo != GIT_HASH_SHA1)
     ++	if (hash_algo != GIT_HASH_SHA1 || !strcmp(ref_storage_format, "reftable"))
     + 		repo_version = GIT_REPO_VERSION_READ;
     + 
     + 	/* This forces creation of new config file */
      @@
       	is_bare_repository_cfg = init_is_bare_repository;
       	if (init_shared_repository != -1)
       		set_shared_repository(init_shared_repository);
     -+	the_repository->ref_storage_format = xstrdup(ref_storage_format);
     ++	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
       
       	/*
       	 * We would have created the above under user's umask -- under
     @@ -166,35 +176,28 @@ $^
      +		 */
       	}
       
     - 	/* This forces creation of new config file */
     --	xsnprintf(repo_version_string, sizeof(repo_version_string),
     --		  "%d", GIT_REPO_VERSION);
     -+	xsnprintf(repo_version_string, sizeof(repo_version_string), "%d",
     -+		  !strcmp(ref_storage_format, "reftable") ?
     -+			  GIT_REPO_VERSION_READ :
     -+			  GIT_REPO_VERSION);
     - 	git_config_set("core.repositoryformatversion", repo_version_string);
     +-	initialize_repository_version(fmt->hash_algo);
     ++	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
       
       	/* Check filemode trustability */
     + 	path = git_path_buf(&buf, "config");
      @@
       }
       
       int init_db(const char *git_dir, const char *real_git_dir,
     --	    const char *template_dir, unsigned int flags)
     -+	    const char *template_dir, const char *ref_storage_format,
     -+	    unsigned int flags)
     +-	    const char *template_dir, int hash, unsigned int flags)
     ++	    const char *template_dir, int hash, const char *ref_storage_format,
     ++            unsigned int flags)
       {
       	int reinit;
       	int exist_ok = flags & INIT_DB_EXIST_OK;
      @@
     + 	 * is an attempt to reinitialize new repository with an old tool.
       	 */
     - 	check_repository_format();
     - 
     --	reinit = create_default_files(template_dir, original_git_dir);
     -+	reinit = create_default_files(template_dir, original_git_dir,
     -+				      ref_storage_format, flags);
     + 	check_repository_format(&repo_fmt);
     ++        repo_fmt.ref_storage = xstrdup(ref_storage_format);
       
     - 	create_object_directory();
     + 	validate_hash_algorithm(&repo_fmt, hash);
       
      @@
       		git_config_set("receive.denyNonFastforwards", "true");
     @@ -213,7 +216,9 @@ $^
       	const char *real_git_dir = NULL;
       	const char *work_tree;
       	const char *template_dir = NULL;
     - 	unsigned int flags = 0;
     +@@
     + 	const char *object_format = NULL;
     + 	int hash_algo = GIT_HASH_UNKNOWN;
       	const struct option init_db_options[] = {
      -		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
      -				N_("directory from which templates will be used")),
     @@ -235,7 +240,7 @@ $^
      +			   N_("the ref storage format to use")),
       		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
       			   N_("separate git dir from working tree")),
     - 		OPT_END()
     + 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
      @@
       	}
       
     @@ -245,21 +250,21 @@ $^
       	UNLEAK(work_tree);
       
       	flags |= INIT_DB_EXIST_OK;
     --	return init_db(git_dir, real_git_dir, template_dir, flags);
     -+	return init_db(git_dir, real_git_dir, template_dir, ref_storage_format,
     -+		       flags);
     +-	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
     ++	return init_db(git_dir, real_git_dir, template_dir, hash_algo, ref_storage_format, flags);
       }
      
       diff --git a/cache.h b/cache.h
       --- a/cache.h
       +++ b/cache.h
      @@
     - #define INIT_DB_EXIST_OK 0x0002
       
       int init_db(const char *git_dir, const char *real_git_dir,
     --	    const char *template_dir, unsigned int flags);
     -+	    const char *template_dir, const char *ref_storage_format,
     -+	    unsigned int flags);
     + 	    const char *template_dir, int hash_algo,
     ++            const char *ref_storage_format,
     + 	    unsigned int flags);
     +-void initialize_repository_version(int hash_algo);
     ++void initialize_repository_version(int hash_algo, const char *ref_storage_format);
       
       void sanitize_stdfds(void);
       int daemonize(void);
     @@ -350,7 +355,6 @@ $^
       struct string_list_item;
       struct worktree;
       
     -+/* XXX where should this be? */
      +#define DEFAULT_REF_STORAGE "files"
      +
       /*
     @@ -386,26 +390,25 @@ $^
      +
      +extern struct ref_storage_be refs_be_reftable;
      +
     -+struct reftable_ref_store {
     ++struct git_reftable_ref_store {
      +	struct ref_store base;
      +	unsigned int store_flags;
      +
      +	int err;
      +	char *reftable_dir;
     -+	char *table_list_file;
     -+	struct stack *stack;
     ++	struct reftable_stack *stack;
      +};
      +
     -+static void clear_log_record(struct log_record *log)
     ++static void clear_reftable_log_record(struct reftable_log_record *log)
      +{
      +	log->old_hash = NULL;
      +	log->new_hash = NULL;
      +	log->message = NULL;
      +	log->ref_name = NULL;
     -+	log_record_clear(log);
     ++	reftable_log_record_clear(log);
      +}
      +
     -+static void fill_log_record(struct log_record *log)
     ++static void fill_reftable_log_record(struct reftable_log_record *log)
      +{
      +	const char *info = git_committer_info(0);
      +	struct ident_split split = {};
     @@ -413,7 +416,7 @@ $^
      +	int sign = 1;
      +	assert(0 == result);
      +
     -+	log_record_clear(log);
     ++	reftable_log_record_clear(log);
      +	log->name =
      +		xstrndup(split.name_begin, split.name_end - split.name_begin);
      +	log->email =
     @@ -431,12 +434,12 @@ $^
      +	log->tz_offset = sign * atoi(split.tz_begin);
      +}
      +
     -+static struct ref_store *reftable_ref_store_create(const char *path,
     ++static struct ref_store *git_reftable_ref_store_create(const char *path,
      +						   unsigned int store_flags)
      +{
     -+	struct reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
     ++	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
      +	struct ref_store *ref_store = (struct ref_store *)refs;
     -+	struct write_options cfg = {
     ++	struct reftable_write_options cfg = {
      +		.block_size = 4096,
      +		.hash_id = the_hash_algo->format_id,
      +	};
     @@ -449,10 +452,6 @@ $^
      +	refs->reftable_dir = xstrdup(sb.buf);
      +	strbuf_reset(&sb);
      +
     -+	strbuf_addf(&sb, "%s/reftable/tables.list", path);
     -+	refs->table_list_file = xstrdup(sb.buf);
     -+	strbuf_reset(&sb);
     -+
      +	strbuf_addf(&sb, "%s/refs", path);
      +	safe_create_dir(sb.buf, 1);
      +	strbuf_reset(&sb);
     @@ -464,31 +463,23 @@ $^
      +	strbuf_addf(&sb, "%s/refs/heads", path);
      +	write_file(sb.buf, "this repository uses the reftable format");
      +
     -+	refs->err = new_stack(&refs->stack, refs->reftable_dir,
     -+			      refs->table_list_file, cfg);
     ++	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
      +	strbuf_release(&sb);
      +	return ref_store;
      +}
      +
      +static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
      +{
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)ref_store;
     -+	FILE *file;
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
      +	safe_create_dir(refs->reftable_dir, 1);
     -+
     -+	file = fopen(refs->table_list_file, "a");
     -+	if (file == NULL) {
     -+		return -1;
     -+	}
     -+	fclose(file);
      +	return 0;
      +}
      +
     -+struct reftable_iterator {
     ++struct git_reftable_iterator {
      +	struct ref_iterator base;
     -+	struct iterator iter;
     -+	struct ref_record ref;
     ++	struct reftable_iterator iter;
     ++	struct reftable_ref_record ref;
      +	struct object_id oid;
      +	struct ref_store *ref_store;
      +	unsigned int flags;
     @@ -498,9 +489,9 @@ $^
      +
      +static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
      +{
     -+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
     ++	struct git_reftable_iterator *ri = (struct git_reftable_iterator *)ref_iterator;
      +	while (ri->err == 0) {
     -+		ri->err = iterator_next_ref(ri->iter, &ri->ref);
     ++		ri->err = reftable_iterator_next_ref(ri->iter, &ri->ref);
      +		if (ri->err) {
      +			break;
      +		}
     @@ -554,7 +545,7 @@ $^
      +static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
      +				      struct object_id *peeled)
      +{
     -+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
     ++	struct git_reftable_iterator *ri = (struct git_reftable_iterator *)ref_iterator;
      +	if (ri->ref.target_value != NULL) {
      +		hashcpy(peeled->hash, ri->ref.target_value);
      +		return 0;
     @@ -565,9 +556,9 @@ $^
      +
      +static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
      +{
     -+	struct reftable_iterator *ri = (struct reftable_iterator *)ref_iterator;
     -+	ref_record_clear(&ri->ref);
     -+	iterator_destroy(&ri->iter);
     ++	struct git_reftable_iterator *ri = (struct git_reftable_iterator *)ref_iterator;
     ++	reftable_ref_record_clear(&ri->ref);
     ++	reftable_iterator_destroy(&ri->iter);
      +	return 0;
      +}
      +
     @@ -580,16 +571,16 @@ $^
      +reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
      +			    unsigned int flags)
      +{
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)ref_store;
     -+	struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
     -+	struct merged_table *mt = NULL;
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
     ++	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
     ++	struct reftable_merged_table *mt = NULL;
      +
      +	if (refs->err < 0) {
      +		ri->err = refs->err;
      +	} else {
     -+		mt = stack_merged_table(refs->stack);
     -+		ri->err = merged_table_seek_ref(mt, &ri->iter, prefix);
     ++		mt = reftable_stack_merged_table(refs->stack);
     ++		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
      +	}
      +
      +	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
     @@ -610,8 +601,8 @@ $^
      +				      struct ref_transaction *transaction,
      +				      struct strbuf *err)
      +{
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)ref_store;
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
      +	(void)refs;
      +	return 0;
      +}
     @@ -640,19 +631,19 @@ $^
      +		      ((struct ref_update *)b)->refname);
      +}
      +
     -+static int write_transaction_table(struct writer *writer, void *arg)
     ++static int write_transaction_table(struct reftable_writer *writer, void *arg)
      +{
      +	struct ref_transaction *transaction = (struct ref_transaction *)arg;
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)transaction->ref_store;
     -+	uint64_t ts = stack_next_update_index(refs->stack);
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)transaction->ref_store;
     ++	uint64_t ts = reftable_stack_next_update_index(refs->stack);
      +	int err = 0;
     -+	struct log_record *logs = calloc(transaction->nr, sizeof(*logs));
     ++	struct reftable_log_record *logs = calloc(transaction->nr, sizeof(*logs));
      +	struct ref_update **sorted =
      +		malloc(transaction->nr * sizeof(struct ref_update *));
      +	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
      +	QSORT(sorted, transaction->nr, ref_update_cmp);
     -+	writer_set_limits(writer, ts, ts);
     ++	reftable_writer_set_limits(writer, ts, ts);
      +
      +	for (int i = 0; i < transaction->nr; i++) {
      +		struct ref_update *u = sorted[i];
     @@ -667,8 +658,8 @@ $^
      +
      +	for (int i = 0; i < transaction->nr; i++) {
      +		struct ref_update *u = sorted[i];
     -+		struct log_record *log = &logs[i];
     -+		fill_log_record(log);
     ++		struct reftable_log_record *log = &logs[i];
     ++		fill_reftable_log_record(log);
      +		log->ref_name = (char *)u->refname;
      +		log->old_hash = u->old_oid.hash;
      +		log->new_hash = u->new_oid.hash;
     @@ -683,13 +674,13 @@ $^
      +			const char *resolved = refs_resolve_ref_unsafe(
      +				transaction->ref_store, u->refname, 0, &out_oid,
      +				&out_flags);
     -+			struct ref_record ref = {};
     ++			struct reftable_ref_record ref = {};
      +			ref.ref_name =
      +				(char *)(resolved ? resolved : u->refname);
      +			log->ref_name = ref.ref_name;
      +			ref.value = u->new_oid.hash;
      +			ref.update_index = ts;
     -+			err = writer_add_ref(writer, &ref);
     ++			err = reftable_writer_add_ref(writer, &ref);
      +			if (err < 0) {
      +				goto exit;
      +			}
     @@ -697,8 +688,8 @@ $^
      +	}
      +
      +	for (int i = 0; i < transaction->nr; i++) {
     -+		err = writer_add_log(writer, &logs[i]);
     -+		clear_log_record(&logs[i]);
     ++		err = reftable_writer_add_log(writer, &logs[i]);
     ++		clear_reftable_log_record(&logs[i]);
      +		if (err < 0) {
      +			goto exit;
      +		}
     @@ -714,17 +705,17 @@ $^
      +				       struct ref_transaction *transaction,
      +				       struct strbuf *errmsg)
      +{
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)ref_store;
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
      +	int err = 0;
      +	if (refs->err < 0) {
      +		return refs->err;
      +	}
      +
     -+	err = stack_add(refs->stack, &write_transaction_table, transaction);
     ++	err = reftable_stack_add(refs->stack, &write_transaction_table, transaction);
      +	if (err < 0) {
      +		strbuf_addf(errmsg, "reftable: transaction failure %s",
     -+			    error_str(err));
     ++			    reftable_error_str(err));
      +		return -1;
      +	}
      +
     @@ -739,49 +730,49 @@ $^
      +}
      +
      +struct write_delete_refs_arg {
     -+	struct stack *stack;
     ++	struct reftable_stack *stack;
      +	struct string_list *refnames;
      +	const char *logmsg;
      +	unsigned int flags;
      +};
      +
     -+static int write_delete_refs_table(struct writer *writer, void *argv)
     ++static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
      +{
      +	struct write_delete_refs_arg *arg =
      +		(struct write_delete_refs_arg *)argv;
     -+	uint64_t ts = stack_next_update_index(arg->stack);
     ++	uint64_t ts = reftable_stack_next_update_index(arg->stack);
      +	int err = 0;
      +
     -+	writer_set_limits(writer, ts, ts);
     ++	reftable_writer_set_limits(writer, ts, ts);
      +	for (int i = 0; i < arg->refnames->nr; i++) {
     -+		struct ref_record ref = {
     ++		struct reftable_ref_record ref = {
      +			.ref_name = (char *)arg->refnames->items[i].string,
      +			.update_index = ts,
      +		};
     -+		err = writer_add_ref(writer, &ref);
     ++		err = reftable_writer_add_ref(writer, &ref);
      +		if (err < 0) {
      +			return err;
      +		}
      +	}
      +
      +	for (int i = 0; i < arg->refnames->nr; i++) {
     -+		struct log_record log = {};
     -+		struct ref_record current = {};
     -+		fill_log_record(&log);
     ++		struct reftable_log_record log = {};
     ++		struct reftable_ref_record current = {};
     ++		fill_reftable_log_record(&log);
      +		log.message = xstrdup(arg->logmsg);
      +		log.new_hash = NULL;
      +		log.old_hash = NULL;
      +		log.update_index = ts;
      +		log.ref_name = (char *)arg->refnames->items[i].string;
      +
     -+		if (stack_read_ref(arg->stack, log.ref_name, &current) == 0) {
     ++		if (reftable_stack_read_ref(arg->stack, log.ref_name, &current) == 0) {
      +			log.old_hash = current.value;
      +		}
     -+		err = writer_add_log(writer, &log);
     ++		err = reftable_writer_add_log(writer, &log);
      +		log.old_hash = NULL;
     -+		ref_record_clear(&current);
     ++		reftable_ref_record_clear(&current);
      +
     -+		clear_log_record(&log);
     ++		clear_reftable_log_record(&log);
      +		if (err < 0) {
      +			return err;
      +		}
     @@ -793,8 +784,8 @@ $^
      +				struct string_list *refnames,
      +				unsigned int flags)
      +{
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)ref_store;
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
      +	struct write_delete_refs_arg arg = {
      +		.stack = refs->stack,
      +		.refnames = refnames,
     @@ -805,52 +796,52 @@ $^
      +		return refs->err;
      +	}
      +
     -+	return stack_add(refs->stack, &write_delete_refs_table, &arg);
     ++	return reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
      +}
      +
      +static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
      +{
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)ref_store;
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
      +	if (refs->err < 0) {
      +		return refs->err;
      +	}
     -+	return stack_compact_all(refs->stack, NULL);
     ++	return reftable_stack_compact_all(refs->stack, NULL);
      +}
      +
      +struct write_create_symref_arg {
     -+	struct reftable_ref_store *refs;
     ++	struct git_reftable_ref_store *refs;
      +	const char *refname;
      +	const char *target;
      +	const char *logmsg;
      +};
      +
     -+static int write_create_symref_table(struct writer *writer, void *arg)
     ++static int write_create_symref_table(struct reftable_writer *writer, void *arg)
      +{
      +	struct write_create_symref_arg *create =
      +		(struct write_create_symref_arg *)arg;
     -+	uint64_t ts = stack_next_update_index(create->refs->stack);
     ++	uint64_t ts = reftable_stack_next_update_index(create->refs->stack);
      +	int err = 0;
      +
     -+	struct ref_record ref = {
     ++	struct reftable_ref_record ref = {
      +		.ref_name = (char *)create->refname,
      +		.target = (char *)create->target,
      +		.update_index = ts,
      +	};
     -+	writer_set_limits(writer, ts, ts);
     -+	err = writer_add_ref(writer, &ref);
     ++	reftable_writer_set_limits(writer, ts, ts);
     ++	err = reftable_writer_add_ref(writer, &ref);
      +	if (err < 0) {
      +		return err;
      +	}
      +
      +	{
     -+		struct log_record log = {};
     ++		struct reftable_log_record log = {};
      +		struct object_id new_oid = {};
      +		struct object_id old_oid = {};
     -+		struct ref_record current = {};
     -+		stack_read_ref(create->refs->stack, create->refname, &current);
     ++		struct reftable_ref_record current = {};
     ++		reftable_stack_read_ref(create->refs->stack, create->refname, &current);
      +
     -+		fill_log_record(&log);
     ++		fill_reftable_log_record(&log);
      +		log.ref_name = current.ref_name;
      +		if (refs_resolve_ref_unsafe(
      +			    (struct ref_store *)create->refs, create->refname,
     @@ -865,12 +856,12 @@ $^
      +		}
      +
      +		if (log.old_hash != NULL || log.new_hash != NULL) {
     -+			writer_add_log(writer, &log);
     ++			reftable_writer_add_log(writer, &log);
      +		}
      +		log.ref_name = NULL;
      +		log.old_hash = NULL;
      +		log.new_hash = NULL;
     -+		clear_log_record(&log);
     ++		clear_reftable_log_record(&log);
      +	}
      +	return 0;
      +}
     @@ -879,8 +870,8 @@ $^
      +				  const char *refname, const char *target,
      +				  const char *logmsg)
      +{
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)ref_store;
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
      +	struct write_create_symref_arg arg = { .refs = refs,
      +					       .refname = refname,
      +					       .target = target,
     @@ -888,55 +879,55 @@ $^
      +	if (refs->err < 0) {
      +		return refs->err;
      +	}
     -+	return stack_add(refs->stack, &write_create_symref_table, &arg);
     ++	return reftable_stack_add(refs->stack, &write_create_symref_table, &arg);
      +}
      +
      +struct write_rename_arg {
     -+	struct stack *stack;
     ++	struct reftable_stack *stack;
      +	const char *oldname;
      +	const char *newname;
      +	const char *logmsg;
      +};
      +
     -+static int write_rename_table(struct writer *writer, void *argv)
     ++static int write_rename_table(struct reftable_writer *writer, void *argv)
      +{
      +	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
     -+	uint64_t ts = stack_next_update_index(arg->stack);
     -+	struct ref_record ref = {};
     -+	int err = stack_read_ref(arg->stack, arg->oldname, &ref);
     ++	uint64_t ts = reftable_stack_next_update_index(arg->stack);
     ++	struct reftable_ref_record ref = {};
     ++	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
      +
      +	if (err) {
      +		goto exit;
      +	}
      +
      +	/* XXX do ref renames overwrite the target? */
     -+	if (stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
     ++	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
      +		goto exit;
      +	}
      +
      +	free(ref.ref_name);
      +	ref.ref_name = strdup(arg->newname);
     -+	writer_set_limits(writer, ts, ts);
     ++	reftable_writer_set_limits(writer, ts, ts);
      +	ref.update_index = ts;
      +
      +	{
     -+		struct ref_record todo[2] = {};
     ++		struct reftable_ref_record todo[2] = {};
      +		todo[0].ref_name = (char *)arg->oldname;
      +		todo[0].update_index = ts;
      +		/* leave todo[0] empty */
      +		todo[1] = ref;
      +		todo[1].update_index = ts;
      +
     -+		err = writer_add_refs(writer, todo, 2);
     ++		err = reftable_writer_add_refs(writer, todo, 2);
      +		if (err < 0) {
      +			goto exit;
      +		}
      +	}
      +
      +	if (ref.value != NULL) {
     -+		struct log_record todo[2] = {};
     -+		fill_log_record(&todo[0]);
     -+		fill_log_record(&todo[1]);
     ++		struct reftable_log_record todo[2] = {};
     ++		fill_reftable_log_record(&todo[0]);
     ++		fill_reftable_log_record(&todo[1]);
      +
      +		todo[0].ref_name = (char *)arg->oldname;
      +		todo[0].update_index = ts;
     @@ -950,10 +941,10 @@ $^
      +		todo[1].new_hash = ref.value;
      +		todo[1].message = (char *)arg->logmsg;
      +
     -+		err = writer_add_logs(writer, todo, 2);
     ++		err = reftable_writer_add_logs(writer, todo, 2);
      +
     -+		clear_log_record(&todo[0]);
     -+		clear_log_record(&todo[1]);
     ++		clear_reftable_log_record(&todo[0]);
     ++		clear_reftable_log_record(&todo[1]);
      +
      +		if (err < 0) {
      +			goto exit;
     @@ -964,7 +955,7 @@ $^
      +	}
      +
      +exit:
     -+	ref_record_clear(&ref);
     ++	reftable_ref_record_clear(&ref);
      +	return err;
      +}
      +
     @@ -972,8 +963,8 @@ $^
      +			       const char *oldrefname, const char *newrefname,
      +			       const char *logmsg)
      +{
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)ref_store;
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
      +	struct write_rename_arg arg = {
      +		.stack = refs->stack,
      +		.oldname = oldrefname,
     @@ -984,7 +975,7 @@ $^
      +		return refs->err;
      +	}
      +
     -+	return stack_add(refs->stack, &write_rename_table, &arg);
     ++	return reftable_stack_add(refs->stack, &write_rename_table, &arg);
      +}
      +
      +static int reftable_copy_ref(struct ref_store *ref_store,
     @@ -996,8 +987,8 @@ $^
      +
      +struct reftable_reflog_ref_iterator {
      +	struct ref_iterator base;
     -+	struct iterator iter;
     -+	struct log_record log;
     ++	struct reftable_iterator iter;
     ++	struct reftable_log_record log;
      +	struct object_id oid;
      +	char *last_name;
      +};
     @@ -1009,7 +1000,7 @@ $^
      +		(struct reftable_reflog_ref_iterator *)ref_iterator;
      +
      +	while (1) {
     -+		int err = iterator_next_log(ri->iter, &ri->log);
     ++		int err = reftable_iterator_next_log(ri->iter, &ri->log);
      +		if (err > 0) {
      +			return ITER_DONE;
      +		}
     @@ -1041,8 +1032,8 @@ $^
      +{
      +	struct reftable_reflog_ref_iterator *ri =
      +		(struct reftable_reflog_ref_iterator *)ref_iterator;
     -+	log_record_clear(&ri->log);
     -+	iterator_destroy(&ri->iter);
     ++	reftable_log_record_clear(&ri->log);
     ++	reftable_iterator_destroy(&ri->iter);
      +	return 0;
      +}
      +
     @@ -1055,11 +1046,11 @@ $^
      +reftable_reflog_iterator_begin(struct ref_store *ref_store)
      +{
      +	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)ref_store;
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
      +
     -+	struct merged_table *mt = stack_merged_table(refs->stack);
     -+	int err = merged_table_seek_log(mt, &ri->iter, "");
     ++	struct reftable_merged_table *mt = reftable_stack_merged_table(refs->stack);
     ++	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
      +	if (err < 0) {
      +		free(ri);
      +		return NULL;
     @@ -1077,21 +1068,21 @@ $^
      +					  const char *refname,
      +					  each_reflog_ent_fn fn, void *cb_data)
      +{
     -+	struct iterator it = {};
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)ref_store;
     -+	struct merged_table *mt = NULL;
     ++	struct reftable_iterator it = {};
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_merged_table *mt = NULL;
      +	int err = 0;
     -+	struct log_record log = {};
     ++	struct reftable_log_record log = {};
      +
      +	if (refs->err < 0) {
      +		return refs->err;
      +	}
      +
     -+	mt = stack_merged_table(refs->stack);
     -+	err = merged_table_seek_log(mt, &it, refname);
     ++	mt = reftable_stack_merged_table(refs->stack);
     ++	err = reftable_merged_table_seek_log(mt, &it, refname);
      +	while (err == 0) {
     -+		err = iterator_next_log(it, &log);
     ++		err = reftable_iterator_next_log(it, &log);
      +		if (err != 0) {
      +			break;
      +		}
     @@ -1120,8 +1111,8 @@ $^
      +		}
      +	}
      +
     -+	log_record_clear(&log);
     -+	iterator_destroy(&it);
     ++	reftable_log_record_clear(&log);
     ++	reftable_iterator_destroy(&it);
      +	if (err > 0) {
      +		err = 0;
      +	}
     @@ -1133,11 +1124,11 @@ $^
      +					  const char *refname,
      +					  each_reflog_ent_fn fn, void *cb_data)
      +{
     -+	struct iterator it = {};
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)ref_store;
     -+	struct merged_table *mt = NULL;
     -+	struct log_record *logs = NULL;
     ++	struct reftable_iterator it = {};
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_merged_table *mt = NULL;
     ++	struct reftable_log_record *logs = NULL;
      +	int cap = 0;
      +	int len = 0;
      +	int err = 0;
     @@ -1145,12 +1136,12 @@ $^
      +	if (refs->err < 0) {
      +		return refs->err;
      +	}
     -+	mt = stack_merged_table(refs->stack);
     -+	err = merged_table_seek_log(mt, &it, refname);
     ++	mt = reftable_stack_merged_table(refs->stack);
     ++	err = reftable_merged_table_seek_log(mt, &it, refname);
      +
      +	while (err == 0) {
     -+		struct log_record log = {};
     -+		err = iterator_next_log(it, &log);
     ++		struct reftable_log_record log = {};
     ++		err = reftable_iterator_next_log(it, &log);
      +		if (err != 0) {
      +			break;
      +		}
     @@ -1168,7 +1159,7 @@ $^
      +	}
      +
      +	for (int i = len; i--;) {
     -+		struct log_record *log = &logs[i];
     ++		struct reftable_log_record *log = &logs[i];
      +		struct object_id old_oid = {};
      +		struct object_id new_oid = {};
      +		const char *full_committer = "";
     @@ -1187,11 +1178,11 @@ $^
      +	}
      +
      +	for (int i = 0; i < len; i++) {
     -+		log_record_clear(&logs[i]);
     ++		reftable_log_record_clear(&logs[i]);
      +	}
      +	free(logs);
      +
     -+	iterator_destroy(&it);
     ++	reftable_iterator_destroy(&it);
      +	if (err > 0) {
      +		err = 0;
      +	}
     @@ -1219,8 +1210,8 @@ $^
      +}
      +
      +struct reflog_expiry_arg {
     -+	struct reftable_ref_store *refs;
     -+	struct log_record *tombstones;
     ++	struct git_reftable_ref_store *refs;
     ++	struct reftable_log_record *tombstones;
      +	int len;
      +	int cap;
      +};
     @@ -1229,7 +1220,7 @@ $^
      +{
      +	int i = 0;
      +	for (; i < arg->len; i++) {
     -+		log_record_clear(&arg->tombstones[i]);
     ++		reftable_log_record_clear(&arg->tombstones[i]);
      +	}
      +
      +	FREE_AND_NULL(arg->tombstones);
     @@ -1238,7 +1229,7 @@ $^
      +static void add_log_tombstone(struct reflog_expiry_arg *arg,
      +			      const char *refname, uint64_t ts)
      +{
     -+	struct log_record tombstone = {
     ++	struct reftable_log_record tombstone = {
      +		.ref_name = xstrdup(refname),
      +		.update_index = ts,
      +	};
     @@ -1250,14 +1241,14 @@ $^
      +	arg->tombstones[arg->len++] = tombstone;
      +}
      +
     -+static int write_reflog_expiry_table(struct writer *writer, void *argv)
     ++static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
      +{
      +	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
     -+	uint64_t ts = stack_next_update_index(arg->refs->stack);
     ++	uint64_t ts = reftable_stack_next_update_index(arg->refs->stack);
      +	int i = 0;
     -+	writer_set_limits(writer, ts, ts);
     ++	reftable_writer_set_limits(writer, ts, ts);
      +	for (i = 0; i < arg->len; i++) {
     -+		int err = writer_add_log(writer, &arg->tombstones[i]);
     ++		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
      +		if (err) {
      +			return err;
      +		}
     @@ -1286,21 +1277,21 @@ $^
      +	  It would be better if the refs backend supported an API that sets a
      +	  criterion for all refs, passing the criterion to pack_refs().
      +	*/
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)ref_store;
     -+	struct merged_table *mt = NULL;
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_merged_table *mt = NULL;
      +	struct reflog_expiry_arg arg = {
      +		.refs = refs,
      +	};
     -+	struct log_record log = {};
     -+	struct iterator it = {};
     ++	struct reftable_log_record log = {};
     ++	struct reftable_iterator it = {};
      +	int err = 0;
      +	if (refs->err < 0) {
      +		return refs->err;
      +	}
      +
     -+	mt = stack_merged_table(refs->stack);
     -+	err = merged_table_seek_log(mt, &it, refname);
     ++	mt = reftable_stack_merged_table(refs->stack);
     ++	err = reftable_merged_table_seek_log(mt, &it, refname);
      +	if (err < 0) {
      +		return err;
      +	}
     @@ -1309,7 +1300,7 @@ $^
      +		struct object_id ooid = {};
      +		struct object_id noid = {};
      +
     -+		int err = iterator_next_log(it, &log);
     ++		int err = reftable_iterator_next_log(it, &log);
      +		if (err < 0) {
      +			return err;
      +		}
     @@ -1326,9 +1317,9 @@ $^
      +			add_log_tombstone(&arg, refname, log.update_index);
      +		}
      +	}
     -+	log_record_clear(&log);
     -+	iterator_destroy(&it);
     -+	err = stack_add(refs->stack, &write_reflog_expiry_table, &arg);
     ++	reftable_log_record_clear(&log);
     ++	reftable_iterator_destroy(&it);
     ++	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
      +	clear_log_tombstones(&arg);
      +	return err;
      +}
     @@ -1337,15 +1328,15 @@ $^
      +				 const char *refname, struct object_id *oid,
      +				 struct strbuf *referent, unsigned int *type)
      +{
     -+	struct reftable_ref_store *refs =
     -+		(struct reftable_ref_store *)ref_store;
     -+	struct ref_record ref = {};
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_ref_record ref = {};
      +	int err = 0;
      +	if (refs->err < 0) {
      +		return refs->err;
      +	}
      +
     -+	err = stack_read_ref(refs->stack, refname, &ref);
     ++	err = reftable_stack_read_ref(refs->stack, refname, &ref);
      +	if (err) {
      +		goto exit;
      +	}
     @@ -1358,14 +1349,14 @@ $^
      +		hashcpy(oid->hash, ref.value);
      +	}
      +exit:
     -+	ref_record_clear(&ref);
     ++	reftable_ref_record_clear(&ref);
      +	return err;
      +}
      +
      +struct ref_storage_be refs_be_reftable = {
      +	&refs_be_files,
      +	"reftable",
     -+	reftable_ref_store_create,
     ++	git_reftable_ref_store_create,
      +	reftable_init_db,
      +	reftable_transaction_prepare,
      +	reftable_transaction_finish,
     @@ -1471,24 +1462,28 @@ $^
      +. ./test-lib.sh
      +
      +test_expect_success 'basic operation of reftable storage' '
     -+        git init --ref-storage=reftable repo &&
     -+        cd repo &&
     -+        echo "hello" > world.txt &&
     -+        git add world.txt &&
     -+        git commit -m "first post" &&
     -+        printf "HEAD\nrefs/heads/master\n" > want &&
     -+        git show-ref | cut -f2 -d" " > got &&
     -+        test_cmp got want &&
     -+        for count in $(test_seq 1 10); do
     -+                echo "hello" >> world.txt
     -+                git commit -m "number ${count}" world.txt
     -+        done &&
     -+        git gc &&
     -+        nfiles=$(ls -1 .git/reftable | wc -l | tr -d "[ \t]" ) &&
     -+        test "${nfiles}" = "2" &&
     -+        git reflog refs/heads/master > output &&
     -+        grep "commit (initial): first post" output &&
     -+        grep "commit: number 10" output
     ++	git init --ref-storage=reftable repo && (
     ++	cd repo &&
     ++	echo "hello" >world.txt &&
     ++	git add world.txt &&
     ++	git commit -m "first post" &&
     ++	test_write_lines HEAD refs/heads/master >expect &&
     ++	git show-ref &&
     ++	git show-ref | cut -f2 -d" " > actual &&
     ++	test_cmp actual expect &&
     ++	for count in $(test_seq 1 10)
     ++	do
     ++		echo "hello" >>world.txt
     ++		git commit -m "number ${count}" world.txt ||
     ++		return 1
     ++	done &&
     ++	git gc &&
     ++	nfiles=$(ls -1 .git/reftable | wc -l ) &&
     ++	test ${nfiles} = "2" &&
     ++	git reflog refs/heads/master >output &&
     ++	test_line_count = 11 output &&
     ++	grep "commit (initial): first post" output &&
     ++	grep "commit: number 10" output )
      +'
      +
      +test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v8 1/9] refs.h: clarify reflog iteration order
  2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
@ 2020-04-01 11:28               ` Han-Wen Nienhuys via GitGitGadget
  2020-04-01 11:28               ` [PATCH v8 2/9] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
                                 ` (9 subsequent siblings)
  10 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-01 11:28 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/refs.h b/refs.h
index 545029c6d80..87c9ec921b9 100644
--- a/refs.h
+++ b/refs.h
@@ -444,18 +444,21 @@ int delete_refs(const char *msg, struct string_list *refnames,
 int refs_delete_reflog(struct ref_store *refs, const char *refname);
 int delete_reflog(const char *refname);
 
-/* iterate over reflog entries */
+/* Iterate over reflog entries. */
 typedef int each_reflog_ent_fn(
 		struct object_id *old_oid, struct object_id *new_oid,
 		const char *committer, timestamp_t timestamp,
 		int tz, const char *msg, void *cb_data);
 
+/* Iterate in over reflog entries, oldest entry first. */
 int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
 			     each_reflog_ent_fn fn, void *cb_data);
 int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
 				     const char *refname,
 				     each_reflog_ent_fn fn,
 				     void *cb_data);
+
+/* Call a function for each reflog entry, oldest entry first. */
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v8 2/9] create .git/refs in files-backend.c
  2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
  2020-04-01 11:28               ` [PATCH v8 1/9] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
@ 2020-04-01 11:28               ` Han-Wen Nienhuys via GitGitGadget
  2020-04-01 11:28               ` [PATCH v8 3/9] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
                                 ` (8 subsequent siblings)
  10 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-01 11:28 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This prepares for supporting the reftable format, which will want
create its own file system layout in .git

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/init-db.c    | 2 --
 refs/files-backend.c | 6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 0b7222e7188..3b50b1aa0e5 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -251,8 +251,6 @@ static int create_default_files(const char *template_path,
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
 	 */
-	safe_create_dir(git_path("refs"), 1);
-	adjust_shared_perm(git_path("refs"));
 
 	if (refs_init_db(&err))
 		die("failed to set up refs db: %s", err.buf);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 561c33ac8a9..ab7899a9c77 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3157,9 +3157,15 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
 	struct strbuf sb = STRBUF_INIT;
 
+	files_ref_path(refs, &sb, "refs");
+	safe_create_dir(sb.buf, 1);
+	/* adjust permissions even if directory already exists. */
+	adjust_shared_perm(sb.buf);
+
 	/*
 	 * Create .git/refs/{heads,tags}
 	 */
+	strbuf_reset(&sb);
 	files_ref_path(refs, &sb, "refs/heads");
 	safe_create_dir(sb.buf, 1);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v8 3/9] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
  2020-04-01 11:28               ` [PATCH v8 1/9] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
  2020-04-01 11:28               ` [PATCH v8 2/9] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
@ 2020-04-01 11:28               ` Han-Wen Nienhuys via GitGitGadget
  2020-04-01 11:28               ` [PATCH v8 4/9] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
                                 ` (7 subsequent siblings)
  10 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-01 11:28 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index ff2436c0fb7..3490aac3a40 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -438,6 +438,11 @@ void base_ref_iterator_free(struct ref_iterator *iter);
 
 /* Virtual function declarations for ref_iterators: */
 
+/*
+ * backend-specific implementation of ref_iterator_advance.
+ * For symrefs, the function should set REF_ISSYMREF, and it should also
+ * dereference the symref to provide the OID referent.
+ */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v8 4/9] Add .gitattributes for the reftable/ directory
  2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
                                 ` (2 preceding siblings ...)
  2020-04-01 11:28               ` [PATCH v8 3/9] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
@ 2020-04-01 11:28               ` Han-Wen Nienhuys via GitGitGadget
  2020-04-01 11:28               ` [PATCH v8 5/9] reftable: file format documentation Jonathan Nieder via GitGitGadget
                                 ` (6 subsequent siblings)
  10 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-01 11:28 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 reftable/.gitattributes

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 00000000000..f44451a3795
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v8 5/9] reftable: file format documentation
  2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
                                 ` (3 preceding siblings ...)
  2020-04-01 11:28               ` [PATCH v8 4/9] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
@ 2020-04-01 11:28               ` Jonathan Nieder via GitGitGadget
  2020-04-01 11:28               ` [PATCH v8 6/9] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
                                 ` (5 subsequent siblings)
  10 siblings, 0 replies; 409+ messages in thread
From: Jonathan Nieder via GitGitGadget @ 2020-04-01 11:28 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder

From: Jonathan Nieder <jrnieder@gmail.com>

Shawn Pearce explains:

Some repositories contain a lot of references (e.g. android at 866k,
rails at 31k). The reftable format provides:

- Near constant time lookup for any single reference, even when the
  repository is cold and not in process or kernel cache.
- Near constant time verification a SHA-1 is referred to by at least
  one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.

This file format spec was originally written in July, 2017 by Shawn
Pearce.  Some refinements since then were made by Shawn and by Han-Wen
Nienhuys based on experiences implementing and experimenting with the
format.  (All of this was in the context of our work at Google and
Google is happy to contribute the result to the Git project.)

Imported from JGit[1]'s current version (c217d33ff,
"Documentation/technical/reftable: improve repo layout", 2020-02-04)
of Documentation/technical/reftable.md and converted to asciidoc by
running

  pandoc -t asciidoc -f markdown reftable.md >reftable.txt

using pandoc 2.2.1.  The result required the following additional
minor changes:

- removed the [TOC] directive to add a table of contents, since
  asciidoc does not support it
- replaced git-scm.com/docs links with linkgit: directives that link
  to other pages within Git's documentation

[1] https://eclipse.googlesource.com/jgit/jgit

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/Makefile               |    1 +
 Documentation/technical/reftable.txt | 1067 ++++++++++++++++++++++++++
 2 files changed, 1068 insertions(+)
 create mode 100644 Documentation/technical/reftable.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 8fe829cc1b8..3aab9b8d61a 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -92,6 +92,7 @@ TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
 TECH_DOCS += technical/protocol-v2
 TECH_DOCS += technical/racy-git
+TECH_DOCS += technical/reftable
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
 TECH_DOCS += technical/signature-format
diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
new file mode 100644
index 00000000000..9fa4657d9ff
--- /dev/null
+++ b/Documentation/technical/reftable.txt
@@ -0,0 +1,1067 @@
+reftable
+--------
+
+Overview
+~~~~~~~~
+
+Problem statement
+^^^^^^^^^^^^^^^^^
+
+Some repositories contain a lot of references (e.g. android at 866k,
+rails at 31k). The existing packed-refs format takes up a lot of space
+(e.g. 62M), and does not scale with additional references. Lookup of a
+single reference requires linearly scanning the file.
+
+Atomic pushes modifying multiple references require copying the entire
+packed-refs file, which can be a considerable amount of data moved
+(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
+
+Repositories with many loose references occupy a large number of disk
+blocks from the local file system, as each reference is its own file
+storing 41 bytes (and another file for the corresponding reflog). This
+negatively affects the number of inodes available when a large number of
+repositories are stored on the same filesystem. Readers can be penalized
+due to the larger number of syscalls required to traverse and read the
+`$GIT_DIR/refs` directory.
+
+Objectives
+^^^^^^^^^^
+
+* Near constant time lookup for any single reference, even when the
+repository is cold and not in process or kernel cache.
+* Near constant time verification if a SHA-1 is referred to by at least
+one reference (for allow-tip-sha1-in-want).
+* Efficient lookup of an entire namespace, such as `refs/tags/`.
+* Support atomic push with `O(size_of_update)` operations.
+* Combine reflog storage with ref storage for small transactions.
+* Separate reflog storage for base refs and historical logs.
+
+Description
+^^^^^^^^^^^
+
+A reftable file is a portable binary file format customized for
+reference storage. References are sorted, enabling linear scans, binary
+search lookup, and range scans.
+
+Storage in the file is organized into variable sized blocks. Prefix
+compression is used within a single block to reduce disk space. Block
+size and alignment is tunable by the writer.
+
+Performance
+^^^^^^^^^^^
+
+Space used, packed-refs vs. reftable:
+
+[cols=",>,>,>,>,>",options="header",]
+|===============================================================
+|repository |packed-refs |reftable |% original |avg ref |avg obj
+|android |62.2 M |36.1 M |58.0% |33 bytes |5 bytes
+|rails |1.8 M |1.1 M |57.7% |29 bytes |4 bytes
+|git |78.7 K |48.1 K |61.0% |50 bytes |4 bytes
+|git (heads) |332 b |269 b |81.0% |33 bytes |0 bytes
+|===============================================================
+
+Scan (read 866k refs), by reference name lookup (single ref from 866k
+refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs):
+
+[cols=",>,>,>,>",options="header",]
+|=========================================================
+|format |cache |scan |by name |by SHA-1
+|packed-refs |cold |402 ms |409,660.1 usec |412,535.8 usec
+|packed-refs |hot | |6,844.6 usec |20,110.1 usec
+|reftable |cold |112 ms |33.9 usec |323.2 usec
+|reftable |hot | |20.2 usec |320.8 usec
+|=========================================================
+
+Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable:
+
+[cols=",>,>",options="header",]
+|================================
+|format |size |avg entry
+|$GIT_DIR/logs |173 M |1209 bytes
+|reftable |5 M |37 bytes
+|================================
+
+Details
+~~~~~~~
+
+Peeling
+^^^^^^^
+
+References stored in a reftable are peeled, a record for an annotated
+(or signed) tag records both the tag object, and the object it refers
+to.
+
+Reference name encoding
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Reference names are an uninterpreted sequence of bytes that must pass
+linkgit:git-check-ref-format[1] as a valid reference name.
+
+Key unicity
+^^^^^^^^^^^
+
+Each entry must have a unique key; repeated keys are disallowed.
+
+Network byte order
+^^^^^^^^^^^^^^^^^^
+
+All multi-byte, fixed width fields are in network byte order.
+
+Ordering
+^^^^^^^^
+
+Blocks are lexicographically ordered by their first reference.
+
+Directory/file conflicts
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format accepts both `refs/heads/foo` and
+`refs/heads/foo/bar` as distinct references.
+
+This property is useful for retaining log records in reftable, but may
+confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
+references. Users of reftable may choose to continue to reject `foo` and
+`foo/bar` type conflicts to prevent problems for peers.
+
+File format
+~~~~~~~~~~~
+
+Structure
+^^^^^^^^^
+
+A reftable file has the following high-level structure:
+
+....
+first_block {
+  header
+  first_ref_block
+}
+ref_block*
+ref_index*
+obj_block*
+obj_index*
+log_block*
+log_index*
+footer
+....
+
+A log-only file omits the `ref_block`, `ref_index`, `obj_block` and
+`obj_index` sections, containing only the file header and log block:
+
+....
+first_block {
+  header
+}
+log_block*
+log_index*
+footer
+....
+
+in a log-only file the first log block immediately follows the file
+header, without padding to block alignment.
+
+Block size
+^^^^^^^^^^
+
+The file’s block size is arbitrarily determined by the writer, and does
+not have to be a power of 2. The block size must be larger than the
+longest reference name or log entry used in the repository, as
+references cannot span blocks.
+
+Powers of two that are friendly to the virtual memory system or
+filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
+yield better compression, with a possible increased cost incurred by
+readers during access.
+
+The largest block size is `16777215` bytes (15.99 MiB).
+
+Block alignment
+^^^^^^^^^^^^^^^
+
+Writers may choose to align blocks at multiples of the block size by
+including `padding` filled with NUL bytes at the end of a block to round
+out to the chosen alignment. When alignment is used, writers must
+specify the alignment with the file header’s `block_size` field.
+
+Block alignment is not required by the file format. Unaligned files must
+set `block_size = 0` in the file header, and omit `padding`. Unaligned
+files with more than one ref block must include the link:#Ref-index[ref
+index] to support fast lookup. Readers must be able to read both aligned
+and non-aligned files.
+
+Very small files (e.g. 1 only ref block) may omit `padding` and the ref
+index to reduce total file size.
+
+Header
+^^^^^^
+
+A 24-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+....
+
+Aligned files must specify `block_size` to configure readers with the
+expected block alignment. Unaligned files must set `block_size = 0`.
+
+The `min_update_index` and `max_update_index` describe bounds for the
+`update_index` field of all log records in this file. When reftables are
+used in a stack for link:#Update-transactions[transactions], these
+fields can order the files such that the prior file’s
+`max_update_index + 1` is the next file’s `min_update_index`.
+
+First ref block
+^^^^^^^^^^^^^^^
+
+The first ref block shares the same block as the file header, and is 24
+bytes smaller than all other blocks in the file. The first block
+immediately begins after the file header, at position 24.
+
+If the first block is a log block (a log-only file), its block header
+begins immediately at position 24.
+
+Ref block format
+^^^^^^^^^^^^^^^^
+
+A ref block is written as:
+
+....
+'r'
+uint24( block_len )
+ref_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
+encodes the number of bytes in the block up to, but not including the
+optional `padding`. This is always less than or equal to the file’s
+block size. In the first ref block, `block_len` includes 24 bytes for
+the file header.
+
+The 2-byte `restart_count` stores the number of entries in the
+`restart_offset` list, which must not be empty. Readers can use
+`restart_count` to binary search between restarts before starting a
+linear scan.
+
+Exactly `restart_count` 3-byte `restart_offset` values precedes the
+`restart_count`. Offsets are relative to the start of the block and
+refer to the first byte of any `ref_record` whose name has not been
+prefix compressed. Entries in the `restart_offset` list must be sorted,
+ascending. Readers can start linear scans from any of these records.
+
+A variable number of `ref_record` fill the middle of the block,
+describing reference names and values. The format is described below.
+
+As the first ref block shares the first file block with the file header,
+all `restart_offset` in the first block are relative to the start of the
+file (position 0), and include the file header. This forces the first
+`restart_offset` to be `28`.
+
+ref record
+++++++++++
+
+A `ref_record` describes a single reference, storing both the name and
+its value(s). Records are formatted as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | value_type )
+suffix
+varint( update_index_delta )
+value?
+....
+
+The `prefix_length` field specifies how many leading bytes of the prior
+reference record’s name should be copied to obtain this reference’s
+name. This must be 0 for the first reference in any block, and also must
+be 0 for any `ref_record` whose offset is listed in the `restart_offset`
+table at the end of the block.
+
+Recovering a reference name from any `ref_record` is a simple concat:
+
+....
+this_name = prior_name[0..prefix_length] + suffix
+....
+
+The `suffix_length` value provides the number of bytes available in
+`suffix` to copy from `suffix` to complete the reference name.
+
+The `update_index` that last modified the reference can be obtained by
+adding `update_index_delta` to the `min_update_index` from the file
+header: `min_update_index + update_index_delta`.
+
+The `value` follows. Its format is determined by `value_type`, one of
+the following:
+
+* `0x0`: deletion; no value data (see transactions, below)
+* `0x1`: one 20-byte object id; value of the ref
+* `0x2`: two 20-byte object ids; value of the ref, peeled target
+* `0x3`: symbolic reference: `varint( target_len ) target`
+
+Symbolic references use `0x3`, followed by the complete name of the
+reference target. No compression is applied to the target name.
+
+Types `0x4..0x7` are reserved for future use.
+
+Ref index
+^^^^^^^^^
+
+The ref index stores the name of the last reference from every ref block
+in the file, enabling reduced disk seeks for lookups. Any reference can
+be found by searching the index, identifying the containing block, and
+searching within that block.
+
+The index may be organized into a multi-level index, where the 1st level
+index block points to additional ref index blocks (2nd level), which may
+in turn point to either additional index blocks (e.g. 3rd level) or ref
+blocks (leaf level). Disk reads required to access a ref go up with
+higher index levels. Multi-level indexes may be required to ensure no
+single index block exceeds the file format’s max block size of
+`16777215` bytes (15.99 MiB). To acheive constant O(1) disk seeks for
+lookups the index must be a single level, which is permitted to exceed
+the file’s configured block size, but not the format’s max block size of
+15.99 MiB.
+
+If present, the ref index block(s) appears after the last ref block.
+
+If there are at least 4 ref blocks, a ref index block should be written
+to improve lookup times. Cold reads using the index require 2 disk reads
+(read index, read block), and binary searching < 4 blocks also requires
+<= 2 reads. Omitting the index block from smaller files saves space.
+
+If the file is unaligned and contains more than one ref block, the ref
+index must be written.
+
+Index block format:
+
+....
+'i'
+uint24( block_len )
+index_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+The index blocks begin with `block_type = 'i'` and a 3-byte `block_len`
+which encodes the number of bytes in the block, up to but not including
+the optional `padding`.
+
+The `restart_offset` and `restart_count` fields are identical in format,
+meaning and usage as in ref blocks.
+
+To reduce the number of reads required for random access in very large
+files the index block may be larger than other blocks. However, readers
+must hold the entire index in memory to benefit from this, so it’s a
+time-space tradeoff in both file size and reader memory.
+
+Increasing the file’s block size decreases the index size. Alternatively
+a multi-level index may be used, keeping index blocks within the file’s
+block size, but increasing the number of blocks that need to be
+accessed.
+
+index record
+++++++++++++
+
+An index record describes the last entry in another block. Index records
+are written as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | 0 )
+suffix
+varint( block_position )
+....
+
+Index records use prefix compression exactly like `ref_record`.
+
+Index records store `block_position` after the suffix, specifying the
+absolute position in bytes (from the start of the file) of the block
+that ends with this reference. Readers can seek to `block_position` to
+begin reading the block header.
+
+Readers must examine the block header at `block_position` to determine
+if the next block is another level index block, or the leaf-level ref
+block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the ref index must first read the footer (below) to
+obtain `ref_index_position`. If not present, the position will be 0. The
+`ref_index_position` is for the 1st level root of the ref index.
+
+Obj block format
+^^^^^^^^^^^^^^^^
+
+Object blocks are optional. Writers may choose to omit object blocks,
+especially if readers will not use the SHA-1 to ref mapping.
+
+Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to
+ref blocks containing references pointing to that object directly, or as
+the peeled value of an annotated tag. Like ref blocks, object blocks use
+the file’s standard block size. The abbrevation length is available in
+the footer as `obj_id_len`.
+
+To save space in small files, object blocks may be omitted if the ref
+index is not present, as brute force search will only need to read a few
+ref blocks. When missing, readers should brute force a linear search of
+all references to lookup by SHA-1.
+
+An object block is written as:
+
+....
+'o'
+uint24( block_len )
+obj_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Fields are identical to ref block. Binary search using the restart table
+works the same as in reference blocks.
+
+Because object identifiers are abbreviated by writers to the shortest
+unique abbreviation within the reftable, obj key lengths are variable
+between 2 and 20 bytes. Readers must compare only for common prefix
+match within an obj block or obj index.
+
+obj record
+++++++++++
+
+An `obj_record` describes a single object abbreviation, and the blocks
+containing references using that unique abbreviation:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | cnt_3 )
+suffix
+varint( cnt_large )?
+varint( position_delta )*
+....
+
+Like in reference blocks, abbreviations are prefix compressed within an
+obj block. On large reftables with many unique objects, higher block
+sizes (64k), and higher restart interval (128), a `prefix_length` of 2
+or 3 and `suffix_length` of 3 may be common in obj records (unique
+abbreviation of 5-6 raw bytes, 10-12 hex digits).
+
+Each record contains `position_count` number of positions for matching
+ref blocks. For 1-7 positions the count is stored in `cnt_3`. When
+`cnt_3 = 0` the actual count follows in a varint, `cnt_large`.
+
+The use of `cnt_3` bets most objects are pointed to by only a single
+reference, some may be pointed to by a couple of references, and very
+few (if any) are pointed to by more than 7 references.
+
+A special case exists when `cnt_3 = 0` and `cnt_large = 0`: there are no
+`position_delta`, but at least one reference starts with this
+abbreviation. A reader that needs exact reference names must scan all
+references to find which specific references have the desired object.
+Writers should use this format when the `position_delta` list would have
+overflowed the file’s block size due to a high number of references
+pointing to the same object.
+
+The first `position_delta` is the position from the start of the file.
+Additional `position_delta` entries are sorted ascending and relative to
+the prior entry, e.g. a reader would perform:
+
+....
+pos = position_delta[0]
+prior = pos
+for (j = 1; j < position_count; j++) {
+  pos = prior + position_delta[j]
+  prior = pos
+}
+....
+
+With a position in hand, a reader must linearly scan the ref block,
+starting from the first `ref_record`, testing each reference’s SHA-1s
+(for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
+SHA-1 within a single ref block is not supported by the reftable format.
+Smaller block sizes reduce the number of candidates this step must
+consider.
+
+Obj index
+^^^^^^^^^
+
+The obj index stores the abbreviation from the last entry for every obj
+block in the file, enabling reduced disk seeks for all lookups. It is
+formatted exactly the same as the ref index, but refers to obj blocks.
+
+The obj index should be present if obj blocks are present, as obj blocks
+should only be written in larger files.
+
+Readers loading the obj index must first read the footer (below) to
+obtain `obj_index_position`. If not present, the position will be 0.
+
+Log block format
+^^^^^^^^^^^^^^^^
+
+Unlike ref and obj blocks, log blocks are always unaligned.
+
+Log blocks are variable in size, and do not match the `block_size`
+specified in the file header or footer. Writers should choose an
+appropriate buffer size to prepare a log block for deflation, such as
+`2 * block_size`.
+
+A log block is written as:
+
+....
+'g'
+uint24( block_len )
+zlib_deflate {
+  log_record+
+  uint24( restart_offset )+
+  uint16( restart_count )
+}
+....
+
+Log blocks look similar to ref blocks, except `block_type = 'g'`.
+
+The 4-byte block header is followed by the deflated block contents using
+zlib deflate. The `block_len` in the header is the inflated size
+(including 4-byte block header), and should be used by readers to
+preallocate the inflation output buffer. A log block’s `block_len` may
+exceed the file’s block size.
+
+Offsets within the log block (e.g. `restart_offset`) still include the
+4-byte header. Readers may prefer prefixing the inflation output buffer
+with the 4-byte header.
+
+Within the deflate container, a variable number of `log_record` describe
+reference changes. The log record format is described below. See ref
+block format (above) for a description of `restart_offset` and
+`restart_count`.
+
+Because log blocks have no alignment or padding between blocks, readers
+must keep track of the bytes consumed by the inflater to know where the
+next log block begins.
+
+log record
+++++++++++
+
+Log record keys are structured as:
+
+....
+ref_name '\0' reverse_int64( update_index )
+....
+
+where `update_index` is the unique transaction identifier. The
+`update_index` field must be unique within the scope of a `ref_name`.
+See the update transactions section below for further details.
+
+The `reverse_int64` function inverses the value so lexographical
+ordering the network byte order encoding sorts the more recent records
+with higher `update_index` values first:
+
+....
+reverse_int64(int64 t) {
+  return 0xffffffffffffffff - t;
+}
+....
+
+Log records have a similar starting structure to ref and index records,
+utilizing the same prefix compression scheme applied to the log record
+key described above.
+
+....
+    varint( prefix_length )
+    varint( (suffix_length << 3) | log_type )
+    suffix
+    log_data {
+      old_id
+      new_id
+      varint( name_length    )  name
+      varint( email_length   )  email
+      varint( time_seconds )
+      sint16( tz_offset )
+      varint( message_length )  message
+    }?
+....
+
+Log record entries use `log_type` to indicate what follows:
+
+* `0x0`: deletion; no log data.
+* `0x1`: standard git reflog data using `log_data` above.
+
+The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
+entry from the reflog of `refs/stash` in a transaction file (below),
+without needing to rewrite larger files. Readers reading a stack of
+reflogs must treat this as a deletion.
+
+For `log_type = 0x1`, the `log_data` section follows
+linkgit:git-update-ref[1] logging and includes:
+
+* two 20-byte SHA-1s (old id, new id)
+* varint string of committer’s name
+* varint string of committer’s email
+* varint time in seconds since epoch (Jan 1, 1970)
+* 2-byte timezone offset in minutes (signed)
+* varint string of message
+
+`tz_offset` is the absolute number of minutes from GMT the committer was
+at the time of the update. For example `GMT-0800` is encoded in reftable
+as `sint16(-480)` and `GMT+0230` is `sint16(150)`.
+
+The committer email does not contain `<` or `>`, it’s the value normally
+found between the `<>` in a git commit object header.
+
+The `message_length` may be 0, in which case there was no message
+supplied for the update.
+
+Contrary to traditional reflog (which is a file), renames are encoded as
+a combination of ref deletion and ref creation.
+
+Reading the log
++++++++++++++++
+
+Readers accessing the log must first read the footer (below) to
+determine the `log_position`. The first block of the log begins at
+`log_position` bytes since the start of the file. The `log_position` is
+not block aligned.
+
+Importing logs
+++++++++++++++
+
+When importing from `$GIT_DIR/logs` writers should globally order all
+log records roughly by timestamp while preserving file order, and assign
+unique, increasing `update_index` values for each log line. Newer log
+records get higher `update_index` values.
+
+Although an import may write only a single reftable file, the reftable
+file must span many unique `update_index`, as each log line requires its
+own `update_index` to preserve semantics.
+
+Log index
+^^^^^^^^^
+
+The log index stores the log key
+(`refname \0 reverse_int64(update_index)`) for the last log record of
+every log block in the file, supporting bounded-time lookup.
+
+A log index block must be written if 2 or more log blocks are written to
+the file. If present, the log index appears after the last log block.
+There is no padding used to align the log index to block alignment.
+
+Log index format is identical to ref index, except the keys are 9 bytes
+longer to include `'\0'` and the 8-byte `reverse_int64(update_index)`.
+Records use `block_position` to refer to the start of a log block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the log index must first read the footer (below) to
+obtain `log_index_position`. If not present, the position will be 0.
+
+Footer
+^^^^^^
+
+After the last block of the file, a file footer is written. It begins
+like the file header, but is extended with additional data.
+
+A 68-byte footer appears at the end:
+
+....
+    'REFT'
+    uint8( version_number = 1 )
+    uint24( block_size )
+    uint64( min_update_index )
+    uint64( max_update_index )
+
+    uint64( ref_index_position )
+    uint64( (obj_position << 5) | obj_id_len )
+    uint64( obj_index_position )
+
+    uint64( log_position )
+    uint64( log_index_position )
+
+    uint32( CRC-32 of above )
+....
+
+If a section is missing (e.g. ref index) the corresponding position
+field (e.g. `ref_index_position`) will be 0.
+
+* `obj_position`: byte position for the first obj block.
+* `obj_id_len`: number of bytes used to abbreviate object identifiers in
+obj blocks.
+* `log_position`: byte position for the first log block.
+* `ref_index_position`: byte position for the start of the ref index.
+* `obj_index_position`: byte position for the start of the obj index.
+* `log_index_position`: byte position for the start of the log index.
+
+Reading the footer
+++++++++++++++++++
+
+Readers must seek to `file_length - 68` to access the footer. A trusted
+external source (such as `stat(2)`) is necessary to obtain
+`file_length`. When reading the footer, readers must verify:
+
+* 4-byte magic is correct
+* 1-byte version number is recognized
+* 4-byte CRC-32 matches the other 64 bytes (including magic, and
+version)
+
+Once verified, the other fields of the footer can be accessed.
+
+Varint encoding
+^^^^^^^^^^^^^^^
+
+Varint encoding is identical to the ofs-delta encoding method used
+within pack files.
+
+Decoder works such as:
+
+....
+val = buf[ptr] & 0x7f
+while (buf[ptr] & 0x80) {
+  ptr++
+  val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
+}
+....
+
+Binary search
+^^^^^^^^^^^^^
+
+Binary search within a block is supported by the `restart_offset` fields
+at the end of the block. Readers can binary search through the restart
+table to locate between which two restart points the sought reference or
+key should appear.
+
+Each record identified by a `restart_offset` stores the complete key in
+the `suffix` field of the record, making the compare operation during
+binary search straightforward.
+
+Once a restart point lexicographically before the sought reference has
+been identified, readers can linearly scan through the following record
+entries to locate the sought record, terminating if the current record
+sorts after (and therefore the sought key is not present).
+
+Restart point selection
++++++++++++++++++++++++
+
+Writers determine the restart points at file creation. The process is
+arbitrary, but every 16 or 64 records is recommended. Every 16 may be
+more suitable for smaller block sizes (4k or 8k), every 64 for larger
+block sizes (64k).
+
+More frequent restart points reduces prefix compression and increases
+space consumed by the restart table, both of which increase file size.
+
+Less frequent restart points makes prefix compression more effective,
+decreasing overall file size, with increased penalities for readers
+walking through more records after the binary search step.
+
+A maximum of `65535` restart points per block is supported.
+
+Considerations
+~~~~~~~~~~~~~~
+
+Lightweight refs dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format assumes the vast majority of references are single
+SHA-1 valued with common prefixes, such as Gerrit Code Review’s
+`refs/changes/` namespace, GitHub’s `refs/pulls/` namespace, or many
+lightweight tags in the `refs/tags/` namespace.
+
+Annotated tags storing the peeled object cost an additional 20 bytes per
+reference.
+
+Low overhead
+^^^^^^^^^^^^
+
+A reftable with very few references (e.g. git.git with 5 heads) is 269
+bytes for reftable, vs. 332 bytes for packed-refs. This supports
+reftable scaling down for transaction logs (below).
+
+Block size
+^^^^^^^^^^
+
+For a Gerrit Code Review type repository with many change refs, larger
+block sizes (64 KiB) and less frequent restart points (every 64) yield
+better compression due to more references within the block compressing
+against the prior reference.
+
+Larger block sizes reduce the index size, as the reftable will require
+fewer blocks to store the same number of references.
+
+Minimal disk seeks
+^^^^^^^^^^^^^^^^^^
+
+Assuming the index block has been loaded into memory, binary searching
+for any single reference requires exactly 1 disk seek to load the
+containing block.
+
+Scans and lookups dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Scanning all references and lookup by name (or namespace such as
+`refs/heads/`) are the most common activities performed on repositories.
+SHA-1s are stored directly with references to optimize this use case.
+
+Logs are infrequently read
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are infrequently accessed, but can be large. Deflating log blocks
+saves disk space, with some increased penalty at read time.
+
+Logs are stored in an isolated section from refs, reducing the burden on
+reference readers that want to ignore logs. Further, historical logs can
+be isolated into log-only files.
+
+Logs are read backwards
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are frequently accessed backwards (most recent N records for master
+to answer `master@{4}`), so log records are grouped by reference, and
+sorted descending by update index.
+
+Repository format
+~~~~~~~~~~~~~~~~~
+
+Version 1
+^^^^^^^^^
+
+A repository must set its `$GIT_DIR/config` to configure reftable:
+
+....
+[core]
+    repositoryformatversion = 1
+[extensions]
+    refStorage = reftable
+....
+
+Layout
+^^^^^^
+
+A collection of reftable files are stored in the `$GIT_DIR/reftable/`
+directory:
+
+....
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+where reftable files are named by a unique name such as produced by the
+function `${min_update_index}-${max_update_index}.ref`.
+
+Log-only files use the `.log` extension, while ref-only and mixed ref
+and log files use `.ref`. extension.
+
+The stack ordering file is `$GIT_DIR/reftable/tables.list` and lists the
+current files, one per line, in order, from oldest (base) to newest
+(most recent):
+
+....
+$ cat .git/reftable/tables.list
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+Readers must read `$GIT_DIR/reftable/tables.list` to determine which
+files are relevant right now, and search through the stack in reverse
+order (last reftable is examined first).
+
+Reftable files not listed in `tables.list` may be new (and about to be
+added to the stack by the active writer), or ancient and ready to be
+pruned.
+
+Backward compatibility
+^^^^^^^^^^^^^^^^^^^^^^
+
+Older clients should continue to recognize the directory as a git
+repository so they don’t look for an enclosing repository in parent
+directories. To this end, a reftable-enabled repository must contain the
+following dummy files
+
+* `.git/HEAD`, a regular file containing `ref: refs/heads/.invalid`.
+* `.git/refs/`, a directory
+* `.git/refs/heads`, a regular file
+
+Readers
+^^^^^^^
+
+Readers can obtain a consistent snapshot of the reference space by
+following:
+
+1.  Open and read the `tables.list` file.
+2.  Open each of the reftable files that it mentions.
+3.  If any of the files is missing, goto 1.
+4.  Read from the now-open files as long as necessary.
+
+Update transactions
+^^^^^^^^^^^^^^^^^^^
+
+Although reftables are immutable, mutations are supported by writing a
+new reftable and atomically appending it to the stack:
+
+1.  Acquire `tables.list.lock`.
+2.  Read `tables.list` to determine current reftables.
+3.  Select `update_index` to be most recent file’s
+`max_update_index + 1`.
+4.  Prepare temp reftable `tmp_XXXXXX`, including log entries.
+5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
+6.  Copy `tables.list` to `tables.list.lock`, appending file from (5).
+7.  Rename `tables.list.lock` to `tables.list`.
+
+During step 4 the new file’s `min_update_index` and `max_update_index`
+are both set to the `update_index` selected by step 3. All log records
+for the transaction use the same `update_index` in their keys. This
+enables later correlation of which references were updated by the same
+transaction.
+
+Because a single `tables.list.lock` file is used to manage locking, the
+repository is single-threaded for writers. Writers may have to busy-spin
+(with backoff) around creating `tables.list.lock`, for up to an
+acceptable wait period, aborting if the repository is too busy to
+mutate. Application servers wrapped around repositories (e.g. Gerrit
+Code Review) can layer their own lock/wait queue to improve fairness to
+writers.
+
+Reference deletions
+^^^^^^^^^^^^^^^^^^^
+
+Deletion of any reference can be explicitly stored by setting the `type`
+to `0x0` and omitting the `value` field of the `ref_record`. This serves
+as a tombstone, overriding any assertions about the existence of the
+reference from earlier files in the stack.
+
+Compaction
+^^^^^^^^^^
+
+A partial stack of reftables can be compacted by merging references
+using a straightforward merge join across reftables, selecting the most
+recent value for output, and omitting deleted references that do not
+appear in remaining, lower reftables.
+
+A compacted reftable should set its `min_update_index` to the smallest
+of the input files’ `min_update_index`, and its `max_update_index`
+likewise to the largest input `max_update_index`.
+
+For sake of illustration, assume the stack currently consists of
+reftable files (from oldest to newest): A, B, C, and D. The compactor is
+going to compact B and C, leaving A and D alone.
+
+1.  Obtain lock `tables.list.lock` and read the `tables.list` file.
+2.  Obtain locks `B.lock` and `C.lock`. Ownership of these locks
+prevents other processes from trying to compact these files.
+3.  Release `tables.list.lock`.
+4.  Compact `B` and `C` into a temp file
+`${min_update_index}-${max_update_index}_XXXXXX`.
+5.  Reacquire lock `tables.list.lock`.
+6.  Verify that `B` and `C` are still in the stack, in that order. This
+should always be the case, assuming that other processes are adhering to
+the locking protocol.
+7.  Rename `${min_update_index}-${max_update_index}_XXXXXX` to
+`${min_update_index}-${max_update_index}.ref`.
+8.  Write the new stack to `tables.list.lock`, replacing `B` and `C`
+with the file from (4).
+9.  Rename `tables.list.lock` to `tables.list`.
+10. Delete `B` and `C`, perhaps after a short sleep to avoid forcing
+readers to backtrack.
+
+This strategy permits compactions to proceed independently of updates.
+
+Each reftable (compacted or not) is uniquely identified by its name, so
+open reftables can be cached by their name.
+
+Alternatives considered
+~~~~~~~~~~~~~~~~~~~~~~~
+
+bzip packed-refs
+^^^^^^^^^^^^^^^^
+
+`bzip2` can significantly shrink a large packed-refs file (e.g. 62 MiB
+compresses to 23 MiB, 37%). However the bzip format does not support
+random access to a single reference. Readers must inflate and discard
+while performing a linear scan.
+
+Breaking packed-refs into chunks (individually compressing each chunk)
+would reduce the amount of data a reader must inflate, but still leaves
+the problem of indexing chunks to support readers efficiently locating
+the correct chunk.
+
+Given the compression achieved by reftable’s encoding, it does not seem
+necessary to add the complexity of bzip/gzip/zlib.
+
+Michael Haggerty’s alternate format
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Michael Haggerty proposed
+https://public-inbox.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe+wGvEJ7A@mail.gmail.com/[an
+alternate] format to reftable on the Git mailing list. This format uses
+smaller chunks, without the restart table, and avoids block alignment
+with padding. Reflog entries immediately follow each ref, and are thus
+interleaved between refs.
+
+Performance testing indicates reftable is faster for lookups (51%
+faster, 11.2 usec vs. 5.4 usec), although reftable produces a slightly
+larger file (+ ~3.2%, 28.3M vs 29.2M):
+
+[cols=">,>,>,>",options="header",]
+|=====================================
+|format |size |seek cold |seek hot
+|mh-alt |28.3 M |23.4 usec |11.2 usec
+|reftable |29.2 M |19.9 usec |5.4 usec
+|=====================================
+
+JGit Ketch RefTree
+^^^^^^^^^^^^^^^^^^
+
+https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg03073.html[JGit Ketch]
+proposed
+https://public-inbox.org/git/CAJo=hJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg@mail.gmail.com/[RefTree],
+an encoding of references inside Git tree objects stored as part of the
+repository’s object database.
+
+The RefTree format adds additional load on the object database storage
+layer (more loose objects, more objects in packs), and relies heavily on
+the packer’s delta compression to save space. Namespaces which are flat
+(e.g. thousands of tags in refs/tags) initially create very large loose
+objects, and so RefTree does not address the problem of copying many
+references to modify a handful.
+
+Flat namespaces are not efficiently searchable in RefTree, as tree
+objects in canonical formatting cannot be binary searched. This fails
+the need to handle a large number of references in a single namespace,
+such as GitHub’s `refs/pulls`, or a project with many tags.
+
+LMDB
+^^^^
+
+David Turner proposed
+https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/[using
+LMDB], as LMDB is lightweight (64k of runtime code) and GPL-compatible
+license.
+
+A downside of LMDB is its reliance on a single C implementation. This
+makes embedding inside JGit (a popular reimplemenation of Git)
+difficult, and hoisting onto virtual storage (for JGit DFS) virtually
+impossible.
+
+A common format that can be supported by all major Git implementations
+(git-core, JGit, libgit2) is strongly preferred.
+
+Future
+~~~~~~
+
+Longer hashes
+^^^^^^^^^^^^^
+
+Version will bump (e.g. 2) to indicate `value` uses a different object
+id length other than 20. The length could be stored in an expanded file
+header, or hardcoded as part of the version.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v8 6/9] reftable: define version 2 of the spec to accomodate SHA256
  2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
                                 ` (4 preceding siblings ...)
  2020-04-01 11:28               ` [PATCH v8 5/9] reftable: file format documentation Jonathan Nieder via GitGitGadget
@ 2020-04-01 11:28               ` Han-Wen Nienhuys via GitGitGadget
  2020-04-01 11:28               ` [PATCH v8 7/9] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
                                 ` (4 subsequent siblings)
  10 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-01 11:28 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/technical/reftable.txt | 50 ++++++++++++++++------------
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index 9fa4657d9ff..ee3f36ea851 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -193,8 +193,8 @@ and non-aligned files.
 Very small files (e.g. 1 only ref block) may omit `padding` and the ref
 index to reduce total file size.
 
-Header
-^^^^^^
+Header (version 1)
+^^^^^^^^^^^^^^^^^^
 
 A 24-byte header appears at the beginning of the file:
 
@@ -215,6 +215,24 @@ used in a stack for link:#Update-transactions[transactions], these
 fields can order the files such that the prior file’s
 `max_update_index + 1` is the next file’s `min_update_index`.
 
+Header (version 2)
+^^^^^^^^^^^^^^^^^^
+
+A 28-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+uint32( hash_id )
+....
+
+The header is identical to `version_number=1`, with the 4-byte hash ID
+("sha1" for SHA1 and "s256" for SHA-256) append to the header.
+
+
 First ref block
 ^^^^^^^^^^^^^^^
 
@@ -671,14 +689,8 @@ Footer
 After the last block of the file, a file footer is written. It begins
 like the file header, but is extended with additional data.
 
-A 68-byte footer appears at the end:
-
 ....
-    'REFT'
-    uint8( version_number = 1 )
-    uint24( block_size )
-    uint64( min_update_index )
-    uint64( max_update_index )
+    HEADER
 
     uint64( ref_index_position )
     uint64( (obj_position << 5) | obj_id_len )
@@ -701,12 +713,16 @@ obj blocks.
 * `obj_index_position`: byte position for the start of the obj index.
 * `log_index_position`: byte position for the start of the log index.
 
+The size of the footer is 68 bytes for version 1, and 72 bytes for
+version 2.
+
 Reading the footer
 ++++++++++++++++++
 
-Readers must seek to `file_length - 68` to access the footer. A trusted
-external source (such as `stat(2)`) is necessary to obtain
-`file_length`. When reading the footer, readers must verify:
+Readers must first read the file start to determine the version
+number. Then they seek to `file_length - FOOTER_LENGTH` to access the
+footer. A trusted external source (such as `stat(2)`) is necessary to
+obtain `file_length`. When reading the footer, readers must verify:
 
 * 4-byte magic is correct
 * 1-byte version number is recognized
@@ -1055,13 +1071,3 @@ impossible.
 
 A common format that can be supported by all major Git implementations
 (git-core, JGit, libgit2) is strongly preferred.
-
-Future
-~~~~~~
-
-Longer hashes
-^^^^^^^^^^^^^
-
-Version will bump (e.g. 2) to indicate `value` uses a different object
-id length other than 20. The length could be stored in an expanded file
-header, or hardcoded as part of the version.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v8 7/9] reftable: clarify how empty tables should be written
  2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
                                 ` (5 preceding siblings ...)
  2020-04-01 11:28               ` [PATCH v8 6/9] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
@ 2020-04-01 11:28               ` Han-Wen Nienhuys via GitGitGadget
  2020-04-01 11:28               ` [PATCH v8 8/9] Add reftable library Han-Wen Nienhuys via GitGitGadget
                                 ` (3 subsequent siblings)
  10 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-01 11:28 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The format allows for some ambiguity, as a lone footer also starts
with a valid file header. However, the current JGit code will barf on
this. This commit codifies this behavior into the standard.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/technical/reftable.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index ee3f36ea851..df5cb5f11b8 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -731,6 +731,13 @@ version)
 
 Once verified, the other fields of the footer can be accessed.
 
+Empty tables
+++++++++++++
+
+A reftable may be empty. In this case, the file starts with a header
+and is immediately followed by a footer.
+
+
 Varint encoding
 ^^^^^^^^^^^^^^^
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v8 8/9] Add reftable library
  2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
                                 ` (6 preceding siblings ...)
  2020-04-01 11:28               ` [PATCH v8 7/9] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
@ 2020-04-01 11:28               ` Han-Wen Nienhuys via GitGitGadget
  2020-04-01 11:28               ` [PATCH v8 9/9] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                                 ` (2 subsequent siblings)
  10 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-01 11:28 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. Its rationale and
specification is in the preceding commit.

This imports the upstream library as one big commit. For understanding
the code, it is suggested to read the code in the following order:

* The specification under Documentation/technical/reftable.txt

* reftable.h - the public API

* block.{c,h} - reading and writing blocks.

* writer.{c,h} - writing a complete reftable file.

* merged.{c,h} and pq.{c,h} - reading a stack of reftables

* stack.{c,h} - writing and compacting stacks of reftable on the
filesystem.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE       |   31 ++
 reftable/README.md     |   11 +
 reftable/VERSION       |    5 +
 reftable/basics.c      |  194 +++++++
 reftable/basics.h      |   35 ++
 reftable/block.c       |  433 +++++++++++++++
 reftable/block.h       |   76 +++
 reftable/blocksource.h |   22 +
 reftable/bytes.c       |    0
 reftable/config.h      |    1 +
 reftable/constants.h   |   21 +
 reftable/dump.c        |   97 ++++
 reftable/file.c        |   98 ++++
 reftable/iter.c        |  234 +++++++++
 reftable/iter.h        |   56 ++
 reftable/merged.c      |  299 +++++++++++
 reftable/merged.h      |   34 ++
 reftable/pq.c          |  115 ++++
 reftable/pq.h          |   34 ++
 reftable/reader.c      |  756 +++++++++++++++++++++++++++
 reftable/reader.h      |   60 +++
 reftable/record.c      | 1098 ++++++++++++++++++++++++++++++++++++++
 reftable/record.h      |  106 ++++
 reftable/reftable.h    |  520 ++++++++++++++++++
 reftable/slice.c       |  219 ++++++++
 reftable/slice.h       |   42 ++
 reftable/stack.c       | 1132 ++++++++++++++++++++++++++++++++++++++++
 reftable/stack.h       |   42 ++
 reftable/system.h      |   53 ++
 reftable/tree.c        |   67 +++
 reftable/tree.h        |   24 +
 reftable/update.sh     |   14 +
 reftable/writer.c      |  653 +++++++++++++++++++++++
 reftable/writer.h      |   45 ++
 reftable/zlib-compat.c |   92 ++++
 35 files changed, 6719 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 00000000000..402e0f9356b
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 00000000000..21500563fd4
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,11 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   sh reftable/update.sh
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 00000000000..a811e02b389
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,5 @@
+commit 920f7413f7a25f308d9a203f584873f6753192ec
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Wed Apr 1 12:41:41 2020 +0200
+
+    C: remove stray debug printfs
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 00000000000..3080cc4f500
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,194 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)((i)&0xff);
+}
+
+uint32_t get_be24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)((i)&0xff);
+}
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(int sz, int (*f)(int k, void *args), void *args)
+{
+	int lo = 0;
+	int hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		int mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		reftable_free(*p);
+		p++;
+	}
+	reftable_free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = reftable_realloc(
+					names, names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = reftable_realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *reftable_error_str(int err)
+{
+	switch (err) {
+	case IO_ERROR:
+		return "I/O error";
+	case FORMAT_ERROR:
+		return "FORMAT_ERROR";
+	case NOT_EXIST_ERROR:
+		return "NOT_EXIST_ERROR";
+	case LOCK_ERROR:
+		return "LOCK_ERROR";
+	case API_ERROR:
+		return "API_ERROR";
+	case ZLIB_ERROR:
+		return "ZLIB_ERROR";
+	case -1:
+		return "general error";
+	default:
+		return "unknown error code";
+	}
+}
+
+void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
+void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
+void (*reftable_free_ptr)(void *) = &free;
+
+void *reftable_malloc(size_t sz)
+{
+	return (*reftable_malloc_ptr)(sz);
+}
+
+void *reftable_realloc(void *p, size_t sz)
+{
+	return (*reftable_realloc_ptr)(p, sz);
+}
+
+void reftable_free(void *p)
+{
+	reftable_free_ptr(p);
+}
+
+void *reftable_calloc(size_t sz)
+{
+	void *p = reftable_malloc(sz);
+	memset(p, 0, sz);
+	return p;
+}
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *))
+{
+	reftable_malloc_ptr = malloc;
+	reftable_realloc_ptr = realloc;
+	reftable_free_ptr = free;
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 00000000000..aa389782b62
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,35 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+void put_be24(byte *out, uint32_t i);
+uint32_t get_be24(byte *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+int binsearch(int sz, int (*f)(int k, void *args), void *args);
+
+void free_names(char **a);
+void parse_names(char *buf, int size, char ***namesp);
+int names_equal(char **a, char **b);
+int names_length(char **names);
+
+void *reftable_malloc(size_t sz);
+void *reftable_realloc(void *p, size_t sz);
+void reftable_free(void *p);
+void *reftable_calloc(size_t sz);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 00000000000..3bb04306222
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,433 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "blocksource.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int header_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 24;
+	case 2:
+		return 28;
+	}
+	abort();
+}
+
+int footer_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 68;
+	case 2:
+		return 72;
+	}
+	abort();
+}
+
+int block_writer_register_restart(struct reftable_block_writer *w, int n,
+				  bool restart, struct slice key);
+
+void block_writer_init(struct reftable_block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+}
+
+byte block_writer_type(struct reftable_block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct reftable_block_writer *w, struct record rec)
+{
+	struct slice empty = { 0 };
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = { 0 };
+	int n = 0;
+
+	record_key(rec, &key);
+	n = encode_key(&restart, out, last, key, record_val_type(rec));
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	n = record_encode(rec, out, w->hash_size);
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0) {
+		goto err;
+	}
+
+	reftable_free(slice_yield(&key));
+	return 0;
+
+err:
+	reftable_free(slice_yield(&key));
+	return -1;
+}
+
+int block_writer_register_restart(struct reftable_block_writer *w, int n,
+				  bool restart, struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next) {
+		return -1;
+	}
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = reftable_realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct reftable_block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = { 0 };
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		slice_resize(&compressed, src_len);
+
+		while (1) {
+			uLongf dest_len = compressed.len;
+
+			zresult = compress2(compressed.buf, &dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				slice_resize(&compressed, 2 * compressed.len);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				reftable_free(slice_yield(&compressed));
+				return ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed.buf,
+			       dest_len);
+			w->next = dest_len + block_header_skip;
+			break;
+		}
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct reftable_block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct reftable_block_reader *br,
+		      struct reftable_block *block, uint32_t header_off,
+		      uint32_t table_block_size, int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	if (!is_block_type(typ)) {
+		return FORMAT_ERROR;
+	}
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = { 0 };
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip;
+		uLongf src_len = block->len - block_header_skip;
+
+		slice_resize(&uncompressed, sz);
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			reftable_free(slice_yield(&uncompressed));
+			return ZLIB_ERROR;
+		}
+
+		block_source_return_block(block->source, block);
+		block->data = uncompressed.buf;
+		block->len = dst_len; /* XXX: 4 bytes missing? */
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	{
+		uint16_t restart_count = get_be16(block->data + sz - 2);
+		uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+		byte *restart_bytes = block->data + restart_start;
+
+		/* transfer ownership. */
+		br->block = *block;
+		block->data = NULL;
+		block->len = 0;
+
+		br->hash_size = hash_size;
+		br->block_len = restart_start;
+		br->full_block_size = full_block_size;
+		br->header_off = header_off;
+		br->restart_count = restart_count;
+		br->restart_bytes = restart_bytes;
+	}
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct reftable_block_reader *br,
+					    int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct reftable_block_reader *br,
+			struct reftable_block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct slice key;
+	struct reftable_block_reader *r;
+};
+
+static int restart_key_less(int idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = { 0 };
+	struct slice last_key = { 0 };
+	byte unused_extra;
+	int n = decode_key(&rkey, &unused_extra, last_key, in);
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	{
+		int result = slice_compare(a->key, rkey);
+		reftable_free(slice_yield(&rkey));
+		return result;
+	}
+}
+
+void block_iter_copy_from(struct reftable_block_iter *dest,
+			  struct reftable_block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct reftable_block_iter *it, struct record rec)
+{
+	if (it->next_off >= it->br->block_len) {
+		return 1;
+	}
+
+	{
+		struct slice in = {
+			.buf = it->br->block.data + it->next_off,
+			.len = it->br->block_len - it->next_off,
+		};
+		struct slice start = in;
+		struct slice key = { 0 };
+		byte extra;
+		int n = decode_key(&key, &extra, it->last_key, in);
+		if (n < 0) {
+			return -1;
+		}
+
+		slice_consume(&in, n);
+		n = record_decode(rec, key, extra, in, it->br->hash_size);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+
+		slice_copy(&it->last_key, key);
+		it->next_off += start.len - in.len;
+		reftable_free(slice_yield(&key));
+		return 0;
+	}
+}
+
+int block_reader_first_key(struct reftable_block_reader *br, struct slice *key)
+{
+	struct slice empty = { 0 };
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	byte extra = 0;
+	int n = decode_key(key, &extra, empty, in);
+	if (n < 0) {
+		return n;
+	}
+	return 0;
+}
+
+int block_iter_seek(struct reftable_block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct reftable_block_iter *it)
+{
+	reftable_free(slice_yield(&it->last_key));
+}
+
+int block_reader_seek(struct reftable_block_reader *br,
+		      struct reftable_block_iter *it, struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		return -1;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	{
+		struct record rec = new_record(block_reader_type(br));
+		struct slice key = { 0 };
+		int result = 0;
+		int err = 0;
+		struct reftable_block_iter next = { 0 };
+		while (true) {
+			block_iter_copy_from(&next, it);
+
+			err = block_iter_next(&next, rec);
+			if (err < 0) {
+				result = -1;
+				goto exit;
+			}
+
+			record_key(rec, &key);
+			if (err > 0 || slice_compare(key, want) >= 0) {
+				result = 0;
+				goto exit;
+			}
+
+			block_iter_copy_from(it, &next);
+		}
+
+	exit:
+		reftable_free(slice_yield(&key));
+		reftable_free(slice_yield(&next.last_key));
+		record_clear(rec);
+		reftable_free(record_yield(&rec));
+
+		return result;
+	}
+}
+
+void block_writer_reset(struct reftable_block_writer *bw)
+{
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+void block_writer_clear(struct reftable_block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	reftable_free(slice_yield(&bw->last_key));
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 00000000000..336d621ceb2
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,76 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+struct reftable_block_writer {
+	byte *buf;
+	uint32_t block_size;
+	uint32_t header_off;
+	int restart_interval;
+	int hash_size;
+
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+	struct slice last_key;
+	int entries;
+};
+
+void block_writer_init(struct reftable_block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+byte block_writer_type(struct reftable_block_writer *bw);
+int block_writer_add(struct reftable_block_writer *w, struct record rec);
+int block_writer_finish(struct reftable_block_writer *w);
+void block_writer_reset(struct reftable_block_writer *bw);
+void block_writer_clear(struct reftable_block_writer *bw);
+
+struct reftable_block_reader {
+	uint32_t header_off;
+	struct reftable_block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint32_t full_block_size;
+	uint16_t restart_count;
+};
+
+struct reftable_block_iter {
+	uint32_t next_off;
+	struct reftable_block_reader *br;
+	struct slice last_key;
+};
+
+int block_reader_init(struct reftable_block_reader *br,
+		      struct reftable_block *bl, uint32_t header_off,
+		      uint32_t table_block_size, int hash_size);
+void block_reader_start(struct reftable_block_reader *br,
+			struct reftable_block_iter *it);
+int block_reader_seek(struct reftable_block_reader *br,
+		      struct reftable_block_iter *it, struct slice want);
+byte block_reader_type(struct reftable_block_reader *r);
+int block_reader_first_key(struct reftable_block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct reftable_block_iter *dest,
+			  struct reftable_block_iter *src);
+int block_iter_next(struct reftable_block_iter *it, struct record rec);
+int block_iter_seek(struct reftable_block_iter *it, struct slice want);
+void block_iter_close(struct reftable_block_iter *it);
+
+int header_size(int version);
+int footer_size(int version);
+
+#endif
diff --git a/reftable/blocksource.h b/reftable/blocksource.h
new file mode 100644
index 00000000000..1fdf0bf4557
--- /dev/null
+++ b/reftable/blocksource.h
@@ -0,0 +1,22 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCKSOURCE_H
+#define BLOCKSOURCE_H
+
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source source);
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_return_block(struct reftable_block_source source,
+			       struct reftable_block *ret);
+void block_source_close(struct reftable_block_source source);
+
+#endif
diff --git a/reftable/bytes.c b/reftable/bytes.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/reftable/config.h b/reftable/config.h
new file mode 100644
index 00000000000..40a8c178f10
--- /dev/null
+++ b/reftable/config.h
@@ -0,0 +1 @@
+/* empty */
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 00000000000..5eee72c4c11
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,21 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 00000000000..c0065792a4f
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "reftable.h"
+
+static int dump_table(const char *tablename)
+{
+	struct block_source src = { 0 };
+	int err = block_source_from_file(&src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	struct reader *r = NULL;
+	err = new_reader(&r, src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct iterator it = { 0 };
+		err = reader_seek_ref(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+
+		struct ref_record ref = { 0 };
+		while (1) {
+			err = iterator_next_ref(it, &ref);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			ref_record_print(&ref, 20);
+		}
+		iterator_destroy(&it);
+		ref_record_clear(&ref);
+	}
+
+	{
+		struct iterator it = { 0 };
+		err = reader_seek_log(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+		struct log_record log = { 0 };
+		while (1) {
+			err = iterator_next_log(it, &log);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			log_record_print(&log, 20);
+		}
+		iterator_destroy(&it);
+		log_record_clear(&log);
+	}
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int opt;
+	const char *table = NULL;
+	while ((opt = getopt(argc, argv, "t:")) != -1) {
+		switch (opt) {
+		case 't':
+			table = xstrdup(optarg);
+			break;
+		case '?':
+			printf("usage: %s [-table tablefile]\n", argv[0]);
+			return 2;
+			break;
+		}
+	}
+
+	if (table != NULL) {
+		int err = dump_table(table);
+		if (err < 0) {
+			fprintf(stderr, "%s: %s: %s\n", argv[0], table,
+				error_str(err));
+			return 1;
+		}
+	}
+	return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 00000000000..82659960b30
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,98 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	reftable_free(b);
+}
+
+static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = reftable_malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size) {
+		return -1;
+	}
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int reftable_block_source_from_file(struct reftable_block_source *bs,
+				    const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0) {
+		return -1;
+	}
+
+	{
+		struct file_block_source *p =
+			reftable_calloc(sizeof(struct file_block_source));
+		p->size = st.st_size;
+		p->fd = fd;
+
+		bs->ops = &file_vtable;
+		bs->arg = p;
+	}
+	return 0;
+}
+
+int reftable_fd_write(void *arg, byte *data, int sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 00000000000..ed3bb5ea486
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,234 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct reftable_iterator it)
+{
+	return it.ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct record rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct reftable_iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct reftable_iterator *it)
+{
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct reftable_iterator it, struct record rec)
+{
+	return it.ops->next(it.iter_arg, rec);
+}
+
+void reftable_iterator_destroy(struct reftable_iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int reftable_iterator_next_ref(struct reftable_iterator it,
+			       struct reftable_ref_record *ref)
+{
+	struct record rec = { 0 };
+	record_from_ref(&rec, ref);
+	return iterator_next(it, rec);
+}
+
+int reftable_iterator_next_log(struct reftable_iterator it,
+			       struct reftable_log_record *log)
+{
+	struct record rec = { 0 };
+	record_from_log(&rec, log);
+	return iterator_next(it, rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	reftable_free(slice_yield(&fri->oid));
+	reftable_iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec.data;
+
+	while (true) {
+		int err = reftable_iterator_next_ref(fri->it, ref);
+		if (err != 0) {
+			return err;
+		}
+
+		if (fri->double_check) {
+			struct reftable_iterator it = { 0 };
+
+			int err = reftable_reader_seek_ref(fri->r, &it,
+							   ref->ref_name);
+			if (err == 0) {
+				err = reftable_iterator_next_ref(it, ref);
+			}
+
+			reftable_iterator_destroy(&it);
+
+			if (err < 0) {
+				return err;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+}
+
+struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reader_return_block(it->r, &it->block_reader.block);
+	reftable_free(slice_yield(&it->oid));
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reader_return_block(it->r, &it->block_reader.block);
+
+	{
+		uint64_t off = it->offsets[it->offset_idx++];
+		int err = reader_init_block_reader(it->r, &it->block_reader,
+						   off, BLOCK_TYPE_REF);
+		if (err < 0) {
+			return err;
+		}
+		if (err > 0) {
+			/* indexed block does not exist. */
+			return FORMAT_ERROR;
+		}
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct record rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec.data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter *itr =
+		reftable_calloc(sizeof(struct indexed_table_ref_iter));
+	int err = 0;
+
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		reftable_free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 00000000000..81ec0e14d0b
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,56 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct reftable_iterator_vtable {
+	int (*next)(void *iter_arg, struct record rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct reftable_iterator *it);
+int iterator_next(struct reftable_iterator it, struct record rec);
+bool iterator_is_null(struct reftable_iterator it);
+
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reftable_reader *r;
+	struct slice oid;
+	struct reftable_iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
+					  struct filtering_ref_iterator *);
+
+struct indexed_table_ref_iter {
+	struct reftable_reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct reftable_block_reader block_reader;
+	struct reftable_block_iter cur;
+	bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 00000000000..5e4558b8920
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,299 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct record rec = new_record(mi->typ);
+		int err = iterator_next(mi->stack[i], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[i]);
+			record_clear(rec);
+			reftable_free(record_yield(&rec));
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->stack[i]);
+	}
+	reftable_free(mi->stack);
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, int idx)
+{
+	if (iterator_is_null(mi->stack[idx])) {
+		return 0;
+	}
+
+	{
+		struct record rec = new_record(mi->typ);
+		struct pq_entry e = {
+			.rec = rec,
+			.index = idx,
+		};
+		int err = iterator_next(mi->stack[idx], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[idx]);
+			record_clear(rec);
+			reftable_free(record_yield(&rec));
+			return 0;
+		}
+
+		merged_iter_pqueue_add(&mi->pq, e);
+	}
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct record rec)
+{
+	struct slice entry_key = { 0 };
+	struct pq_entry entry = merged_iter_pqueue_remove(&mi->pq);
+	int err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0) {
+		return err;
+	}
+
+	record_key(entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = { 0 };
+		int err = 0, cmp = 0;
+
+		record_key(top.rec, &k);
+
+		cmp = slice_compare(k, entry_key);
+		reftable_free(slice_yield(&k));
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		record_clear(top.rec);
+		reftable_free(record_yield(&top.rec));
+	}
+
+	record_copy_from(rec, entry.rec, hash_size(mi->hash_id));
+	record_clear(entry.rec);
+	reftable_free(record_yield(&entry.rec));
+	reftable_free(slice_yield(&entry_key));
+	return 0;
+}
+
+static int merged_iter_next_void(void *p, struct record rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	return merged_iter_next(mi, rec);
+}
+
+struct reftable_iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct reftable_iterator *it,
+				      struct merged_iter *mi)
+{
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id)
+{
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reftable_reader *r = stack[i];
+		if (r->hash_id != hash_id) {
+			return FORMAT_ERROR;
+		}
+		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
+			return FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reftable_reader_min_update_index(r);
+		}
+
+		last_max = reftable_reader_max_update_index(r);
+	}
+
+	{
+		struct reftable_merged_table m = {
+			.stack = stack,
+			.stack_len = n,
+			.min = first_min,
+			.max = last_max,
+			.hash_id = hash_id,
+		};
+
+		*dest = reftable_calloc(sizeof(struct reftable_merged_table));
+		**dest = m;
+	}
+	return 0;
+}
+
+void reftable_merged_table_close(struct reftable_merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reftable_reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct reftable_merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void reftable_merged_table_free(struct reftable_merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	reftable_free(mt);
+}
+
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
+{
+	return mt->min;
+}
+
+static int merged_table_seek_record(struct reftable_merged_table *mt,
+				    struct reftable_iterator *it,
+				    struct record rec)
+{
+	struct reftable_iterator *iters = reftable_calloc(
+		sizeof(struct reftable_iterator) * mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = record_type(rec),
+		.hash_id = mt->hash_id,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			reftable_iterator_destroy(&iters[i]);
+		}
+		reftable_free(iters);
+		return err;
+	}
+
+	merged.stack_len = n, err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	}
+
+	{
+		struct merged_iter *p =
+			reftable_malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 00000000000..16ca3c7d901
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct reftable_merged_table {
+	struct reftable_reader **stack;
+	int stack_len;
+	uint32_t hash_id;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	int stack_len;
+	byte typ;
+	struct merged_iter_pqueue pq;
+};
+
+void merged_table_clear(struct reftable_merged_table *mt);
+
+#endif
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 00000000000..210070ff4cd
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,115 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = { 0 };
+	struct slice bk = { 0 };
+	int cmp = 0;
+	record_key(a.rec, &ak);
+	record_key(b.rec, &bk);
+
+	cmp = slice_compare(ak, bk);
+
+	reftable_free(slice_yield(&ak));
+	reftable_free(slice_yield(&bk));
+
+	if (cmp == 0) {
+		return a.index > b.index;
+	}
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = reftable_realloc(pq->heap,
+					    pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		record_clear(pq->heap[i].rec);
+		reftable_free(record_yield(&pq->heap[i].rec));
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 00000000000..b585a52ee14
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 00000000000..4060bd139c4
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,756 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct reftable_block_source source)
+{
+	return source.ops->size(source.arg);
+}
+
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	int result = source.ops->read_block(source.arg, dest, off, size);
+	dest->source = source;
+	return result;
+}
+
+void block_source_return_block(struct reftable_block_source source,
+			       struct reftable_block *blockp)
+{
+	source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct reftable_block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reftable_reader_offsets *
+reader_offsets_for(struct reftable_reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reftable_reader *r,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size) {
+		return 0;
+	}
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(r->source, dest, off, sz);
+}
+
+void reader_return_block(struct reftable_reader *r, struct reftable_block *p)
+{
+	block_source_return_block(r->source, p);
+}
+
+uint32_t reftable_reader_hash_id(struct reftable_reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reftable_reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reftable_reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	int err = 0;
+	if (memcmp(f, "REFT", 4)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, header_size(r->version))) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+
+	f++;
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	if (r->version == 1) {
+		r->hash_id = SHA1_ID;
+	} else {
+		r->hash_id = get_be32(f);
+		switch (r->hash_id) {
+		case SHA1_ID:
+			break;
+		case SHA256_ID:
+			break;
+		default:
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+		f += 4;
+	}
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	{
+		uint32_t computed_crc = crc32(0, footer, f - footer);
+		uint32_t file_crc = get_be32(f);
+		f += 4;
+		if (computed_crc != file_crc) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	{
+		byte first_block_typ = header[header_size(r->version)];
+		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+		r->ref_offsets.offset = 0;
+		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+					  r->log_offsets.offset > 0);
+		r->obj_offsets.present = r->obj_offsets.offset > 0;
+	}
+	err = 0;
+exit:
+	return err;
+}
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
+		const char *name)
+{
+	struct reftable_block footer = { 0 };
+	struct reftable_block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reftable_reader));
+
+	/* Need +1 to read type of first block. */
+	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
+	if (err != header_size(2) + 1) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	if (memcmp(header.data, "REFT", 4)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+	r->version = header.data[4];
+	if (r->version != 1 && r->version != 2) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+
+	r->size = block_source_size(source) - footer_size(r->version);
+	r->source = source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size,
+				      footer_size(r->version));
+	if (err != footer_size(r->version)) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+exit:
+	block_source_return_block(r->source, &footer);
+	block_source_return_block(r->source, &header);
+	return err;
+}
+
+struct table_iter {
+	struct reftable_reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct reftable_block_iter bi;
+	bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
+		((struct reftable_ref_record *)rec.data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reader_return_block(ti->r, &ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off,
+				  int version)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += header_size(version);
+	}
+
+	*typ = data[0];
+	if (is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reftable_reader *r,
+			     struct reftable_block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct reftable_block block = { 0 };
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : header_size(r->version);
+	int32_t block_size = 0;
+
+	if (next_off >= r->size) {
+		return 1;
+	}
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0) {
+		return err;
+	}
+
+	block_size = extract_block_size(block.data, &block_typ, next_off,
+					r->version);
+	if (block_size < 0) {
+		return block_size;
+	}
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reader_return_block(r, &block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reader_return_block(r, &block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct reftable_block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0) {
+		return err;
+	}
+
+	{
+		struct reftable_block_reader *brp =
+			reftable_malloc(sizeof(struct reftable_block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct record rec)
+{
+	if (record_type(rec) != ti->typ) {
+		return API_ERROR;
+	}
+
+	while (true) {
+		struct table_iter next = { 0 };
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct record rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct reftable_iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reftable_reader *r,
+				struct table_iter *ti, uint64_t off, byte typ)
+{
+	struct reftable_block_reader br = { 0 };
+	struct reftable_block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0) {
+		return err;
+	}
+
+	brp = reftable_malloc(sizeof(struct reftable_block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reftable_reader *r, struct table_iter *ti,
+			byte typ, bool index)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
+			      struct record want)
+{
+	struct record rec = new_record(record_type(want));
+	struct slice want_key = { 0 };
+	struct slice got_key = { 0 };
+	struct table_iter next = { 0 };
+	int err = -1;
+	record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0) {
+			goto exit;
+		}
+		{
+			int cmp = slice_compare(got_key, want_key);
+			if (cmp > 0) {
+				table_iter_block_done(&next);
+				break;
+			}
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0) {
+		goto exit;
+	}
+	err = 0;
+
+exit:
+	block_iter_close(&next.bi);
+	record_clear(rec);
+	reftable_free(record_yield(&rec));
+	reftable_free(slice_yield(&want_key));
+	reftable_free(slice_yield(&got_key));
+	return err;
+}
+
+static int reader_seek_indexed(struct reftable_reader *r,
+			       struct reftable_iterator *it, struct record rec)
+{
+	struct index_record want_index = { 0 };
+	struct record want_index_rec = { 0 };
+	struct index_record index_result = { 0 };
+	struct record index_result_rec = { 0 };
+	struct table_iter index_iter = { 0 };
+	struct table_iter next = { 0 };
+	int err = 0;
+
+	record_key(rec, &want_index.last_key);
+	record_from_index(&want_index_rec, &want_index);
+	record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, record_type(rec), true);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reader_seek_linear(r, &index_iter, want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (next.typ == record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter *malloced =
+			reftable_calloc(sizeof(struct table_iter));
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+exit:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	record_clear(want_index_rec);
+	record_clear(index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reftable_reader *r,
+				struct reftable_iterator *it, struct record rec)
+{
+	struct reftable_reader_offsets *offs =
+		reader_offsets_for(r, record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = { 0 };
+	int err = 0;
+	if (idx > 0) {
+		return reader_seek_indexed(r, it, rec);
+	}
+
+	err = reader_start(r, &ti, record_type(rec), false);
+	if (err < 0) {
+		return err;
+	}
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct table_iter *p =
+			reftable_malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct record rec)
+{
+	byte typ = record_type(rec);
+
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return reader_seek(r, it, rec);
+}
+
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return reader_seek(r, it, rec);
+}
+
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reftable_reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int reftable_new_reader(struct reftable_reader **p,
+			struct reftable_block_source src, char const *name)
+{
+	struct reftable_reader *rd =
+		reftable_calloc(sizeof(struct reftable_reader));
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		reftable_free(rd);
+	}
+	return err;
+}
+
+void reftable_reader_free(struct reftable_reader *r)
+{
+	reader_close(r);
+	reftable_free(r);
+}
+
+static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
+					    struct reftable_iterator *it,
+					    byte *oid)
+{
+	struct obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct record want_rec = { 0 };
+	struct reftable_iterator oit = { 0 };
+	struct obj_record got = { 0 };
+	struct record got_rec = { 0 };
+	int err = 0;
+
+	record_from_obj(&want_rec, &want);
+
+	err = reader_seek(r, &oit, want_rec);
+	if (err != 0) {
+		return err;
+	}
+
+	record_from_obj(&got_rec, &got);
+	err = iterator_next(oit, got_rec);
+	reftable_iterator_destroy(&oit);
+	if (err < 0) {
+		return err;
+	}
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	{
+		struct indexed_table_ref_iter *itr = NULL;
+		err = new_indexed_table_ref_iter(&itr, r, oid,
+						 hash_size(r->hash_id),
+						 got.offsets, got.offset_len);
+		if (err < 0) {
+			record_clear(got_rec);
+			return err;
+		}
+		got.offsets = NULL;
+		record_clear(got_rec);
+
+		iterator_from_indexed_table_ref_iter(it, itr);
+	}
+
+	return 0;
+}
+
+static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
+					      struct reftable_iterator *it,
+					      byte *oid, int oid_len)
+{
+	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
+	struct filtering_ref_iterator *filter = NULL;
+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		reftable_free(ti);
+		return err;
+	}
+
+	filter = reftable_calloc(sizeof(struct filtering_ref_iterator));
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	filter->r = r;
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, byte *oid,
+			     int oid_len)
+{
+	if (r->obj_offsets.present) {
+		return reftable_reader_refs_for_indexed(r, it, oid);
+	}
+	return reftable_reader_refs_for_unindexed(r, it, oid, oid_len);
+}
+
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 00000000000..f878867ea71
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source source);
+
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_return_block(struct reftable_block_source source,
+			       struct reftable_block *ret);
+void block_source_close(struct reftable_block_source *source);
+
+struct reftable_reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+struct reftable_reader {
+	char *name;
+	struct reftable_block_source source;
+	uint32_t hash_id;
+
+	// Size of the file, excluding the footer.
+	uint64_t size;
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	int object_id_len;
+	int version;
+
+	struct reftable_reader_offsets ref_offsets;
+	struct reftable_reader_offsets obj_offsets;
+	struct reftable_reader_offsets log_offsets;
+};
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
+		const char *name);
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct record rec);
+void reader_close(struct reftable_reader *r);
+const char *reader_name(struct reftable_reader *r);
+void reader_return_block(struct reftable_reader *r, struct reftable_block *p);
+int reader_init_block_reader(struct reftable_reader *r,
+			     struct reftable_block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 00000000000..a33beadc688
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1098 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+/* record.c - methods for different types of records. */
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0) {
+		return -1;
+	}
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = { 0 };
+	int i = 9;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	{
+		int n = sizeof(buf) - i - 1;
+		if (dest.len < n) {
+			return -1;
+		}
+		memcpy(dest.buf, &buf[i + 1], n);
+		return n;
+	}
+}
+
+int is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+	if (in.len < tsize) {
+		return -1;
+	}
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	slice_consume(&in, tsize);
+
+	return start_len - in.len;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+	if (s.len < l) {
+		return -1;
+	}
+	memcpy(s.buf, str, l);
+	slice_consume(&s, l);
+
+	return start.len - s.len;
+}
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	if (dest.len < suffix_len) {
+		return -1;
+	}
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	slice_consume(&dest, suffix_len);
+
+	return start.len - dest.len;
+}
+
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	if (prefix_len > last_key.len) {
+		return -1;
+	}
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len) {
+		return -1;
+	}
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	slice_consume(&in, suffix_len);
+
+	return start_len - in.len;
+}
+
+static byte reftable_ref_record_type(void)
+{
+	return BLOCK_TYPE_REF;
+}
+
+static void reftable_ref_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_ref_record *rec =
+		(const struct reftable_ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
+	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	reftable_ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = xstrdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = reftable_malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = reftable_malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9) {
+		return '0' + c;
+	}
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void reftable_ref_record_print(struct reftable_ref_record *ref, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size);
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size);
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void reftable_ref_record_clear_void(void *rec)
+{
+	reftable_ref_record_clear((struct reftable_ref_record *)rec);
+}
+
+void reftable_ref_record_clear(struct reftable_ref_record *ref)
+{
+	reftable_free(ref->ref_name);
+	reftable_free(ref->target);
+	reftable_free(ref->target_value);
+	reftable_free(ref->value);
+	memset(ref, 0, sizeof(struct reftable_ref_record));
+}
+
+static byte reftable_ref_record_val_type(const void *rec)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL) {
+		return 3;
+	}
+	return 0;
+}
+
+static int reftable_ref_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+
+	return start.len - s.len;
+}
+
+static int reftable_ref_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0) {
+		return n;
+	}
+	assert(hash_size > 0);
+
+	slice_consume(&in, n);
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = reftable_malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = reftable_malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		break;
+	case 3: {
+		struct slice dest = { 0 };
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+		seen_target = true;
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+struct record_vtable reftable_ref_record_vtable = {
+	.key = &reftable_ref_record_key,
+	.type = &reftable_ref_record_type,
+	.copy_from = &reftable_ref_record_copy_from,
+	.val_type = &reftable_ref_record_val_type,
+	.encode = &reftable_ref_record_encode,
+	.decode = &reftable_ref_record_decode,
+	.clear = &reftable_ref_record_clear_void,
+};
+
+static byte obj_record_type(void)
+{
+	return BLOCK_TYPE_OBJ;
+}
+
+static void obj_record_key(const void *r, struct slice *dest)
+{
+	const struct obj_record *rec = (const struct obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	const struct obj_record *src = (const struct obj_record *)src_rec;
+
+	*ref = *src;
+	ref->hash_prefix = reftable_malloc(ref->hash_prefix_len);
+	memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
+
+	{
+		int olen = ref->offset_len * sizeof(uint64_t);
+		ref->offsets = reftable_malloc(olen);
+		memcpy(ref->offsets, src->offsets, olen);
+	}
+}
+
+static void obj_record_clear(void *rec)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	FREE_AND_NULL(ref->hash_prefix);
+	FREE_AND_NULL(ref->offsets);
+	memset(ref, 0, sizeof(struct obj_record));
+}
+
+static byte obj_record_val_type(const void *rec)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8) {
+		return r->offset_len;
+	}
+	return 0;
+}
+
+static int obj_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+	if (r->offset_len == 0) {
+		return start.len - s.len;
+	}
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	{
+		uint64_t last = r->offsets[0];
+		int i = 0;
+		for (i = 1; i < r->offset_len; i++) {
+			int n = put_var_int(s, r->offsets[i] - last);
+			if (n < 0) {
+				return -1;
+			}
+			slice_consume(&s, n);
+			last = r->offsets[i];
+		}
+	}
+	return start.len - s.len;
+}
+
+static int obj_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct obj_record *r = (struct obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	r->hash_prefix = reftable_malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		slice_consume(&in, n);
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0) {
+		return start.len - in.len;
+	}
+
+	r->offsets = reftable_malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0) {
+		return n;
+	}
+	slice_consume(&in, n);
+
+	{
+		uint64_t last = r->offsets[0];
+		int j = 1;
+		while (j < count) {
+			uint64_t delta = 0;
+			int n = get_var_int(&delta, in);
+			if (n < 0) {
+				return n;
+			}
+			slice_consume(&in, n);
+
+			last = r->offsets[j] = (delta + last);
+			j++;
+		}
+	}
+	return start.len - in.len;
+}
+
+struct record_vtable obj_record_vtable = {
+	.key = &obj_record_key,
+	.type = &obj_record_type,
+	.copy_from = &obj_record_copy_from,
+	.val_type = &obj_record_val_type,
+	.encode = &obj_record_encode,
+	.decode = &obj_record_decode,
+	.clear = &obj_record_clear,
+};
+
+void reftable_log_record_print(struct reftable_log_record *log, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size);
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size);
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static byte reftable_log_record_type(void)
+{
+	return BLOCK_TYPE_LOG;
+}
+
+static void reftable_log_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_log_record *rec =
+		(const struct reftable_log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_be64(dest->buf + 1 + len, ts);
+}
+
+static void reftable_log_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
+	const struct reftable_log_record *src =
+		(const struct reftable_log_record *)src_rec;
+
+	*dst = *src;
+	dst->ref_name = xstrdup(dst->ref_name);
+	dst->email = xstrdup(dst->email);
+	dst->name = xstrdup(dst->name);
+	dst->message = xstrdup(dst->message);
+	if (dst->new_hash != NULL) {
+		dst->new_hash = reftable_malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = reftable_malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void reftable_log_record_clear_void(void *rec)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	reftable_log_record_clear(r);
+}
+
+void reftable_log_record_clear(struct reftable_log_record *r)
+{
+	reftable_free(r->ref_name);
+	reftable_free(r->new_hash);
+	reftable_free(r->old_hash);
+	reftable_free(r->name);
+	reftable_free(r->email);
+	reftable_free(r->message);
+	memset(r, 0, sizeof(struct reftable_log_record));
+}
+
+static byte reftable_log_record_val_type(const void *rec)
+{
+	const struct reftable_log_record *log =
+		(const struct reftable_log_record *)rec;
+
+	return reftable_log_record_is_deletion(log) ? 0 : 1;
+}
+
+static byte zero[SHA256_SIZE] = { 0 };
+
+static int reftable_log_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (reftable_log_record_is_deletion(r)) {
+		return 0;
+	}
+
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size) {
+		return -1;
+	}
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	slice_consume(&s, 2 * hash_size);
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = put_var_int(s, r->time);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (s.len < 2) {
+		return -1;
+	}
+
+	put_be16(s.buf, r->tz_offset);
+	slice_consume(&s, 2);
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	return start.len - s.len;
+}
+
+static int reftable_log_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = { 0 };
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+		return FORMAT_ERROR;
+	}
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (val_type == 0) {
+		return 0;
+	}
+
+	if (in.len < 2 * hash_size) {
+		return FORMAT_ERROR;
+	}
+
+	r->old_hash = reftable_realloc(r->old_hash, hash_size);
+	r->new_hash = reftable_realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	slice_consume(&in, 2 * hash_size);
+
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->name = reftable_realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->email = reftable_realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+	r->time = ts;
+	if (in.len < 2) {
+		goto error;
+	}
+
+	r->tz_offset = get_be16(in.buf);
+	slice_consume(&in, 2);
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->message = reftable_realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	return start.len - in.len;
+
+error:
+	reftable_free(slice_yield(&dest));
+	return FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL) {
+		a = empty;
+	}
+	if (b == NULL) {
+		b = empty;
+	}
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL) {
+		a = zero;
+	}
+	if (b == NULL) {
+		b = zero;
+	}
+	return !memcmp(a, b, sz);
+}
+
+bool reftable_log_record_equal(struct reftable_log_record *a,
+			       struct reftable_log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+struct record_vtable reftable_log_record_vtable = {
+	.key = &reftable_log_record_key,
+	.type = &reftable_log_record_type,
+	.copy_from = &reftable_log_record_copy_from,
+	.val_type = &reftable_log_record_val_type,
+	.encode = &reftable_log_record_encode,
+	.decode = &reftable_log_record_decode,
+	.clear = &reftable_log_record_clear_void,
+};
+
+struct record new_record(byte typ)
+{
+	struct record rec;
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct reftable_ref_record *r =
+			reftable_calloc(sizeof(struct reftable_ref_record));
+		record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct obj_record *r =
+			reftable_calloc(sizeof(struct obj_record));
+		record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct reftable_log_record *r =
+			reftable_calloc(sizeof(struct reftable_log_record));
+		record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct index_record *r =
+			reftable_calloc(sizeof(struct index_record));
+		record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+static byte index_record_type(void)
+{
+	return BLOCK_TYPE_INDEX;
+}
+
+static void index_record_key(const void *r, struct slice *dest)
+{
+	struct index_record *rec = (struct index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void index_record_copy_from(void *rec, const void *src_rec,
+				   int hash_size)
+{
+	struct index_record *dst = (struct index_record *)rec;
+	struct index_record *src = (struct index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void index_record_clear(void *rec)
+{
+	struct index_record *idx = (struct index_record *)rec;
+	reftable_free(slice_yield(&idx->last_key));
+}
+
+static byte index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int index_record_encode(const void *rec, struct slice out, int hash_size)
+{
+	const struct index_record *r = (const struct index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&out, n);
+
+	return start.len - out.len;
+}
+
+static int index_record_decode(void *rec, struct slice key, byte val_type,
+			       struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct index_record *r = (struct index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&in, n);
+	return start.len - in.len;
+}
+
+struct record_vtable index_record_vtable = {
+	.key = &index_record_key,
+	.type = &index_record_type,
+	.copy_from = &index_record_copy_from,
+	.val_type = &index_record_val_type,
+	.encode = &index_record_encode,
+	.decode = &index_record_decode,
+	.clear = &index_record_clear,
+};
+
+void record_key(struct record rec, struct slice *dest)
+{
+	rec.ops->key(rec.data, dest);
+}
+
+byte record_type(struct record rec)
+{
+	return rec.ops->type();
+}
+
+int record_encode(struct record rec, struct slice dest, int hash_size)
+{
+	return rec.ops->encode(rec.data, dest, hash_size);
+}
+
+void record_copy_from(struct record rec, struct record src, int hash_size)
+{
+	assert(src.ops->type() == rec.ops->type());
+
+	rec.ops->copy_from(rec.data, src.data, hash_size);
+}
+
+byte record_val_type(struct record rec)
+{
+	return rec.ops->val_type(rec.data);
+}
+
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size)
+{
+	return rec.ops->decode(rec.data, key, extra, src, hash_size);
+}
+
+void record_clear(struct record rec)
+{
+	return rec.ops->clear(rec.data);
+}
+
+void record_from_ref(struct record *rec, struct reftable_ref_record *ref_rec)
+{
+	rec->data = ref_rec;
+	rec->ops = &reftable_ref_record_vtable;
+}
+
+void record_from_obj(struct record *rec, struct obj_record *obj_rec)
+{
+	rec->data = obj_rec;
+	rec->ops = &obj_record_vtable;
+}
+
+void record_from_index(struct record *rec, struct index_record *index_rec)
+{
+	rec->data = index_rec;
+	rec->ops = &index_record_vtable;
+}
+
+void record_from_log(struct record *rec, struct reftable_log_record *log_rec)
+{
+	rec->data = log_rec;
+	rec->ops = &reftable_log_record_vtable;
+}
+
+void *record_yield(struct record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+struct reftable_ref_record *record_as_ref(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_REF);
+	return (struct reftable_ref_record *)rec.data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL) {
+		return !memcmp(a, b, hash_size);
+	}
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL) {
+		return 0 == strcmp(a, b);
+	}
+
+	return a == b;
+}
+
+bool reftable_ref_record_equal(struct reftable_ref_record *a,
+			       struct reftable_ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int reftable_ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_ref_record *)a)->ref_name,
+		      ((struct reftable_ref_record *)b)->ref_name);
+}
+
+bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int reftable_log_record_compare_key(const void *a, const void *b)
+{
+	struct reftable_log_record *la = (struct reftable_log_record *)a;
+	struct reftable_log_record *lb = (struct reftable_log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp) {
+		return cmp;
+	}
+	if (la->update_index > lb->update_index) {
+		return -1;
+	}
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
+{
+	return (log->new_hash == NULL && log->old_hash == NULL &&
+		log->name == NULL && log->email == NULL &&
+		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
+		log->message == NULL);
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 00000000000..25ac7e0dd0f
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,106 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+/* utilities for de/encoding varints */
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+
+/* Methods for records. */
+struct record_vtable {
+	/* encode the key of to a byte slice. */
+	void (*key)(const void *rec, struct slice *dest);
+
+	/* The record type of ('r' for ref). */
+	byte (*type)(void);
+
+	void (*copy_from)(void *dest, const void *src, int hash_size);
+
+	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
+	 * vs ref deletion) */
+	byte (*val_type)(const void *rec);
+
+	/* encodes rec into dest, returning how much space was used. */
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+
+	/* decode data from `src` into the record. */
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+
+	/* deallocate and null the record. */
+	void (*clear)(void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct record {
+	void *data;
+	struct record_vtable *ops;
+};
+
+int is_block_type(byte typ);
+
+struct record new_record(byte typ);
+
+extern struct record_vtable reftable_ref_record_vtable;
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra);
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in);
+
+/* index_record are used internally to speed up lookups. */
+struct index_record {
+	uint64_t offset; /* Offset of block */
+	struct slice last_key; /* Last key of the block. */
+};
+
+/* obj_record stores an object ID => ref mapping. */
+struct obj_record {
+	byte *hash_prefix; /* leading bytes of the object ID */
+	int hash_prefix_len; /* number of leading bytes. Constant
+			      * across a single table. */
+	uint64_t *offsets; /* a vector of file offsets. */
+	int offset_len;
+};
+
+/* see struct record_vtable */
+
+void record_key(struct record rec, struct slice *dest);
+byte record_type(struct record rec);
+void record_copy_from(struct record rec, struct record src, int hash_size);
+byte record_val_type(struct record rec);
+int record_encode(struct record rec, struct slice dest, int hash_size);
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size);
+void record_clear(struct record rec);
+
+/* clear out the record, yielding the record data that was encapsulated. */
+void *record_yield(struct record *rec);
+
+/* initialize generic records from concrete records. The generic record should
+ * be zeroed out. */
+
+void record_from_obj(struct record *rec, struct obj_record *objrec);
+void record_from_index(struct record *rec, struct index_record *idxrec);
+void record_from_ref(struct record *rec, struct reftable_ref_record *refrec);
+void record_from_log(struct record *rec, struct reftable_log_record *logrec);
+struct reftable_ref_record *record_as_ref(struct record ref);
+
+/* for qsort. */
+int reftable_ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int reftable_log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 00000000000..48722428b56
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,520 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *));
+
+/****************************************************************
+ Basic data types
+
+ Reftables store the state of each ref in struct reftable_ref_record, and they
+ store a sequence of reflog updates in struct reftable_log_record.
+ ****************************************************************/
+
+/* reftable_ref_record holds a ref database entry target_value */
+struct reftable_ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
+
+/* prints a reftable_ref_record onto stdout */
+void reftable_ref_record_print(struct reftable_ref_record *ref, int hash_size);
+
+/* frees and nulls all pointer values. */
+void reftable_ref_record_clear(struct reftable_ref_record *ref);
+
+/* returns whether two reftable_ref_records are the same */
+int reftable_ref_record_equal(struct reftable_ref_record *a,
+			      struct reftable_ref_record *b, int hash_size);
+
+/* reftable_log_record holds a reflog entry */
+struct reftable_log_record {
+	char *ref_name;
+	uint64_t update_index; /* logical timestamp of a transactional update.
+				*/
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int reftable_log_record_is_deletion(const struct reftable_log_record *log);
+
+/* frees and nulls all pointer values. */
+void reftable_log_record_clear(struct reftable_log_record *log);
+
+/* returns whether two records are equal. */
+int reftable_log_record_equal(struct reftable_log_record *a,
+			      struct reftable_log_record *b, int hash_size);
+
+/* dumps a reftable_log_record on stdout, for debugging/testing. */
+void reftable_log_record_print(struct reftable_log_record *log, int hash_size);
+
+/****************************************************************
+ Error handling
+
+ Error are signaled with negative integer return values. 0 means success.
+ ****************************************************************/
+
+/* different types of errors */
+enum reftable_error {
+	/* Unexpected file system behavior */
+	IO_ERROR = -2,
+
+	/* Format inconsistency on reading data
+	 */
+	FORMAT_ERROR = -3,
+
+	/* File does not exist. Returned from block_source_from_file(),  because
+	   it needs special handling in stack.
+	*/
+	NOT_EXIST_ERROR = -4,
+
+	/* Trying to write out-of-date data. */
+	LOCK_ERROR = -5,
+
+	/* Misuse of the API:
+	   - on writing a record with NULL ref_name.
+	   - on writing a reftable_ref_record outside the table limits
+	   - on writing a ref or log record before the stack's next_update_index
+	   - on reading a reftable_ref_record from log iterator, or vice versa.
+	*/
+	API_ERROR = -6,
+
+	/* Decompression error */
+	ZLIB_ERROR = -7,
+
+	/* Wrote a table without blocks. */
+	EMPTY_TABLE_ERROR = -8,
+};
+
+/* convert the numeric error code to a string. The string should not be
+ * deallocated. */
+const char *reftable_error_str(int err);
+
+/****************************************************************
+ Writing
+
+ Writing single reftables
+ ****************************************************************/
+
+/* reftable_write_options sets options for writing a single reftable. */
+struct reftable_write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+};
+
+/* reftable_block_stats holds statistics for a single block type */
+struct reftable_block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct reftable_stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct reftable_block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct reftable_block_stats obj_stats;
+	/* stats for index blocks */
+	struct reftable_block_stats idx_stats;
+	/* stats for log blocks */
+	struct reftable_block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* reftable_new_writer creates a new writer */
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, uint8_t *, int),
+		    void *writer_arg, struct reftable_write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int reftable_fd_write(void *fdp, uint8_t *data, int size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   reftable_stack_next_update_index(), or API_ERROR is returned.
+
+   For transactional updates, typically min==max. When converting an existing
+   ref database into a single reftable, this would be a range of update-index
+   timestamps.
+ */
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max);
+
+/* adds a reftable_ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   reftable_writer_set_limits(), or API_ERROR is returned.
+
+   It is an error to write a ref record after a log record.
+ */
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n);
+
+/* adds a reftable_log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n);
+
+/* reftable_writer_close finalizes the reftable. The writer is retained so
+ * statistics can be inspected. */
+int reftable_writer_close(struct reftable_writer *w);
+
+/* writer_stats returns the statistics on the reftable being written.
+
+   This struct becomes invalid when the writer is freed.
+ */
+const struct reftable_stats *writer_stats(struct reftable_writer *w);
+
+/* reftable_writer_free deallocates memory for the writer */
+void reftable_writer_free(struct reftable_writer *w);
+
+/****************************************************************
+ * ITERATING
+ ****************************************************************/
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct reftable_iterator {
+	struct reftable_iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_ref(struct reftable_iterator it,
+			       struct reftable_ref_record *ref);
+
+/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_log(struct reftable_iterator it,
+			       struct reftable_log_record *log);
+
+/* releases resources associated with an iterator. */
+void reftable_iterator_destroy(struct reftable_iterator *it);
+
+/****************************************************************
+ Reading single tables
+
+ The follow routines are for reading single files. For an application-level
+ interface, skip ahead to struct reftable_merged_table and struct
+ reftable_stack.
+ ****************************************************************/
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct reftable_block_source {
+	struct reftable_block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct reftable_block {
+	uint8_t *data;
+	int len;
+	struct reftable_block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct reftable_block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct reftable_block *dest,
+			  uint64_t off, uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct reftable_block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int reftable_block_source_from_file(struct reftable_block_source *block_src,
+				    const char *name);
+
+/* The reader struct is a handle to an open reftable file. */
+struct reftable_reader;
+
+/* reftable_new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp.  The name is used for creating a
+ * stack. Typically, it is the basename of the file.
+ */
+int reftable_new_reader(struct reftable_reader **pp,
+			struct reftable_block_source, const char *name);
+
+/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
+   in the table.  To seek to the start of the table, use name = "".
+
+   example:
+
+   struct reftable_reader *r = NULL;
+   int err = reftable_new_reader(&r, src, "filename");
+   if (err < 0) { ... }
+   struct reftable_iterator it  = {0};
+   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct reftable_ref_record ref  = {0};
+   while (1) {
+     err = reftable_iterator_next_ref(it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   reftable_iterator_destroy(&it);
+   reftable_ref_record_clear(&ref);
+ */
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reftable_reader_hash_id(struct reftable_reader *r);
+
+/* seek to logs for the given name, older than update_index. To seek to the
+   start of the table, use name = "".
+ */
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reftable_reader_free(struct reftable_reader *);
+
+/* return an iterator for the refs pointing to oid */
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid,
+			     int oid_len);
+
+/* return the max_update_index for a table */
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
+
+/****************************************************************
+ Merged tables
+
+ A ref database kept in a sequence of table files. The merged_table presents a
+ unified view to reading (seeking, iterating) a sequence of immutable tables.
+ ****************************************************************/
+
+/* A merged table is implements seeking/iterating over a stack of tables. */
+struct reftable_merged_table;
+
+/* reftable_new_merged_table creates a new merged table. It takes ownership of
+   the stack array.
+*/
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id);
+
+/* returns the hash id used in this merged table. */
+uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *mt);
+
+/* returns an iterator positioned just before 'name' */
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index);
+
+/* like reftable_merged_table_seek_log_at but look for the newest entry. */
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
+
+/* closes readers for the merged tables */
+void reftable_merged_table_close(struct reftable_merged_table *mt);
+
+/* releases memory for the merged_table */
+void reftable_merged_table_free(struct reftable_merged_table *m);
+
+/****************************************************************
+ Mutable ref database
+
+ The stack presents an interface to a mutable sequence of reftables.
+ ****************************************************************/
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct reftable_stack;
+
+/* open a new reftable stack. The tables along with the table list will be
+   stored in 'dir'. Typically, this should be .git/reftables.
+*/
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
+
+/* holds a transaction to add tables at the top of a stack. */
+struct reftable_addition;
+
+/*
+  returns a new transaction to add reftables to the given stack. As a side
+  effect, the ref database is locked.
+*/ 
+int reftable_stack_new_addition(struct reftable_addition **dest, struct reftable_stack *st);
+
+/* Adds a reftable to transaction. */ 
+int reftable_addition_add(struct reftable_addition *add,
+                          int (*write_table)(struct reftable_writer *wr, void *arg),
+                          void *arg);
+
+/* Commits the transaction, releasing the lock. */
+int reftable_addition_commit(struct reftable_addition *add);
+
+/* Release all non-committed data from the transaction; releases the lock if held. */
+void reftable_addition_close(struct reftable_addition *add);
+
+/* add a new table to the stack. The write_table function must call
+   reftable_writer_set_limits, add refs and return an error value. */
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write_table)(struct reftable_writer *wr,
+					  void *write_arg),
+		       void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st);
+
+/* frees all resources associated with the stack. */
+void reftable_stack_destroy(struct reftable_stack *st);
+
+/* reloads the stack if necessary. */
+int reftable_stack_reload(struct reftable_stack *st);
+
+/* Policy for expiring reflog entries. */
+struct reftable_log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int reftable_stack_auto_compact(struct reftable_stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log);
+
+/* statistics on past compactions. */
+struct reftable_compaction_stats {
+	uint64_t bytes; /* total number of bytes written */
+	int attempts; /* how often we tried to compact */
+	int failures; /* failures happen on concurrent updates */
+};
+
+/* return statistics for compaction up till now. */
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st);
+
+#endif
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 00000000000..7cb9a5e60d0
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,219 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+
+	{
+		int l = strlen(str);
+		l++; /* \0 */
+		slice_resize(s, l);
+		memcpy(s->buf, str, l);
+		s->len = l - 1;
+	}
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = reftable_realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_append_string(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_append(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+void slice_consume(struct slice *s, int n)
+{
+	s->buf += n;
+	s->len -= n;
+}
+
+byte *slice_yield(struct slice *s)
+{
+	byte *p = s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = { 0 };
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_yield(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	if (a.len != b.len) {
+		return 0;
+	}
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_compare(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	if (res != 0) {
+		return res;
+	}
+	if (a.len < b.len) {
+		return -1;
+	} else if (a.len > b.len) {
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+int slice_write(struct slice *b, byte *data, int sz)
+{
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = reftable_realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_write_void(void *b, byte *data, int sz)
+{
+	return slice_write((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = reftable_calloc(size);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf)
+{
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+struct reftable_block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct reftable_block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct reftable_block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 00000000000..f4c8b3a8e9f
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,42 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+struct slice {
+	byte *buf;
+	int len;
+	int cap;
+};
+
+void slice_set_string(struct slice *dest, const char *);
+void slice_append_string(struct slice *dest, const char *);
+char *slice_to_string(struct slice src);
+const char *slice_as_string(struct slice *src);
+bool slice_equal(struct slice a, struct slice b);
+byte *slice_yield(struct slice *s);
+void slice_copy(struct slice *dest, struct slice src);
+void slice_consume(struct slice *s, int n);
+void slice_resize(struct slice *s, int l);
+int slice_compare(struct slice a, struct slice b);
+int slice_write(struct slice *b, byte *data, int sz);
+int slice_write_void(void *b, byte *data, int sz);
+void slice_append(struct slice *dest, struct slice add);
+int common_prefix_size(struct slice a, struct slice b);
+
+struct reftable_block_source;
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf);
+
+struct reftable_block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 00000000000..83a7e92fe42
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1132 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "reftable.h"
+#include "writer.h"
+
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config)
+{
+	struct reftable_stack *p =
+		reftable_calloc(sizeof(struct reftable_stack));
+	struct slice list_file_name = {};
+	int err = 0;
+	*dest = NULL;
+
+	slice_set_string(&list_file_name, dir);
+	slice_append_string(&list_file_name, "/reftables.list");
+
+	p->list_file = slice_to_string(list_file_name);
+	reftable_free(slice_yield(&list_file_name));
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = reftable_stack_reload(p);
+	if (err < 0) {
+		reftable_stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fread_lines(FILE *f, char ***namesp)
+{
+	long size = 0;
+	int err = fseek(f, 0, SEEK_END);
+	char *buf = NULL;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	size = ftell(f);
+	if (size < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = fseek(f, 0, SEEK_SET);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	buf = reftable_malloc(size + 1);
+	if (fread(buf, 1, size, f) != size) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+exit:
+	reftable_free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	FILE *f = fopen(filename, "r");
+	int err = 0;
+	if (f == NULL) {
+		if (errno == ENOENT) {
+			*namesp = reftable_calloc(sizeof(char *));
+			return 0;
+		}
+
+		return IO_ERROR;
+	}
+	err = fread_lines(f, namesp);
+	fclose(f);
+	return err;
+}
+
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void reftable_stack_destroy(struct reftable_stack *st)
+{
+	if (st->merged == NULL) {
+		return;
+	}
+
+	reftable_merged_table_close(st->merged);
+	reftable_merged_table_free(st->merged);
+	st->merged = NULL;
+
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	reftable_free(st);
+}
+
+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
+						   int cur_len)
+{
+	struct reftable_reader **cur =
+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
+				      bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reftable_reader **new_tables =
+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
+	int new_tables_len = 0;
+	struct reftable_merged_table *new_merged = NULL;
+
+	struct slice table_path = { 0 };
+
+	while (*names) {
+		struct reftable_reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct reftable_block_source src = { 0 };
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_append_string(&table_path, "/");
+			slice_append_string(&table_path, name);
+
+			err = reftable_block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0) {
+				goto exit;
+			}
+
+			err = reftable_new_reader(&rd, src, name);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
+					st->config.hash_id);
+	if (err < 0) {
+		goto exit;
+	}
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		reftable_merged_table_free(st->merged);
+	}
+	st->merged = new_merged;
+
+	{
+		int i = 0;
+		for (i = 0; i < cur_len; i++) {
+			if (cur[i] != NULL) {
+				reader_close(cur[i]);
+				reftable_reader_free(cur[i]);
+			}
+		}
+	}
+exit:
+	reftable_free(slice_yield(&table_path));
+	{
+		int i = 0;
+		for (i = 0; i < new_tables_len; i++) {
+			reader_close(new_tables[i]);
+		}
+	}
+	reftable_free(new_tables);
+	reftable_free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0) {
+		return diff;
+	}
+
+	return udiff;
+}
+
+static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+					     bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = reftable_stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		err = read_lines(st->list_file, &names_after);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return -1;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 100;
+		usleep(delay);
+	}
+
+	return 0;
+}
+
+int reftable_stack_reload(struct reftable_stack *st)
+{
+	return reftable_stack_reload_maybe_reuse(st, true);
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct reftable_stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto exit;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto exit;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	free_names(names);
+	return err;
+}
+
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write)(struct reftable_writer *wr, void *arg),
+		       void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == LOCK_ERROR) {
+			err = reftable_stack_reload(st);
+		}
+		return err;
+	}
+
+	return reftable_stack_auto_compact(st);
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "%012" PRIx64 "-%012" PRIx64, min, max);
+	slice_set_string(dest, buf);
+}
+
+struct reftable_addition {
+	int lock_file_fd;
+	struct slice lock_file_name;
+	struct reftable_stack *stack;
+	char **names;
+	char **new_tables;
+	int new_tables_len;
+	uint64_t next_update_index;
+};
+
+static int reftable_stack_init_addition(struct reftable_addition *add,
+					struct reftable_stack *st)
+{
+	int err = 0;
+	add->stack = st;
+
+	slice_set_string(&add->lock_file_name, st->list_file);
+	slice_append_string(&add->lock_file_name, ".lock");
+
+	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (add->lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = LOCK_ERROR;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	err = stack_uptodate(st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 1) {
+		err = LOCK_ERROR;
+		goto exit;
+	}
+
+	add->next_update_index = reftable_stack_next_update_index(st);
+exit:
+	if (err) {
+		reftable_addition_close(add);
+	}
+	return err;
+}
+
+void reftable_addition_close(struct reftable_addition *add)
+{
+	int i = 0;
+	struct slice nm = {};
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_set_string(&nm, add->stack->list_file);
+		slice_append_string(&nm, "/");
+		slice_append_string(&nm, add->new_tables[i]);
+		unlink(slice_as_string(&nm));
+
+		reftable_free(add->new_tables[i]);
+		add->new_tables[i] = NULL;
+	}
+	reftable_free(add->new_tables);
+	add->new_tables = NULL;
+	add->new_tables_len = 0;
+
+	if (add->lock_file_fd > 0) {
+		close(add->lock_file_fd);
+		add->lock_file_fd = 0;
+	}
+	if (add->lock_file_name.len > 0) {
+		unlink(slice_as_string(&add->lock_file_name));
+		reftable_free(slice_yield(&add->lock_file_name));
+	}
+
+	free_names(add->names);
+	add->names = NULL;
+}
+
+int reftable_addition_commit(struct reftable_addition *add)
+{
+	struct slice table_list = { 0 };
+	int i = 0;
+	int err = 0;
+	if (add->new_tables_len == 0) {
+		goto exit;
+	}
+
+	for (i = 0; i < add->stack->merged->stack_len; i++) {
+		slice_append_string(&table_list,
+				    add->stack->merged->stack[i]->name);
+		slice_append_string(&table_list, "\n");
+	}
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_append_string(&table_list, add->new_tables[i]);
+		slice_append_string(&table_list, "\n");
+	}
+
+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	free(slice_yield(&table_list));
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = close(add->lock_file_fd);
+	add->lock_file_fd = 0;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&add->lock_file_name),
+		     add->stack->list_file);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = reftable_stack_reload(add->stack);
+
+exit:
+	reftable_addition_close(add);
+	return err;
+}
+
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st)
+{
+	int err = 0;
+	*dest = reftable_malloc(sizeof(**dest));
+	err = reftable_stack_init_addition(*dest, st);
+	if (err) {
+		reftable_free(*dest);
+		*dest = NULL;
+	}
+	return err;
+}
+
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg)
+{
+	struct reftable_addition add = { 0 };
+	int err = reftable_stack_init_addition(&add, st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_addition_add(&add, write_table, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_addition_commit(&add);
+exit:
+	reftable_addition_close(&add);
+	return err;
+}
+
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice tab_file_name = { 0 };
+	struct slice next_name = { 0 };
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+	uint64_t next_update_index = 0;
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, next_update_index, next_update_index);
+
+	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
+	slice_append_string(&temp_tab_file_name, "/");
+	slice_append(&temp_tab_file_name, next_name);
+	slice_append_string(&temp_tab_file_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
+	if (tab_fd < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
+				 &add->stack->config);
+	err = write_table(wr, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_writer_close(wr);
+	if (err == EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto exit;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	if (wr->min_update_index < next_update_index) {
+		err = API_ERROR;
+		goto exit;
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_append_string(&next_name, ".ref");
+
+	slice_set_string(&tab_file_name, add->stack->reftable_dir);
+	slice_append_string(&tab_file_name, "/");
+	slice_append(&tab_file_name, next_name);
+
+	err = rename(slice_as_string(&temp_tab_file_name),
+		     slice_as_string(&tab_file_name));
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	add->new_tables = reftable_realloc(add->new_tables,
+					   sizeof(*add->new_tables) *
+						   (add->new_tables_len + 1));
+	add->new_tables[add->new_tables_len] = slice_to_string(next_name);
+	add->new_tables_len++;
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_file_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_file_name));
+	}
+
+	reftable_free(slice_yield(&temp_tab_file_name));
+	reftable_free(slice_yield(&tab_file_name));
+	reftable_free(slice_yield(&next_name));
+	reftable_writer_free(wr);
+	return err;
+}
+
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0) {
+		return reftable_reader_max_update_index(
+			       st->merged->stack[sz - 1]) +
+		       1;
+	}
+	return 1;
+}
+
+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct reftable_log_expiry_config *config)
+{
+	struct slice next_name = { 0 };
+	int tab_fd = -1;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reftable_reader_min_update_index(st->merged->stack[first]),
+		    reftable_reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_append_string(temp_tab, "/");
+	slice_append(temp_tab, next_name);
+	slice_append_string(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0) {
+		goto exit;
+	}
+	err = reftable_writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+	reftable_writer_free(wr);
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		reftable_free(slice_yield(temp_tab));
+	}
+	reftable_free(slice_yield(&next_name));
+	return err;
+}
+
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reftable_reader **subtabs = reftable_calloc(
+		sizeof(struct reftable_reader *) * (last - first + 1));
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reftable_reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	reftable_writer_set_limits(wr,
+				   st->merged->stack[first]->min_update_index,
+				   st->merged->stack[last]->max_update_index);
+
+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
+					st->config.hash_id);
+	if (err < 0) {
+		reftable_free(subtabs);
+		goto exit;
+	}
+
+	err = reftable_merged_table_seek_ref(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_ref(it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = reftable_writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+	}
+
+	err = reftable_merged_table_seek_log(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_log(it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		/* XXX collect stats? */
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = reftable_writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		reftable_merged_table_free(mt);
+	}
+	reftable_ref_record_clear(&ref);
+
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct reftable_stack *st, int first, int last,
+			       struct reftable_log_expiry_config *expiry)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice new_table_name = { 0 };
+	struct slice lock_file_name = { 0 };
+	struct slice ref_list_contents = { 0 };
+	struct slice new_table_path = { 0 };
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **delete_on_success =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	char **subtable_locks =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto exit;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_append_string(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0) {
+		goto exit;
+	}
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_file_name = { 0 };
+		struct slice subtab_lock = { 0 };
+		slice_set_string(&subtab_file_name, st->reftable_dir);
+		slice_append_string(&subtab_file_name, "/");
+		slice_append_string(&subtab_file_name,
+				    reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_file_name);
+		slice_append_string(&subtab_lock, ".lock");
+
+		{
+			int sublock_file_fd =
+				open(slice_as_string(&subtab_lock),
+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
+			if (sublock_file_fd > 0) {
+				close(sublock_file_fd);
+			} else if (sublock_file_fd < 0) {
+				if (errno == EEXIST) {
+					err = 1;
+				}
+				err = IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] =
+			(char *)slice_as_string(&subtab_file_name);
+		j++;
+
+		if (err != 0) {
+			goto exit;
+		}
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
+				   expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_append_string(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_append_string(&new_table_path, "/");
+
+	slice_append(&new_table_path, new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(slice_as_string(&temp_tab_file_name),
+			     slice_as_string(&new_table_path));
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		slice_append(&ref_list_contents, new_table_name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	have_lock = false;
+
+	{
+		char **p = delete_on_success;
+		while (*p) {
+			if (strcmp(*p, slice_as_string(&new_table_path))) {
+				unlink(*p);
+			}
+			p++;
+		}
+	}
+
+	err = reftable_stack_reload_maybe_reuse(st, first < last);
+exit:
+	free_names(delete_on_success);
+	{
+		char **p = subtable_locks;
+		while (*p) {
+			unlink(*p);
+			p++;
+		}
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	reftable_free(slice_yield(&new_table_name));
+	reftable_free(slice_yield(&new_table_path));
+	reftable_free(slice_yield(&ref_list_contents));
+	reftable_free(slice_yield(&temp_tab_file_name));
+	reftable_free(slice_yield(&lock_file_name));
+	return err;
+}
+
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct reftable_stack *st, int first,
+				     int last,
+				     struct reftable_log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	assert(sz > 0);
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+	if (next > 0) {
+		segs[next++] = cur;
+	}
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	reftable_free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
+{
+	uint64_t *sizes =
+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		/* overhead is 24 + 68 = 92. */
+		sizes[i] = st->merged->stack[i]->size - 91;
+	}
+	return sizes;
+}
+
+int reftable_stack_auto_compact(struct reftable_stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	reftable_free(sizes);
+	if (segment_size(&seg) > 0) {
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+	}
+
+	return 0;
+}
+
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st)
+{
+	return &st->stats;
+}
+
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_ref(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_ref(it, ref);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(ref->ref_name, refname) ||
+	    reftable_ref_record_is_deletion(ref)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_log(it, log);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(log->ref_name, refname) ||
+	    reftable_log_record_is_deletion(log)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 00000000000..362d6a9c56f
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,42 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+
+struct reftable_stack {
+	char *list_file;
+	char *reftable_dir;
+
+	struct reftable_write_options config;
+
+	struct reftable_merged_table *merged;
+	struct reftable_compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg);
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config);
+int fastlog2(uint64_t sz);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 00000000000..99f453bfec8
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#if 1 /* REFTABLE_IN_GITCORE */
+
+#include "git-compat-util.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_IN_GITCORE */
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef uint8_t byte;
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ *
+ * TODO: in git-core, this should fallback to uncompress2 if it is available.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 00000000000..0341c865569
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,67 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				reftable_calloc(sizeof(struct tree_node));
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	{
+		int res = compare(key, (*rootp)->key);
+		if (res < 0) {
+			return tree_search(key, &(*rootp)->left, compare,
+					   insert);
+		} else if (res > 0) {
+			return tree_search(key, &(*rootp)->right, compare,
+					   insert);
+		}
+	}
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	reftable_free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 00000000000..86a71715aee
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,24 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100644
index 00000000000..0fd90f89675
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,14 @@
+#!/bin/sh
+
+set -eux
+
+((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
+git clone https://github.com/google/reftable reftable-repo) && \
+cp reftable-repo/c/*.[ch] reftable/ && \
+cp reftable-repo/c/include/*.[ch] reftable/ && \
+cp reftable-repo/LICENSE reftable/ &&
+git --git-dir reftable-repo/.git show --no-patch origin/master \
+> reftable/VERSION && \
+sed -i~ 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|' reftable/system.h
+rm reftable/*_test.c reftable/test_framework.* reftable/compat.*
+git add reftable/*.[ch]
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 00000000000..c5b9f66686b
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,653 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct reftable_block_stats *
+writer_reftable_block_stats(struct reftable_writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct reftable_writer *w, byte *data, size_t len,
+			int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = reftable_calloc(w->pending_padding);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0) {
+			return n;
+		}
+
+		w->pending_padding = 0;
+		reftable_free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0) {
+		return n;
+	}
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct reftable_write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_version(struct reftable_writer *w)
+{
+	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
+}
+
+static int writer_write_header(struct reftable_writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	dest[4] = writer_version(w);
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	if (writer_version(w) == 2) {
+		put_be32(dest + 24, w->opts.hash_id);
+	}
+	return header_size(writer_version(w));
+}
+
+static void writer_reinit_block_writer(struct reftable_writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = header_size(writer_version(w));
+	}
+
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, byte *, int), void *writer_arg,
+		    struct reftable_write_options *opts)
+{
+	struct reftable_writer *wp =
+		reftable_calloc(sizeof(struct reftable_writer));
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->block = reftable_calloc(opts->block_size);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void reftable_writer_free(struct reftable_writer *w)
+{
+	reftable_free(w->block);
+	reftable_free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
+			     ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct reftable_writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		key = reftable_calloc(sizeof(struct obj_index_tree_node));
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = reftable_realloc(
+			key->offsets, sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct reftable_writer *w, struct record rec)
+{
+	int result = -1;
+	struct slice key = { 0 };
+	int err = 0;
+	record_key(rec, &key);
+	if (slice_compare(w->last_key, key) >= 0) {
+		goto exit;
+	}
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto exit;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	writer_reinit_block_writer(w, record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	result = 0;
+exit:
+	reftable_free(slice_yield(&key));
+	return result;
+}
+
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref)
+{
+	struct record rec = { 0 };
+	struct reftable_ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL) {
+		return API_ERROR;
+	}
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index) {
+		return API_ERROR;
+	}
+
+	record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = hash_size(w->opts.hash_id),
+		};
+
+		writer_index_hash(w, h);
+	}
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = hash_size(w->opts.hash_id),
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, reftable_ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log)
+{
+	if (log->ref_name == NULL) {
+		return API_ERROR;
+	}
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	{
+		struct record rec = { 0 };
+		int err;
+		record_from_log(&rec, log);
+		err = writer_add_record(w, rec);
+		return err;
+	}
+}
+
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, reftable_log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	while (w->index_len > threshold) {
+		struct index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct record rec = { 0 };
+			record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, rec) == 0) {
+				continue;
+			}
+
+			{
+				int err = writer_flush_block(w);
+				if (err < 0) {
+					return err;
+				}
+			}
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, rec);
+			assert(err == 0);
+		}
+		for (i = 0; i < idx_len; i++) {
+			reftable_free(slice_yield(&idx[i].last_key));
+		}
+		reftable_free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_block_stats *bstats =
+			writer_reftable_block_stats(w, typ);
+		bstats->index_blocks =
+			w->stats.idx_stats.blocks - before_blocks;
+		bstats->index_offset = index_start;
+		bstats->max_index_level = max_level;
+	}
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct reftable_writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct record rec = { 0 };
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	reftable_free(slice_yield(&entry->hash));
+	reftable_free(entry);
+}
+
+static int writer_dump_object_index(struct reftable_writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0) {
+		return closure.err;
+	}
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct reftable_writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0) {
+		return err;
+	}
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int reftable_writer_close(struct reftable_writer *w)
+{
+	byte footer[72];
+	byte *p = footer;
+
+	int err = writer_finish_public_section(w);
+	if (err != 0) {
+		goto exit;
+	}
+
+	p += writer_write_header(w, footer);
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+	w->pending_padding = 0;
+
+	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (w->stats.log_stats.entries + w->stats.ref_stats.entries == 0) {
+		err = EMPTY_TABLE_ERROR;
+		goto exit;
+	}
+
+exit:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	reftable_free(slice_yield(&w->last_key));
+	return err;
+}
+
+void writer_clear_index(struct reftable_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		reftable_free(slice_yield(&w->index[i].last_key));
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct reftable_block_stats *bstats =
+		writer_reftable_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	if (raw_bytes < 0) {
+		return raw_bytes;
+	}
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0) {
+		return err;
+	}
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = reftable_realloc(
+			w->index, sizeof(struct index_record) * w->index_cap);
+	}
+
+	{
+		struct index_record ir = {
+			.offset = w->next,
+		};
+		slice_copy(&ir.last_key, w->block_writer->last_key);
+		w->index[w->index_len] = ir;
+	}
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	block_writer_reset(&w->block_writer_data);
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct reftable_writer *w)
+{
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+	if (w->block_writer->entries == 0) {
+		return 0;
+	}
+	return writer_flush_nonempty_block(w);
+}
+
+const struct reftable_stats *writer_stats(struct reftable_writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 00000000000..6564dee60b2
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,45 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct reftable_writer {
+	int (*write)(void *, byte *, int);
+	void *write_arg;
+	int pending_padding;
+	struct slice last_key;
+
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct reftable_write_options opts;
+
+	byte *block;
+	struct reftable_block_writer *block_writer;
+	struct reftable_block_writer block_writer_data;
+	struct index_record *index;
+	int index_len;
+	int index_cap;
+
+	/* tree for use with tsearch */
+	struct tree_node *obj_index_tree;
+
+	struct reftable_stats stats;
+};
+
+int writer_flush_block(struct reftable_writer *w);
+void writer_clear_index(struct reftable_writer *w);
+int writer_finish_public_section(struct reftable_writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 00000000000..3e0b0f24f1c
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v8 9/9] Reftable support for git-core
  2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
                                 ` (7 preceding siblings ...)
  2020-04-01 11:28               ` [PATCH v8 8/9] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-04-01 11:28               ` Han-Wen Nienhuys via GitGitGadget
  2020-04-15 23:29               ` [PATCH v8 0/9] Reftable support git-core Junio C Hamano
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
  10 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-01 11:28 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

TODO:

 * "git show-ref" shows "HEAD"
 * Resolve spots marked with XXX
 * Test strategy?

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   24 +-
 builtin/clone.c                               |    5 +-
 builtin/init-db.c                             |   49 +-
 cache.h                                       |    4 +-
 refs.c                                        |   20 +-
 refs.h                                        |    2 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1002 +++++++++++++++++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |   35 +
 13 files changed, 1138 insertions(+), 28 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ffd..72576235833 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index ef1ff2228f0..9c1a7f0b817 100644
--- a/Makefile
+++ b/Makefile
@@ -814,6 +814,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += command-list.h
 
@@ -960,6 +961,7 @@ LIB_OBJS += rebase-interactive.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1164,7 +1166,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2348,11 +2350,28 @@ VCSSVN_OBJS += vcs-svn/fast_export.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/bytes.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2489,6 +2508,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
diff --git a/builtin/clone.c b/builtin/clone.c
index d8b1f413aad..3bead96e447 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1109,7 +1109,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template,
+                GIT_HASH_UNKNOWN,
+		DEFAULT_REF_STORAGE, /* XXX */
+		INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 3b50b1aa0e5..888b421338d 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,7 +178,7 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo, const char *ref_storage_format)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -188,7 +188,7 @@ void initialize_repository_version(int hash_algo)
 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
 #endif
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 || !strcmp(ref_storage_format, "reftable"))
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -238,6 +238,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -247,6 +248,15 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -259,15 +269,17 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
+	} else {
+		/*
+		 * XXX should check whether our ref backend matches the
+		 * original one; if not, either die() or convert
+		 */
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
@@ -381,7 +393,8 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash, unsigned int flags)
+	    const char *template_dir, int hash, const char *ref_storage_format,
+            unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -420,6 +433,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
+        repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
@@ -448,6 +462,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -521,6 +537,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = DEFAULT_REF_STORAGE;
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
@@ -528,15 +545,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *object_format = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
@@ -646,9 +666,10 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
+	return init_db(git_dir, real_git_dir, template_dir, hash_algo, ref_storage_format, flags);
 }
diff --git a/cache.h b/cache.h
index c77b95870a5..83c4908ba06 100644
--- a/cache.h
+++ b/cache.h
@@ -628,8 +628,9 @@ int path_inside_repo(const char *prefix, const char *path);
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
+            const char *ref_storage_format,
 	    unsigned int flags);
-void initialize_repository_version(int hash_algo);
+void initialize_repository_version(int hash_algo, const char *ref_storage_format);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1043,6 +1044,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 1ab0bb54d3d..6530219762f 100644
--- a/refs.c
+++ b/refs.c
@@ -20,7 +20,7 @@
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1836,13 +1836,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1858,7 +1858,10 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs = ref_store_init(r->gitdir,
+				 r->ref_storage_format ? r->ref_storage_format :
+							 DEFAULT_REF_STORAGE,
+				 REF_STORE_ALL_CAPS);
 	return r->refs;
 }
 
@@ -1913,7 +1916,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1927,6 +1930,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1940,9 +1944,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index 87c9ec921b9..d7fbbe26e6a 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,8 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+#define DEFAULT_REF_STORAGE "files"
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 3490aac3a40..cafe5b97376 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -661,6 +661,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 00000000000..6e845e9c649
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1002 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct git_reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *reftable_dir;
+	struct reftable_stack *stack;
+};
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	reftable_log_record_clear(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = {};
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	reftable_log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *git_reftable_ref_store_create(const char *path,
+						   unsigned int store_flags)
+{
+	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct reftable_write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", path);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/HEAD", path);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", path);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	safe_create_dir(refs->reftable_dir, 1);
+	return 0;
+}
+
+struct git_reftable_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri = (struct git_reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = reftable_iterator_next_ref(ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct git_reftable_iterator *ri = (struct git_reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri = (struct git_reftable_iterator *)ref_iterator;
+	reftable_ref_record_clear(&ri->ref);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct reftable_merged_table *mt = NULL;
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else {
+		mt = reftable_stack_merged_table(refs->stack);
+		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	(void)refs;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid = {};
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct ref_update *)a)->refname,
+		      ((struct ref_update *)b)->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = reftable_stack_next_update_index(refs->stack);
+	int err = 0;
+	struct reftable_log_record *logs = calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct reftable_log_record *log = &logs[i];
+		fill_reftable_log_record(log);
+		log->ref_name = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct object_id out_oid = {};
+			int out_flags = 0;
+			/* Memory owned by refs_resolve_ref_unsafe, no need to
+			 * free(). */
+			const char *resolved = refs_resolve_ref_unsafe(
+				transaction->ref_store, u->refname, 0, &out_oid,
+				&out_flags);
+			struct reftable_ref_record ref = {};
+			ref.ref_name =
+				(char *)(resolved ? resolved : u->refname);
+			log->ref_name = ref.ref_name;
+			ref.value = u->new_oid.hash;
+			ref.update_index = ts;
+			err = reftable_writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		err = reftable_writer_add_log(writer, &logs[i]);
+		clear_reftable_log_record(&logs[i]);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+exit:
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_commit(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_add(refs->stack, &write_transaction_table, transaction);
+	if (err < 0) {
+		strbuf_addf(errmsg, "reftable: transaction failure %s",
+			    reftable_error_str(err));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *err)
+{
+	return reftable_transaction_commit(ref_store, transaction, err);
+}
+
+struct write_delete_refs_arg {
+	struct reftable_stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+
+	reftable_writer_set_limits(writer, ts, ts);
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = reftable_writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_log_record log = {};
+		struct reftable_ref_record current = {};
+		fill_reftable_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (reftable_stack_read_ref(arg->stack, log.ref_name, &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = reftable_writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		reftable_ref_record_clear(&current);
+
+		clear_reftable_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	return reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct git_reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = reftable_stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	reftable_writer_set_limits(writer, ts, ts);
+	err = reftable_writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_log_record log = {};
+		struct object_id new_oid = {};
+		struct object_id old_oid = {};
+		struct reftable_ref_record current = {};
+		reftable_stack_read_ref(create->refs->stack, create->refname, &current);
+
+		fill_reftable_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			reftable_writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_reftable_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_add(refs->stack, &write_create_symref_table, &arg);
+}
+
+struct write_rename_arg {
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_ref_record ref = {};
+	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto exit;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto exit;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	reftable_writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct reftable_ref_record todo[2] = {};
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = reftable_writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct reftable_log_record todo[2] = {};
+		fill_reftable_log_record(&todo[0]);
+		fill_reftable_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = reftable_writer_add_logs(writer, todo, 2);
+
+		clear_reftable_log_record(&todo[0]);
+		clear_reftable_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto exit;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+exit:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	return reftable_stack_add(refs->stack, &write_rename_table, &arg);
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = reftable_iterator_next_log(ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	reftable_log_record_clear(&ri->log);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+
+	struct reftable_merged_table *mt = reftable_stack_merged_table(refs->stack);
+	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = {};
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_log_record log = {};
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		err = reftable_iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid = {};
+			struct object_id new_oid = {};
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = {};
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct reftable_log_record log = {};
+		err = reftable_iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (int i = len; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid = {};
+		struct object_id new_oid = {};
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (int i = 0; i < len; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		reftable_log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct reftable_log_record tombstone = {
+		.ref_name = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->refs->stack);
+	int i = 0;
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  reftable_pack_refs(), which will compact the entire stack and get rid
+	  of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.refs = refs,
+	};
+	struct reftable_log_record log = {};
+	struct reftable_iterator it = {};
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		struct object_id ooid = {};
+		struct object_id noid = {};
+
+		int err = reftable_iterator_next_log(it, &log);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0 || strcmp(log.ref_name, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_ref_record ref = {};
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_read_ref(refs->stack, refname, &ref);
+	if (err) {
+		goto exit;
+	}
+	if (ref.target != NULL) {
+		/* XXX recurse? */
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else {
+		hashcpy(oid->hash, ref.value);
+	}
+exit:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	git_reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire
+};
diff --git a/repository.c b/repository.c
index 6f7f6f002b1..087760bc184 100644
--- a/repository.c
+++ b/repository.c
@@ -178,6 +178,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 040057dea6f..198d4aa0907 100644
--- a/repository.h
+++ b/repository.h
@@ -70,6 +70,9 @@ struct repository {
 	/* The store in which the refs are held. */
 	struct ref_store *refs;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index 65fe5ecefbe..2ef970f9f88 100644
--- a/setup.c
+++ b/setup.c
@@ -468,9 +468,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -559,6 +561,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1204,8 +1207,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 00000000000..3ebf13c2f42
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+test_expect_success 'basic operation of reftable storage' '
+	git init --ref-storage=reftable repo && (
+	cd repo &&
+	echo "hello" >world.txt &&
+	git add world.txt &&
+	git commit -m "first post" &&
+	test_write_lines HEAD refs/heads/master >expect &&
+	git show-ref &&
+	git show-ref | cut -f2 -d" " > actual &&
+	test_cmp actual expect &&
+	for count in $(test_seq 1 10)
+	do
+		echo "hello" >>world.txt
+		git commit -m "number ${count}" world.txt ||
+		return 1
+	done &&
+	git gc &&
+	nfiles=$(ls -1 .git/reftable | wc -l ) &&
+	test ${nfiles} = "2" &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 11 output &&
+	grep "commit (initial): first post" output &&
+	grep "commit: number 10" output )
+'
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v8 0/9] Reftable support git-core
  2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
                                 ` (8 preceding siblings ...)
  2020-04-01 11:28               ` [PATCH v8 9/9] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-04-15 23:29               ` Junio C Hamano
  2020-04-18  3:22                 ` Danh Doan
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
  10 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-04-15 23:29 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This adds the reftable library, and hooks it up as a ref backend.

Since I queued the topic to 'pu', I needed to apply these fix-up
patches on top.  The change to the Makefile is about "make clean"
which forgot to clean the reftable library.  All the rest are ws
clean-ups.

There are build breakages reported on Windows, with possible fixes
(which IIRC were reported to segfault X-<), but I do not have URL or
message-IDs handy.

Thanks.


 Makefile            | 2 +-
 builtin/clone.c     | 2 +-
 builtin/init-db.c   | 4 ++--
 cache.h             | 2 +-
 reftable/reftable.h | 8 ++++----
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/Makefile b/Makefile
index 9c1a7f0b81..3d5a585d8f 100644
--- a/Makefile
+++ b/Makefile
@@ -3126,7 +3126,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/builtin/clone.c b/builtin/clone.c
index 3bead96e44..54b1441b95 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1110,7 +1110,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 
 	init_db(git_dir, real_git_dir, option_template,
-                GIT_HASH_UNKNOWN,
+		GIT_HASH_UNKNOWN,
 		DEFAULT_REF_STORAGE, /* XXX */
 		INIT_DB_QUIET);
 
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 888b421338..70645a1848 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -394,7 +394,7 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash, const char *ref_storage_format,
-            unsigned int flags)
+	    unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -433,7 +433,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
-        repo_fmt.ref_storage = xstrdup(ref_storage_format);
+	repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
diff --git a/cache.h b/cache.h
index 83c4908ba0..89e257b7a5 100644
--- a/cache.h
+++ b/cache.h
@@ -628,7 +628,7 @@ int path_inside_repo(const char *prefix, const char *path);
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
-            const char *ref_storage_format,
+	    const char *ref_storage_format,
 	    unsigned int flags);
 void initialize_repository_version(int hash_algo, const char *ref_storage_format);
 
diff --git a/reftable/reftable.h b/reftable/reftable.h
index 48722428b5..3086884be4 100644
--- a/reftable/reftable.h
+++ b/reftable/reftable.h
@@ -446,13 +446,13 @@ struct reftable_addition;
 /*
   returns a new transaction to add reftables to the given stack. As a side
   effect, the ref database is locked.
-*/ 
+*/
 int reftable_stack_new_addition(struct reftable_addition **dest, struct reftable_stack *st);
 
-/* Adds a reftable to transaction. */ 
+/* Adds a reftable to transaction. */
 int reftable_addition_add(struct reftable_addition *add,
-                          int (*write_table)(struct reftable_writer *wr, void *arg),
-                          void *arg);
+			  int (*write_table)(struct reftable_writer *wr, void *arg),
+			  void *arg);
 
 /* Commits the transaction, releasing the lock. */
 int reftable_addition_commit(struct reftable_addition *add);

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v8 0/9] Reftable support git-core
  2020-04-15 23:29               ` [PATCH v8 0/9] Reftable support git-core Junio C Hamano
@ 2020-04-18  3:22                 ` Danh Doan
  0 siblings, 0 replies; 409+ messages in thread
From: Danh Doan @ 2020-04-18  3:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On 2020-04-15 16:29:33-0700, Junio C Hamano <gitster@pobox.com> wrote:
> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > This adds the reftable library, and hooks it up as a ref backend.
> 
> Since I queued the topic to 'pu', I needed to apply these fix-up
> patches on top.  The change to the Makefile is about "make clean"
> which forgot to clean the reftable library.  All the rest are ws
> clean-ups.
> 
> There are build breakages reported on Windows, with possible fixes
> (which IIRC were reported to segfault X-<), but I do not have URL or
> message-IDs handy.

FYI, it's also segfault in Linux even if only the test is applied.
IOW, the code to fix breakages on Windows doesn't affect the segfault.

Here is the message: <nycvar.QRO.7.76.6.2004101604210.46@tvgsbejvaqbjf.bet>

-- 
Danh

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v9 00/10] Reftable support git-core
  2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
                                 ` (9 preceding siblings ...)
  2020-04-15 23:29               ` [PATCH v8 0/9] Reftable support git-core Junio C Hamano
@ 2020-04-20 21:14               ` Han-Wen Nienhuys via GitGitGadget
  2020-04-20 21:14                 ` [PATCH v9 01/10] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
                                   ` (11 more replies)
  10 siblings, 12 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-20 21:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend.

Feedback wanted:

 * spots marked with XXX in the Git source code.
   
   
 * what is a good test strategy? Could we have a CI flavor where we flip the
   default to reftable, and see how it fares?
   
   

v9

 * Added spec as technical doc.
 * Use 4-byte hash ID as API. Storage format for SHA256 is still pending
   discussion.

v10 

 * Serialize 4 byte hash ID to the version-2 format

v11

 * use reftable_ namespace prefix.
 * transaction API to provide locking.
 * reorganized record.{h,c}. More doc in reftable.h
 * support log record deletion.
 * pluggable malloc/free

v12 

 * More doc in slice.h, block.h, reader.h, writer.h, basics.h
 * Folded in Dscho's patch for struct initialization
 * Add "refs/" as default prefix for for_each_[raw]ref
 * Fix nullptr deref in log record copying.
 * Copy over the iteration prefix; this fixes "warning: bad replace ref
   name: e"
 * Some unresolved issues with "git gc"; disabled that part of t0031 for
   now.

Han-Wen Nienhuys (9):
  refs.h: clarify reflog iteration order
  Iterate over the "refs/" namespace in for_each_[raw]ref
  create .git/refs in files-backend.c
  refs: document how ref_iterator_advance_fn should handle symrefs
  Add .gitattributes for the reftable/ directory
  reftable: define version 2 of the spec to accomodate SHA256
  reftable: clarify how empty tables should be written
  Add reftable library
  Reftable support for git-core

Jonathan Nieder (1):
  reftable: file format documentation

 Documentation/Makefile                        |    1 +
 Documentation/technical/reftable.txt          | 1080 ++++++++++++++++
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   26 +-
 builtin/clone.c                               |    5 +-
 builtin/init-db.c                             |   51 +-
 cache.h                                       |    4 +-
 refs.c                                        |   26 +-
 refs.h                                        |    7 +-
 refs/files-backend.c                          |    6 +
 refs/refs-internal.h                          |    6 +
 refs/reftable-backend.c                       | 1021 +++++++++++++++
 reftable/.gitattributes                       |    1 +
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   11 +
 reftable/VERSION                              |    5 +
 reftable/basics.c                             |  189 +++
 reftable/basics.h                             |   53 +
 reftable/block.c                              |  425 +++++++
 reftable/block.h                              |  124 ++
 reftable/blocksource.h                        |   22 +
 reftable/bytes.c                              |    0
 reftable/config.h                             |    1 +
 reftable/constants.h                          |   21 +
 reftable/dump.c                               |   97 ++
 reftable/file.c                               |   98 ++
 reftable/iter.c                               |  234 ++++
 reftable/iter.h                               |   60 +
 reftable/merged.c                             |  307 +++++
 reftable/merged.h                             |   34 +
 reftable/pq.c                                 |  115 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  754 +++++++++++
 reftable/reader.h                             |   68 +
 reftable/record.c                             | 1119 ++++++++++++++++
 reftable/record.h                             |  117 ++
 reftable/reftable.h                           |  523 ++++++++
 reftable/slice.c                              |  224 ++++
 reftable/slice.h                              |   76 ++
 reftable/stack.c                              | 1132 +++++++++++++++++
 reftable/stack.h                              |   42 +
 reftable/system.h                             |   53 +
 reftable/tree.c                               |   67 +
 reftable/tree.h                               |   34 +
 reftable/update.sh                            |   14 +
 reftable/writer.c                             |  661 ++++++++++
 reftable/writer.h                             |   60 +
 reftable/zlib-compat.c                        |   92 ++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |   35 +
 52 files changed, 9155 insertions(+), 35 deletions(-)
 create mode 100644 Documentation/technical/reftable.txt
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/.gitattributes
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100755 t/t0031-reftable.sh


base-commit: 048abe1751e6727bfbacf7b80466d78e04631f94
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v9
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v9
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v8:

  1:  05634d26dd5 =  1:  b600d0bc6dd refs.h: clarify reflog iteration order
  -:  ----------- >  2:  89b68145e8f Iterate over the "refs/" namespace in for_each_[raw]ref
  2:  8d34e2c5c8b =  3:  91f1efe24ec create .git/refs in files-backend.c
  3:  d08f823844d =  4:  56f65a2a0d7 refs: document how ref_iterator_advance_fn should handle symrefs
  4:  ca95b3a371e =  5:  b432c1cc2ae Add .gitattributes for the reftable/ directory
  5:  dfc8b131294 =  6:  b1d94be691a reftable: file format documentation
  6:  45ab5361750 =  7:  3e418d27f67 reftable: define version 2 of the spec to accomodate SHA256
  7:  67ee5962e85 =  8:  6f62be4067b reftable: clarify how empty tables should be written
  8:  4f9bdd7312b !  9:  a30001ad1e8 Add reftable library
     @@ reftable/README.md (new)
      
       ## reftable/VERSION (new) ##
      @@
     -+commit 920f7413f7a25f308d9a203f584873f6753192ec
     ++commit b68690acad769d59e80cbb4f79c442ece133e6bd
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Wed Apr 1 12:41:41 2020 +0200
     ++Date:   Mon Apr 20 21:33:11 2020 +0200
      +
     -+    C: remove stray debug printfs
     ++    C: fix search/replace error in dump.c
      
       ## reftable/basics.c (new) ##
      @@
     @@ reftable/basics.c (new)
      +	out[1] = (uint8_t)((i)&0xff);
      +}
      +
     -+/*
     -+  find smallest index i in [0, sz) at which f(i) is true, assuming
     -+  that f is ascending. Return sz if f(i) is false for all indices.
     -+*/
      +int binsearch(int sz, int (*f)(int k, void *args), void *args)
      +{
      +	int lo = 0;
     @@ reftable/basics.c (new)
      +	return len;
      +}
      +
     -+/* parse a newline separated list of names. Empty names are discarded. */
      +void parse_names(char *buf, int size, char ***namesp)
      +{
      +	char **names = NULL;
     @@ reftable/basics.h (new)
      +#define true 1
      +#define false 0
      +
     ++/* Bigendian en/decoding of integers */
     ++
      +void put_be24(byte *out, uint32_t i);
      +uint32_t get_be24(byte *in);
      +void put_be16(uint8_t *out, uint16_t i);
      +
     ++/*
     ++  find smallest index i in [0, sz) at which f(i) is true, assuming
     ++  that f is ascending. Return sz if f(i) is false for all indices.
     ++*/
      +int binsearch(int sz, int (*f)(int k, void *args), void *args);
      +
     ++/*
     ++  Frees a NULL terminated array of malloced strings. The array itself is also
     ++  freed.
     ++ */
      +void free_names(char **a);
     ++
     ++/* parse a newline separated list of names. Empty names are discarded. */
      +void parse_names(char *buf, int size, char ***namesp);
     ++
     ++/* compares two NULL-terminated arrays of strings. */
      +int names_equal(char **a, char **b);
     ++
     ++/* returns the array size of a NULL-terminated array of strings. */
      +int names_length(char **names);
      +
     ++/* Allocation routines; they invoke the functions set through
     ++ * reftable_set_alloc() */
      +void *reftable_malloc(size_t sz);
      +void *reftable_realloc(void *p, size_t sz);
      +void reftable_free(void *p);
     @@ reftable/block.c (new)
      +	abort();
      +}
      +
     -+int block_writer_register_restart(struct reftable_block_writer *w, int n,
     -+				  bool restart, struct slice key);
     ++int block_writer_register_restart(struct block_writer *w, int n, bool restart,
     ++				  struct slice key);
      +
     -+void block_writer_init(struct reftable_block_writer *bw, byte typ, byte *buf,
     ++void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
      +		       uint32_t block_size, uint32_t header_off, int hash_size)
      +{
      +	bw->buf = buf;
     @@ reftable/block.c (new)
      +	bw->next = header_off + 4;
      +	bw->restart_interval = 16;
      +	bw->entries = 0;
     ++	bw->restart_len = 0;
     ++	bw->last_key.len = 0;
      +}
      +
     -+byte block_writer_type(struct reftable_block_writer *bw)
     ++byte block_writer_type(struct block_writer *bw)
      +{
      +	return bw->buf[bw->header_off];
      +}
      +
      +/* adds the record to the block. Returns -1 if it does not fit, 0 on
      +   success */
     -+int block_writer_add(struct reftable_block_writer *w, struct record rec)
     ++int block_writer_add(struct block_writer *w, struct record rec)
      +{
      +	struct slice empty = { 0 };
      +	struct slice last = w->entries % w->restart_interval == 0 ? empty :
     @@ reftable/block.c (new)
      +		goto err;
      +	}
      +
     -+	reftable_free(slice_yield(&key));
     ++	slice_clear(&key);
      +	return 0;
      +
      +err:
     -+	reftable_free(slice_yield(&key));
     ++	slice_clear(&key);
      +	return -1;
      +}
      +
     -+int block_writer_register_restart(struct reftable_block_writer *w, int n,
     -+				  bool restart, struct slice key)
     ++int block_writer_register_restart(struct block_writer *w, int n, bool restart,
     ++				  struct slice key)
      +{
      +	int rlen = w->restart_len;
      +	if (rlen >= MAX_RESTARTS) {
     @@ reftable/block.c (new)
      +	return 0;
      +}
      +
     -+int block_writer_finish(struct reftable_block_writer *w)
     ++int block_writer_finish(struct block_writer *w)
      +{
      +	int i = 0;
      +	for (i = 0; i < w->restart_len; i++) {
     @@ reftable/block.c (new)
      +			}
      +
      +			if (Z_OK != zresult) {
     -+				reftable_free(slice_yield(&compressed));
     ++				slice_clear(&compressed);
      +				return ZLIB_ERROR;
      +			}
      +
     @@ reftable/block.c (new)
      +	return w->next;
      +}
      +
     -+byte block_reader_type(struct reftable_block_reader *r)
     ++byte block_reader_type(struct block_reader *r)
      +{
      +	return r->block.data[r->header_off];
      +}
      +
     -+int block_reader_init(struct reftable_block_reader *br,
     -+		      struct reftable_block *block, uint32_t header_off,
     -+		      uint32_t table_block_size, int hash_size)
     ++int block_reader_init(struct block_reader *br, struct reftable_block *block,
     ++		      uint32_t header_off, uint32_t table_block_size,
     ++		      int hash_size)
      +{
      +	uint32_t full_block_size = table_block_size;
      +	byte typ = block->data[header_off];
     @@ reftable/block.c (new)
      +				    uncompressed.buf + block_header_skip,
      +				    &dst_len, block->data + block_header_skip,
      +				    &src_len)) {
     -+			reftable_free(slice_yield(&uncompressed));
     ++			slice_clear(&uncompressed);
      +			return ZLIB_ERROR;
      +		}
      +
     @@ reftable/block.c (new)
      +	return 0;
      +}
      +
     -+static uint32_t block_reader_restart_offset(struct reftable_block_reader *br,
     -+					    int i)
     ++static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
      +{
      +	return get_be24(br->restart_bytes + 3 * i);
      +}
      +
     -+void block_reader_start(struct reftable_block_reader *br,
     -+			struct reftable_block_iter *it)
     ++void block_reader_start(struct block_reader *br, struct block_iter *it)
      +{
      +	it->br = br;
      +	slice_resize(&it->last_key, 0);
     @@ reftable/block.c (new)
      +struct restart_find_args {
      +	int error;
      +	struct slice key;
     -+	struct reftable_block_reader *r;
     ++	struct block_reader *r;
      +};
      +
      +static int restart_key_less(int idx, void *args)
     @@ reftable/block.c (new)
      +
      +	{
      +		int result = slice_compare(a->key, rkey);
     -+		reftable_free(slice_yield(&rkey));
     ++		slice_clear(&rkey);
      +		return result;
      +	}
      +}
      +
     -+void block_iter_copy_from(struct reftable_block_iter *dest,
     -+			  struct reftable_block_iter *src)
     ++void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
      +{
      +	dest->br = src->br;
      +	dest->next_off = src->next_off;
     @@ reftable/block.c (new)
      +}
      +
      +/* return < 0 for error, 0 for OK, > 0 for EOF. */
     -+int block_iter_next(struct reftable_block_iter *it, struct record rec)
     ++int block_iter_next(struct block_iter *it, struct record rec)
      +{
      +	if (it->next_off >= it->br->block_len) {
      +		return 1;
     @@ reftable/block.c (new)
      +
      +		slice_copy(&it->last_key, key);
      +		it->next_off += start.len - in.len;
     -+		reftable_free(slice_yield(&key));
     ++		slice_clear(&key);
      +		return 0;
      +	}
      +}
      +
     -+int block_reader_first_key(struct reftable_block_reader *br, struct slice *key)
     ++int block_reader_first_key(struct block_reader *br, struct slice *key)
      +{
      +	struct slice empty = { 0 };
      +	int off = br->header_off + 4;
     @@ reftable/block.c (new)
      +	return 0;
      +}
      +
     -+int block_iter_seek(struct reftable_block_iter *it, struct slice want)
     ++int block_iter_seek(struct block_iter *it, struct slice want)
      +{
      +	return block_reader_seek(it->br, it, want);
      +}
      +
     -+void block_iter_close(struct reftable_block_iter *it)
     ++void block_iter_close(struct block_iter *it)
      +{
     -+	reftable_free(slice_yield(&it->last_key));
     ++	slice_clear(&it->last_key);
      +}
      +
     -+int block_reader_seek(struct reftable_block_reader *br,
     -+		      struct reftable_block_iter *it, struct slice want)
     ++int block_reader_seek(struct block_reader *br, struct block_iter *it,
     ++		      struct slice want)
      +{
      +	struct restart_find_args args = {
      +		.key = want,
     @@ reftable/block.c (new)
      +		struct slice key = { 0 };
      +		int result = 0;
      +		int err = 0;
     -+		struct reftable_block_iter next = { 0 };
     ++		struct block_iter next = { 0 };
      +		while (true) {
      +			block_iter_copy_from(&next, it);
      +
     @@ reftable/block.c (new)
      +		}
      +
      +	exit:
     -+		reftable_free(slice_yield(&key));
     -+		reftable_free(slice_yield(&next.last_key));
     -+		record_clear(rec);
     -+		reftable_free(record_yield(&rec));
     ++		slice_clear(&key);
     ++		slice_clear(&next.last_key);
     ++		record_destroy(&rec);
      +
      +		return result;
      +	}
      +}
      +
     -+void block_writer_reset(struct reftable_block_writer *bw)
     -+{
     -+	bw->restart_len = 0;
     -+	bw->last_key.len = 0;
     -+}
     -+
     -+void block_writer_clear(struct reftable_block_writer *bw)
     ++void block_writer_clear(struct block_writer *bw)
      +{
      +	FREE_AND_NULL(bw->restarts);
     -+	reftable_free(slice_yield(&bw->last_key));
     ++	slice_clear(&bw->last_key);
      +	/* the block is not owned. */
      +}
      
     @@ reftable/block.h (new)
      +#include "record.h"
      +#include "reftable.h"
      +
     -+struct reftable_block_writer {
     ++/*
     ++  Writes reftable blocks. The block_writer is reused across blocks to minimize
     ++  allocation overhead.
     ++*/
     ++struct block_writer {
      +	byte *buf;
      +	uint32_t block_size;
     ++
     ++	/* Offset ofof the global header. Nonzero in the first block only. */
      +	uint32_t header_off;
     ++
     ++	/* How often to restart keys. */
      +	int restart_interval;
      +	int hash_size;
      +
     ++	/* Offset of next byte to write. */
      +	uint32_t next;
      +	uint32_t *restarts;
      +	uint32_t restart_len;
      +	uint32_t restart_cap;
     ++
      +	struct slice last_key;
      +	int entries;
      +};
      +
     -+void block_writer_init(struct reftable_block_writer *bw, byte typ, byte *buf,
     ++/*
     ++  initializes the blockwriter to write `typ` entries, using `buf` as temporary
     ++  storage. `buf` is not owned by the block_writer. */
     ++void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
      +		       uint32_t block_size, uint32_t header_off, int hash_size);
     -+byte block_writer_type(struct reftable_block_writer *bw);
     -+int block_writer_add(struct reftable_block_writer *w, struct record rec);
     -+int block_writer_finish(struct reftable_block_writer *w);
     -+void block_writer_reset(struct reftable_block_writer *bw);
     -+void block_writer_clear(struct reftable_block_writer *bw);
      +
     -+struct reftable_block_reader {
     ++/*
     ++  returns the block type (eg. 'r' for ref records.
     ++*/
     ++byte block_writer_type(struct block_writer *bw);
     ++
     ++/* appends the record, or -1 if it doesn't fit. */
     ++int block_writer_add(struct block_writer *w, struct record rec);
     ++
     ++/* appends the key restarts, and compress the block if necessary. */
     ++int block_writer_finish(struct block_writer *w);
     ++
     ++/* clears out internally allocated block_writer members. */
     ++void block_writer_clear(struct block_writer *bw);
     ++
     ++/* Read a block. */
     ++struct block_reader {
     ++	/* offset of the block header; nonzero for the first block in a
     ++	 * reftable. */
      +	uint32_t header_off;
     ++
     ++	/* the memory block */
      +	struct reftable_block block;
      +	int hash_size;
      +
      +	/* size of the data, excluding restart data. */
      +	uint32_t block_len;
      +	byte *restart_bytes;
     -+	uint32_t full_block_size;
      +	uint16_t restart_count;
     ++
     ++	/* size of the data in the file. For log blocks, this is the compressed
     ++	 * size. */
     ++	uint32_t full_block_size;
      +};
      +
     -+struct reftable_block_iter {
     ++/* Iterate over entries in a block */
     ++struct block_iter {
     ++	/* offset within the block of the next entry to read. */
      +	uint32_t next_off;
     -+	struct reftable_block_reader *br;
     ++	struct block_reader *br;
     ++
     ++	/* key for last entry we read. */
      +	struct slice last_key;
      +};
      +
     -+int block_reader_init(struct reftable_block_reader *br,
     -+		      struct reftable_block *bl, uint32_t header_off,
     -+		      uint32_t table_block_size, int hash_size);
     -+void block_reader_start(struct reftable_block_reader *br,
     -+			struct reftable_block_iter *it);
     -+int block_reader_seek(struct reftable_block_reader *br,
     -+		      struct reftable_block_iter *it, struct slice want);
     -+byte block_reader_type(struct reftable_block_reader *r);
     -+int block_reader_first_key(struct reftable_block_reader *br, struct slice *key);
     -+
     -+void block_iter_copy_from(struct reftable_block_iter *dest,
     -+			  struct reftable_block_iter *src);
     -+int block_iter_next(struct reftable_block_iter *it, struct record rec);
     -+int block_iter_seek(struct reftable_block_iter *it, struct slice want);
     -+void block_iter_close(struct reftable_block_iter *it);
     ++/* initializes a block reader */
     ++int block_reader_init(struct block_reader *br, struct reftable_block *bl,
     ++		      uint32_t header_off, uint32_t table_block_size,
     ++		      int hash_size);
     ++
     ++/* Position `it` at start of the block */
     ++void block_reader_start(struct block_reader *br, struct block_iter *it);
     ++
     ++/* Position `it` to the `want` key in the block */
     ++int block_reader_seek(struct block_reader *br, struct block_iter *it,
     ++		      struct slice want);
     ++
     ++/* Returns the block type (eg. 'r' for refs) */
     ++byte block_reader_type(struct block_reader *r);
     ++
     ++/* Decodes the first key in the block */
     ++int block_reader_first_key(struct block_reader *br, struct slice *key);
     ++
     ++void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
     ++int block_iter_next(struct block_iter *it, struct record rec);
     ++
     ++/* Seek to `want` with in the block pointed to by `it` */
     ++int block_iter_seek(struct block_iter *it, struct slice want);
     ++
     ++/* deallocate memory for `it`. The block reader and its block is left intact. */
     ++void block_iter_close(struct block_iter *it);
      +
     ++/* size of file header, depending on format version */
      +int header_size(int version);
     ++
     ++/* size of file footer, depending on format version */
      +int footer_size(int version);
      +
      +#endif
     @@ reftable/iter.c (new)
      +{
      +	struct filtering_ref_iterator *fri =
      +		(struct filtering_ref_iterator *)iter_arg;
     -+	reftable_free(slice_yield(&fri->oid));
     ++	slice_clear(&fri->oid);
      +	reftable_iterator_destroy(&fri->it);
      +}
      +
     @@ reftable/iter.c (new)
      +	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
      +	block_iter_close(&it->cur);
      +	reader_return_block(it->r, &it->block_reader.block);
     -+	reftable_free(slice_yield(&it->oid));
     ++	slice_clear(&it->oid);
      +}
      +
      +static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
     @@ reftable/iter.h (new)
      +int iterator_next(struct reftable_iterator it, struct record rec);
      +bool iterator_is_null(struct reftable_iterator it);
      +
     ++/* iterator that produces only ref records that point to `oid` */
      +struct filtering_ref_iterator {
      +	bool double_check;
      +	struct reftable_reader *r;
     @@ reftable/iter.h (new)
      +void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
      +					  struct filtering_ref_iterator *);
      +
     ++/* iterator that produces only ref records that point to `oid`,
     ++   but using the object index.
     ++ */
      +struct indexed_table_ref_iter {
      +	struct reftable_reader *r;
      +	struct slice oid;
     @@ reftable/iter.h (new)
      +	/* Points to the next offset to read. */
      +	int offset_idx;
      +	int offset_len;
     -+	struct reftable_block_reader block_reader;
     -+	struct reftable_block_iter cur;
     ++	struct block_reader block_reader;
     ++	struct block_iter cur;
      +	bool finished;
      +};
      +
     @@ reftable/merged.c (new)
      +
      +		if (err > 0) {
      +			reftable_iterator_destroy(&mi->stack[i]);
     -+			record_clear(rec);
     -+			reftable_free(record_yield(&rec));
     ++			record_destroy(&rec);
      +		} else {
      +			struct pq_entry e = {
      +				.rec = rec,
     @@ reftable/merged.c (new)
      +
      +		if (err > 0) {
      +			reftable_iterator_destroy(&mi->stack[idx]);
     -+			record_clear(rec);
     -+			reftable_free(record_yield(&rec));
     ++			record_destroy(&rec);
      +			return 0;
      +		}
      +
     @@ reftable/merged.c (new)
      +		return err;
      +	}
      +
     ++
     ++        /*
     ++          One can also use reftable as datacenter-local storage, where the ref
     ++          database is maintained in globally consistent database (eg.
     ++          CockroachDB or Spanner). In this scenario, replication delays together
     ++          with compaction may cause newer tables to contain older entries. In
     ++          such a deployment, the loop below must be changed to collect all
     ++          entries for the same key, and return new the newest one.
     ++        */
      +	record_key(entry.rec, &entry_key);
      +	while (!merged_iter_pqueue_is_empty(mi->pq)) {
      +		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
     @@ reftable/merged.c (new)
      +		record_key(top.rec, &k);
      +
      +		cmp = slice_compare(k, entry_key);
     -+		reftable_free(slice_yield(&k));
     ++		slice_clear(&k);
      +
      +		if (cmp > 0) {
      +			break;
     @@ reftable/merged.c (new)
      +	record_copy_from(rec, entry.rec, hash_size(mi->hash_id));
      +	record_clear(entry.rec);
      +	reftable_free(record_yield(&entry.rec));
     -+	reftable_free(slice_yield(&entry_key));
     ++	slice_clear(&entry_key);
      +	return 0;
      +}
      +
     @@ reftable/merged.c (new)
      +		return err;
      +	}
      +
     -+	merged.stack_len = n, err = merged_iter_init(&merged);
     ++	merged.stack_len = n;
     ++	err = merged_iter_init(&merged);
      +	if (err < 0) {
      +		merged_iter_close(&merged);
      +		return err;
     @@ reftable/pq.c (new)
      +
      +	cmp = slice_compare(ak, bk);
      +
     -+	reftable_free(slice_yield(&ak));
     -+	reftable_free(slice_yield(&bk));
     ++	slice_clear(&ak);
     ++	slice_clear(&bk);
      +
      +	if (cmp == 0) {
      +		return a.index > b.index;
     @@ reftable/reader.c (new)
      +	struct reftable_reader *r;
      +	byte typ;
      +	uint64_t block_off;
     -+	struct reftable_block_iter bi;
     ++	struct block_iter bi;
      +	bool finished;
      +};
      +
     @@ reftable/reader.c (new)
      +	return result;
      +}
      +
     -+int reader_init_block_reader(struct reftable_reader *r,
     -+			     struct reftable_block_reader *br,
     ++int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
      +			     uint64_t next_off, byte want_typ)
      +{
      +	int32_t guess_block_size = r->block_size ? r->block_size :
     @@ reftable/reader.c (new)
      +				 struct table_iter *src)
      +{
      +	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
     -+	struct reftable_block_reader br = { 0 };
     ++	struct block_reader br = { 0 };
      +	int err = 0;
      +
      +	dest->r = src->r;
     @@ reftable/reader.c (new)
      +	}
      +
      +	{
     -+		struct reftable_block_reader *brp =
     -+			reftable_malloc(sizeof(struct reftable_block_reader));
     ++		struct block_reader *brp =
     ++			reftable_malloc(sizeof(struct block_reader));
      +		*brp = br;
      +
      +		dest->finished = false;
     @@ reftable/reader.c (new)
      +static int reader_table_iter_at(struct reftable_reader *r,
      +				struct table_iter *ti, uint64_t off, byte typ)
      +{
     -+	struct reftable_block_reader br = { 0 };
     -+	struct reftable_block_reader *brp = NULL;
     ++	struct block_reader br = { 0 };
     ++	struct block_reader *brp = NULL;
      +
      +	int err = reader_init_block_reader(r, &br, off, typ);
      +	if (err != 0) {
      +		return err;
      +	}
      +
     -+	brp = reftable_malloc(sizeof(struct reftable_block_reader));
     ++	brp = reftable_malloc(sizeof(struct block_reader));
      +	*brp = br;
      +	ti->r = r;
      +	ti->typ = block_reader_type(brp);
     @@ reftable/reader.c (new)
      +
      +exit:
      +	block_iter_close(&next.bi);
     -+	record_clear(rec);
     -+	reftable_free(record_yield(&rec));
     -+	reftable_free(slice_yield(&want_key));
     -+	reftable_free(slice_yield(&got_key));
     ++	record_destroy(&rec);
     ++	slice_clear(&want_key);
     ++	slice_clear(&got_key);
      +	return err;
      +}
      +
     @@ reftable/reader.h (new)
      +			       struct reftable_block *ret);
      +void block_source_close(struct reftable_block_source *source);
      +
     ++/* metadata for a block type */
      +struct reftable_reader_offsets {
      +	bool present;
      +	uint64_t offset;
      +	uint64_t index_offset;
      +};
      +
     ++/* The state for reading a reftable file. */
      +struct reftable_reader {
     ++	/* for convience, associate a name with the instance. */
      +	char *name;
      +	struct reftable_block_source source;
     -+	uint32_t hash_id;
      +
     -+	// Size of the file, excluding the footer.
     ++	/* Size of the file, excluding the footer. */
      +	uint64_t size;
     ++
     ++	/* 'sha1' for SHA1, 's256' for SHA-256 */
     ++	uint32_t hash_id;
     ++
      +	uint32_t block_size;
      +	uint64_t min_update_index;
      +	uint64_t max_update_index;
     ++	/* Length of the OID keys in the 'o' section */
      +	int object_id_len;
      +	int version;
      +
     @@ reftable/reader.h (new)
      +void reader_close(struct reftable_reader *r);
      +const char *reader_name(struct reftable_reader *r);
      +void reader_return_block(struct reftable_reader *r, struct reftable_block *p);
     -+int reader_init_block_reader(struct reftable_reader *r,
     -+			     struct reftable_block_reader *br,
     ++
     ++/* initialize a block reader to read from `r` */
     ++int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
      +			     uint64_t next_off, byte want_typ);
      +
      +#endif
     @@ reftable/record.c (new)
      +		(const struct reftable_log_record *)src_rec;
      +
      +	*dst = *src;
     -+	dst->ref_name = xstrdup(dst->ref_name);
     -+	dst->email = xstrdup(dst->email);
     -+	dst->name = xstrdup(dst->name);
     -+	dst->message = xstrdup(dst->message);
     ++	if (dst->ref_name != NULL) {
     ++		dst->ref_name = xstrdup(dst->ref_name);
     ++	}
     ++	if (dst->email != NULL) {
     ++		dst->email = xstrdup(dst->email);
     ++	}
     ++	if (dst->name != NULL) {
     ++		dst->name = xstrdup(dst->name);
     ++	}
     ++	if (dst->message != NULL) {
     ++		dst->message = xstrdup(dst->message);
     ++	}
     ++
      +	if (dst->new_hash != NULL) {
      +		dst->new_hash = reftable_malloc(hash_size);
      +		memcpy(dst->new_hash, src->new_hash, hash_size);
     @@ reftable/record.c (new)
      +	return start.len - in.len;
      +
      +error:
     -+	reftable_free(slice_yield(&dest));
     ++	slice_clear(&dest);
      +	return FORMAT_ERROR;
      +}
      +
     @@ reftable/record.c (new)
      +	return rec;
      +}
      +
     ++void *record_yield(struct record *rec)
     ++{
     ++	void *p = rec->data;
     ++	rec->data = NULL;
     ++	return p;
     ++}
     ++
     ++void record_destroy(struct record *rec)
     ++{
     ++	record_clear(*rec);
     ++	reftable_free(record_yield(rec));
     ++}
     ++
      +static byte index_record_type(void)
      +{
      +	return BLOCK_TYPE_INDEX;
     @@ reftable/record.c (new)
      +static void index_record_clear(void *rec)
      +{
      +	struct index_record *idx = (struct index_record *)rec;
     -+	reftable_free(slice_yield(&idx->last_key));
     ++	slice_clear(&idx->last_key);
      +}
      +
      +static byte index_record_val_type(const void *rec)
     @@ reftable/record.c (new)
      +	rec->ops = &reftable_log_record_vtable;
      +}
      +
     -+void *record_yield(struct record *rec)
     -+{
     -+	void *p = rec->data;
     -+	rec->data = NULL;
     -+	return p;
     -+}
     -+
      +struct reftable_ref_record *record_as_ref(struct record rec)
      +{
      +	assert(record_type(rec) == BLOCK_TYPE_REF);
      +	return (struct reftable_ref_record *)rec.data;
      +}
      +
     ++struct reftable_log_record *record_as_log(struct record rec)
     ++{
     ++	assert(record_type(rec) == BLOCK_TYPE_LOG);
     ++	return (struct reftable_log_record *)rec.data;
     ++}
     ++
      +static bool hash_equal(byte *a, byte *b, int hash_size)
      +{
      +	if (a != NULL && b != NULL) {
     @@ reftable/record.h (new)
      +	struct record_vtable *ops;
      +};
      +
     ++/* returns true for recognized block types. Block start with the block type. */
      +int is_block_type(byte typ);
      +
     ++/* creates a malloced record of the given type. Dispose with record_destroy */
      +struct record new_record(byte typ);
      +
      +extern struct record_vtable reftable_ref_record_vtable;
      +
     ++/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
     ++   number of bytes written. */
      +int encode_key(bool *restart, struct slice dest, struct slice prev_key,
      +	       struct slice key, byte extra);
     ++
     ++/* Decode into `key` and `extra` from `in` */
      +int decode_key(struct slice *key, byte *extra, struct slice last_key,
      +	       struct slice in);
      +
     @@ reftable/record.h (new)
      +int record_encode(struct record rec, struct slice dest, int hash_size);
      +int record_decode(struct record rec, struct slice key, byte extra,
      +		  struct slice src, int hash_size);
     ++
     ++/* zeroes out the embedded record */
      +void record_clear(struct record rec);
      +
      +/* clear out the record, yielding the record data that was encapsulated. */
      +void *record_yield(struct record *rec);
      +
     ++/* clear and deallocate embedded record, and zero `rec`. */
     ++void record_destroy(struct record *rec);
     ++
      +/* initialize generic records from concrete records. The generic record should
      + * be zeroed out. */
     -+
      +void record_from_obj(struct record *rec, struct obj_record *objrec);
      +void record_from_index(struct record *rec, struct index_record *idxrec);
      +void record_from_ref(struct record *rec, struct reftable_ref_record *refrec);
      +void record_from_log(struct record *rec, struct reftable_log_record *logrec);
      +struct reftable_ref_record *record_as_ref(struct record ref);
     ++struct reftable_log_record *record_as_log(struct record ref);
      +
      +/* for qsort. */
      +int reftable_ref_record_compare_name(const void *a, const void *b);
     @@ reftable/reftable.h (new)
      +/*
      +  returns a new transaction to add reftables to the given stack. As a side
      +  effect, the ref database is locked.
     -+*/ 
     -+int reftable_stack_new_addition(struct reftable_addition **dest, struct reftable_stack *st);
     ++*/
     ++int reftable_stack_new_addition(struct reftable_addition **dest,
     ++				struct reftable_stack *st);
      +
     -+/* Adds a reftable to transaction. */ 
     ++/* Adds a reftable to transaction. */
      +int reftable_addition_add(struct reftable_addition *add,
     -+                          int (*write_table)(struct reftable_writer *wr, void *arg),
     -+                          void *arg);
     ++			  int (*write_table)(struct reftable_writer *wr,
     ++					     void *arg),
     ++			  void *arg);
      +
      +/* Commits the transaction, releasing the lock. */
      +int reftable_addition_commit(struct reftable_addition *add);
      +
     -+/* Release all non-committed data from the transaction; releases the lock if held. */
     ++/* Release all non-committed data from the transaction; releases the lock if
     ++ * held. */
      +void reftable_addition_close(struct reftable_addition *add);
      +
      +/* add a new table to the stack. The write_table function must call
     @@ reftable/slice.c (new)
      +	return p;
      +}
      +
     ++void slice_clear(struct slice *s)
     ++{
     ++	reftable_free(slice_yield(s));
     ++}
     ++
      +void slice_copy(struct slice *dest, struct slice src)
      +{
      +	slice_resize(dest, src.len);
     @@ reftable/slice.h (new)
      +#include "basics.h"
      +#include "reftable.h"
      +
     ++/*
     ++  provides bounds-checked byte ranges.
     ++  To use, initialize as "slice x = {0};"
     ++ */
      +struct slice {
     -+	byte *buf;
      +	int len;
      +	int cap;
     ++	byte *buf;
      +};
      +
     -+void slice_set_string(struct slice *dest, const char *);
     -+void slice_append_string(struct slice *dest, const char *);
     ++void slice_set_string(struct slice *dest, const char *src);
     ++void slice_append_string(struct slice *dest, const char *src);
     ++/* Set length to 0, but retain buffer */
     ++void slice_clear(struct slice *slice);
     ++
     ++/* Return a malloced string for `src` */
      +char *slice_to_string(struct slice src);
     ++
     ++/* Ensure that `buf` is \0 terminated. */
      +const char *slice_as_string(struct slice *src);
     ++
     ++/* Compare slices */
      +bool slice_equal(struct slice a, struct slice b);
     ++
     ++/* Return `buf`, clearing out `s` */
      +byte *slice_yield(struct slice *s);
     ++
     ++/* Copy bytes */
      +void slice_copy(struct slice *dest, struct slice src);
     ++
     ++/* Advance `buf` by `n`, and decrease length. A copy of the slice
     ++   should be kept for deallocating the slice. */
      +void slice_consume(struct slice *s, int n);
     ++
     ++/* Set length of the slice to `l` */
      +void slice_resize(struct slice *s, int l);
     ++
     ++/* Signed comparison */
      +int slice_compare(struct slice a, struct slice b);
     -+int slice_write(struct slice *b, byte *data, int sz);
     -+int slice_write_void(void *b, byte *data, int sz);
     ++
     ++/* Append `data` to the `dest` slice.  */
     ++int slice_write(struct slice *dest, byte *data, int sz);
     ++
     ++/* Append `add` to `dest. */
      +void slice_append(struct slice *dest, struct slice add);
     ++
     ++/* Like slice_write, but suitable for passing to reftable_new_writer
     ++ */
     ++int slice_write_void(void *b, byte *data, int sz);
     ++
     ++/* Find the longest shared prefix size of `a` and `b` */
      +int common_prefix_size(struct slice a, struct slice b);
      +
      +struct reftable_block_source;
     ++
     ++/* Create an in-memory block source for reading reftables */
      +void block_source_from_slice(struct reftable_block_source *bs,
      +			     struct slice *buf);
      +
     @@ reftable/stack.c (new)
      +	slice_append_string(&list_file_name, "/reftables.list");
      +
      +	p->list_file = slice_to_string(list_file_name);
     -+	reftable_free(slice_yield(&list_file_name));
     ++	slice_clear(&list_file_name);
      +	p->reftable_dir = xstrdup(dir);
      +	p->config = config;
      +
     @@ reftable/stack.c (new)
      +		}
      +	}
      +exit:
     -+	reftable_free(slice_yield(&table_path));
     ++	slice_clear(&table_path);
      +	{
      +		int i = 0;
      +		for (i = 0; i < new_tables_len; i++) {
     @@ reftable/stack.c (new)
      +	}
      +	if (add->lock_file_name.len > 0) {
      +		unlink(slice_as_string(&add->lock_file_name));
     -+		reftable_free(slice_yield(&add->lock_file_name));
     ++		slice_clear(&add->lock_file_name);
      +	}
      +
      +	free_names(add->names);
     @@ reftable/stack.c (new)
      +	}
      +
      +	err = write(add->lock_file_fd, table_list.buf, table_list.len);
     -+	free(slice_yield(&table_list));
     ++	slice_clear(&table_list);
      +	if (err < 0) {
      +		err = IO_ERROR;
      +		goto exit;
     @@ reftable/stack.c (new)
      +		unlink(slice_as_string(&temp_tab_file_name));
      +	}
      +
     -+	reftable_free(slice_yield(&temp_tab_file_name));
     -+	reftable_free(slice_yield(&tab_file_name));
     -+	reftable_free(slice_yield(&next_name));
     ++	slice_clear(&temp_tab_file_name);
     ++	slice_clear(&tab_file_name);
     ++	slice_clear(&next_name);
      +	reftable_writer_free(wr);
      +	return err;
      +}
     @@ reftable/stack.c (new)
      +	}
      +	if (err != 0 && temp_tab->len > 0) {
      +		unlink(slice_as_string(temp_tab));
     -+		reftable_free(slice_yield(temp_tab));
     ++		slice_clear(temp_tab);
      +	}
     -+	reftable_free(slice_yield(&next_name));
     ++	slice_clear(&next_name);
      +	return err;
      +}
      +
     @@ reftable/stack.c (new)
      +	if (have_lock) {
      +		unlink(slice_as_string(&lock_file_name));
      +	}
     -+	reftable_free(slice_yield(&new_table_name));
     -+	reftable_free(slice_yield(&new_table_path));
     -+	reftable_free(slice_yield(&ref_list_contents));
     -+	reftable_free(slice_yield(&temp_tab_file_name));
     -+	reftable_free(slice_yield(&lock_file_name));
     ++	slice_clear(&new_table_name);
     ++	slice_clear(&new_table_path);
     ++	slice_clear(&ref_list_contents);
     ++	slice_clear(&temp_tab_file_name);
     ++	slice_clear(&lock_file_name);
      +	return err;
      +}
      +
     @@ reftable/tree.h (new)
      +#ifndef TREE_H
      +#define TREE_H
      +
     ++/* tree_node is a generic binary search tree. */
      +struct tree_node {
      +	void *key;
      +	struct tree_node *left, *right;
      +};
      +
     ++/* looks for `key` in `rootp` using `compare` as comparison function. If insert
     ++   is set, insert the key if it's not found. Else, return NULL.
     ++*/
      +struct tree_node *tree_search(void *key, struct tree_node **rootp,
      +			      int (*compare)(const void *, const void *),
      +			      int insert);
     ++
     ++/* performs an infix walk of the tree. */
      +void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
      +		void *arg);
     ++
     ++/*
     ++  deallocates the tree nodes recursively. Keys should be deallocated separately
     ++  by walking over the tree. */
      +void tree_free(struct tree_node *t);
      +
      +#endif
     @@ reftable/writer.c (new)
      +
      +	result = 0;
      +exit:
     -+	reftable_free(slice_yield(&key));
     ++	slice_clear(&key);
      +	return result;
      +}
      +
     @@ reftable/writer.c (new)
      +			assert(err == 0);
      +		}
      +		for (i = 0; i < idx_len; i++) {
     -+			reftable_free(slice_yield(&idx[i].last_key));
     ++			slice_clear(&idx[i].last_key);
      +		}
      +		reftable_free(idx);
      +	}
     @@ reftable/writer.c (new)
      +	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
      +
      +	FREE_AND_NULL(entry->offsets);
     -+	reftable_free(slice_yield(&entry->hash));
     ++	slice_clear(&entry->hash);
      +	reftable_free(entry);
      +}
      +
     @@ reftable/writer.c (new)
      +{
      +	byte footer[72];
      +	byte *p = footer;
     -+
      +	int err = writer_finish_public_section(w);
     ++	int empty_table = w->next == 0;
      +	if (err != 0) {
      +		goto exit;
      +	}
     ++	w->pending_padding = 0;
     ++	if (empty_table) {
     ++		// Empty tables need a header anyway.
     ++		byte header[28];
     ++		int n = writer_write_header(w, header);
     ++		err = padded_write(w, header, n, 0);
     ++		if (err < 0) {
     ++			goto exit;
     ++		}
     ++	}
      +
      +	p += writer_write_header(w, footer);
      +	put_be64(p, w->stats.ref_stats.index_offset);
     @@ reftable/writer.c (new)
      +
      +	put_be32(p, crc32(0, footer, p - footer));
      +	p += 4;
     -+	w->pending_padding = 0;
      +
      +	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
      +	if (err < 0) {
      +		goto exit;
      +	}
      +
     -+	if (w->stats.log_stats.entries + w->stats.ref_stats.entries == 0) {
     ++	if (empty_table) {
      +		err = EMPTY_TABLE_ERROR;
      +		goto exit;
      +	}
     @@ reftable/writer.c (new)
      +	/* free up memory. */
      +	block_writer_clear(&w->block_writer_data);
      +	writer_clear_index(w);
     -+	reftable_free(slice_yield(&w->last_key));
     ++	slice_clear(&w->last_key);
      +	return err;
      +}
      +
     @@ reftable/writer.c (new)
      +{
      +	int i = 0;
      +	for (i = 0; i < w->index_len; i++) {
     -+		reftable_free(slice_yield(&w->index[i].last_key));
     ++		slice_clear(&w->index[i].last_key);
      +	}
      +
      +	FREE_AND_NULL(w->index);
     @@ reftable/writer.c (new)
      +
      +	w->index_len++;
      +	w->next += padding + raw_bytes;
     -+	block_writer_reset(&w->block_writer_data);
      +	w->block_writer = NULL;
      +	return 0;
      +}
     @@ reftable/writer.h (new)
      +	int pending_padding;
      +	struct slice last_key;
      +
     ++	/* offset of next block to write. */
      +	uint64_t next;
      +	uint64_t min_update_index, max_update_index;
      +	struct reftable_write_options opts;
      +
     ++	/* memory buffer for writing */
      +	byte *block;
     -+	struct reftable_block_writer *block_writer;
     -+	struct reftable_block_writer block_writer_data;
     ++
     ++	/* writer for the current section. NULL or points to
     ++	 * block_writer_data */
     ++	struct block_writer *block_writer;
     ++
     ++	struct block_writer block_writer_data;
     ++
     ++	/* pending index records for the current section */
      +	struct index_record *index;
      +	int index_len;
      +	int index_cap;
      +
     -+	/* tree for use with tsearch */
     ++	/*
     ++	  tree for use with tsearch; used to populate the 'o' inverse OID
     ++	  map */
      +	struct tree_node *obj_index_tree;
      +
      +	struct reftable_stats stats;
      +};
      +
     ++/* finishes a block, and writes it to storage */
      +int writer_flush_block(struct reftable_writer *w);
     ++
     ++/* deallocates memory related to the index */
      +void writer_clear_index(struct reftable_writer *w);
     ++
     ++/* finishes writing a 'r' (refs) or 'g' (reflogs) section */
      +int writer_finish_public_section(struct reftable_writer *w);
      +
      +#endif
  9:  b29c4ecc1c4 ! 10:  ad72edbcfd4 Reftable support for git-core
     @@ Commit message
      
          TODO:
      
     -     * "git show-ref" shows "HEAD"
           * Resolve spots marked with XXX
           * Test strategy?
      
          Example use: see t/t0031-reftable.sh
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
     +    Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
          Co-authored-by: Jeff King <peff@peff.net>
      
       ## Documentation/technical/repository-version.txt ##
     @@ Makefile: $(XDIFF_LIB): $(XDIFF_OBJS)
       export DEFAULT_EDITOR DEFAULT_PAGER
       
       Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
     +@@ Makefile: cocciclean:
     + clean: profile-clean coverage-clean cocciclean
     + 	$(RM) *.res
     + 	$(RM) $(OBJECTS)
     +-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
     ++	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
     + 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
     + 	$(RM) $(TEST_PROGRAMS)
     + 	$(RM) $(FUZZ_PROGRAMS)
      
       ## builtin/clone.c ##
      @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
     @@ refs/reftable-backend.c (new)
      +static void fill_reftable_log_record(struct reftable_log_record *log)
      +{
      +	const char *info = git_committer_info(0);
     -+	struct ident_split split = {};
     ++	struct ident_split split = { NULL };
      +	int result = split_ident_line(&split, info, strlen(info));
      +	int sign = 1;
      +	assert(0 == result);
     @@ refs/reftable-backend.c (new)
      +}
      +
      +static struct ref_store *git_reftable_ref_store_create(const char *path,
     -+						   unsigned int store_flags)
     ++						       unsigned int store_flags)
      +{
      +	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
      +	struct ref_store *ref_store = (struct ref_store *)refs;
     @@ refs/reftable-backend.c (new)
      +	struct ref_store *ref_store;
      +	unsigned int flags;
      +	int err;
     -+	char *prefix;
     ++	const char *prefix;
      +};
      +
      +static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
      +{
     -+	struct git_reftable_iterator *ri = (struct git_reftable_iterator *)ref_iterator;
     ++	struct git_reftable_iterator *ri =
     ++		(struct git_reftable_iterator *)ref_iterator;
      +	while (ri->err == 0) {
      +		ri->err = reftable_iterator_next_ref(ri->iter, &ri->ref);
      +		if (ri->err) {
     @@ refs/reftable-backend.c (new)
      +static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
      +				      struct object_id *peeled)
      +{
     -+	struct git_reftable_iterator *ri = (struct git_reftable_iterator *)ref_iterator;
     ++	struct git_reftable_iterator *ri =
     ++		(struct git_reftable_iterator *)ref_iterator;
      +	if (ri->ref.target_value != NULL) {
      +		hashcpy(peeled->hash, ri->ref.target_value);
      +		return 0;
     @@ refs/reftable-backend.c (new)
      +
      +static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
      +{
     -+	struct git_reftable_iterator *ri = (struct git_reftable_iterator *)ref_iterator;
     ++	struct git_reftable_iterator *ri =
     ++		(struct git_reftable_iterator *)ref_iterator;
      +	reftable_ref_record_clear(&ri->ref);
      +	reftable_iterator_destroy(&ri->iter);
      +	return 0;
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
     ++        ri->prefix = prefix;
      +	ri->base.oid = &ri->oid;
      +	ri->flags = flags;
      +	ri->ref_store = ref_store;
     @@ refs/reftable-backend.c (new)
      +static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
      +				  struct object_id *want_oid)
      +{
     -+	struct object_id out_oid = {};
     ++	struct object_id out_oid;
      +	int out_flags = 0;
      +	const char *resolved = refs_resolve_ref_unsafe(
      +		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
     @@ refs/reftable-backend.c (new)
      +		(struct git_reftable_ref_store *)transaction->ref_store;
      +	uint64_t ts = reftable_stack_next_update_index(refs->stack);
      +	int err = 0;
     -+	struct reftable_log_record *logs = calloc(transaction->nr, sizeof(*logs));
     ++	struct reftable_log_record *logs =
     ++		calloc(transaction->nr, sizeof(*logs));
      +	struct ref_update **sorted =
      +		malloc(transaction->nr * sizeof(struct ref_update *));
      +	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
     @@ refs/reftable-backend.c (new)
      +		log->message = u->msg;
      +
      +		if (u->flags & REF_HAVE_NEW) {
     -+			struct object_id out_oid = {};
     ++			struct object_id out_oid;
      +			int out_flags = 0;
      +			/* Memory owned by refs_resolve_ref_unsafe, no need to
      +			 * free(). */
      +			const char *resolved = refs_resolve_ref_unsafe(
      +				transaction->ref_store, u->refname, 0, &out_oid,
      +				&out_flags);
     -+			struct reftable_ref_record ref = {};
     ++			struct reftable_ref_record ref = { NULL };
      +			ref.ref_name =
      +				(char *)(resolved ? resolved : u->refname);
      +			log->ref_name = ref.ref_name;
     @@ refs/reftable-backend.c (new)
      +		return refs->err;
      +	}
      +
     -+	err = reftable_stack_add(refs->stack, &write_transaction_table, transaction);
     ++	err = reftable_stack_add(refs->stack, &write_transaction_table,
     ++				 transaction);
      +	if (err < 0) {
      +		strbuf_addf(errmsg, "reftable: transaction failure %s",
      +			    reftable_error_str(err));
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +	for (int i = 0; i < arg->refnames->nr; i++) {
     -+		struct reftable_log_record log = {};
     -+		struct reftable_ref_record current = {};
     ++		struct reftable_log_record log = { NULL };
     ++		struct reftable_ref_record current = { NULL };
      +		fill_reftable_log_record(&log);
      +		log.message = xstrdup(arg->logmsg);
      +		log.new_hash = NULL;
     @@ refs/reftable-backend.c (new)
      +		log.update_index = ts;
      +		log.ref_name = (char *)arg->refnames->items[i].string;
      +
     -+		if (reftable_stack_read_ref(arg->stack, log.ref_name, &current) == 0) {
     ++		if (reftable_stack_read_ref(arg->stack, log.ref_name,
     ++					    &current) == 0) {
      +			log.old_hash = current.value;
      +		}
      +		err = reftable_writer_add_log(writer, &log);
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +	{
     -+		struct reftable_log_record log = {};
     -+		struct object_id new_oid = {};
     -+		struct object_id old_oid = {};
     -+		struct reftable_ref_record current = {};
     -+		reftable_stack_read_ref(create->refs->stack, create->refname, &current);
     ++		struct reftable_log_record log = { NULL };
     ++		struct object_id new_oid;
     ++		struct object_id old_oid;
     ++		struct reftable_ref_record current = { NULL };
     ++		reftable_stack_read_ref(create->refs->stack, create->refname,
     ++					&current);
      +
      +		fill_reftable_log_record(&log);
      +		log.ref_name = current.ref_name;
     @@ refs/reftable-backend.c (new)
      +	if (refs->err < 0) {
      +		return refs->err;
      +	}
     -+	return reftable_stack_add(refs->stack, &write_create_symref_table, &arg);
     ++	return reftable_stack_add(refs->stack, &write_create_symref_table,
     ++				  &arg);
      +}
      +
      +struct write_rename_arg {
     @@ refs/reftable-backend.c (new)
      +{
      +	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
      +	uint64_t ts = reftable_stack_next_update_index(arg->stack);
     -+	struct reftable_ref_record ref = {};
     ++	struct reftable_ref_record ref = { NULL };
      +	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
      +
      +	if (err) {
     @@ refs/reftable-backend.c (new)
      +	ref.update_index = ts;
      +
      +	{
     -+		struct reftable_ref_record todo[2] = {};
     ++		struct reftable_ref_record todo[2] = { { NULL } };
      +		todo[0].ref_name = (char *)arg->oldname;
      +		todo[0].update_index = ts;
      +		/* leave todo[0] empty */
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +	if (ref.value != NULL) {
     -+		struct reftable_log_record todo[2] = {};
     ++		struct reftable_log_record todo[2] = { { NULL } };
      +		fill_reftable_log_record(&todo[0]);
      +		fill_reftable_log_record(&todo[1]);
      +
     @@ refs/reftable-backend.c (new)
      +			return ITER_ERROR;
      +		}
      +
     ++                if (reftable_log_record_is_deletion(&ri->log)) {
     ++                        /* XXX - Why does the reftable_stack filter these? */
     ++                        continue;
     ++                }
      +		ri->base.refname = ri->log.ref_name;
      +		if (ri->last_name != NULL &&
      +		    !strcmp(ri->log.ref_name, ri->last_name)) {
     ++                        /* we want the refnames that we have reflogs for, so we
     ++                         * skip if we've already produced this name. This could
     ++                         * be faster by seeking directly to
     ++                         * reflog@update_index==0.
     ++                         */
      +			continue;
      +		}
      +
     @@ refs/reftable-backend.c (new)
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
      +
     -+	struct reftable_merged_table *mt = reftable_stack_merged_table(refs->stack);
     ++	struct reftable_merged_table *mt =
     ++		reftable_stack_merged_table(refs->stack);
      +	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
      +	if (err < 0) {
      +		free(ri);
     @@ refs/reftable-backend.c (new)
      +					  const char *refname,
      +					  each_reflog_ent_fn fn, void *cb_data)
      +{
     -+	struct reftable_iterator it = {};
     ++	struct reftable_iterator it = { NULL };
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
      +	struct reftable_merged_table *mt = NULL;
      +	int err = 0;
     -+	struct reftable_log_record log = {};
     ++	struct reftable_log_record log = { NULL };
      +
      +	if (refs->err < 0) {
      +		return refs->err;
     @@ refs/reftable-backend.c (new)
      +		}
      +
      +		{
     -+			struct object_id old_oid = {};
     -+			struct object_id new_oid = {};
     ++			struct object_id old_oid;
     ++			struct object_id new_oid;
      +			const char *full_committer = "";
      +
      +			hashcpy(old_oid.hash, log.old_hash);
     @@ refs/reftable-backend.c (new)
      +					  const char *refname,
      +					  each_reflog_ent_fn fn, void *cb_data)
      +{
     -+	struct reftable_iterator it = {};
     ++	struct reftable_iterator it = { NULL };
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
      +	struct reftable_merged_table *mt = NULL;
     @@ refs/reftable-backend.c (new)
      +	err = reftable_merged_table_seek_log(mt, &it, refname);
      +
      +	while (err == 0) {
     -+		struct reftable_log_record log = {};
     ++		struct reftable_log_record log = { NULL };
      +		err = reftable_iterator_next_log(it, &log);
      +		if (err != 0) {
      +			break;
     @@ refs/reftable-backend.c (new)
      +
      +	for (int i = len; i--;) {
      +		struct reftable_log_record *log = &logs[i];
     -+		struct object_id old_oid = {};
     -+		struct object_id new_oid = {};
     ++		struct object_id old_oid;
     ++		struct object_id new_oid;
      +		const char *full_committer = "";
      +
      +		hashcpy(old_oid.hash, log->old_hash);
     @@ refs/reftable-backend.c (new)
      +	struct reflog_expiry_arg arg = {
      +		.refs = refs,
      +	};
     -+	struct reftable_log_record log = {};
     -+	struct reftable_iterator it = {};
     ++	struct reftable_log_record log = { NULL };
     ++	struct reftable_iterator it = { NULL };
      +	int err = 0;
      +	if (refs->err < 0) {
      +		return refs->err;
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +	while (1) {
     -+		struct object_id ooid = {};
     -+		struct object_id noid = {};
     ++		struct object_id ooid;
     ++		struct object_id noid;
      +
      +		int err = reftable_iterator_next_log(it, &log);
      +		if (err < 0) {
     @@ refs/reftable-backend.c (new)
      +{
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
     -+	struct reftable_ref_record ref = {};
     ++	struct reftable_ref_record ref = { NULL };
      +	int err = 0;
      +	if (refs->err < 0) {
      +		return refs->err;
     @@ t/t0031-reftable.sh (new)
      +
      +. ./test-lib.sh
      +
     ++# XXX - fix GC
      +test_expect_success 'basic operation of reftable storage' '
     -+	git init --ref-storage=reftable repo && (
     -+	cd repo &&
     -+	echo "hello" >world.txt &&
     -+	git add world.txt &&
     -+	git commit -m "first post" &&
     -+	test_write_lines HEAD refs/heads/master >expect &&
     ++	rm -rf .git &&
     ++	git init --ref-storage=reftable &&
     ++	mv .git/hooks .git/hooks-disabled &&
     ++	test_commit file &&
     ++	test_write_lines refs/heads/master refs/tags/file >expect &&
      +	git show-ref &&
      +	git show-ref | cut -f2 -d" " > actual &&
      +	test_cmp actual expect &&
      +	for count in $(test_seq 1 10)
      +	do
     -+		echo "hello" >>world.txt
     -+		git commit -m "number ${count}" world.txt ||
     -+		return 1
     ++		test_commit "number $count" file.t $count number-$count ||
     ++ 		return 1
      +	done &&
     ++(true || (test_pause &&
      +	git gc &&
     -+	nfiles=$(ls -1 .git/reftable | wc -l ) &&
     -+	test ${nfiles} = "2" &&
     -+	git reflog refs/heads/master >output &&
     ++	ls -1 .git/reftable >table-files &&
     ++	test_line_count = 2 table-files &&
     ++ 	git reflog refs/heads/master >output &&
      +	test_line_count = 11 output &&
      +	grep "commit (initial): first post" output &&
     -+	grep "commit: number 10" output )
     ++	grep "commit: number 10" output ))
      +'
      +
      +test_done

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v9 01/10] refs.h: clarify reflog iteration order
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
@ 2020-04-20 21:14                 ` Han-Wen Nienhuys via GitGitGadget
  2020-04-20 21:14                 ` [PATCH v9 02/10] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
                                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-20 21:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/refs.h b/refs.h
index 545029c6d80..87c9ec921b9 100644
--- a/refs.h
+++ b/refs.h
@@ -444,18 +444,21 @@ int delete_refs(const char *msg, struct string_list *refnames,
 int refs_delete_reflog(struct ref_store *refs, const char *refname);
 int delete_reflog(const char *refname);
 
-/* iterate over reflog entries */
+/* Iterate over reflog entries. */
 typedef int each_reflog_ent_fn(
 		struct object_id *old_oid, struct object_id *new_oid,
 		const char *committer, timestamp_t timestamp,
 		int tz, const char *msg, void *cb_data);
 
+/* Iterate in over reflog entries, oldest entry first. */
 int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
 			     each_reflog_ent_fn fn, void *cb_data);
 int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
 				     const char *refname,
 				     each_reflog_ent_fn fn,
 				     void *cb_data);
+
+/* Call a function for each reflog entry, oldest entry first. */
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v9 02/10] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
  2020-04-20 21:14                 ` [PATCH v9 01/10] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
@ 2020-04-20 21:14                 ` Han-Wen Nienhuys via GitGitGadget
  2020-04-20 21:14                 ` [PATCH v9 03/10] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
                                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-20 21:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This happens implicitly in the files/packed ref backend; making it
explicit simplifies adding alternate ref storage backends, such as
reftable.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 1ab0bb54d3d..3b9742d03cb 100644
--- a/refs.c
+++ b/refs.c
@@ -1569,7 +1569,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1629,8 +1629,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
-			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
+			       cb_data);
 }
 
 int for_each_rawref(each_ref_fn fn, void *cb_data)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v9 03/10] create .git/refs in files-backend.c
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
  2020-04-20 21:14                 ` [PATCH v9 01/10] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
  2020-04-20 21:14                 ` [PATCH v9 02/10] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
@ 2020-04-20 21:14                 ` Han-Wen Nienhuys via GitGitGadget
  2020-04-20 21:14                 ` [PATCH v9 04/10] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
                                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-20 21:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This prepares for supporting the reftable format, which will want
create its own file system layout in .git

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/init-db.c    | 2 --
 refs/files-backend.c | 6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 0b7222e7188..3b50b1aa0e5 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -251,8 +251,6 @@ static int create_default_files(const char *template_path,
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
 	 */
-	safe_create_dir(git_path("refs"), 1);
-	adjust_shared_perm(git_path("refs"));
 
 	if (refs_init_db(&err))
 		die("failed to set up refs db: %s", err.buf);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 561c33ac8a9..ab7899a9c77 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3157,9 +3157,15 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
 	struct strbuf sb = STRBUF_INIT;
 
+	files_ref_path(refs, &sb, "refs");
+	safe_create_dir(sb.buf, 1);
+	/* adjust permissions even if directory already exists. */
+	adjust_shared_perm(sb.buf);
+
 	/*
 	 * Create .git/refs/{heads,tags}
 	 */
+	strbuf_reset(&sb);
 	files_ref_path(refs, &sb, "refs/heads");
 	safe_create_dir(sb.buf, 1);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v9 04/10] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
                                   ` (2 preceding siblings ...)
  2020-04-20 21:14                 ` [PATCH v9 03/10] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
@ 2020-04-20 21:14                 ` Han-Wen Nienhuys via GitGitGadget
  2020-04-20 21:14                 ` [PATCH v9 05/10] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
                                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-20 21:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index ff2436c0fb7..3490aac3a40 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -438,6 +438,11 @@ void base_ref_iterator_free(struct ref_iterator *iter);
 
 /* Virtual function declarations for ref_iterators: */
 
+/*
+ * backend-specific implementation of ref_iterator_advance.
+ * For symrefs, the function should set REF_ISSYMREF, and it should also
+ * dereference the symref to provide the OID referent.
+ */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v9 05/10] Add .gitattributes for the reftable/ directory
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
                                   ` (3 preceding siblings ...)
  2020-04-20 21:14                 ` [PATCH v9 04/10] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
@ 2020-04-20 21:14                 ` Han-Wen Nienhuys via GitGitGadget
  2020-04-20 21:14                 ` [PATCH v9 06/10] reftable: file format documentation Jonathan Nieder via GitGitGadget
                                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-20 21:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 reftable/.gitattributes

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 00000000000..f44451a3795
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v9 06/10] reftable: file format documentation
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
                                   ` (4 preceding siblings ...)
  2020-04-20 21:14                 ` [PATCH v9 05/10] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
@ 2020-04-20 21:14                 ` Jonathan Nieder via GitGitGadget
  2020-04-20 21:14                 ` [PATCH v9 07/10] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
                                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 409+ messages in thread
From: Jonathan Nieder via GitGitGadget @ 2020-04-20 21:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder

From: Jonathan Nieder <jrnieder@gmail.com>

Shawn Pearce explains:

Some repositories contain a lot of references (e.g. android at 866k,
rails at 31k). The reftable format provides:

- Near constant time lookup for any single reference, even when the
  repository is cold and not in process or kernel cache.
- Near constant time verification a SHA-1 is referred to by at least
  one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.

This file format spec was originally written in July, 2017 by Shawn
Pearce.  Some refinements since then were made by Shawn and by Han-Wen
Nienhuys based on experiences implementing and experimenting with the
format.  (All of this was in the context of our work at Google and
Google is happy to contribute the result to the Git project.)

Imported from JGit[1]'s current version (c217d33ff,
"Documentation/technical/reftable: improve repo layout", 2020-02-04)
of Documentation/technical/reftable.md and converted to asciidoc by
running

  pandoc -t asciidoc -f markdown reftable.md >reftable.txt

using pandoc 2.2.1.  The result required the following additional
minor changes:

- removed the [TOC] directive to add a table of contents, since
  asciidoc does not support it
- replaced git-scm.com/docs links with linkgit: directives that link
  to other pages within Git's documentation

[1] https://eclipse.googlesource.com/jgit/jgit

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/Makefile               |    1 +
 Documentation/technical/reftable.txt | 1067 ++++++++++++++++++++++++++
 2 files changed, 1068 insertions(+)
 create mode 100644 Documentation/technical/reftable.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 8fe829cc1b8..3aab9b8d61a 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -92,6 +92,7 @@ TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
 TECH_DOCS += technical/protocol-v2
 TECH_DOCS += technical/racy-git
+TECH_DOCS += technical/reftable
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
 TECH_DOCS += technical/signature-format
diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
new file mode 100644
index 00000000000..9fa4657d9ff
--- /dev/null
+++ b/Documentation/technical/reftable.txt
@@ -0,0 +1,1067 @@
+reftable
+--------
+
+Overview
+~~~~~~~~
+
+Problem statement
+^^^^^^^^^^^^^^^^^
+
+Some repositories contain a lot of references (e.g. android at 866k,
+rails at 31k). The existing packed-refs format takes up a lot of space
+(e.g. 62M), and does not scale with additional references. Lookup of a
+single reference requires linearly scanning the file.
+
+Atomic pushes modifying multiple references require copying the entire
+packed-refs file, which can be a considerable amount of data moved
+(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
+
+Repositories with many loose references occupy a large number of disk
+blocks from the local file system, as each reference is its own file
+storing 41 bytes (and another file for the corresponding reflog). This
+negatively affects the number of inodes available when a large number of
+repositories are stored on the same filesystem. Readers can be penalized
+due to the larger number of syscalls required to traverse and read the
+`$GIT_DIR/refs` directory.
+
+Objectives
+^^^^^^^^^^
+
+* Near constant time lookup for any single reference, even when the
+repository is cold and not in process or kernel cache.
+* Near constant time verification if a SHA-1 is referred to by at least
+one reference (for allow-tip-sha1-in-want).
+* Efficient lookup of an entire namespace, such as `refs/tags/`.
+* Support atomic push with `O(size_of_update)` operations.
+* Combine reflog storage with ref storage for small transactions.
+* Separate reflog storage for base refs and historical logs.
+
+Description
+^^^^^^^^^^^
+
+A reftable file is a portable binary file format customized for
+reference storage. References are sorted, enabling linear scans, binary
+search lookup, and range scans.
+
+Storage in the file is organized into variable sized blocks. Prefix
+compression is used within a single block to reduce disk space. Block
+size and alignment is tunable by the writer.
+
+Performance
+^^^^^^^^^^^
+
+Space used, packed-refs vs. reftable:
+
+[cols=",>,>,>,>,>",options="header",]
+|===============================================================
+|repository |packed-refs |reftable |% original |avg ref |avg obj
+|android |62.2 M |36.1 M |58.0% |33 bytes |5 bytes
+|rails |1.8 M |1.1 M |57.7% |29 bytes |4 bytes
+|git |78.7 K |48.1 K |61.0% |50 bytes |4 bytes
+|git (heads) |332 b |269 b |81.0% |33 bytes |0 bytes
+|===============================================================
+
+Scan (read 866k refs), by reference name lookup (single ref from 866k
+refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs):
+
+[cols=",>,>,>,>",options="header",]
+|=========================================================
+|format |cache |scan |by name |by SHA-1
+|packed-refs |cold |402 ms |409,660.1 usec |412,535.8 usec
+|packed-refs |hot | |6,844.6 usec |20,110.1 usec
+|reftable |cold |112 ms |33.9 usec |323.2 usec
+|reftable |hot | |20.2 usec |320.8 usec
+|=========================================================
+
+Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable:
+
+[cols=",>,>",options="header",]
+|================================
+|format |size |avg entry
+|$GIT_DIR/logs |173 M |1209 bytes
+|reftable |5 M |37 bytes
+|================================
+
+Details
+~~~~~~~
+
+Peeling
+^^^^^^^
+
+References stored in a reftable are peeled, a record for an annotated
+(or signed) tag records both the tag object, and the object it refers
+to.
+
+Reference name encoding
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Reference names are an uninterpreted sequence of bytes that must pass
+linkgit:git-check-ref-format[1] as a valid reference name.
+
+Key unicity
+^^^^^^^^^^^
+
+Each entry must have a unique key; repeated keys are disallowed.
+
+Network byte order
+^^^^^^^^^^^^^^^^^^
+
+All multi-byte, fixed width fields are in network byte order.
+
+Ordering
+^^^^^^^^
+
+Blocks are lexicographically ordered by their first reference.
+
+Directory/file conflicts
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format accepts both `refs/heads/foo` and
+`refs/heads/foo/bar` as distinct references.
+
+This property is useful for retaining log records in reftable, but may
+confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
+references. Users of reftable may choose to continue to reject `foo` and
+`foo/bar` type conflicts to prevent problems for peers.
+
+File format
+~~~~~~~~~~~
+
+Structure
+^^^^^^^^^
+
+A reftable file has the following high-level structure:
+
+....
+first_block {
+  header
+  first_ref_block
+}
+ref_block*
+ref_index*
+obj_block*
+obj_index*
+log_block*
+log_index*
+footer
+....
+
+A log-only file omits the `ref_block`, `ref_index`, `obj_block` and
+`obj_index` sections, containing only the file header and log block:
+
+....
+first_block {
+  header
+}
+log_block*
+log_index*
+footer
+....
+
+in a log-only file the first log block immediately follows the file
+header, without padding to block alignment.
+
+Block size
+^^^^^^^^^^
+
+The file’s block size is arbitrarily determined by the writer, and does
+not have to be a power of 2. The block size must be larger than the
+longest reference name or log entry used in the repository, as
+references cannot span blocks.
+
+Powers of two that are friendly to the virtual memory system or
+filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
+yield better compression, with a possible increased cost incurred by
+readers during access.
+
+The largest block size is `16777215` bytes (15.99 MiB).
+
+Block alignment
+^^^^^^^^^^^^^^^
+
+Writers may choose to align blocks at multiples of the block size by
+including `padding` filled with NUL bytes at the end of a block to round
+out to the chosen alignment. When alignment is used, writers must
+specify the alignment with the file header’s `block_size` field.
+
+Block alignment is not required by the file format. Unaligned files must
+set `block_size = 0` in the file header, and omit `padding`. Unaligned
+files with more than one ref block must include the link:#Ref-index[ref
+index] to support fast lookup. Readers must be able to read both aligned
+and non-aligned files.
+
+Very small files (e.g. 1 only ref block) may omit `padding` and the ref
+index to reduce total file size.
+
+Header
+^^^^^^
+
+A 24-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+....
+
+Aligned files must specify `block_size` to configure readers with the
+expected block alignment. Unaligned files must set `block_size = 0`.
+
+The `min_update_index` and `max_update_index` describe bounds for the
+`update_index` field of all log records in this file. When reftables are
+used in a stack for link:#Update-transactions[transactions], these
+fields can order the files such that the prior file’s
+`max_update_index + 1` is the next file’s `min_update_index`.
+
+First ref block
+^^^^^^^^^^^^^^^
+
+The first ref block shares the same block as the file header, and is 24
+bytes smaller than all other blocks in the file. The first block
+immediately begins after the file header, at position 24.
+
+If the first block is a log block (a log-only file), its block header
+begins immediately at position 24.
+
+Ref block format
+^^^^^^^^^^^^^^^^
+
+A ref block is written as:
+
+....
+'r'
+uint24( block_len )
+ref_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
+encodes the number of bytes in the block up to, but not including the
+optional `padding`. This is always less than or equal to the file’s
+block size. In the first ref block, `block_len` includes 24 bytes for
+the file header.
+
+The 2-byte `restart_count` stores the number of entries in the
+`restart_offset` list, which must not be empty. Readers can use
+`restart_count` to binary search between restarts before starting a
+linear scan.
+
+Exactly `restart_count` 3-byte `restart_offset` values precedes the
+`restart_count`. Offsets are relative to the start of the block and
+refer to the first byte of any `ref_record` whose name has not been
+prefix compressed. Entries in the `restart_offset` list must be sorted,
+ascending. Readers can start linear scans from any of these records.
+
+A variable number of `ref_record` fill the middle of the block,
+describing reference names and values. The format is described below.
+
+As the first ref block shares the first file block with the file header,
+all `restart_offset` in the first block are relative to the start of the
+file (position 0), and include the file header. This forces the first
+`restart_offset` to be `28`.
+
+ref record
+++++++++++
+
+A `ref_record` describes a single reference, storing both the name and
+its value(s). Records are formatted as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | value_type )
+suffix
+varint( update_index_delta )
+value?
+....
+
+The `prefix_length` field specifies how many leading bytes of the prior
+reference record’s name should be copied to obtain this reference’s
+name. This must be 0 for the first reference in any block, and also must
+be 0 for any `ref_record` whose offset is listed in the `restart_offset`
+table at the end of the block.
+
+Recovering a reference name from any `ref_record` is a simple concat:
+
+....
+this_name = prior_name[0..prefix_length] + suffix
+....
+
+The `suffix_length` value provides the number of bytes available in
+`suffix` to copy from `suffix` to complete the reference name.
+
+The `update_index` that last modified the reference can be obtained by
+adding `update_index_delta` to the `min_update_index` from the file
+header: `min_update_index + update_index_delta`.
+
+The `value` follows. Its format is determined by `value_type`, one of
+the following:
+
+* `0x0`: deletion; no value data (see transactions, below)
+* `0x1`: one 20-byte object id; value of the ref
+* `0x2`: two 20-byte object ids; value of the ref, peeled target
+* `0x3`: symbolic reference: `varint( target_len ) target`
+
+Symbolic references use `0x3`, followed by the complete name of the
+reference target. No compression is applied to the target name.
+
+Types `0x4..0x7` are reserved for future use.
+
+Ref index
+^^^^^^^^^
+
+The ref index stores the name of the last reference from every ref block
+in the file, enabling reduced disk seeks for lookups. Any reference can
+be found by searching the index, identifying the containing block, and
+searching within that block.
+
+The index may be organized into a multi-level index, where the 1st level
+index block points to additional ref index blocks (2nd level), which may
+in turn point to either additional index blocks (e.g. 3rd level) or ref
+blocks (leaf level). Disk reads required to access a ref go up with
+higher index levels. Multi-level indexes may be required to ensure no
+single index block exceeds the file format’s max block size of
+`16777215` bytes (15.99 MiB). To acheive constant O(1) disk seeks for
+lookups the index must be a single level, which is permitted to exceed
+the file’s configured block size, but not the format’s max block size of
+15.99 MiB.
+
+If present, the ref index block(s) appears after the last ref block.
+
+If there are at least 4 ref blocks, a ref index block should be written
+to improve lookup times. Cold reads using the index require 2 disk reads
+(read index, read block), and binary searching < 4 blocks also requires
+<= 2 reads. Omitting the index block from smaller files saves space.
+
+If the file is unaligned and contains more than one ref block, the ref
+index must be written.
+
+Index block format:
+
+....
+'i'
+uint24( block_len )
+index_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+The index blocks begin with `block_type = 'i'` and a 3-byte `block_len`
+which encodes the number of bytes in the block, up to but not including
+the optional `padding`.
+
+The `restart_offset` and `restart_count` fields are identical in format,
+meaning and usage as in ref blocks.
+
+To reduce the number of reads required for random access in very large
+files the index block may be larger than other blocks. However, readers
+must hold the entire index in memory to benefit from this, so it’s a
+time-space tradeoff in both file size and reader memory.
+
+Increasing the file’s block size decreases the index size. Alternatively
+a multi-level index may be used, keeping index blocks within the file’s
+block size, but increasing the number of blocks that need to be
+accessed.
+
+index record
+++++++++++++
+
+An index record describes the last entry in another block. Index records
+are written as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | 0 )
+suffix
+varint( block_position )
+....
+
+Index records use prefix compression exactly like `ref_record`.
+
+Index records store `block_position` after the suffix, specifying the
+absolute position in bytes (from the start of the file) of the block
+that ends with this reference. Readers can seek to `block_position` to
+begin reading the block header.
+
+Readers must examine the block header at `block_position` to determine
+if the next block is another level index block, or the leaf-level ref
+block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the ref index must first read the footer (below) to
+obtain `ref_index_position`. If not present, the position will be 0. The
+`ref_index_position` is for the 1st level root of the ref index.
+
+Obj block format
+^^^^^^^^^^^^^^^^
+
+Object blocks are optional. Writers may choose to omit object blocks,
+especially if readers will not use the SHA-1 to ref mapping.
+
+Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to
+ref blocks containing references pointing to that object directly, or as
+the peeled value of an annotated tag. Like ref blocks, object blocks use
+the file’s standard block size. The abbrevation length is available in
+the footer as `obj_id_len`.
+
+To save space in small files, object blocks may be omitted if the ref
+index is not present, as brute force search will only need to read a few
+ref blocks. When missing, readers should brute force a linear search of
+all references to lookup by SHA-1.
+
+An object block is written as:
+
+....
+'o'
+uint24( block_len )
+obj_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Fields are identical to ref block. Binary search using the restart table
+works the same as in reference blocks.
+
+Because object identifiers are abbreviated by writers to the shortest
+unique abbreviation within the reftable, obj key lengths are variable
+between 2 and 20 bytes. Readers must compare only for common prefix
+match within an obj block or obj index.
+
+obj record
+++++++++++
+
+An `obj_record` describes a single object abbreviation, and the blocks
+containing references using that unique abbreviation:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | cnt_3 )
+suffix
+varint( cnt_large )?
+varint( position_delta )*
+....
+
+Like in reference blocks, abbreviations are prefix compressed within an
+obj block. On large reftables with many unique objects, higher block
+sizes (64k), and higher restart interval (128), a `prefix_length` of 2
+or 3 and `suffix_length` of 3 may be common in obj records (unique
+abbreviation of 5-6 raw bytes, 10-12 hex digits).
+
+Each record contains `position_count` number of positions for matching
+ref blocks. For 1-7 positions the count is stored in `cnt_3`. When
+`cnt_3 = 0` the actual count follows in a varint, `cnt_large`.
+
+The use of `cnt_3` bets most objects are pointed to by only a single
+reference, some may be pointed to by a couple of references, and very
+few (if any) are pointed to by more than 7 references.
+
+A special case exists when `cnt_3 = 0` and `cnt_large = 0`: there are no
+`position_delta`, but at least one reference starts with this
+abbreviation. A reader that needs exact reference names must scan all
+references to find which specific references have the desired object.
+Writers should use this format when the `position_delta` list would have
+overflowed the file’s block size due to a high number of references
+pointing to the same object.
+
+The first `position_delta` is the position from the start of the file.
+Additional `position_delta` entries are sorted ascending and relative to
+the prior entry, e.g. a reader would perform:
+
+....
+pos = position_delta[0]
+prior = pos
+for (j = 1; j < position_count; j++) {
+  pos = prior + position_delta[j]
+  prior = pos
+}
+....
+
+With a position in hand, a reader must linearly scan the ref block,
+starting from the first `ref_record`, testing each reference’s SHA-1s
+(for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
+SHA-1 within a single ref block is not supported by the reftable format.
+Smaller block sizes reduce the number of candidates this step must
+consider.
+
+Obj index
+^^^^^^^^^
+
+The obj index stores the abbreviation from the last entry for every obj
+block in the file, enabling reduced disk seeks for all lookups. It is
+formatted exactly the same as the ref index, but refers to obj blocks.
+
+The obj index should be present if obj blocks are present, as obj blocks
+should only be written in larger files.
+
+Readers loading the obj index must first read the footer (below) to
+obtain `obj_index_position`. If not present, the position will be 0.
+
+Log block format
+^^^^^^^^^^^^^^^^
+
+Unlike ref and obj blocks, log blocks are always unaligned.
+
+Log blocks are variable in size, and do not match the `block_size`
+specified in the file header or footer. Writers should choose an
+appropriate buffer size to prepare a log block for deflation, such as
+`2 * block_size`.
+
+A log block is written as:
+
+....
+'g'
+uint24( block_len )
+zlib_deflate {
+  log_record+
+  uint24( restart_offset )+
+  uint16( restart_count )
+}
+....
+
+Log blocks look similar to ref blocks, except `block_type = 'g'`.
+
+The 4-byte block header is followed by the deflated block contents using
+zlib deflate. The `block_len` in the header is the inflated size
+(including 4-byte block header), and should be used by readers to
+preallocate the inflation output buffer. A log block’s `block_len` may
+exceed the file’s block size.
+
+Offsets within the log block (e.g. `restart_offset`) still include the
+4-byte header. Readers may prefer prefixing the inflation output buffer
+with the 4-byte header.
+
+Within the deflate container, a variable number of `log_record` describe
+reference changes. The log record format is described below. See ref
+block format (above) for a description of `restart_offset` and
+`restart_count`.
+
+Because log blocks have no alignment or padding between blocks, readers
+must keep track of the bytes consumed by the inflater to know where the
+next log block begins.
+
+log record
+++++++++++
+
+Log record keys are structured as:
+
+....
+ref_name '\0' reverse_int64( update_index )
+....
+
+where `update_index` is the unique transaction identifier. The
+`update_index` field must be unique within the scope of a `ref_name`.
+See the update transactions section below for further details.
+
+The `reverse_int64` function inverses the value so lexographical
+ordering the network byte order encoding sorts the more recent records
+with higher `update_index` values first:
+
+....
+reverse_int64(int64 t) {
+  return 0xffffffffffffffff - t;
+}
+....
+
+Log records have a similar starting structure to ref and index records,
+utilizing the same prefix compression scheme applied to the log record
+key described above.
+
+....
+    varint( prefix_length )
+    varint( (suffix_length << 3) | log_type )
+    suffix
+    log_data {
+      old_id
+      new_id
+      varint( name_length    )  name
+      varint( email_length   )  email
+      varint( time_seconds )
+      sint16( tz_offset )
+      varint( message_length )  message
+    }?
+....
+
+Log record entries use `log_type` to indicate what follows:
+
+* `0x0`: deletion; no log data.
+* `0x1`: standard git reflog data using `log_data` above.
+
+The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
+entry from the reflog of `refs/stash` in a transaction file (below),
+without needing to rewrite larger files. Readers reading a stack of
+reflogs must treat this as a deletion.
+
+For `log_type = 0x1`, the `log_data` section follows
+linkgit:git-update-ref[1] logging and includes:
+
+* two 20-byte SHA-1s (old id, new id)
+* varint string of committer’s name
+* varint string of committer’s email
+* varint time in seconds since epoch (Jan 1, 1970)
+* 2-byte timezone offset in minutes (signed)
+* varint string of message
+
+`tz_offset` is the absolute number of minutes from GMT the committer was
+at the time of the update. For example `GMT-0800` is encoded in reftable
+as `sint16(-480)` and `GMT+0230` is `sint16(150)`.
+
+The committer email does not contain `<` or `>`, it’s the value normally
+found between the `<>` in a git commit object header.
+
+The `message_length` may be 0, in which case there was no message
+supplied for the update.
+
+Contrary to traditional reflog (which is a file), renames are encoded as
+a combination of ref deletion and ref creation.
+
+Reading the log
++++++++++++++++
+
+Readers accessing the log must first read the footer (below) to
+determine the `log_position`. The first block of the log begins at
+`log_position` bytes since the start of the file. The `log_position` is
+not block aligned.
+
+Importing logs
+++++++++++++++
+
+When importing from `$GIT_DIR/logs` writers should globally order all
+log records roughly by timestamp while preserving file order, and assign
+unique, increasing `update_index` values for each log line. Newer log
+records get higher `update_index` values.
+
+Although an import may write only a single reftable file, the reftable
+file must span many unique `update_index`, as each log line requires its
+own `update_index` to preserve semantics.
+
+Log index
+^^^^^^^^^
+
+The log index stores the log key
+(`refname \0 reverse_int64(update_index)`) for the last log record of
+every log block in the file, supporting bounded-time lookup.
+
+A log index block must be written if 2 or more log blocks are written to
+the file. If present, the log index appears after the last log block.
+There is no padding used to align the log index to block alignment.
+
+Log index format is identical to ref index, except the keys are 9 bytes
+longer to include `'\0'` and the 8-byte `reverse_int64(update_index)`.
+Records use `block_position` to refer to the start of a log block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the log index must first read the footer (below) to
+obtain `log_index_position`. If not present, the position will be 0.
+
+Footer
+^^^^^^
+
+After the last block of the file, a file footer is written. It begins
+like the file header, but is extended with additional data.
+
+A 68-byte footer appears at the end:
+
+....
+    'REFT'
+    uint8( version_number = 1 )
+    uint24( block_size )
+    uint64( min_update_index )
+    uint64( max_update_index )
+
+    uint64( ref_index_position )
+    uint64( (obj_position << 5) | obj_id_len )
+    uint64( obj_index_position )
+
+    uint64( log_position )
+    uint64( log_index_position )
+
+    uint32( CRC-32 of above )
+....
+
+If a section is missing (e.g. ref index) the corresponding position
+field (e.g. `ref_index_position`) will be 0.
+
+* `obj_position`: byte position for the first obj block.
+* `obj_id_len`: number of bytes used to abbreviate object identifiers in
+obj blocks.
+* `log_position`: byte position for the first log block.
+* `ref_index_position`: byte position for the start of the ref index.
+* `obj_index_position`: byte position for the start of the obj index.
+* `log_index_position`: byte position for the start of the log index.
+
+Reading the footer
+++++++++++++++++++
+
+Readers must seek to `file_length - 68` to access the footer. A trusted
+external source (such as `stat(2)`) is necessary to obtain
+`file_length`. When reading the footer, readers must verify:
+
+* 4-byte magic is correct
+* 1-byte version number is recognized
+* 4-byte CRC-32 matches the other 64 bytes (including magic, and
+version)
+
+Once verified, the other fields of the footer can be accessed.
+
+Varint encoding
+^^^^^^^^^^^^^^^
+
+Varint encoding is identical to the ofs-delta encoding method used
+within pack files.
+
+Decoder works such as:
+
+....
+val = buf[ptr] & 0x7f
+while (buf[ptr] & 0x80) {
+  ptr++
+  val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
+}
+....
+
+Binary search
+^^^^^^^^^^^^^
+
+Binary search within a block is supported by the `restart_offset` fields
+at the end of the block. Readers can binary search through the restart
+table to locate between which two restart points the sought reference or
+key should appear.
+
+Each record identified by a `restart_offset` stores the complete key in
+the `suffix` field of the record, making the compare operation during
+binary search straightforward.
+
+Once a restart point lexicographically before the sought reference has
+been identified, readers can linearly scan through the following record
+entries to locate the sought record, terminating if the current record
+sorts after (and therefore the sought key is not present).
+
+Restart point selection
++++++++++++++++++++++++
+
+Writers determine the restart points at file creation. The process is
+arbitrary, but every 16 or 64 records is recommended. Every 16 may be
+more suitable for smaller block sizes (4k or 8k), every 64 for larger
+block sizes (64k).
+
+More frequent restart points reduces prefix compression and increases
+space consumed by the restart table, both of which increase file size.
+
+Less frequent restart points makes prefix compression more effective,
+decreasing overall file size, with increased penalities for readers
+walking through more records after the binary search step.
+
+A maximum of `65535` restart points per block is supported.
+
+Considerations
+~~~~~~~~~~~~~~
+
+Lightweight refs dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format assumes the vast majority of references are single
+SHA-1 valued with common prefixes, such as Gerrit Code Review’s
+`refs/changes/` namespace, GitHub’s `refs/pulls/` namespace, or many
+lightweight tags in the `refs/tags/` namespace.
+
+Annotated tags storing the peeled object cost an additional 20 bytes per
+reference.
+
+Low overhead
+^^^^^^^^^^^^
+
+A reftable with very few references (e.g. git.git with 5 heads) is 269
+bytes for reftable, vs. 332 bytes for packed-refs. This supports
+reftable scaling down for transaction logs (below).
+
+Block size
+^^^^^^^^^^
+
+For a Gerrit Code Review type repository with many change refs, larger
+block sizes (64 KiB) and less frequent restart points (every 64) yield
+better compression due to more references within the block compressing
+against the prior reference.
+
+Larger block sizes reduce the index size, as the reftable will require
+fewer blocks to store the same number of references.
+
+Minimal disk seeks
+^^^^^^^^^^^^^^^^^^
+
+Assuming the index block has been loaded into memory, binary searching
+for any single reference requires exactly 1 disk seek to load the
+containing block.
+
+Scans and lookups dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Scanning all references and lookup by name (or namespace such as
+`refs/heads/`) are the most common activities performed on repositories.
+SHA-1s are stored directly with references to optimize this use case.
+
+Logs are infrequently read
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are infrequently accessed, but can be large. Deflating log blocks
+saves disk space, with some increased penalty at read time.
+
+Logs are stored in an isolated section from refs, reducing the burden on
+reference readers that want to ignore logs. Further, historical logs can
+be isolated into log-only files.
+
+Logs are read backwards
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are frequently accessed backwards (most recent N records for master
+to answer `master@{4}`), so log records are grouped by reference, and
+sorted descending by update index.
+
+Repository format
+~~~~~~~~~~~~~~~~~
+
+Version 1
+^^^^^^^^^
+
+A repository must set its `$GIT_DIR/config` to configure reftable:
+
+....
+[core]
+    repositoryformatversion = 1
+[extensions]
+    refStorage = reftable
+....
+
+Layout
+^^^^^^
+
+A collection of reftable files are stored in the `$GIT_DIR/reftable/`
+directory:
+
+....
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+where reftable files are named by a unique name such as produced by the
+function `${min_update_index}-${max_update_index}.ref`.
+
+Log-only files use the `.log` extension, while ref-only and mixed ref
+and log files use `.ref`. extension.
+
+The stack ordering file is `$GIT_DIR/reftable/tables.list` and lists the
+current files, one per line, in order, from oldest (base) to newest
+(most recent):
+
+....
+$ cat .git/reftable/tables.list
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+Readers must read `$GIT_DIR/reftable/tables.list` to determine which
+files are relevant right now, and search through the stack in reverse
+order (last reftable is examined first).
+
+Reftable files not listed in `tables.list` may be new (and about to be
+added to the stack by the active writer), or ancient and ready to be
+pruned.
+
+Backward compatibility
+^^^^^^^^^^^^^^^^^^^^^^
+
+Older clients should continue to recognize the directory as a git
+repository so they don’t look for an enclosing repository in parent
+directories. To this end, a reftable-enabled repository must contain the
+following dummy files
+
+* `.git/HEAD`, a regular file containing `ref: refs/heads/.invalid`.
+* `.git/refs/`, a directory
+* `.git/refs/heads`, a regular file
+
+Readers
+^^^^^^^
+
+Readers can obtain a consistent snapshot of the reference space by
+following:
+
+1.  Open and read the `tables.list` file.
+2.  Open each of the reftable files that it mentions.
+3.  If any of the files is missing, goto 1.
+4.  Read from the now-open files as long as necessary.
+
+Update transactions
+^^^^^^^^^^^^^^^^^^^
+
+Although reftables are immutable, mutations are supported by writing a
+new reftable and atomically appending it to the stack:
+
+1.  Acquire `tables.list.lock`.
+2.  Read `tables.list` to determine current reftables.
+3.  Select `update_index` to be most recent file’s
+`max_update_index + 1`.
+4.  Prepare temp reftable `tmp_XXXXXX`, including log entries.
+5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
+6.  Copy `tables.list` to `tables.list.lock`, appending file from (5).
+7.  Rename `tables.list.lock` to `tables.list`.
+
+During step 4 the new file’s `min_update_index` and `max_update_index`
+are both set to the `update_index` selected by step 3. All log records
+for the transaction use the same `update_index` in their keys. This
+enables later correlation of which references were updated by the same
+transaction.
+
+Because a single `tables.list.lock` file is used to manage locking, the
+repository is single-threaded for writers. Writers may have to busy-spin
+(with backoff) around creating `tables.list.lock`, for up to an
+acceptable wait period, aborting if the repository is too busy to
+mutate. Application servers wrapped around repositories (e.g. Gerrit
+Code Review) can layer their own lock/wait queue to improve fairness to
+writers.
+
+Reference deletions
+^^^^^^^^^^^^^^^^^^^
+
+Deletion of any reference can be explicitly stored by setting the `type`
+to `0x0` and omitting the `value` field of the `ref_record`. This serves
+as a tombstone, overriding any assertions about the existence of the
+reference from earlier files in the stack.
+
+Compaction
+^^^^^^^^^^
+
+A partial stack of reftables can be compacted by merging references
+using a straightforward merge join across reftables, selecting the most
+recent value for output, and omitting deleted references that do not
+appear in remaining, lower reftables.
+
+A compacted reftable should set its `min_update_index` to the smallest
+of the input files’ `min_update_index`, and its `max_update_index`
+likewise to the largest input `max_update_index`.
+
+For sake of illustration, assume the stack currently consists of
+reftable files (from oldest to newest): A, B, C, and D. The compactor is
+going to compact B and C, leaving A and D alone.
+
+1.  Obtain lock `tables.list.lock` and read the `tables.list` file.
+2.  Obtain locks `B.lock` and `C.lock`. Ownership of these locks
+prevents other processes from trying to compact these files.
+3.  Release `tables.list.lock`.
+4.  Compact `B` and `C` into a temp file
+`${min_update_index}-${max_update_index}_XXXXXX`.
+5.  Reacquire lock `tables.list.lock`.
+6.  Verify that `B` and `C` are still in the stack, in that order. This
+should always be the case, assuming that other processes are adhering to
+the locking protocol.
+7.  Rename `${min_update_index}-${max_update_index}_XXXXXX` to
+`${min_update_index}-${max_update_index}.ref`.
+8.  Write the new stack to `tables.list.lock`, replacing `B` and `C`
+with the file from (4).
+9.  Rename `tables.list.lock` to `tables.list`.
+10. Delete `B` and `C`, perhaps after a short sleep to avoid forcing
+readers to backtrack.
+
+This strategy permits compactions to proceed independently of updates.
+
+Each reftable (compacted or not) is uniquely identified by its name, so
+open reftables can be cached by their name.
+
+Alternatives considered
+~~~~~~~~~~~~~~~~~~~~~~~
+
+bzip packed-refs
+^^^^^^^^^^^^^^^^
+
+`bzip2` can significantly shrink a large packed-refs file (e.g. 62 MiB
+compresses to 23 MiB, 37%). However the bzip format does not support
+random access to a single reference. Readers must inflate and discard
+while performing a linear scan.
+
+Breaking packed-refs into chunks (individually compressing each chunk)
+would reduce the amount of data a reader must inflate, but still leaves
+the problem of indexing chunks to support readers efficiently locating
+the correct chunk.
+
+Given the compression achieved by reftable’s encoding, it does not seem
+necessary to add the complexity of bzip/gzip/zlib.
+
+Michael Haggerty’s alternate format
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Michael Haggerty proposed
+https://public-inbox.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe+wGvEJ7A@mail.gmail.com/[an
+alternate] format to reftable on the Git mailing list. This format uses
+smaller chunks, without the restart table, and avoids block alignment
+with padding. Reflog entries immediately follow each ref, and are thus
+interleaved between refs.
+
+Performance testing indicates reftable is faster for lookups (51%
+faster, 11.2 usec vs. 5.4 usec), although reftable produces a slightly
+larger file (+ ~3.2%, 28.3M vs 29.2M):
+
+[cols=">,>,>,>",options="header",]
+|=====================================
+|format |size |seek cold |seek hot
+|mh-alt |28.3 M |23.4 usec |11.2 usec
+|reftable |29.2 M |19.9 usec |5.4 usec
+|=====================================
+
+JGit Ketch RefTree
+^^^^^^^^^^^^^^^^^^
+
+https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg03073.html[JGit Ketch]
+proposed
+https://public-inbox.org/git/CAJo=hJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg@mail.gmail.com/[RefTree],
+an encoding of references inside Git tree objects stored as part of the
+repository’s object database.
+
+The RefTree format adds additional load on the object database storage
+layer (more loose objects, more objects in packs), and relies heavily on
+the packer’s delta compression to save space. Namespaces which are flat
+(e.g. thousands of tags in refs/tags) initially create very large loose
+objects, and so RefTree does not address the problem of copying many
+references to modify a handful.
+
+Flat namespaces are not efficiently searchable in RefTree, as tree
+objects in canonical formatting cannot be binary searched. This fails
+the need to handle a large number of references in a single namespace,
+such as GitHub’s `refs/pulls`, or a project with many tags.
+
+LMDB
+^^^^
+
+David Turner proposed
+https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/[using
+LMDB], as LMDB is lightweight (64k of runtime code) and GPL-compatible
+license.
+
+A downside of LMDB is its reliance on a single C implementation. This
+makes embedding inside JGit (a popular reimplemenation of Git)
+difficult, and hoisting onto virtual storage (for JGit DFS) virtually
+impossible.
+
+A common format that can be supported by all major Git implementations
+(git-core, JGit, libgit2) is strongly preferred.
+
+Future
+~~~~~~
+
+Longer hashes
+^^^^^^^^^^^^^
+
+Version will bump (e.g. 2) to indicate `value` uses a different object
+id length other than 20. The length could be stored in an expanded file
+header, or hardcoded as part of the version.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v9 07/10] reftable: define version 2 of the spec to accomodate SHA256
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
                                   ` (5 preceding siblings ...)
  2020-04-20 21:14                 ` [PATCH v9 06/10] reftable: file format documentation Jonathan Nieder via GitGitGadget
@ 2020-04-20 21:14                 ` Han-Wen Nienhuys via GitGitGadget
  2020-04-20 21:14                 ` [PATCH v9 08/10] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
                                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-20 21:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/technical/reftable.txt | 50 ++++++++++++++++------------
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index 9fa4657d9ff..ee3f36ea851 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -193,8 +193,8 @@ and non-aligned files.
 Very small files (e.g. 1 only ref block) may omit `padding` and the ref
 index to reduce total file size.
 
-Header
-^^^^^^
+Header (version 1)
+^^^^^^^^^^^^^^^^^^
 
 A 24-byte header appears at the beginning of the file:
 
@@ -215,6 +215,24 @@ used in a stack for link:#Update-transactions[transactions], these
 fields can order the files such that the prior file’s
 `max_update_index + 1` is the next file’s `min_update_index`.
 
+Header (version 2)
+^^^^^^^^^^^^^^^^^^
+
+A 28-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+uint32( hash_id )
+....
+
+The header is identical to `version_number=1`, with the 4-byte hash ID
+("sha1" for SHA1 and "s256" for SHA-256) append to the header.
+
+
 First ref block
 ^^^^^^^^^^^^^^^
 
@@ -671,14 +689,8 @@ Footer
 After the last block of the file, a file footer is written. It begins
 like the file header, but is extended with additional data.
 
-A 68-byte footer appears at the end:
-
 ....
-    'REFT'
-    uint8( version_number = 1 )
-    uint24( block_size )
-    uint64( min_update_index )
-    uint64( max_update_index )
+    HEADER
 
     uint64( ref_index_position )
     uint64( (obj_position << 5) | obj_id_len )
@@ -701,12 +713,16 @@ obj blocks.
 * `obj_index_position`: byte position for the start of the obj index.
 * `log_index_position`: byte position for the start of the log index.
 
+The size of the footer is 68 bytes for version 1, and 72 bytes for
+version 2.
+
 Reading the footer
 ++++++++++++++++++
 
-Readers must seek to `file_length - 68` to access the footer. A trusted
-external source (such as `stat(2)`) is necessary to obtain
-`file_length`. When reading the footer, readers must verify:
+Readers must first read the file start to determine the version
+number. Then they seek to `file_length - FOOTER_LENGTH` to access the
+footer. A trusted external source (such as `stat(2)`) is necessary to
+obtain `file_length`. When reading the footer, readers must verify:
 
 * 4-byte magic is correct
 * 1-byte version number is recognized
@@ -1055,13 +1071,3 @@ impossible.
 
 A common format that can be supported by all major Git implementations
 (git-core, JGit, libgit2) is strongly preferred.
-
-Future
-~~~~~~
-
-Longer hashes
-^^^^^^^^^^^^^
-
-Version will bump (e.g. 2) to indicate `value` uses a different object
-id length other than 20. The length could be stored in an expanded file
-header, or hardcoded as part of the version.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v9 08/10] reftable: clarify how empty tables should be written
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
                                   ` (6 preceding siblings ...)
  2020-04-20 21:14                 ` [PATCH v9 07/10] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
@ 2020-04-20 21:14                 ` Han-Wen Nienhuys via GitGitGadget
  2020-04-20 21:14                 ` [PATCH v9 09/10] Add reftable library Han-Wen Nienhuys via GitGitGadget
                                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-20 21:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The format allows for some ambiguity, as a lone footer also starts
with a valid file header. However, the current JGit code will barf on
this. This commit codifies this behavior into the standard.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/technical/reftable.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index ee3f36ea851..df5cb5f11b8 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -731,6 +731,13 @@ version)
 
 Once verified, the other fields of the footer can be accessed.
 
+Empty tables
+++++++++++++
+
+A reftable may be empty. In this case, the file starts with a header
+and is immediately followed by a footer.
+
+
 Varint encoding
 ^^^^^^^^^^^^^^^
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v9 09/10] Add reftable library
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
                                   ` (7 preceding siblings ...)
  2020-04-20 21:14                 ` [PATCH v9 08/10] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
@ 2020-04-20 21:14                 ` Han-Wen Nienhuys via GitGitGadget
  2020-04-20 22:06                   ` Junio C Hamano
  2020-04-20 21:14                 ` [PATCH v9 10/10] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-20 21:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. Its rationale and
specification is in the preceding commit.

This imports the upstream library as one big commit. For understanding
the code, it is suggested to read the code in the following order:

* The specification under Documentation/technical/reftable.txt

* reftable.h - the public API

* block.{c,h} - reading and writing blocks.

* writer.{c,h} - writing a complete reftable file.

* merged.{c,h} and pq.{c,h} - reading a stack of reftables

* stack.{c,h} - writing and compacting stacks of reftable on the
filesystem.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE       |   31 ++
 reftable/README.md     |   11 +
 reftable/VERSION       |    5 +
 reftable/basics.c      |  189 +++++++
 reftable/basics.h      |   53 ++
 reftable/block.c       |  425 +++++++++++++++
 reftable/block.h       |  124 +++++
 reftable/blocksource.h |   22 +
 reftable/bytes.c       |    0
 reftable/config.h      |    1 +
 reftable/constants.h   |   21 +
 reftable/dump.c        |   97 ++++
 reftable/file.c        |   98 ++++
 reftable/iter.c        |  234 +++++++++
 reftable/iter.h        |   60 +++
 reftable/merged.c      |  307 +++++++++++
 reftable/merged.h      |   34 ++
 reftable/pq.c          |  115 ++++
 reftable/pq.h          |   34 ++
 reftable/reader.c      |  754 ++++++++++++++++++++++++++
 reftable/reader.h      |   68 +++
 reftable/record.c      | 1119 +++++++++++++++++++++++++++++++++++++++
 reftable/record.h      |  117 +++++
 reftable/reftable.h    |  523 +++++++++++++++++++
 reftable/slice.c       |  224 ++++++++
 reftable/slice.h       |   76 +++
 reftable/stack.c       | 1132 ++++++++++++++++++++++++++++++++++++++++
 reftable/stack.h       |   42 ++
 reftable/system.h      |   53 ++
 reftable/tree.c        |   67 +++
 reftable/tree.h        |   34 ++
 reftable/update.sh     |   14 +
 reftable/writer.c      |  661 +++++++++++++++++++++++
 reftable/writer.h      |   60 +++
 reftable/zlib-compat.c |   92 ++++
 35 files changed, 6897 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 00000000000..402e0f9356b
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 00000000000..21500563fd4
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,11 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   sh reftable/update.sh
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 00000000000..b801511d16d
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,5 @@
+commit b68690acad769d59e80cbb4f79c442ece133e6bd
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Mon Apr 20 21:33:11 2020 +0200
+
+    C: fix search/replace error in dump.c
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 00000000000..a0db42b74d3
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,189 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)((i)&0xff);
+}
+
+uint32_t get_be24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)((i)&0xff);
+}
+
+int binsearch(int sz, int (*f)(int k, void *args), void *args)
+{
+	int lo = 0;
+	int hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		int mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		reftable_free(*p);
+		p++;
+	}
+	reftable_free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = reftable_realloc(
+					names, names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = reftable_realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *reftable_error_str(int err)
+{
+	switch (err) {
+	case IO_ERROR:
+		return "I/O error";
+	case FORMAT_ERROR:
+		return "FORMAT_ERROR";
+	case NOT_EXIST_ERROR:
+		return "NOT_EXIST_ERROR";
+	case LOCK_ERROR:
+		return "LOCK_ERROR";
+	case API_ERROR:
+		return "API_ERROR";
+	case ZLIB_ERROR:
+		return "ZLIB_ERROR";
+	case -1:
+		return "general error";
+	default:
+		return "unknown error code";
+	}
+}
+
+void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
+void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
+void (*reftable_free_ptr)(void *) = &free;
+
+void *reftable_malloc(size_t sz)
+{
+	return (*reftable_malloc_ptr)(sz);
+}
+
+void *reftable_realloc(void *p, size_t sz)
+{
+	return (*reftable_realloc_ptr)(p, sz);
+}
+
+void reftable_free(void *p)
+{
+	reftable_free_ptr(p);
+}
+
+void *reftable_calloc(size_t sz)
+{
+	void *p = reftable_malloc(sz);
+	memset(p, 0, sz);
+	return p;
+}
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *))
+{
+	reftable_malloc_ptr = malloc;
+	reftable_realloc_ptr = realloc;
+	reftable_free_ptr = free;
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 00000000000..6d89eb5d931
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+/* Bigendian en/decoding of integers */
+
+void put_be24(byte *out, uint32_t i);
+uint32_t get_be24(byte *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(int sz, int (*f)(int k, void *args), void *args);
+
+/*
+  Frees a NULL terminated array of malloced strings. The array itself is also
+  freed.
+ */
+void free_names(char **a);
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp);
+
+/* compares two NULL-terminated arrays of strings. */
+int names_equal(char **a, char **b);
+
+/* returns the array size of a NULL-terminated array of strings. */
+int names_length(char **names);
+
+/* Allocation routines; they invoke the functions set through
+ * reftable_set_alloc() */
+void *reftable_malloc(size_t sz);
+void *reftable_realloc(void *p, size_t sz);
+void reftable_free(void *p);
+void *reftable_calloc(size_t sz);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 00000000000..ad6ef86a1f6
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,425 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "blocksource.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int header_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 24;
+	case 2:
+		return 28;
+	}
+	abort();
+}
+
+int footer_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 68;
+	case 2:
+		return 72;
+	}
+	abort();
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct record rec)
+{
+	struct slice empty = { 0 };
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = { 0 };
+	int n = 0;
+
+	record_key(rec, &key);
+	n = encode_key(&restart, out, last, key, record_val_type(rec));
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	n = record_encode(rec, out, w->hash_size);
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0) {
+		goto err;
+	}
+
+	slice_clear(&key);
+	return 0;
+
+err:
+	slice_clear(&key);
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next) {
+		return -1;
+	}
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = reftable_realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = { 0 };
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		slice_resize(&compressed, src_len);
+
+		while (1) {
+			uLongf dest_len = compressed.len;
+
+			zresult = compress2(compressed.buf, &dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				slice_resize(&compressed, 2 * compressed.len);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				slice_clear(&compressed);
+				return ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed.buf,
+			       dest_len);
+			w->next = dest_len + block_header_skip;
+			break;
+		}
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct reftable_block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	if (!is_block_type(typ)) {
+		return FORMAT_ERROR;
+	}
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = { 0 };
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip;
+		uLongf src_len = block->len - block_header_skip;
+
+		slice_resize(&uncompressed, sz);
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			slice_clear(&uncompressed);
+			return ZLIB_ERROR;
+		}
+
+		block_source_return_block(block->source, block);
+		block->data = uncompressed.buf;
+		block->len = dst_len; /* XXX: 4 bytes missing? */
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	{
+		uint16_t restart_count = get_be16(block->data + sz - 2);
+		uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+		byte *restart_bytes = block->data + restart_start;
+
+		/* transfer ownership. */
+		br->block = *block;
+		block->data = NULL;
+		block->len = 0;
+
+		br->hash_size = hash_size;
+		br->block_len = restart_start;
+		br->full_block_size = full_block_size;
+		br->header_off = header_off;
+		br->restart_count = restart_count;
+		br->restart_bytes = restart_bytes;
+	}
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct slice key;
+	struct block_reader *r;
+};
+
+static int restart_key_less(int idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = { 0 };
+	struct slice last_key = { 0 };
+	byte unused_extra;
+	int n = decode_key(&rkey, &unused_extra, last_key, in);
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	{
+		int result = slice_compare(a->key, rkey);
+		slice_clear(&rkey);
+		return result;
+	}
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct record rec)
+{
+	if (it->next_off >= it->br->block_len) {
+		return 1;
+	}
+
+	{
+		struct slice in = {
+			.buf = it->br->block.data + it->next_off,
+			.len = it->br->block_len - it->next_off,
+		};
+		struct slice start = in;
+		struct slice key = { 0 };
+		byte extra;
+		int n = decode_key(&key, &extra, it->last_key, in);
+		if (n < 0) {
+			return -1;
+		}
+
+		slice_consume(&in, n);
+		n = record_decode(rec, key, extra, in, it->br->hash_size);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+
+		slice_copy(&it->last_key, key);
+		it->next_off += start.len - in.len;
+		slice_clear(&key);
+		return 0;
+	}
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = { 0 };
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	byte extra = 0;
+	int n = decode_key(key, &extra, empty, in);
+	if (n < 0) {
+		return n;
+	}
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	slice_clear(&it->last_key);
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		return -1;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	{
+		struct record rec = new_record(block_reader_type(br));
+		struct slice key = { 0 };
+		int result = 0;
+		int err = 0;
+		struct block_iter next = { 0 };
+		while (true) {
+			block_iter_copy_from(&next, it);
+
+			err = block_iter_next(&next, rec);
+			if (err < 0) {
+				result = -1;
+				goto exit;
+			}
+
+			record_key(rec, &key);
+			if (err > 0 || slice_compare(key, want) >= 0) {
+				result = 0;
+				goto exit;
+			}
+
+			block_iter_copy_from(it, &next);
+		}
+
+	exit:
+		slice_clear(&key);
+		slice_clear(&next.last_key);
+		record_destroy(&rec);
+
+		return result;
+	}
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	slice_clear(&bw->last_key);
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 00000000000..62b2e0fec6d
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,124 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+/*
+  Writes reftable blocks. The block_writer is reused across blocks to minimize
+  allocation overhead.
+*/
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+
+	/* Offset ofof the global header. Nonzero in the first block only. */
+	uint32_t header_off;
+
+	/* How often to restart keys. */
+	int restart_interval;
+	int hash_size;
+
+	/* Offset of next byte to write. */
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+
+	struct slice last_key;
+	int entries;
+};
+
+/*
+  initializes the blockwriter to write `typ` entries, using `buf` as temporary
+  storage. `buf` is not owned by the block_writer. */
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+
+/*
+  returns the block type (eg. 'r' for ref records.
+*/
+byte block_writer_type(struct block_writer *bw);
+
+/* appends the record, or -1 if it doesn't fit. */
+int block_writer_add(struct block_writer *w, struct record rec);
+
+/* appends the key restarts, and compress the block if necessary. */
+int block_writer_finish(struct block_writer *w);
+
+/* clears out internally allocated block_writer members. */
+void block_writer_clear(struct block_writer *bw);
+
+/* Read a block. */
+struct block_reader {
+	/* offset of the block header; nonzero for the first block in a
+	 * reftable. */
+	uint32_t header_off;
+
+	/* the memory block */
+	struct reftable_block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint16_t restart_count;
+
+	/* size of the data in the file. For log blocks, this is the compressed
+	 * size. */
+	uint32_t full_block_size;
+};
+
+/* Iterate over entries in a block */
+struct block_iter {
+	/* offset within the block of the next entry to read. */
+	uint32_t next_off;
+	struct block_reader *br;
+
+	/* key for last entry we read. */
+	struct slice last_key;
+};
+
+/* initializes a block reader */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Position `it` at start of the block */
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+
+/* Position `it` to the `want` key in the block */
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want);
+
+/* Returns the block type (eg. 'r' for refs) */
+byte block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+int block_iter_next(struct block_iter *it, struct record rec);
+
+/* Seek to `want` with in the block pointed to by `it` */
+int block_iter_seek(struct block_iter *it, struct slice want);
+
+/* deallocate memory for `it`. The block reader and its block is left intact. */
+void block_iter_close(struct block_iter *it);
+
+/* size of file header, depending on format version */
+int header_size(int version);
+
+/* size of file footer, depending on format version */
+int footer_size(int version);
+
+#endif
diff --git a/reftable/blocksource.h b/reftable/blocksource.h
new file mode 100644
index 00000000000..1fdf0bf4557
--- /dev/null
+++ b/reftable/blocksource.h
@@ -0,0 +1,22 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCKSOURCE_H
+#define BLOCKSOURCE_H
+
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source source);
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_return_block(struct reftable_block_source source,
+			       struct reftable_block *ret);
+void block_source_close(struct reftable_block_source source);
+
+#endif
diff --git a/reftable/bytes.c b/reftable/bytes.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/reftable/config.h b/reftable/config.h
new file mode 100644
index 00000000000..40a8c178f10
--- /dev/null
+++ b/reftable/config.h
@@ -0,0 +1 @@
+/* empty */
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 00000000000..5eee72c4c11
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,21 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 00000000000..c0065792a4f
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "reftable.h"
+
+static int dump_table(const char *tablename)
+{
+	struct block_source src = { 0 };
+	int err = block_source_from_file(&src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	struct reader *r = NULL;
+	err = new_reader(&r, src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct iterator it = { 0 };
+		err = reader_seek_ref(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+
+		struct ref_record ref = { 0 };
+		while (1) {
+			err = iterator_next_ref(it, &ref);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			ref_record_print(&ref, 20);
+		}
+		iterator_destroy(&it);
+		ref_record_clear(&ref);
+	}
+
+	{
+		struct iterator it = { 0 };
+		err = reader_seek_log(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+		struct log_record log = { 0 };
+		while (1) {
+			err = iterator_next_log(it, &log);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			log_record_print(&log, 20);
+		}
+		iterator_destroy(&it);
+		log_record_clear(&log);
+	}
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int opt;
+	const char *table = NULL;
+	while ((opt = getopt(argc, argv, "t:")) != -1) {
+		switch (opt) {
+		case 't':
+			table = xstrdup(optarg);
+			break;
+		case '?':
+			printf("usage: %s [-table tablefile]\n", argv[0]);
+			return 2;
+			break;
+		}
+	}
+
+	if (table != NULL) {
+		int err = dump_table(table);
+		if (err < 0) {
+			fprintf(stderr, "%s: %s: %s\n", argv[0], table,
+				error_str(err));
+			return 1;
+		}
+	}
+	return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 00000000000..82659960b30
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,98 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	reftable_free(b);
+}
+
+static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = reftable_malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size) {
+		return -1;
+	}
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int reftable_block_source_from_file(struct reftable_block_source *bs,
+				    const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0) {
+		return -1;
+	}
+
+	{
+		struct file_block_source *p =
+			reftable_calloc(sizeof(struct file_block_source));
+		p->size = st.st_size;
+		p->fd = fd;
+
+		bs->ops = &file_vtable;
+		bs->arg = p;
+	}
+	return 0;
+}
+
+int reftable_fd_write(void *arg, byte *data, int sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 00000000000..e694dc952c1
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,234 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct reftable_iterator it)
+{
+	return it.ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct record rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct reftable_iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct reftable_iterator *it)
+{
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct reftable_iterator it, struct record rec)
+{
+	return it.ops->next(it.iter_arg, rec);
+}
+
+void reftable_iterator_destroy(struct reftable_iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int reftable_iterator_next_ref(struct reftable_iterator it,
+			       struct reftable_ref_record *ref)
+{
+	struct record rec = { 0 };
+	record_from_ref(&rec, ref);
+	return iterator_next(it, rec);
+}
+
+int reftable_iterator_next_log(struct reftable_iterator it,
+			       struct reftable_log_record *log)
+{
+	struct record rec = { 0 };
+	record_from_log(&rec, log);
+	return iterator_next(it, rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	slice_clear(&fri->oid);
+	reftable_iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec.data;
+
+	while (true) {
+		int err = reftable_iterator_next_ref(fri->it, ref);
+		if (err != 0) {
+			return err;
+		}
+
+		if (fri->double_check) {
+			struct reftable_iterator it = { 0 };
+
+			int err = reftable_reader_seek_ref(fri->r, &it,
+							   ref->ref_name);
+			if (err == 0) {
+				err = reftable_iterator_next_ref(it, ref);
+			}
+
+			reftable_iterator_destroy(&it);
+
+			if (err < 0) {
+				return err;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+}
+
+struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reader_return_block(it->r, &it->block_reader.block);
+	slice_clear(&it->oid);
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reader_return_block(it->r, &it->block_reader.block);
+
+	{
+		uint64_t off = it->offsets[it->offset_idx++];
+		int err = reader_init_block_reader(it->r, &it->block_reader,
+						   off, BLOCK_TYPE_REF);
+		if (err < 0) {
+			return err;
+		}
+		if (err > 0) {
+			/* indexed block does not exist. */
+			return FORMAT_ERROR;
+		}
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct record rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec.data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter *itr =
+		reftable_calloc(sizeof(struct indexed_table_ref_iter));
+	int err = 0;
+
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		reftable_free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 00000000000..408b7f41a74
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct reftable_iterator_vtable {
+	int (*next)(void *iter_arg, struct record rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct reftable_iterator *it);
+int iterator_next(struct reftable_iterator it, struct record rec);
+bool iterator_is_null(struct reftable_iterator it);
+
+/* iterator that produces only ref records that point to `oid` */
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reftable_reader *r;
+	struct slice oid;
+	struct reftable_iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
+					  struct filtering_ref_iterator *);
+
+/* iterator that produces only ref records that point to `oid`,
+   but using the object index.
+ */
+struct indexed_table_ref_iter {
+	struct reftable_reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 00000000000..83a20363a2c
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,307 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct record rec = new_record(mi->typ);
+		int err = iterator_next(mi->stack[i], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[i]);
+			record_destroy(&rec);
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->stack[i]);
+	}
+	reftable_free(mi->stack);
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, int idx)
+{
+	if (iterator_is_null(mi->stack[idx])) {
+		return 0;
+	}
+
+	{
+		struct record rec = new_record(mi->typ);
+		struct pq_entry e = {
+			.rec = rec,
+			.index = idx,
+		};
+		int err = iterator_next(mi->stack[idx], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[idx]);
+			record_destroy(&rec);
+			return 0;
+		}
+
+		merged_iter_pqueue_add(&mi->pq, e);
+	}
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct record rec)
+{
+	struct slice entry_key = { 0 };
+	struct pq_entry entry = merged_iter_pqueue_remove(&mi->pq);
+	int err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0) {
+		return err;
+	}
+
+
+        /*
+          One can also use reftable as datacenter-local storage, where the ref
+          database is maintained in globally consistent database (eg.
+          CockroachDB or Spanner). In this scenario, replication delays together
+          with compaction may cause newer tables to contain older entries. In
+          such a deployment, the loop below must be changed to collect all
+          entries for the same key, and return new the newest one.
+        */
+	record_key(entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = { 0 };
+		int err = 0, cmp = 0;
+
+		record_key(top.rec, &k);
+
+		cmp = slice_compare(k, entry_key);
+		slice_clear(&k);
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		record_clear(top.rec);
+		reftable_free(record_yield(&top.rec));
+	}
+
+	record_copy_from(rec, entry.rec, hash_size(mi->hash_id));
+	record_clear(entry.rec);
+	reftable_free(record_yield(&entry.rec));
+	slice_clear(&entry_key);
+	return 0;
+}
+
+static int merged_iter_next_void(void *p, struct record rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	return merged_iter_next(mi, rec);
+}
+
+struct reftable_iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct reftable_iterator *it,
+				      struct merged_iter *mi)
+{
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id)
+{
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reftable_reader *r = stack[i];
+		if (r->hash_id != hash_id) {
+			return FORMAT_ERROR;
+		}
+		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
+			return FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reftable_reader_min_update_index(r);
+		}
+
+		last_max = reftable_reader_max_update_index(r);
+	}
+
+	{
+		struct reftable_merged_table m = {
+			.stack = stack,
+			.stack_len = n,
+			.min = first_min,
+			.max = last_max,
+			.hash_id = hash_id,
+		};
+
+		*dest = reftable_calloc(sizeof(struct reftable_merged_table));
+		**dest = m;
+	}
+	return 0;
+}
+
+void reftable_merged_table_close(struct reftable_merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reftable_reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct reftable_merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void reftable_merged_table_free(struct reftable_merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	reftable_free(mt);
+}
+
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
+{
+	return mt->min;
+}
+
+static int merged_table_seek_record(struct reftable_merged_table *mt,
+				    struct reftable_iterator *it,
+				    struct record rec)
+{
+	struct reftable_iterator *iters = reftable_calloc(
+		sizeof(struct reftable_iterator) * mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = record_type(rec),
+		.hash_id = mt->hash_id,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			reftable_iterator_destroy(&iters[i]);
+		}
+		reftable_free(iters);
+		return err;
+	}
+
+	merged.stack_len = n;
+	err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	}
+
+	{
+		struct merged_iter *p =
+			reftable_malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 00000000000..16ca3c7d901
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct reftable_merged_table {
+	struct reftable_reader **stack;
+	int stack_len;
+	uint32_t hash_id;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	int stack_len;
+	byte typ;
+	struct merged_iter_pqueue pq;
+};
+
+void merged_table_clear(struct reftable_merged_table *mt);
+
+#endif
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 00000000000..bfad96c6695
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,115 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = { 0 };
+	struct slice bk = { 0 };
+	int cmp = 0;
+	record_key(a.rec, &ak);
+	record_key(b.rec, &bk);
+
+	cmp = slice_compare(ak, bk);
+
+	slice_clear(&ak);
+	slice_clear(&bk);
+
+	if (cmp == 0) {
+		return a.index > b.index;
+	}
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = reftable_realloc(pq->heap,
+					    pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		record_clear(pq->heap[i].rec);
+		reftable_free(record_yield(&pq->heap[i].rec));
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 00000000000..b585a52ee14
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 00000000000..26fb5f198c6
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,754 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct reftable_block_source source)
+{
+	return source.ops->size(source.arg);
+}
+
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	int result = source.ops->read_block(source.arg, dest, off, size);
+	dest->source = source;
+	return result;
+}
+
+void block_source_return_block(struct reftable_block_source source,
+			       struct reftable_block *blockp)
+{
+	source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct reftable_block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reftable_reader_offsets *
+reader_offsets_for(struct reftable_reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reftable_reader *r,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size) {
+		return 0;
+	}
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(r->source, dest, off, sz);
+}
+
+void reader_return_block(struct reftable_reader *r, struct reftable_block *p)
+{
+	block_source_return_block(r->source, p);
+}
+
+uint32_t reftable_reader_hash_id(struct reftable_reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reftable_reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reftable_reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	int err = 0;
+	if (memcmp(f, "REFT", 4)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, header_size(r->version))) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+
+	f++;
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	if (r->version == 1) {
+		r->hash_id = SHA1_ID;
+	} else {
+		r->hash_id = get_be32(f);
+		switch (r->hash_id) {
+		case SHA1_ID:
+			break;
+		case SHA256_ID:
+			break;
+		default:
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+		f += 4;
+	}
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	{
+		uint32_t computed_crc = crc32(0, footer, f - footer);
+		uint32_t file_crc = get_be32(f);
+		f += 4;
+		if (computed_crc != file_crc) {
+			err = FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	{
+		byte first_block_typ = header[header_size(r->version)];
+		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+		r->ref_offsets.offset = 0;
+		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+					  r->log_offsets.offset > 0);
+		r->obj_offsets.present = r->obj_offsets.offset > 0;
+	}
+	err = 0;
+exit:
+	return err;
+}
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
+		const char *name)
+{
+	struct reftable_block footer = { 0 };
+	struct reftable_block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reftable_reader));
+
+	/* Need +1 to read type of first block. */
+	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
+	if (err != header_size(2) + 1) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	if (memcmp(header.data, "REFT", 4)) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+	r->version = header.data[4];
+	if (r->version != 1 && r->version != 2) {
+		err = FORMAT_ERROR;
+		goto exit;
+	}
+
+	r->size = block_source_size(source) - footer_size(r->version);
+	r->source = source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size,
+				      footer_size(r->version));
+	if (err != footer_size(r->version)) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+exit:
+	block_source_return_block(r->source, &footer);
+	block_source_return_block(r->source, &header);
+	return err;
+}
+
+struct table_iter {
+	struct reftable_reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
+		((struct reftable_ref_record *)rec.data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reader_return_block(ti->r, &ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off,
+				  int version)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += header_size(version);
+	}
+
+	*typ = data[0];
+	if (is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct reftable_block block = { 0 };
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : header_size(r->version);
+	int32_t block_size = 0;
+
+	if (next_off >= r->size) {
+		return 1;
+	}
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0) {
+		return err;
+	}
+
+	block_size = extract_block_size(block.data, &block_typ, next_off,
+					r->version);
+	if (block_size < 0) {
+		return block_size;
+	}
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reader_return_block(r, &block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reader_return_block(r, &block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0) {
+		return err;
+	}
+
+	{
+		struct block_reader *brp =
+			reftable_malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct record rec)
+{
+	if (record_type(rec) != ti->typ) {
+		return API_ERROR;
+	}
+
+	while (true) {
+		struct table_iter next = { 0 };
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct record rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct reftable_iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reftable_reader *r,
+				struct table_iter *ti, uint64_t off, byte typ)
+{
+	struct block_reader br = { 0 };
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0) {
+		return err;
+	}
+
+	brp = reftable_malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reftable_reader *r, struct table_iter *ti,
+			byte typ, bool index)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
+			      struct record want)
+{
+	struct record rec = new_record(record_type(want));
+	struct slice want_key = { 0 };
+	struct slice got_key = { 0 };
+	struct table_iter next = { 0 };
+	int err = -1;
+	record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0) {
+			goto exit;
+		}
+		{
+			int cmp = slice_compare(got_key, want_key);
+			if (cmp > 0) {
+				table_iter_block_done(&next);
+				break;
+			}
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0) {
+		goto exit;
+	}
+	err = 0;
+
+exit:
+	block_iter_close(&next.bi);
+	record_destroy(&rec);
+	slice_clear(&want_key);
+	slice_clear(&got_key);
+	return err;
+}
+
+static int reader_seek_indexed(struct reftable_reader *r,
+			       struct reftable_iterator *it, struct record rec)
+{
+	struct index_record want_index = { 0 };
+	struct record want_index_rec = { 0 };
+	struct index_record index_result = { 0 };
+	struct record index_result_rec = { 0 };
+	struct table_iter index_iter = { 0 };
+	struct table_iter next = { 0 };
+	int err = 0;
+
+	record_key(rec, &want_index.last_key);
+	record_from_index(&want_index_rec, &want_index);
+	record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, record_type(rec), true);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reader_seek_linear(r, &index_iter, want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (next.typ == record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter *malloced =
+			reftable_calloc(sizeof(struct table_iter));
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+exit:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	record_clear(want_index_rec);
+	record_clear(index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reftable_reader *r,
+				struct reftable_iterator *it, struct record rec)
+{
+	struct reftable_reader_offsets *offs =
+		reader_offsets_for(r, record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = { 0 };
+	int err = 0;
+	if (idx > 0) {
+		return reader_seek_indexed(r, it, rec);
+	}
+
+	err = reader_start(r, &ti, record_type(rec), false);
+	if (err < 0) {
+		return err;
+	}
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct table_iter *p =
+			reftable_malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct record rec)
+{
+	byte typ = record_type(rec);
+
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return reader_seek(r, it, rec);
+}
+
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return reader_seek(r, it, rec);
+}
+
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reftable_reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int reftable_new_reader(struct reftable_reader **p,
+			struct reftable_block_source src, char const *name)
+{
+	struct reftable_reader *rd =
+		reftable_calloc(sizeof(struct reftable_reader));
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		reftable_free(rd);
+	}
+	return err;
+}
+
+void reftable_reader_free(struct reftable_reader *r)
+{
+	reader_close(r);
+	reftable_free(r);
+}
+
+static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
+					    struct reftable_iterator *it,
+					    byte *oid)
+{
+	struct obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct record want_rec = { 0 };
+	struct reftable_iterator oit = { 0 };
+	struct obj_record got = { 0 };
+	struct record got_rec = { 0 };
+	int err = 0;
+
+	record_from_obj(&want_rec, &want);
+
+	err = reader_seek(r, &oit, want_rec);
+	if (err != 0) {
+		return err;
+	}
+
+	record_from_obj(&got_rec, &got);
+	err = iterator_next(oit, got_rec);
+	reftable_iterator_destroy(&oit);
+	if (err < 0) {
+		return err;
+	}
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	{
+		struct indexed_table_ref_iter *itr = NULL;
+		err = new_indexed_table_ref_iter(&itr, r, oid,
+						 hash_size(r->hash_id),
+						 got.offsets, got.offset_len);
+		if (err < 0) {
+			record_clear(got_rec);
+			return err;
+		}
+		got.offsets = NULL;
+		record_clear(got_rec);
+
+		iterator_from_indexed_table_ref_iter(it, itr);
+	}
+
+	return 0;
+}
+
+static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
+					      struct reftable_iterator *it,
+					      byte *oid, int oid_len)
+{
+	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
+	struct filtering_ref_iterator *filter = NULL;
+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		reftable_free(ti);
+		return err;
+	}
+
+	filter = reftable_calloc(sizeof(struct filtering_ref_iterator));
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	filter->r = r;
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, byte *oid,
+			     int oid_len)
+{
+	if (r->obj_offsets.present) {
+		return reftable_reader_refs_for_indexed(r, it, oid);
+	}
+	return reftable_reader_refs_for_unindexed(r, it, oid, oid_len);
+}
+
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 00000000000..80d8a5adbaa
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,68 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source source);
+
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_return_block(struct reftable_block_source source,
+			       struct reftable_block *ret);
+void block_source_close(struct reftable_block_source *source);
+
+/* metadata for a block type */
+struct reftable_reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+/* The state for reading a reftable file. */
+struct reftable_reader {
+	/* for convience, associate a name with the instance. */
+	char *name;
+	struct reftable_block_source source;
+
+	/* Size of the file, excluding the footer. */
+	uint64_t size;
+
+	/* 'sha1' for SHA1, 's256' for SHA-256 */
+	uint32_t hash_id;
+
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	/* Length of the OID keys in the 'o' section */
+	int object_id_len;
+	int version;
+
+	struct reftable_reader_offsets ref_offsets;
+	struct reftable_reader_offsets obj_offsets;
+	struct reftable_reader_offsets log_offsets;
+};
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
+		const char *name);
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct record rec);
+void reader_close(struct reftable_reader *r);
+const char *reader_name(struct reftable_reader *r);
+void reader_return_block(struct reftable_reader *r, struct reftable_block *p);
+
+/* initialize a block reader to read from `r` */
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 00000000000..4394c8ced37
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1119 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+/* record.c - methods for different types of records. */
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0) {
+		return -1;
+	}
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = { 0 };
+	int i = 9;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	{
+		int n = sizeof(buf) - i - 1;
+		if (dest.len < n) {
+			return -1;
+		}
+		memcpy(dest.buf, &buf[i + 1], n);
+		return n;
+	}
+}
+
+int is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+	if (in.len < tsize) {
+		return -1;
+	}
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	slice_consume(&in, tsize);
+
+	return start_len - in.len;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+	if (s.len < l) {
+		return -1;
+	}
+	memcpy(s.buf, str, l);
+	slice_consume(&s, l);
+
+	return start.len - s.len;
+}
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	if (dest.len < suffix_len) {
+		return -1;
+	}
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	slice_consume(&dest, suffix_len);
+
+	return start.len - dest.len;
+}
+
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	if (prefix_len > last_key.len) {
+		return -1;
+	}
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len) {
+		return -1;
+	}
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	slice_consume(&in, suffix_len);
+
+	return start_len - in.len;
+}
+
+static byte reftable_ref_record_type(void)
+{
+	return BLOCK_TYPE_REF;
+}
+
+static void reftable_ref_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_ref_record *rec =
+		(const struct reftable_ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
+	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	reftable_ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = xstrdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = reftable_malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = reftable_malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9) {
+		return '0' + c;
+	}
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void reftable_ref_record_print(struct reftable_ref_record *ref, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size);
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size);
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void reftable_ref_record_clear_void(void *rec)
+{
+	reftable_ref_record_clear((struct reftable_ref_record *)rec);
+}
+
+void reftable_ref_record_clear(struct reftable_ref_record *ref)
+{
+	reftable_free(ref->ref_name);
+	reftable_free(ref->target);
+	reftable_free(ref->target_value);
+	reftable_free(ref->value);
+	memset(ref, 0, sizeof(struct reftable_ref_record));
+}
+
+static byte reftable_ref_record_val_type(const void *rec)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL) {
+		return 3;
+	}
+	return 0;
+}
+
+static int reftable_ref_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+
+	return start.len - s.len;
+}
+
+static int reftable_ref_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0) {
+		return n;
+	}
+	assert(hash_size > 0);
+
+	slice_consume(&in, n);
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = reftable_malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = reftable_malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		break;
+	case 3: {
+		struct slice dest = { 0 };
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+		seen_target = true;
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+struct record_vtable reftable_ref_record_vtable = {
+	.key = &reftable_ref_record_key,
+	.type = &reftable_ref_record_type,
+	.copy_from = &reftable_ref_record_copy_from,
+	.val_type = &reftable_ref_record_val_type,
+	.encode = &reftable_ref_record_encode,
+	.decode = &reftable_ref_record_decode,
+	.clear = &reftable_ref_record_clear_void,
+};
+
+static byte obj_record_type(void)
+{
+	return BLOCK_TYPE_OBJ;
+}
+
+static void obj_record_key(const void *r, struct slice *dest)
+{
+	const struct obj_record *rec = (const struct obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	const struct obj_record *src = (const struct obj_record *)src_rec;
+
+	*ref = *src;
+	ref->hash_prefix = reftable_malloc(ref->hash_prefix_len);
+	memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
+
+	{
+		int olen = ref->offset_len * sizeof(uint64_t);
+		ref->offsets = reftable_malloc(olen);
+		memcpy(ref->offsets, src->offsets, olen);
+	}
+}
+
+static void obj_record_clear(void *rec)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	FREE_AND_NULL(ref->hash_prefix);
+	FREE_AND_NULL(ref->offsets);
+	memset(ref, 0, sizeof(struct obj_record));
+}
+
+static byte obj_record_val_type(const void *rec)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8) {
+		return r->offset_len;
+	}
+	return 0;
+}
+
+static int obj_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+	if (r->offset_len == 0) {
+		return start.len - s.len;
+	}
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	{
+		uint64_t last = r->offsets[0];
+		int i = 0;
+		for (i = 1; i < r->offset_len; i++) {
+			int n = put_var_int(s, r->offsets[i] - last);
+			if (n < 0) {
+				return -1;
+			}
+			slice_consume(&s, n);
+			last = r->offsets[i];
+		}
+	}
+	return start.len - s.len;
+}
+
+static int obj_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct obj_record *r = (struct obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	r->hash_prefix = reftable_malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		slice_consume(&in, n);
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0) {
+		return start.len - in.len;
+	}
+
+	r->offsets = reftable_malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0) {
+		return n;
+	}
+	slice_consume(&in, n);
+
+	{
+		uint64_t last = r->offsets[0];
+		int j = 1;
+		while (j < count) {
+			uint64_t delta = 0;
+			int n = get_var_int(&delta, in);
+			if (n < 0) {
+				return n;
+			}
+			slice_consume(&in, n);
+
+			last = r->offsets[j] = (delta + last);
+			j++;
+		}
+	}
+	return start.len - in.len;
+}
+
+struct record_vtable obj_record_vtable = {
+	.key = &obj_record_key,
+	.type = &obj_record_type,
+	.copy_from = &obj_record_copy_from,
+	.val_type = &obj_record_val_type,
+	.encode = &obj_record_encode,
+	.decode = &obj_record_decode,
+	.clear = &obj_record_clear,
+};
+
+void reftable_log_record_print(struct reftable_log_record *log, int hash_size)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size);
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size);
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static byte reftable_log_record_type(void)
+{
+	return BLOCK_TYPE_LOG;
+}
+
+static void reftable_log_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_log_record *rec =
+		(const struct reftable_log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_be64(dest->buf + 1 + len, ts);
+}
+
+static void reftable_log_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
+	const struct reftable_log_record *src =
+		(const struct reftable_log_record *)src_rec;
+
+	*dst = *src;
+	if (dst->ref_name != NULL) {
+		dst->ref_name = xstrdup(dst->ref_name);
+	}
+	if (dst->email != NULL) {
+		dst->email = xstrdup(dst->email);
+	}
+	if (dst->name != NULL) {
+		dst->name = xstrdup(dst->name);
+	}
+	if (dst->message != NULL) {
+		dst->message = xstrdup(dst->message);
+	}
+
+	if (dst->new_hash != NULL) {
+		dst->new_hash = reftable_malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = reftable_malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void reftable_log_record_clear_void(void *rec)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	reftable_log_record_clear(r);
+}
+
+void reftable_log_record_clear(struct reftable_log_record *r)
+{
+	reftable_free(r->ref_name);
+	reftable_free(r->new_hash);
+	reftable_free(r->old_hash);
+	reftable_free(r->name);
+	reftable_free(r->email);
+	reftable_free(r->message);
+	memset(r, 0, sizeof(struct reftable_log_record));
+}
+
+static byte reftable_log_record_val_type(const void *rec)
+{
+	const struct reftable_log_record *log =
+		(const struct reftable_log_record *)rec;
+
+	return reftable_log_record_is_deletion(log) ? 0 : 1;
+}
+
+static byte zero[SHA256_SIZE] = { 0 };
+
+static int reftable_log_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (reftable_log_record_is_deletion(r)) {
+		return 0;
+	}
+
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size) {
+		return -1;
+	}
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	slice_consume(&s, 2 * hash_size);
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = put_var_int(s, r->time);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (s.len < 2) {
+		return -1;
+	}
+
+	put_be16(s.buf, r->tz_offset);
+	slice_consume(&s, 2);
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	return start.len - s.len;
+}
+
+static int reftable_log_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = { 0 };
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+		return FORMAT_ERROR;
+	}
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (val_type == 0) {
+		return 0;
+	}
+
+	if (in.len < 2 * hash_size) {
+		return FORMAT_ERROR;
+	}
+
+	r->old_hash = reftable_realloc(r->old_hash, hash_size);
+	r->new_hash = reftable_realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	slice_consume(&in, 2 * hash_size);
+
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->name = reftable_realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->email = reftable_realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+	r->time = ts;
+	if (in.len < 2) {
+		goto error;
+	}
+
+	r->tz_offset = get_be16(in.buf);
+	slice_consume(&in, 2);
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->message = reftable_realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	return start.len - in.len;
+
+error:
+	slice_clear(&dest);
+	return FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL) {
+		a = empty;
+	}
+	if (b == NULL) {
+		b = empty;
+	}
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL) {
+		a = zero;
+	}
+	if (b == NULL) {
+		b = zero;
+	}
+	return !memcmp(a, b, sz);
+}
+
+bool reftable_log_record_equal(struct reftable_log_record *a,
+			       struct reftable_log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+struct record_vtable reftable_log_record_vtable = {
+	.key = &reftable_log_record_key,
+	.type = &reftable_log_record_type,
+	.copy_from = &reftable_log_record_copy_from,
+	.val_type = &reftable_log_record_val_type,
+	.encode = &reftable_log_record_encode,
+	.decode = &reftable_log_record_decode,
+	.clear = &reftable_log_record_clear_void,
+};
+
+struct record new_record(byte typ)
+{
+	struct record rec;
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct reftable_ref_record *r =
+			reftable_calloc(sizeof(struct reftable_ref_record));
+		record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct obj_record *r =
+			reftable_calloc(sizeof(struct obj_record));
+		record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct reftable_log_record *r =
+			reftable_calloc(sizeof(struct reftable_log_record));
+		record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct index_record *r =
+			reftable_calloc(sizeof(struct index_record));
+		record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+void *record_yield(struct record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+void record_destroy(struct record *rec)
+{
+	record_clear(*rec);
+	reftable_free(record_yield(rec));
+}
+
+static byte index_record_type(void)
+{
+	return BLOCK_TYPE_INDEX;
+}
+
+static void index_record_key(const void *r, struct slice *dest)
+{
+	struct index_record *rec = (struct index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void index_record_copy_from(void *rec, const void *src_rec,
+				   int hash_size)
+{
+	struct index_record *dst = (struct index_record *)rec;
+	struct index_record *src = (struct index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void index_record_clear(void *rec)
+{
+	struct index_record *idx = (struct index_record *)rec;
+	slice_clear(&idx->last_key);
+}
+
+static byte index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int index_record_encode(const void *rec, struct slice out, int hash_size)
+{
+	const struct index_record *r = (const struct index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&out, n);
+
+	return start.len - out.len;
+}
+
+static int index_record_decode(void *rec, struct slice key, byte val_type,
+			       struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct index_record *r = (struct index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&in, n);
+	return start.len - in.len;
+}
+
+struct record_vtable index_record_vtable = {
+	.key = &index_record_key,
+	.type = &index_record_type,
+	.copy_from = &index_record_copy_from,
+	.val_type = &index_record_val_type,
+	.encode = &index_record_encode,
+	.decode = &index_record_decode,
+	.clear = &index_record_clear,
+};
+
+void record_key(struct record rec, struct slice *dest)
+{
+	rec.ops->key(rec.data, dest);
+}
+
+byte record_type(struct record rec)
+{
+	return rec.ops->type();
+}
+
+int record_encode(struct record rec, struct slice dest, int hash_size)
+{
+	return rec.ops->encode(rec.data, dest, hash_size);
+}
+
+void record_copy_from(struct record rec, struct record src, int hash_size)
+{
+	assert(src.ops->type() == rec.ops->type());
+
+	rec.ops->copy_from(rec.data, src.data, hash_size);
+}
+
+byte record_val_type(struct record rec)
+{
+	return rec.ops->val_type(rec.data);
+}
+
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size)
+{
+	return rec.ops->decode(rec.data, key, extra, src, hash_size);
+}
+
+void record_clear(struct record rec)
+{
+	return rec.ops->clear(rec.data);
+}
+
+void record_from_ref(struct record *rec, struct reftable_ref_record *ref_rec)
+{
+	rec->data = ref_rec;
+	rec->ops = &reftable_ref_record_vtable;
+}
+
+void record_from_obj(struct record *rec, struct obj_record *obj_rec)
+{
+	rec->data = obj_rec;
+	rec->ops = &obj_record_vtable;
+}
+
+void record_from_index(struct record *rec, struct index_record *index_rec)
+{
+	rec->data = index_rec;
+	rec->ops = &index_record_vtable;
+}
+
+void record_from_log(struct record *rec, struct reftable_log_record *log_rec)
+{
+	rec->data = log_rec;
+	rec->ops = &reftable_log_record_vtable;
+}
+
+struct reftable_ref_record *record_as_ref(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_REF);
+	return (struct reftable_ref_record *)rec.data;
+}
+
+struct reftable_log_record *record_as_log(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_LOG);
+	return (struct reftable_log_record *)rec.data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL) {
+		return !memcmp(a, b, hash_size);
+	}
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL) {
+		return 0 == strcmp(a, b);
+	}
+
+	return a == b;
+}
+
+bool reftable_ref_record_equal(struct reftable_ref_record *a,
+			       struct reftable_ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int reftable_ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_ref_record *)a)->ref_name,
+		      ((struct reftable_ref_record *)b)->ref_name);
+}
+
+bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int reftable_log_record_compare_key(const void *a, const void *b)
+{
+	struct reftable_log_record *la = (struct reftable_log_record *)a;
+	struct reftable_log_record *lb = (struct reftable_log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp) {
+		return cmp;
+	}
+	if (la->update_index > lb->update_index) {
+		return -1;
+	}
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
+{
+	return (log->new_hash == NULL && log->old_hash == NULL &&
+		log->name == NULL && log->email == NULL &&
+		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
+		log->message == NULL);
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 00000000000..bc795ad03fb
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,117 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+/* utilities for de/encoding varints */
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+
+/* Methods for records. */
+struct record_vtable {
+	/* encode the key of to a byte slice. */
+	void (*key)(const void *rec, struct slice *dest);
+
+	/* The record type of ('r' for ref). */
+	byte (*type)(void);
+
+	void (*copy_from)(void *dest, const void *src, int hash_size);
+
+	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
+	 * vs ref deletion) */
+	byte (*val_type)(const void *rec);
+
+	/* encodes rec into dest, returning how much space was used. */
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+
+	/* decode data from `src` into the record. */
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+
+	/* deallocate and null the record. */
+	void (*clear)(void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct record {
+	void *data;
+	struct record_vtable *ops;
+};
+
+/* returns true for recognized block types. Block start with the block type. */
+int is_block_type(byte typ);
+
+/* creates a malloced record of the given type. Dispose with record_destroy */
+struct record new_record(byte typ);
+
+extern struct record_vtable reftable_ref_record_vtable;
+
+/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
+   number of bytes written. */
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra);
+
+/* Decode into `key` and `extra` from `in` */
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in);
+
+/* index_record are used internally to speed up lookups. */
+struct index_record {
+	uint64_t offset; /* Offset of block */
+	struct slice last_key; /* Last key of the block. */
+};
+
+/* obj_record stores an object ID => ref mapping. */
+struct obj_record {
+	byte *hash_prefix; /* leading bytes of the object ID */
+	int hash_prefix_len; /* number of leading bytes. Constant
+			      * across a single table. */
+	uint64_t *offsets; /* a vector of file offsets. */
+	int offset_len;
+};
+
+/* see struct record_vtable */
+
+void record_key(struct record rec, struct slice *dest);
+byte record_type(struct record rec);
+void record_copy_from(struct record rec, struct record src, int hash_size);
+byte record_val_type(struct record rec);
+int record_encode(struct record rec, struct slice dest, int hash_size);
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size);
+
+/* zeroes out the embedded record */
+void record_clear(struct record rec);
+
+/* clear out the record, yielding the record data that was encapsulated. */
+void *record_yield(struct record *rec);
+
+/* clear and deallocate embedded record, and zero `rec`. */
+void record_destroy(struct record *rec);
+
+/* initialize generic records from concrete records. The generic record should
+ * be zeroed out. */
+void record_from_obj(struct record *rec, struct obj_record *objrec);
+void record_from_index(struct record *rec, struct index_record *idxrec);
+void record_from_ref(struct record *rec, struct reftable_ref_record *refrec);
+void record_from_log(struct record *rec, struct reftable_log_record *logrec);
+struct reftable_ref_record *record_as_ref(struct record ref);
+struct reftable_log_record *record_as_log(struct record ref);
+
+/* for qsort. */
+int reftable_ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int reftable_log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 00000000000..dcf5c433d4a
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,523 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *));
+
+/****************************************************************
+ Basic data types
+
+ Reftables store the state of each ref in struct reftable_ref_record, and they
+ store a sequence of reflog updates in struct reftable_log_record.
+ ****************************************************************/
+
+/* reftable_ref_record holds a ref database entry target_value */
+struct reftable_ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
+
+/* prints a reftable_ref_record onto stdout */
+void reftable_ref_record_print(struct reftable_ref_record *ref, int hash_size);
+
+/* frees and nulls all pointer values. */
+void reftable_ref_record_clear(struct reftable_ref_record *ref);
+
+/* returns whether two reftable_ref_records are the same */
+int reftable_ref_record_equal(struct reftable_ref_record *a,
+			      struct reftable_ref_record *b, int hash_size);
+
+/* reftable_log_record holds a reflog entry */
+struct reftable_log_record {
+	char *ref_name;
+	uint64_t update_index; /* logical timestamp of a transactional update.
+				*/
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int reftable_log_record_is_deletion(const struct reftable_log_record *log);
+
+/* frees and nulls all pointer values. */
+void reftable_log_record_clear(struct reftable_log_record *log);
+
+/* returns whether two records are equal. */
+int reftable_log_record_equal(struct reftable_log_record *a,
+			      struct reftable_log_record *b, int hash_size);
+
+/* dumps a reftable_log_record on stdout, for debugging/testing. */
+void reftable_log_record_print(struct reftable_log_record *log, int hash_size);
+
+/****************************************************************
+ Error handling
+
+ Error are signaled with negative integer return values. 0 means success.
+ ****************************************************************/
+
+/* different types of errors */
+enum reftable_error {
+	/* Unexpected file system behavior */
+	IO_ERROR = -2,
+
+	/* Format inconsistency on reading data
+	 */
+	FORMAT_ERROR = -3,
+
+	/* File does not exist. Returned from block_source_from_file(),  because
+	   it needs special handling in stack.
+	*/
+	NOT_EXIST_ERROR = -4,
+
+	/* Trying to write out-of-date data. */
+	LOCK_ERROR = -5,
+
+	/* Misuse of the API:
+	   - on writing a record with NULL ref_name.
+	   - on writing a reftable_ref_record outside the table limits
+	   - on writing a ref or log record before the stack's next_update_index
+	   - on reading a reftable_ref_record from log iterator, or vice versa.
+	*/
+	API_ERROR = -6,
+
+	/* Decompression error */
+	ZLIB_ERROR = -7,
+
+	/* Wrote a table without blocks. */
+	EMPTY_TABLE_ERROR = -8,
+};
+
+/* convert the numeric error code to a string. The string should not be
+ * deallocated. */
+const char *reftable_error_str(int err);
+
+/****************************************************************
+ Writing
+
+ Writing single reftables
+ ****************************************************************/
+
+/* reftable_write_options sets options for writing a single reftable. */
+struct reftable_write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+};
+
+/* reftable_block_stats holds statistics for a single block type */
+struct reftable_block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct reftable_stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct reftable_block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct reftable_block_stats obj_stats;
+	/* stats for index blocks */
+	struct reftable_block_stats idx_stats;
+	/* stats for log blocks */
+	struct reftable_block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* reftable_new_writer creates a new writer */
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, uint8_t *, int),
+		    void *writer_arg, struct reftable_write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int reftable_fd_write(void *fdp, uint8_t *data, int size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   reftable_stack_next_update_index(), or API_ERROR is returned.
+
+   For transactional updates, typically min==max. When converting an existing
+   ref database into a single reftable, this would be a range of update-index
+   timestamps.
+ */
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max);
+
+/* adds a reftable_ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   reftable_writer_set_limits(), or API_ERROR is returned.
+
+   It is an error to write a ref record after a log record.
+ */
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n);
+
+/* adds a reftable_log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n);
+
+/* reftable_writer_close finalizes the reftable. The writer is retained so
+ * statistics can be inspected. */
+int reftable_writer_close(struct reftable_writer *w);
+
+/* writer_stats returns the statistics on the reftable being written.
+
+   This struct becomes invalid when the writer is freed.
+ */
+const struct reftable_stats *writer_stats(struct reftable_writer *w);
+
+/* reftable_writer_free deallocates memory for the writer */
+void reftable_writer_free(struct reftable_writer *w);
+
+/****************************************************************
+ * ITERATING
+ ****************************************************************/
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct reftable_iterator {
+	struct reftable_iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_ref(struct reftable_iterator it,
+			       struct reftable_ref_record *ref);
+
+/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_log(struct reftable_iterator it,
+			       struct reftable_log_record *log);
+
+/* releases resources associated with an iterator. */
+void reftable_iterator_destroy(struct reftable_iterator *it);
+
+/****************************************************************
+ Reading single tables
+
+ The follow routines are for reading single files. For an application-level
+ interface, skip ahead to struct reftable_merged_table and struct
+ reftable_stack.
+ ****************************************************************/
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct reftable_block_source {
+	struct reftable_block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct reftable_block {
+	uint8_t *data;
+	int len;
+	struct reftable_block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct reftable_block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct reftable_block *dest,
+			  uint64_t off, uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct reftable_block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int reftable_block_source_from_file(struct reftable_block_source *block_src,
+				    const char *name);
+
+/* The reader struct is a handle to an open reftable file. */
+struct reftable_reader;
+
+/* reftable_new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp.  The name is used for creating a
+ * stack. Typically, it is the basename of the file.
+ */
+int reftable_new_reader(struct reftable_reader **pp,
+			struct reftable_block_source, const char *name);
+
+/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
+   in the table.  To seek to the start of the table, use name = "".
+
+   example:
+
+   struct reftable_reader *r = NULL;
+   int err = reftable_new_reader(&r, src, "filename");
+   if (err < 0) { ... }
+   struct reftable_iterator it  = {0};
+   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct reftable_ref_record ref  = {0};
+   while (1) {
+     err = reftable_iterator_next_ref(it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   reftable_iterator_destroy(&it);
+   reftable_ref_record_clear(&ref);
+ */
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reftable_reader_hash_id(struct reftable_reader *r);
+
+/* seek to logs for the given name, older than update_index. To seek to the
+   start of the table, use name = "".
+ */
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reftable_reader_free(struct reftable_reader *);
+
+/* return an iterator for the refs pointing to oid */
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid,
+			     int oid_len);
+
+/* return the max_update_index for a table */
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
+
+/****************************************************************
+ Merged tables
+
+ A ref database kept in a sequence of table files. The merged_table presents a
+ unified view to reading (seeking, iterating) a sequence of immutable tables.
+ ****************************************************************/
+
+/* A merged table is implements seeking/iterating over a stack of tables. */
+struct reftable_merged_table;
+
+/* reftable_new_merged_table creates a new merged table. It takes ownership of
+   the stack array.
+*/
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id);
+
+/* returns the hash id used in this merged table. */
+uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *mt);
+
+/* returns an iterator positioned just before 'name' */
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index);
+
+/* like reftable_merged_table_seek_log_at but look for the newest entry. */
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
+
+/* closes readers for the merged tables */
+void reftable_merged_table_close(struct reftable_merged_table *mt);
+
+/* releases memory for the merged_table */
+void reftable_merged_table_free(struct reftable_merged_table *m);
+
+/****************************************************************
+ Mutable ref database
+
+ The stack presents an interface to a mutable sequence of reftables.
+ ****************************************************************/
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct reftable_stack;
+
+/* open a new reftable stack. The tables along with the table list will be
+   stored in 'dir'. Typically, this should be .git/reftables.
+*/
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
+
+/* holds a transaction to add tables at the top of a stack. */
+struct reftable_addition;
+
+/*
+  returns a new transaction to add reftables to the given stack. As a side
+  effect, the ref database is locked.
+*/
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st);
+
+/* Adds a reftable to transaction. */
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg);
+
+/* Commits the transaction, releasing the lock. */
+int reftable_addition_commit(struct reftable_addition *add);
+
+/* Release all non-committed data from the transaction; releases the lock if
+ * held. */
+void reftable_addition_close(struct reftable_addition *add);
+
+/* add a new table to the stack. The write_table function must call
+   reftable_writer_set_limits, add refs and return an error value. */
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write_table)(struct reftable_writer *wr,
+					  void *write_arg),
+		       void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st);
+
+/* frees all resources associated with the stack. */
+void reftable_stack_destroy(struct reftable_stack *st);
+
+/* reloads the stack if necessary. */
+int reftable_stack_reload(struct reftable_stack *st);
+
+/* Policy for expiring reflog entries. */
+struct reftable_log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int reftable_stack_auto_compact(struct reftable_stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log);
+
+/* statistics on past compactions. */
+struct reftable_compaction_stats {
+	uint64_t bytes; /* total number of bytes written */
+	int attempts; /* how often we tried to compact */
+	int failures; /* failures happen on concurrent updates */
+};
+
+/* return statistics for compaction up till now. */
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st);
+
+#endif
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 00000000000..f3f190c3918
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,224 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+
+	{
+		int l = strlen(str);
+		l++; /* \0 */
+		slice_resize(s, l);
+		memcpy(s->buf, str, l);
+		s->len = l - 1;
+	}
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = reftable_realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_append_string(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_append(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+void slice_consume(struct slice *s, int n)
+{
+	s->buf += n;
+	s->len -= n;
+}
+
+byte *slice_yield(struct slice *s)
+{
+	byte *p = s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_clear(struct slice *s)
+{
+	reftable_free(slice_yield(s));
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = { 0 };
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_yield(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	if (a.len != b.len) {
+		return 0;
+	}
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_compare(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	if (res != 0) {
+		return res;
+	}
+	if (a.len < b.len) {
+		return -1;
+	} else if (a.len > b.len) {
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+int slice_write(struct slice *b, byte *data, int sz)
+{
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = reftable_realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_write_void(void *b, byte *data, int sz)
+{
+	return slice_write((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = reftable_calloc(size);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf)
+{
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+struct reftable_block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct reftable_block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct reftable_block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 00000000000..d7e0603d346
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,76 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+/*
+  provides bounds-checked byte ranges.
+  To use, initialize as "slice x = {0};"
+ */
+struct slice {
+	int len;
+	int cap;
+	byte *buf;
+};
+
+void slice_set_string(struct slice *dest, const char *src);
+void slice_append_string(struct slice *dest, const char *src);
+/* Set length to 0, but retain buffer */
+void slice_clear(struct slice *slice);
+
+/* Return a malloced string for `src` */
+char *slice_to_string(struct slice src);
+
+/* Ensure that `buf` is \0 terminated. */
+const char *slice_as_string(struct slice *src);
+
+/* Compare slices */
+bool slice_equal(struct slice a, struct slice b);
+
+/* Return `buf`, clearing out `s` */
+byte *slice_yield(struct slice *s);
+
+/* Copy bytes */
+void slice_copy(struct slice *dest, struct slice src);
+
+/* Advance `buf` by `n`, and decrease length. A copy of the slice
+   should be kept for deallocating the slice. */
+void slice_consume(struct slice *s, int n);
+
+/* Set length of the slice to `l` */
+void slice_resize(struct slice *s, int l);
+
+/* Signed comparison */
+int slice_compare(struct slice a, struct slice b);
+
+/* Append `data` to the `dest` slice.  */
+int slice_write(struct slice *dest, byte *data, int sz);
+
+/* Append `add` to `dest. */
+void slice_append(struct slice *dest, struct slice add);
+
+/* Like slice_write, but suitable for passing to reftable_new_writer
+ */
+int slice_write_void(void *b, byte *data, int sz);
+
+/* Find the longest shared prefix size of `a` and `b` */
+int common_prefix_size(struct slice a, struct slice b);
+
+struct reftable_block_source;
+
+/* Create an in-memory block source for reading reftables */
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf);
+
+struct reftable_block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 00000000000..d34320d03f5
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1132 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "reftable.h"
+#include "writer.h"
+
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config)
+{
+	struct reftable_stack *p =
+		reftable_calloc(sizeof(struct reftable_stack));
+	struct slice list_file_name = {};
+	int err = 0;
+	*dest = NULL;
+
+	slice_set_string(&list_file_name, dir);
+	slice_append_string(&list_file_name, "/reftables.list");
+
+	p->list_file = slice_to_string(list_file_name);
+	slice_clear(&list_file_name);
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = reftable_stack_reload(p);
+	if (err < 0) {
+		reftable_stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fread_lines(FILE *f, char ***namesp)
+{
+	long size = 0;
+	int err = fseek(f, 0, SEEK_END);
+	char *buf = NULL;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	size = ftell(f);
+	if (size < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	err = fseek(f, 0, SEEK_SET);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	buf = reftable_malloc(size + 1);
+	if (fread(buf, 1, size, f) != size) {
+		err = IO_ERROR;
+		goto exit;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+exit:
+	reftable_free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	FILE *f = fopen(filename, "r");
+	int err = 0;
+	if (f == NULL) {
+		if (errno == ENOENT) {
+			*namesp = reftable_calloc(sizeof(char *));
+			return 0;
+		}
+
+		return IO_ERROR;
+	}
+	err = fread_lines(f, namesp);
+	fclose(f);
+	return err;
+}
+
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void reftable_stack_destroy(struct reftable_stack *st)
+{
+	if (st->merged == NULL) {
+		return;
+	}
+
+	reftable_merged_table_close(st->merged);
+	reftable_merged_table_free(st->merged);
+	st->merged = NULL;
+
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	reftable_free(st);
+}
+
+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
+						   int cur_len)
+{
+	struct reftable_reader **cur =
+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
+				      bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reftable_reader **new_tables =
+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
+	int new_tables_len = 0;
+	struct reftable_merged_table *new_merged = NULL;
+
+	struct slice table_path = { 0 };
+
+	while (*names) {
+		struct reftable_reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct reftable_block_source src = { 0 };
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_append_string(&table_path, "/");
+			slice_append_string(&table_path, name);
+
+			err = reftable_block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0) {
+				goto exit;
+			}
+
+			err = reftable_new_reader(&rd, src, name);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
+					st->config.hash_id);
+	if (err < 0) {
+		goto exit;
+	}
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		reftable_merged_table_free(st->merged);
+	}
+	st->merged = new_merged;
+
+	{
+		int i = 0;
+		for (i = 0; i < cur_len; i++) {
+			if (cur[i] != NULL) {
+				reader_close(cur[i]);
+				reftable_reader_free(cur[i]);
+			}
+		}
+	}
+exit:
+	slice_clear(&table_path);
+	{
+		int i = 0;
+		for (i = 0; i < new_tables_len; i++) {
+			reader_close(new_tables[i]);
+		}
+	}
+	reftable_free(new_tables);
+	reftable_free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0) {
+		return diff;
+	}
+
+	return udiff;
+}
+
+static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+					     bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = reftable_stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		err = read_lines(st->list_file, &names_after);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return -1;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 100;
+		usleep(delay);
+	}
+
+	return 0;
+}
+
+int reftable_stack_reload(struct reftable_stack *st)
+{
+	return reftable_stack_reload_maybe_reuse(st, true);
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct reftable_stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto exit;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto exit;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	free_names(names);
+	return err;
+}
+
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write)(struct reftable_writer *wr, void *arg),
+		       void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == LOCK_ERROR) {
+			err = reftable_stack_reload(st);
+		}
+		return err;
+	}
+
+	return reftable_stack_auto_compact(st);
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "%012" PRIx64 "-%012" PRIx64, min, max);
+	slice_set_string(dest, buf);
+}
+
+struct reftable_addition {
+	int lock_file_fd;
+	struct slice lock_file_name;
+	struct reftable_stack *stack;
+	char **names;
+	char **new_tables;
+	int new_tables_len;
+	uint64_t next_update_index;
+};
+
+static int reftable_stack_init_addition(struct reftable_addition *add,
+					struct reftable_stack *st)
+{
+	int err = 0;
+	add->stack = st;
+
+	slice_set_string(&add->lock_file_name, st->list_file);
+	slice_append_string(&add->lock_file_name, ".lock");
+
+	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (add->lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = LOCK_ERROR;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	err = stack_uptodate(st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 1) {
+		err = LOCK_ERROR;
+		goto exit;
+	}
+
+	add->next_update_index = reftable_stack_next_update_index(st);
+exit:
+	if (err) {
+		reftable_addition_close(add);
+	}
+	return err;
+}
+
+void reftable_addition_close(struct reftable_addition *add)
+{
+	int i = 0;
+	struct slice nm = {};
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_set_string(&nm, add->stack->list_file);
+		slice_append_string(&nm, "/");
+		slice_append_string(&nm, add->new_tables[i]);
+		unlink(slice_as_string(&nm));
+
+		reftable_free(add->new_tables[i]);
+		add->new_tables[i] = NULL;
+	}
+	reftable_free(add->new_tables);
+	add->new_tables = NULL;
+	add->new_tables_len = 0;
+
+	if (add->lock_file_fd > 0) {
+		close(add->lock_file_fd);
+		add->lock_file_fd = 0;
+	}
+	if (add->lock_file_name.len > 0) {
+		unlink(slice_as_string(&add->lock_file_name));
+		slice_clear(&add->lock_file_name);
+	}
+
+	free_names(add->names);
+	add->names = NULL;
+}
+
+int reftable_addition_commit(struct reftable_addition *add)
+{
+	struct slice table_list = { 0 };
+	int i = 0;
+	int err = 0;
+	if (add->new_tables_len == 0) {
+		goto exit;
+	}
+
+	for (i = 0; i < add->stack->merged->stack_len; i++) {
+		slice_append_string(&table_list,
+				    add->stack->merged->stack[i]->name);
+		slice_append_string(&table_list, "\n");
+	}
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_append_string(&table_list, add->new_tables[i]);
+		slice_append_string(&table_list, "\n");
+	}
+
+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	slice_clear(&table_list);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = close(add->lock_file_fd);
+	add->lock_file_fd = 0;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&add->lock_file_name),
+		     add->stack->list_file);
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	err = reftable_stack_reload(add->stack);
+
+exit:
+	reftable_addition_close(add);
+	return err;
+}
+
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st)
+{
+	int err = 0;
+	*dest = reftable_malloc(sizeof(**dest));
+	err = reftable_stack_init_addition(*dest, st);
+	if (err) {
+		reftable_free(*dest);
+		*dest = NULL;
+	}
+	return err;
+}
+
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg)
+{
+	struct reftable_addition add = { 0 };
+	int err = reftable_stack_init_addition(&add, st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_addition_add(&add, write_table, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_addition_commit(&add);
+exit:
+	reftable_addition_close(&add);
+	return err;
+}
+
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice tab_file_name = { 0 };
+	struct slice next_name = { 0 };
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+	uint64_t next_update_index = 0;
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, next_update_index, next_update_index);
+
+	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
+	slice_append_string(&temp_tab_file_name, "/");
+	slice_append(&temp_tab_file_name, next_name);
+	slice_append_string(&temp_tab_file_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
+	if (tab_fd < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
+				 &add->stack->config);
+	err = write_table(wr, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_writer_close(wr);
+	if (err == EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto exit;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	if (wr->min_update_index < next_update_index) {
+		err = API_ERROR;
+		goto exit;
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_append_string(&next_name, ".ref");
+
+	slice_set_string(&tab_file_name, add->stack->reftable_dir);
+	slice_append_string(&tab_file_name, "/");
+	slice_append(&tab_file_name, next_name);
+
+	err = rename(slice_as_string(&temp_tab_file_name),
+		     slice_as_string(&tab_file_name));
+	if (err < 0) {
+		err = IO_ERROR;
+		goto exit;
+	}
+
+	add->new_tables = reftable_realloc(add->new_tables,
+					   sizeof(*add->new_tables) *
+						   (add->new_tables_len + 1));
+	add->new_tables[add->new_tables_len] = slice_to_string(next_name);
+	add->new_tables_len++;
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_file_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_file_name));
+	}
+
+	slice_clear(&temp_tab_file_name);
+	slice_clear(&tab_file_name);
+	slice_clear(&next_name);
+	reftable_writer_free(wr);
+	return err;
+}
+
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0) {
+		return reftable_reader_max_update_index(
+			       st->merged->stack[sz - 1]) +
+		       1;
+	}
+	return 1;
+}
+
+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct reftable_log_expiry_config *config)
+{
+	struct slice next_name = { 0 };
+	int tab_fd = -1;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reftable_reader_min_update_index(st->merged->stack[first]),
+		    reftable_reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_append_string(temp_tab, "/");
+	slice_append(temp_tab, next_name);
+	slice_append_string(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0) {
+		goto exit;
+	}
+	err = reftable_writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+	reftable_writer_free(wr);
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		slice_clear(temp_tab);
+	}
+	slice_clear(&next_name);
+	return err;
+}
+
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reftable_reader **subtabs = reftable_calloc(
+		sizeof(struct reftable_reader *) * (last - first + 1));
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reftable_reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	reftable_writer_set_limits(wr,
+				   st->merged->stack[first]->min_update_index,
+				   st->merged->stack[last]->max_update_index);
+
+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
+					st->config.hash_id);
+	if (err < 0) {
+		reftable_free(subtabs);
+		goto exit;
+	}
+
+	err = reftable_merged_table_seek_ref(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_ref(it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = reftable_writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+	}
+
+	err = reftable_merged_table_seek_log(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_log(it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		/* XXX collect stats? */
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = reftable_writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		reftable_merged_table_free(mt);
+	}
+	reftable_ref_record_clear(&ref);
+
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct reftable_stack *st, int first, int last,
+			       struct reftable_log_expiry_config *expiry)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice new_table_name = { 0 };
+	struct slice lock_file_name = { 0 };
+	struct slice ref_list_contents = { 0 };
+	struct slice new_table_path = { 0 };
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **delete_on_success =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	char **subtable_locks =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto exit;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_append_string(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0) {
+		goto exit;
+	}
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_file_name = { 0 };
+		struct slice subtab_lock = { 0 };
+		slice_set_string(&subtab_file_name, st->reftable_dir);
+		slice_append_string(&subtab_file_name, "/");
+		slice_append_string(&subtab_file_name,
+				    reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_file_name);
+		slice_append_string(&subtab_lock, ".lock");
+
+		{
+			int sublock_file_fd =
+				open(slice_as_string(&subtab_lock),
+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
+			if (sublock_file_fd > 0) {
+				close(sublock_file_fd);
+			} else if (sublock_file_fd < 0) {
+				if (errno == EEXIST) {
+					err = 1;
+				}
+				err = IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] =
+			(char *)slice_as_string(&subtab_file_name);
+		j++;
+
+		if (err != 0) {
+			goto exit;
+		}
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
+				   expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_append_string(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_append_string(&new_table_path, "/");
+
+	slice_append(&new_table_path, new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(slice_as_string(&temp_tab_file_name),
+			     slice_as_string(&new_table_path));
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		slice_append(&ref_list_contents, new_table_name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	have_lock = false;
+
+	{
+		char **p = delete_on_success;
+		while (*p) {
+			if (strcmp(*p, slice_as_string(&new_table_path))) {
+				unlink(*p);
+			}
+			p++;
+		}
+	}
+
+	err = reftable_stack_reload_maybe_reuse(st, first < last);
+exit:
+	free_names(delete_on_success);
+	{
+		char **p = subtable_locks;
+		while (*p) {
+			unlink(*p);
+			p++;
+		}
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	slice_clear(&new_table_name);
+	slice_clear(&new_table_path);
+	slice_clear(&ref_list_contents);
+	slice_clear(&temp_tab_file_name);
+	slice_clear(&lock_file_name);
+	return err;
+}
+
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct reftable_stack *st, int first,
+				     int last,
+				     struct reftable_log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	assert(sz > 0);
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+	if (next > 0) {
+		segs[next++] = cur;
+	}
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	reftable_free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
+{
+	uint64_t *sizes =
+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		/* overhead is 24 + 68 = 92. */
+		sizes[i] = st->merged->stack[i]->size - 91;
+	}
+	return sizes;
+}
+
+int reftable_stack_auto_compact(struct reftable_stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	reftable_free(sizes);
+	if (segment_size(&seg) > 0) {
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+	}
+
+	return 0;
+}
+
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st)
+{
+	return &st->stats;
+}
+
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_ref(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_ref(it, ref);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(ref->ref_name, refname) ||
+	    reftable_ref_record_is_deletion(ref)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_log(it, log);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(log->ref_name, refname) ||
+	    reftable_log_record_is_deletion(log)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 00000000000..362d6a9c56f
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,42 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+
+struct reftable_stack {
+	char *list_file;
+	char *reftable_dir;
+
+	struct reftable_write_options config;
+
+	struct reftable_merged_table *merged;
+	struct reftable_compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg);
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config);
+int fastlog2(uint64_t sz);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 00000000000..99f453bfec8
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#if 1 /* REFTABLE_IN_GITCORE */
+
+#include "git-compat-util.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_IN_GITCORE */
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef uint8_t byte;
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ *
+ * TODO: in git-core, this should fallback to uncompress2 if it is available.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 00000000000..0341c865569
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,67 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				reftable_calloc(sizeof(struct tree_node));
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	{
+		int res = compare(key, (*rootp)->key);
+		if (res < 0) {
+			return tree_search(key, &(*rootp)->left, compare,
+					   insert);
+		} else if (res > 0) {
+			return tree_search(key, &(*rootp)->right, compare,
+					   insert);
+		}
+	}
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	reftable_free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 00000000000..954512e9a3a
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+/* tree_node is a generic binary search tree. */
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+/* looks for `key` in `rootp` using `compare` as comparison function. If insert
+   is set, insert the key if it's not found. Else, return NULL.
+*/
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+
+/* performs an infix walk of the tree. */
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+
+/*
+  deallocates the tree nodes recursively. Keys should be deallocated separately
+  by walking over the tree. */
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100644
index 00000000000..0fd90f89675
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,14 @@
+#!/bin/sh
+
+set -eux
+
+((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
+git clone https://github.com/google/reftable reftable-repo) && \
+cp reftable-repo/c/*.[ch] reftable/ && \
+cp reftable-repo/c/include/*.[ch] reftable/ && \
+cp reftable-repo/LICENSE reftable/ &&
+git --git-dir reftable-repo/.git show --no-patch origin/master \
+> reftable/VERSION && \
+sed -i~ 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|' reftable/system.h
+rm reftable/*_test.c reftable/test_framework.* reftable/compat.*
+git add reftable/*.[ch]
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 00000000000..0d810134414
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,661 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct reftable_block_stats *
+writer_reftable_block_stats(struct reftable_writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct reftable_writer *w, byte *data, size_t len,
+			int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = reftable_calloc(w->pending_padding);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0) {
+			return n;
+		}
+
+		w->pending_padding = 0;
+		reftable_free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0) {
+		return n;
+	}
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct reftable_write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_version(struct reftable_writer *w)
+{
+	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
+}
+
+static int writer_write_header(struct reftable_writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	dest[4] = writer_version(w);
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	if (writer_version(w) == 2) {
+		put_be32(dest + 24, w->opts.hash_id);
+	}
+	return header_size(writer_version(w));
+}
+
+static void writer_reinit_block_writer(struct reftable_writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = header_size(writer_version(w));
+	}
+
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, byte *, int), void *writer_arg,
+		    struct reftable_write_options *opts)
+{
+	struct reftable_writer *wp =
+		reftable_calloc(sizeof(struct reftable_writer));
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->block = reftable_calloc(opts->block_size);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void reftable_writer_free(struct reftable_writer *w)
+{
+	reftable_free(w->block);
+	reftable_free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
+			     ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct reftable_writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		key = reftable_calloc(sizeof(struct obj_index_tree_node));
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = reftable_realloc(
+			key->offsets, sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct reftable_writer *w, struct record rec)
+{
+	int result = -1;
+	struct slice key = { 0 };
+	int err = 0;
+	record_key(rec, &key);
+	if (slice_compare(w->last_key, key) >= 0) {
+		goto exit;
+	}
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto exit;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	writer_reinit_block_writer(w, record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	result = 0;
+exit:
+	slice_clear(&key);
+	return result;
+}
+
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref)
+{
+	struct record rec = { 0 };
+	struct reftable_ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL) {
+		return API_ERROR;
+	}
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index) {
+		return API_ERROR;
+	}
+
+	record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = hash_size(w->opts.hash_id),
+		};
+
+		writer_index_hash(w, h);
+	}
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = hash_size(w->opts.hash_id),
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, reftable_ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log)
+{
+	if (log->ref_name == NULL) {
+		return API_ERROR;
+	}
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	{
+		struct record rec = { 0 };
+		int err;
+		record_from_log(&rec, log);
+		err = writer_add_record(w, rec);
+		return err;
+	}
+}
+
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, reftable_log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	while (w->index_len > threshold) {
+		struct index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct record rec = { 0 };
+			record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, rec) == 0) {
+				continue;
+			}
+
+			{
+				int err = writer_flush_block(w);
+				if (err < 0) {
+					return err;
+				}
+			}
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, rec);
+			assert(err == 0);
+		}
+		for (i = 0; i < idx_len; i++) {
+			slice_clear(&idx[i].last_key);
+		}
+		reftable_free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_block_stats *bstats =
+			writer_reftable_block_stats(w, typ);
+		bstats->index_blocks =
+			w->stats.idx_stats.blocks - before_blocks;
+		bstats->index_offset = index_start;
+		bstats->max_index_level = max_level;
+	}
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct reftable_writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct record rec = { 0 };
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	slice_clear(&entry->hash);
+	reftable_free(entry);
+}
+
+static int writer_dump_object_index(struct reftable_writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0) {
+		return closure.err;
+	}
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct reftable_writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0) {
+		return err;
+	}
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int reftable_writer_close(struct reftable_writer *w)
+{
+	byte footer[72];
+	byte *p = footer;
+	int err = writer_finish_public_section(w);
+	int empty_table = w->next == 0;
+	if (err != 0) {
+		goto exit;
+	}
+	w->pending_padding = 0;
+	if (empty_table) {
+		// Empty tables need a header anyway.
+		byte header[28];
+		int n = writer_write_header(w, header);
+		err = padded_write(w, header, n, 0);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	p += writer_write_header(w, footer);
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+
+	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (empty_table) {
+		err = EMPTY_TABLE_ERROR;
+		goto exit;
+	}
+
+exit:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	slice_clear(&w->last_key);
+	return err;
+}
+
+void writer_clear_index(struct reftable_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		slice_clear(&w->index[i].last_key);
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct reftable_block_stats *bstats =
+		writer_reftable_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	if (raw_bytes < 0) {
+		return raw_bytes;
+	}
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0) {
+		return err;
+	}
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = reftable_realloc(
+			w->index, sizeof(struct index_record) * w->index_cap);
+	}
+
+	{
+		struct index_record ir = {
+			.offset = w->next,
+		};
+		slice_copy(&ir.last_key, w->block_writer->last_key);
+		w->index[w->index_len] = ir;
+	}
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct reftable_writer *w)
+{
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+	if (w->block_writer->entries == 0) {
+		return 0;
+	}
+	return writer_flush_nonempty_block(w);
+}
+
+const struct reftable_stats *writer_stats(struct reftable_writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 00000000000..5115e762b12
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct reftable_writer {
+	int (*write)(void *, byte *, int);
+	void *write_arg;
+	int pending_padding;
+	struct slice last_key;
+
+	/* offset of next block to write. */
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct reftable_write_options opts;
+
+	/* memory buffer for writing */
+	byte *block;
+
+	/* writer for the current section. NULL or points to
+	 * block_writer_data */
+	struct block_writer *block_writer;
+
+	struct block_writer block_writer_data;
+
+	/* pending index records for the current section */
+	struct index_record *index;
+	int index_len;
+	int index_cap;
+
+	/*
+	  tree for use with tsearch; used to populate the 'o' inverse OID
+	  map */
+	struct tree_node *obj_index_tree;
+
+	struct reftable_stats stats;
+};
+
+/* finishes a block, and writes it to storage */
+int writer_flush_block(struct reftable_writer *w);
+
+/* deallocates memory related to the index */
+void writer_clear_index(struct reftable_writer *w);
+
+/* finishes writing a 'r' (refs) or 'g' (reflogs) section */
+int writer_finish_public_section(struct reftable_writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 00000000000..3e0b0f24f1c
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v9 10/10] Reftable support for git-core
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
                                   ` (8 preceding siblings ...)
  2020-04-20 21:14                 ` [PATCH v9 09/10] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-04-20 21:14                 ` Han-Wen Nienhuys via GitGitGadget
  2020-04-21 20:13                 ` [PATCH v9 00/10] Reftable support git-core Junio C Hamano
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
  11 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-20 21:14 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

TODO:

 * Resolve spots marked with XXX
 * Test strategy?

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   26 +-
 builtin/clone.c                               |    5 +-
 builtin/init-db.c                             |   49 +-
 cache.h                                       |    4 +-
 refs.c                                        |   20 +-
 refs.h                                        |    2 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1021 +++++++++++++++++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |   35 +
 13 files changed, 1158 insertions(+), 29 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ffd..72576235833 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index ef1ff2228f0..3d5a585d8fe 100644
--- a/Makefile
+++ b/Makefile
@@ -814,6 +814,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += command-list.h
 
@@ -960,6 +961,7 @@ LIB_OBJS += rebase-interactive.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1164,7 +1166,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2348,11 +2350,28 @@ VCSSVN_OBJS += vcs-svn/fast_export.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/bytes.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2489,6 +2508,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -3104,7 +3126,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/builtin/clone.c b/builtin/clone.c
index d8b1f413aad..3bead96e447 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1109,7 +1109,10 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template,
+                GIT_HASH_UNKNOWN,
+		DEFAULT_REF_STORAGE, /* XXX */
+		INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 3b50b1aa0e5..888b421338d 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,7 +178,7 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo, const char *ref_storage_format)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -188,7 +188,7 @@ void initialize_repository_version(int hash_algo)
 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
 #endif
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 || !strcmp(ref_storage_format, "reftable"))
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -238,6 +238,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -247,6 +248,15 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -259,15 +269,17 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
+	} else {
+		/*
+		 * XXX should check whether our ref backend matches the
+		 * original one; if not, either die() or convert
+		 */
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
@@ -381,7 +393,8 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash, unsigned int flags)
+	    const char *template_dir, int hash, const char *ref_storage_format,
+            unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -420,6 +433,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
+        repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
@@ -448,6 +462,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -521,6 +537,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = DEFAULT_REF_STORAGE;
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
@@ -528,15 +545,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *object_format = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
@@ -646,9 +666,10 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
+	return init_db(git_dir, real_git_dir, template_dir, hash_algo, ref_storage_format, flags);
 }
diff --git a/cache.h b/cache.h
index c77b95870a5..83c4908ba06 100644
--- a/cache.h
+++ b/cache.h
@@ -628,8 +628,9 @@ int path_inside_repo(const char *prefix, const char *path);
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
+            const char *ref_storage_format,
 	    unsigned int flags);
-void initialize_repository_version(int hash_algo);
+void initialize_repository_version(int hash_algo, const char *ref_storage_format);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1043,6 +1044,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 3b9742d03cb..7b90b84f29c 100644
--- a/refs.c
+++ b/refs.c
@@ -20,7 +20,7 @@
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1836,13 +1836,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1858,7 +1858,10 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs = ref_store_init(r->gitdir,
+				 r->ref_storage_format ? r->ref_storage_format :
+							 DEFAULT_REF_STORAGE,
+				 REF_STORE_ALL_CAPS);
 	return r->refs;
 }
 
@@ -1913,7 +1916,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1927,6 +1930,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1940,9 +1944,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index 87c9ec921b9..d7fbbe26e6a 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,8 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+#define DEFAULT_REF_STORAGE "files"
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 3490aac3a40..cafe5b97376 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -661,6 +661,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 00000000000..e114ab71342
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1021 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct git_reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *reftable_dir;
+	struct reftable_stack *stack;
+};
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	reftable_log_record_clear(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = { NULL };
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	reftable_log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *git_reftable_ref_store_create(const char *path,
+						       unsigned int store_flags)
+{
+	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct reftable_write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", path);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/HEAD", path);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", path);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	safe_create_dir(refs->reftable_dir, 1);
+	return 0;
+}
+
+struct git_reftable_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	const char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = reftable_iterator_next_ref(ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	reftable_ref_record_clear(&ri->ref);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct reftable_merged_table *mt = NULL;
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else {
+		mt = reftable_stack_merged_table(refs->stack);
+		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+        ri->prefix = prefix;
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	(void)refs;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid;
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct ref_update *)a)->refname,
+		      ((struct ref_update *)b)->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = reftable_stack_next_update_index(refs->stack);
+	int err = 0;
+	struct reftable_log_record *logs =
+		calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct reftable_log_record *log = &logs[i];
+		fill_reftable_log_record(log);
+		log->ref_name = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct object_id out_oid;
+			int out_flags = 0;
+			/* Memory owned by refs_resolve_ref_unsafe, no need to
+			 * free(). */
+			const char *resolved = refs_resolve_ref_unsafe(
+				transaction->ref_store, u->refname, 0, &out_oid,
+				&out_flags);
+			struct reftable_ref_record ref = { NULL };
+			ref.ref_name =
+				(char *)(resolved ? resolved : u->refname);
+			log->ref_name = ref.ref_name;
+			ref.value = u->new_oid.hash;
+			ref.update_index = ts;
+			err = reftable_writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (int i = 0; i < transaction->nr; i++) {
+		err = reftable_writer_add_log(writer, &logs[i]);
+		clear_reftable_log_record(&logs[i]);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+exit:
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_commit(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_add(refs->stack, &write_transaction_table,
+				 transaction);
+	if (err < 0) {
+		strbuf_addf(errmsg, "reftable: transaction failure %s",
+			    reftable_error_str(err));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *err)
+{
+	return reftable_transaction_commit(ref_store, transaction, err);
+}
+
+struct write_delete_refs_arg {
+	struct reftable_stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+
+	reftable_writer_set_limits(writer, ts, ts);
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = reftable_writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (int i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_log_record log = { NULL };
+		struct reftable_ref_record current = { NULL };
+		fill_reftable_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (reftable_stack_read_ref(arg->stack, log.ref_name,
+					    &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = reftable_writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		reftable_ref_record_clear(&current);
+
+		clear_reftable_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	return reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct git_reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = reftable_stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	reftable_writer_set_limits(writer, ts, ts);
+	err = reftable_writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_log_record log = { NULL };
+		struct object_id new_oid;
+		struct object_id old_oid;
+		struct reftable_ref_record current = { NULL };
+		reftable_stack_read_ref(create->refs->stack, create->refname,
+					&current);
+
+		fill_reftable_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			reftable_writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_reftable_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_add(refs->stack, &write_create_symref_table,
+				  &arg);
+}
+
+struct write_rename_arg {
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_ref_record ref = { NULL };
+	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto exit;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto exit;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	reftable_writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct reftable_ref_record todo[2] = { { NULL } };
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = reftable_writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct reftable_log_record todo[2] = { { NULL } };
+		fill_reftable_log_record(&todo[0]);
+		fill_reftable_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = reftable_writer_add_logs(writer, todo, 2);
+
+		clear_reftable_log_record(&todo[0]);
+		clear_reftable_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto exit;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+exit:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	return reftable_stack_add(refs->stack, &write_rename_table, &arg);
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = reftable_iterator_next_log(ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+                if (reftable_log_record_is_deletion(&ri->log)) {
+                        /* XXX - Why does the reftable_stack filter these? */
+                        continue;
+                }
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+                        /* we want the refnames that we have reflogs for, so we
+                         * skip if we've already produced this name. This could
+                         * be faster by seeking directly to
+                         * reflog@update_index==0.
+                         */
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	reftable_log_record_clear(&ri->log);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(refs->stack);
+	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_log_record log = { NULL };
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		err = reftable_iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid;
+			struct object_id new_oid;
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct reftable_log_record log = { NULL };
+		err = reftable_iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (int i = len; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (int i = 0; i < len; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		reftable_log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct reftable_log_record tombstone = {
+		.ref_name = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->refs->stack);
+	int i = 0;
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  reftable_pack_refs(), which will compact the entire stack and get rid
+	  of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.refs = refs,
+	};
+	struct reftable_log_record log = { NULL };
+	struct reftable_iterator it = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		struct object_id ooid;
+		struct object_id noid;
+
+		int err = reftable_iterator_next_log(it, &log);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0 || strcmp(log.ref_name, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_read_ref(refs->stack, refname, &ref);
+	if (err) {
+		goto exit;
+	}
+	if (ref.target != NULL) {
+		/* XXX recurse? */
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else {
+		hashcpy(oid->hash, ref.value);
+	}
+exit:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	git_reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire
+};
diff --git a/repository.c b/repository.c
index 6f7f6f002b1..087760bc184 100644
--- a/repository.c
+++ b/repository.c
@@ -178,6 +178,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 040057dea6f..198d4aa0907 100644
--- a/repository.h
+++ b/repository.h
@@ -70,6 +70,9 @@ struct repository {
 	/* The store in which the refs are held. */
 	struct ref_store *refs;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index 65fe5ecefbe..2ef970f9f88 100644
--- a/setup.c
+++ b/setup.c
@@ -468,9 +468,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -559,6 +561,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1204,8 +1207,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 00000000000..4da1bd0846f
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,35 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+# XXX - fix GC
+test_expect_success 'basic operation of reftable storage' '
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	mv .git/hooks .git/hooks-disabled &&
+	test_commit file &&
+	test_write_lines refs/heads/master refs/tags/file >expect &&
+	git show-ref &&
+	git show-ref | cut -f2 -d" " > actual &&
+	test_cmp actual expect &&
+	for count in $(test_seq 1 10)
+	do
+		test_commit "number $count" file.t $count number-$count ||
+ 		return 1
+	done &&
+(true || (test_pause &&
+	git gc &&
+	ls -1 .git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+ 	git reflog refs/heads/master >output &&
+	test_line_count = 11 output &&
+	grep "commit (initial): first post" output &&
+	grep "commit: number 10" output ))
+'
+
+test_done
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v9 09/10] Add reftable library
  2020-04-20 21:14                 ` [PATCH v9 09/10] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-04-20 22:06                   ` Junio C Hamano
  2020-04-21 19:04                     ` Han-Wen Nienhuys
  2020-04-22 17:35                     ` Johannes Schindelin
  0 siblings, 2 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-04-20 22:06 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/reftable/update.sh b/reftable/update.sh
> new file mode 100644

Shouldn't this be executable, even though the readme tells the
reader to run "sh update.sh"?

> index 00000000000..0fd90f89675
> --- /dev/null
> +++ b/reftable/update.sh
> @@ -0,0 +1,14 @@
> +#!/bin/sh
> +
> +set -eux

Use of -e and -u are understandable but -x is a bit unusual unless
one is debugging...

> +((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
> +git clone https://github.com/google/reftable reftable-repo) && \

I think you can discard most of the '\' at the end of the line, as
the shell knows, when you stop a line with &&, you haven't finished
talking to it yet.  That would make the result slightly less noisy
to follow.  We don't rely on "set -e" in this project, but this is a
borrowed file from our point of view---as long as we are using -e,
wouldn't it make more sense to take full advantage of it and do
without &&?

> +cp reftable-repo/c/*.[ch] reftable/ && \
> +cp reftable-repo/c/include/*.[ch] reftable/ && \
> +cp reftable-repo/LICENSE reftable/ &&
> +git --git-dir reftable-repo/.git show --no-patch origin/master \
> +> reftable/VERSION && \

> +sed -i~ 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|' reftable/system.h

"sed -i" is probably GNUism that does not port well.

> +rm reftable/*_test.c reftable/test_framework.* reftable/compat.*

I presume that this is because we copied everything that match
c/*.[ch] and is not about removing what used to exist in reftable/
directory locally?  Not complaining, but checking if I am reading
the intention of the code correctly.

> +git add reftable/*.[ch]

As we flattened when copying from c/include/*.[ch], this "git add"
can also be flat, which makes sense.  Don't you need to also add the
LICENSE file in there?

Two comments on the design:

 - The list of "what to copy in" largely depends on the upstream
   reftable library; hardcoding the patterns here would make it
   harder to reorganize the files for the upstream project.  I
   wonder if it makes more sense to reserve a single path in the
   reftable-repo and hardcode the knowledge of what to copy out
   there?  That would mean the resulting update.sh here can become

	(git -C reftable-repo pull --ff-only && git clone ...) &&
	reftable-repo/copy-out-to-git-core.sh reftable/

   and the copy-out-to-git-core.sh script that comes with the
   upstream reftable library is the only one that needs to know
   where the license and C source files are in the same version as
   itself, and how to produce $1/VERSION file using the commit log
   message of the commit it comes from.

 - The above procedure (with or without the "who should know what to
   copy" change) would cope well with modified and new files in the
   reftable upstream, but does not remove a library file that used
   to exist (and copied to the git-core project) that no longer is
   needed and shipped.  It appears that there has to be some way to
   remove them.

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v9 09/10] Add reftable library
  2020-04-20 22:06                   ` Junio C Hamano
@ 2020-04-21 19:04                     ` Han-Wen Nienhuys
  2020-04-22 17:35                     ` Johannes Schindelin
  1 sibling, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-04-21 19:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, Apr 21, 2020 at 12:07 AM Junio C Hamano <gitster@pobox.com> wrote:
> Shouldn't this be executable, even though the readme tells the
> reader to run "sh update.sh"?

Done.

(should I post an updated version of the patch series for these minor
changes, or is it OK to fold them with the next major update?)

> > +set -eux
>
> Use of -e and -u are understandable but -x is a bit unusual unless
> one is debugging...

Done.

> > +((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
> > +git clone https://github.com/google/reftable reftable-repo) && \
>
> I think you can discard most of the '\' at the end of the line, as
> the shell knows, when you stop a line with &&, you haven't finished
> talking to it yet.  That would make the result slightly less noisy
> to follow.  We don't rely on "set -e" in this project, but this is a
> borrowed file from our point of view---as long as we are using -e,
> wouldn't it make more sense to take full advantage of it and do
> without &&?

Done.

> > +cp reftable-repo/c/*.[ch] reftable/ && \
> > +cp reftable-repo/c/include/*.[ch] reftable/ && \
> > +cp reftable-repo/LICENSE reftable/ &&
> > +git --git-dir reftable-repo/.git show --no-patch origin/master \
> > +> reftable/VERSION && \
>
> > +sed -i~ 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|' reftable/system.h
>
> "sed -i" is probably GNUism that does not port well.

done.

> > +rm reftable/*_test.c reftable/test_framework.* reftable/compat.*
>
> I presume that this is because we copied everything that match
> c/*.[ch] and is not about removing what used to exist in reftable/
> directory locally?  Not complaining, but checking if I am reading
> the intention of the code correctly.

correct.

> > +git add reftable/*.[ch]
>
> As we flattened when copying from c/include/*.[ch], this "git add"
> can also be flat, which makes sense.  Don't you need to also add the
> LICENSE file in there?

Done.

> Two comments on the design:
>
>  - The list of "what to copy in" largely depends on the upstream
>    reftable library; hardcoding the patterns here would make it
>    harder to reorganize the files for the upstream project.  I
>    wonder if it makes more sense to reserve a single path in the
>    reftable-repo and hardcode the knowledge of what to copy out
>    there?  That would mean the resulting update.sh here can become
>
>         (git -C reftable-repo pull --ff-only && git clone ...) &&
>         reftable-repo/copy-out-to-git-core.sh reftable/
>
>    and the copy-out-to-git-core.sh script that comes with the
>    upstream reftable library is the only one that needs to know
>    where the license and C source files are in the same version as
>    itself, and how to produce $1/VERSION file using the commit log
>    message of the commit it comes from.
>
>  - The above procedure (with or without the "who should know what to
>    copy" change) would cope well with modified and new files in the
>    reftable upstream, but does not remove a library file that used
>    to exist (and copied to the git-core project) that no longer is
>    needed and shipped.  It appears that there has to be some way to
>    remove them.

You are correct, but once this is working to the extent that it's
accepted into git-core, I expect imports to be infrequent, and I think
it's not worth the effort to automate this further. If files are
renamed, the result will likely not compile, so this mistake will be
easy to catch.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v9 00/10] Reftable support git-core
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
                                   ` (9 preceding siblings ...)
  2020-04-20 21:14                 ` [PATCH v9 10/10] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-04-21 20:13                 ` Junio C Hamano
  2020-04-23 21:27                   ` Han-Wen Nienhuys
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
  11 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-04-21 20:13 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  * what is a good test strategy? Could we have a CI flavor where we flip the
>    default to reftable, and see how it fares?

I do not know if the current tests are prepared for it yet,
e.g. there may be places that say "cat .git/refs/heads/master" and
do something to its contents.  But I think that it is a good goal to
shoot for to make sure we can run all the tests with reftable
enabled.

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v9 09/10] Add reftable library
  2020-04-20 22:06                   ` Junio C Hamano
  2020-04-21 19:04                     ` Han-Wen Nienhuys
@ 2020-04-22 17:35                     ` Johannes Schindelin
  1 sibling, 0 replies; 409+ messages in thread
From: Johannes Schindelin @ 2020-04-22 17:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Han-Wen Nienhuys

Hi,

On Mon, 20 Apr 2020, Junio C Hamano wrote:

> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > diff --git a/reftable/update.sh b/reftable/update.sh
> > new file mode 100644
>
> Shouldn't this be executable, even though the readme tells the
> reader to run "sh update.sh"?

Do we really want to have such a shell script? To be honest, I'd rather do
a subtree merge. In my mind, this should make it easier to maintain this
code elsewhere, yet debug and fix it in git.git (because there is no doubt
whatsoever in my mind that the vast majority of bug fixes in this code
will come in via git.git, which _does_ make me wonder whether it makes
sense to keep the reftable library separate, as if it was truly an
independent part of Git: it is absolutely not).

Of course, this would make it even more desirable to do away with code
that reads like Java (and is probably a silent, defiant protest against
Git's requirement to have no declarations after statements), e.g.

	int block_iter_next(struct block_iter *it, struct record rec)
	{
		if (it->next_off >= it->br->block_len) {
			return 1;
		}

		{
			struct slice in = {
				.buf = it->br->block.data + it->next_off,
				.len = it->br->block_len - it->next_off,
			};
			struct slice start = in;
			struct slice key = { 0 };
			byte extra;
			int n = decode_key(&key, &extra, it->last_key, in);
			if (n < 0) {
				return -1;
			}

			[...]
		}
	}

The fact that this code uses a coding style that is very different from
Git's, on purpose, makes it quite awkward to even so much as debug, say,
the segmentation fault that was caused by my rewrite of the test script
(which _also_ looks _very_ different from the existing test cases, adding
to the awkwardness):

	test_expect_success 'basic operation of reftable storage' '
		rm -rf .git &&
		git init --ref-storage=reftable &&
		mv .git/hooks .git/hooks-disabled &&
		test_commit file &&
		test_write_lines HEAD refs/heads/master refs/tags/file >expect &&
		git show-ref &&
		git show-ref | cut -f2 -d" " > actual &&
		test_cmp actual expect &&
		for count in $(test_seq 1 10)
		do
			test_commit "number $count" file.t $count number-$count ||
			return 1
		done &&
		git gc &&
		ls -1 .git/reftable >table-files &&
		test_line_count = 2 table-files &&
		git reflog refs/heads/master >output &&
		test_line_count = 11 output &&
		grep "commit (initial): first post" output &&
		grep "commit: number 10" output
	'

The fact that this rather trivial usage can still cause a segmentation
fault makes me believe that this rationale for the purposefully-different
coding style implies a fundamental fallacy (see
https://github.com/git-for-windows/git/commit/00eb1fd4496ef763f840878fb5249507ae83af43#commitcomment-38614650):

	Each time you use an unitialized variable, you put another mental burden
	on the reader to check that it is actually OK in that place, and the Git
	source code (which puts the declaration and use) far apart makes this
	harder to verify.

The major challenge with writing C code these days, especially when you
come from a background of Java or other languages that take care of memory
management "for you" is that you _do_ have to manage memory in C,
explicitly. And if you already start not paying attention to the
initialization of local variables, you go down the path where all of a
sudden you end up with a segmentation fault and have no idea where it is
coming from, and the most likely explanation is that the memory was
managed incorrectly.

So I would recommend to actually not get sloppy with confusing declaration
and initialization and initializing unnecessarily and trying to get cute
with introducing code blocks "just to introduce a narrower scope for local
variables".

Instead, I would suggest to stick with the style Git developed; We have
gotten relatively good with keeping our memory management straight that
way.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v9 00/10] Reftable support git-core
  2020-04-21 20:13                 ` [PATCH v9 00/10] Reftable support git-core Junio C Hamano
@ 2020-04-23 21:27                   ` Han-Wen Nienhuys
  2020-04-23 21:43                     ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-04-23 21:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, Apr 21, 2020 at 10:14 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> >  * what is a good test strategy? Could we have a CI flavor where we flip the
> >    default to reftable, and see how it fares?
>
> I do not know if the current tests are prepared for it yet,
> e.g. there may be places that say "cat .git/refs/heads/master" and
> do something to its contents.  But I think that it is a good goal to
> shoot for to make sure we can run all the tests with reftable
> enabled.

Obviously, it would be nice if all tests pass, but that doesn't seem
very realistic.

hanwen@chiquinho:~/vc/git/t$ grep '^ok' log | wc
  18457  154017  896360
hanwen@chiquinho:~/vc/git/t$ grep '^not ok' log | wc
   3416   39825  228427

I expect that the bulk of this number reflects a (hopefully) small
number of bugs in the reftable glue, but there are many tests that do
things that are just not compatible with a more abstract ref database.
For example,

not ok 10 - check rev-list
#
# echo $SHA >"$REAL/HEAD" &&
# test "$SHA" = "$(git rev-list HEAD)"
#

What is the right way to approach this? Should the test use

  git update-ref HEAD $SHA

instead of writing to the loose ref? Or should we skip it in case of
reftable? Similarly

  git update-ref $m $A &&
  test $A = $(cat .git/$m)

My suggestion to leave these tests as is, but skip them if they can't
work for reftable. However, this needs a new primitive in the test
library; what should that primitive look like? Maybe a function, eg.

   test_expect_succes_skip_reftable

or similar?
-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v9 00/10] Reftable support git-core
  2020-04-23 21:27                   ` Han-Wen Nienhuys
@ 2020-04-23 21:43                     ` Junio C Hamano
  2020-04-23 21:52                       ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-04-23 21:43 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

Han-Wen Nienhuys <hanwen@google.com> writes:

> For example,
>
> not ok 10 - check rev-list
> #
> # echo $SHA >"$REAL/HEAD" &&
> # test "$SHA" = "$(git rev-list HEAD)"
> #
>
> What is the right way to approach this? Should the test use
>
>   git update-ref HEAD $SHA
>
> instead of writing to the loose ref?

Preferred.  

I didn't bother checking the context, but if the test is checking
"the history leading to $SHA has only one commit, i.e.  $SHA, and
rev-list can handle that correctly", certainly that would be a
preferred rewrite, rather than skipping the check for reftable,
which may risk not noticing that HEAD is broken with reftable.

> Or should we skip it in case of
> reftable? Similarly
>
>   git update-ref $m $A &&
>   test $A = $(cat .git/$m)

Ditto.  

We have "git show-ref -s --verify" for that; "cat .git/$m" certainly is wrong.

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v9 00/10] Reftable support git-core
  2020-04-23 21:43                     ` Junio C Hamano
@ 2020-04-23 21:52                       ` Junio C Hamano
  2020-04-25 13:58                         ` Johannes Schindelin
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-04-23 21:52 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

Junio C Hamano <gitster@pobox.com> writes:

> Han-Wen Nienhuys <hanwen@google.com> writes:
>
>> For example,
>>
>> not ok 10 - check rev-list
>> #
>> # echo $SHA >"$REAL/HEAD" &&
>> # test "$SHA" = "$(git rev-list HEAD)"
>> #
>>
>> What is the right way to approach this? Should the test use
>>
>>   git update-ref HEAD $SHA
>>
>> instead of writing to the loose ref?
>
> Preferred.  
>
> I didn't bother checking the context, but if the test is checking
> "the history leading to $SHA has only one commit, i.e.  $SHA, and
> rev-list can handle that correctly", certainly that would be a
> preferred rewrite, rather than skipping the check for reftable,
> which may risk not noticing that HEAD is broken with reftable.

Now I have.  The test is about various low-level machineries we have
work correctly even if .git is *not* a directory but is a "gitfile:
$other location" (which is an underlying mechanism for multiple
worktree support etc.), and it is making sure "git rev-list"
understands HEAD in such a repository that uses the gitfile mechanism.

If I didn't know it, I might have said that "if we are interested in
seeing $SHA is a root commit, we should check it more directly,
perhaps by making sure 'cat-file commit $SHA' does not say 'parent'
and that won't need to write to .git/HEAD at all", but the point of
the test is to ensure 'rev-list' works correctly in such a
repository, I think "update-ref HEAD $SHA" would be the right "fix"
for the test.

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v9 00/10] Reftable support git-core
  2020-04-23 21:52                       ` Junio C Hamano
@ 2020-04-25 13:58                         ` Johannes Schindelin
  0 siblings, 0 replies; 409+ messages in thread
From: Johannes Schindelin @ 2020-04-25 13:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Han-Wen Nienhuys, Han-Wen Nienhuys via GitGitGadget, git,
	Han-Wen Nienhuys

Hi Junio & Han-Wen,

On Thu, 23 Apr 2020, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
> > Han-Wen Nienhuys <hanwen@google.com> writes:
> >
> >> For example,
> >>
> >> not ok 10 - check rev-list
> >> #
> >> # echo $SHA >"$REAL/HEAD" &&
> >> # test "$SHA" = "$(git rev-list HEAD)"
> >> #
> >>
> >> What is the right way to approach this? Should the test use
> >>
> >>   git update-ref HEAD $SHA
> >>
> >> instead of writing to the loose ref?
> >
> > Preferred.
> >
> > I didn't bother checking the context, but if the test is checking
> > "the history leading to $SHA has only one commit, i.e.  $SHA, and
> > rev-list can handle that correctly", certainly that would be a
> > preferred rewrite, rather than skipping the check for reftable,
> > which may risk not noticing that HEAD is broken with reftable.
>
> Now I have.  The test is about various low-level machineries we have
> work correctly even if .git is *not* a directory but is a "gitfile:
> $other location" (which is an underlying mechanism for multiple
> worktree support etc.), and it is making sure "git rev-list"
> understands HEAD in such a repository that uses the gitfile mechanism.
>
> If I didn't know it, I might have said that "if we are interested in
> seeing $SHA is a root commit, we should check it more directly,
> perhaps by making sure 'cat-file commit $SHA' does not say 'parent'
> and that won't need to write to .git/HEAD at all", but the point of
> the test is to ensure 'rev-list' works correctly in such a
> repository, I think "update-ref HEAD $SHA" would be the right "fix"
> for the test.

I believe the common strategy for this kind of thing is to

- introduce a `GIT_TEST_*` variable that is used to change the default
  (follow e.g. GIT_TEST_COMMIT_GRAPH as an example)

- either add a prereq that guards the test cases that cannot handle that
  knob (!REFTABLE in this case), or follow GIT_TEST_COMMIT_GRAPH's example
  again and _force_ its value to 0 for those test cases.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v10 00/12] Reftable support git-core
  2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
                                   ` (10 preceding siblings ...)
  2020-04-21 20:13                 ` [PATCH v9 00/10] Reftable support git-core Junio C Hamano
@ 2020-04-27 20:13                 ` Han-Wen Nienhuys via GitGitGadget
  2020-04-27 20:13                   ` [PATCH v10 01/12] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
                                     ` (12 more replies)
  11 siblings, 13 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-27 20:13 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend.

Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1

Summary 18816 tests pass 2994 tests fail

Some issues:

 * worktree support looks broken
 * no detection for file/dir conflicts. You can have branch {a, a/b}
   simultaneously.
 * submodules (eg. t1013)
 * many tests inspect .git/logs/ directly.

v13

 * Avoid exposing tombstones from reftable_stack to git 
 * Write peeled tags
 * Fix GC issue: use git-refpack in test
 * Fix unborn branches: must propagate errno from read_raw_ref.
 * add GIT_TEST_REFTABLE environment to control default.
 * add REFTABLE test prerequisite.
 * fix & test update-ref -d.
 * print error message for failed update-ref transactions.

Han-Wen Nienhuys (11):
  refs.h: clarify reflog iteration order
  Iterate over the "refs/" namespace in for_each_[raw]ref
  create .git/refs in files-backend.c
  refs: document how ref_iterator_advance_fn should handle symrefs
  Add .gitattributes for the reftable/ directory
  reftable: define version 2 of the spec to accomodate SHA256
  reftable: clarify how empty tables should be written
  Add reftable library
  Reftable support for git-core
  Add some reftable testing infrastructure
  t: use update-ref and show-ref to reading/writing refs

Jonathan Nieder (1):
  reftable: file format documentation

 Documentation/Makefile                        |    1 +
 Documentation/technical/reftable.txt          | 1080 ++++++++++++++++
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   26 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   63 +-
 cache.h                                       |    6 +-
 refs.c                                        |   33 +-
 refs.h                                        |    8 +-
 refs/files-backend.c                          |    6 +
 refs/refs-internal.h                          |    6 +
 refs/reftable-backend.c                       | 1045 +++++++++++++++
 reftable/.gitattributes                       |    1 +
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   11 +
 reftable/VERSION                              |    5 +
 reftable/basics.c                             |  209 +++
 reftable/basics.h                             |   53 +
 reftable/block.c                              |  425 ++++++
 reftable/block.h                              |  124 ++
 reftable/blocksource.h                        |   22 +
 reftable/bytes.c                              |    0
 reftable/config.h                             |    1 +
 reftable/constants.h                          |   21 +
 reftable/dump.c                               |   97 ++
 reftable/file.c                               |   98 ++
 reftable/iter.c                               |  234 ++++
 reftable/iter.h                               |   60 +
 reftable/merged.c                             |  327 +++++
 reftable/merged.h                             |   36 +
 reftable/pq.c                                 |  115 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  754 +++++++++++
 reftable/reader.h                             |   68 +
 reftable/record.c                             | 1126 ++++++++++++++++
 reftable/record.h                             |  121 ++
 reftable/reftable.h                           |  527 ++++++++
 reftable/slice.c                              |  224 ++++
 reftable/slice.h                              |   76 ++
 reftable/stack.c                              | 1151 +++++++++++++++++
 reftable/stack.h                              |   44 +
 reftable/system.h                             |   53 +
 reftable/tree.c                               |   67 +
 reftable/tree.h                               |   34 +
 reftable/update.sh                            |   23 +
 reftable/writer.c                             |  661 ++++++++++
 reftable/writer.h                             |   60 +
 reftable/zlib-compat.c                        |   92 ++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0002-gitfile.sh                            |    2 +-
 t/t0031-reftable.sh                           |   99 ++
 t/t1400-update-ref.sh                         |   32 +-
 t/t1506-rev-parse-diagnosis.sh                |    2 +-
 t/t3210-pack-refs.sh                          |    6 +
 t/t6050-replace.sh                            |    2 +-
 t/t9020-remote-svn.sh                         |    4 +-
 t/test-lib.sh                                 |    5 +
 59 files changed, 9378 insertions(+), 60 deletions(-)
 create mode 100644 Documentation/technical/reftable.txt
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/.gitattributes
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100755 t/t0031-reftable.sh


base-commit: e870325ee8575d5c3d7afe0ba2c9be072c692b65
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v10
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v10
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v9:

  1:  b600d0bc6dd =  1:  d3d8ed2032c refs.h: clarify reflog iteration order
  2:  89b68145e8f =  2:  45fd65f72e0 Iterate over the "refs/" namespace in for_each_[raw]ref
  3:  91f1efe24ec =  3:  bc89bcd9c8c create .git/refs in files-backend.c
  4:  56f65a2a0d7 =  4:  fd67cdff0cd refs: document how ref_iterator_advance_fn should handle symrefs
  5:  b432c1cc2ae =  5:  5373828f854 Add .gitattributes for the reftable/ directory
  6:  b1d94be691a =  6:  8eeed29ebbf reftable: file format documentation
  7:  3e418d27f67 =  7:  5ebc272e23d reftable: define version 2 of the spec to accomodate SHA256
  8:  6f62be4067b =  8:  95aae041613 reftable: clarify how empty tables should be written
  9:  a30001ad1e8 !  9:  59209a5ad39 Add reftable library
     @@ Commit message
      
          * reftable.h - the public API
      
     +    * record.{c,h} - reading and writing records
     +
          * block.{c,h} - reading and writing blocks.
      
          * writer.{c,h} - writing a complete reftable file.
     @@ reftable/README.md (new)
      
       ## reftable/VERSION (new) ##
      @@
     -+commit b68690acad769d59e80cbb4f79c442ece133e6bd
     ++commit a6d6b31bea256044621d789df64a8159647f21bc
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Mon Apr 20 21:33:11 2020 +0200
     ++Date:   Mon Apr 27 20:46:21 2020 +0200
      +
     -+    C: fix search/replace error in dump.c
     ++    C: prefix error codes with REFTABLE_
      
       ## reftable/basics.c (new) ##
      @@
     @@ reftable/basics.c (new)
      +const char *reftable_error_str(int err)
      +{
      +	switch (err) {
     -+	case IO_ERROR:
     ++	case REFTABLE_IO_ERROR:
      +		return "I/O error";
     -+	case FORMAT_ERROR:
     -+		return "FORMAT_ERROR";
     -+	case NOT_EXIST_ERROR:
     -+		return "NOT_EXIST_ERROR";
     -+	case LOCK_ERROR:
     -+		return "LOCK_ERROR";
     -+	case API_ERROR:
     -+		return "API_ERROR";
     -+	case ZLIB_ERROR:
     -+		return "ZLIB_ERROR";
     ++	case REFTABLE_FORMAT_ERROR:
     ++		return "corrupt reftable file";
     ++	case REFTABLE_NOT_EXIST_ERROR:
     ++		return "file does not exist";
     ++	case REFTABLE_LOCK_ERROR:
     ++		return "data is outdated";
     ++	case REFTABLE_API_ERROR:
     ++		return "misuse of the reftable API";
     ++	case REFTABLE_ZLIB_ERROR:
     ++		return "zlib failure";
      +	case -1:
      +		return "general error";
      +	default:
     @@ reftable/basics.c (new)
      +	}
      +}
      +
     ++int reftable_error_to_errno(int err) {
     ++	switch (err) {
     ++	case REFTABLE_IO_ERROR:
     ++		return EIO;
     ++	case REFTABLE_FORMAT_ERROR:
     ++		return EFAULT;
     ++	case REFTABLE_NOT_EXIST_ERROR:
     ++		return ENOENT;
     ++	case REFTABLE_LOCK_ERROR:
     ++		return EBUSY;
     ++	case REFTABLE_API_ERROR:
     ++		return EINVAL;
     ++	case REFTABLE_ZLIB_ERROR:
     ++		return EDOM;
     ++	default:
     ++		return ERANGE;
     ++	}
     ++}
     ++
     ++
      +void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
      +void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
      +void (*reftable_free_ptr)(void *) = &free;
     @@ reftable/block.c (new)
      +
      +			if (Z_OK != zresult) {
      +				slice_clear(&compressed);
     -+				return ZLIB_ERROR;
     ++				return REFTABLE_ZLIB_ERROR;
      +			}
      +
      +			memcpy(w->buf + block_header_skip, compressed.buf,
     @@ reftable/block.c (new)
      +	uint32_t sz = get_be24(block->data + header_off + 1);
      +
      +	if (!is_block_type(typ)) {
     -+		return FORMAT_ERROR;
     ++		return REFTABLE_FORMAT_ERROR;
      +	}
      +
      +	if (typ == BLOCK_TYPE_LOG) {
     @@ reftable/block.c (new)
      +				    &dst_len, block->data + block_header_skip,
      +				    &src_len)) {
      +			slice_clear(&uncompressed);
     -+			return ZLIB_ERROR;
     ++			return REFTABLE_ZLIB_ERROR;
      +		}
      +
      +		block_source_return_block(block->source, block);
     @@ reftable/file.c (new)
      +	int fd = open(name, O_RDONLY);
      +	if (fd < 0) {
      +		if (errno == ENOENT) {
     -+			return NOT_EXIST_ERROR;
     ++			return REFTABLE_NOT_EXIST_ERROR;
      +		}
      +		return -1;
      +	}
     @@ reftable/iter.c (new)
      +		}
      +		if (err > 0) {
      +			/* indexed block does not exist. */
     -+			return FORMAT_ERROR;
     ++			return REFTABLE_FORMAT_ERROR;
      +		}
      +	}
      +	block_reader_start(&it->block_reader, &it->cur);
     @@ reftable/merged.c (new)
      +	return 0;
      +}
      +
     -+static int merged_iter_next(struct merged_iter *mi, struct record rec)
     ++static int merged_iter_next_entry(struct merged_iter *mi, struct record rec)
      +{
      +	struct slice entry_key = { 0 };
     -+	struct pq_entry entry = merged_iter_pqueue_remove(&mi->pq);
     -+	int err = merged_iter_advance_subiter(mi, entry.index);
     ++	struct pq_entry entry = { 0 };
     ++	int err = 0;
     ++
     ++	if (merged_iter_pqueue_is_empty(mi->pq)) {
     ++		return 1;
     ++	}
     ++
     ++	entry = merged_iter_pqueue_remove(&mi->pq);
     ++	err = merged_iter_advance_subiter(mi, entry.index);
      +	if (err < 0) {
      +		return err;
      +	}
      +
     -+
     -+        /*
     -+          One can also use reftable as datacenter-local storage, where the ref
     -+          database is maintained in globally consistent database (eg.
     -+          CockroachDB or Spanner). In this scenario, replication delays together
     -+          with compaction may cause newer tables to contain older entries. In
     -+          such a deployment, the loop below must be changed to collect all
     -+          entries for the same key, and return new the newest one.
     -+        */
     ++	/*
     ++	  One can also use reftable as datacenter-local storage, where the ref
     ++	  database is maintained in globally consistent database (eg.
     ++	  CockroachDB or Spanner). In this scenario, replication delays together
     ++	  with compaction may cause newer tables to contain older entries. In
     ++	  such a deployment, the loop below must be changed to collect all
     ++	  entries for the same key, and return new the newest one.
     ++	*/
      +	record_key(entry.rec, &entry_key);
      +	while (!merged_iter_pqueue_is_empty(mi->pq)) {
      +		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
     @@ reftable/merged.c (new)
      +	return 0;
      +}
      +
     ++static int merged_iter_next(struct merged_iter *mi, struct record rec)
     ++{
     ++	while (true) {
     ++		int err = merged_iter_next_entry(mi, rec);
     ++		if (err == 0 && mi->suppress_deletions &&
     ++		    record_is_deletion(rec)) {
     ++			continue;
     ++		}
     ++
     ++		return err;
     ++	}
     ++}
     ++
      +static int merged_iter_next_void(void *p, struct record rec)
      +{
      +	struct merged_iter *mi = (struct merged_iter *)p;
     @@ reftable/merged.c (new)
      +	for (i = 0; i < n; i++) {
      +		struct reftable_reader *r = stack[i];
      +		if (r->hash_id != hash_id) {
     -+			return FORMAT_ERROR;
     ++			return REFTABLE_FORMAT_ERROR;
      +		}
      +		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
     -+			return FORMAT_ERROR;
     ++			return REFTABLE_FORMAT_ERROR;
      +		}
      +		if (i == 0) {
      +			first_min = reftable_reader_min_update_index(r);
     @@ reftable/merged.c (new)
      +		.stack = iters,
      +		.typ = record_type(rec),
      +		.hash_id = mt->hash_id,
     ++		.suppress_deletions = mt->suppress_deletions,
      +	};
      +	int n = 0;
      +	int err = 0;
     @@ reftable/merged.h (new)
      +	struct reftable_reader **stack;
      +	int stack_len;
      +	uint32_t hash_id;
     ++	bool suppress_deletions;
      +
      +	uint64_t min;
      +	uint64_t max;
     @@ reftable/merged.h (new)
      +	uint32_t hash_id;
      +	int stack_len;
      +	byte typ;
     ++	bool suppress_deletions;
      +	struct merged_iter_pqueue pq;
      +};
      +
     @@ reftable/reader.c (new)
      +	byte *f = footer;
      +	int err = 0;
      +	if (memcmp(f, "REFT", 4)) {
     -+		err = FORMAT_ERROR;
     ++		err = REFTABLE_FORMAT_ERROR;
      +		goto exit;
      +	}
      +	f += 4;
      +
      +	if (memcmp(footer, header, header_size(r->version))) {
     -+		err = FORMAT_ERROR;
     ++		err = REFTABLE_FORMAT_ERROR;
      +		goto exit;
      +	}
      +
     @@ reftable/reader.c (new)
      +		case SHA256_ID:
      +			break;
      +		default:
     -+			err = FORMAT_ERROR;
     ++			err = REFTABLE_FORMAT_ERROR;
      +			goto exit;
      +		}
      +		f += 4;
     @@ reftable/reader.c (new)
      +		uint32_t file_crc = get_be32(f);
      +		f += 4;
      +		if (computed_crc != file_crc) {
     -+			err = FORMAT_ERROR;
     ++			err = REFTABLE_FORMAT_ERROR;
      +			goto exit;
      +		}
      +	}
     @@ reftable/reader.c (new)
      +	/* Need +1 to read type of first block. */
      +	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
      +	if (err != header_size(2) + 1) {
     -+		err = IO_ERROR;
     ++		err = REFTABLE_IO_ERROR;
      +		goto exit;
      +	}
      +
      +	if (memcmp(header.data, "REFT", 4)) {
     -+		err = FORMAT_ERROR;
     ++		err = REFTABLE_FORMAT_ERROR;
      +		goto exit;
      +	}
      +	r->version = header.data[4];
      +	if (r->version != 1 && r->version != 2) {
     -+		err = FORMAT_ERROR;
     ++		err = REFTABLE_FORMAT_ERROR;
      +		goto exit;
      +	}
      +
     @@ reftable/reader.c (new)
      +	err = block_source_read_block(source, &footer, r->size,
      +				      footer_size(r->version));
      +	if (err != footer_size(r->version)) {
     -+		err = IO_ERROR;
     ++		err = REFTABLE_IO_ERROR;
      +		goto exit;
      +	}
      +
     @@ reftable/reader.c (new)
      +static int table_iter_next(struct table_iter *ti, struct record rec)
      +{
      +	if (record_type(rec) != ti->typ) {
     -+		return API_ERROR;
     ++		return REFTABLE_API_ERROR;
      +	}
      +
      +	while (true) {
     @@ reftable/reader.c (new)
      +		}
      +
      +		if (next.typ != BLOCK_TYPE_INDEX) {
     -+			err = FORMAT_ERROR;
     ++			err = REFTABLE_FORMAT_ERROR;
      +			break;
      +		}
      +
     @@ reftable/record.c (new)
      +	return start_len - in.len;
      +}
      +
     -+static byte reftable_ref_record_type(void)
     -+{
     -+	return BLOCK_TYPE_REF;
     -+}
     -+
      +static void reftable_ref_record_key(const void *r, struct slice *dest)
      +{
      +	const struct reftable_ref_record *rec =
     @@ reftable/record.c (new)
      +	}
      +}
      +
     -+void reftable_ref_record_print(struct reftable_ref_record *ref, int hash_size)
     ++void reftable_ref_record_print(struct reftable_ref_record *ref,
     ++			       uint32_t hash_id)
      +{
      +	char hex[SHA256_SIZE + 1] = { 0 };
     -+
      +	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
      +	if (ref->value != NULL) {
     -+		hex_format(hex, ref->value, hash_size);
     ++		hex_format(hex, ref->value, hash_size(hash_id));
      +		printf("%s", hex);
      +	}
      +	if (ref->target_value != NULL) {
     -+		hex_format(hex, ref->target_value, hash_size);
     ++		hex_format(hex, ref->target_value, hash_size(hash_id));
      +		printf(" (T %s)", hex);
      +	}
      +	if (ref->target != NULL) {
     @@ reftable/record.c (new)
      +	return start.len - in.len;
      +}
      +
     ++static bool reftable_ref_record_is_deletion_void(const void *p)
     ++{
     ++	return reftable_ref_record_is_deletion(
     ++		(const struct reftable_ref_record *)p);
     ++}
     ++
      +struct record_vtable reftable_ref_record_vtable = {
      +	.key = &reftable_ref_record_key,
     -+	.type = &reftable_ref_record_type,
     ++	.type = BLOCK_TYPE_REF,
      +	.copy_from = &reftable_ref_record_copy_from,
      +	.val_type = &reftable_ref_record_val_type,
      +	.encode = &reftable_ref_record_encode,
      +	.decode = &reftable_ref_record_decode,
      +	.clear = &reftable_ref_record_clear_void,
     ++	.is_deletion = &reftable_ref_record_is_deletion_void,
      +};
      +
     -+static byte obj_record_type(void)
     -+{
     -+	return BLOCK_TYPE_OBJ;
     -+}
     -+
      +static void obj_record_key(const void *r, struct slice *dest)
      +{
      +	const struct obj_record *rec = (const struct obj_record *)r;
     @@ reftable/record.c (new)
      +	return start.len - in.len;
      +}
      +
     ++static bool not_a_deletion(const void *p)
     ++{
     ++	return false;
     ++}
     ++
      +struct record_vtable obj_record_vtable = {
      +	.key = &obj_record_key,
     -+	.type = &obj_record_type,
     ++	.type = BLOCK_TYPE_OBJ,
      +	.copy_from = &obj_record_copy_from,
      +	.val_type = &obj_record_val_type,
      +	.encode = &obj_record_encode,
      +	.decode = &obj_record_decode,
      +	.clear = &obj_record_clear,
     ++	.is_deletion = not_a_deletion,
      +};
      +
     -+void reftable_log_record_print(struct reftable_log_record *log, int hash_size)
     ++void reftable_log_record_print(struct reftable_log_record *log,
     ++			       uint32_t hash_id)
      +{
      +	char hex[SHA256_SIZE + 1] = { 0 };
      +
      +	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
      +	       log->update_index, log->name, log->email, log->time,
      +	       log->tz_offset);
     -+	hex_format(hex, log->old_hash, hash_size);
     ++	hex_format(hex, log->old_hash, hash_size(hash_id));
      +	printf("%s => ", hex);
     -+	hex_format(hex, log->new_hash, hash_size);
     ++	hex_format(hex, log->new_hash, hash_size(hash_id));
      +	printf("%s\n\n%s\n}\n", hex, log->message);
      +}
      +
     -+static byte reftable_log_record_type(void)
     -+{
     -+	return BLOCK_TYPE_LOG;
     -+}
     -+
      +static void reftable_log_record_key(const void *r, struct slice *dest)
      +{
      +	const struct reftable_log_record *rec =
     @@ reftable/record.c (new)
      +	int n;
      +
      +	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
     -+		return FORMAT_ERROR;
     ++		return REFTABLE_FORMAT_ERROR;
      +	}
      +
      +	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
     @@ reftable/record.c (new)
      +	}
      +
      +	if (in.len < 2 * hash_size) {
     -+		return FORMAT_ERROR;
     ++		return REFTABLE_FORMAT_ERROR;
      +	}
      +
      +	r->old_hash = reftable_realloc(r->old_hash, hash_size);
     @@ reftable/record.c (new)
      +
      +error:
      +	slice_clear(&dest);
     -+	return FORMAT_ERROR;
     ++	return REFTABLE_FORMAT_ERROR;
      +}
      +
      +static bool null_streq(char *a, char *b)
     @@ reftable/record.c (new)
      +	       a->update_index == b->update_index;
      +}
      +
     ++static bool reftable_log_record_is_deletion_void(const void *p)
     ++{
     ++	return reftable_log_record_is_deletion(
     ++		(const struct reftable_log_record *)p);
     ++}
     ++
      +struct record_vtable reftable_log_record_vtable = {
      +	.key = &reftable_log_record_key,
     -+	.type = &reftable_log_record_type,
     ++	.type = BLOCK_TYPE_LOG,
      +	.copy_from = &reftable_log_record_copy_from,
      +	.val_type = &reftable_log_record_val_type,
      +	.encode = &reftable_log_record_encode,
      +	.decode = &reftable_log_record_decode,
      +	.clear = &reftable_log_record_clear_void,
     ++	.is_deletion = &reftable_log_record_is_deletion_void,
      +};
      +
      +struct record new_record(byte typ)
     @@ reftable/record.c (new)
      +	reftable_free(record_yield(rec));
      +}
      +
     -+static byte index_record_type(void)
     -+{
     -+	return BLOCK_TYPE_INDEX;
     -+}
     -+
      +static void index_record_key(const void *r, struct slice *dest)
      +{
      +	struct index_record *rec = (struct index_record *)r;
     @@ reftable/record.c (new)
      +
      +struct record_vtable index_record_vtable = {
      +	.key = &index_record_key,
     -+	.type = &index_record_type,
     ++	.type = BLOCK_TYPE_INDEX,
      +	.copy_from = &index_record_copy_from,
      +	.val_type = &index_record_val_type,
      +	.encode = &index_record_encode,
      +	.decode = &index_record_decode,
      +	.clear = &index_record_clear,
     ++	.is_deletion = &not_a_deletion,
      +};
      +
      +void record_key(struct record rec, struct slice *dest)
     @@ reftable/record.c (new)
      +
      +byte record_type(struct record rec)
      +{
     -+	return rec.ops->type();
     ++	return rec.ops->type;
      +}
      +
      +int record_encode(struct record rec, struct slice dest, int hash_size)
     @@ reftable/record.c (new)
      +
      +void record_copy_from(struct record rec, struct record src, int hash_size)
      +{
     -+	assert(src.ops->type() == rec.ops->type());
     ++	assert(src.ops->type == rec.ops->type);
      +
      +	rec.ops->copy_from(rec.data, src.data, hash_size);
      +}
     @@ reftable/record.c (new)
      +	return rec.ops->clear(rec.data);
      +}
      +
     ++bool record_is_deletion(struct record rec)
     ++{
     ++	return rec.ops->is_deletion(rec.data);
     ++}
     ++
      +void record_from_ref(struct record *rec, struct reftable_ref_record *ref_rec)
      +{
      +	rec->data = ref_rec;
     @@ reftable/record.h (new)
      +	void (*key)(const void *rec, struct slice *dest);
      +
      +	/* The record type of ('r' for ref). */
     -+	byte (*type)(void);
     ++	byte type;
      +
      +	void (*copy_from)(void *dest, const void *src, int hash_size);
      +
     @@ reftable/record.h (new)
      +
      +	/* deallocate and null the record. */
      +	void (*clear)(void *rec);
     ++
     ++	/* is this a tombstone? */
     ++	bool (*is_deletion)(const void *rec);
      +};
      +
      +/* record is a generic wrapper for different types of records. */
     @@ reftable/record.h (new)
      +int record_encode(struct record rec, struct slice dest, int hash_size);
      +int record_decode(struct record rec, struct slice key, byte extra,
      +		  struct slice src, int hash_size);
     ++bool record_is_deletion(struct record rec);
      +
      +/* zeroes out the embedded record */
      +void record_clear(struct record rec);
     @@ reftable/reftable.h (new)
      +int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
      +
      +/* prints a reftable_ref_record onto stdout */
     -+void reftable_ref_record_print(struct reftable_ref_record *ref, int hash_size);
     ++void reftable_ref_record_print(struct reftable_ref_record *ref,
     ++			       uint32_t hash_id);
      +
      +/* frees and nulls all pointer values. */
      +void reftable_ref_record_clear(struct reftable_ref_record *ref);
     @@ reftable/reftable.h (new)
      +			      struct reftable_log_record *b, int hash_size);
      +
      +/* dumps a reftable_log_record on stdout, for debugging/testing. */
     -+void reftable_log_record_print(struct reftable_log_record *log, int hash_size);
     ++void reftable_log_record_print(struct reftable_log_record *log,
     ++			       uint32_t hash_id);
      +
      +/****************************************************************
      + Error handling
     @@ reftable/reftable.h (new)
      +/* different types of errors */
      +enum reftable_error {
      +	/* Unexpected file system behavior */
     -+	IO_ERROR = -2,
     ++	REFTABLE_IO_ERROR = -2,
      +
      +	/* Format inconsistency on reading data
      +	 */
     -+	FORMAT_ERROR = -3,
     ++	REFTABLE_FORMAT_ERROR = -3,
      +
      +	/* File does not exist. Returned from block_source_from_file(),  because
      +	   it needs special handling in stack.
      +	*/
     -+	NOT_EXIST_ERROR = -4,
     ++	REFTABLE_NOT_EXIST_ERROR = -4,
      +
      +	/* Trying to write out-of-date data. */
     -+	LOCK_ERROR = -5,
     ++	REFTABLE_LOCK_ERROR = -5,
      +
      +	/* Misuse of the API:
      +	   - on writing a record with NULL ref_name.
     @@ reftable/reftable.h (new)
      +	   - on writing a ref or log record before the stack's next_update_index
      +	   - on reading a reftable_ref_record from log iterator, or vice versa.
      +	*/
     -+	API_ERROR = -6,
     ++	REFTABLE_API_ERROR = -6,
      +
      +	/* Decompression error */
     -+	ZLIB_ERROR = -7,
     ++	REFTABLE_ZLIB_ERROR = -7,
      +
      +	/* Wrote a table without blocks. */
     -+	EMPTY_TABLE_ERROR = -8,
     ++	REFTABLE_EMPTY_TABLE_ERROR = -8,
      +};
      +
      +/* convert the numeric error code to a string. The string should not be
      + * deallocated. */
      +const char *reftable_error_str(int err);
      +
     ++/*
     ++ * Convert the numeric error code to an equivalent errno code.
     ++ */
     ++int reftable_error_to_errno(int err);
     ++
      +/****************************************************************
      + Writing
      +
     @@ reftable/reftable.h (new)
      +
      +/* Set the range of update indices for the records we will add.  When
      +   writing a table into a stack, the min should be at least
     -+   reftable_stack_next_update_index(), or API_ERROR is returned.
     ++   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
      +
      +   For transactional updates, typically min==max. When converting an existing
      +   ref database into a single reftable, this would be a range of update-index
     @@ reftable/reftable.h (new)
      +
      +/* adds a reftable_ref_record. Must be called in ascending
      +   order. The update_index must be within the limits set by
     -+   reftable_writer_set_limits(), or API_ERROR is returned.
     ++   reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned.
      +
      +   It is an error to write a ref record after a log record.
      + */
     @@ reftable/reftable.h (new)
      +			      struct reftable_reader **stack, int n,
      +			      uint32_t hash_id);
      +
     -+/* returns the hash id used in this merged table. */
     -+uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *mt);
     -+
      +/* returns an iterator positioned just before 'name' */
      +int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
      +				   struct reftable_iterator *it,
     @@ reftable/stack.c (new)
      +		reftable_calloc(sizeof(struct reftable_stack));
      +	struct slice list_file_name = {};
      +	int err = 0;
     ++
     ++	if (config.hash_id == 0) {
     ++		config.hash_id = SHA1_ID;
     ++	}
     ++
      +	*dest = NULL;
      +
      +	slice_set_string(&list_file_name, dir);
     -+	slice_append_string(&list_file_name, "/reftables.list");
     ++	slice_append_string(&list_file_name, "/tables.list");
      +
      +	p->list_file = slice_to_string(list_file_name);
      +	slice_clear(&list_file_name);
     @@ reftable/stack.c (new)
      +	return err;
      +}
      +
     -+static int fread_lines(FILE *f, char ***namesp)
     ++static int fd_read_lines(int fd, char ***namesp)
      +{
     -+	long size = 0;
     -+	int err = fseek(f, 0, SEEK_END);
     ++	off_t size = lseek(fd, 0, SEEK_END);
      +	char *buf = NULL;
     -+	if (err < 0) {
     -+		err = IO_ERROR;
     -+		goto exit;
     -+	}
     -+	size = ftell(f);
     ++	int err = 0;
      +	if (size < 0) {
     -+		err = IO_ERROR;
     ++		err = REFTABLE_IO_ERROR;
      +		goto exit;
      +	}
     -+	err = fseek(f, 0, SEEK_SET);
     ++	err = lseek(fd, 0, SEEK_SET);
      +	if (err < 0) {
     -+		err = IO_ERROR;
     ++		err = REFTABLE_IO_ERROR;
      +		goto exit;
      +	}
      +
      +	buf = reftable_malloc(size + 1);
     -+	if (fread(buf, 1, size, f) != size) {
     -+		err = IO_ERROR;
     ++	if (read(fd, buf, size) != size) {
     ++		err = REFTABLE_IO_ERROR;
      +		goto exit;
      +	}
      +	buf[size] = 0;
      +
      +	parse_names(buf, size, namesp);
     ++
      +exit:
      +	reftable_free(buf);
      +	return err;
     @@ reftable/stack.c (new)
      +
      +int read_lines(const char *filename, char ***namesp)
      +{
     -+	FILE *f = fopen(filename, "r");
     ++	int fd = open(filename, O_RDONLY, 0644);
      +	int err = 0;
     -+	if (f == NULL) {
     ++	if (fd < 0) {
      +		if (errno == ENOENT) {
      +			*namesp = reftable_calloc(sizeof(char *));
      +			return 0;
      +		}
      +
     -+		return IO_ERROR;
     ++		return REFTABLE_IO_ERROR;
      +	}
     -+	err = fread_lines(f, namesp);
     -+	fclose(f);
     ++	err = fd_read_lines(fd, namesp);
     ++	close(fd);
      +	return err;
      +}
      +
     @@ reftable/stack.c (new)
      +		merged_table_clear(st->merged);
      +		reftable_merged_table_free(st->merged);
      +	}
     ++	new_merged->suppress_deletions = true;
      +	st->merged = new_merged;
      +
      +	{
     @@ reftable/stack.c (new)
      +		char **names_after = NULL;
      +		struct timeval now = { 0 };
      +		int err = gettimeofday(&now, NULL);
     ++		int err2 = 0;
      +		if (err < 0) {
      +			return err;
      +		}
     @@ reftable/stack.c (new)
      +			free_names(names);
      +			break;
      +		}
     -+		if (err != NOT_EXIST_ERROR) {
     ++		if (err != REFTABLE_NOT_EXIST_ERROR) {
      +			free_names(names);
      +			return err;
      +		}
      +
     -+		err = read_lines(st->list_file, &names_after);
     -+		if (err < 0) {
     ++		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
     ++		   writer. Check if there was one by checking if the name list
     ++		   changed.
     ++		*/
     ++		err2 = read_lines(st->list_file, &names_after);
     ++		if (err2 < 0) {
      +			free_names(names);
     -+			return err;
     ++			return err2;
      +		}
      +
      +		if (names_equal(names_after, names)) {
      +			free_names(names);
      +			free_names(names_after);
     -+			return -1;
     ++			return err;
      +		}
      +		free_names(names);
      +		free_names(names_after);
     @@ reftable/stack.c (new)
      +{
      +	int err = stack_try_add(st, write, arg);
      +	if (err < 0) {
     -+		if (err == LOCK_ERROR) {
     -+			err = reftable_stack_reload(st);
     ++		if (err == REFTABLE_LOCK_ERROR) {
     ++			// Ignore error return, we want to propagate
     ++			// REFTABLE_LOCK_ERROR.
     ++			reftable_stack_reload(st);
      +		}
      +		return err;
      +	}
      +
     -+	return reftable_stack_auto_compact(st);
     ++	if (!st->disable_auto_compact) {
     ++		return reftable_stack_auto_compact(st);
     ++	}
     ++
     ++	return 0;
      +}
      +
      +static void format_name(struct slice *dest, uint64_t min, uint64_t max)
      +{
      +	char buf[100];
     -+	snprintf(buf, sizeof(buf), "%012" PRIx64 "-%012" PRIx64, min, max);
     ++	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
      +	slice_set_string(dest, buf);
      +}
      +
     @@ reftable/stack.c (new)
      +				 O_EXCL | O_CREAT | O_WRONLY, 0644);
      +	if (add->lock_file_fd < 0) {
      +		if (errno == EEXIST) {
     -+			err = LOCK_ERROR;
     ++			err = REFTABLE_LOCK_ERROR;
      +		} else {
     -+			err = IO_ERROR;
     ++			err = REFTABLE_IO_ERROR;
      +		}
      +		goto exit;
      +	}
     @@ reftable/stack.c (new)
      +	}
      +
      +	if (err > 1) {
     -+		err = LOCK_ERROR;
     ++		err = REFTABLE_LOCK_ERROR;
      +		goto exit;
      +	}
      +
     @@ reftable/stack.c (new)
      +	err = write(add->lock_file_fd, table_list.buf, table_list.len);
      +	slice_clear(&table_list);
      +	if (err < 0) {
     -+		err = IO_ERROR;
     ++		err = REFTABLE_IO_ERROR;
      +		goto exit;
      +	}
      +
      +	err = close(add->lock_file_fd);
      +	add->lock_file_fd = 0;
      +	if (err < 0) {
     -+		err = IO_ERROR;
     ++		err = REFTABLE_IO_ERROR;
      +		goto exit;
      +	}
      +
      +	err = rename(slice_as_string(&add->lock_file_name),
      +		     add->stack->list_file);
      +	if (err < 0) {
     -+		err = IO_ERROR;
     ++		err = REFTABLE_IO_ERROR;
      +		goto exit;
      +	}
      +
     @@ reftable/stack.c (new)
      +
      +	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
      +	if (tab_fd < 0) {
     -+		err = IO_ERROR;
     ++		err = REFTABLE_IO_ERROR;
      +		goto exit;
      +	}
      +
     @@ reftable/stack.c (new)
      +	}
      +
      +	err = reftable_writer_close(wr);
     -+	if (err == EMPTY_TABLE_ERROR) {
     ++	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
      +		err = 0;
      +		goto exit;
      +	}
     @@ reftable/stack.c (new)
      +	err = close(tab_fd);
      +	tab_fd = 0;
      +	if (err < 0) {
     -+		err = IO_ERROR;
     ++		err = REFTABLE_IO_ERROR;
      +		goto exit;
      +	}
      +
      +	if (wr->min_update_index < next_update_index) {
     -+		err = API_ERROR;
     ++		err = REFTABLE_API_ERROR;
      +		goto exit;
      +	}
      +
     @@ reftable/stack.c (new)
      +	err = rename(slice_as_string(&temp_tab_file_name),
      +		     slice_as_string(&tab_file_name));
      +	if (err < 0) {
     -+		err = IO_ERROR;
     ++		err = REFTABLE_IO_ERROR;
      +		goto exit;
      +	}
      +
     @@ reftable/stack.c (new)
      +		if (errno == EEXIST) {
      +			err = 1;
      +		} else {
     -+			err = IO_ERROR;
     ++			err = REFTABLE_IO_ERROR;
      +		}
      +		goto exit;
      +	}
     @@ reftable/stack.c (new)
      +			} else if (sublock_file_fd < 0) {
      +				if (errno == EEXIST) {
      +					err = 1;
     ++				} else {
     ++					err = REFTABLE_IO_ERROR;
      +				}
     -+				err = IO_ERROR;
      +			}
      +		}
      +
     @@ reftable/stack.c (new)
      +				   expiry);
      +	/* Compaction + tombstones can create an empty table out of non-empty
      +	 * tables. */
     -+	is_empty_table = (err == EMPTY_TABLE_ERROR);
     ++	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
      +	if (is_empty_table) {
      +		err = 0;
      +	}
     @@ reftable/stack.c (new)
      +		if (errno == EEXIST) {
      +			err = 1;
      +		} else {
     -+			err = IO_ERROR;
     ++			err = REFTABLE_IO_ERROR;
      +		}
      +		goto exit;
      +	}
     @@ reftable/stack.c (new)
      +		err = rename(slice_as_string(&temp_tab_file_name),
      +			     slice_as_string(&new_table_path));
      +		if (err < 0) {
     ++			err = REFTABLE_IO_ERROR;
      +			goto exit;
      +		}
      +	}
     @@ reftable/stack.c (new)
      +
      +	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
      +	if (err < 0) {
     ++		err = REFTABLE_IO_ERROR;
      +		unlink(slice_as_string(&new_table_path));
      +		goto exit;
      +	}
      +	err = close(lock_file_fd);
      +	lock_file_fd = 0;
      +	if (err < 0) {
     ++		err = REFTABLE_IO_ERROR;
      +		unlink(slice_as_string(&new_table_path));
      +		goto exit;
      +	}
      +
      +	err = rename(slice_as_string(&lock_file_name), st->list_file);
      +	if (err < 0) {
     ++		err = REFTABLE_IO_ERROR;
      +		unlink(slice_as_string(&new_table_path));
      +		goto exit;
      +	}
     @@ reftable/stack.c (new)
      +{
      +	uint64_t *sizes =
      +		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
     ++	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
     ++	int overhead = footer_size(version) + header_size(version) - 1;
      +	int i = 0;
      +	for (i = 0; i < st->merged->stack_len; i++) {
     -+		/* overhead is 24 + 68 = 92. */
     -+		sizes[i] = st->merged->stack[i]->size - 91;
     ++		sizes[i] = st->merged->stack[i]->size - overhead;
      +	}
      +	return sizes;
      +}
     @@ reftable/stack.h (new)
      +#define STACK_H
      +
      +#include "reftable.h"
     ++#include "system.h"
      +
      +struct reftable_stack {
      +	char *list_file;
      +	char *reftable_dir;
     ++	bool disable_auto_compact;
      +
      +	struct reftable_write_options config;
      +
     @@ reftable/update.sh (new)
      @@
      +#!/bin/sh
      +
     -+set -eux
     ++set -eu
      +
     -+((cd reftable-repo && git fetch origin && git checkout origin/master ) ||
     -+git clone https://github.com/google/reftable reftable-repo) && \
     -+cp reftable-repo/c/*.[ch] reftable/ && \
     -+cp reftable-repo/c/include/*.[ch] reftable/ && \
     -+cp reftable-repo/LICENSE reftable/ &&
     ++# Override this to import from somewhere else, say "../reftable".
     ++SRC=${SRC:-origin} BRANCH=${BRANCH:-origin/master}
     ++
     ++((git --git-dir reftable-repo/.git fetch ${SRC} && cd reftable-repo && git checkout ${BRANCH} ) ||
     ++   git clone https://github.com/google/reftable reftable-repo)
     ++
     ++cp reftable-repo/c/*.[ch] reftable/
     ++cp reftable-repo/c/include/*.[ch] reftable/
     ++cp reftable-repo/LICENSE reftable/
      +git --git-dir reftable-repo/.git show --no-patch origin/master \
     -+> reftable/VERSION && \
     -+sed -i~ 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|' reftable/system.h
     ++  > reftable/VERSION
     ++
     ++mv reftable/system.h reftable/system.h~
     ++sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
     ++
     ++# Remove unittests and compatibility hacks we don't need here.  
      +rm reftable/*_test.c reftable/test_framework.* reftable/compat.*
     -+git add reftable/*.[ch]
     ++
     ++git add reftable/*.[ch] reftable/LICENSE reftable/VERSION 
      
       ## reftable/writer.c (new) ##
      @@
     @@ reftable/writer.c (new)
      +	int err = 0;
      +
      +	if (ref->ref_name == NULL) {
     -+		return API_ERROR;
     ++		return REFTABLE_API_ERROR;
      +	}
      +	if (ref->update_index < w->min_update_index ||
      +	    ref->update_index > w->max_update_index) {
     -+		return API_ERROR;
     ++		return REFTABLE_API_ERROR;
      +	}
      +
      +	record_from_ref(&rec, &copy);
     @@ reftable/writer.c (new)
      +			    struct reftable_log_record *log)
      +{
      +	if (log->ref_name == NULL) {
     -+		return API_ERROR;
     ++		return REFTABLE_API_ERROR;
      +	}
      +
      +	if (w->block_writer != NULL &&
     @@ reftable/writer.c (new)
      +	}
      +
      +	if (empty_table) {
     -+		err = EMPTY_TABLE_ERROR;
     ++		err = REFTABLE_EMPTY_TABLE_ERROR;
      +		goto exit;
      +	}
      +
 10:  ad72edbcfd4 ! 10:  be2371cd6e4 Reftable support for git-core
     @@ Commit message
          TODO:
      
           * Resolve spots marked with XXX
     -     * Test strategy?
     +
     +     * Detect and prevent directory/file conflicts in naming.
     +
     +     * Support worktrees (t0002-gitfile "linked repo" testcase)
      
          Example use: see t/t0031-reftable.sh
      
     @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
       	}
       
      -	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
     -+	init_db(git_dir, real_git_dir, option_template,
     -+                GIT_HASH_UNKNOWN,
     -+		DEFAULT_REF_STORAGE, /* XXX */
     -+		INIT_DB_QUIET);
     ++	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
     ++		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
       
       	if (real_git_dir)
       		git_dir = real_git_dir;
     @@ builtin/init-db.c: static int needs_work_tree_config(const char *git_dir, const
       }
       
      -void initialize_repository_version(int hash_algo)
     -+void initialize_repository_version(int hash_algo, const char *ref_storage_format)
     ++void initialize_repository_version(int hash_algo,
     ++				   const char *ref_storage_format)
       {
       	char repo_version_string[10];
       	int repo_version = GIT_REPO_VERSION;
     @@ builtin/init-db.c: void initialize_repository_version(int hash_algo)
       #endif
       
      -	if (hash_algo != GIT_HASH_SHA1)
     -+	if (hash_algo != GIT_HASH_SHA1 || !strcmp(ref_storage_format, "reftable"))
     ++	if (hash_algo != GIT_HASH_SHA1 ||
     ++	    !strcmp(ref_storage_format, "reftable"))
       		repo_version = GIT_REPO_VERSION_READ;
       
       	/* This forces creation of new config file */
     @@ builtin/init-db.c: static int create_default_files(const char *template_path,
      +	path = git_path_buf(&buf, "HEAD");
      +	reinit = (!access(path, R_OK) ||
      +		  readlink(path, junk, sizeof(junk) - 1) != -1);
     ++
     ++        /*
     ++         * refs/heads is a file when using reftable. We can't reinitialize with
     ++         * a reftable because it will overwrite HEAD
     ++         */
     ++	if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
     ++			      is_directory(git_path_buf(&buf, "refs/heads"))) {
     ++		die("cannot switch ref storage format.");
     ++	}
      +
       	/*
       	 * We need to create a "refs" dir in any case so that older
       	 * versions of git can tell that this is a repository.
     + 	 */
     +-
     + 	if (refs_init_db(&err))
     + 		die("failed to set up refs db: %s", err.buf);
     + 
      @@ builtin/init-db.c: static int create_default_files(const char *template_path,
       	 * Create the default symlink from ".git/HEAD" to the "master"
       	 * branch, if it does not exist yet.
     @@ builtin/init-db.c: static int create_default_files(const char *template_path,
       	if (!reinit) {
       		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
       			exit(1);
     -+	} else {
     -+		/*
     -+		 * XXX should check whether our ref backend matches the
     -+		 * original one; if not, either die() or convert
     -+		 */
     - 	}
     - 
     +-	}
     +-
      -	initialize_repository_version(fmt->hash_algo);
     -+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
     ++	} 
       
     ++	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
     ++        
       	/* Check filemode trustability */
       	path = git_path_buf(&buf, "config");
     + 	filemode = TEST_FILEMODE;
      @@ builtin/init-db.c: static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
       }
       
       int init_db(const char *git_dir, const char *real_git_dir,
      -	    const char *template_dir, int hash, unsigned int flags)
      +	    const char *template_dir, int hash, const char *ref_storage_format,
     -+            unsigned int flags)
     ++	    unsigned int flags)
       {
       	int reinit;
       	int exist_ok = flags & INIT_DB_EXIST_OK;
     @@ builtin/init-db.c: int init_db(const char *git_dir, const char *real_git_dir,
       	 * is an attempt to reinitialize new repository with an old tool.
       	 */
       	check_repository_format(&repo_fmt);
     -+        repo_fmt.ref_storage = xstrdup(ref_storage_format);
     ++	repo_fmt.ref_storage = xstrdup(ref_storage_format);
       
       	validate_hash_algorithm(&repo_fmt, hash);
       
     @@ builtin/init-db.c: int cmd_init_db(int argc, const char **argv, const char *pref
       
       	flags |= INIT_DB_EXIST_OK;
      -	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
     -+	return init_db(git_dir, real_git_dir, template_dir, hash_algo, ref_storage_format, flags);
     ++	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
     ++		       ref_storage_format, flags);
       }
      
       ## cache.h ##
     @@ cache.h: int path_inside_repo(const char *prefix, const char *path);
       
       int init_db(const char *git_dir, const char *real_git_dir,
       	    const char *template_dir, int hash_algo,
     -+            const char *ref_storage_format,
     - 	    unsigned int flags);
     +-	    unsigned int flags);
      -void initialize_repository_version(int hash_algo);
     -+void initialize_repository_version(int hash_algo, const char *ref_storage_format);
     ++	    const char *ref_storage_format, unsigned int flags);
     ++void initialize_repository_version(int hash_algo,
     ++				   const char *ref_storage_format);
       
       void sanitize_stdfds(void);
       int daemonize(void);
     @@ cache.h: struct repository_format {
      
       ## refs.c ##
      @@
     + #include "argv-array.h"
     + #include "repository.h"
     + 
     ++const char *default_ref_storage(void)
     ++{
     ++	const char *test = getenv("GIT_TEST_REFTABLE");
     ++	return test ? "reftable" : "files";
     ++}
     ++
       /*
        * List of all available backends
        */
     @@ refs.c: struct ref_store *get_main_ref_store(struct repository *r)
       	if (!r->gitdir)
       		BUG("attempting to get main_ref_store outside of repository");
       
     --	r->refs = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
     -+	r->refs = ref_store_init(r->gitdir,
     -+				 r->ref_storage_format ? r->ref_storage_format :
     -+							 DEFAULT_REF_STORAGE,
     -+				 REF_STORE_ALL_CAPS);
     - 	return r->refs;
     +-	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
     ++	r->refs_private = ref_store_init(
     ++                r->gitdir,
     ++                r->ref_storage_format ? r->ref_storage_format : DEFAULT_REF_STORAGE,
     ++                REF_STORE_ALL_CAPS);
     + 	return r->refs_private;
       }
       
      @@ refs.c: struct ref_store *get_submodule_ref_store(const char *submodule)
     @@ refs.h: struct string_list;
       struct string_list_item;
       struct worktree;
       
     -+#define DEFAULT_REF_STORAGE "files"
     ++/* Returns the ref storage backend to use by default. */
     ++const char *default_ref_storage(void);
      +
       /*
        * Resolve a reference, recursively following symbolic refererences.
     @@ refs/reftable-backend.c (new)
      +	unsigned int store_flags;
      +
      +	int err;
     ++        char *repo_dir;
      +	char *reftable_dir;
      +	struct reftable_stack *stack;
      +};
     @@ refs/reftable-backend.c (new)
      +
      +	base_ref_store_init(ref_store, &refs_be_reftable);
      +	refs->store_flags = store_flags;
     -+
     ++        refs->repo_dir = xstrdup(path);
      +	strbuf_addf(&sb, "%s/reftable", path);
      +	refs->reftable_dir = xstrdup(sb.buf);
      +	strbuf_reset(&sb);
      +
     -+	strbuf_addf(&sb, "%s/refs", path);
     -+	safe_create_dir(sb.buf, 1);
     -+	strbuf_reset(&sb);
     -+
     -+	strbuf_addf(&sb, "%s/HEAD", path);
     -+	write_file(sb.buf, "ref: refs/.invalid");
     -+	strbuf_reset(&sb);
     -+
     -+	strbuf_addf(&sb, "%s/refs/heads", path);
     -+	write_file(sb.buf, "this repository uses the reftable format");
     -+
      +	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
      +	strbuf_release(&sb);
      +	return ref_store;
     @@ refs/reftable-backend.c (new)
      +{
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
     ++	struct strbuf sb = STRBUF_INIT;
     ++
      +	safe_create_dir(refs->reftable_dir, 1);
     -+	return 0;
     ++
     ++	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
     ++	write_file(sb.buf, "ref: refs/.invalid");
     ++	strbuf_reset(&sb);
     ++
     ++	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
     ++	safe_create_dir(sb.buf, 1);
     ++	strbuf_reset(&sb);
     ++
     ++	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
     ++	write_file(sb.buf, "this repository uses the reftable format");
     ++
     ++        return 0;
      +}
      +
      +struct git_reftable_iterator {
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
     -+        ri->prefix = prefix;
     ++	ri->prefix = prefix;
      +	ri->base.oid = &ri->oid;
      +	ri->flags = flags;
      +	ri->ref_store = ref_store;
     @@ refs/reftable-backend.c (new)
      +	const char *resolved = refs_resolve_ref_unsafe(
      +		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
      +	if (is_null_oid(want_oid) != (resolved == NULL)) {
     -+		return LOCK_ERROR;
     ++		return REFTABLE_LOCK_ERROR;
      +	}
      +
      +	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
     -+		return LOCK_ERROR;
     ++		return REFTABLE_LOCK_ERROR;
      +	}
      +
      +	return 0;
     @@ refs/reftable-backend.c (new)
      +		(struct git_reftable_ref_store *)transaction->ref_store;
      +	uint64_t ts = reftable_stack_next_update_index(refs->stack);
      +	int err = 0;
     ++	int i = 0;
      +	struct reftable_log_record *logs =
      +		calloc(transaction->nr, sizeof(*logs));
      +	struct ref_update **sorted =
     @@ refs/reftable-backend.c (new)
      +	QSORT(sorted, transaction->nr, ref_update_cmp);
      +	reftable_writer_set_limits(writer, ts, ts);
      +
     -+	for (int i = 0; i < transaction->nr; i++) {
     ++	for (i = 0; i < transaction->nr; i++) {
      +		struct ref_update *u = sorted[i];
      +		if (u->flags & REF_HAVE_OLD) {
      +			err = reftable_check_old_oid(transaction->ref_store,
     @@ refs/reftable-backend.c (new)
      +		}
      +	}
      +
     -+	for (int i = 0; i < transaction->nr; i++) {
     ++	for (i = 0; i < transaction->nr; i++) {
      +		struct ref_update *u = sorted[i];
      +		struct reftable_log_record *log = &logs[i];
      +		fill_reftable_log_record(log);
     @@ refs/reftable-backend.c (new)
      +				transaction->ref_store, u->refname, 0, &out_oid,
      +				&out_flags);
      +			struct reftable_ref_record ref = { NULL };
     ++			struct object_id peeled;
     ++			int peel_error = peel_object(&u->new_oid, &peeled);
     ++
      +			ref.ref_name =
      +				(char *)(resolved ? resolved : u->refname);
      +			log->ref_name = ref.ref_name;
     -+			ref.value = u->new_oid.hash;
     ++
     ++			if (!is_null_oid(&u->new_oid)) {
     ++				ref.value = u->new_oid.hash;
     ++			}
      +			ref.update_index = ts;
     ++			if (!peel_error) {
     ++				ref.target_value = peeled.hash;
     ++			}
     ++
      +			err = reftable_writer_add_ref(writer, &ref);
      +			if (err < 0) {
      +				goto exit;
     @@ refs/reftable-backend.c (new)
      +		}
      +	}
      +
     -+	for (int i = 0; i < transaction->nr; i++) {
     ++	for (i = 0; i < transaction->nr; i++) {
      +		err = reftable_writer_add_log(writer, &logs[i]);
      +		clear_reftable_log_record(&logs[i]);
      +		if (err < 0) {
     @@ refs/reftable-backend.c (new)
      +	err = reftable_stack_add(refs->stack, &write_transaction_table,
      +				 transaction);
      +	if (err < 0) {
     -+		strbuf_addf(errmsg, "reftable: transaction failure %s",
     ++		strbuf_addf(errmsg, "reftable: transaction failure: %s",
      +			    reftable_error_str(err));
      +		return -1;
      +	}
     @@ refs/reftable-backend.c (new)
      +		(struct write_delete_refs_arg *)argv;
      +	uint64_t ts = reftable_stack_next_update_index(arg->stack);
      +	int err = 0;
     ++	int i = 0;
      +
      +	reftable_writer_set_limits(writer, ts, ts);
     -+	for (int i = 0; i < arg->refnames->nr; i++) {
     ++	for (i = 0; i < arg->refnames->nr; i++) {
      +		struct reftable_ref_record ref = {
      +			.ref_name = (char *)arg->refnames->items[i].string,
      +			.update_index = ts,
     @@ refs/reftable-backend.c (new)
      +		}
      +	}
      +
     -+	for (int i = 0; i < arg->refnames->nr; i++) {
     ++	for (i = 0; i < arg->refnames->nr; i++) {
      +		struct reftable_log_record log = { NULL };
      +		struct reftable_ref_record current = { NULL };
      +		fill_reftable_log_record(&log);
     @@ refs/reftable-backend.c (new)
      +			return ITER_ERROR;
      +		}
      +
     -+                if (reftable_log_record_is_deletion(&ri->log)) {
     -+                        /* XXX - Why does the reftable_stack filter these? */
     -+                        continue;
     -+                }
      +		ri->base.refname = ri->log.ref_name;
      +		if (ri->last_name != NULL &&
      +		    !strcmp(ri->log.ref_name, ri->last_name)) {
     -+                        /* we want the refnames that we have reflogs for, so we
     -+                         * skip if we've already produced this name. This could
     -+                         * be faster by seeking directly to
     -+                         * reflog@update_index==0.
     -+                         */
     ++			/* we want the refnames that we have reflogs for, so we
     ++			 * skip if we've already produced this name. This could
     ++			 * be faster by seeking directly to
     ++			 * reflog@update_index==0.
     ++			 */
      +			continue;
      +		}
      +
     @@ refs/reftable-backend.c (new)
      +	int cap = 0;
      +	int len = 0;
      +	int err = 0;
     ++	int i = 0;
      +
      +	if (refs->err < 0) {
      +		return refs->err;
     @@ refs/reftable-backend.c (new)
      +		logs[len++] = log;
      +	}
      +
     -+	for (int i = len; i--;) {
     ++	for (i = len; i--;) {
      +		struct reftable_log_record *log = &logs[i];
      +		struct object_id old_oid;
      +		struct object_id new_oid;
     @@ refs/reftable-backend.c (new)
      +		}
      +	}
      +
     -+	for (int i = 0; i < len; i++) {
     ++	for (i = 0; i < len; i++) {
      +		reftable_log_record_clear(&logs[i]);
      +	}
      +	free(logs);
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +	err = reftable_stack_read_ref(refs->stack, refname, &ref);
     -+	if (err) {
     ++        if (err > 0) {
     ++                errno = ENOENT;
     ++                err = -1;
     ++                goto exit;
     ++        }
     ++	if (err < 0) {
     ++                errno = reftable_error_to_errno(err);
     ++                err = -1;
      +		goto exit;
      +	}
      +	if (ref.target != NULL) {
     @@ refs/reftable-backend.c (new)
      +		strbuf_reset(referent);
      +		strbuf_addstr(referent, ref.target);
      +		*type |= REF_ISSYMREF;
     -+	} else {
     ++	} else if (ref.value != NULL) {
      +		hashcpy(oid->hash, ref.value);
     -+	}
     ++	} else {
     ++                *type |= REF_ISBROKEN;
     ++                errno = EINVAL;
     ++                err = -1;
     ++        }
      +exit:
      +	reftable_ref_record_clear(&ref);
      +	return err;
     @@ refs/reftable-backend.c (new)
      +	reftable_reflog_expire
      +};
      
     + ## reftable/update.sh ##
     +@@ reftable/update.sh: SRC=${SRC:-origin} BRANCH=${BRANCH:-origin/master}
     + cp reftable-repo/c/*.[ch] reftable/
     + cp reftable-repo/c/include/*.[ch] reftable/
     + cp reftable-repo/LICENSE reftable/
     +-git --git-dir reftable-repo/.git show --no-patch origin/master \
     ++git --git-dir reftable-repo/.git show --no-patch ${BRANCH} \
     +   > reftable/VERSION
     + 
     + mv reftable/system.h reftable/system.h~
     +
       ## repository.c ##
      @@ repository.c: int repo_init(struct repository *repo,
       	if (worktree)
     @@ repository.c: int repo_init(struct repository *repo,
      
       ## repository.h ##
      @@ repository.h: struct repository {
     - 	/* The store in which the refs are held. */
     - 	struct ref_store *refs;
     + 	 */
     + 	struct ref_store *refs_private;
       
      +	/* The format to use for the ref database. */
      +	char *ref_storage_format;
     @@ t/t0031-reftable.sh (new)
      +
      +. ./test-lib.sh
      +
     -+# XXX - fix GC
     -+test_expect_success 'basic operation of reftable storage' '
     ++INVALID_SHA1=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
     ++
     ++initialize ()  {
      +	rm -rf .git &&
      +	git init --ref-storage=reftable &&
     -+	mv .git/hooks .git/hooks-disabled &&
     ++	mv .git/hooks .git/hooks-disabled
     ++}
     ++
     ++test_expect_success 'delete ref' '
     ++	initialize &&
     ++	test_commit file &&
     ++	SHA=$(git show-ref -s --verify HEAD) &&
     ++	test_write_lines "$SHA refs/heads/master" "$SHA refs/tags/file" >expect &&
     ++	git show-ref > actual &&
     ++	! git update-ref -d refs/tags/file $INVALID_SHA1 &&
     ++	test_cmp expect actual &&
     ++	git update-ref -d refs/tags/file $SHA  &&
     ++	test_write_lines "$SHA refs/heads/master" >expect &&
     ++	git show-ref > actual &&
     ++	test_cmp expect actual
     ++'
     ++
     ++test_expect_success 'basic operation of reftable storage: commit, reflog, repack' '
     ++	initialize &&
      +	test_commit file &&
      +	test_write_lines refs/heads/master refs/tags/file >expect &&
      +	git show-ref &&
     @@ t/t0031-reftable.sh (new)
      +	for count in $(test_seq 1 10)
      +	do
      +		test_commit "number $count" file.t $count number-$count ||
     -+ 		return 1
     ++	        return 1
      +	done &&
     -+(true || (test_pause &&
     -+	git gc &&
     ++	git pack-refs &&
      +	ls -1 .git/reftable >table-files &&
      +	test_line_count = 2 table-files &&
     -+ 	git reflog refs/heads/master >output &&
     ++	git reflog refs/heads/master >output &&
      +	test_line_count = 11 output &&
     -+	grep "commit (initial): first post" output &&
     -+	grep "commit: number 10" output ))
     ++	grep "commit (initial): file" output &&
     ++	grep "commit: number 10" output &&
     ++	git gc &&
     ++	git reflog refs/heads/master >output &&
     ++	test_line_count = 0 output 
     ++'
     ++
     ++# This matches show-ref's output
     ++print_ref() {
     ++	echo "$(git rev-parse "$1") $1"
     ++}
     ++
     ++test_expect_success 'peeled tags are stored' '
     ++	initialize &&
     ++	test_commit file &&
     ++	git tag -m "annotated tag" test_tag HEAD &&
     ++	{
     ++		print_ref "refs/heads/master" &&
     ++		print_ref "refs/tags/file" &&
     ++		print_ref "refs/tags/test_tag" &&
     ++		print_ref "refs/tags/test_tag^{}" 
     ++	} >expect &&
     ++	git show-ref -d >actual &&
     ++	test_cmp expect actual
     ++'
     ++
     ++test_expect_success 'show-ref works on fresh repo' '
     ++	initialize &&
     ++	rm -rf .git &&
     ++	git init --ref-storage=reftable &&
     ++	>expect &&
     ++	! git show-ref > actual &&
     ++	test_cmp expect actual
     ++'
     ++
     ++test_expect_success 'checkout unborn branch' '
     ++	initialize &&
     ++	git checkout -b master
     ++'
     ++
     ++test_expect_success 'do not clobber existing repo' '
     ++	rm -rf .git &&
     ++	git init --ref-storage=files &&
     ++	cat .git/HEAD > expect &&
     ++	test_commit file &&
     ++	(git init --ref-storage=reftable || true) &&
     ++	cat .git/HEAD > actual &&
     ++	test_cmp expect actual
      +'
      +
      +test_done
  -:  ----------- > 11:  8af67b85e49 Add some reftable testing infrastructure
  -:  ----------- > 12:  b02b83a92ad t: use update-ref and show-ref to reading/writing refs

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v10 01/12] refs.h: clarify reflog iteration order
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
@ 2020-04-27 20:13                   ` Han-Wen Nienhuys via GitGitGadget
  2020-04-27 20:13                   ` [PATCH v10 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
                                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-27 20:13 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/refs.h b/refs.h
index 545029c6d80..87c9ec921b9 100644
--- a/refs.h
+++ b/refs.h
@@ -444,18 +444,21 @@ int delete_refs(const char *msg, struct string_list *refnames,
 int refs_delete_reflog(struct ref_store *refs, const char *refname);
 int delete_reflog(const char *refname);
 
-/* iterate over reflog entries */
+/* Iterate over reflog entries. */
 typedef int each_reflog_ent_fn(
 		struct object_id *old_oid, struct object_id *new_oid,
 		const char *committer, timestamp_t timestamp,
 		int tz, const char *msg, void *cb_data);
 
+/* Iterate in over reflog entries, oldest entry first. */
 int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
 			     each_reflog_ent_fn fn, void *cb_data);
 int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
 				     const char *refname,
 				     each_reflog_ent_fn fn,
 				     void *cb_data);
+
+/* Call a function for each reflog entry, oldest entry first. */
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v10 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
  2020-04-27 20:13                   ` [PATCH v10 01/12] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
@ 2020-04-27 20:13                   ` Han-Wen Nienhuys via GitGitGadget
  2020-04-30 21:17                     ` Emily Shaffer
  2020-04-27 20:13                   ` [PATCH v10 03/12] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
                                     ` (10 subsequent siblings)
  12 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-27 20:13 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This happens implicitly in the files/packed ref backend; making it
explicit simplifies adding alternate ref storage backends, such as
reftable.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index b8759116cd0..05e05579408 100644
--- a/refs.c
+++ b/refs.c
@@ -1569,7 +1569,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1629,8 +1629,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
-			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
+			       cb_data);
 }
 
 int for_each_rawref(each_ref_fn fn, void *cb_data)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v10 03/12] create .git/refs in files-backend.c
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
  2020-04-27 20:13                   ` [PATCH v10 01/12] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
  2020-04-27 20:13                   ` [PATCH v10 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
@ 2020-04-27 20:13                   ` Han-Wen Nienhuys via GitGitGadget
  2020-04-30 21:24                     ` Emily Shaffer
  2020-04-27 20:13                   ` [PATCH v10 04/12] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
                                     ` (9 subsequent siblings)
  12 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-27 20:13 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This prepares for supporting the reftable format, which will want
create its own file system layout in .git

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/init-db.c    | 2 --
 refs/files-backend.c | 6 ++++++
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 0b7222e7188..3b50b1aa0e5 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -251,8 +251,6 @@ static int create_default_files(const char *template_path,
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
 	 */
-	safe_create_dir(git_path("refs"), 1);
-	adjust_shared_perm(git_path("refs"));
 
 	if (refs_init_db(&err))
 		die("failed to set up refs db: %s", err.buf);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 561c33ac8a9..ab7899a9c77 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3157,9 +3157,15 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
 	struct strbuf sb = STRBUF_INIT;
 
+	files_ref_path(refs, &sb, "refs");
+	safe_create_dir(sb.buf, 1);
+	/* adjust permissions even if directory already exists. */
+	adjust_shared_perm(sb.buf);
+
 	/*
 	 * Create .git/refs/{heads,tags}
 	 */
+	strbuf_reset(&sb);
 	files_ref_path(refs, &sb, "refs/heads");
 	safe_create_dir(sb.buf, 1);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v10 04/12] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
                                     ` (2 preceding siblings ...)
  2020-04-27 20:13                   ` [PATCH v10 03/12] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
@ 2020-04-27 20:13                   ` Han-Wen Nienhuys via GitGitGadget
  2020-04-27 20:13                   ` [PATCH v10 05/12] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
                                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-27 20:13 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index ff2436c0fb7..3490aac3a40 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -438,6 +438,11 @@ void base_ref_iterator_free(struct ref_iterator *iter);
 
 /* Virtual function declarations for ref_iterators: */
 
+/*
+ * backend-specific implementation of ref_iterator_advance.
+ * For symrefs, the function should set REF_ISSYMREF, and it should also
+ * dereference the symref to provide the OID referent.
+ */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v10 05/12] Add .gitattributes for the reftable/ directory
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
                                     ` (3 preceding siblings ...)
  2020-04-27 20:13                   ` [PATCH v10 04/12] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
@ 2020-04-27 20:13                   ` Han-Wen Nienhuys via GitGitGadget
  2020-04-27 20:13                   ` [PATCH v10 06/12] reftable: file format documentation Jonathan Nieder via GitGitGadget
                                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-27 20:13 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 reftable/.gitattributes

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 00000000000..f44451a3795
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v10 06/12] reftable: file format documentation
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
                                     ` (4 preceding siblings ...)
  2020-04-27 20:13                   ` [PATCH v10 05/12] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
@ 2020-04-27 20:13                   ` Jonathan Nieder via GitGitGadget
  2020-04-27 20:13                   ` [PATCH v10 07/12] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
                                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 409+ messages in thread
From: Jonathan Nieder via GitGitGadget @ 2020-04-27 20:13 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder

From: Jonathan Nieder <jrnieder@gmail.com>

Shawn Pearce explains:

Some repositories contain a lot of references (e.g. android at 866k,
rails at 31k). The reftable format provides:

- Near constant time lookup for any single reference, even when the
  repository is cold and not in process or kernel cache.
- Near constant time verification a SHA-1 is referred to by at least
  one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.

This file format spec was originally written in July, 2017 by Shawn
Pearce.  Some refinements since then were made by Shawn and by Han-Wen
Nienhuys based on experiences implementing and experimenting with the
format.  (All of this was in the context of our work at Google and
Google is happy to contribute the result to the Git project.)

Imported from JGit[1]'s current version (c217d33ff,
"Documentation/technical/reftable: improve repo layout", 2020-02-04)
of Documentation/technical/reftable.md and converted to asciidoc by
running

  pandoc -t asciidoc -f markdown reftable.md >reftable.txt

using pandoc 2.2.1.  The result required the following additional
minor changes:

- removed the [TOC] directive to add a table of contents, since
  asciidoc does not support it
- replaced git-scm.com/docs links with linkgit: directives that link
  to other pages within Git's documentation

[1] https://eclipse.googlesource.com/jgit/jgit

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/Makefile               |    1 +
 Documentation/technical/reftable.txt | 1067 ++++++++++++++++++++++++++
 2 files changed, 1068 insertions(+)
 create mode 100644 Documentation/technical/reftable.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 59e6ce3a2a6..c73eedbb3fc 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -93,6 +93,7 @@ TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
 TECH_DOCS += technical/protocol-v2
 TECH_DOCS += technical/racy-git
+TECH_DOCS += technical/reftable
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
 TECH_DOCS += technical/signature-format
diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
new file mode 100644
index 00000000000..9fa4657d9ff
--- /dev/null
+++ b/Documentation/technical/reftable.txt
@@ -0,0 +1,1067 @@
+reftable
+--------
+
+Overview
+~~~~~~~~
+
+Problem statement
+^^^^^^^^^^^^^^^^^
+
+Some repositories contain a lot of references (e.g. android at 866k,
+rails at 31k). The existing packed-refs format takes up a lot of space
+(e.g. 62M), and does not scale with additional references. Lookup of a
+single reference requires linearly scanning the file.
+
+Atomic pushes modifying multiple references require copying the entire
+packed-refs file, which can be a considerable amount of data moved
+(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
+
+Repositories with many loose references occupy a large number of disk
+blocks from the local file system, as each reference is its own file
+storing 41 bytes (and another file for the corresponding reflog). This
+negatively affects the number of inodes available when a large number of
+repositories are stored on the same filesystem. Readers can be penalized
+due to the larger number of syscalls required to traverse and read the
+`$GIT_DIR/refs` directory.
+
+Objectives
+^^^^^^^^^^
+
+* Near constant time lookup for any single reference, even when the
+repository is cold and not in process or kernel cache.
+* Near constant time verification if a SHA-1 is referred to by at least
+one reference (for allow-tip-sha1-in-want).
+* Efficient lookup of an entire namespace, such as `refs/tags/`.
+* Support atomic push with `O(size_of_update)` operations.
+* Combine reflog storage with ref storage for small transactions.
+* Separate reflog storage for base refs and historical logs.
+
+Description
+^^^^^^^^^^^
+
+A reftable file is a portable binary file format customized for
+reference storage. References are sorted, enabling linear scans, binary
+search lookup, and range scans.
+
+Storage in the file is organized into variable sized blocks. Prefix
+compression is used within a single block to reduce disk space. Block
+size and alignment is tunable by the writer.
+
+Performance
+^^^^^^^^^^^
+
+Space used, packed-refs vs. reftable:
+
+[cols=",>,>,>,>,>",options="header",]
+|===============================================================
+|repository |packed-refs |reftable |% original |avg ref |avg obj
+|android |62.2 M |36.1 M |58.0% |33 bytes |5 bytes
+|rails |1.8 M |1.1 M |57.7% |29 bytes |4 bytes
+|git |78.7 K |48.1 K |61.0% |50 bytes |4 bytes
+|git (heads) |332 b |269 b |81.0% |33 bytes |0 bytes
+|===============================================================
+
+Scan (read 866k refs), by reference name lookup (single ref from 866k
+refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs):
+
+[cols=",>,>,>,>",options="header",]
+|=========================================================
+|format |cache |scan |by name |by SHA-1
+|packed-refs |cold |402 ms |409,660.1 usec |412,535.8 usec
+|packed-refs |hot | |6,844.6 usec |20,110.1 usec
+|reftable |cold |112 ms |33.9 usec |323.2 usec
+|reftable |hot | |20.2 usec |320.8 usec
+|=========================================================
+
+Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable:
+
+[cols=",>,>",options="header",]
+|================================
+|format |size |avg entry
+|$GIT_DIR/logs |173 M |1209 bytes
+|reftable |5 M |37 bytes
+|================================
+
+Details
+~~~~~~~
+
+Peeling
+^^^^^^^
+
+References stored in a reftable are peeled, a record for an annotated
+(or signed) tag records both the tag object, and the object it refers
+to.
+
+Reference name encoding
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Reference names are an uninterpreted sequence of bytes that must pass
+linkgit:git-check-ref-format[1] as a valid reference name.
+
+Key unicity
+^^^^^^^^^^^
+
+Each entry must have a unique key; repeated keys are disallowed.
+
+Network byte order
+^^^^^^^^^^^^^^^^^^
+
+All multi-byte, fixed width fields are in network byte order.
+
+Ordering
+^^^^^^^^
+
+Blocks are lexicographically ordered by their first reference.
+
+Directory/file conflicts
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format accepts both `refs/heads/foo` and
+`refs/heads/foo/bar` as distinct references.
+
+This property is useful for retaining log records in reftable, but may
+confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
+references. Users of reftable may choose to continue to reject `foo` and
+`foo/bar` type conflicts to prevent problems for peers.
+
+File format
+~~~~~~~~~~~
+
+Structure
+^^^^^^^^^
+
+A reftable file has the following high-level structure:
+
+....
+first_block {
+  header
+  first_ref_block
+}
+ref_block*
+ref_index*
+obj_block*
+obj_index*
+log_block*
+log_index*
+footer
+....
+
+A log-only file omits the `ref_block`, `ref_index`, `obj_block` and
+`obj_index` sections, containing only the file header and log block:
+
+....
+first_block {
+  header
+}
+log_block*
+log_index*
+footer
+....
+
+in a log-only file the first log block immediately follows the file
+header, without padding to block alignment.
+
+Block size
+^^^^^^^^^^
+
+The file’s block size is arbitrarily determined by the writer, and does
+not have to be a power of 2. The block size must be larger than the
+longest reference name or log entry used in the repository, as
+references cannot span blocks.
+
+Powers of two that are friendly to the virtual memory system or
+filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
+yield better compression, with a possible increased cost incurred by
+readers during access.
+
+The largest block size is `16777215` bytes (15.99 MiB).
+
+Block alignment
+^^^^^^^^^^^^^^^
+
+Writers may choose to align blocks at multiples of the block size by
+including `padding` filled with NUL bytes at the end of a block to round
+out to the chosen alignment. When alignment is used, writers must
+specify the alignment with the file header’s `block_size` field.
+
+Block alignment is not required by the file format. Unaligned files must
+set `block_size = 0` in the file header, and omit `padding`. Unaligned
+files with more than one ref block must include the link:#Ref-index[ref
+index] to support fast lookup. Readers must be able to read both aligned
+and non-aligned files.
+
+Very small files (e.g. 1 only ref block) may omit `padding` and the ref
+index to reduce total file size.
+
+Header
+^^^^^^
+
+A 24-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+....
+
+Aligned files must specify `block_size` to configure readers with the
+expected block alignment. Unaligned files must set `block_size = 0`.
+
+The `min_update_index` and `max_update_index` describe bounds for the
+`update_index` field of all log records in this file. When reftables are
+used in a stack for link:#Update-transactions[transactions], these
+fields can order the files such that the prior file’s
+`max_update_index + 1` is the next file’s `min_update_index`.
+
+First ref block
+^^^^^^^^^^^^^^^
+
+The first ref block shares the same block as the file header, and is 24
+bytes smaller than all other blocks in the file. The first block
+immediately begins after the file header, at position 24.
+
+If the first block is a log block (a log-only file), its block header
+begins immediately at position 24.
+
+Ref block format
+^^^^^^^^^^^^^^^^
+
+A ref block is written as:
+
+....
+'r'
+uint24( block_len )
+ref_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
+encodes the number of bytes in the block up to, but not including the
+optional `padding`. This is always less than or equal to the file’s
+block size. In the first ref block, `block_len` includes 24 bytes for
+the file header.
+
+The 2-byte `restart_count` stores the number of entries in the
+`restart_offset` list, which must not be empty. Readers can use
+`restart_count` to binary search between restarts before starting a
+linear scan.
+
+Exactly `restart_count` 3-byte `restart_offset` values precedes the
+`restart_count`. Offsets are relative to the start of the block and
+refer to the first byte of any `ref_record` whose name has not been
+prefix compressed. Entries in the `restart_offset` list must be sorted,
+ascending. Readers can start linear scans from any of these records.
+
+A variable number of `ref_record` fill the middle of the block,
+describing reference names and values. The format is described below.
+
+As the first ref block shares the first file block with the file header,
+all `restart_offset` in the first block are relative to the start of the
+file (position 0), and include the file header. This forces the first
+`restart_offset` to be `28`.
+
+ref record
+++++++++++
+
+A `ref_record` describes a single reference, storing both the name and
+its value(s). Records are formatted as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | value_type )
+suffix
+varint( update_index_delta )
+value?
+....
+
+The `prefix_length` field specifies how many leading bytes of the prior
+reference record’s name should be copied to obtain this reference’s
+name. This must be 0 for the first reference in any block, and also must
+be 0 for any `ref_record` whose offset is listed in the `restart_offset`
+table at the end of the block.
+
+Recovering a reference name from any `ref_record` is a simple concat:
+
+....
+this_name = prior_name[0..prefix_length] + suffix
+....
+
+The `suffix_length` value provides the number of bytes available in
+`suffix` to copy from `suffix` to complete the reference name.
+
+The `update_index` that last modified the reference can be obtained by
+adding `update_index_delta` to the `min_update_index` from the file
+header: `min_update_index + update_index_delta`.
+
+The `value` follows. Its format is determined by `value_type`, one of
+the following:
+
+* `0x0`: deletion; no value data (see transactions, below)
+* `0x1`: one 20-byte object id; value of the ref
+* `0x2`: two 20-byte object ids; value of the ref, peeled target
+* `0x3`: symbolic reference: `varint( target_len ) target`
+
+Symbolic references use `0x3`, followed by the complete name of the
+reference target. No compression is applied to the target name.
+
+Types `0x4..0x7` are reserved for future use.
+
+Ref index
+^^^^^^^^^
+
+The ref index stores the name of the last reference from every ref block
+in the file, enabling reduced disk seeks for lookups. Any reference can
+be found by searching the index, identifying the containing block, and
+searching within that block.
+
+The index may be organized into a multi-level index, where the 1st level
+index block points to additional ref index blocks (2nd level), which may
+in turn point to either additional index blocks (e.g. 3rd level) or ref
+blocks (leaf level). Disk reads required to access a ref go up with
+higher index levels. Multi-level indexes may be required to ensure no
+single index block exceeds the file format’s max block size of
+`16777215` bytes (15.99 MiB). To acheive constant O(1) disk seeks for
+lookups the index must be a single level, which is permitted to exceed
+the file’s configured block size, but not the format’s max block size of
+15.99 MiB.
+
+If present, the ref index block(s) appears after the last ref block.
+
+If there are at least 4 ref blocks, a ref index block should be written
+to improve lookup times. Cold reads using the index require 2 disk reads
+(read index, read block), and binary searching < 4 blocks also requires
+<= 2 reads. Omitting the index block from smaller files saves space.
+
+If the file is unaligned and contains more than one ref block, the ref
+index must be written.
+
+Index block format:
+
+....
+'i'
+uint24( block_len )
+index_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+The index blocks begin with `block_type = 'i'` and a 3-byte `block_len`
+which encodes the number of bytes in the block, up to but not including
+the optional `padding`.
+
+The `restart_offset` and `restart_count` fields are identical in format,
+meaning and usage as in ref blocks.
+
+To reduce the number of reads required for random access in very large
+files the index block may be larger than other blocks. However, readers
+must hold the entire index in memory to benefit from this, so it’s a
+time-space tradeoff in both file size and reader memory.
+
+Increasing the file’s block size decreases the index size. Alternatively
+a multi-level index may be used, keeping index blocks within the file’s
+block size, but increasing the number of blocks that need to be
+accessed.
+
+index record
+++++++++++++
+
+An index record describes the last entry in another block. Index records
+are written as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | 0 )
+suffix
+varint( block_position )
+....
+
+Index records use prefix compression exactly like `ref_record`.
+
+Index records store `block_position` after the suffix, specifying the
+absolute position in bytes (from the start of the file) of the block
+that ends with this reference. Readers can seek to `block_position` to
+begin reading the block header.
+
+Readers must examine the block header at `block_position` to determine
+if the next block is another level index block, or the leaf-level ref
+block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the ref index must first read the footer (below) to
+obtain `ref_index_position`. If not present, the position will be 0. The
+`ref_index_position` is for the 1st level root of the ref index.
+
+Obj block format
+^^^^^^^^^^^^^^^^
+
+Object blocks are optional. Writers may choose to omit object blocks,
+especially if readers will not use the SHA-1 to ref mapping.
+
+Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to
+ref blocks containing references pointing to that object directly, or as
+the peeled value of an annotated tag. Like ref blocks, object blocks use
+the file’s standard block size. The abbrevation length is available in
+the footer as `obj_id_len`.
+
+To save space in small files, object blocks may be omitted if the ref
+index is not present, as brute force search will only need to read a few
+ref blocks. When missing, readers should brute force a linear search of
+all references to lookup by SHA-1.
+
+An object block is written as:
+
+....
+'o'
+uint24( block_len )
+obj_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Fields are identical to ref block. Binary search using the restart table
+works the same as in reference blocks.
+
+Because object identifiers are abbreviated by writers to the shortest
+unique abbreviation within the reftable, obj key lengths are variable
+between 2 and 20 bytes. Readers must compare only for common prefix
+match within an obj block or obj index.
+
+obj record
+++++++++++
+
+An `obj_record` describes a single object abbreviation, and the blocks
+containing references using that unique abbreviation:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | cnt_3 )
+suffix
+varint( cnt_large )?
+varint( position_delta )*
+....
+
+Like in reference blocks, abbreviations are prefix compressed within an
+obj block. On large reftables with many unique objects, higher block
+sizes (64k), and higher restart interval (128), a `prefix_length` of 2
+or 3 and `suffix_length` of 3 may be common in obj records (unique
+abbreviation of 5-6 raw bytes, 10-12 hex digits).
+
+Each record contains `position_count` number of positions for matching
+ref blocks. For 1-7 positions the count is stored in `cnt_3`. When
+`cnt_3 = 0` the actual count follows in a varint, `cnt_large`.
+
+The use of `cnt_3` bets most objects are pointed to by only a single
+reference, some may be pointed to by a couple of references, and very
+few (if any) are pointed to by more than 7 references.
+
+A special case exists when `cnt_3 = 0` and `cnt_large = 0`: there are no
+`position_delta`, but at least one reference starts with this
+abbreviation. A reader that needs exact reference names must scan all
+references to find which specific references have the desired object.
+Writers should use this format when the `position_delta` list would have
+overflowed the file’s block size due to a high number of references
+pointing to the same object.
+
+The first `position_delta` is the position from the start of the file.
+Additional `position_delta` entries are sorted ascending and relative to
+the prior entry, e.g. a reader would perform:
+
+....
+pos = position_delta[0]
+prior = pos
+for (j = 1; j < position_count; j++) {
+  pos = prior + position_delta[j]
+  prior = pos
+}
+....
+
+With a position in hand, a reader must linearly scan the ref block,
+starting from the first `ref_record`, testing each reference’s SHA-1s
+(for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
+SHA-1 within a single ref block is not supported by the reftable format.
+Smaller block sizes reduce the number of candidates this step must
+consider.
+
+Obj index
+^^^^^^^^^
+
+The obj index stores the abbreviation from the last entry for every obj
+block in the file, enabling reduced disk seeks for all lookups. It is
+formatted exactly the same as the ref index, but refers to obj blocks.
+
+The obj index should be present if obj blocks are present, as obj blocks
+should only be written in larger files.
+
+Readers loading the obj index must first read the footer (below) to
+obtain `obj_index_position`. If not present, the position will be 0.
+
+Log block format
+^^^^^^^^^^^^^^^^
+
+Unlike ref and obj blocks, log blocks are always unaligned.
+
+Log blocks are variable in size, and do not match the `block_size`
+specified in the file header or footer. Writers should choose an
+appropriate buffer size to prepare a log block for deflation, such as
+`2 * block_size`.
+
+A log block is written as:
+
+....
+'g'
+uint24( block_len )
+zlib_deflate {
+  log_record+
+  uint24( restart_offset )+
+  uint16( restart_count )
+}
+....
+
+Log blocks look similar to ref blocks, except `block_type = 'g'`.
+
+The 4-byte block header is followed by the deflated block contents using
+zlib deflate. The `block_len` in the header is the inflated size
+(including 4-byte block header), and should be used by readers to
+preallocate the inflation output buffer. A log block’s `block_len` may
+exceed the file’s block size.
+
+Offsets within the log block (e.g. `restart_offset`) still include the
+4-byte header. Readers may prefer prefixing the inflation output buffer
+with the 4-byte header.
+
+Within the deflate container, a variable number of `log_record` describe
+reference changes. The log record format is described below. See ref
+block format (above) for a description of `restart_offset` and
+`restart_count`.
+
+Because log blocks have no alignment or padding between blocks, readers
+must keep track of the bytes consumed by the inflater to know where the
+next log block begins.
+
+log record
+++++++++++
+
+Log record keys are structured as:
+
+....
+ref_name '\0' reverse_int64( update_index )
+....
+
+where `update_index` is the unique transaction identifier. The
+`update_index` field must be unique within the scope of a `ref_name`.
+See the update transactions section below for further details.
+
+The `reverse_int64` function inverses the value so lexographical
+ordering the network byte order encoding sorts the more recent records
+with higher `update_index` values first:
+
+....
+reverse_int64(int64 t) {
+  return 0xffffffffffffffff - t;
+}
+....
+
+Log records have a similar starting structure to ref and index records,
+utilizing the same prefix compression scheme applied to the log record
+key described above.
+
+....
+    varint( prefix_length )
+    varint( (suffix_length << 3) | log_type )
+    suffix
+    log_data {
+      old_id
+      new_id
+      varint( name_length    )  name
+      varint( email_length   )  email
+      varint( time_seconds )
+      sint16( tz_offset )
+      varint( message_length )  message
+    }?
+....
+
+Log record entries use `log_type` to indicate what follows:
+
+* `0x0`: deletion; no log data.
+* `0x1`: standard git reflog data using `log_data` above.
+
+The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
+entry from the reflog of `refs/stash` in a transaction file (below),
+without needing to rewrite larger files. Readers reading a stack of
+reflogs must treat this as a deletion.
+
+For `log_type = 0x1`, the `log_data` section follows
+linkgit:git-update-ref[1] logging and includes:
+
+* two 20-byte SHA-1s (old id, new id)
+* varint string of committer’s name
+* varint string of committer’s email
+* varint time in seconds since epoch (Jan 1, 1970)
+* 2-byte timezone offset in minutes (signed)
+* varint string of message
+
+`tz_offset` is the absolute number of minutes from GMT the committer was
+at the time of the update. For example `GMT-0800` is encoded in reftable
+as `sint16(-480)` and `GMT+0230` is `sint16(150)`.
+
+The committer email does not contain `<` or `>`, it’s the value normally
+found between the `<>` in a git commit object header.
+
+The `message_length` may be 0, in which case there was no message
+supplied for the update.
+
+Contrary to traditional reflog (which is a file), renames are encoded as
+a combination of ref deletion and ref creation.
+
+Reading the log
++++++++++++++++
+
+Readers accessing the log must first read the footer (below) to
+determine the `log_position`. The first block of the log begins at
+`log_position` bytes since the start of the file. The `log_position` is
+not block aligned.
+
+Importing logs
+++++++++++++++
+
+When importing from `$GIT_DIR/logs` writers should globally order all
+log records roughly by timestamp while preserving file order, and assign
+unique, increasing `update_index` values for each log line. Newer log
+records get higher `update_index` values.
+
+Although an import may write only a single reftable file, the reftable
+file must span many unique `update_index`, as each log line requires its
+own `update_index` to preserve semantics.
+
+Log index
+^^^^^^^^^
+
+The log index stores the log key
+(`refname \0 reverse_int64(update_index)`) for the last log record of
+every log block in the file, supporting bounded-time lookup.
+
+A log index block must be written if 2 or more log blocks are written to
+the file. If present, the log index appears after the last log block.
+There is no padding used to align the log index to block alignment.
+
+Log index format is identical to ref index, except the keys are 9 bytes
+longer to include `'\0'` and the 8-byte `reverse_int64(update_index)`.
+Records use `block_position` to refer to the start of a log block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the log index must first read the footer (below) to
+obtain `log_index_position`. If not present, the position will be 0.
+
+Footer
+^^^^^^
+
+After the last block of the file, a file footer is written. It begins
+like the file header, but is extended with additional data.
+
+A 68-byte footer appears at the end:
+
+....
+    'REFT'
+    uint8( version_number = 1 )
+    uint24( block_size )
+    uint64( min_update_index )
+    uint64( max_update_index )
+
+    uint64( ref_index_position )
+    uint64( (obj_position << 5) | obj_id_len )
+    uint64( obj_index_position )
+
+    uint64( log_position )
+    uint64( log_index_position )
+
+    uint32( CRC-32 of above )
+....
+
+If a section is missing (e.g. ref index) the corresponding position
+field (e.g. `ref_index_position`) will be 0.
+
+* `obj_position`: byte position for the first obj block.
+* `obj_id_len`: number of bytes used to abbreviate object identifiers in
+obj blocks.
+* `log_position`: byte position for the first log block.
+* `ref_index_position`: byte position for the start of the ref index.
+* `obj_index_position`: byte position for the start of the obj index.
+* `log_index_position`: byte position for the start of the log index.
+
+Reading the footer
+++++++++++++++++++
+
+Readers must seek to `file_length - 68` to access the footer. A trusted
+external source (such as `stat(2)`) is necessary to obtain
+`file_length`. When reading the footer, readers must verify:
+
+* 4-byte magic is correct
+* 1-byte version number is recognized
+* 4-byte CRC-32 matches the other 64 bytes (including magic, and
+version)
+
+Once verified, the other fields of the footer can be accessed.
+
+Varint encoding
+^^^^^^^^^^^^^^^
+
+Varint encoding is identical to the ofs-delta encoding method used
+within pack files.
+
+Decoder works such as:
+
+....
+val = buf[ptr] & 0x7f
+while (buf[ptr] & 0x80) {
+  ptr++
+  val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
+}
+....
+
+Binary search
+^^^^^^^^^^^^^
+
+Binary search within a block is supported by the `restart_offset` fields
+at the end of the block. Readers can binary search through the restart
+table to locate between which two restart points the sought reference or
+key should appear.
+
+Each record identified by a `restart_offset` stores the complete key in
+the `suffix` field of the record, making the compare operation during
+binary search straightforward.
+
+Once a restart point lexicographically before the sought reference has
+been identified, readers can linearly scan through the following record
+entries to locate the sought record, terminating if the current record
+sorts after (and therefore the sought key is not present).
+
+Restart point selection
++++++++++++++++++++++++
+
+Writers determine the restart points at file creation. The process is
+arbitrary, but every 16 or 64 records is recommended. Every 16 may be
+more suitable for smaller block sizes (4k or 8k), every 64 for larger
+block sizes (64k).
+
+More frequent restart points reduces prefix compression and increases
+space consumed by the restart table, both of which increase file size.
+
+Less frequent restart points makes prefix compression more effective,
+decreasing overall file size, with increased penalities for readers
+walking through more records after the binary search step.
+
+A maximum of `65535` restart points per block is supported.
+
+Considerations
+~~~~~~~~~~~~~~
+
+Lightweight refs dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format assumes the vast majority of references are single
+SHA-1 valued with common prefixes, such as Gerrit Code Review’s
+`refs/changes/` namespace, GitHub’s `refs/pulls/` namespace, or many
+lightweight tags in the `refs/tags/` namespace.
+
+Annotated tags storing the peeled object cost an additional 20 bytes per
+reference.
+
+Low overhead
+^^^^^^^^^^^^
+
+A reftable with very few references (e.g. git.git with 5 heads) is 269
+bytes for reftable, vs. 332 bytes for packed-refs. This supports
+reftable scaling down for transaction logs (below).
+
+Block size
+^^^^^^^^^^
+
+For a Gerrit Code Review type repository with many change refs, larger
+block sizes (64 KiB) and less frequent restart points (every 64) yield
+better compression due to more references within the block compressing
+against the prior reference.
+
+Larger block sizes reduce the index size, as the reftable will require
+fewer blocks to store the same number of references.
+
+Minimal disk seeks
+^^^^^^^^^^^^^^^^^^
+
+Assuming the index block has been loaded into memory, binary searching
+for any single reference requires exactly 1 disk seek to load the
+containing block.
+
+Scans and lookups dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Scanning all references and lookup by name (or namespace such as
+`refs/heads/`) are the most common activities performed on repositories.
+SHA-1s are stored directly with references to optimize this use case.
+
+Logs are infrequently read
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are infrequently accessed, but can be large. Deflating log blocks
+saves disk space, with some increased penalty at read time.
+
+Logs are stored in an isolated section from refs, reducing the burden on
+reference readers that want to ignore logs. Further, historical logs can
+be isolated into log-only files.
+
+Logs are read backwards
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are frequently accessed backwards (most recent N records for master
+to answer `master@{4}`), so log records are grouped by reference, and
+sorted descending by update index.
+
+Repository format
+~~~~~~~~~~~~~~~~~
+
+Version 1
+^^^^^^^^^
+
+A repository must set its `$GIT_DIR/config` to configure reftable:
+
+....
+[core]
+    repositoryformatversion = 1
+[extensions]
+    refStorage = reftable
+....
+
+Layout
+^^^^^^
+
+A collection of reftable files are stored in the `$GIT_DIR/reftable/`
+directory:
+
+....
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+where reftable files are named by a unique name such as produced by the
+function `${min_update_index}-${max_update_index}.ref`.
+
+Log-only files use the `.log` extension, while ref-only and mixed ref
+and log files use `.ref`. extension.
+
+The stack ordering file is `$GIT_DIR/reftable/tables.list` and lists the
+current files, one per line, in order, from oldest (base) to newest
+(most recent):
+
+....
+$ cat .git/reftable/tables.list
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+Readers must read `$GIT_DIR/reftable/tables.list` to determine which
+files are relevant right now, and search through the stack in reverse
+order (last reftable is examined first).
+
+Reftable files not listed in `tables.list` may be new (and about to be
+added to the stack by the active writer), or ancient and ready to be
+pruned.
+
+Backward compatibility
+^^^^^^^^^^^^^^^^^^^^^^
+
+Older clients should continue to recognize the directory as a git
+repository so they don’t look for an enclosing repository in parent
+directories. To this end, a reftable-enabled repository must contain the
+following dummy files
+
+* `.git/HEAD`, a regular file containing `ref: refs/heads/.invalid`.
+* `.git/refs/`, a directory
+* `.git/refs/heads`, a regular file
+
+Readers
+^^^^^^^
+
+Readers can obtain a consistent snapshot of the reference space by
+following:
+
+1.  Open and read the `tables.list` file.
+2.  Open each of the reftable files that it mentions.
+3.  If any of the files is missing, goto 1.
+4.  Read from the now-open files as long as necessary.
+
+Update transactions
+^^^^^^^^^^^^^^^^^^^
+
+Although reftables are immutable, mutations are supported by writing a
+new reftable and atomically appending it to the stack:
+
+1.  Acquire `tables.list.lock`.
+2.  Read `tables.list` to determine current reftables.
+3.  Select `update_index` to be most recent file’s
+`max_update_index + 1`.
+4.  Prepare temp reftable `tmp_XXXXXX`, including log entries.
+5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
+6.  Copy `tables.list` to `tables.list.lock`, appending file from (5).
+7.  Rename `tables.list.lock` to `tables.list`.
+
+During step 4 the new file’s `min_update_index` and `max_update_index`
+are both set to the `update_index` selected by step 3. All log records
+for the transaction use the same `update_index` in their keys. This
+enables later correlation of which references were updated by the same
+transaction.
+
+Because a single `tables.list.lock` file is used to manage locking, the
+repository is single-threaded for writers. Writers may have to busy-spin
+(with backoff) around creating `tables.list.lock`, for up to an
+acceptable wait period, aborting if the repository is too busy to
+mutate. Application servers wrapped around repositories (e.g. Gerrit
+Code Review) can layer their own lock/wait queue to improve fairness to
+writers.
+
+Reference deletions
+^^^^^^^^^^^^^^^^^^^
+
+Deletion of any reference can be explicitly stored by setting the `type`
+to `0x0` and omitting the `value` field of the `ref_record`. This serves
+as a tombstone, overriding any assertions about the existence of the
+reference from earlier files in the stack.
+
+Compaction
+^^^^^^^^^^
+
+A partial stack of reftables can be compacted by merging references
+using a straightforward merge join across reftables, selecting the most
+recent value for output, and omitting deleted references that do not
+appear in remaining, lower reftables.
+
+A compacted reftable should set its `min_update_index` to the smallest
+of the input files’ `min_update_index`, and its `max_update_index`
+likewise to the largest input `max_update_index`.
+
+For sake of illustration, assume the stack currently consists of
+reftable files (from oldest to newest): A, B, C, and D. The compactor is
+going to compact B and C, leaving A and D alone.
+
+1.  Obtain lock `tables.list.lock` and read the `tables.list` file.
+2.  Obtain locks `B.lock` and `C.lock`. Ownership of these locks
+prevents other processes from trying to compact these files.
+3.  Release `tables.list.lock`.
+4.  Compact `B` and `C` into a temp file
+`${min_update_index}-${max_update_index}_XXXXXX`.
+5.  Reacquire lock `tables.list.lock`.
+6.  Verify that `B` and `C` are still in the stack, in that order. This
+should always be the case, assuming that other processes are adhering to
+the locking protocol.
+7.  Rename `${min_update_index}-${max_update_index}_XXXXXX` to
+`${min_update_index}-${max_update_index}.ref`.
+8.  Write the new stack to `tables.list.lock`, replacing `B` and `C`
+with the file from (4).
+9.  Rename `tables.list.lock` to `tables.list`.
+10. Delete `B` and `C`, perhaps after a short sleep to avoid forcing
+readers to backtrack.
+
+This strategy permits compactions to proceed independently of updates.
+
+Each reftable (compacted or not) is uniquely identified by its name, so
+open reftables can be cached by their name.
+
+Alternatives considered
+~~~~~~~~~~~~~~~~~~~~~~~
+
+bzip packed-refs
+^^^^^^^^^^^^^^^^
+
+`bzip2` can significantly shrink a large packed-refs file (e.g. 62 MiB
+compresses to 23 MiB, 37%). However the bzip format does not support
+random access to a single reference. Readers must inflate and discard
+while performing a linear scan.
+
+Breaking packed-refs into chunks (individually compressing each chunk)
+would reduce the amount of data a reader must inflate, but still leaves
+the problem of indexing chunks to support readers efficiently locating
+the correct chunk.
+
+Given the compression achieved by reftable’s encoding, it does not seem
+necessary to add the complexity of bzip/gzip/zlib.
+
+Michael Haggerty’s alternate format
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Michael Haggerty proposed
+https://public-inbox.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe+wGvEJ7A@mail.gmail.com/[an
+alternate] format to reftable on the Git mailing list. This format uses
+smaller chunks, without the restart table, and avoids block alignment
+with padding. Reflog entries immediately follow each ref, and are thus
+interleaved between refs.
+
+Performance testing indicates reftable is faster for lookups (51%
+faster, 11.2 usec vs. 5.4 usec), although reftable produces a slightly
+larger file (+ ~3.2%, 28.3M vs 29.2M):
+
+[cols=">,>,>,>",options="header",]
+|=====================================
+|format |size |seek cold |seek hot
+|mh-alt |28.3 M |23.4 usec |11.2 usec
+|reftable |29.2 M |19.9 usec |5.4 usec
+|=====================================
+
+JGit Ketch RefTree
+^^^^^^^^^^^^^^^^^^
+
+https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg03073.html[JGit Ketch]
+proposed
+https://public-inbox.org/git/CAJo=hJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg@mail.gmail.com/[RefTree],
+an encoding of references inside Git tree objects stored as part of the
+repository’s object database.
+
+The RefTree format adds additional load on the object database storage
+layer (more loose objects, more objects in packs), and relies heavily on
+the packer’s delta compression to save space. Namespaces which are flat
+(e.g. thousands of tags in refs/tags) initially create very large loose
+objects, and so RefTree does not address the problem of copying many
+references to modify a handful.
+
+Flat namespaces are not efficiently searchable in RefTree, as tree
+objects in canonical formatting cannot be binary searched. This fails
+the need to handle a large number of references in a single namespace,
+such as GitHub’s `refs/pulls`, or a project with many tags.
+
+LMDB
+^^^^
+
+David Turner proposed
+https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/[using
+LMDB], as LMDB is lightweight (64k of runtime code) and GPL-compatible
+license.
+
+A downside of LMDB is its reliance on a single C implementation. This
+makes embedding inside JGit (a popular reimplemenation of Git)
+difficult, and hoisting onto virtual storage (for JGit DFS) virtually
+impossible.
+
+A common format that can be supported by all major Git implementations
+(git-core, JGit, libgit2) is strongly preferred.
+
+Future
+~~~~~~
+
+Longer hashes
+^^^^^^^^^^^^^
+
+Version will bump (e.g. 2) to indicate `value` uses a different object
+id length other than 20. The length could be stored in an expanded file
+header, or hardcoded as part of the version.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v10 07/12] reftable: define version 2 of the spec to accomodate SHA256
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
                                     ` (5 preceding siblings ...)
  2020-04-27 20:13                   ` [PATCH v10 06/12] reftable: file format documentation Jonathan Nieder via GitGitGadget
@ 2020-04-27 20:13                   ` Han-Wen Nienhuys via GitGitGadget
  2020-04-27 20:13                   ` [PATCH v10 08/12] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
                                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-27 20:13 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/technical/reftable.txt | 50 ++++++++++++++++------------
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index 9fa4657d9ff..ee3f36ea851 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -193,8 +193,8 @@ and non-aligned files.
 Very small files (e.g. 1 only ref block) may omit `padding` and the ref
 index to reduce total file size.
 
-Header
-^^^^^^
+Header (version 1)
+^^^^^^^^^^^^^^^^^^
 
 A 24-byte header appears at the beginning of the file:
 
@@ -215,6 +215,24 @@ used in a stack for link:#Update-transactions[transactions], these
 fields can order the files such that the prior file’s
 `max_update_index + 1` is the next file’s `min_update_index`.
 
+Header (version 2)
+^^^^^^^^^^^^^^^^^^
+
+A 28-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+uint32( hash_id )
+....
+
+The header is identical to `version_number=1`, with the 4-byte hash ID
+("sha1" for SHA1 and "s256" for SHA-256) append to the header.
+
+
 First ref block
 ^^^^^^^^^^^^^^^
 
@@ -671,14 +689,8 @@ Footer
 After the last block of the file, a file footer is written. It begins
 like the file header, but is extended with additional data.
 
-A 68-byte footer appears at the end:
-
 ....
-    'REFT'
-    uint8( version_number = 1 )
-    uint24( block_size )
-    uint64( min_update_index )
-    uint64( max_update_index )
+    HEADER
 
     uint64( ref_index_position )
     uint64( (obj_position << 5) | obj_id_len )
@@ -701,12 +713,16 @@ obj blocks.
 * `obj_index_position`: byte position for the start of the obj index.
 * `log_index_position`: byte position for the start of the log index.
 
+The size of the footer is 68 bytes for version 1, and 72 bytes for
+version 2.
+
 Reading the footer
 ++++++++++++++++++
 
-Readers must seek to `file_length - 68` to access the footer. A trusted
-external source (such as `stat(2)`) is necessary to obtain
-`file_length`. When reading the footer, readers must verify:
+Readers must first read the file start to determine the version
+number. Then they seek to `file_length - FOOTER_LENGTH` to access the
+footer. A trusted external source (such as `stat(2)`) is necessary to
+obtain `file_length`. When reading the footer, readers must verify:
 
 * 4-byte magic is correct
 * 1-byte version number is recognized
@@ -1055,13 +1071,3 @@ impossible.
 
 A common format that can be supported by all major Git implementations
 (git-core, JGit, libgit2) is strongly preferred.
-
-Future
-~~~~~~
-
-Longer hashes
-^^^^^^^^^^^^^
-
-Version will bump (e.g. 2) to indicate `value` uses a different object
-id length other than 20. The length could be stored in an expanded file
-header, or hardcoded as part of the version.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v10 08/12] reftable: clarify how empty tables should be written
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
                                     ` (6 preceding siblings ...)
  2020-04-27 20:13                   ` [PATCH v10 07/12] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
@ 2020-04-27 20:13                   ` Han-Wen Nienhuys via GitGitGadget
  2020-04-27 20:13                   ` [PATCH v10 09/12] Add reftable library Han-Wen Nienhuys via GitGitGadget
                                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-27 20:13 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The format allows for some ambiguity, as a lone footer also starts
with a valid file header. However, the current JGit code will barf on
this. This commit codifies this behavior into the standard.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/technical/reftable.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index ee3f36ea851..df5cb5f11b8 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -731,6 +731,13 @@ version)
 
 Once verified, the other fields of the footer can be accessed.
 
+Empty tables
+++++++++++++
+
+A reftable may be empty. In this case, the file starts with a header
+and is immediately followed by a footer.
+
+
 Varint encoding
 ^^^^^^^^^^^^^^^
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v10 09/12] Add reftable library
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
                                     ` (7 preceding siblings ...)
  2020-04-27 20:13                   ` [PATCH v10 08/12] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
@ 2020-04-27 20:13                   ` Han-Wen Nienhuys via GitGitGadget
  2020-04-28 14:55                     ` Danh Doan
  2020-04-27 20:13                   ` [PATCH v10 10/12] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                                     ` (3 subsequent siblings)
  12 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-27 20:13 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. Its rationale and
specification is in the preceding commit.

This imports the upstream library as one big commit. For understanding
the code, it is suggested to read the code in the following order:

* The specification under Documentation/technical/reftable.txt

* reftable.h - the public API

* record.{c,h} - reading and writing records

* block.{c,h} - reading and writing blocks.

* writer.{c,h} - writing a complete reftable file.

* merged.{c,h} and pq.{c,h} - reading a stack of reftables

* stack.{c,h} - writing and compacting stacks of reftable on the
filesystem.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE       |   31 ++
 reftable/README.md     |   11 +
 reftable/VERSION       |    5 +
 reftable/basics.c      |  209 ++++++++
 reftable/basics.h      |   53 ++
 reftable/block.c       |  425 +++++++++++++++
 reftable/block.h       |  124 +++++
 reftable/blocksource.h |   22 +
 reftable/bytes.c       |    0
 reftable/config.h      |    1 +
 reftable/constants.h   |   21 +
 reftable/dump.c        |   97 ++++
 reftable/file.c        |   98 ++++
 reftable/iter.c        |  234 ++++++++
 reftable/iter.h        |   60 +++
 reftable/merged.c      |  327 ++++++++++++
 reftable/merged.h      |   36 ++
 reftable/pq.c          |  115 ++++
 reftable/pq.h          |   34 ++
 reftable/reader.c      |  754 ++++++++++++++++++++++++++
 reftable/reader.h      |   68 +++
 reftable/record.c      | 1126 +++++++++++++++++++++++++++++++++++++++
 reftable/record.h      |  121 +++++
 reftable/reftable.h    |  527 ++++++++++++++++++
 reftable/slice.c       |  224 ++++++++
 reftable/slice.h       |   76 +++
 reftable/stack.c       | 1151 ++++++++++++++++++++++++++++++++++++++++
 reftable/stack.h       |   44 ++
 reftable/system.h      |   53 ++
 reftable/tree.c        |   67 +++
 reftable/tree.h        |   34 ++
 reftable/update.sh     |   23 +
 reftable/writer.c      |  661 +++++++++++++++++++++++
 reftable/writer.h      |   60 +++
 reftable/zlib-compat.c |   92 ++++
 35 files changed, 6984 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/bytes.c
 create mode 100644 reftable/config.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 00000000000..402e0f9356b
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 00000000000..21500563fd4
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,11 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   sh reftable/update.sh
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 00000000000..5d6fb5d9130
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,5 @@
+commit a6d6b31bea256044621d789df64a8159647f21bc
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Mon Apr 27 20:46:21 2020 +0200
+
+    C: prefix error codes with REFTABLE_
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 00000000000..87fc235850e
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,209 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)((i)&0xff);
+}
+
+uint32_t get_be24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)((i)&0xff);
+}
+
+int binsearch(int sz, int (*f)(int k, void *args), void *args)
+{
+	int lo = 0;
+	int hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		int mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		reftable_free(*p);
+		p++;
+	}
+	reftable_free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = reftable_realloc(
+					names, names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = reftable_realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *reftable_error_str(int err)
+{
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return "I/O error";
+	case REFTABLE_FORMAT_ERROR:
+		return "corrupt reftable file";
+	case REFTABLE_NOT_EXIST_ERROR:
+		return "file does not exist";
+	case REFTABLE_LOCK_ERROR:
+		return "data is outdated";
+	case REFTABLE_API_ERROR:
+		return "misuse of the reftable API";
+	case REFTABLE_ZLIB_ERROR:
+		return "zlib failure";
+	case -1:
+		return "general error";
+	default:
+		return "unknown error code";
+	}
+}
+
+int reftable_error_to_errno(int err) {
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return EIO;
+	case REFTABLE_FORMAT_ERROR:
+		return EFAULT;
+	case REFTABLE_NOT_EXIST_ERROR:
+		return ENOENT;
+	case REFTABLE_LOCK_ERROR:
+		return EBUSY;
+	case REFTABLE_API_ERROR:
+		return EINVAL;
+	case REFTABLE_ZLIB_ERROR:
+		return EDOM;
+	default:
+		return ERANGE;
+	}
+}
+
+
+void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
+void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
+void (*reftable_free_ptr)(void *) = &free;
+
+void *reftable_malloc(size_t sz)
+{
+	return (*reftable_malloc_ptr)(sz);
+}
+
+void *reftable_realloc(void *p, size_t sz)
+{
+	return (*reftable_realloc_ptr)(p, sz);
+}
+
+void reftable_free(void *p)
+{
+	reftable_free_ptr(p);
+}
+
+void *reftable_calloc(size_t sz)
+{
+	void *p = reftable_malloc(sz);
+	memset(p, 0, sz);
+	return p;
+}
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *))
+{
+	reftable_malloc_ptr = malloc;
+	reftable_realloc_ptr = realloc;
+	reftable_free_ptr = free;
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 00000000000..6d89eb5d931
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+/* Bigendian en/decoding of integers */
+
+void put_be24(byte *out, uint32_t i);
+uint32_t get_be24(byte *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(int sz, int (*f)(int k, void *args), void *args);
+
+/*
+  Frees a NULL terminated array of malloced strings. The array itself is also
+  freed.
+ */
+void free_names(char **a);
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp);
+
+/* compares two NULL-terminated arrays of strings. */
+int names_equal(char **a, char **b);
+
+/* returns the array size of a NULL-terminated array of strings. */
+int names_length(char **names);
+
+/* Allocation routines; they invoke the functions set through
+ * reftable_set_alloc() */
+void *reftable_malloc(size_t sz);
+void *reftable_realloc(void *p, size_t sz);
+void reftable_free(void *p);
+void *reftable_calloc(size_t sz);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 00000000000..4b4344fcd87
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,425 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "blocksource.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int header_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 24;
+	case 2:
+		return 28;
+	}
+	abort();
+}
+
+int footer_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 68;
+	case 2:
+		return 72;
+	}
+	abort();
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct record rec)
+{
+	struct slice empty = { 0 };
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = { 0 };
+	int n = 0;
+
+	record_key(rec, &key);
+	n = encode_key(&restart, out, last, key, record_val_type(rec));
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	n = record_encode(rec, out, w->hash_size);
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0) {
+		goto err;
+	}
+
+	slice_clear(&key);
+	return 0;
+
+err:
+	slice_clear(&key);
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next) {
+		return -1;
+	}
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = reftable_realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = { 0 };
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		slice_resize(&compressed, src_len);
+
+		while (1) {
+			uLongf dest_len = compressed.len;
+
+			zresult = compress2(compressed.buf, &dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				slice_resize(&compressed, 2 * compressed.len);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				slice_clear(&compressed);
+				return REFTABLE_ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed.buf,
+			       dest_len);
+			w->next = dest_len + block_header_skip;
+			break;
+		}
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct reftable_block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	if (!is_block_type(typ)) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = { 0 };
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip;
+		uLongf src_len = block->len - block_header_skip;
+
+		slice_resize(&uncompressed, sz);
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			slice_clear(&uncompressed);
+			return REFTABLE_ZLIB_ERROR;
+		}
+
+		block_source_return_block(block->source, block);
+		block->data = uncompressed.buf;
+		block->len = dst_len; /* XXX: 4 bytes missing? */
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	{
+		uint16_t restart_count = get_be16(block->data + sz - 2);
+		uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+		byte *restart_bytes = block->data + restart_start;
+
+		/* transfer ownership. */
+		br->block = *block;
+		block->data = NULL;
+		block->len = 0;
+
+		br->hash_size = hash_size;
+		br->block_len = restart_start;
+		br->full_block_size = full_block_size;
+		br->header_off = header_off;
+		br->restart_count = restart_count;
+		br->restart_bytes = restart_bytes;
+	}
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct slice key;
+	struct block_reader *r;
+};
+
+static int restart_key_less(int idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = { 0 };
+	struct slice last_key = { 0 };
+	byte unused_extra;
+	int n = decode_key(&rkey, &unused_extra, last_key, in);
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	{
+		int result = slice_compare(a->key, rkey);
+		slice_clear(&rkey);
+		return result;
+	}
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct record rec)
+{
+	if (it->next_off >= it->br->block_len) {
+		return 1;
+	}
+
+	{
+		struct slice in = {
+			.buf = it->br->block.data + it->next_off,
+			.len = it->br->block_len - it->next_off,
+		};
+		struct slice start = in;
+		struct slice key = { 0 };
+		byte extra;
+		int n = decode_key(&key, &extra, it->last_key, in);
+		if (n < 0) {
+			return -1;
+		}
+
+		slice_consume(&in, n);
+		n = record_decode(rec, key, extra, in, it->br->hash_size);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+
+		slice_copy(&it->last_key, key);
+		it->next_off += start.len - in.len;
+		slice_clear(&key);
+		return 0;
+	}
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = { 0 };
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	byte extra = 0;
+	int n = decode_key(key, &extra, empty, in);
+	if (n < 0) {
+		return n;
+	}
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	slice_clear(&it->last_key);
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		return -1;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	{
+		struct record rec = new_record(block_reader_type(br));
+		struct slice key = { 0 };
+		int result = 0;
+		int err = 0;
+		struct block_iter next = { 0 };
+		while (true) {
+			block_iter_copy_from(&next, it);
+
+			err = block_iter_next(&next, rec);
+			if (err < 0) {
+				result = -1;
+				goto exit;
+			}
+
+			record_key(rec, &key);
+			if (err > 0 || slice_compare(key, want) >= 0) {
+				result = 0;
+				goto exit;
+			}
+
+			block_iter_copy_from(it, &next);
+		}
+
+	exit:
+		slice_clear(&key);
+		slice_clear(&next.last_key);
+		record_destroy(&rec);
+
+		return result;
+	}
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	slice_clear(&bw->last_key);
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 00000000000..62b2e0fec6d
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,124 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+/*
+  Writes reftable blocks. The block_writer is reused across blocks to minimize
+  allocation overhead.
+*/
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+
+	/* Offset ofof the global header. Nonzero in the first block only. */
+	uint32_t header_off;
+
+	/* How often to restart keys. */
+	int restart_interval;
+	int hash_size;
+
+	/* Offset of next byte to write. */
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+
+	struct slice last_key;
+	int entries;
+};
+
+/*
+  initializes the blockwriter to write `typ` entries, using `buf` as temporary
+  storage. `buf` is not owned by the block_writer. */
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+
+/*
+  returns the block type (eg. 'r' for ref records.
+*/
+byte block_writer_type(struct block_writer *bw);
+
+/* appends the record, or -1 if it doesn't fit. */
+int block_writer_add(struct block_writer *w, struct record rec);
+
+/* appends the key restarts, and compress the block if necessary. */
+int block_writer_finish(struct block_writer *w);
+
+/* clears out internally allocated block_writer members. */
+void block_writer_clear(struct block_writer *bw);
+
+/* Read a block. */
+struct block_reader {
+	/* offset of the block header; nonzero for the first block in a
+	 * reftable. */
+	uint32_t header_off;
+
+	/* the memory block */
+	struct reftable_block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint16_t restart_count;
+
+	/* size of the data in the file. For log blocks, this is the compressed
+	 * size. */
+	uint32_t full_block_size;
+};
+
+/* Iterate over entries in a block */
+struct block_iter {
+	/* offset within the block of the next entry to read. */
+	uint32_t next_off;
+	struct block_reader *br;
+
+	/* key for last entry we read. */
+	struct slice last_key;
+};
+
+/* initializes a block reader */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Position `it` at start of the block */
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+
+/* Position `it` to the `want` key in the block */
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want);
+
+/* Returns the block type (eg. 'r' for refs) */
+byte block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+int block_iter_next(struct block_iter *it, struct record rec);
+
+/* Seek to `want` with in the block pointed to by `it` */
+int block_iter_seek(struct block_iter *it, struct slice want);
+
+/* deallocate memory for `it`. The block reader and its block is left intact. */
+void block_iter_close(struct block_iter *it);
+
+/* size of file header, depending on format version */
+int header_size(int version);
+
+/* size of file footer, depending on format version */
+int footer_size(int version);
+
+#endif
diff --git a/reftable/blocksource.h b/reftable/blocksource.h
new file mode 100644
index 00000000000..1fdf0bf4557
--- /dev/null
+++ b/reftable/blocksource.h
@@ -0,0 +1,22 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCKSOURCE_H
+#define BLOCKSOURCE_H
+
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source source);
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_return_block(struct reftable_block_source source,
+			       struct reftable_block *ret);
+void block_source_close(struct reftable_block_source source);
+
+#endif
diff --git a/reftable/bytes.c b/reftable/bytes.c
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/reftable/config.h b/reftable/config.h
new file mode 100644
index 00000000000..40a8c178f10
--- /dev/null
+++ b/reftable/config.h
@@ -0,0 +1 @@
+/* empty */
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 00000000000..5eee72c4c11
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,21 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 00000000000..c0065792a4f
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "reftable.h"
+
+static int dump_table(const char *tablename)
+{
+	struct block_source src = { 0 };
+	int err = block_source_from_file(&src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	struct reader *r = NULL;
+	err = new_reader(&r, src, tablename);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct iterator it = { 0 };
+		err = reader_seek_ref(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+
+		struct ref_record ref = { 0 };
+		while (1) {
+			err = iterator_next_ref(it, &ref);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			ref_record_print(&ref, 20);
+		}
+		iterator_destroy(&it);
+		ref_record_clear(&ref);
+	}
+
+	{
+		struct iterator it = { 0 };
+		err = reader_seek_log(r, &it, "");
+		if (err < 0) {
+			return err;
+		}
+		struct log_record log = { 0 };
+		while (1) {
+			err = iterator_next_log(it, &log);
+			if (err > 0) {
+				break;
+			}
+			if (err < 0) {
+				return err;
+			}
+			log_record_print(&log, 20);
+		}
+		iterator_destroy(&it);
+		log_record_clear(&log);
+	}
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int opt;
+	const char *table = NULL;
+	while ((opt = getopt(argc, argv, "t:")) != -1) {
+		switch (opt) {
+		case 't':
+			table = xstrdup(optarg);
+			break;
+		case '?':
+			printf("usage: %s [-table tablefile]\n", argv[0]);
+			return 2;
+			break;
+		}
+	}
+
+	if (table != NULL) {
+		int err = dump_table(table);
+		if (err < 0) {
+			fprintf(stderr, "%s: %s: %s\n", argv[0], table,
+				error_str(err));
+			return 1;
+		}
+	}
+	return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 00000000000..f29448ee4eb
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,98 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	reftable_free(b);
+}
+
+static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = reftable_malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size) {
+		return -1;
+	}
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int reftable_block_source_from_file(struct reftable_block_source *bs,
+				    const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return REFTABLE_NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0) {
+		return -1;
+	}
+
+	{
+		struct file_block_source *p =
+			reftable_calloc(sizeof(struct file_block_source));
+		p->size = st.st_size;
+		p->fd = fd;
+
+		bs->ops = &file_vtable;
+		bs->arg = p;
+	}
+	return 0;
+}
+
+int reftable_fd_write(void *arg, byte *data, int sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 00000000000..747f3eae256
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,234 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct reftable_iterator it)
+{
+	return it.ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct record rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct reftable_iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct reftable_iterator *it)
+{
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct reftable_iterator it, struct record rec)
+{
+	return it.ops->next(it.iter_arg, rec);
+}
+
+void reftable_iterator_destroy(struct reftable_iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int reftable_iterator_next_ref(struct reftable_iterator it,
+			       struct reftable_ref_record *ref)
+{
+	struct record rec = { 0 };
+	record_from_ref(&rec, ref);
+	return iterator_next(it, rec);
+}
+
+int reftable_iterator_next_log(struct reftable_iterator it,
+			       struct reftable_log_record *log)
+{
+	struct record rec = { 0 };
+	record_from_log(&rec, log);
+	return iterator_next(it, rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	slice_clear(&fri->oid);
+	reftable_iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec.data;
+
+	while (true) {
+		int err = reftable_iterator_next_ref(fri->it, ref);
+		if (err != 0) {
+			return err;
+		}
+
+		if (fri->double_check) {
+			struct reftable_iterator it = { 0 };
+
+			int err = reftable_reader_seek_ref(fri->r, &it,
+							   ref->ref_name);
+			if (err == 0) {
+				err = reftable_iterator_next_ref(it, ref);
+			}
+
+			reftable_iterator_destroy(&it);
+
+			if (err < 0) {
+				return err;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+}
+
+struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reader_return_block(it->r, &it->block_reader.block);
+	slice_clear(&it->oid);
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reader_return_block(it->r, &it->block_reader.block);
+
+	{
+		uint64_t off = it->offsets[it->offset_idx++];
+		int err = reader_init_block_reader(it->r, &it->block_reader,
+						   off, BLOCK_TYPE_REF);
+		if (err < 0) {
+			return err;
+		}
+		if (err > 0) {
+			/* indexed block does not exist. */
+			return REFTABLE_FORMAT_ERROR;
+		}
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct record rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec.data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter *itr =
+		reftable_calloc(sizeof(struct indexed_table_ref_iter));
+	int err = 0;
+
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		reftable_free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 00000000000..408b7f41a74
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct reftable_iterator_vtable {
+	int (*next)(void *iter_arg, struct record rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct reftable_iterator *it);
+int iterator_next(struct reftable_iterator it, struct record rec);
+bool iterator_is_null(struct reftable_iterator it);
+
+/* iterator that produces only ref records that point to `oid` */
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reftable_reader *r;
+	struct slice oid;
+	struct reftable_iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
+					  struct filtering_ref_iterator *);
+
+/* iterator that produces only ref records that point to `oid`,
+   but using the object index.
+ */
+struct indexed_table_ref_iter {
+	struct reftable_reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 00000000000..929ccc621db
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,327 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct record rec = new_record(mi->typ);
+		int err = iterator_next(mi->stack[i], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[i]);
+			record_destroy(&rec);
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->stack[i]);
+	}
+	reftable_free(mi->stack);
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, int idx)
+{
+	if (iterator_is_null(mi->stack[idx])) {
+		return 0;
+	}
+
+	{
+		struct record rec = new_record(mi->typ);
+		struct pq_entry e = {
+			.rec = rec,
+			.index = idx,
+		};
+		int err = iterator_next(mi->stack[idx], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[idx]);
+			record_destroy(&rec);
+			return 0;
+		}
+
+		merged_iter_pqueue_add(&mi->pq, e);
+	}
+	return 0;
+}
+
+static int merged_iter_next_entry(struct merged_iter *mi, struct record rec)
+{
+	struct slice entry_key = { 0 };
+	struct pq_entry entry = { 0 };
+	int err = 0;
+
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	entry = merged_iter_pqueue_remove(&mi->pq);
+	err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0) {
+		return err;
+	}
+
+	/*
+	  One can also use reftable as datacenter-local storage, where the ref
+	  database is maintained in globally consistent database (eg.
+	  CockroachDB or Spanner). In this scenario, replication delays together
+	  with compaction may cause newer tables to contain older entries. In
+	  such a deployment, the loop below must be changed to collect all
+	  entries for the same key, and return new the newest one.
+	*/
+	record_key(entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = { 0 };
+		int err = 0, cmp = 0;
+
+		record_key(top.rec, &k);
+
+		cmp = slice_compare(k, entry_key);
+		slice_clear(&k);
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		record_clear(top.rec);
+		reftable_free(record_yield(&top.rec));
+	}
+
+	record_copy_from(rec, entry.rec, hash_size(mi->hash_id));
+	record_clear(entry.rec);
+	reftable_free(record_yield(&entry.rec));
+	slice_clear(&entry_key);
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct record rec)
+{
+	while (true) {
+		int err = merged_iter_next_entry(mi, rec);
+		if (err == 0 && mi->suppress_deletions &&
+		    record_is_deletion(rec)) {
+			continue;
+		}
+
+		return err;
+	}
+}
+
+static int merged_iter_next_void(void *p, struct record rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	return merged_iter_next(mi, rec);
+}
+
+struct reftable_iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct reftable_iterator *it,
+				      struct merged_iter *mi)
+{
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id)
+{
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reftable_reader *r = stack[i];
+		if (r->hash_id != hash_id) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reftable_reader_min_update_index(r);
+		}
+
+		last_max = reftable_reader_max_update_index(r);
+	}
+
+	{
+		struct reftable_merged_table m = {
+			.stack = stack,
+			.stack_len = n,
+			.min = first_min,
+			.max = last_max,
+			.hash_id = hash_id,
+		};
+
+		*dest = reftable_calloc(sizeof(struct reftable_merged_table));
+		**dest = m;
+	}
+	return 0;
+}
+
+void reftable_merged_table_close(struct reftable_merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reftable_reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct reftable_merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void reftable_merged_table_free(struct reftable_merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	reftable_free(mt);
+}
+
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
+{
+	return mt->min;
+}
+
+static int merged_table_seek_record(struct reftable_merged_table *mt,
+				    struct reftable_iterator *it,
+				    struct record rec)
+{
+	struct reftable_iterator *iters = reftable_calloc(
+		sizeof(struct reftable_iterator) * mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = record_type(rec),
+		.hash_id = mt->hash_id,
+		.suppress_deletions = mt->suppress_deletions,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			reftable_iterator_destroy(&iters[i]);
+		}
+		reftable_free(iters);
+		return err;
+	}
+
+	merged.stack_len = n;
+	err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	}
+
+	{
+		struct merged_iter *p =
+			reftable_malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 00000000000..d6e1a72733e
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,36 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct reftable_merged_table {
+	struct reftable_reader **stack;
+	int stack_len;
+	uint32_t hash_id;
+	bool suppress_deletions;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	int stack_len;
+	byte typ;
+	bool suppress_deletions;
+	struct merged_iter_pqueue pq;
+};
+
+void merged_table_clear(struct reftable_merged_table *mt);
+
+#endif
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 00000000000..bfad96c6695
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,115 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = { 0 };
+	struct slice bk = { 0 };
+	int cmp = 0;
+	record_key(a.rec, &ak);
+	record_key(b.rec, &bk);
+
+	cmp = slice_compare(ak, bk);
+
+	slice_clear(&ak);
+	slice_clear(&bk);
+
+	if (cmp == 0) {
+		return a.index > b.index;
+	}
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = reftable_realloc(pq->heap,
+					    pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		record_clear(pq->heap[i].rec);
+		reftable_free(record_yield(&pq->heap[i].rec));
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 00000000000..b585a52ee14
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 00000000000..d91b089f703
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,754 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct reftable_block_source source)
+{
+	return source.ops->size(source.arg);
+}
+
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	int result = source.ops->read_block(source.arg, dest, off, size);
+	dest->source = source;
+	return result;
+}
+
+void block_source_return_block(struct reftable_block_source source,
+			       struct reftable_block *blockp)
+{
+	source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct reftable_block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reftable_reader_offsets *
+reader_offsets_for(struct reftable_reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reftable_reader *r,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size) {
+		return 0;
+	}
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(r->source, dest, off, sz);
+}
+
+void reader_return_block(struct reftable_reader *r, struct reftable_block *p)
+{
+	block_source_return_block(r->source, p);
+}
+
+uint32_t reftable_reader_hash_id(struct reftable_reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reftable_reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reftable_reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	int err = 0;
+	if (memcmp(f, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, header_size(r->version))) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+
+	f++;
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	if (r->version == 1) {
+		r->hash_id = SHA1_ID;
+	} else {
+		r->hash_id = get_be32(f);
+		switch (r->hash_id) {
+		case SHA1_ID:
+			break;
+		case SHA256_ID:
+			break;
+		default:
+			err = REFTABLE_FORMAT_ERROR;
+			goto exit;
+		}
+		f += 4;
+	}
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	{
+		uint32_t computed_crc = crc32(0, footer, f - footer);
+		uint32_t file_crc = get_be32(f);
+		f += 4;
+		if (computed_crc != file_crc) {
+			err = REFTABLE_FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	{
+		byte first_block_typ = header[header_size(r->version)];
+		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+		r->ref_offsets.offset = 0;
+		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+					  r->log_offsets.offset > 0);
+		r->obj_offsets.present = r->obj_offsets.offset > 0;
+	}
+	err = 0;
+exit:
+	return err;
+}
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
+		const char *name)
+{
+	struct reftable_block footer = { 0 };
+	struct reftable_block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reftable_reader));
+
+	/* Need +1 to read type of first block. */
+	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
+	if (err != header_size(2) + 1) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	if (memcmp(header.data, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+	r->version = header.data[4];
+	if (r->version != 1 && r->version != 2) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+
+	r->size = block_source_size(source) - footer_size(r->version);
+	r->source = source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size,
+				      footer_size(r->version));
+	if (err != footer_size(r->version)) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+exit:
+	block_source_return_block(r->source, &footer);
+	block_source_return_block(r->source, &header);
+	return err;
+}
+
+struct table_iter {
+	struct reftable_reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
+		((struct reftable_ref_record *)rec.data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reader_return_block(ti->r, &ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off,
+				  int version)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += header_size(version);
+	}
+
+	*typ = data[0];
+	if (is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct reftable_block block = { 0 };
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : header_size(r->version);
+	int32_t block_size = 0;
+
+	if (next_off >= r->size) {
+		return 1;
+	}
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0) {
+		return err;
+	}
+
+	block_size = extract_block_size(block.data, &block_typ, next_off,
+					r->version);
+	if (block_size < 0) {
+		return block_size;
+	}
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reader_return_block(r, &block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reader_return_block(r, &block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0) {
+		return err;
+	}
+
+	{
+		struct block_reader *brp =
+			reftable_malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct record rec)
+{
+	if (record_type(rec) != ti->typ) {
+		return REFTABLE_API_ERROR;
+	}
+
+	while (true) {
+		struct table_iter next = { 0 };
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct record rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct reftable_iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reftable_reader *r,
+				struct table_iter *ti, uint64_t off, byte typ)
+{
+	struct block_reader br = { 0 };
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0) {
+		return err;
+	}
+
+	brp = reftable_malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reftable_reader *r, struct table_iter *ti,
+			byte typ, bool index)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
+			      struct record want)
+{
+	struct record rec = new_record(record_type(want));
+	struct slice want_key = { 0 };
+	struct slice got_key = { 0 };
+	struct table_iter next = { 0 };
+	int err = -1;
+	record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0) {
+			goto exit;
+		}
+		{
+			int cmp = slice_compare(got_key, want_key);
+			if (cmp > 0) {
+				table_iter_block_done(&next);
+				break;
+			}
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0) {
+		goto exit;
+	}
+	err = 0;
+
+exit:
+	block_iter_close(&next.bi);
+	record_destroy(&rec);
+	slice_clear(&want_key);
+	slice_clear(&got_key);
+	return err;
+}
+
+static int reader_seek_indexed(struct reftable_reader *r,
+			       struct reftable_iterator *it, struct record rec)
+{
+	struct index_record want_index = { 0 };
+	struct record want_index_rec = { 0 };
+	struct index_record index_result = { 0 };
+	struct record index_result_rec = { 0 };
+	struct table_iter index_iter = { 0 };
+	struct table_iter next = { 0 };
+	int err = 0;
+
+	record_key(rec, &want_index.last_key);
+	record_from_index(&want_index_rec, &want_index);
+	record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, record_type(rec), true);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reader_seek_linear(r, &index_iter, want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (next.typ == record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = REFTABLE_FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter *malloced =
+			reftable_calloc(sizeof(struct table_iter));
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+exit:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	record_clear(want_index_rec);
+	record_clear(index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reftable_reader *r,
+				struct reftable_iterator *it, struct record rec)
+{
+	struct reftable_reader_offsets *offs =
+		reader_offsets_for(r, record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = { 0 };
+	int err = 0;
+	if (idx > 0) {
+		return reader_seek_indexed(r, it, rec);
+	}
+
+	err = reader_start(r, &ti, record_type(rec), false);
+	if (err < 0) {
+		return err;
+	}
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct table_iter *p =
+			reftable_malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct record rec)
+{
+	byte typ = record_type(rec);
+
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return reader_seek(r, it, rec);
+}
+
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return reader_seek(r, it, rec);
+}
+
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reftable_reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int reftable_new_reader(struct reftable_reader **p,
+			struct reftable_block_source src, char const *name)
+{
+	struct reftable_reader *rd =
+		reftable_calloc(sizeof(struct reftable_reader));
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		reftable_free(rd);
+	}
+	return err;
+}
+
+void reftable_reader_free(struct reftable_reader *r)
+{
+	reader_close(r);
+	reftable_free(r);
+}
+
+static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
+					    struct reftable_iterator *it,
+					    byte *oid)
+{
+	struct obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct record want_rec = { 0 };
+	struct reftable_iterator oit = { 0 };
+	struct obj_record got = { 0 };
+	struct record got_rec = { 0 };
+	int err = 0;
+
+	record_from_obj(&want_rec, &want);
+
+	err = reader_seek(r, &oit, want_rec);
+	if (err != 0) {
+		return err;
+	}
+
+	record_from_obj(&got_rec, &got);
+	err = iterator_next(oit, got_rec);
+	reftable_iterator_destroy(&oit);
+	if (err < 0) {
+		return err;
+	}
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	{
+		struct indexed_table_ref_iter *itr = NULL;
+		err = new_indexed_table_ref_iter(&itr, r, oid,
+						 hash_size(r->hash_id),
+						 got.offsets, got.offset_len);
+		if (err < 0) {
+			record_clear(got_rec);
+			return err;
+		}
+		got.offsets = NULL;
+		record_clear(got_rec);
+
+		iterator_from_indexed_table_ref_iter(it, itr);
+	}
+
+	return 0;
+}
+
+static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
+					      struct reftable_iterator *it,
+					      byte *oid, int oid_len)
+{
+	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
+	struct filtering_ref_iterator *filter = NULL;
+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		reftable_free(ti);
+		return err;
+	}
+
+	filter = reftable_calloc(sizeof(struct filtering_ref_iterator));
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	filter->r = r;
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, byte *oid,
+			     int oid_len)
+{
+	if (r->obj_offsets.present) {
+		return reftable_reader_refs_for_indexed(r, it, oid);
+	}
+	return reftable_reader_refs_for_unindexed(r, it, oid, oid_len);
+}
+
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 00000000000..80d8a5adbaa
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,68 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source source);
+
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_return_block(struct reftable_block_source source,
+			       struct reftable_block *ret);
+void block_source_close(struct reftable_block_source *source);
+
+/* metadata for a block type */
+struct reftable_reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+/* The state for reading a reftable file. */
+struct reftable_reader {
+	/* for convience, associate a name with the instance. */
+	char *name;
+	struct reftable_block_source source;
+
+	/* Size of the file, excluding the footer. */
+	uint64_t size;
+
+	/* 'sha1' for SHA1, 's256' for SHA-256 */
+	uint32_t hash_id;
+
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	/* Length of the OID keys in the 'o' section */
+	int object_id_len;
+	int version;
+
+	struct reftable_reader_offsets ref_offsets;
+	struct reftable_reader_offsets obj_offsets;
+	struct reftable_reader_offsets log_offsets;
+};
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
+		const char *name);
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct record rec);
+void reader_close(struct reftable_reader *r);
+const char *reader_name(struct reftable_reader *r);
+void reader_return_block(struct reftable_reader *r, struct reftable_block *p);
+
+/* initialize a block reader to read from `r` */
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 00000000000..b0f18f26c55
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1126 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+/* record.c - methods for different types of records. */
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0) {
+		return -1;
+	}
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = { 0 };
+	int i = 9;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	{
+		int n = sizeof(buf) - i - 1;
+		if (dest.len < n) {
+			return -1;
+		}
+		memcpy(dest.buf, &buf[i + 1], n);
+		return n;
+	}
+}
+
+int is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+	if (in.len < tsize) {
+		return -1;
+	}
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	slice_consume(&in, tsize);
+
+	return start_len - in.len;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+	if (s.len < l) {
+		return -1;
+	}
+	memcpy(s.buf, str, l);
+	slice_consume(&s, l);
+
+	return start.len - s.len;
+}
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	if (dest.len < suffix_len) {
+		return -1;
+	}
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	slice_consume(&dest, suffix_len);
+
+	return start.len - dest.len;
+}
+
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	if (prefix_len > last_key.len) {
+		return -1;
+	}
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len) {
+		return -1;
+	}
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	slice_consume(&in, suffix_len);
+
+	return start_len - in.len;
+}
+
+static void reftable_ref_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_ref_record *rec =
+		(const struct reftable_ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
+	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	reftable_ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = xstrdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = reftable_malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = reftable_malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9) {
+		return '0' + c;
+	}
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size(hash_id));
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size(hash_id));
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void reftable_ref_record_clear_void(void *rec)
+{
+	reftable_ref_record_clear((struct reftable_ref_record *)rec);
+}
+
+void reftable_ref_record_clear(struct reftable_ref_record *ref)
+{
+	reftable_free(ref->ref_name);
+	reftable_free(ref->target);
+	reftable_free(ref->target_value);
+	reftable_free(ref->value);
+	memset(ref, 0, sizeof(struct reftable_ref_record));
+}
+
+static byte reftable_ref_record_val_type(const void *rec)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL) {
+		return 3;
+	}
+	return 0;
+}
+
+static int reftable_ref_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+
+	return start.len - s.len;
+}
+
+static int reftable_ref_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0) {
+		return n;
+	}
+	assert(hash_size > 0);
+
+	slice_consume(&in, n);
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = reftable_malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = reftable_malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		break;
+	case 3: {
+		struct slice dest = { 0 };
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+		seen_target = true;
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+static bool reftable_ref_record_is_deletion_void(const void *p)
+{
+	return reftable_ref_record_is_deletion(
+		(const struct reftable_ref_record *)p);
+}
+
+struct record_vtable reftable_ref_record_vtable = {
+	.key = &reftable_ref_record_key,
+	.type = BLOCK_TYPE_REF,
+	.copy_from = &reftable_ref_record_copy_from,
+	.val_type = &reftable_ref_record_val_type,
+	.encode = &reftable_ref_record_encode,
+	.decode = &reftable_ref_record_decode,
+	.clear = &reftable_ref_record_clear_void,
+	.is_deletion = &reftable_ref_record_is_deletion_void,
+};
+
+static void obj_record_key(const void *r, struct slice *dest)
+{
+	const struct obj_record *rec = (const struct obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	const struct obj_record *src = (const struct obj_record *)src_rec;
+
+	*ref = *src;
+	ref->hash_prefix = reftable_malloc(ref->hash_prefix_len);
+	memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
+
+	{
+		int olen = ref->offset_len * sizeof(uint64_t);
+		ref->offsets = reftable_malloc(olen);
+		memcpy(ref->offsets, src->offsets, olen);
+	}
+}
+
+static void obj_record_clear(void *rec)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	FREE_AND_NULL(ref->hash_prefix);
+	FREE_AND_NULL(ref->offsets);
+	memset(ref, 0, sizeof(struct obj_record));
+}
+
+static byte obj_record_val_type(const void *rec)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8) {
+		return r->offset_len;
+	}
+	return 0;
+}
+
+static int obj_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+	if (r->offset_len == 0) {
+		return start.len - s.len;
+	}
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	{
+		uint64_t last = r->offsets[0];
+		int i = 0;
+		for (i = 1; i < r->offset_len; i++) {
+			int n = put_var_int(s, r->offsets[i] - last);
+			if (n < 0) {
+				return -1;
+			}
+			slice_consume(&s, n);
+			last = r->offsets[i];
+		}
+	}
+	return start.len - s.len;
+}
+
+static int obj_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct obj_record *r = (struct obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	r->hash_prefix = reftable_malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		slice_consume(&in, n);
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0) {
+		return start.len - in.len;
+	}
+
+	r->offsets = reftable_malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0) {
+		return n;
+	}
+	slice_consume(&in, n);
+
+	{
+		uint64_t last = r->offsets[0];
+		int j = 1;
+		while (j < count) {
+			uint64_t delta = 0;
+			int n = get_var_int(&delta, in);
+			if (n < 0) {
+				return n;
+			}
+			slice_consume(&in, n);
+
+			last = r->offsets[j] = (delta + last);
+			j++;
+		}
+	}
+	return start.len - in.len;
+}
+
+static bool not_a_deletion(const void *p)
+{
+	return false;
+}
+
+struct record_vtable obj_record_vtable = {
+	.key = &obj_record_key,
+	.type = BLOCK_TYPE_OBJ,
+	.copy_from = &obj_record_copy_from,
+	.val_type = &obj_record_val_type,
+	.encode = &obj_record_encode,
+	.decode = &obj_record_decode,
+	.clear = &obj_record_clear,
+	.is_deletion = not_a_deletion,
+};
+
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size(hash_id));
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size(hash_id));
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static void reftable_log_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_log_record *rec =
+		(const struct reftable_log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_be64(dest->buf + 1 + len, ts);
+}
+
+static void reftable_log_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
+	const struct reftable_log_record *src =
+		(const struct reftable_log_record *)src_rec;
+
+	*dst = *src;
+	if (dst->ref_name != NULL) {
+		dst->ref_name = xstrdup(dst->ref_name);
+	}
+	if (dst->email != NULL) {
+		dst->email = xstrdup(dst->email);
+	}
+	if (dst->name != NULL) {
+		dst->name = xstrdup(dst->name);
+	}
+	if (dst->message != NULL) {
+		dst->message = xstrdup(dst->message);
+	}
+
+	if (dst->new_hash != NULL) {
+		dst->new_hash = reftable_malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = reftable_malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void reftable_log_record_clear_void(void *rec)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	reftable_log_record_clear(r);
+}
+
+void reftable_log_record_clear(struct reftable_log_record *r)
+{
+	reftable_free(r->ref_name);
+	reftable_free(r->new_hash);
+	reftable_free(r->old_hash);
+	reftable_free(r->name);
+	reftable_free(r->email);
+	reftable_free(r->message);
+	memset(r, 0, sizeof(struct reftable_log_record));
+}
+
+static byte reftable_log_record_val_type(const void *rec)
+{
+	const struct reftable_log_record *log =
+		(const struct reftable_log_record *)rec;
+
+	return reftable_log_record_is_deletion(log) ? 0 : 1;
+}
+
+static byte zero[SHA256_SIZE] = { 0 };
+
+static int reftable_log_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (reftable_log_record_is_deletion(r)) {
+		return 0;
+	}
+
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size) {
+		return -1;
+	}
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	slice_consume(&s, 2 * hash_size);
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = put_var_int(s, r->time);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (s.len < 2) {
+		return -1;
+	}
+
+	put_be16(s.buf, r->tz_offset);
+	slice_consume(&s, 2);
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	return start.len - s.len;
+}
+
+static int reftable_log_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = { 0 };
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (val_type == 0) {
+		return 0;
+	}
+
+	if (in.len < 2 * hash_size) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	r->old_hash = reftable_realloc(r->old_hash, hash_size);
+	r->new_hash = reftable_realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	slice_consume(&in, 2 * hash_size);
+
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->name = reftable_realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->email = reftable_realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+	r->time = ts;
+	if (in.len < 2) {
+		goto error;
+	}
+
+	r->tz_offset = get_be16(in.buf);
+	slice_consume(&in, 2);
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->message = reftable_realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	return start.len - in.len;
+
+error:
+	slice_clear(&dest);
+	return REFTABLE_FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL) {
+		a = empty;
+	}
+	if (b == NULL) {
+		b = empty;
+	}
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL) {
+		a = zero;
+	}
+	if (b == NULL) {
+		b = zero;
+	}
+	return !memcmp(a, b, sz);
+}
+
+bool reftable_log_record_equal(struct reftable_log_record *a,
+			       struct reftable_log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+static bool reftable_log_record_is_deletion_void(const void *p)
+{
+	return reftable_log_record_is_deletion(
+		(const struct reftable_log_record *)p);
+}
+
+struct record_vtable reftable_log_record_vtable = {
+	.key = &reftable_log_record_key,
+	.type = BLOCK_TYPE_LOG,
+	.copy_from = &reftable_log_record_copy_from,
+	.val_type = &reftable_log_record_val_type,
+	.encode = &reftable_log_record_encode,
+	.decode = &reftable_log_record_decode,
+	.clear = &reftable_log_record_clear_void,
+	.is_deletion = &reftable_log_record_is_deletion_void,
+};
+
+struct record new_record(byte typ)
+{
+	struct record rec;
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct reftable_ref_record *r =
+			reftable_calloc(sizeof(struct reftable_ref_record));
+		record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct obj_record *r =
+			reftable_calloc(sizeof(struct obj_record));
+		record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct reftable_log_record *r =
+			reftable_calloc(sizeof(struct reftable_log_record));
+		record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct index_record *r =
+			reftable_calloc(sizeof(struct index_record));
+		record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+void *record_yield(struct record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+void record_destroy(struct record *rec)
+{
+	record_clear(*rec);
+	reftable_free(record_yield(rec));
+}
+
+static void index_record_key(const void *r, struct slice *dest)
+{
+	struct index_record *rec = (struct index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void index_record_copy_from(void *rec, const void *src_rec,
+				   int hash_size)
+{
+	struct index_record *dst = (struct index_record *)rec;
+	struct index_record *src = (struct index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void index_record_clear(void *rec)
+{
+	struct index_record *idx = (struct index_record *)rec;
+	slice_clear(&idx->last_key);
+}
+
+static byte index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int index_record_encode(const void *rec, struct slice out, int hash_size)
+{
+	const struct index_record *r = (const struct index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&out, n);
+
+	return start.len - out.len;
+}
+
+static int index_record_decode(void *rec, struct slice key, byte val_type,
+			       struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct index_record *r = (struct index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&in, n);
+	return start.len - in.len;
+}
+
+struct record_vtable index_record_vtable = {
+	.key = &index_record_key,
+	.type = BLOCK_TYPE_INDEX,
+	.copy_from = &index_record_copy_from,
+	.val_type = &index_record_val_type,
+	.encode = &index_record_encode,
+	.decode = &index_record_decode,
+	.clear = &index_record_clear,
+	.is_deletion = &not_a_deletion,
+};
+
+void record_key(struct record rec, struct slice *dest)
+{
+	rec.ops->key(rec.data, dest);
+}
+
+byte record_type(struct record rec)
+{
+	return rec.ops->type;
+}
+
+int record_encode(struct record rec, struct slice dest, int hash_size)
+{
+	return rec.ops->encode(rec.data, dest, hash_size);
+}
+
+void record_copy_from(struct record rec, struct record src, int hash_size)
+{
+	assert(src.ops->type == rec.ops->type);
+
+	rec.ops->copy_from(rec.data, src.data, hash_size);
+}
+
+byte record_val_type(struct record rec)
+{
+	return rec.ops->val_type(rec.data);
+}
+
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size)
+{
+	return rec.ops->decode(rec.data, key, extra, src, hash_size);
+}
+
+void record_clear(struct record rec)
+{
+	return rec.ops->clear(rec.data);
+}
+
+bool record_is_deletion(struct record rec)
+{
+	return rec.ops->is_deletion(rec.data);
+}
+
+void record_from_ref(struct record *rec, struct reftable_ref_record *ref_rec)
+{
+	rec->data = ref_rec;
+	rec->ops = &reftable_ref_record_vtable;
+}
+
+void record_from_obj(struct record *rec, struct obj_record *obj_rec)
+{
+	rec->data = obj_rec;
+	rec->ops = &obj_record_vtable;
+}
+
+void record_from_index(struct record *rec, struct index_record *index_rec)
+{
+	rec->data = index_rec;
+	rec->ops = &index_record_vtable;
+}
+
+void record_from_log(struct record *rec, struct reftable_log_record *log_rec)
+{
+	rec->data = log_rec;
+	rec->ops = &reftable_log_record_vtable;
+}
+
+struct reftable_ref_record *record_as_ref(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_REF);
+	return (struct reftable_ref_record *)rec.data;
+}
+
+struct reftable_log_record *record_as_log(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_LOG);
+	return (struct reftable_log_record *)rec.data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL) {
+		return !memcmp(a, b, hash_size);
+	}
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL) {
+		return 0 == strcmp(a, b);
+	}
+
+	return a == b;
+}
+
+bool reftable_ref_record_equal(struct reftable_ref_record *a,
+			       struct reftable_ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int reftable_ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_ref_record *)a)->ref_name,
+		      ((struct reftable_ref_record *)b)->ref_name);
+}
+
+bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int reftable_log_record_compare_key(const void *a, const void *b)
+{
+	struct reftable_log_record *la = (struct reftable_log_record *)a;
+	struct reftable_log_record *lb = (struct reftable_log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp) {
+		return cmp;
+	}
+	if (la->update_index > lb->update_index) {
+		return -1;
+	}
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
+{
+	return (log->new_hash == NULL && log->old_hash == NULL &&
+		log->name == NULL && log->email == NULL &&
+		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
+		log->message == NULL);
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 00000000000..011d9bc56fc
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,121 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+/* utilities for de/encoding varints */
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+
+/* Methods for records. */
+struct record_vtable {
+	/* encode the key of to a byte slice. */
+	void (*key)(const void *rec, struct slice *dest);
+
+	/* The record type of ('r' for ref). */
+	byte type;
+
+	void (*copy_from)(void *dest, const void *src, int hash_size);
+
+	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
+	 * vs ref deletion) */
+	byte (*val_type)(const void *rec);
+
+	/* encodes rec into dest, returning how much space was used. */
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+
+	/* decode data from `src` into the record. */
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+
+	/* deallocate and null the record. */
+	void (*clear)(void *rec);
+
+	/* is this a tombstone? */
+	bool (*is_deletion)(const void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct record {
+	void *data;
+	struct record_vtable *ops;
+};
+
+/* returns true for recognized block types. Block start with the block type. */
+int is_block_type(byte typ);
+
+/* creates a malloced record of the given type. Dispose with record_destroy */
+struct record new_record(byte typ);
+
+extern struct record_vtable reftable_ref_record_vtable;
+
+/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
+   number of bytes written. */
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra);
+
+/* Decode into `key` and `extra` from `in` */
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in);
+
+/* index_record are used internally to speed up lookups. */
+struct index_record {
+	uint64_t offset; /* Offset of block */
+	struct slice last_key; /* Last key of the block. */
+};
+
+/* obj_record stores an object ID => ref mapping. */
+struct obj_record {
+	byte *hash_prefix; /* leading bytes of the object ID */
+	int hash_prefix_len; /* number of leading bytes. Constant
+			      * across a single table. */
+	uint64_t *offsets; /* a vector of file offsets. */
+	int offset_len;
+};
+
+/* see struct record_vtable */
+
+void record_key(struct record rec, struct slice *dest);
+byte record_type(struct record rec);
+void record_copy_from(struct record rec, struct record src, int hash_size);
+byte record_val_type(struct record rec);
+int record_encode(struct record rec, struct slice dest, int hash_size);
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size);
+bool record_is_deletion(struct record rec);
+
+/* zeroes out the embedded record */
+void record_clear(struct record rec);
+
+/* clear out the record, yielding the record data that was encapsulated. */
+void *record_yield(struct record *rec);
+
+/* clear and deallocate embedded record, and zero `rec`. */
+void record_destroy(struct record *rec);
+
+/* initialize generic records from concrete records. The generic record should
+ * be zeroed out. */
+void record_from_obj(struct record *rec, struct obj_record *objrec);
+void record_from_index(struct record *rec, struct index_record *idxrec);
+void record_from_ref(struct record *rec, struct reftable_ref_record *refrec);
+void record_from_log(struct record *rec, struct reftable_log_record *logrec);
+struct reftable_ref_record *record_as_ref(struct record ref);
+struct reftable_log_record *record_as_log(struct record ref);
+
+/* for qsort. */
+int reftable_ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int reftable_log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 00000000000..318b15af717
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,527 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *));
+
+/****************************************************************
+ Basic data types
+
+ Reftables store the state of each ref in struct reftable_ref_record, and they
+ store a sequence of reflog updates in struct reftable_log_record.
+ ****************************************************************/
+
+/* reftable_ref_record holds a ref database entry target_value */
+struct reftable_ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
+
+/* prints a reftable_ref_record onto stdout */
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id);
+
+/* frees and nulls all pointer values. */
+void reftable_ref_record_clear(struct reftable_ref_record *ref);
+
+/* returns whether two reftable_ref_records are the same */
+int reftable_ref_record_equal(struct reftable_ref_record *a,
+			      struct reftable_ref_record *b, int hash_size);
+
+/* reftable_log_record holds a reflog entry */
+struct reftable_log_record {
+	char *ref_name;
+	uint64_t update_index; /* logical timestamp of a transactional update.
+				*/
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int reftable_log_record_is_deletion(const struct reftable_log_record *log);
+
+/* frees and nulls all pointer values. */
+void reftable_log_record_clear(struct reftable_log_record *log);
+
+/* returns whether two records are equal. */
+int reftable_log_record_equal(struct reftable_log_record *a,
+			      struct reftable_log_record *b, int hash_size);
+
+/* dumps a reftable_log_record on stdout, for debugging/testing. */
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id);
+
+/****************************************************************
+ Error handling
+
+ Error are signaled with negative integer return values. 0 means success.
+ ****************************************************************/
+
+/* different types of errors */
+enum reftable_error {
+	/* Unexpected file system behavior */
+	REFTABLE_IO_ERROR = -2,
+
+	/* Format inconsistency on reading data
+	 */
+	REFTABLE_FORMAT_ERROR = -3,
+
+	/* File does not exist. Returned from block_source_from_file(),  because
+	   it needs special handling in stack.
+	*/
+	REFTABLE_NOT_EXIST_ERROR = -4,
+
+	/* Trying to write out-of-date data. */
+	REFTABLE_LOCK_ERROR = -5,
+
+	/* Misuse of the API:
+	   - on writing a record with NULL ref_name.
+	   - on writing a reftable_ref_record outside the table limits
+	   - on writing a ref or log record before the stack's next_update_index
+	   - on reading a reftable_ref_record from log iterator, or vice versa.
+	*/
+	REFTABLE_API_ERROR = -6,
+
+	/* Decompression error */
+	REFTABLE_ZLIB_ERROR = -7,
+
+	/* Wrote a table without blocks. */
+	REFTABLE_EMPTY_TABLE_ERROR = -8,
+};
+
+/* convert the numeric error code to a string. The string should not be
+ * deallocated. */
+const char *reftable_error_str(int err);
+
+/*
+ * Convert the numeric error code to an equivalent errno code.
+ */
+int reftable_error_to_errno(int err);
+
+/****************************************************************
+ Writing
+
+ Writing single reftables
+ ****************************************************************/
+
+/* reftable_write_options sets options for writing a single reftable. */
+struct reftable_write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+};
+
+/* reftable_block_stats holds statistics for a single block type */
+struct reftable_block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct reftable_stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct reftable_block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct reftable_block_stats obj_stats;
+	/* stats for index blocks */
+	struct reftable_block_stats idx_stats;
+	/* stats for log blocks */
+	struct reftable_block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* reftable_new_writer creates a new writer */
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, uint8_t *, int),
+		    void *writer_arg, struct reftable_write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int reftable_fd_write(void *fdp, uint8_t *data, int size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+
+   For transactional updates, typically min==max. When converting an existing
+   ref database into a single reftable, this would be a range of update-index
+   timestamps.
+ */
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max);
+
+/* adds a reftable_ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned.
+
+   It is an error to write a ref record after a log record.
+ */
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n);
+
+/* adds a reftable_log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n);
+
+/* reftable_writer_close finalizes the reftable. The writer is retained so
+ * statistics can be inspected. */
+int reftable_writer_close(struct reftable_writer *w);
+
+/* writer_stats returns the statistics on the reftable being written.
+
+   This struct becomes invalid when the writer is freed.
+ */
+const struct reftable_stats *writer_stats(struct reftable_writer *w);
+
+/* reftable_writer_free deallocates memory for the writer */
+void reftable_writer_free(struct reftable_writer *w);
+
+/****************************************************************
+ * ITERATING
+ ****************************************************************/
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct reftable_iterator {
+	struct reftable_iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_ref(struct reftable_iterator it,
+			       struct reftable_ref_record *ref);
+
+/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_log(struct reftable_iterator it,
+			       struct reftable_log_record *log);
+
+/* releases resources associated with an iterator. */
+void reftable_iterator_destroy(struct reftable_iterator *it);
+
+/****************************************************************
+ Reading single tables
+
+ The follow routines are for reading single files. For an application-level
+ interface, skip ahead to struct reftable_merged_table and struct
+ reftable_stack.
+ ****************************************************************/
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct reftable_block_source {
+	struct reftable_block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct reftable_block {
+	uint8_t *data;
+	int len;
+	struct reftable_block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct reftable_block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct reftable_block *dest,
+			  uint64_t off, uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct reftable_block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int reftable_block_source_from_file(struct reftable_block_source *block_src,
+				    const char *name);
+
+/* The reader struct is a handle to an open reftable file. */
+struct reftable_reader;
+
+/* reftable_new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp.  The name is used for creating a
+ * stack. Typically, it is the basename of the file.
+ */
+int reftable_new_reader(struct reftable_reader **pp,
+			struct reftable_block_source, const char *name);
+
+/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
+   in the table.  To seek to the start of the table, use name = "".
+
+   example:
+
+   struct reftable_reader *r = NULL;
+   int err = reftable_new_reader(&r, src, "filename");
+   if (err < 0) { ... }
+   struct reftable_iterator it  = {0};
+   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct reftable_ref_record ref  = {0};
+   while (1) {
+     err = reftable_iterator_next_ref(it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   reftable_iterator_destroy(&it);
+   reftable_ref_record_clear(&ref);
+ */
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reftable_reader_hash_id(struct reftable_reader *r);
+
+/* seek to logs for the given name, older than update_index. To seek to the
+   start of the table, use name = "".
+ */
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reftable_reader_free(struct reftable_reader *);
+
+/* return an iterator for the refs pointing to oid */
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid,
+			     int oid_len);
+
+/* return the max_update_index for a table */
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
+
+/****************************************************************
+ Merged tables
+
+ A ref database kept in a sequence of table files. The merged_table presents a
+ unified view to reading (seeking, iterating) a sequence of immutable tables.
+ ****************************************************************/
+
+/* A merged table is implements seeking/iterating over a stack of tables. */
+struct reftable_merged_table;
+
+/* reftable_new_merged_table creates a new merged table. It takes ownership of
+   the stack array.
+*/
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id);
+
+/* returns an iterator positioned just before 'name' */
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index);
+
+/* like reftable_merged_table_seek_log_at but look for the newest entry. */
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
+
+/* closes readers for the merged tables */
+void reftable_merged_table_close(struct reftable_merged_table *mt);
+
+/* releases memory for the merged_table */
+void reftable_merged_table_free(struct reftable_merged_table *m);
+
+/****************************************************************
+ Mutable ref database
+
+ The stack presents an interface to a mutable sequence of reftables.
+ ****************************************************************/
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct reftable_stack;
+
+/* open a new reftable stack. The tables along with the table list will be
+   stored in 'dir'. Typically, this should be .git/reftables.
+*/
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
+
+/* holds a transaction to add tables at the top of a stack. */
+struct reftable_addition;
+
+/*
+  returns a new transaction to add reftables to the given stack. As a side
+  effect, the ref database is locked.
+*/
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st);
+
+/* Adds a reftable to transaction. */
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg);
+
+/* Commits the transaction, releasing the lock. */
+int reftable_addition_commit(struct reftable_addition *add);
+
+/* Release all non-committed data from the transaction; releases the lock if
+ * held. */
+void reftable_addition_close(struct reftable_addition *add);
+
+/* add a new table to the stack. The write_table function must call
+   reftable_writer_set_limits, add refs and return an error value. */
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write_table)(struct reftable_writer *wr,
+					  void *write_arg),
+		       void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st);
+
+/* frees all resources associated with the stack. */
+void reftable_stack_destroy(struct reftable_stack *st);
+
+/* reloads the stack if necessary. */
+int reftable_stack_reload(struct reftable_stack *st);
+
+/* Policy for expiring reflog entries. */
+struct reftable_log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int reftable_stack_auto_compact(struct reftable_stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log);
+
+/* statistics on past compactions. */
+struct reftable_compaction_stats {
+	uint64_t bytes; /* total number of bytes written */
+	int attempts; /* how often we tried to compact */
+	int failures; /* failures happen on concurrent updates */
+};
+
+/* return statistics for compaction up till now. */
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st);
+
+#endif
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 00000000000..f3f190c3918
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,224 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+
+	{
+		int l = strlen(str);
+		l++; /* \0 */
+		slice_resize(s, l);
+		memcpy(s->buf, str, l);
+		s->len = l - 1;
+	}
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = reftable_realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_append_string(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_append(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+void slice_consume(struct slice *s, int n)
+{
+	s->buf += n;
+	s->len -= n;
+}
+
+byte *slice_yield(struct slice *s)
+{
+	byte *p = s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_clear(struct slice *s)
+{
+	reftable_free(slice_yield(s));
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = { 0 };
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_yield(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	if (a.len != b.len) {
+		return 0;
+	}
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_compare(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	if (res != 0) {
+		return res;
+	}
+	if (a.len < b.len) {
+		return -1;
+	} else if (a.len > b.len) {
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+int slice_write(struct slice *b, byte *data, int sz)
+{
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = reftable_realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_write_void(void *b, byte *data, int sz)
+{
+	return slice_write((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = reftable_calloc(size);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf)
+{
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+struct reftable_block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct reftable_block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct reftable_block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 00000000000..d7e0603d346
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,76 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+/*
+  provides bounds-checked byte ranges.
+  To use, initialize as "slice x = {0};"
+ */
+struct slice {
+	int len;
+	int cap;
+	byte *buf;
+};
+
+void slice_set_string(struct slice *dest, const char *src);
+void slice_append_string(struct slice *dest, const char *src);
+/* Set length to 0, but retain buffer */
+void slice_clear(struct slice *slice);
+
+/* Return a malloced string for `src` */
+char *slice_to_string(struct slice src);
+
+/* Ensure that `buf` is \0 terminated. */
+const char *slice_as_string(struct slice *src);
+
+/* Compare slices */
+bool slice_equal(struct slice a, struct slice b);
+
+/* Return `buf`, clearing out `s` */
+byte *slice_yield(struct slice *s);
+
+/* Copy bytes */
+void slice_copy(struct slice *dest, struct slice src);
+
+/* Advance `buf` by `n`, and decrease length. A copy of the slice
+   should be kept for deallocating the slice. */
+void slice_consume(struct slice *s, int n);
+
+/* Set length of the slice to `l` */
+void slice_resize(struct slice *s, int l);
+
+/* Signed comparison */
+int slice_compare(struct slice a, struct slice b);
+
+/* Append `data` to the `dest` slice.  */
+int slice_write(struct slice *dest, byte *data, int sz);
+
+/* Append `add` to `dest. */
+void slice_append(struct slice *dest, struct slice add);
+
+/* Like slice_write, but suitable for passing to reftable_new_writer
+ */
+int slice_write_void(void *b, byte *data, int sz);
+
+/* Find the longest shared prefix size of `a` and `b` */
+int common_prefix_size(struct slice a, struct slice b);
+
+struct reftable_block_source;
+
+/* Create an in-memory block source for reading reftables */
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf);
+
+struct reftable_block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 00000000000..e7b625d924a
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1151 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "reftable.h"
+#include "writer.h"
+
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config)
+{
+	struct reftable_stack *p =
+		reftable_calloc(sizeof(struct reftable_stack));
+	struct slice list_file_name = {};
+	int err = 0;
+
+	if (config.hash_id == 0) {
+		config.hash_id = SHA1_ID;
+	}
+
+	*dest = NULL;
+
+	slice_set_string(&list_file_name, dir);
+	slice_append_string(&list_file_name, "/tables.list");
+
+	p->list_file = slice_to_string(list_file_name);
+	slice_clear(&list_file_name);
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = reftable_stack_reload(p);
+	if (err < 0) {
+		reftable_stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fd_read_lines(int fd, char ***namesp)
+{
+	off_t size = lseek(fd, 0, SEEK_END);
+	char *buf = NULL;
+	int err = 0;
+	if (size < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+	err = lseek(fd, 0, SEEK_SET);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	buf = reftable_malloc(size + 1);
+	if (read(fd, buf, size) != size) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+
+exit:
+	reftable_free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	int fd = open(filename, O_RDONLY, 0644);
+	int err = 0;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			*namesp = reftable_calloc(sizeof(char *));
+			return 0;
+		}
+
+		return REFTABLE_IO_ERROR;
+	}
+	err = fd_read_lines(fd, namesp);
+	close(fd);
+	return err;
+}
+
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void reftable_stack_destroy(struct reftable_stack *st)
+{
+	if (st->merged == NULL) {
+		return;
+	}
+
+	reftable_merged_table_close(st->merged);
+	reftable_merged_table_free(st->merged);
+	st->merged = NULL;
+
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	reftable_free(st);
+}
+
+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
+						   int cur_len)
+{
+	struct reftable_reader **cur =
+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
+				      bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reftable_reader **new_tables =
+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
+	int new_tables_len = 0;
+	struct reftable_merged_table *new_merged = NULL;
+
+	struct slice table_path = { 0 };
+
+	while (*names) {
+		struct reftable_reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct reftable_block_source src = { 0 };
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_append_string(&table_path, "/");
+			slice_append_string(&table_path, name);
+
+			err = reftable_block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0) {
+				goto exit;
+			}
+
+			err = reftable_new_reader(&rd, src, name);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
+					st->config.hash_id);
+	if (err < 0) {
+		goto exit;
+	}
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		reftable_merged_table_free(st->merged);
+	}
+	new_merged->suppress_deletions = true;
+	st->merged = new_merged;
+
+	{
+		int i = 0;
+		for (i = 0; i < cur_len; i++) {
+			if (cur[i] != NULL) {
+				reader_close(cur[i]);
+				reftable_reader_free(cur[i]);
+			}
+		}
+	}
+exit:
+	slice_clear(&table_path);
+	{
+		int i = 0;
+		for (i = 0; i < new_tables_len; i++) {
+			reader_close(new_tables[i]);
+		}
+	}
+	reftable_free(new_tables);
+	reftable_free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0) {
+		return diff;
+	}
+
+	return udiff;
+}
+
+static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+					     bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		int err2 = 0;
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = reftable_stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != REFTABLE_NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
+		   writer. Check if there was one by checking if the name list
+		   changed.
+		*/
+		err2 = read_lines(st->list_file, &names_after);
+		if (err2 < 0) {
+			free_names(names);
+			return err2;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return err;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 100;
+		usleep(delay);
+	}
+
+	return 0;
+}
+
+int reftable_stack_reload(struct reftable_stack *st)
+{
+	return reftable_stack_reload_maybe_reuse(st, true);
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct reftable_stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto exit;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto exit;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	free_names(names);
+	return err;
+}
+
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write)(struct reftable_writer *wr, void *arg),
+		       void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == REFTABLE_LOCK_ERROR) {
+			// Ignore error return, we want to propagate
+			// REFTABLE_LOCK_ERROR.
+			reftable_stack_reload(st);
+		}
+		return err;
+	}
+
+	if (!st->disable_auto_compact) {
+		return reftable_stack_auto_compact(st);
+	}
+
+	return 0;
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
+	slice_set_string(dest, buf);
+}
+
+struct reftable_addition {
+	int lock_file_fd;
+	struct slice lock_file_name;
+	struct reftable_stack *stack;
+	char **names;
+	char **new_tables;
+	int new_tables_len;
+	uint64_t next_update_index;
+};
+
+static int reftable_stack_init_addition(struct reftable_addition *add,
+					struct reftable_stack *st)
+{
+	int err = 0;
+	add->stack = st;
+
+	slice_set_string(&add->lock_file_name, st->list_file);
+	slice_append_string(&add->lock_file_name, ".lock");
+
+	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (add->lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = REFTABLE_LOCK_ERROR;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	err = stack_uptodate(st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 1) {
+		err = REFTABLE_LOCK_ERROR;
+		goto exit;
+	}
+
+	add->next_update_index = reftable_stack_next_update_index(st);
+exit:
+	if (err) {
+		reftable_addition_close(add);
+	}
+	return err;
+}
+
+void reftable_addition_close(struct reftable_addition *add)
+{
+	int i = 0;
+	struct slice nm = {};
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_set_string(&nm, add->stack->list_file);
+		slice_append_string(&nm, "/");
+		slice_append_string(&nm, add->new_tables[i]);
+		unlink(slice_as_string(&nm));
+
+		reftable_free(add->new_tables[i]);
+		add->new_tables[i] = NULL;
+	}
+	reftable_free(add->new_tables);
+	add->new_tables = NULL;
+	add->new_tables_len = 0;
+
+	if (add->lock_file_fd > 0) {
+		close(add->lock_file_fd);
+		add->lock_file_fd = 0;
+	}
+	if (add->lock_file_name.len > 0) {
+		unlink(slice_as_string(&add->lock_file_name));
+		slice_clear(&add->lock_file_name);
+	}
+
+	free_names(add->names);
+	add->names = NULL;
+}
+
+int reftable_addition_commit(struct reftable_addition *add)
+{
+	struct slice table_list = { 0 };
+	int i = 0;
+	int err = 0;
+	if (add->new_tables_len == 0) {
+		goto exit;
+	}
+
+	for (i = 0; i < add->stack->merged->stack_len; i++) {
+		slice_append_string(&table_list,
+				    add->stack->merged->stack[i]->name);
+		slice_append_string(&table_list, "\n");
+	}
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_append_string(&table_list, add->new_tables[i]);
+		slice_append_string(&table_list, "\n");
+	}
+
+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	slice_clear(&table_list);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = close(add->lock_file_fd);
+	add->lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&add->lock_file_name),
+		     add->stack->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = reftable_stack_reload(add->stack);
+
+exit:
+	reftable_addition_close(add);
+	return err;
+}
+
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st)
+{
+	int err = 0;
+	*dest = reftable_malloc(sizeof(**dest));
+	err = reftable_stack_init_addition(*dest, st);
+	if (err) {
+		reftable_free(*dest);
+		*dest = NULL;
+	}
+	return err;
+}
+
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg)
+{
+	struct reftable_addition add = { 0 };
+	int err = reftable_stack_init_addition(&add, st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_addition_add(&add, write_table, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_addition_commit(&add);
+exit:
+	reftable_addition_close(&add);
+	return err;
+}
+
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice tab_file_name = { 0 };
+	struct slice next_name = { 0 };
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+	uint64_t next_update_index = 0;
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, next_update_index, next_update_index);
+
+	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
+	slice_append_string(&temp_tab_file_name, "/");
+	slice_append(&temp_tab_file_name, next_name);
+	slice_append_string(&temp_tab_file_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
+	if (tab_fd < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
+				 &add->stack->config);
+	err = write_table(wr, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_writer_close(wr);
+	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto exit;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	if (wr->min_update_index < next_update_index) {
+		err = REFTABLE_API_ERROR;
+		goto exit;
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_append_string(&next_name, ".ref");
+
+	slice_set_string(&tab_file_name, add->stack->reftable_dir);
+	slice_append_string(&tab_file_name, "/");
+	slice_append(&tab_file_name, next_name);
+
+	err = rename(slice_as_string(&temp_tab_file_name),
+		     slice_as_string(&tab_file_name));
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	add->new_tables = reftable_realloc(add->new_tables,
+					   sizeof(*add->new_tables) *
+						   (add->new_tables_len + 1));
+	add->new_tables[add->new_tables_len] = slice_to_string(next_name);
+	add->new_tables_len++;
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_file_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_file_name));
+	}
+
+	slice_clear(&temp_tab_file_name);
+	slice_clear(&tab_file_name);
+	slice_clear(&next_name);
+	reftable_writer_free(wr);
+	return err;
+}
+
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0) {
+		return reftable_reader_max_update_index(
+			       st->merged->stack[sz - 1]) +
+		       1;
+	}
+	return 1;
+}
+
+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct reftable_log_expiry_config *config)
+{
+	struct slice next_name = { 0 };
+	int tab_fd = -1;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reftable_reader_min_update_index(st->merged->stack[first]),
+		    reftable_reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_append_string(temp_tab, "/");
+	slice_append(temp_tab, next_name);
+	slice_append_string(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0) {
+		goto exit;
+	}
+	err = reftable_writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+	reftable_writer_free(wr);
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		slice_clear(temp_tab);
+	}
+	slice_clear(&next_name);
+	return err;
+}
+
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reftable_reader **subtabs = reftable_calloc(
+		sizeof(struct reftable_reader *) * (last - first + 1));
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reftable_reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	reftable_writer_set_limits(wr,
+				   st->merged->stack[first]->min_update_index,
+				   st->merged->stack[last]->max_update_index);
+
+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
+					st->config.hash_id);
+	if (err < 0) {
+		reftable_free(subtabs);
+		goto exit;
+	}
+
+	err = reftable_merged_table_seek_ref(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_ref(it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = reftable_writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+	}
+
+	err = reftable_merged_table_seek_log(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_log(it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		/* XXX collect stats? */
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = reftable_writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		reftable_merged_table_free(mt);
+	}
+	reftable_ref_record_clear(&ref);
+
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct reftable_stack *st, int first, int last,
+			       struct reftable_log_expiry_config *expiry)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice new_table_name = { 0 };
+	struct slice lock_file_name = { 0 };
+	struct slice ref_list_contents = { 0 };
+	struct slice new_table_path = { 0 };
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **delete_on_success =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	char **subtable_locks =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto exit;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_append_string(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0) {
+		goto exit;
+	}
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_file_name = { 0 };
+		struct slice subtab_lock = { 0 };
+		slice_set_string(&subtab_file_name, st->reftable_dir);
+		slice_append_string(&subtab_file_name, "/");
+		slice_append_string(&subtab_file_name,
+				    reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_file_name);
+		slice_append_string(&subtab_lock, ".lock");
+
+		{
+			int sublock_file_fd =
+				open(slice_as_string(&subtab_lock),
+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
+			if (sublock_file_fd > 0) {
+				close(sublock_file_fd);
+			} else if (sublock_file_fd < 0) {
+				if (errno == EEXIST) {
+					err = 1;
+				} else {
+					err = REFTABLE_IO_ERROR;
+				}
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] =
+			(char *)slice_as_string(&subtab_file_name);
+		j++;
+
+		if (err != 0) {
+			goto exit;
+		}
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
+				   expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_append_string(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_append_string(&new_table_path, "/");
+
+	slice_append(&new_table_path, new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(slice_as_string(&temp_tab_file_name),
+			     slice_as_string(&new_table_path));
+		if (err < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto exit;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		slice_append(&ref_list_contents, new_table_name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	have_lock = false;
+
+	{
+		char **p = delete_on_success;
+		while (*p) {
+			if (strcmp(*p, slice_as_string(&new_table_path))) {
+				unlink(*p);
+			}
+			p++;
+		}
+	}
+
+	err = reftable_stack_reload_maybe_reuse(st, first < last);
+exit:
+	free_names(delete_on_success);
+	{
+		char **p = subtable_locks;
+		while (*p) {
+			unlink(*p);
+			p++;
+		}
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	slice_clear(&new_table_name);
+	slice_clear(&new_table_path);
+	slice_clear(&ref_list_contents);
+	slice_clear(&temp_tab_file_name);
+	slice_clear(&lock_file_name);
+	return err;
+}
+
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct reftable_stack *st, int first,
+				     int last,
+				     struct reftable_log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	assert(sz > 0);
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+	if (next > 0) {
+		segs[next++] = cur;
+	}
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	reftable_free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
+{
+	uint64_t *sizes =
+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
+	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
+	int overhead = footer_size(version) + header_size(version) - 1;
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		sizes[i] = st->merged->stack[i]->size - overhead;
+	}
+	return sizes;
+}
+
+int reftable_stack_auto_compact(struct reftable_stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	reftable_free(sizes);
+	if (segment_size(&seg) > 0) {
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+	}
+
+	return 0;
+}
+
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st)
+{
+	return &st->stats;
+}
+
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_ref(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_ref(it, ref);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(ref->ref_name, refname) ||
+	    reftable_ref_record_is_deletion(ref)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_log(it, log);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(log->ref_name, refname) ||
+	    reftable_log_record_is_deletion(log)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 00000000000..26017f1f2d4
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,44 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+#include "system.h"
+
+struct reftable_stack {
+	char *list_file;
+	char *reftable_dir;
+	bool disable_auto_compact;
+
+	struct reftable_write_options config;
+
+	struct reftable_merged_table *merged;
+	struct reftable_compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg);
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config);
+int fastlog2(uint64_t sz);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 00000000000..99f453bfec8
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#if 1 /* REFTABLE_IN_GITCORE */
+
+#include "git-compat-util.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_IN_GITCORE */
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef uint8_t byte;
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ *
+ * TODO: in git-core, this should fallback to uncompress2 if it is available.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 00000000000..0341c865569
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,67 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				reftable_calloc(sizeof(struct tree_node));
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	{
+		int res = compare(key, (*rootp)->key);
+		if (res < 0) {
+			return tree_search(key, &(*rootp)->left, compare,
+					   insert);
+		} else if (res > 0) {
+			return tree_search(key, &(*rootp)->right, compare,
+					   insert);
+		}
+	}
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	reftable_free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 00000000000..954512e9a3a
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+/* tree_node is a generic binary search tree. */
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+/* looks for `key` in `rootp` using `compare` as comparison function. If insert
+   is set, insert the key if it's not found. Else, return NULL.
+*/
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+
+/* performs an infix walk of the tree. */
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+
+/*
+  deallocates the tree nodes recursively. Keys should be deallocated separately
+  by walking over the tree. */
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100755
index 00000000000..c37776c573f
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+set -eu
+
+# Override this to import from somewhere else, say "../reftable".
+SRC=${SRC:-origin} BRANCH=${BRANCH:-origin/master}
+
+((git --git-dir reftable-repo/.git fetch ${SRC} && cd reftable-repo && git checkout ${BRANCH} ) ||
+   git clone https://github.com/google/reftable reftable-repo)
+
+cp reftable-repo/c/*.[ch] reftable/
+cp reftable-repo/c/include/*.[ch] reftable/
+cp reftable-repo/LICENSE reftable/
+git --git-dir reftable-repo/.git show --no-patch origin/master \
+  > reftable/VERSION
+
+mv reftable/system.h reftable/system.h~
+sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
+
+# Remove unittests and compatibility hacks we don't need here.  
+rm reftable/*_test.c reftable/test_framework.* reftable/compat.*
+
+git add reftable/*.[ch] reftable/LICENSE reftable/VERSION 
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 00000000000..123bae75286
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,661 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct reftable_block_stats *
+writer_reftable_block_stats(struct reftable_writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct reftable_writer *w, byte *data, size_t len,
+			int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = reftable_calloc(w->pending_padding);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0) {
+			return n;
+		}
+
+		w->pending_padding = 0;
+		reftable_free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0) {
+		return n;
+	}
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct reftable_write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_version(struct reftable_writer *w)
+{
+	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
+}
+
+static int writer_write_header(struct reftable_writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	dest[4] = writer_version(w);
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	if (writer_version(w) == 2) {
+		put_be32(dest + 24, w->opts.hash_id);
+	}
+	return header_size(writer_version(w));
+}
+
+static void writer_reinit_block_writer(struct reftable_writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = header_size(writer_version(w));
+	}
+
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, byte *, int), void *writer_arg,
+		    struct reftable_write_options *opts)
+{
+	struct reftable_writer *wp =
+		reftable_calloc(sizeof(struct reftable_writer));
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->block = reftable_calloc(opts->block_size);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void reftable_writer_free(struct reftable_writer *w)
+{
+	reftable_free(w->block);
+	reftable_free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
+			     ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct reftable_writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		key = reftable_calloc(sizeof(struct obj_index_tree_node));
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = reftable_realloc(
+			key->offsets, sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct reftable_writer *w, struct record rec)
+{
+	int result = -1;
+	struct slice key = { 0 };
+	int err = 0;
+	record_key(rec, &key);
+	if (slice_compare(w->last_key, key) >= 0) {
+		goto exit;
+	}
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto exit;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	writer_reinit_block_writer(w, record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	result = 0;
+exit:
+	slice_clear(&key);
+	return result;
+}
+
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref)
+{
+	struct record rec = { 0 };
+	struct reftable_ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL) {
+		return REFTABLE_API_ERROR;
+	}
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index) {
+		return REFTABLE_API_ERROR;
+	}
+
+	record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = hash_size(w->opts.hash_id),
+		};
+
+		writer_index_hash(w, h);
+	}
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = hash_size(w->opts.hash_id),
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, reftable_ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log)
+{
+	if (log->ref_name == NULL) {
+		return REFTABLE_API_ERROR;
+	}
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	{
+		struct record rec = { 0 };
+		int err;
+		record_from_log(&rec, log);
+		err = writer_add_record(w, rec);
+		return err;
+	}
+}
+
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, reftable_log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	while (w->index_len > threshold) {
+		struct index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct record rec = { 0 };
+			record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, rec) == 0) {
+				continue;
+			}
+
+			{
+				int err = writer_flush_block(w);
+				if (err < 0) {
+					return err;
+				}
+			}
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, rec);
+			assert(err == 0);
+		}
+		for (i = 0; i < idx_len; i++) {
+			slice_clear(&idx[i].last_key);
+		}
+		reftable_free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_block_stats *bstats =
+			writer_reftable_block_stats(w, typ);
+		bstats->index_blocks =
+			w->stats.idx_stats.blocks - before_blocks;
+		bstats->index_offset = index_start;
+		bstats->max_index_level = max_level;
+	}
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct reftable_writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct record rec = { 0 };
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	slice_clear(&entry->hash);
+	reftable_free(entry);
+}
+
+static int writer_dump_object_index(struct reftable_writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0) {
+		return closure.err;
+	}
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct reftable_writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0) {
+		return err;
+	}
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int reftable_writer_close(struct reftable_writer *w)
+{
+	byte footer[72];
+	byte *p = footer;
+	int err = writer_finish_public_section(w);
+	int empty_table = w->next == 0;
+	if (err != 0) {
+		goto exit;
+	}
+	w->pending_padding = 0;
+	if (empty_table) {
+		// Empty tables need a header anyway.
+		byte header[28];
+		int n = writer_write_header(w, header);
+		err = padded_write(w, header, n, 0);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	p += writer_write_header(w, footer);
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+
+	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (empty_table) {
+		err = REFTABLE_EMPTY_TABLE_ERROR;
+		goto exit;
+	}
+
+exit:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	slice_clear(&w->last_key);
+	return err;
+}
+
+void writer_clear_index(struct reftable_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		slice_clear(&w->index[i].last_key);
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct reftable_block_stats *bstats =
+		writer_reftable_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	if (raw_bytes < 0) {
+		return raw_bytes;
+	}
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0) {
+		return err;
+	}
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = reftable_realloc(
+			w->index, sizeof(struct index_record) * w->index_cap);
+	}
+
+	{
+		struct index_record ir = {
+			.offset = w->next,
+		};
+		slice_copy(&ir.last_key, w->block_writer->last_key);
+		w->index[w->index_len] = ir;
+	}
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct reftable_writer *w)
+{
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+	if (w->block_writer->entries == 0) {
+		return 0;
+	}
+	return writer_flush_nonempty_block(w);
+}
+
+const struct reftable_stats *writer_stats(struct reftable_writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 00000000000..5115e762b12
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct reftable_writer {
+	int (*write)(void *, byte *, int);
+	void *write_arg;
+	int pending_padding;
+	struct slice last_key;
+
+	/* offset of next block to write. */
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct reftable_write_options opts;
+
+	/* memory buffer for writing */
+	byte *block;
+
+	/* writer for the current section. NULL or points to
+	 * block_writer_data */
+	struct block_writer *block_writer;
+
+	struct block_writer block_writer_data;
+
+	/* pending index records for the current section */
+	struct index_record *index;
+	int index_len;
+	int index_cap;
+
+	/*
+	  tree for use with tsearch; used to populate the 'o' inverse OID
+	  map */
+	struct tree_node *obj_index_tree;
+
+	struct reftable_stats stats;
+};
+
+/* finishes a block, and writes it to storage */
+int writer_flush_block(struct reftable_writer *w);
+
+/* deallocates memory related to the index */
+void writer_clear_index(struct reftable_writer *w);
+
+/* finishes writing a 'r' (refs) or 'g' (reflogs) section */
+int writer_finish_public_section(struct reftable_writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 00000000000..3e0b0f24f1c
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v10 10/12] Reftable support for git-core
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
                                     ` (8 preceding siblings ...)
  2020-04-27 20:13                   ` [PATCH v10 09/12] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-04-27 20:13                   ` Han-Wen Nienhuys via GitGitGadget
  2020-04-27 20:13                   ` [PATCH v10 11/12] Add some reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
                                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-27 20:13 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

TODO:

 * Resolve spots marked with XXX

 * Detect and prevent directory/file conflicts in naming.

 * Support worktrees (t0002-gitfile "linked repo" testcase)

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   26 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   61 +-
 cache.h                                       |    6 +-
 refs.c                                        |   26 +-
 refs.h                                        |    3 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1045 +++++++++++++++++
 reftable/update.sh                            |    2 +-
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |   99 ++
 14 files changed, 1262 insertions(+), 34 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ffd..72576235833 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index dc356ce4ddc..8d5393a96ff 100644
--- a/Makefile
+++ b/Makefile
@@ -814,6 +814,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += command-list.h
 
@@ -961,6 +962,7 @@ LIB_OBJS += rebase-interactive.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1164,7 +1166,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2348,11 +2350,28 @@ VCSSVN_OBJS += vcs-svn/fast_export.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/bytes.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2489,6 +2508,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -3104,7 +3126,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/builtin/clone.c b/builtin/clone.c
index a4f836d1baf..5259738de88 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1108,7 +1108,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
+		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 3b50b1aa0e5..eb2781ed6af 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,7 +178,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -188,7 +189,8 @@ void initialize_repository_version(int hash_algo)
 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
 #endif
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 ||
+	    !strcmp(ref_storage_format, "reftable"))
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -238,6 +240,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -247,11 +250,28 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
+        /*
+         * refs/heads is a file when using reftable. We can't reinitialize with
+         * a reftable because it will overwrite HEAD
+         */
+	if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
+			      is_directory(git_path_buf(&buf, "refs/heads"))) {
+		die("cannot switch ref storage format.");
+	}
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
 	 */
-
 	if (refs_init_db(&err))
 		die("failed to set up refs db: %s", err.buf);
 
@@ -259,16 +279,13 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
-	}
-
-	initialize_repository_version(fmt->hash_algo);
+	} 
 
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
+        
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
 	filemode = TEST_FILEMODE;
@@ -381,7 +398,8 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash, unsigned int flags)
+	    const char *template_dir, int hash, const char *ref_storage_format,
+	    unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -420,6 +438,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
+	repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
@@ -448,6 +467,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -521,6 +542,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = DEFAULT_REF_STORAGE;
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
@@ -528,15 +550,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *object_format = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
@@ -646,9 +671,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
+	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
+		       ref_storage_format, flags);
 }
diff --git a/cache.h b/cache.h
index 0f0485ecfe2..8cb884773c3 100644
--- a/cache.h
+++ b/cache.h
@@ -628,8 +628,9 @@ int path_inside_repo(const char *prefix, const char *path);
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
-	    unsigned int flags);
-void initialize_repository_version(int hash_algo);
+	    const char *ref_storage_format, unsigned int flags);
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1043,6 +1044,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 05e05579408..cf1e645266f 100644
--- a/refs.c
+++ b/refs.c
@@ -17,10 +17,16 @@
 #include "argv-array.h"
 #include "repository.h"
 
+const char *default_ref_storage(void)
+{
+	const char *test = getenv("GIT_TEST_REFTABLE");
+	return test ? "reftable" : "files";
+}
+
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1836,13 +1842,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1858,7 +1864,10 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs_private = ref_store_init(
+                r->gitdir,
+                r->ref_storage_format ? r->ref_storage_format : DEFAULT_REF_STORAGE,
+                REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
 
@@ -1913,7 +1922,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1927,6 +1936,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1940,9 +1950,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index 87c9ec921b9..e024da47ef6 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* Returns the ref storage backend to use by default. */
+const char *default_ref_storage(void);
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 3490aac3a40..cafe5b97376 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -661,6 +661,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 00000000000..e0eb9f09b1d
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1045 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct git_reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+        char *repo_dir;
+	char *reftable_dir;
+	struct reftable_stack *stack;
+};
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	reftable_log_record_clear(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = { NULL };
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	reftable_log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *git_reftable_ref_store_create(const char *path,
+						       unsigned int store_flags)
+{
+	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct reftable_write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+        refs->repo_dir = xstrdup(path);
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct strbuf sb = STRBUF_INIT;
+
+	safe_create_dir(refs->reftable_dir, 1);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+        return 0;
+}
+
+struct git_reftable_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	const char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = reftable_iterator_next_ref(ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	reftable_ref_record_clear(&ri->ref);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct reftable_merged_table *mt = NULL;
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else {
+		mt = reftable_stack_merged_table(refs->stack);
+		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->prefix = prefix;
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	(void)refs;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid;
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct ref_update *)a)->refname,
+		      ((struct ref_update *)b)->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = reftable_stack_next_update_index(refs->stack);
+	int err = 0;
+	int i = 0;
+	struct reftable_log_record *logs =
+		calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct reftable_log_record *log = &logs[i];
+		fill_reftable_log_record(log);
+		log->ref_name = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct object_id out_oid;
+			int out_flags = 0;
+			/* Memory owned by refs_resolve_ref_unsafe, no need to
+			 * free(). */
+			const char *resolved = refs_resolve_ref_unsafe(
+				transaction->ref_store, u->refname, 0, &out_oid,
+				&out_flags);
+			struct reftable_ref_record ref = { NULL };
+			struct object_id peeled;
+			int peel_error = peel_object(&u->new_oid, &peeled);
+
+			ref.ref_name =
+				(char *)(resolved ? resolved : u->refname);
+			log->ref_name = ref.ref_name;
+
+			if (!is_null_oid(&u->new_oid)) {
+				ref.value = u->new_oid.hash;
+			}
+			ref.update_index = ts;
+			if (!peel_error) {
+				ref.target_value = peeled.hash;
+			}
+
+			err = reftable_writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		err = reftable_writer_add_log(writer, &logs[i]);
+		clear_reftable_log_record(&logs[i]);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+exit:
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_commit(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_add(refs->stack, &write_transaction_table,
+				 transaction);
+	if (err < 0) {
+		strbuf_addf(errmsg, "reftable: transaction failure: %s",
+			    reftable_error_str(err));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *err)
+{
+	return reftable_transaction_commit(ref_store, transaction, err);
+}
+
+struct write_delete_refs_arg {
+	struct reftable_stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	int i = 0;
+
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = reftable_writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_log_record log = { NULL };
+		struct reftable_ref_record current = { NULL };
+		fill_reftable_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (reftable_stack_read_ref(arg->stack, log.ref_name,
+					    &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = reftable_writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		reftable_ref_record_clear(&current);
+
+		clear_reftable_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	return reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct git_reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = reftable_stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	reftable_writer_set_limits(writer, ts, ts);
+	err = reftable_writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_log_record log = { NULL };
+		struct object_id new_oid;
+		struct object_id old_oid;
+		struct reftable_ref_record current = { NULL };
+		reftable_stack_read_ref(create->refs->stack, create->refname,
+					&current);
+
+		fill_reftable_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			reftable_writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_reftable_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_add(refs->stack, &write_create_symref_table,
+				  &arg);
+}
+
+struct write_rename_arg {
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_ref_record ref = { NULL };
+	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto exit;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto exit;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	reftable_writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct reftable_ref_record todo[2] = { { NULL } };
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = reftable_writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct reftable_log_record todo[2] = { { NULL } };
+		fill_reftable_log_record(&todo[0]);
+		fill_reftable_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = reftable_writer_add_logs(writer, todo, 2);
+
+		clear_reftable_log_record(&todo[0]);
+		clear_reftable_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto exit;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+exit:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	return reftable_stack_add(refs->stack, &write_rename_table, &arg);
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = reftable_iterator_next_log(ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			/* we want the refnames that we have reflogs for, so we
+			 * skip if we've already produced this name. This could
+			 * be faster by seeking directly to
+			 * reflog@update_index==0.
+			 */
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	reftable_log_record_clear(&ri->log);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(refs->stack);
+	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_log_record log = { NULL };
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		err = reftable_iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid;
+			struct object_id new_oid;
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+	int i = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct reftable_log_record log = { NULL };
+		err = reftable_iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (i = len; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (i = 0; i < len; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		reftable_log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct reftable_log_record tombstone = {
+		.ref_name = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->refs->stack);
+	int i = 0;
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  reftable_pack_refs(), which will compact the entire stack and get rid
+	  of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.refs = refs,
+	};
+	struct reftable_log_record log = { NULL };
+	struct reftable_iterator it = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		struct object_id ooid;
+		struct object_id noid;
+
+		int err = reftable_iterator_next_log(it, &log);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0 || strcmp(log.ref_name, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_read_ref(refs->stack, refname, &ref);
+        if (err > 0) {
+                errno = ENOENT;
+                err = -1;
+                goto exit;
+        }
+	if (err < 0) {
+                errno = reftable_error_to_errno(err);
+                err = -1;
+		goto exit;
+	}
+	if (ref.target != NULL) {
+		/* XXX recurse? */
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else if (ref.value != NULL) {
+		hashcpy(oid->hash, ref.value);
+	} else {
+                *type |= REF_ISBROKEN;
+                errno = EINVAL;
+                err = -1;
+        }
+exit:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	git_reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire
+};
diff --git a/reftable/update.sh b/reftable/update.sh
index c37776c573f..b5632d64b36 100755
--- a/reftable/update.sh
+++ b/reftable/update.sh
@@ -11,7 +11,7 @@ SRC=${SRC:-origin} BRANCH=${BRANCH:-origin/master}
 cp reftable-repo/c/*.[ch] reftable/
 cp reftable-repo/c/include/*.[ch] reftable/
 cp reftable-repo/LICENSE reftable/
-git --git-dir reftable-repo/.git show --no-patch origin/master \
+git --git-dir reftable-repo/.git show --no-patch ${BRANCH} \
   > reftable/VERSION
 
 mv reftable/system.h reftable/system.h~
diff --git a/repository.c b/repository.c
index 6f7f6f002b1..087760bc184 100644
--- a/repository.c
+++ b/repository.c
@@ -178,6 +178,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 6534fbb7b31..f57b73f4a27 100644
--- a/repository.h
+++ b/repository.h
@@ -74,6 +74,9 @@ struct repository {
 	 */
 	struct ref_store *refs_private;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index 65fe5ecefbe..2ef970f9f88 100644
--- a/setup.c
+++ b/setup.c
@@ -468,9 +468,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -559,6 +561,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1204,8 +1207,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 00000000000..6b940f5f282
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,99 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+INVALID_SHA1=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+initialize ()  {
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	mv .git/hooks .git/hooks-disabled
+}
+
+test_expect_success 'delete ref' '
+	initialize &&
+	test_commit file &&
+	SHA=$(git show-ref -s --verify HEAD) &&
+	test_write_lines "$SHA refs/heads/master" "$SHA refs/tags/file" >expect &&
+	git show-ref > actual &&
+	! git update-ref -d refs/tags/file $INVALID_SHA1 &&
+	test_cmp expect actual &&
+	git update-ref -d refs/tags/file $SHA  &&
+	test_write_lines "$SHA refs/heads/master" >expect &&
+	git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basic operation of reftable storage: commit, reflog, repack' '
+	initialize &&
+	test_commit file &&
+	test_write_lines refs/heads/master refs/tags/file >expect &&
+	git show-ref &&
+	git show-ref | cut -f2 -d" " > actual &&
+	test_cmp actual expect &&
+	for count in $(test_seq 1 10)
+	do
+		test_commit "number $count" file.t $count number-$count ||
+	        return 1
+	done &&
+	git pack-refs &&
+	ls -1 .git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 11 output &&
+	grep "commit (initial): file" output &&
+	grep "commit: number 10" output &&
+	git gc &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 0 output 
+'
+
+# This matches show-ref's output
+print_ref() {
+	echo "$(git rev-parse "$1") $1"
+}
+
+test_expect_success 'peeled tags are stored' '
+	initialize &&
+	test_commit file &&
+	git tag -m "annotated tag" test_tag HEAD &&
+	{
+		print_ref "refs/heads/master" &&
+		print_ref "refs/tags/file" &&
+		print_ref "refs/tags/test_tag" &&
+		print_ref "refs/tags/test_tag^{}" 
+	} >expect &&
+	git show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'show-ref works on fresh repo' '
+	initialize &&
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	>expect &&
+	! git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'checkout unborn branch' '
+	initialize &&
+	git checkout -b master
+'
+
+test_expect_success 'do not clobber existing repo' '
+	rm -rf .git &&
+	git init --ref-storage=files &&
+	cat .git/HEAD > expect &&
+	test_commit file &&
+	(git init --ref-storage=reftable || true) &&
+	cat .git/HEAD > actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v10 11/12] Add some reftable testing infrastructure
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
                                     ` (9 preceding siblings ...)
  2020-04-27 20:13                   ` [PATCH v10 10/12] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-04-27 20:13                   ` Han-Wen Nienhuys via GitGitGadget
  2020-04-27 20:13                   ` [PATCH v10 12/12] t: use update-ref and show-ref to reading/writing refs Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-27 20:13 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* Add GIT_TEST_REFTABLE environment var to control default ref storage

* Add test_prerequisite REFTABLE. Skip t/t3210-pack-refs.sh for REFTABLE.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/clone.c      |  2 +-
 builtin/init-db.c    |  2 +-
 refs.c               | 13 +++++++------
 t/t3210-pack-refs.sh |  6 ++++++
 t/test-lib.sh        |  5 +++++
 5 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 5259738de88..7a59781a327 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1109,7 +1109,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 
 	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
-		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
+		default_ref_storage(), INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index eb2781ed6af..6aaa51ec74e 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -542,7 +542,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
-	const char *ref_storage_format = DEFAULT_REF_STORAGE;
+	const char *ref_storage_format = default_ref_storage();
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
diff --git a/refs.c b/refs.c
index cf1e645266f..7d7aad49e3c 100644
--- a/refs.c
+++ b/refs.c
@@ -1864,10 +1864,11 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(
-                r->gitdir,
-                r->ref_storage_format ? r->ref_storage_format : DEFAULT_REF_STORAGE,
-                REF_STORE_ALL_CAPS);
+	r->refs_private = ref_store_init(r->gitdir,
+					 r->ref_storage_format ?
+						 r->ref_storage_format :
+						 default_ref_storage(),
+					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
 
@@ -1922,7 +1923,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
+	refs = ref_store_init(submodule_sb.buf, default_ref_storage(),
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1936,7 +1937,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
-	const char *format = DEFAULT_REF_STORAGE; /* XXX */
+	const char *format = default_ref_storage();
 	struct ref_store *refs;
 	const char *id;
 
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index f41b2afb996..edaef2c175a 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -11,6 +11,12 @@ semantic is still the same.
 '
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success 'enable reflogs' '
 	git config core.logallrefupdates true
 '
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 0bb1105ec37..3877a91b268 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1502,6 +1502,11 @@ FreeBSD)
 	;;
 esac
 
+if test -n "$GIT_TEST_REFTABLE"
+then
+  test_set_prereq REFTABLE
+fi
+
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v10 12/12] t: use update-ref and show-ref to reading/writing refs
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
                                     ` (10 preceding siblings ...)
  2020-04-27 20:13                   ` [PATCH v10 11/12] Add some reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
@ 2020-04-27 20:13                   ` Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-04-27 20:13 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reading and writing .git/refs/* assumes that refs are stored in the 'files'
ref backend.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/t0002-gitfile.sh             |  2 +-
 t/t1400-update-ref.sh          | 32 ++++++++++++++++----------------
 t/t1506-rev-parse-diagnosis.sh |  2 +-
 t/t6050-replace.sh             |  2 +-
 t/t9020-remote-svn.sh          |  4 ++--
 5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/t/t0002-gitfile.sh b/t/t0002-gitfile.sh
index 0aa9908ea12..960ed150cb5 100755
--- a/t/t0002-gitfile.sh
+++ b/t/t0002-gitfile.sh
@@ -62,7 +62,7 @@ test_expect_success 'check commit-tree' '
 '
 
 test_expect_success 'check rev-list' '
-	echo $SHA >"$REAL/HEAD" &&
+	git update-ref "HEAD" "$SHA" &&
 	test "$SHA" = "$(git rev-list HEAD)"
 '
 
diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index a6224ef65fe..00dcd2fbd46 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -37,15 +37,15 @@ test_expect_success setup '
 
 test_expect_success "create $m" '
 	git update-ref $m $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "create $m with oldvalue verification" '
 	git update-ref $m $B $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "fail to delete $m with stale ref" '
 	test_must_fail git update-ref -d $m $A &&
-	test $B = "$(cat .git/$m)"
+	test $B = "$(git show-ref -s --verify $m)"
 '
 test_expect_success "delete $m" '
 	test_when_finished "rm -f .git/$m" &&
@@ -56,7 +56,7 @@ test_expect_success "delete $m" '
 test_expect_success "delete $m without oldvalue verification" '
 	test_when_finished "rm -f .git/$m" &&
 	git update-ref $m $A &&
-	test $A = $(cat .git/$m) &&
+	test $A = $(git show-ref -s --verify $m) &&
 	git update-ref -d $m &&
 	test_path_is_missing .git/$m
 '
@@ -69,15 +69,15 @@ test_expect_success "fail to create $n" '
 
 test_expect_success "create $m (by HEAD)" '
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "create $m (by HEAD) with oldvalue verification" '
 	git update-ref HEAD $B $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "fail to delete $m (by HEAD) with stale ref" '
 	test_must_fail git update-ref -d HEAD $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "delete $m (by HEAD)" '
 	test_when_finished "rm -f .git/$m" &&
@@ -178,14 +178,14 @@ test_expect_success '--no-create-reflog overrides core.logAllRefUpdates=always'
 
 test_expect_success "create $m (by HEAD)" '
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success 'pack refs' '
 	git pack-refs --all
 '
 test_expect_success "move $m (by HEAD)" '
 	git update-ref HEAD $B $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "delete $m (by HEAD) should remove both packed and loose $m" '
 	test_when_finished "rm -f .git/$m" &&
@@ -255,7 +255,7 @@ test_expect_success '(not) change HEAD with wrong SHA1' '
 '
 test_expect_success "(not) changed .git/$m" '
 	test_when_finished "rm -f .git/$m" &&
-	! test $B = $(cat .git/$m)
+	! test $B = $(git show-ref -s --verify $m)
 '
 
 rm -f .git/logs/refs/heads/master
@@ -263,19 +263,19 @@ test_expect_success "create $m (logged by touch)" '
 	test_config core.logAllRefUpdates false &&
 	GIT_COMMITTER_DATE="2005-05-26 23:30" \
 	git update-ref --create-reflog HEAD $A -m "Initial Creation" &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "update $m (logged by touch)" '
 	test_config core.logAllRefUpdates false &&
 	GIT_COMMITTER_DATE="2005-05-26 23:31" \
 	git update-ref HEAD $B $A -m "Switch" &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "set $m (logged by touch)" '
 	test_config core.logAllRefUpdates false &&
 	GIT_COMMITTER_DATE="2005-05-26 23:41" \
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 
 test_expect_success 'empty directory removal' '
@@ -319,19 +319,19 @@ test_expect_success "create $m (logged by config)" '
 	test_config core.logAllRefUpdates true &&
 	GIT_COMMITTER_DATE="2005-05-26 23:32" \
 	git update-ref HEAD $A -m "Initial Creation" &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "update $m (logged by config)" '
 	test_config core.logAllRefUpdates true &&
 	GIT_COMMITTER_DATE="2005-05-26 23:33" \
 	git update-ref HEAD'" $B $A "'-m "Switch" &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "set $m (logged by config)" '
 	test_config core.logAllRefUpdates true &&
 	GIT_COMMITTER_DATE="2005-05-26 23:43" \
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 
 cat >expect <<EOF
diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
index 52edcbdcc32..dbf690b9c1b 100755
--- a/t/t1506-rev-parse-diagnosis.sh
+++ b/t/t1506-rev-parse-diagnosis.sh
@@ -207,7 +207,7 @@ test_expect_success 'arg before dashdash must be a revision (ambiguous)' '
 	{
 		# we do not want to use rev-parse here, because
 		# we are testing it
-		cat .git/refs/heads/foobar &&
+		git show-ref -s refs/heads/foobar &&
 		printf "%s\n" --
 	} >expect &&
 	git rev-parse foobar -- >actual &&
diff --git a/t/t6050-replace.sh b/t/t6050-replace.sh
index e7e64e085dd..c80dc10b8f1 100755
--- a/t/t6050-replace.sh
+++ b/t/t6050-replace.sh
@@ -135,7 +135,7 @@ test_expect_success 'tag replaced commit' '
 test_expect_success '"git fsck" works' '
      git fsck master >fsck_master.out &&
      test_i18ngrep "dangling commit $R" fsck_master.out &&
-     test_i18ngrep "dangling tag $(cat .git/refs/tags/mytag)" fsck_master.out &&
+     test_i18ngrep "dangling tag $(git show-ref -s refs/tags/mytag)" fsck_master.out &&
      test -z "$(git fsck)"
 '
 
diff --git a/t/t9020-remote-svn.sh b/t/t9020-remote-svn.sh
index 6fca08e5e35..9fcfa969a9b 100755
--- a/t/t9020-remote-svn.sh
+++ b/t/t9020-remote-svn.sh
@@ -48,8 +48,8 @@ test_expect_success REMOTE_SVN 'simple fetch' '
 '
 
 test_debug '
-	cat .git/refs/svn/svnsim/master
-	cat .git/refs/remotes/svnsim/master
+	git show-ref -s refs/svn/svnsim/master
+	git show-ref -s refs/remotes/svnsim/master
 '
 
 test_expect_success REMOTE_SVN 'repeated fetch, nothing shall change' '
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v10 09/12] Add reftable library
  2020-04-27 20:13                   ` [PATCH v10 09/12] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-04-28 14:55                     ` Danh Doan
  2020-04-28 15:29                       ` Junio C Hamano
  2020-04-28 20:21                       ` Han-Wen Nienhuys
  0 siblings, 2 replies; 409+ messages in thread
From: Danh Doan @ 2020-04-28 14:55 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

On 2020-04-27 20:13:35+0000, Han-Wen Nienhuys via GitGitGadget <gitgitgadget@gmail.com> wrote:
[..snip..]
> diff --git a/reftable/LICENSE b/reftable/LICENSE
> new file mode 100644
> index 00000000000..402e0f9356b
> --- /dev/null
> +++ b/reftable/LICENSE
> @@ -0,0 +1,31 @@
> +BSD License
> +
> +Copyright (c) 2020, Google LLC
> +All rights reserved.

Linking against GPLv2 code will make this code GPLv2, right?

> --- /dev/null
> +++ b/reftable/README.md
> @@ -0,0 +1,11 @@
> +
> +The source code in this directory comes from https://github.com/google/reftable.
> +
> +The VERSION file keeps track of the current version of the reftable library.
> +
> +To update the library, do:
> +
> +   sh reftable/update.sh

I agree with Dscho, we should do a git-subtree merge instead.

[..snip..]
> +#include "system.h"
> +
> +void put_be24(byte *out, uint32_t i)

So, we introduce a new type `byte`?

> +{
> +	out[0] = (byte)((i >> 16) & 0xff);
> +	out[1] = (byte)((i >> 8) & 0xff);
> +	out[2] = (byte)((i)&0xff);

At my first glance, I thought that code is:

	out[2] = (byte)( (i)(&0xff) )

It's totally un-parsable.

Anyway, we're gonna accept "expand tab to 4 spaces" in Git, now?

> +}
> +
> +uint32_t get_be24(byte *in)
> +{
> +	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
> +	       (uint32_t)(in[2]);

I don't think we need this much cast. One cast can rule them all.

> +}
> +
> +void put_be16(uint8_t *out, uint16_t i)
> +{
> +	out[0] = (uint8_t)((i >> 8) & 0xff);
> +	out[1] = (uint8_t)((i)&0xff);

This is un-parsable, see above.

> +}
> +
> +int binsearch(int sz, int (*f)(int k, void *args), void *args)

binary search?

I think we want all size to be `size_t`

> +{
> +	int lo = 0;
> +	int hi = sz;
> +
> +	/* invariant: (hi == sz) || f(hi) == true
> +	   (lo == 0 && f(0) == true) || fi(lo) == false
> +	 */

Comment style

> +	while (hi - lo > 1) {
> +		int mid = lo + (hi - lo) / 2;
> +
> +		int val = f(mid, args);
> +		if (val) {
> +			hi = mid;

But this line of code makes me think this function is binary partition
instead.

> +		} else {
> +			lo = mid;
> +		}

bracket style

> +	}
> +

[..snip..]

> diff --git a/reftable/basics.h b/reftable/basics.h
> new file mode 100644
> index 00000000000..6d89eb5d931
> --- /dev/null
> +++ b/reftable/basics.h
> @@ -0,0 +1,53 @@
[..snip..]
> +  find smallest index i in [0, sz) at which f(i) is true, assuming
> +  that f is ascending. Return sz if f(i) is false for all indices.
> +*/

So, this is about partitioning, not searching

> +int binsearch(int sz, int (*f)(int k, void *args), void *args);
> +
> +int block_writer_add(struct block_writer *w, struct record rec)

So, we're gonna pass a large structure into a function now?

> +{
> +	struct slice empty = { 0 };
> +	struct slice last = w->entries % w->restart_interval == 0 ? empty :
> +								    w->last_key;
> +	struct slice out = {
> +		.buf = w->buf + w->next,
> +		.len = w->block_size - w->next,
> +	};
> +
> +	struct slice start = out;
> +
> +	bool restart = false;

Do we want new type: bool?

> +	struct slice key = { 0 };
> +	int n = 0;
> +
> +	record_key(rec, &key);
> +	n = encode_key(&restart, out, last, key, record_val_type(rec));
> +	if (n < 0) {
> +		goto err;
> +	}
> +	slice_consume(&out, n);
> +
> +	n = record_encode(rec, out, w->hash_size);
> +	if (n < 0) {
> +		goto err;
> +	}
> +	slice_consume(&out, n);
> +
> +	if (block_writer_register_restart(w, start.len - out.len, restart,
> +					  key) < 0) {

Eh, naming is hard, and so does this function. It's overly verbose,
and long. I haven't go though all of this patch (because it's too
long), but I guess some of them can be made static and their name can
be shortened.

> +		goto err;
> +	}
> +
> +	slice_clear(&key);
> +	return 0;
> +
> +err:
> +	slice_clear(&key);
> +	return -1;
> +}

[..snip..]
> +		block_source_return_block(block->source, block);
> +		block->data = uncompressed.buf;
> +		block->len = dst_len; /* XXX: 4 bytes missing? */

Even this is a very lengthy patch, some bytes is missing but we
couldn't figured out.

> +		block->source = malloc_block_source();
> +		full_block_size = src_len + block_header_skip;
> +	} else if (full_block_size == 0) {
> +		full_block_size = sz;
> +	} else if (sz < full_block_size && sz < block->len &&
> +		   block->data[sz] != 0) {
> +		/* If the block is smaller than the full block size, it is
> +		   padded (data followed by '\0') or the next block is
> +		   unaligned. */
> +		full_block_size = sz;
> +	}
> +
> +	{

This cute curly braces is new to me

> +		uint16_t restart_count = get_be16(block->data + sz - 2);
> +		uint32_t restart_start = sz - 2 - 3 * restart_count;
> +
> +		byte *restart_bytes = block->data + restart_start;
> +
> +		/* transfer ownership. */
> +		br->block = *block;
> +		block->data = NULL;
> +		block->len = 0;
> +
> +		br->hash_size = hash_size;
> +		br->block_len = restart_start;
> +		br->full_block_size = full_block_size;
> +		br->header_off = header_off;
> +		br->restart_count = restart_count;
> +		br->restart_bytes = restart_bytes;
> +	}
> +
> +	return 0;
> +}
> +
> +static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
> +{
> +	return get_be24(br->restart_bytes + 3 * i);
> +}
> +
> +void block_reader_start(struct block_reader *br, struct block_iter *it)
> +{
> +	it->br = br;
> +	slice_resize(&it->last_key, 0);
> +	it->next_off = br->header_off + 4;
> +}
> +
> +struct restart_find_args {
> +	int error;
> +	struct slice key;
> +	struct block_reader *r;
> +};

I would like to see all structure declaration at top of file.

> +
> +static int restart_key_less(int idx, void *args)

idx? index? Shouldn't it be size_t?

> +{
> +	struct restart_find_args *a = (struct restart_find_args *)args;
> +	uint32_t off = block_reader_restart_offset(a->r, idx);
> +	struct slice in = {
> +		.buf = a->r->block.data + off,
> +		.len = a->r->block_len - off,

C99 designated initialisation.
We aren't ready, yet.

> +void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
> +{
> +	dest->br = src->br;
> +	dest->next_off = src->next_off;
> +	slice_copy(&dest->last_key, src->last_key);
> +}
> +
> +/* return < 0 for error, 0 for OK, > 0 for EOF. */
> +int block_iter_next(struct block_iter *it, struct record rec)

If this function is used only in this file, it should be static,
otherwise, the comment should go to header file

> diff --git a/reftable/block.h b/reftable/block.h
> new file mode 100644
> index 00000000000..62b2e0fec6d
> +/*
> +  Writes reftable blocks. The block_writer is reused across blocks to minimize
> +  allocation overhead.
> +*/
> +struct block_writer {
> +	byte *buf;
> +	uint32_t block_size;

I suppose this means 2^32 block should be enough for everyone?

> diff --git a/reftable/bytes.c b/reftable/bytes.c
> new file mode 100644
> index 00000000000..e69de29bb2d

Empty?

> diff --git a/reftable/config.h b/reftable/config.h
> new file mode 100644
> index 00000000000..40a8c178f10
> --- /dev/null
> +++ b/reftable/config.h
> @@ -0,0 +1 @@
> +/* empty */

Empty?

> diff --git a/reftable/dump.c b/reftable/dump.c
> new file mode 100644
> index 00000000000..c0065792a4f
> --- /dev/null
> +++ b/reftable/dump.c
> @@ -0,0 +1,97 @@
> +/*
> +Copyright 2020 Google LLC
> +
> +Use of this source code is governed by a BSD-style
> +license that can be found in the LICENSE file or at
> +https://developers.google.com/open-source/licenses/bsd
> +*/
> +
> +#include "system.h"
> +
> +#include "reftable.h"
> +
> +static int dump_table(const char *tablename)
> +{
> +	struct block_source src = { 0 };
> +	int err = block_source_from_file(&src, tablename);
> +	if (err < 0) {
> +		return err;
> +	}
> +
> +	struct reader *r = NULL;

-Wdeclaration-after-statement

> +	err = new_reader(&r, src, tablename);
> +	if (err < 0) {
> +		return err;
> +	}
> +
> +	{
> +		struct iterator it = { 0 };
> +		err = reader_seek_ref(r, &it, "");
> +		if (err < 0) {
> +			return err;
> +		}
> +
> +		struct ref_record ref = { 0 };
> +		while (1) {

In the same series, we see both `while (true)` and `while (1)`

> +int main(int argc, char *argv[])

Do we ship this file?
(I haven't checked change to Makefile, yet)

> +{
> +	int opt;
> +	const char *table = NULL;
> +	while ((opt = getopt(argc, argv, "t:")) != -1) {
> +		switch (opt) {
> +		case 't':
> +			table = xstrdup(optarg);
> +			break;
> +		case '?':

We uses "-h|--help" for usage.

> +			printf("usage: %s [-table tablefile]\n", argv[0]);
> +			return 2;
> +			break;

break after return

> diff --git a/reftable/iter.c b/reftable/iter.c
> new file mode 100644
> index 00000000000..747f3eae256
> --- /dev/null
> +++ b/reftable/iter.c
> @@ -0,0 +1,234 @@
> +/*
> +Copyright 2020 Google LLC
> +
> +Use of this source code is governed by a BSD-style
> +license that can be found in the LICENSE file or at
> +https://developers.google.com/open-source/licenses/bsd
> +*/
> +
> +#include "iter.h"
> +
> +#include "system.h"
> +
> +#include "block.h"
> +#include "constants.h"
> +#include "reader.h"
> +#include "reftable.h"
> +
> +bool iterator_is_null(struct reftable_iterator it)
> +{
> +	return it.ops == NULL;
> +}

Do we want this verbose?
Or it'll be use in some kind of vtable?

> +
> +static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
> +{
> +	struct filtering_ref_iterator *fri =
> +		(struct filtering_ref_iterator *)iter_arg;
> +	struct reftable_ref_record *ref =
> +		(struct reftable_ref_record *)rec.data;

We're passing a pointer, references a member of a variable in local
scope around, this will ask for trouble if callee doesn't aware about
it, and save that pointer somewhere for later access.

I really think we shouldn't merge a big patch like this into Git,
it's too hard to follow and it's harder to reasoning.

> +
> +	while (true) {
> +		int err = reftable_iterator_next_ref(fri->it, ref);
> +		if (err != 0) {
> +			return err;
> +		}
> +
> +		if (fri->double_check) {
> +			struct reftable_iterator it = { 0 };
> +
> +			int err = reftable_reader_seek_ref(fri->r, &it,
> +							   ref->ref_name);
> +			if (err == 0) {
> +				err = reftable_iterator_next_ref(it, ref);
> +			}
> +
> +			reftable_iterator_destroy(&it);
> +
> +			if (err < 0) {
> +				return err;
> +			}
> +
> +			if (err > 0) {
> +				continue;
> +			}
> +		}
> +
> +		if ((ref->target_value != NULL &&
> +		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
> +		    (ref->value != NULL &&
> +		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
> +			return 0;
> +		}
> +	}
> +}
> +
> +}
> +
> diff --git a/reftable/iter.h b/reftable/iter.h
> new file mode 100644
> index 00000000000..408b7f41a74

I stopped at this. It's too tiresome to read this patch.

Overall, I have those general comments, when I have _NOT_ read the
design:

* This patch introduce a total different coding style
* size should be in size_t, not int
* passing large structure around, and passing pointer to some member
  of that structure around, this will ask for trouble.
* Some code are un-parsable,
* I strongly agree with Dscho, we should merge this patch as subtree
  instead, or we should provide API so other party can provide plugin
  ref-backend.

-- 
Danh

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v10 09/12] Add reftable library
  2020-04-28 14:55                     ` Danh Doan
@ 2020-04-28 15:29                       ` Junio C Hamano
  2020-04-28 15:31                         ` Junio C Hamano
  2020-04-28 20:21                       ` Han-Wen Nienhuys
  1 sibling, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-04-28 15:29 UTC (permalink / raw)
  To: Danh Doan
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Han-Wen Nienhuys

Danh Doan <congdanhqx@gmail.com> writes:

>> +BSD License
>> +
>> +Copyright (c) 2020, Google LLC
>> +All rights reserved.
>
> Linking against GPLv2 code will make this code GPLv2, right?

Wrong.  A program (e.g. "git") can have various parts, some of which
is licensed under GPL, some other parts under BSD, yet some other
parts under LGPL.  Read the second paragraph of README we ship
without source.

I think you are thinking about the obligation a distributor of the
linked binary compiled from such source has, which is that they need
to make available not just the GPL part but the rest of the program
used to compile the whole, including parts licensed under BSD and
other licenses.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v10 09/12] Add reftable library
  2020-04-28 15:29                       ` Junio C Hamano
@ 2020-04-28 15:31                         ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-04-28 15:31 UTC (permalink / raw)
  To: Danh Doan
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Han-Wen Nienhuys

Junio C Hamano <gitster@pobox.com> writes:

> Danh Doan <congdanhqx@gmail.com> writes:
>
>>> +BSD License
>>> +
>>> +Copyright (c) 2020, Google LLC
>>> +All rights reserved.
>>
>> Linking against GPLv2 code will make this code GPLv2, right?
>
> Wrong.  A program (e.g. "git") can have various parts, some of which
> is licensed under GPL, some other parts under BSD, yet some other
> parts under LGPL.  Read the second paragraph of README we ship
> without source.

without --> with our

;-)

>
> I think you are thinking about the obligation a distributor of the
> linked binary compiled from such source has, which is that they need
> to make available not just the GPL part but the rest of the program
> used to compile the whole, including parts licensed under BSD and
> other licenses.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v10 09/12] Add reftable library
  2020-04-28 14:55                     ` Danh Doan
  2020-04-28 15:29                       ` Junio C Hamano
@ 2020-04-28 20:21                       ` Han-Wen Nienhuys
  2020-04-28 20:23                         ` Han-Wen Nienhuys
  1 sibling, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-04-28 20:21 UTC (permalink / raw)
  To: Danh Doan; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, Apr 28, 2020 at 4:55 PM Danh Doan <congdanhqx@gmail.com> wrote:
> > +#include "system.h"
> > +
> > +void put_be24(byte *out, uint32_t i)
>
> So, we introduce a new type `byte`?

The reftable is structured as a library. The declaration for 'byte',
as well as 'bool'  is private to the library.

> > +{
> > +     out[0] = (byte)((i >> 16) & 0xff);
> > +     out[1] = (byte)((i >> 8) & 0xff);
> > +     out[2] = (byte)((i)&0xff);
>
> At my first glance, I thought that code is:
>
>         out[2] = (byte)( (i)(&0xff) )
>
> It's totally un-parsable.

I removed the parens here, in
https://github.com/google/reftable/commit/6f883714588d2658122fefd85a2e2e76401ae2be

that also changes a number of 'int's to 'size_t's.

> > +     return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
> > +            (uint32_t)(in[2]);
>
> I don't think we need this much cast. One cast can rule them all.

would you have a concrete suggestion to rewrite this? The rules for
undefined behavior and integer casts are subtle, and I know they are a
cause of concern in crypto code, and I'd rather be correct than rely
on deep knowledge of the C standard.

> > +int binsearch(int sz, int (*f)(int k, void *args), void *args)
>
> binary search?
>
> I think we want all size to be `size_t`

fixed this.

>
> > +{
> > +     int lo = 0;
> > +     int hi = sz;
> > +
> > +     /* invariant: (hi == sz) || f(hi) == true
> > +        (lo == 0 && f(0) == true) || fi(lo) == false
> > +      */
>
> Comment style

Review style

> > diff --git a/reftable/basics.h b/reftable/basics.h
> > new file mode 100644
> > index 00000000000..6d89eb5d931
> > --- /dev/null
> > +++ b/reftable/basics.h
> > @@ -0,0 +1,53 @@
> [..snip..]
> > +  find smallest index i in [0, sz) at which f(i) is true, assuming
> > +  that f is ascending. Return sz if f(i) is false for all indices.
> > +*/
>
> So, this is about partitioning, not searching

there is precedent for this to be called binary search. See for
example, https://golang.org/pkg/sort/#Search

> > +int block_writer_add(struct block_writer *w, struct record rec)
>
> So, we're gonna pass a large structure into a function now?

The record structure contains two pointers.

> > +     if (block_writer_register_restart(w, start.len - out.len, restart,
> > +                                       key) < 0) {
>
> Eh, naming is hard, and so does this function.

I can't parse this sentence.

> It's overly verbose,
> and long. I haven't go though all of this patch (because it's too
> long), but I guess some of them can be made static and their name can
> be shortened.

For uniformity all functions taking a struct Xyz* as the first
argument are called

  xyz_do_something

since this takes a block_writer, and it registers a restart, it is
called block_writer_register_restart. The naming of this is uniform
throughout the library.

> > +             block_source_return_block(block->source, block);
> > +             block->data = uncompressed.buf;
> > +             block->len = dst_len; /* XXX: 4 bytes missing? */
>
> Even this is a very lengthy patch, some bytes is missing but we
> couldn't figured out.

good point, I'll look into this again.

> > +     {
>
> This cute curly braces is new to me

Pleased to make your acquaintance!

> > +static int restart_key_less(int idx, void *args)
>
> idx? index? Shouldn't it be size_t?

fixed this.

>
> > +{
> > +     struct restart_find_args *a = (struct restart_find_args *)args;
> > +     uint32_t off = block_reader_restart_offset(a->r, idx);
> > +     struct slice in = {
> > +             .buf = a->r->block.data + off,
> > +             .len = a->r->block_len - off,
>
> C99 designated initialisation.
> We aren't ready, yet.

I'm sorry, gcc -std=c89 said this was OK.

> > +void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
> > +{
> > +     dest->br = src->br;
> > +     dest->next_off = src->next_off;
> > +     slice_copy(&dest->last_key, src->last_key);
> > +}
> > +
> > +/* return < 0 for error, 0 for OK, > 0 for EOF. */
> > +int block_iter_next(struct block_iter *it, struct record rec)
>
> If this function is used only in this file, it should be static,
> otherwise, the comment should go to header file

moved comment to header file.

> > diff --git a/reftable/block.h b/reftable/block.h
> > new file mode 100644
> > index 00000000000..62b2e0fec6d
> > +/*
> > +  Writes reftable blocks. The block_writer is reused across blocks to minimize
> > +  allocation overhead.
> > +*/
> > +struct block_writer {
> > +     byte *buf;
> > +     uint32_t block_size;
>
> I suppose this means 2^32 block should be enough for everyone?

No. 2^24-1 is the maximum size, as detailed in the specification.

> > diff --git a/reftable/bytes.c b/reftable/bytes.c
> > new file mode 100644
> > index 00000000000..e69de29bb2d
>
> Empty?

thanks, I've removed these.

> > diff --git a/reftable/config.h b/reftable/config.h
> > new file mode 100644
> > index 00000000000..40a8c178f10
> > --- /dev/null
> > +++ b/reftable/config.h
> > @@ -0,0 +1 @@
> > +/* empty */
>
> Empty?

Removed

> > +++ b/reftable/dump.c

This was moved to a different directory in upstream; I've removed it now.

> > +bool iterator_is_null(struct reftable_iterator it)
> > +{
> > +     return it.ops == NULL;
> > +}
>
> Do we want this verbose?
> Or it'll be use in some kind of vtable?

I've added a doc comment.

> > +
> > +static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
> > +{
> > +     struct filtering_ref_iterator *fri =
> > +             (struct filtering_ref_iterator *)iter_arg;
> > +     struct reftable_ref_record *ref =
> > +             (struct reftable_ref_record *)rec.data;
>
> We're passing a pointer, references a member of a variable in local
> scope around, this will ask for trouble if callee doesn't aware about
> it, and save that pointer somewhere for later access.

I don't understand your criticism; could you be more precise?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v10 09/12] Add reftable library
  2020-04-28 20:21                       ` Han-Wen Nienhuys
@ 2020-04-28 20:23                         ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-04-28 20:23 UTC (permalink / raw)
  To: Danh Doan; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, Apr 28, 2020 at 10:21 PM Han-Wen Nienhuys <hanwen@google.com> wrote:
> I've added a doc comment.

I've pushed these fixes to
https://github.com/gitgitgadget/git/pull/539. Should I be sending new
patches over email for each review round?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v10 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-04-27 20:13                   ` [PATCH v10 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
@ 2020-04-30 21:17                     ` Emily Shaffer
  2020-05-04 18:03                       ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Emily Shaffer @ 2020-04-30 21:17 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

On Mon, Apr 27, 2020 at 08:13:28PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> 
> 
> This happens implicitly in the files/packed ref backend; making it
> explicit simplifies adding alternate ref storage backends, such as
> reftable.

As an outsider to this part of the codebase, a little more explanation
could be handy in the commit message. I found the backends you mentioned
in refs, and it seems like they're the only two, but it's not obvious
how this delta is related to those backends. Furthermore, grepping looks
like this function whose behavior is changing is being called from
somewhere else, with no change to that function (and it looks like the
callsite's callback doesn't check whether a ref begins with refs/ or
not).

All this to say - it's hard to convince myself this is a safe change,
and I'd really like to read more to understand why you made it.

> 
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  refs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index b8759116cd0..05e05579408 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1569,7 +1569,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
>  
>  int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
>  {
> -	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
> +	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
>  }
>  
>  int for_each_ref(each_ref_fn fn, void *cb_data)
> @@ -1629,8 +1629,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
>  
>  int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
>  {
> -	return do_for_each_ref(refs, "", fn, 0,
> -			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
> +	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
> +			       cb_data);
>  }
>  
>  int for_each_rawref(each_ref_fn fn, void *cb_data)
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v10 03/12] create .git/refs in files-backend.c
  2020-04-27 20:13                   ` [PATCH v10 03/12] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
@ 2020-04-30 21:24                     ` Emily Shaffer
  2020-04-30 21:49                       ` Junio C Hamano
  2020-05-04 18:10                       ` Han-Wen Nienhuys
  0 siblings, 2 replies; 409+ messages in thread
From: Emily Shaffer @ 2020-04-30 21:24 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

On Mon, Apr 27, 2020 at 08:13:29PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> 
> 
> This prepares for supporting the reftable format, which will want
> create its own file system layout in .git
> 
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  builtin/init-db.c    | 2 --
>  refs/files-backend.c | 6 ++++++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index 0b7222e7188..3b50b1aa0e5 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -251,8 +251,6 @@ static int create_default_files(const char *template_path,
>  	 * We need to create a "refs" dir in any case so that older
>  	 * versions of git can tell that this is a repository.
>  	 */
> -	safe_create_dir(git_path("refs"), 1);
> -	adjust_shared_perm(git_path("refs"));

Is the reftable completely replacing the refs/ dir? Or is the idea that
the refs/ dir is only used by the files backend? The commit message
makes it sound like it's an additional format to support, so I'm a
little confused. Why does the other currently-existing backend not need
the refs/ dir at this stage?

>  
>  	if (refs_init_db(&err))
>  		die("failed to set up refs db: %s", err.buf);
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 561c33ac8a9..ab7899a9c77 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3157,9 +3157,15 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
>  		files_downcast(ref_store, REF_STORE_WRITE, "init_db");
>  	struct strbuf sb = STRBUF_INIT;
>  
> +	files_ref_path(refs, &sb, "refs");
> +	safe_create_dir(sb.buf, 1);
> +	/* adjust permissions even if directory already exists. */
> +	adjust_shared_perm(sb.buf);
> +
>  	/*
>  	 * Create .git/refs/{heads,tags}
>  	 */
> +	strbuf_reset(&sb);
>  	files_ref_path(refs, &sb, "refs/heads");
>  	safe_create_dir(sb.buf, 1);
>  
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v10 03/12] create .git/refs in files-backend.c
  2020-04-30 21:24                     ` Emily Shaffer
@ 2020-04-30 21:49                       ` Junio C Hamano
  2020-05-04 18:10                       ` Han-Wen Nienhuys
  1 sibling, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-04-30 21:49 UTC (permalink / raw)
  To: Emily Shaffer
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Han-Wen Nienhuys

Emily Shaffer <emilyshaffer@google.com> writes:

>> -	safe_create_dir(git_path("refs"), 1);
>> -	adjust_shared_perm(git_path("refs"));
>
> Is the reftable completely replacing the refs/ dir? Or is the idea that
> the refs/ dir is only used by the files backend? The commit message
> makes it sound like it's an additional format to support, so I'm a
> little confused. Why does the other currently-existing backend not need
> the refs/ dir at this stage?

Other current backend being the files-backend, which creates it
below, isn't this a no-op at this stage?

What's more interesting is what would happen when a repo is
initialized with the reftable backend as its sole backend.  

Even then, the repository discovery code of existing versions of Git
needs to see a directory with a ".git" subdirectory, in the latter
of which there must be HEAD and objects and refs subdirectories, so
we'd need to create refs/ directory even if reftable does not use it.

This step, from that point of view, may not be necessary.  But as
long as we make sure any repository creates refs/ and objects/
directories, regardless of the ref backend, we'd be OK.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v10 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-04-30 21:17                     ` Emily Shaffer
@ 2020-05-04 18:03                       ` Han-Wen Nienhuys
  2020-05-05 18:26                         ` Pseudo ref handling (was Re: [PATCH v10 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref) Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-04 18:03 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Thu, Apr 30, 2020 at 11:17 PM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> On Mon, Apr 27, 2020 at 08:13:28PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> >
> >
> > This happens implicitly in the files/packed ref backend; making it
> > explicit simplifies adding alternate ref storage backends, such as
> > reftable.
>
> As an outsider to this part of the codebase, a little more explanation
> could be handy in the commit message. I found the backends you mentioned
> in refs, and it seems like they're the only two, but it's not obvious
> how this delta is related to those backends. Furthermore, grepping looks
> like this function whose behavior is changing is being called from
> somewhere else, with no change to that function (and it looks like the
> callsite's callback doesn't check whether a ref begins with refs/ or
> not).
>
> All this to say - it's hard to convince myself this is a safe change,
> and I'd really like to read more to understand why you made it.

I'll be the first to admit that I'm on shaky ground here. However,
given that the test suite passes, if this is breaking some behavior,
it's probably not very well tested behavior.

Here is what I know:

Git stores refs in multiple places:
- normal refs, packed: .git/packed-refs
- normal refs, loose: under refs/*/
- special refs: HEAD, ORIG_HEAD, REBASE_HEAD.

Currently, the special refs can only be read with refs_read_raw_ref().
If you iterate over the refs in the files/packed backend, you can
never find HEAD, ORIG_HEAD etc.

Reftable does not have different classes of ref storage, so this means
that the whole space of refs (including HEAD) is managed by the
reftable backend, and if you iterate over the refs, reftable will also
produce HEAD. This is not spelled out in the spec, but JGit actually
stores HEAD in reftable too, and we want to be interoperable.

Without this patch, commands like "git show-ref" will produce an entry
"HEAD", which is a regression.

With this patch, the default iteration is limited to the "refs/"
prefix, so we don't produce HEAD in the reftable backend by default.

Now, I have several questions:

* how does this interact with worktrees? It seems that there is a
special worktree namespace?
* if you are doing a rebase, and have unreachable objects in
ORIG_HEAD, REBASE_HEAD, how does git-gc ensure that these objects stay
alive? I'd think that you need to iterate over all entries (including
REBASE_HEAD and friends), but I haven't been able to understand how
that works.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v10 03/12] create .git/refs in files-backend.c
  2020-04-30 21:24                     ` Emily Shaffer
  2020-04-30 21:49                       ` Junio C Hamano
@ 2020-05-04 18:10                       ` Han-Wen Nienhuys
  1 sibling, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-04 18:10 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Thu, Apr 30, 2020 at 11:24 PM Emily Shaffer <emilyshaffer@google.com> wrote:
>
> On Mon, Apr 27, 2020 at 08:13:29PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> >
> >
> > This prepares for supporting the reftable format, which will want
> > create its own file system layout in .git
> >
> > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> > ---
> >  builtin/init-db.c    | 2 --
> >  refs/files-backend.c | 6 ++++++
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/builtin/init-db.c b/builtin/init-db.c
> > index 0b7222e7188..3b50b1aa0e5 100644
> > --- a/builtin/init-db.c
> > +++ b/builtin/init-db.c
> > @@ -251,8 +251,6 @@ static int create_default_files(const char *template_path,
> >        * We need to create a "refs" dir in any case so that older
> >        * versions of git can tell that this is a repository.
> >        */
> > -     safe_create_dir(git_path("refs"), 1);
> > -     adjust_shared_perm(git_path("refs"));
>
> Is the reftable completely replacing the refs/ dir? Or is the idea that
> the refs/ dir is only used by the files backend? The commit message
> makes it sound like it's an additional format to support, so I'm a
> little confused. Why does the other currently-existing backend not need
> the refs/ dir at this stage?

I've dropped this patch from the series; it probably was from before
the compat hack that is described in

  https://git.eclipse.org/r/#/c/157167/

for further clarification: reftable doesn't used the refs/ dir at all,
but we still want to not confuse older git versions.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v11 00/12] Reftable support git-core
  2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
                                     ` (11 preceding siblings ...)
  2020-04-27 20:13                   ` [PATCH v10 12/12] t: use update-ref and show-ref to reading/writing refs Han-Wen Nienhuys via GitGitGadget
@ 2020-05-04 19:03                   ` Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 01/12] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
                                       ` (13 more replies)
  12 siblings, 14 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-04 19:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend.

Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1

Summary 18964 tests pass 2871 tests fail

Some issues:

 * worktree support looks broken
 * submodules (eg. t1013)
 * many tests inspect .git/logs/ directly.

v14

 * Fixes for Windows (thanks, Dscho!)
 * Detect file/directory conflicts.
 * Plug some memory leaks.
 * Fixes for autocompaction

Han-Wen Nienhuys (10):
  refs.h: clarify reflog iteration order
  Iterate over the "refs/" namespace in for_each_[raw]ref
  refs: document how ref_iterator_advance_fn should handle symrefs
  Add .gitattributes for the reftable/ directory
  reftable: define version 2 of the spec to accomodate SHA256
  reftable: clarify how empty tables should be written
  Add reftable library
  Reftable support for git-core
  Add some reftable testing infrastructure
  t: use update-ref and show-ref to reading/writing refs

Johannes Schindelin (1):
  vcxproj: adjust for the reftable changes

Jonathan Nieder (1):
  reftable: file format documentation

 Documentation/Makefile                        |    1 +
 Documentation/technical/reftable.txt          | 1080 +++++++++++++++
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   27 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   56 +-
 cache.h                                       |    6 +-
 config.mak.uname                              |    2 +-
 contrib/buildsystems/Generators/Vcxproj.pm    |   11 +-
 refs.c                                        |   33 +-
 refs.h                                        |    8 +-
 refs/refs-internal.h                          |    6 +
 refs/reftable-backend.c                       | 1045 ++++++++++++++
 reftable/.gitattributes                       |    1 +
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   11 +
 reftable/VERSION                              |    5 +
 reftable/basics.c                             |  215 +++
 reftable/basics.h                             |   53 +
 reftable/block.c                              |  436 ++++++
 reftable/block.h                              |  126 ++
 reftable/blocksource.h                        |   22 +
 reftable/constants.h                          |   21 +
 reftable/file.c                               |   99 ++
 reftable/iter.c                               |  240 ++++
 reftable/iter.h                               |   63 +
 reftable/merged.c                             |  327 +++++
 reftable/merged.h                             |   38 +
 reftable/pq.c                                 |  115 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  758 ++++++++++
 reftable/reader.h                             |   68 +
 reftable/record.c                             | 1131 +++++++++++++++
 reftable/record.h                             |  121 ++
 reftable/refname.c                            |  215 +++
 reftable/refname.h                            |   39 +
 reftable/reftable.c                           |   91 ++
 reftable/reftable.h                           |  564 ++++++++
 reftable/slice.c                              |  225 +++
 reftable/slice.h                              |   76 +
 reftable/stack.c                              | 1229 +++++++++++++++++
 reftable/stack.h                              |   45 +
 reftable/system.h                             |   54 +
 reftable/tree.c                               |   67 +
 reftable/tree.h                               |   34 +
 reftable/update.sh                            |   24 +
 reftable/writer.c                             |  661 +++++++++
 reftable/writer.h                             |   60 +
 reftable/zlib-compat.c                        |   92 ++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0002-gitfile.sh                            |    2 +-
 t/t0031-reftable.sh                           |  109 ++
 t/t1400-update-ref.sh                         |   32 +-
 t/t1506-rev-parse-diagnosis.sh                |    2 +-
 t/t3210-pack-refs.sh                          |    6 +
 t/t6050-replace.sh                            |    2 +-
 t/t9020-remote-svn.sh                         |    4 +-
 t/test-lib.sh                                 |    5 +
 60 files changed, 9798 insertions(+), 57 deletions(-)
 create mode 100644 Documentation/technical/reftable.txt
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/.gitattributes
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100755 t/t0031-reftable.sh


base-commit: b34789c0b0d3b137f0bb516b417bd8d75e0cb306
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v11
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v11
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v10:

  1:  d3d8ed2032c =  1:  dfa5fd74f85 refs.h: clarify reflog iteration order
  2:  45fd65f72e0 =  2:  340c5c415e1 Iterate over the "refs/" namespace in for_each_[raw]ref
  3:  bc89bcd9c8c <  -:  ----------- create .git/refs in files-backend.c
  4:  fd67cdff0cd =  3:  6553285043b refs: document how ref_iterator_advance_fn should handle symrefs
  5:  5373828f854 =  4:  7dc47c7756f Add .gitattributes for the reftable/ directory
  6:  8eeed29ebbf =  5:  06fcb49e903 reftable: file format documentation
  7:  5ebc272e23d =  6:  093fa74a3d0 reftable: define version 2 of the spec to accomodate SHA256
  8:  95aae041613 =  7:  6d9031372ce reftable: clarify how empty tables should be written
  9:  59209a5ad39 !  8:  6ee6c44752c Add reftable library
     @@ reftable/README.md (new)
      
       ## reftable/VERSION (new) ##
      @@
     -+commit a6d6b31bea256044621d789df64a8159647f21bc
     ++commit 3a486f79abcd17e88e3bca62f43188a7c8b80ff3
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Mon Apr 27 20:46:21 2020 +0200
     ++Date:   Mon May 4 19:20:21 2020 +0200
      +
     -+    C: prefix error codes with REFTABLE_
     ++    C: include "cache.h" in git-core sleep_millisec()
      
       ## reftable/basics.c (new) ##
      @@
     @@ reftable/basics.c (new)
      +{
      +	out[0] = (byte)((i >> 16) & 0xff);
      +	out[1] = (byte)((i >> 8) & 0xff);
     -+	out[2] = (byte)((i)&0xff);
     ++	out[2] = (byte)(i & 0xff);
      +}
      +
      +uint32_t get_be24(byte *in)
     @@ reftable/basics.c (new)
      +void put_be16(uint8_t *out, uint16_t i)
      +{
      +	out[0] = (uint8_t)((i >> 8) & 0xff);
     -+	out[1] = (uint8_t)((i)&0xff);
     ++	out[1] = (uint8_t)(i & 0xff);
      +}
      +
     -+int binsearch(int sz, int (*f)(int k, void *args), void *args)
     ++int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args)
      +{
     -+	int lo = 0;
     -+	int hi = sz;
     ++	size_t lo = 0;
     ++	size_t hi = sz;
      +
      +	/* invariant: (hi == sz) || f(hi) == true
      +	   (lo == 0 && f(0) == true) || fi(lo) == false
      +	 */
      +	while (hi - lo > 1) {
     -+		int mid = lo + (hi - lo) / 2;
     ++		size_t mid = lo + (hi - lo) / 2;
      +
      +		int val = f(mid, args);
      +		if (val) {
     @@ reftable/basics.c (new)
      +
      +const char *reftable_error_str(int err)
      +{
     ++	static char buf[250];
      +	switch (err) {
      +	case REFTABLE_IO_ERROR:
      +		return "I/O error";
     @@ reftable/basics.c (new)
      +		return "misuse of the reftable API";
      +	case REFTABLE_ZLIB_ERROR:
      +		return "zlib failure";
     ++	case REFTABLE_NAME_CONFLICT:
     ++		return "file/directory conflict";
     ++	case REFTABLE_REFNAME_ERROR:
     ++		return "invalid refname";
      +	case -1:
      +		return "general error";
      +	default:
     -+		return "unknown error code";
     ++		snprintf(buf, sizeof(buf), "unknown error code %d", err);
     ++		return buf;
      +	}
      +}
      +
     -+int reftable_error_to_errno(int err) {
     ++int reftable_error_to_errno(int err)
     ++{
      +	switch (err) {
      +	case REFTABLE_IO_ERROR:
      +		return EIO;
     @@ reftable/basics.c (new)
      +	}
      +}
      +
     -+
      +void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
      +void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
      +void (*reftable_free_ptr)(void *) = &free;
     @@ reftable/basics.h (new)
      +  find smallest index i in [0, sz) at which f(i) is true, assuming
      +  that f is ascending. Return sz if f(i) is false for all indices.
      +*/
     -+int binsearch(int sz, int (*f)(int k, void *args), void *args);
     ++int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args);
      +
      +/*
      +  Frees a NULL terminated array of malloced strings. The array itself is also
     @@ reftable/block.c (new)
      +			memcpy(w->buf + block_header_skip, compressed.buf,
      +			       dest_len);
      +			w->next = dest_len + block_header_skip;
     ++			slice_clear(&compressed);
      +			break;
      +		}
      +	}
     @@ reftable/block.c (new)
      +	if (typ == BLOCK_TYPE_LOG) {
      +		struct slice uncompressed = { 0 };
      +		int block_header_skip = 4 + header_off;
     -+		uLongf dst_len = sz - block_header_skip;
     ++		uLongf dst_len = sz - block_header_skip; /* total size of dest
     ++							    buffer. */
      +		uLongf src_len = block->len - block_header_skip;
      +
     ++		/* Log blocks specify the *uncompressed* size in their header.
     ++		 */
      +		slice_resize(&uncompressed, sz);
     ++
     ++		/* Copy over the block header verbatim. It's not compressed. */
      +		memcpy(uncompressed.buf, block->data, block_header_skip);
      +
     ++		/* Uncompress */
      +		if (Z_OK != uncompress_return_consumed(
      +				    uncompressed.buf + block_header_skip,
      +				    &dst_len, block->data + block_header_skip,
     @@ reftable/block.c (new)
      +			return REFTABLE_ZLIB_ERROR;
      +		}
      +
     ++		if (dst_len + block_header_skip != sz) {
     ++			return REFTABLE_FORMAT_ERROR;
     ++		}
     ++
     ++		/* We're done with the input data. */
      +		block_source_return_block(block->source, block);
      +		block->data = uncompressed.buf;
     -+		block->len = dst_len; /* XXX: 4 bytes missing? */
     ++		block->len = sz;
      +		block->source = malloc_block_source();
      +		full_block_size = src_len + block_header_skip;
      +	} else if (full_block_size == 0) {
     @@ reftable/block.c (new)
      +	struct block_reader *r;
      +};
      +
     -+static int restart_key_less(int idx, void *args)
     ++static int restart_key_less(size_t idx, void *args)
      +{
      +	struct restart_find_args *a = (struct restart_find_args *)args;
      +	uint32_t off = block_reader_restart_offset(a->r, idx);
     @@ reftable/block.c (new)
      +	slice_copy(&dest->last_key, src->last_key);
      +}
      +
     -+/* return < 0 for error, 0 for OK, > 0 for EOF. */
      +int block_iter_next(struct block_iter *it, struct record rec)
      +{
      +	if (it->next_off >= it->br->block_len) {
     @@ reftable/block.h (new)
      +int block_reader_first_key(struct block_reader *br, struct slice *key);
      +
      +void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
     ++
     ++/* return < 0 for error, 0 for OK, > 0 for EOF. */
      +int block_iter_next(struct block_iter *it, struct record rec);
      +
      +/* Seek to `want` with in the block pointed to by `it` */
     @@ reftable/blocksource.h (new)
      +
      +#endif
      
     - ## reftable/bytes.c (new) ##
     -
     - ## reftable/config.h (new) ##
     -@@
     -+/* empty */
     -
       ## reftable/constants.h (new) ##
      @@
      +/*
     @@ reftable/constants.h (new)
      +
      +#endif
      
     - ## reftable/dump.c (new) ##
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#include "system.h"
     -+
     -+#include "reftable.h"
     -+
     -+static int dump_table(const char *tablename)
     -+{
     -+	struct block_source src = { 0 };
     -+	int err = block_source_from_file(&src, tablename);
     -+	if (err < 0) {
     -+		return err;
     -+	}
     -+
     -+	struct reader *r = NULL;
     -+	err = new_reader(&r, src, tablename);
     -+	if (err < 0) {
     -+		return err;
     -+	}
     -+
     -+	{
     -+		struct iterator it = { 0 };
     -+		err = reader_seek_ref(r, &it, "");
     -+		if (err < 0) {
     -+			return err;
     -+		}
     -+
     -+		struct ref_record ref = { 0 };
     -+		while (1) {
     -+			err = iterator_next_ref(it, &ref);
     -+			if (err > 0) {
     -+				break;
     -+			}
     -+			if (err < 0) {
     -+				return err;
     -+			}
     -+			ref_record_print(&ref, 20);
     -+		}
     -+		iterator_destroy(&it);
     -+		ref_record_clear(&ref);
     -+	}
     -+
     -+	{
     -+		struct iterator it = { 0 };
     -+		err = reader_seek_log(r, &it, "");
     -+		if (err < 0) {
     -+			return err;
     -+		}
     -+		struct log_record log = { 0 };
     -+		while (1) {
     -+			err = iterator_next_log(it, &log);
     -+			if (err > 0) {
     -+				break;
     -+			}
     -+			if (err < 0) {
     -+				return err;
     -+			}
     -+			log_record_print(&log, 20);
     -+		}
     -+		iterator_destroy(&it);
     -+		log_record_clear(&log);
     -+	}
     -+	return 0;
     -+}
     -+
     -+int main(int argc, char *argv[])
     -+{
     -+	int opt;
     -+	const char *table = NULL;
     -+	while ((opt = getopt(argc, argv, "t:")) != -1) {
     -+		switch (opt) {
     -+		case 't':
     -+			table = xstrdup(optarg);
     -+			break;
     -+		case '?':
     -+			printf("usage: %s [-table tablefile]\n", argv[0]);
     -+			return 2;
     -+			break;
     -+		}
     -+	}
     -+
     -+	if (table != NULL) {
     -+		int err = dump_table(table);
     -+		if (err < 0) {
     -+			fprintf(stderr, "%s: %s: %s\n", argv[0], table,
     -+				error_str(err));
     -+			return 1;
     -+		}
     -+	}
     -+	return 0;
     -+}
     -
       ## reftable/file.c (new) ##
      @@
      +/*
     @@ reftable/file.c (new)
      +		p->size = st.st_size;
      +		p->fd = fd;
      +
     ++		assert(bs->ops == NULL);
      +		bs->ops = &file_vtable;
      +		bs->arg = p;
      +	}
      +	return 0;
      +}
      +
     -+int reftable_fd_write(void *arg, byte *data, int sz)
     ++int reftable_fd_write(void *arg, byte *data, size_t sz)
      +{
      +	int *fdp = (int *)arg;
      +	return write(*fdp, data, sz);
     @@ reftable/iter.c (new)
      +
      +void iterator_set_empty(struct reftable_iterator *it)
      +{
     ++	assert(it->ops == NULL);
      +	it->iter_arg = NULL;
      +	it->ops = &empty_vtable;
      +}
     @@ reftable/iter.c (new)
      +		(struct filtering_ref_iterator *)iter_arg;
      +	struct reftable_ref_record *ref =
      +		(struct reftable_ref_record *)rec.data;
     -+
     ++	int err = 0;
      +	while (true) {
     -+		int err = reftable_iterator_next_ref(fri->it, ref);
     ++		err = reftable_iterator_next_ref(fri->it, ref);
      +		if (err != 0) {
     -+			return err;
     ++			break;
      +		}
      +
      +		if (fri->double_check) {
      +			struct reftable_iterator it = { 0 };
      +
     -+			int err = reftable_reader_seek_ref(fri->r, &it,
     -+							   ref->ref_name);
     ++			err = reftable_reader_seek_ref(fri->r, &it,
     ++						       ref->ref_name);
      +			if (err == 0) {
      +				err = reftable_iterator_next_ref(it, ref);
      +			}
     @@ reftable/iter.c (new)
      +			reftable_iterator_destroy(&it);
      +
      +			if (err < 0) {
     -+				return err;
     ++				break;
      +			}
      +
      +			if (err > 0) {
     @@ reftable/iter.c (new)
      +			return 0;
      +		}
      +	}
     ++
     ++	reftable_ref_record_clear(ref);
     ++	return err;
      +}
      +
      +struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
     @@ reftable/iter.c (new)
      +void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
      +					  struct filtering_ref_iterator *fri)
      +{
     ++	assert(it->ops == NULL);
      +	it->iter_arg = fri;
      +	it->ops = &filtering_ref_iterator_vtable;
      +}
     @@ reftable/iter.c (new)
      +void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
      +					  struct indexed_table_ref_iter *itr)
      +{
     ++	assert(it->ops == NULL);
      +	it->iter_arg = itr;
      +	it->ops = &indexed_table_ref_iter_vtable;
      +}
     @@ reftable/iter.h (new)
      +
      +void iterator_set_empty(struct reftable_iterator *it);
      +int iterator_next(struct reftable_iterator it, struct record rec);
     ++
     ++/* Returns true for a zeroed out iterator, such as the one returned from
     ++   iterator_destroy. */
      +bool iterator_is_null(struct reftable_iterator it);
      +
      +/* iterator that produces only ref records that point to `oid` */
     @@ reftable/merged.c (new)
      +	reftable_free(mi->stack);
      +}
      +
     -+static int merged_iter_advance_subiter(struct merged_iter *mi, int idx)
     ++static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
      +{
      +	if (iterator_is_null(mi->stack[idx])) {
      +		return 0;
     @@ reftable/merged.c (new)
      +static void iterator_from_merged_iter(struct reftable_iterator *it,
      +				      struct merged_iter *mi)
      +{
     ++	assert(it->ops == NULL);
      +	it->iter_arg = mi;
      +	it->ops = &merged_iter_vtable;
      +}
     @@ reftable/merged.c (new)
      +	return mt->min;
      +}
      +
     -+static int merged_table_seek_record(struct reftable_merged_table *mt,
     -+				    struct reftable_iterator *it,
     -+				    struct record rec)
     ++int merged_table_seek_record(struct reftable_merged_table *mt,
     ++			     struct reftable_iterator *it, struct record rec)
      +{
      +	struct reftable_iterator *iters = reftable_calloc(
      +		sizeof(struct reftable_iterator) * mt->stack_len);
     @@ reftable/merged.h (new)
      +};
      +
      +void merged_table_clear(struct reftable_merged_table *mt);
     ++int merged_table_seek_record(struct reftable_merged_table *mt,
     ++			     struct reftable_iterator *it, struct record rec);
      +
      +#endif
      
     @@ reftable/reader.c (new)
      +static void iterator_from_table_iter(struct reftable_iterator *it,
      +				     struct table_iter *ti)
      +{
     ++	assert(it->ops == NULL);
      +	it->iter_arg = ti;
      +	it->ops = &table_iter_vtable;
      +}
     @@ reftable/reader.c (new)
      +	if (err == 0) {
      +		*p = rd;
      +	} else {
     ++		block_source_close(&src);
      +		reftable_free(rd);
      +	}
      +	return err;
     @@ reftable/reader.c (new)
      +	struct record got_rec = { 0 };
      +	int err = 0;
      +
     ++	/* Look through the reverse index. */
      +	record_from_obj(&want_rec, &want);
     -+
      +	err = reader_seek(r, &oit, want_rec);
      +	if (err != 0) {
     -+		return err;
     ++		goto exit;
      +	}
      +
     ++	/* read out the obj_record */
      +	record_from_obj(&got_rec, &got);
      +	err = iterator_next(oit, got_rec);
     -+	reftable_iterator_destroy(&oit);
      +	if (err < 0) {
     -+		return err;
     ++		goto exit;
      +	}
      +
      +	if (err > 0 ||
      +	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
     ++		/* didn't find it; return empty iterator */
      +		iterator_set_empty(it);
     -+		return 0;
     ++		err = 0;
     ++		goto exit;
      +	}
      +
      +	{
     @@ reftable/reader.c (new)
      +						 hash_size(r->hash_id),
      +						 got.offsets, got.offset_len);
      +		if (err < 0) {
     -+			record_clear(got_rec);
     -+			return err;
     ++			goto exit;
      +		}
      +		got.offsets = NULL;
     -+		record_clear(got_rec);
     -+
      +		iterator_from_indexed_table_ref_iter(it, itr);
      +	}
      +
     -+	return 0;
     ++exit:
     ++	reftable_iterator_destroy(&oit);
     ++	record_clear(got_rec);
     ++	return err;
      +}
      +
      +static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
     @@ reftable/record.c (new)
      +	memcpy(r->message, dest.buf, dest.len);
      +	r->message[dest.len] = 0;
      +
     ++	slice_clear(&dest);
      +	return start.len - in.len;
      +
      +error:
     @@ reftable/record.c (new)
      +
      +struct record new_record(byte typ)
      +{
     -+	struct record rec;
     ++	struct record rec = { NULL };
      +	switch (typ) {
      +	case BLOCK_TYPE_REF: {
      +		struct reftable_ref_record *r =
     @@ reftable/record.c (new)
      +
      +void record_clear(struct record rec)
      +{
     -+	return rec.ops->clear(rec.data);
     ++	rec.ops->clear(rec.data);
      +}
      +
      +bool record_is_deletion(struct record rec)
     @@ reftable/record.c (new)
      +
      +void record_from_ref(struct record *rec, struct reftable_ref_record *ref_rec)
      +{
     ++	assert(rec->ops == NULL);
      +	rec->data = ref_rec;
      +	rec->ops = &reftable_ref_record_vtable;
      +}
      +
      +void record_from_obj(struct record *rec, struct obj_record *obj_rec)
      +{
     ++	assert(rec->ops == NULL);
      +	rec->data = obj_rec;
      +	rec->ops = &obj_record_vtable;
      +}
      +
      +void record_from_index(struct record *rec, struct index_record *index_rec)
      +{
     ++	assert(rec->ops == NULL);
      +	rec->data = index_rec;
      +	rec->ops = &index_record_vtable;
      +}
      +
      +void record_from_log(struct record *rec, struct reftable_log_record *log_rec)
      +{
     ++	assert(rec->ops == NULL);
      +	rec->data = log_rec;
      +	rec->ops = &reftable_log_record_vtable;
      +}
     @@ reftable/record.h (new)
      +
      +#endif
      
     + ## reftable/refname.c (new) ##
     +@@
     ++/*
     ++  Copyright 2020 Google LLC
     ++
     ++  Use of this source code is governed by a BSD-style
     ++  license that can be found in the LICENSE file or at
     ++  https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include "system.h"
     ++#include "reftable.h"
     ++#include "basics.h"
     ++#include "refname.h"
     ++#include "slice.h"
     ++
     ++struct find_arg {
     ++	char **names;
     ++	const char *want;
     ++};
     ++
     ++static int find_name(size_t k, void *arg)
     ++{
     ++	struct find_arg *f_arg = (struct find_arg *)arg;
     ++
     ++	return strcmp(f_arg->names[k], f_arg->want) >= 0;
     ++}
     ++
     ++int modification_has_ref(struct modification *mod, const char *name)
     ++{
     ++	struct reftable_ref_record ref = { 0 };
     ++	int err = 0;
     ++
     ++	if (mod->add_len > 0) {
     ++		struct find_arg arg = {
     ++			.names = mod->add,
     ++			.want = name,
     ++		};
     ++		int idx = binsearch(mod->add_len, find_name, &arg);
     ++		if (idx < mod->add_len && !strcmp(mod->add[idx], name)) {
     ++			return 0;
     ++		}
     ++	}
     ++
     ++	if (mod->del_len > 0) {
     ++		struct find_arg arg = {
     ++			.names = mod->del,
     ++			.want = name,
     ++		};
     ++		int idx = binsearch(mod->del_len, find_name, &arg);
     ++		if (idx < mod->del_len && !strcmp(mod->del[idx], name)) {
     ++			return 1;
     ++		}
     ++	}
     ++
     ++	err = reftable_table_read_ref(mod->tab, name, &ref);
     ++	reftable_ref_record_clear(&ref);
     ++	return err;
     ++}
     ++
     ++static void modification_clear(struct modification *mod)
     ++{
     ++	FREE_AND_NULL(mod->add);
     ++	FREE_AND_NULL(mod->del);
     ++	mod->add_len = 0;
     ++	mod->del_len = 0;
     ++}
     ++
     ++int modification_has_ref_with_prefix(struct modification *mod,
     ++				     const char *prefix)
     ++{
     ++	struct reftable_iterator it = { NULL };
     ++	struct reftable_ref_record ref = { NULL };
     ++	int err = 0;
     ++
     ++	if (mod->add_len > 0) {
     ++		struct find_arg arg = {
     ++			.names = mod->add,
     ++			.want = prefix,
     ++		};
     ++		int idx = binsearch(mod->add_len, find_name, &arg);
     ++		if (idx < mod->add_len &&
     ++		    !strncmp(prefix, mod->add[idx], strlen(prefix))) {
     ++			goto exit;
     ++		}
     ++	}
     ++
     ++	err = reftable_table_seek_ref(mod->tab, &it, prefix);
     ++	if (err) {
     ++		goto exit;
     ++	}
     ++
     ++	while (true) {
     ++		err = reftable_iterator_next_ref(it, &ref);
     ++		if (err) {
     ++			goto exit;
     ++		}
     ++
     ++		if (mod->del_len > 0) {
     ++			struct find_arg arg = {
     ++				.names = mod->del,
     ++				.want = ref.ref_name,
     ++			};
     ++			int idx = binsearch(mod->del_len, find_name, &arg);
     ++			if (idx < mod->del_len &&
     ++			    !strcmp(ref.ref_name, mod->del[idx])) {
     ++				continue;
     ++			}
     ++		}
     ++
     ++		if (strncmp(ref.ref_name, prefix, strlen(prefix))) {
     ++			err = 1;
     ++			goto exit;
     ++		}
     ++		err = 0;
     ++		goto exit;
     ++	}
     ++
     ++exit:
     ++	reftable_ref_record_clear(&ref);
     ++	reftable_iterator_destroy(&it);
     ++	return err;
     ++}
     ++
     ++int validate_ref_name(const char *name)
     ++{
     ++	while (true) {
     ++		char *next = strchr(name, '/');
     ++		if (!*name) {
     ++			return REFTABLE_REFNAME_ERROR;
     ++		}
     ++		if (!next) {
     ++			return 0;
     ++		}
     ++		if (next - name == 0 || (next - name == 1 && *name == '.') ||
     ++		    (next - name == 2 && name[0] == '.' && name[1] == '.'))
     ++			return REFTABLE_REFNAME_ERROR;
     ++		name = next + 1;
     ++	}
     ++	return 0;
     ++}
     ++
     ++int validate_ref_record_addition(struct reftable_table tab,
     ++				 struct reftable_ref_record *recs, size_t sz)
     ++{
     ++	struct modification mod = {
     ++		.tab = tab,
     ++		.add = reftable_calloc(sizeof(char *) * sz),
     ++		.del = reftable_calloc(sizeof(char *) * sz),
     ++	};
     ++	int i = 0;
     ++	int err = 0;
     ++	for (; i < sz; i++) {
     ++		if (reftable_ref_record_is_deletion(&recs[i])) {
     ++			mod.del[mod.del_len++] = recs[i].ref_name;
     ++		} else {
     ++			mod.add[mod.add_len++] = recs[i].ref_name;
     ++		}
     ++	}
     ++
     ++	err = modification_validate(&mod);
     ++	modification_clear(&mod);
     ++	return err;
     ++}
     ++
     ++static void slice_trim_component(struct slice *sl)
     ++{
     ++	while (sl->len > 0) {
     ++		bool is_slash = (sl->buf[sl->len - 1] == '/');
     ++		sl->len--;
     ++		if (is_slash)
     ++			break;
     ++	}
     ++}
     ++
     ++int modification_validate(struct modification *mod)
     ++{
     ++	struct slice slashed = { 0 };
     ++	int err = 0;
     ++	int i = 0;
     ++	for (; i < mod->add_len; i++) {
     ++		err = validate_ref_name(mod->add[i]);
     ++		if (err) {
     ++			goto exit;
     ++		}
     ++		slice_set_string(&slashed, mod->add[i]);
     ++		slice_append_string(&slashed, "/");
     ++
     ++		err = modification_has_ref_with_prefix(
     ++			mod, slice_as_string(&slashed));
     ++		if (err == 0) {
     ++			err = REFTABLE_NAME_CONFLICT;
     ++			goto exit;
     ++		}
     ++		if (err < 0) {
     ++			goto exit;
     ++		}
     ++
     ++		slice_set_string(&slashed, mod->add[i]);
     ++		while (slashed.len) {
     ++			slice_trim_component(&slashed);
     ++			err = modification_has_ref(mod,
     ++						   slice_as_string(&slashed));
     ++			if (err == 0) {
     ++				err = REFTABLE_NAME_CONFLICT;
     ++				goto exit;
     ++			}
     ++			if (err < 0) {
     ++				goto exit;
     ++			}
     ++		}
     ++	}
     ++	err = 0;
     ++exit:
     ++	slice_clear(&slashed);
     ++	return err;
     ++}
     +
     + ## reftable/refname.h (new) ##
     +@@
     ++/*
     ++  Copyright 2020 Google LLC
     ++
     ++  Use of this source code is governed by a BSD-style
     ++  license that can be found in the LICENSE file or at
     ++  https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++#ifndef REFNAME_H
     ++#define REFNAME_H
     ++
     ++struct modification {
     ++	struct reftable_table tab;
     ++
     ++	char **add;
     ++	size_t add_len;
     ++
     ++	char **del;
     ++	size_t del_len;
     ++};
     ++
     ++// -1 = error, 0 = found, 1 = not found
     ++int modification_has_ref(struct modification *mod, const char *name);
     ++
     ++// -1 = error, 0 = found, 1 = not found.
     ++int modification_has_ref_with_prefix(struct modification *mod,
     ++				     const char *prefix);
     ++
     ++// 0 = OK.
     ++int validate_ref_name(const char *name);
     ++
     ++int validate_ref_record_addition(struct reftable_table tab,
     ++				 struct reftable_ref_record *recs, size_t sz);
     ++
     ++int modification_validate(struct modification *mod);
     ++
     ++/* illegal name, or dir/file conflict */
     ++#define REFTABLE_REFNAME_ERROR -9
     ++
     ++#endif
     +
     + ## reftable/reftable.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include "reftable.h"
     ++#include "record.h"
     ++#include "reader.h"
     ++#include "merged.h"
     ++
     ++struct reftable_table_vtable {
     ++	int (*seek)(void *tab, struct reftable_iterator *it, struct record);
     ++};
     ++
     ++static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
     ++				     struct record rec)
     ++{
     ++	return reader_seek((struct reftable_reader *)tab, it, rec);
     ++}
     ++
     ++static struct reftable_table_vtable reader_vtable = {
     ++	.seek = reftable_reader_seek_void,
     ++};
     ++
     ++static int reftable_merged_table_seek_void(void *tab,
     ++					   struct reftable_iterator *it,
     ++					   struct record rec)
     ++{
     ++	return merged_table_seek_record((struct reftable_merged_table *)tab, it,
     ++					rec);
     ++}
     ++
     ++static struct reftable_table_vtable merged_table_vtable = {
     ++	.seek = reftable_merged_table_seek_void,
     ++};
     ++
     ++int reftable_table_seek_ref(struct reftable_table tab,
     ++			    struct reftable_iterator *it, const char *name)
     ++{
     ++	struct reftable_ref_record ref = {
     ++		.ref_name = (char *)name,
     ++	};
     ++	struct record rec = { 0 };
     ++	record_from_ref(&rec, &ref);
     ++	return tab.ops->seek(tab.table_arg, it, rec);
     ++}
     ++
     ++void reftable_table_from_reader(struct reftable_table *tab,
     ++				struct reftable_reader *reader)
     ++{
     ++	assert(tab->ops == NULL);
     ++	tab->ops = &reader_vtable;
     ++	tab->table_arg = reader;
     ++}
     ++
     ++void reftable_table_from_merged_table(struct reftable_table *tab,
     ++				      struct reftable_merged_table *merged)
     ++{
     ++	assert(tab->ops == NULL);
     ++	tab->ops = &merged_table_vtable;
     ++	tab->table_arg = merged;
     ++}
     ++
     ++int reftable_table_read_ref(struct reftable_table tab, const char *name,
     ++			    struct reftable_ref_record *ref)
     ++{
     ++	struct reftable_iterator it = { 0 };
     ++	int err = reftable_table_seek_ref(tab, &it, name);
     ++	if (err) {
     ++		goto exit;
     ++	}
     ++
     ++	err = reftable_iterator_next_ref(it, ref);
     ++	if (err) {
     ++		goto exit;
     ++	}
     ++
     ++	if (strcmp(ref->ref_name, name) ||
     ++	    reftable_ref_record_is_deletion(ref)) {
     ++		reftable_ref_record_clear(ref);
     ++		err = 1;
     ++		goto exit;
     ++	}
     ++
     ++exit:
     ++	reftable_iterator_destroy(&it);
     ++	return err;
     ++}
     +
       ## reftable/reftable.h (new) ##
      @@
      +/*
     @@ reftable/reftable.h (new)
      +
      +	/* Wrote a table without blocks. */
      +	REFTABLE_EMPTY_TABLE_ERROR = -8,
     ++
     ++	/* Dir/file conflict. */
     ++	REFTABLE_NAME_CONFLICT = -9,
     ++
     ++	/* Illegal ref name. */
     ++	REFTABLE_REFNAME_ERROR = -10,
      +};
      +
      +/* convert the numeric error code to a string. The string should not be
     @@ reftable/reftable.h (new)
      +	 * Defaults to SHA1 if unset
      +	 */
      +	uint32_t hash_id;
     ++
     ++	/* boolean: do not check ref names for validity or dir/file conflicts.
     ++	 */
     ++	int skip_name_check;
      +};
      +
      +/* reftable_block_stats holds statistics for a single block type */
     @@ reftable/reftable.h (new)
      +
      +/* reftable_new_writer creates a new writer */
      +struct reftable_writer *
     -+reftable_new_writer(int (*writer_func)(void *, uint8_t *, int),
     ++reftable_new_writer(int (*writer_func)(void *, uint8_t *, size_t),
      +		    void *writer_arg, struct reftable_write_options *opts);
      +
      +/* write to a file descriptor. fdp should be an int* pointing to the fd. */
     -+int reftable_fd_write(void *fdp, uint8_t *data, int size);
     ++int reftable_fd_write(void *fdp, uint8_t *data, size_t size);
      +
      +/* Set the range of update indices for the records we will add.  When
      +   writing a table into a stack, the min should be at least
     @@ reftable/reftable.h (new)
      +struct reftable_reader;
      +
      +/* reftable_new_reader opens a reftable for reading. If successful, returns 0
     -+ * code and sets pp.  The name is used for creating a
     -+ * stack. Typically, it is the basename of the file.
     ++ * code and sets pp. The name is used for creating a stack. Typically, it is the
     ++ * basename of the file. The block source `src` is owned by the reader, and is
     ++ * closed on calling reftable_reader_destroy().
      + */
      +int reftable_new_reader(struct reftable_reader **pp,
     -+			struct reftable_block_source, const char *name);
     ++			struct reftable_block_source src, const char *name);
      +
      +/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
      +   in the table.  To seek to the start of the table, use name = "".
     @@ reftable/reftable.h (new)
      +void reftable_merged_table_free(struct reftable_merged_table *m);
      +
      +/****************************************************************
     ++ Generic tables
     ++
     ++ A unified API for reading tables, either merged tables, or single readers.
     ++ ****************************************************************/
     ++
     ++struct reftable_table {
     ++	struct reftable_table_vtable *ops;
     ++	void *table_arg;
     ++};
     ++
     ++int reftable_table_seek_ref(struct reftable_table tab,
     ++			    struct reftable_iterator *it, const char *name);
     ++
     ++void reftable_table_from_reader(struct reftable_table *tab,
     ++				struct reftable_reader *reader);
     ++void reftable_table_from_merged_table(struct reftable_table *tab,
     ++				      struct reftable_merged_table *table);
     ++
     ++/* convenience function to read a single ref. Returns < 0 for error, 0
     ++   for success, and 1 if ref not found. */
     ++int reftable_table_read_ref(struct reftable_table tab, const char *name,
     ++			    struct reftable_ref_record *ref);
     ++
     ++/****************************************************************
      + Mutable ref database
      +
      + The stack presents an interface to a mutable sequence of reftables.
     @@ reftable/reftable.h (new)
      +/* statistics on past compactions. */
      +struct reftable_compaction_stats {
      +	uint64_t bytes; /* total number of bytes written */
     ++	uint64_t entries_written; /* total number of entries written, including
     ++				     failures. */
      +	int attempts; /* how often we tried to compact */
      +	int failures; /* failures happen on concurrent updates */
      +};
     @@ reftable/slice.c (new)
      +	}
      +}
      +
     -+int slice_write(struct slice *b, byte *data, int sz)
     ++int slice_write(struct slice *b, byte *data, size_t sz)
      +{
      +	if (b->len + sz > b->cap) {
      +		int newcap = 2 * b->cap + 1;
     @@ reftable/slice.c (new)
      +	return sz;
      +}
      +
     -+int slice_write_void(void *b, byte *data, int sz)
     ++int slice_write_void(void *b, byte *data, size_t sz)
      +{
      +	return slice_write((struct slice *)b, data, sz);
      +}
     @@ reftable/slice.c (new)
      +void block_source_from_slice(struct reftable_block_source *bs,
      +			     struct slice *buf)
      +{
     ++	assert(bs->ops == NULL);
      +	bs->ops = &slice_vtable;
      +	bs->arg = buf;
      +}
     @@ reftable/slice.h (new)
      +int slice_compare(struct slice a, struct slice b);
      +
      +/* Append `data` to the `dest` slice.  */
     -+int slice_write(struct slice *dest, byte *data, int sz);
     ++int slice_write(struct slice *dest, byte *data, size_t sz);
      +
      +/* Append `add` to `dest. */
      +void slice_append(struct slice *dest, struct slice add);
      +
      +/* Like slice_write, but suitable for passing to reftable_new_writer
      + */
     -+int slice_write_void(void *b, byte *data, int sz);
     ++int slice_write_void(void *b, byte *data, size_t sz);
      +
      +/* Find the longest shared prefix size of `a` and `b` */
      +int common_prefix_size(struct slice a, struct slice b);
     @@ reftable/stack.c (new)
      +#include "system.h"
      +#include "merged.h"
      +#include "reader.h"
     ++#include "refname.h"
      +#include "reftable.h"
      +#include "writer.h"
      +
     @@ reftable/stack.c (new)
      +{
      +	struct reftable_stack *p =
      +		reftable_calloc(sizeof(struct reftable_stack));
     -+	struct slice list_file_name = {};
     ++	struct slice list_file_name = { 0 };
      +	int err = 0;
      +
      +	if (config.hash_id == 0) {
     @@ reftable/stack.c (new)
      +			}
      +		}
      +	}
     ++
      +exit:
      +	slice_clear(&table_path);
      +	{
     @@ reftable/stack.c (new)
      +		free_names(names);
      +		free_names(names_after);
      +
     -+		delay = delay + (delay * rand()) / RAND_MAX + 100;
     -+		usleep(delay);
     ++		delay = delay + (delay * rand()) / RAND_MAX + 1;
     ++		sleep_millisec(delay);
      +	}
      +
      +	return 0;
     @@ reftable/stack.c (new)
      +	int err = stack_try_add(st, write, arg);
      +	if (err < 0) {
      +		if (err == REFTABLE_LOCK_ERROR) {
     -+			// Ignore error return, we want to propagate
     -+			// REFTABLE_LOCK_ERROR.
     ++			/* Ignore error return, we want to propagate
     ++			   REFTABLE_LOCK_ERROR.
     ++			*/
      +			reftable_stack_reload(st);
      +		}
      +		return err;
     @@ reftable/stack.c (new)
      +void reftable_addition_close(struct reftable_addition *add)
      +{
      +	int i = 0;
     -+	struct slice nm = {};
     ++	struct slice nm = { 0 };
      +	for (i = 0; i < add->new_tables_len; i++) {
      +		slice_set_string(&nm, add->stack->list_file);
      +		slice_append_string(&nm, "/");
      +		slice_append_string(&nm, add->new_tables[i]);
      +		unlink(slice_as_string(&nm));
     -+
      +		reftable_free(add->new_tables[i]);
      +		add->new_tables[i] = NULL;
      +	}
     @@ reftable/stack.c (new)
      +
      +	free_names(add->names);
      +	add->names = NULL;
     ++	slice_clear(&nm);
      +}
      +
      +int reftable_addition_commit(struct reftable_addition *add)
     @@ reftable/stack.c (new)
      +	struct reftable_writer *wr = NULL;
      +	int err = 0;
      +	int tab_fd = 0;
     -+	uint64_t next_update_index = 0;
      +
      +	slice_resize(&next_name, 0);
     -+	format_name(&next_name, next_update_index, next_update_index);
     ++	format_name(&next_name, add->next_update_index, add->next_update_index);
      +
      +	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
      +	slice_append_string(&temp_tab_file_name, "/");
     @@ reftable/stack.c (new)
      +		goto exit;
      +	}
      +
     -+	if (wr->min_update_index < next_update_index) {
     ++	err = stack_check_addition(add->stack,
     ++				   slice_as_string(&temp_tab_file_name));
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	if (wr->min_update_index < add->next_update_index) {
      +		err = REFTABLE_API_ERROR;
      +		goto exit;
      +	}
     @@ reftable/stack.c (new)
      +	struct reftable_ref_record ref = { 0 };
      +	struct reftable_log_record log = { 0 };
      +
     ++	uint64_t entries = 0;
     ++
      +	int i = 0, j = 0;
      +	for (i = first, j = 0; i <= last; i++) {
      +		struct reftable_reader *t = st->merged->stack[i];
     @@ reftable/stack.c (new)
      +		if (err < 0) {
      +			break;
      +		}
     ++		entries++;
      +	}
     ++	reftable_iterator_destroy(&it);
      +
      +	err = reftable_merged_table_seek_log(mt, &it, "");
      +	if (err < 0) {
     @@ reftable/stack.c (new)
      +			continue;
      +		}
      +
     -+		/* XXX collect stats? */
     -+
      +		if (config != NULL && config->time > 0 &&
      +		    log.time < config->time) {
      +			continue;
     @@ reftable/stack.c (new)
      +		if (err < 0) {
      +			break;
      +		}
     ++		entries++;
      +	}
      +
      +exit:
     @@ reftable/stack.c (new)
      +		reftable_merged_table_free(mt);
      +	}
      +	reftable_ref_record_clear(&ref);
     -+
     ++	reftable_log_record_clear(&log);
     ++	st->stats.entries_written += entries;
      +	return err;
      +}
      +
     @@ reftable/stack.c (new)
      +		}
      +		goto exit;
      +	}
     ++	/* Don't want to write to the lock for now.  */
     ++	close(lock_file_fd);
     ++	lock_file_fd = 0;
     ++
      +	have_lock = true;
      +	err = stack_uptodate(st);
      +	if (err != 0) {
     @@ reftable/stack.c (new)
      +	}
      +	have_lock = false;
      +
     ++	/* Reload the stack before deleting. On windows, we can only delete the
     ++	   files after we closed them.
     ++	*/
     ++	err = reftable_stack_reload_maybe_reuse(st, first < last);
     ++
      +	{
      +		char **p = delete_on_success;
      +		while (*p) {
     @@ reftable/stack.c (new)
      +		}
      +	}
      +
     -+	err = reftable_stack_reload_maybe_reuse(st, first < last);
      +exit:
      +	free_names(delete_on_success);
      +	{
     @@ reftable/stack.c (new)
      +int fastlog2(uint64_t sz)
      +{
      +	int l = 0;
     -+	assert(sz > 0);
     ++	if (sz == 0) {
     ++		return 0;
     ++	}
      +	for (; sz; sz /= 2) {
      +		l++;
      +	}
     @@ reftable/stack.c (new)
      +	int next = 0;
      +	struct segment cur = { 0 };
      +	int i = 0;
     ++
     ++	if (n == 0) {
     ++		*seglen = 0;
     ++		return segs;
     ++	}
      +	for (i = 0; i < n; i++) {
      +		int log = fastlog2(sizes[i]);
      +		if (cur.log != log && cur.bytes > 0) {
     @@ reftable/stack.c (new)
      +		cur.end = i + 1;
      +		cur.bytes += sizes[i];
      +	}
     -+	if (next > 0) {
     -+		segs[next++] = cur;
     -+	}
     ++	segs[next++] = cur;
      +	*seglen = next;
      +	return segs;
      +}
     @@ reftable/stack.c (new)
      +	uint64_t *sizes =
      +		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
      +	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
     -+	int overhead = footer_size(version) + header_size(version) - 1;
     ++	int overhead = header_size(version) - 1;
      +	int i = 0;
      +	for (i = 0; i < st->merged->stack_len; i++) {
      +		sizes[i] = st->merged->stack[i]->size - overhead;
     @@ reftable/stack.c (new)
      +int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
      +			    struct reftable_ref_record *ref)
      +{
     ++	struct reftable_table tab = { NULL };
     ++	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
     ++	return reftable_table_read_ref(tab, refname, ref);
     ++}
     ++
     ++int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
     ++			    struct reftable_log_record *log)
     ++{
      +	struct reftable_iterator it = { 0 };
      +	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
     -+	int err = reftable_merged_table_seek_ref(mt, &it, refname);
     ++	int err = reftable_merged_table_seek_log(mt, &it, refname);
      +	if (err) {
      +		goto exit;
      +	}
      +
     -+	err = reftable_iterator_next_ref(it, ref);
     ++	err = reftable_iterator_next_log(it, log);
      +	if (err) {
      +		goto exit;
      +	}
      +
     -+	if (strcmp(ref->ref_name, refname) ||
     -+	    reftable_ref_record_is_deletion(ref)) {
     ++	if (strcmp(log->ref_name, refname) ||
     ++	    reftable_log_record_is_deletion(log)) {
      +		err = 1;
      +		goto exit;
      +	}
      +
      +exit:
     ++	if (err) {
     ++		reftable_log_record_clear(log);
     ++	}
      +	reftable_iterator_destroy(&it);
      +	return err;
      +}
      +
     -+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
     -+			    struct reftable_log_record *log)
     ++int stack_check_addition(struct reftable_stack *st, const char *new_tab_name)
      +{
     -+	struct reftable_iterator it = { 0 };
     -+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
     -+	int err = reftable_merged_table_seek_log(mt, &it, refname);
     -+	if (err) {
     ++	int err = 0;
     ++	struct reftable_block_source src = { 0 };
     ++	struct reftable_reader *rd = NULL;
     ++	struct reftable_table tab = { NULL };
     ++	struct reftable_ref_record *refs = NULL;
     ++	struct reftable_iterator it = { NULL };
     ++	int cap = 0;
     ++	int len = 0;
     ++	int i = 0;
     ++
     ++	if (st->config.skip_name_check) {
     ++		return 0;
     ++	}
     ++
     ++	err = reftable_block_source_from_file(&src, new_tab_name);
     ++	if (err < 0) {
      +		goto exit;
      +	}
      +
     -+	err = reftable_iterator_next_log(it, log);
     -+	if (err) {
     ++	err = reftable_new_reader(&rd, src, new_tab_name);
     ++	if (err < 0) {
      +		goto exit;
      +	}
      +
     -+	if (strcmp(log->ref_name, refname) ||
     -+	    reftable_log_record_is_deletion(log)) {
     -+		err = 1;
     ++	err = reftable_reader_seek_ref(rd, &it, "");
     ++	if (err > 0) {
     ++		err = 0;
      +		goto exit;
      +	}
     ++	if (err < 0) {
     ++		goto exit;
     ++	}
     ++
     ++	while (true) {
     ++		struct reftable_ref_record ref = { 0 };
     ++		err = reftable_iterator_next_ref(it, &ref);
     ++		if (err > 0) {
     ++			break;
     ++		}
     ++		if (err < 0) {
     ++			goto exit;
     ++		}
     ++
     ++		if (len >= cap) {
     ++			cap = 2 * cap + 1;
     ++			refs = reftable_realloc(refs, cap * sizeof(refs[0]));
     ++		}
     ++
     ++		refs[len++] = ref;
     ++	}
     ++
     ++	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
     ++
     ++	err = validate_ref_record_addition(tab, refs, len);
     ++
     ++	for (i = 0; i < len; i++) {
     ++		reftable_ref_record_clear(&refs[i]);
     ++	}
      +
      +exit:
     ++	free(refs);
      +	reftable_iterator_destroy(&it);
     ++	reftable_reader_free(rd);
      +	return err;
      +}
      
     @@ reftable/stack.h (new)
      +			int first, int last,
      +			struct reftable_log_expiry_config *config);
      +int fastlog2(uint64_t sz);
     ++int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
      +
      +struct segment {
      +	int start, end;
     @@ reftable/system.h (new)
      +#if 1 /* REFTABLE_IN_GITCORE */
      +
      +#include "git-compat-util.h"
     ++#include "cache.h"
      +#include <zlib.h>
      +
      +#else
     @@ reftable/update.sh (new)
      +set -eu
      +
      +# Override this to import from somewhere else, say "../reftable".
     -+SRC=${SRC:-origin} BRANCH=${BRANCH:-origin/master}
     ++SRC=${SRC:-origin}
     ++BRANCH=${BRANCH:-master}
      +
     -+((git --git-dir reftable-repo/.git fetch ${SRC} && cd reftable-repo && git checkout ${BRANCH} ) ||
     ++((git --git-dir reftable-repo/.git fetch ${SRC} ${BRANCH}:import && cd reftable-repo && git checkout -f $(git rev-parse import) ) ||
      +   git clone https://github.com/google/reftable reftable-repo)
      +
      +cp reftable-repo/c/*.[ch] reftable/
      +cp reftable-repo/c/include/*.[ch] reftable/
      +cp reftable-repo/LICENSE reftable/
     -+git --git-dir reftable-repo/.git show --no-patch origin/master \
     ++git --git-dir reftable-repo/.git show --no-patch HEAD \
      +  > reftable/VERSION
      +
      +mv reftable/system.h reftable/system.h~
     @@ reftable/writer.c (new)
      +}
      +
      +struct reftable_writer *
     -+reftable_new_writer(int (*writer_func)(void *, byte *, int), void *writer_arg,
     -+		    struct reftable_write_options *opts)
     ++reftable_new_writer(int (*writer_func)(void *, byte *, size_t),
     ++		    void *writer_arg, struct reftable_write_options *opts)
      +{
      +	struct reftable_writer *wp =
      +		reftable_calloc(sizeof(struct reftable_writer));
     @@ reftable/writer.c (new)
      +	}
      +	w->pending_padding = 0;
      +	if (empty_table) {
     -+		// Empty tables need a header anyway.
     ++		/* Empty tables need a header anyway. */
      +		byte header[28];
      +		int n = writer_write_header(w, header);
      +		err = padded_write(w, header, n, 0);
     @@ reftable/writer.h (new)
      +#include "tree.h"
      +
      +struct reftable_writer {
     -+	int (*write)(void *, byte *, int);
     ++	int (*write)(void *, byte *, size_t);
      +	void *write_arg;
      +	int pending_padding;
      +	struct slice last_key;
 10:  be2371cd6e4 !  9:  d731e1669b7 Reftable support for git-core
     @@ Commit message
      
           * Resolve spots marked with XXX
      
     -     * Detect and prevent directory/file conflicts in naming.
     -
           * Support worktrees (t0002-gitfile "linked repo" testcase)
      
          Example use: see t/t0031-reftable.sh
     @@ Makefile: TEST_SHELL_PATH = $(SHELL_PATH)
       VCSSVN_LIB = vcs-svn/lib.a
      +REFTABLE_LIB = reftable/libreftable.a
       
     + GENERATED_H += config-list.h
       GENERATED_H += command-list.h
     - 
     -@@ Makefile: LIB_OBJS += rebase-interactive.o
     +@@ Makefile: LIB_OBJS += ref-filter.o
       LIB_OBJS += reflog-walk.o
       LIB_OBJS += refs.o
       LIB_OBJS += refs/files-backend.o
     @@ Makefile: THIRD_PARTY_SOURCES += compat/regex/%
       EXTLIBS =
       
       GIT_USER_AGENT = git/$(GIT_VERSION)
     -@@ Makefile: VCSSVN_OBJS += vcs-svn/fast_export.o
     +@@ Makefile: VCSSVN_OBJS += vcs-svn/sliding_window.o
       VCSSVN_OBJS += vcs-svn/svndiff.o
       VCSSVN_OBJS += vcs-svn/svndump.o
       
      +REFTABLE_OBJS += reftable/basics.o
      +REFTABLE_OBJS += reftable/block.o
     -+REFTABLE_OBJS += reftable/bytes.o
      +REFTABLE_OBJS += reftable/file.o
      +REFTABLE_OBJS += reftable/iter.o
      +REFTABLE_OBJS += reftable/merged.o
      +REFTABLE_OBJS += reftable/pq.o
      +REFTABLE_OBJS += reftable/reader.o
      +REFTABLE_OBJS += reftable/record.o
     ++REFTABLE_OBJS += reftable/refname.o
     ++REFTABLE_OBJS += reftable/reftable.o
      +REFTABLE_OBJS += reftable/slice.o
      +REFTABLE_OBJS += reftable/stack.o
      +REFTABLE_OBJS += reftable/tree.o
     @@ builtin/init-db.c: static int create_default_files(const char *template_path,
      +	reinit = (!access(path, R_OK) ||
      +		  readlink(path, junk, sizeof(junk) - 1) != -1);
      +
     -+        /*
     -+         * refs/heads is a file when using reftable. We can't reinitialize with
     -+         * a reftable because it will overwrite HEAD
     -+         */
     ++	/*
     ++	 * refs/heads is a file when using reftable. We can't reinitialize with
     ++	 * a reftable because it will overwrite HEAD
     ++	 */
      +	if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
      +			      is_directory(git_path_buf(&buf, "refs/heads"))) {
      +		die("cannot switch ref storage format.");
     @@ builtin/init-db.c: static int create_default_files(const char *template_path,
       	/*
       	 * We need to create a "refs" dir in any case so that older
       	 * versions of git can tell that this is a repository.
     - 	 */
     --
     - 	if (refs_init_db(&err))
     - 		die("failed to set up refs db: %s", err.buf);
     - 
      @@ builtin/init-db.c: static int create_default_files(const char *template_path,
       	 * Create the default symlink from ".git/HEAD" to the "master"
       	 * branch, if it does not exist yet.
     @@ builtin/init-db.c: static int create_default_files(const char *template_path,
       	if (!reinit) {
       		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
       			exit(1);
     --	}
     --
     --	initialize_repository_version(fmt->hash_algo);
     -+	} 
     + 	}
       
     +-	initialize_repository_version(fmt->hash_algo);
      +	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
     -+        
     + 
       	/* Check filemode trustability */
       	path = git_path_buf(&buf, "config");
     - 	filemode = TEST_FILEMODE;
      @@ builtin/init-db.c: static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
       }
       
     @@ refs.c: struct ref_store *get_main_ref_store(struct repository *r)
       		BUG("attempting to get main_ref_store outside of repository");
       
      -	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
     -+	r->refs_private = ref_store_init(
     -+                r->gitdir,
     -+                r->ref_storage_format ? r->ref_storage_format : DEFAULT_REF_STORAGE,
     -+                REF_STORE_ALL_CAPS);
     ++	r->refs_private = ref_store_init(r->gitdir,
     ++					 r->ref_storage_format ?
     ++						 r->ref_storage_format :
     ++						 DEFAULT_REF_STORAGE,
     ++					 REF_STORE_ALL_CAPS);
       	return r->refs_private;
       }
       
     @@ refs/reftable-backend.c (new)
      +	unsigned int store_flags;
      +
      +	int err;
     -+        char *repo_dir;
     ++	char *repo_dir;
      +	char *reftable_dir;
      +	struct reftable_stack *stack;
      +};
     @@ refs/reftable-backend.c (new)
      +
      +	base_ref_store_init(ref_store, &refs_be_reftable);
      +	refs->store_flags = store_flags;
     -+        refs->repo_dir = xstrdup(path);
     ++	refs->repo_dir = xstrdup(path);
      +	strbuf_addf(&sb, "%s/reftable", path);
      +	refs->reftable_dir = xstrdup(sb.buf);
      +	strbuf_reset(&sb);
     @@ refs/reftable-backend.c (new)
      +	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
      +	write_file(sb.buf, "this repository uses the reftable format");
      +
     -+        return 0;
     ++	return 0;
      +}
      +
      +struct git_reftable_iterator {
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +	err = reftable_stack_read_ref(refs->stack, refname, &ref);
     -+        if (err > 0) {
     -+                errno = ENOENT;
     -+                err = -1;
     -+                goto exit;
     -+        }
     ++	if (err > 0) {
     ++		errno = ENOENT;
     ++		err = -1;
     ++		goto exit;
     ++	}
      +	if (err < 0) {
     -+                errno = reftable_error_to_errno(err);
     -+                err = -1;
     ++		errno = reftable_error_to_errno(err);
     ++		err = -1;
      +		goto exit;
      +	}
      +	if (ref.target != NULL) {
     @@ refs/reftable-backend.c (new)
      +	} else if (ref.value != NULL) {
      +		hashcpy(oid->hash, ref.value);
      +	} else {
     -+                *type |= REF_ISBROKEN;
     -+                errno = EINVAL;
     -+                err = -1;
     -+        }
     ++		*type |= REF_ISBROKEN;
     ++		errno = EINVAL;
     ++		err = -1;
     ++	}
      +exit:
      +	reftable_ref_record_clear(&ref);
      +	return err;
     @@ refs/reftable-backend.c (new)
      +	reftable_reflog_expire
      +};
      
     - ## reftable/update.sh ##
     -@@ reftable/update.sh: SRC=${SRC:-origin} BRANCH=${BRANCH:-origin/master}
     - cp reftable-repo/c/*.[ch] reftable/
     - cp reftable-repo/c/include/*.[ch] reftable/
     - cp reftable-repo/LICENSE reftable/
     --git --git-dir reftable-repo/.git show --no-patch origin/master \
     -+git --git-dir reftable-repo/.git show --no-patch ${BRANCH} \
     -   > reftable/VERSION
     - 
     - mv reftable/system.h reftable/system.h~
     -
       ## repository.c ##
      @@ repository.c: int repo_init(struct repository *repo,
       	if (worktree)
     @@ t/t0031-reftable.sh (new)
      +	git checkout -b master
      +'
      +
     ++
     ++test_expect_success 'dir/file conflict' '
     ++	initialize &&
     ++	test_commit file &&
     ++	! git branch master/forbidden
     ++'
     ++
     ++
      +test_expect_success 'do not clobber existing repo' '
      +	rm -rf .git &&
      +	git init --ref-storage=files &&
     @@ t/t0031-reftable.sh (new)
      +	test_cmp expect actual
      +'
      +
     ++
      +test_done
     ++
  -:  ----------- > 10:  513d585f0f8 vcxproj: adjust for the reftable changes
 11:  8af67b85e49 ! 11:  846fe29fa4b Add some reftable testing infrastructure
     @@ builtin/init-db.c: static const char *const init_db_usage[] = {
      
       ## refs.c ##
      @@ refs.c: struct ref_store *get_main_ref_store(struct repository *r)
     - 	if (!r->gitdir)
     - 		BUG("attempting to get main_ref_store outside of repository");
     - 
     --	r->refs_private = ref_store_init(
     --                r->gitdir,
     --                r->ref_storage_format ? r->ref_storage_format : DEFAULT_REF_STORAGE,
     --                REF_STORE_ALL_CAPS);
     -+	r->refs_private = ref_store_init(r->gitdir,
     -+					 r->ref_storage_format ?
     -+						 r->ref_storage_format :
     + 	r->refs_private = ref_store_init(r->gitdir,
     + 					 r->ref_storage_format ?
     + 						 r->ref_storage_format :
     +-						 DEFAULT_REF_STORAGE,
      +						 default_ref_storage(),
     -+					 REF_STORE_ALL_CAPS);
     + 					 REF_STORE_ALL_CAPS);
       	return r->refs_private;
       }
     - 
      @@ refs.c: struct ref_store *get_submodule_ref_store(const char *submodule)
       		goto done;
       
 12:  b02b83a92ad = 12:  a7d1d2e721c t: use update-ref and show-ref to reading/writing refs

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v11 01/12] refs.h: clarify reflog iteration order
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-05-04 19:03                     ` Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
                                       ` (12 subsequent siblings)
  13 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-04 19:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/refs.h b/refs.h
index a92d2c74c83..9421c5b8465 100644
--- a/refs.h
+++ b/refs.h
@@ -432,18 +432,21 @@ int delete_refs(const char *msg, struct string_list *refnames,
 int refs_delete_reflog(struct ref_store *refs, const char *refname);
 int delete_reflog(const char *refname);
 
-/* iterate over reflog entries */
+/* Iterate over reflog entries. */
 typedef int each_reflog_ent_fn(
 		struct object_id *old_oid, struct object_id *new_oid,
 		const char *committer, timestamp_t timestamp,
 		int tz, const char *msg, void *cb_data);
 
+/* Iterate in over reflog entries, oldest entry first. */
 int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
 			     each_reflog_ent_fn fn, void *cb_data);
 int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
 				     const char *refname,
 				     each_reflog_ent_fn fn,
 				     void *cb_data);
+
+/* Call a function for each reflog entry, oldest entry first. */
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v11 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 01/12] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
@ 2020-05-04 19:03                     ` Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 03/12] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
                                       ` (11 subsequent siblings)
  13 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-04 19:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This happens implicitly in the files/packed ref backend; making it
explicit simplifies adding alternate ref storage backends, such as
reftable.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 224ff66c7bb..4db27379661 100644
--- a/refs.c
+++ b/refs.c
@@ -1525,7 +1525,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1585,8 +1585,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
-			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
+			       cb_data);
 }
 
 int for_each_rawref(each_ref_fn fn, void *cb_data)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v11 03/12] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 01/12] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
@ 2020-05-04 19:03                     ` Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 04/12] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
                                       ` (10 subsequent siblings)
  13 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-04 19:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index ff2436c0fb7..3490aac3a40 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -438,6 +438,11 @@ void base_ref_iterator_free(struct ref_iterator *iter);
 
 /* Virtual function declarations for ref_iterators: */
 
+/*
+ * backend-specific implementation of ref_iterator_advance.
+ * For symrefs, the function should set REF_ISSYMREF, and it should also
+ * dereference the symref to provide the OID referent.
+ */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v11 04/12] Add .gitattributes for the reftable/ directory
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                       ` (2 preceding siblings ...)
  2020-05-04 19:03                     ` [PATCH v11 03/12] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
@ 2020-05-04 19:03                     ` Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 05/12] reftable: file format documentation Jonathan Nieder via GitGitGadget
                                       ` (9 subsequent siblings)
  13 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-04 19:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 reftable/.gitattributes

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 00000000000..f44451a3795
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v11 05/12] reftable: file format documentation
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                       ` (3 preceding siblings ...)
  2020-05-04 19:03                     ` [PATCH v11 04/12] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
@ 2020-05-04 19:03                     ` Jonathan Nieder via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 06/12] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
                                       ` (8 subsequent siblings)
  13 siblings, 0 replies; 409+ messages in thread
From: Jonathan Nieder via GitGitGadget @ 2020-05-04 19:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder

From: Jonathan Nieder <jrnieder@gmail.com>

Shawn Pearce explains:

Some repositories contain a lot of references (e.g. android at 866k,
rails at 31k). The reftable format provides:

- Near constant time lookup for any single reference, even when the
  repository is cold and not in process or kernel cache.
- Near constant time verification a SHA-1 is referred to by at least
  one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.

This file format spec was originally written in July, 2017 by Shawn
Pearce.  Some refinements since then were made by Shawn and by Han-Wen
Nienhuys based on experiences implementing and experimenting with the
format.  (All of this was in the context of our work at Google and
Google is happy to contribute the result to the Git project.)

Imported from JGit[1]'s current version (c217d33ff,
"Documentation/technical/reftable: improve repo layout", 2020-02-04)
of Documentation/technical/reftable.md and converted to asciidoc by
running

  pandoc -t asciidoc -f markdown reftable.md >reftable.txt

using pandoc 2.2.1.  The result required the following additional
minor changes:

- removed the [TOC] directive to add a table of contents, since
  asciidoc does not support it
- replaced git-scm.com/docs links with linkgit: directives that link
  to other pages within Git's documentation

[1] https://eclipse.googlesource.com/jgit/jgit

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/Makefile               |    1 +
 Documentation/technical/reftable.txt | 1067 ++++++++++++++++++++++++++
 2 files changed, 1068 insertions(+)
 create mode 100644 Documentation/technical/reftable.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 15d9d04f316..ecd0b340b1c 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -93,6 +93,7 @@ TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
 TECH_DOCS += technical/protocol-v2
 TECH_DOCS += technical/racy-git
+TECH_DOCS += technical/reftable
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
 TECH_DOCS += technical/signature-format
diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
new file mode 100644
index 00000000000..9fa4657d9ff
--- /dev/null
+++ b/Documentation/technical/reftable.txt
@@ -0,0 +1,1067 @@
+reftable
+--------
+
+Overview
+~~~~~~~~
+
+Problem statement
+^^^^^^^^^^^^^^^^^
+
+Some repositories contain a lot of references (e.g. android at 866k,
+rails at 31k). The existing packed-refs format takes up a lot of space
+(e.g. 62M), and does not scale with additional references. Lookup of a
+single reference requires linearly scanning the file.
+
+Atomic pushes modifying multiple references require copying the entire
+packed-refs file, which can be a considerable amount of data moved
+(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
+
+Repositories with many loose references occupy a large number of disk
+blocks from the local file system, as each reference is its own file
+storing 41 bytes (and another file for the corresponding reflog). This
+negatively affects the number of inodes available when a large number of
+repositories are stored on the same filesystem. Readers can be penalized
+due to the larger number of syscalls required to traverse and read the
+`$GIT_DIR/refs` directory.
+
+Objectives
+^^^^^^^^^^
+
+* Near constant time lookup for any single reference, even when the
+repository is cold and not in process or kernel cache.
+* Near constant time verification if a SHA-1 is referred to by at least
+one reference (for allow-tip-sha1-in-want).
+* Efficient lookup of an entire namespace, such as `refs/tags/`.
+* Support atomic push with `O(size_of_update)` operations.
+* Combine reflog storage with ref storage for small transactions.
+* Separate reflog storage for base refs and historical logs.
+
+Description
+^^^^^^^^^^^
+
+A reftable file is a portable binary file format customized for
+reference storage. References are sorted, enabling linear scans, binary
+search lookup, and range scans.
+
+Storage in the file is organized into variable sized blocks. Prefix
+compression is used within a single block to reduce disk space. Block
+size and alignment is tunable by the writer.
+
+Performance
+^^^^^^^^^^^
+
+Space used, packed-refs vs. reftable:
+
+[cols=",>,>,>,>,>",options="header",]
+|===============================================================
+|repository |packed-refs |reftable |% original |avg ref |avg obj
+|android |62.2 M |36.1 M |58.0% |33 bytes |5 bytes
+|rails |1.8 M |1.1 M |57.7% |29 bytes |4 bytes
+|git |78.7 K |48.1 K |61.0% |50 bytes |4 bytes
+|git (heads) |332 b |269 b |81.0% |33 bytes |0 bytes
+|===============================================================
+
+Scan (read 866k refs), by reference name lookup (single ref from 866k
+refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs):
+
+[cols=",>,>,>,>",options="header",]
+|=========================================================
+|format |cache |scan |by name |by SHA-1
+|packed-refs |cold |402 ms |409,660.1 usec |412,535.8 usec
+|packed-refs |hot | |6,844.6 usec |20,110.1 usec
+|reftable |cold |112 ms |33.9 usec |323.2 usec
+|reftable |hot | |20.2 usec |320.8 usec
+|=========================================================
+
+Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable:
+
+[cols=",>,>",options="header",]
+|================================
+|format |size |avg entry
+|$GIT_DIR/logs |173 M |1209 bytes
+|reftable |5 M |37 bytes
+|================================
+
+Details
+~~~~~~~
+
+Peeling
+^^^^^^^
+
+References stored in a reftable are peeled, a record for an annotated
+(or signed) tag records both the tag object, and the object it refers
+to.
+
+Reference name encoding
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Reference names are an uninterpreted sequence of bytes that must pass
+linkgit:git-check-ref-format[1] as a valid reference name.
+
+Key unicity
+^^^^^^^^^^^
+
+Each entry must have a unique key; repeated keys are disallowed.
+
+Network byte order
+^^^^^^^^^^^^^^^^^^
+
+All multi-byte, fixed width fields are in network byte order.
+
+Ordering
+^^^^^^^^
+
+Blocks are lexicographically ordered by their first reference.
+
+Directory/file conflicts
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format accepts both `refs/heads/foo` and
+`refs/heads/foo/bar` as distinct references.
+
+This property is useful for retaining log records in reftable, but may
+confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
+references. Users of reftable may choose to continue to reject `foo` and
+`foo/bar` type conflicts to prevent problems for peers.
+
+File format
+~~~~~~~~~~~
+
+Structure
+^^^^^^^^^
+
+A reftable file has the following high-level structure:
+
+....
+first_block {
+  header
+  first_ref_block
+}
+ref_block*
+ref_index*
+obj_block*
+obj_index*
+log_block*
+log_index*
+footer
+....
+
+A log-only file omits the `ref_block`, `ref_index`, `obj_block` and
+`obj_index` sections, containing only the file header and log block:
+
+....
+first_block {
+  header
+}
+log_block*
+log_index*
+footer
+....
+
+in a log-only file the first log block immediately follows the file
+header, without padding to block alignment.
+
+Block size
+^^^^^^^^^^
+
+The file’s block size is arbitrarily determined by the writer, and does
+not have to be a power of 2. The block size must be larger than the
+longest reference name or log entry used in the repository, as
+references cannot span blocks.
+
+Powers of two that are friendly to the virtual memory system or
+filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
+yield better compression, with a possible increased cost incurred by
+readers during access.
+
+The largest block size is `16777215` bytes (15.99 MiB).
+
+Block alignment
+^^^^^^^^^^^^^^^
+
+Writers may choose to align blocks at multiples of the block size by
+including `padding` filled with NUL bytes at the end of a block to round
+out to the chosen alignment. When alignment is used, writers must
+specify the alignment with the file header’s `block_size` field.
+
+Block alignment is not required by the file format. Unaligned files must
+set `block_size = 0` in the file header, and omit `padding`. Unaligned
+files with more than one ref block must include the link:#Ref-index[ref
+index] to support fast lookup. Readers must be able to read both aligned
+and non-aligned files.
+
+Very small files (e.g. 1 only ref block) may omit `padding` and the ref
+index to reduce total file size.
+
+Header
+^^^^^^
+
+A 24-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+....
+
+Aligned files must specify `block_size` to configure readers with the
+expected block alignment. Unaligned files must set `block_size = 0`.
+
+The `min_update_index` and `max_update_index` describe bounds for the
+`update_index` field of all log records in this file. When reftables are
+used in a stack for link:#Update-transactions[transactions], these
+fields can order the files such that the prior file’s
+`max_update_index + 1` is the next file’s `min_update_index`.
+
+First ref block
+^^^^^^^^^^^^^^^
+
+The first ref block shares the same block as the file header, and is 24
+bytes smaller than all other blocks in the file. The first block
+immediately begins after the file header, at position 24.
+
+If the first block is a log block (a log-only file), its block header
+begins immediately at position 24.
+
+Ref block format
+^^^^^^^^^^^^^^^^
+
+A ref block is written as:
+
+....
+'r'
+uint24( block_len )
+ref_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
+encodes the number of bytes in the block up to, but not including the
+optional `padding`. This is always less than or equal to the file’s
+block size. In the first ref block, `block_len` includes 24 bytes for
+the file header.
+
+The 2-byte `restart_count` stores the number of entries in the
+`restart_offset` list, which must not be empty. Readers can use
+`restart_count` to binary search between restarts before starting a
+linear scan.
+
+Exactly `restart_count` 3-byte `restart_offset` values precedes the
+`restart_count`. Offsets are relative to the start of the block and
+refer to the first byte of any `ref_record` whose name has not been
+prefix compressed. Entries in the `restart_offset` list must be sorted,
+ascending. Readers can start linear scans from any of these records.
+
+A variable number of `ref_record` fill the middle of the block,
+describing reference names and values. The format is described below.
+
+As the first ref block shares the first file block with the file header,
+all `restart_offset` in the first block are relative to the start of the
+file (position 0), and include the file header. This forces the first
+`restart_offset` to be `28`.
+
+ref record
+++++++++++
+
+A `ref_record` describes a single reference, storing both the name and
+its value(s). Records are formatted as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | value_type )
+suffix
+varint( update_index_delta )
+value?
+....
+
+The `prefix_length` field specifies how many leading bytes of the prior
+reference record’s name should be copied to obtain this reference’s
+name. This must be 0 for the first reference in any block, and also must
+be 0 for any `ref_record` whose offset is listed in the `restart_offset`
+table at the end of the block.
+
+Recovering a reference name from any `ref_record` is a simple concat:
+
+....
+this_name = prior_name[0..prefix_length] + suffix
+....
+
+The `suffix_length` value provides the number of bytes available in
+`suffix` to copy from `suffix` to complete the reference name.
+
+The `update_index` that last modified the reference can be obtained by
+adding `update_index_delta` to the `min_update_index` from the file
+header: `min_update_index + update_index_delta`.
+
+The `value` follows. Its format is determined by `value_type`, one of
+the following:
+
+* `0x0`: deletion; no value data (see transactions, below)
+* `0x1`: one 20-byte object id; value of the ref
+* `0x2`: two 20-byte object ids; value of the ref, peeled target
+* `0x3`: symbolic reference: `varint( target_len ) target`
+
+Symbolic references use `0x3`, followed by the complete name of the
+reference target. No compression is applied to the target name.
+
+Types `0x4..0x7` are reserved for future use.
+
+Ref index
+^^^^^^^^^
+
+The ref index stores the name of the last reference from every ref block
+in the file, enabling reduced disk seeks for lookups. Any reference can
+be found by searching the index, identifying the containing block, and
+searching within that block.
+
+The index may be organized into a multi-level index, where the 1st level
+index block points to additional ref index blocks (2nd level), which may
+in turn point to either additional index blocks (e.g. 3rd level) or ref
+blocks (leaf level). Disk reads required to access a ref go up with
+higher index levels. Multi-level indexes may be required to ensure no
+single index block exceeds the file format’s max block size of
+`16777215` bytes (15.99 MiB). To acheive constant O(1) disk seeks for
+lookups the index must be a single level, which is permitted to exceed
+the file’s configured block size, but not the format’s max block size of
+15.99 MiB.
+
+If present, the ref index block(s) appears after the last ref block.
+
+If there are at least 4 ref blocks, a ref index block should be written
+to improve lookup times. Cold reads using the index require 2 disk reads
+(read index, read block), and binary searching < 4 blocks also requires
+<= 2 reads. Omitting the index block from smaller files saves space.
+
+If the file is unaligned and contains more than one ref block, the ref
+index must be written.
+
+Index block format:
+
+....
+'i'
+uint24( block_len )
+index_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+The index blocks begin with `block_type = 'i'` and a 3-byte `block_len`
+which encodes the number of bytes in the block, up to but not including
+the optional `padding`.
+
+The `restart_offset` and `restart_count` fields are identical in format,
+meaning and usage as in ref blocks.
+
+To reduce the number of reads required for random access in very large
+files the index block may be larger than other blocks. However, readers
+must hold the entire index in memory to benefit from this, so it’s a
+time-space tradeoff in both file size and reader memory.
+
+Increasing the file’s block size decreases the index size. Alternatively
+a multi-level index may be used, keeping index blocks within the file’s
+block size, but increasing the number of blocks that need to be
+accessed.
+
+index record
+++++++++++++
+
+An index record describes the last entry in another block. Index records
+are written as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | 0 )
+suffix
+varint( block_position )
+....
+
+Index records use prefix compression exactly like `ref_record`.
+
+Index records store `block_position` after the suffix, specifying the
+absolute position in bytes (from the start of the file) of the block
+that ends with this reference. Readers can seek to `block_position` to
+begin reading the block header.
+
+Readers must examine the block header at `block_position` to determine
+if the next block is another level index block, or the leaf-level ref
+block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the ref index must first read the footer (below) to
+obtain `ref_index_position`. If not present, the position will be 0. The
+`ref_index_position` is for the 1st level root of the ref index.
+
+Obj block format
+^^^^^^^^^^^^^^^^
+
+Object blocks are optional. Writers may choose to omit object blocks,
+especially if readers will not use the SHA-1 to ref mapping.
+
+Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to
+ref blocks containing references pointing to that object directly, or as
+the peeled value of an annotated tag. Like ref blocks, object blocks use
+the file’s standard block size. The abbrevation length is available in
+the footer as `obj_id_len`.
+
+To save space in small files, object blocks may be omitted if the ref
+index is not present, as brute force search will only need to read a few
+ref blocks. When missing, readers should brute force a linear search of
+all references to lookup by SHA-1.
+
+An object block is written as:
+
+....
+'o'
+uint24( block_len )
+obj_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Fields are identical to ref block. Binary search using the restart table
+works the same as in reference blocks.
+
+Because object identifiers are abbreviated by writers to the shortest
+unique abbreviation within the reftable, obj key lengths are variable
+between 2 and 20 bytes. Readers must compare only for common prefix
+match within an obj block or obj index.
+
+obj record
+++++++++++
+
+An `obj_record` describes a single object abbreviation, and the blocks
+containing references using that unique abbreviation:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | cnt_3 )
+suffix
+varint( cnt_large )?
+varint( position_delta )*
+....
+
+Like in reference blocks, abbreviations are prefix compressed within an
+obj block. On large reftables with many unique objects, higher block
+sizes (64k), and higher restart interval (128), a `prefix_length` of 2
+or 3 and `suffix_length` of 3 may be common in obj records (unique
+abbreviation of 5-6 raw bytes, 10-12 hex digits).
+
+Each record contains `position_count` number of positions for matching
+ref blocks. For 1-7 positions the count is stored in `cnt_3`. When
+`cnt_3 = 0` the actual count follows in a varint, `cnt_large`.
+
+The use of `cnt_3` bets most objects are pointed to by only a single
+reference, some may be pointed to by a couple of references, and very
+few (if any) are pointed to by more than 7 references.
+
+A special case exists when `cnt_3 = 0` and `cnt_large = 0`: there are no
+`position_delta`, but at least one reference starts with this
+abbreviation. A reader that needs exact reference names must scan all
+references to find which specific references have the desired object.
+Writers should use this format when the `position_delta` list would have
+overflowed the file’s block size due to a high number of references
+pointing to the same object.
+
+The first `position_delta` is the position from the start of the file.
+Additional `position_delta` entries are sorted ascending and relative to
+the prior entry, e.g. a reader would perform:
+
+....
+pos = position_delta[0]
+prior = pos
+for (j = 1; j < position_count; j++) {
+  pos = prior + position_delta[j]
+  prior = pos
+}
+....
+
+With a position in hand, a reader must linearly scan the ref block,
+starting from the first `ref_record`, testing each reference’s SHA-1s
+(for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
+SHA-1 within a single ref block is not supported by the reftable format.
+Smaller block sizes reduce the number of candidates this step must
+consider.
+
+Obj index
+^^^^^^^^^
+
+The obj index stores the abbreviation from the last entry for every obj
+block in the file, enabling reduced disk seeks for all lookups. It is
+formatted exactly the same as the ref index, but refers to obj blocks.
+
+The obj index should be present if obj blocks are present, as obj blocks
+should only be written in larger files.
+
+Readers loading the obj index must first read the footer (below) to
+obtain `obj_index_position`. If not present, the position will be 0.
+
+Log block format
+^^^^^^^^^^^^^^^^
+
+Unlike ref and obj blocks, log blocks are always unaligned.
+
+Log blocks are variable in size, and do not match the `block_size`
+specified in the file header or footer. Writers should choose an
+appropriate buffer size to prepare a log block for deflation, such as
+`2 * block_size`.
+
+A log block is written as:
+
+....
+'g'
+uint24( block_len )
+zlib_deflate {
+  log_record+
+  uint24( restart_offset )+
+  uint16( restart_count )
+}
+....
+
+Log blocks look similar to ref blocks, except `block_type = 'g'`.
+
+The 4-byte block header is followed by the deflated block contents using
+zlib deflate. The `block_len` in the header is the inflated size
+(including 4-byte block header), and should be used by readers to
+preallocate the inflation output buffer. A log block’s `block_len` may
+exceed the file’s block size.
+
+Offsets within the log block (e.g. `restart_offset`) still include the
+4-byte header. Readers may prefer prefixing the inflation output buffer
+with the 4-byte header.
+
+Within the deflate container, a variable number of `log_record` describe
+reference changes. The log record format is described below. See ref
+block format (above) for a description of `restart_offset` and
+`restart_count`.
+
+Because log blocks have no alignment or padding between blocks, readers
+must keep track of the bytes consumed by the inflater to know where the
+next log block begins.
+
+log record
+++++++++++
+
+Log record keys are structured as:
+
+....
+ref_name '\0' reverse_int64( update_index )
+....
+
+where `update_index` is the unique transaction identifier. The
+`update_index` field must be unique within the scope of a `ref_name`.
+See the update transactions section below for further details.
+
+The `reverse_int64` function inverses the value so lexographical
+ordering the network byte order encoding sorts the more recent records
+with higher `update_index` values first:
+
+....
+reverse_int64(int64 t) {
+  return 0xffffffffffffffff - t;
+}
+....
+
+Log records have a similar starting structure to ref and index records,
+utilizing the same prefix compression scheme applied to the log record
+key described above.
+
+....
+    varint( prefix_length )
+    varint( (suffix_length << 3) | log_type )
+    suffix
+    log_data {
+      old_id
+      new_id
+      varint( name_length    )  name
+      varint( email_length   )  email
+      varint( time_seconds )
+      sint16( tz_offset )
+      varint( message_length )  message
+    }?
+....
+
+Log record entries use `log_type` to indicate what follows:
+
+* `0x0`: deletion; no log data.
+* `0x1`: standard git reflog data using `log_data` above.
+
+The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
+entry from the reflog of `refs/stash` in a transaction file (below),
+without needing to rewrite larger files. Readers reading a stack of
+reflogs must treat this as a deletion.
+
+For `log_type = 0x1`, the `log_data` section follows
+linkgit:git-update-ref[1] logging and includes:
+
+* two 20-byte SHA-1s (old id, new id)
+* varint string of committer’s name
+* varint string of committer’s email
+* varint time in seconds since epoch (Jan 1, 1970)
+* 2-byte timezone offset in minutes (signed)
+* varint string of message
+
+`tz_offset` is the absolute number of minutes from GMT the committer was
+at the time of the update. For example `GMT-0800` is encoded in reftable
+as `sint16(-480)` and `GMT+0230` is `sint16(150)`.
+
+The committer email does not contain `<` or `>`, it’s the value normally
+found between the `<>` in a git commit object header.
+
+The `message_length` may be 0, in which case there was no message
+supplied for the update.
+
+Contrary to traditional reflog (which is a file), renames are encoded as
+a combination of ref deletion and ref creation.
+
+Reading the log
++++++++++++++++
+
+Readers accessing the log must first read the footer (below) to
+determine the `log_position`. The first block of the log begins at
+`log_position` bytes since the start of the file. The `log_position` is
+not block aligned.
+
+Importing logs
+++++++++++++++
+
+When importing from `$GIT_DIR/logs` writers should globally order all
+log records roughly by timestamp while preserving file order, and assign
+unique, increasing `update_index` values for each log line. Newer log
+records get higher `update_index` values.
+
+Although an import may write only a single reftable file, the reftable
+file must span many unique `update_index`, as each log line requires its
+own `update_index` to preserve semantics.
+
+Log index
+^^^^^^^^^
+
+The log index stores the log key
+(`refname \0 reverse_int64(update_index)`) for the last log record of
+every log block in the file, supporting bounded-time lookup.
+
+A log index block must be written if 2 or more log blocks are written to
+the file. If present, the log index appears after the last log block.
+There is no padding used to align the log index to block alignment.
+
+Log index format is identical to ref index, except the keys are 9 bytes
+longer to include `'\0'` and the 8-byte `reverse_int64(update_index)`.
+Records use `block_position` to refer to the start of a log block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the log index must first read the footer (below) to
+obtain `log_index_position`. If not present, the position will be 0.
+
+Footer
+^^^^^^
+
+After the last block of the file, a file footer is written. It begins
+like the file header, but is extended with additional data.
+
+A 68-byte footer appears at the end:
+
+....
+    'REFT'
+    uint8( version_number = 1 )
+    uint24( block_size )
+    uint64( min_update_index )
+    uint64( max_update_index )
+
+    uint64( ref_index_position )
+    uint64( (obj_position << 5) | obj_id_len )
+    uint64( obj_index_position )
+
+    uint64( log_position )
+    uint64( log_index_position )
+
+    uint32( CRC-32 of above )
+....
+
+If a section is missing (e.g. ref index) the corresponding position
+field (e.g. `ref_index_position`) will be 0.
+
+* `obj_position`: byte position for the first obj block.
+* `obj_id_len`: number of bytes used to abbreviate object identifiers in
+obj blocks.
+* `log_position`: byte position for the first log block.
+* `ref_index_position`: byte position for the start of the ref index.
+* `obj_index_position`: byte position for the start of the obj index.
+* `log_index_position`: byte position for the start of the log index.
+
+Reading the footer
+++++++++++++++++++
+
+Readers must seek to `file_length - 68` to access the footer. A trusted
+external source (such as `stat(2)`) is necessary to obtain
+`file_length`. When reading the footer, readers must verify:
+
+* 4-byte magic is correct
+* 1-byte version number is recognized
+* 4-byte CRC-32 matches the other 64 bytes (including magic, and
+version)
+
+Once verified, the other fields of the footer can be accessed.
+
+Varint encoding
+^^^^^^^^^^^^^^^
+
+Varint encoding is identical to the ofs-delta encoding method used
+within pack files.
+
+Decoder works such as:
+
+....
+val = buf[ptr] & 0x7f
+while (buf[ptr] & 0x80) {
+  ptr++
+  val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
+}
+....
+
+Binary search
+^^^^^^^^^^^^^
+
+Binary search within a block is supported by the `restart_offset` fields
+at the end of the block. Readers can binary search through the restart
+table to locate between which two restart points the sought reference or
+key should appear.
+
+Each record identified by a `restart_offset` stores the complete key in
+the `suffix` field of the record, making the compare operation during
+binary search straightforward.
+
+Once a restart point lexicographically before the sought reference has
+been identified, readers can linearly scan through the following record
+entries to locate the sought record, terminating if the current record
+sorts after (and therefore the sought key is not present).
+
+Restart point selection
++++++++++++++++++++++++
+
+Writers determine the restart points at file creation. The process is
+arbitrary, but every 16 or 64 records is recommended. Every 16 may be
+more suitable for smaller block sizes (4k or 8k), every 64 for larger
+block sizes (64k).
+
+More frequent restart points reduces prefix compression and increases
+space consumed by the restart table, both of which increase file size.
+
+Less frequent restart points makes prefix compression more effective,
+decreasing overall file size, with increased penalities for readers
+walking through more records after the binary search step.
+
+A maximum of `65535` restart points per block is supported.
+
+Considerations
+~~~~~~~~~~~~~~
+
+Lightweight refs dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format assumes the vast majority of references are single
+SHA-1 valued with common prefixes, such as Gerrit Code Review’s
+`refs/changes/` namespace, GitHub’s `refs/pulls/` namespace, or many
+lightweight tags in the `refs/tags/` namespace.
+
+Annotated tags storing the peeled object cost an additional 20 bytes per
+reference.
+
+Low overhead
+^^^^^^^^^^^^
+
+A reftable with very few references (e.g. git.git with 5 heads) is 269
+bytes for reftable, vs. 332 bytes for packed-refs. This supports
+reftable scaling down for transaction logs (below).
+
+Block size
+^^^^^^^^^^
+
+For a Gerrit Code Review type repository with many change refs, larger
+block sizes (64 KiB) and less frequent restart points (every 64) yield
+better compression due to more references within the block compressing
+against the prior reference.
+
+Larger block sizes reduce the index size, as the reftable will require
+fewer blocks to store the same number of references.
+
+Minimal disk seeks
+^^^^^^^^^^^^^^^^^^
+
+Assuming the index block has been loaded into memory, binary searching
+for any single reference requires exactly 1 disk seek to load the
+containing block.
+
+Scans and lookups dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Scanning all references and lookup by name (or namespace such as
+`refs/heads/`) are the most common activities performed on repositories.
+SHA-1s are stored directly with references to optimize this use case.
+
+Logs are infrequently read
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are infrequently accessed, but can be large. Deflating log blocks
+saves disk space, with some increased penalty at read time.
+
+Logs are stored in an isolated section from refs, reducing the burden on
+reference readers that want to ignore logs. Further, historical logs can
+be isolated into log-only files.
+
+Logs are read backwards
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are frequently accessed backwards (most recent N records for master
+to answer `master@{4}`), so log records are grouped by reference, and
+sorted descending by update index.
+
+Repository format
+~~~~~~~~~~~~~~~~~
+
+Version 1
+^^^^^^^^^
+
+A repository must set its `$GIT_DIR/config` to configure reftable:
+
+....
+[core]
+    repositoryformatversion = 1
+[extensions]
+    refStorage = reftable
+....
+
+Layout
+^^^^^^
+
+A collection of reftable files are stored in the `$GIT_DIR/reftable/`
+directory:
+
+....
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+where reftable files are named by a unique name such as produced by the
+function `${min_update_index}-${max_update_index}.ref`.
+
+Log-only files use the `.log` extension, while ref-only and mixed ref
+and log files use `.ref`. extension.
+
+The stack ordering file is `$GIT_DIR/reftable/tables.list` and lists the
+current files, one per line, in order, from oldest (base) to newest
+(most recent):
+
+....
+$ cat .git/reftable/tables.list
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+Readers must read `$GIT_DIR/reftable/tables.list` to determine which
+files are relevant right now, and search through the stack in reverse
+order (last reftable is examined first).
+
+Reftable files not listed in `tables.list` may be new (and about to be
+added to the stack by the active writer), or ancient and ready to be
+pruned.
+
+Backward compatibility
+^^^^^^^^^^^^^^^^^^^^^^
+
+Older clients should continue to recognize the directory as a git
+repository so they don’t look for an enclosing repository in parent
+directories. To this end, a reftable-enabled repository must contain the
+following dummy files
+
+* `.git/HEAD`, a regular file containing `ref: refs/heads/.invalid`.
+* `.git/refs/`, a directory
+* `.git/refs/heads`, a regular file
+
+Readers
+^^^^^^^
+
+Readers can obtain a consistent snapshot of the reference space by
+following:
+
+1.  Open and read the `tables.list` file.
+2.  Open each of the reftable files that it mentions.
+3.  If any of the files is missing, goto 1.
+4.  Read from the now-open files as long as necessary.
+
+Update transactions
+^^^^^^^^^^^^^^^^^^^
+
+Although reftables are immutable, mutations are supported by writing a
+new reftable and atomically appending it to the stack:
+
+1.  Acquire `tables.list.lock`.
+2.  Read `tables.list` to determine current reftables.
+3.  Select `update_index` to be most recent file’s
+`max_update_index + 1`.
+4.  Prepare temp reftable `tmp_XXXXXX`, including log entries.
+5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
+6.  Copy `tables.list` to `tables.list.lock`, appending file from (5).
+7.  Rename `tables.list.lock` to `tables.list`.
+
+During step 4 the new file’s `min_update_index` and `max_update_index`
+are both set to the `update_index` selected by step 3. All log records
+for the transaction use the same `update_index` in their keys. This
+enables later correlation of which references were updated by the same
+transaction.
+
+Because a single `tables.list.lock` file is used to manage locking, the
+repository is single-threaded for writers. Writers may have to busy-spin
+(with backoff) around creating `tables.list.lock`, for up to an
+acceptable wait period, aborting if the repository is too busy to
+mutate. Application servers wrapped around repositories (e.g. Gerrit
+Code Review) can layer their own lock/wait queue to improve fairness to
+writers.
+
+Reference deletions
+^^^^^^^^^^^^^^^^^^^
+
+Deletion of any reference can be explicitly stored by setting the `type`
+to `0x0` and omitting the `value` field of the `ref_record`. This serves
+as a tombstone, overriding any assertions about the existence of the
+reference from earlier files in the stack.
+
+Compaction
+^^^^^^^^^^
+
+A partial stack of reftables can be compacted by merging references
+using a straightforward merge join across reftables, selecting the most
+recent value for output, and omitting deleted references that do not
+appear in remaining, lower reftables.
+
+A compacted reftable should set its `min_update_index` to the smallest
+of the input files’ `min_update_index`, and its `max_update_index`
+likewise to the largest input `max_update_index`.
+
+For sake of illustration, assume the stack currently consists of
+reftable files (from oldest to newest): A, B, C, and D. The compactor is
+going to compact B and C, leaving A and D alone.
+
+1.  Obtain lock `tables.list.lock` and read the `tables.list` file.
+2.  Obtain locks `B.lock` and `C.lock`. Ownership of these locks
+prevents other processes from trying to compact these files.
+3.  Release `tables.list.lock`.
+4.  Compact `B` and `C` into a temp file
+`${min_update_index}-${max_update_index}_XXXXXX`.
+5.  Reacquire lock `tables.list.lock`.
+6.  Verify that `B` and `C` are still in the stack, in that order. This
+should always be the case, assuming that other processes are adhering to
+the locking protocol.
+7.  Rename `${min_update_index}-${max_update_index}_XXXXXX` to
+`${min_update_index}-${max_update_index}.ref`.
+8.  Write the new stack to `tables.list.lock`, replacing `B` and `C`
+with the file from (4).
+9.  Rename `tables.list.lock` to `tables.list`.
+10. Delete `B` and `C`, perhaps after a short sleep to avoid forcing
+readers to backtrack.
+
+This strategy permits compactions to proceed independently of updates.
+
+Each reftable (compacted or not) is uniquely identified by its name, so
+open reftables can be cached by their name.
+
+Alternatives considered
+~~~~~~~~~~~~~~~~~~~~~~~
+
+bzip packed-refs
+^^^^^^^^^^^^^^^^
+
+`bzip2` can significantly shrink a large packed-refs file (e.g. 62 MiB
+compresses to 23 MiB, 37%). However the bzip format does not support
+random access to a single reference. Readers must inflate and discard
+while performing a linear scan.
+
+Breaking packed-refs into chunks (individually compressing each chunk)
+would reduce the amount of data a reader must inflate, but still leaves
+the problem of indexing chunks to support readers efficiently locating
+the correct chunk.
+
+Given the compression achieved by reftable’s encoding, it does not seem
+necessary to add the complexity of bzip/gzip/zlib.
+
+Michael Haggerty’s alternate format
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Michael Haggerty proposed
+https://public-inbox.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe+wGvEJ7A@mail.gmail.com/[an
+alternate] format to reftable on the Git mailing list. This format uses
+smaller chunks, without the restart table, and avoids block alignment
+with padding. Reflog entries immediately follow each ref, and are thus
+interleaved between refs.
+
+Performance testing indicates reftable is faster for lookups (51%
+faster, 11.2 usec vs. 5.4 usec), although reftable produces a slightly
+larger file (+ ~3.2%, 28.3M vs 29.2M):
+
+[cols=">,>,>,>",options="header",]
+|=====================================
+|format |size |seek cold |seek hot
+|mh-alt |28.3 M |23.4 usec |11.2 usec
+|reftable |29.2 M |19.9 usec |5.4 usec
+|=====================================
+
+JGit Ketch RefTree
+^^^^^^^^^^^^^^^^^^
+
+https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg03073.html[JGit Ketch]
+proposed
+https://public-inbox.org/git/CAJo=hJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg@mail.gmail.com/[RefTree],
+an encoding of references inside Git tree objects stored as part of the
+repository’s object database.
+
+The RefTree format adds additional load on the object database storage
+layer (more loose objects, more objects in packs), and relies heavily on
+the packer’s delta compression to save space. Namespaces which are flat
+(e.g. thousands of tags in refs/tags) initially create very large loose
+objects, and so RefTree does not address the problem of copying many
+references to modify a handful.
+
+Flat namespaces are not efficiently searchable in RefTree, as tree
+objects in canonical formatting cannot be binary searched. This fails
+the need to handle a large number of references in a single namespace,
+such as GitHub’s `refs/pulls`, or a project with many tags.
+
+LMDB
+^^^^
+
+David Turner proposed
+https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/[using
+LMDB], as LMDB is lightweight (64k of runtime code) and GPL-compatible
+license.
+
+A downside of LMDB is its reliance on a single C implementation. This
+makes embedding inside JGit (a popular reimplemenation of Git)
+difficult, and hoisting onto virtual storage (for JGit DFS) virtually
+impossible.
+
+A common format that can be supported by all major Git implementations
+(git-core, JGit, libgit2) is strongly preferred.
+
+Future
+~~~~~~
+
+Longer hashes
+^^^^^^^^^^^^^
+
+Version will bump (e.g. 2) to indicate `value` uses a different object
+id length other than 20. The length could be stored in an expanded file
+header, or hardcoded as part of the version.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v11 06/12] reftable: define version 2 of the spec to accomodate SHA256
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                       ` (4 preceding siblings ...)
  2020-05-04 19:03                     ` [PATCH v11 05/12] reftable: file format documentation Jonathan Nieder via GitGitGadget
@ 2020-05-04 19:03                     ` Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 07/12] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
                                       ` (7 subsequent siblings)
  13 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-04 19:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/technical/reftable.txt | 50 ++++++++++++++++------------
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index 9fa4657d9ff..ee3f36ea851 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -193,8 +193,8 @@ and non-aligned files.
 Very small files (e.g. 1 only ref block) may omit `padding` and the ref
 index to reduce total file size.
 
-Header
-^^^^^^
+Header (version 1)
+^^^^^^^^^^^^^^^^^^
 
 A 24-byte header appears at the beginning of the file:
 
@@ -215,6 +215,24 @@ used in a stack for link:#Update-transactions[transactions], these
 fields can order the files such that the prior file’s
 `max_update_index + 1` is the next file’s `min_update_index`.
 
+Header (version 2)
+^^^^^^^^^^^^^^^^^^
+
+A 28-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+uint32( hash_id )
+....
+
+The header is identical to `version_number=1`, with the 4-byte hash ID
+("sha1" for SHA1 and "s256" for SHA-256) append to the header.
+
+
 First ref block
 ^^^^^^^^^^^^^^^
 
@@ -671,14 +689,8 @@ Footer
 After the last block of the file, a file footer is written. It begins
 like the file header, but is extended with additional data.
 
-A 68-byte footer appears at the end:
-
 ....
-    'REFT'
-    uint8( version_number = 1 )
-    uint24( block_size )
-    uint64( min_update_index )
-    uint64( max_update_index )
+    HEADER
 
     uint64( ref_index_position )
     uint64( (obj_position << 5) | obj_id_len )
@@ -701,12 +713,16 @@ obj blocks.
 * `obj_index_position`: byte position for the start of the obj index.
 * `log_index_position`: byte position for the start of the log index.
 
+The size of the footer is 68 bytes for version 1, and 72 bytes for
+version 2.
+
 Reading the footer
 ++++++++++++++++++
 
-Readers must seek to `file_length - 68` to access the footer. A trusted
-external source (such as `stat(2)`) is necessary to obtain
-`file_length`. When reading the footer, readers must verify:
+Readers must first read the file start to determine the version
+number. Then they seek to `file_length - FOOTER_LENGTH` to access the
+footer. A trusted external source (such as `stat(2)`) is necessary to
+obtain `file_length`. When reading the footer, readers must verify:
 
 * 4-byte magic is correct
 * 1-byte version number is recognized
@@ -1055,13 +1071,3 @@ impossible.
 
 A common format that can be supported by all major Git implementations
 (git-core, JGit, libgit2) is strongly preferred.
-
-Future
-~~~~~~
-
-Longer hashes
-^^^^^^^^^^^^^
-
-Version will bump (e.g. 2) to indicate `value` uses a different object
-id length other than 20. The length could be stored in an expanded file
-header, or hardcoded as part of the version.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v11 07/12] reftable: clarify how empty tables should be written
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                       ` (5 preceding siblings ...)
  2020-05-04 19:03                     ` [PATCH v11 06/12] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
@ 2020-05-04 19:03                     ` Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 08/12] Add reftable library Han-Wen Nienhuys via GitGitGadget
                                       ` (6 subsequent siblings)
  13 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-04 19:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The format allows for some ambiguity, as a lone footer also starts
with a valid file header. However, the current JGit code will barf on
this. This commit codifies this behavior into the standard.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/technical/reftable.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index ee3f36ea851..df5cb5f11b8 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -731,6 +731,13 @@ version)
 
 Once verified, the other fields of the footer can be accessed.
 
+Empty tables
+++++++++++++
+
+A reftable may be empty. In this case, the file starts with a header
+and is immediately followed by a footer.
+
+
 Varint encoding
 ^^^^^^^^^^^^^^^
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v11 08/12] Add reftable library
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                       ` (6 preceding siblings ...)
  2020-05-04 19:03                     ` [PATCH v11 07/12] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
@ 2020-05-04 19:03                     ` Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 09/12] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                                       ` (5 subsequent siblings)
  13 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-04 19:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. Its rationale and
specification is in the preceding commit.

This imports the upstream library as one big commit. For understanding
the code, it is suggested to read the code in the following order:

* The specification under Documentation/technical/reftable.txt

* reftable.h - the public API

* record.{c,h} - reading and writing records

* block.{c,h} - reading and writing blocks.

* writer.{c,h} - writing a complete reftable file.

* merged.{c,h} and pq.{c,h} - reading a stack of reftables

* stack.{c,h} - writing and compacting stacks of reftable on the
filesystem.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE       |   31 +
 reftable/README.md     |   11 +
 reftable/VERSION       |    5 +
 reftable/basics.c      |  215 +++++++
 reftable/basics.h      |   53 ++
 reftable/block.c       |  436 ++++++++++++++
 reftable/block.h       |  126 ++++
 reftable/blocksource.h |   22 +
 reftable/constants.h   |   21 +
 reftable/file.c        |   99 ++++
 reftable/iter.c        |  240 ++++++++
 reftable/iter.h        |   63 ++
 reftable/merged.c      |  327 +++++++++++
 reftable/merged.h      |   38 ++
 reftable/pq.c          |  115 ++++
 reftable/pq.h          |   34 ++
 reftable/reader.c      |  758 +++++++++++++++++++++++++
 reftable/reader.h      |   68 +++
 reftable/record.c      | 1131 ++++++++++++++++++++++++++++++++++++
 reftable/record.h      |  121 ++++
 reftable/refname.c     |  215 +++++++
 reftable/refname.h     |   39 ++
 reftable/reftable.c    |   91 +++
 reftable/reftable.h    |  564 ++++++++++++++++++
 reftable/slice.c       |  225 ++++++++
 reftable/slice.h       |   76 +++
 reftable/stack.c       | 1229 ++++++++++++++++++++++++++++++++++++++++
 reftable/stack.h       |   45 ++
 reftable/system.h      |   54 ++
 reftable/tree.c        |   67 +++
 reftable/tree.h        |   34 ++
 reftable/update.sh     |   24 +
 reftable/writer.c      |  661 +++++++++++++++++++++
 reftable/writer.h      |   60 ++
 reftable/zlib-compat.c |   92 +++
 35 files changed, 7390 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/blocksource.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 00000000000..402e0f9356b
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 00000000000..21500563fd4
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,11 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   sh reftable/update.sh
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 00000000000..b947a05c089
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,5 @@
+commit 3a486f79abcd17e88e3bca62f43188a7c8b80ff3
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Mon May 4 19:20:21 2020 +0200
+
+    C: include "cache.h" in git-core sleep_millisec()
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 00000000000..14c4dfaf5a6
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,215 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)(i & 0xff);
+}
+
+uint32_t get_be24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)(i & 0xff);
+}
+
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args)
+{
+	size_t lo = 0;
+	size_t hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		size_t mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		reftable_free(*p);
+		p++;
+	}
+	reftable_free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = reftable_realloc(
+					names, names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = reftable_realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *reftable_error_str(int err)
+{
+	static char buf[250];
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return "I/O error";
+	case REFTABLE_FORMAT_ERROR:
+		return "corrupt reftable file";
+	case REFTABLE_NOT_EXIST_ERROR:
+		return "file does not exist";
+	case REFTABLE_LOCK_ERROR:
+		return "data is outdated";
+	case REFTABLE_API_ERROR:
+		return "misuse of the reftable API";
+	case REFTABLE_ZLIB_ERROR:
+		return "zlib failure";
+	case REFTABLE_NAME_CONFLICT:
+		return "file/directory conflict";
+	case REFTABLE_REFNAME_ERROR:
+		return "invalid refname";
+	case -1:
+		return "general error";
+	default:
+		snprintf(buf, sizeof(buf), "unknown error code %d", err);
+		return buf;
+	}
+}
+
+int reftable_error_to_errno(int err)
+{
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return EIO;
+	case REFTABLE_FORMAT_ERROR:
+		return EFAULT;
+	case REFTABLE_NOT_EXIST_ERROR:
+		return ENOENT;
+	case REFTABLE_LOCK_ERROR:
+		return EBUSY;
+	case REFTABLE_API_ERROR:
+		return EINVAL;
+	case REFTABLE_ZLIB_ERROR:
+		return EDOM;
+	default:
+		return ERANGE;
+	}
+}
+
+void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
+void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
+void (*reftable_free_ptr)(void *) = &free;
+
+void *reftable_malloc(size_t sz)
+{
+	return (*reftable_malloc_ptr)(sz);
+}
+
+void *reftable_realloc(void *p, size_t sz)
+{
+	return (*reftable_realloc_ptr)(p, sz);
+}
+
+void reftable_free(void *p)
+{
+	reftable_free_ptr(p);
+}
+
+void *reftable_calloc(size_t sz)
+{
+	void *p = reftable_malloc(sz);
+	memset(p, 0, sz);
+	return p;
+}
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *))
+{
+	reftable_malloc_ptr = malloc;
+	reftable_realloc_ptr = realloc;
+	reftable_free_ptr = free;
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 00000000000..1b003485c9c
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+/* Bigendian en/decoding of integers */
+
+void put_be24(byte *out, uint32_t i);
+uint32_t get_be24(byte *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args);
+
+/*
+  Frees a NULL terminated array of malloced strings. The array itself is also
+  freed.
+ */
+void free_names(char **a);
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp);
+
+/* compares two NULL-terminated arrays of strings. */
+int names_equal(char **a, char **b);
+
+/* returns the array size of a NULL-terminated array of strings. */
+int names_length(char **names);
+
+/* Allocation routines; they invoke the functions set through
+ * reftable_set_alloc() */
+void *reftable_malloc(size_t sz);
+void *reftable_realloc(void *p, size_t sz);
+void reftable_free(void *p);
+void *reftable_calloc(size_t sz);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 00000000000..968b192979a
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,436 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "blocksource.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int header_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 24;
+	case 2:
+		return 28;
+	}
+	abort();
+}
+
+int footer_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 68;
+	case 2:
+		return 72;
+	}
+	abort();
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct record rec)
+{
+	struct slice empty = { 0 };
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = { 0 };
+	int n = 0;
+
+	record_key(rec, &key);
+	n = encode_key(&restart, out, last, key, record_val_type(rec));
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	n = record_encode(rec, out, w->hash_size);
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0) {
+		goto err;
+	}
+
+	slice_clear(&key);
+	return 0;
+
+err:
+	slice_clear(&key);
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next) {
+		return -1;
+	}
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = reftable_realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = { 0 };
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		slice_resize(&compressed, src_len);
+
+		while (1) {
+			uLongf dest_len = compressed.len;
+
+			zresult = compress2(compressed.buf, &dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				slice_resize(&compressed, 2 * compressed.len);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				slice_clear(&compressed);
+				return REFTABLE_ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed.buf,
+			       dest_len);
+			w->next = dest_len + block_header_skip;
+			slice_clear(&compressed);
+			break;
+		}
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct reftable_block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	if (!is_block_type(typ)) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = { 0 };
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip; /* total size of dest
+							    buffer. */
+		uLongf src_len = block->len - block_header_skip;
+
+		/* Log blocks specify the *uncompressed* size in their header.
+		 */
+		slice_resize(&uncompressed, sz);
+
+		/* Copy over the block header verbatim. It's not compressed. */
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		/* Uncompress */
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			slice_clear(&uncompressed);
+			return REFTABLE_ZLIB_ERROR;
+		}
+
+		if (dst_len + block_header_skip != sz) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+
+		/* We're done with the input data. */
+		block_source_return_block(block->source, block);
+		block->data = uncompressed.buf;
+		block->len = sz;
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	{
+		uint16_t restart_count = get_be16(block->data + sz - 2);
+		uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+		byte *restart_bytes = block->data + restart_start;
+
+		/* transfer ownership. */
+		br->block = *block;
+		block->data = NULL;
+		block->len = 0;
+
+		br->hash_size = hash_size;
+		br->block_len = restart_start;
+		br->full_block_size = full_block_size;
+		br->header_off = header_off;
+		br->restart_count = restart_count;
+		br->restart_bytes = restart_bytes;
+	}
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct slice key;
+	struct block_reader *r;
+};
+
+static int restart_key_less(size_t idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = { 0 };
+	struct slice last_key = { 0 };
+	byte unused_extra;
+	int n = decode_key(&rkey, &unused_extra, last_key, in);
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	{
+		int result = slice_compare(a->key, rkey);
+		slice_clear(&rkey);
+		return result;
+	}
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+int block_iter_next(struct block_iter *it, struct record rec)
+{
+	if (it->next_off >= it->br->block_len) {
+		return 1;
+	}
+
+	{
+		struct slice in = {
+			.buf = it->br->block.data + it->next_off,
+			.len = it->br->block_len - it->next_off,
+		};
+		struct slice start = in;
+		struct slice key = { 0 };
+		byte extra;
+		int n = decode_key(&key, &extra, it->last_key, in);
+		if (n < 0) {
+			return -1;
+		}
+
+		slice_consume(&in, n);
+		n = record_decode(rec, key, extra, in, it->br->hash_size);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+
+		slice_copy(&it->last_key, key);
+		it->next_off += start.len - in.len;
+		slice_clear(&key);
+		return 0;
+	}
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = { 0 };
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	byte extra = 0;
+	int n = decode_key(key, &extra, empty, in);
+	if (n < 0) {
+		return n;
+	}
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	slice_clear(&it->last_key);
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		return -1;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	{
+		struct record rec = new_record(block_reader_type(br));
+		struct slice key = { 0 };
+		int result = 0;
+		int err = 0;
+		struct block_iter next = { 0 };
+		while (true) {
+			block_iter_copy_from(&next, it);
+
+			err = block_iter_next(&next, rec);
+			if (err < 0) {
+				result = -1;
+				goto exit;
+			}
+
+			record_key(rec, &key);
+			if (err > 0 || slice_compare(key, want) >= 0) {
+				result = 0;
+				goto exit;
+			}
+
+			block_iter_copy_from(it, &next);
+		}
+
+	exit:
+		slice_clear(&key);
+		slice_clear(&next.last_key);
+		record_destroy(&rec);
+
+		return result;
+	}
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	slice_clear(&bw->last_key);
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 00000000000..320a2d6753b
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,126 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+/*
+  Writes reftable blocks. The block_writer is reused across blocks to minimize
+  allocation overhead.
+*/
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+
+	/* Offset ofof the global header. Nonzero in the first block only. */
+	uint32_t header_off;
+
+	/* How often to restart keys. */
+	int restart_interval;
+	int hash_size;
+
+	/* Offset of next byte to write. */
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+
+	struct slice last_key;
+	int entries;
+};
+
+/*
+  initializes the blockwriter to write `typ` entries, using `buf` as temporary
+  storage. `buf` is not owned by the block_writer. */
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+
+/*
+  returns the block type (eg. 'r' for ref records.
+*/
+byte block_writer_type(struct block_writer *bw);
+
+/* appends the record, or -1 if it doesn't fit. */
+int block_writer_add(struct block_writer *w, struct record rec);
+
+/* appends the key restarts, and compress the block if necessary. */
+int block_writer_finish(struct block_writer *w);
+
+/* clears out internally allocated block_writer members. */
+void block_writer_clear(struct block_writer *bw);
+
+/* Read a block. */
+struct block_reader {
+	/* offset of the block header; nonzero for the first block in a
+	 * reftable. */
+	uint32_t header_off;
+
+	/* the memory block */
+	struct reftable_block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint16_t restart_count;
+
+	/* size of the data in the file. For log blocks, this is the compressed
+	 * size. */
+	uint32_t full_block_size;
+};
+
+/* Iterate over entries in a block */
+struct block_iter {
+	/* offset within the block of the next entry to read. */
+	uint32_t next_off;
+	struct block_reader *br;
+
+	/* key for last entry we read. */
+	struct slice last_key;
+};
+
+/* initializes a block reader */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Position `it` at start of the block */
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+
+/* Position `it` to the `want` key in the block */
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want);
+
+/* Returns the block type (eg. 'r' for refs) */
+byte block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct record rec);
+
+/* Seek to `want` with in the block pointed to by `it` */
+int block_iter_seek(struct block_iter *it, struct slice want);
+
+/* deallocate memory for `it`. The block reader and its block is left intact. */
+void block_iter_close(struct block_iter *it);
+
+/* size of file header, depending on format version */
+int header_size(int version);
+
+/* size of file footer, depending on format version */
+int footer_size(int version);
+
+#endif
diff --git a/reftable/blocksource.h b/reftable/blocksource.h
new file mode 100644
index 00000000000..1fdf0bf4557
--- /dev/null
+++ b/reftable/blocksource.h
@@ -0,0 +1,22 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCKSOURCE_H
+#define BLOCKSOURCE_H
+
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source source);
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_return_block(struct reftable_block_source source,
+			       struct reftable_block *ret);
+void block_source_close(struct reftable_block_source source);
+
+#endif
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 00000000000..5eee72c4c11
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,21 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 00000000000..2bf1135a2e2
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,99 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	reftable_free(b);
+}
+
+static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = reftable_malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size) {
+		return -1;
+	}
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int reftable_block_source_from_file(struct reftable_block_source *bs,
+				    const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return REFTABLE_NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0) {
+		return -1;
+	}
+
+	{
+		struct file_block_source *p =
+			reftable_calloc(sizeof(struct file_block_source));
+		p->size = st.st_size;
+		p->fd = fd;
+
+		assert(bs->ops == NULL);
+		bs->ops = &file_vtable;
+		bs->arg = p;
+	}
+	return 0;
+}
+
+int reftable_fd_write(void *arg, byte *data, size_t sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 00000000000..145fa1393ad
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,240 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct reftable_iterator it)
+{
+	return it.ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct record rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct reftable_iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct reftable_iterator *it)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct reftable_iterator it, struct record rec)
+{
+	return it.ops->next(it.iter_arg, rec);
+}
+
+void reftable_iterator_destroy(struct reftable_iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int reftable_iterator_next_ref(struct reftable_iterator it,
+			       struct reftable_ref_record *ref)
+{
+	struct record rec = { 0 };
+	record_from_ref(&rec, ref);
+	return iterator_next(it, rec);
+}
+
+int reftable_iterator_next_log(struct reftable_iterator it,
+			       struct reftable_log_record *log)
+{
+	struct record rec = { 0 };
+	record_from_log(&rec, log);
+	return iterator_next(it, rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	slice_clear(&fri->oid);
+	reftable_iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec.data;
+	int err = 0;
+	while (true) {
+		err = reftable_iterator_next_ref(fri->it, ref);
+		if (err != 0) {
+			break;
+		}
+
+		if (fri->double_check) {
+			struct reftable_iterator it = { 0 };
+
+			err = reftable_reader_seek_ref(fri->r, &it,
+						       ref->ref_name);
+			if (err == 0) {
+				err = reftable_iterator_next_ref(it, ref);
+			}
+
+			reftable_iterator_destroy(&it);
+
+			if (err < 0) {
+				break;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+
+	reftable_ref_record_clear(ref);
+	return err;
+}
+
+struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reader_return_block(it->r, &it->block_reader.block);
+	slice_clear(&it->oid);
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reader_return_block(it->r, &it->block_reader.block);
+
+	{
+		uint64_t off = it->offsets[it->offset_idx++];
+		int err = reader_init_block_reader(it->r, &it->block_reader,
+						   off, BLOCK_TYPE_REF);
+		if (err < 0) {
+			return err;
+		}
+		if (err > 0) {
+			/* indexed block does not exist. */
+			return REFTABLE_FORMAT_ERROR;
+		}
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct record rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec.data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter *itr =
+		reftable_calloc(sizeof(struct indexed_table_ref_iter));
+	int err = 0;
+
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		reftable_free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 00000000000..ef1d3964baa
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,63 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct reftable_iterator_vtable {
+	int (*next)(void *iter_arg, struct record rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct reftable_iterator *it);
+int iterator_next(struct reftable_iterator it, struct record rec);
+
+/* Returns true for a zeroed out iterator, such as the one returned from
+   iterator_destroy. */
+bool iterator_is_null(struct reftable_iterator it);
+
+/* iterator that produces only ref records that point to `oid` */
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reftable_reader *r;
+	struct slice oid;
+	struct reftable_iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
+					  struct filtering_ref_iterator *);
+
+/* iterator that produces only ref records that point to `oid`,
+   but using the object index.
+ */
+struct indexed_table_ref_iter {
+	struct reftable_reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 00000000000..5a1b896e0b0
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,327 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct record rec = new_record(mi->typ);
+		int err = iterator_next(mi->stack[i], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[i]);
+			record_destroy(&rec);
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->stack[i]);
+	}
+	reftable_free(mi->stack);
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
+{
+	if (iterator_is_null(mi->stack[idx])) {
+		return 0;
+	}
+
+	{
+		struct record rec = new_record(mi->typ);
+		struct pq_entry e = {
+			.rec = rec,
+			.index = idx,
+		};
+		int err = iterator_next(mi->stack[idx], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[idx]);
+			record_destroy(&rec);
+			return 0;
+		}
+
+		merged_iter_pqueue_add(&mi->pq, e);
+	}
+	return 0;
+}
+
+static int merged_iter_next_entry(struct merged_iter *mi, struct record rec)
+{
+	struct slice entry_key = { 0 };
+	struct pq_entry entry = { 0 };
+	int err = 0;
+
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	entry = merged_iter_pqueue_remove(&mi->pq);
+	err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0) {
+		return err;
+	}
+
+	/*
+	  One can also use reftable as datacenter-local storage, where the ref
+	  database is maintained in globally consistent database (eg.
+	  CockroachDB or Spanner). In this scenario, replication delays together
+	  with compaction may cause newer tables to contain older entries. In
+	  such a deployment, the loop below must be changed to collect all
+	  entries for the same key, and return new the newest one.
+	*/
+	record_key(entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = { 0 };
+		int err = 0, cmp = 0;
+
+		record_key(top.rec, &k);
+
+		cmp = slice_compare(k, entry_key);
+		slice_clear(&k);
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		record_clear(top.rec);
+		reftable_free(record_yield(&top.rec));
+	}
+
+	record_copy_from(rec, entry.rec, hash_size(mi->hash_id));
+	record_clear(entry.rec);
+	reftable_free(record_yield(&entry.rec));
+	slice_clear(&entry_key);
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct record rec)
+{
+	while (true) {
+		int err = merged_iter_next_entry(mi, rec);
+		if (err == 0 && mi->suppress_deletions &&
+		    record_is_deletion(rec)) {
+			continue;
+		}
+
+		return err;
+	}
+}
+
+static int merged_iter_next_void(void *p, struct record rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	return merged_iter_next(mi, rec);
+}
+
+struct reftable_iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct reftable_iterator *it,
+				      struct merged_iter *mi)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id)
+{
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reftable_reader *r = stack[i];
+		if (r->hash_id != hash_id) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reftable_reader_min_update_index(r);
+		}
+
+		last_max = reftable_reader_max_update_index(r);
+	}
+
+	{
+		struct reftable_merged_table m = {
+			.stack = stack,
+			.stack_len = n,
+			.min = first_min,
+			.max = last_max,
+			.hash_id = hash_id,
+		};
+
+		*dest = reftable_calloc(sizeof(struct reftable_merged_table));
+		**dest = m;
+	}
+	return 0;
+}
+
+void reftable_merged_table_close(struct reftable_merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reftable_reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct reftable_merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void reftable_merged_table_free(struct reftable_merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	reftable_free(mt);
+}
+
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
+{
+	return mt->min;
+}
+
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it, struct record rec)
+{
+	struct reftable_iterator *iters = reftable_calloc(
+		sizeof(struct reftable_iterator) * mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = record_type(rec),
+		.hash_id = mt->hash_id,
+		.suppress_deletions = mt->suppress_deletions,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			reftable_iterator_destroy(&iters[i]);
+		}
+		reftable_free(iters);
+		return err;
+	}
+
+	merged.stack_len = n;
+	err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	}
+
+	{
+		struct merged_iter *p =
+			reftable_malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 00000000000..a71f36ecbae
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,38 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct reftable_merged_table {
+	struct reftable_reader **stack;
+	int stack_len;
+	uint32_t hash_id;
+	bool suppress_deletions;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	int stack_len;
+	byte typ;
+	bool suppress_deletions;
+	struct merged_iter_pqueue pq;
+};
+
+void merged_table_clear(struct reftable_merged_table *mt);
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it, struct record rec);
+
+#endif
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 00000000000..bfad96c6695
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,115 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = { 0 };
+	struct slice bk = { 0 };
+	int cmp = 0;
+	record_key(a.rec, &ak);
+	record_key(b.rec, &bk);
+
+	cmp = slice_compare(ak, bk);
+
+	slice_clear(&ak);
+	slice_clear(&bk);
+
+	if (cmp == 0) {
+		return a.index > b.index;
+	}
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = reftable_realloc(pq->heap,
+					    pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		record_clear(pq->heap[i].rec);
+		reftable_free(record_yield(&pq->heap[i].rec));
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 00000000000..b585a52ee14
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 00000000000..00913bc3b54
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,758 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct reftable_block_source source)
+{
+	return source.ops->size(source.arg);
+}
+
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	int result = source.ops->read_block(source.arg, dest, off, size);
+	dest->source = source;
+	return result;
+}
+
+void block_source_return_block(struct reftable_block_source source,
+			       struct reftable_block *blockp)
+{
+	source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct reftable_block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reftable_reader_offsets *
+reader_offsets_for(struct reftable_reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reftable_reader *r,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size) {
+		return 0;
+	}
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(r->source, dest, off, sz);
+}
+
+void reader_return_block(struct reftable_reader *r, struct reftable_block *p)
+{
+	block_source_return_block(r->source, p);
+}
+
+uint32_t reftable_reader_hash_id(struct reftable_reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reftable_reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reftable_reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	int err = 0;
+	if (memcmp(f, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, header_size(r->version))) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+
+	f++;
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	if (r->version == 1) {
+		r->hash_id = SHA1_ID;
+	} else {
+		r->hash_id = get_be32(f);
+		switch (r->hash_id) {
+		case SHA1_ID:
+			break;
+		case SHA256_ID:
+			break;
+		default:
+			err = REFTABLE_FORMAT_ERROR;
+			goto exit;
+		}
+		f += 4;
+	}
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	{
+		uint32_t computed_crc = crc32(0, footer, f - footer);
+		uint32_t file_crc = get_be32(f);
+		f += 4;
+		if (computed_crc != file_crc) {
+			err = REFTABLE_FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	{
+		byte first_block_typ = header[header_size(r->version)];
+		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+		r->ref_offsets.offset = 0;
+		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+					  r->log_offsets.offset > 0);
+		r->obj_offsets.present = r->obj_offsets.offset > 0;
+	}
+	err = 0;
+exit:
+	return err;
+}
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
+		const char *name)
+{
+	struct reftable_block footer = { 0 };
+	struct reftable_block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reftable_reader));
+
+	/* Need +1 to read type of first block. */
+	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
+	if (err != header_size(2) + 1) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	if (memcmp(header.data, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+	r->version = header.data[4];
+	if (r->version != 1 && r->version != 2) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+
+	r->size = block_source_size(source) - footer_size(r->version);
+	r->source = source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size,
+				      footer_size(r->version));
+	if (err != footer_size(r->version)) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+exit:
+	block_source_return_block(r->source, &footer);
+	block_source_return_block(r->source, &header);
+	return err;
+}
+
+struct table_iter {
+	struct reftable_reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
+		((struct reftable_ref_record *)rec.data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reader_return_block(ti->r, &ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off,
+				  int version)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += header_size(version);
+	}
+
+	*typ = data[0];
+	if (is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct reftable_block block = { 0 };
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : header_size(r->version);
+	int32_t block_size = 0;
+
+	if (next_off >= r->size) {
+		return 1;
+	}
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0) {
+		return err;
+	}
+
+	block_size = extract_block_size(block.data, &block_typ, next_off,
+					r->version);
+	if (block_size < 0) {
+		return block_size;
+	}
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reader_return_block(r, &block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reader_return_block(r, &block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0) {
+		return err;
+	}
+
+	{
+		struct block_reader *brp =
+			reftable_malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct record rec)
+{
+	if (record_type(rec) != ti->typ) {
+		return REFTABLE_API_ERROR;
+	}
+
+	while (true) {
+		struct table_iter next = { 0 };
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct record rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct reftable_iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reftable_reader *r,
+				struct table_iter *ti, uint64_t off, byte typ)
+{
+	struct block_reader br = { 0 };
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0) {
+		return err;
+	}
+
+	brp = reftable_malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reftable_reader *r, struct table_iter *ti,
+			byte typ, bool index)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
+			      struct record want)
+{
+	struct record rec = new_record(record_type(want));
+	struct slice want_key = { 0 };
+	struct slice got_key = { 0 };
+	struct table_iter next = { 0 };
+	int err = -1;
+	record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0) {
+			goto exit;
+		}
+		{
+			int cmp = slice_compare(got_key, want_key);
+			if (cmp > 0) {
+				table_iter_block_done(&next);
+				break;
+			}
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0) {
+		goto exit;
+	}
+	err = 0;
+
+exit:
+	block_iter_close(&next.bi);
+	record_destroy(&rec);
+	slice_clear(&want_key);
+	slice_clear(&got_key);
+	return err;
+}
+
+static int reader_seek_indexed(struct reftable_reader *r,
+			       struct reftable_iterator *it, struct record rec)
+{
+	struct index_record want_index = { 0 };
+	struct record want_index_rec = { 0 };
+	struct index_record index_result = { 0 };
+	struct record index_result_rec = { 0 };
+	struct table_iter index_iter = { 0 };
+	struct table_iter next = { 0 };
+	int err = 0;
+
+	record_key(rec, &want_index.last_key);
+	record_from_index(&want_index_rec, &want_index);
+	record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, record_type(rec), true);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reader_seek_linear(r, &index_iter, want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (next.typ == record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = REFTABLE_FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter *malloced =
+			reftable_calloc(sizeof(struct table_iter));
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+exit:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	record_clear(want_index_rec);
+	record_clear(index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reftable_reader *r,
+				struct reftable_iterator *it, struct record rec)
+{
+	struct reftable_reader_offsets *offs =
+		reader_offsets_for(r, record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = { 0 };
+	int err = 0;
+	if (idx > 0) {
+		return reader_seek_indexed(r, it, rec);
+	}
+
+	err = reader_start(r, &ti, record_type(rec), false);
+	if (err < 0) {
+		return err;
+	}
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct table_iter *p =
+			reftable_malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct record rec)
+{
+	byte typ = record_type(rec);
+
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return reader_seek(r, it, rec);
+}
+
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return reader_seek(r, it, rec);
+}
+
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reftable_reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int reftable_new_reader(struct reftable_reader **p,
+			struct reftable_block_source src, char const *name)
+{
+	struct reftable_reader *rd =
+		reftable_calloc(sizeof(struct reftable_reader));
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		block_source_close(&src);
+		reftable_free(rd);
+	}
+	return err;
+}
+
+void reftable_reader_free(struct reftable_reader *r)
+{
+	reader_close(r);
+	reftable_free(r);
+}
+
+static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
+					    struct reftable_iterator *it,
+					    byte *oid)
+{
+	struct obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct record want_rec = { 0 };
+	struct reftable_iterator oit = { 0 };
+	struct obj_record got = { 0 };
+	struct record got_rec = { 0 };
+	int err = 0;
+
+	/* Look through the reverse index. */
+	record_from_obj(&want_rec, &want);
+	err = reader_seek(r, &oit, want_rec);
+	if (err != 0) {
+		goto exit;
+	}
+
+	/* read out the obj_record */
+	record_from_obj(&got_rec, &got);
+	err = iterator_next(oit, got_rec);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		/* didn't find it; return empty iterator */
+		iterator_set_empty(it);
+		err = 0;
+		goto exit;
+	}
+
+	{
+		struct indexed_table_ref_iter *itr = NULL;
+		err = new_indexed_table_ref_iter(&itr, r, oid,
+						 hash_size(r->hash_id),
+						 got.offsets, got.offset_len);
+		if (err < 0) {
+			goto exit;
+		}
+		got.offsets = NULL;
+		iterator_from_indexed_table_ref_iter(it, itr);
+	}
+
+exit:
+	reftable_iterator_destroy(&oit);
+	record_clear(got_rec);
+	return err;
+}
+
+static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
+					      struct reftable_iterator *it,
+					      byte *oid, int oid_len)
+{
+	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
+	struct filtering_ref_iterator *filter = NULL;
+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		reftable_free(ti);
+		return err;
+	}
+
+	filter = reftable_calloc(sizeof(struct filtering_ref_iterator));
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	filter->r = r;
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, byte *oid,
+			     int oid_len)
+{
+	if (r->obj_offsets.present) {
+		return reftable_reader_refs_for_indexed(r, it, oid);
+	}
+	return reftable_reader_refs_for_unindexed(r, it, oid, oid_len);
+}
+
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 00000000000..80d8a5adbaa
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,68 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source source);
+
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_return_block(struct reftable_block_source source,
+			       struct reftable_block *ret);
+void block_source_close(struct reftable_block_source *source);
+
+/* metadata for a block type */
+struct reftable_reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+/* The state for reading a reftable file. */
+struct reftable_reader {
+	/* for convience, associate a name with the instance. */
+	char *name;
+	struct reftable_block_source source;
+
+	/* Size of the file, excluding the footer. */
+	uint64_t size;
+
+	/* 'sha1' for SHA1, 's256' for SHA-256 */
+	uint32_t hash_id;
+
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	/* Length of the OID keys in the 'o' section */
+	int object_id_len;
+	int version;
+
+	struct reftable_reader_offsets ref_offsets;
+	struct reftable_reader_offsets obj_offsets;
+	struct reftable_reader_offsets log_offsets;
+};
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
+		const char *name);
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct record rec);
+void reader_close(struct reftable_reader *r);
+const char *reader_name(struct reftable_reader *r);
+void reader_return_block(struct reftable_reader *r, struct reftable_block *p);
+
+/* initialize a block reader to read from `r` */
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 00000000000..8b9099b1966
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1131 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+/* record.c - methods for different types of records. */
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0) {
+		return -1;
+	}
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = { 0 };
+	int i = 9;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	{
+		int n = sizeof(buf) - i - 1;
+		if (dest.len < n) {
+			return -1;
+		}
+		memcpy(dest.buf, &buf[i + 1], n);
+		return n;
+	}
+}
+
+int is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+	if (in.len < tsize) {
+		return -1;
+	}
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	slice_consume(&in, tsize);
+
+	return start_len - in.len;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+	if (s.len < l) {
+		return -1;
+	}
+	memcpy(s.buf, str, l);
+	slice_consume(&s, l);
+
+	return start.len - s.len;
+}
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	if (dest.len < suffix_len) {
+		return -1;
+	}
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	slice_consume(&dest, suffix_len);
+
+	return start.len - dest.len;
+}
+
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	if (prefix_len > last_key.len) {
+		return -1;
+	}
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len) {
+		return -1;
+	}
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	slice_consume(&in, suffix_len);
+
+	return start_len - in.len;
+}
+
+static void reftable_ref_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_ref_record *rec =
+		(const struct reftable_ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
+	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	reftable_ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = xstrdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = reftable_malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = reftable_malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9) {
+		return '0' + c;
+	}
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size(hash_id));
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size(hash_id));
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void reftable_ref_record_clear_void(void *rec)
+{
+	reftable_ref_record_clear((struct reftable_ref_record *)rec);
+}
+
+void reftable_ref_record_clear(struct reftable_ref_record *ref)
+{
+	reftable_free(ref->ref_name);
+	reftable_free(ref->target);
+	reftable_free(ref->target_value);
+	reftable_free(ref->value);
+	memset(ref, 0, sizeof(struct reftable_ref_record));
+}
+
+static byte reftable_ref_record_val_type(const void *rec)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL) {
+		return 3;
+	}
+	return 0;
+}
+
+static int reftable_ref_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+
+	return start.len - s.len;
+}
+
+static int reftable_ref_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0) {
+		return n;
+	}
+	assert(hash_size > 0);
+
+	slice_consume(&in, n);
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = reftable_malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = reftable_malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		break;
+	case 3: {
+		struct slice dest = { 0 };
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+		seen_target = true;
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+static bool reftable_ref_record_is_deletion_void(const void *p)
+{
+	return reftable_ref_record_is_deletion(
+		(const struct reftable_ref_record *)p);
+}
+
+struct record_vtable reftable_ref_record_vtable = {
+	.key = &reftable_ref_record_key,
+	.type = BLOCK_TYPE_REF,
+	.copy_from = &reftable_ref_record_copy_from,
+	.val_type = &reftable_ref_record_val_type,
+	.encode = &reftable_ref_record_encode,
+	.decode = &reftable_ref_record_decode,
+	.clear = &reftable_ref_record_clear_void,
+	.is_deletion = &reftable_ref_record_is_deletion_void,
+};
+
+static void obj_record_key(const void *r, struct slice *dest)
+{
+	const struct obj_record *rec = (const struct obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	const struct obj_record *src = (const struct obj_record *)src_rec;
+
+	*ref = *src;
+	ref->hash_prefix = reftable_malloc(ref->hash_prefix_len);
+	memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
+
+	{
+		int olen = ref->offset_len * sizeof(uint64_t);
+		ref->offsets = reftable_malloc(olen);
+		memcpy(ref->offsets, src->offsets, olen);
+	}
+}
+
+static void obj_record_clear(void *rec)
+{
+	struct obj_record *ref = (struct obj_record *)rec;
+	FREE_AND_NULL(ref->hash_prefix);
+	FREE_AND_NULL(ref->offsets);
+	memset(ref, 0, sizeof(struct obj_record));
+}
+
+static byte obj_record_val_type(const void *rec)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8) {
+		return r->offset_len;
+	}
+	return 0;
+}
+
+static int obj_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+	if (r->offset_len == 0) {
+		return start.len - s.len;
+	}
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	{
+		uint64_t last = r->offsets[0];
+		int i = 0;
+		for (i = 1; i < r->offset_len; i++) {
+			int n = put_var_int(s, r->offsets[i] - last);
+			if (n < 0) {
+				return -1;
+			}
+			slice_consume(&s, n);
+			last = r->offsets[i];
+		}
+	}
+	return start.len - s.len;
+}
+
+static int obj_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct obj_record *r = (struct obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	r->hash_prefix = reftable_malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		slice_consume(&in, n);
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0) {
+		return start.len - in.len;
+	}
+
+	r->offsets = reftable_malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0) {
+		return n;
+	}
+	slice_consume(&in, n);
+
+	{
+		uint64_t last = r->offsets[0];
+		int j = 1;
+		while (j < count) {
+			uint64_t delta = 0;
+			int n = get_var_int(&delta, in);
+			if (n < 0) {
+				return n;
+			}
+			slice_consume(&in, n);
+
+			last = r->offsets[j] = (delta + last);
+			j++;
+		}
+	}
+	return start.len - in.len;
+}
+
+static bool not_a_deletion(const void *p)
+{
+	return false;
+}
+
+struct record_vtable obj_record_vtable = {
+	.key = &obj_record_key,
+	.type = BLOCK_TYPE_OBJ,
+	.copy_from = &obj_record_copy_from,
+	.val_type = &obj_record_val_type,
+	.encode = &obj_record_encode,
+	.decode = &obj_record_decode,
+	.clear = &obj_record_clear,
+	.is_deletion = not_a_deletion,
+};
+
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size(hash_id));
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size(hash_id));
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static void reftable_log_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_log_record *rec =
+		(const struct reftable_log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_be64(dest->buf + 1 + len, ts);
+}
+
+static void reftable_log_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
+	const struct reftable_log_record *src =
+		(const struct reftable_log_record *)src_rec;
+
+	*dst = *src;
+	if (dst->ref_name != NULL) {
+		dst->ref_name = xstrdup(dst->ref_name);
+	}
+	if (dst->email != NULL) {
+		dst->email = xstrdup(dst->email);
+	}
+	if (dst->name != NULL) {
+		dst->name = xstrdup(dst->name);
+	}
+	if (dst->message != NULL) {
+		dst->message = xstrdup(dst->message);
+	}
+
+	if (dst->new_hash != NULL) {
+		dst->new_hash = reftable_malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = reftable_malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void reftable_log_record_clear_void(void *rec)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	reftable_log_record_clear(r);
+}
+
+void reftable_log_record_clear(struct reftable_log_record *r)
+{
+	reftable_free(r->ref_name);
+	reftable_free(r->new_hash);
+	reftable_free(r->old_hash);
+	reftable_free(r->name);
+	reftable_free(r->email);
+	reftable_free(r->message);
+	memset(r, 0, sizeof(struct reftable_log_record));
+}
+
+static byte reftable_log_record_val_type(const void *rec)
+{
+	const struct reftable_log_record *log =
+		(const struct reftable_log_record *)rec;
+
+	return reftable_log_record_is_deletion(log) ? 0 : 1;
+}
+
+static byte zero[SHA256_SIZE] = { 0 };
+
+static int reftable_log_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (reftable_log_record_is_deletion(r)) {
+		return 0;
+	}
+
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size) {
+		return -1;
+	}
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	slice_consume(&s, 2 * hash_size);
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = put_var_int(s, r->time);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (s.len < 2) {
+		return -1;
+	}
+
+	put_be16(s.buf, r->tz_offset);
+	slice_consume(&s, 2);
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	return start.len - s.len;
+}
+
+static int reftable_log_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = { 0 };
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (val_type == 0) {
+		return 0;
+	}
+
+	if (in.len < 2 * hash_size) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	r->old_hash = reftable_realloc(r->old_hash, hash_size);
+	r->new_hash = reftable_realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	slice_consume(&in, 2 * hash_size);
+
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->name = reftable_realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->email = reftable_realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+	r->time = ts;
+	if (in.len < 2) {
+		goto error;
+	}
+
+	r->tz_offset = get_be16(in.buf);
+	slice_consume(&in, 2);
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->message = reftable_realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	slice_clear(&dest);
+	return start.len - in.len;
+
+error:
+	slice_clear(&dest);
+	return REFTABLE_FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL) {
+		a = empty;
+	}
+	if (b == NULL) {
+		b = empty;
+	}
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL) {
+		a = zero;
+	}
+	if (b == NULL) {
+		b = zero;
+	}
+	return !memcmp(a, b, sz);
+}
+
+bool reftable_log_record_equal(struct reftable_log_record *a,
+			       struct reftable_log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+static bool reftable_log_record_is_deletion_void(const void *p)
+{
+	return reftable_log_record_is_deletion(
+		(const struct reftable_log_record *)p);
+}
+
+struct record_vtable reftable_log_record_vtable = {
+	.key = &reftable_log_record_key,
+	.type = BLOCK_TYPE_LOG,
+	.copy_from = &reftable_log_record_copy_from,
+	.val_type = &reftable_log_record_val_type,
+	.encode = &reftable_log_record_encode,
+	.decode = &reftable_log_record_decode,
+	.clear = &reftable_log_record_clear_void,
+	.is_deletion = &reftable_log_record_is_deletion_void,
+};
+
+struct record new_record(byte typ)
+{
+	struct record rec = { NULL };
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct reftable_ref_record *r =
+			reftable_calloc(sizeof(struct reftable_ref_record));
+		record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct obj_record *r =
+			reftable_calloc(sizeof(struct obj_record));
+		record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct reftable_log_record *r =
+			reftable_calloc(sizeof(struct reftable_log_record));
+		record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct index_record *r =
+			reftable_calloc(sizeof(struct index_record));
+		record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+void *record_yield(struct record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+void record_destroy(struct record *rec)
+{
+	record_clear(*rec);
+	reftable_free(record_yield(rec));
+}
+
+static void index_record_key(const void *r, struct slice *dest)
+{
+	struct index_record *rec = (struct index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void index_record_copy_from(void *rec, const void *src_rec,
+				   int hash_size)
+{
+	struct index_record *dst = (struct index_record *)rec;
+	struct index_record *src = (struct index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void index_record_clear(void *rec)
+{
+	struct index_record *idx = (struct index_record *)rec;
+	slice_clear(&idx->last_key);
+}
+
+static byte index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int index_record_encode(const void *rec, struct slice out, int hash_size)
+{
+	const struct index_record *r = (const struct index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&out, n);
+
+	return start.len - out.len;
+}
+
+static int index_record_decode(void *rec, struct slice key, byte val_type,
+			       struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct index_record *r = (struct index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&in, n);
+	return start.len - in.len;
+}
+
+struct record_vtable index_record_vtable = {
+	.key = &index_record_key,
+	.type = BLOCK_TYPE_INDEX,
+	.copy_from = &index_record_copy_from,
+	.val_type = &index_record_val_type,
+	.encode = &index_record_encode,
+	.decode = &index_record_decode,
+	.clear = &index_record_clear,
+	.is_deletion = &not_a_deletion,
+};
+
+void record_key(struct record rec, struct slice *dest)
+{
+	rec.ops->key(rec.data, dest);
+}
+
+byte record_type(struct record rec)
+{
+	return rec.ops->type;
+}
+
+int record_encode(struct record rec, struct slice dest, int hash_size)
+{
+	return rec.ops->encode(rec.data, dest, hash_size);
+}
+
+void record_copy_from(struct record rec, struct record src, int hash_size)
+{
+	assert(src.ops->type == rec.ops->type);
+
+	rec.ops->copy_from(rec.data, src.data, hash_size);
+}
+
+byte record_val_type(struct record rec)
+{
+	return rec.ops->val_type(rec.data);
+}
+
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size)
+{
+	return rec.ops->decode(rec.data, key, extra, src, hash_size);
+}
+
+void record_clear(struct record rec)
+{
+	rec.ops->clear(rec.data);
+}
+
+bool record_is_deletion(struct record rec)
+{
+	return rec.ops->is_deletion(rec.data);
+}
+
+void record_from_ref(struct record *rec, struct reftable_ref_record *ref_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = ref_rec;
+	rec->ops = &reftable_ref_record_vtable;
+}
+
+void record_from_obj(struct record *rec, struct obj_record *obj_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = obj_rec;
+	rec->ops = &obj_record_vtable;
+}
+
+void record_from_index(struct record *rec, struct index_record *index_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = index_rec;
+	rec->ops = &index_record_vtable;
+}
+
+void record_from_log(struct record *rec, struct reftable_log_record *log_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = log_rec;
+	rec->ops = &reftable_log_record_vtable;
+}
+
+struct reftable_ref_record *record_as_ref(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_REF);
+	return (struct reftable_ref_record *)rec.data;
+}
+
+struct reftable_log_record *record_as_log(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_LOG);
+	return (struct reftable_log_record *)rec.data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL) {
+		return !memcmp(a, b, hash_size);
+	}
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL) {
+		return 0 == strcmp(a, b);
+	}
+
+	return a == b;
+}
+
+bool reftable_ref_record_equal(struct reftable_ref_record *a,
+			       struct reftable_ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int reftable_ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_ref_record *)a)->ref_name,
+		      ((struct reftable_ref_record *)b)->ref_name);
+}
+
+bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int reftable_log_record_compare_key(const void *a, const void *b)
+{
+	struct reftable_log_record *la = (struct reftable_log_record *)a;
+	struct reftable_log_record *lb = (struct reftable_log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp) {
+		return cmp;
+	}
+	if (la->update_index > lb->update_index) {
+		return -1;
+	}
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
+{
+	return (log->new_hash == NULL && log->old_hash == NULL &&
+		log->name == NULL && log->email == NULL &&
+		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
+		log->message == NULL);
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 00000000000..011d9bc56fc
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,121 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+/* utilities for de/encoding varints */
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+
+/* Methods for records. */
+struct record_vtable {
+	/* encode the key of to a byte slice. */
+	void (*key)(const void *rec, struct slice *dest);
+
+	/* The record type of ('r' for ref). */
+	byte type;
+
+	void (*copy_from)(void *dest, const void *src, int hash_size);
+
+	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
+	 * vs ref deletion) */
+	byte (*val_type)(const void *rec);
+
+	/* encodes rec into dest, returning how much space was used. */
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+
+	/* decode data from `src` into the record. */
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+
+	/* deallocate and null the record. */
+	void (*clear)(void *rec);
+
+	/* is this a tombstone? */
+	bool (*is_deletion)(const void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct record {
+	void *data;
+	struct record_vtable *ops;
+};
+
+/* returns true for recognized block types. Block start with the block type. */
+int is_block_type(byte typ);
+
+/* creates a malloced record of the given type. Dispose with record_destroy */
+struct record new_record(byte typ);
+
+extern struct record_vtable reftable_ref_record_vtable;
+
+/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
+   number of bytes written. */
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra);
+
+/* Decode into `key` and `extra` from `in` */
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in);
+
+/* index_record are used internally to speed up lookups. */
+struct index_record {
+	uint64_t offset; /* Offset of block */
+	struct slice last_key; /* Last key of the block. */
+};
+
+/* obj_record stores an object ID => ref mapping. */
+struct obj_record {
+	byte *hash_prefix; /* leading bytes of the object ID */
+	int hash_prefix_len; /* number of leading bytes. Constant
+			      * across a single table. */
+	uint64_t *offsets; /* a vector of file offsets. */
+	int offset_len;
+};
+
+/* see struct record_vtable */
+
+void record_key(struct record rec, struct slice *dest);
+byte record_type(struct record rec);
+void record_copy_from(struct record rec, struct record src, int hash_size);
+byte record_val_type(struct record rec);
+int record_encode(struct record rec, struct slice dest, int hash_size);
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size);
+bool record_is_deletion(struct record rec);
+
+/* zeroes out the embedded record */
+void record_clear(struct record rec);
+
+/* clear out the record, yielding the record data that was encapsulated. */
+void *record_yield(struct record *rec);
+
+/* clear and deallocate embedded record, and zero `rec`. */
+void record_destroy(struct record *rec);
+
+/* initialize generic records from concrete records. The generic record should
+ * be zeroed out. */
+void record_from_obj(struct record *rec, struct obj_record *objrec);
+void record_from_index(struct record *rec, struct index_record *idxrec);
+void record_from_ref(struct record *rec, struct reftable_ref_record *refrec);
+void record_from_log(struct record *rec, struct reftable_log_record *logrec);
+struct reftable_ref_record *record_as_ref(struct record ref);
+struct reftable_log_record *record_as_log(struct record ref);
+
+/* for qsort. */
+int reftable_ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int reftable_log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/refname.c b/reftable/refname.c
new file mode 100644
index 00000000000..c92488924ac
--- /dev/null
+++ b/reftable/refname.c
@@ -0,0 +1,215 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+#include "reftable.h"
+#include "basics.h"
+#include "refname.h"
+#include "slice.h"
+
+struct find_arg {
+	char **names;
+	const char *want;
+};
+
+static int find_name(size_t k, void *arg)
+{
+	struct find_arg *f_arg = (struct find_arg *)arg;
+
+	return strcmp(f_arg->names[k], f_arg->want) >= 0;
+}
+
+int modification_has_ref(struct modification *mod, const char *name)
+{
+	struct reftable_ref_record ref = { 0 };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = name,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len && !strcmp(mod->add[idx], name)) {
+			return 0;
+		}
+	}
+
+	if (mod->del_len > 0) {
+		struct find_arg arg = {
+			.names = mod->del,
+			.want = name,
+		};
+		int idx = binsearch(mod->del_len, find_name, &arg);
+		if (idx < mod->del_len && !strcmp(mod->del[idx], name)) {
+			return 1;
+		}
+	}
+
+	err = reftable_table_read_ref(mod->tab, name, &ref);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static void modification_clear(struct modification *mod)
+{
+	FREE_AND_NULL(mod->add);
+	FREE_AND_NULL(mod->del);
+	mod->add_len = 0;
+	mod->del_len = 0;
+}
+
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix)
+{
+	struct reftable_iterator it = { NULL };
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = prefix,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len &&
+		    !strncmp(prefix, mod->add[idx], strlen(prefix))) {
+			goto exit;
+		}
+	}
+
+	err = reftable_table_seek_ref(mod->tab, &it, prefix);
+	if (err) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_ref(it, &ref);
+		if (err) {
+			goto exit;
+		}
+
+		if (mod->del_len > 0) {
+			struct find_arg arg = {
+				.names = mod->del,
+				.want = ref.ref_name,
+			};
+			int idx = binsearch(mod->del_len, find_name, &arg);
+			if (idx < mod->del_len &&
+			    !strcmp(ref.ref_name, mod->del[idx])) {
+				continue;
+			}
+		}
+
+		if (strncmp(ref.ref_name, prefix, strlen(prefix))) {
+			err = 1;
+			goto exit;
+		}
+		err = 0;
+		goto exit;
+	}
+
+exit:
+	reftable_ref_record_clear(&ref);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int validate_ref_name(const char *name)
+{
+	while (true) {
+		char *next = strchr(name, '/');
+		if (!*name) {
+			return REFTABLE_REFNAME_ERROR;
+		}
+		if (!next) {
+			return 0;
+		}
+		if (next - name == 0 || (next - name == 1 && *name == '.') ||
+		    (next - name == 2 && name[0] == '.' && name[1] == '.'))
+			return REFTABLE_REFNAME_ERROR;
+		name = next + 1;
+	}
+	return 0;
+}
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz)
+{
+	struct modification mod = {
+		.tab = tab,
+		.add = reftable_calloc(sizeof(char *) * sz),
+		.del = reftable_calloc(sizeof(char *) * sz),
+	};
+	int i = 0;
+	int err = 0;
+	for (; i < sz; i++) {
+		if (reftable_ref_record_is_deletion(&recs[i])) {
+			mod.del[mod.del_len++] = recs[i].ref_name;
+		} else {
+			mod.add[mod.add_len++] = recs[i].ref_name;
+		}
+	}
+
+	err = modification_validate(&mod);
+	modification_clear(&mod);
+	return err;
+}
+
+static void slice_trim_component(struct slice *sl)
+{
+	while (sl->len > 0) {
+		bool is_slash = (sl->buf[sl->len - 1] == '/');
+		sl->len--;
+		if (is_slash)
+			break;
+	}
+}
+
+int modification_validate(struct modification *mod)
+{
+	struct slice slashed = { 0 };
+	int err = 0;
+	int i = 0;
+	for (; i < mod->add_len; i++) {
+		err = validate_ref_name(mod->add[i]);
+		if (err) {
+			goto exit;
+		}
+		slice_set_string(&slashed, mod->add[i]);
+		slice_append_string(&slashed, "/");
+
+		err = modification_has_ref_with_prefix(
+			mod, slice_as_string(&slashed));
+		if (err == 0) {
+			err = REFTABLE_NAME_CONFLICT;
+			goto exit;
+		}
+		if (err < 0) {
+			goto exit;
+		}
+
+		slice_set_string(&slashed, mod->add[i]);
+		while (slashed.len) {
+			slice_trim_component(&slashed);
+			err = modification_has_ref(mod,
+						   slice_as_string(&slashed));
+			if (err == 0) {
+				err = REFTABLE_NAME_CONFLICT;
+				goto exit;
+			}
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+	err = 0;
+exit:
+	slice_clear(&slashed);
+	return err;
+}
diff --git a/reftable/refname.h b/reftable/refname.h
new file mode 100644
index 00000000000..e0c40b7fa9e
--- /dev/null
+++ b/reftable/refname.h
@@ -0,0 +1,39 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+#ifndef REFNAME_H
+#define REFNAME_H
+
+struct modification {
+	struct reftable_table tab;
+
+	char **add;
+	size_t add_len;
+
+	char **del;
+	size_t del_len;
+};
+
+// -1 = error, 0 = found, 1 = not found
+int modification_has_ref(struct modification *mod, const char *name);
+
+// -1 = error, 0 = found, 1 = not found.
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix);
+
+// 0 = OK.
+int validate_ref_name(const char *name);
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz);
+
+int modification_validate(struct modification *mod);
+
+/* illegal name, or dir/file conflict */
+#define REFTABLE_REFNAME_ERROR -9
+
+#endif
diff --git a/reftable/reftable.c b/reftable/reftable.c
new file mode 100644
index 00000000000..d29fbd8ee44
--- /dev/null
+++ b/reftable/reftable.c
@@ -0,0 +1,91 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+#include "record.h"
+#include "reader.h"
+#include "merged.h"
+
+struct reftable_table_vtable {
+	int (*seek)(void *tab, struct reftable_iterator *it, struct record);
+};
+
+static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
+				     struct record rec)
+{
+	return reader_seek((struct reftable_reader *)tab, it, rec);
+}
+
+static struct reftable_table_vtable reader_vtable = {
+	.seek = reftable_reader_seek_void,
+};
+
+static int reftable_merged_table_seek_void(void *tab,
+					   struct reftable_iterator *it,
+					   struct record rec)
+{
+	return merged_table_seek_record((struct reftable_merged_table *)tab, it,
+					rec);
+}
+
+static struct reftable_table_vtable merged_table_vtable = {
+	.seek = reftable_merged_table_seek_void,
+};
+
+int reftable_table_seek_ref(struct reftable_table tab,
+			    struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return tab.ops->seek(tab.table_arg, it, rec);
+}
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &reader_vtable;
+	tab->table_arg = reader;
+}
+
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *merged)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &merged_table_vtable;
+	tab->table_arg = merged;
+}
+
+int reftable_table_read_ref(struct reftable_table tab, const char *name,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_iterator it = { 0 };
+	int err = reftable_table_seek_ref(tab, &it, name);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_ref(it, ref);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(ref->ref_name, name) ||
+	    reftable_ref_record_is_deletion(ref)) {
+		reftable_ref_record_clear(ref);
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 00000000000..968a8c644c2
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,564 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *));
+
+/****************************************************************
+ Basic data types
+
+ Reftables store the state of each ref in struct reftable_ref_record, and they
+ store a sequence of reflog updates in struct reftable_log_record.
+ ****************************************************************/
+
+/* reftable_ref_record holds a ref database entry target_value */
+struct reftable_ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
+
+/* prints a reftable_ref_record onto stdout */
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id);
+
+/* frees and nulls all pointer values. */
+void reftable_ref_record_clear(struct reftable_ref_record *ref);
+
+/* returns whether two reftable_ref_records are the same */
+int reftable_ref_record_equal(struct reftable_ref_record *a,
+			      struct reftable_ref_record *b, int hash_size);
+
+/* reftable_log_record holds a reflog entry */
+struct reftable_log_record {
+	char *ref_name;
+	uint64_t update_index; /* logical timestamp of a transactional update.
+				*/
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int reftable_log_record_is_deletion(const struct reftable_log_record *log);
+
+/* frees and nulls all pointer values. */
+void reftable_log_record_clear(struct reftable_log_record *log);
+
+/* returns whether two records are equal. */
+int reftable_log_record_equal(struct reftable_log_record *a,
+			      struct reftable_log_record *b, int hash_size);
+
+/* dumps a reftable_log_record on stdout, for debugging/testing. */
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id);
+
+/****************************************************************
+ Error handling
+
+ Error are signaled with negative integer return values. 0 means success.
+ ****************************************************************/
+
+/* different types of errors */
+enum reftable_error {
+	/* Unexpected file system behavior */
+	REFTABLE_IO_ERROR = -2,
+
+	/* Format inconsistency on reading data
+	 */
+	REFTABLE_FORMAT_ERROR = -3,
+
+	/* File does not exist. Returned from block_source_from_file(),  because
+	   it needs special handling in stack.
+	*/
+	REFTABLE_NOT_EXIST_ERROR = -4,
+
+	/* Trying to write out-of-date data. */
+	REFTABLE_LOCK_ERROR = -5,
+
+	/* Misuse of the API:
+	   - on writing a record with NULL ref_name.
+	   - on writing a reftable_ref_record outside the table limits
+	   - on writing a ref or log record before the stack's next_update_index
+	   - on reading a reftable_ref_record from log iterator, or vice versa.
+	*/
+	REFTABLE_API_ERROR = -6,
+
+	/* Decompression error */
+	REFTABLE_ZLIB_ERROR = -7,
+
+	/* Wrote a table without blocks. */
+	REFTABLE_EMPTY_TABLE_ERROR = -8,
+
+	/* Dir/file conflict. */
+	REFTABLE_NAME_CONFLICT = -9,
+
+	/* Illegal ref name. */
+	REFTABLE_REFNAME_ERROR = -10,
+};
+
+/* convert the numeric error code to a string. The string should not be
+ * deallocated. */
+const char *reftable_error_str(int err);
+
+/*
+ * Convert the numeric error code to an equivalent errno code.
+ */
+int reftable_error_to_errno(int err);
+
+/****************************************************************
+ Writing
+
+ Writing single reftables
+ ****************************************************************/
+
+/* reftable_write_options sets options for writing a single reftable. */
+struct reftable_write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+
+	/* boolean: do not check ref names for validity or dir/file conflicts.
+	 */
+	int skip_name_check;
+};
+
+/* reftable_block_stats holds statistics for a single block type */
+struct reftable_block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct reftable_stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct reftable_block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct reftable_block_stats obj_stats;
+	/* stats for index blocks */
+	struct reftable_block_stats idx_stats;
+	/* stats for log blocks */
+	struct reftable_block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* reftable_new_writer creates a new writer */
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, uint8_t *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int reftable_fd_write(void *fdp, uint8_t *data, size_t size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+
+   For transactional updates, typically min==max. When converting an existing
+   ref database into a single reftable, this would be a range of update-index
+   timestamps.
+ */
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max);
+
+/* adds a reftable_ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned.
+
+   It is an error to write a ref record after a log record.
+ */
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n);
+
+/* adds a reftable_log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n);
+
+/* reftable_writer_close finalizes the reftable. The writer is retained so
+ * statistics can be inspected. */
+int reftable_writer_close(struct reftable_writer *w);
+
+/* writer_stats returns the statistics on the reftable being written.
+
+   This struct becomes invalid when the writer is freed.
+ */
+const struct reftable_stats *writer_stats(struct reftable_writer *w);
+
+/* reftable_writer_free deallocates memory for the writer */
+void reftable_writer_free(struct reftable_writer *w);
+
+/****************************************************************
+ * ITERATING
+ ****************************************************************/
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct reftable_iterator {
+	struct reftable_iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_ref(struct reftable_iterator it,
+			       struct reftable_ref_record *ref);
+
+/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_log(struct reftable_iterator it,
+			       struct reftable_log_record *log);
+
+/* releases resources associated with an iterator. */
+void reftable_iterator_destroy(struct reftable_iterator *it);
+
+/****************************************************************
+ Reading single tables
+
+ The follow routines are for reading single files. For an application-level
+ interface, skip ahead to struct reftable_merged_table and struct
+ reftable_stack.
+ ****************************************************************/
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct reftable_block_source {
+	struct reftable_block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct reftable_block {
+	uint8_t *data;
+	int len;
+	struct reftable_block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct reftable_block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct reftable_block *dest,
+			  uint64_t off, uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct reftable_block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int reftable_block_source_from_file(struct reftable_block_source *block_src,
+				    const char *name);
+
+/* The reader struct is a handle to an open reftable file. */
+struct reftable_reader;
+
+/* reftable_new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp. The name is used for creating a stack. Typically, it is the
+ * basename of the file. The block source `src` is owned by the reader, and is
+ * closed on calling reftable_reader_destroy().
+ */
+int reftable_new_reader(struct reftable_reader **pp,
+			struct reftable_block_source src, const char *name);
+
+/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
+   in the table.  To seek to the start of the table, use name = "".
+
+   example:
+
+   struct reftable_reader *r = NULL;
+   int err = reftable_new_reader(&r, src, "filename");
+   if (err < 0) { ... }
+   struct reftable_iterator it  = {0};
+   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct reftable_ref_record ref  = {0};
+   while (1) {
+     err = reftable_iterator_next_ref(it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   reftable_iterator_destroy(&it);
+   reftable_ref_record_clear(&ref);
+ */
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reftable_reader_hash_id(struct reftable_reader *r);
+
+/* seek to logs for the given name, older than update_index. To seek to the
+   start of the table, use name = "".
+ */
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reftable_reader_free(struct reftable_reader *);
+
+/* return an iterator for the refs pointing to oid */
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid,
+			     int oid_len);
+
+/* return the max_update_index for a table */
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
+
+/****************************************************************
+ Merged tables
+
+ A ref database kept in a sequence of table files. The merged_table presents a
+ unified view to reading (seeking, iterating) a sequence of immutable tables.
+ ****************************************************************/
+
+/* A merged table is implements seeking/iterating over a stack of tables. */
+struct reftable_merged_table;
+
+/* reftable_new_merged_table creates a new merged table. It takes ownership of
+   the stack array.
+*/
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id);
+
+/* returns an iterator positioned just before 'name' */
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index);
+
+/* like reftable_merged_table_seek_log_at but look for the newest entry. */
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
+
+/* closes readers for the merged tables */
+void reftable_merged_table_close(struct reftable_merged_table *mt);
+
+/* releases memory for the merged_table */
+void reftable_merged_table_free(struct reftable_merged_table *m);
+
+/****************************************************************
+ Generic tables
+
+ A unified API for reading tables, either merged tables, or single readers.
+ ****************************************************************/
+
+struct reftable_table {
+	struct reftable_table_vtable *ops;
+	void *table_arg;
+};
+
+int reftable_table_seek_ref(struct reftable_table tab,
+			    struct reftable_iterator *it, const char *name);
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader);
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *table);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_table_read_ref(struct reftable_table tab, const char *name,
+			    struct reftable_ref_record *ref);
+
+/****************************************************************
+ Mutable ref database
+
+ The stack presents an interface to a mutable sequence of reftables.
+ ****************************************************************/
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct reftable_stack;
+
+/* open a new reftable stack. The tables along with the table list will be
+   stored in 'dir'. Typically, this should be .git/reftables.
+*/
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
+
+/* holds a transaction to add tables at the top of a stack. */
+struct reftable_addition;
+
+/*
+  returns a new transaction to add reftables to the given stack. As a side
+  effect, the ref database is locked.
+*/
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st);
+
+/* Adds a reftable to transaction. */
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg);
+
+/* Commits the transaction, releasing the lock. */
+int reftable_addition_commit(struct reftable_addition *add);
+
+/* Release all non-committed data from the transaction; releases the lock if
+ * held. */
+void reftable_addition_close(struct reftable_addition *add);
+
+/* add a new table to the stack. The write_table function must call
+   reftable_writer_set_limits, add refs and return an error value. */
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write_table)(struct reftable_writer *wr,
+					  void *write_arg),
+		       void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st);
+
+/* frees all resources associated with the stack. */
+void reftable_stack_destroy(struct reftable_stack *st);
+
+/* reloads the stack if necessary. */
+int reftable_stack_reload(struct reftable_stack *st);
+
+/* Policy for expiring reflog entries. */
+struct reftable_log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int reftable_stack_auto_compact(struct reftable_stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log);
+
+/* statistics on past compactions. */
+struct reftable_compaction_stats {
+	uint64_t bytes; /* total number of bytes written */
+	uint64_t entries_written; /* total number of entries written, including
+				     failures. */
+	int attempts; /* how often we tried to compact */
+	int failures; /* failures happen on concurrent updates */
+};
+
+/* return statistics for compaction up till now. */
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st);
+
+#endif
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 00000000000..89e649b0d46
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,225 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+
+	{
+		int l = strlen(str);
+		l++; /* \0 */
+		slice_resize(s, l);
+		memcpy(s->buf, str, l);
+		s->len = l - 1;
+	}
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = reftable_realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_append_string(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_append(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+void slice_consume(struct slice *s, int n)
+{
+	s->buf += n;
+	s->len -= n;
+}
+
+byte *slice_yield(struct slice *s)
+{
+	byte *p = s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_clear(struct slice *s)
+{
+	reftable_free(slice_yield(s));
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = { 0 };
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_yield(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	if (a.len != b.len) {
+		return 0;
+	}
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_compare(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	if (res != 0) {
+		return res;
+	}
+	if (a.len < b.len) {
+		return -1;
+	} else if (a.len > b.len) {
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+int slice_write(struct slice *b, byte *data, size_t sz)
+{
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = reftable_realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_write_void(void *b, byte *data, size_t sz)
+{
+	return slice_write((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = reftable_calloc(size);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf)
+{
+	assert(bs->ops == NULL);
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+struct reftable_block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct reftable_block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct reftable_block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 00000000000..886f22811c4
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,76 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+/*
+  provides bounds-checked byte ranges.
+  To use, initialize as "slice x = {0};"
+ */
+struct slice {
+	int len;
+	int cap;
+	byte *buf;
+};
+
+void slice_set_string(struct slice *dest, const char *src);
+void slice_append_string(struct slice *dest, const char *src);
+/* Set length to 0, but retain buffer */
+void slice_clear(struct slice *slice);
+
+/* Return a malloced string for `src` */
+char *slice_to_string(struct slice src);
+
+/* Ensure that `buf` is \0 terminated. */
+const char *slice_as_string(struct slice *src);
+
+/* Compare slices */
+bool slice_equal(struct slice a, struct slice b);
+
+/* Return `buf`, clearing out `s` */
+byte *slice_yield(struct slice *s);
+
+/* Copy bytes */
+void slice_copy(struct slice *dest, struct slice src);
+
+/* Advance `buf` by `n`, and decrease length. A copy of the slice
+   should be kept for deallocating the slice. */
+void slice_consume(struct slice *s, int n);
+
+/* Set length of the slice to `l` */
+void slice_resize(struct slice *s, int l);
+
+/* Signed comparison */
+int slice_compare(struct slice a, struct slice b);
+
+/* Append `data` to the `dest` slice.  */
+int slice_write(struct slice *dest, byte *data, size_t sz);
+
+/* Append `add` to `dest. */
+void slice_append(struct slice *dest, struct slice add);
+
+/* Like slice_write, but suitable for passing to reftable_new_writer
+ */
+int slice_write_void(void *b, byte *data, size_t sz);
+
+/* Find the longest shared prefix size of `a` and `b` */
+int common_prefix_size(struct slice a, struct slice b);
+
+struct reftable_block_source;
+
+/* Create an in-memory block source for reading reftables */
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf);
+
+struct reftable_block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 00000000000..0f44bd258ed
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1229 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "refname.h"
+#include "reftable.h"
+#include "writer.h"
+
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config)
+{
+	struct reftable_stack *p =
+		reftable_calloc(sizeof(struct reftable_stack));
+	struct slice list_file_name = { 0 };
+	int err = 0;
+
+	if (config.hash_id == 0) {
+		config.hash_id = SHA1_ID;
+	}
+
+	*dest = NULL;
+
+	slice_set_string(&list_file_name, dir);
+	slice_append_string(&list_file_name, "/tables.list");
+
+	p->list_file = slice_to_string(list_file_name);
+	slice_clear(&list_file_name);
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = reftable_stack_reload(p);
+	if (err < 0) {
+		reftable_stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fd_read_lines(int fd, char ***namesp)
+{
+	off_t size = lseek(fd, 0, SEEK_END);
+	char *buf = NULL;
+	int err = 0;
+	if (size < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+	err = lseek(fd, 0, SEEK_SET);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	buf = reftable_malloc(size + 1);
+	if (read(fd, buf, size) != size) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+
+exit:
+	reftable_free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	int fd = open(filename, O_RDONLY, 0644);
+	int err = 0;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			*namesp = reftable_calloc(sizeof(char *));
+			return 0;
+		}
+
+		return REFTABLE_IO_ERROR;
+	}
+	err = fd_read_lines(fd, namesp);
+	close(fd);
+	return err;
+}
+
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void reftable_stack_destroy(struct reftable_stack *st)
+{
+	if (st->merged == NULL) {
+		return;
+	}
+
+	reftable_merged_table_close(st->merged);
+	reftable_merged_table_free(st->merged);
+	st->merged = NULL;
+
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	reftable_free(st);
+}
+
+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
+						   int cur_len)
+{
+	struct reftable_reader **cur =
+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
+				      bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reftable_reader **new_tables =
+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
+	int new_tables_len = 0;
+	struct reftable_merged_table *new_merged = NULL;
+
+	struct slice table_path = { 0 };
+
+	while (*names) {
+		struct reftable_reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct reftable_block_source src = { 0 };
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_append_string(&table_path, "/");
+			slice_append_string(&table_path, name);
+
+			err = reftable_block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0) {
+				goto exit;
+			}
+
+			err = reftable_new_reader(&rd, src, name);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
+					st->config.hash_id);
+	if (err < 0) {
+		goto exit;
+	}
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		reftable_merged_table_free(st->merged);
+	}
+	new_merged->suppress_deletions = true;
+	st->merged = new_merged;
+
+	{
+		int i = 0;
+		for (i = 0; i < cur_len; i++) {
+			if (cur[i] != NULL) {
+				reader_close(cur[i]);
+				reftable_reader_free(cur[i]);
+			}
+		}
+	}
+
+exit:
+	slice_clear(&table_path);
+	{
+		int i = 0;
+		for (i = 0; i < new_tables_len; i++) {
+			reader_close(new_tables[i]);
+		}
+	}
+	reftable_free(new_tables);
+	reftable_free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0) {
+		return diff;
+	}
+
+	return udiff;
+}
+
+static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+					     bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		int err2 = 0;
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = reftable_stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != REFTABLE_NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
+		   writer. Check if there was one by checking if the name list
+		   changed.
+		*/
+		err2 = read_lines(st->list_file, &names_after);
+		if (err2 < 0) {
+			free_names(names);
+			return err2;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return err;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 1;
+		sleep_millisec(delay);
+	}
+
+	return 0;
+}
+
+int reftable_stack_reload(struct reftable_stack *st)
+{
+	return reftable_stack_reload_maybe_reuse(st, true);
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct reftable_stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto exit;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto exit;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	free_names(names);
+	return err;
+}
+
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write)(struct reftable_writer *wr, void *arg),
+		       void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == REFTABLE_LOCK_ERROR) {
+			/* Ignore error return, we want to propagate
+			   REFTABLE_LOCK_ERROR.
+			*/
+			reftable_stack_reload(st);
+		}
+		return err;
+	}
+
+	if (!st->disable_auto_compact) {
+		return reftable_stack_auto_compact(st);
+	}
+
+	return 0;
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
+	slice_set_string(dest, buf);
+}
+
+struct reftable_addition {
+	int lock_file_fd;
+	struct slice lock_file_name;
+	struct reftable_stack *stack;
+	char **names;
+	char **new_tables;
+	int new_tables_len;
+	uint64_t next_update_index;
+};
+
+static int reftable_stack_init_addition(struct reftable_addition *add,
+					struct reftable_stack *st)
+{
+	int err = 0;
+	add->stack = st;
+
+	slice_set_string(&add->lock_file_name, st->list_file);
+	slice_append_string(&add->lock_file_name, ".lock");
+
+	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (add->lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = REFTABLE_LOCK_ERROR;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	err = stack_uptodate(st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 1) {
+		err = REFTABLE_LOCK_ERROR;
+		goto exit;
+	}
+
+	add->next_update_index = reftable_stack_next_update_index(st);
+exit:
+	if (err) {
+		reftable_addition_close(add);
+	}
+	return err;
+}
+
+void reftable_addition_close(struct reftable_addition *add)
+{
+	int i = 0;
+	struct slice nm = { 0 };
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_set_string(&nm, add->stack->list_file);
+		slice_append_string(&nm, "/");
+		slice_append_string(&nm, add->new_tables[i]);
+		unlink(slice_as_string(&nm));
+		reftable_free(add->new_tables[i]);
+		add->new_tables[i] = NULL;
+	}
+	reftable_free(add->new_tables);
+	add->new_tables = NULL;
+	add->new_tables_len = 0;
+
+	if (add->lock_file_fd > 0) {
+		close(add->lock_file_fd);
+		add->lock_file_fd = 0;
+	}
+	if (add->lock_file_name.len > 0) {
+		unlink(slice_as_string(&add->lock_file_name));
+		slice_clear(&add->lock_file_name);
+	}
+
+	free_names(add->names);
+	add->names = NULL;
+	slice_clear(&nm);
+}
+
+int reftable_addition_commit(struct reftable_addition *add)
+{
+	struct slice table_list = { 0 };
+	int i = 0;
+	int err = 0;
+	if (add->new_tables_len == 0) {
+		goto exit;
+	}
+
+	for (i = 0; i < add->stack->merged->stack_len; i++) {
+		slice_append_string(&table_list,
+				    add->stack->merged->stack[i]->name);
+		slice_append_string(&table_list, "\n");
+	}
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_append_string(&table_list, add->new_tables[i]);
+		slice_append_string(&table_list, "\n");
+	}
+
+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	slice_clear(&table_list);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = close(add->lock_file_fd);
+	add->lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&add->lock_file_name),
+		     add->stack->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = reftable_stack_reload(add->stack);
+
+exit:
+	reftable_addition_close(add);
+	return err;
+}
+
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st)
+{
+	int err = 0;
+	*dest = reftable_malloc(sizeof(**dest));
+	err = reftable_stack_init_addition(*dest, st);
+	if (err) {
+		reftable_free(*dest);
+		*dest = NULL;
+	}
+	return err;
+}
+
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg)
+{
+	struct reftable_addition add = { 0 };
+	int err = reftable_stack_init_addition(&add, st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_addition_add(&add, write_table, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_addition_commit(&add);
+exit:
+	reftable_addition_close(&add);
+	return err;
+}
+
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice tab_file_name = { 0 };
+	struct slice next_name = { 0 };
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, add->next_update_index, add->next_update_index);
+
+	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
+	slice_append_string(&temp_tab_file_name, "/");
+	slice_append(&temp_tab_file_name, next_name);
+	slice_append_string(&temp_tab_file_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
+	if (tab_fd < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
+				 &add->stack->config);
+	err = write_table(wr, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_writer_close(wr);
+	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto exit;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_check_addition(add->stack,
+				   slice_as_string(&temp_tab_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (wr->min_update_index < add->next_update_index) {
+		err = REFTABLE_API_ERROR;
+		goto exit;
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_append_string(&next_name, ".ref");
+
+	slice_set_string(&tab_file_name, add->stack->reftable_dir);
+	slice_append_string(&tab_file_name, "/");
+	slice_append(&tab_file_name, next_name);
+
+	err = rename(slice_as_string(&temp_tab_file_name),
+		     slice_as_string(&tab_file_name));
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	add->new_tables = reftable_realloc(add->new_tables,
+					   sizeof(*add->new_tables) *
+						   (add->new_tables_len + 1));
+	add->new_tables[add->new_tables_len] = slice_to_string(next_name);
+	add->new_tables_len++;
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_file_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_file_name));
+	}
+
+	slice_clear(&temp_tab_file_name);
+	slice_clear(&tab_file_name);
+	slice_clear(&next_name);
+	reftable_writer_free(wr);
+	return err;
+}
+
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0) {
+		return reftable_reader_max_update_index(
+			       st->merged->stack[sz - 1]) +
+		       1;
+	}
+	return 1;
+}
+
+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct reftable_log_expiry_config *config)
+{
+	struct slice next_name = { 0 };
+	int tab_fd = -1;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reftable_reader_min_update_index(st->merged->stack[first]),
+		    reftable_reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_append_string(temp_tab, "/");
+	slice_append(temp_tab, next_name);
+	slice_append_string(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0) {
+		goto exit;
+	}
+	err = reftable_writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+	reftable_writer_free(wr);
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		slice_clear(temp_tab);
+	}
+	slice_clear(&next_name);
+	return err;
+}
+
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reftable_reader **subtabs = reftable_calloc(
+		sizeof(struct reftable_reader *) * (last - first + 1));
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+
+	uint64_t entries = 0;
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reftable_reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	reftable_writer_set_limits(wr,
+				   st->merged->stack[first]->min_update_index,
+				   st->merged->stack[last]->max_update_index);
+
+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
+					st->config.hash_id);
+	if (err < 0) {
+		reftable_free(subtabs);
+		goto exit;
+	}
+
+	err = reftable_merged_table_seek_ref(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_ref(it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = reftable_writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+	reftable_iterator_destroy(&it);
+
+	err = reftable_merged_table_seek_log(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_log(it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = reftable_writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		reftable_merged_table_free(mt);
+	}
+	reftable_ref_record_clear(&ref);
+	reftable_log_record_clear(&log);
+	st->stats.entries_written += entries;
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct reftable_stack *st, int first, int last,
+			       struct reftable_log_expiry_config *expiry)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice new_table_name = { 0 };
+	struct slice lock_file_name = { 0 };
+	struct slice ref_list_contents = { 0 };
+	struct slice new_table_path = { 0 };
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **delete_on_success =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	char **subtable_locks =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto exit;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_append_string(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	/* Don't want to write to the lock for now.  */
+	close(lock_file_fd);
+	lock_file_fd = 0;
+
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0) {
+		goto exit;
+	}
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_file_name = { 0 };
+		struct slice subtab_lock = { 0 };
+		slice_set_string(&subtab_file_name, st->reftable_dir);
+		slice_append_string(&subtab_file_name, "/");
+		slice_append_string(&subtab_file_name,
+				    reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_file_name);
+		slice_append_string(&subtab_lock, ".lock");
+
+		{
+			int sublock_file_fd =
+				open(slice_as_string(&subtab_lock),
+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
+			if (sublock_file_fd > 0) {
+				close(sublock_file_fd);
+			} else if (sublock_file_fd < 0) {
+				if (errno == EEXIST) {
+					err = 1;
+				} else {
+					err = REFTABLE_IO_ERROR;
+				}
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] =
+			(char *)slice_as_string(&subtab_file_name);
+		j++;
+
+		if (err != 0) {
+			goto exit;
+		}
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
+				   expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_append_string(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_append_string(&new_table_path, "/");
+
+	slice_append(&new_table_path, new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(slice_as_string(&temp_tab_file_name),
+			     slice_as_string(&new_table_path));
+		if (err < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto exit;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		slice_append(&ref_list_contents, new_table_name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	have_lock = false;
+
+	/* Reload the stack before deleting. On windows, we can only delete the
+	   files after we closed them.
+	*/
+	err = reftable_stack_reload_maybe_reuse(st, first < last);
+
+	{
+		char **p = delete_on_success;
+		while (*p) {
+			if (strcmp(*p, slice_as_string(&new_table_path))) {
+				unlink(*p);
+			}
+			p++;
+		}
+	}
+
+exit:
+	free_names(delete_on_success);
+	{
+		char **p = subtable_locks;
+		while (*p) {
+			unlink(*p);
+			p++;
+		}
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	slice_clear(&new_table_name);
+	slice_clear(&new_table_path);
+	slice_clear(&ref_list_contents);
+	slice_clear(&temp_tab_file_name);
+	slice_clear(&lock_file_name);
+	return err;
+}
+
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct reftable_stack *st, int first,
+				     int last,
+				     struct reftable_log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	if (sz == 0) {
+		return 0;
+	}
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+
+	if (n == 0) {
+		*seglen = 0;
+		return segs;
+	}
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	reftable_free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
+{
+	uint64_t *sizes =
+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
+	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
+	int overhead = header_size(version) - 1;
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		sizes[i] = st->merged->stack[i]->size - overhead;
+	}
+	return sizes;
+}
+
+int reftable_stack_auto_compact(struct reftable_stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	reftable_free(sizes);
+	if (segment_size(&seg) > 0) {
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+	}
+
+	return 0;
+}
+
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st)
+{
+	return &st->stats;
+}
+
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_table tab = { NULL };
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+	return reftable_table_read_ref(tab, refname, ref);
+}
+
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_log(it, log);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(log->ref_name, refname) ||
+	    reftable_log_record_is_deletion(log)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	if (err) {
+		reftable_log_record_clear(log);
+	}
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name)
+{
+	int err = 0;
+	struct reftable_block_source src = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct reftable_ref_record *refs = NULL;
+	struct reftable_iterator it = { NULL };
+	int cap = 0;
+	int len = 0;
+	int i = 0;
+
+	if (st->config.skip_name_check) {
+		return 0;
+	}
+
+	err = reftable_block_source_from_file(&src, new_tab_name);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_new_reader(&rd, src, new_tab_name);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	if (err > 0) {
+		err = 0;
+		goto exit;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		err = reftable_iterator_next_ref(it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (len >= cap) {
+			cap = 2 * cap + 1;
+			refs = reftable_realloc(refs, cap * sizeof(refs[0]));
+		}
+
+		refs[len++] = ref;
+	}
+
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+
+	err = validate_ref_record_addition(tab, refs, len);
+
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&refs[i]);
+	}
+
+exit:
+	free(refs);
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 00000000000..48924c40f4b
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,45 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+#include "system.h"
+
+struct reftable_stack {
+	char *list_file;
+	char *reftable_dir;
+	bool disable_auto_compact;
+
+	struct reftable_write_options config;
+
+	struct reftable_merged_table *merged;
+	struct reftable_compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg);
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config);
+int fastlog2(uint64_t sz);
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 00000000000..9e2c4b8d9ac
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,54 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#if 1 /* REFTABLE_IN_GITCORE */
+
+#include "git-compat-util.h"
+#include "cache.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_IN_GITCORE */
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef uint8_t byte;
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ *
+ * TODO: in git-core, this should fallback to uncompress2 if it is available.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 00000000000..0341c865569
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,67 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				reftable_calloc(sizeof(struct tree_node));
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	{
+		int res = compare(key, (*rootp)->key);
+		if (res < 0) {
+			return tree_search(key, &(*rootp)->left, compare,
+					   insert);
+		} else if (res > 0) {
+			return tree_search(key, &(*rootp)->right, compare,
+					   insert);
+		}
+	}
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	reftable_free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 00000000000..954512e9a3a
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+/* tree_node is a generic binary search tree. */
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+/* looks for `key` in `rootp` using `compare` as comparison function. If insert
+   is set, insert the key if it's not found. Else, return NULL.
+*/
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+
+/* performs an infix walk of the tree. */
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+
+/*
+  deallocates the tree nodes recursively. Keys should be deallocated separately
+  by walking over the tree. */
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100755
index 00000000000..ef32aeda515
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+set -eu
+
+# Override this to import from somewhere else, say "../reftable".
+SRC=${SRC:-origin}
+BRANCH=${BRANCH:-master}
+
+((git --git-dir reftable-repo/.git fetch ${SRC} ${BRANCH}:import && cd reftable-repo && git checkout -f $(git rev-parse import) ) ||
+   git clone https://github.com/google/reftable reftable-repo)
+
+cp reftable-repo/c/*.[ch] reftable/
+cp reftable-repo/c/include/*.[ch] reftable/
+cp reftable-repo/LICENSE reftable/
+git --git-dir reftable-repo/.git show --no-patch HEAD \
+  > reftable/VERSION
+
+mv reftable/system.h reftable/system.h~
+sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
+
+# Remove unittests and compatibility hacks we don't need here.  
+rm reftable/*_test.c reftable/test_framework.* reftable/compat.*
+
+git add reftable/*.[ch] reftable/LICENSE reftable/VERSION 
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 00000000000..bb70eba49da
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,661 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct reftable_block_stats *
+writer_reftable_block_stats(struct reftable_writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct reftable_writer *w, byte *data, size_t len,
+			int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = reftable_calloc(w->pending_padding);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0) {
+			return n;
+		}
+
+		w->pending_padding = 0;
+		reftable_free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0) {
+		return n;
+	}
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct reftable_write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_version(struct reftable_writer *w)
+{
+	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
+}
+
+static int writer_write_header(struct reftable_writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	dest[4] = writer_version(w);
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	if (writer_version(w) == 2) {
+		put_be32(dest + 24, w->opts.hash_id);
+	}
+	return header_size(writer_version(w));
+}
+
+static void writer_reinit_block_writer(struct reftable_writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = header_size(writer_version(w));
+	}
+
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, byte *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts)
+{
+	struct reftable_writer *wp =
+		reftable_calloc(sizeof(struct reftable_writer));
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->block = reftable_calloc(opts->block_size);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void reftable_writer_free(struct reftable_writer *w)
+{
+	reftable_free(w->block);
+	reftable_free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
+			     ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct reftable_writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		key = reftable_calloc(sizeof(struct obj_index_tree_node));
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = reftable_realloc(
+			key->offsets, sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct reftable_writer *w, struct record rec)
+{
+	int result = -1;
+	struct slice key = { 0 };
+	int err = 0;
+	record_key(rec, &key);
+	if (slice_compare(w->last_key, key) >= 0) {
+		goto exit;
+	}
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto exit;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	writer_reinit_block_writer(w, record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	result = 0;
+exit:
+	slice_clear(&key);
+	return result;
+}
+
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref)
+{
+	struct record rec = { 0 };
+	struct reftable_ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL) {
+		return REFTABLE_API_ERROR;
+	}
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index) {
+		return REFTABLE_API_ERROR;
+	}
+
+	record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = hash_size(w->opts.hash_id),
+		};
+
+		writer_index_hash(w, h);
+	}
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = hash_size(w->opts.hash_id),
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, reftable_ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log)
+{
+	if (log->ref_name == NULL) {
+		return REFTABLE_API_ERROR;
+	}
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	{
+		struct record rec = { 0 };
+		int err;
+		record_from_log(&rec, log);
+		err = writer_add_record(w, rec);
+		return err;
+	}
+}
+
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, reftable_log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	while (w->index_len > threshold) {
+		struct index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct record rec = { 0 };
+			record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, rec) == 0) {
+				continue;
+			}
+
+			{
+				int err = writer_flush_block(w);
+				if (err < 0) {
+					return err;
+				}
+			}
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, rec);
+			assert(err == 0);
+		}
+		for (i = 0; i < idx_len; i++) {
+			slice_clear(&idx[i].last_key);
+		}
+		reftable_free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_block_stats *bstats =
+			writer_reftable_block_stats(w, typ);
+		bstats->index_blocks =
+			w->stats.idx_stats.blocks - before_blocks;
+		bstats->index_offset = index_start;
+		bstats->max_index_level = max_level;
+	}
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct reftable_writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct record rec = { 0 };
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	slice_clear(&entry->hash);
+	reftable_free(entry);
+}
+
+static int writer_dump_object_index(struct reftable_writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0) {
+		return closure.err;
+	}
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct reftable_writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0) {
+		return err;
+	}
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int reftable_writer_close(struct reftable_writer *w)
+{
+	byte footer[72];
+	byte *p = footer;
+	int err = writer_finish_public_section(w);
+	int empty_table = w->next == 0;
+	if (err != 0) {
+		goto exit;
+	}
+	w->pending_padding = 0;
+	if (empty_table) {
+		/* Empty tables need a header anyway. */
+		byte header[28];
+		int n = writer_write_header(w, header);
+		err = padded_write(w, header, n, 0);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	p += writer_write_header(w, footer);
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+
+	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (empty_table) {
+		err = REFTABLE_EMPTY_TABLE_ERROR;
+		goto exit;
+	}
+
+exit:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	slice_clear(&w->last_key);
+	return err;
+}
+
+void writer_clear_index(struct reftable_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		slice_clear(&w->index[i].last_key);
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct reftable_block_stats *bstats =
+		writer_reftable_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	if (raw_bytes < 0) {
+		return raw_bytes;
+	}
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0) {
+		return err;
+	}
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = reftable_realloc(
+			w->index, sizeof(struct index_record) * w->index_cap);
+	}
+
+	{
+		struct index_record ir = {
+			.offset = w->next,
+		};
+		slice_copy(&ir.last_key, w->block_writer->last_key);
+		w->index[w->index_len] = ir;
+	}
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct reftable_writer *w)
+{
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+	if (w->block_writer->entries == 0) {
+		return 0;
+	}
+	return writer_flush_nonempty_block(w);
+}
+
+const struct reftable_stats *writer_stats(struct reftable_writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 00000000000..afc15d8f31c
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct reftable_writer {
+	int (*write)(void *, byte *, size_t);
+	void *write_arg;
+	int pending_padding;
+	struct slice last_key;
+
+	/* offset of next block to write. */
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct reftable_write_options opts;
+
+	/* memory buffer for writing */
+	byte *block;
+
+	/* writer for the current section. NULL or points to
+	 * block_writer_data */
+	struct block_writer *block_writer;
+
+	struct block_writer block_writer_data;
+
+	/* pending index records for the current section */
+	struct index_record *index;
+	int index_len;
+	int index_cap;
+
+	/*
+	  tree for use with tsearch; used to populate the 'o' inverse OID
+	  map */
+	struct tree_node *obj_index_tree;
+
+	struct reftable_stats stats;
+};
+
+/* finishes a block, and writes it to storage */
+int writer_flush_block(struct reftable_writer *w);
+
+/* deallocates memory related to the index */
+void writer_clear_index(struct reftable_writer *w);
+
+/* finishes writing a 'r' (refs) or 'g' (reflogs) section */
+int writer_finish_public_section(struct reftable_writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 00000000000..3e0b0f24f1c
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v11 09/12] Reftable support for git-core
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                       ` (7 preceding siblings ...)
  2020-05-04 19:03                     ` [PATCH v11 08/12] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-05-04 19:03                     ` Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 10/12] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
                                       ` (4 subsequent siblings)
  13 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-04 19:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

TODO:

 * Resolve spots marked with XXX

 * Support worktrees (t0002-gitfile "linked repo" testcase)

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   27 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   56 +-
 cache.h                                       |    6 +-
 refs.c                                        |   27 +-
 refs.h                                        |    3 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1045 +++++++++++++++++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |  109 ++
 13 files changed, 1271 insertions(+), 30 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ffd..72576235833 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index 3d3a39fc192..f4848a382f4 100644
--- a/Makefile
+++ b/Makefile
@@ -810,6 +810,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -962,6 +963,7 @@ LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1165,7 +1167,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2360,11 +2362,29 @@ VCSSVN_OBJS += vcs-svn/sliding_window.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/refname.o
+REFTABLE_OBJS += reftable/reftable.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2505,6 +2525,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -3120,7 +3143,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/builtin/clone.c b/builtin/clone.c
index cb48a291caf..4d0cf065e4a 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1108,7 +1108,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
+		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 0b7222e7188..b7053b9e370 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,7 +178,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -188,7 +189,8 @@ void initialize_repository_version(int hash_algo)
 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
 #endif
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 ||
+	    !strcmp(ref_storage_format, "reftable"))
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -238,6 +240,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -247,6 +250,24 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
+	/*
+	 * refs/heads is a file when using reftable. We can't reinitialize with
+	 * a reftable because it will overwrite HEAD
+	 */
+	if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
+			      is_directory(git_path_buf(&buf, "refs/heads"))) {
+		die("cannot switch ref storage format.");
+	}
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -261,15 +282,12 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
@@ -383,7 +401,8 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash, unsigned int flags)
+	    const char *template_dir, int hash, const char *ref_storage_format,
+	    unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -422,6 +441,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
+	repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
@@ -450,6 +470,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -523,6 +545,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = DEFAULT_REF_STORAGE;
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
@@ -530,15 +553,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *object_format = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
@@ -648,9 +674,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
+	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
+		       ref_storage_format, flags);
 }
diff --git a/cache.h b/cache.h
index 0f0485ecfe2..8cb884773c3 100644
--- a/cache.h
+++ b/cache.h
@@ -628,8 +628,9 @@ int path_inside_repo(const char *prefix, const char *path);
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
-	    unsigned int flags);
-void initialize_repository_version(int hash_algo);
+	    const char *ref_storage_format, unsigned int flags);
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1043,6 +1044,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 4db27379661..299a5db8bf1 100644
--- a/refs.c
+++ b/refs.c
@@ -17,10 +17,16 @@
 #include "argv-array.h"
 #include "repository.h"
 
+const char *default_ref_storage(void)
+{
+	const char *test = getenv("GIT_TEST_REFTABLE");
+	return test ? "reftable" : "files";
+}
+
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1792,13 +1798,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1814,7 +1820,11 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs_private = ref_store_init(r->gitdir,
+					 r->ref_storage_format ?
+						 r->ref_storage_format :
+						 DEFAULT_REF_STORAGE,
+					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
 
@@ -1869,7 +1879,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1883,6 +1893,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1896,9 +1907,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index 9421c5b8465..15f6d78ee84 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* Returns the ref storage backend to use by default. */
+const char *default_ref_storage(void);
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 3490aac3a40..cafe5b97376 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -661,6 +661,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 00000000000..3ede7e837c1
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1045 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct git_reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *repo_dir;
+	char *reftable_dir;
+	struct reftable_stack *stack;
+};
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	reftable_log_record_clear(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = { NULL };
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	reftable_log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *git_reftable_ref_store_create(const char *path,
+						       unsigned int store_flags)
+{
+	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct reftable_write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+	refs->repo_dir = xstrdup(path);
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct strbuf sb = STRBUF_INIT;
+
+	safe_create_dir(refs->reftable_dir, 1);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	return 0;
+}
+
+struct git_reftable_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	const char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = reftable_iterator_next_ref(ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	reftable_ref_record_clear(&ri->ref);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct reftable_merged_table *mt = NULL;
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else {
+		mt = reftable_stack_merged_table(refs->stack);
+		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->prefix = prefix;
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	(void)refs;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid;
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct ref_update *)a)->refname,
+		      ((struct ref_update *)b)->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = reftable_stack_next_update_index(refs->stack);
+	int err = 0;
+	int i = 0;
+	struct reftable_log_record *logs =
+		calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct reftable_log_record *log = &logs[i];
+		fill_reftable_log_record(log);
+		log->ref_name = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct object_id out_oid;
+			int out_flags = 0;
+			/* Memory owned by refs_resolve_ref_unsafe, no need to
+			 * free(). */
+			const char *resolved = refs_resolve_ref_unsafe(
+				transaction->ref_store, u->refname, 0, &out_oid,
+				&out_flags);
+			struct reftable_ref_record ref = { NULL };
+			struct object_id peeled;
+			int peel_error = peel_object(&u->new_oid, &peeled);
+
+			ref.ref_name =
+				(char *)(resolved ? resolved : u->refname);
+			log->ref_name = ref.ref_name;
+
+			if (!is_null_oid(&u->new_oid)) {
+				ref.value = u->new_oid.hash;
+			}
+			ref.update_index = ts;
+			if (!peel_error) {
+				ref.target_value = peeled.hash;
+			}
+
+			err = reftable_writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		err = reftable_writer_add_log(writer, &logs[i]);
+		clear_reftable_log_record(&logs[i]);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+exit:
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_commit(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_add(refs->stack, &write_transaction_table,
+				 transaction);
+	if (err < 0) {
+		strbuf_addf(errmsg, "reftable: transaction failure: %s",
+			    reftable_error_str(err));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *err)
+{
+	return reftable_transaction_commit(ref_store, transaction, err);
+}
+
+struct write_delete_refs_arg {
+	struct reftable_stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	int i = 0;
+
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = reftable_writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_log_record log = { NULL };
+		struct reftable_ref_record current = { NULL };
+		fill_reftable_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (reftable_stack_read_ref(arg->stack, log.ref_name,
+					    &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = reftable_writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		reftable_ref_record_clear(&current);
+
+		clear_reftable_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	return reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct git_reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = reftable_stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	reftable_writer_set_limits(writer, ts, ts);
+	err = reftable_writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_log_record log = { NULL };
+		struct object_id new_oid;
+		struct object_id old_oid;
+		struct reftable_ref_record current = { NULL };
+		reftable_stack_read_ref(create->refs->stack, create->refname,
+					&current);
+
+		fill_reftable_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			reftable_writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_reftable_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_add(refs->stack, &write_create_symref_table,
+				  &arg);
+}
+
+struct write_rename_arg {
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_ref_record ref = { NULL };
+	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto exit;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto exit;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	reftable_writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct reftable_ref_record todo[2] = { { NULL } };
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = reftable_writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct reftable_log_record todo[2] = { { NULL } };
+		fill_reftable_log_record(&todo[0]);
+		fill_reftable_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = reftable_writer_add_logs(writer, todo, 2);
+
+		clear_reftable_log_record(&todo[0]);
+		clear_reftable_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto exit;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+exit:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	return reftable_stack_add(refs->stack, &write_rename_table, &arg);
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = reftable_iterator_next_log(ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			/* we want the refnames that we have reflogs for, so we
+			 * skip if we've already produced this name. This could
+			 * be faster by seeking directly to
+			 * reflog@update_index==0.
+			 */
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	reftable_log_record_clear(&ri->log);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(refs->stack);
+	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_log_record log = { NULL };
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		err = reftable_iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid;
+			struct object_id new_oid;
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+	int i = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct reftable_log_record log = { NULL };
+		err = reftable_iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (i = len; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (i = 0; i < len; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		reftable_log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct reftable_log_record tombstone = {
+		.ref_name = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->refs->stack);
+	int i = 0;
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  reftable_pack_refs(), which will compact the entire stack and get rid
+	  of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.refs = refs,
+	};
+	struct reftable_log_record log = { NULL };
+	struct reftable_iterator it = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		struct object_id ooid;
+		struct object_id noid;
+
+		int err = reftable_iterator_next_log(it, &log);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0 || strcmp(log.ref_name, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_read_ref(refs->stack, refname, &ref);
+	if (err > 0) {
+		errno = ENOENT;
+		err = -1;
+		goto exit;
+	}
+	if (err < 0) {
+		errno = reftable_error_to_errno(err);
+		err = -1;
+		goto exit;
+	}
+	if (ref.target != NULL) {
+		/* XXX recurse? */
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else if (ref.value != NULL) {
+		hashcpy(oid->hash, ref.value);
+	} else {
+		*type |= REF_ISBROKEN;
+		errno = EINVAL;
+		err = -1;
+	}
+exit:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	git_reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire
+};
diff --git a/repository.c b/repository.c
index 6f7f6f002b1..087760bc184 100644
--- a/repository.c
+++ b/repository.c
@@ -178,6 +178,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 6534fbb7b31..f57b73f4a27 100644
--- a/repository.h
+++ b/repository.h
@@ -74,6 +74,9 @@ struct repository {
 	 */
 	struct ref_store *refs_private;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index 65fe5ecefbe..2ef970f9f88 100644
--- a/setup.c
+++ b/setup.c
@@ -468,9 +468,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -559,6 +561,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1204,8 +1207,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 00000000000..7510d24a932
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,109 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+INVALID_SHA1=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+initialize ()  {
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	mv .git/hooks .git/hooks-disabled
+}
+
+test_expect_success 'delete ref' '
+	initialize &&
+	test_commit file &&
+	SHA=$(git show-ref -s --verify HEAD) &&
+	test_write_lines "$SHA refs/heads/master" "$SHA refs/tags/file" >expect &&
+	git show-ref > actual &&
+	! git update-ref -d refs/tags/file $INVALID_SHA1 &&
+	test_cmp expect actual &&
+	git update-ref -d refs/tags/file $SHA  &&
+	test_write_lines "$SHA refs/heads/master" >expect &&
+	git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basic operation of reftable storage: commit, reflog, repack' '
+	initialize &&
+	test_commit file &&
+	test_write_lines refs/heads/master refs/tags/file >expect &&
+	git show-ref &&
+	git show-ref | cut -f2 -d" " > actual &&
+	test_cmp actual expect &&
+	for count in $(test_seq 1 10)
+	do
+		test_commit "number $count" file.t $count number-$count ||
+	        return 1
+	done &&
+	git pack-refs &&
+	ls -1 .git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 11 output &&
+	grep "commit (initial): file" output &&
+	grep "commit: number 10" output &&
+	git gc &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 0 output 
+'
+
+# This matches show-ref's output
+print_ref() {
+	echo "$(git rev-parse "$1") $1"
+}
+
+test_expect_success 'peeled tags are stored' '
+	initialize &&
+	test_commit file &&
+	git tag -m "annotated tag" test_tag HEAD &&
+	{
+		print_ref "refs/heads/master" &&
+		print_ref "refs/tags/file" &&
+		print_ref "refs/tags/test_tag" &&
+		print_ref "refs/tags/test_tag^{}" 
+	} >expect &&
+	git show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'show-ref works on fresh repo' '
+	initialize &&
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	>expect &&
+	! git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'checkout unborn branch' '
+	initialize &&
+	git checkout -b master
+'
+
+
+test_expect_success 'dir/file conflict' '
+	initialize &&
+	test_commit file &&
+	! git branch master/forbidden
+'
+
+
+test_expect_success 'do not clobber existing repo' '
+	rm -rf .git &&
+	git init --ref-storage=files &&
+	cat .git/HEAD > expect &&
+	test_commit file &&
+	(git init --ref-storage=reftable || true) &&
+	cat .git/HEAD > actual &&
+	test_cmp expect actual
+'
+
+
+test_done
+
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v11 10/12] vcxproj: adjust for the reftable changes
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                       ` (8 preceding siblings ...)
  2020-05-04 19:03                     ` [PATCH v11 09/12] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-05-04 19:03                     ` Johannes Schindelin via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 11/12] Add some reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
                                       ` (3 subsequent siblings)
  13 siblings, 0 replies; 409+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-05-04 19:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This allows Git to be compiled via Visual Studio again after integrating
the `hn/reftable` branch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.mak.uname                           |  2 +-
 contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/config.mak.uname b/config.mak.uname
index 5ad43c80b1a..484aec15ac2 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -710,7 +710,7 @@ vcxproj:
 	# Make .vcxproj files and add them
 	unset QUIET_GEN QUIET_BUILT_IN; \
 	perl contrib/buildsystems/generate -g Vcxproj
-	git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
+	git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj
 
 	# Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
 	(echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm
index 5c666f9ac03..33a08d31652 100644
--- a/contrib/buildsystems/Generators/Vcxproj.pm
+++ b/contrib/buildsystems/Generators/Vcxproj.pm
@@ -77,7 +77,7 @@ sub createProject {
     my $libs_release = "\n    ";
     my $libs_debug = "\n    ";
     if (!$static_library) {
-      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
+      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
       $libs_debug = $libs_release;
       $libs_debug =~ s/zlib\.lib/zlibd\.lib/g;
       $libs_debug =~ s/libcurl\.lib/libcurl-d\.lib/g;
@@ -231,6 +231,7 @@ sub createProject {
 EOM
     if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') {
       my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"};
+      my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"};
       my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"};
 
       print F << "EOM";
@@ -240,6 +241,14 @@ sub createProject {
       <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
     </ProjectReference>
 EOM
+      if (!($name =~ /xdiff|libreftable/)) {
+        print F << "EOM";
+    <ProjectReference Include="$cdup\\reftable\\libreftable\\libreftable.vcxproj">
+      <Project>$uuid_libreftable</Project>
+      <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
+    </ProjectReference>
+EOM
+      }
       if (!($name =~ 'xdiff')) {
         print F << "EOM";
     <ProjectReference Include="$cdup\\xdiff\\lib\\xdiff_lib.vcxproj">
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v11 11/12] Add some reftable testing infrastructure
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                       ` (9 preceding siblings ...)
  2020-05-04 19:03                     ` [PATCH v11 10/12] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
@ 2020-05-04 19:03                     ` Han-Wen Nienhuys via GitGitGadget
  2020-05-04 19:03                     ` [PATCH v11 12/12] t: use update-ref and show-ref to reading/writing refs Han-Wen Nienhuys via GitGitGadget
                                       ` (2 subsequent siblings)
  13 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-04 19:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* Add GIT_TEST_REFTABLE environment var to control default ref storage

* Add test_prerequisite REFTABLE. Skip t/t3210-pack-refs.sh for REFTABLE.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/clone.c      | 2 +-
 builtin/init-db.c    | 2 +-
 refs.c               | 6 +++---
 t/t3210-pack-refs.sh | 6 ++++++
 t/test-lib.sh        | 5 +++++
 5 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4d0cf065e4a..780c5807415 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1109,7 +1109,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 
 	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
-		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
+		default_ref_storage(), INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index b7053b9e370..da5b4670c84 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -545,7 +545,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
-	const char *ref_storage_format = DEFAULT_REF_STORAGE;
+	const char *ref_storage_format = default_ref_storage();
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
diff --git a/refs.c b/refs.c
index 299a5db8bf1..b9b3e7e7070 100644
--- a/refs.c
+++ b/refs.c
@@ -1823,7 +1823,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	r->refs_private = ref_store_init(r->gitdir,
 					 r->ref_storage_format ?
 						 r->ref_storage_format :
-						 DEFAULT_REF_STORAGE,
+						 default_ref_storage(),
 					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
@@ -1879,7 +1879,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
+	refs = ref_store_init(submodule_sb.buf, default_ref_storage(),
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1893,7 +1893,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
-	const char *format = DEFAULT_REF_STORAGE; /* XXX */
+	const char *format = default_ref_storage();
 	struct ref_store *refs;
 	const char *id;
 
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index f41b2afb996..edaef2c175a 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -11,6 +11,12 @@ semantic is still the same.
 '
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success 'enable reflogs' '
 	git config core.logallrefupdates true
 '
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 1b221951a8e..b2b16979407 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1514,6 +1514,11 @@ FreeBSD)
 	;;
 esac
 
+if test -n "$GIT_TEST_REFTABLE"
+then
+  test_set_prereq REFTABLE
+fi
+
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v11 12/12] t: use update-ref and show-ref to reading/writing refs
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                       ` (10 preceding siblings ...)
  2020-05-04 19:03                     ` [PATCH v11 11/12] Add some reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
@ 2020-05-04 19:03                     ` Han-Wen Nienhuys via GitGitGadget
  2020-05-06  4:29                     ` [PATCH v11 00/12] Reftable support git-core Junio C Hamano
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
  13 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-04 19:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reading and writing .git/refs/* assumes that refs are stored in the 'files'
ref backend.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/t0002-gitfile.sh             |  2 +-
 t/t1400-update-ref.sh          | 32 ++++++++++++++++----------------
 t/t1506-rev-parse-diagnosis.sh |  2 +-
 t/t6050-replace.sh             |  2 +-
 t/t9020-remote-svn.sh          |  4 ++--
 5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/t/t0002-gitfile.sh b/t/t0002-gitfile.sh
index 0aa9908ea12..960ed150cb5 100755
--- a/t/t0002-gitfile.sh
+++ b/t/t0002-gitfile.sh
@@ -62,7 +62,7 @@ test_expect_success 'check commit-tree' '
 '
 
 test_expect_success 'check rev-list' '
-	echo $SHA >"$REAL/HEAD" &&
+	git update-ref "HEAD" "$SHA" &&
 	test "$SHA" = "$(git rev-list HEAD)"
 '
 
diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index e1197ac8189..27171f82612 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -37,15 +37,15 @@ test_expect_success setup '
 
 test_expect_success "create $m" '
 	git update-ref $m $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "create $m with oldvalue verification" '
 	git update-ref $m $B $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "fail to delete $m with stale ref" '
 	test_must_fail git update-ref -d $m $A &&
-	test $B = "$(cat .git/$m)"
+	test $B = "$(git show-ref -s --verify $m)"
 '
 test_expect_success "delete $m" '
 	test_when_finished "rm -f .git/$m" &&
@@ -56,7 +56,7 @@ test_expect_success "delete $m" '
 test_expect_success "delete $m without oldvalue verification" '
 	test_when_finished "rm -f .git/$m" &&
 	git update-ref $m $A &&
-	test $A = $(cat .git/$m) &&
+	test $A = $(git show-ref -s --verify $m) &&
 	git update-ref -d $m &&
 	test_path_is_missing .git/$m
 '
@@ -69,15 +69,15 @@ test_expect_success "fail to create $n" '
 
 test_expect_success "create $m (by HEAD)" '
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "create $m (by HEAD) with oldvalue verification" '
 	git update-ref HEAD $B $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "fail to delete $m (by HEAD) with stale ref" '
 	test_must_fail git update-ref -d HEAD $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "delete $m (by HEAD)" '
 	test_when_finished "rm -f .git/$m" &&
@@ -178,14 +178,14 @@ test_expect_success '--no-create-reflog overrides core.logAllRefUpdates=always'
 
 test_expect_success "create $m (by HEAD)" '
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success 'pack refs' '
 	git pack-refs --all
 '
 test_expect_success "move $m (by HEAD)" '
 	git update-ref HEAD $B $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "delete $m (by HEAD) should remove both packed and loose $m" '
 	test_when_finished "rm -f .git/$m" &&
@@ -255,7 +255,7 @@ test_expect_success '(not) change HEAD with wrong SHA1' '
 '
 test_expect_success "(not) changed .git/$m" '
 	test_when_finished "rm -f .git/$m" &&
-	! test $B = $(cat .git/$m)
+	! test $B = $(git show-ref -s --verify $m)
 '
 
 rm -f .git/logs/refs/heads/master
@@ -263,19 +263,19 @@ test_expect_success "create $m (logged by touch)" '
 	test_config core.logAllRefUpdates false &&
 	GIT_COMMITTER_DATE="2005-05-26 23:30" \
 	git update-ref --create-reflog HEAD $A -m "Initial Creation" &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "update $m (logged by touch)" '
 	test_config core.logAllRefUpdates false &&
 	GIT_COMMITTER_DATE="2005-05-26 23:31" \
 	git update-ref HEAD $B $A -m "Switch" &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "set $m (logged by touch)" '
 	test_config core.logAllRefUpdates false &&
 	GIT_COMMITTER_DATE="2005-05-26 23:41" \
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 
 test_expect_success 'empty directory removal' '
@@ -319,19 +319,19 @@ test_expect_success "create $m (logged by config)" '
 	test_config core.logAllRefUpdates true &&
 	GIT_COMMITTER_DATE="2005-05-26 23:32" \
 	git update-ref HEAD $A -m "Initial Creation" &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "update $m (logged by config)" '
 	test_config core.logAllRefUpdates true &&
 	GIT_COMMITTER_DATE="2005-05-26 23:33" \
 	git update-ref HEAD'" $B $A "'-m "Switch" &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "set $m (logged by config)" '
 	test_config core.logAllRefUpdates true &&
 	GIT_COMMITTER_DATE="2005-05-26 23:43" \
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 
 cat >expect <<EOF
diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
index 52edcbdcc32..dbf690b9c1b 100755
--- a/t/t1506-rev-parse-diagnosis.sh
+++ b/t/t1506-rev-parse-diagnosis.sh
@@ -207,7 +207,7 @@ test_expect_success 'arg before dashdash must be a revision (ambiguous)' '
 	{
 		# we do not want to use rev-parse here, because
 		# we are testing it
-		cat .git/refs/heads/foobar &&
+		git show-ref -s refs/heads/foobar &&
 		printf "%s\n" --
 	} >expect &&
 	git rev-parse foobar -- >actual &&
diff --git a/t/t6050-replace.sh b/t/t6050-replace.sh
index e7e64e085dd..c80dc10b8f1 100755
--- a/t/t6050-replace.sh
+++ b/t/t6050-replace.sh
@@ -135,7 +135,7 @@ test_expect_success 'tag replaced commit' '
 test_expect_success '"git fsck" works' '
      git fsck master >fsck_master.out &&
      test_i18ngrep "dangling commit $R" fsck_master.out &&
-     test_i18ngrep "dangling tag $(cat .git/refs/tags/mytag)" fsck_master.out &&
+     test_i18ngrep "dangling tag $(git show-ref -s refs/tags/mytag)" fsck_master.out &&
      test -z "$(git fsck)"
 '
 
diff --git a/t/t9020-remote-svn.sh b/t/t9020-remote-svn.sh
index 6fca08e5e35..9fcfa969a9b 100755
--- a/t/t9020-remote-svn.sh
+++ b/t/t9020-remote-svn.sh
@@ -48,8 +48,8 @@ test_expect_success REMOTE_SVN 'simple fetch' '
 '
 
 test_debug '
-	cat .git/refs/svn/svnsim/master
-	cat .git/refs/remotes/svnsim/master
+	git show-ref -s refs/svn/svnsim/master
+	git show-ref -s refs/remotes/svnsim/master
 '
 
 test_expect_success REMOTE_SVN 'repeated fetch, nothing shall change' '
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Pseudo ref handling (was Re: [PATCH v10 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref)
  2020-05-04 18:03                       ` Han-Wen Nienhuys
@ 2020-05-05 18:26                         ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-05 18:26 UTC (permalink / raw)
  To: Emily Shaffer; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Mon, May 4, 2020 at 8:03 PM Han-Wen Nienhuys <hanwen@google.com> wrote:
> >
> > All this to say - it's hard to convince myself this is a safe change,
> > and I'd really like to read more to understand why you made it.
>
> I'll be the first to admit that I'm on shaky ground here. However,
> given that the test suite passes, if this is breaking some behavior,
> it's probably not very well tested behavior.
>
> Here is what I know:
>
> Git stores refs in multiple places:
> - normal refs, packed: .git/packed-refs
> - normal refs, loose: under refs/*/
> - special refs: HEAD, ORIG_HEAD, REBASE_HEAD.
>
> Currently, the special refs can only be read with refs_read_raw_ref().
> If you iterate over the refs in the files/packed backend, you can
> never find HEAD, ORIG_HEAD etc.

A large source of test failures of the reftable patch series is the
handling of pseudo refs (HEAD, CHERRY_PICK_HEAD etc.)

These are written as

  if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
    assert(refs == get_main_ref_store(the_repository));
    ret = write_pseudoref(refname, new_oid, old_oid, &err);
  } else {
    t = ref_store_transaction_begin(refs, &err);
    ..

write_pseudoref writes a file directly to the .git/ directory.

However, on read (for example, CHERRY_PICK_HEAD), the pseudo ref is
given to lookup_commit_reference_by_name, which in the end calls
refs_resolve_ref_unsafe(). If reftable is active, the pseudo ref isn't
read from .git/CHERRY_PICK_HEAD.

If the pseudo refs are read through a ref backend, shouldn't they be
written through a ref backend too?

This causes all rebase and all cherry-pick tests to fail with reftable.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v11 00/12] Reftable support git-core
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                       ` (11 preceding siblings ...)
  2020-05-04 19:03                     ` [PATCH v11 12/12] t: use update-ref and show-ref to reading/writing refs Han-Wen Nienhuys via GitGitGadget
@ 2020-05-06  4:29                     ` Junio C Hamano
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
  13 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-06  4:29 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This adds the reftable library, and hooks it up as a ref backend.

I queued this to 'pu', and we are seeing

    https://github.com/git/git/runs/648285159

The ci/run-static-analysis.sh job's failure report ends like so:

    HDR reftable/record.h
    HDR reftable/refname.h
    HDR reftable/reftable.h
    HDR reftable/slice.h
    HDR reftable/stack.h
    HDR reftable/system.h
    HDR reftable/tree.h
    HDR reftable/writer.h
    HDR remote.h
In file included from reftable/refname.hcc:2:0:
./reftable/refname.h:12:24: error: field ‘tab’ has incomplete type
  struct reftable_table tab;
                        ^~~
./reftable/refname.h:32:13: error: ‘struct reftable_ref_record’ declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
      struct reftable_ref_record *recs, size_t sz);
             ^~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make: *** [reftable/refname.hco] Error 1
Makefile:2842: recipe for target 'reftable/refname.hco' failed
make: *** Waiting for unfinished jobs....

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v12 00/12] Reftable support git-core
  2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                       ` (12 preceding siblings ...)
  2020-05-06  4:29                     ` [PATCH v11 00/12] Reftable support git-core Junio C Hamano
@ 2020-05-07  9:59                     ` Han-Wen Nienhuys via GitGitGadget
  2020-05-07  9:59                       ` [PATCH v12 01/12] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
                                         ` (12 more replies)
  13 siblings, 13 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-07  9:59 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend.

Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1

Summary 18964 tests pass 2871 tests fail

Some issues:

 * pseudorefs are broken
 * many tests inspect .git/{logs,heads}/ directly.

v15

 * fix static analysis error
 * plug some memory leaks

Han-Wen Nienhuys (10):
  refs.h: clarify reflog iteration order
  Iterate over the "refs/" namespace in for_each_[raw]ref
  refs: document how ref_iterator_advance_fn should handle symrefs
  Add .gitattributes for the reftable/ directory
  reftable: define version 2 of the spec to accomodate SHA256
  reftable: clarify how empty tables should be written
  Add reftable library
  Reftable support for git-core
  t: use update-ref and show-ref to reading/writing refs
  Add some reftable testing infrastructure

Johannes Schindelin (1):
  vcxproj: adjust for the reftable changes

Jonathan Nieder (1):
  reftable: file format documentation

 Documentation/Makefile                        |    1 +
 Documentation/technical/reftable.txt          | 1080 +++++++++++++++
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   27 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   56 +-
 cache.h                                       |    6 +-
 config.mak.uname                              |    2 +-
 contrib/buildsystems/Generators/Vcxproj.pm    |   11 +-
 refs.c                                        |   33 +-
 refs.h                                        |    8 +-
 refs/refs-internal.h                          |    6 +
 refs/reftable-backend.c                       | 1045 ++++++++++++++
 reftable/.gitattributes                       |    1 +
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   11 +
 reftable/VERSION                              |    5 +
 reftable/basics.c                             |  215 +++
 reftable/basics.h                             |   53 +
 reftable/block.c                              |  435 ++++++
 reftable/block.h                              |  129 ++
 reftable/constants.h                          |   21 +
 reftable/file.c                               |   99 ++
 reftable/iter.c                               |  240 ++++
 reftable/iter.h                               |   63 +
 reftable/merged.c                             |  327 +++++
 reftable/merged.h                             |   38 +
 reftable/pq.c                                 |  115 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  753 ++++++++++
 reftable/reader.h                             |   65 +
 reftable/record.c                             | 1133 +++++++++++++++
 reftable/record.h                             |  121 ++
 reftable/refname.c                            |  215 +++
 reftable/refname.h                            |   38 +
 reftable/reftable.c                           |   91 ++
 reftable/reftable.h                           |  563 ++++++++
 reftable/slice.c                              |  225 +++
 reftable/slice.h                              |   76 +
 reftable/stack.c                              | 1230 +++++++++++++++++
 reftable/stack.h                              |   45 +
 reftable/system.h                             |   54 +
 reftable/tree.c                               |   67 +
 reftable/tree.h                               |   34 +
 reftable/update.sh                            |   24 +
 reftable/writer.c                             |  661 +++++++++
 reftable/writer.h                             |   60 +
 reftable/zlib-compat.c                        |   92 ++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0002-gitfile.sh                            |    2 +-
 t/t0031-reftable.sh                           |  109 ++
 t/t1400-update-ref.sh                         |   32 +-
 t/t1409-avoid-packing-refs.sh                 |    6 +
 t/t1506-rev-parse-diagnosis.sh                |    2 +-
 t/t3210-pack-refs.sh                          |    6 +
 t/t6050-replace.sh                            |    2 +-
 t/t9020-remote-svn.sh                         |    4 +-
 t/test-lib.sh                                 |    5 +
 60 files changed, 9777 insertions(+), 57 deletions(-)
 create mode 100644 Documentation/technical/reftable.txt
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/.gitattributes
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100755 t/t0031-reftable.sh


base-commit: b34789c0b0d3b137f0bb516b417bd8d75e0cb306
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v12
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v12
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v11:

  1:  dfa5fd74f85 =  1:  dfa5fd74f85 refs.h: clarify reflog iteration order
  2:  340c5c415e1 =  2:  340c5c415e1 Iterate over the "refs/" namespace in for_each_[raw]ref
  3:  6553285043b =  3:  6553285043b refs: document how ref_iterator_advance_fn should handle symrefs
  4:  7dc47c7756f =  4:  7dc47c7756f Add .gitattributes for the reftable/ directory
  5:  06fcb49e903 =  5:  06fcb49e903 reftable: file format documentation
  6:  093fa74a3d0 =  6:  093fa74a3d0 reftable: define version 2 of the spec to accomodate SHA256
  7:  6d9031372ce =  7:  6d9031372ce reftable: clarify how empty tables should be written
  8:  6ee6c44752c !  8:  57d338c4983 Add reftable library
     @@ reftable/README.md (new)
      
       ## reftable/VERSION (new) ##
      @@
     -+commit 3a486f79abcd17e88e3bca62f43188a7c8b80ff3
     ++commit 06dd91a8377b0f920d9835b9835d0d650f928dce
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Mon May 4 19:20:21 2020 +0200
     ++Date:   Wed May 6 21:27:32 2020 +0200
      +
     -+    C: include "cache.h" in git-core sleep_millisec()
     ++    C: fix small memory leak in stack.c error path
      
       ## reftable/basics.c (new) ##
      @@
     @@ reftable/block.c (new)
      +
      +#include "system.h"
      +
     -+#include "blocksource.h"
      +#include "constants.h"
      +#include "record.h"
      +#include "reftable.h"
     @@ reftable/block.c (new)
      +		}
      +
      +		/* We're done with the input data. */
     -+		block_source_return_block(block->source, block);
     ++		reftable_block_done(block);
      +		block->data = uncompressed.buf;
      +		block->len = sz;
      +		block->source = malloc_block_source();
     @@ reftable/block.h (new)
      +/* size of file footer, depending on format version */
      +int footer_size(int version);
      +
     -+#endif
     -
     - ## reftable/blocksource.h (new) ##
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#ifndef BLOCKSOURCE_H
     -+#define BLOCKSOURCE_H
     -+
     -+#include "reftable.h"
     -+
     -+uint64_t block_source_size(struct reftable_block_source source);
     -+int block_source_read_block(struct reftable_block_source source,
     -+			    struct reftable_block *dest, uint64_t off,
     -+			    uint32_t size);
     -+void block_source_return_block(struct reftable_block_source source,
     -+			       struct reftable_block *ret);
     -+void block_source_close(struct reftable_block_source source);
     ++/* returns a block to its source. */
     ++void reftable_block_done(struct reftable_block *ret);
      +
      +#endif
      
     @@ reftable/iter.c (new)
      +		if (fri->double_check) {
      +			struct reftable_iterator it = { 0 };
      +
     -+			err = reftable_reader_seek_ref(fri->r, &it,
     -+						       ref->ref_name);
     ++			err = reftable_table_seek_ref(fri->tab, &it,
     ++						      ref->ref_name);
      +			if (err == 0) {
      +				err = reftable_iterator_next_ref(it, ref);
      +			}
     @@ reftable/iter.c (new)
      +{
      +	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
      +	block_iter_close(&it->cur);
     -+	reader_return_block(it->r, &it->block_reader.block);
     ++	reftable_block_done(&it->block_reader.block);
      +	slice_clear(&it->oid);
      +}
      +
     @@ reftable/iter.c (new)
      +		return 1;
      +	}
      +
     -+	reader_return_block(it->r, &it->block_reader.block);
     ++	reftable_block_done(&it->block_reader.block);
      +
      +	{
      +		uint64_t off = it->offsets[it->offset_idx++];
     @@ reftable/iter.h (new)
      +/* iterator that produces only ref records that point to `oid` */
      +struct filtering_ref_iterator {
      +	bool double_check;
     -+	struct reftable_reader *r;
     ++	struct reftable_table tab;
      +	struct slice oid;
      +	struct reftable_iterator it;
      +};
     @@ reftable/reader.c (new)
      +#include "reader.h"
      +
      +#include "system.h"
     -+
      +#include "block.h"
      +#include "constants.h"
      +#include "iter.h"
     @@ reftable/reader.c (new)
      +	return result;
      +}
      +
     -+void block_source_return_block(struct reftable_block_source source,
     -+			       struct reftable_block *blockp)
     ++void reftable_block_done(struct reftable_block *blockp)
      +{
     -+	source.ops->return_block(source.arg, blockp);
     ++	struct reftable_block_source source = blockp->source;
     ++	if (blockp != NULL && source.ops != NULL)
     ++		source.ops->return_block(source.arg, blockp);
      +	blockp->data = NULL;
      +	blockp->len = 0;
      +	blockp->source.ops = NULL;
     @@ reftable/reader.c (new)
      +	return block_source_read_block(r->source, dest, off, sz);
      +}
      +
     -+void reader_return_block(struct reftable_reader *r, struct reftable_block *p)
     -+{
     -+	block_source_return_block(r->source, p);
     -+}
     -+
      +uint32_t reftable_reader_hash_id(struct reftable_reader *r)
      +{
      +	return r->hash_id;
     @@ reftable/reader.c (new)
      +
      +	err = parse_footer(r, footer.data, header.data);
      +exit:
     -+	block_source_return_block(r->source, &footer);
     -+	block_source_return_block(r->source, &header);
     ++	reftable_block_done(&footer);
     ++	reftable_block_done(&header);
      +	return err;
      +}
      +
     @@ reftable/reader.c (new)
      +	if (ti->bi.br == NULL) {
      +		return;
      +	}
     -+	reader_return_block(ti->r, &ti->bi.br->block);
     ++	reftable_block_done(&ti->bi.br->block);
      +	FREE_AND_NULL(ti->bi.br);
      +
      +	ti->bi.last_key.len = 0;
     @@ reftable/reader.c (new)
      +	}
      +
      +	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
     -+		reader_return_block(r, &block);
     ++		reftable_block_done(&block);
      +		return 1;
      +	}
      +
      +	if (block_size > guess_block_size) {
     -+		reader_return_block(r, &block);
     ++		reftable_block_done(&block);
      +		err = reader_get_block(r, &block, next_off, block_size);
      +		if (err < 0) {
      +			return err;
     @@ reftable/reader.c (new)
      +
      +static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
      +					      struct reftable_iterator *it,
     -+					      byte *oid, int oid_len)
     ++					      byte *oid)
      +{
      +	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
      +	struct filtering_ref_iterator *filter = NULL;
     ++	int oid_len = hash_size(r->hash_id);
      +	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
      +	if (err < 0) {
      +		reftable_free(ti);
     @@ reftable/reader.c (new)
      +	filter = reftable_calloc(sizeof(struct filtering_ref_iterator));
      +	slice_resize(&filter->oid, oid_len);
      +	memcpy(filter->oid.buf, oid, oid_len);
     -+	filter->r = r;
     ++	reftable_table_from_reader(&filter->tab, r);
      +	filter->double_check = false;
      +	iterator_from_table_iter(&filter->it, ti);
      +
     @@ reftable/reader.c (new)
      +}
      +
      +int reftable_reader_refs_for(struct reftable_reader *r,
     -+			     struct reftable_iterator *it, byte *oid,
     -+			     int oid_len)
     ++			     struct reftable_iterator *it, byte *oid)
      +{
      +	if (r->obj_offsets.present) {
      +		return reftable_reader_refs_for_indexed(r, it, oid);
      +	}
     -+	return reftable_reader_refs_for_unindexed(r, it, oid, oid_len);
     ++	return reftable_reader_refs_for_unindexed(r, it, oid);
      +}
      +
      +uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
     @@ reftable/reader.h (new)
      +int block_source_read_block(struct reftable_block_source source,
      +			    struct reftable_block *dest, uint64_t off,
      +			    uint32_t size);
     -+void block_source_return_block(struct reftable_block_source source,
     -+			       struct reftable_block *ret);
      +void block_source_close(struct reftable_block_source *source);
      +
      +/* metadata for a block type */
     @@ reftable/reader.h (new)
      +		struct record rec);
      +void reader_close(struct reftable_reader *r);
      +const char *reader_name(struct reftable_reader *r);
     -+void reader_return_block(struct reftable_reader *r, struct reftable_block *p);
      +
      +/* initialize a block reader to read from `r` */
      +int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
     @@ reftable/record.c (new)
      +	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
      +}
      +
     ++static void obj_record_clear(void *rec)
     ++{
     ++	struct obj_record *obj = (struct obj_record *)rec;
     ++	FREE_AND_NULL(obj->hash_prefix);
     ++	FREE_AND_NULL(obj->offsets);
     ++	memset(obj, 0, sizeof(struct obj_record));
     ++}
     ++
      +static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
      +{
     -+	struct obj_record *ref = (struct obj_record *)rec;
     ++	struct obj_record *obj = (struct obj_record *)rec;
      +	const struct obj_record *src = (const struct obj_record *)src_rec;
      +
     -+	*ref = *src;
     -+	ref->hash_prefix = reftable_malloc(ref->hash_prefix_len);
     -+	memcpy(ref->hash_prefix, src->hash_prefix, ref->hash_prefix_len);
     ++	obj_record_clear(obj);
     ++	*obj = *src;
     ++	obj->hash_prefix = reftable_malloc(obj->hash_prefix_len);
     ++	memcpy(obj->hash_prefix, src->hash_prefix, obj->hash_prefix_len);
      +
      +	{
     -+		int olen = ref->offset_len * sizeof(uint64_t);
     -+		ref->offsets = reftable_malloc(olen);
     -+		memcpy(ref->offsets, src->offsets, olen);
     ++		int olen = obj->offset_len * sizeof(uint64_t);
     ++		obj->offsets = reftable_malloc(olen);
     ++		memcpy(obj->offsets, src->offsets, olen);
      +	}
      +}
      +
     -+static void obj_record_clear(void *rec)
     -+{
     -+	struct obj_record *ref = (struct obj_record *)rec;
     -+	FREE_AND_NULL(ref->hash_prefix);
     -+	FREE_AND_NULL(ref->offsets);
     -+	memset(ref, 0, sizeof(struct obj_record));
     -+}
     -+
      +static byte obj_record_val_type(const void *rec)
      +{
      +	struct obj_record *r = (struct obj_record *)rec;
     @@ reftable/record.c (new)
      +	const struct reftable_log_record *src =
      +		(const struct reftable_log_record *)src_rec;
      +
     ++	reftable_log_record_clear(dst);
      +	*dst = *src;
      +	if (dst->ref_name != NULL) {
      +		dst->ref_name = xstrdup(dst->ref_name);
     @@ reftable/refname.h (new)
      +#ifndef REFNAME_H
      +#define REFNAME_H
      +
     ++#include "reftable.h"
     ++
      +struct modification {
      +	struct reftable_table tab;
      +
     @@ reftable/refname.h (new)
      +
      +int modification_validate(struct modification *mod);
      +
     -+/* illegal name, or dir/file conflict */
     -+#define REFTABLE_REFNAME_ERROR -9
     -+
      +#endif
      
       ## reftable/reftable.c (new) ##
     @@ reftable/reftable.h (new)
      +/* closes and deallocates a reader. */
      +void reftable_reader_free(struct reftable_reader *);
      +
     -+/* return an iterator for the refs pointing to oid */
     ++/* return an iterator for the refs pointing to `oid`. */
      +int reftable_reader_refs_for(struct reftable_reader *r,
     -+			     struct reftable_iterator *it, uint8_t *oid,
     -+			     int oid_len);
     ++			     struct reftable_iterator *it, uint8_t *oid);
      +
      +/* return the max_update_index for a table */
      +uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
     @@ reftable/stack.c (new)
      +		int i = 0;
      +		for (i = 0; i < new_tables_len; i++) {
      +			reader_close(new_tables[i]);
     ++			reftable_reader_free(new_tables[i]);
      +		}
      +	}
      +	reftable_free(new_tables);
  9:  d731e1669b7 !  9:  f3bb9410038 Reftable support for git-core
     @@ Commit message
      
          For background, see the previous commit introducing the library.
      
     -    TODO:
     +    This introduces the refs/reftable-backend.c containing reftable powered ref
     +    storage backend.
     +
     +    It can be activated by passing --ref-storage=reftable to "git init".
      
     -     * Resolve spots marked with XXX
     +    TODO:
      
     -     * Support worktrees (t0002-gitfile "linked repo" testcase)
     +    * Resolve spots marked with XXX
      
          Example use: see t/t0031-reftable.sh
      
 10:  513d585f0f8 = 10:  94fcc6dca6a vcxproj: adjust for the reftable changes
 12:  a7d1d2e721c = 11:  2abcbd1af99 t: use update-ref and show-ref to reading/writing refs
 11:  846fe29fa4b ! 12:  fe9407d10b1 Add some reftable testing infrastructure
     @@ Commit message
      
          * Add test_prerequisite REFTABLE. Skip t/t3210-pack-refs.sh for REFTABLE.
      
     +    Major test failures:
     +
     +     * t9903-bash-prompt - The bash mode reads .git/HEAD directly
     +     * t1400-update-ref.sh - Reads from .git/{refs,logs} directly
     +     * t1404-update-ref-errors.sh - Manipulates .git/refs/ directly
     +     * t1405 - inspecs .git/ directly.
     +     * t1450-fsck.sh - manipulates .git/ directly to create invalid state
     +     * Rebase, cherry-pick: pseudo refs aren't written through the refs backend.
     +
     +    Other tests by decreasing brokenness:
     +
     +    t1407-worktree-ref-store.sh              - 5 of 5
     +    t1413-reflog-detach.sh                   - 7 of 7
     +    t1415-worktree-refs.sh                   - 11 of 11
     +    t3908-stash-in-worktree.sh               - 2 of 2
     +    t4207-log-decoration-colors.sh           - 2 of 2
     +    t5515-fetch-merge-logic.sh               - 17 of 17
     +    t5900-repo-selection.sh                  - 8 of 8
     +    t6016-rev-list-graph-simplify-history.sh - 12 of 12
     +    t5573-pull-verify-signatures.sh          - 15 of 16
     +    t5612-clone-refspec.sh                   - 12 of 13
     +    t5514-fetch-multiple.sh                  - 11 of 12
     +    t6030-bisect-porcelain.sh                - 64 of 71
     +    t5533-push-cas.sh                        - 15 of 17
     +    t5539-fetch-http-shallow.sh              - 7 of 8
     +    t7413-submodule-is-active.sh             - 7 of 8
     +    t2400-worktree-add.sh                    - 59 of 69
     +    t0100-previous.sh                        - 5 of 6
     +    t7419-submodule-set-branch.sh            - 5 of 6
     +    t1404-update-ref-errors.sh               - 44 of 53
     +    t6003-rev-list-topo-order.sh             - 29 of 35
     +    t1409-avoid-packing-refs.sh              - 9 of 11
     +    t5541-http-push-smart.sh                 - 31 of 38
     +    t5407-post-rewrite-hook.sh               - 13 of 16
     +    t9903-bash-prompt.sh                     - 52 of 66
     +    t1414-reflog-walk.sh                     - 9 of 12
     +    t1507-rev-parse-upstream.sh              - 21 of 28
     +    t2404-worktree-config.sh                 - 9 of 12
     +    t1505-rev-parse-last.sh                  - 5 of 7
     +    t7510-signed-commit.sh                   - 16 of 23
     +    t2018-checkout-branch.sh                 - 15 of 22
     +    (..etc)
     +
     +
     +
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
       ## builtin/clone.c ##
     @@ refs.c: struct ref_store *get_submodule_ref_store(const char *submodule)
       	const char *id;
       
      
     + ## t/t1409-avoid-packing-refs.sh ##
     +@@ t/t1409-avoid-packing-refs.sh: test_description='avoid rewriting packed-refs unnecessarily'
     + 
     + . ./test-lib.sh
     + 
     ++if test_have_prereq REFTABLE
     ++then
     ++  skip_all='skipping pack-refs tests; incompatible with reftable'
     ++  test_done
     ++fi
     ++
     + # Add an identifying mark to the packed-refs file header line. This
     + # shouldn't upset readers, and it should be omitted if the file is
     + # ever rewritten.
     +
       ## t/t3210-pack-refs.sh ##
      @@ t/t3210-pack-refs.sh: semantic is still the same.
       '

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v12 01/12] refs.h: clarify reflog iteration order
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
@ 2020-05-07  9:59                       ` Han-Wen Nienhuys via GitGitGadget
  2020-05-08 18:54                         ` Junio C Hamano
  2020-05-07  9:59                       ` [PATCH v12 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
                                         ` (11 subsequent siblings)
  12 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-07  9:59 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/refs.h b/refs.h
index a92d2c74c83..9421c5b8465 100644
--- a/refs.h
+++ b/refs.h
@@ -432,18 +432,21 @@ int delete_refs(const char *msg, struct string_list *refnames,
 int refs_delete_reflog(struct ref_store *refs, const char *refname);
 int delete_reflog(const char *refname);
 
-/* iterate over reflog entries */
+/* Iterate over reflog entries. */
 typedef int each_reflog_ent_fn(
 		struct object_id *old_oid, struct object_id *new_oid,
 		const char *committer, timestamp_t timestamp,
 		int tz, const char *msg, void *cb_data);
 
+/* Iterate in over reflog entries, oldest entry first. */
 int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
 			     each_reflog_ent_fn fn, void *cb_data);
 int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
 				     const char *refname,
 				     each_reflog_ent_fn fn,
 				     void *cb_data);
+
+/* Call a function for each reflog entry, oldest entry first. */
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v12 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
  2020-05-07  9:59                       ` [PATCH v12 01/12] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
@ 2020-05-07  9:59                       ` Han-Wen Nienhuys via GitGitGadget
  2020-05-08 18:54                         ` Junio C Hamano
  2020-05-07  9:59                       ` [PATCH v12 03/12] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
                                         ` (10 subsequent siblings)
  12 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-07  9:59 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This happens implicitly in the files/packed ref backend; making it
explicit simplifies adding alternate ref storage backends, such as
reftable.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 224ff66c7bb..4db27379661 100644
--- a/refs.c
+++ b/refs.c
@@ -1525,7 +1525,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1585,8 +1585,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
-			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
+			       cb_data);
 }
 
 int for_each_rawref(each_ref_fn fn, void *cb_data)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v12 03/12] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
  2020-05-07  9:59                       ` [PATCH v12 01/12] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
  2020-05-07  9:59                       ` [PATCH v12 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
@ 2020-05-07  9:59                       ` Han-Wen Nienhuys via GitGitGadget
  2020-05-08 18:58                         ` Junio C Hamano
  2020-05-07  9:59                       ` [PATCH v12 04/12] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
                                         ` (9 subsequent siblings)
  12 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-07  9:59 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index ff2436c0fb7..3490aac3a40 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -438,6 +438,11 @@ void base_ref_iterator_free(struct ref_iterator *iter);
 
 /* Virtual function declarations for ref_iterators: */
 
+/*
+ * backend-specific implementation of ref_iterator_advance.
+ * For symrefs, the function should set REF_ISSYMREF, and it should also
+ * dereference the symref to provide the OID referent.
+ */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v12 04/12] Add .gitattributes for the reftable/ directory
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
                                         ` (2 preceding siblings ...)
  2020-05-07  9:59                       ` [PATCH v12 03/12] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
@ 2020-05-07  9:59                       ` Han-Wen Nienhuys via GitGitGadget
  2020-05-07  9:59                       ` [PATCH v12 05/12] reftable: file format documentation Jonathan Nieder via GitGitGadget
                                         ` (8 subsequent siblings)
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-07  9:59 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 reftable/.gitattributes

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 00000000000..f44451a3795
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v12 05/12] reftable: file format documentation
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
                                         ` (3 preceding siblings ...)
  2020-05-07  9:59                       ` [PATCH v12 04/12] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
@ 2020-05-07  9:59                       ` Jonathan Nieder via GitGitGadget
  2020-05-07  9:59                       ` [PATCH v12 06/12] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
                                         ` (7 subsequent siblings)
  12 siblings, 0 replies; 409+ messages in thread
From: Jonathan Nieder via GitGitGadget @ 2020-05-07  9:59 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder

From: Jonathan Nieder <jrnieder@gmail.com>

Shawn Pearce explains:

Some repositories contain a lot of references (e.g. android at 866k,
rails at 31k). The reftable format provides:

- Near constant time lookup for any single reference, even when the
  repository is cold and not in process or kernel cache.
- Near constant time verification a SHA-1 is referred to by at least
  one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.

This file format spec was originally written in July, 2017 by Shawn
Pearce.  Some refinements since then were made by Shawn and by Han-Wen
Nienhuys based on experiences implementing and experimenting with the
format.  (All of this was in the context of our work at Google and
Google is happy to contribute the result to the Git project.)

Imported from JGit[1]'s current version (c217d33ff,
"Documentation/technical/reftable: improve repo layout", 2020-02-04)
of Documentation/technical/reftable.md and converted to asciidoc by
running

  pandoc -t asciidoc -f markdown reftable.md >reftable.txt

using pandoc 2.2.1.  The result required the following additional
minor changes:

- removed the [TOC] directive to add a table of contents, since
  asciidoc does not support it
- replaced git-scm.com/docs links with linkgit: directives that link
  to other pages within Git's documentation

[1] https://eclipse.googlesource.com/jgit/jgit

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/Makefile               |    1 +
 Documentation/technical/reftable.txt | 1067 ++++++++++++++++++++++++++
 2 files changed, 1068 insertions(+)
 create mode 100644 Documentation/technical/reftable.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 15d9d04f316..ecd0b340b1c 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -93,6 +93,7 @@ TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
 TECH_DOCS += technical/protocol-v2
 TECH_DOCS += technical/racy-git
+TECH_DOCS += technical/reftable
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
 TECH_DOCS += technical/signature-format
diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
new file mode 100644
index 00000000000..9fa4657d9ff
--- /dev/null
+++ b/Documentation/technical/reftable.txt
@@ -0,0 +1,1067 @@
+reftable
+--------
+
+Overview
+~~~~~~~~
+
+Problem statement
+^^^^^^^^^^^^^^^^^
+
+Some repositories contain a lot of references (e.g. android at 866k,
+rails at 31k). The existing packed-refs format takes up a lot of space
+(e.g. 62M), and does not scale with additional references. Lookup of a
+single reference requires linearly scanning the file.
+
+Atomic pushes modifying multiple references require copying the entire
+packed-refs file, which can be a considerable amount of data moved
+(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
+
+Repositories with many loose references occupy a large number of disk
+blocks from the local file system, as each reference is its own file
+storing 41 bytes (and another file for the corresponding reflog). This
+negatively affects the number of inodes available when a large number of
+repositories are stored on the same filesystem. Readers can be penalized
+due to the larger number of syscalls required to traverse and read the
+`$GIT_DIR/refs` directory.
+
+Objectives
+^^^^^^^^^^
+
+* Near constant time lookup for any single reference, even when the
+repository is cold and not in process or kernel cache.
+* Near constant time verification if a SHA-1 is referred to by at least
+one reference (for allow-tip-sha1-in-want).
+* Efficient lookup of an entire namespace, such as `refs/tags/`.
+* Support atomic push with `O(size_of_update)` operations.
+* Combine reflog storage with ref storage for small transactions.
+* Separate reflog storage for base refs and historical logs.
+
+Description
+^^^^^^^^^^^
+
+A reftable file is a portable binary file format customized for
+reference storage. References are sorted, enabling linear scans, binary
+search lookup, and range scans.
+
+Storage in the file is organized into variable sized blocks. Prefix
+compression is used within a single block to reduce disk space. Block
+size and alignment is tunable by the writer.
+
+Performance
+^^^^^^^^^^^
+
+Space used, packed-refs vs. reftable:
+
+[cols=",>,>,>,>,>",options="header",]
+|===============================================================
+|repository |packed-refs |reftable |% original |avg ref |avg obj
+|android |62.2 M |36.1 M |58.0% |33 bytes |5 bytes
+|rails |1.8 M |1.1 M |57.7% |29 bytes |4 bytes
+|git |78.7 K |48.1 K |61.0% |50 bytes |4 bytes
+|git (heads) |332 b |269 b |81.0% |33 bytes |0 bytes
+|===============================================================
+
+Scan (read 866k refs), by reference name lookup (single ref from 866k
+refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs):
+
+[cols=",>,>,>,>",options="header",]
+|=========================================================
+|format |cache |scan |by name |by SHA-1
+|packed-refs |cold |402 ms |409,660.1 usec |412,535.8 usec
+|packed-refs |hot | |6,844.6 usec |20,110.1 usec
+|reftable |cold |112 ms |33.9 usec |323.2 usec
+|reftable |hot | |20.2 usec |320.8 usec
+|=========================================================
+
+Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable:
+
+[cols=",>,>",options="header",]
+|================================
+|format |size |avg entry
+|$GIT_DIR/logs |173 M |1209 bytes
+|reftable |5 M |37 bytes
+|================================
+
+Details
+~~~~~~~
+
+Peeling
+^^^^^^^
+
+References stored in a reftable are peeled, a record for an annotated
+(or signed) tag records both the tag object, and the object it refers
+to.
+
+Reference name encoding
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Reference names are an uninterpreted sequence of bytes that must pass
+linkgit:git-check-ref-format[1] as a valid reference name.
+
+Key unicity
+^^^^^^^^^^^
+
+Each entry must have a unique key; repeated keys are disallowed.
+
+Network byte order
+^^^^^^^^^^^^^^^^^^
+
+All multi-byte, fixed width fields are in network byte order.
+
+Ordering
+^^^^^^^^
+
+Blocks are lexicographically ordered by their first reference.
+
+Directory/file conflicts
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format accepts both `refs/heads/foo` and
+`refs/heads/foo/bar` as distinct references.
+
+This property is useful for retaining log records in reftable, but may
+confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
+references. Users of reftable may choose to continue to reject `foo` and
+`foo/bar` type conflicts to prevent problems for peers.
+
+File format
+~~~~~~~~~~~
+
+Structure
+^^^^^^^^^
+
+A reftable file has the following high-level structure:
+
+....
+first_block {
+  header
+  first_ref_block
+}
+ref_block*
+ref_index*
+obj_block*
+obj_index*
+log_block*
+log_index*
+footer
+....
+
+A log-only file omits the `ref_block`, `ref_index`, `obj_block` and
+`obj_index` sections, containing only the file header and log block:
+
+....
+first_block {
+  header
+}
+log_block*
+log_index*
+footer
+....
+
+in a log-only file the first log block immediately follows the file
+header, without padding to block alignment.
+
+Block size
+^^^^^^^^^^
+
+The file’s block size is arbitrarily determined by the writer, and does
+not have to be a power of 2. The block size must be larger than the
+longest reference name or log entry used in the repository, as
+references cannot span blocks.
+
+Powers of two that are friendly to the virtual memory system or
+filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
+yield better compression, with a possible increased cost incurred by
+readers during access.
+
+The largest block size is `16777215` bytes (15.99 MiB).
+
+Block alignment
+^^^^^^^^^^^^^^^
+
+Writers may choose to align blocks at multiples of the block size by
+including `padding` filled with NUL bytes at the end of a block to round
+out to the chosen alignment. When alignment is used, writers must
+specify the alignment with the file header’s `block_size` field.
+
+Block alignment is not required by the file format. Unaligned files must
+set `block_size = 0` in the file header, and omit `padding`. Unaligned
+files with more than one ref block must include the link:#Ref-index[ref
+index] to support fast lookup. Readers must be able to read both aligned
+and non-aligned files.
+
+Very small files (e.g. 1 only ref block) may omit `padding` and the ref
+index to reduce total file size.
+
+Header
+^^^^^^
+
+A 24-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+....
+
+Aligned files must specify `block_size` to configure readers with the
+expected block alignment. Unaligned files must set `block_size = 0`.
+
+The `min_update_index` and `max_update_index` describe bounds for the
+`update_index` field of all log records in this file. When reftables are
+used in a stack for link:#Update-transactions[transactions], these
+fields can order the files such that the prior file’s
+`max_update_index + 1` is the next file’s `min_update_index`.
+
+First ref block
+^^^^^^^^^^^^^^^
+
+The first ref block shares the same block as the file header, and is 24
+bytes smaller than all other blocks in the file. The first block
+immediately begins after the file header, at position 24.
+
+If the first block is a log block (a log-only file), its block header
+begins immediately at position 24.
+
+Ref block format
+^^^^^^^^^^^^^^^^
+
+A ref block is written as:
+
+....
+'r'
+uint24( block_len )
+ref_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
+encodes the number of bytes in the block up to, but not including the
+optional `padding`. This is always less than or equal to the file’s
+block size. In the first ref block, `block_len` includes 24 bytes for
+the file header.
+
+The 2-byte `restart_count` stores the number of entries in the
+`restart_offset` list, which must not be empty. Readers can use
+`restart_count` to binary search between restarts before starting a
+linear scan.
+
+Exactly `restart_count` 3-byte `restart_offset` values precedes the
+`restart_count`. Offsets are relative to the start of the block and
+refer to the first byte of any `ref_record` whose name has not been
+prefix compressed. Entries in the `restart_offset` list must be sorted,
+ascending. Readers can start linear scans from any of these records.
+
+A variable number of `ref_record` fill the middle of the block,
+describing reference names and values. The format is described below.
+
+As the first ref block shares the first file block with the file header,
+all `restart_offset` in the first block are relative to the start of the
+file (position 0), and include the file header. This forces the first
+`restart_offset` to be `28`.
+
+ref record
+++++++++++
+
+A `ref_record` describes a single reference, storing both the name and
+its value(s). Records are formatted as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | value_type )
+suffix
+varint( update_index_delta )
+value?
+....
+
+The `prefix_length` field specifies how many leading bytes of the prior
+reference record’s name should be copied to obtain this reference’s
+name. This must be 0 for the first reference in any block, and also must
+be 0 for any `ref_record` whose offset is listed in the `restart_offset`
+table at the end of the block.
+
+Recovering a reference name from any `ref_record` is a simple concat:
+
+....
+this_name = prior_name[0..prefix_length] + suffix
+....
+
+The `suffix_length` value provides the number of bytes available in
+`suffix` to copy from `suffix` to complete the reference name.
+
+The `update_index` that last modified the reference can be obtained by
+adding `update_index_delta` to the `min_update_index` from the file
+header: `min_update_index + update_index_delta`.
+
+The `value` follows. Its format is determined by `value_type`, one of
+the following:
+
+* `0x0`: deletion; no value data (see transactions, below)
+* `0x1`: one 20-byte object id; value of the ref
+* `0x2`: two 20-byte object ids; value of the ref, peeled target
+* `0x3`: symbolic reference: `varint( target_len ) target`
+
+Symbolic references use `0x3`, followed by the complete name of the
+reference target. No compression is applied to the target name.
+
+Types `0x4..0x7` are reserved for future use.
+
+Ref index
+^^^^^^^^^
+
+The ref index stores the name of the last reference from every ref block
+in the file, enabling reduced disk seeks for lookups. Any reference can
+be found by searching the index, identifying the containing block, and
+searching within that block.
+
+The index may be organized into a multi-level index, where the 1st level
+index block points to additional ref index blocks (2nd level), which may
+in turn point to either additional index blocks (e.g. 3rd level) or ref
+blocks (leaf level). Disk reads required to access a ref go up with
+higher index levels. Multi-level indexes may be required to ensure no
+single index block exceeds the file format’s max block size of
+`16777215` bytes (15.99 MiB). To acheive constant O(1) disk seeks for
+lookups the index must be a single level, which is permitted to exceed
+the file’s configured block size, but not the format’s max block size of
+15.99 MiB.
+
+If present, the ref index block(s) appears after the last ref block.
+
+If there are at least 4 ref blocks, a ref index block should be written
+to improve lookup times. Cold reads using the index require 2 disk reads
+(read index, read block), and binary searching < 4 blocks also requires
+<= 2 reads. Omitting the index block from smaller files saves space.
+
+If the file is unaligned and contains more than one ref block, the ref
+index must be written.
+
+Index block format:
+
+....
+'i'
+uint24( block_len )
+index_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+The index blocks begin with `block_type = 'i'` and a 3-byte `block_len`
+which encodes the number of bytes in the block, up to but not including
+the optional `padding`.
+
+The `restart_offset` and `restart_count` fields are identical in format,
+meaning and usage as in ref blocks.
+
+To reduce the number of reads required for random access in very large
+files the index block may be larger than other blocks. However, readers
+must hold the entire index in memory to benefit from this, so it’s a
+time-space tradeoff in both file size and reader memory.
+
+Increasing the file’s block size decreases the index size. Alternatively
+a multi-level index may be used, keeping index blocks within the file’s
+block size, but increasing the number of blocks that need to be
+accessed.
+
+index record
+++++++++++++
+
+An index record describes the last entry in another block. Index records
+are written as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | 0 )
+suffix
+varint( block_position )
+....
+
+Index records use prefix compression exactly like `ref_record`.
+
+Index records store `block_position` after the suffix, specifying the
+absolute position in bytes (from the start of the file) of the block
+that ends with this reference. Readers can seek to `block_position` to
+begin reading the block header.
+
+Readers must examine the block header at `block_position` to determine
+if the next block is another level index block, or the leaf-level ref
+block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the ref index must first read the footer (below) to
+obtain `ref_index_position`. If not present, the position will be 0. The
+`ref_index_position` is for the 1st level root of the ref index.
+
+Obj block format
+^^^^^^^^^^^^^^^^
+
+Object blocks are optional. Writers may choose to omit object blocks,
+especially if readers will not use the SHA-1 to ref mapping.
+
+Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to
+ref blocks containing references pointing to that object directly, or as
+the peeled value of an annotated tag. Like ref blocks, object blocks use
+the file’s standard block size. The abbrevation length is available in
+the footer as `obj_id_len`.
+
+To save space in small files, object blocks may be omitted if the ref
+index is not present, as brute force search will only need to read a few
+ref blocks. When missing, readers should brute force a linear search of
+all references to lookup by SHA-1.
+
+An object block is written as:
+
+....
+'o'
+uint24( block_len )
+obj_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Fields are identical to ref block. Binary search using the restart table
+works the same as in reference blocks.
+
+Because object identifiers are abbreviated by writers to the shortest
+unique abbreviation within the reftable, obj key lengths are variable
+between 2 and 20 bytes. Readers must compare only for common prefix
+match within an obj block or obj index.
+
+obj record
+++++++++++
+
+An `obj_record` describes a single object abbreviation, and the blocks
+containing references using that unique abbreviation:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | cnt_3 )
+suffix
+varint( cnt_large )?
+varint( position_delta )*
+....
+
+Like in reference blocks, abbreviations are prefix compressed within an
+obj block. On large reftables with many unique objects, higher block
+sizes (64k), and higher restart interval (128), a `prefix_length` of 2
+or 3 and `suffix_length` of 3 may be common in obj records (unique
+abbreviation of 5-6 raw bytes, 10-12 hex digits).
+
+Each record contains `position_count` number of positions for matching
+ref blocks. For 1-7 positions the count is stored in `cnt_3`. When
+`cnt_3 = 0` the actual count follows in a varint, `cnt_large`.
+
+The use of `cnt_3` bets most objects are pointed to by only a single
+reference, some may be pointed to by a couple of references, and very
+few (if any) are pointed to by more than 7 references.
+
+A special case exists when `cnt_3 = 0` and `cnt_large = 0`: there are no
+`position_delta`, but at least one reference starts with this
+abbreviation. A reader that needs exact reference names must scan all
+references to find which specific references have the desired object.
+Writers should use this format when the `position_delta` list would have
+overflowed the file’s block size due to a high number of references
+pointing to the same object.
+
+The first `position_delta` is the position from the start of the file.
+Additional `position_delta` entries are sorted ascending and relative to
+the prior entry, e.g. a reader would perform:
+
+....
+pos = position_delta[0]
+prior = pos
+for (j = 1; j < position_count; j++) {
+  pos = prior + position_delta[j]
+  prior = pos
+}
+....
+
+With a position in hand, a reader must linearly scan the ref block,
+starting from the first `ref_record`, testing each reference’s SHA-1s
+(for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
+SHA-1 within a single ref block is not supported by the reftable format.
+Smaller block sizes reduce the number of candidates this step must
+consider.
+
+Obj index
+^^^^^^^^^
+
+The obj index stores the abbreviation from the last entry for every obj
+block in the file, enabling reduced disk seeks for all lookups. It is
+formatted exactly the same as the ref index, but refers to obj blocks.
+
+The obj index should be present if obj blocks are present, as obj blocks
+should only be written in larger files.
+
+Readers loading the obj index must first read the footer (below) to
+obtain `obj_index_position`. If not present, the position will be 0.
+
+Log block format
+^^^^^^^^^^^^^^^^
+
+Unlike ref and obj blocks, log blocks are always unaligned.
+
+Log blocks are variable in size, and do not match the `block_size`
+specified in the file header or footer. Writers should choose an
+appropriate buffer size to prepare a log block for deflation, such as
+`2 * block_size`.
+
+A log block is written as:
+
+....
+'g'
+uint24( block_len )
+zlib_deflate {
+  log_record+
+  uint24( restart_offset )+
+  uint16( restart_count )
+}
+....
+
+Log blocks look similar to ref blocks, except `block_type = 'g'`.
+
+The 4-byte block header is followed by the deflated block contents using
+zlib deflate. The `block_len` in the header is the inflated size
+(including 4-byte block header), and should be used by readers to
+preallocate the inflation output buffer. A log block’s `block_len` may
+exceed the file’s block size.
+
+Offsets within the log block (e.g. `restart_offset`) still include the
+4-byte header. Readers may prefer prefixing the inflation output buffer
+with the 4-byte header.
+
+Within the deflate container, a variable number of `log_record` describe
+reference changes. The log record format is described below. See ref
+block format (above) for a description of `restart_offset` and
+`restart_count`.
+
+Because log blocks have no alignment or padding between blocks, readers
+must keep track of the bytes consumed by the inflater to know where the
+next log block begins.
+
+log record
+++++++++++
+
+Log record keys are structured as:
+
+....
+ref_name '\0' reverse_int64( update_index )
+....
+
+where `update_index` is the unique transaction identifier. The
+`update_index` field must be unique within the scope of a `ref_name`.
+See the update transactions section below for further details.
+
+The `reverse_int64` function inverses the value so lexographical
+ordering the network byte order encoding sorts the more recent records
+with higher `update_index` values first:
+
+....
+reverse_int64(int64 t) {
+  return 0xffffffffffffffff - t;
+}
+....
+
+Log records have a similar starting structure to ref and index records,
+utilizing the same prefix compression scheme applied to the log record
+key described above.
+
+....
+    varint( prefix_length )
+    varint( (suffix_length << 3) | log_type )
+    suffix
+    log_data {
+      old_id
+      new_id
+      varint( name_length    )  name
+      varint( email_length   )  email
+      varint( time_seconds )
+      sint16( tz_offset )
+      varint( message_length )  message
+    }?
+....
+
+Log record entries use `log_type` to indicate what follows:
+
+* `0x0`: deletion; no log data.
+* `0x1`: standard git reflog data using `log_data` above.
+
+The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
+entry from the reflog of `refs/stash` in a transaction file (below),
+without needing to rewrite larger files. Readers reading a stack of
+reflogs must treat this as a deletion.
+
+For `log_type = 0x1`, the `log_data` section follows
+linkgit:git-update-ref[1] logging and includes:
+
+* two 20-byte SHA-1s (old id, new id)
+* varint string of committer’s name
+* varint string of committer’s email
+* varint time in seconds since epoch (Jan 1, 1970)
+* 2-byte timezone offset in minutes (signed)
+* varint string of message
+
+`tz_offset` is the absolute number of minutes from GMT the committer was
+at the time of the update. For example `GMT-0800` is encoded in reftable
+as `sint16(-480)` and `GMT+0230` is `sint16(150)`.
+
+The committer email does not contain `<` or `>`, it’s the value normally
+found between the `<>` in a git commit object header.
+
+The `message_length` may be 0, in which case there was no message
+supplied for the update.
+
+Contrary to traditional reflog (which is a file), renames are encoded as
+a combination of ref deletion and ref creation.
+
+Reading the log
++++++++++++++++
+
+Readers accessing the log must first read the footer (below) to
+determine the `log_position`. The first block of the log begins at
+`log_position` bytes since the start of the file. The `log_position` is
+not block aligned.
+
+Importing logs
+++++++++++++++
+
+When importing from `$GIT_DIR/logs` writers should globally order all
+log records roughly by timestamp while preserving file order, and assign
+unique, increasing `update_index` values for each log line. Newer log
+records get higher `update_index` values.
+
+Although an import may write only a single reftable file, the reftable
+file must span many unique `update_index`, as each log line requires its
+own `update_index` to preserve semantics.
+
+Log index
+^^^^^^^^^
+
+The log index stores the log key
+(`refname \0 reverse_int64(update_index)`) for the last log record of
+every log block in the file, supporting bounded-time lookup.
+
+A log index block must be written if 2 or more log blocks are written to
+the file. If present, the log index appears after the last log block.
+There is no padding used to align the log index to block alignment.
+
+Log index format is identical to ref index, except the keys are 9 bytes
+longer to include `'\0'` and the 8-byte `reverse_int64(update_index)`.
+Records use `block_position` to refer to the start of a log block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the log index must first read the footer (below) to
+obtain `log_index_position`. If not present, the position will be 0.
+
+Footer
+^^^^^^
+
+After the last block of the file, a file footer is written. It begins
+like the file header, but is extended with additional data.
+
+A 68-byte footer appears at the end:
+
+....
+    'REFT'
+    uint8( version_number = 1 )
+    uint24( block_size )
+    uint64( min_update_index )
+    uint64( max_update_index )
+
+    uint64( ref_index_position )
+    uint64( (obj_position << 5) | obj_id_len )
+    uint64( obj_index_position )
+
+    uint64( log_position )
+    uint64( log_index_position )
+
+    uint32( CRC-32 of above )
+....
+
+If a section is missing (e.g. ref index) the corresponding position
+field (e.g. `ref_index_position`) will be 0.
+
+* `obj_position`: byte position for the first obj block.
+* `obj_id_len`: number of bytes used to abbreviate object identifiers in
+obj blocks.
+* `log_position`: byte position for the first log block.
+* `ref_index_position`: byte position for the start of the ref index.
+* `obj_index_position`: byte position for the start of the obj index.
+* `log_index_position`: byte position for the start of the log index.
+
+Reading the footer
+++++++++++++++++++
+
+Readers must seek to `file_length - 68` to access the footer. A trusted
+external source (such as `stat(2)`) is necessary to obtain
+`file_length`. When reading the footer, readers must verify:
+
+* 4-byte magic is correct
+* 1-byte version number is recognized
+* 4-byte CRC-32 matches the other 64 bytes (including magic, and
+version)
+
+Once verified, the other fields of the footer can be accessed.
+
+Varint encoding
+^^^^^^^^^^^^^^^
+
+Varint encoding is identical to the ofs-delta encoding method used
+within pack files.
+
+Decoder works such as:
+
+....
+val = buf[ptr] & 0x7f
+while (buf[ptr] & 0x80) {
+  ptr++
+  val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
+}
+....
+
+Binary search
+^^^^^^^^^^^^^
+
+Binary search within a block is supported by the `restart_offset` fields
+at the end of the block. Readers can binary search through the restart
+table to locate between which two restart points the sought reference or
+key should appear.
+
+Each record identified by a `restart_offset` stores the complete key in
+the `suffix` field of the record, making the compare operation during
+binary search straightforward.
+
+Once a restart point lexicographically before the sought reference has
+been identified, readers can linearly scan through the following record
+entries to locate the sought record, terminating if the current record
+sorts after (and therefore the sought key is not present).
+
+Restart point selection
++++++++++++++++++++++++
+
+Writers determine the restart points at file creation. The process is
+arbitrary, but every 16 or 64 records is recommended. Every 16 may be
+more suitable for smaller block sizes (4k or 8k), every 64 for larger
+block sizes (64k).
+
+More frequent restart points reduces prefix compression and increases
+space consumed by the restart table, both of which increase file size.
+
+Less frequent restart points makes prefix compression more effective,
+decreasing overall file size, with increased penalities for readers
+walking through more records after the binary search step.
+
+A maximum of `65535` restart points per block is supported.
+
+Considerations
+~~~~~~~~~~~~~~
+
+Lightweight refs dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format assumes the vast majority of references are single
+SHA-1 valued with common prefixes, such as Gerrit Code Review’s
+`refs/changes/` namespace, GitHub’s `refs/pulls/` namespace, or many
+lightweight tags in the `refs/tags/` namespace.
+
+Annotated tags storing the peeled object cost an additional 20 bytes per
+reference.
+
+Low overhead
+^^^^^^^^^^^^
+
+A reftable with very few references (e.g. git.git with 5 heads) is 269
+bytes for reftable, vs. 332 bytes for packed-refs. This supports
+reftable scaling down for transaction logs (below).
+
+Block size
+^^^^^^^^^^
+
+For a Gerrit Code Review type repository with many change refs, larger
+block sizes (64 KiB) and less frequent restart points (every 64) yield
+better compression due to more references within the block compressing
+against the prior reference.
+
+Larger block sizes reduce the index size, as the reftable will require
+fewer blocks to store the same number of references.
+
+Minimal disk seeks
+^^^^^^^^^^^^^^^^^^
+
+Assuming the index block has been loaded into memory, binary searching
+for any single reference requires exactly 1 disk seek to load the
+containing block.
+
+Scans and lookups dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Scanning all references and lookup by name (or namespace such as
+`refs/heads/`) are the most common activities performed on repositories.
+SHA-1s are stored directly with references to optimize this use case.
+
+Logs are infrequently read
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are infrequently accessed, but can be large. Deflating log blocks
+saves disk space, with some increased penalty at read time.
+
+Logs are stored in an isolated section from refs, reducing the burden on
+reference readers that want to ignore logs. Further, historical logs can
+be isolated into log-only files.
+
+Logs are read backwards
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are frequently accessed backwards (most recent N records for master
+to answer `master@{4}`), so log records are grouped by reference, and
+sorted descending by update index.
+
+Repository format
+~~~~~~~~~~~~~~~~~
+
+Version 1
+^^^^^^^^^
+
+A repository must set its `$GIT_DIR/config` to configure reftable:
+
+....
+[core]
+    repositoryformatversion = 1
+[extensions]
+    refStorage = reftable
+....
+
+Layout
+^^^^^^
+
+A collection of reftable files are stored in the `$GIT_DIR/reftable/`
+directory:
+
+....
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+where reftable files are named by a unique name such as produced by the
+function `${min_update_index}-${max_update_index}.ref`.
+
+Log-only files use the `.log` extension, while ref-only and mixed ref
+and log files use `.ref`. extension.
+
+The stack ordering file is `$GIT_DIR/reftable/tables.list` and lists the
+current files, one per line, in order, from oldest (base) to newest
+(most recent):
+
+....
+$ cat .git/reftable/tables.list
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+Readers must read `$GIT_DIR/reftable/tables.list` to determine which
+files are relevant right now, and search through the stack in reverse
+order (last reftable is examined first).
+
+Reftable files not listed in `tables.list` may be new (and about to be
+added to the stack by the active writer), or ancient and ready to be
+pruned.
+
+Backward compatibility
+^^^^^^^^^^^^^^^^^^^^^^
+
+Older clients should continue to recognize the directory as a git
+repository so they don’t look for an enclosing repository in parent
+directories. To this end, a reftable-enabled repository must contain the
+following dummy files
+
+* `.git/HEAD`, a regular file containing `ref: refs/heads/.invalid`.
+* `.git/refs/`, a directory
+* `.git/refs/heads`, a regular file
+
+Readers
+^^^^^^^
+
+Readers can obtain a consistent snapshot of the reference space by
+following:
+
+1.  Open and read the `tables.list` file.
+2.  Open each of the reftable files that it mentions.
+3.  If any of the files is missing, goto 1.
+4.  Read from the now-open files as long as necessary.
+
+Update transactions
+^^^^^^^^^^^^^^^^^^^
+
+Although reftables are immutable, mutations are supported by writing a
+new reftable and atomically appending it to the stack:
+
+1.  Acquire `tables.list.lock`.
+2.  Read `tables.list` to determine current reftables.
+3.  Select `update_index` to be most recent file’s
+`max_update_index + 1`.
+4.  Prepare temp reftable `tmp_XXXXXX`, including log entries.
+5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
+6.  Copy `tables.list` to `tables.list.lock`, appending file from (5).
+7.  Rename `tables.list.lock` to `tables.list`.
+
+During step 4 the new file’s `min_update_index` and `max_update_index`
+are both set to the `update_index` selected by step 3. All log records
+for the transaction use the same `update_index` in their keys. This
+enables later correlation of which references were updated by the same
+transaction.
+
+Because a single `tables.list.lock` file is used to manage locking, the
+repository is single-threaded for writers. Writers may have to busy-spin
+(with backoff) around creating `tables.list.lock`, for up to an
+acceptable wait period, aborting if the repository is too busy to
+mutate. Application servers wrapped around repositories (e.g. Gerrit
+Code Review) can layer their own lock/wait queue to improve fairness to
+writers.
+
+Reference deletions
+^^^^^^^^^^^^^^^^^^^
+
+Deletion of any reference can be explicitly stored by setting the `type`
+to `0x0` and omitting the `value` field of the `ref_record`. This serves
+as a tombstone, overriding any assertions about the existence of the
+reference from earlier files in the stack.
+
+Compaction
+^^^^^^^^^^
+
+A partial stack of reftables can be compacted by merging references
+using a straightforward merge join across reftables, selecting the most
+recent value for output, and omitting deleted references that do not
+appear in remaining, lower reftables.
+
+A compacted reftable should set its `min_update_index` to the smallest
+of the input files’ `min_update_index`, and its `max_update_index`
+likewise to the largest input `max_update_index`.
+
+For sake of illustration, assume the stack currently consists of
+reftable files (from oldest to newest): A, B, C, and D. The compactor is
+going to compact B and C, leaving A and D alone.
+
+1.  Obtain lock `tables.list.lock` and read the `tables.list` file.
+2.  Obtain locks `B.lock` and `C.lock`. Ownership of these locks
+prevents other processes from trying to compact these files.
+3.  Release `tables.list.lock`.
+4.  Compact `B` and `C` into a temp file
+`${min_update_index}-${max_update_index}_XXXXXX`.
+5.  Reacquire lock `tables.list.lock`.
+6.  Verify that `B` and `C` are still in the stack, in that order. This
+should always be the case, assuming that other processes are adhering to
+the locking protocol.
+7.  Rename `${min_update_index}-${max_update_index}_XXXXXX` to
+`${min_update_index}-${max_update_index}.ref`.
+8.  Write the new stack to `tables.list.lock`, replacing `B` and `C`
+with the file from (4).
+9.  Rename `tables.list.lock` to `tables.list`.
+10. Delete `B` and `C`, perhaps after a short sleep to avoid forcing
+readers to backtrack.
+
+This strategy permits compactions to proceed independently of updates.
+
+Each reftable (compacted or not) is uniquely identified by its name, so
+open reftables can be cached by their name.
+
+Alternatives considered
+~~~~~~~~~~~~~~~~~~~~~~~
+
+bzip packed-refs
+^^^^^^^^^^^^^^^^
+
+`bzip2` can significantly shrink a large packed-refs file (e.g. 62 MiB
+compresses to 23 MiB, 37%). However the bzip format does not support
+random access to a single reference. Readers must inflate and discard
+while performing a linear scan.
+
+Breaking packed-refs into chunks (individually compressing each chunk)
+would reduce the amount of data a reader must inflate, but still leaves
+the problem of indexing chunks to support readers efficiently locating
+the correct chunk.
+
+Given the compression achieved by reftable’s encoding, it does not seem
+necessary to add the complexity of bzip/gzip/zlib.
+
+Michael Haggerty’s alternate format
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Michael Haggerty proposed
+https://public-inbox.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe+wGvEJ7A@mail.gmail.com/[an
+alternate] format to reftable on the Git mailing list. This format uses
+smaller chunks, without the restart table, and avoids block alignment
+with padding. Reflog entries immediately follow each ref, and are thus
+interleaved between refs.
+
+Performance testing indicates reftable is faster for lookups (51%
+faster, 11.2 usec vs. 5.4 usec), although reftable produces a slightly
+larger file (+ ~3.2%, 28.3M vs 29.2M):
+
+[cols=">,>,>,>",options="header",]
+|=====================================
+|format |size |seek cold |seek hot
+|mh-alt |28.3 M |23.4 usec |11.2 usec
+|reftable |29.2 M |19.9 usec |5.4 usec
+|=====================================
+
+JGit Ketch RefTree
+^^^^^^^^^^^^^^^^^^
+
+https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg03073.html[JGit Ketch]
+proposed
+https://public-inbox.org/git/CAJo=hJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg@mail.gmail.com/[RefTree],
+an encoding of references inside Git tree objects stored as part of the
+repository’s object database.
+
+The RefTree format adds additional load on the object database storage
+layer (more loose objects, more objects in packs), and relies heavily on
+the packer’s delta compression to save space. Namespaces which are flat
+(e.g. thousands of tags in refs/tags) initially create very large loose
+objects, and so RefTree does not address the problem of copying many
+references to modify a handful.
+
+Flat namespaces are not efficiently searchable in RefTree, as tree
+objects in canonical formatting cannot be binary searched. This fails
+the need to handle a large number of references in a single namespace,
+such as GitHub’s `refs/pulls`, or a project with many tags.
+
+LMDB
+^^^^
+
+David Turner proposed
+https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/[using
+LMDB], as LMDB is lightweight (64k of runtime code) and GPL-compatible
+license.
+
+A downside of LMDB is its reliance on a single C implementation. This
+makes embedding inside JGit (a popular reimplemenation of Git)
+difficult, and hoisting onto virtual storage (for JGit DFS) virtually
+impossible.
+
+A common format that can be supported by all major Git implementations
+(git-core, JGit, libgit2) is strongly preferred.
+
+Future
+~~~~~~
+
+Longer hashes
+^^^^^^^^^^^^^
+
+Version will bump (e.g. 2) to indicate `value` uses a different object
+id length other than 20. The length could be stored in an expanded file
+header, or hardcoded as part of the version.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v12 06/12] reftable: define version 2 of the spec to accomodate SHA256
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
                                         ` (4 preceding siblings ...)
  2020-05-07  9:59                       ` [PATCH v12 05/12] reftable: file format documentation Jonathan Nieder via GitGitGadget
@ 2020-05-07  9:59                       ` Han-Wen Nienhuys via GitGitGadget
  2020-05-08 19:59                         ` Junio C Hamano
  2020-05-07  9:59                       ` [PATCH v12 07/12] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
                                         ` (6 subsequent siblings)
  12 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-07  9:59 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/technical/reftable.txt | 50 ++++++++++++++++------------
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index 9fa4657d9ff..ee3f36ea851 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -193,8 +193,8 @@ and non-aligned files.
 Very small files (e.g. 1 only ref block) may omit `padding` and the ref
 index to reduce total file size.
 
-Header
-^^^^^^
+Header (version 1)
+^^^^^^^^^^^^^^^^^^
 
 A 24-byte header appears at the beginning of the file:
 
@@ -215,6 +215,24 @@ used in a stack for link:#Update-transactions[transactions], these
 fields can order the files such that the prior file’s
 `max_update_index + 1` is the next file’s `min_update_index`.
 
+Header (version 2)
+^^^^^^^^^^^^^^^^^^
+
+A 28-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+uint32( hash_id )
+....
+
+The header is identical to `version_number=1`, with the 4-byte hash ID
+("sha1" for SHA1 and "s256" for SHA-256) append to the header.
+
+
 First ref block
 ^^^^^^^^^^^^^^^
 
@@ -671,14 +689,8 @@ Footer
 After the last block of the file, a file footer is written. It begins
 like the file header, but is extended with additional data.
 
-A 68-byte footer appears at the end:
-
 ....
-    'REFT'
-    uint8( version_number = 1 )
-    uint24( block_size )
-    uint64( min_update_index )
-    uint64( max_update_index )
+    HEADER
 
     uint64( ref_index_position )
     uint64( (obj_position << 5) | obj_id_len )
@@ -701,12 +713,16 @@ obj blocks.
 * `obj_index_position`: byte position for the start of the obj index.
 * `log_index_position`: byte position for the start of the log index.
 
+The size of the footer is 68 bytes for version 1, and 72 bytes for
+version 2.
+
 Reading the footer
 ++++++++++++++++++
 
-Readers must seek to `file_length - 68` to access the footer. A trusted
-external source (such as `stat(2)`) is necessary to obtain
-`file_length`. When reading the footer, readers must verify:
+Readers must first read the file start to determine the version
+number. Then they seek to `file_length - FOOTER_LENGTH` to access the
+footer. A trusted external source (such as `stat(2)`) is necessary to
+obtain `file_length`. When reading the footer, readers must verify:
 
 * 4-byte magic is correct
 * 1-byte version number is recognized
@@ -1055,13 +1071,3 @@ impossible.
 
 A common format that can be supported by all major Git implementations
 (git-core, JGit, libgit2) is strongly preferred.
-
-Future
-~~~~~~
-
-Longer hashes
-^^^^^^^^^^^^^
-
-Version will bump (e.g. 2) to indicate `value` uses a different object
-id length other than 20. The length could be stored in an expanded file
-header, or hardcoded as part of the version.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v12 07/12] reftable: clarify how empty tables should be written
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
                                         ` (5 preceding siblings ...)
  2020-05-07  9:59                       ` [PATCH v12 06/12] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
@ 2020-05-07  9:59                       ` Han-Wen Nienhuys via GitGitGadget
  2020-05-07  9:59                       ` [PATCH v12 08/12] Add reftable library Han-Wen Nienhuys via GitGitGadget
                                         ` (5 subsequent siblings)
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-07  9:59 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The format allows for some ambiguity, as a lone footer also starts
with a valid file header. However, the current JGit code will barf on
this. This commit codifies this behavior into the standard.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/technical/reftable.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index ee3f36ea851..df5cb5f11b8 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -731,6 +731,13 @@ version)
 
 Once verified, the other fields of the footer can be accessed.
 
+Empty tables
+++++++++++++
+
+A reftable may be empty. In this case, the file starts with a header
+and is immediately followed by a footer.
+
+
 Varint encoding
 ^^^^^^^^^^^^^^^
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v12 08/12] Add reftable library
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
                                         ` (6 preceding siblings ...)
  2020-05-07  9:59                       ` [PATCH v12 07/12] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
@ 2020-05-07  9:59                       ` Han-Wen Nienhuys via GitGitGadget
  2020-05-07  9:59                       ` [PATCH v12 09/12] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                                         ` (4 subsequent siblings)
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-07  9:59 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. Its rationale and
specification is in the preceding commit.

This imports the upstream library as one big commit. For understanding
the code, it is suggested to read the code in the following order:

* The specification under Documentation/technical/reftable.txt

* reftable.h - the public API

* record.{c,h} - reading and writing records

* block.{c,h} - reading and writing blocks.

* writer.{c,h} - writing a complete reftable file.

* merged.{c,h} and pq.{c,h} - reading a stack of reftables

* stack.{c,h} - writing and compacting stacks of reftable on the
filesystem.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE       |   31 +
 reftable/README.md     |   11 +
 reftable/VERSION       |    5 +
 reftable/basics.c      |  215 +++++++
 reftable/basics.h      |   53 ++
 reftable/block.c       |  435 ++++++++++++++
 reftable/block.h       |  129 +++++
 reftable/constants.h   |   21 +
 reftable/file.c        |   99 ++++
 reftable/iter.c        |  240 ++++++++
 reftable/iter.h        |   63 ++
 reftable/merged.c      |  327 +++++++++++
 reftable/merged.h      |   38 ++
 reftable/pq.c          |  115 ++++
 reftable/pq.h          |   34 ++
 reftable/reader.c      |  753 ++++++++++++++++++++++++
 reftable/reader.h      |   65 +++
 reftable/record.c      | 1133 ++++++++++++++++++++++++++++++++++++
 reftable/record.h      |  121 ++++
 reftable/refname.c     |  215 +++++++
 reftable/refname.h     |   38 ++
 reftable/reftable.c    |   91 +++
 reftable/reftable.h    |  563 ++++++++++++++++++
 reftable/slice.c       |  225 ++++++++
 reftable/slice.h       |   76 +++
 reftable/stack.c       | 1230 ++++++++++++++++++++++++++++++++++++++++
 reftable/stack.h       |   45 ++
 reftable/system.h      |   54 ++
 reftable/tree.c        |   67 +++
 reftable/tree.h        |   34 ++
 reftable/update.sh     |   24 +
 reftable/writer.c      |  661 +++++++++++++++++++++
 reftable/writer.h      |   60 ++
 reftable/zlib-compat.c |   92 +++
 34 files changed, 7363 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 00000000000..402e0f9356b
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 00000000000..21500563fd4
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,11 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   sh reftable/update.sh
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 00000000000..09ceed05220
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,5 @@
+commit 06dd91a8377b0f920d9835b9835d0d650f928dce
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Wed May 6 21:27:32 2020 +0200
+
+    C: fix small memory leak in stack.c error path
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 00000000000..14c4dfaf5a6
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,215 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)(i & 0xff);
+}
+
+uint32_t get_be24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)(i & 0xff);
+}
+
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args)
+{
+	size_t lo = 0;
+	size_t hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		size_t mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		reftable_free(*p);
+		p++;
+	}
+	reftable_free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = reftable_realloc(
+					names, names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = reftable_realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *reftable_error_str(int err)
+{
+	static char buf[250];
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return "I/O error";
+	case REFTABLE_FORMAT_ERROR:
+		return "corrupt reftable file";
+	case REFTABLE_NOT_EXIST_ERROR:
+		return "file does not exist";
+	case REFTABLE_LOCK_ERROR:
+		return "data is outdated";
+	case REFTABLE_API_ERROR:
+		return "misuse of the reftable API";
+	case REFTABLE_ZLIB_ERROR:
+		return "zlib failure";
+	case REFTABLE_NAME_CONFLICT:
+		return "file/directory conflict";
+	case REFTABLE_REFNAME_ERROR:
+		return "invalid refname";
+	case -1:
+		return "general error";
+	default:
+		snprintf(buf, sizeof(buf), "unknown error code %d", err);
+		return buf;
+	}
+}
+
+int reftable_error_to_errno(int err)
+{
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return EIO;
+	case REFTABLE_FORMAT_ERROR:
+		return EFAULT;
+	case REFTABLE_NOT_EXIST_ERROR:
+		return ENOENT;
+	case REFTABLE_LOCK_ERROR:
+		return EBUSY;
+	case REFTABLE_API_ERROR:
+		return EINVAL;
+	case REFTABLE_ZLIB_ERROR:
+		return EDOM;
+	default:
+		return ERANGE;
+	}
+}
+
+void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
+void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
+void (*reftable_free_ptr)(void *) = &free;
+
+void *reftable_malloc(size_t sz)
+{
+	return (*reftable_malloc_ptr)(sz);
+}
+
+void *reftable_realloc(void *p, size_t sz)
+{
+	return (*reftable_realloc_ptr)(p, sz);
+}
+
+void reftable_free(void *p)
+{
+	reftable_free_ptr(p);
+}
+
+void *reftable_calloc(size_t sz)
+{
+	void *p = reftable_malloc(sz);
+	memset(p, 0, sz);
+	return p;
+}
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *))
+{
+	reftable_malloc_ptr = malloc;
+	reftable_realloc_ptr = realloc;
+	reftable_free_ptr = free;
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 00000000000..1b003485c9c
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+/* Bigendian en/decoding of integers */
+
+void put_be24(byte *out, uint32_t i);
+uint32_t get_be24(byte *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args);
+
+/*
+  Frees a NULL terminated array of malloced strings. The array itself is also
+  freed.
+ */
+void free_names(char **a);
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp);
+
+/* compares two NULL-terminated arrays of strings. */
+int names_equal(char **a, char **b);
+
+/* returns the array size of a NULL-terminated array of strings. */
+int names_length(char **names);
+
+/* Allocation routines; they invoke the functions set through
+ * reftable_set_alloc() */
+void *reftable_malloc(size_t sz);
+void *reftable_realloc(void *p, size_t sz);
+void reftable_free(void *p);
+void *reftable_calloc(size_t sz);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 00000000000..bd7ebda31ba
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,435 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int header_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 24;
+	case 2:
+		return 28;
+	}
+	abort();
+}
+
+int footer_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 68;
+	case 2:
+		return 72;
+	}
+	abort();
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct record rec)
+{
+	struct slice empty = { 0 };
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = { 0 };
+	int n = 0;
+
+	record_key(rec, &key);
+	n = encode_key(&restart, out, last, key, record_val_type(rec));
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	n = record_encode(rec, out, w->hash_size);
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0) {
+		goto err;
+	}
+
+	slice_clear(&key);
+	return 0;
+
+err:
+	slice_clear(&key);
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next) {
+		return -1;
+	}
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = reftable_realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = { 0 };
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		slice_resize(&compressed, src_len);
+
+		while (1) {
+			uLongf dest_len = compressed.len;
+
+			zresult = compress2(compressed.buf, &dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				slice_resize(&compressed, 2 * compressed.len);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				slice_clear(&compressed);
+				return REFTABLE_ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed.buf,
+			       dest_len);
+			w->next = dest_len + block_header_skip;
+			slice_clear(&compressed);
+			break;
+		}
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct reftable_block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	if (!is_block_type(typ)) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = { 0 };
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip; /* total size of dest
+							    buffer. */
+		uLongf src_len = block->len - block_header_skip;
+
+		/* Log blocks specify the *uncompressed* size in their header.
+		 */
+		slice_resize(&uncompressed, sz);
+
+		/* Copy over the block header verbatim. It's not compressed. */
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		/* Uncompress */
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			slice_clear(&uncompressed);
+			return REFTABLE_ZLIB_ERROR;
+		}
+
+		if (dst_len + block_header_skip != sz) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+
+		/* We're done with the input data. */
+		reftable_block_done(block);
+		block->data = uncompressed.buf;
+		block->len = sz;
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	{
+		uint16_t restart_count = get_be16(block->data + sz - 2);
+		uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+		byte *restart_bytes = block->data + restart_start;
+
+		/* transfer ownership. */
+		br->block = *block;
+		block->data = NULL;
+		block->len = 0;
+
+		br->hash_size = hash_size;
+		br->block_len = restart_start;
+		br->full_block_size = full_block_size;
+		br->header_off = header_off;
+		br->restart_count = restart_count;
+		br->restart_bytes = restart_bytes;
+	}
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct slice key;
+	struct block_reader *r;
+};
+
+static int restart_key_less(size_t idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = { 0 };
+	struct slice last_key = { 0 };
+	byte unused_extra;
+	int n = decode_key(&rkey, &unused_extra, last_key, in);
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	{
+		int result = slice_compare(a->key, rkey);
+		slice_clear(&rkey);
+		return result;
+	}
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+int block_iter_next(struct block_iter *it, struct record rec)
+{
+	if (it->next_off >= it->br->block_len) {
+		return 1;
+	}
+
+	{
+		struct slice in = {
+			.buf = it->br->block.data + it->next_off,
+			.len = it->br->block_len - it->next_off,
+		};
+		struct slice start = in;
+		struct slice key = { 0 };
+		byte extra;
+		int n = decode_key(&key, &extra, it->last_key, in);
+		if (n < 0) {
+			return -1;
+		}
+
+		slice_consume(&in, n);
+		n = record_decode(rec, key, extra, in, it->br->hash_size);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+
+		slice_copy(&it->last_key, key);
+		it->next_off += start.len - in.len;
+		slice_clear(&key);
+		return 0;
+	}
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = { 0 };
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	byte extra = 0;
+	int n = decode_key(key, &extra, empty, in);
+	if (n < 0) {
+		return n;
+	}
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	slice_clear(&it->last_key);
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		return -1;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	{
+		struct record rec = new_record(block_reader_type(br));
+		struct slice key = { 0 };
+		int result = 0;
+		int err = 0;
+		struct block_iter next = { 0 };
+		while (true) {
+			block_iter_copy_from(&next, it);
+
+			err = block_iter_next(&next, rec);
+			if (err < 0) {
+				result = -1;
+				goto exit;
+			}
+
+			record_key(rec, &key);
+			if (err > 0 || slice_compare(key, want) >= 0) {
+				result = 0;
+				goto exit;
+			}
+
+			block_iter_copy_from(it, &next);
+		}
+
+	exit:
+		slice_clear(&key);
+		slice_clear(&next.last_key);
+		record_destroy(&rec);
+
+		return result;
+	}
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	slice_clear(&bw->last_key);
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 00000000000..17bee98e2bb
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,129 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+/*
+  Writes reftable blocks. The block_writer is reused across blocks to minimize
+  allocation overhead.
+*/
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+
+	/* Offset ofof the global header. Nonzero in the first block only. */
+	uint32_t header_off;
+
+	/* How often to restart keys. */
+	int restart_interval;
+	int hash_size;
+
+	/* Offset of next byte to write. */
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+
+	struct slice last_key;
+	int entries;
+};
+
+/*
+  initializes the blockwriter to write `typ` entries, using `buf` as temporary
+  storage. `buf` is not owned by the block_writer. */
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+
+/*
+  returns the block type (eg. 'r' for ref records.
+*/
+byte block_writer_type(struct block_writer *bw);
+
+/* appends the record, or -1 if it doesn't fit. */
+int block_writer_add(struct block_writer *w, struct record rec);
+
+/* appends the key restarts, and compress the block if necessary. */
+int block_writer_finish(struct block_writer *w);
+
+/* clears out internally allocated block_writer members. */
+void block_writer_clear(struct block_writer *bw);
+
+/* Read a block. */
+struct block_reader {
+	/* offset of the block header; nonzero for the first block in a
+	 * reftable. */
+	uint32_t header_off;
+
+	/* the memory block */
+	struct reftable_block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint16_t restart_count;
+
+	/* size of the data in the file. For log blocks, this is the compressed
+	 * size. */
+	uint32_t full_block_size;
+};
+
+/* Iterate over entries in a block */
+struct block_iter {
+	/* offset within the block of the next entry to read. */
+	uint32_t next_off;
+	struct block_reader *br;
+
+	/* key for last entry we read. */
+	struct slice last_key;
+};
+
+/* initializes a block reader */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Position `it` at start of the block */
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+
+/* Position `it` to the `want` key in the block */
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want);
+
+/* Returns the block type (eg. 'r' for refs) */
+byte block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct record rec);
+
+/* Seek to `want` with in the block pointed to by `it` */
+int block_iter_seek(struct block_iter *it, struct slice want);
+
+/* deallocate memory for `it`. The block reader and its block is left intact. */
+void block_iter_close(struct block_iter *it);
+
+/* size of file header, depending on format version */
+int header_size(int version);
+
+/* size of file footer, depending on format version */
+int footer_size(int version);
+
+/* returns a block to its source. */
+void reftable_block_done(struct reftable_block *ret);
+
+#endif
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 00000000000..5eee72c4c11
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,21 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 00000000000..2bf1135a2e2
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,99 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	reftable_free(b);
+}
+
+static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = reftable_malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size) {
+		return -1;
+	}
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int reftable_block_source_from_file(struct reftable_block_source *bs,
+				    const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return REFTABLE_NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0) {
+		return -1;
+	}
+
+	{
+		struct file_block_source *p =
+			reftable_calloc(sizeof(struct file_block_source));
+		p->size = st.st_size;
+		p->fd = fd;
+
+		assert(bs->ops == NULL);
+		bs->ops = &file_vtable;
+		bs->arg = p;
+	}
+	return 0;
+}
+
+int reftable_fd_write(void *arg, byte *data, size_t sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 00000000000..b48155e1107
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,240 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct reftable_iterator it)
+{
+	return it.ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct record rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct reftable_iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct reftable_iterator *it)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct reftable_iterator it, struct record rec)
+{
+	return it.ops->next(it.iter_arg, rec);
+}
+
+void reftable_iterator_destroy(struct reftable_iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int reftable_iterator_next_ref(struct reftable_iterator it,
+			       struct reftable_ref_record *ref)
+{
+	struct record rec = { 0 };
+	record_from_ref(&rec, ref);
+	return iterator_next(it, rec);
+}
+
+int reftable_iterator_next_log(struct reftable_iterator it,
+			       struct reftable_log_record *log)
+{
+	struct record rec = { 0 };
+	record_from_log(&rec, log);
+	return iterator_next(it, rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	slice_clear(&fri->oid);
+	reftable_iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec.data;
+	int err = 0;
+	while (true) {
+		err = reftable_iterator_next_ref(fri->it, ref);
+		if (err != 0) {
+			break;
+		}
+
+		if (fri->double_check) {
+			struct reftable_iterator it = { 0 };
+
+			err = reftable_table_seek_ref(fri->tab, &it,
+						      ref->ref_name);
+			if (err == 0) {
+				err = reftable_iterator_next_ref(it, ref);
+			}
+
+			reftable_iterator_destroy(&it);
+
+			if (err < 0) {
+				break;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+
+	reftable_ref_record_clear(ref);
+	return err;
+}
+
+struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reftable_block_done(&it->block_reader.block);
+	slice_clear(&it->oid);
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reftable_block_done(&it->block_reader.block);
+
+	{
+		uint64_t off = it->offsets[it->offset_idx++];
+		int err = reader_init_block_reader(it->r, &it->block_reader,
+						   off, BLOCK_TYPE_REF);
+		if (err < 0) {
+			return err;
+		}
+		if (err > 0) {
+			/* indexed block does not exist. */
+			return REFTABLE_FORMAT_ERROR;
+		}
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct record rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec.data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter *itr =
+		reftable_calloc(sizeof(struct indexed_table_ref_iter));
+	int err = 0;
+
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		reftable_free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 00000000000..5a0d12a3552
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,63 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct reftable_iterator_vtable {
+	int (*next)(void *iter_arg, struct record rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct reftable_iterator *it);
+int iterator_next(struct reftable_iterator it, struct record rec);
+
+/* Returns true for a zeroed out iterator, such as the one returned from
+   iterator_destroy. */
+bool iterator_is_null(struct reftable_iterator it);
+
+/* iterator that produces only ref records that point to `oid` */
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reftable_table tab;
+	struct slice oid;
+	struct reftable_iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
+					  struct filtering_ref_iterator *);
+
+/* iterator that produces only ref records that point to `oid`,
+   but using the object index.
+ */
+struct indexed_table_ref_iter {
+	struct reftable_reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 00000000000..5a1b896e0b0
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,327 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct record rec = new_record(mi->typ);
+		int err = iterator_next(mi->stack[i], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[i]);
+			record_destroy(&rec);
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->stack[i]);
+	}
+	reftable_free(mi->stack);
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
+{
+	if (iterator_is_null(mi->stack[idx])) {
+		return 0;
+	}
+
+	{
+		struct record rec = new_record(mi->typ);
+		struct pq_entry e = {
+			.rec = rec,
+			.index = idx,
+		};
+		int err = iterator_next(mi->stack[idx], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[idx]);
+			record_destroy(&rec);
+			return 0;
+		}
+
+		merged_iter_pqueue_add(&mi->pq, e);
+	}
+	return 0;
+}
+
+static int merged_iter_next_entry(struct merged_iter *mi, struct record rec)
+{
+	struct slice entry_key = { 0 };
+	struct pq_entry entry = { 0 };
+	int err = 0;
+
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	entry = merged_iter_pqueue_remove(&mi->pq);
+	err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0) {
+		return err;
+	}
+
+	/*
+	  One can also use reftable as datacenter-local storage, where the ref
+	  database is maintained in globally consistent database (eg.
+	  CockroachDB or Spanner). In this scenario, replication delays together
+	  with compaction may cause newer tables to contain older entries. In
+	  such a deployment, the loop below must be changed to collect all
+	  entries for the same key, and return new the newest one.
+	*/
+	record_key(entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = { 0 };
+		int err = 0, cmp = 0;
+
+		record_key(top.rec, &k);
+
+		cmp = slice_compare(k, entry_key);
+		slice_clear(&k);
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		record_clear(top.rec);
+		reftable_free(record_yield(&top.rec));
+	}
+
+	record_copy_from(rec, entry.rec, hash_size(mi->hash_id));
+	record_clear(entry.rec);
+	reftable_free(record_yield(&entry.rec));
+	slice_clear(&entry_key);
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct record rec)
+{
+	while (true) {
+		int err = merged_iter_next_entry(mi, rec);
+		if (err == 0 && mi->suppress_deletions &&
+		    record_is_deletion(rec)) {
+			continue;
+		}
+
+		return err;
+	}
+}
+
+static int merged_iter_next_void(void *p, struct record rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	return merged_iter_next(mi, rec);
+}
+
+struct reftable_iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct reftable_iterator *it,
+				      struct merged_iter *mi)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id)
+{
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reftable_reader *r = stack[i];
+		if (r->hash_id != hash_id) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reftable_reader_min_update_index(r);
+		}
+
+		last_max = reftable_reader_max_update_index(r);
+	}
+
+	{
+		struct reftable_merged_table m = {
+			.stack = stack,
+			.stack_len = n,
+			.min = first_min,
+			.max = last_max,
+			.hash_id = hash_id,
+		};
+
+		*dest = reftable_calloc(sizeof(struct reftable_merged_table));
+		**dest = m;
+	}
+	return 0;
+}
+
+void reftable_merged_table_close(struct reftable_merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reftable_reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct reftable_merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void reftable_merged_table_free(struct reftable_merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	reftable_free(mt);
+}
+
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
+{
+	return mt->min;
+}
+
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it, struct record rec)
+{
+	struct reftable_iterator *iters = reftable_calloc(
+		sizeof(struct reftable_iterator) * mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = record_type(rec),
+		.hash_id = mt->hash_id,
+		.suppress_deletions = mt->suppress_deletions,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			reftable_iterator_destroy(&iters[i]);
+		}
+		reftable_free(iters);
+		return err;
+	}
+
+	merged.stack_len = n;
+	err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	}
+
+	{
+		struct merged_iter *p =
+			reftable_malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 00000000000..a71f36ecbae
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,38 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct reftable_merged_table {
+	struct reftable_reader **stack;
+	int stack_len;
+	uint32_t hash_id;
+	bool suppress_deletions;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	int stack_len;
+	byte typ;
+	bool suppress_deletions;
+	struct merged_iter_pqueue pq;
+};
+
+void merged_table_clear(struct reftable_merged_table *mt);
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it, struct record rec);
+
+#endif
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 00000000000..bfad96c6695
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,115 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = { 0 };
+	struct slice bk = { 0 };
+	int cmp = 0;
+	record_key(a.rec, &ak);
+	record_key(b.rec, &bk);
+
+	cmp = slice_compare(ak, bk);
+
+	slice_clear(&ak);
+	slice_clear(&bk);
+
+	if (cmp == 0) {
+		return a.index > b.index;
+	}
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = reftable_realloc(pq->heap,
+					    pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		record_clear(pq->heap[i].rec);
+		reftable_free(record_yield(&pq->heap[i].rec));
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 00000000000..b585a52ee14
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 00000000000..b5744de3378
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,753 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct reftable_block_source source)
+{
+	return source.ops->size(source.arg);
+}
+
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	int result = source.ops->read_block(source.arg, dest, off, size);
+	dest->source = source;
+	return result;
+}
+
+void reftable_block_done(struct reftable_block *blockp)
+{
+	struct reftable_block_source source = blockp->source;
+	if (blockp != NULL && source.ops != NULL)
+		source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct reftable_block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reftable_reader_offsets *
+reader_offsets_for(struct reftable_reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reftable_reader *r,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size) {
+		return 0;
+	}
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(r->source, dest, off, sz);
+}
+
+uint32_t reftable_reader_hash_id(struct reftable_reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reftable_reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reftable_reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	int err = 0;
+	if (memcmp(f, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, header_size(r->version))) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+
+	f++;
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	if (r->version == 1) {
+		r->hash_id = SHA1_ID;
+	} else {
+		r->hash_id = get_be32(f);
+		switch (r->hash_id) {
+		case SHA1_ID:
+			break;
+		case SHA256_ID:
+			break;
+		default:
+			err = REFTABLE_FORMAT_ERROR;
+			goto exit;
+		}
+		f += 4;
+	}
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	{
+		uint32_t computed_crc = crc32(0, footer, f - footer);
+		uint32_t file_crc = get_be32(f);
+		f += 4;
+		if (computed_crc != file_crc) {
+			err = REFTABLE_FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	{
+		byte first_block_typ = header[header_size(r->version)];
+		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+		r->ref_offsets.offset = 0;
+		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+					  r->log_offsets.offset > 0);
+		r->obj_offsets.present = r->obj_offsets.offset > 0;
+	}
+	err = 0;
+exit:
+	return err;
+}
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
+		const char *name)
+{
+	struct reftable_block footer = { 0 };
+	struct reftable_block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reftable_reader));
+
+	/* Need +1 to read type of first block. */
+	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
+	if (err != header_size(2) + 1) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	if (memcmp(header.data, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+	r->version = header.data[4];
+	if (r->version != 1 && r->version != 2) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+
+	r->size = block_source_size(source) - footer_size(r->version);
+	r->source = source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size,
+				      footer_size(r->version));
+	if (err != footer_size(r->version)) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+exit:
+	reftable_block_done(&footer);
+	reftable_block_done(&header);
+	return err;
+}
+
+struct table_iter {
+	struct reftable_reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
+		((struct reftable_ref_record *)rec.data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reftable_block_done(&ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off,
+				  int version)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += header_size(version);
+	}
+
+	*typ = data[0];
+	if (is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct reftable_block block = { 0 };
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : header_size(r->version);
+	int32_t block_size = 0;
+
+	if (next_off >= r->size) {
+		return 1;
+	}
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0) {
+		return err;
+	}
+
+	block_size = extract_block_size(block.data, &block_typ, next_off,
+					r->version);
+	if (block_size < 0) {
+		return block_size;
+	}
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reftable_block_done(&block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reftable_block_done(&block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0) {
+		return err;
+	}
+
+	{
+		struct block_reader *brp =
+			reftable_malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct record rec)
+{
+	if (record_type(rec) != ti->typ) {
+		return REFTABLE_API_ERROR;
+	}
+
+	while (true) {
+		struct table_iter next = { 0 };
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct record rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct reftable_iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reftable_reader *r,
+				struct table_iter *ti, uint64_t off, byte typ)
+{
+	struct block_reader br = { 0 };
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0) {
+		return err;
+	}
+
+	brp = reftable_malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reftable_reader *r, struct table_iter *ti,
+			byte typ, bool index)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
+			      struct record want)
+{
+	struct record rec = new_record(record_type(want));
+	struct slice want_key = { 0 };
+	struct slice got_key = { 0 };
+	struct table_iter next = { 0 };
+	int err = -1;
+	record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0) {
+			goto exit;
+		}
+		{
+			int cmp = slice_compare(got_key, want_key);
+			if (cmp > 0) {
+				table_iter_block_done(&next);
+				break;
+			}
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0) {
+		goto exit;
+	}
+	err = 0;
+
+exit:
+	block_iter_close(&next.bi);
+	record_destroy(&rec);
+	slice_clear(&want_key);
+	slice_clear(&got_key);
+	return err;
+}
+
+static int reader_seek_indexed(struct reftable_reader *r,
+			       struct reftable_iterator *it, struct record rec)
+{
+	struct index_record want_index = { 0 };
+	struct record want_index_rec = { 0 };
+	struct index_record index_result = { 0 };
+	struct record index_result_rec = { 0 };
+	struct table_iter index_iter = { 0 };
+	struct table_iter next = { 0 };
+	int err = 0;
+
+	record_key(rec, &want_index.last_key);
+	record_from_index(&want_index_rec, &want_index);
+	record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, record_type(rec), true);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reader_seek_linear(r, &index_iter, want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (next.typ == record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = REFTABLE_FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter *malloced =
+			reftable_calloc(sizeof(struct table_iter));
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+exit:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	record_clear(want_index_rec);
+	record_clear(index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reftable_reader *r,
+				struct reftable_iterator *it, struct record rec)
+{
+	struct reftable_reader_offsets *offs =
+		reader_offsets_for(r, record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = { 0 };
+	int err = 0;
+	if (idx > 0) {
+		return reader_seek_indexed(r, it, rec);
+	}
+
+	err = reader_start(r, &ti, record_type(rec), false);
+	if (err < 0) {
+		return err;
+	}
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct table_iter *p =
+			reftable_malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct record rec)
+{
+	byte typ = record_type(rec);
+
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return reader_seek(r, it, rec);
+}
+
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return reader_seek(r, it, rec);
+}
+
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reftable_reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int reftable_new_reader(struct reftable_reader **p,
+			struct reftable_block_source src, char const *name)
+{
+	struct reftable_reader *rd =
+		reftable_calloc(sizeof(struct reftable_reader));
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		block_source_close(&src);
+		reftable_free(rd);
+	}
+	return err;
+}
+
+void reftable_reader_free(struct reftable_reader *r)
+{
+	reader_close(r);
+	reftable_free(r);
+}
+
+static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
+					    struct reftable_iterator *it,
+					    byte *oid)
+{
+	struct obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct record want_rec = { 0 };
+	struct reftable_iterator oit = { 0 };
+	struct obj_record got = { 0 };
+	struct record got_rec = { 0 };
+	int err = 0;
+
+	/* Look through the reverse index. */
+	record_from_obj(&want_rec, &want);
+	err = reader_seek(r, &oit, want_rec);
+	if (err != 0) {
+		goto exit;
+	}
+
+	/* read out the obj_record */
+	record_from_obj(&got_rec, &got);
+	err = iterator_next(oit, got_rec);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		/* didn't find it; return empty iterator */
+		iterator_set_empty(it);
+		err = 0;
+		goto exit;
+	}
+
+	{
+		struct indexed_table_ref_iter *itr = NULL;
+		err = new_indexed_table_ref_iter(&itr, r, oid,
+						 hash_size(r->hash_id),
+						 got.offsets, got.offset_len);
+		if (err < 0) {
+			goto exit;
+		}
+		got.offsets = NULL;
+		iterator_from_indexed_table_ref_iter(it, itr);
+	}
+
+exit:
+	reftable_iterator_destroy(&oit);
+	record_clear(got_rec);
+	return err;
+}
+
+static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
+					      struct reftable_iterator *it,
+					      byte *oid)
+{
+	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
+	struct filtering_ref_iterator *filter = NULL;
+	int oid_len = hash_size(r->hash_id);
+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		reftable_free(ti);
+		return err;
+	}
+
+	filter = reftable_calloc(sizeof(struct filtering_ref_iterator));
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	reftable_table_from_reader(&filter->tab, r);
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, byte *oid)
+{
+	if (r->obj_offsets.present) {
+		return reftable_reader_refs_for_indexed(r, it, oid);
+	}
+	return reftable_reader_refs_for_unindexed(r, it, oid);
+}
+
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 00000000000..fee83c6dad0
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,65 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source source);
+
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_close(struct reftable_block_source *source);
+
+/* metadata for a block type */
+struct reftable_reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+/* The state for reading a reftable file. */
+struct reftable_reader {
+	/* for convience, associate a name with the instance. */
+	char *name;
+	struct reftable_block_source source;
+
+	/* Size of the file, excluding the footer. */
+	uint64_t size;
+
+	/* 'sha1' for SHA1, 's256' for SHA-256 */
+	uint32_t hash_id;
+
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	/* Length of the OID keys in the 'o' section */
+	int object_id_len;
+	int version;
+
+	struct reftable_reader_offsets ref_offsets;
+	struct reftable_reader_offsets obj_offsets;
+	struct reftable_reader_offsets log_offsets;
+};
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
+		const char *name);
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct record rec);
+void reader_close(struct reftable_reader *r);
+const char *reader_name(struct reftable_reader *r);
+
+/* initialize a block reader to read from `r` */
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 00000000000..2af54fdf90b
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1133 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+/* record.c - methods for different types of records. */
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0) {
+		return -1;
+	}
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = { 0 };
+	int i = 9;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	{
+		int n = sizeof(buf) - i - 1;
+		if (dest.len < n) {
+			return -1;
+		}
+		memcpy(dest.buf, &buf[i + 1], n);
+		return n;
+	}
+}
+
+int is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+	if (in.len < tsize) {
+		return -1;
+	}
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	slice_consume(&in, tsize);
+
+	return start_len - in.len;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+	if (s.len < l) {
+		return -1;
+	}
+	memcpy(s.buf, str, l);
+	slice_consume(&s, l);
+
+	return start.len - s.len;
+}
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	if (dest.len < suffix_len) {
+		return -1;
+	}
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	slice_consume(&dest, suffix_len);
+
+	return start.len - dest.len;
+}
+
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	if (prefix_len > last_key.len) {
+		return -1;
+	}
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len) {
+		return -1;
+	}
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	slice_consume(&in, suffix_len);
+
+	return start_len - in.len;
+}
+
+static void reftable_ref_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_ref_record *rec =
+		(const struct reftable_ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
+	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	reftable_ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = xstrdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = reftable_malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = reftable_malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9) {
+		return '0' + c;
+	}
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size(hash_id));
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size(hash_id));
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void reftable_ref_record_clear_void(void *rec)
+{
+	reftable_ref_record_clear((struct reftable_ref_record *)rec);
+}
+
+void reftable_ref_record_clear(struct reftable_ref_record *ref)
+{
+	reftable_free(ref->ref_name);
+	reftable_free(ref->target);
+	reftable_free(ref->target_value);
+	reftable_free(ref->value);
+	memset(ref, 0, sizeof(struct reftable_ref_record));
+}
+
+static byte reftable_ref_record_val_type(const void *rec)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL) {
+		return 3;
+	}
+	return 0;
+}
+
+static int reftable_ref_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+
+	return start.len - s.len;
+}
+
+static int reftable_ref_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0) {
+		return n;
+	}
+	assert(hash_size > 0);
+
+	slice_consume(&in, n);
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = reftable_malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = reftable_malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		break;
+	case 3: {
+		struct slice dest = { 0 };
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+		seen_target = true;
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+static bool reftable_ref_record_is_deletion_void(const void *p)
+{
+	return reftable_ref_record_is_deletion(
+		(const struct reftable_ref_record *)p);
+}
+
+struct record_vtable reftable_ref_record_vtable = {
+	.key = &reftable_ref_record_key,
+	.type = BLOCK_TYPE_REF,
+	.copy_from = &reftable_ref_record_copy_from,
+	.val_type = &reftable_ref_record_val_type,
+	.encode = &reftable_ref_record_encode,
+	.decode = &reftable_ref_record_decode,
+	.clear = &reftable_ref_record_clear_void,
+	.is_deletion = &reftable_ref_record_is_deletion_void,
+};
+
+static void obj_record_key(const void *r, struct slice *dest)
+{
+	const struct obj_record *rec = (const struct obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void obj_record_clear(void *rec)
+{
+	struct obj_record *obj = (struct obj_record *)rec;
+	FREE_AND_NULL(obj->hash_prefix);
+	FREE_AND_NULL(obj->offsets);
+	memset(obj, 0, sizeof(struct obj_record));
+}
+
+static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct obj_record *obj = (struct obj_record *)rec;
+	const struct obj_record *src = (const struct obj_record *)src_rec;
+
+	obj_record_clear(obj);
+	*obj = *src;
+	obj->hash_prefix = reftable_malloc(obj->hash_prefix_len);
+	memcpy(obj->hash_prefix, src->hash_prefix, obj->hash_prefix_len);
+
+	{
+		int olen = obj->offset_len * sizeof(uint64_t);
+		obj->offsets = reftable_malloc(olen);
+		memcpy(obj->offsets, src->offsets, olen);
+	}
+}
+
+static byte obj_record_val_type(const void *rec)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8) {
+		return r->offset_len;
+	}
+	return 0;
+}
+
+static int obj_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+	if (r->offset_len == 0) {
+		return start.len - s.len;
+	}
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	{
+		uint64_t last = r->offsets[0];
+		int i = 0;
+		for (i = 1; i < r->offset_len; i++) {
+			int n = put_var_int(s, r->offsets[i] - last);
+			if (n < 0) {
+				return -1;
+			}
+			slice_consume(&s, n);
+			last = r->offsets[i];
+		}
+	}
+	return start.len - s.len;
+}
+
+static int obj_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct obj_record *r = (struct obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	r->hash_prefix = reftable_malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		slice_consume(&in, n);
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0) {
+		return start.len - in.len;
+	}
+
+	r->offsets = reftable_malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0) {
+		return n;
+	}
+	slice_consume(&in, n);
+
+	{
+		uint64_t last = r->offsets[0];
+		int j = 1;
+		while (j < count) {
+			uint64_t delta = 0;
+			int n = get_var_int(&delta, in);
+			if (n < 0) {
+				return n;
+			}
+			slice_consume(&in, n);
+
+			last = r->offsets[j] = (delta + last);
+			j++;
+		}
+	}
+	return start.len - in.len;
+}
+
+static bool not_a_deletion(const void *p)
+{
+	return false;
+}
+
+struct record_vtable obj_record_vtable = {
+	.key = &obj_record_key,
+	.type = BLOCK_TYPE_OBJ,
+	.copy_from = &obj_record_copy_from,
+	.val_type = &obj_record_val_type,
+	.encode = &obj_record_encode,
+	.decode = &obj_record_decode,
+	.clear = &obj_record_clear,
+	.is_deletion = not_a_deletion,
+};
+
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size(hash_id));
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size(hash_id));
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static void reftable_log_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_log_record *rec =
+		(const struct reftable_log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_be64(dest->buf + 1 + len, ts);
+}
+
+static void reftable_log_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
+	const struct reftable_log_record *src =
+		(const struct reftable_log_record *)src_rec;
+
+	reftable_log_record_clear(dst);
+	*dst = *src;
+	if (dst->ref_name != NULL) {
+		dst->ref_name = xstrdup(dst->ref_name);
+	}
+	if (dst->email != NULL) {
+		dst->email = xstrdup(dst->email);
+	}
+	if (dst->name != NULL) {
+		dst->name = xstrdup(dst->name);
+	}
+	if (dst->message != NULL) {
+		dst->message = xstrdup(dst->message);
+	}
+
+	if (dst->new_hash != NULL) {
+		dst->new_hash = reftable_malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = reftable_malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void reftable_log_record_clear_void(void *rec)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	reftable_log_record_clear(r);
+}
+
+void reftable_log_record_clear(struct reftable_log_record *r)
+{
+	reftable_free(r->ref_name);
+	reftable_free(r->new_hash);
+	reftable_free(r->old_hash);
+	reftable_free(r->name);
+	reftable_free(r->email);
+	reftable_free(r->message);
+	memset(r, 0, sizeof(struct reftable_log_record));
+}
+
+static byte reftable_log_record_val_type(const void *rec)
+{
+	const struct reftable_log_record *log =
+		(const struct reftable_log_record *)rec;
+
+	return reftable_log_record_is_deletion(log) ? 0 : 1;
+}
+
+static byte zero[SHA256_SIZE] = { 0 };
+
+static int reftable_log_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (reftable_log_record_is_deletion(r)) {
+		return 0;
+	}
+
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size) {
+		return -1;
+	}
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	slice_consume(&s, 2 * hash_size);
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = put_var_int(s, r->time);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (s.len < 2) {
+		return -1;
+	}
+
+	put_be16(s.buf, r->tz_offset);
+	slice_consume(&s, 2);
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	return start.len - s.len;
+}
+
+static int reftable_log_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = { 0 };
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (val_type == 0) {
+		return 0;
+	}
+
+	if (in.len < 2 * hash_size) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	r->old_hash = reftable_realloc(r->old_hash, hash_size);
+	r->new_hash = reftable_realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	slice_consume(&in, 2 * hash_size);
+
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->name = reftable_realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->email = reftable_realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+	r->time = ts;
+	if (in.len < 2) {
+		goto error;
+	}
+
+	r->tz_offset = get_be16(in.buf);
+	slice_consume(&in, 2);
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->message = reftable_realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	slice_clear(&dest);
+	return start.len - in.len;
+
+error:
+	slice_clear(&dest);
+	return REFTABLE_FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL) {
+		a = empty;
+	}
+	if (b == NULL) {
+		b = empty;
+	}
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL) {
+		a = zero;
+	}
+	if (b == NULL) {
+		b = zero;
+	}
+	return !memcmp(a, b, sz);
+}
+
+bool reftable_log_record_equal(struct reftable_log_record *a,
+			       struct reftable_log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+static bool reftable_log_record_is_deletion_void(const void *p)
+{
+	return reftable_log_record_is_deletion(
+		(const struct reftable_log_record *)p);
+}
+
+struct record_vtable reftable_log_record_vtable = {
+	.key = &reftable_log_record_key,
+	.type = BLOCK_TYPE_LOG,
+	.copy_from = &reftable_log_record_copy_from,
+	.val_type = &reftable_log_record_val_type,
+	.encode = &reftable_log_record_encode,
+	.decode = &reftable_log_record_decode,
+	.clear = &reftable_log_record_clear_void,
+	.is_deletion = &reftable_log_record_is_deletion_void,
+};
+
+struct record new_record(byte typ)
+{
+	struct record rec = { NULL };
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct reftable_ref_record *r =
+			reftable_calloc(sizeof(struct reftable_ref_record));
+		record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct obj_record *r =
+			reftable_calloc(sizeof(struct obj_record));
+		record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct reftable_log_record *r =
+			reftable_calloc(sizeof(struct reftable_log_record));
+		record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct index_record *r =
+			reftable_calloc(sizeof(struct index_record));
+		record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+void *record_yield(struct record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+void record_destroy(struct record *rec)
+{
+	record_clear(*rec);
+	reftable_free(record_yield(rec));
+}
+
+static void index_record_key(const void *r, struct slice *dest)
+{
+	struct index_record *rec = (struct index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void index_record_copy_from(void *rec, const void *src_rec,
+				   int hash_size)
+{
+	struct index_record *dst = (struct index_record *)rec;
+	struct index_record *src = (struct index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void index_record_clear(void *rec)
+{
+	struct index_record *idx = (struct index_record *)rec;
+	slice_clear(&idx->last_key);
+}
+
+static byte index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int index_record_encode(const void *rec, struct slice out, int hash_size)
+{
+	const struct index_record *r = (const struct index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&out, n);
+
+	return start.len - out.len;
+}
+
+static int index_record_decode(void *rec, struct slice key, byte val_type,
+			       struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct index_record *r = (struct index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&in, n);
+	return start.len - in.len;
+}
+
+struct record_vtable index_record_vtable = {
+	.key = &index_record_key,
+	.type = BLOCK_TYPE_INDEX,
+	.copy_from = &index_record_copy_from,
+	.val_type = &index_record_val_type,
+	.encode = &index_record_encode,
+	.decode = &index_record_decode,
+	.clear = &index_record_clear,
+	.is_deletion = &not_a_deletion,
+};
+
+void record_key(struct record rec, struct slice *dest)
+{
+	rec.ops->key(rec.data, dest);
+}
+
+byte record_type(struct record rec)
+{
+	return rec.ops->type;
+}
+
+int record_encode(struct record rec, struct slice dest, int hash_size)
+{
+	return rec.ops->encode(rec.data, dest, hash_size);
+}
+
+void record_copy_from(struct record rec, struct record src, int hash_size)
+{
+	assert(src.ops->type == rec.ops->type);
+
+	rec.ops->copy_from(rec.data, src.data, hash_size);
+}
+
+byte record_val_type(struct record rec)
+{
+	return rec.ops->val_type(rec.data);
+}
+
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size)
+{
+	return rec.ops->decode(rec.data, key, extra, src, hash_size);
+}
+
+void record_clear(struct record rec)
+{
+	rec.ops->clear(rec.data);
+}
+
+bool record_is_deletion(struct record rec)
+{
+	return rec.ops->is_deletion(rec.data);
+}
+
+void record_from_ref(struct record *rec, struct reftable_ref_record *ref_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = ref_rec;
+	rec->ops = &reftable_ref_record_vtable;
+}
+
+void record_from_obj(struct record *rec, struct obj_record *obj_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = obj_rec;
+	rec->ops = &obj_record_vtable;
+}
+
+void record_from_index(struct record *rec, struct index_record *index_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = index_rec;
+	rec->ops = &index_record_vtable;
+}
+
+void record_from_log(struct record *rec, struct reftable_log_record *log_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = log_rec;
+	rec->ops = &reftable_log_record_vtable;
+}
+
+struct reftable_ref_record *record_as_ref(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_REF);
+	return (struct reftable_ref_record *)rec.data;
+}
+
+struct reftable_log_record *record_as_log(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_LOG);
+	return (struct reftable_log_record *)rec.data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL) {
+		return !memcmp(a, b, hash_size);
+	}
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL) {
+		return 0 == strcmp(a, b);
+	}
+
+	return a == b;
+}
+
+bool reftable_ref_record_equal(struct reftable_ref_record *a,
+			       struct reftable_ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int reftable_ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_ref_record *)a)->ref_name,
+		      ((struct reftable_ref_record *)b)->ref_name);
+}
+
+bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int reftable_log_record_compare_key(const void *a, const void *b)
+{
+	struct reftable_log_record *la = (struct reftable_log_record *)a;
+	struct reftable_log_record *lb = (struct reftable_log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp) {
+		return cmp;
+	}
+	if (la->update_index > lb->update_index) {
+		return -1;
+	}
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
+{
+	return (log->new_hash == NULL && log->old_hash == NULL &&
+		log->name == NULL && log->email == NULL &&
+		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
+		log->message == NULL);
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 00000000000..011d9bc56fc
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,121 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+/* utilities for de/encoding varints */
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+
+/* Methods for records. */
+struct record_vtable {
+	/* encode the key of to a byte slice. */
+	void (*key)(const void *rec, struct slice *dest);
+
+	/* The record type of ('r' for ref). */
+	byte type;
+
+	void (*copy_from)(void *dest, const void *src, int hash_size);
+
+	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
+	 * vs ref deletion) */
+	byte (*val_type)(const void *rec);
+
+	/* encodes rec into dest, returning how much space was used. */
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+
+	/* decode data from `src` into the record. */
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+
+	/* deallocate and null the record. */
+	void (*clear)(void *rec);
+
+	/* is this a tombstone? */
+	bool (*is_deletion)(const void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct record {
+	void *data;
+	struct record_vtable *ops;
+};
+
+/* returns true for recognized block types. Block start with the block type. */
+int is_block_type(byte typ);
+
+/* creates a malloced record of the given type. Dispose with record_destroy */
+struct record new_record(byte typ);
+
+extern struct record_vtable reftable_ref_record_vtable;
+
+/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
+   number of bytes written. */
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra);
+
+/* Decode into `key` and `extra` from `in` */
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in);
+
+/* index_record are used internally to speed up lookups. */
+struct index_record {
+	uint64_t offset; /* Offset of block */
+	struct slice last_key; /* Last key of the block. */
+};
+
+/* obj_record stores an object ID => ref mapping. */
+struct obj_record {
+	byte *hash_prefix; /* leading bytes of the object ID */
+	int hash_prefix_len; /* number of leading bytes. Constant
+			      * across a single table. */
+	uint64_t *offsets; /* a vector of file offsets. */
+	int offset_len;
+};
+
+/* see struct record_vtable */
+
+void record_key(struct record rec, struct slice *dest);
+byte record_type(struct record rec);
+void record_copy_from(struct record rec, struct record src, int hash_size);
+byte record_val_type(struct record rec);
+int record_encode(struct record rec, struct slice dest, int hash_size);
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size);
+bool record_is_deletion(struct record rec);
+
+/* zeroes out the embedded record */
+void record_clear(struct record rec);
+
+/* clear out the record, yielding the record data that was encapsulated. */
+void *record_yield(struct record *rec);
+
+/* clear and deallocate embedded record, and zero `rec`. */
+void record_destroy(struct record *rec);
+
+/* initialize generic records from concrete records. The generic record should
+ * be zeroed out. */
+void record_from_obj(struct record *rec, struct obj_record *objrec);
+void record_from_index(struct record *rec, struct index_record *idxrec);
+void record_from_ref(struct record *rec, struct reftable_ref_record *refrec);
+void record_from_log(struct record *rec, struct reftable_log_record *logrec);
+struct reftable_ref_record *record_as_ref(struct record ref);
+struct reftable_log_record *record_as_log(struct record ref);
+
+/* for qsort. */
+int reftable_ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int reftable_log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/refname.c b/reftable/refname.c
new file mode 100644
index 00000000000..c92488924ac
--- /dev/null
+++ b/reftable/refname.c
@@ -0,0 +1,215 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+#include "reftable.h"
+#include "basics.h"
+#include "refname.h"
+#include "slice.h"
+
+struct find_arg {
+	char **names;
+	const char *want;
+};
+
+static int find_name(size_t k, void *arg)
+{
+	struct find_arg *f_arg = (struct find_arg *)arg;
+
+	return strcmp(f_arg->names[k], f_arg->want) >= 0;
+}
+
+int modification_has_ref(struct modification *mod, const char *name)
+{
+	struct reftable_ref_record ref = { 0 };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = name,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len && !strcmp(mod->add[idx], name)) {
+			return 0;
+		}
+	}
+
+	if (mod->del_len > 0) {
+		struct find_arg arg = {
+			.names = mod->del,
+			.want = name,
+		};
+		int idx = binsearch(mod->del_len, find_name, &arg);
+		if (idx < mod->del_len && !strcmp(mod->del[idx], name)) {
+			return 1;
+		}
+	}
+
+	err = reftable_table_read_ref(mod->tab, name, &ref);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static void modification_clear(struct modification *mod)
+{
+	FREE_AND_NULL(mod->add);
+	FREE_AND_NULL(mod->del);
+	mod->add_len = 0;
+	mod->del_len = 0;
+}
+
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix)
+{
+	struct reftable_iterator it = { NULL };
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = prefix,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len &&
+		    !strncmp(prefix, mod->add[idx], strlen(prefix))) {
+			goto exit;
+		}
+	}
+
+	err = reftable_table_seek_ref(mod->tab, &it, prefix);
+	if (err) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_ref(it, &ref);
+		if (err) {
+			goto exit;
+		}
+
+		if (mod->del_len > 0) {
+			struct find_arg arg = {
+				.names = mod->del,
+				.want = ref.ref_name,
+			};
+			int idx = binsearch(mod->del_len, find_name, &arg);
+			if (idx < mod->del_len &&
+			    !strcmp(ref.ref_name, mod->del[idx])) {
+				continue;
+			}
+		}
+
+		if (strncmp(ref.ref_name, prefix, strlen(prefix))) {
+			err = 1;
+			goto exit;
+		}
+		err = 0;
+		goto exit;
+	}
+
+exit:
+	reftable_ref_record_clear(&ref);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int validate_ref_name(const char *name)
+{
+	while (true) {
+		char *next = strchr(name, '/');
+		if (!*name) {
+			return REFTABLE_REFNAME_ERROR;
+		}
+		if (!next) {
+			return 0;
+		}
+		if (next - name == 0 || (next - name == 1 && *name == '.') ||
+		    (next - name == 2 && name[0] == '.' && name[1] == '.'))
+			return REFTABLE_REFNAME_ERROR;
+		name = next + 1;
+	}
+	return 0;
+}
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz)
+{
+	struct modification mod = {
+		.tab = tab,
+		.add = reftable_calloc(sizeof(char *) * sz),
+		.del = reftable_calloc(sizeof(char *) * sz),
+	};
+	int i = 0;
+	int err = 0;
+	for (; i < sz; i++) {
+		if (reftable_ref_record_is_deletion(&recs[i])) {
+			mod.del[mod.del_len++] = recs[i].ref_name;
+		} else {
+			mod.add[mod.add_len++] = recs[i].ref_name;
+		}
+	}
+
+	err = modification_validate(&mod);
+	modification_clear(&mod);
+	return err;
+}
+
+static void slice_trim_component(struct slice *sl)
+{
+	while (sl->len > 0) {
+		bool is_slash = (sl->buf[sl->len - 1] == '/');
+		sl->len--;
+		if (is_slash)
+			break;
+	}
+}
+
+int modification_validate(struct modification *mod)
+{
+	struct slice slashed = { 0 };
+	int err = 0;
+	int i = 0;
+	for (; i < mod->add_len; i++) {
+		err = validate_ref_name(mod->add[i]);
+		if (err) {
+			goto exit;
+		}
+		slice_set_string(&slashed, mod->add[i]);
+		slice_append_string(&slashed, "/");
+
+		err = modification_has_ref_with_prefix(
+			mod, slice_as_string(&slashed));
+		if (err == 0) {
+			err = REFTABLE_NAME_CONFLICT;
+			goto exit;
+		}
+		if (err < 0) {
+			goto exit;
+		}
+
+		slice_set_string(&slashed, mod->add[i]);
+		while (slashed.len) {
+			slice_trim_component(&slashed);
+			err = modification_has_ref(mod,
+						   slice_as_string(&slashed));
+			if (err == 0) {
+				err = REFTABLE_NAME_CONFLICT;
+				goto exit;
+			}
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+	err = 0;
+exit:
+	slice_clear(&slashed);
+	return err;
+}
diff --git a/reftable/refname.h b/reftable/refname.h
new file mode 100644
index 00000000000..f335287fcdb
--- /dev/null
+++ b/reftable/refname.h
@@ -0,0 +1,38 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+#ifndef REFNAME_H
+#define REFNAME_H
+
+#include "reftable.h"
+
+struct modification {
+	struct reftable_table tab;
+
+	char **add;
+	size_t add_len;
+
+	char **del;
+	size_t del_len;
+};
+
+// -1 = error, 0 = found, 1 = not found
+int modification_has_ref(struct modification *mod, const char *name);
+
+// -1 = error, 0 = found, 1 = not found.
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix);
+
+// 0 = OK.
+int validate_ref_name(const char *name);
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz);
+
+int modification_validate(struct modification *mod);
+
+#endif
diff --git a/reftable/reftable.c b/reftable/reftable.c
new file mode 100644
index 00000000000..d29fbd8ee44
--- /dev/null
+++ b/reftable/reftable.c
@@ -0,0 +1,91 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+#include "record.h"
+#include "reader.h"
+#include "merged.h"
+
+struct reftable_table_vtable {
+	int (*seek)(void *tab, struct reftable_iterator *it, struct record);
+};
+
+static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
+				     struct record rec)
+{
+	return reader_seek((struct reftable_reader *)tab, it, rec);
+}
+
+static struct reftable_table_vtable reader_vtable = {
+	.seek = reftable_reader_seek_void,
+};
+
+static int reftable_merged_table_seek_void(void *tab,
+					   struct reftable_iterator *it,
+					   struct record rec)
+{
+	return merged_table_seek_record((struct reftable_merged_table *)tab, it,
+					rec);
+}
+
+static struct reftable_table_vtable merged_table_vtable = {
+	.seek = reftable_merged_table_seek_void,
+};
+
+int reftable_table_seek_ref(struct reftable_table tab,
+			    struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return tab.ops->seek(tab.table_arg, it, rec);
+}
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &reader_vtable;
+	tab->table_arg = reader;
+}
+
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *merged)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &merged_table_vtable;
+	tab->table_arg = merged;
+}
+
+int reftable_table_read_ref(struct reftable_table tab, const char *name,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_iterator it = { 0 };
+	int err = reftable_table_seek_ref(tab, &it, name);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_ref(it, ref);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(ref->ref_name, name) ||
+	    reftable_ref_record_is_deletion(ref)) {
+		reftable_ref_record_clear(ref);
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 00000000000..557b76f2dae
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,563 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *));
+
+/****************************************************************
+ Basic data types
+
+ Reftables store the state of each ref in struct reftable_ref_record, and they
+ store a sequence of reflog updates in struct reftable_log_record.
+ ****************************************************************/
+
+/* reftable_ref_record holds a ref database entry target_value */
+struct reftable_ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
+
+/* prints a reftable_ref_record onto stdout */
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id);
+
+/* frees and nulls all pointer values. */
+void reftable_ref_record_clear(struct reftable_ref_record *ref);
+
+/* returns whether two reftable_ref_records are the same */
+int reftable_ref_record_equal(struct reftable_ref_record *a,
+			      struct reftable_ref_record *b, int hash_size);
+
+/* reftable_log_record holds a reflog entry */
+struct reftable_log_record {
+	char *ref_name;
+	uint64_t update_index; /* logical timestamp of a transactional update.
+				*/
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int reftable_log_record_is_deletion(const struct reftable_log_record *log);
+
+/* frees and nulls all pointer values. */
+void reftable_log_record_clear(struct reftable_log_record *log);
+
+/* returns whether two records are equal. */
+int reftable_log_record_equal(struct reftable_log_record *a,
+			      struct reftable_log_record *b, int hash_size);
+
+/* dumps a reftable_log_record on stdout, for debugging/testing. */
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id);
+
+/****************************************************************
+ Error handling
+
+ Error are signaled with negative integer return values. 0 means success.
+ ****************************************************************/
+
+/* different types of errors */
+enum reftable_error {
+	/* Unexpected file system behavior */
+	REFTABLE_IO_ERROR = -2,
+
+	/* Format inconsistency on reading data
+	 */
+	REFTABLE_FORMAT_ERROR = -3,
+
+	/* File does not exist. Returned from block_source_from_file(),  because
+	   it needs special handling in stack.
+	*/
+	REFTABLE_NOT_EXIST_ERROR = -4,
+
+	/* Trying to write out-of-date data. */
+	REFTABLE_LOCK_ERROR = -5,
+
+	/* Misuse of the API:
+	   - on writing a record with NULL ref_name.
+	   - on writing a reftable_ref_record outside the table limits
+	   - on writing a ref or log record before the stack's next_update_index
+	   - on reading a reftable_ref_record from log iterator, or vice versa.
+	*/
+	REFTABLE_API_ERROR = -6,
+
+	/* Decompression error */
+	REFTABLE_ZLIB_ERROR = -7,
+
+	/* Wrote a table without blocks. */
+	REFTABLE_EMPTY_TABLE_ERROR = -8,
+
+	/* Dir/file conflict. */
+	REFTABLE_NAME_CONFLICT = -9,
+
+	/* Illegal ref name. */
+	REFTABLE_REFNAME_ERROR = -10,
+};
+
+/* convert the numeric error code to a string. The string should not be
+ * deallocated. */
+const char *reftable_error_str(int err);
+
+/*
+ * Convert the numeric error code to an equivalent errno code.
+ */
+int reftable_error_to_errno(int err);
+
+/****************************************************************
+ Writing
+
+ Writing single reftables
+ ****************************************************************/
+
+/* reftable_write_options sets options for writing a single reftable. */
+struct reftable_write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+
+	/* boolean: do not check ref names for validity or dir/file conflicts.
+	 */
+	int skip_name_check;
+};
+
+/* reftable_block_stats holds statistics for a single block type */
+struct reftable_block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct reftable_stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct reftable_block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct reftable_block_stats obj_stats;
+	/* stats for index blocks */
+	struct reftable_block_stats idx_stats;
+	/* stats for log blocks */
+	struct reftable_block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* reftable_new_writer creates a new writer */
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, uint8_t *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int reftable_fd_write(void *fdp, uint8_t *data, size_t size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+
+   For transactional updates, typically min==max. When converting an existing
+   ref database into a single reftable, this would be a range of update-index
+   timestamps.
+ */
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max);
+
+/* adds a reftable_ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned.
+
+   It is an error to write a ref record after a log record.
+ */
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n);
+
+/* adds a reftable_log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n);
+
+/* reftable_writer_close finalizes the reftable. The writer is retained so
+ * statistics can be inspected. */
+int reftable_writer_close(struct reftable_writer *w);
+
+/* writer_stats returns the statistics on the reftable being written.
+
+   This struct becomes invalid when the writer is freed.
+ */
+const struct reftable_stats *writer_stats(struct reftable_writer *w);
+
+/* reftable_writer_free deallocates memory for the writer */
+void reftable_writer_free(struct reftable_writer *w);
+
+/****************************************************************
+ * ITERATING
+ ****************************************************************/
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct reftable_iterator {
+	struct reftable_iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_ref(struct reftable_iterator it,
+			       struct reftable_ref_record *ref);
+
+/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_log(struct reftable_iterator it,
+			       struct reftable_log_record *log);
+
+/* releases resources associated with an iterator. */
+void reftable_iterator_destroy(struct reftable_iterator *it);
+
+/****************************************************************
+ Reading single tables
+
+ The follow routines are for reading single files. For an application-level
+ interface, skip ahead to struct reftable_merged_table and struct
+ reftable_stack.
+ ****************************************************************/
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct reftable_block_source {
+	struct reftable_block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct reftable_block {
+	uint8_t *data;
+	int len;
+	struct reftable_block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct reftable_block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct reftable_block *dest,
+			  uint64_t off, uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct reftable_block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int reftable_block_source_from_file(struct reftable_block_source *block_src,
+				    const char *name);
+
+/* The reader struct is a handle to an open reftable file. */
+struct reftable_reader;
+
+/* reftable_new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp. The name is used for creating a stack. Typically, it is the
+ * basename of the file. The block source `src` is owned by the reader, and is
+ * closed on calling reftable_reader_destroy().
+ */
+int reftable_new_reader(struct reftable_reader **pp,
+			struct reftable_block_source src, const char *name);
+
+/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
+   in the table.  To seek to the start of the table, use name = "".
+
+   example:
+
+   struct reftable_reader *r = NULL;
+   int err = reftable_new_reader(&r, src, "filename");
+   if (err < 0) { ... }
+   struct reftable_iterator it  = {0};
+   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct reftable_ref_record ref  = {0};
+   while (1) {
+     err = reftable_iterator_next_ref(it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   reftable_iterator_destroy(&it);
+   reftable_ref_record_clear(&ref);
+ */
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reftable_reader_hash_id(struct reftable_reader *r);
+
+/* seek to logs for the given name, older than update_index. To seek to the
+   start of the table, use name = "".
+ */
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reftable_reader_free(struct reftable_reader *);
+
+/* return an iterator for the refs pointing to `oid`. */
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid);
+
+/* return the max_update_index for a table */
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
+
+/****************************************************************
+ Merged tables
+
+ A ref database kept in a sequence of table files. The merged_table presents a
+ unified view to reading (seeking, iterating) a sequence of immutable tables.
+ ****************************************************************/
+
+/* A merged table is implements seeking/iterating over a stack of tables. */
+struct reftable_merged_table;
+
+/* reftable_new_merged_table creates a new merged table. It takes ownership of
+   the stack array.
+*/
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id);
+
+/* returns an iterator positioned just before 'name' */
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index);
+
+/* like reftable_merged_table_seek_log_at but look for the newest entry. */
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
+
+/* closes readers for the merged tables */
+void reftable_merged_table_close(struct reftable_merged_table *mt);
+
+/* releases memory for the merged_table */
+void reftable_merged_table_free(struct reftable_merged_table *m);
+
+/****************************************************************
+ Generic tables
+
+ A unified API for reading tables, either merged tables, or single readers.
+ ****************************************************************/
+
+struct reftable_table {
+	struct reftable_table_vtable *ops;
+	void *table_arg;
+};
+
+int reftable_table_seek_ref(struct reftable_table tab,
+			    struct reftable_iterator *it, const char *name);
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader);
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *table);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_table_read_ref(struct reftable_table tab, const char *name,
+			    struct reftable_ref_record *ref);
+
+/****************************************************************
+ Mutable ref database
+
+ The stack presents an interface to a mutable sequence of reftables.
+ ****************************************************************/
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct reftable_stack;
+
+/* open a new reftable stack. The tables along with the table list will be
+   stored in 'dir'. Typically, this should be .git/reftables.
+*/
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
+
+/* holds a transaction to add tables at the top of a stack. */
+struct reftable_addition;
+
+/*
+  returns a new transaction to add reftables to the given stack. As a side
+  effect, the ref database is locked.
+*/
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st);
+
+/* Adds a reftable to transaction. */
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg);
+
+/* Commits the transaction, releasing the lock. */
+int reftable_addition_commit(struct reftable_addition *add);
+
+/* Release all non-committed data from the transaction; releases the lock if
+ * held. */
+void reftable_addition_close(struct reftable_addition *add);
+
+/* add a new table to the stack. The write_table function must call
+   reftable_writer_set_limits, add refs and return an error value. */
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write_table)(struct reftable_writer *wr,
+					  void *write_arg),
+		       void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st);
+
+/* frees all resources associated with the stack. */
+void reftable_stack_destroy(struct reftable_stack *st);
+
+/* reloads the stack if necessary. */
+int reftable_stack_reload(struct reftable_stack *st);
+
+/* Policy for expiring reflog entries. */
+struct reftable_log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int reftable_stack_auto_compact(struct reftable_stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log);
+
+/* statistics on past compactions. */
+struct reftable_compaction_stats {
+	uint64_t bytes; /* total number of bytes written */
+	uint64_t entries_written; /* total number of entries written, including
+				     failures. */
+	int attempts; /* how often we tried to compact */
+	int failures; /* failures happen on concurrent updates */
+};
+
+/* return statistics for compaction up till now. */
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st);
+
+#endif
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 00000000000..89e649b0d46
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,225 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+
+	{
+		int l = strlen(str);
+		l++; /* \0 */
+		slice_resize(s, l);
+		memcpy(s->buf, str, l);
+		s->len = l - 1;
+	}
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = reftable_realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_append_string(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_append(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+void slice_consume(struct slice *s, int n)
+{
+	s->buf += n;
+	s->len -= n;
+}
+
+byte *slice_yield(struct slice *s)
+{
+	byte *p = s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_clear(struct slice *s)
+{
+	reftable_free(slice_yield(s));
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = { 0 };
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_yield(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	if (a.len != b.len) {
+		return 0;
+	}
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_compare(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	if (res != 0) {
+		return res;
+	}
+	if (a.len < b.len) {
+		return -1;
+	} else if (a.len > b.len) {
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+int slice_write(struct slice *b, byte *data, size_t sz)
+{
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = reftable_realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_write_void(void *b, byte *data, size_t sz)
+{
+	return slice_write((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = reftable_calloc(size);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf)
+{
+	assert(bs->ops == NULL);
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+struct reftable_block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct reftable_block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct reftable_block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 00000000000..886f22811c4
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,76 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+/*
+  provides bounds-checked byte ranges.
+  To use, initialize as "slice x = {0};"
+ */
+struct slice {
+	int len;
+	int cap;
+	byte *buf;
+};
+
+void slice_set_string(struct slice *dest, const char *src);
+void slice_append_string(struct slice *dest, const char *src);
+/* Set length to 0, but retain buffer */
+void slice_clear(struct slice *slice);
+
+/* Return a malloced string for `src` */
+char *slice_to_string(struct slice src);
+
+/* Ensure that `buf` is \0 terminated. */
+const char *slice_as_string(struct slice *src);
+
+/* Compare slices */
+bool slice_equal(struct slice a, struct slice b);
+
+/* Return `buf`, clearing out `s` */
+byte *slice_yield(struct slice *s);
+
+/* Copy bytes */
+void slice_copy(struct slice *dest, struct slice src);
+
+/* Advance `buf` by `n`, and decrease length. A copy of the slice
+   should be kept for deallocating the slice. */
+void slice_consume(struct slice *s, int n);
+
+/* Set length of the slice to `l` */
+void slice_resize(struct slice *s, int l);
+
+/* Signed comparison */
+int slice_compare(struct slice a, struct slice b);
+
+/* Append `data` to the `dest` slice.  */
+int slice_write(struct slice *dest, byte *data, size_t sz);
+
+/* Append `add` to `dest. */
+void slice_append(struct slice *dest, struct slice add);
+
+/* Like slice_write, but suitable for passing to reftable_new_writer
+ */
+int slice_write_void(void *b, byte *data, size_t sz);
+
+/* Find the longest shared prefix size of `a` and `b` */
+int common_prefix_size(struct slice a, struct slice b);
+
+struct reftable_block_source;
+
+/* Create an in-memory block source for reading reftables */
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf);
+
+struct reftable_block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 00000000000..aca1f067b57
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1230 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "refname.h"
+#include "reftable.h"
+#include "writer.h"
+
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config)
+{
+	struct reftable_stack *p =
+		reftable_calloc(sizeof(struct reftable_stack));
+	struct slice list_file_name = { 0 };
+	int err = 0;
+
+	if (config.hash_id == 0) {
+		config.hash_id = SHA1_ID;
+	}
+
+	*dest = NULL;
+
+	slice_set_string(&list_file_name, dir);
+	slice_append_string(&list_file_name, "/tables.list");
+
+	p->list_file = slice_to_string(list_file_name);
+	slice_clear(&list_file_name);
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = reftable_stack_reload(p);
+	if (err < 0) {
+		reftable_stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fd_read_lines(int fd, char ***namesp)
+{
+	off_t size = lseek(fd, 0, SEEK_END);
+	char *buf = NULL;
+	int err = 0;
+	if (size < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+	err = lseek(fd, 0, SEEK_SET);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	buf = reftable_malloc(size + 1);
+	if (read(fd, buf, size) != size) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+
+exit:
+	reftable_free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	int fd = open(filename, O_RDONLY, 0644);
+	int err = 0;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			*namesp = reftable_calloc(sizeof(char *));
+			return 0;
+		}
+
+		return REFTABLE_IO_ERROR;
+	}
+	err = fd_read_lines(fd, namesp);
+	close(fd);
+	return err;
+}
+
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void reftable_stack_destroy(struct reftable_stack *st)
+{
+	if (st->merged == NULL) {
+		return;
+	}
+
+	reftable_merged_table_close(st->merged);
+	reftable_merged_table_free(st->merged);
+	st->merged = NULL;
+
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	reftable_free(st);
+}
+
+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
+						   int cur_len)
+{
+	struct reftable_reader **cur =
+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
+				      bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reftable_reader **new_tables =
+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
+	int new_tables_len = 0;
+	struct reftable_merged_table *new_merged = NULL;
+
+	struct slice table_path = { 0 };
+
+	while (*names) {
+		struct reftable_reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct reftable_block_source src = { 0 };
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_append_string(&table_path, "/");
+			slice_append_string(&table_path, name);
+
+			err = reftable_block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0) {
+				goto exit;
+			}
+
+			err = reftable_new_reader(&rd, src, name);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
+					st->config.hash_id);
+	if (err < 0) {
+		goto exit;
+	}
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		reftable_merged_table_free(st->merged);
+	}
+	new_merged->suppress_deletions = true;
+	st->merged = new_merged;
+
+	{
+		int i = 0;
+		for (i = 0; i < cur_len; i++) {
+			if (cur[i] != NULL) {
+				reader_close(cur[i]);
+				reftable_reader_free(cur[i]);
+			}
+		}
+	}
+
+exit:
+	slice_clear(&table_path);
+	{
+		int i = 0;
+		for (i = 0; i < new_tables_len; i++) {
+			reader_close(new_tables[i]);
+			reftable_reader_free(new_tables[i]);
+		}
+	}
+	reftable_free(new_tables);
+	reftable_free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0) {
+		return diff;
+	}
+
+	return udiff;
+}
+
+static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+					     bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		int err2 = 0;
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = reftable_stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != REFTABLE_NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
+		   writer. Check if there was one by checking if the name list
+		   changed.
+		*/
+		err2 = read_lines(st->list_file, &names_after);
+		if (err2 < 0) {
+			free_names(names);
+			return err2;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return err;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 1;
+		sleep_millisec(delay);
+	}
+
+	return 0;
+}
+
+int reftable_stack_reload(struct reftable_stack *st)
+{
+	return reftable_stack_reload_maybe_reuse(st, true);
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct reftable_stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto exit;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto exit;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	free_names(names);
+	return err;
+}
+
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write)(struct reftable_writer *wr, void *arg),
+		       void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == REFTABLE_LOCK_ERROR) {
+			/* Ignore error return, we want to propagate
+			   REFTABLE_LOCK_ERROR.
+			*/
+			reftable_stack_reload(st);
+		}
+		return err;
+	}
+
+	if (!st->disable_auto_compact) {
+		return reftable_stack_auto_compact(st);
+	}
+
+	return 0;
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
+	slice_set_string(dest, buf);
+}
+
+struct reftable_addition {
+	int lock_file_fd;
+	struct slice lock_file_name;
+	struct reftable_stack *stack;
+	char **names;
+	char **new_tables;
+	int new_tables_len;
+	uint64_t next_update_index;
+};
+
+static int reftable_stack_init_addition(struct reftable_addition *add,
+					struct reftable_stack *st)
+{
+	int err = 0;
+	add->stack = st;
+
+	slice_set_string(&add->lock_file_name, st->list_file);
+	slice_append_string(&add->lock_file_name, ".lock");
+
+	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (add->lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = REFTABLE_LOCK_ERROR;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	err = stack_uptodate(st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 1) {
+		err = REFTABLE_LOCK_ERROR;
+		goto exit;
+	}
+
+	add->next_update_index = reftable_stack_next_update_index(st);
+exit:
+	if (err) {
+		reftable_addition_close(add);
+	}
+	return err;
+}
+
+void reftable_addition_close(struct reftable_addition *add)
+{
+	int i = 0;
+	struct slice nm = { 0 };
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_set_string(&nm, add->stack->list_file);
+		slice_append_string(&nm, "/");
+		slice_append_string(&nm, add->new_tables[i]);
+		unlink(slice_as_string(&nm));
+		reftable_free(add->new_tables[i]);
+		add->new_tables[i] = NULL;
+	}
+	reftable_free(add->new_tables);
+	add->new_tables = NULL;
+	add->new_tables_len = 0;
+
+	if (add->lock_file_fd > 0) {
+		close(add->lock_file_fd);
+		add->lock_file_fd = 0;
+	}
+	if (add->lock_file_name.len > 0) {
+		unlink(slice_as_string(&add->lock_file_name));
+		slice_clear(&add->lock_file_name);
+	}
+
+	free_names(add->names);
+	add->names = NULL;
+	slice_clear(&nm);
+}
+
+int reftable_addition_commit(struct reftable_addition *add)
+{
+	struct slice table_list = { 0 };
+	int i = 0;
+	int err = 0;
+	if (add->new_tables_len == 0) {
+		goto exit;
+	}
+
+	for (i = 0; i < add->stack->merged->stack_len; i++) {
+		slice_append_string(&table_list,
+				    add->stack->merged->stack[i]->name);
+		slice_append_string(&table_list, "\n");
+	}
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_append_string(&table_list, add->new_tables[i]);
+		slice_append_string(&table_list, "\n");
+	}
+
+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	slice_clear(&table_list);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = close(add->lock_file_fd);
+	add->lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&add->lock_file_name),
+		     add->stack->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = reftable_stack_reload(add->stack);
+
+exit:
+	reftable_addition_close(add);
+	return err;
+}
+
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st)
+{
+	int err = 0;
+	*dest = reftable_malloc(sizeof(**dest));
+	err = reftable_stack_init_addition(*dest, st);
+	if (err) {
+		reftable_free(*dest);
+		*dest = NULL;
+	}
+	return err;
+}
+
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg)
+{
+	struct reftable_addition add = { 0 };
+	int err = reftable_stack_init_addition(&add, st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_addition_add(&add, write_table, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_addition_commit(&add);
+exit:
+	reftable_addition_close(&add);
+	return err;
+}
+
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice tab_file_name = { 0 };
+	struct slice next_name = { 0 };
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, add->next_update_index, add->next_update_index);
+
+	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
+	slice_append_string(&temp_tab_file_name, "/");
+	slice_append(&temp_tab_file_name, next_name);
+	slice_append_string(&temp_tab_file_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
+	if (tab_fd < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
+				 &add->stack->config);
+	err = write_table(wr, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_writer_close(wr);
+	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto exit;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_check_addition(add->stack,
+				   slice_as_string(&temp_tab_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (wr->min_update_index < add->next_update_index) {
+		err = REFTABLE_API_ERROR;
+		goto exit;
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_append_string(&next_name, ".ref");
+
+	slice_set_string(&tab_file_name, add->stack->reftable_dir);
+	slice_append_string(&tab_file_name, "/");
+	slice_append(&tab_file_name, next_name);
+
+	err = rename(slice_as_string(&temp_tab_file_name),
+		     slice_as_string(&tab_file_name));
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	add->new_tables = reftable_realloc(add->new_tables,
+					   sizeof(*add->new_tables) *
+						   (add->new_tables_len + 1));
+	add->new_tables[add->new_tables_len] = slice_to_string(next_name);
+	add->new_tables_len++;
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_file_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_file_name));
+	}
+
+	slice_clear(&temp_tab_file_name);
+	slice_clear(&tab_file_name);
+	slice_clear(&next_name);
+	reftable_writer_free(wr);
+	return err;
+}
+
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0) {
+		return reftable_reader_max_update_index(
+			       st->merged->stack[sz - 1]) +
+		       1;
+	}
+	return 1;
+}
+
+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct reftable_log_expiry_config *config)
+{
+	struct slice next_name = { 0 };
+	int tab_fd = -1;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reftable_reader_min_update_index(st->merged->stack[first]),
+		    reftable_reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_append_string(temp_tab, "/");
+	slice_append(temp_tab, next_name);
+	slice_append_string(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0) {
+		goto exit;
+	}
+	err = reftable_writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+	reftable_writer_free(wr);
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		slice_clear(temp_tab);
+	}
+	slice_clear(&next_name);
+	return err;
+}
+
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reftable_reader **subtabs = reftable_calloc(
+		sizeof(struct reftable_reader *) * (last - first + 1));
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+
+	uint64_t entries = 0;
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reftable_reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	reftable_writer_set_limits(wr,
+				   st->merged->stack[first]->min_update_index,
+				   st->merged->stack[last]->max_update_index);
+
+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
+					st->config.hash_id);
+	if (err < 0) {
+		reftable_free(subtabs);
+		goto exit;
+	}
+
+	err = reftable_merged_table_seek_ref(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_ref(it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = reftable_writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+	reftable_iterator_destroy(&it);
+
+	err = reftable_merged_table_seek_log(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_log(it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = reftable_writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		reftable_merged_table_free(mt);
+	}
+	reftable_ref_record_clear(&ref);
+	reftable_log_record_clear(&log);
+	st->stats.entries_written += entries;
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct reftable_stack *st, int first, int last,
+			       struct reftable_log_expiry_config *expiry)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice new_table_name = { 0 };
+	struct slice lock_file_name = { 0 };
+	struct slice ref_list_contents = { 0 };
+	struct slice new_table_path = { 0 };
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **delete_on_success =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	char **subtable_locks =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto exit;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_append_string(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	/* Don't want to write to the lock for now.  */
+	close(lock_file_fd);
+	lock_file_fd = 0;
+
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0) {
+		goto exit;
+	}
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_file_name = { 0 };
+		struct slice subtab_lock = { 0 };
+		slice_set_string(&subtab_file_name, st->reftable_dir);
+		slice_append_string(&subtab_file_name, "/");
+		slice_append_string(&subtab_file_name,
+				    reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_file_name);
+		slice_append_string(&subtab_lock, ".lock");
+
+		{
+			int sublock_file_fd =
+				open(slice_as_string(&subtab_lock),
+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
+			if (sublock_file_fd > 0) {
+				close(sublock_file_fd);
+			} else if (sublock_file_fd < 0) {
+				if (errno == EEXIST) {
+					err = 1;
+				} else {
+					err = REFTABLE_IO_ERROR;
+				}
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] =
+			(char *)slice_as_string(&subtab_file_name);
+		j++;
+
+		if (err != 0) {
+			goto exit;
+		}
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
+				   expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_append_string(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_append_string(&new_table_path, "/");
+
+	slice_append(&new_table_path, new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(slice_as_string(&temp_tab_file_name),
+			     slice_as_string(&new_table_path));
+		if (err < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto exit;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		slice_append(&ref_list_contents, new_table_name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	have_lock = false;
+
+	/* Reload the stack before deleting. On windows, we can only delete the
+	   files after we closed them.
+	*/
+	err = reftable_stack_reload_maybe_reuse(st, first < last);
+
+	{
+		char **p = delete_on_success;
+		while (*p) {
+			if (strcmp(*p, slice_as_string(&new_table_path))) {
+				unlink(*p);
+			}
+			p++;
+		}
+	}
+
+exit:
+	free_names(delete_on_success);
+	{
+		char **p = subtable_locks;
+		while (*p) {
+			unlink(*p);
+			p++;
+		}
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	slice_clear(&new_table_name);
+	slice_clear(&new_table_path);
+	slice_clear(&ref_list_contents);
+	slice_clear(&temp_tab_file_name);
+	slice_clear(&lock_file_name);
+	return err;
+}
+
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct reftable_stack *st, int first,
+				     int last,
+				     struct reftable_log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	if (sz == 0) {
+		return 0;
+	}
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+
+	if (n == 0) {
+		*seglen = 0;
+		return segs;
+	}
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	reftable_free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
+{
+	uint64_t *sizes =
+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
+	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
+	int overhead = header_size(version) - 1;
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		sizes[i] = st->merged->stack[i]->size - overhead;
+	}
+	return sizes;
+}
+
+int reftable_stack_auto_compact(struct reftable_stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	reftable_free(sizes);
+	if (segment_size(&seg) > 0) {
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+	}
+
+	return 0;
+}
+
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st)
+{
+	return &st->stats;
+}
+
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_table tab = { NULL };
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+	return reftable_table_read_ref(tab, refname, ref);
+}
+
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_log(it, log);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(log->ref_name, refname) ||
+	    reftable_log_record_is_deletion(log)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	if (err) {
+		reftable_log_record_clear(log);
+	}
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name)
+{
+	int err = 0;
+	struct reftable_block_source src = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct reftable_ref_record *refs = NULL;
+	struct reftable_iterator it = { NULL };
+	int cap = 0;
+	int len = 0;
+	int i = 0;
+
+	if (st->config.skip_name_check) {
+		return 0;
+	}
+
+	err = reftable_block_source_from_file(&src, new_tab_name);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_new_reader(&rd, src, new_tab_name);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	if (err > 0) {
+		err = 0;
+		goto exit;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		err = reftable_iterator_next_ref(it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (len >= cap) {
+			cap = 2 * cap + 1;
+			refs = reftable_realloc(refs, cap * sizeof(refs[0]));
+		}
+
+		refs[len++] = ref;
+	}
+
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+
+	err = validate_ref_record_addition(tab, refs, len);
+
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&refs[i]);
+	}
+
+exit:
+	free(refs);
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 00000000000..48924c40f4b
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,45 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+#include "system.h"
+
+struct reftable_stack {
+	char *list_file;
+	char *reftable_dir;
+	bool disable_auto_compact;
+
+	struct reftable_write_options config;
+
+	struct reftable_merged_table *merged;
+	struct reftable_compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg);
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config);
+int fastlog2(uint64_t sz);
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 00000000000..9e2c4b8d9ac
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,54 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#if 1 /* REFTABLE_IN_GITCORE */
+
+#include "git-compat-util.h"
+#include "cache.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_IN_GITCORE */
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef uint8_t byte;
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ *
+ * TODO: in git-core, this should fallback to uncompress2 if it is available.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 00000000000..0341c865569
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,67 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				reftable_calloc(sizeof(struct tree_node));
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	{
+		int res = compare(key, (*rootp)->key);
+		if (res < 0) {
+			return tree_search(key, &(*rootp)->left, compare,
+					   insert);
+		} else if (res > 0) {
+			return tree_search(key, &(*rootp)->right, compare,
+					   insert);
+		}
+	}
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	reftable_free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 00000000000..954512e9a3a
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+/* tree_node is a generic binary search tree. */
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+/* looks for `key` in `rootp` using `compare` as comparison function. If insert
+   is set, insert the key if it's not found. Else, return NULL.
+*/
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+
+/* performs an infix walk of the tree. */
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+
+/*
+  deallocates the tree nodes recursively. Keys should be deallocated separately
+  by walking over the tree. */
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100755
index 00000000000..ef32aeda515
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+set -eu
+
+# Override this to import from somewhere else, say "../reftable".
+SRC=${SRC:-origin}
+BRANCH=${BRANCH:-master}
+
+((git --git-dir reftable-repo/.git fetch ${SRC} ${BRANCH}:import && cd reftable-repo && git checkout -f $(git rev-parse import) ) ||
+   git clone https://github.com/google/reftable reftable-repo)
+
+cp reftable-repo/c/*.[ch] reftable/
+cp reftable-repo/c/include/*.[ch] reftable/
+cp reftable-repo/LICENSE reftable/
+git --git-dir reftable-repo/.git show --no-patch HEAD \
+  > reftable/VERSION
+
+mv reftable/system.h reftable/system.h~
+sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
+
+# Remove unittests and compatibility hacks we don't need here.  
+rm reftable/*_test.c reftable/test_framework.* reftable/compat.*
+
+git add reftable/*.[ch] reftable/LICENSE reftable/VERSION 
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 00000000000..bb70eba49da
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,661 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct reftable_block_stats *
+writer_reftable_block_stats(struct reftable_writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct reftable_writer *w, byte *data, size_t len,
+			int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = reftable_calloc(w->pending_padding);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0) {
+			return n;
+		}
+
+		w->pending_padding = 0;
+		reftable_free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0) {
+		return n;
+	}
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct reftable_write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_version(struct reftable_writer *w)
+{
+	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
+}
+
+static int writer_write_header(struct reftable_writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	dest[4] = writer_version(w);
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	if (writer_version(w) == 2) {
+		put_be32(dest + 24, w->opts.hash_id);
+	}
+	return header_size(writer_version(w));
+}
+
+static void writer_reinit_block_writer(struct reftable_writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = header_size(writer_version(w));
+	}
+
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, byte *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts)
+{
+	struct reftable_writer *wp =
+		reftable_calloc(sizeof(struct reftable_writer));
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->block = reftable_calloc(opts->block_size);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void reftable_writer_free(struct reftable_writer *w)
+{
+	reftable_free(w->block);
+	reftable_free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
+			     ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct reftable_writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		key = reftable_calloc(sizeof(struct obj_index_tree_node));
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = reftable_realloc(
+			key->offsets, sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct reftable_writer *w, struct record rec)
+{
+	int result = -1;
+	struct slice key = { 0 };
+	int err = 0;
+	record_key(rec, &key);
+	if (slice_compare(w->last_key, key) >= 0) {
+		goto exit;
+	}
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto exit;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	writer_reinit_block_writer(w, record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	result = 0;
+exit:
+	slice_clear(&key);
+	return result;
+}
+
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref)
+{
+	struct record rec = { 0 };
+	struct reftable_ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL) {
+		return REFTABLE_API_ERROR;
+	}
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index) {
+		return REFTABLE_API_ERROR;
+	}
+
+	record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = hash_size(w->opts.hash_id),
+		};
+
+		writer_index_hash(w, h);
+	}
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = hash_size(w->opts.hash_id),
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, reftable_ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log)
+{
+	if (log->ref_name == NULL) {
+		return REFTABLE_API_ERROR;
+	}
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	{
+		struct record rec = { 0 };
+		int err;
+		record_from_log(&rec, log);
+		err = writer_add_record(w, rec);
+		return err;
+	}
+}
+
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, reftable_log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	while (w->index_len > threshold) {
+		struct index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct record rec = { 0 };
+			record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, rec) == 0) {
+				continue;
+			}
+
+			{
+				int err = writer_flush_block(w);
+				if (err < 0) {
+					return err;
+				}
+			}
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, rec);
+			assert(err == 0);
+		}
+		for (i = 0; i < idx_len; i++) {
+			slice_clear(&idx[i].last_key);
+		}
+		reftable_free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_block_stats *bstats =
+			writer_reftable_block_stats(w, typ);
+		bstats->index_blocks =
+			w->stats.idx_stats.blocks - before_blocks;
+		bstats->index_offset = index_start;
+		bstats->max_index_level = max_level;
+	}
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct reftable_writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct record rec = { 0 };
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	slice_clear(&entry->hash);
+	reftable_free(entry);
+}
+
+static int writer_dump_object_index(struct reftable_writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0) {
+		return closure.err;
+	}
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct reftable_writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0) {
+		return err;
+	}
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int reftable_writer_close(struct reftable_writer *w)
+{
+	byte footer[72];
+	byte *p = footer;
+	int err = writer_finish_public_section(w);
+	int empty_table = w->next == 0;
+	if (err != 0) {
+		goto exit;
+	}
+	w->pending_padding = 0;
+	if (empty_table) {
+		/* Empty tables need a header anyway. */
+		byte header[28];
+		int n = writer_write_header(w, header);
+		err = padded_write(w, header, n, 0);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	p += writer_write_header(w, footer);
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+
+	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (empty_table) {
+		err = REFTABLE_EMPTY_TABLE_ERROR;
+		goto exit;
+	}
+
+exit:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	slice_clear(&w->last_key);
+	return err;
+}
+
+void writer_clear_index(struct reftable_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		slice_clear(&w->index[i].last_key);
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct reftable_block_stats *bstats =
+		writer_reftable_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	if (raw_bytes < 0) {
+		return raw_bytes;
+	}
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0) {
+		return err;
+	}
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = reftable_realloc(
+			w->index, sizeof(struct index_record) * w->index_cap);
+	}
+
+	{
+		struct index_record ir = {
+			.offset = w->next,
+		};
+		slice_copy(&ir.last_key, w->block_writer->last_key);
+		w->index[w->index_len] = ir;
+	}
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct reftable_writer *w)
+{
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+	if (w->block_writer->entries == 0) {
+		return 0;
+	}
+	return writer_flush_nonempty_block(w);
+}
+
+const struct reftable_stats *writer_stats(struct reftable_writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 00000000000..afc15d8f31c
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct reftable_writer {
+	int (*write)(void *, byte *, size_t);
+	void *write_arg;
+	int pending_padding;
+	struct slice last_key;
+
+	/* offset of next block to write. */
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct reftable_write_options opts;
+
+	/* memory buffer for writing */
+	byte *block;
+
+	/* writer for the current section. NULL or points to
+	 * block_writer_data */
+	struct block_writer *block_writer;
+
+	struct block_writer block_writer_data;
+
+	/* pending index records for the current section */
+	struct index_record *index;
+	int index_len;
+	int index_cap;
+
+	/*
+	  tree for use with tsearch; used to populate the 'o' inverse OID
+	  map */
+	struct tree_node *obj_index_tree;
+
+	struct reftable_stats stats;
+};
+
+/* finishes a block, and writes it to storage */
+int writer_flush_block(struct reftable_writer *w);
+
+/* deallocates memory related to the index */
+void writer_clear_index(struct reftable_writer *w);
+
+/* finishes writing a 'r' (refs) or 'g' (reflogs) section */
+int writer_finish_public_section(struct reftable_writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 00000000000..3e0b0f24f1c
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v12 09/12] Reftable support for git-core
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
                                         ` (7 preceding siblings ...)
  2020-05-07  9:59                       ` [PATCH v12 08/12] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-05-07  9:59                       ` Han-Wen Nienhuys via GitGitGadget
  2020-05-07  9:59                       ` [PATCH v12 10/12] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
                                         ` (3 subsequent siblings)
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-07  9:59 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

This introduces the refs/reftable-backend.c containing reftable powered ref
storage backend.

It can be activated by passing --ref-storage=reftable to "git init".

TODO:

* Resolve spots marked with XXX

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   27 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   56 +-
 cache.h                                       |    6 +-
 refs.c                                        |   27 +-
 refs.h                                        |    3 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1045 +++++++++++++++++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |  109 ++
 13 files changed, 1271 insertions(+), 30 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ffd..72576235833 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index 3d3a39fc192..f4848a382f4 100644
--- a/Makefile
+++ b/Makefile
@@ -810,6 +810,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -962,6 +963,7 @@ LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1165,7 +1167,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2360,11 +2362,29 @@ VCSSVN_OBJS += vcs-svn/sliding_window.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/refname.o
+REFTABLE_OBJS += reftable/reftable.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2505,6 +2525,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -3120,7 +3143,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/builtin/clone.c b/builtin/clone.c
index cb48a291caf..4d0cf065e4a 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1108,7 +1108,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
+		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 0b7222e7188..b7053b9e370 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,7 +178,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -188,7 +189,8 @@ void initialize_repository_version(int hash_algo)
 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
 #endif
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 ||
+	    !strcmp(ref_storage_format, "reftable"))
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -238,6 +240,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -247,6 +250,24 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
+	/*
+	 * refs/heads is a file when using reftable. We can't reinitialize with
+	 * a reftable because it will overwrite HEAD
+	 */
+	if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
+			      is_directory(git_path_buf(&buf, "refs/heads"))) {
+		die("cannot switch ref storage format.");
+	}
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -261,15 +282,12 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
@@ -383,7 +401,8 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash, unsigned int flags)
+	    const char *template_dir, int hash, const char *ref_storage_format,
+	    unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -422,6 +441,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
+	repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
@@ -450,6 +470,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -523,6 +545,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = DEFAULT_REF_STORAGE;
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
@@ -530,15 +553,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *object_format = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
@@ -648,9 +674,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
+	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
+		       ref_storage_format, flags);
 }
diff --git a/cache.h b/cache.h
index 0f0485ecfe2..8cb884773c3 100644
--- a/cache.h
+++ b/cache.h
@@ -628,8 +628,9 @@ int path_inside_repo(const char *prefix, const char *path);
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
-	    unsigned int flags);
-void initialize_repository_version(int hash_algo);
+	    const char *ref_storage_format, unsigned int flags);
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1043,6 +1044,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 4db27379661..299a5db8bf1 100644
--- a/refs.c
+++ b/refs.c
@@ -17,10 +17,16 @@
 #include "argv-array.h"
 #include "repository.h"
 
+const char *default_ref_storage(void)
+{
+	const char *test = getenv("GIT_TEST_REFTABLE");
+	return test ? "reftable" : "files";
+}
+
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1792,13 +1798,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1814,7 +1820,11 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs_private = ref_store_init(r->gitdir,
+					 r->ref_storage_format ?
+						 r->ref_storage_format :
+						 DEFAULT_REF_STORAGE,
+					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
 
@@ -1869,7 +1879,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1883,6 +1893,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1896,9 +1907,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index 9421c5b8465..15f6d78ee84 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* Returns the ref storage backend to use by default. */
+const char *default_ref_storage(void);
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 3490aac3a40..cafe5b97376 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -661,6 +661,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 00000000000..3ede7e837c1
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1045 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct git_reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *repo_dir;
+	char *reftable_dir;
+	struct reftable_stack *stack;
+};
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	reftable_log_record_clear(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = { NULL };
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	reftable_log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *git_reftable_ref_store_create(const char *path,
+						       unsigned int store_flags)
+{
+	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct reftable_write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+	refs->repo_dir = xstrdup(path);
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct strbuf sb = STRBUF_INIT;
+
+	safe_create_dir(refs->reftable_dir, 1);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	return 0;
+}
+
+struct git_reftable_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	const char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = reftable_iterator_next_ref(ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	reftable_ref_record_clear(&ri->ref);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct reftable_merged_table *mt = NULL;
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else {
+		mt = reftable_stack_merged_table(refs->stack);
+		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->prefix = prefix;
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	(void)refs;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid;
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct ref_update *)a)->refname,
+		      ((struct ref_update *)b)->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = reftable_stack_next_update_index(refs->stack);
+	int err = 0;
+	int i = 0;
+	struct reftable_log_record *logs =
+		calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct reftable_log_record *log = &logs[i];
+		fill_reftable_log_record(log);
+		log->ref_name = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct object_id out_oid;
+			int out_flags = 0;
+			/* Memory owned by refs_resolve_ref_unsafe, no need to
+			 * free(). */
+			const char *resolved = refs_resolve_ref_unsafe(
+				transaction->ref_store, u->refname, 0, &out_oid,
+				&out_flags);
+			struct reftable_ref_record ref = { NULL };
+			struct object_id peeled;
+			int peel_error = peel_object(&u->new_oid, &peeled);
+
+			ref.ref_name =
+				(char *)(resolved ? resolved : u->refname);
+			log->ref_name = ref.ref_name;
+
+			if (!is_null_oid(&u->new_oid)) {
+				ref.value = u->new_oid.hash;
+			}
+			ref.update_index = ts;
+			if (!peel_error) {
+				ref.target_value = peeled.hash;
+			}
+
+			err = reftable_writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		err = reftable_writer_add_log(writer, &logs[i]);
+		clear_reftable_log_record(&logs[i]);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+exit:
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_commit(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_add(refs->stack, &write_transaction_table,
+				 transaction);
+	if (err < 0) {
+		strbuf_addf(errmsg, "reftable: transaction failure: %s",
+			    reftable_error_str(err));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *err)
+{
+	return reftable_transaction_commit(ref_store, transaction, err);
+}
+
+struct write_delete_refs_arg {
+	struct reftable_stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	int i = 0;
+
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = reftable_writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_log_record log = { NULL };
+		struct reftable_ref_record current = { NULL };
+		fill_reftable_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (reftable_stack_read_ref(arg->stack, log.ref_name,
+					    &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = reftable_writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		reftable_ref_record_clear(&current);
+
+		clear_reftable_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	return reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct git_reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = reftable_stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	reftable_writer_set_limits(writer, ts, ts);
+	err = reftable_writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_log_record log = { NULL };
+		struct object_id new_oid;
+		struct object_id old_oid;
+		struct reftable_ref_record current = { NULL };
+		reftable_stack_read_ref(create->refs->stack, create->refname,
+					&current);
+
+		fill_reftable_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			reftable_writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_reftable_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_add(refs->stack, &write_create_symref_table,
+				  &arg);
+}
+
+struct write_rename_arg {
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_ref_record ref = { NULL };
+	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto exit;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto exit;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	reftable_writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct reftable_ref_record todo[2] = { { NULL } };
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = reftable_writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct reftable_log_record todo[2] = { { NULL } };
+		fill_reftable_log_record(&todo[0]);
+		fill_reftable_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = reftable_writer_add_logs(writer, todo, 2);
+
+		clear_reftable_log_record(&todo[0]);
+		clear_reftable_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto exit;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+exit:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	return reftable_stack_add(refs->stack, &write_rename_table, &arg);
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = reftable_iterator_next_log(ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			/* we want the refnames that we have reflogs for, so we
+			 * skip if we've already produced this name. This could
+			 * be faster by seeking directly to
+			 * reflog@update_index==0.
+			 */
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	reftable_log_record_clear(&ri->log);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(refs->stack);
+	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_log_record log = { NULL };
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		err = reftable_iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid;
+			struct object_id new_oid;
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+	int i = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct reftable_log_record log = { NULL };
+		err = reftable_iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (i = len; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (i = 0; i < len; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		reftable_log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct reftable_log_record tombstone = {
+		.ref_name = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->refs->stack);
+	int i = 0;
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  reftable_pack_refs(), which will compact the entire stack and get rid
+	  of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.refs = refs,
+	};
+	struct reftable_log_record log = { NULL };
+	struct reftable_iterator it = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		struct object_id ooid;
+		struct object_id noid;
+
+		int err = reftable_iterator_next_log(it, &log);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0 || strcmp(log.ref_name, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_read_ref(refs->stack, refname, &ref);
+	if (err > 0) {
+		errno = ENOENT;
+		err = -1;
+		goto exit;
+	}
+	if (err < 0) {
+		errno = reftable_error_to_errno(err);
+		err = -1;
+		goto exit;
+	}
+	if (ref.target != NULL) {
+		/* XXX recurse? */
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else if (ref.value != NULL) {
+		hashcpy(oid->hash, ref.value);
+	} else {
+		*type |= REF_ISBROKEN;
+		errno = EINVAL;
+		err = -1;
+	}
+exit:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	git_reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire
+};
diff --git a/repository.c b/repository.c
index 6f7f6f002b1..087760bc184 100644
--- a/repository.c
+++ b/repository.c
@@ -178,6 +178,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 6534fbb7b31..f57b73f4a27 100644
--- a/repository.h
+++ b/repository.h
@@ -74,6 +74,9 @@ struct repository {
 	 */
 	struct ref_store *refs_private;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index 65fe5ecefbe..2ef970f9f88 100644
--- a/setup.c
+++ b/setup.c
@@ -468,9 +468,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -559,6 +561,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1204,8 +1207,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 00000000000..7510d24a932
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,109 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+INVALID_SHA1=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+initialize ()  {
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	mv .git/hooks .git/hooks-disabled
+}
+
+test_expect_success 'delete ref' '
+	initialize &&
+	test_commit file &&
+	SHA=$(git show-ref -s --verify HEAD) &&
+	test_write_lines "$SHA refs/heads/master" "$SHA refs/tags/file" >expect &&
+	git show-ref > actual &&
+	! git update-ref -d refs/tags/file $INVALID_SHA1 &&
+	test_cmp expect actual &&
+	git update-ref -d refs/tags/file $SHA  &&
+	test_write_lines "$SHA refs/heads/master" >expect &&
+	git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basic operation of reftable storage: commit, reflog, repack' '
+	initialize &&
+	test_commit file &&
+	test_write_lines refs/heads/master refs/tags/file >expect &&
+	git show-ref &&
+	git show-ref | cut -f2 -d" " > actual &&
+	test_cmp actual expect &&
+	for count in $(test_seq 1 10)
+	do
+		test_commit "number $count" file.t $count number-$count ||
+	        return 1
+	done &&
+	git pack-refs &&
+	ls -1 .git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 11 output &&
+	grep "commit (initial): file" output &&
+	grep "commit: number 10" output &&
+	git gc &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 0 output 
+'
+
+# This matches show-ref's output
+print_ref() {
+	echo "$(git rev-parse "$1") $1"
+}
+
+test_expect_success 'peeled tags are stored' '
+	initialize &&
+	test_commit file &&
+	git tag -m "annotated tag" test_tag HEAD &&
+	{
+		print_ref "refs/heads/master" &&
+		print_ref "refs/tags/file" &&
+		print_ref "refs/tags/test_tag" &&
+		print_ref "refs/tags/test_tag^{}" 
+	} >expect &&
+	git show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'show-ref works on fresh repo' '
+	initialize &&
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	>expect &&
+	! git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'checkout unborn branch' '
+	initialize &&
+	git checkout -b master
+'
+
+
+test_expect_success 'dir/file conflict' '
+	initialize &&
+	test_commit file &&
+	! git branch master/forbidden
+'
+
+
+test_expect_success 'do not clobber existing repo' '
+	rm -rf .git &&
+	git init --ref-storage=files &&
+	cat .git/HEAD > expect &&
+	test_commit file &&
+	(git init --ref-storage=reftable || true) &&
+	cat .git/HEAD > actual &&
+	test_cmp expect actual
+'
+
+
+test_done
+
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v12 10/12] vcxproj: adjust for the reftable changes
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
                                         ` (8 preceding siblings ...)
  2020-05-07  9:59                       ` [PATCH v12 09/12] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-05-07  9:59                       ` Johannes Schindelin via GitGitGadget
  2020-05-07  9:59                       ` [PATCH v12 11/12] t: use update-ref and show-ref to reading/writing refs Han-Wen Nienhuys via GitGitGadget
                                         ` (2 subsequent siblings)
  12 siblings, 0 replies; 409+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-05-07  9:59 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This allows Git to be compiled via Visual Studio again after integrating
the `hn/reftable` branch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.mak.uname                           |  2 +-
 contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/config.mak.uname b/config.mak.uname
index 5ad43c80b1a..484aec15ac2 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -710,7 +710,7 @@ vcxproj:
 	# Make .vcxproj files and add them
 	unset QUIET_GEN QUIET_BUILT_IN; \
 	perl contrib/buildsystems/generate -g Vcxproj
-	git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
+	git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj
 
 	# Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
 	(echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm
index 5c666f9ac03..33a08d31652 100644
--- a/contrib/buildsystems/Generators/Vcxproj.pm
+++ b/contrib/buildsystems/Generators/Vcxproj.pm
@@ -77,7 +77,7 @@ sub createProject {
     my $libs_release = "\n    ";
     my $libs_debug = "\n    ";
     if (!$static_library) {
-      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
+      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
       $libs_debug = $libs_release;
       $libs_debug =~ s/zlib\.lib/zlibd\.lib/g;
       $libs_debug =~ s/libcurl\.lib/libcurl-d\.lib/g;
@@ -231,6 +231,7 @@ sub createProject {
 EOM
     if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') {
       my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"};
+      my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"};
       my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"};
 
       print F << "EOM";
@@ -240,6 +241,14 @@ sub createProject {
       <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
     </ProjectReference>
 EOM
+      if (!($name =~ /xdiff|libreftable/)) {
+        print F << "EOM";
+    <ProjectReference Include="$cdup\\reftable\\libreftable\\libreftable.vcxproj">
+      <Project>$uuid_libreftable</Project>
+      <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
+    </ProjectReference>
+EOM
+      }
       if (!($name =~ 'xdiff')) {
         print F << "EOM";
     <ProjectReference Include="$cdup\\xdiff\\lib\\xdiff_lib.vcxproj">
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v12 11/12] t: use update-ref and show-ref to reading/writing refs
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
                                         ` (9 preceding siblings ...)
  2020-05-07  9:59                       ` [PATCH v12 10/12] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
@ 2020-05-07  9:59                       ` Han-Wen Nienhuys via GitGitGadget
  2020-05-07  9:59                       ` [PATCH v12 12/12] Add some reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-07  9:59 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reading and writing .git/refs/* assumes that refs are stored in the 'files'
ref backend.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/t0002-gitfile.sh             |  2 +-
 t/t1400-update-ref.sh          | 32 ++++++++++++++++----------------
 t/t1506-rev-parse-diagnosis.sh |  2 +-
 t/t6050-replace.sh             |  2 +-
 t/t9020-remote-svn.sh          |  4 ++--
 5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/t/t0002-gitfile.sh b/t/t0002-gitfile.sh
index 0aa9908ea12..960ed150cb5 100755
--- a/t/t0002-gitfile.sh
+++ b/t/t0002-gitfile.sh
@@ -62,7 +62,7 @@ test_expect_success 'check commit-tree' '
 '
 
 test_expect_success 'check rev-list' '
-	echo $SHA >"$REAL/HEAD" &&
+	git update-ref "HEAD" "$SHA" &&
 	test "$SHA" = "$(git rev-list HEAD)"
 '
 
diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index e1197ac8189..27171f82612 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -37,15 +37,15 @@ test_expect_success setup '
 
 test_expect_success "create $m" '
 	git update-ref $m $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "create $m with oldvalue verification" '
 	git update-ref $m $B $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "fail to delete $m with stale ref" '
 	test_must_fail git update-ref -d $m $A &&
-	test $B = "$(cat .git/$m)"
+	test $B = "$(git show-ref -s --verify $m)"
 '
 test_expect_success "delete $m" '
 	test_when_finished "rm -f .git/$m" &&
@@ -56,7 +56,7 @@ test_expect_success "delete $m" '
 test_expect_success "delete $m without oldvalue verification" '
 	test_when_finished "rm -f .git/$m" &&
 	git update-ref $m $A &&
-	test $A = $(cat .git/$m) &&
+	test $A = $(git show-ref -s --verify $m) &&
 	git update-ref -d $m &&
 	test_path_is_missing .git/$m
 '
@@ -69,15 +69,15 @@ test_expect_success "fail to create $n" '
 
 test_expect_success "create $m (by HEAD)" '
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "create $m (by HEAD) with oldvalue verification" '
 	git update-ref HEAD $B $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "fail to delete $m (by HEAD) with stale ref" '
 	test_must_fail git update-ref -d HEAD $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "delete $m (by HEAD)" '
 	test_when_finished "rm -f .git/$m" &&
@@ -178,14 +178,14 @@ test_expect_success '--no-create-reflog overrides core.logAllRefUpdates=always'
 
 test_expect_success "create $m (by HEAD)" '
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success 'pack refs' '
 	git pack-refs --all
 '
 test_expect_success "move $m (by HEAD)" '
 	git update-ref HEAD $B $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "delete $m (by HEAD) should remove both packed and loose $m" '
 	test_when_finished "rm -f .git/$m" &&
@@ -255,7 +255,7 @@ test_expect_success '(not) change HEAD with wrong SHA1' '
 '
 test_expect_success "(not) changed .git/$m" '
 	test_when_finished "rm -f .git/$m" &&
-	! test $B = $(cat .git/$m)
+	! test $B = $(git show-ref -s --verify $m)
 '
 
 rm -f .git/logs/refs/heads/master
@@ -263,19 +263,19 @@ test_expect_success "create $m (logged by touch)" '
 	test_config core.logAllRefUpdates false &&
 	GIT_COMMITTER_DATE="2005-05-26 23:30" \
 	git update-ref --create-reflog HEAD $A -m "Initial Creation" &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "update $m (logged by touch)" '
 	test_config core.logAllRefUpdates false &&
 	GIT_COMMITTER_DATE="2005-05-26 23:31" \
 	git update-ref HEAD $B $A -m "Switch" &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "set $m (logged by touch)" '
 	test_config core.logAllRefUpdates false &&
 	GIT_COMMITTER_DATE="2005-05-26 23:41" \
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 
 test_expect_success 'empty directory removal' '
@@ -319,19 +319,19 @@ test_expect_success "create $m (logged by config)" '
 	test_config core.logAllRefUpdates true &&
 	GIT_COMMITTER_DATE="2005-05-26 23:32" \
 	git update-ref HEAD $A -m "Initial Creation" &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "update $m (logged by config)" '
 	test_config core.logAllRefUpdates true &&
 	GIT_COMMITTER_DATE="2005-05-26 23:33" \
 	git update-ref HEAD'" $B $A "'-m "Switch" &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "set $m (logged by config)" '
 	test_config core.logAllRefUpdates true &&
 	GIT_COMMITTER_DATE="2005-05-26 23:43" \
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 
 cat >expect <<EOF
diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
index 52edcbdcc32..dbf690b9c1b 100755
--- a/t/t1506-rev-parse-diagnosis.sh
+++ b/t/t1506-rev-parse-diagnosis.sh
@@ -207,7 +207,7 @@ test_expect_success 'arg before dashdash must be a revision (ambiguous)' '
 	{
 		# we do not want to use rev-parse here, because
 		# we are testing it
-		cat .git/refs/heads/foobar &&
+		git show-ref -s refs/heads/foobar &&
 		printf "%s\n" --
 	} >expect &&
 	git rev-parse foobar -- >actual &&
diff --git a/t/t6050-replace.sh b/t/t6050-replace.sh
index e7e64e085dd..c80dc10b8f1 100755
--- a/t/t6050-replace.sh
+++ b/t/t6050-replace.sh
@@ -135,7 +135,7 @@ test_expect_success 'tag replaced commit' '
 test_expect_success '"git fsck" works' '
      git fsck master >fsck_master.out &&
      test_i18ngrep "dangling commit $R" fsck_master.out &&
-     test_i18ngrep "dangling tag $(cat .git/refs/tags/mytag)" fsck_master.out &&
+     test_i18ngrep "dangling tag $(git show-ref -s refs/tags/mytag)" fsck_master.out &&
      test -z "$(git fsck)"
 '
 
diff --git a/t/t9020-remote-svn.sh b/t/t9020-remote-svn.sh
index 6fca08e5e35..9fcfa969a9b 100755
--- a/t/t9020-remote-svn.sh
+++ b/t/t9020-remote-svn.sh
@@ -48,8 +48,8 @@ test_expect_success REMOTE_SVN 'simple fetch' '
 '
 
 test_debug '
-	cat .git/refs/svn/svnsim/master
-	cat .git/refs/remotes/svnsim/master
+	git show-ref -s refs/svn/svnsim/master
+	git show-ref -s refs/remotes/svnsim/master
 '
 
 test_expect_success REMOTE_SVN 'repeated fetch, nothing shall change' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v12 12/12] Add some reftable testing infrastructure
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
                                         ` (10 preceding siblings ...)
  2020-05-07  9:59                       ` [PATCH v12 11/12] t: use update-ref and show-ref to reading/writing refs Han-Wen Nienhuys via GitGitGadget
@ 2020-05-07  9:59                       ` Han-Wen Nienhuys via GitGitGadget
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  12 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-07  9:59 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* Add GIT_TEST_REFTABLE environment var to control default ref storage

* Add test_prerequisite REFTABLE. Skip t/t3210-pack-refs.sh for REFTABLE.

Major test failures:

 * t9903-bash-prompt - The bash mode reads .git/HEAD directly
 * t1400-update-ref.sh - Reads from .git/{refs,logs} directly
 * t1404-update-ref-errors.sh - Manipulates .git/refs/ directly
 * t1405 - inspecs .git/ directly.
 * t1450-fsck.sh - manipulates .git/ directly to create invalid state
 * Rebase, cherry-pick: pseudo refs aren't written through the refs backend.

Other tests by decreasing brokenness:

t1407-worktree-ref-store.sh              - 5 of 5
t1413-reflog-detach.sh                   - 7 of 7
t1415-worktree-refs.sh                   - 11 of 11
t3908-stash-in-worktree.sh               - 2 of 2
t4207-log-decoration-colors.sh           - 2 of 2
t5515-fetch-merge-logic.sh               - 17 of 17
t5900-repo-selection.sh                  - 8 of 8
t6016-rev-list-graph-simplify-history.sh - 12 of 12
t5573-pull-verify-signatures.sh          - 15 of 16
t5612-clone-refspec.sh                   - 12 of 13
t5514-fetch-multiple.sh                  - 11 of 12
t6030-bisect-porcelain.sh                - 64 of 71
t5533-push-cas.sh                        - 15 of 17
t5539-fetch-http-shallow.sh              - 7 of 8
t7413-submodule-is-active.sh             - 7 of 8
t2400-worktree-add.sh                    - 59 of 69
t0100-previous.sh                        - 5 of 6
t7419-submodule-set-branch.sh            - 5 of 6
t1404-update-ref-errors.sh               - 44 of 53
t6003-rev-list-topo-order.sh             - 29 of 35
t1409-avoid-packing-refs.sh              - 9 of 11
t5541-http-push-smart.sh                 - 31 of 38
t5407-post-rewrite-hook.sh               - 13 of 16
t9903-bash-prompt.sh                     - 52 of 66
t1414-reflog-walk.sh                     - 9 of 12
t1507-rev-parse-upstream.sh              - 21 of 28
t2404-worktree-config.sh                 - 9 of 12
t1505-rev-parse-last.sh                  - 5 of 7
t7510-signed-commit.sh                   - 16 of 23
t2018-checkout-branch.sh                 - 15 of 22
(..etc)



Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/clone.c               | 2 +-
 builtin/init-db.c             | 2 +-
 refs.c                        | 6 +++---
 t/t1409-avoid-packing-refs.sh | 6 ++++++
 t/t3210-pack-refs.sh          | 6 ++++++
 t/test-lib.sh                 | 5 +++++
 6 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4d0cf065e4a..780c5807415 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1109,7 +1109,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 
 	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
-		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
+		default_ref_storage(), INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index b7053b9e370..da5b4670c84 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -545,7 +545,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
-	const char *ref_storage_format = DEFAULT_REF_STORAGE;
+	const char *ref_storage_format = default_ref_storage();
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
diff --git a/refs.c b/refs.c
index 299a5db8bf1..b9b3e7e7070 100644
--- a/refs.c
+++ b/refs.c
@@ -1823,7 +1823,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	r->refs_private = ref_store_init(r->gitdir,
 					 r->ref_storage_format ?
 						 r->ref_storage_format :
-						 DEFAULT_REF_STORAGE,
+						 default_ref_storage(),
 					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
@@ -1879,7 +1879,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
+	refs = ref_store_init(submodule_sb.buf, default_ref_storage(),
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1893,7 +1893,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
-	const char *format = DEFAULT_REF_STORAGE; /* XXX */
+	const char *format = default_ref_storage();
 	struct ref_store *refs;
 	const char *id;
 
diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
index be12fb63506..c6f78325563 100755
--- a/t/t1409-avoid-packing-refs.sh
+++ b/t/t1409-avoid-packing-refs.sh
@@ -4,6 +4,12 @@ test_description='avoid rewriting packed-refs unnecessarily'
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 # Add an identifying mark to the packed-refs file header line. This
 # shouldn't upset readers, and it should be omitted if the file is
 # ever rewritten.
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index f41b2afb996..edaef2c175a 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -11,6 +11,12 @@ semantic is still the same.
 '
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success 'enable reflogs' '
 	git config core.logallrefupdates true
 '
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 1b221951a8e..b2b16979407 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1514,6 +1514,11 @@ FreeBSD)
 	;;
 esac
 
+if test -n "$GIT_TEST_REFTABLE"
+then
+  test_set_prereq REFTABLE
+fi
+
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v12 01/12] refs.h: clarify reflog iteration order
  2020-05-07  9:59                       ` [PATCH v12 01/12] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
@ 2020-05-08 18:54                         ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-08 18:54 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  refs.h | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/refs.h b/refs.h
> index a92d2c74c83..9421c5b8465 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -432,18 +432,21 @@ int delete_refs(const char *msg, struct string_list *refnames,
>  int refs_delete_reflog(struct ref_store *refs, const char *refname);
>  int delete_reflog(const char *refname);
>  
> -/* iterate over reflog entries */
> +/* Iterate over reflog entries. */

It is not wrong per-se, but it is more problematic that this
description is very similar to the functions that take callback
functions of this type than it begins with lowercase.

/* Callback to process a reflog entry found by the iterators (see below) */

perhaps?

>  typedef int each_reflog_ent_fn(
>  		struct object_id *old_oid, struct object_id *new_oid,
>  		const char *committer, timestamp_t timestamp,
>  		int tz, const char *msg, void *cb_data);
>  
> +/* Iterate in over reflog entries, oldest entry first. */

/* Iterate over reflog entries of refname in refs */

/* oldest entry first */
>  int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
>  			     each_reflog_ent_fn fn, void *cb_data);

/* youngest entry first */
>  int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
>  				     const char *refname,
>  				     each_reflog_ent_fn fn,
>  				     void *cb_data);
> +
> +/* Call a function for each reflog entry, oldest entry first. */

Likewise.

>  int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
>  int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v12 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-05-07  9:59                       ` [PATCH v12 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
@ 2020-05-08 18:54                         ` Junio C Hamano
  2020-05-11 11:41                           ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-05-08 18:54 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> This happens implicitly in the files/packed ref backend; making it
> explicit simplifies adding alternate ref storage backends, such as
> reftable.

Makes sense.



>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  refs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index 224ff66c7bb..4db27379661 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1525,7 +1525,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
>  
>  int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
>  {
> -	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
> +	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
>  }
>  
>  int for_each_ref(each_ref_fn fn, void *cb_data)
> @@ -1585,8 +1585,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
>  
>  int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
>  {
> -	return do_for_each_ref(refs, "", fn, 0,
> -			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
> +	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
> +			       cb_data);
>  }
>  
>  int for_each_rawref(each_ref_fn fn, void *cb_data)

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v12 03/12] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-05-07  9:59                       ` [PATCH v12 03/12] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
@ 2020-05-08 18:58                         ` Junio C Hamano
  2020-05-11 11:42                           ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-05-08 18:58 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  refs/refs-internal.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index ff2436c0fb7..3490aac3a40 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -438,6 +438,11 @@ void base_ref_iterator_free(struct ref_iterator *iter);
>  
>  /* Virtual function declarations for ref_iterators: */
>  
> +/*
> + * backend-specific implementation of ref_iterator_advance.
> + * For symrefs, the function should set REF_ISSYMREF, and it should also
> + * dereference the symref to provide the OID referent.
> + */
>  typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
>  
>  typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,

Makes sense.

I wonder if I should take 1-3/12 as a separate "clean-up" series and
merge it before everything else down to 'master'?  That would reduce
the churn somewhat.

I also wonder if there are similar "clean-up" buried later in the
series. For example, would it make sense to also move 11/12 and have
it graduate early together with 1-3/12?

Thanks.


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v12 06/12] reftable: define version 2 of the spec to accomodate SHA256
  2020-05-07  9:59                       ` [PATCH v12 06/12] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
@ 2020-05-08 19:59                         ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-08 19:59 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  Documentation/technical/reftable.txt | 50 ++++++++++++++++------------
>  1 file changed, 28 insertions(+), 22 deletions(-)
>
> diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
> index 9fa4657d9ff..ee3f36ea851 100644
> --- a/Documentation/technical/reftable.txt
> +++ b/Documentation/technical/reftable.txt
> @@ -193,8 +193,8 @@ and non-aligned files.
>  Very small files (e.g. 1 only ref block) may omit `padding` and the ref

Hmph, I am seeing nbsp before '1' and am wondering where it came from.

>  index to reduce total file size.
>  
> -Header
> -^^^^^^
> +Header (version 1)
> +^^^^^^^^^^^^^^^^^^
>  
>  A 24-byte header appears at the beginning of the file:
>  
> @@ -215,6 +215,24 @@ used in a stack for link:#Update-transactions[transactions], these
>  fields can order the files such that the prior file’s
>  `max_update_index + 1` is the next file’s `min_update_index`.

Am I correct to assume that we do not plan to support a repository
with mixed set of refs, some referring to a commit with its SHA-1
object name while others using SHA-256 object name?

> +Header (version 2)
> +^^^^^^^^^^^^^^^^^^
> +
> +A 28-byte header appears at the beginning of the file:
> +
> +....
> +'REFT'
> +uint8( version_number = 1 )

Shouldn't this be 2 instead, as v1 lacked the Hash-id field?

> +uint24( block_size )
> +uint64( min_update_index )
> +uint64( max_update_index )
> +uint32( hash_id )
> +....
> +
> +The header is identical to `version_number=1`, with the 4-byte hash ID
> +("sha1" for SHA1 and "s256" for SHA-256) append to the header.

Am I correct to assume that SHA-1 repositories are encouraged to use
version 2 when the code becomes available?

>  First ref block
>  ^^^^^^^^^^^^^^^
>  
> @@ -671,14 +689,8 @@ Footer
>  After the last block of the file, a file footer is written. It begins
>  like the file header, but is extended with additional data.
>  
> -A 68-byte footer appears at the end:
> -
>  ....
> -    'REFT'
> -    uint8( version_number = 1 )
> -    uint24( block_size )
> -    uint64( min_update_index )
> -    uint64( max_update_index )
> +    HEADER
>  
>      uint64( ref_index_position )
>      uint64( (obj_position << 5) | obj_id_len )
> @@ -701,12 +713,16 @@ obj blocks.
>  * `obj_index_position`: byte position for the start of the obj index.
>  * `log_index_position`: byte position for the start of the log index.
>  
> +The size of the footer is 68 bytes for version 1, and 72 bytes for
> +version 2.
> +
>  Reading the footer
>  ++++++++++++++++++
>  
> -Readers must seek to `file_length - 68` to access the footer. A trusted
> -external source (such as `stat(2)`) is necessary to obtain
> -`file_length`. When reading the footer, readers must verify:
> +Readers must first read the file start to determine the version
> +number. Then they seek to `file_length - FOOTER_LENGTH` to access the
> +footer. A trusted external source (such as `stat(2)`) is necessary to
> +obtain `file_length`. When reading the footer, readers must verify:

In any case, the size of this patch is pleasant to see---it must be
a sign that the previous step was done well not to hardcode the
"hash size is 20 bytes" assumption all over the place and instead
used "this field holds N+m bytes where N is the size of the hash
described in the REFT header" consistently.

Nicely done.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v12 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-05-08 18:54                         ` Junio C Hamano
@ 2020-05-11 11:41                           ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-11 11:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Fri, May 8, 2020 at 8:54 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Han-Wen Nienhuys <hanwen@google.com>
> >
> > This happens implicitly in the files/packed ref backend; making it
> > explicit simplifies adding alternate ref storage backends, such as
> > reftable.
>
> Makes sense.

It makes sense, but I'm not happy with this yet. I think the more
consistent thing would be to have pseudo refs be integrated in the ref
backend completely. Then, there could be a INCLUDE_PSEUDO_REFS flag
that could be supplied to the iteration to decide whether or not to
produce these refs.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v12 03/12] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-05-08 18:58                         ` Junio C Hamano
@ 2020-05-11 11:42                           ` Han-Wen Nienhuys
  2020-05-11 14:49                             ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-11 11:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Fri, May 8, 2020 at 8:58 PM Junio C Hamano <gitster@pobox.com> wrote:
> Makes sense.
>
> I wonder if I should take 1-3/12 as a separate "clean-up" series and
> merge it before everything else down to 'master'?  That would reduce
> the churn somewhat.

That would be great. Do I need to send them separately, or do can you
cherry-pick the changes out of this series?


> I also wonder if there are similar "clean-up" buried later in the
> series. For example, would it make sense to also move 11/12 and have
> it graduate early together with 1-3/12?

yes, I'll reorder my patches to reflect this.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v12 03/12] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-05-11 11:42                           ` Han-Wen Nienhuys
@ 2020-05-11 14:49                             ` Junio C Hamano
  2020-05-11 15:11                               ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-05-11 14:49 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Johannes Schindelin, Jonathan Nieder, Jeff King,
	Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

Han-Wen Nienhuys <hanwen@google.com> writes:

> On Fri, May 8, 2020 at 8:58 PM Junio C Hamano <gitster@pobox.com> wrote:
>> Makes sense.
>>
>> I wonder if I should take 1-3/12 as a separate "clean-up" series and
>> merge it before everything else down to 'master'?  That would reduce
>> the churn somewhat.
>
> That would be great. Do I need to send them separately, or do can you
> cherry-pick the changes out of this series?

For the past several days, my tree had two topics, hn/refs-cleanup
(4 patches---1, 2, 3 and 11 from this series) and hn/reftable (the
rest) queued separately.  If everybody (including you) is happy with
the former, we can just treat it as a separate "preliminary
clean-up" topic and merge it down thru 'next' to 'master' and
hopefully we can do so before the end of this cycle, as I do not
immediately see anything controversial in them.  The other topic
builds on top of it, and it may have to get into a reviewable state,
but at least the early "good bits" split out into a separate topic
shouldn't have to wait.  Then we only need to iterate on the latter
part.

I am wondering if we can also throw the file format documentation
into the "more or less uncontroversial" pile.  There may be a lot to
dislike in the current "implementation" in reftable/*.c, especially
when viewed in the context of this project, that makes it less
appealing to review, but my understanding is that the feature as
designed by Shawn and described in the format document is already in
use in JGit, so even if the code may need a major update only to
become viewable from some reviewers' point of view, the design it
aims to implement, at least a major part of it, is more or less cast
in stone.  Unless we collectively decide that we will never support
reftable in git-core, we need to adopt the format documentation and
the way the subsystem is supposed to work as-is to be compatible,
which makes it "more or less uncontroversial".  It may need some
copyediting, though.  Also it is not clear to me if the base part
(without "hash id" and incremented format version) is the only thing
in such a "the other system already implements it, so compatibility
concerns leave little room to change the design at this point"
state, or the updated variant that allows us to support SHA-256 and
other hashes is also in that state.


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v12 03/12] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-05-11 14:49                             ` Junio C Hamano
@ 2020-05-11 15:11                               ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-11 15:11 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Schindelin, Jonathan Nieder, Jeff King,
	Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Mon, May 11, 2020 at 4:49 PM Junio C Hamano <gitster@pobox.com> wrote:
> > On Fri, May 8, 2020 at 8:58 PM Junio C Hamano <gitster@pobox.com> wrote:
> >> Makes sense.
> >>
> >> I wonder if I should take 1-3/12 as a separate "clean-up" series and
> >> merge it before everything else down to 'master'?  That would reduce
> >> the churn somewhat.
> >
> > That would be great. Do I need to send them separately, or do can you
> > cherry-pick the changes out of this series?
>
> For the past several days, my tree had two topics, hn/refs-cleanup
> (4 patches---1, 2, 3 and 11 from this series) and hn/reftable (the
> rest) queued separately.  If everybody (including you) is happy with

I think the following patches should be uncontroversial:

* refs.h: clarify reflog iteration order - I tweaked it to reflect your comments
* refs: document how ref_iterator_advance_fn should handle symrefs
* reftable: file format documentation - I tweaked it to remove nbsp,
and fixed one phrasing nit.
* reftable: clarify how empty tables should be written
* t: use update-ref and show-ref to reading/writing refs

The patch "Iterate over the "refs/" namespace in for_each_[raw]ref" is
something I want to mull over for a bit. It may be unnecessary (stay
tuned).

> I am wondering if we can also throw the file format documentation
> into the "more or less uncontroversial" pile.  There may be a lot to
> dislike in the current "implementation" in reftable/*.c, especially
> when viewed in the context of this project, that makes it less
> appealing to review, but my understanding is that the feature as
> designed by Shawn and described in the format document is already in
> use in JGit, so even if the code may need a major update only to
> become viewable from some reviewers' point of view, the design it
> aims to implement, at least a major part of it, is more or less cast
> in stone.

Correct; I support this.

>   Unless we collectively decide that we will never support
> reftable in git-core, we need to adopt the format documentation and
> the way the subsystem is supposed to work as-is to be compatible,
> which makes it "more or less uncontroversial".  It may need some
> copyediting, though.  Also it is not clear to me if the base part
> (without "hash id" and incremented format version) is the only thing
> in such a "the other system already implements it, so compatibility
> concerns leave little room to change the design at this point"
> state, or the updated variant that allows us to support SHA-256 and
> other hashes is also in that state.

JGit supports version 1 of reftable only, and JGit doesn't do SHA-256
at all, so we have a little  wiggle room with SHA-256, but the
proposal I make here is the simplest way to also support SHA-256 that
is in line with how it has been decided on in general (with a 4-byte
hash ID).

The serialization format is set in stone from Google's perspective (we
have been using it in production for 2.5 years now). The repo layout
on disk is recent (Code submitted last November, and updated in
February to reflect Peff's compatibility advice), so we could get away
with minor tweaks if we want.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v13 00/13] Reftable support git-core
  2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
                                         ` (11 preceding siblings ...)
  2020-05-07  9:59                       ` [PATCH v12 12/12] Add some reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
@ 2020-05-11 19:46                       ` Han-Wen Nienhuys via GitGitGadget
  2020-05-11 19:46                         ` [PATCH v13 01/13] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
                                           ` (14 more replies)
  12 siblings, 15 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend.

Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1

Summary 19055 tests pass 2765 tests fail

Some issues:

 * many tests inspect .git/{logs,heads}/ directly.
 * worktrees broken. 
 * Rebase/cherry-pick still largely broken

v16

 * handle pseudo refs.
 * various bugfixes and fixes for mem leaks.

Han-Wen Nienhuys (11):
  refs.h: clarify reflog iteration order
  t: use update-ref and show-ref to reading/writing refs
  refs: document how ref_iterator_advance_fn should handle symrefs
  reftable: clarify how empty tables should be written
  reftable: define version 2 of the spec to accomodate SHA256
  Write pseudorefs through ref backends.
  Iterate over the "refs/" namespace in for_each_[raw]ref
  Add .gitattributes for the reftable/ directory
  Add reftable library
  Reftable support for git-core
  Add some reftable testing infrastructure

Johannes Schindelin (1):
  vcxproj: adjust for the reftable changes

Jonathan Nieder (1):
  reftable: file format documentation

 Documentation/Makefile                        |    1 +
 Documentation/technical/reftable.txt          | 1083 ++++++++++++++
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   27 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   56 +-
 cache.h                                       |    6 +-
 config.mak.uname                              |    2 +-
 contrib/buildsystems/Generators/Vcxproj.pm    |   11 +-
 refs.c                                        |  148 +-
 refs.h                                        |   28 +-
 refs/files-backend.c                          |  164 ++-
 refs/packed-backend.c                         |   40 +-
 refs/refs-internal.h                          |   18 +
 refs/reftable-backend.c                       | 1218 ++++++++++++++++
 reftable/.gitattributes                       |    1 +
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   11 +
 reftable/VERSION                              |   14 +
 reftable/basics.c                             |  215 +++
 reftable/basics.h                             |   53 +
 reftable/block.c                              |  434 ++++++
 reftable/block.h                              |  129 ++
 reftable/constants.h                          |   21 +
 reftable/file.c                               |   99 ++
 reftable/iter.c                               |  240 ++++
 reftable/iter.h                               |   63 +
 reftable/merged.c                             |  325 +++++
 reftable/merged.h                             |   38 +
 reftable/pq.c                                 |  114 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  753 ++++++++++
 reftable/reader.h                             |   65 +
 reftable/record.c                             | 1141 +++++++++++++++
 reftable/record.h                             |  121 ++
 reftable/refname.c                            |  215 +++
 reftable/refname.h                            |   38 +
 reftable/reftable.c                           |   91 ++
 reftable/reftable.h                           |  564 ++++++++
 reftable/slice.c                              |  225 +++
 reftable/slice.h                              |   76 +
 reftable/stack.c                              | 1245 +++++++++++++++++
 reftable/stack.h                              |   48 +
 reftable/system.h                             |   54 +
 reftable/tree.c                               |   67 +
 reftable/tree.h                               |   34 +
 reftable/update.sh                            |   24 +
 reftable/writer.c                             |  665 +++++++++
 reftable/writer.h                             |   60 +
 reftable/zlib-compat.c                        |   92 ++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0002-gitfile.sh                            |    2 +-
 t/t0031-reftable.sh                           |  120 ++
 t/t1400-update-ref.sh                         |   32 +-
 t/t1409-avoid-packing-refs.sh                 |    6 +
 t/t1506-rev-parse-diagnosis.sh                |    2 +-
 t/t3210-pack-refs.sh                          |    6 +
 t/t6050-replace.sh                            |    2 +-
 t/t9020-remote-svn.sh                         |    4 +-
 t/test-lib.sh                                 |    5 +
 62 files changed, 10201 insertions(+), 207 deletions(-)
 create mode 100644 Documentation/technical/reftable.txt
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/.gitattributes
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100755 t/t0031-reftable.sh


base-commit: b994622632154fc3b17fb40a38819ad954a5fb88
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v13
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v13
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v12:

  1:  dfa5fd74f85 !  1:  8394c156eb4 refs.h: clarify reflog iteration order
     @@ refs.h: int delete_refs(const char *msg, struct string_list *refnames,
       int delete_reflog(const char *refname);
       
      -/* iterate over reflog entries */
     -+/* Iterate over reflog entries. */
     ++/* Callback to process a reflog entry found by the iteration functions (see
     ++ * below) */
       typedef int each_reflog_ent_fn(
       		struct object_id *old_oid, struct object_id *new_oid,
       		const char *committer, timestamp_t timestamp,
       		int tz, const char *msg, void *cb_data);
       
     -+/* Iterate in over reflog entries, oldest entry first. */
     ++/* Iterate in over reflog entries in the log for `refname`. */
     ++
     ++/* oldest entry first */
       int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
       			     each_reflog_ent_fn fn, void *cb_data);
     ++
     ++/* youngest entry first */
       int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
       				     const char *refname,
       				     each_reflog_ent_fn fn,
       				     void *cb_data);
      +
     -+/* Call a function for each reflog entry, oldest entry first. */
     ++/* Call a function for each reflog entry in the log for `refname`. */
     ++
     ++/* oldest entry first */
       int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
     ++
     ++/* youngest entry first */
       int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
       
     + /*
 11:  2abcbd1af99 =  2:  dbf45fe8753 t: use update-ref and show-ref to reading/writing refs
  3:  6553285043b =  3:  be083a85fb5 refs: document how ref_iterator_advance_fn should handle symrefs
  5:  06fcb49e903 !  4:  96fd9814a67 reftable: file format documentation
     @@ Documentation/technical/reftable.txt (new)
      +
      +Some repositories contain a lot of references (e.g. android at 866k,
      +rails at 31k). The existing packed-refs format takes up a lot of space
     -+(e.g. 62M), and does not scale with additional references. Lookup of a
     ++(e.g. 62M), and does not scale with additional references. Lookup of a
      +single reference requires linearly scanning the file.
      +
      +Atomic pushes modifying multiple references require copying the entire
      +packed-refs file, which can be a considerable amount of data moved
     -+(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
     ++(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
      +
      +Repositories with many loose references occupy a large number of disk
      +blocks from the local file system, as each reference is its own file
     @@ Documentation/technical/reftable.txt (new)
      +Performance
      +^^^^^^^^^^^
      +
     -+Space used, packed-refs vs. reftable:
     ++Space used, packed-refs vs. reftable:
      +
      +[cols=",>,>,>,>,>",options="header",]
      +|===============================================================
     @@ Documentation/technical/reftable.txt (new)
      +|reftable |hot | |20.2 usec |320.8 usec
      +|=========================================================
      +
     -+Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable:
     ++Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable:
      +
      +[cols=",>,>",options="header",]
      +|================================
     @@ Documentation/technical/reftable.txt (new)
      +index] to support fast lookup. Readers must be able to read both aligned
      +and non-aligned files.
      +
     -+Very small files (e.g. 1 only ref block) may omit `padding` and the ref
     ++Very small files (e.g. a single ref block) may omit `padding` and the ref
      +index to reduce total file size.
      +
      +Header
     @@ Documentation/technical/reftable.txt (new)
      +
      +The index may be organized into a multi-level index, where the 1st level
      +index block points to additional ref index blocks (2nd level), which may
     -+in turn point to either additional index blocks (e.g. 3rd level) or ref
     ++in turn point to either additional index blocks (e.g. 3rd level) or ref
      +blocks (leaf level). Disk reads required to access a ref go up with
      +higher index levels. Multi-level indexes may be required to ensure no
      +single index block exceeds the file format’s max block size of
     @@ Documentation/technical/reftable.txt (new)
      +
      +The first `position_delta` is the position from the start of the file.
      +Additional `position_delta` entries are sorted ascending and relative to
     -+the prior entry, e.g. a reader would perform:
     ++the prior entry, e.g. a reader would perform:
      +
      +....
      +pos = position_delta[0]
     @@ Documentation/technical/reftable.txt (new)
      +    uint32( CRC-32 of above )
      +....
      +
     -+If a section is missing (e.g. ref index) the corresponding position
     ++If a section is missing (e.g. ref index) the corresponding position
      +field (e.g. `ref_index_position`) will be 0.
      +
      +* `obj_position`: byte position for the first obj block.
     @@ Documentation/technical/reftable.txt (new)
      +Low overhead
      +^^^^^^^^^^^^
      +
     -+A reftable with very few references (e.g. git.git with 5 heads) is 269
     -+bytes for reftable, vs. 332 bytes for packed-refs. This supports
     ++A reftable with very few references (e.g. git.git with 5 heads) is 269
     ++bytes for reftable, vs. 332 bytes for packed-refs. This supports
      +reftable scaling down for transaction logs (below).
      +
      +Block size
     @@ Documentation/technical/reftable.txt (new)
      +repository is single-threaded for writers. Writers may have to busy-spin
      +(with backoff) around creating `tables.list.lock`, for up to an
      +acceptable wait period, aborting if the repository is too busy to
     -+mutate. Application servers wrapped around repositories (e.g. Gerrit
     ++mutate. Application servers wrapped around repositories (e.g. Gerrit
      +Code Review) can layer their own lock/wait queue to improve fairness to
      +writers.
      +
     @@ Documentation/technical/reftable.txt (new)
      +bzip packed-refs
      +^^^^^^^^^^^^^^^^
      +
     -+`bzip2` can significantly shrink a large packed-refs file (e.g. 62 MiB
     ++`bzip2` can significantly shrink a large packed-refs file (e.g. 62 MiB
      +compresses to 23 MiB, 37%). However the bzip format does not support
      +random access to a single reference. Readers must inflate and discard
      +while performing a linear scan.
     @@ Documentation/technical/reftable.txt (new)
      +interleaved between refs.
      +
      +Performance testing indicates reftable is faster for lookups (51%
     -+faster, 11.2 usec vs. 5.4 usec), although reftable produces a slightly
     ++faster, 11.2 usec vs. 5.4 usec), although reftable produces a slightly
      +larger file (+ ~3.2%, 28.3M vs 29.2M):
      +
      +[cols=">,>,>,>",options="header",]
     @@ Documentation/technical/reftable.txt (new)
      +The RefTree format adds additional load on the object database storage
      +layer (more loose objects, more objects in packs), and relies heavily on
      +the packer’s delta compression to save space. Namespaces which are flat
     -+(e.g. thousands of tags in refs/tags) initially create very large loose
     ++(e.g. thousands of tags in refs/tags) initially create very large loose
      +objects, and so RefTree does not address the problem of copying many
      +references to modify a handful.
      +
     @@ Documentation/technical/reftable.txt (new)
      +Longer hashes
      +^^^^^^^^^^^^^
      +
     -+Version will bump (e.g. 2) to indicate `value` uses a different object
     ++Version will bump (e.g. 2) to indicate `value` uses a different object
      +id length other than 20. The length could be stored in an expanded file
      +header, or hardcoded as part of the version.
  7:  6d9031372ce =  5:  7aa3f92fca0 reftable: clarify how empty tables should be written
  6:  093fa74a3d0 !  6:  1e3c8f2d3e8 reftable: define version 2 of the spec to accomodate SHA256
     @@ Metadata
       ## Commit message ##
          reftable: define version 2 of the spec to accomodate SHA256
      
     +    Version appends a hash ID to the file header, making it slightly larger.
     +
     +    This commit also changes "SHA-1" into "object ID" in many places.
     +
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
       ## Documentation/technical/reftable.txt ##
     +@@ Documentation/technical/reftable.txt: Objectives
     + 
     + * Near constant time lookup for any single reference, even when the
     + repository is cold and not in process or kernel cache.
     +-* Near constant time verification if a SHA-1 is referred to by at least
     ++* Near constant time verification if an object ID is referred to by at least
     + one reference (for allow-tip-sha1-in-want).
     + * Efficient lookup of an entire namespace, such as `refs/tags/`.
     + * Support atomic push with `O(size_of_update)` operations.
      @@ Documentation/technical/reftable.txt: and non-aligned files.
     - Very small files (e.g. 1 only ref block) may omit `padding` and the ref
     + Very small files (e.g. a single ref block) may omit `padding` and the ref
       index to reduce total file size.
       
      -Header
     @@ Documentation/technical/reftable.txt: used in a stack for link:#Update-transacti
      +
      +....
      +'REFT'
     -+uint8( version_number = 1 )
     ++uint8( version_number = 2 )
      +uint24( block_size )
      +uint64( min_update_index )
      +uint64( max_update_index )
     @@ Documentation/technical/reftable.txt: used in a stack for link:#Update-transacti
      +The header is identical to `version_number=1`, with the 4-byte hash ID
      +("sha1" for SHA1 and "s256" for SHA-256) append to the header.
      +
     ++For maximum backward compatibility, it is recommended to use version 1 when
     ++writing SHA1 reftables.
     ++
      +
       First ref block
       ^^^^^^^^^^^^^^^
       
     +@@ Documentation/technical/reftable.txt: The `value` follows. Its format is determined by `value_type`, one of
     + the following:
     + 
     + * `0x0`: deletion; no value data (see transactions, below)
     +-* `0x1`: one 20-byte object id; value of the ref
     +-* `0x2`: two 20-byte object ids; value of the ref, peeled target
     ++* `0x1`: one object id; value of the ref
     ++* `0x2`: two object ids; value of the ref, peeled target
     + * `0x3`: symbolic reference: `varint( target_len ) target`
     + 
     + Symbolic references use `0x3`, followed by the complete name of the
     +@@ Documentation/technical/reftable.txt: Obj block format
     + ^^^^^^^^^^^^^^^^
     + 
     + Object blocks are optional. Writers may choose to omit object blocks,
     +-especially if readers will not use the SHA-1 to ref mapping.
     ++especially if readers will not use the object ID to ref mapping.
     + 
     +-Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to
     ++Object blocks use unique, abbreviated 2-32 byte object ID keys, mapping to
     + ref blocks containing references pointing to that object directly, or as
     + the peeled value of an annotated tag. Like ref blocks, object blocks use
     + the file’s standard block size. The abbrevation length is available in
     +@@ Documentation/technical/reftable.txt: the footer as `obj_id_len`.
     + To save space in small files, object blocks may be omitted if the ref
     + index is not present, as brute force search will only need to read a few
     + ref blocks. When missing, readers should brute force a linear search of
     +-all references to lookup by SHA-1.
     ++all references to lookup by object ID.
     + 
     + An object block is written as:
     + 
     +@@ Documentation/technical/reftable.txt: works the same as in reference blocks.
     + 
     + Because object identifiers are abbreviated by writers to the shortest
     + unique abbreviation within the reftable, obj key lengths are variable
     +-between 2 and 20 bytes. Readers must compare only for common prefix
     ++between 2 and 32 bytes. Readers must compare only for common prefix
     + match within an obj block or obj index.
     + 
     + obj record
     +@@ Documentation/technical/reftable.txt: for (j = 1; j < position_count; j++) {
     + ....
     + 
     + With a position in hand, a reader must linearly scan the ref block,
     +-starting from the first `ref_record`, testing each reference’s SHA-1s
     ++starting from the first `ref_record`, testing each reference’s object IDs
     + (for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
     +-SHA-1 within a single ref block is not supported by the reftable format.
     ++object ID within a single ref block is not supported by the reftable format.
     + Smaller block sizes reduce the number of candidates this step must
     + consider.
     + 
     +@@ Documentation/technical/reftable.txt: reflogs must treat this as a deletion.
     + For `log_type = 0x1`, the `log_data` section follows
     + linkgit:git-update-ref[1] logging and includes:
     + 
     +-* two 20-byte SHA-1s (old id, new id)
     ++* two object IDs (old id, new id)
     + * varint string of committer’s name
     + * varint string of committer’s email
     + * varint time in seconds since epoch (Jan 1, 1970)
      @@ Documentation/technical/reftable.txt: Footer
       After the last block of the file, a file footer is written. It begins
       like the file header, but is extended with additional data.
     @@ Documentation/technical/reftable.txt: obj blocks.
       
       * 4-byte magic is correct
       * 1-byte version number is recognized
     +@@ Documentation/technical/reftable.txt: Lightweight refs dominate
     + ^^^^^^^^^^^^^^^^^^^^^^^^^
     + 
     + The reftable format assumes the vast majority of references are single
     +-SHA-1 valued with common prefixes, such as Gerrit Code Review’s
     ++object IDs valued with common prefixes, such as Gerrit Code Review’s
     + `refs/changes/` namespace, GitHub’s `refs/pulls/` namespace, or many
     + lightweight tags in the `refs/tags/` namespace.
     + 
     +-Annotated tags storing the peeled object cost an additional 20 bytes per
     ++Annotated tags storing the peeled object cost an additional object ID per
     + reference.
     + 
     + Low overhead
     +@@ Documentation/technical/reftable.txt: Scans and lookups dominate
     + 
     + Scanning all references and lookup by name (or namespace such as
     + `refs/heads/`) are the most common activities performed on repositories.
     +-SHA-1s are stored directly with references to optimize this use case.
     ++Object IDs are stored directly with references to optimize this use case.
     + 
     + Logs are infrequently read
     + ^^^^^^^^^^^^^^^^^^^^^^^^^^
      @@ Documentation/technical/reftable.txt: impossible.
       
       A common format that can be supported by all major Git implementations
     @@ Documentation/technical/reftable.txt: impossible.
      -Longer hashes
      -^^^^^^^^^^^^^
      -
     --Version will bump (e.g. 2) to indicate `value` uses a different object
     +-Version will bump (e.g. 2) to indicate `value` uses a different object
      -id length other than 20. The length could be stored in an expanded file
      -header, or hardcoded as part of the version.
  -:  ----------- >  7:  2c2f94ddc0e Write pseudorefs through ref backends.
  2:  340c5c415e1 =  8:  3becaaee66a Iterate over the "refs/" namespace in for_each_[raw]ref
  4:  7dc47c7756f =  9:  a6f77965f84 Add .gitattributes for the reftable/ directory
  8:  57d338c4983 ! 10:  8103703c358 Add reftable library
     @@ reftable/README.md (new)
      
       ## reftable/VERSION (new) ##
      @@
     -+commit 06dd91a8377b0f920d9835b9835d0d650f928dce
     ++commit e74c14b66b6c15f6526c485f2e45d3f2735d359d
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Wed May 6 21:27:32 2020 +0200
     -+
     -+    C: fix small memory leak in stack.c error path
     ++Date:   Mon May 11 21:02:55 2020 +0200
     ++
     ++    C: handle out-of-date reftable stacks
     ++    
     ++    * Make reftable_stack_reload() check out-of-dateness. This makes it very cheap,
     ++      allowing it to be called often. This is useful because Git calls itself often,
     ++      which effectively requires a reload.
     ++    
     ++    * In reftable_stack_add(), check the return value of stack_uptodate(), leading
     ++      to erroneously succeeding the transaction.
     ++    
     ++    * A test that exercises the above.
      
       ## reftable/basics.c (new) ##
      @@
     @@ reftable/block.c (new)
      +
      +int block_iter_next(struct block_iter *it, struct record rec)
      +{
     ++	struct slice in = {
     ++		.buf = it->br->block.data + it->next_off,
     ++		.len = it->br->block_len - it->next_off,
     ++	};
     ++	struct slice start = in;
     ++	struct slice key = { 0 };
     ++	byte extra = 0;
     ++	int n = 0;
     ++
      +	if (it->next_off >= it->br->block_len) {
      +		return 1;
      +	}
      +
     -+	{
     -+		struct slice in = {
     -+			.buf = it->br->block.data + it->next_off,
     -+			.len = it->br->block_len - it->next_off,
     -+		};
     -+		struct slice start = in;
     -+		struct slice key = { 0 };
     -+		byte extra;
     -+		int n = decode_key(&key, &extra, it->last_key, in);
     -+		if (n < 0) {
     -+			return -1;
     -+		}
     -+
     -+		slice_consume(&in, n);
     -+		n = record_decode(rec, key, extra, in, it->br->hash_size);
     -+		if (n < 0) {
     -+			return -1;
     -+		}
     -+		slice_consume(&in, n);
     ++	n = decode_key(&key, &extra, it->last_key, in);
     ++	if (n < 0) {
     ++		return -1;
     ++	}
      +
     -+		slice_copy(&it->last_key, key);
     -+		it->next_off += start.len - in.len;
     -+		slice_clear(&key);
     -+		return 0;
     ++	slice_consume(&in, n);
     ++	n = record_decode(rec, key, extra, in, it->br->hash_size);
     ++	if (n < 0) {
     ++		return -1;
      +	}
     ++	slice_consume(&in, n);
     ++
     ++	slice_copy(&it->last_key, key);
     ++	it->next_off += start.len - in.len;
     ++	slice_clear(&key);
     ++	return 0;
      +}
      +
      +int block_reader_first_key(struct block_reader *br, struct slice *key)
     @@ reftable/block.c (new)
      +		.key = want,
      +		.r = br,
      +	};
     ++	struct record rec = new_record(block_reader_type(br));
     ++	struct slice key = { 0 };
     ++	int err = 0;
     ++	struct block_iter next = { 0 };
      +
      +	int i = binsearch(br->restart_count, &restart_key_less, &args);
      +	if (args.error) {
     -+		return -1;
     ++		err = REFTABLE_FORMAT_ERROR;
     ++		goto exit;
      +	}
      +
      +	it->br = br;
     @@ reftable/block.c (new)
      +		it->next_off = br->header_off + 4;
      +	}
      +
     -+	{
     -+		struct record rec = new_record(block_reader_type(br));
     -+		struct slice key = { 0 };
     -+		int result = 0;
     -+		int err = 0;
     -+		struct block_iter next = { 0 };
     -+		while (true) {
     -+			block_iter_copy_from(&next, it);
     -+
     -+			err = block_iter_next(&next, rec);
     -+			if (err < 0) {
     -+				result = -1;
     -+				goto exit;
     -+			}
     -+
     -+			record_key(rec, &key);
     -+			if (err > 0 || slice_compare(key, want) >= 0) {
     -+				result = 0;
     -+				goto exit;
     -+			}
     -+
     -+			block_iter_copy_from(it, &next);
     ++	/* We're looking for the last entry less/equal than the wanted key, so
     ++	   we have to go one entry too far and then back up.
     ++	*/
     ++	while (true) {
     ++		block_iter_copy_from(&next, it);
     ++		err = block_iter_next(&next, rec);
     ++		if (err < 0) {
     ++			goto exit;
      +		}
      +
     -+	exit:
     -+		slice_clear(&key);
     -+		slice_clear(&next.last_key);
     -+		record_destroy(&rec);
     ++		record_key(rec, &key);
     ++		if (err > 0 || slice_compare(key, want) >= 0) {
     ++			err = 0;
     ++			goto exit;
     ++		}
      +
     -+		return result;
     ++		block_iter_copy_from(it, &next);
      +	}
     ++
     ++exit:
     ++	slice_clear(&key);
     ++	slice_clear(&next.last_key);
     ++	record_destroy(&rec);
     ++
     ++	return err;
      +}
      +
      +void block_writer_clear(struct block_writer *bw)
     @@ reftable/merged.c (new)
      +		if (err < 0) {
      +			return err;
      +		}
     -+		record_clear(top.rec);
     -+		reftable_free(record_yield(&top.rec));
     ++		record_destroy(&top.rec);
      +	}
      +
      +	record_copy_from(rec, entry.rec, hash_size(mi->hash_id));
     -+	record_clear(entry.rec);
     -+	reftable_free(record_yield(&entry.rec));
     ++	record_destroy(&entry.rec);
      +	slice_clear(&entry_key);
      +	return 0;
      +}
     @@ reftable/pq.c (new)
      +{
      +	int i = 0;
      +	for (i = 0; i < pq->len; i++) {
     -+		record_clear(pq->heap[i].rec);
     -+		reftable_free(record_yield(&pq->heap[i].rec));
     ++		record_destroy(&pq->heap[i].rec);
      +	}
      +	FREE_AND_NULL(pq->heap);
      +	pq->len = pq->cap = 0;
     @@ reftable/record.c (new)
      +		}
      +		slice_consume(&in, n);
      +		seen_target = true;
     ++		if (r->target != NULL) {
     ++			reftable_free(r->target);
     ++		}
      +		r->target = (char *)slice_as_string(&dest);
      +	} break;
      +
     @@ reftable/record.c (new)
      +	r->update_index = (~max) - ts;
      +
      +	if (val_type == 0) {
     ++		FREE_AND_NULL(r->old_hash);
     ++		FREE_AND_NULL(r->new_hash);
     ++		FREE_AND_NULL(r->message);
     ++		FREE_AND_NULL(r->email);
     ++		FREE_AND_NULL(r->name);
      +		return 0;
      +	}
      +
     @@ reftable/refname.c (new)
      +static int find_name(size_t k, void *arg)
      +{
      +	struct find_arg *f_arg = (struct find_arg *)arg;
     -+
      +	return strcmp(f_arg->names[k], f_arg->want) >= 0;
      +}
      +
     @@ reftable/refname.c (new)
      +
      +static void modification_clear(struct modification *mod)
      +{
     ++	/* don't delete the strings themselves; they're owned by ref records.
     ++	 */
      +	FREE_AND_NULL(mod->add);
      +	FREE_AND_NULL(mod->del);
      +	mod->add_len = 0;
     @@ reftable/refname.c (new)
      +			goto exit;
      +		}
      +	}
     -+
      +	err = reftable_table_seek_ref(mod->tab, &it, prefix);
      +	if (err) {
      +		goto exit;
     @@ reftable/reftable.h (new)
      +/* Commits the transaction, releasing the lock. */
      +int reftable_addition_commit(struct reftable_addition *add);
      +
     -+/* Release all non-committed data from the transaction; releases the lock if
     -+ * held. */
     -+void reftable_addition_close(struct reftable_addition *add);
     ++/* Release all non-committed data from the transaction, and deallocate the
     ++   transaction. Releases the lock if held. */
     ++void reftable_addition_destroy(struct reftable_addition *add);
      +
      +/* add a new table to the stack. The write_table function must call
      +   reftable_writer_set_limits, add refs and return an error value. */
     @@ reftable/reftable.h (new)
      +/* frees all resources associated with the stack. */
      +void reftable_stack_destroy(struct reftable_stack *st);
      +
     -+/* reloads the stack if necessary. */
     ++/* Reloads the stack if necessary. This is very cheap to run if the stack was up
     ++ * to date */
      +int reftable_stack_reload(struct reftable_stack *st);
      +
      +/* Policy for expiring reflog entries. */
     @@ reftable/stack.c (new)
      +	p->reftable_dir = xstrdup(dir);
      +	p->config = config;
      +
     -+	err = reftable_stack_reload(p);
     ++	err = reftable_stack_reload_maybe_reuse(p, true);
      +	if (err < 0) {
      +		reftable_stack_destroy(p);
      +	} else {
     @@ reftable/stack.c (new)
      +/* Close and free the stack */
      +void reftable_stack_destroy(struct reftable_stack *st)
      +{
     -+	if (st->merged == NULL) {
     -+		return;
     ++	if (st->merged != NULL) {
     ++		reftable_merged_table_close(st->merged);
     ++		reftable_merged_table_free(st->merged);
     ++		st->merged = NULL;
      +	}
     -+
     -+	reftable_merged_table_close(st->merged);
     -+	reftable_merged_table_free(st->merged);
     -+	st->merged = NULL;
     -+
      +	FREE_AND_NULL(st->list_file);
      +	FREE_AND_NULL(st->reftable_dir);
      +	reftable_free(st);
     @@ reftable/stack.c (new)
      +	return udiff;
      +}
      +
     -+static int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
     -+					     bool reuse_open)
     ++int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
     ++				      bool reuse_open)
      +{
      +	struct timeval deadline = { 0 };
      +	int err = gettimeofday(&deadline, NULL);
     @@ reftable/stack.c (new)
      +	return 0;
      +}
      +
     -+int reftable_stack_reload(struct reftable_stack *st)
     -+{
     -+	return reftable_stack_reload_maybe_reuse(st, true);
     -+}
     -+
      +/* -1 = error
      + 0 = up to date
      + 1 = changed. */
     @@ reftable/stack.c (new)
      +	return err;
      +}
      +
     ++int reftable_stack_reload(struct reftable_stack *st)
     ++{
     ++	int err = stack_uptodate(st);
     ++	if (err > 0) {
     ++		return reftable_stack_reload_maybe_reuse(st, true);
     ++	}
     ++	return err;
     ++}
     ++
      +int reftable_stack_add(struct reftable_stack *st,
      +		       int (*write)(struct reftable_writer *wr, void *arg),
      +		       void *arg)
     @@ reftable/stack.c (new)
      +	slice_clear(&nm);
      +}
      +
     ++void reftable_addition_destroy(struct reftable_addition *add)
     ++{
     ++	if (add == NULL) {
     ++		return;
     ++	}
     ++	reftable_addition_close(add);
     ++	reftable_free(add);
     ++}
     ++
      +int reftable_addition_commit(struct reftable_addition *add)
      +{
      +	struct slice table_list = { 0 };
     @@ reftable/stack.c (new)
      +				struct reftable_stack *st)
      +{
      +	int err = 0;
     -+	*dest = reftable_malloc(sizeof(**dest));
     ++	*dest = reftable_calloc(sizeof(**dest));
      +	err = reftable_stack_init_addition(*dest, st);
      +	if (err) {
      +		reftable_free(*dest);
     @@ reftable/stack.c (new)
      +	if (err < 0) {
      +		goto exit;
      +	}
     ++	if (err > 0) {
     ++		err = REFTABLE_LOCK_ERROR;
     ++		goto exit;
     ++	}
      +
      +	err = reftable_addition_add(&add, write_table, arg);
      +	if (err < 0) {
     @@ reftable/stack.c (new)
      +	slice_append_string(&tab_file_name, "/");
      +	slice_append(&tab_file_name, next_name);
      +
     ++	/* TODO: should check destination out of paranoia */
      +	err = rename(slice_as_string(&temp_tab_file_name),
      +		     slice_as_string(&tab_file_name));
      +	if (err < 0) {
     @@ reftable/stack.c (new)
      +	if (err < 0) {
      +		goto exit;
      +	}
     -+	reftable_writer_free(wr);
      +
      +	err = close(tab_fd);
      +	tab_fd = 0;
      +
      +exit:
     ++	reftable_writer_free(wr);
      +	if (tab_fd > 0) {
      +		close(tab_fd);
      +		tab_fd = 0;
     @@ reftable/stack.c (new)
      +
      +	err = validate_ref_record_addition(tab, refs, len);
      +
     ++exit:
      +	for (i = 0; i < len; i++) {
      +		reftable_ref_record_clear(&refs[i]);
      +	}
      +
     -+exit:
      +	free(refs);
      +	reftable_iterator_destroy(&it);
      +	reftable_reader_free(rd);
     @@ reftable/stack.h (new)
      +			struct reftable_log_expiry_config *config);
      +int fastlog2(uint64_t sz);
      +int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
     ++void reftable_addition_close(struct reftable_addition *add);
     ++int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
     ++				      bool reuse_open);
      +
      +struct segment {
      +	int start, end;
     @@ reftable/writer.c (new)
      +			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
      +
      +			err = block_writer_add(w->block_writer, rec);
     -+			assert(err == 0);
     ++			if (err != 0) {
     ++				/* write into fresh block should always succeed
     ++				 */
     ++				abort();
     ++			}
      +		}
      +		for (i = 0; i < idx_len; i++) {
      +			slice_clear(&idx[i].last_key);
  9:  f3bb9410038 ! 11:  ace95b6cd88 Reftable support for git-core
     @@ refs/reftable-backend.c (new)
      +			break;
      +		}
      +
     ++		/*
     ++		   We could filter pseudo refs here explicitly, but HEAD is not
     ++		   a PSEUDOREF, but a PER_WORKTREE, b/c each worktree can have
     ++		   its own HEAD.
     ++		 */
      +		ri->base.refname = ri->ref.ref_name;
      +		if (ri->prefix != NULL &&
      +		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
     @@ refs/reftable-backend.c (new)
      +
      +static int reftable_transaction_prepare(struct ref_store *ref_store,
      +					struct ref_transaction *transaction,
     -+					struct strbuf *err)
     ++					struct strbuf *errbuf)
      +{
     -+	return 0;
     ++	/* XXX rewrite using the reftable transaction API. */
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
     ++	int err = refs->err;
     ++	if (err < 0) {
     ++		goto done;
     ++	}
     ++	err = reftable_stack_reload(refs->stack);
     ++	if (err) {
     ++		goto done;
     ++	}
     ++
     ++done:
     ++	return err;
      +}
      +
      +static int reftable_transaction_abort(struct ref_store *ref_store,
     @@ refs/reftable-backend.c (new)
      +	return reftable_transaction_commit(ref_store, transaction, err);
      +}
      +
     ++struct write_pseudoref_arg {
     ++	struct reftable_stack *stack;
     ++	const char *pseudoref;
     ++	const struct object_id *new_oid;
     ++	const struct object_id *old_oid;
     ++};
     ++
     ++static int write_pseudoref_table(struct reftable_writer *writer, void *argv)
     ++{
     ++	struct write_pseudoref_arg *arg = (struct write_pseudoref_arg *)argv;
     ++	uint64_t ts = reftable_stack_next_update_index(arg->stack);
     ++	int err = 0;
     ++	struct reftable_ref_record read_ref = { NULL };
     ++	struct reftable_ref_record write_ref = { NULL };
     ++
     ++	reftable_writer_set_limits(writer, ts, ts);
     ++	if (arg->old_oid) {
     ++		struct object_id read_oid;
     ++		err = reftable_stack_read_ref(arg->stack, arg->pseudoref,
     ++					      &read_ref);
     ++		if (err < 0)
     ++			goto done;
     ++
     ++		if ((err > 0) != is_null_oid(arg->old_oid)) {
     ++			err = REFTABLE_LOCK_ERROR;
     ++			goto done;
     ++		}
     ++
     ++		/* XXX If old_oid is set, and we have a symref? */
     ++
     ++		if (err == 0 && read_ref.value == NULL) {
     ++			err = REFTABLE_LOCK_ERROR;
     ++			goto done;
     ++		}
     ++
     ++		hashcpy(read_oid.hash, read_ref.value);
     ++		if (!oideq(arg->old_oid, &read_oid)) {
     ++			err = REFTABLE_LOCK_ERROR;
     ++			goto done;
     ++		}
     ++	}
     ++
     ++	write_ref.ref_name = (char *)arg->pseudoref;
     ++	write_ref.update_index = ts;
     ++	if (!is_null_oid(arg->new_oid))
     ++		write_ref.value = (uint8_t *)arg->new_oid->hash;
     ++
     ++	err = reftable_writer_add_ref(writer, &write_ref);
     ++done:
     ++	reftable_ref_record_clear(&read_ref);
     ++	return err;
     ++}
     ++
     ++static int reftable_write_pseudoref(struct ref_store *ref_store,
     ++				    const char *pseudoref,
     ++				    const struct object_id *oid,
     ++				    const struct object_id *old_oid,
     ++				    struct strbuf *errbuf)
     ++{
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
     ++	struct write_pseudoref_arg arg = {
     ++		.stack = refs->stack,
     ++		.pseudoref = pseudoref,
     ++		.new_oid = oid,
     ++	};
     ++	struct reftable_addition *add = NULL;
     ++	int err = refs->err;
     ++	if (err < 0) {
     ++		goto done;
     ++	}
     ++
     ++	err = reftable_stack_reload(refs->stack);
     ++	if (err) {
     ++		goto done;
     ++	}
     ++	err = reftable_stack_new_addition(&add, refs->stack);
     ++	if (err) {
     ++		goto done;
     ++	}
     ++	if (old_oid) {
     ++		struct object_id actual_old_oid;
     ++
     ++		/* XXX this is cut & paste from files-backend - should factor
     ++		 * out? */
     ++		if (read_ref(pseudoref, &actual_old_oid)) {
     ++			if (!is_null_oid(old_oid)) {
     ++				strbuf_addf(errbuf,
     ++					    _("could not read ref '%s'"),
     ++					    pseudoref);
     ++				goto done;
     ++			}
     ++		} else if (is_null_oid(old_oid)) {
     ++			strbuf_addf(errbuf, _("ref '%s' already exists"),
     ++				    pseudoref);
     ++			goto done;
     ++		} else if (!oideq(&actual_old_oid, old_oid)) {
     ++			strbuf_addf(errbuf,
     ++				    _("unexpected object ID when writing '%s'"),
     ++				    pseudoref);
     ++			goto done;
     ++		}
     ++	}
     ++
     ++	err = reftable_addition_add(add, &write_pseudoref_table, &arg);
     ++	if (err < 0) {
     ++		strbuf_addf(errbuf, "reftable: pseudoref update failure: %s",
     ++			    reftable_error_str(err));
     ++	}
     ++
     ++	err = reftable_addition_commit(add);
     ++	if (err < 0) {
     ++		strbuf_addf(errbuf, "reftable: pseudoref commit failure: %s",
     ++			    reftable_error_str(err));
     ++	}
     ++
     ++done:
     ++	reftable_addition_destroy(add);
     ++	return err;
     ++}
     ++
     ++static int reftable_delete_pseudoref(struct ref_store *ref_store,
     ++				     const char *pseudoref,
     ++				     const struct object_id *old_oid)
     ++{
     ++	struct strbuf errbuf = STRBUF_INIT;
     ++	int ret = reftable_write_pseudoref(ref_store, pseudoref, &null_oid,
     ++					   old_oid, &errbuf);
     ++	/* XXX what to do with the error message? */
     ++	strbuf_release(&errbuf);
     ++	return ret;
     ++}
     ++
      +struct write_delete_refs_arg {
      +	struct reftable_stack *stack;
      +	struct string_list *refnames;
     @@ refs/reftable-backend.c (new)
      +		.logmsg = msg,
      +		.flags = flags,
      +	};
     -+	if (refs->err < 0) {
     -+		return refs->err;
     ++	int err = refs->err;
     ++	if (err < 0) {
     ++		goto done;
      +	}
     -+
     -+	return reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
     ++	err = reftable_stack_reload(refs->stack);
     ++	if (err) {
     ++		goto done;
     ++	}
     ++	err = reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
     ++done:
     ++	return err;
      +}
      +
      +static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
     @@ refs/reftable-backend.c (new)
      +					       .refname = refname,
      +					       .target = target,
      +					       .logmsg = logmsg };
     -+	if (refs->err < 0) {
     -+		return refs->err;
     ++	int err = refs->err;
     ++	if (err < 0) {
     ++		goto done;
     ++	}
     ++	err = reftable_stack_reload(refs->stack);
     ++	if (err) {
     ++		goto done;
      +	}
     -+	return reftable_stack_add(refs->stack, &write_create_symref_table,
     -+				  &arg);
     ++	err = reftable_stack_add(refs->stack, &write_create_symref_table, &arg);
     ++done:
     ++	return err;
      +}
      +
      +struct write_rename_arg {
     @@ refs/reftable-backend.c (new)
      +		.newname = newrefname,
      +		.logmsg = logmsg,
      +	};
     -+	if (refs->err < 0) {
     -+		return refs->err;
     ++	int err = refs->err;
     ++	if (err < 0) {
     ++		goto done;
     ++	}
     ++	err = reftable_stack_reload(refs->stack);
     ++	if (err) {
     ++		goto done;
      +	}
      +
     -+	return reftable_stack_add(refs->stack, &write_rename_table, &arg);
     ++	err = reftable_stack_add(refs->stack, &write_rename_table, &arg);
     ++done:
     ++	return err;
      +}
      +
      +static int reftable_copy_ref(struct ref_store *ref_store,
     @@ refs/reftable-backend.c (new)
      +	reftable_rename_ref,
      +	reftable_copy_ref,
      +
     ++	reftable_write_pseudoref,
     ++	reftable_delete_pseudoref,
     ++
      +	reftable_ref_iterator_begin,
      +	reftable_read_raw_ref,
      +
     @@ t/t0031-reftable.sh (new)
      +	test_cmp expect actual
      +'
      +
     ++# cherry-pick uses a pseudo ref.
     ++test_expect_success 'pseudo refs' '
     ++	initialize &&
     ++	test_commit message1 file1 &&
     ++	test_commit message2 file2 &&
     ++	git branch source &&
     ++	git checkout HEAD^ &&
     ++	test_commit message3 file3 &&
     ++	git cherry-pick source &&
     ++	test -f file2
     ++'
      +
      +test_done
      +
 10:  94fcc6dca6a = 12:  b96f0712d44 vcxproj: adjust for the reftable changes
 12:  fe9407d10b1 = 13:  0e732d30b51 Add some reftable testing infrastructure

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v13 01/13] refs.h: clarify reflog iteration order
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-05-11 19:46                         ` Han-Wen Nienhuys via GitGitGadget
  2020-05-18 23:31                           ` Junio C Hamano
  2020-05-11 19:46                         ` [PATCH v13 02/13] t: use update-ref and show-ref to reading/writing refs Han-Wen Nienhuys via GitGitGadget
                                           ` (13 subsequent siblings)
  14 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.h | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/refs.h b/refs.h
index a92d2c74c83..99ba9e331e5 100644
--- a/refs.h
+++ b/refs.h
@@ -432,19 +432,31 @@ int delete_refs(const char *msg, struct string_list *refnames,
 int refs_delete_reflog(struct ref_store *refs, const char *refname);
 int delete_reflog(const char *refname);
 
-/* iterate over reflog entries */
+/* Callback to process a reflog entry found by the iteration functions (see
+ * below) */
 typedef int each_reflog_ent_fn(
 		struct object_id *old_oid, struct object_id *new_oid,
 		const char *committer, timestamp_t timestamp,
 		int tz, const char *msg, void *cb_data);
 
+/* Iterate in over reflog entries in the log for `refname`. */
+
+/* oldest entry first */
 int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
 			     each_reflog_ent_fn fn, void *cb_data);
+
+/* youngest entry first */
 int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
 				     const char *refname,
 				     each_reflog_ent_fn fn,
 				     void *cb_data);
+
+/* Call a function for each reflog entry in the log for `refname`. */
+
+/* oldest entry first */
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
+
+/* youngest entry first */
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
 /*
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v13 02/13] t: use update-ref and show-ref to reading/writing refs
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-05-11 19:46                         ` [PATCH v13 01/13] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
@ 2020-05-11 19:46                         ` Han-Wen Nienhuys via GitGitGadget
  2020-05-18 23:34                           ` Junio C Hamano
  2020-05-11 19:46                         ` [PATCH v13 03/13] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
                                           ` (12 subsequent siblings)
  14 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reading and writing .git/refs/* assumes that refs are stored in the 'files'
ref backend.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/t0002-gitfile.sh             |  2 +-
 t/t1400-update-ref.sh          | 32 ++++++++++++++++----------------
 t/t1506-rev-parse-diagnosis.sh |  2 +-
 t/t6050-replace.sh             |  2 +-
 t/t9020-remote-svn.sh          |  4 ++--
 5 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/t/t0002-gitfile.sh b/t/t0002-gitfile.sh
index 0aa9908ea12..960ed150cb5 100755
--- a/t/t0002-gitfile.sh
+++ b/t/t0002-gitfile.sh
@@ -62,7 +62,7 @@ test_expect_success 'check commit-tree' '
 '
 
 test_expect_success 'check rev-list' '
-	echo $SHA >"$REAL/HEAD" &&
+	git update-ref "HEAD" "$SHA" &&
 	test "$SHA" = "$(git rev-list HEAD)"
 '
 
diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index e1197ac8189..27171f82612 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -37,15 +37,15 @@ test_expect_success setup '
 
 test_expect_success "create $m" '
 	git update-ref $m $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "create $m with oldvalue verification" '
 	git update-ref $m $B $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "fail to delete $m with stale ref" '
 	test_must_fail git update-ref -d $m $A &&
-	test $B = "$(cat .git/$m)"
+	test $B = "$(git show-ref -s --verify $m)"
 '
 test_expect_success "delete $m" '
 	test_when_finished "rm -f .git/$m" &&
@@ -56,7 +56,7 @@ test_expect_success "delete $m" '
 test_expect_success "delete $m without oldvalue verification" '
 	test_when_finished "rm -f .git/$m" &&
 	git update-ref $m $A &&
-	test $A = $(cat .git/$m) &&
+	test $A = $(git show-ref -s --verify $m) &&
 	git update-ref -d $m &&
 	test_path_is_missing .git/$m
 '
@@ -69,15 +69,15 @@ test_expect_success "fail to create $n" '
 
 test_expect_success "create $m (by HEAD)" '
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "create $m (by HEAD) with oldvalue verification" '
 	git update-ref HEAD $B $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "fail to delete $m (by HEAD) with stale ref" '
 	test_must_fail git update-ref -d HEAD $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "delete $m (by HEAD)" '
 	test_when_finished "rm -f .git/$m" &&
@@ -178,14 +178,14 @@ test_expect_success '--no-create-reflog overrides core.logAllRefUpdates=always'
 
 test_expect_success "create $m (by HEAD)" '
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success 'pack refs' '
 	git pack-refs --all
 '
 test_expect_success "move $m (by HEAD)" '
 	git update-ref HEAD $B $A &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "delete $m (by HEAD) should remove both packed and loose $m" '
 	test_when_finished "rm -f .git/$m" &&
@@ -255,7 +255,7 @@ test_expect_success '(not) change HEAD with wrong SHA1' '
 '
 test_expect_success "(not) changed .git/$m" '
 	test_when_finished "rm -f .git/$m" &&
-	! test $B = $(cat .git/$m)
+	! test $B = $(git show-ref -s --verify $m)
 '
 
 rm -f .git/logs/refs/heads/master
@@ -263,19 +263,19 @@ test_expect_success "create $m (logged by touch)" '
 	test_config core.logAllRefUpdates false &&
 	GIT_COMMITTER_DATE="2005-05-26 23:30" \
 	git update-ref --create-reflog HEAD $A -m "Initial Creation" &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "update $m (logged by touch)" '
 	test_config core.logAllRefUpdates false &&
 	GIT_COMMITTER_DATE="2005-05-26 23:31" \
 	git update-ref HEAD $B $A -m "Switch" &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "set $m (logged by touch)" '
 	test_config core.logAllRefUpdates false &&
 	GIT_COMMITTER_DATE="2005-05-26 23:41" \
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 
 test_expect_success 'empty directory removal' '
@@ -319,19 +319,19 @@ test_expect_success "create $m (logged by config)" '
 	test_config core.logAllRefUpdates true &&
 	GIT_COMMITTER_DATE="2005-05-26 23:32" \
 	git update-ref HEAD $A -m "Initial Creation" &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 test_expect_success "update $m (logged by config)" '
 	test_config core.logAllRefUpdates true &&
 	GIT_COMMITTER_DATE="2005-05-26 23:33" \
 	git update-ref HEAD'" $B $A "'-m "Switch" &&
-	test $B = $(cat .git/$m)
+	test $B = $(git show-ref -s --verify $m)
 '
 test_expect_success "set $m (logged by config)" '
 	test_config core.logAllRefUpdates true &&
 	GIT_COMMITTER_DATE="2005-05-26 23:43" \
 	git update-ref HEAD $A &&
-	test $A = $(cat .git/$m)
+	test $A = $(git show-ref -s --verify $m)
 '
 
 cat >expect <<EOF
diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
index 52edcbdcc32..dbf690b9c1b 100755
--- a/t/t1506-rev-parse-diagnosis.sh
+++ b/t/t1506-rev-parse-diagnosis.sh
@@ -207,7 +207,7 @@ test_expect_success 'arg before dashdash must be a revision (ambiguous)' '
 	{
 		# we do not want to use rev-parse here, because
 		# we are testing it
-		cat .git/refs/heads/foobar &&
+		git show-ref -s refs/heads/foobar &&
 		printf "%s\n" --
 	} >expect &&
 	git rev-parse foobar -- >actual &&
diff --git a/t/t6050-replace.sh b/t/t6050-replace.sh
index e7e64e085dd..c80dc10b8f1 100755
--- a/t/t6050-replace.sh
+++ b/t/t6050-replace.sh
@@ -135,7 +135,7 @@ test_expect_success 'tag replaced commit' '
 test_expect_success '"git fsck" works' '
      git fsck master >fsck_master.out &&
      test_i18ngrep "dangling commit $R" fsck_master.out &&
-     test_i18ngrep "dangling tag $(cat .git/refs/tags/mytag)" fsck_master.out &&
+     test_i18ngrep "dangling tag $(git show-ref -s refs/tags/mytag)" fsck_master.out &&
      test -z "$(git fsck)"
 '
 
diff --git a/t/t9020-remote-svn.sh b/t/t9020-remote-svn.sh
index 6fca08e5e35..9fcfa969a9b 100755
--- a/t/t9020-remote-svn.sh
+++ b/t/t9020-remote-svn.sh
@@ -48,8 +48,8 @@ test_expect_success REMOTE_SVN 'simple fetch' '
 '
 
 test_debug '
-	cat .git/refs/svn/svnsim/master
-	cat .git/refs/remotes/svnsim/master
+	git show-ref -s refs/svn/svnsim/master
+	git show-ref -s refs/remotes/svnsim/master
 '
 
 test_expect_success REMOTE_SVN 'repeated fetch, nothing shall change' '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v13 03/13] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-05-11 19:46                         ` [PATCH v13 01/13] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
  2020-05-11 19:46                         ` [PATCH v13 02/13] t: use update-ref and show-ref to reading/writing refs Han-Wen Nienhuys via GitGitGadget
@ 2020-05-11 19:46                         ` Han-Wen Nienhuys via GitGitGadget
  2020-05-18 23:43                           ` Junio C Hamano
  2020-05-11 19:46                         ` [PATCH v13 04/13] reftable: file format documentation Jonathan Nieder via GitGitGadget
                                           ` (11 subsequent siblings)
  14 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/refs-internal.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index ff2436c0fb7..3490aac3a40 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -438,6 +438,11 @@ void base_ref_iterator_free(struct ref_iterator *iter);
 
 /* Virtual function declarations for ref_iterators: */
 
+/*
+ * backend-specific implementation of ref_iterator_advance.
+ * For symrefs, the function should set REF_ISSYMREF, and it should also
+ * dereference the symref to provide the OID referent.
+ */
 typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
 
 typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v13 04/13] reftable: file format documentation
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                           ` (2 preceding siblings ...)
  2020-05-11 19:46                         ` [PATCH v13 03/13] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
@ 2020-05-11 19:46                         ` Jonathan Nieder via GitGitGadget
  2020-05-19 22:00                           ` Junio C Hamano
  2020-05-11 19:46                         ` [PATCH v13 05/13] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
                                           ` (10 subsequent siblings)
  14 siblings, 1 reply; 409+ messages in thread
From: Jonathan Nieder via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jonathan Nieder

From: Jonathan Nieder <jrnieder@gmail.com>

Shawn Pearce explains:

Some repositories contain a lot of references (e.g. android at 866k,
rails at 31k). The reftable format provides:

- Near constant time lookup for any single reference, even when the
  repository is cold and not in process or kernel cache.
- Near constant time verification a SHA-1 is referred to by at least
  one reference (for allow-tip-sha1-in-want).
- Efficient lookup of an entire namespace, such as `refs/tags/`.
- Support atomic push `O(size_of_update)` operations.
- Combine reflog storage with ref storage.

This file format spec was originally written in July, 2017 by Shawn
Pearce.  Some refinements since then were made by Shawn and by Han-Wen
Nienhuys based on experiences implementing and experimenting with the
format.  (All of this was in the context of our work at Google and
Google is happy to contribute the result to the Git project.)

Imported from JGit[1]'s current version (c217d33ff,
"Documentation/technical/reftable: improve repo layout", 2020-02-04)
of Documentation/technical/reftable.md and converted to asciidoc by
running

  pandoc -t asciidoc -f markdown reftable.md >reftable.txt

using pandoc 2.2.1.  The result required the following additional
minor changes:

- removed the [TOC] directive to add a table of contents, since
  asciidoc does not support it
- replaced git-scm.com/docs links with linkgit: directives that link
  to other pages within Git's documentation

[1] https://eclipse.googlesource.com/jgit/jgit

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 Documentation/Makefile               |    1 +
 Documentation/technical/reftable.txt | 1067 ++++++++++++++++++++++++++
 2 files changed, 1068 insertions(+)
 create mode 100644 Documentation/technical/reftable.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 15d9d04f316..ecd0b340b1c 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -93,6 +93,7 @@ TECH_DOCS += technical/protocol-capabilities
 TECH_DOCS += technical/protocol-common
 TECH_DOCS += technical/protocol-v2
 TECH_DOCS += technical/racy-git
+TECH_DOCS += technical/reftable
 TECH_DOCS += technical/send-pack-pipeline
 TECH_DOCS += technical/shallow
 TECH_DOCS += technical/signature-format
diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
new file mode 100644
index 00000000000..8bad9ade256
--- /dev/null
+++ b/Documentation/technical/reftable.txt
@@ -0,0 +1,1067 @@
+reftable
+--------
+
+Overview
+~~~~~~~~
+
+Problem statement
+^^^^^^^^^^^^^^^^^
+
+Some repositories contain a lot of references (e.g. android at 866k,
+rails at 31k). The existing packed-refs format takes up a lot of space
+(e.g. 62M), and does not scale with additional references. Lookup of a
+single reference requires linearly scanning the file.
+
+Atomic pushes modifying multiple references require copying the entire
+packed-refs file, which can be a considerable amount of data moved
+(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
+
+Repositories with many loose references occupy a large number of disk
+blocks from the local file system, as each reference is its own file
+storing 41 bytes (and another file for the corresponding reflog). This
+negatively affects the number of inodes available when a large number of
+repositories are stored on the same filesystem. Readers can be penalized
+due to the larger number of syscalls required to traverse and read the
+`$GIT_DIR/refs` directory.
+
+Objectives
+^^^^^^^^^^
+
+* Near constant time lookup for any single reference, even when the
+repository is cold and not in process or kernel cache.
+* Near constant time verification if a SHA-1 is referred to by at least
+one reference (for allow-tip-sha1-in-want).
+* Efficient lookup of an entire namespace, such as `refs/tags/`.
+* Support atomic push with `O(size_of_update)` operations.
+* Combine reflog storage with ref storage for small transactions.
+* Separate reflog storage for base refs and historical logs.
+
+Description
+^^^^^^^^^^^
+
+A reftable file is a portable binary file format customized for
+reference storage. References are sorted, enabling linear scans, binary
+search lookup, and range scans.
+
+Storage in the file is organized into variable sized blocks. Prefix
+compression is used within a single block to reduce disk space. Block
+size and alignment is tunable by the writer.
+
+Performance
+^^^^^^^^^^^
+
+Space used, packed-refs vs. reftable:
+
+[cols=",>,>,>,>,>",options="header",]
+|===============================================================
+|repository |packed-refs |reftable |% original |avg ref |avg obj
+|android |62.2 M |36.1 M |58.0% |33 bytes |5 bytes
+|rails |1.8 M |1.1 M |57.7% |29 bytes |4 bytes
+|git |78.7 K |48.1 K |61.0% |50 bytes |4 bytes
+|git (heads) |332 b |269 b |81.0% |33 bytes |0 bytes
+|===============================================================
+
+Scan (read 866k refs), by reference name lookup (single ref from 866k
+refs), and by SHA-1 lookup (refs with that SHA-1, from 866k refs):
+
+[cols=",>,>,>,>",options="header",]
+|=========================================================
+|format |cache |scan |by name |by SHA-1
+|packed-refs |cold |402 ms |409,660.1 usec |412,535.8 usec
+|packed-refs |hot | |6,844.6 usec |20,110.1 usec
+|reftable |cold |112 ms |33.9 usec |323.2 usec
+|reftable |hot | |20.2 usec |320.8 usec
+|=========================================================
+
+Space used for 149,932 log entries for 43,061 refs, reflog vs. reftable:
+
+[cols=",>,>",options="header",]
+|================================
+|format |size |avg entry
+|$GIT_DIR/logs |173 M |1209 bytes
+|reftable |5 M |37 bytes
+|================================
+
+Details
+~~~~~~~
+
+Peeling
+^^^^^^^
+
+References stored in a reftable are peeled, a record for an annotated
+(or signed) tag records both the tag object, and the object it refers
+to.
+
+Reference name encoding
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Reference names are an uninterpreted sequence of bytes that must pass
+linkgit:git-check-ref-format[1] as a valid reference name.
+
+Key unicity
+^^^^^^^^^^^
+
+Each entry must have a unique key; repeated keys are disallowed.
+
+Network byte order
+^^^^^^^^^^^^^^^^^^
+
+All multi-byte, fixed width fields are in network byte order.
+
+Ordering
+^^^^^^^^
+
+Blocks are lexicographically ordered by their first reference.
+
+Directory/file conflicts
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format accepts both `refs/heads/foo` and
+`refs/heads/foo/bar` as distinct references.
+
+This property is useful for retaining log records in reftable, but may
+confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
+references. Users of reftable may choose to continue to reject `foo` and
+`foo/bar` type conflicts to prevent problems for peers.
+
+File format
+~~~~~~~~~~~
+
+Structure
+^^^^^^^^^
+
+A reftable file has the following high-level structure:
+
+....
+first_block {
+  header
+  first_ref_block
+}
+ref_block*
+ref_index*
+obj_block*
+obj_index*
+log_block*
+log_index*
+footer
+....
+
+A log-only file omits the `ref_block`, `ref_index`, `obj_block` and
+`obj_index` sections, containing only the file header and log block:
+
+....
+first_block {
+  header
+}
+log_block*
+log_index*
+footer
+....
+
+in a log-only file the first log block immediately follows the file
+header, without padding to block alignment.
+
+Block size
+^^^^^^^^^^
+
+The file’s block size is arbitrarily determined by the writer, and does
+not have to be a power of 2. The block size must be larger than the
+longest reference name or log entry used in the repository, as
+references cannot span blocks.
+
+Powers of two that are friendly to the virtual memory system or
+filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
+yield better compression, with a possible increased cost incurred by
+readers during access.
+
+The largest block size is `16777215` bytes (15.99 MiB).
+
+Block alignment
+^^^^^^^^^^^^^^^
+
+Writers may choose to align blocks at multiples of the block size by
+including `padding` filled with NUL bytes at the end of a block to round
+out to the chosen alignment. When alignment is used, writers must
+specify the alignment with the file header’s `block_size` field.
+
+Block alignment is not required by the file format. Unaligned files must
+set `block_size = 0` in the file header, and omit `padding`. Unaligned
+files with more than one ref block must include the link:#Ref-index[ref
+index] to support fast lookup. Readers must be able to read both aligned
+and non-aligned files.
+
+Very small files (e.g. a single ref block) may omit `padding` and the ref
+index to reduce total file size.
+
+Header
+^^^^^^
+
+A 24-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 1 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+....
+
+Aligned files must specify `block_size` to configure readers with the
+expected block alignment. Unaligned files must set `block_size = 0`.
+
+The `min_update_index` and `max_update_index` describe bounds for the
+`update_index` field of all log records in this file. When reftables are
+used in a stack for link:#Update-transactions[transactions], these
+fields can order the files such that the prior file’s
+`max_update_index + 1` is the next file’s `min_update_index`.
+
+First ref block
+^^^^^^^^^^^^^^^
+
+The first ref block shares the same block as the file header, and is 24
+bytes smaller than all other blocks in the file. The first block
+immediately begins after the file header, at position 24.
+
+If the first block is a log block (a log-only file), its block header
+begins immediately at position 24.
+
+Ref block format
+^^^^^^^^^^^^^^^^
+
+A ref block is written as:
+
+....
+'r'
+uint24( block_len )
+ref_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
+encodes the number of bytes in the block up to, but not including the
+optional `padding`. This is always less than or equal to the file’s
+block size. In the first ref block, `block_len` includes 24 bytes for
+the file header.
+
+The 2-byte `restart_count` stores the number of entries in the
+`restart_offset` list, which must not be empty. Readers can use
+`restart_count` to binary search between restarts before starting a
+linear scan.
+
+Exactly `restart_count` 3-byte `restart_offset` values precedes the
+`restart_count`. Offsets are relative to the start of the block and
+refer to the first byte of any `ref_record` whose name has not been
+prefix compressed. Entries in the `restart_offset` list must be sorted,
+ascending. Readers can start linear scans from any of these records.
+
+A variable number of `ref_record` fill the middle of the block,
+describing reference names and values. The format is described below.
+
+As the first ref block shares the first file block with the file header,
+all `restart_offset` in the first block are relative to the start of the
+file (position 0), and include the file header. This forces the first
+`restart_offset` to be `28`.
+
+ref record
+++++++++++
+
+A `ref_record` describes a single reference, storing both the name and
+its value(s). Records are formatted as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | value_type )
+suffix
+varint( update_index_delta )
+value?
+....
+
+The `prefix_length` field specifies how many leading bytes of the prior
+reference record’s name should be copied to obtain this reference’s
+name. This must be 0 for the first reference in any block, and also must
+be 0 for any `ref_record` whose offset is listed in the `restart_offset`
+table at the end of the block.
+
+Recovering a reference name from any `ref_record` is a simple concat:
+
+....
+this_name = prior_name[0..prefix_length] + suffix
+....
+
+The `suffix_length` value provides the number of bytes available in
+`suffix` to copy from `suffix` to complete the reference name.
+
+The `update_index` that last modified the reference can be obtained by
+adding `update_index_delta` to the `min_update_index` from the file
+header: `min_update_index + update_index_delta`.
+
+The `value` follows. Its format is determined by `value_type`, one of
+the following:
+
+* `0x0`: deletion; no value data (see transactions, below)
+* `0x1`: one 20-byte object id; value of the ref
+* `0x2`: two 20-byte object ids; value of the ref, peeled target
+* `0x3`: symbolic reference: `varint( target_len ) target`
+
+Symbolic references use `0x3`, followed by the complete name of the
+reference target. No compression is applied to the target name.
+
+Types `0x4..0x7` are reserved for future use.
+
+Ref index
+^^^^^^^^^
+
+The ref index stores the name of the last reference from every ref block
+in the file, enabling reduced disk seeks for lookups. Any reference can
+be found by searching the index, identifying the containing block, and
+searching within that block.
+
+The index may be organized into a multi-level index, where the 1st level
+index block points to additional ref index blocks (2nd level), which may
+in turn point to either additional index blocks (e.g. 3rd level) or ref
+blocks (leaf level). Disk reads required to access a ref go up with
+higher index levels. Multi-level indexes may be required to ensure no
+single index block exceeds the file format’s max block size of
+`16777215` bytes (15.99 MiB). To acheive constant O(1) disk seeks for
+lookups the index must be a single level, which is permitted to exceed
+the file’s configured block size, but not the format’s max block size of
+15.99 MiB.
+
+If present, the ref index block(s) appears after the last ref block.
+
+If there are at least 4 ref blocks, a ref index block should be written
+to improve lookup times. Cold reads using the index require 2 disk reads
+(read index, read block), and binary searching < 4 blocks also requires
+<= 2 reads. Omitting the index block from smaller files saves space.
+
+If the file is unaligned and contains more than one ref block, the ref
+index must be written.
+
+Index block format:
+
+....
+'i'
+uint24( block_len )
+index_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+The index blocks begin with `block_type = 'i'` and a 3-byte `block_len`
+which encodes the number of bytes in the block, up to but not including
+the optional `padding`.
+
+The `restart_offset` and `restart_count` fields are identical in format,
+meaning and usage as in ref blocks.
+
+To reduce the number of reads required for random access in very large
+files the index block may be larger than other blocks. However, readers
+must hold the entire index in memory to benefit from this, so it’s a
+time-space tradeoff in both file size and reader memory.
+
+Increasing the file’s block size decreases the index size. Alternatively
+a multi-level index may be used, keeping index blocks within the file’s
+block size, but increasing the number of blocks that need to be
+accessed.
+
+index record
+++++++++++++
+
+An index record describes the last entry in another block. Index records
+are written as:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | 0 )
+suffix
+varint( block_position )
+....
+
+Index records use prefix compression exactly like `ref_record`.
+
+Index records store `block_position` after the suffix, specifying the
+absolute position in bytes (from the start of the file) of the block
+that ends with this reference. Readers can seek to `block_position` to
+begin reading the block header.
+
+Readers must examine the block header at `block_position` to determine
+if the next block is another level index block, or the leaf-level ref
+block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the ref index must first read the footer (below) to
+obtain `ref_index_position`. If not present, the position will be 0. The
+`ref_index_position` is for the 1st level root of the ref index.
+
+Obj block format
+^^^^^^^^^^^^^^^^
+
+Object blocks are optional. Writers may choose to omit object blocks,
+especially if readers will not use the SHA-1 to ref mapping.
+
+Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to
+ref blocks containing references pointing to that object directly, or as
+the peeled value of an annotated tag. Like ref blocks, object blocks use
+the file’s standard block size. The abbrevation length is available in
+the footer as `obj_id_len`.
+
+To save space in small files, object blocks may be omitted if the ref
+index is not present, as brute force search will only need to read a few
+ref blocks. When missing, readers should brute force a linear search of
+all references to lookup by SHA-1.
+
+An object block is written as:
+
+....
+'o'
+uint24( block_len )
+obj_record+
+uint24( restart_offset )+
+uint16( restart_count )
+
+padding?
+....
+
+Fields are identical to ref block. Binary search using the restart table
+works the same as in reference blocks.
+
+Because object identifiers are abbreviated by writers to the shortest
+unique abbreviation within the reftable, obj key lengths are variable
+between 2 and 20 bytes. Readers must compare only for common prefix
+match within an obj block or obj index.
+
+obj record
+++++++++++
+
+An `obj_record` describes a single object abbreviation, and the blocks
+containing references using that unique abbreviation:
+
+....
+varint( prefix_length )
+varint( (suffix_length << 3) | cnt_3 )
+suffix
+varint( cnt_large )?
+varint( position_delta )*
+....
+
+Like in reference blocks, abbreviations are prefix compressed within an
+obj block. On large reftables with many unique objects, higher block
+sizes (64k), and higher restart interval (128), a `prefix_length` of 2
+or 3 and `suffix_length` of 3 may be common in obj records (unique
+abbreviation of 5-6 raw bytes, 10-12 hex digits).
+
+Each record contains `position_count` number of positions for matching
+ref blocks. For 1-7 positions the count is stored in `cnt_3`. When
+`cnt_3 = 0` the actual count follows in a varint, `cnt_large`.
+
+The use of `cnt_3` bets most objects are pointed to by only a single
+reference, some may be pointed to by a couple of references, and very
+few (if any) are pointed to by more than 7 references.
+
+A special case exists when `cnt_3 = 0` and `cnt_large = 0`: there are no
+`position_delta`, but at least one reference starts with this
+abbreviation. A reader that needs exact reference names must scan all
+references to find which specific references have the desired object.
+Writers should use this format when the `position_delta` list would have
+overflowed the file’s block size due to a high number of references
+pointing to the same object.
+
+The first `position_delta` is the position from the start of the file.
+Additional `position_delta` entries are sorted ascending and relative to
+the prior entry, e.g. a reader would perform:
+
+....
+pos = position_delta[0]
+prior = pos
+for (j = 1; j < position_count; j++) {
+  pos = prior + position_delta[j]
+  prior = pos
+}
+....
+
+With a position in hand, a reader must linearly scan the ref block,
+starting from the first `ref_record`, testing each reference’s SHA-1s
+(for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
+SHA-1 within a single ref block is not supported by the reftable format.
+Smaller block sizes reduce the number of candidates this step must
+consider.
+
+Obj index
+^^^^^^^^^
+
+The obj index stores the abbreviation from the last entry for every obj
+block in the file, enabling reduced disk seeks for all lookups. It is
+formatted exactly the same as the ref index, but refers to obj blocks.
+
+The obj index should be present if obj blocks are present, as obj blocks
+should only be written in larger files.
+
+Readers loading the obj index must first read the footer (below) to
+obtain `obj_index_position`. If not present, the position will be 0.
+
+Log block format
+^^^^^^^^^^^^^^^^
+
+Unlike ref and obj blocks, log blocks are always unaligned.
+
+Log blocks are variable in size, and do not match the `block_size`
+specified in the file header or footer. Writers should choose an
+appropriate buffer size to prepare a log block for deflation, such as
+`2 * block_size`.
+
+A log block is written as:
+
+....
+'g'
+uint24( block_len )
+zlib_deflate {
+  log_record+
+  uint24( restart_offset )+
+  uint16( restart_count )
+}
+....
+
+Log blocks look similar to ref blocks, except `block_type = 'g'`.
+
+The 4-byte block header is followed by the deflated block contents using
+zlib deflate. The `block_len` in the header is the inflated size
+(including 4-byte block header), and should be used by readers to
+preallocate the inflation output buffer. A log block’s `block_len` may
+exceed the file’s block size.
+
+Offsets within the log block (e.g. `restart_offset`) still include the
+4-byte header. Readers may prefer prefixing the inflation output buffer
+with the 4-byte header.
+
+Within the deflate container, a variable number of `log_record` describe
+reference changes. The log record format is described below. See ref
+block format (above) for a description of `restart_offset` and
+`restart_count`.
+
+Because log blocks have no alignment or padding between blocks, readers
+must keep track of the bytes consumed by the inflater to know where the
+next log block begins.
+
+log record
+++++++++++
+
+Log record keys are structured as:
+
+....
+ref_name '\0' reverse_int64( update_index )
+....
+
+where `update_index` is the unique transaction identifier. The
+`update_index` field must be unique within the scope of a `ref_name`.
+See the update transactions section below for further details.
+
+The `reverse_int64` function inverses the value so lexographical
+ordering the network byte order encoding sorts the more recent records
+with higher `update_index` values first:
+
+....
+reverse_int64(int64 t) {
+  return 0xffffffffffffffff - t;
+}
+....
+
+Log records have a similar starting structure to ref and index records,
+utilizing the same prefix compression scheme applied to the log record
+key described above.
+
+....
+    varint( prefix_length )
+    varint( (suffix_length << 3) | log_type )
+    suffix
+    log_data {
+      old_id
+      new_id
+      varint( name_length    )  name
+      varint( email_length   )  email
+      varint( time_seconds )
+      sint16( tz_offset )
+      varint( message_length )  message
+    }?
+....
+
+Log record entries use `log_type` to indicate what follows:
+
+* `0x0`: deletion; no log data.
+* `0x1`: standard git reflog data using `log_data` above.
+
+The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
+entry from the reflog of `refs/stash` in a transaction file (below),
+without needing to rewrite larger files. Readers reading a stack of
+reflogs must treat this as a deletion.
+
+For `log_type = 0x1`, the `log_data` section follows
+linkgit:git-update-ref[1] logging and includes:
+
+* two 20-byte SHA-1s (old id, new id)
+* varint string of committer’s name
+* varint string of committer’s email
+* varint time in seconds since epoch (Jan 1, 1970)
+* 2-byte timezone offset in minutes (signed)
+* varint string of message
+
+`tz_offset` is the absolute number of minutes from GMT the committer was
+at the time of the update. For example `GMT-0800` is encoded in reftable
+as `sint16(-480)` and `GMT+0230` is `sint16(150)`.
+
+The committer email does not contain `<` or `>`, it’s the value normally
+found between the `<>` in a git commit object header.
+
+The `message_length` may be 0, in which case there was no message
+supplied for the update.
+
+Contrary to traditional reflog (which is a file), renames are encoded as
+a combination of ref deletion and ref creation.
+
+Reading the log
++++++++++++++++
+
+Readers accessing the log must first read the footer (below) to
+determine the `log_position`. The first block of the log begins at
+`log_position` bytes since the start of the file. The `log_position` is
+not block aligned.
+
+Importing logs
+++++++++++++++
+
+When importing from `$GIT_DIR/logs` writers should globally order all
+log records roughly by timestamp while preserving file order, and assign
+unique, increasing `update_index` values for each log line. Newer log
+records get higher `update_index` values.
+
+Although an import may write only a single reftable file, the reftable
+file must span many unique `update_index`, as each log line requires its
+own `update_index` to preserve semantics.
+
+Log index
+^^^^^^^^^
+
+The log index stores the log key
+(`refname \0 reverse_int64(update_index)`) for the last log record of
+every log block in the file, supporting bounded-time lookup.
+
+A log index block must be written if 2 or more log blocks are written to
+the file. If present, the log index appears after the last log block.
+There is no padding used to align the log index to block alignment.
+
+Log index format is identical to ref index, except the keys are 9 bytes
+longer to include `'\0'` and the 8-byte `reverse_int64(update_index)`.
+Records use `block_position` to refer to the start of a log block.
+
+Reading the index
++++++++++++++++++
+
+Readers loading the log index must first read the footer (below) to
+obtain `log_index_position`. If not present, the position will be 0.
+
+Footer
+^^^^^^
+
+After the last block of the file, a file footer is written. It begins
+like the file header, but is extended with additional data.
+
+A 68-byte footer appears at the end:
+
+....
+    'REFT'
+    uint8( version_number = 1 )
+    uint24( block_size )
+    uint64( min_update_index )
+    uint64( max_update_index )
+
+    uint64( ref_index_position )
+    uint64( (obj_position << 5) | obj_id_len )
+    uint64( obj_index_position )
+
+    uint64( log_position )
+    uint64( log_index_position )
+
+    uint32( CRC-32 of above )
+....
+
+If a section is missing (e.g. ref index) the corresponding position
+field (e.g. `ref_index_position`) will be 0.
+
+* `obj_position`: byte position for the first obj block.
+* `obj_id_len`: number of bytes used to abbreviate object identifiers in
+obj blocks.
+* `log_position`: byte position for the first log block.
+* `ref_index_position`: byte position for the start of the ref index.
+* `obj_index_position`: byte position for the start of the obj index.
+* `log_index_position`: byte position for the start of the log index.
+
+Reading the footer
+++++++++++++++++++
+
+Readers must seek to `file_length - 68` to access the footer. A trusted
+external source (such as `stat(2)`) is necessary to obtain
+`file_length`. When reading the footer, readers must verify:
+
+* 4-byte magic is correct
+* 1-byte version number is recognized
+* 4-byte CRC-32 matches the other 64 bytes (including magic, and
+version)
+
+Once verified, the other fields of the footer can be accessed.
+
+Varint encoding
+^^^^^^^^^^^^^^^
+
+Varint encoding is identical to the ofs-delta encoding method used
+within pack files.
+
+Decoder works such as:
+
+....
+val = buf[ptr] & 0x7f
+while (buf[ptr] & 0x80) {
+  ptr++
+  val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
+}
+....
+
+Binary search
+^^^^^^^^^^^^^
+
+Binary search within a block is supported by the `restart_offset` fields
+at the end of the block. Readers can binary search through the restart
+table to locate between which two restart points the sought reference or
+key should appear.
+
+Each record identified by a `restart_offset` stores the complete key in
+the `suffix` field of the record, making the compare operation during
+binary search straightforward.
+
+Once a restart point lexicographically before the sought reference has
+been identified, readers can linearly scan through the following record
+entries to locate the sought record, terminating if the current record
+sorts after (and therefore the sought key is not present).
+
+Restart point selection
++++++++++++++++++++++++
+
+Writers determine the restart points at file creation. The process is
+arbitrary, but every 16 or 64 records is recommended. Every 16 may be
+more suitable for smaller block sizes (4k or 8k), every 64 for larger
+block sizes (64k).
+
+More frequent restart points reduces prefix compression and increases
+space consumed by the restart table, both of which increase file size.
+
+Less frequent restart points makes prefix compression more effective,
+decreasing overall file size, with increased penalities for readers
+walking through more records after the binary search step.
+
+A maximum of `65535` restart points per block is supported.
+
+Considerations
+~~~~~~~~~~~~~~
+
+Lightweight refs dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The reftable format assumes the vast majority of references are single
+SHA-1 valued with common prefixes, such as Gerrit Code Review’s
+`refs/changes/` namespace, GitHub’s `refs/pulls/` namespace, or many
+lightweight tags in the `refs/tags/` namespace.
+
+Annotated tags storing the peeled object cost an additional 20 bytes per
+reference.
+
+Low overhead
+^^^^^^^^^^^^
+
+A reftable with very few references (e.g. git.git with 5 heads) is 269
+bytes for reftable, vs. 332 bytes for packed-refs. This supports
+reftable scaling down for transaction logs (below).
+
+Block size
+^^^^^^^^^^
+
+For a Gerrit Code Review type repository with many change refs, larger
+block sizes (64 KiB) and less frequent restart points (every 64) yield
+better compression due to more references within the block compressing
+against the prior reference.
+
+Larger block sizes reduce the index size, as the reftable will require
+fewer blocks to store the same number of references.
+
+Minimal disk seeks
+^^^^^^^^^^^^^^^^^^
+
+Assuming the index block has been loaded into memory, binary searching
+for any single reference requires exactly 1 disk seek to load the
+containing block.
+
+Scans and lookups dominate
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Scanning all references and lookup by name (or namespace such as
+`refs/heads/`) are the most common activities performed on repositories.
+SHA-1s are stored directly with references to optimize this use case.
+
+Logs are infrequently read
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are infrequently accessed, but can be large. Deflating log blocks
+saves disk space, with some increased penalty at read time.
+
+Logs are stored in an isolated section from refs, reducing the burden on
+reference readers that want to ignore logs. Further, historical logs can
+be isolated into log-only files.
+
+Logs are read backwards
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Logs are frequently accessed backwards (most recent N records for master
+to answer `master@{4}`), so log records are grouped by reference, and
+sorted descending by update index.
+
+Repository format
+~~~~~~~~~~~~~~~~~
+
+Version 1
+^^^^^^^^^
+
+A repository must set its `$GIT_DIR/config` to configure reftable:
+
+....
+[core]
+    repositoryformatversion = 1
+[extensions]
+    refStorage = reftable
+....
+
+Layout
+^^^^^^
+
+A collection of reftable files are stored in the `$GIT_DIR/reftable/`
+directory:
+
+....
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+where reftable files are named by a unique name such as produced by the
+function `${min_update_index}-${max_update_index}.ref`.
+
+Log-only files use the `.log` extension, while ref-only and mixed ref
+and log files use `.ref`. extension.
+
+The stack ordering file is `$GIT_DIR/reftable/tables.list` and lists the
+current files, one per line, in order, from oldest (base) to newest
+(most recent):
+
+....
+$ cat .git/reftable/tables.list
+00000001-00000001.log
+00000002-00000002.ref
+00000003-00000003.ref
+....
+
+Readers must read `$GIT_DIR/reftable/tables.list` to determine which
+files are relevant right now, and search through the stack in reverse
+order (last reftable is examined first).
+
+Reftable files not listed in `tables.list` may be new (and about to be
+added to the stack by the active writer), or ancient and ready to be
+pruned.
+
+Backward compatibility
+^^^^^^^^^^^^^^^^^^^^^^
+
+Older clients should continue to recognize the directory as a git
+repository so they don’t look for an enclosing repository in parent
+directories. To this end, a reftable-enabled repository must contain the
+following dummy files
+
+* `.git/HEAD`, a regular file containing `ref: refs/heads/.invalid`.
+* `.git/refs/`, a directory
+* `.git/refs/heads`, a regular file
+
+Readers
+^^^^^^^
+
+Readers can obtain a consistent snapshot of the reference space by
+following:
+
+1.  Open and read the `tables.list` file.
+2.  Open each of the reftable files that it mentions.
+3.  If any of the files is missing, goto 1.
+4.  Read from the now-open files as long as necessary.
+
+Update transactions
+^^^^^^^^^^^^^^^^^^^
+
+Although reftables are immutable, mutations are supported by writing a
+new reftable and atomically appending it to the stack:
+
+1.  Acquire `tables.list.lock`.
+2.  Read `tables.list` to determine current reftables.
+3.  Select `update_index` to be most recent file’s
+`max_update_index + 1`.
+4.  Prepare temp reftable `tmp_XXXXXX`, including log entries.
+5.  Rename `tmp_XXXXXX` to `${update_index}-${update_index}.ref`.
+6.  Copy `tables.list` to `tables.list.lock`, appending file from (5).
+7.  Rename `tables.list.lock` to `tables.list`.
+
+During step 4 the new file’s `min_update_index` and `max_update_index`
+are both set to the `update_index` selected by step 3. All log records
+for the transaction use the same `update_index` in their keys. This
+enables later correlation of which references were updated by the same
+transaction.
+
+Because a single `tables.list.lock` file is used to manage locking, the
+repository is single-threaded for writers. Writers may have to busy-spin
+(with backoff) around creating `tables.list.lock`, for up to an
+acceptable wait period, aborting if the repository is too busy to
+mutate. Application servers wrapped around repositories (e.g. Gerrit
+Code Review) can layer their own lock/wait queue to improve fairness to
+writers.
+
+Reference deletions
+^^^^^^^^^^^^^^^^^^^
+
+Deletion of any reference can be explicitly stored by setting the `type`
+to `0x0` and omitting the `value` field of the `ref_record`. This serves
+as a tombstone, overriding any assertions about the existence of the
+reference from earlier files in the stack.
+
+Compaction
+^^^^^^^^^^
+
+A partial stack of reftables can be compacted by merging references
+using a straightforward merge join across reftables, selecting the most
+recent value for output, and omitting deleted references that do not
+appear in remaining, lower reftables.
+
+A compacted reftable should set its `min_update_index` to the smallest
+of the input files’ `min_update_index`, and its `max_update_index`
+likewise to the largest input `max_update_index`.
+
+For sake of illustration, assume the stack currently consists of
+reftable files (from oldest to newest): A, B, C, and D. The compactor is
+going to compact B and C, leaving A and D alone.
+
+1.  Obtain lock `tables.list.lock` and read the `tables.list` file.
+2.  Obtain locks `B.lock` and `C.lock`. Ownership of these locks
+prevents other processes from trying to compact these files.
+3.  Release `tables.list.lock`.
+4.  Compact `B` and `C` into a temp file
+`${min_update_index}-${max_update_index}_XXXXXX`.
+5.  Reacquire lock `tables.list.lock`.
+6.  Verify that `B` and `C` are still in the stack, in that order. This
+should always be the case, assuming that other processes are adhering to
+the locking protocol.
+7.  Rename `${min_update_index}-${max_update_index}_XXXXXX` to
+`${min_update_index}-${max_update_index}.ref`.
+8.  Write the new stack to `tables.list.lock`, replacing `B` and `C`
+with the file from (4).
+9.  Rename `tables.list.lock` to `tables.list`.
+10. Delete `B` and `C`, perhaps after a short sleep to avoid forcing
+readers to backtrack.
+
+This strategy permits compactions to proceed independently of updates.
+
+Each reftable (compacted or not) is uniquely identified by its name, so
+open reftables can be cached by their name.
+
+Alternatives considered
+~~~~~~~~~~~~~~~~~~~~~~~
+
+bzip packed-refs
+^^^^^^^^^^^^^^^^
+
+`bzip2` can significantly shrink a large packed-refs file (e.g. 62 MiB
+compresses to 23 MiB, 37%). However the bzip format does not support
+random access to a single reference. Readers must inflate and discard
+while performing a linear scan.
+
+Breaking packed-refs into chunks (individually compressing each chunk)
+would reduce the amount of data a reader must inflate, but still leaves
+the problem of indexing chunks to support readers efficiently locating
+the correct chunk.
+
+Given the compression achieved by reftable’s encoding, it does not seem
+necessary to add the complexity of bzip/gzip/zlib.
+
+Michael Haggerty’s alternate format
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Michael Haggerty proposed
+https://public-inbox.org/git/CAMy9T_HCnyc1g8XWOOWhe7nN0aEFyyBskV2aOMb_fe+wGvEJ7A@mail.gmail.com/[an
+alternate] format to reftable on the Git mailing list. This format uses
+smaller chunks, without the restart table, and avoids block alignment
+with padding. Reflog entries immediately follow each ref, and are thus
+interleaved between refs.
+
+Performance testing indicates reftable is faster for lookups (51%
+faster, 11.2 usec vs. 5.4 usec), although reftable produces a slightly
+larger file (+ ~3.2%, 28.3M vs 29.2M):
+
+[cols=">,>,>,>",options="header",]
+|=====================================
+|format |size |seek cold |seek hot
+|mh-alt |28.3 M |23.4 usec |11.2 usec
+|reftable |29.2 M |19.9 usec |5.4 usec
+|=====================================
+
+JGit Ketch RefTree
+^^^^^^^^^^^^^^^^^^
+
+https://dev.eclipse.org/mhonarc/lists/jgit-dev/msg03073.html[JGit Ketch]
+proposed
+https://public-inbox.org/git/CAJo=hJvnAPNAdDcAAwAvU9C4RVeQdoS3Ev9WTguHx4fD0V_nOg@mail.gmail.com/[RefTree],
+an encoding of references inside Git tree objects stored as part of the
+repository’s object database.
+
+The RefTree format adds additional load on the object database storage
+layer (more loose objects, more objects in packs), and relies heavily on
+the packer’s delta compression to save space. Namespaces which are flat
+(e.g. thousands of tags in refs/tags) initially create very large loose
+objects, and so RefTree does not address the problem of copying many
+references to modify a handful.
+
+Flat namespaces are not efficiently searchable in RefTree, as tree
+objects in canonical formatting cannot be binary searched. This fails
+the need to handle a large number of references in a single namespace,
+such as GitHub’s `refs/pulls`, or a project with many tags.
+
+LMDB
+^^^^
+
+David Turner proposed
+https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/[using
+LMDB], as LMDB is lightweight (64k of runtime code) and GPL-compatible
+license.
+
+A downside of LMDB is its reliance on a single C implementation. This
+makes embedding inside JGit (a popular reimplemenation of Git)
+difficult, and hoisting onto virtual storage (for JGit DFS) virtually
+impossible.
+
+A common format that can be supported by all major Git implementations
+(git-core, JGit, libgit2) is strongly preferred.
+
+Future
+~~~~~~
+
+Longer hashes
+^^^^^^^^^^^^^
+
+Version will bump (e.g. 2) to indicate `value` uses a different object
+id length other than 20. The length could be stored in an expanded file
+header, or hardcoded as part of the version.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v13 05/13] reftable: clarify how empty tables should be written
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                           ` (3 preceding siblings ...)
  2020-05-11 19:46                         ` [PATCH v13 04/13] reftable: file format documentation Jonathan Nieder via GitGitGadget
@ 2020-05-11 19:46                         ` Han-Wen Nienhuys via GitGitGadget
  2020-05-19 22:01                           ` Junio C Hamano
  2020-05-11 19:46                         ` [PATCH v13 06/13] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
                                           ` (9 subsequent siblings)
  14 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The format allows for some ambiguity, as a lone footer also starts
with a valid file header. However, the current JGit code will barf on
this. This commit codifies this behavior into the standard.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/technical/reftable.txt | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index 8bad9ade256..6223538d64e 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -715,6 +715,13 @@ version)
 
 Once verified, the other fields of the footer can be accessed.
 
+Empty tables
+++++++++++++
+
+A reftable may be empty. In this case, the file starts with a header
+and is immediately followed by a footer.
+
+
 Varint encoding
 ^^^^^^^^^^^^^^^
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v13 06/13] reftable: define version 2 of the spec to accomodate SHA256
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                           ` (4 preceding siblings ...)
  2020-05-11 19:46                         ` [PATCH v13 05/13] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
@ 2020-05-11 19:46                         ` Han-Wen Nienhuys via GitGitGadget
  2020-05-19 22:32                           ` Junio C Hamano
  2020-05-11 19:46                         ` [PATCH v13 07/13] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
                                           ` (8 subsequent siblings)
  14 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Version appends a hash ID to the file header, making it slightly larger.

This commit also changes "SHA-1" into "object ID" in many places.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/technical/reftable.txt | 79 ++++++++++++++++------------
 1 file changed, 44 insertions(+), 35 deletions(-)

diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
index 6223538d64e..1464c4e7437 100644
--- a/Documentation/technical/reftable.txt
+++ b/Documentation/technical/reftable.txt
@@ -29,7 +29,7 @@ Objectives
 
 * Near constant time lookup for any single reference, even when the
 repository is cold and not in process or kernel cache.
-* Near constant time verification if a SHA-1 is referred to by at least
+* Near constant time verification if an object ID is referred to by at least
 one reference (for allow-tip-sha1-in-want).
 * Efficient lookup of an entire namespace, such as `refs/tags/`.
 * Support atomic push with `O(size_of_update)` operations.
@@ -193,8 +193,8 @@ and non-aligned files.
 Very small files (e.g. a single ref block) may omit `padding` and the ref
 index to reduce total file size.
 
-Header
-^^^^^^
+Header (version 1)
+^^^^^^^^^^^^^^^^^^
 
 A 24-byte header appears at the beginning of the file:
 
@@ -215,6 +215,27 @@ used in a stack for link:#Update-transactions[transactions], these
 fields can order the files such that the prior file’s
 `max_update_index + 1` is the next file’s `min_update_index`.
 
+Header (version 2)
+^^^^^^^^^^^^^^^^^^
+
+A 28-byte header appears at the beginning of the file:
+
+....
+'REFT'
+uint8( version_number = 2 )
+uint24( block_size )
+uint64( min_update_index )
+uint64( max_update_index )
+uint32( hash_id )
+....
+
+The header is identical to `version_number=1`, with the 4-byte hash ID
+("sha1" for SHA1 and "s256" for SHA-256) append to the header.
+
+For maximum backward compatibility, it is recommended to use version 1 when
+writing SHA1 reftables.
+
+
 First ref block
 ^^^^^^^^^^^^^^^
 
@@ -302,8 +323,8 @@ The `value` follows. Its format is determined by `value_type`, one of
 the following:
 
 * `0x0`: deletion; no value data (see transactions, below)
-* `0x1`: one 20-byte object id; value of the ref
-* `0x2`: two 20-byte object ids; value of the ref, peeled target
+* `0x1`: one object id; value of the ref
+* `0x2`: two object ids; value of the ref, peeled target
 * `0x3`: symbolic reference: `varint( target_len ) target`
 
 Symbolic references use `0x3`, followed by the complete name of the
@@ -404,9 +425,9 @@ Obj block format
 ^^^^^^^^^^^^^^^^
 
 Object blocks are optional. Writers may choose to omit object blocks,
-especially if readers will not use the SHA-1 to ref mapping.
+especially if readers will not use the object ID to ref mapping.
 
-Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to
+Object blocks use unique, abbreviated 2-32 byte object ID keys, mapping to
 ref blocks containing references pointing to that object directly, or as
 the peeled value of an annotated tag. Like ref blocks, object blocks use
 the file’s standard block size. The abbrevation length is available in
@@ -415,7 +436,7 @@ the footer as `obj_id_len`.
 To save space in small files, object blocks may be omitted if the ref
 index is not present, as brute force search will only need to read a few
 ref blocks. When missing, readers should brute force a linear search of
-all references to lookup by SHA-1.
+all references to lookup by object ID.
 
 An object block is written as:
 
@@ -434,7 +455,7 @@ works the same as in reference blocks.
 
 Because object identifiers are abbreviated by writers to the shortest
 unique abbreviation within the reftable, obj key lengths are variable
-between 2 and 20 bytes. Readers must compare only for common prefix
+between 2 and 32 bytes. Readers must compare only for common prefix
 match within an obj block or obj index.
 
 obj record
@@ -487,9 +508,9 @@ for (j = 1; j < position_count; j++) {
 ....
 
 With a position in hand, a reader must linearly scan the ref block,
-starting from the first `ref_record`, testing each reference’s SHA-1s
+starting from the first `ref_record`, testing each reference’s object IDs
 (for `value_type = 0x1` or `0x2`) for full equality. Faster searching by
-SHA-1 within a single ref block is not supported by the reftable format.
+object ID within a single ref block is not supported by the reftable format.
 Smaller block sizes reduce the number of candidates this step must
 consider.
 
@@ -604,7 +625,7 @@ reflogs must treat this as a deletion.
 For `log_type = 0x1`, the `log_data` section follows
 linkgit:git-update-ref[1] logging and includes:
 
-* two 20-byte SHA-1s (old id, new id)
+* two object IDs (old id, new id)
 * varint string of committer’s name
 * varint string of committer’s email
 * varint time in seconds since epoch (Jan 1, 1970)
@@ -671,14 +692,8 @@ Footer
 After the last block of the file, a file footer is written. It begins
 like the file header, but is extended with additional data.
 
-A 68-byte footer appears at the end:
-
 ....
-    'REFT'
-    uint8( version_number = 1 )
-    uint24( block_size )
-    uint64( min_update_index )
-    uint64( max_update_index )
+    HEADER
 
     uint64( ref_index_position )
     uint64( (obj_position << 5) | obj_id_len )
@@ -701,12 +716,16 @@ obj blocks.
 * `obj_index_position`: byte position for the start of the obj index.
 * `log_index_position`: byte position for the start of the log index.
 
+The size of the footer is 68 bytes for version 1, and 72 bytes for
+version 2.
+
 Reading the footer
 ++++++++++++++++++
 
-Readers must seek to `file_length - 68` to access the footer. A trusted
-external source (such as `stat(2)`) is necessary to obtain
-`file_length`. When reading the footer, readers must verify:
+Readers must first read the file start to determine the version
+number. Then they seek to `file_length - FOOTER_LENGTH` to access the
+footer. A trusted external source (such as `stat(2)`) is necessary to
+obtain `file_length`. When reading the footer, readers must verify:
 
 * 4-byte magic is correct
 * 1-byte version number is recognized
@@ -779,11 +798,11 @@ Lightweight refs dominate
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The reftable format assumes the vast majority of references are single
-SHA-1 valued with common prefixes, such as Gerrit Code Review’s
+object IDs valued with common prefixes, such as Gerrit Code Review’s
 `refs/changes/` namespace, GitHub’s `refs/pulls/` namespace, or many
 lightweight tags in the `refs/tags/` namespace.
 
-Annotated tags storing the peeled object cost an additional 20 bytes per
+Annotated tags storing the peeled object cost an additional object ID per
 reference.
 
 Low overhead
@@ -816,7 +835,7 @@ Scans and lookups dominate
 
 Scanning all references and lookup by name (or namespace such as
 `refs/heads/`) are the most common activities performed on repositories.
-SHA-1s are stored directly with references to optimize this use case.
+Object IDs are stored directly with references to optimize this use case.
 
 Logs are infrequently read
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -1062,13 +1081,3 @@ impossible.
 
 A common format that can be supported by all major Git implementations
 (git-core, JGit, libgit2) is strongly preferred.
-
-Future
-~~~~~~
-
-Longer hashes
-^^^^^^^^^^^^^
-
-Version will bump (e.g. 2) to indicate `value` uses a different object
-id length other than 20. The length could be stored in an expanded file
-header, or hardcoded as part of the version.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v13 07/13] Write pseudorefs through ref backends.
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                           ` (5 preceding siblings ...)
  2020-05-11 19:46                         ` [PATCH v13 06/13] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
@ 2020-05-11 19:46                         ` Han-Wen Nienhuys via GitGitGadget
  2020-05-12 10:22                           ` Phillip Wood
  2020-05-11 19:46                         ` [PATCH v13 08/13] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
                                           ` (7 subsequent siblings)
  14 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Pseudorefs store transient data in in the repository. Examples are HEAD,
CHERRY_PICK_HEAD, etc.

These refs have always been read through the ref backends, but they were written
in a one-off routine that wrote a object ID or symref directly wrote into
.git/<pseudo_ref_name>.

This causes problems when introducing a new ref storage backend. To remedy this,
extend the ref backend implementation with a write_pseudoref_fn and
update_pseudoref_fn.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c                | 115 +++++------------------------
 refs.h                |  11 +++
 refs/files-backend.c  | 164 +++++++++++++++++++++++++++++++++++-------
 refs/packed-backend.c |  40 ++++-------
 refs/refs-internal.h  |  12 ++++
 5 files changed, 192 insertions(+), 150 deletions(-)

diff --git a/refs.c b/refs.c
index 224ff66c7bb..fc72211a36f 100644
--- a/refs.c
+++ b/refs.c
@@ -739,101 +739,6 @@ long get_files_ref_lock_timeout_ms(void)
 	return timeout_ms;
 }
 
-static int write_pseudoref(const char *pseudoref, const struct object_id *oid,
-			   const struct object_id *old_oid, struct strbuf *err)
-{
-	const char *filename;
-	int fd;
-	struct lock_file lock = LOCK_INIT;
-	struct strbuf buf = STRBUF_INIT;
-	int ret = -1;
-
-	if (!oid)
-		return 0;
-
-	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
-
-	filename = git_path("%s", pseudoref);
-	fd = hold_lock_file_for_update_timeout(&lock, filename, 0,
-					       get_files_ref_lock_timeout_ms());
-	if (fd < 0) {
-		strbuf_addf(err, _("could not open '%s' for writing: %s"),
-			    filename, strerror(errno));
-		goto done;
-	}
-
-	if (old_oid) {
-		struct object_id actual_old_oid;
-
-		if (read_ref(pseudoref, &actual_old_oid)) {
-			if (!is_null_oid(old_oid)) {
-				strbuf_addf(err, _("could not read ref '%s'"),
-					    pseudoref);
-				rollback_lock_file(&lock);
-				goto done;
-			}
-		} else if (is_null_oid(old_oid)) {
-			strbuf_addf(err, _("ref '%s' already exists"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		} else if (!oideq(&actual_old_oid, old_oid)) {
-			strbuf_addf(err, _("unexpected object ID when writing '%s'"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		}
-	}
-
-	if (write_in_full(fd, buf.buf, buf.len) < 0) {
-		strbuf_addf(err, _("could not write to '%s'"), filename);
-		rollback_lock_file(&lock);
-		goto done;
-	}
-
-	commit_lock_file(&lock);
-	ret = 0;
-done:
-	strbuf_release(&buf);
-	return ret;
-}
-
-static int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
-{
-	const char *filename;
-
-	filename = git_path("%s", pseudoref);
-
-	if (old_oid && !is_null_oid(old_oid)) {
-		struct lock_file lock = LOCK_INIT;
-		int fd;
-		struct object_id actual_old_oid;
-
-		fd = hold_lock_file_for_update_timeout(
-				&lock, filename, 0,
-				get_files_ref_lock_timeout_ms());
-		if (fd < 0) {
-			error_errno(_("could not open '%s' for writing"),
-				    filename);
-			return -1;
-		}
-		if (read_ref(pseudoref, &actual_old_oid))
-			die(_("could not read ref '%s'"), pseudoref);
-		if (!oideq(&actual_old_oid, old_oid)) {
-			error(_("unexpected object ID when deleting '%s'"),
-			      pseudoref);
-			rollback_lock_file(&lock);
-			return -1;
-		}
-
-		unlink(filename);
-		rollback_lock_file(&lock);
-	} else {
-		unlink(filename);
-	}
-
-	return 0;
-}
 
 int refs_delete_ref(struct ref_store *refs, const char *msg,
 		    const char *refname,
@@ -844,8 +749,7 @@ int refs_delete_ref(struct ref_store *refs, const char *msg,
 	struct strbuf err = STRBUF_INIT;
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
-		assert(refs == get_main_ref_store(the_repository));
-		return delete_pseudoref(refname, old_oid);
+		return ref_store_delete_pseudoref(refs, refname, old_oid);
 	}
 
 	transaction = ref_store_transaction_begin(refs, &err);
@@ -1172,7 +1076,8 @@ int refs_update_ref(struct ref_store *refs, const char *msg,
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
 		assert(refs == get_main_ref_store(the_repository));
-		ret = write_pseudoref(refname, new_oid, old_oid, &err);
+		ret = ref_store_write_pseudoref(refs, refname, new_oid, old_oid,
+						&err);
 	} else {
 		t = ref_store_transaction_begin(refs, &err);
 		if (!t ||
@@ -1441,6 +1346,20 @@ int head_ref(each_ref_fn fn, void *cb_data)
 	return refs_head_ref(get_main_ref_store(the_repository), fn, cb_data);
 }
 
+int ref_store_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			      const struct object_id *oid,
+			      const struct object_id *old_oid,
+			      struct strbuf *err)
+{
+	return refs->be->write_pseudoref(refs, pseudoref, oid, old_oid, err);
+}
+
+int ref_store_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			       const struct object_id *old_oid)
+{
+	return refs->be->delete_pseudoref(refs, pseudoref, old_oid);
+}
+
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
 		const char *prefix, int trim, int flags)
diff --git a/refs.h b/refs.h
index 99ba9e331e5..d1d9361441b 100644
--- a/refs.h
+++ b/refs.h
@@ -728,6 +728,17 @@ int update_ref(const char *msg, const char *refname,
 	       const struct object_id *new_oid, const struct object_id *old_oid,
 	       unsigned int flags, enum action_on_err onerr);
 
+/* Pseudorefs (eg. HEAD, CHERRY_PICK_HEAD) have a separate routines for updating
+   and deletion as they cannot take part in normal transactional updates.
+   Pseudorefs should only be written for the main repository.
+*/
+int ref_store_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			      const struct object_id *oid,
+			      const struct object_id *old_oid,
+			      struct strbuf *err);
+int ref_store_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			       const struct object_id *old_oid);
+
 int parse_hide_refs_config(const char *var, const char *value, const char *);
 
 /*
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6516c7bc8c8..53b891190be 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -731,6 +731,115 @@ static int lock_raw_ref(struct files_ref_store *refs,
 	return ret;
 }
 
+static int files_write_pseudoref(struct ref_store *ref_store,
+				 const char *pseudoref,
+				 const struct object_id *oid,
+				 const struct object_id *old_oid,
+				 struct strbuf *err)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "write_pseudoref");
+	int fd;
+	struct lock_file lock = LOCK_INIT;
+	struct strbuf filename = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	int ret = -1;
+
+	if (!oid)
+		return 0;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+	fd = hold_lock_file_for_update_timeout(&lock, filename.buf, 0,
+					       get_files_ref_lock_timeout_ms());
+	if (fd < 0) {
+		strbuf_addf(err, _("could not open '%s' for writing: %s"),
+			    buf.buf, strerror(errno));
+		goto done;
+	}
+
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(err, _("could not read ref '%s'"),
+					    pseudoref);
+				rollback_lock_file(&lock);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(err, _("ref '%s' already exists"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(err,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+	}
+
+	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
+	if (write_in_full(fd, buf.buf, buf.len) < 0) {
+		strbuf_addf(err, _("could not write to '%s'"), filename.buf);
+		rollback_lock_file(&lock);
+		goto done;
+	}
+
+	commit_lock_file(&lock);
+	ret = 0;
+done:
+	strbuf_release(&buf);
+	strbuf_release(&filename);
+	return ret;
+}
+
+static int files_delete_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *old_oid)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "delete_pseudoref");
+	struct strbuf filename = STRBUF_INIT;
+	int ret = -1;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+
+	if (old_oid && !is_null_oid(old_oid)) {
+		struct lock_file lock = LOCK_INIT;
+		int fd;
+		struct object_id actual_old_oid;
+
+		fd = hold_lock_file_for_update_timeout(
+			&lock, filename.buf, 0,
+			get_files_ref_lock_timeout_ms());
+		if (fd < 0) {
+			error_errno(_("could not open '%s' for writing"),
+				    filename.buf);
+			goto done;
+		}
+		if (read_ref(pseudoref, &actual_old_oid))
+			die(_("could not read ref '%s'"), pseudoref);
+		if (!oideq(&actual_old_oid, old_oid)) {
+			error(_("unexpected object ID when deleting '%s'"),
+			      pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+
+		unlink(filename.buf);
+		rollback_lock_file(&lock);
+	} else {
+		unlink(filename.buf);
+	}
+	ret = 0;
+done:
+	strbuf_release(&filename);
+	return ret;
+}
+
 struct files_ref_iterator {
 	struct ref_iterator base;
 
@@ -3173,30 +3282,31 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
 	return 0;
 }
 
-struct ref_storage_be refs_be_files = {
-	NULL,
-	"files",
-	files_ref_store_create,
-	files_init_db,
-	files_transaction_prepare,
-	files_transaction_finish,
-	files_transaction_abort,
-	files_initial_transaction_commit,
-
-	files_pack_refs,
-	files_create_symref,
-	files_delete_refs,
-	files_rename_ref,
-	files_copy_ref,
-
-	files_ref_iterator_begin,
-	files_read_raw_ref,
-
-	files_reflog_iterator_begin,
-	files_for_each_reflog_ent,
-	files_for_each_reflog_ent_reverse,
-	files_reflog_exists,
-	files_create_reflog,
-	files_delete_reflog,
-	files_reflog_expire
-};
+struct ref_storage_be refs_be_files = { NULL,
+					"files",
+					files_ref_store_create,
+					files_init_db,
+					files_transaction_prepare,
+					files_transaction_finish,
+					files_transaction_abort,
+					files_initial_transaction_commit,
+
+					files_pack_refs,
+					files_create_symref,
+					files_delete_refs,
+					files_rename_ref,
+					files_copy_ref,
+
+					files_write_pseudoref,
+					files_delete_pseudoref,
+
+					files_ref_iterator_begin,
+					files_read_raw_ref,
+
+					files_reflog_iterator_begin,
+					files_for_each_reflog_ent,
+					files_for_each_reflog_ent_reverse,
+					files_reflog_exists,
+					files_create_reflog,
+					files_delete_reflog,
+					files_reflog_expire };
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 4458a0f69cc..8f9b85f0b0c 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1641,29 +1641,19 @@ static int packed_reflog_expire(struct ref_store *ref_store,
 }
 
 struct ref_storage_be refs_be_packed = {
-	NULL,
-	"packed",
-	packed_ref_store_create,
-	packed_init_db,
-	packed_transaction_prepare,
-	packed_transaction_finish,
-	packed_transaction_abort,
-	packed_initial_transaction_commit,
-
-	packed_pack_refs,
-	packed_create_symref,
-	packed_delete_refs,
-	packed_rename_ref,
-	packed_copy_ref,
-
-	packed_ref_iterator_begin,
-	packed_read_raw_ref,
-
-	packed_reflog_iterator_begin,
-	packed_for_each_reflog_ent,
-	packed_for_each_reflog_ent_reverse,
-	packed_reflog_exists,
-	packed_create_reflog,
-	packed_delete_reflog,
-	packed_reflog_expire
+	NULL, "packed", packed_ref_store_create, packed_init_db,
+	packed_transaction_prepare, packed_transaction_finish,
+	packed_transaction_abort, packed_initial_transaction_commit,
+
+	packed_pack_refs, packed_create_symref, packed_delete_refs,
+	packed_rename_ref, packed_copy_ref,
+
+	/* XXX */
+	NULL, NULL,
+
+	packed_ref_iterator_begin, packed_read_raw_ref,
+
+	packed_reflog_iterator_begin, packed_for_each_reflog_ent,
+	packed_for_each_reflog_ent_reverse, packed_reflog_exists,
+	packed_create_reflog, packed_delete_reflog, packed_reflog_expire
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 3490aac3a40..dabe18baea1 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -549,6 +549,15 @@ typedef int copy_ref_fn(struct ref_store *ref_store,
 			  const char *oldref, const char *newref,
 			  const char *logmsg);
 
+typedef int write_pseudoref_fn(struct ref_store *ref_store,
+			       const char *pseudoref,
+			       const struct object_id *oid,
+			       const struct object_id *old_oid,
+			       struct strbuf *err);
+typedef int delete_pseudoref_fn(struct ref_store *ref_store,
+				const char *pseudoref,
+				const struct object_id *old_oid);
+
 /*
  * Iterate over the references in `ref_store` whose names start with
  * `prefix`. `prefix` is matched as a literal string, without regard
@@ -648,6 +657,9 @@ struct ref_storage_be {
 	rename_ref_fn *rename_ref;
 	copy_ref_fn *copy_ref;
 
+	write_pseudoref_fn *write_pseudoref;
+	delete_pseudoref_fn *delete_pseudoref;
+
 	ref_iterator_begin_fn *iterator_begin;
 	read_raw_ref_fn *read_raw_ref;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v13 08/13] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                           ` (6 preceding siblings ...)
  2020-05-11 19:46                         ` [PATCH v13 07/13] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
@ 2020-05-11 19:46                         ` Han-Wen Nienhuys via GitGitGadget
  2020-05-11 19:46                         ` [PATCH v13 09/13] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
                                           ` (6 subsequent siblings)
  14 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This happens implicitly in the files/packed ref backend; making it
explicit simplifies adding alternate ref storage backends, such as
reftable.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index fc72211a36f..79c63d60dcb 100644
--- a/refs.c
+++ b/refs.c
@@ -1444,7 +1444,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1504,8 +1504,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
-			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
+			       cb_data);
 }
 
 int for_each_rawref(each_ref_fn fn, void *cb_data)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v13 09/13] Add .gitattributes for the reftable/ directory
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                           ` (7 preceding siblings ...)
  2020-05-11 19:46                         ` [PATCH v13 08/13] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
@ 2020-05-11 19:46                         ` Han-Wen Nienhuys via GitGitGadget
  2020-05-11 19:46                         ` [PATCH v13 10/13] Add reftable library Han-Wen Nienhuys via GitGitGadget
                                           ` (5 subsequent siblings)
  14 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 reftable/.gitattributes

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 00000000000..f44451a3795
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v13 10/13] Add reftable library
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                           ` (8 preceding siblings ...)
  2020-05-11 19:46                         ` [PATCH v13 09/13] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
@ 2020-05-11 19:46                         ` Han-Wen Nienhuys via GitGitGadget
  2020-05-11 19:46                         ` [PATCH v13 11/13] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                                           ` (4 subsequent siblings)
  14 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. Its rationale and
specification is in the preceding commit.

This imports the upstream library as one big commit. For understanding
the code, it is suggested to read the code in the following order:

* The specification under Documentation/technical/reftable.txt

* reftable.h - the public API

* record.{c,h} - reading and writing records

* block.{c,h} - reading and writing blocks.

* writer.{c,h} - writing a complete reftable file.

* merged.{c,h} and pq.{c,h} - reading a stack of reftables

* stack.{c,h} - writing and compacting stacks of reftable on the
filesystem.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE       |   31 +
 reftable/README.md     |   11 +
 reftable/VERSION       |   14 +
 reftable/basics.c      |  215 +++++++
 reftable/basics.h      |   53 ++
 reftable/block.c       |  434 ++++++++++++++
 reftable/block.h       |  129 +++++
 reftable/constants.h   |   21 +
 reftable/file.c        |   99 ++++
 reftable/iter.c        |  240 ++++++++
 reftable/iter.h        |   63 ++
 reftable/merged.c      |  325 +++++++++++
 reftable/merged.h      |   38 ++
 reftable/pq.c          |  114 ++++
 reftable/pq.h          |   34 ++
 reftable/reader.c      |  753 ++++++++++++++++++++++++
 reftable/reader.h      |   65 +++
 reftable/record.c      | 1141 ++++++++++++++++++++++++++++++++++++
 reftable/record.h      |  121 ++++
 reftable/refname.c     |  215 +++++++
 reftable/refname.h     |   38 ++
 reftable/reftable.c    |   91 +++
 reftable/reftable.h    |  564 ++++++++++++++++++
 reftable/slice.c       |  225 ++++++++
 reftable/slice.h       |   76 +++
 reftable/stack.c       | 1245 ++++++++++++++++++++++++++++++++++++++++
 reftable/stack.h       |   48 ++
 reftable/system.h      |   54 ++
 reftable/tree.c        |   67 +++
 reftable/tree.h        |   34 ++
 reftable/update.sh     |   24 +
 reftable/writer.c      |  665 +++++++++++++++++++++
 reftable/writer.h      |   60 ++
 reftable/zlib-compat.c |   92 +++
 34 files changed, 7399 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 00000000000..402e0f9356b
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 00000000000..21500563fd4
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,11 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   sh reftable/update.sh
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 00000000000..8c003b88a7a
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,14 @@
+commit e74c14b66b6c15f6526c485f2e45d3f2735d359d
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Mon May 11 21:02:55 2020 +0200
+
+    C: handle out-of-date reftable stacks
+    
+    * Make reftable_stack_reload() check out-of-dateness. This makes it very cheap,
+      allowing it to be called often. This is useful because Git calls itself often,
+      which effectively requires a reload.
+    
+    * In reftable_stack_add(), check the return value of stack_uptodate(), leading
+      to erroneously succeeding the transaction.
+    
+    * A test that exercises the above.
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 00000000000..14c4dfaf5a6
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,215 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)(i & 0xff);
+}
+
+uint32_t get_be24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)(i & 0xff);
+}
+
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args)
+{
+	size_t lo = 0;
+	size_t hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		size_t mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		reftable_free(*p);
+		p++;
+	}
+	reftable_free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = reftable_realloc(
+					names, names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = reftable_realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *reftable_error_str(int err)
+{
+	static char buf[250];
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return "I/O error";
+	case REFTABLE_FORMAT_ERROR:
+		return "corrupt reftable file";
+	case REFTABLE_NOT_EXIST_ERROR:
+		return "file does not exist";
+	case REFTABLE_LOCK_ERROR:
+		return "data is outdated";
+	case REFTABLE_API_ERROR:
+		return "misuse of the reftable API";
+	case REFTABLE_ZLIB_ERROR:
+		return "zlib failure";
+	case REFTABLE_NAME_CONFLICT:
+		return "file/directory conflict";
+	case REFTABLE_REFNAME_ERROR:
+		return "invalid refname";
+	case -1:
+		return "general error";
+	default:
+		snprintf(buf, sizeof(buf), "unknown error code %d", err);
+		return buf;
+	}
+}
+
+int reftable_error_to_errno(int err)
+{
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return EIO;
+	case REFTABLE_FORMAT_ERROR:
+		return EFAULT;
+	case REFTABLE_NOT_EXIST_ERROR:
+		return ENOENT;
+	case REFTABLE_LOCK_ERROR:
+		return EBUSY;
+	case REFTABLE_API_ERROR:
+		return EINVAL;
+	case REFTABLE_ZLIB_ERROR:
+		return EDOM;
+	default:
+		return ERANGE;
+	}
+}
+
+void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
+void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
+void (*reftable_free_ptr)(void *) = &free;
+
+void *reftable_malloc(size_t sz)
+{
+	return (*reftable_malloc_ptr)(sz);
+}
+
+void *reftable_realloc(void *p, size_t sz)
+{
+	return (*reftable_realloc_ptr)(p, sz);
+}
+
+void reftable_free(void *p)
+{
+	reftable_free_ptr(p);
+}
+
+void *reftable_calloc(size_t sz)
+{
+	void *p = reftable_malloc(sz);
+	memset(p, 0, sz);
+	return p;
+}
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *))
+{
+	reftable_malloc_ptr = malloc;
+	reftable_realloc_ptr = realloc;
+	reftable_free_ptr = free;
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 00000000000..1b003485c9c
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+/* Bigendian en/decoding of integers */
+
+void put_be24(byte *out, uint32_t i);
+uint32_t get_be24(byte *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args);
+
+/*
+  Frees a NULL terminated array of malloced strings. The array itself is also
+  freed.
+ */
+void free_names(char **a);
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp);
+
+/* compares two NULL-terminated arrays of strings. */
+int names_equal(char **a, char **b);
+
+/* returns the array size of a NULL-terminated array of strings. */
+int names_length(char **names);
+
+/* Allocation routines; they invoke the functions set through
+ * reftable_set_alloc() */
+void *reftable_malloc(size_t sz);
+void *reftable_realloc(void *p, size_t sz);
+void reftable_free(void *p);
+void *reftable_calloc(size_t sz);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 00000000000..c782c628a71
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,434 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int header_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 24;
+	case 2:
+		return 28;
+	}
+	abort();
+}
+
+int footer_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 68;
+	case 2:
+		return 72;
+	}
+	abort();
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct record rec)
+{
+	struct slice empty = { 0 };
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = { 0 };
+	int n = 0;
+
+	record_key(rec, &key);
+	n = encode_key(&restart, out, last, key, record_val_type(rec));
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	n = record_encode(rec, out, w->hash_size);
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0) {
+		goto err;
+	}
+
+	slice_clear(&key);
+	return 0;
+
+err:
+	slice_clear(&key);
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next) {
+		return -1;
+	}
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = reftable_realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = { 0 };
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		slice_resize(&compressed, src_len);
+
+		while (1) {
+			uLongf dest_len = compressed.len;
+
+			zresult = compress2(compressed.buf, &dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				slice_resize(&compressed, 2 * compressed.len);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				slice_clear(&compressed);
+				return REFTABLE_ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed.buf,
+			       dest_len);
+			w->next = dest_len + block_header_skip;
+			slice_clear(&compressed);
+			break;
+		}
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct reftable_block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	if (!is_block_type(typ)) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = { 0 };
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip; /* total size of dest
+							    buffer. */
+		uLongf src_len = block->len - block_header_skip;
+
+		/* Log blocks specify the *uncompressed* size in their header.
+		 */
+		slice_resize(&uncompressed, sz);
+
+		/* Copy over the block header verbatim. It's not compressed. */
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		/* Uncompress */
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			slice_clear(&uncompressed);
+			return REFTABLE_ZLIB_ERROR;
+		}
+
+		if (dst_len + block_header_skip != sz) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+
+		/* We're done with the input data. */
+		reftable_block_done(block);
+		block->data = uncompressed.buf;
+		block->len = sz;
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	{
+		uint16_t restart_count = get_be16(block->data + sz - 2);
+		uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+		byte *restart_bytes = block->data + restart_start;
+
+		/* transfer ownership. */
+		br->block = *block;
+		block->data = NULL;
+		block->len = 0;
+
+		br->hash_size = hash_size;
+		br->block_len = restart_start;
+		br->full_block_size = full_block_size;
+		br->header_off = header_off;
+		br->restart_count = restart_count;
+		br->restart_bytes = restart_bytes;
+	}
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct slice key;
+	struct block_reader *r;
+};
+
+static int restart_key_less(size_t idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = { 0 };
+	struct slice last_key = { 0 };
+	byte unused_extra;
+	int n = decode_key(&rkey, &unused_extra, last_key, in);
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	{
+		int result = slice_compare(a->key, rkey);
+		slice_clear(&rkey);
+		return result;
+	}
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+int block_iter_next(struct block_iter *it, struct record rec)
+{
+	struct slice in = {
+		.buf = it->br->block.data + it->next_off,
+		.len = it->br->block_len - it->next_off,
+	};
+	struct slice start = in;
+	struct slice key = { 0 };
+	byte extra = 0;
+	int n = 0;
+
+	if (it->next_off >= it->br->block_len) {
+		return 1;
+	}
+
+	n = decode_key(&key, &extra, it->last_key, in);
+	if (n < 0) {
+		return -1;
+	}
+
+	slice_consume(&in, n);
+	n = record_decode(rec, key, extra, in, it->br->hash_size);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	slice_copy(&it->last_key, key);
+	it->next_off += start.len - in.len;
+	slice_clear(&key);
+	return 0;
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = { 0 };
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	byte extra = 0;
+	int n = decode_key(key, &extra, empty, in);
+	if (n < 0) {
+		return n;
+	}
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	slice_clear(&it->last_key);
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+	struct record rec = new_record(block_reader_type(br));
+	struct slice key = { 0 };
+	int err = 0;
+	struct block_iter next = { 0 };
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	/* We're looking for the last entry less/equal than the wanted key, so
+	   we have to go one entry too far and then back up.
+	*/
+	while (true) {
+		block_iter_copy_from(&next, it);
+		err = block_iter_next(&next, rec);
+		if (err < 0) {
+			goto exit;
+		}
+
+		record_key(rec, &key);
+		if (err > 0 || slice_compare(key, want) >= 0) {
+			err = 0;
+			goto exit;
+		}
+
+		block_iter_copy_from(it, &next);
+	}
+
+exit:
+	slice_clear(&key);
+	slice_clear(&next.last_key);
+	record_destroy(&rec);
+
+	return err;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	slice_clear(&bw->last_key);
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 00000000000..17bee98e2bb
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,129 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+/*
+  Writes reftable blocks. The block_writer is reused across blocks to minimize
+  allocation overhead.
+*/
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+
+	/* Offset ofof the global header. Nonzero in the first block only. */
+	uint32_t header_off;
+
+	/* How often to restart keys. */
+	int restart_interval;
+	int hash_size;
+
+	/* Offset of next byte to write. */
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+
+	struct slice last_key;
+	int entries;
+};
+
+/*
+  initializes the blockwriter to write `typ` entries, using `buf` as temporary
+  storage. `buf` is not owned by the block_writer. */
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+
+/*
+  returns the block type (eg. 'r' for ref records.
+*/
+byte block_writer_type(struct block_writer *bw);
+
+/* appends the record, or -1 if it doesn't fit. */
+int block_writer_add(struct block_writer *w, struct record rec);
+
+/* appends the key restarts, and compress the block if necessary. */
+int block_writer_finish(struct block_writer *w);
+
+/* clears out internally allocated block_writer members. */
+void block_writer_clear(struct block_writer *bw);
+
+/* Read a block. */
+struct block_reader {
+	/* offset of the block header; nonzero for the first block in a
+	 * reftable. */
+	uint32_t header_off;
+
+	/* the memory block */
+	struct reftable_block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint16_t restart_count;
+
+	/* size of the data in the file. For log blocks, this is the compressed
+	 * size. */
+	uint32_t full_block_size;
+};
+
+/* Iterate over entries in a block */
+struct block_iter {
+	/* offset within the block of the next entry to read. */
+	uint32_t next_off;
+	struct block_reader *br;
+
+	/* key for last entry we read. */
+	struct slice last_key;
+};
+
+/* initializes a block reader */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Position `it` at start of the block */
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+
+/* Position `it` to the `want` key in the block */
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want);
+
+/* Returns the block type (eg. 'r' for refs) */
+byte block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct record rec);
+
+/* Seek to `want` with in the block pointed to by `it` */
+int block_iter_seek(struct block_iter *it, struct slice want);
+
+/* deallocate memory for `it`. The block reader and its block is left intact. */
+void block_iter_close(struct block_iter *it);
+
+/* size of file header, depending on format version */
+int header_size(int version);
+
+/* size of file footer, depending on format version */
+int footer_size(int version);
+
+/* returns a block to its source. */
+void reftable_block_done(struct reftable_block *ret);
+
+#endif
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 00000000000..5eee72c4c11
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,21 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 00000000000..2bf1135a2e2
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,99 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	reftable_free(b);
+}
+
+static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = reftable_malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size) {
+		return -1;
+	}
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int reftable_block_source_from_file(struct reftable_block_source *bs,
+				    const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return REFTABLE_NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0) {
+		return -1;
+	}
+
+	{
+		struct file_block_source *p =
+			reftable_calloc(sizeof(struct file_block_source));
+		p->size = st.st_size;
+		p->fd = fd;
+
+		assert(bs->ops == NULL);
+		bs->ops = &file_vtable;
+		bs->arg = p;
+	}
+	return 0;
+}
+
+int reftable_fd_write(void *arg, byte *data, size_t sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 00000000000..b48155e1107
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,240 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct reftable_iterator it)
+{
+	return it.ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct record rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct reftable_iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct reftable_iterator *it)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct reftable_iterator it, struct record rec)
+{
+	return it.ops->next(it.iter_arg, rec);
+}
+
+void reftable_iterator_destroy(struct reftable_iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int reftable_iterator_next_ref(struct reftable_iterator it,
+			       struct reftable_ref_record *ref)
+{
+	struct record rec = { 0 };
+	record_from_ref(&rec, ref);
+	return iterator_next(it, rec);
+}
+
+int reftable_iterator_next_log(struct reftable_iterator it,
+			       struct reftable_log_record *log)
+{
+	struct record rec = { 0 };
+	record_from_log(&rec, log);
+	return iterator_next(it, rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	slice_clear(&fri->oid);
+	reftable_iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec.data;
+	int err = 0;
+	while (true) {
+		err = reftable_iterator_next_ref(fri->it, ref);
+		if (err != 0) {
+			break;
+		}
+
+		if (fri->double_check) {
+			struct reftable_iterator it = { 0 };
+
+			err = reftable_table_seek_ref(fri->tab, &it,
+						      ref->ref_name);
+			if (err == 0) {
+				err = reftable_iterator_next_ref(it, ref);
+			}
+
+			reftable_iterator_destroy(&it);
+
+			if (err < 0) {
+				break;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+
+	reftable_ref_record_clear(ref);
+	return err;
+}
+
+struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reftable_block_done(&it->block_reader.block);
+	slice_clear(&it->oid);
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reftable_block_done(&it->block_reader.block);
+
+	{
+		uint64_t off = it->offsets[it->offset_idx++];
+		int err = reader_init_block_reader(it->r, &it->block_reader,
+						   off, BLOCK_TYPE_REF);
+		if (err < 0) {
+			return err;
+		}
+		if (err > 0) {
+			/* indexed block does not exist. */
+			return REFTABLE_FORMAT_ERROR;
+		}
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct record rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec.data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter *itr =
+		reftable_calloc(sizeof(struct indexed_table_ref_iter));
+	int err = 0;
+
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		reftable_free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 00000000000..5a0d12a3552
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,63 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct reftable_iterator_vtable {
+	int (*next)(void *iter_arg, struct record rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct reftable_iterator *it);
+int iterator_next(struct reftable_iterator it, struct record rec);
+
+/* Returns true for a zeroed out iterator, such as the one returned from
+   iterator_destroy. */
+bool iterator_is_null(struct reftable_iterator it);
+
+/* iterator that produces only ref records that point to `oid` */
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reftable_table tab;
+	struct slice oid;
+	struct reftable_iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
+					  struct filtering_ref_iterator *);
+
+/* iterator that produces only ref records that point to `oid`,
+   but using the object index.
+ */
+struct indexed_table_ref_iter {
+	struct reftable_reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 00000000000..ff8940c9237
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,325 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct record rec = new_record(mi->typ);
+		int err = iterator_next(mi->stack[i], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[i]);
+			record_destroy(&rec);
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->stack[i]);
+	}
+	reftable_free(mi->stack);
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
+{
+	if (iterator_is_null(mi->stack[idx])) {
+		return 0;
+	}
+
+	{
+		struct record rec = new_record(mi->typ);
+		struct pq_entry e = {
+			.rec = rec,
+			.index = idx,
+		};
+		int err = iterator_next(mi->stack[idx], rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[idx]);
+			record_destroy(&rec);
+			return 0;
+		}
+
+		merged_iter_pqueue_add(&mi->pq, e);
+	}
+	return 0;
+}
+
+static int merged_iter_next_entry(struct merged_iter *mi, struct record rec)
+{
+	struct slice entry_key = { 0 };
+	struct pq_entry entry = { 0 };
+	int err = 0;
+
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	entry = merged_iter_pqueue_remove(&mi->pq);
+	err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0) {
+		return err;
+	}
+
+	/*
+	  One can also use reftable as datacenter-local storage, where the ref
+	  database is maintained in globally consistent database (eg.
+	  CockroachDB or Spanner). In this scenario, replication delays together
+	  with compaction may cause newer tables to contain older entries. In
+	  such a deployment, the loop below must be changed to collect all
+	  entries for the same key, and return new the newest one.
+	*/
+	record_key(entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = { 0 };
+		int err = 0, cmp = 0;
+
+		record_key(top.rec, &k);
+
+		cmp = slice_compare(k, entry_key);
+		slice_clear(&k);
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		record_destroy(&top.rec);
+	}
+
+	record_copy_from(rec, entry.rec, hash_size(mi->hash_id));
+	record_destroy(&entry.rec);
+	slice_clear(&entry_key);
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct record rec)
+{
+	while (true) {
+		int err = merged_iter_next_entry(mi, rec);
+		if (err == 0 && mi->suppress_deletions &&
+		    record_is_deletion(rec)) {
+			continue;
+		}
+
+		return err;
+	}
+}
+
+static int merged_iter_next_void(void *p, struct record rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	return merged_iter_next(mi, rec);
+}
+
+struct reftable_iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct reftable_iterator *it,
+				      struct merged_iter *mi)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id)
+{
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reftable_reader *r = stack[i];
+		if (r->hash_id != hash_id) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reftable_reader_min_update_index(r);
+		}
+
+		last_max = reftable_reader_max_update_index(r);
+	}
+
+	{
+		struct reftable_merged_table m = {
+			.stack = stack,
+			.stack_len = n,
+			.min = first_min,
+			.max = last_max,
+			.hash_id = hash_id,
+		};
+
+		*dest = reftable_calloc(sizeof(struct reftable_merged_table));
+		**dest = m;
+	}
+	return 0;
+}
+
+void reftable_merged_table_close(struct reftable_merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reftable_reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct reftable_merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void reftable_merged_table_free(struct reftable_merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	reftable_free(mt);
+}
+
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
+{
+	return mt->min;
+}
+
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it, struct record rec)
+{
+	struct reftable_iterator *iters = reftable_calloc(
+		sizeof(struct reftable_iterator) * mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = record_type(rec),
+		.hash_id = mt->hash_id,
+		.suppress_deletions = mt->suppress_deletions,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			reftable_iterator_destroy(&iters[i]);
+		}
+		reftable_free(iters);
+		return err;
+	}
+
+	merged.stack_len = n;
+	err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	}
+
+	{
+		struct merged_iter *p =
+			reftable_malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, rec);
+}
+
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 00000000000..a71f36ecbae
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,38 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct reftable_merged_table {
+	struct reftable_reader **stack;
+	int stack_len;
+	uint32_t hash_id;
+	bool suppress_deletions;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	int stack_len;
+	byte typ;
+	bool suppress_deletions;
+	struct merged_iter_pqueue pq;
+};
+
+void merged_table_clear(struct reftable_merged_table *mt);
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it, struct record rec);
+
+#endif
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 00000000000..9e0bd3ca85b
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,114 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = { 0 };
+	struct slice bk = { 0 };
+	int cmp = 0;
+	record_key(a.rec, &ak);
+	record_key(b.rec, &bk);
+
+	cmp = slice_compare(ak, bk);
+
+	slice_clear(&ak);
+	slice_clear(&bk);
+
+	if (cmp == 0) {
+		return a.index > b.index;
+	}
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = reftable_realloc(pq->heap,
+					    pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		record_destroy(&pq->heap[i].rec);
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 00000000000..b585a52ee14
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 00000000000..b5744de3378
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,753 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct reftable_block_source source)
+{
+	return source.ops->size(source.arg);
+}
+
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	int result = source.ops->read_block(source.arg, dest, off, size);
+	dest->source = source;
+	return result;
+}
+
+void reftable_block_done(struct reftable_block *blockp)
+{
+	struct reftable_block_source source = blockp->source;
+	if (blockp != NULL && source.ops != NULL)
+		source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct reftable_block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reftable_reader_offsets *
+reader_offsets_for(struct reftable_reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reftable_reader *r,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size) {
+		return 0;
+	}
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(r->source, dest, off, sz);
+}
+
+uint32_t reftable_reader_hash_id(struct reftable_reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reftable_reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reftable_reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	int err = 0;
+	if (memcmp(f, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, header_size(r->version))) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+
+	f++;
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	if (r->version == 1) {
+		r->hash_id = SHA1_ID;
+	} else {
+		r->hash_id = get_be32(f);
+		switch (r->hash_id) {
+		case SHA1_ID:
+			break;
+		case SHA256_ID:
+			break;
+		default:
+			err = REFTABLE_FORMAT_ERROR;
+			goto exit;
+		}
+		f += 4;
+	}
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	{
+		uint32_t computed_crc = crc32(0, footer, f - footer);
+		uint32_t file_crc = get_be32(f);
+		f += 4;
+		if (computed_crc != file_crc) {
+			err = REFTABLE_FORMAT_ERROR;
+			goto exit;
+		}
+	}
+
+	{
+		byte first_block_typ = header[header_size(r->version)];
+		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+		r->ref_offsets.offset = 0;
+		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+					  r->log_offsets.offset > 0);
+		r->obj_offsets.present = r->obj_offsets.offset > 0;
+	}
+	err = 0;
+exit:
+	return err;
+}
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
+		const char *name)
+{
+	struct reftable_block footer = { 0 };
+	struct reftable_block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reftable_reader));
+
+	/* Need +1 to read type of first block. */
+	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
+	if (err != header_size(2) + 1) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	if (memcmp(header.data, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+	r->version = header.data[4];
+	if (r->version != 1 && r->version != 2) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+
+	r->size = block_source_size(source) - footer_size(r->version);
+	r->source = source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size,
+				      footer_size(r->version));
+	if (err != footer_size(r->version)) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+exit:
+	reftable_block_done(&footer);
+	reftable_block_done(&header);
+	return err;
+}
+
+struct table_iter {
+	struct reftable_reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
+		((struct reftable_ref_record *)rec.data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reftable_block_done(&ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off,
+				  int version)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += header_size(version);
+	}
+
+	*typ = data[0];
+	if (is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct reftable_block block = { 0 };
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : header_size(r->version);
+	int32_t block_size = 0;
+
+	if (next_off >= r->size) {
+		return 1;
+	}
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0) {
+		return err;
+	}
+
+	block_size = extract_block_size(block.data, &block_typ, next_off,
+					r->version);
+	if (block_size < 0) {
+		return block_size;
+	}
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reftable_block_done(&block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reftable_block_done(&block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0) {
+		return err;
+	}
+
+	{
+		struct block_reader *brp =
+			reftable_malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct record rec)
+{
+	if (record_type(rec) != ti->typ) {
+		return REFTABLE_API_ERROR;
+	}
+
+	while (true) {
+		struct table_iter next = { 0 };
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct record rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct reftable_iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reftable_reader *r,
+				struct table_iter *ti, uint64_t off, byte typ)
+{
+	struct block_reader br = { 0 };
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0) {
+		return err;
+	}
+
+	brp = reftable_malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reftable_reader *r, struct table_iter *ti,
+			byte typ, bool index)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
+			      struct record want)
+{
+	struct record rec = new_record(record_type(want));
+	struct slice want_key = { 0 };
+	struct slice got_key = { 0 };
+	struct table_iter next = { 0 };
+	int err = -1;
+	record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0) {
+			goto exit;
+		}
+		{
+			int cmp = slice_compare(got_key, want_key);
+			if (cmp > 0) {
+				table_iter_block_done(&next);
+				break;
+			}
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0) {
+		goto exit;
+	}
+	err = 0;
+
+exit:
+	block_iter_close(&next.bi);
+	record_destroy(&rec);
+	slice_clear(&want_key);
+	slice_clear(&got_key);
+	return err;
+}
+
+static int reader_seek_indexed(struct reftable_reader *r,
+			       struct reftable_iterator *it, struct record rec)
+{
+	struct index_record want_index = { 0 };
+	struct record want_index_rec = { 0 };
+	struct index_record index_result = { 0 };
+	struct record index_result_rec = { 0 };
+	struct table_iter index_iter = { 0 };
+	struct table_iter next = { 0 };
+	int err = 0;
+
+	record_key(rec, &want_index.last_key);
+	record_from_index(&want_index_rec, &want_index);
+	record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, record_type(rec), true);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reader_seek_linear(r, &index_iter, want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (next.typ == record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = REFTABLE_FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter *malloced =
+			reftable_calloc(sizeof(struct table_iter));
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+exit:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	record_clear(want_index_rec);
+	record_clear(index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reftable_reader *r,
+				struct reftable_iterator *it, struct record rec)
+{
+	struct reftable_reader_offsets *offs =
+		reader_offsets_for(r, record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = { 0 };
+	int err = 0;
+	if (idx > 0) {
+		return reader_seek_indexed(r, it, rec);
+	}
+
+	err = reader_start(r, &ti, record_type(rec), false);
+	if (err < 0) {
+		return err;
+	}
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct table_iter *p =
+			reftable_malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct record rec)
+{
+	byte typ = record_type(rec);
+
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return reader_seek(r, it, rec);
+}
+
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct record rec = { 0 };
+	record_from_log(&rec, &log);
+	return reader_seek(r, it, rec);
+}
+
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reftable_reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int reftable_new_reader(struct reftable_reader **p,
+			struct reftable_block_source src, char const *name)
+{
+	struct reftable_reader *rd =
+		reftable_calloc(sizeof(struct reftable_reader));
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		block_source_close(&src);
+		reftable_free(rd);
+	}
+	return err;
+}
+
+void reftable_reader_free(struct reftable_reader *r)
+{
+	reader_close(r);
+	reftable_free(r);
+}
+
+static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
+					    struct reftable_iterator *it,
+					    byte *oid)
+{
+	struct obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct record want_rec = { 0 };
+	struct reftable_iterator oit = { 0 };
+	struct obj_record got = { 0 };
+	struct record got_rec = { 0 };
+	int err = 0;
+
+	/* Look through the reverse index. */
+	record_from_obj(&want_rec, &want);
+	err = reader_seek(r, &oit, want_rec);
+	if (err != 0) {
+		goto exit;
+	}
+
+	/* read out the obj_record */
+	record_from_obj(&got_rec, &got);
+	err = iterator_next(oit, got_rec);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		/* didn't find it; return empty iterator */
+		iterator_set_empty(it);
+		err = 0;
+		goto exit;
+	}
+
+	{
+		struct indexed_table_ref_iter *itr = NULL;
+		err = new_indexed_table_ref_iter(&itr, r, oid,
+						 hash_size(r->hash_id),
+						 got.offsets, got.offset_len);
+		if (err < 0) {
+			goto exit;
+		}
+		got.offsets = NULL;
+		iterator_from_indexed_table_ref_iter(it, itr);
+	}
+
+exit:
+	reftable_iterator_destroy(&oit);
+	record_clear(got_rec);
+	return err;
+}
+
+static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
+					      struct reftable_iterator *it,
+					      byte *oid)
+{
+	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
+	struct filtering_ref_iterator *filter = NULL;
+	int oid_len = hash_size(r->hash_id);
+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		reftable_free(ti);
+		return err;
+	}
+
+	filter = reftable_calloc(sizeof(struct filtering_ref_iterator));
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	reftable_table_from_reader(&filter->tab, r);
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, byte *oid)
+{
+	if (r->obj_offsets.present) {
+		return reftable_reader_refs_for_indexed(r, it, oid);
+	}
+	return reftable_reader_refs_for_unindexed(r, it, oid);
+}
+
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 00000000000..fee83c6dad0
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,65 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source source);
+
+int block_source_read_block(struct reftable_block_source source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_close(struct reftable_block_source *source);
+
+/* metadata for a block type */
+struct reftable_reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+/* The state for reading a reftable file. */
+struct reftable_reader {
+	/* for convience, associate a name with the instance. */
+	char *name;
+	struct reftable_block_source source;
+
+	/* Size of the file, excluding the footer. */
+	uint64_t size;
+
+	/* 'sha1' for SHA1, 's256' for SHA-256 */
+	uint32_t hash_id;
+
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	/* Length of the OID keys in the 'o' section */
+	int object_id_len;
+	int version;
+
+	struct reftable_reader_offsets ref_offsets;
+	struct reftable_reader_offsets obj_offsets;
+	struct reftable_reader_offsets log_offsets;
+};
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
+		const char *name);
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct record rec);
+void reader_close(struct reftable_reader *r);
+const char *reader_name(struct reftable_reader *r);
+
+/* initialize a block reader to read from `r` */
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 00000000000..8ab572b40ed
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1141 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+/* record.c - methods for different types of records. */
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0) {
+		return -1;
+	}
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = { 0 };
+	int i = 9;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	{
+		int n = sizeof(buf) - i - 1;
+		if (dest.len < n) {
+			return -1;
+		}
+		memcpy(dest.buf, &buf[i + 1], n);
+		return n;
+	}
+}
+
+int is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+	if (in.len < tsize) {
+		return -1;
+	}
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	slice_consume(&in, tsize);
+
+	return start_len - in.len;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+	if (s.len < l) {
+		return -1;
+	}
+	memcpy(s.buf, str, l);
+	slice_consume(&s, l);
+
+	return start.len - s.len;
+}
+
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	if (dest.len < suffix_len) {
+		return -1;
+	}
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	slice_consume(&dest, suffix_len);
+
+	return start.len - dest.len;
+}
+
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	if (prefix_len > last_key.len) {
+		return -1;
+	}
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len) {
+		return -1;
+	}
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	slice_consume(&in, suffix_len);
+
+	return start_len - in.len;
+}
+
+static void reftable_ref_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_ref_record *rec =
+		(const struct reftable_ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
+	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	reftable_ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = xstrdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = reftable_malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = reftable_malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9) {
+		return '0' + c;
+	}
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size(hash_id));
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size(hash_id));
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void reftable_ref_record_clear_void(void *rec)
+{
+	reftable_ref_record_clear((struct reftable_ref_record *)rec);
+}
+
+void reftable_ref_record_clear(struct reftable_ref_record *ref)
+{
+	reftable_free(ref->ref_name);
+	reftable_free(ref->target);
+	reftable_free(ref->target_value);
+	reftable_free(ref->value);
+	memset(ref, 0, sizeof(struct reftable_ref_record));
+}
+
+static byte reftable_ref_record_val_type(const void *rec)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL) {
+		return 3;
+	}
+	return 0;
+}
+
+static int reftable_ref_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+
+	return start.len - s.len;
+}
+
+static int reftable_ref_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0) {
+		return n;
+	}
+	assert(hash_size > 0);
+
+	slice_consume(&in, n);
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = reftable_malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = reftable_malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		break;
+	case 3: {
+		struct slice dest = { 0 };
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+		seen_target = true;
+		if (r->target != NULL) {
+			reftable_free(r->target);
+		}
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+static bool reftable_ref_record_is_deletion_void(const void *p)
+{
+	return reftable_ref_record_is_deletion(
+		(const struct reftable_ref_record *)p);
+}
+
+struct record_vtable reftable_ref_record_vtable = {
+	.key = &reftable_ref_record_key,
+	.type = BLOCK_TYPE_REF,
+	.copy_from = &reftable_ref_record_copy_from,
+	.val_type = &reftable_ref_record_val_type,
+	.encode = &reftable_ref_record_encode,
+	.decode = &reftable_ref_record_decode,
+	.clear = &reftable_ref_record_clear_void,
+	.is_deletion = &reftable_ref_record_is_deletion_void,
+};
+
+static void obj_record_key(const void *r, struct slice *dest)
+{
+	const struct obj_record *rec = (const struct obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void obj_record_clear(void *rec)
+{
+	struct obj_record *obj = (struct obj_record *)rec;
+	FREE_AND_NULL(obj->hash_prefix);
+	FREE_AND_NULL(obj->offsets);
+	memset(obj, 0, sizeof(struct obj_record));
+}
+
+static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
+{
+	struct obj_record *obj = (struct obj_record *)rec;
+	const struct obj_record *src = (const struct obj_record *)src_rec;
+
+	obj_record_clear(obj);
+	*obj = *src;
+	obj->hash_prefix = reftable_malloc(obj->hash_prefix_len);
+	memcpy(obj->hash_prefix, src->hash_prefix, obj->hash_prefix_len);
+
+	{
+		int olen = obj->offset_len * sizeof(uint64_t);
+		obj->offsets = reftable_malloc(olen);
+		memcpy(obj->offsets, src->offsets, olen);
+	}
+}
+
+static byte obj_record_val_type(const void *rec)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8) {
+		return r->offset_len;
+	}
+	return 0;
+}
+
+static int obj_record_encode(const void *rec, struct slice s, int hash_size)
+{
+	struct obj_record *r = (struct obj_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+	if (r->offset_len == 0) {
+		return start.len - s.len;
+	}
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	{
+		uint64_t last = r->offsets[0];
+		int i = 0;
+		for (i = 1; i < r->offset_len; i++) {
+			int n = put_var_int(s, r->offsets[i] - last);
+			if (n < 0) {
+				return -1;
+			}
+			slice_consume(&s, n);
+			last = r->offsets[i];
+		}
+	}
+	return start.len - s.len;
+}
+
+static int obj_record_decode(void *rec, struct slice key, byte val_type,
+			     struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct obj_record *r = (struct obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	r->hash_prefix = reftable_malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		slice_consume(&in, n);
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0) {
+		return start.len - in.len;
+	}
+
+	r->offsets = reftable_malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0) {
+		return n;
+	}
+	slice_consume(&in, n);
+
+	{
+		uint64_t last = r->offsets[0];
+		int j = 1;
+		while (j < count) {
+			uint64_t delta = 0;
+			int n = get_var_int(&delta, in);
+			if (n < 0) {
+				return n;
+			}
+			slice_consume(&in, n);
+
+			last = r->offsets[j] = (delta + last);
+			j++;
+		}
+	}
+	return start.len - in.len;
+}
+
+static bool not_a_deletion(const void *p)
+{
+	return false;
+}
+
+struct record_vtable obj_record_vtable = {
+	.key = &obj_record_key,
+	.type = BLOCK_TYPE_OBJ,
+	.copy_from = &obj_record_copy_from,
+	.val_type = &obj_record_val_type,
+	.encode = &obj_record_encode,
+	.decode = &obj_record_decode,
+	.clear = &obj_record_clear,
+	.is_deletion = not_a_deletion,
+};
+
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size(hash_id));
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size(hash_id));
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static void reftable_log_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_log_record *rec =
+		(const struct reftable_log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_be64(dest->buf + 1 + len, ts);
+}
+
+static void reftable_log_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
+	const struct reftable_log_record *src =
+		(const struct reftable_log_record *)src_rec;
+
+	reftable_log_record_clear(dst);
+	*dst = *src;
+	if (dst->ref_name != NULL) {
+		dst->ref_name = xstrdup(dst->ref_name);
+	}
+	if (dst->email != NULL) {
+		dst->email = xstrdup(dst->email);
+	}
+	if (dst->name != NULL) {
+		dst->name = xstrdup(dst->name);
+	}
+	if (dst->message != NULL) {
+		dst->message = xstrdup(dst->message);
+	}
+
+	if (dst->new_hash != NULL) {
+		dst->new_hash = reftable_malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = reftable_malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void reftable_log_record_clear_void(void *rec)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	reftable_log_record_clear(r);
+}
+
+void reftable_log_record_clear(struct reftable_log_record *r)
+{
+	reftable_free(r->ref_name);
+	reftable_free(r->new_hash);
+	reftable_free(r->old_hash);
+	reftable_free(r->name);
+	reftable_free(r->email);
+	reftable_free(r->message);
+	memset(r, 0, sizeof(struct reftable_log_record));
+}
+
+static byte reftable_log_record_val_type(const void *rec)
+{
+	const struct reftable_log_record *log =
+		(const struct reftable_log_record *)rec;
+
+	return reftable_log_record_is_deletion(log) ? 0 : 1;
+}
+
+static byte zero[SHA256_SIZE] = { 0 };
+
+static int reftable_log_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (reftable_log_record_is_deletion(r)) {
+		return 0;
+	}
+
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size) {
+		return -1;
+	}
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	slice_consume(&s, 2 * hash_size);
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = put_var_int(s, r->time);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (s.len < 2) {
+		return -1;
+	}
+
+	put_be16(s.buf, r->tz_offset);
+	slice_consume(&s, 2);
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	return start.len - s.len;
+}
+
+static int reftable_log_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = { 0 };
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (val_type == 0) {
+		FREE_AND_NULL(r->old_hash);
+		FREE_AND_NULL(r->new_hash);
+		FREE_AND_NULL(r->message);
+		FREE_AND_NULL(r->email);
+		FREE_AND_NULL(r->name);
+		return 0;
+	}
+
+	if (in.len < 2 * hash_size) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	r->old_hash = reftable_realloc(r->old_hash, hash_size);
+	r->new_hash = reftable_realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	slice_consume(&in, 2 * hash_size);
+
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->name = reftable_realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->email = reftable_realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+	r->time = ts;
+	if (in.len < 2) {
+		goto error;
+	}
+
+	r->tz_offset = get_be16(in.buf);
+	slice_consume(&in, 2);
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->message = reftable_realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	slice_clear(&dest);
+	return start.len - in.len;
+
+error:
+	slice_clear(&dest);
+	return REFTABLE_FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL) {
+		a = empty;
+	}
+	if (b == NULL) {
+		b = empty;
+	}
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL) {
+		a = zero;
+	}
+	if (b == NULL) {
+		b = zero;
+	}
+	return !memcmp(a, b, sz);
+}
+
+bool reftable_log_record_equal(struct reftable_log_record *a,
+			       struct reftable_log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+static bool reftable_log_record_is_deletion_void(const void *p)
+{
+	return reftable_log_record_is_deletion(
+		(const struct reftable_log_record *)p);
+}
+
+struct record_vtable reftable_log_record_vtable = {
+	.key = &reftable_log_record_key,
+	.type = BLOCK_TYPE_LOG,
+	.copy_from = &reftable_log_record_copy_from,
+	.val_type = &reftable_log_record_val_type,
+	.encode = &reftable_log_record_encode,
+	.decode = &reftable_log_record_decode,
+	.clear = &reftable_log_record_clear_void,
+	.is_deletion = &reftable_log_record_is_deletion_void,
+};
+
+struct record new_record(byte typ)
+{
+	struct record rec = { NULL };
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct reftable_ref_record *r =
+			reftable_calloc(sizeof(struct reftable_ref_record));
+		record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct obj_record *r =
+			reftable_calloc(sizeof(struct obj_record));
+		record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct reftable_log_record *r =
+			reftable_calloc(sizeof(struct reftable_log_record));
+		record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct index_record *r =
+			reftable_calloc(sizeof(struct index_record));
+		record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+void *record_yield(struct record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+void record_destroy(struct record *rec)
+{
+	record_clear(*rec);
+	reftable_free(record_yield(rec));
+}
+
+static void index_record_key(const void *r, struct slice *dest)
+{
+	struct index_record *rec = (struct index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void index_record_copy_from(void *rec, const void *src_rec,
+				   int hash_size)
+{
+	struct index_record *dst = (struct index_record *)rec;
+	struct index_record *src = (struct index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void index_record_clear(void *rec)
+{
+	struct index_record *idx = (struct index_record *)rec;
+	slice_clear(&idx->last_key);
+}
+
+static byte index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int index_record_encode(const void *rec, struct slice out, int hash_size)
+{
+	const struct index_record *r = (const struct index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&out, n);
+
+	return start.len - out.len;
+}
+
+static int index_record_decode(void *rec, struct slice key, byte val_type,
+			       struct slice in, int hash_size)
+{
+	struct slice start = in;
+	struct index_record *r = (struct index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&in, n);
+	return start.len - in.len;
+}
+
+struct record_vtable index_record_vtable = {
+	.key = &index_record_key,
+	.type = BLOCK_TYPE_INDEX,
+	.copy_from = &index_record_copy_from,
+	.val_type = &index_record_val_type,
+	.encode = &index_record_encode,
+	.decode = &index_record_decode,
+	.clear = &index_record_clear,
+	.is_deletion = &not_a_deletion,
+};
+
+void record_key(struct record rec, struct slice *dest)
+{
+	rec.ops->key(rec.data, dest);
+}
+
+byte record_type(struct record rec)
+{
+	return rec.ops->type;
+}
+
+int record_encode(struct record rec, struct slice dest, int hash_size)
+{
+	return rec.ops->encode(rec.data, dest, hash_size);
+}
+
+void record_copy_from(struct record rec, struct record src, int hash_size)
+{
+	assert(src.ops->type == rec.ops->type);
+
+	rec.ops->copy_from(rec.data, src.data, hash_size);
+}
+
+byte record_val_type(struct record rec)
+{
+	return rec.ops->val_type(rec.data);
+}
+
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size)
+{
+	return rec.ops->decode(rec.data, key, extra, src, hash_size);
+}
+
+void record_clear(struct record rec)
+{
+	rec.ops->clear(rec.data);
+}
+
+bool record_is_deletion(struct record rec)
+{
+	return rec.ops->is_deletion(rec.data);
+}
+
+void record_from_ref(struct record *rec, struct reftable_ref_record *ref_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = ref_rec;
+	rec->ops = &reftable_ref_record_vtable;
+}
+
+void record_from_obj(struct record *rec, struct obj_record *obj_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = obj_rec;
+	rec->ops = &obj_record_vtable;
+}
+
+void record_from_index(struct record *rec, struct index_record *index_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = index_rec;
+	rec->ops = &index_record_vtable;
+}
+
+void record_from_log(struct record *rec, struct reftable_log_record *log_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = log_rec;
+	rec->ops = &reftable_log_record_vtable;
+}
+
+struct reftable_ref_record *record_as_ref(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_REF);
+	return (struct reftable_ref_record *)rec.data;
+}
+
+struct reftable_log_record *record_as_log(struct record rec)
+{
+	assert(record_type(rec) == BLOCK_TYPE_LOG);
+	return (struct reftable_log_record *)rec.data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL) {
+		return !memcmp(a, b, hash_size);
+	}
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL) {
+		return 0 == strcmp(a, b);
+	}
+
+	return a == b;
+}
+
+bool reftable_ref_record_equal(struct reftable_ref_record *a,
+			       struct reftable_ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int reftable_ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_ref_record *)a)->ref_name,
+		      ((struct reftable_ref_record *)b)->ref_name);
+}
+
+bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int reftable_log_record_compare_key(const void *a, const void *b)
+{
+	struct reftable_log_record *la = (struct reftable_log_record *)a;
+	struct reftable_log_record *lb = (struct reftable_log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp) {
+		return cmp;
+	}
+	if (la->update_index > lb->update_index) {
+		return -1;
+	}
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
+{
+	return (log->new_hash == NULL && log->old_hash == NULL &&
+		log->name == NULL && log->email == NULL &&
+		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
+		log->message == NULL);
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 00000000000..011d9bc56fc
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,121 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+/* utilities for de/encoding varints */
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+
+/* Methods for records. */
+struct record_vtable {
+	/* encode the key of to a byte slice. */
+	void (*key)(const void *rec, struct slice *dest);
+
+	/* The record type of ('r' for ref). */
+	byte type;
+
+	void (*copy_from)(void *dest, const void *src, int hash_size);
+
+	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
+	 * vs ref deletion) */
+	byte (*val_type)(const void *rec);
+
+	/* encodes rec into dest, returning how much space was used. */
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+
+	/* decode data from `src` into the record. */
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+
+	/* deallocate and null the record. */
+	void (*clear)(void *rec);
+
+	/* is this a tombstone? */
+	bool (*is_deletion)(const void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct record {
+	void *data;
+	struct record_vtable *ops;
+};
+
+/* returns true for recognized block types. Block start with the block type. */
+int is_block_type(byte typ);
+
+/* creates a malloced record of the given type. Dispose with record_destroy */
+struct record new_record(byte typ);
+
+extern struct record_vtable reftable_ref_record_vtable;
+
+/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
+   number of bytes written. */
+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
+	       struct slice key, byte extra);
+
+/* Decode into `key` and `extra` from `in` */
+int decode_key(struct slice *key, byte *extra, struct slice last_key,
+	       struct slice in);
+
+/* index_record are used internally to speed up lookups. */
+struct index_record {
+	uint64_t offset; /* Offset of block */
+	struct slice last_key; /* Last key of the block. */
+};
+
+/* obj_record stores an object ID => ref mapping. */
+struct obj_record {
+	byte *hash_prefix; /* leading bytes of the object ID */
+	int hash_prefix_len; /* number of leading bytes. Constant
+			      * across a single table. */
+	uint64_t *offsets; /* a vector of file offsets. */
+	int offset_len;
+};
+
+/* see struct record_vtable */
+
+void record_key(struct record rec, struct slice *dest);
+byte record_type(struct record rec);
+void record_copy_from(struct record rec, struct record src, int hash_size);
+byte record_val_type(struct record rec);
+int record_encode(struct record rec, struct slice dest, int hash_size);
+int record_decode(struct record rec, struct slice key, byte extra,
+		  struct slice src, int hash_size);
+bool record_is_deletion(struct record rec);
+
+/* zeroes out the embedded record */
+void record_clear(struct record rec);
+
+/* clear out the record, yielding the record data that was encapsulated. */
+void *record_yield(struct record *rec);
+
+/* clear and deallocate embedded record, and zero `rec`. */
+void record_destroy(struct record *rec);
+
+/* initialize generic records from concrete records. The generic record should
+ * be zeroed out. */
+void record_from_obj(struct record *rec, struct obj_record *objrec);
+void record_from_index(struct record *rec, struct index_record *idxrec);
+void record_from_ref(struct record *rec, struct reftable_ref_record *refrec);
+void record_from_log(struct record *rec, struct reftable_log_record *logrec);
+struct reftable_ref_record *record_as_ref(struct record ref);
+struct reftable_log_record *record_as_log(struct record ref);
+
+/* for qsort. */
+int reftable_ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int reftable_log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/refname.c b/reftable/refname.c
new file mode 100644
index 00000000000..92964cb9514
--- /dev/null
+++ b/reftable/refname.c
@@ -0,0 +1,215 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+#include "reftable.h"
+#include "basics.h"
+#include "refname.h"
+#include "slice.h"
+
+struct find_arg {
+	char **names;
+	const char *want;
+};
+
+static int find_name(size_t k, void *arg)
+{
+	struct find_arg *f_arg = (struct find_arg *)arg;
+	return strcmp(f_arg->names[k], f_arg->want) >= 0;
+}
+
+int modification_has_ref(struct modification *mod, const char *name)
+{
+	struct reftable_ref_record ref = { 0 };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = name,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len && !strcmp(mod->add[idx], name)) {
+			return 0;
+		}
+	}
+
+	if (mod->del_len > 0) {
+		struct find_arg arg = {
+			.names = mod->del,
+			.want = name,
+		};
+		int idx = binsearch(mod->del_len, find_name, &arg);
+		if (idx < mod->del_len && !strcmp(mod->del[idx], name)) {
+			return 1;
+		}
+	}
+
+	err = reftable_table_read_ref(mod->tab, name, &ref);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static void modification_clear(struct modification *mod)
+{
+	/* don't delete the strings themselves; they're owned by ref records.
+	 */
+	FREE_AND_NULL(mod->add);
+	FREE_AND_NULL(mod->del);
+	mod->add_len = 0;
+	mod->del_len = 0;
+}
+
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix)
+{
+	struct reftable_iterator it = { NULL };
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = prefix,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len &&
+		    !strncmp(prefix, mod->add[idx], strlen(prefix))) {
+			goto exit;
+		}
+	}
+	err = reftable_table_seek_ref(mod->tab, &it, prefix);
+	if (err) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_ref(it, &ref);
+		if (err) {
+			goto exit;
+		}
+
+		if (mod->del_len > 0) {
+			struct find_arg arg = {
+				.names = mod->del,
+				.want = ref.ref_name,
+			};
+			int idx = binsearch(mod->del_len, find_name, &arg);
+			if (idx < mod->del_len &&
+			    !strcmp(ref.ref_name, mod->del[idx])) {
+				continue;
+			}
+		}
+
+		if (strncmp(ref.ref_name, prefix, strlen(prefix))) {
+			err = 1;
+			goto exit;
+		}
+		err = 0;
+		goto exit;
+	}
+
+exit:
+	reftable_ref_record_clear(&ref);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int validate_ref_name(const char *name)
+{
+	while (true) {
+		char *next = strchr(name, '/');
+		if (!*name) {
+			return REFTABLE_REFNAME_ERROR;
+		}
+		if (!next) {
+			return 0;
+		}
+		if (next - name == 0 || (next - name == 1 && *name == '.') ||
+		    (next - name == 2 && name[0] == '.' && name[1] == '.'))
+			return REFTABLE_REFNAME_ERROR;
+		name = next + 1;
+	}
+	return 0;
+}
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz)
+{
+	struct modification mod = {
+		.tab = tab,
+		.add = reftable_calloc(sizeof(char *) * sz),
+		.del = reftable_calloc(sizeof(char *) * sz),
+	};
+	int i = 0;
+	int err = 0;
+	for (; i < sz; i++) {
+		if (reftable_ref_record_is_deletion(&recs[i])) {
+			mod.del[mod.del_len++] = recs[i].ref_name;
+		} else {
+			mod.add[mod.add_len++] = recs[i].ref_name;
+		}
+	}
+
+	err = modification_validate(&mod);
+	modification_clear(&mod);
+	return err;
+}
+
+static void slice_trim_component(struct slice *sl)
+{
+	while (sl->len > 0) {
+		bool is_slash = (sl->buf[sl->len - 1] == '/');
+		sl->len--;
+		if (is_slash)
+			break;
+	}
+}
+
+int modification_validate(struct modification *mod)
+{
+	struct slice slashed = { 0 };
+	int err = 0;
+	int i = 0;
+	for (; i < mod->add_len; i++) {
+		err = validate_ref_name(mod->add[i]);
+		if (err) {
+			goto exit;
+		}
+		slice_set_string(&slashed, mod->add[i]);
+		slice_append_string(&slashed, "/");
+
+		err = modification_has_ref_with_prefix(
+			mod, slice_as_string(&slashed));
+		if (err == 0) {
+			err = REFTABLE_NAME_CONFLICT;
+			goto exit;
+		}
+		if (err < 0) {
+			goto exit;
+		}
+
+		slice_set_string(&slashed, mod->add[i]);
+		while (slashed.len) {
+			slice_trim_component(&slashed);
+			err = modification_has_ref(mod,
+						   slice_as_string(&slashed));
+			if (err == 0) {
+				err = REFTABLE_NAME_CONFLICT;
+				goto exit;
+			}
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+	err = 0;
+exit:
+	slice_clear(&slashed);
+	return err;
+}
diff --git a/reftable/refname.h b/reftable/refname.h
new file mode 100644
index 00000000000..f335287fcdb
--- /dev/null
+++ b/reftable/refname.h
@@ -0,0 +1,38 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+#ifndef REFNAME_H
+#define REFNAME_H
+
+#include "reftable.h"
+
+struct modification {
+	struct reftable_table tab;
+
+	char **add;
+	size_t add_len;
+
+	char **del;
+	size_t del_len;
+};
+
+// -1 = error, 0 = found, 1 = not found
+int modification_has_ref(struct modification *mod, const char *name);
+
+// -1 = error, 0 = found, 1 = not found.
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix);
+
+// 0 = OK.
+int validate_ref_name(const char *name);
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz);
+
+int modification_validate(struct modification *mod);
+
+#endif
diff --git a/reftable/reftable.c b/reftable/reftable.c
new file mode 100644
index 00000000000..d29fbd8ee44
--- /dev/null
+++ b/reftable/reftable.c
@@ -0,0 +1,91 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+#include "record.h"
+#include "reader.h"
+#include "merged.h"
+
+struct reftable_table_vtable {
+	int (*seek)(void *tab, struct reftable_iterator *it, struct record);
+};
+
+static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
+				     struct record rec)
+{
+	return reader_seek((struct reftable_reader *)tab, it, rec);
+}
+
+static struct reftable_table_vtable reader_vtable = {
+	.seek = reftable_reader_seek_void,
+};
+
+static int reftable_merged_table_seek_void(void *tab,
+					   struct reftable_iterator *it,
+					   struct record rec)
+{
+	return merged_table_seek_record((struct reftable_merged_table *)tab, it,
+					rec);
+}
+
+static struct reftable_table_vtable merged_table_vtable = {
+	.seek = reftable_merged_table_seek_void,
+};
+
+int reftable_table_seek_ref(struct reftable_table tab,
+			    struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct record rec = { 0 };
+	record_from_ref(&rec, &ref);
+	return tab.ops->seek(tab.table_arg, it, rec);
+}
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &reader_vtable;
+	tab->table_arg = reader;
+}
+
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *merged)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &merged_table_vtable;
+	tab->table_arg = merged;
+}
+
+int reftable_table_read_ref(struct reftable_table tab, const char *name,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_iterator it = { 0 };
+	int err = reftable_table_seek_ref(tab, &it, name);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_ref(it, ref);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(ref->ref_name, name) ||
+	    reftable_ref_record_is_deletion(ref)) {
+		reftable_ref_record_clear(ref);
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 00000000000..028ed150de1
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,564 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *));
+
+/****************************************************************
+ Basic data types
+
+ Reftables store the state of each ref in struct reftable_ref_record, and they
+ store a sequence of reflog updates in struct reftable_log_record.
+ ****************************************************************/
+
+/* reftable_ref_record holds a ref database entry target_value */
+struct reftable_ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
+
+/* prints a reftable_ref_record onto stdout */
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id);
+
+/* frees and nulls all pointer values. */
+void reftable_ref_record_clear(struct reftable_ref_record *ref);
+
+/* returns whether two reftable_ref_records are the same */
+int reftable_ref_record_equal(struct reftable_ref_record *a,
+			      struct reftable_ref_record *b, int hash_size);
+
+/* reftable_log_record holds a reflog entry */
+struct reftable_log_record {
+	char *ref_name;
+	uint64_t update_index; /* logical timestamp of a transactional update.
+				*/
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int reftable_log_record_is_deletion(const struct reftable_log_record *log);
+
+/* frees and nulls all pointer values. */
+void reftable_log_record_clear(struct reftable_log_record *log);
+
+/* returns whether two records are equal. */
+int reftable_log_record_equal(struct reftable_log_record *a,
+			      struct reftable_log_record *b, int hash_size);
+
+/* dumps a reftable_log_record on stdout, for debugging/testing. */
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id);
+
+/****************************************************************
+ Error handling
+
+ Error are signaled with negative integer return values. 0 means success.
+ ****************************************************************/
+
+/* different types of errors */
+enum reftable_error {
+	/* Unexpected file system behavior */
+	REFTABLE_IO_ERROR = -2,
+
+	/* Format inconsistency on reading data
+	 */
+	REFTABLE_FORMAT_ERROR = -3,
+
+	/* File does not exist. Returned from block_source_from_file(),  because
+	   it needs special handling in stack.
+	*/
+	REFTABLE_NOT_EXIST_ERROR = -4,
+
+	/* Trying to write out-of-date data. */
+	REFTABLE_LOCK_ERROR = -5,
+
+	/* Misuse of the API:
+	   - on writing a record with NULL ref_name.
+	   - on writing a reftable_ref_record outside the table limits
+	   - on writing a ref or log record before the stack's next_update_index
+	   - on reading a reftable_ref_record from log iterator, or vice versa.
+	*/
+	REFTABLE_API_ERROR = -6,
+
+	/* Decompression error */
+	REFTABLE_ZLIB_ERROR = -7,
+
+	/* Wrote a table without blocks. */
+	REFTABLE_EMPTY_TABLE_ERROR = -8,
+
+	/* Dir/file conflict. */
+	REFTABLE_NAME_CONFLICT = -9,
+
+	/* Illegal ref name. */
+	REFTABLE_REFNAME_ERROR = -10,
+};
+
+/* convert the numeric error code to a string. The string should not be
+ * deallocated. */
+const char *reftable_error_str(int err);
+
+/*
+ * Convert the numeric error code to an equivalent errno code.
+ */
+int reftable_error_to_errno(int err);
+
+/****************************************************************
+ Writing
+
+ Writing single reftables
+ ****************************************************************/
+
+/* reftable_write_options sets options for writing a single reftable. */
+struct reftable_write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+
+	/* boolean: do not check ref names for validity or dir/file conflicts.
+	 */
+	int skip_name_check;
+};
+
+/* reftable_block_stats holds statistics for a single block type */
+struct reftable_block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct reftable_stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct reftable_block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct reftable_block_stats obj_stats;
+	/* stats for index blocks */
+	struct reftable_block_stats idx_stats;
+	/* stats for log blocks */
+	struct reftable_block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* reftable_new_writer creates a new writer */
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, uint8_t *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int reftable_fd_write(void *fdp, uint8_t *data, size_t size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+
+   For transactional updates, typically min==max. When converting an existing
+   ref database into a single reftable, this would be a range of update-index
+   timestamps.
+ */
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max);
+
+/* adds a reftable_ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned.
+
+   It is an error to write a ref record after a log record.
+ */
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n);
+
+/* adds a reftable_log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n);
+
+/* reftable_writer_close finalizes the reftable. The writer is retained so
+ * statistics can be inspected. */
+int reftable_writer_close(struct reftable_writer *w);
+
+/* writer_stats returns the statistics on the reftable being written.
+
+   This struct becomes invalid when the writer is freed.
+ */
+const struct reftable_stats *writer_stats(struct reftable_writer *w);
+
+/* reftable_writer_free deallocates memory for the writer */
+void reftable_writer_free(struct reftable_writer *w);
+
+/****************************************************************
+ * ITERATING
+ ****************************************************************/
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct reftable_iterator {
+	struct reftable_iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_ref(struct reftable_iterator it,
+			       struct reftable_ref_record *ref);
+
+/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_log(struct reftable_iterator it,
+			       struct reftable_log_record *log);
+
+/* releases resources associated with an iterator. */
+void reftable_iterator_destroy(struct reftable_iterator *it);
+
+/****************************************************************
+ Reading single tables
+
+ The follow routines are for reading single files. For an application-level
+ interface, skip ahead to struct reftable_merged_table and struct
+ reftable_stack.
+ ****************************************************************/
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct reftable_block_source {
+	struct reftable_block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct reftable_block {
+	uint8_t *data;
+	int len;
+	struct reftable_block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct reftable_block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct reftable_block *dest,
+			  uint64_t off, uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct reftable_block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int reftable_block_source_from_file(struct reftable_block_source *block_src,
+				    const char *name);
+
+/* The reader struct is a handle to an open reftable file. */
+struct reftable_reader;
+
+/* reftable_new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp. The name is used for creating a stack. Typically, it is the
+ * basename of the file. The block source `src` is owned by the reader, and is
+ * closed on calling reftable_reader_destroy().
+ */
+int reftable_new_reader(struct reftable_reader **pp,
+			struct reftable_block_source src, const char *name);
+
+/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
+   in the table.  To seek to the start of the table, use name = "".
+
+   example:
+
+   struct reftable_reader *r = NULL;
+   int err = reftable_new_reader(&r, src, "filename");
+   if (err < 0) { ... }
+   struct reftable_iterator it  = {0};
+   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct reftable_ref_record ref  = {0};
+   while (1) {
+     err = reftable_iterator_next_ref(it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   reftable_iterator_destroy(&it);
+   reftable_ref_record_clear(&ref);
+ */
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reftable_reader_hash_id(struct reftable_reader *r);
+
+/* seek to logs for the given name, older than update_index. To seek to the
+   start of the table, use name = "".
+ */
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reftable_reader_free(struct reftable_reader *);
+
+/* return an iterator for the refs pointing to `oid`. */
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid);
+
+/* return the max_update_index for a table */
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
+
+/****************************************************************
+ Merged tables
+
+ A ref database kept in a sequence of table files. The merged_table presents a
+ unified view to reading (seeking, iterating) a sequence of immutable tables.
+ ****************************************************************/
+
+/* A merged table is implements seeking/iterating over a stack of tables. */
+struct reftable_merged_table;
+
+/* reftable_new_merged_table creates a new merged table. It takes ownership of
+   the stack array.
+*/
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id);
+
+/* returns an iterator positioned just before 'name' */
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index);
+
+/* like reftable_merged_table_seek_log_at but look for the newest entry. */
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
+
+/* closes readers for the merged tables */
+void reftable_merged_table_close(struct reftable_merged_table *mt);
+
+/* releases memory for the merged_table */
+void reftable_merged_table_free(struct reftable_merged_table *m);
+
+/****************************************************************
+ Generic tables
+
+ A unified API for reading tables, either merged tables, or single readers.
+ ****************************************************************/
+
+struct reftable_table {
+	struct reftable_table_vtable *ops;
+	void *table_arg;
+};
+
+int reftable_table_seek_ref(struct reftable_table tab,
+			    struct reftable_iterator *it, const char *name);
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader);
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *table);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_table_read_ref(struct reftable_table tab, const char *name,
+			    struct reftable_ref_record *ref);
+
+/****************************************************************
+ Mutable ref database
+
+ The stack presents an interface to a mutable sequence of reftables.
+ ****************************************************************/
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct reftable_stack;
+
+/* open a new reftable stack. The tables along with the table list will be
+   stored in 'dir'. Typically, this should be .git/reftables.
+*/
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
+
+/* holds a transaction to add tables at the top of a stack. */
+struct reftable_addition;
+
+/*
+  returns a new transaction to add reftables to the given stack. As a side
+  effect, the ref database is locked.
+*/
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st);
+
+/* Adds a reftable to transaction. */
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg);
+
+/* Commits the transaction, releasing the lock. */
+int reftable_addition_commit(struct reftable_addition *add);
+
+/* Release all non-committed data from the transaction, and deallocate the
+   transaction. Releases the lock if held. */
+void reftable_addition_destroy(struct reftable_addition *add);
+
+/* add a new table to the stack. The write_table function must call
+   reftable_writer_set_limits, add refs and return an error value. */
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write_table)(struct reftable_writer *wr,
+					  void *write_arg),
+		       void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st);
+
+/* frees all resources associated with the stack. */
+void reftable_stack_destroy(struct reftable_stack *st);
+
+/* Reloads the stack if necessary. This is very cheap to run if the stack was up
+ * to date */
+int reftable_stack_reload(struct reftable_stack *st);
+
+/* Policy for expiring reflog entries. */
+struct reftable_log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int reftable_stack_auto_compact(struct reftable_stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log);
+
+/* statistics on past compactions. */
+struct reftable_compaction_stats {
+	uint64_t bytes; /* total number of bytes written */
+	uint64_t entries_written; /* total number of entries written, including
+				     failures. */
+	int attempts; /* how often we tried to compact */
+	int failures; /* failures happen on concurrent updates */
+};
+
+/* return statistics for compaction up till now. */
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st);
+
+#endif
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 00000000000..89e649b0d46
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,225 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+
+	{
+		int l = strlen(str);
+		l++; /* \0 */
+		slice_resize(s, l);
+		memcpy(s->buf, str, l);
+		s->len = l - 1;
+	}
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = reftable_realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_append_string(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_append(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+void slice_consume(struct slice *s, int n)
+{
+	s->buf += n;
+	s->len -= n;
+}
+
+byte *slice_yield(struct slice *s)
+{
+	byte *p = s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_clear(struct slice *s)
+{
+	reftable_free(slice_yield(s));
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = { 0 };
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_yield(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	if (a.len != b.len) {
+		return 0;
+	}
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_compare(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	if (res != 0) {
+		return res;
+	}
+	if (a.len < b.len) {
+		return -1;
+	} else if (a.len > b.len) {
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+int slice_write(struct slice *b, byte *data, size_t sz)
+{
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = reftable_realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_write_void(void *b, byte *data, size_t sz)
+{
+	return slice_write((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = reftable_calloc(size);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf)
+{
+	assert(bs->ops == NULL);
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+struct reftable_block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct reftable_block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct reftable_block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 00000000000..886f22811c4
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,76 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+/*
+  provides bounds-checked byte ranges.
+  To use, initialize as "slice x = {0};"
+ */
+struct slice {
+	int len;
+	int cap;
+	byte *buf;
+};
+
+void slice_set_string(struct slice *dest, const char *src);
+void slice_append_string(struct slice *dest, const char *src);
+/* Set length to 0, but retain buffer */
+void slice_clear(struct slice *slice);
+
+/* Return a malloced string for `src` */
+char *slice_to_string(struct slice src);
+
+/* Ensure that `buf` is \0 terminated. */
+const char *slice_as_string(struct slice *src);
+
+/* Compare slices */
+bool slice_equal(struct slice a, struct slice b);
+
+/* Return `buf`, clearing out `s` */
+byte *slice_yield(struct slice *s);
+
+/* Copy bytes */
+void slice_copy(struct slice *dest, struct slice src);
+
+/* Advance `buf` by `n`, and decrease length. A copy of the slice
+   should be kept for deallocating the slice. */
+void slice_consume(struct slice *s, int n);
+
+/* Set length of the slice to `l` */
+void slice_resize(struct slice *s, int l);
+
+/* Signed comparison */
+int slice_compare(struct slice a, struct slice b);
+
+/* Append `data` to the `dest` slice.  */
+int slice_write(struct slice *dest, byte *data, size_t sz);
+
+/* Append `add` to `dest. */
+void slice_append(struct slice *dest, struct slice add);
+
+/* Like slice_write, but suitable for passing to reftable_new_writer
+ */
+int slice_write_void(void *b, byte *data, size_t sz);
+
+/* Find the longest shared prefix size of `a` and `b` */
+int common_prefix_size(struct slice a, struct slice b);
+
+struct reftable_block_source;
+
+/* Create an in-memory block source for reading reftables */
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf);
+
+struct reftable_block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 00000000000..85b92f50197
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1245 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "refname.h"
+#include "reftable.h"
+#include "writer.h"
+
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config)
+{
+	struct reftable_stack *p =
+		reftable_calloc(sizeof(struct reftable_stack));
+	struct slice list_file_name = { 0 };
+	int err = 0;
+
+	if (config.hash_id == 0) {
+		config.hash_id = SHA1_ID;
+	}
+
+	*dest = NULL;
+
+	slice_set_string(&list_file_name, dir);
+	slice_append_string(&list_file_name, "/tables.list");
+
+	p->list_file = slice_to_string(list_file_name);
+	slice_clear(&list_file_name);
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = reftable_stack_reload_maybe_reuse(p, true);
+	if (err < 0) {
+		reftable_stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fd_read_lines(int fd, char ***namesp)
+{
+	off_t size = lseek(fd, 0, SEEK_END);
+	char *buf = NULL;
+	int err = 0;
+	if (size < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+	err = lseek(fd, 0, SEEK_SET);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	buf = reftable_malloc(size + 1);
+	if (read(fd, buf, size) != size) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+
+exit:
+	reftable_free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	int fd = open(filename, O_RDONLY, 0644);
+	int err = 0;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			*namesp = reftable_calloc(sizeof(char *));
+			return 0;
+		}
+
+		return REFTABLE_IO_ERROR;
+	}
+	err = fd_read_lines(fd, namesp);
+	close(fd);
+	return err;
+}
+
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void reftable_stack_destroy(struct reftable_stack *st)
+{
+	if (st->merged != NULL) {
+		reftable_merged_table_close(st->merged);
+		reftable_merged_table_free(st->merged);
+		st->merged = NULL;
+	}
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	reftable_free(st);
+}
+
+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
+						   int cur_len)
+{
+	struct reftable_reader **cur =
+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
+				      bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reftable_reader **new_tables =
+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
+	int new_tables_len = 0;
+	struct reftable_merged_table *new_merged = NULL;
+
+	struct slice table_path = { 0 };
+
+	while (*names) {
+		struct reftable_reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct reftable_block_source src = { 0 };
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_append_string(&table_path, "/");
+			slice_append_string(&table_path, name);
+
+			err = reftable_block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0) {
+				goto exit;
+			}
+
+			err = reftable_new_reader(&rd, src, name);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
+					st->config.hash_id);
+	if (err < 0) {
+		goto exit;
+	}
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		reftable_merged_table_free(st->merged);
+	}
+	new_merged->suppress_deletions = true;
+	st->merged = new_merged;
+
+	{
+		int i = 0;
+		for (i = 0; i < cur_len; i++) {
+			if (cur[i] != NULL) {
+				reader_close(cur[i]);
+				reftable_reader_free(cur[i]);
+			}
+		}
+	}
+
+exit:
+	slice_clear(&table_path);
+	{
+		int i = 0;
+		for (i = 0; i < new_tables_len; i++) {
+			reader_close(new_tables[i]);
+			reftable_reader_free(new_tables[i]);
+		}
+	}
+	reftable_free(new_tables);
+	reftable_free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0) {
+		return diff;
+	}
+
+	return udiff;
+}
+
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		int err2 = 0;
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = reftable_stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != REFTABLE_NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
+		   writer. Check if there was one by checking if the name list
+		   changed.
+		*/
+		err2 = read_lines(st->list_file, &names_after);
+		if (err2 < 0) {
+			free_names(names);
+			return err2;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return err;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 1;
+		sleep_millisec(delay);
+	}
+
+	return 0;
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct reftable_stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto exit;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto exit;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	free_names(names);
+	return err;
+}
+
+int reftable_stack_reload(struct reftable_stack *st)
+{
+	int err = stack_uptodate(st);
+	if (err > 0) {
+		return reftable_stack_reload_maybe_reuse(st, true);
+	}
+	return err;
+}
+
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write)(struct reftable_writer *wr, void *arg),
+		       void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == REFTABLE_LOCK_ERROR) {
+			/* Ignore error return, we want to propagate
+			   REFTABLE_LOCK_ERROR.
+			*/
+			reftable_stack_reload(st);
+		}
+		return err;
+	}
+
+	if (!st->disable_auto_compact) {
+		return reftable_stack_auto_compact(st);
+	}
+
+	return 0;
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
+	slice_set_string(dest, buf);
+}
+
+struct reftable_addition {
+	int lock_file_fd;
+	struct slice lock_file_name;
+	struct reftable_stack *stack;
+	char **names;
+	char **new_tables;
+	int new_tables_len;
+	uint64_t next_update_index;
+};
+
+static int reftable_stack_init_addition(struct reftable_addition *add,
+					struct reftable_stack *st)
+{
+	int err = 0;
+	add->stack = st;
+
+	slice_set_string(&add->lock_file_name, st->list_file);
+	slice_append_string(&add->lock_file_name, ".lock");
+
+	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (add->lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = REFTABLE_LOCK_ERROR;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	err = stack_uptodate(st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 1) {
+		err = REFTABLE_LOCK_ERROR;
+		goto exit;
+	}
+
+	add->next_update_index = reftable_stack_next_update_index(st);
+exit:
+	if (err) {
+		reftable_addition_close(add);
+	}
+	return err;
+}
+
+void reftable_addition_close(struct reftable_addition *add)
+{
+	int i = 0;
+	struct slice nm = { 0 };
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_set_string(&nm, add->stack->list_file);
+		slice_append_string(&nm, "/");
+		slice_append_string(&nm, add->new_tables[i]);
+		unlink(slice_as_string(&nm));
+		reftable_free(add->new_tables[i]);
+		add->new_tables[i] = NULL;
+	}
+	reftable_free(add->new_tables);
+	add->new_tables = NULL;
+	add->new_tables_len = 0;
+
+	if (add->lock_file_fd > 0) {
+		close(add->lock_file_fd);
+		add->lock_file_fd = 0;
+	}
+	if (add->lock_file_name.len > 0) {
+		unlink(slice_as_string(&add->lock_file_name));
+		slice_clear(&add->lock_file_name);
+	}
+
+	free_names(add->names);
+	add->names = NULL;
+	slice_clear(&nm);
+}
+
+void reftable_addition_destroy(struct reftable_addition *add)
+{
+	if (add == NULL) {
+		return;
+	}
+	reftable_addition_close(add);
+	reftable_free(add);
+}
+
+int reftable_addition_commit(struct reftable_addition *add)
+{
+	struct slice table_list = { 0 };
+	int i = 0;
+	int err = 0;
+	if (add->new_tables_len == 0) {
+		goto exit;
+	}
+
+	for (i = 0; i < add->stack->merged->stack_len; i++) {
+		slice_append_string(&table_list,
+				    add->stack->merged->stack[i]->name);
+		slice_append_string(&table_list, "\n");
+	}
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_append_string(&table_list, add->new_tables[i]);
+		slice_append_string(&table_list, "\n");
+	}
+
+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	slice_clear(&table_list);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = close(add->lock_file_fd);
+	add->lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&add->lock_file_name),
+		     add->stack->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = reftable_stack_reload(add->stack);
+
+exit:
+	reftable_addition_close(add);
+	return err;
+}
+
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st)
+{
+	int err = 0;
+	*dest = reftable_calloc(sizeof(**dest));
+	err = reftable_stack_init_addition(*dest, st);
+	if (err) {
+		reftable_free(*dest);
+		*dest = NULL;
+	}
+	return err;
+}
+
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg)
+{
+	struct reftable_addition add = { 0 };
+	int err = reftable_stack_init_addition(&add, st);
+	if (err < 0) {
+		goto exit;
+	}
+	if (err > 0) {
+		err = REFTABLE_LOCK_ERROR;
+		goto exit;
+	}
+
+	err = reftable_addition_add(&add, write_table, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_addition_commit(&add);
+exit:
+	reftable_addition_close(&add);
+	return err;
+}
+
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice tab_file_name = { 0 };
+	struct slice next_name = { 0 };
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, add->next_update_index, add->next_update_index);
+
+	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
+	slice_append_string(&temp_tab_file_name, "/");
+	slice_append(&temp_tab_file_name, next_name);
+	slice_append_string(&temp_tab_file_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
+	if (tab_fd < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
+				 &add->stack->config);
+	err = write_table(wr, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_writer_close(wr);
+	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto exit;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_check_addition(add->stack,
+				   slice_as_string(&temp_tab_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (wr->min_update_index < add->next_update_index) {
+		err = REFTABLE_API_ERROR;
+		goto exit;
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_append_string(&next_name, ".ref");
+
+	slice_set_string(&tab_file_name, add->stack->reftable_dir);
+	slice_append_string(&tab_file_name, "/");
+	slice_append(&tab_file_name, next_name);
+
+	/* TODO: should check destination out of paranoia */
+	err = rename(slice_as_string(&temp_tab_file_name),
+		     slice_as_string(&tab_file_name));
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	add->new_tables = reftable_realloc(add->new_tables,
+					   sizeof(*add->new_tables) *
+						   (add->new_tables_len + 1));
+	add->new_tables[add->new_tables_len] = slice_to_string(next_name);
+	add->new_tables_len++;
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_file_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_file_name));
+	}
+
+	slice_clear(&temp_tab_file_name);
+	slice_clear(&tab_file_name);
+	slice_clear(&next_name);
+	reftable_writer_free(wr);
+	return err;
+}
+
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0) {
+		return reftable_reader_max_update_index(
+			       st->merged->stack[sz - 1]) +
+		       1;
+	}
+	return 1;
+}
+
+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct reftable_log_expiry_config *config)
+{
+	struct slice next_name = { 0 };
+	int tab_fd = -1;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reftable_reader_min_update_index(st->merged->stack[first]),
+		    reftable_reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_append_string(temp_tab, "/");
+	slice_append(temp_tab, next_name);
+	slice_append_string(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0) {
+		goto exit;
+	}
+	err = reftable_writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+exit:
+	reftable_writer_free(wr);
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		slice_clear(temp_tab);
+	}
+	slice_clear(&next_name);
+	return err;
+}
+
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reftable_reader **subtabs = reftable_calloc(
+		sizeof(struct reftable_reader *) * (last - first + 1));
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+
+	uint64_t entries = 0;
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reftable_reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	reftable_writer_set_limits(wr,
+				   st->merged->stack[first]->min_update_index,
+				   st->merged->stack[last]->max_update_index);
+
+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
+					st->config.hash_id);
+	if (err < 0) {
+		reftable_free(subtabs);
+		goto exit;
+	}
+
+	err = reftable_merged_table_seek_ref(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_ref(it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = reftable_writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+	reftable_iterator_destroy(&it);
+
+	err = reftable_merged_table_seek_log(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_log(it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = reftable_writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		reftable_merged_table_free(mt);
+	}
+	reftable_ref_record_clear(&ref);
+	reftable_log_record_clear(&log);
+	st->stats.entries_written += entries;
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct reftable_stack *st, int first, int last,
+			       struct reftable_log_expiry_config *expiry)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice new_table_name = { 0 };
+	struct slice lock_file_name = { 0 };
+	struct slice ref_list_contents = { 0 };
+	struct slice new_table_path = { 0 };
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **delete_on_success =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	char **subtable_locks =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto exit;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_append_string(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	/* Don't want to write to the lock for now.  */
+	close(lock_file_fd);
+	lock_file_fd = 0;
+
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0) {
+		goto exit;
+	}
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_file_name = { 0 };
+		struct slice subtab_lock = { 0 };
+		slice_set_string(&subtab_file_name, st->reftable_dir);
+		slice_append_string(&subtab_file_name, "/");
+		slice_append_string(&subtab_file_name,
+				    reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_file_name);
+		slice_append_string(&subtab_lock, ".lock");
+
+		{
+			int sublock_file_fd =
+				open(slice_as_string(&subtab_lock),
+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
+			if (sublock_file_fd > 0) {
+				close(sublock_file_fd);
+			} else if (sublock_file_fd < 0) {
+				if (errno == EEXIST) {
+					err = 1;
+				} else {
+					err = REFTABLE_IO_ERROR;
+				}
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] =
+			(char *)slice_as_string(&subtab_file_name);
+		j++;
+
+		if (err != 0) {
+			goto exit;
+		}
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
+				   expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_append_string(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_append_string(&new_table_path, "/");
+
+	slice_append(&new_table_path, new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(slice_as_string(&temp_tab_file_name),
+			     slice_as_string(&new_table_path));
+		if (err < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto exit;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		slice_append(&ref_list_contents, new_table_name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_append_string(&ref_list_contents,
+				    st->merged->stack[i]->name);
+		slice_append_string(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	have_lock = false;
+
+	/* Reload the stack before deleting. On windows, we can only delete the
+	   files after we closed them.
+	*/
+	err = reftable_stack_reload_maybe_reuse(st, first < last);
+
+	{
+		char **p = delete_on_success;
+		while (*p) {
+			if (strcmp(*p, slice_as_string(&new_table_path))) {
+				unlink(*p);
+			}
+			p++;
+		}
+	}
+
+exit:
+	free_names(delete_on_success);
+	{
+		char **p = subtable_locks;
+		while (*p) {
+			unlink(*p);
+			p++;
+		}
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	slice_clear(&new_table_name);
+	slice_clear(&new_table_path);
+	slice_clear(&ref_list_contents);
+	slice_clear(&temp_tab_file_name);
+	slice_clear(&lock_file_name);
+	return err;
+}
+
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct reftable_stack *st, int first,
+				     int last,
+				     struct reftable_log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	if (sz == 0) {
+		return 0;
+	}
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+
+	if (n == 0) {
+		*seglen = 0;
+		return segs;
+	}
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	reftable_free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
+{
+	uint64_t *sizes =
+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
+	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
+	int overhead = header_size(version) - 1;
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		sizes[i] = st->merged->stack[i]->size - overhead;
+	}
+	return sizes;
+}
+
+int reftable_stack_auto_compact(struct reftable_stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	reftable_free(sizes);
+	if (segment_size(&seg) > 0) {
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+	}
+
+	return 0;
+}
+
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st)
+{
+	return &st->stats;
+}
+
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_table tab = { NULL };
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+	return reftable_table_read_ref(tab, refname, ref);
+}
+
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_log(it, log);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(log->ref_name, refname) ||
+	    reftable_log_record_is_deletion(log)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	if (err) {
+		reftable_log_record_clear(log);
+	}
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name)
+{
+	int err = 0;
+	struct reftable_block_source src = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct reftable_ref_record *refs = NULL;
+	struct reftable_iterator it = { NULL };
+	int cap = 0;
+	int len = 0;
+	int i = 0;
+
+	if (st->config.skip_name_check) {
+		return 0;
+	}
+
+	err = reftable_block_source_from_file(&src, new_tab_name);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_new_reader(&rd, src, new_tab_name);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	if (err > 0) {
+		err = 0;
+		goto exit;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		err = reftable_iterator_next_ref(it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (len >= cap) {
+			cap = 2 * cap + 1;
+			refs = reftable_realloc(refs, cap * sizeof(refs[0]));
+		}
+
+		refs[len++] = ref;
+	}
+
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+
+	err = validate_ref_record_addition(tab, refs, len);
+
+exit:
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&refs[i]);
+	}
+
+	free(refs);
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 00000000000..60bca3b3e58
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,48 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+#include "system.h"
+
+struct reftable_stack {
+	char *list_file;
+	char *reftable_dir;
+	bool disable_auto_compact;
+
+	struct reftable_write_options config;
+
+	struct reftable_merged_table *merged;
+	struct reftable_compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg);
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config);
+int fastlog2(uint64_t sz);
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
+void reftable_addition_close(struct reftable_addition *add);
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 00000000000..9e2c4b8d9ac
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,54 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#if 1 /* REFTABLE_IN_GITCORE */
+
+#include "git-compat-util.h"
+#include "cache.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_IN_GITCORE */
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef uint8_t byte;
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ *
+ * TODO: in git-core, this should fallback to uncompress2 if it is available.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 00000000000..0341c865569
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,67 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				reftable_calloc(sizeof(struct tree_node));
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	{
+		int res = compare(key, (*rootp)->key);
+		if (res < 0) {
+			return tree_search(key, &(*rootp)->left, compare,
+					   insert);
+		} else if (res > 0) {
+			return tree_search(key, &(*rootp)->right, compare,
+					   insert);
+		}
+	}
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	reftable_free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 00000000000..954512e9a3a
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+/* tree_node is a generic binary search tree. */
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+/* looks for `key` in `rootp` using `compare` as comparison function. If insert
+   is set, insert the key if it's not found. Else, return NULL.
+*/
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+
+/* performs an infix walk of the tree. */
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+
+/*
+  deallocates the tree nodes recursively. Keys should be deallocated separately
+  by walking over the tree. */
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100755
index 00000000000..ef32aeda515
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+set -eu
+
+# Override this to import from somewhere else, say "../reftable".
+SRC=${SRC:-origin}
+BRANCH=${BRANCH:-master}
+
+((git --git-dir reftable-repo/.git fetch ${SRC} ${BRANCH}:import && cd reftable-repo && git checkout -f $(git rev-parse import) ) ||
+   git clone https://github.com/google/reftable reftable-repo)
+
+cp reftable-repo/c/*.[ch] reftable/
+cp reftable-repo/c/include/*.[ch] reftable/
+cp reftable-repo/LICENSE reftable/
+git --git-dir reftable-repo/.git show --no-patch HEAD \
+  > reftable/VERSION
+
+mv reftable/system.h reftable/system.h~
+sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
+
+# Remove unittests and compatibility hacks we don't need here.  
+rm reftable/*_test.c reftable/test_framework.* reftable/compat.*
+
+git add reftable/*.[ch] reftable/LICENSE reftable/VERSION 
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 00000000000..a3db8e5ba44
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,665 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct reftable_block_stats *
+writer_reftable_block_stats(struct reftable_writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct reftable_writer *w, byte *data, size_t len,
+			int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = reftable_calloc(w->pending_padding);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0) {
+			return n;
+		}
+
+		w->pending_padding = 0;
+		reftable_free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0) {
+		return n;
+	}
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct reftable_write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_version(struct reftable_writer *w)
+{
+	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
+}
+
+static int writer_write_header(struct reftable_writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	dest[4] = writer_version(w);
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	if (writer_version(w) == 2) {
+		put_be32(dest + 24, w->opts.hash_id);
+	}
+	return header_size(writer_version(w));
+}
+
+static void writer_reinit_block_writer(struct reftable_writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = header_size(writer_version(w));
+	}
+
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, byte *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts)
+{
+	struct reftable_writer *wp =
+		reftable_calloc(sizeof(struct reftable_writer));
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->block = reftable_calloc(opts->block_size);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void reftable_writer_free(struct reftable_writer *w)
+{
+	reftable_free(w->block);
+	reftable_free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
+			     ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct reftable_writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		key = reftable_calloc(sizeof(struct obj_index_tree_node));
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = reftable_realloc(
+			key->offsets, sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct reftable_writer *w, struct record rec)
+{
+	int result = -1;
+	struct slice key = { 0 };
+	int err = 0;
+	record_key(rec, &key);
+	if (slice_compare(w->last_key, key) >= 0) {
+		goto exit;
+	}
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto exit;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	writer_reinit_block_writer(w, record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	result = 0;
+exit:
+	slice_clear(&key);
+	return result;
+}
+
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref)
+{
+	struct record rec = { 0 };
+	struct reftable_ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL) {
+		return REFTABLE_API_ERROR;
+	}
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index) {
+		return REFTABLE_API_ERROR;
+	}
+
+	record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = hash_size(w->opts.hash_id),
+		};
+
+		writer_index_hash(w, h);
+	}
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = hash_size(w->opts.hash_id),
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, reftable_ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log)
+{
+	if (log->ref_name == NULL) {
+		return REFTABLE_API_ERROR;
+	}
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	{
+		struct record rec = { 0 };
+		int err;
+		record_from_log(&rec, log);
+		err = writer_add_record(w, rec);
+		return err;
+	}
+}
+
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, reftable_log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	while (w->index_len > threshold) {
+		struct index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct record rec = { 0 };
+			record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, rec) == 0) {
+				continue;
+			}
+
+			{
+				int err = writer_flush_block(w);
+				if (err < 0) {
+					return err;
+				}
+			}
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, rec);
+			if (err != 0) {
+				/* write into fresh block should always succeed
+				 */
+				abort();
+			}
+		}
+		for (i = 0; i < idx_len; i++) {
+			slice_clear(&idx[i].last_key);
+		}
+		reftable_free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_block_stats *bstats =
+			writer_reftable_block_stats(w, typ);
+		bstats->index_blocks =
+			w->stats.idx_stats.blocks - before_blocks;
+		bstats->index_offset = index_start;
+		bstats->max_index_level = max_level;
+	}
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct reftable_writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct record rec = { 0 };
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	slice_clear(&entry->hash);
+	reftable_free(entry);
+}
+
+static int writer_dump_object_index(struct reftable_writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0) {
+		return closure.err;
+	}
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct reftable_writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0) {
+		return err;
+	}
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int reftable_writer_close(struct reftable_writer *w)
+{
+	byte footer[72];
+	byte *p = footer;
+	int err = writer_finish_public_section(w);
+	int empty_table = w->next == 0;
+	if (err != 0) {
+		goto exit;
+	}
+	w->pending_padding = 0;
+	if (empty_table) {
+		/* Empty tables need a header anyway. */
+		byte header[28];
+		int n = writer_write_header(w, header);
+		err = padded_write(w, header, n, 0);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	p += writer_write_header(w, footer);
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+
+	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (empty_table) {
+		err = REFTABLE_EMPTY_TABLE_ERROR;
+		goto exit;
+	}
+
+exit:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	slice_clear(&w->last_key);
+	return err;
+}
+
+void writer_clear_index(struct reftable_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		slice_clear(&w->index[i].last_key);
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct reftable_block_stats *bstats =
+		writer_reftable_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	if (raw_bytes < 0) {
+		return raw_bytes;
+	}
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0) {
+		return err;
+	}
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = reftable_realloc(
+			w->index, sizeof(struct index_record) * w->index_cap);
+	}
+
+	{
+		struct index_record ir = {
+			.offset = w->next,
+		};
+		slice_copy(&ir.last_key, w->block_writer->last_key);
+		w->index[w->index_len] = ir;
+	}
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct reftable_writer *w)
+{
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+	if (w->block_writer->entries == 0) {
+		return 0;
+	}
+	return writer_flush_nonempty_block(w);
+}
+
+const struct reftable_stats *writer_stats(struct reftable_writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 00000000000..afc15d8f31c
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct reftable_writer {
+	int (*write)(void *, byte *, size_t);
+	void *write_arg;
+	int pending_padding;
+	struct slice last_key;
+
+	/* offset of next block to write. */
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct reftable_write_options opts;
+
+	/* memory buffer for writing */
+	byte *block;
+
+	/* writer for the current section. NULL or points to
+	 * block_writer_data */
+	struct block_writer *block_writer;
+
+	struct block_writer block_writer_data;
+
+	/* pending index records for the current section */
+	struct index_record *index;
+	int index_len;
+	int index_cap;
+
+	/*
+	  tree for use with tsearch; used to populate the 'o' inverse OID
+	  map */
+	struct tree_node *obj_index_tree;
+
+	struct reftable_stats stats;
+};
+
+/* finishes a block, and writes it to storage */
+int writer_flush_block(struct reftable_writer *w);
+
+/* deallocates memory related to the index */
+void writer_clear_index(struct reftable_writer *w);
+
+/* finishes writing a 'r' (refs) or 'g' (reflogs) section */
+int writer_finish_public_section(struct reftable_writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 00000000000..3e0b0f24f1c
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v13 11/13] Reftable support for git-core
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                           ` (9 preceding siblings ...)
  2020-05-11 19:46                         ` [PATCH v13 10/13] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-05-11 19:46                         ` Han-Wen Nienhuys via GitGitGadget
  2020-05-13 19:55                           ` Junio C Hamano
  2020-05-11 19:46                         ` [PATCH v13 12/13] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
                                           ` (3 subsequent siblings)
  14 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

This introduces the refs/reftable-backend.c containing reftable powered ref
storage backend.

It can be activated by passing --ref-storage=reftable to "git init".

TODO:

* Resolve spots marked with XXX

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   27 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   56 +-
 cache.h                                       |    6 +-
 refs.c                                        |   27 +-
 refs.h                                        |    3 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1218 +++++++++++++++++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |  120 ++
 13 files changed, 1455 insertions(+), 30 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ffd..72576235833 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index 3d3a39fc192..f4848a382f4 100644
--- a/Makefile
+++ b/Makefile
@@ -810,6 +810,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -962,6 +963,7 @@ LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1165,7 +1167,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2360,11 +2362,29 @@ VCSSVN_OBJS += vcs-svn/sliding_window.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/refname.o
+REFTABLE_OBJS += reftable/reftable.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2505,6 +2525,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -3120,7 +3143,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/builtin/clone.c b/builtin/clone.c
index cb48a291caf..4d0cf065e4a 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1108,7 +1108,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
+		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 0b7222e7188..b7053b9e370 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,7 +178,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -188,7 +189,8 @@ void initialize_repository_version(int hash_algo)
 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
 #endif
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 ||
+	    !strcmp(ref_storage_format, "reftable"))
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -238,6 +240,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -247,6 +250,24 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
+	/*
+	 * refs/heads is a file when using reftable. We can't reinitialize with
+	 * a reftable because it will overwrite HEAD
+	 */
+	if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
+			      is_directory(git_path_buf(&buf, "refs/heads"))) {
+		die("cannot switch ref storage format.");
+	}
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -261,15 +282,12 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
@@ -383,7 +401,8 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash, unsigned int flags)
+	    const char *template_dir, int hash, const char *ref_storage_format,
+	    unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -422,6 +441,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
+	repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
@@ -450,6 +470,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -523,6 +545,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = DEFAULT_REF_STORAGE;
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
@@ -530,15 +553,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *object_format = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
@@ -648,9 +674,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
+	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
+		       ref_storage_format, flags);
 }
diff --git a/cache.h b/cache.h
index 0f0485ecfe2..8cb884773c3 100644
--- a/cache.h
+++ b/cache.h
@@ -628,8 +628,9 @@ int path_inside_repo(const char *prefix, const char *path);
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
-	    unsigned int flags);
-void initialize_repository_version(int hash_algo);
+	    const char *ref_storage_format, unsigned int flags);
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1043,6 +1044,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 79c63d60dcb..e8751415a9e 100644
--- a/refs.c
+++ b/refs.c
@@ -17,10 +17,16 @@
 #include "argv-array.h"
 #include "repository.h"
 
+const char *default_ref_storage(void)
+{
+	const char *test = getenv("GIT_TEST_REFTABLE");
+	return test ? "reftable" : "files";
+}
+
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1711,13 +1717,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1733,7 +1739,11 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs_private = ref_store_init(r->gitdir,
+					 r->ref_storage_format ?
+						 r->ref_storage_format :
+						 DEFAULT_REF_STORAGE,
+					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
 
@@ -1788,7 +1798,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1802,6 +1812,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1815,9 +1826,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index d1d9361441b..a167f60734d 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* Returns the ref storage backend to use by default. */
+const char *default_ref_storage(void);
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index dabe18baea1..4daab6f0f0c 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -673,6 +673,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 00000000000..c3ceb9438ad
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1218 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct git_reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *repo_dir;
+	char *reftable_dir;
+	struct reftable_stack *stack;
+};
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	reftable_log_record_clear(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = { NULL };
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	reftable_log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *git_reftable_ref_store_create(const char *path,
+						       unsigned int store_flags)
+{
+	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct reftable_write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+	refs->repo_dir = xstrdup(path);
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct strbuf sb = STRBUF_INIT;
+
+	safe_create_dir(refs->reftable_dir, 1);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	return 0;
+}
+
+struct git_reftable_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	const char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = reftable_iterator_next_ref(ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		/*
+		   We could filter pseudo refs here explicitly, but HEAD is not
+		   a PSEUDOREF, but a PER_WORKTREE, b/c each worktree can have
+		   its own HEAD.
+		 */
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	reftable_ref_record_clear(&ri->ref);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct reftable_merged_table *mt = NULL;
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else {
+		mt = reftable_stack_merged_table(refs->stack);
+		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->prefix = prefix;
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *errbuf)
+{
+	/* XXX rewrite using the reftable transaction API. */
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+done:
+	return err;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	(void)refs;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid;
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp(((struct ref_update *)a)->refname,
+		      ((struct ref_update *)b)->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = reftable_stack_next_update_index(refs->stack);
+	int err = 0;
+	int i = 0;
+	struct reftable_log_record *logs =
+		calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct reftable_log_record *log = &logs[i];
+		fill_reftable_log_record(log);
+		log->ref_name = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct object_id out_oid;
+			int out_flags = 0;
+			/* Memory owned by refs_resolve_ref_unsafe, no need to
+			 * free(). */
+			const char *resolved = refs_resolve_ref_unsafe(
+				transaction->ref_store, u->refname, 0, &out_oid,
+				&out_flags);
+			struct reftable_ref_record ref = { NULL };
+			struct object_id peeled;
+			int peel_error = peel_object(&u->new_oid, &peeled);
+
+			ref.ref_name =
+				(char *)(resolved ? resolved : u->refname);
+			log->ref_name = ref.ref_name;
+
+			if (!is_null_oid(&u->new_oid)) {
+				ref.value = u->new_oid.hash;
+			}
+			ref.update_index = ts;
+			if (!peel_error) {
+				ref.target_value = peeled.hash;
+			}
+
+			err = reftable_writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		err = reftable_writer_add_log(writer, &logs[i]);
+		clear_reftable_log_record(&logs[i]);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+exit:
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_commit(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_add(refs->stack, &write_transaction_table,
+				 transaction);
+	if (err < 0) {
+		strbuf_addf(errmsg, "reftable: transaction failure: %s",
+			    reftable_error_str(err));
+		return -1;
+	}
+
+	return 0;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *err)
+{
+	return reftable_transaction_commit(ref_store, transaction, err);
+}
+
+struct write_pseudoref_arg {
+	struct reftable_stack *stack;
+	const char *pseudoref;
+	const struct object_id *new_oid;
+	const struct object_id *old_oid;
+};
+
+static int write_pseudoref_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_pseudoref_arg *arg = (struct write_pseudoref_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	struct reftable_ref_record read_ref = { NULL };
+	struct reftable_ref_record write_ref = { NULL };
+
+	reftable_writer_set_limits(writer, ts, ts);
+	if (arg->old_oid) {
+		struct object_id read_oid;
+		err = reftable_stack_read_ref(arg->stack, arg->pseudoref,
+					      &read_ref);
+		if (err < 0)
+			goto done;
+
+		if ((err > 0) != is_null_oid(arg->old_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		/* XXX If old_oid is set, and we have a symref? */
+
+		if (err == 0 && read_ref.value == NULL) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		hashcpy(read_oid.hash, read_ref.value);
+		if (!oideq(arg->old_oid, &read_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+	}
+
+	write_ref.ref_name = (char *)arg->pseudoref;
+	write_ref.update_index = ts;
+	if (!is_null_oid(arg->new_oid))
+		write_ref.value = (uint8_t *)arg->new_oid->hash;
+
+	err = reftable_writer_add_ref(writer, &write_ref);
+done:
+	reftable_ref_record_clear(&read_ref);
+	return err;
+}
+
+static int reftable_write_pseudoref(struct ref_store *ref_store,
+				    const char *pseudoref,
+				    const struct object_id *oid,
+				    const struct object_id *old_oid,
+				    struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_pseudoref_arg arg = {
+		.stack = refs->stack,
+		.pseudoref = pseudoref,
+		.new_oid = oid,
+	};
+	struct reftable_addition *add = NULL;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_new_addition(&add, refs->stack);
+	if (err) {
+		goto done;
+	}
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		/* XXX this is cut & paste from files-backend - should factor
+		 * out? */
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(errbuf,
+					    _("could not read ref '%s'"),
+					    pseudoref);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(errbuf, _("ref '%s' already exists"),
+				    pseudoref);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(errbuf,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			goto done;
+		}
+	}
+
+	err = reftable_addition_add(add, &write_pseudoref_table, &arg);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref update failure: %s",
+			    reftable_error_str(err));
+	}
+
+	err = reftable_addition_commit(add);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref commit failure: %s",
+			    reftable_error_str(err));
+	}
+
+done:
+	reftable_addition_destroy(add);
+	return err;
+}
+
+static int reftable_delete_pseudoref(struct ref_store *ref_store,
+				     const char *pseudoref,
+				     const struct object_id *old_oid)
+{
+	struct strbuf errbuf = STRBUF_INIT;
+	int ret = reftable_write_pseudoref(ref_store, pseudoref, &null_oid,
+					   old_oid, &errbuf);
+	/* XXX what to do with the error message? */
+	strbuf_release(&errbuf);
+	return ret;
+}
+
+struct write_delete_refs_arg {
+	struct reftable_stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	int i = 0;
+
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = reftable_writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_log_record log = { NULL };
+		struct reftable_ref_record current = { NULL };
+		fill_reftable_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (reftable_stack_read_ref(arg->stack, log.ref_name,
+					    &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = reftable_writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		reftable_ref_record_clear(&current);
+
+		clear_reftable_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
+done:
+	return err;
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct git_reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = reftable_stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	reftable_writer_set_limits(writer, ts, ts);
+	err = reftable_writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_log_record log = { NULL };
+		struct object_id new_oid;
+		struct object_id old_oid;
+		struct reftable_ref_record current = { NULL };
+		reftable_stack_read_ref(create->refs->stack, create->refname,
+					&current);
+
+		fill_reftable_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			reftable_writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_reftable_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(refs->stack, &write_create_symref_table, &arg);
+done:
+	return err;
+}
+
+struct write_rename_arg {
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_ref_record ref = { NULL };
+	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto exit;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto exit;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	reftable_writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct reftable_ref_record todo[2] = { { NULL } };
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = reftable_writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct reftable_log_record todo[2] = { { NULL } };
+		fill_reftable_log_record(&todo[0]);
+		fill_reftable_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = reftable_writer_add_logs(writer, todo, 2);
+
+		clear_reftable_log_record(&todo[0]);
+		clear_reftable_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto exit;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+exit:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_add(refs->stack, &write_rename_table, &arg);
+done:
+	return err;
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = reftable_iterator_next_log(ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			/* we want the refnames that we have reflogs for, so we
+			 * skip if we've already produced this name. This could
+			 * be faster by seeking directly to
+			 * reflog@update_index==0.
+			 */
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	reftable_log_record_clear(&ri->log);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(refs->stack);
+	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_log_record log = { NULL };
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		err = reftable_iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid;
+			struct object_id new_oid;
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+	int i = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct reftable_log_record log = { NULL };
+		err = reftable_iterator_next_log(it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (i = len; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (i = 0; i < len; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		reftable_log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct reftable_log_record tombstone = {
+		.ref_name = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->refs->stack);
+	int i = 0;
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  reftable_pack_refs(), which will compact the entire stack and get rid
+	  of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.refs = refs,
+	};
+	struct reftable_log_record log = { NULL };
+	struct reftable_iterator it = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		struct object_id ooid;
+		struct object_id noid;
+
+		int err = reftable_iterator_next_log(it, &log);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0 || strcmp(log.ref_name, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_read_ref(refs->stack, refname, &ref);
+	if (err > 0) {
+		errno = ENOENT;
+		err = -1;
+		goto exit;
+	}
+	if (err < 0) {
+		errno = reftable_error_to_errno(err);
+		err = -1;
+		goto exit;
+	}
+	if (ref.target != NULL) {
+		/* XXX recurse? */
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else if (ref.value != NULL) {
+		hashcpy(oid->hash, ref.value);
+	} else {
+		*type |= REF_ISBROKEN;
+		errno = EINVAL;
+		err = -1;
+	}
+exit:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	git_reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_write_pseudoref,
+	reftable_delete_pseudoref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire
+};
diff --git a/repository.c b/repository.c
index 6f7f6f002b1..087760bc184 100644
--- a/repository.c
+++ b/repository.c
@@ -178,6 +178,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 6534fbb7b31..f57b73f4a27 100644
--- a/repository.h
+++ b/repository.h
@@ -74,6 +74,9 @@ struct repository {
 	 */
 	struct ref_store *refs_private;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index 65fe5ecefbe..2ef970f9f88 100644
--- a/setup.c
+++ b/setup.c
@@ -468,9 +468,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -559,6 +561,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1204,8 +1207,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 00000000000..c476873ef16
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,120 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+INVALID_SHA1=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+initialize ()  {
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	mv .git/hooks .git/hooks-disabled
+}
+
+test_expect_success 'delete ref' '
+	initialize &&
+	test_commit file &&
+	SHA=$(git show-ref -s --verify HEAD) &&
+	test_write_lines "$SHA refs/heads/master" "$SHA refs/tags/file" >expect &&
+	git show-ref > actual &&
+	! git update-ref -d refs/tags/file $INVALID_SHA1 &&
+	test_cmp expect actual &&
+	git update-ref -d refs/tags/file $SHA  &&
+	test_write_lines "$SHA refs/heads/master" >expect &&
+	git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basic operation of reftable storage: commit, reflog, repack' '
+	initialize &&
+	test_commit file &&
+	test_write_lines refs/heads/master refs/tags/file >expect &&
+	git show-ref &&
+	git show-ref | cut -f2 -d" " > actual &&
+	test_cmp actual expect &&
+	for count in $(test_seq 1 10)
+	do
+		test_commit "number $count" file.t $count number-$count ||
+	        return 1
+	done &&
+	git pack-refs &&
+	ls -1 .git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 11 output &&
+	grep "commit (initial): file" output &&
+	grep "commit: number 10" output &&
+	git gc &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 0 output 
+'
+
+# This matches show-ref's output
+print_ref() {
+	echo "$(git rev-parse "$1") $1"
+}
+
+test_expect_success 'peeled tags are stored' '
+	initialize &&
+	test_commit file &&
+	git tag -m "annotated tag" test_tag HEAD &&
+	{
+		print_ref "refs/heads/master" &&
+		print_ref "refs/tags/file" &&
+		print_ref "refs/tags/test_tag" &&
+		print_ref "refs/tags/test_tag^{}" 
+	} >expect &&
+	git show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'show-ref works on fresh repo' '
+	initialize &&
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	>expect &&
+	! git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'checkout unborn branch' '
+	initialize &&
+	git checkout -b master
+'
+
+
+test_expect_success 'dir/file conflict' '
+	initialize &&
+	test_commit file &&
+	! git branch master/forbidden
+'
+
+
+test_expect_success 'do not clobber existing repo' '
+	rm -rf .git &&
+	git init --ref-storage=files &&
+	cat .git/HEAD > expect &&
+	test_commit file &&
+	(git init --ref-storage=reftable || true) &&
+	cat .git/HEAD > actual &&
+	test_cmp expect actual
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'pseudo refs' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+	git cherry-pick source &&
+	test -f file2
+'
+
+test_done
+
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v13 12/13] vcxproj: adjust for the reftable changes
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                           ` (10 preceding siblings ...)
  2020-05-11 19:46                         ` [PATCH v13 11/13] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-05-11 19:46                         ` Johannes Schindelin via GitGitGadget
  2020-05-11 19:46                         ` [PATCH v13 13/13] Add some reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
                                           ` (2 subsequent siblings)
  14 siblings, 0 replies; 409+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This allows Git to be compiled via Visual Studio again after integrating
the `hn/reftable` branch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.mak.uname                           |  2 +-
 contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/config.mak.uname b/config.mak.uname
index 5ad43c80b1a..484aec15ac2 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -710,7 +710,7 @@ vcxproj:
 	# Make .vcxproj files and add them
 	unset QUIET_GEN QUIET_BUILT_IN; \
 	perl contrib/buildsystems/generate -g Vcxproj
-	git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
+	git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj
 
 	# Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
 	(echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm
index 5c666f9ac03..33a08d31652 100644
--- a/contrib/buildsystems/Generators/Vcxproj.pm
+++ b/contrib/buildsystems/Generators/Vcxproj.pm
@@ -77,7 +77,7 @@ sub createProject {
     my $libs_release = "\n    ";
     my $libs_debug = "\n    ";
     if (!$static_library) {
-      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
+      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
       $libs_debug = $libs_release;
       $libs_debug =~ s/zlib\.lib/zlibd\.lib/g;
       $libs_debug =~ s/libcurl\.lib/libcurl-d\.lib/g;
@@ -231,6 +231,7 @@ sub createProject {
 EOM
     if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') {
       my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"};
+      my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"};
       my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"};
 
       print F << "EOM";
@@ -240,6 +241,14 @@ sub createProject {
       <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
     </ProjectReference>
 EOM
+      if (!($name =~ /xdiff|libreftable/)) {
+        print F << "EOM";
+    <ProjectReference Include="$cdup\\reftable\\libreftable\\libreftable.vcxproj">
+      <Project>$uuid_libreftable</Project>
+      <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
+    </ProjectReference>
+EOM
+      }
       if (!($name =~ 'xdiff')) {
         print F << "EOM";
     <ProjectReference Include="$cdup\\xdiff\\lib\\xdiff_lib.vcxproj">
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v13 13/13] Add some reftable testing infrastructure
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                           ` (11 preceding siblings ...)
  2020-05-11 19:46                         ` [PATCH v13 12/13] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
@ 2020-05-11 19:46                         ` Han-Wen Nienhuys via GitGitGadget
  2020-05-13 19:57                           ` Junio C Hamano
  2020-05-12  0:41                         ` [PATCH v13 00/13] Reftable support git-core Junio C Hamano
  2020-05-18 20:31                         ` [PATCH v14 0/9] " Han-Wen Nienhuys via GitGitGadget
  14 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-11 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* Add GIT_TEST_REFTABLE environment var to control default ref storage

* Add test_prerequisite REFTABLE. Skip t/t3210-pack-refs.sh for REFTABLE.

Major test failures:

 * t9903-bash-prompt - The bash mode reads .git/HEAD directly
 * t1400-update-ref.sh - Reads from .git/{refs,logs} directly
 * t1404-update-ref-errors.sh - Manipulates .git/refs/ directly
 * t1405 - inspecs .git/ directly.
 * t1450-fsck.sh - manipulates .git/ directly to create invalid state
 * Rebase, cherry-pick: pseudo refs aren't written through the refs backend.

Other tests by decreasing brokenness:

t1407-worktree-ref-store.sh              - 5 of 5
t1413-reflog-detach.sh                   - 7 of 7
t1415-worktree-refs.sh                   - 11 of 11
t3908-stash-in-worktree.sh               - 2 of 2
t4207-log-decoration-colors.sh           - 2 of 2
t5515-fetch-merge-logic.sh               - 17 of 17
t5900-repo-selection.sh                  - 8 of 8
t6016-rev-list-graph-simplify-history.sh - 12 of 12
t5573-pull-verify-signatures.sh          - 15 of 16
t5612-clone-refspec.sh                   - 12 of 13
t5514-fetch-multiple.sh                  - 11 of 12
t6030-bisect-porcelain.sh                - 64 of 71
t5533-push-cas.sh                        - 15 of 17
t5539-fetch-http-shallow.sh              - 7 of 8
t7413-submodule-is-active.sh             - 7 of 8
t2400-worktree-add.sh                    - 59 of 69
t0100-previous.sh                        - 5 of 6
t7419-submodule-set-branch.sh            - 5 of 6
t1404-update-ref-errors.sh               - 44 of 53
t6003-rev-list-topo-order.sh             - 29 of 35
t1409-avoid-packing-refs.sh              - 9 of 11
t5541-http-push-smart.sh                 - 31 of 38
t5407-post-rewrite-hook.sh               - 13 of 16
t9903-bash-prompt.sh                     - 52 of 66
t1414-reflog-walk.sh                     - 9 of 12
t1507-rev-parse-upstream.sh              - 21 of 28
t2404-worktree-config.sh                 - 9 of 12
t1505-rev-parse-last.sh                  - 5 of 7
t7510-signed-commit.sh                   - 16 of 23
t2018-checkout-branch.sh                 - 15 of 22
(..etc)



Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/clone.c               | 2 +-
 builtin/init-db.c             | 2 +-
 refs.c                        | 6 +++---
 t/t1409-avoid-packing-refs.sh | 6 ++++++
 t/t3210-pack-refs.sh          | 6 ++++++
 t/test-lib.sh                 | 5 +++++
 6 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4d0cf065e4a..780c5807415 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1109,7 +1109,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 
 	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
-		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
+		default_ref_storage(), INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index b7053b9e370..da5b4670c84 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -545,7 +545,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
-	const char *ref_storage_format = DEFAULT_REF_STORAGE;
+	const char *ref_storage_format = default_ref_storage();
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
diff --git a/refs.c b/refs.c
index e8751415a9e..70c11b05391 100644
--- a/refs.c
+++ b/refs.c
@@ -1742,7 +1742,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	r->refs_private = ref_store_init(r->gitdir,
 					 r->ref_storage_format ?
 						 r->ref_storage_format :
-						 DEFAULT_REF_STORAGE,
+						 default_ref_storage(),
 					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
@@ -1798,7 +1798,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
+	refs = ref_store_init(submodule_sb.buf, default_ref_storage(),
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1812,7 +1812,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
-	const char *format = DEFAULT_REF_STORAGE; /* XXX */
+	const char *format = default_ref_storage();
 	struct ref_store *refs;
 	const char *id;
 
diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
index be12fb63506..c6f78325563 100755
--- a/t/t1409-avoid-packing-refs.sh
+++ b/t/t1409-avoid-packing-refs.sh
@@ -4,6 +4,12 @@ test_description='avoid rewriting packed-refs unnecessarily'
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 # Add an identifying mark to the packed-refs file header line. This
 # shouldn't upset readers, and it should be omitted if the file is
 # ever rewritten.
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index f41b2afb996..edaef2c175a 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -11,6 +11,12 @@ semantic is still the same.
 '
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success 'enable reflogs' '
 	git config core.logallrefupdates true
 '
diff --git a/t/test-lib.sh b/t/test-lib.sh
index baf94546da1..21740e2c13d 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1516,6 +1516,11 @@ FreeBSD)
 	;;
 esac
 
+if test -n "$GIT_TEST_REFTABLE"
+then
+  test_set_prereq REFTABLE
+fi
+
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 00/13] Reftable support git-core
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                           ` (12 preceding siblings ...)
  2020-05-11 19:46                         ` [PATCH v13 13/13] Add some reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
@ 2020-05-12  0:41                         ` Junio C Hamano
  2020-05-12  7:49                           ` Han-Wen Nienhuys
  2020-05-18 20:31                         ` [PATCH v14 0/9] " Han-Wen Nienhuys via GitGitGadget
  14 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-05-12  0:41 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget
  Cc: git, Han-Wen Nienhuys, Jeff King, Jonathan Nieder,
	brian m. carlson, Johannes Schindelin, Michael Haggerty,
	Patrick Steinhardt

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Han-Wen Nienhuys (11):
>   refs.h: clarify reflog iteration order
>   t: use update-ref and show-ref to reading/writing refs
>   refs: document how ref_iterator_advance_fn should handle symrefs
>   reftable: clarify how empty tables should be written
>   reftable: define version 2 of the spec to accomodate SHA256
>   Write pseudorefs through ref backends.
>   Iterate over the "refs/" namespace in for_each_[raw]ref
>   Add .gitattributes for the reftable/ directory
>   Add reftable library
>   Reftable support for git-core
>   Add some reftable testing infrastructure
>
> Johannes Schindelin (1):
>   vcxproj: adjust for the reftable changes
>
> Jonathan Nieder (1):
>   reftable: file format documentation

I often wish we had something better than shortlog when writing
a cover letter for a multi-author series like this.

Anyway.

As I already said earlier today [*1*], I think some parts of the
series can be reviewed before the "dense code" part, so let's draw
the line between patch 8 and 9 (or patch 7 and 8) and try to see if
we can get the parts before the line reviewed adequately and merge
them to at lesat 'next' during this cycle, which will close in 3
weeks.

  1/13 refs.h: clarify reflog iteration order
  2/13 t: use update-ref and show-ref to reading/writing refs
  3/13 refs: document how ref_iterator_advance_fn should handle symrefs
  4/13 reftable: file format documentation
  5/13 reftable: clarify how empty tables should be written
  6/13 reftable: define version 2 of the spec to accomodate SHA256
  7/13 Write pseudorefs through ref backends.
  8/13 Iterate over the "refs/" namespace in for_each_[raw]ref
  9/13 Add .gitattributes for the reftable/ directory
 10/13 Add reftable library
 11/13 Reftable support for git-core
 12/13 vcxproj: adjust for the reftable changes
 13/13 Add some reftable testing infrastructure

Thanks.

(ps) I took the Cc: list from a message by JNeider earlier today.



[Reference]

*1* https://lore.kernel.org/git/xmqqwo5isf09.fsf@gitster.c.googlers.com/

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 00/13] Reftable support git-core
  2020-05-12  0:41                         ` [PATCH v13 00/13] Reftable support git-core Junio C Hamano
@ 2020-05-12  7:49                           ` Han-Wen Nienhuys
  2020-05-13 21:21                             ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-12  7:49 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Jeff King, Jonathan Nieder, brian m. carlson,
	Johannes Schindelin, Michael Haggerty, Patrick Steinhardt

On Tue, May 12, 2020 at 2:42 AM Junio C Hamano <gitster@pobox.com> wrote:
> As I already said earlier today [*1*], I think some parts of the
> series can be reviewed before the "dense code" part, so let's draw
> the line between patch 8 and 9 (or patch 7 and 8) and try to see if
> we can get the parts before the line reviewed adequately and merge
> them to at lesat 'next' during this cycle, which will close in 3
> weeks.

I concocted 7 (write pseudorefs) yesterday in an hour or so. It looks
like it passes tests, but I think it could do with more scrutiny. I
propose to merge 1-6, and leave 7-8 until I figure out how pseudorefs
and worktree refs really work.

>   1/13 refs.h: clarify reflog iteration order
>   2/13 t: use update-ref and show-ref to reading/writing refs
>   3/13 refs: document how ref_iterator_advance_fn should handle symrefs
>   4/13 reftable: file format documentation
>   5/13 reftable: clarify how empty tables should be written
>   6/13 reftable: define version 2 of the spec to accomodate SHA256
>   7/13 Write pseudorefs through ref backends.
>   8/13 Iterate over the "refs/" namespace in for_each_[raw]ref
>   9/13 Add .gitattributes for the reftable/ directory
>  10/13 Add reftable library
>  11/13 Reftable support for git-core
>  12/13 vcxproj: adjust for the reftable changes
>  13/13 Add some reftable testing infrastructure

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 07/13] Write pseudorefs through ref backends.
  2020-05-11 19:46                         ` [PATCH v13 07/13] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
@ 2020-05-12 10:22                           ` Phillip Wood
  2020-05-12 16:48                             ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Phillip Wood @ 2020-05-12 10:22 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget, git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

Hi Han-Wen

Thanks for working on the reftable implementation. I've had a look at
this patch but bear in mind I'm not familiar with the refs code

On 11/05/2020 20:46, Han-Wen Nienhuys via GitGitGadget wrote:
> From: Han-Wen Nienhuys <hanwen@google.com>
> 
> Pseudorefs store transient data in in the repository. Examples are HEAD,
> CHERRY_PICK_HEAD, etc.
> 
> These refs have always been read through the ref backends, but they were written
> in a one-off routine that wrote a object ID or symref directly wrote into
> .git/<pseudo_ref_name>.
> 
> This causes problems when introducing a new ref storage backend. To remedy this,
> extend the ref backend implementation with a write_pseudoref_fn and
> update_pseudoref_fn.
> 
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  refs.c                | 115 +++++------------------------
>  refs.h                |  11 +++
>  refs/files-backend.c  | 164 +++++++++++++++++++++++++++++++++++-------
>  refs/packed-backend.c |  40 ++++-------
>  refs/refs-internal.h  |  12 ++++
>  5 files changed, 192 insertions(+), 150 deletions(-)
> 
> diff --git a/refs.c b/refs.c
> index 224ff66c7bb..fc72211a36f 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -739,101 +739,6 @@ long get_files_ref_lock_timeout_ms(void)
>  	return timeout_ms;
>  }
>  
> -static int write_pseudoref(const char *pseudoref, const struct object_id *oid,
> -			   const struct object_id *old_oid, struct strbuf *err)
> -{
> -	const char *filename;
> -	int fd;
> -	struct lock_file lock = LOCK_INIT;
> -	struct strbuf buf = STRBUF_INIT;
> -	int ret = -1;
> -
> -	if (!oid)
> -		return 0;
> -
> -	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
> -
> -	filename = git_path("%s", pseudoref);
> -	fd = hold_lock_file_for_update_timeout(&lock, filename, 0,
> -					       get_files_ref_lock_timeout_ms());
> -	if (fd < 0) {
> -		strbuf_addf(err, _("could not open '%s' for writing: %s"),
> -			    filename, strerror(errno));
> -		goto done;
> -	}
> -
> -	if (old_oid) {
> -		struct object_id actual_old_oid;
> -
> -		if (read_ref(pseudoref, &actual_old_oid)) {
> -			if (!is_null_oid(old_oid)) {
> -				strbuf_addf(err, _("could not read ref '%s'"),
> -					    pseudoref);
> -				rollback_lock_file(&lock);
> -				goto done;
> -			}
> -		} else if (is_null_oid(old_oid)) {
> -			strbuf_addf(err, _("ref '%s' already exists"),
> -				    pseudoref);
> -			rollback_lock_file(&lock);
> -			goto done;
> -		} else if (!oideq(&actual_old_oid, old_oid)) {
> -			strbuf_addf(err, _("unexpected object ID when writing '%s'"),
> -				    pseudoref);
> -			rollback_lock_file(&lock);
> -			goto done;
> -		}
> -	}
> -
> -	if (write_in_full(fd, buf.buf, buf.len) < 0) {
> -		strbuf_addf(err, _("could not write to '%s'"), filename);
> -		rollback_lock_file(&lock);
> -		goto done;
> -	}
> -
> -	commit_lock_file(&lock);
> -	ret = 0;
> -done:
> -	strbuf_release(&buf);
> -	return ret;
> -}
> -
> -static int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
> -{
> -	const char *filename;
> -
> -	filename = git_path("%s", pseudoref);
> -
> -	if (old_oid && !is_null_oid(old_oid)) {
> -		struct lock_file lock = LOCK_INIT;
> -		int fd;
> -		struct object_id actual_old_oid;
> -
> -		fd = hold_lock_file_for_update_timeout(
> -				&lock, filename, 0,
> -				get_files_ref_lock_timeout_ms());
> -		if (fd < 0) {
> -			error_errno(_("could not open '%s' for writing"),
> -				    filename);
> -			return -1;
> -		}
> -		if (read_ref(pseudoref, &actual_old_oid))
> -			die(_("could not read ref '%s'"), pseudoref);
> -		if (!oideq(&actual_old_oid, old_oid)) {
> -			error(_("unexpected object ID when deleting '%s'"),
> -			      pseudoref);
> -			rollback_lock_file(&lock);
> -			return -1;
> -		}
> -
> -		unlink(filename);
> -		rollback_lock_file(&lock);
> -	} else {
> -		unlink(filename);
> -	}
> -
> -	return 0;
> -}
>  
>  int refs_delete_ref(struct ref_store *refs, const char *msg,
>  		    const char *refname,
> @@ -844,8 +749,7 @@ int refs_delete_ref(struct ref_store *refs, const char *msg,
>  	struct strbuf err = STRBUF_INIT;
>  
>  	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
> -		assert(refs == get_main_ref_store(the_repository));

I don't think you meant to delete this line, the equivalent line is left
untouched in the next hunk and there are comments in refs.h saying this
should only me called on the main repository

> -		return delete_pseudoref(refname, old_oid);
> +		return ref_store_delete_pseudoref(refs, refname, old_oid);
>  	}
>  
>  	transaction = ref_store_transaction_begin(refs, &err);
> @@ -1172,7 +1076,8 @@ int refs_update_ref(struct ref_store *refs, const char *msg,
>  
>  	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
>  		assert(refs == get_main_ref_store(the_repository));
> -		ret = write_pseudoref(refname, new_oid, old_oid, &err);
> +		ret = ref_store_write_pseudoref(refs, refname, new_oid, old_oid,
> +						&err);
>  	} else {
>  		t = ref_store_transaction_begin(refs, &err);
>  		if (!t ||
> @@ -1441,6 +1346,20 @@ int head_ref(each_ref_fn fn, void *cb_data)
>  	return refs_head_ref(get_main_ref_store(the_repository), fn, cb_data);
>  }
>  
> +int ref_store_write_pseudoref(struct ref_store *refs, const char *pseudoref,
> +			      const struct object_id *oid,
> +			      const struct object_id *old_oid,
> +			      struct strbuf *err)
> +{
> +	return refs->be->write_pseudoref(refs, pseudoref, oid, old_oid, err);
> +}
> +
> +int ref_store_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
> +			       const struct object_id *old_oid)
> +{
> +	return refs->be->delete_pseudoref(refs, pseudoref, old_oid);
> +}
> +
>  struct ref_iterator *refs_ref_iterator_begin(
>  		struct ref_store *refs,
>  		const char *prefix, int trim, int flags)
> diff --git a/refs.h b/refs.h
> index 99ba9e331e5..d1d9361441b 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -728,6 +728,17 @@ int update_ref(const char *msg, const char *refname,
>  	       const struct object_id *new_oid, const struct object_id *old_oid,
>  	       unsigned int flags, enum action_on_err onerr);
>  
> +/* Pseudorefs (eg. HEAD, CHERRY_PICK_HEAD) have a separate routines for updating
> +   and deletion as they cannot take part in normal transactional updates.
> +   Pseudorefs should only be written for the main repository.
> +*/
> +int ref_store_write_pseudoref(struct ref_store *refs, const char *pseudoref,
> +			      const struct object_id *oid,
> +			      const struct object_id *old_oid,
> +			      struct strbuf *err);
> +int ref_store_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
> +			       const struct object_id *old_oid);
> +
>  int parse_hide_refs_config(const char *var, const char *value, const char *);
>  
>  /*
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 6516c7bc8c8..53b891190be 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -731,6 +731,115 @@ static int lock_raw_ref(struct files_ref_store *refs,
>  	return ret;
>  }
>  
> +static int files_write_pseudoref(struct ref_store *ref_store,
> +				 const char *pseudoref,
> +				 const struct object_id *oid,
> +				 const struct object_id *old_oid,
> +				 struct strbuf *err)
> +{
> +	struct files_ref_store *refs =
> +		files_downcast(ref_store, REF_STORE_READ, "write_pseudoref");
> +	int fd;
> +	struct lock_file lock = LOCK_INIT;
> +	struct strbuf filename = STRBUF_INIT;
> +	struct strbuf buf = STRBUF_INIT;
> +	int ret = -1;
> +
> +	if (!oid)
> +		return 0;
> +
> +	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);

The original used git_path(), this change looks correct to an uninformed
reviewer. As far as I can see the only changes to this function are to
accommodate the strbuf which is cleanup up correctly.

> +	fd = hold_lock_file_for_update_timeout(&lock, filename.buf, 0,
> +					       get_files_ref_lock_timeout_ms());
> +	if (fd < 0) {
> +		strbuf_addf(err, _("could not open '%s' for writing: %s"),
> +			    buf.buf, strerror(errno));
> +		goto done;
> +	}
> +
> +	if (old_oid) {
> +		struct object_id actual_old_oid;
> +
> +		if (read_ref(pseudoref, &actual_old_oid)) {
> +			if (!is_null_oid(old_oid)) {
> +				strbuf_addf(err, _("could not read ref '%s'"),
> +					    pseudoref);
> +				rollback_lock_file(&lock);
> +				goto done;
> +			}
> +		} else if (is_null_oid(old_oid)) {
> +			strbuf_addf(err, _("ref '%s' already exists"),
> +				    pseudoref);
> +			rollback_lock_file(&lock);
> +			goto done;
> +		} else if (!oideq(&actual_old_oid, old_oid)) {
> +			strbuf_addf(err,
> +				    _("unexpected object ID when writing '%s'"),
> +				    pseudoref);
> +			rollback_lock_file(&lock);
> +			goto done;
> +		}
> +	}
> +
> +	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
> +	if (write_in_full(fd, buf.buf, buf.len) < 0) {
> +		strbuf_addf(err, _("could not write to '%s'"), filename.buf);
> +		rollback_lock_file(&lock);
> +		goto done;
> +	}
> +
> +	commit_lock_file(&lock);
> +	ret = 0;
> +done:
> +	strbuf_release(&buf);
> +	strbuf_release(&filename);
> +	return ret;
> +}
> +
> +static int files_delete_pseudoref(struct ref_store *ref_store,
> +				  const char *pseudoref,
> +				  const struct object_id *old_oid)
> +{
> +	struct files_ref_store *refs =
> +		files_downcast(ref_store, REF_STORE_READ, "delete_pseudoref");
> +	struct strbuf filename = STRBUF_INIT;
> +	int ret = -1;
> +
> +	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);

Same comment as above, the memory management looks good.

> +
> +	if (old_oid && !is_null_oid(old_oid)) {
> +		struct lock_file lock = LOCK_INIT;
> +		int fd;
> +		struct object_id actual_old_oid;
> +
> +		fd = hold_lock_file_for_update_timeout(
> +			&lock, filename.buf, 0,
> +			get_files_ref_lock_timeout_ms());
> +		if (fd < 0) {
> +			error_errno(_("could not open '%s' for writing"),
> +				    filename.buf);
> +			goto done;
> +		}
> +		if (read_ref(pseudoref, &actual_old_oid))
> +			die(_("could not read ref '%s'"), pseudoref);
> +		if (!oideq(&actual_old_oid, old_oid)) {
> +			error(_("unexpected object ID when deleting '%s'"),
> +			      pseudoref);
> +			rollback_lock_file(&lock);
> +			goto done;
> +		}
> +
> +		unlink(filename.buf);
> +		rollback_lock_file(&lock);
> +	} else {
> +		unlink(filename.buf);
> +	}
> +	ret = 0;
> +done:
> +	strbuf_release(&filename);
> +	return ret;
> +}
> +
>  struct files_ref_iterator {
>  	struct ref_iterator base;
>  
> @@ -3173,30 +3282,31 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
>  	return 0;
>  }
>  
> -struct ref_storage_be refs_be_files = {
> -	NULL,
> -	"files",
> -	files_ref_store_create,
> -	files_init_db,
> -	files_transaction_prepare,
> -	files_transaction_finish,
> -	files_transaction_abort,
> -	files_initial_transaction_commit,
> -
> -	files_pack_refs,
> -	files_create_symref,
> -	files_delete_refs,
> -	files_rename_ref,
> -	files_copy_ref,
> -
> -	files_ref_iterator_begin,
> -	files_read_raw_ref,
> -
> -	files_reflog_iterator_begin,
> -	files_for_each_reflog_ent,
> -	files_for_each_reflog_ent_reverse,
> -	files_reflog_exists,
> -	files_create_reflog,
> -	files_delete_reflog,
> -	files_reflog_expire
> -};
> +struct ref_storage_be refs_be_files = { NULL,
> +					"files",
> +					files_ref_store_create,
> +					files_init_db,
> +					files_transaction_prepare,
> +					files_transaction_finish,
> +					files_transaction_abort,
> +					files_initial_transaction_commit,
> +
> +					files_pack_refs,
> +					files_create_symref,
> +					files_delete_refs,
> +					files_rename_ref,
> +					files_copy_ref,
> +
> +					files_write_pseudoref,
> +					files_delete_pseudoref,
> +
> +					files_ref_iterator_begin,
> +					files_read_raw_ref,
> +
> +					files_reflog_iterator_begin,
> +					files_for_each_reflog_ent,
> +					files_for_each_reflog_ent_reverse,
> +					files_reflog_exists,
> +					files_create_reflog,
> +					files_delete_reflog,
> +					files_reflog_expire };

The formatting has gone haywire

> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 4458a0f69cc..8f9b85f0b0c 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1641,29 +1641,19 @@ static int packed_reflog_expire(struct ref_store *ref_store,
>  }
>  
>  struct ref_storage_be refs_be_packed = {
> -	NULL,
> -	"packed",
> -	packed_ref_store_create,
> -	packed_init_db,
> -	packed_transaction_prepare,
> -	packed_transaction_finish,
> -	packed_transaction_abort,
> -	packed_initial_transaction_commit,
> -
> -	packed_pack_refs,
> -	packed_create_symref,
> -	packed_delete_refs,
> -	packed_rename_ref,
> -	packed_copy_ref,
> -
> -	packed_ref_iterator_begin,
> -	packed_read_raw_ref,
> -
> -	packed_reflog_iterator_begin,
> -	packed_for_each_reflog_ent,
> -	packed_for_each_reflog_ent_reverse,
> -	packed_reflog_exists,
> -	packed_create_reflog,
> -	packed_delete_reflog,
> -	packed_reflog_expire
> +	NULL, "packed", packed_ref_store_create, packed_init_db,
> +	packed_transaction_prepare, packed_transaction_finish,
> +	packed_transaction_abort, packed_initial_transaction_commit,
> +
> +	packed_pack_refs, packed_create_symref, packed_delete_refs,
> +	packed_rename_ref, packed_copy_ref,
> +
> +	/* XXX */
> +	NULL, NULL,

Should the wrappers above that invoke these virtual functions check they
are non-null before dereferencing them? It would be better to die with
BUG() than segfault.

> +
> +	packed_ref_iterator_begin, packed_read_raw_ref,
> +
> +	packed_reflog_iterator_begin, packed_for_each_reflog_ent,
> +	packed_for_each_reflog_ent_reverse, packed_reflog_exists,
> +	packed_create_reflog, packed_delete_reflog, packed_reflog_expire

Formatting again

I think this patch basically works but I'm concerned by the potential
NULL pointer dereference. While it's unfair to judge a patch by it's
formatting the changes to the formatting of existing code and the
dropped assertion rightly or wrongly gave me the impression lack of
attention which does make me concerned that there are other more serious
unintentional changes in the rest of the series.

Best Wishes

Phillip

>  };
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index 3490aac3a40..dabe18baea1 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -549,6 +549,15 @@ typedef int copy_ref_fn(struct ref_store *ref_store,
>  			  const char *oldref, const char *newref,
>  			  const char *logmsg);
>  
> +typedef int write_pseudoref_fn(struct ref_store *ref_store,
> +			       const char *pseudoref,
> +			       const struct object_id *oid,
> +			       const struct object_id *old_oid,
> +			       struct strbuf *err);
> +typedef int delete_pseudoref_fn(struct ref_store *ref_store,
> +				const char *pseudoref,
> +				const struct object_id *old_oid);
> +
>  /*
>   * Iterate over the references in `ref_store` whose names start with
>   * `prefix`. `prefix` is matched as a literal string, without regard
> @@ -648,6 +657,9 @@ struct ref_storage_be {
>  	rename_ref_fn *rename_ref;
>  	copy_ref_fn *copy_ref;
>  
> +	write_pseudoref_fn *write_pseudoref;
> +	delete_pseudoref_fn *delete_pseudoref;
> +
>  	ref_iterator_begin_fn *iterator_begin;
>  	read_raw_ref_fn *read_raw_ref;
>  
> 


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 07/13] Write pseudorefs through ref backends.
  2020-05-12 10:22                           ` Phillip Wood
@ 2020-05-12 16:48                             ` Han-Wen Nienhuys
  2020-05-13 10:06                               ` Phillip Wood
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-12 16:48 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, May 12, 2020 at 12:22 PM Phillip Wood <phillip.wood123@gmail.com> wrote:
> >       if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
> > -             assert(refs == get_main_ref_store(the_repository));
>
> I don't think you meant to delete this line, the equivalent line is left
> untouched in the next hunk and there are comments in refs.h saying this
> should only me called on the main repository

yep. - Whoops. Fixed.

> > -struct ref_storage_be refs_be_files = {
> > -     NULL,
> > -     "files",
> > +struct ref_storage_be refs_be_files = { NULL,
> > +                                     "files",
> > +                                     files_ref_store_create,

> The formatting has gone haywire

This was clang-format's automatic reformatting. I've looked a bit
closer, and it was reformatting because the initializer list was
missing the ',' on the final entry. I've added that now.

> > +     NULL, NULL,
>
> Should the wrappers above that invoke these virtual functions check they
> are non-null before dereferencing them? It would be better to die with
> BUG() than segfault.

Done.

> I think this patch basically works but I'm concerned by the potential
> NULL pointer dereference. While it's unfair to judge a patch by it's
> formatting the changes to the formatting of existing code and the
> dropped assertion rightly or wrongly gave me the impression lack of
> attention which does make me concerned that there are other more serious
> unintentional changes in the rest of the series.

I prefer leaving formatting up to automated tooling. They're better at
following mechanical rules precisely.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 07/13] Write pseudorefs through ref backends.
  2020-05-12 16:48                             ` Han-Wen Nienhuys
@ 2020-05-13 10:06                               ` Phillip Wood
  2020-05-13 18:10                                 ` Phillip Wood
  0 siblings, 1 reply; 409+ messages in thread
From: Phillip Wood @ 2020-05-13 10:06 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On 12/05/2020 17:48, Han-Wen Nienhuys wrote:
> On Tue, May 12, 2020 at 12:22 PM Phillip Wood <phillip.wood123@gmail.com> wrote:
>>>       if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
>>> -             assert(refs == get_main_ref_store(the_repository));
>>
>> I don't think you meant to delete this line, the equivalent line is left
>> untouched in the next hunk and there are comments in refs.h saying this
>> should only me called on the main repository
> 
> yep. - Whoops. Fixed.
> 
>>> -struct ref_storage_be refs_be_files = {
>>> -     NULL,
>>> -     "files",
>>> +struct ref_storage_be refs_be_files = { NULL,
>>> +                                     "files",
>>> +                                     files_ref_store_create,
> 
>> The formatting has gone haywire
> 
> This was clang-format's automatic reformatting. I've looked a bit
> closer, and it was reformatting because the initializer list was
> missing the ',' on the final entry. I've added that now.

It sounds like we need to look at the clang format rules we use, I think
the original code is correctly formatted in this case (we've only
allowed trailing commas relatively recently), it shouldn't need changing
(which clutters up the patch with unrelated changes) just to satisfy
clang-format.

>>> +     NULL, NULL,
>>
>> Should the wrappers above that invoke these virtual functions check they
>> are non-null before dereferencing them? It would be better to die with
>> BUG() than segfault.
> 
> Done.

Great, I did wonder after I had sent the email if it would be better to
implement the BUG() in the virtual functions in the packed-refs backend
to avoid the check overhead in the other backends but it's probably not
worth worrying about.

> 
>> I think this patch basically works but I'm concerned by the potential
>> NULL pointer dereference. While it's unfair to judge a patch by it's
>> formatting the changes to the formatting of existing code and the
>> dropped assertion rightly or wrongly gave me the impression lack of
>> attention which does make me concerned that there are other more serious
>> unintentional changes in the rest of the series.
> 
> I prefer leaving formatting up to automated tooling. They're better at
> following mechanical rules precisely.

Right but in this case the changes completely obscured the important
changes. What really raised a flag for we was that two definitions of
the same structure where reformatted in completely different ways - I do
use clang format but I also sanity check the results before submitting a
patch.

I was looking at our use of git_path_cherry_pick_head() in sequencer.c
(I mostly work on rebase) the other day, there are quite a few places we
rely on CHERRY_PICK_HEAD/REVERT being a file, I suspect we probably do
the same in wt-status.c and builtin/commit.c. The uses are generally
    file_exists(git_path_cherry_pick_head())
which I think we could just change to use get_oid()
We also unlink() it in several places which we could replace with
delete_ref() but we blindly call unlink() in some places so the ref
might not exist.

In the sequencer we always use the refs api when handling REBASE_HEAD, I
haven't got round to checking builtin/rebase.c and builtin/am.c for that
yet.

Is there a plan to handle FETCH_HEAD with reftable? git rev-parse will
parse the first oid in the file which is handy if you've only fetched a
single ref, but the file itself contains several lines like

2407200be2a37bfca57ea1a9474318822fec49b0		branch 'git-alias' of
ssh://pi/home/pi/git

It might be special cased in the refs code already I haven't checked

MERGE_HEAD also contains multiple lines for octopus merges but I'm not
sure if people really need to run rev-parse on that.

Apologies if the FETCH_HEAD and MERGE_HEAD points have already been
covered elsewhere, this thread has got quite long.

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 07/13] Write pseudorefs through ref backends.
  2020-05-13 10:06                               ` Phillip Wood
@ 2020-05-13 18:10                                 ` Phillip Wood
  0 siblings, 0 replies; 409+ messages in thread
From: Phillip Wood @ 2020-05-13 18:10 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On 13/05/2020 11:06, Phillip Wood wrote:
> On 12/05/2020 17:48, Han-Wen Nienhuys wrote:
[...]
>>>> -struct ref_storage_be refs_be_files = {
>>>> -     NULL,
>>>> -     "files",
>>>> +struct ref_storage_be refs_be_files = { NULL,
>>>> +                                     "files",
>>>> +                                     files_ref_store_create,
>>
>>> The formatting has gone haywire
>>
>> This was clang-format's automatic reformatting. I've looked a bit
>> closer, and it was reformatting because the initializer list was
>> missing the ',' on the final entry. I've added that now.
> 
> It sounds like we need to look at the clang format rules we use, I think
> the original code is correctly formatted in this case (we've only
> allowed trailing commas relatively recently), it shouldn't need changing
> (which clutters up the patch with unrelated changes) just to satisfy
> clang-format.

I spent some time trying to fix the clang-format rules this afternoon
and got nowhere. Searching the web revealed plenty of questions but no
useful answers beyond "add a trailing comma" and it's not even clear to
me if that works all time. It is a shame clang-format does not seem to
support a fairly common way of formatting structure initializers. That
the formatting changes so much based on the presence or absence of the
trailing comma is really confusing and unhelpful.

Best Wishes

Phillip

> 
>>>> +     NULL, NULL,
>>>
>>> Should the wrappers above that invoke these virtual functions check they
>>> are non-null before dereferencing them? It would be better to die with
>>> BUG() than segfault.
>>
>> Done.
> 
> Great, I did wonder after I had sent the email if it would be better to
> implement the BUG() in the virtual functions in the packed-refs backend
> to avoid the check overhead in the other backends but it's probably not
> worth worrying about.
> 
>>
>>> I think this patch basically works but I'm concerned by the potential
>>> NULL pointer dereference. While it's unfair to judge a patch by it's
>>> formatting the changes to the formatting of existing code and the
>>> dropped assertion rightly or wrongly gave me the impression lack of
>>> attention which does make me concerned that there are other more serious
>>> unintentional changes in the rest of the series.
>>
>> I prefer leaving formatting up to automated tooling. They're better at
>> following mechanical rules precisely.
> 
> Right but in this case the changes completely obscured the important
> changes. What really raised a flag for we was that two definitions of
> the same structure where reformatted in completely different ways - I do
> use clang format but I also sanity check the results before submitting a
> patch.
> 
> I was looking at our use of git_path_cherry_pick_head() in sequencer.c
> (I mostly work on rebase) the other day, there are quite a few places we
> rely on CHERRY_PICK_HEAD/REVERT being a file, I suspect we probably do
> the same in wt-status.c and builtin/commit.c. The uses are generally
>     file_exists(git_path_cherry_pick_head())
> which I think we could just change to use get_oid()
> We also unlink() it in several places which we could replace with
> delete_ref() but we blindly call unlink() in some places so the ref
> might not exist.
> 
> In the sequencer we always use the refs api when handling REBASE_HEAD, I
> haven't got round to checking builtin/rebase.c and builtin/am.c for that
> yet.
> 
> Is there a plan to handle FETCH_HEAD with reftable? git rev-parse will
> parse the first oid in the file which is handy if you've only fetched a
> single ref, but the file itself contains several lines like
> 
> 2407200be2a37bfca57ea1a9474318822fec49b0		branch 'git-alias' of
> ssh://pi/home/pi/git
> 
> It might be special cased in the refs code already I haven't checked
> 
> MERGE_HEAD also contains multiple lines for octopus merges but I'm not
> sure if people really need to run rev-parse on that.
> 
> Apologies if the FETCH_HEAD and MERGE_HEAD points have already been
> covered elsewhere, this thread has got quite long.
> 
> Best Wishes
> 
> Phillip
> 


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 11/13] Reftable support for git-core
  2020-05-11 19:46                         ` [PATCH v13 11/13] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-05-13 19:55                           ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-13 19:55 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/builtin/clone.c b/builtin/clone.c
> index cb48a291caf..4d0cf065e4a 100644
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -1108,7 +1108,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
>  		}
>  	}
>  
> -	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
> +	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
> +		DEFAULT_REF_STORAGE, INIT_DB_QUIET);

Where does this symbol come from?

In addition, this will be overwritten in a later step to
default_ref_storage(); perhaps in this step we should use that
instead from the beginning?


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 13/13] Add some reftable testing infrastructure
  2020-05-11 19:46                         ` [PATCH v13 13/13] Add some reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
@ 2020-05-13 19:57                           ` Junio C Hamano
  2020-05-19 13:54                             ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-05-13 19:57 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index b7053b9e370..da5b4670c84 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -545,7 +545,7 @@ static const char *const init_db_usage[] = {
>  int cmd_init_db(int argc, const char **argv, const char *prefix)
>  {
>  	const char *git_dir;
> -	const char *ref_storage_format = DEFAULT_REF_STORAGE;
> +	const char *ref_storage_format = default_ref_storage();
>  	const char *real_git_dir = NULL;
>  	const char *work_tree;
>  	const char *template_dir = NULL;
> diff --git a/refs.c b/refs.c
> index e8751415a9e..70c11b05391 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1742,7 +1742,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
>  	r->refs_private = ref_store_init(r->gitdir,
>  					 r->ref_storage_format ?
>  						 r->ref_storage_format :
> -						 DEFAULT_REF_STORAGE,
> +						 default_ref_storage(),
>  					 REF_STORE_ALL_CAPS);
>  	return r->refs_private;
>  }

Would it make sense to let NULL stand for "we use the default,
whatever it is" so that anybody outside the implementation of the
refs API does not have to care what the default is, and make
ref_store_init() function the only thing that knows what it is?


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 00/13] Reftable support git-core
  2020-05-12  7:49                           ` Han-Wen Nienhuys
@ 2020-05-13 21:21                             ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-13 21:21 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Jeff King, Jonathan Nieder, brian m. carlson,
	Johannes Schindelin, Michael Haggerty, Patrick Steinhardt

Han-Wen Nienhuys <hanwen@google.com> writes:

> On Tue, May 12, 2020 at 2:42 AM Junio C Hamano <gitster@pobox.com> wrote:
>> As I already said earlier today [*1*], I think some parts of the
>> series can be reviewed before the "dense code" part, so let's draw
>> the line between patch 8 and 9 (or patch 7 and 8) and try to see if
>> we can get the parts before the line reviewed adequately and merge
>> them to at lesat 'next' during this cycle, which will close in 3
>> weeks.
>
> I concocted 7 (write pseudorefs) yesterday in an hour or so. It looks
> like it passes tests, but I think it could do with more scrutiny. I
> propose to merge 1-6, and leave 7-8 until I figure out how pseudorefs
> and worktree refs really work.

OK.  Today's pushout has the boundary at that spot between 6 and 7.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v14 0/9] Reftable support git-core
  2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                           ` (13 preceding siblings ...)
  2020-05-12  0:41                         ` [PATCH v13 00/13] Reftable support git-core Junio C Hamano
@ 2020-05-18 20:31                         ` Han-Wen Nienhuys via GitGitGadget
  2020-05-18 20:31                           ` [PATCH v14 1/9] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
                                             ` (9 more replies)
  14 siblings, 10 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-18 20:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend. Based on
hn/refs-cleanup.

Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1

Summary 18904 tests pass 2750 tests fail

Some issues:

 * many tests inspect .git/{logs,heads}/ directly.
 * worktrees broken. 
 * Rebase/cherry-pick still largely broken

v16

 * handle pseudo refs.
 * various bugfixes and fixes for mem leaks.

v17

 * many style tweaks in reftable/
 * fix rebase
 * GIT_DEBUG_REFS support.

Han-Wen Nienhuys (8):
  Write pseudorefs through ref backends.
  Move REF_LOG_ONLY to refs-internal.h
  Iterate over the "refs/" namespace in for_each_[raw]ref
  Add .gitattributes for the reftable/ directory
  Add reftable library
  Reftable support for git-core
  Add GIT_DEBUG_REFS debugging mechanism
  Add reftable testing infrastructure

Johannes Schindelin (1):
  vcxproj: adjust for the reftable changes

 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   28 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   56 +-
 cache.h                                       |    6 +-
 config.mak.uname                              |    2 +-
 contrib/buildsystems/Generators/Vcxproj.pm    |   11 +-
 refs.c                                        |  150 +-
 refs.h                                        |   14 +
 refs/debug.c                                  |  309 ++++
 refs/files-backend.c                          |  121 +-
 refs/packed-backend.c                         |   21 +-
 refs/refs-internal.h                          |   26 +
 refs/reftable-backend.c                       | 1313 +++++++++++++++++
 reftable/.gitattributes                       |    1 +
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   11 +
 reftable/VERSION                              |    5 +
 reftable/basics.c                             |  215 +++
 reftable/basics.h                             |   53 +
 reftable/block.c                              |  435 ++++++
 reftable/block.h                              |  129 ++
 reftable/constants.h                          |   21 +
 reftable/file.c                               |   97 ++
 reftable/iter.c                               |  241 +++
 reftable/iter.h                               |   63 +
 reftable/merged.c                             |  327 ++++
 reftable/merged.h                             |   39 +
 reftable/pq.c                                 |  114 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  754 ++++++++++
 reftable/reader.h                             |   65 +
 reftable/record.c                             | 1154 +++++++++++++++
 reftable/record.h                             |  128 ++
 reftable/refname.c                            |  215 +++
 reftable/refname.h                            |   38 +
 reftable/reftable.c                           |   92 ++
 reftable/reftable.h                           |  564 +++++++
 reftable/slice.c                              |  225 +++
 reftable/slice.h                              |   77 +
 reftable/stack.c                              | 1234 ++++++++++++++++
 reftable/stack.h                              |   48 +
 reftable/system.h                             |   54 +
 reftable/tree.c                               |   64 +
 reftable/tree.h                               |   34 +
 reftable/update.sh                            |   24 +
 reftable/writer.c                             |  659 +++++++++
 reftable/writer.h                             |   60 +
 reftable/zlib-compat.c                        |   92 ++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |  136 ++
 t/t0033-debug-refs.sh                         |   18 +
 t/t1409-avoid-packing-refs.sh                 |    6 +
 t/t1450-fsck.sh                               |    6 +
 t/t3210-pack-refs.sh                          |    6 +
 t/t9903-bash-prompt.sh                        |    6 +
 t/test-lib.sh                                 |    5 +
 59 files changed, 9523 insertions(+), 141 deletions(-)
 create mode 100644 refs/debug.c
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/.gitattributes
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100755 t/t0031-reftable.sh
 create mode 100755 t/t0033-debug-refs.sh


base-commit: 369e06977b0442dc153312a6e7fb2ecd8ff540c5
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v14
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v14
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v13:

  1:  8394c156eb4 <  -:  ----------- refs.h: clarify reflog iteration order
  2:  dbf45fe8753 <  -:  ----------- t: use update-ref and show-ref to reading/writing refs
  3:  be083a85fb5 <  -:  ----------- refs: document how ref_iterator_advance_fn should handle symrefs
  4:  96fd9814a67 <  -:  ----------- reftable: file format documentation
  5:  7aa3f92fca0 <  -:  ----------- reftable: clarify how empty tables should be written
  6:  1e3c8f2d3e8 <  -:  ----------- reftable: define version 2 of the spec to accomodate SHA256
  7:  2c2f94ddc0e !  1:  46d04f6740e Write pseudorefs through ref backends.
     @@ refs.c: long get_files_ref_lock_timeout_ms(void)
       int refs_delete_ref(struct ref_store *refs, const char *msg,
       		    const char *refname,
      @@ refs.c: int refs_delete_ref(struct ref_store *refs, const char *msg,
     - 	struct strbuf err = STRBUF_INIT;
       
       	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
     --		assert(refs == get_main_ref_store(the_repository));
     + 		assert(refs == get_main_ref_store(the_repository));
      -		return delete_pseudoref(refname, old_oid);
      +		return ref_store_delete_pseudoref(refs, refname, old_oid);
       	}
     @@ refs/files-backend.c: static int lock_raw_ref(struct files_ref_store *refs,
       struct files_ref_iterator {
       	struct ref_iterator base;
       
     -@@ refs/files-backend.c: static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
     - 	return 0;
     - }
     +@@ refs/files-backend.c: struct ref_storage_be refs_be_files = {
     + 	files_rename_ref,
     + 	files_copy_ref,
       
     --struct ref_storage_be refs_be_files = {
     --	NULL,
     --	"files",
     --	files_ref_store_create,
     --	files_init_db,
     --	files_transaction_prepare,
     --	files_transaction_finish,
     --	files_transaction_abort,
     --	files_initial_transaction_commit,
     --
     --	files_pack_refs,
     --	files_create_symref,
     --	files_delete_refs,
     --	files_rename_ref,
     --	files_copy_ref,
     --
     --	files_ref_iterator_begin,
     --	files_read_raw_ref,
     --
     --	files_reflog_iterator_begin,
     --	files_for_each_reflog_ent,
     --	files_for_each_reflog_ent_reverse,
     --	files_reflog_exists,
     --	files_create_reflog,
     --	files_delete_reflog,
     --	files_reflog_expire
     --};
     -+struct ref_storage_be refs_be_files = { NULL,
     -+					"files",
     -+					files_ref_store_create,
     -+					files_init_db,
     -+					files_transaction_prepare,
     -+					files_transaction_finish,
     -+					files_transaction_abort,
     -+					files_initial_transaction_commit,
     -+
     -+					files_pack_refs,
     -+					files_create_symref,
     -+					files_delete_refs,
     -+					files_rename_ref,
     -+					files_copy_ref,
     ++	files_write_pseudoref,
     ++	files_delete_pseudoref,
      +
     -+					files_write_pseudoref,
     -+					files_delete_pseudoref,
     -+
     -+					files_ref_iterator_begin,
     -+					files_read_raw_ref,
     -+
     -+					files_reflog_iterator_begin,
     -+					files_for_each_reflog_ent,
     -+					files_for_each_reflog_ent_reverse,
     -+					files_reflog_exists,
     -+					files_create_reflog,
     -+					files_delete_reflog,
     -+					files_reflog_expire };
     + 	files_ref_iterator_begin,
     + 	files_read_raw_ref,
     + 
     +@@ refs/files-backend.c: struct ref_storage_be refs_be_files = {
     + 	files_reflog_exists,
     + 	files_create_reflog,
     + 	files_delete_reflog,
     +-	files_reflog_expire
     ++	files_reflog_expire,
     + };
      
       ## refs/packed-backend.c ##
     -@@ refs/packed-backend.c: static int packed_reflog_expire(struct ref_store *ref_store,
     +@@ refs/packed-backend.c: static int packed_copy_ref(struct ref_store *ref_store,
     + 	BUG("packed reference store does not support copying references");
       }
       
     - struct ref_storage_be refs_be_packed = {
     --	NULL,
     --	"packed",
     --	packed_ref_store_create,
     --	packed_init_db,
     --	packed_transaction_prepare,
     --	packed_transaction_finish,
     --	packed_transaction_abort,
     --	packed_initial_transaction_commit,
     --
     --	packed_pack_refs,
     --	packed_create_symref,
     --	packed_delete_refs,
     --	packed_rename_ref,
     --	packed_copy_ref,
     --
     --	packed_ref_iterator_begin,
     --	packed_read_raw_ref,
     --
     --	packed_reflog_iterator_begin,
     --	packed_for_each_reflog_ent,
     --	packed_for_each_reflog_ent_reverse,
     --	packed_reflog_exists,
     --	packed_create_reflog,
     --	packed_delete_reflog,
     --	packed_reflog_expire
     -+	NULL, "packed", packed_ref_store_create, packed_init_db,
     -+	packed_transaction_prepare, packed_transaction_finish,
     -+	packed_transaction_abort, packed_initial_transaction_commit,
     -+
     -+	packed_pack_refs, packed_create_symref, packed_delete_refs,
     -+	packed_rename_ref, packed_copy_ref,
     ++static int packed_write_pseudoref(struct ref_store *ref_store,
     ++				  const char *pseudoref,
     ++				  const struct object_id *oid,
     ++				  const struct object_id *old_oid,
     ++				  struct strbuf *err)
     ++{
     ++	BUG("packed reference store does not support writing pseudo-references");
     ++}
      +
     -+	/* XXX */
     -+	NULL, NULL,
     ++static int packed_delete_pseudoref(struct ref_store *ref_store,
     ++				   const char *pseudoref,
     ++				   const struct object_id *old_oid)
     ++{
     ++	BUG("packed reference store does not support deleting pseudo-references");
     ++}
      +
     -+	packed_ref_iterator_begin, packed_read_raw_ref,
     + static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_store)
     + {
     + 	return empty_ref_iterator_begin();
     +@@ refs/packed-backend.c: struct ref_storage_be refs_be_packed = {
     + 	packed_rename_ref,
     + 	packed_copy_ref,
     + 
     ++	packed_write_pseudoref,
     ++	packed_delete_pseudoref,
      +
     -+	packed_reflog_iterator_begin, packed_for_each_reflog_ent,
     -+	packed_for_each_reflog_ent_reverse, packed_reflog_exists,
     -+	packed_create_reflog, packed_delete_reflog, packed_reflog_expire
     + 	packed_ref_iterator_begin,
     + 	packed_read_raw_ref,
     + 
     +@@ refs/packed-backend.c: struct ref_storage_be refs_be_packed = {
     + 	packed_reflog_exists,
     + 	packed_create_reflog,
     + 	packed_delete_reflog,
     +-	packed_reflog_expire
     ++	packed_reflog_expire,
       };
      
       ## refs/refs-internal.h ##
  -:  ----------- >  2:  c650f7e4345 Move REF_LOG_ONLY to refs-internal.h
  8:  3becaaee66a =  3:  0c953fce52a Iterate over the "refs/" namespace in for_each_[raw]ref
  9:  a6f77965f84 =  4:  206b7d329f8 Add .gitattributes for the reftable/ directory
 10:  8103703c358 !  5:  9a8e504a1d0 Add reftable library
     @@ reftable/README.md (new)
      
       ## reftable/VERSION (new) ##
      @@
     -+commit e74c14b66b6c15f6526c485f2e45d3f2735d359d
     ++commit bad78b6de70700933a2f93ebadaa37a5e852d9ea
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Mon May 11 21:02:55 2020 +0200
     -+
     -+    C: handle out-of-date reftable stacks
     -+    
     -+    * Make reftable_stack_reload() check out-of-dateness. This makes it very cheap,
     -+      allowing it to be called often. This is useful because Git calls itself often,
     -+      which effectively requires a reload.
     -+    
     -+    * In reftable_stack_add(), check the return value of stack_uptodate(), leading
     -+      to erroneously succeeding the transaction.
     -+    
     -+    * A test that exercises the above.
     ++Date:   Mon May 18 20:12:57 2020 +0200
     ++
     ++    C: get rid of inner blocks for local scoping
      
       ## reftable/basics.c (new) ##
      @@
     @@ reftable/block.c (new)
      +	return bw->buf[bw->header_off];
      +}
      +
     -+/* adds the record to the block. Returns -1 if it does not fit, 0 on
     ++/* adds the reftable_record to the block. Returns -1 if it does not fit, 0 on
      +   success */
     -+int block_writer_add(struct block_writer *w, struct record rec)
     ++int block_writer_add(struct block_writer *w, struct reftable_record *rec)
      +{
      +	struct slice empty = { 0 };
      +	struct slice last = w->entries % w->restart_interval == 0 ? empty :
     @@ reftable/block.c (new)
      +	struct slice key = { 0 };
      +	int n = 0;
      +
     -+	record_key(rec, &key);
     -+	n = encode_key(&restart, out, last, key, record_val_type(rec));
     ++	reftable_record_key(rec, &key);
     ++	n = reftable_encode_key(&restart, out, last, key,
     ++				reftable_record_val_type(rec));
      +	if (n < 0) {
      +		goto err;
      +	}
      +	slice_consume(&out, n);
      +
     -+	n = record_encode(rec, out, w->hash_size);
     ++	n = reftable_record_encode(rec, out, w->hash_size);
      +	if (n < 0) {
      +		goto err;
      +	}
     @@ reftable/block.c (new)
      +		goto err;
      +	}
      +
     -+	slice_clear(&key);
     ++	slice_release(&key);
      +	return 0;
      +
      +err:
     -+	slice_clear(&key);
     ++	slice_release(&key);
      +	return -1;
      +}
      +
     @@ reftable/block.c (new)
      +			}
      +
      +			if (Z_OK != zresult) {
     -+				slice_clear(&compressed);
     ++				slice_release(&compressed);
      +				return REFTABLE_ZLIB_ERROR;
      +			}
      +
      +			memcpy(w->buf + block_header_skip, compressed.buf,
      +			       dest_len);
      +			w->next = dest_len + block_header_skip;
     -+			slice_clear(&compressed);
     ++			slice_release(&compressed);
      +			break;
      +		}
      +	}
     @@ reftable/block.c (new)
      +	byte typ = block->data[header_off];
      +	uint32_t sz = get_be24(block->data + header_off + 1);
      +
     -+	if (!is_block_type(typ)) {
     ++	if (!reftable_is_block_type(typ)) {
      +		return REFTABLE_FORMAT_ERROR;
      +	}
      +
     @@ reftable/block.c (new)
      +				    uncompressed.buf + block_header_skip,
      +				    &dst_len, block->data + block_header_skip,
      +				    &src_len)) {
     -+			slice_clear(&uncompressed);
     ++			slice_release(&uncompressed);
      +			return REFTABLE_ZLIB_ERROR;
      +		}
      +
     @@ reftable/block.c (new)
      +	struct slice rkey = { 0 };
      +	struct slice last_key = { 0 };
      +	byte unused_extra;
     -+	int n = decode_key(&rkey, &unused_extra, last_key, in);
     ++	int n = reftable_decode_key(&rkey, &unused_extra, last_key, in);
      +	if (n < 0) {
      +		a->error = 1;
      +		return -1;
      +	}
      +
      +	{
     -+		int result = slice_compare(a->key, rkey);
     -+		slice_clear(&rkey);
     ++		int result = slice_cmp(a->key, rkey);
     ++		slice_release(&rkey);
      +		return result;
      +	}
      +}
     @@ reftable/block.c (new)
      +	slice_copy(&dest->last_key, src->last_key);
      +}
      +
     -+int block_iter_next(struct block_iter *it, struct record rec)
     ++int block_iter_next(struct block_iter *it, struct reftable_record *rec)
      +{
      +	struct slice in = {
      +		.buf = it->br->block.data + it->next_off,
     @@ reftable/block.c (new)
      +		return 1;
      +	}
      +
     -+	n = decode_key(&key, &extra, it->last_key, in);
     ++	n = reftable_decode_key(&key, &extra, it->last_key, in);
      +	if (n < 0) {
      +		return -1;
      +	}
      +
      +	slice_consume(&in, n);
     -+	n = record_decode(rec, key, extra, in, it->br->hash_size);
     ++	n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
      +	if (n < 0) {
      +		return -1;
      +	}
     @@ reftable/block.c (new)
      +
      +	slice_copy(&it->last_key, key);
      +	it->next_off += start.len - in.len;
     -+	slice_clear(&key);
     ++	slice_release(&key);
      +	return 0;
      +}
      +
     @@ reftable/block.c (new)
      +	};
      +
      +	byte extra = 0;
     -+	int n = decode_key(key, &extra, empty, in);
     ++	int n = reftable_decode_key(key, &extra, empty, in);
      +	if (n < 0) {
      +		return n;
      +	}
     @@ reftable/block.c (new)
      +
      +void block_iter_close(struct block_iter *it)
      +{
     -+	slice_clear(&it->last_key);
     ++	slice_release(&it->last_key);
      +}
      +
      +int block_reader_seek(struct block_reader *br, struct block_iter *it,
     @@ reftable/block.c (new)
      +		.key = want,
      +		.r = br,
      +	};
     -+	struct record rec = new_record(block_reader_type(br));
     ++	struct reftable_record rec = reftable_new_record(block_reader_type(br));
      +	struct slice key = { 0 };
      +	int err = 0;
      +	struct block_iter next = { 0 };
     @@ reftable/block.c (new)
      +	*/
      +	while (true) {
      +		block_iter_copy_from(&next, it);
     -+		err = block_iter_next(&next, rec);
     ++		err = block_iter_next(&next, &rec);
      +		if (err < 0) {
      +			goto exit;
      +		}
      +
     -+		record_key(rec, &key);
     -+		if (err > 0 || slice_compare(key, want) >= 0) {
     ++		reftable_record_key(&rec, &key);
     ++		if (err > 0 || slice_cmp(key, want) >= 0) {
      +			err = 0;
      +			goto exit;
      +		}
     @@ reftable/block.c (new)
      +	}
      +
      +exit:
     -+	slice_clear(&key);
     -+	slice_clear(&next.last_key);
     -+	record_destroy(&rec);
     ++	slice_release(&key);
     ++	slice_release(&next.last_key);
     ++	reftable_record_destroy(&rec);
      +
      +	return err;
      +}
     @@ reftable/block.c (new)
      +void block_writer_clear(struct block_writer *bw)
      +{
      +	FREE_AND_NULL(bw->restarts);
     -+	slice_clear(&bw->last_key);
     ++	slice_release(&bw->last_key);
      +	/* the block is not owned. */
      +}
      
     @@ reftable/block.h (new)
      +byte block_writer_type(struct block_writer *bw);
      +
      +/* appends the record, or -1 if it doesn't fit. */
     -+int block_writer_add(struct block_writer *w, struct record rec);
     ++int block_writer_add(struct block_writer *w, struct reftable_record *rec);
      +
      +/* appends the key restarts, and compress the block if necessary. */
      +int block_writer_finish(struct block_writer *w);
     @@ reftable/block.h (new)
      +void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
      +
      +/* return < 0 for error, 0 for OK, > 0 for EOF. */
     -+int block_iter_next(struct block_iter *it, struct record rec);
     ++int block_iter_next(struct block_iter *it, struct reftable_record *rec);
      +
      +/* Seek to `want` with in the block pointed to by `it` */
      +int block_iter_seek(struct block_iter *it, struct slice want);
     @@ reftable/file.c (new)
      +	struct stat st = { 0 };
      +	int err = 0;
      +	int fd = open(name, O_RDONLY);
     ++	struct file_block_source *p = NULL;
      +	if (fd < 0) {
      +		if (errno == ENOENT) {
      +			return REFTABLE_NOT_EXIST_ERROR;
     @@ reftable/file.c (new)
      +		return -1;
      +	}
      +
     -+	{
     -+		struct file_block_source *p =
     -+			reftable_calloc(sizeof(struct file_block_source));
     -+		p->size = st.st_size;
     -+		p->fd = fd;
     ++	p = reftable_calloc(sizeof(struct file_block_source));
     ++	p->size = st.st_size;
     ++	p->fd = fd;
      +
     -+		assert(bs->ops == NULL);
     -+		bs->ops = &file_vtable;
     -+		bs->arg = p;
     -+	}
     ++	assert(bs->ops == NULL);
     ++	bs->ops = &file_vtable;
     ++	bs->arg = p;
      +	return 0;
      +}
      +
     @@ reftable/iter.c (new)
      +#include "reader.h"
      +#include "reftable.h"
      +
     -+bool iterator_is_null(struct reftable_iterator it)
     ++bool iterator_is_null(struct reftable_iterator *it)
      +{
     -+	return it.ops == NULL;
     ++	return it->ops == NULL;
      +}
      +
     -+static int empty_iterator_next(void *arg, struct record rec)
     ++static int empty_iterator_next(void *arg, struct reftable_record *rec)
      +{
      +	return 1;
      +}
     @@ reftable/iter.c (new)
      +	it->ops = &empty_vtable;
      +}
      +
     -+int iterator_next(struct reftable_iterator it, struct record rec)
     ++int iterator_next(struct reftable_iterator *it, struct reftable_record *rec)
      +{
     -+	return it.ops->next(it.iter_arg, rec);
     ++	return it->ops->next(it->iter_arg, rec);
      +}
      +
      +void reftable_iterator_destroy(struct reftable_iterator *it)
     @@ reftable/iter.c (new)
      +	FREE_AND_NULL(it->iter_arg);
      +}
      +
     -+int reftable_iterator_next_ref(struct reftable_iterator it,
     ++int reftable_iterator_next_ref(struct reftable_iterator *it,
      +			       struct reftable_ref_record *ref)
      +{
     -+	struct record rec = { 0 };
     -+	record_from_ref(&rec, ref);
     -+	return iterator_next(it, rec);
     ++	struct reftable_record rec = { 0 };
     ++	reftable_record_from_ref(&rec, ref);
     ++	return iterator_next(it, &rec);
      +}
      +
     -+int reftable_iterator_next_log(struct reftable_iterator it,
     ++int reftable_iterator_next_log(struct reftable_iterator *it,
      +			       struct reftable_log_record *log)
      +{
     -+	struct record rec = { 0 };
     -+	record_from_log(&rec, log);
     -+	return iterator_next(it, rec);
     ++	struct reftable_record rec = { 0 };
     ++	reftable_record_from_log(&rec, log);
     ++	return iterator_next(it, &rec);
      +}
      +
      +static void filtering_ref_iterator_close(void *iter_arg)
      +{
      +	struct filtering_ref_iterator *fri =
      +		(struct filtering_ref_iterator *)iter_arg;
     -+	slice_clear(&fri->oid);
     ++	slice_release(&fri->oid);
      +	reftable_iterator_destroy(&fri->it);
      +}
      +
     -+static int filtering_ref_iterator_next(void *iter_arg, struct record rec)
     ++static int filtering_ref_iterator_next(void *iter_arg,
     ++				       struct reftable_record *rec)
      +{
      +	struct filtering_ref_iterator *fri =
      +		(struct filtering_ref_iterator *)iter_arg;
      +	struct reftable_ref_record *ref =
     -+		(struct reftable_ref_record *)rec.data;
     ++		(struct reftable_ref_record *)rec->data;
      +	int err = 0;
      +	while (true) {
     -+		err = reftable_iterator_next_ref(fri->it, ref);
     ++		err = reftable_iterator_next_ref(&fri->it, ref);
      +		if (err != 0) {
      +			break;
      +		}
     @@ reftable/iter.c (new)
      +		if (fri->double_check) {
      +			struct reftable_iterator it = { 0 };
      +
     -+			err = reftable_table_seek_ref(fri->tab, &it,
     ++			err = reftable_table_seek_ref(&fri->tab, &it,
      +						      ref->ref_name);
      +			if (err == 0) {
     -+				err = reftable_iterator_next_ref(it, ref);
     ++				err = reftable_iterator_next_ref(&it, ref);
      +			}
      +
      +			reftable_iterator_destroy(&it);
     @@ reftable/iter.c (new)
      +	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
      +	block_iter_close(&it->cur);
      +	reftable_block_done(&it->block_reader.block);
     -+	slice_clear(&it->oid);
     ++	slice_release(&it->oid);
      +}
      +
      +static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
     @@ reftable/iter.c (new)
      +	return 0;
      +}
      +
     -+static int indexed_table_ref_iter_next(void *p, struct record rec)
     ++static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec)
      +{
      +	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
      +	struct reftable_ref_record *ref =
     -+		(struct reftable_ref_record *)rec.data;
     ++		(struct reftable_ref_record *)rec->data;
      +
      +	while (true) {
      +		int err = block_iter_next(&it->cur, rec);
     @@ reftable/iter.h (new)
      +#include "slice.h"
      +
      +struct reftable_iterator_vtable {
     -+	int (*next)(void *iter_arg, struct record rec);
     ++	int (*next)(void *iter_arg, struct reftable_record *rec);
      +	void (*close)(void *iter_arg);
      +};
      +
      +void iterator_set_empty(struct reftable_iterator *it);
     -+int iterator_next(struct reftable_iterator it, struct record rec);
     ++int iterator_next(struct reftable_iterator *it, struct reftable_record *rec);
      +
      +/* Returns true for a zeroed out iterator, such as the one returned from
      +   iterator_destroy. */
     -+bool iterator_is_null(struct reftable_iterator it);
     ++bool iterator_is_null(struct reftable_iterator *it);
      +
      +/* iterator that produces only ref records that point to `oid` */
      +struct filtering_ref_iterator {
     @@ reftable/merged.c (new)
      +{
      +	int i = 0;
      +	for (i = 0; i < mi->stack_len; i++) {
     -+		struct record rec = new_record(mi->typ);
     -+		int err = iterator_next(mi->stack[i], rec);
     ++		struct reftable_record rec = reftable_new_record(mi->typ);
     ++		int err = iterator_next(&mi->stack[i], &rec);
      +		if (err < 0) {
      +			return err;
      +		}
      +
      +		if (err > 0) {
      +			reftable_iterator_destroy(&mi->stack[i]);
     -+			record_destroy(&rec);
     ++			reftable_record_destroy(&rec);
      +		} else {
      +			struct pq_entry e = {
      +				.rec = rec,
     @@ reftable/merged.c (new)
      +	reftable_free(mi->stack);
      +}
      +
     -+static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
     ++static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
     ++					       size_t idx)
      +{
     -+	if (iterator_is_null(mi->stack[idx])) {
     -+		return 0;
     ++	struct reftable_record rec = reftable_new_record(mi->typ);
     ++	struct pq_entry e = {
     ++		.rec = rec,
     ++		.index = idx,
     ++	};
     ++	int err = iterator_next(&mi->stack[idx], &rec);
     ++	if (err < 0) {
     ++		return err;
      +	}
      +
     -+	{
     -+		struct record rec = new_record(mi->typ);
     -+		struct pq_entry e = {
     -+			.rec = rec,
     -+			.index = idx,
     -+		};
     -+		int err = iterator_next(mi->stack[idx], rec);
     -+		if (err < 0) {
     -+			return err;
     -+		}
     ++	if (err > 0) {
     ++		reftable_iterator_destroy(&mi->stack[idx]);
     ++		reftable_record_destroy(&rec);
     ++		return 0;
     ++	}
      +
     -+		if (err > 0) {
     -+			reftable_iterator_destroy(&mi->stack[idx]);
     -+			record_destroy(&rec);
     -+			return 0;
     -+		}
     ++	merged_iter_pqueue_add(&mi->pq, e);
     ++	return 0;
     ++}
      +
     -+		merged_iter_pqueue_add(&mi->pq, e);
     ++static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
     ++{
     ++	if (iterator_is_null(&mi->stack[idx])) {
     ++		return 0;
      +	}
     -+	return 0;
     ++	return merged_iter_advance_nonnull_subiter(mi, idx);
      +}
      +
     -+static int merged_iter_next_entry(struct merged_iter *mi, struct record rec)
     ++static int merged_iter_next_entry(struct merged_iter *mi,
     ++				  struct reftable_record *rec)
      +{
      +	struct slice entry_key = { 0 };
      +	struct pq_entry entry = { 0 };
     @@ reftable/merged.c (new)
      +	  such a deployment, the loop below must be changed to collect all
      +	  entries for the same key, and return new the newest one.
      +	*/
     -+	record_key(entry.rec, &entry_key);
     ++	reftable_record_key(&entry.rec, &entry_key);
      +	while (!merged_iter_pqueue_is_empty(mi->pq)) {
      +		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
      +		struct slice k = { 0 };
      +		int err = 0, cmp = 0;
      +
     -+		record_key(top.rec, &k);
     ++		reftable_record_key(&top.rec, &k);
      +
     -+		cmp = slice_compare(k, entry_key);
     -+		slice_clear(&k);
     ++		cmp = slice_cmp(k, entry_key);
     ++		slice_release(&k);
      +
      +		if (cmp > 0) {
      +			break;
     @@ reftable/merged.c (new)
      +		if (err < 0) {
      +			return err;
      +		}
     -+		record_destroy(&top.rec);
     ++		reftable_record_destroy(&top.rec);
      +	}
      +
     -+	record_copy_from(rec, entry.rec, hash_size(mi->hash_id));
     -+	record_destroy(&entry.rec);
     -+	slice_clear(&entry_key);
     ++	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
     ++	reftable_record_destroy(&entry.rec);
     ++	slice_release(&entry_key);
      +	return 0;
      +}
      +
     -+static int merged_iter_next(struct merged_iter *mi, struct record rec)
     ++static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec)
      +{
      +	while (true) {
      +		int err = merged_iter_next_entry(mi, rec);
      +		if (err == 0 && mi->suppress_deletions &&
     -+		    record_is_deletion(rec)) {
     ++		    reftable_record_is_deletion(rec)) {
      +			continue;
      +		}
      +
     @@ reftable/merged.c (new)
      +	}
      +}
      +
     -+static int merged_iter_next_void(void *p, struct record rec)
     ++static int merged_iter_next_void(void *p, struct reftable_record *rec)
      +{
      +	struct merged_iter *mi = (struct merged_iter *)p;
      +	if (merged_iter_pqueue_is_empty(mi->pq)) {
     @@ reftable/merged.c (new)
      +			      struct reftable_reader **stack, int n,
      +			      uint32_t hash_id)
      +{
     ++	struct reftable_merged_table *m = NULL;
      +	uint64_t last_max = 0;
      +	uint64_t first_min = 0;
      +	int i = 0;
     @@ reftable/merged.c (new)
      +		last_max = reftable_reader_max_update_index(r);
      +	}
      +
     -+	{
     -+		struct reftable_merged_table m = {
     -+			.stack = stack,
     -+			.stack_len = n,
     -+			.min = first_min,
     -+			.max = last_max,
     -+			.hash_id = hash_id,
     -+		};
     -+
     -+		*dest = reftable_calloc(sizeof(struct reftable_merged_table));
     -+		**dest = m;
     -+	}
     ++	m = (struct reftable_merged_table *)reftable_calloc(
     ++		sizeof(struct reftable_merged_table));
     ++	m->stack = stack;
     ++	m->stack_len = n;
     ++	m->min = first_min;
     ++	m->max = last_max;
     ++	m->hash_id = hash_id;
     ++	*dest = m;
      +	return 0;
      +}
      +
     @@ reftable/merged.c (new)
      +}
      +
      +int merged_table_seek_record(struct reftable_merged_table *mt,
     -+			     struct reftable_iterator *it, struct record rec)
     ++			     struct reftable_iterator *it,
     ++			     struct reftable_record *rec)
      +{
      +	struct reftable_iterator *iters = reftable_calloc(
      +		sizeof(struct reftable_iterator) * mt->stack_len);
      +	struct merged_iter merged = {
      +		.stack = iters,
     -+		.typ = record_type(rec),
     ++		.typ = reftable_record_type(rec),
      +		.hash_id = mt->hash_id,
      +		.suppress_deletions = mt->suppress_deletions,
      +	};
     @@ reftable/merged.c (new)
      +	struct reftable_ref_record ref = {
      +		.ref_name = (char *)name,
      +	};
     -+	struct record rec = { 0 };
     -+	record_from_ref(&rec, &ref);
     -+	return merged_table_seek_record(mt, it, rec);
     ++	struct reftable_record rec = { 0 };
     ++	reftable_record_from_ref(&rec, &ref);
     ++	return merged_table_seek_record(mt, it, &rec);
      +}
      +
      +int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
     @@ reftable/merged.c (new)
      +		.ref_name = (char *)name,
      +		.update_index = update_index,
      +	};
     -+	struct record rec = { 0 };
     -+	record_from_log(&rec, &log);
     -+	return merged_table_seek_record(mt, it, rec);
     ++	struct reftable_record rec = { 0 };
     ++	reftable_record_from_log(&rec, &log);
     ++	return merged_table_seek_record(mt, it, &rec);
      +}
      +
      +int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
     @@ reftable/merged.h (new)
      +
      +void merged_table_clear(struct reftable_merged_table *mt);
      +int merged_table_seek_record(struct reftable_merged_table *mt,
     -+			     struct reftable_iterator *it, struct record rec);
     ++			     struct reftable_iterator *it,
     ++			     struct reftable_record *rec);
      +
      +#endif
      
     @@ reftable/pq.c (new)
      +	struct slice ak = { 0 };
      +	struct slice bk = { 0 };
      +	int cmp = 0;
     -+	record_key(a.rec, &ak);
     -+	record_key(b.rec, &bk);
     ++	reftable_record_key(&a.rec, &ak);
     ++	reftable_record_key(&b.rec, &bk);
      +
     -+	cmp = slice_compare(ak, bk);
     ++	cmp = slice_cmp(ak, bk);
      +
     -+	slice_clear(&ak);
     -+	slice_clear(&bk);
     ++	slice_release(&ak);
     ++	slice_release(&bk);
      +
      +	if (cmp == 0) {
      +		return a.index > b.index;
     @@ reftable/pq.c (new)
      +{
      +	int i = 0;
      +	for (i = 0; i < pq->len; i++) {
     -+		record_destroy(&pq->heap[i].rec);
     ++		reftable_record_destroy(&pq->heap[i].rec);
      +	}
      +	FREE_AND_NULL(pq->heap);
      +	pq->len = pq->cap = 0;
     @@ reftable/pq.h (new)
      +
      +struct pq_entry {
      +	int index;
     -+	struct record rec;
     ++	struct reftable_record rec;
      +};
      +
      +int pq_less(struct pq_entry a, struct pq_entry b);
     @@ reftable/reader.c (new)
      +#include "reftable.h"
      +#include "tree.h"
      +
     -+uint64_t block_source_size(struct reftable_block_source source)
     ++uint64_t block_source_size(struct reftable_block_source *source)
      +{
     -+	return source.ops->size(source.arg);
     ++	return source->ops->size(source->arg);
      +}
      +
     -+int block_source_read_block(struct reftable_block_source source,
     ++int block_source_read_block(struct reftable_block_source *source,
      +			    struct reftable_block *dest, uint64_t off,
      +			    uint32_t size)
      +{
     -+	int result = source.ops->read_block(source.arg, dest, off, size);
     -+	dest->source = source;
     ++	int result = source->ops->read_block(source->arg, dest, off, size);
     ++	dest->source = *source;
      +	return result;
      +}
      +
     @@ reftable/reader.c (new)
      +		sz = r->size - off;
      +	}
      +
     -+	return block_source_read_block(r->source, dest, off, sz);
     ++	return block_source_read_block(&r->source, dest, off, sz);
      +}
      +
      +uint32_t reftable_reader_hash_id(struct reftable_reader *r)
     @@ reftable/reader.c (new)
      +static int parse_footer(struct reftable_reader *r, byte *footer, byte *header)
      +{
      +	byte *f = footer;
     ++	byte first_block_typ;
      +	int err = 0;
     ++	uint32_t computed_crc;
     ++	uint32_t file_crc;
     ++
      +	if (memcmp(f, "REFT", 4)) {
      +		err = REFTABLE_FORMAT_ERROR;
      +		goto exit;
     @@ reftable/reader.c (new)
      +	r->log_offsets.index_offset = get_be64(f);
      +	f += 8;
      +
     -+	{
     -+		uint32_t computed_crc = crc32(0, footer, f - footer);
     -+		uint32_t file_crc = get_be32(f);
     -+		f += 4;
     -+		if (computed_crc != file_crc) {
     -+			err = REFTABLE_FORMAT_ERROR;
     -+			goto exit;
     -+		}
     ++	computed_crc = crc32(0, footer, f - footer);
     ++	file_crc = get_be32(f);
     ++	f += 4;
     ++	if (computed_crc != file_crc) {
     ++		err = REFTABLE_FORMAT_ERROR;
     ++		goto exit;
      +	}
      +
     -+	{
     -+		byte first_block_typ = header[header_size(r->version)];
     -+		r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
     -+		r->ref_offsets.offset = 0;
     -+		r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
     -+					  r->log_offsets.offset > 0);
     -+		r->obj_offsets.present = r->obj_offsets.offset > 0;
     -+	}
     ++	first_block_typ = header[header_size(r->version)];
     ++	r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
     ++	r->ref_offsets.offset = 0;
     ++	r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
     ++				  r->log_offsets.offset > 0);
     ++	r->obj_offsets.present = r->obj_offsets.offset > 0;
      +	err = 0;
      +exit:
      +	return err;
      +}
      +
     -+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
     ++int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
      +		const char *name)
      +{
      +	struct reftable_block footer = { 0 };
     @@ reftable/reader.c (new)
      +	}
      +
      +	r->size = block_source_size(source) - footer_size(r->version);
     -+	r->source = source;
     ++	r->source = *source;
      +	r->name = xstrdup(name);
      +	r->hash_id = 0;
      +
     @@ reftable/reader.c (new)
      +	block_iter_copy_from(&dest->bi, &src->bi);
      +}
      +
     -+static int table_iter_next_in_block(struct table_iter *ti, struct record rec)
     ++static int table_iter_next_in_block(struct table_iter *ti,
     ++				    struct reftable_record *rec)
      +{
      +	int res = block_iter_next(&ti->bi, rec);
     -+	if (res == 0 && record_type(rec) == BLOCK_TYPE_REF) {
     -+		((struct reftable_ref_record *)rec.data)->update_index +=
     ++	if (res == 0 && reftable_record_type(rec) == BLOCK_TYPE_REF) {
     ++		((struct reftable_ref_record *)rec->data)->update_index +=
      +			ti->r->min_update_index;
      +	}
      +
     @@ reftable/reader.c (new)
      +	}
      +
      +	*typ = data[0];
     -+	if (is_block_type(*typ)) {
     ++	if (reftable_is_block_type(*typ)) {
      +		result = get_be24(data + 1);
      +	}
      +	return result;
     @@ reftable/reader.c (new)
      +	return 0;
      +}
      +
     -+static int table_iter_next(struct table_iter *ti, struct record rec)
     ++static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
      +{
     -+	if (record_type(rec) != ti->typ) {
     ++	if (reftable_record_type(rec) != ti->typ) {
      +		return REFTABLE_API_ERROR;
      +	}
      +
     @@ reftable/reader.c (new)
      +	}
      +}
      +
     -+static int table_iter_next_void(void *ti, struct record rec)
     ++static int table_iter_next_void(void *ti, struct reftable_record *rec)
      +{
      +	return table_iter_next((struct table_iter *)ti, rec);
      +}
     @@ reftable/reader.c (new)
      +}
      +
      +static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
     -+			      struct record want)
     ++			      struct reftable_record *want)
      +{
     -+	struct record rec = new_record(record_type(want));
     ++	struct reftable_record rec =
     ++		reftable_new_record(reftable_record_type(want));
      +	struct slice want_key = { 0 };
      +	struct slice got_key = { 0 };
      +	struct table_iter next = { 0 };
      +	int err = -1;
     -+	record_key(want, &want_key);
     ++	reftable_record_key(want, &want_key);
      +
      +	while (true) {
      +		err = table_iter_next_block(&next, ti);
     @@ reftable/reader.c (new)
      +			goto exit;
      +		}
      +		{
     -+			int cmp = slice_compare(got_key, want_key);
     ++			int cmp = slice_cmp(got_key, want_key);
      +			if (cmp > 0) {
      +				table_iter_block_done(&next);
      +				break;
     @@ reftable/reader.c (new)
      +
      +exit:
      +	block_iter_close(&next.bi);
     -+	record_destroy(&rec);
     -+	slice_clear(&want_key);
     -+	slice_clear(&got_key);
     ++	reftable_record_destroy(&rec);
     ++	slice_release(&want_key);
     ++	slice_release(&got_key);
      +	return err;
      +}
      +
      +static int reader_seek_indexed(struct reftable_reader *r,
     -+			       struct reftable_iterator *it, struct record rec)
     ++			       struct reftable_iterator *it,
     ++			       struct reftable_record *rec)
      +{
     -+	struct index_record want_index = { 0 };
     -+	struct record want_index_rec = { 0 };
     -+	struct index_record index_result = { 0 };
     -+	struct record index_result_rec = { 0 };
     ++	struct reftable_index_record want_index = { 0 };
     ++	struct reftable_record want_index_rec = { 0 };
     ++	struct reftable_index_record index_result = { 0 };
     ++	struct reftable_record index_result_rec = { 0 };
      +	struct table_iter index_iter = { 0 };
      +	struct table_iter next = { 0 };
      +	int err = 0;
      +
     -+	record_key(rec, &want_index.last_key);
     -+	record_from_index(&want_index_rec, &want_index);
     -+	record_from_index(&index_result_rec, &index_result);
     ++	reftable_record_key(rec, &want_index.last_key);
     ++	reftable_record_from_index(&want_index_rec, &want_index);
     ++	reftable_record_from_index(&index_result_rec, &index_result);
      +
     -+	err = reader_start(r, &index_iter, record_type(rec), true);
     ++	err = reader_start(r, &index_iter, reftable_record_type(rec), true);
      +	if (err < 0) {
      +		goto exit;
      +	}
      +
     -+	err = reader_seek_linear(r, &index_iter, want_index_rec);
     ++	err = reader_seek_linear(r, &index_iter, &want_index_rec);
      +	while (true) {
     -+		err = table_iter_next(&index_iter, index_result_rec);
     ++		err = table_iter_next(&index_iter, &index_result_rec);
      +		table_iter_block_done(&index_iter);
      +		if (err != 0) {
      +			goto exit;
     @@ reftable/reader.c (new)
      +			goto exit;
      +		}
      +
     -+		if (next.typ == record_type(rec)) {
     ++		if (next.typ == reftable_record_type(rec)) {
      +			err = 0;
      +			break;
      +		}
     @@ reftable/reader.c (new)
      +exit:
      +	block_iter_close(&next.bi);
      +	table_iter_close(&index_iter);
     -+	record_clear(want_index_rec);
     -+	record_clear(index_result_rec);
     ++	reftable_record_clear(&want_index_rec);
     ++	reftable_record_clear(&index_result_rec);
      +	return err;
      +}
      +
      +static int reader_seek_internal(struct reftable_reader *r,
     -+				struct reftable_iterator *it, struct record rec)
     ++				struct reftable_iterator *it,
     ++				struct reftable_record *rec)
      +{
      +	struct reftable_reader_offsets *offs =
     -+		reader_offsets_for(r, record_type(rec));
     ++		reader_offsets_for(r, reftable_record_type(rec));
      +	uint64_t idx = offs->index_offset;
      +	struct table_iter ti = { 0 };
      +	int err = 0;
     @@ reftable/reader.c (new)
      +		return reader_seek_indexed(r, it, rec);
      +	}
      +
     -+	err = reader_start(r, &ti, record_type(rec), false);
     ++	err = reader_start(r, &ti, reftable_record_type(rec), false);
      +	if (err < 0) {
      +		return err;
      +	}
     @@ reftable/reader.c (new)
      +}
      +
      +int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
     -+		struct record rec)
     ++		struct reftable_record *rec)
      +{
     -+	byte typ = record_type(rec);
     ++	byte typ = reftable_record_type(rec);
      +
      +	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
      +	if (!offs->present) {
     @@ reftable/reader.c (new)
      +	struct reftable_ref_record ref = {
      +		.ref_name = (char *)name,
      +	};
     -+	struct record rec = { 0 };
     -+	record_from_ref(&rec, &ref);
     -+	return reader_seek(r, it, rec);
     ++	struct reftable_record rec = { 0 };
     ++	reftable_record_from_ref(&rec, &ref);
     ++	return reader_seek(r, it, &rec);
      +}
      +
      +int reftable_reader_seek_log_at(struct reftable_reader *r,
     @@ reftable/reader.c (new)
      +		.ref_name = (char *)name,
      +		.update_index = update_index,
      +	};
     -+	struct record rec = { 0 };
     -+	record_from_log(&rec, &log);
     -+	return reader_seek(r, it, rec);
     ++	struct reftable_record rec = { 0 };
     ++	reftable_record_from_log(&rec, &log);
     ++	return reader_seek(r, it, &rec);
      +}
      +
      +int reftable_reader_seek_log(struct reftable_reader *r,
     @@ reftable/reader.c (new)
      +}
      +
      +int reftable_new_reader(struct reftable_reader **p,
     -+			struct reftable_block_source src, char const *name)
     ++			struct reftable_block_source *src, char const *name)
      +{
      +	struct reftable_reader *rd =
      +		reftable_calloc(sizeof(struct reftable_reader));
     @@ reftable/reader.c (new)
      +	if (err == 0) {
      +		*p = rd;
      +	} else {
     -+		block_source_close(&src);
     ++		block_source_close(src);
      +		reftable_free(rd);
      +	}
      +	return err;
     @@ reftable/reader.c (new)
      +					    struct reftable_iterator *it,
      +					    byte *oid)
      +{
     -+	struct obj_record want = {
     ++	struct reftable_obj_record want = {
      +		.hash_prefix = oid,
      +		.hash_prefix_len = r->object_id_len,
      +	};
     -+	struct record want_rec = { 0 };
     ++	struct reftable_record want_rec = { 0 };
      +	struct reftable_iterator oit = { 0 };
     -+	struct obj_record got = { 0 };
     -+	struct record got_rec = { 0 };
     ++	struct reftable_obj_record got = { 0 };
     ++	struct reftable_record got_rec = { 0 };
      +	int err = 0;
     ++	struct indexed_table_ref_iter *itr = NULL;
      +
      +	/* Look through the reverse index. */
     -+	record_from_obj(&want_rec, &want);
     -+	err = reader_seek(r, &oit, want_rec);
     ++	reftable_record_from_obj(&want_rec, &want);
     ++	err = reader_seek(r, &oit, &want_rec);
      +	if (err != 0) {
      +		goto exit;
      +	}
      +
     -+	/* read out the obj_record */
     -+	record_from_obj(&got_rec, &got);
     -+	err = iterator_next(oit, got_rec);
     ++	/* read out the reftable_obj_record */
     ++	reftable_record_from_obj(&got_rec, &got);
     ++	err = iterator_next(&oit, &got_rec);
      +	if (err < 0) {
      +		goto exit;
      +	}
     @@ reftable/reader.c (new)
      +		goto exit;
      +	}
      +
     -+	{
     -+		struct indexed_table_ref_iter *itr = NULL;
     -+		err = new_indexed_table_ref_iter(&itr, r, oid,
     -+						 hash_size(r->hash_id),
     -+						 got.offsets, got.offset_len);
     -+		if (err < 0) {
     -+			goto exit;
     -+		}
     -+		got.offsets = NULL;
     -+		iterator_from_indexed_table_ref_iter(it, itr);
     ++	err = new_indexed_table_ref_iter(&itr, r, oid, hash_size(r->hash_id),
     ++					 got.offsets, got.offset_len);
     ++	if (err < 0) {
     ++		goto exit;
      +	}
     ++	got.offsets = NULL;
     ++	iterator_from_indexed_table_ref_iter(it, itr);
      +
      +exit:
      +	reftable_iterator_destroy(&oit);
     -+	record_clear(got_rec);
     ++	reftable_record_clear(&got_rec);
      +	return err;
      +}
      +
     @@ reftable/reader.h (new)
      +#include "record.h"
      +#include "reftable.h"
      +
     -+uint64_t block_source_size(struct reftable_block_source source);
     ++uint64_t block_source_size(struct reftable_block_source *source);
      +
     -+int block_source_read_block(struct reftable_block_source source,
     ++int block_source_read_block(struct reftable_block_source *source,
      +			    struct reftable_block *dest, uint64_t off,
      +			    uint32_t size);
      +void block_source_close(struct reftable_block_source *source);
     @@ reftable/reader.h (new)
      +	struct reftable_reader_offsets log_offsets;
      +};
      +
     -+int init_reader(struct reftable_reader *r, struct reftable_block_source source,
     ++int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
      +		const char *name);
      +int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
     -+		struct record rec);
     ++		struct reftable_record *rec);
      +void reader_close(struct reftable_reader *r);
      +const char *reader_name(struct reftable_reader *r);
      +
     @@ reftable/record.c (new)
      +{
      +	byte buf[10] = { 0 };
      +	int i = 9;
     ++	int n = 0;
      +	buf[i] = (byte)(val & 0x7f);
      +	i--;
      +	while (true) {
     @@ reftable/record.c (new)
      +		i--;
      +	}
      +
     -+	{
     -+		int n = sizeof(buf) - i - 1;
     -+		if (dest.len < n) {
     -+			return -1;
     -+		}
     -+		memcpy(dest.buf, &buf[i + 1], n);
     -+		return n;
     ++	n = sizeof(buf) - i - 1;
     ++	if (dest.len < n) {
     ++		return -1;
      +	}
     ++	memcpy(dest.buf, &buf[i + 1], n);
     ++	return n;
      +}
      +
     -+int is_block_type(byte typ)
     ++int reftable_is_block_type(byte typ)
      +{
      +	switch (typ) {
      +	case BLOCK_TYPE_REF:
     @@ reftable/record.c (new)
      +	return start.len - s.len;
      +}
      +
     -+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
     -+	       struct slice key, byte extra)
     ++int reftable_encode_key(bool *restart, struct slice dest, struct slice prev_key,
     ++			struct slice key, byte extra)
      +{
      +	struct slice start = dest;
      +	int prefix_len = common_prefix_size(prev_key, key);
     @@ reftable/record.c (new)
      +	return start.len - dest.len;
      +}
      +
     -+int decode_key(struct slice *key, byte *extra, struct slice last_key,
     -+	       struct slice in)
     ++int reftable_decode_key(struct slice *key, byte *extra, struct slice last_key,
     ++			struct slice in)
      +{
      +	int start_len = in.len;
      +	uint64_t prefix_len = 0;
     @@ reftable/record.c (new)
      +		(const struct reftable_ref_record *)p);
      +}
      +
     -+struct record_vtable reftable_ref_record_vtable = {
     ++struct reftable_record_vtable reftable_ref_record_vtable = {
      +	.key = &reftable_ref_record_key,
      +	.type = BLOCK_TYPE_REF,
      +	.copy_from = &reftable_ref_record_copy_from,
     @@ reftable/record.c (new)
      +	.is_deletion = &reftable_ref_record_is_deletion_void,
      +};
      +
     -+static void obj_record_key(const void *r, struct slice *dest)
     ++static void reftable_obj_record_key(const void *r, struct slice *dest)
      +{
     -+	const struct obj_record *rec = (const struct obj_record *)r;
     ++	const struct reftable_obj_record *rec =
     ++		(const struct reftable_obj_record *)r;
      +	slice_resize(dest, rec->hash_prefix_len);
      +	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
      +}
      +
     -+static void obj_record_clear(void *rec)
     ++static void reftable_obj_record_clear(void *rec)
      +{
     -+	struct obj_record *obj = (struct obj_record *)rec;
     ++	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
      +	FREE_AND_NULL(obj->hash_prefix);
      +	FREE_AND_NULL(obj->offsets);
     -+	memset(obj, 0, sizeof(struct obj_record));
     ++	memset(obj, 0, sizeof(struct reftable_obj_record));
      +}
      +
     -+static void obj_record_copy_from(void *rec, const void *src_rec, int hash_size)
     ++static void reftable_obj_record_copy_from(void *rec, const void *src_rec,
     ++					  int hash_size)
      +{
     -+	struct obj_record *obj = (struct obj_record *)rec;
     -+	const struct obj_record *src = (const struct obj_record *)src_rec;
     ++	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
     ++	const struct reftable_obj_record *src =
     ++		(const struct reftable_obj_record *)src_rec;
      +
     -+	obj_record_clear(obj);
     ++	reftable_obj_record_clear(obj);
      +	*obj = *src;
      +	obj->hash_prefix = reftable_malloc(obj->hash_prefix_len);
      +	memcpy(obj->hash_prefix, src->hash_prefix, obj->hash_prefix_len);
     @@ reftable/record.c (new)
      +	}
      +}
      +
     -+static byte obj_record_val_type(const void *rec)
     ++static byte reftable_obj_record_val_type(const void *rec)
      +{
     -+	struct obj_record *r = (struct obj_record *)rec;
     ++	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
      +	if (r->offset_len > 0 && r->offset_len < 8) {
      +		return r->offset_len;
      +	}
      +	return 0;
      +}
      +
     -+static int obj_record_encode(const void *rec, struct slice s, int hash_size)
     ++static int reftable_obj_record_encode(const void *rec, struct slice s,
     ++				      int hash_size)
      +{
     -+	struct obj_record *r = (struct obj_record *)rec;
     ++	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
      +	struct slice start = s;
     ++	int i = 0;
      +	int n = 0;
     ++	uint64_t last = 0;
      +	if (r->offset_len == 0 || r->offset_len >= 8) {
      +		n = put_var_int(s, r->offset_len);
      +		if (n < 0) {
     @@ reftable/record.c (new)
      +	}
      +	slice_consume(&s, n);
      +
     -+	{
     -+		uint64_t last = r->offsets[0];
     -+		int i = 0;
     -+		for (i = 1; i < r->offset_len; i++) {
     -+			int n = put_var_int(s, r->offsets[i] - last);
     -+			if (n < 0) {
     -+				return -1;
     -+			}
     -+			slice_consume(&s, n);
     -+			last = r->offsets[i];
     ++	last = r->offsets[0];
     ++	for (i = 1; i < r->offset_len; i++) {
     ++		int n = put_var_int(s, r->offsets[i] - last);
     ++		if (n < 0) {
     ++			return -1;
      +		}
     ++		slice_consume(&s, n);
     ++		last = r->offsets[i];
      +	}
      +	return start.len - s.len;
      +}
      +
     -+static int obj_record_decode(void *rec, struct slice key, byte val_type,
     -+			     struct slice in, int hash_size)
     ++static int reftable_obj_record_decode(void *rec, struct slice key,
     ++				      byte val_type, struct slice in,
     ++				      int hash_size)
      +{
      +	struct slice start = in;
     -+	struct obj_record *r = (struct obj_record *)rec;
     ++	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
      +	uint64_t count = val_type;
      +	int n = 0;
     ++	uint64_t last;
     ++	int j;
      +	r->hash_prefix = reftable_malloc(key.len);
      +	memcpy(r->hash_prefix, key.buf, key.len);
      +	r->hash_prefix_len = key.len;
     @@ reftable/record.c (new)
      +	}
      +	slice_consume(&in, n);
      +
     -+	{
     -+		uint64_t last = r->offsets[0];
     -+		int j = 1;
     -+		while (j < count) {
     -+			uint64_t delta = 0;
     -+			int n = get_var_int(&delta, in);
     -+			if (n < 0) {
     -+				return n;
     -+			}
     -+			slice_consume(&in, n);
     -+
     -+			last = r->offsets[j] = (delta + last);
     -+			j++;
     ++	last = r->offsets[0];
     ++	j = 1;
     ++	while (j < count) {
     ++		uint64_t delta = 0;
     ++		int n = get_var_int(&delta, in);
     ++		if (n < 0) {
     ++			return n;
      +		}
     ++		slice_consume(&in, n);
     ++
     ++		last = r->offsets[j] = (delta + last);
     ++		j++;
      +	}
      +	return start.len - in.len;
      +}
     @@ reftable/record.c (new)
      +	return false;
      +}
      +
     -+struct record_vtable obj_record_vtable = {
     -+	.key = &obj_record_key,
     ++struct reftable_record_vtable reftable_obj_record_vtable = {
     ++	.key = &reftable_obj_record_key,
      +	.type = BLOCK_TYPE_OBJ,
     -+	.copy_from = &obj_record_copy_from,
     -+	.val_type = &obj_record_val_type,
     -+	.encode = &obj_record_encode,
     -+	.decode = &obj_record_decode,
     -+	.clear = &obj_record_clear,
     ++	.copy_from = &reftable_obj_record_copy_from,
     ++	.val_type = &reftable_obj_record_val_type,
     ++	.encode = &reftable_obj_record_encode,
     ++	.decode = &reftable_obj_record_decode,
     ++	.clear = &reftable_obj_record_clear,
      +	.is_deletion = not_a_deletion,
      +};
      +
     @@ reftable/record.c (new)
      +	memcpy(r->message, dest.buf, dest.len);
      +	r->message[dest.len] = 0;
      +
     -+	slice_clear(&dest);
     ++	slice_release(&dest);
      +	return start.len - in.len;
      +
      +error:
     -+	slice_clear(&dest);
     ++	slice_release(&dest);
      +	return REFTABLE_FORMAT_ERROR;
      +}
      +
     @@ reftable/record.c (new)
      +		(const struct reftable_log_record *)p);
      +}
      +
     -+struct record_vtable reftable_log_record_vtable = {
     ++struct reftable_record_vtable reftable_log_record_vtable = {
      +	.key = &reftable_log_record_key,
      +	.type = BLOCK_TYPE_LOG,
      +	.copy_from = &reftable_log_record_copy_from,
     @@ reftable/record.c (new)
      +	.is_deletion = &reftable_log_record_is_deletion_void,
      +};
      +
     -+struct record new_record(byte typ)
     ++struct reftable_record reftable_new_record(byte typ)
      +{
     -+	struct record rec = { NULL };
     ++	struct reftable_record rec = { NULL };
      +	switch (typ) {
      +	case BLOCK_TYPE_REF: {
      +		struct reftable_ref_record *r =
      +			reftable_calloc(sizeof(struct reftable_ref_record));
     -+		record_from_ref(&rec, r);
     ++		reftable_record_from_ref(&rec, r);
      +		return rec;
      +	}
      +
      +	case BLOCK_TYPE_OBJ: {
     -+		struct obj_record *r =
     -+			reftable_calloc(sizeof(struct obj_record));
     -+		record_from_obj(&rec, r);
     ++		struct reftable_obj_record *r =
     ++			reftable_calloc(sizeof(struct reftable_obj_record));
     ++		reftable_record_from_obj(&rec, r);
      +		return rec;
      +	}
      +	case BLOCK_TYPE_LOG: {
      +		struct reftable_log_record *r =
      +			reftable_calloc(sizeof(struct reftable_log_record));
     -+		record_from_log(&rec, r);
     ++		reftable_record_from_log(&rec, r);
      +		return rec;
      +	}
      +	case BLOCK_TYPE_INDEX: {
     -+		struct index_record *r =
     -+			reftable_calloc(sizeof(struct index_record));
     -+		record_from_index(&rec, r);
     ++		struct reftable_index_record *r =
     ++			reftable_calloc(sizeof(struct reftable_index_record));
     ++		reftable_record_from_index(&rec, r);
      +		return rec;
      +	}
      +	}
     @@ reftable/record.c (new)
      +	return rec;
      +}
      +
     -+void *record_yield(struct record *rec)
     ++void *reftable_record_yield(struct reftable_record *rec)
      +{
      +	void *p = rec->data;
      +	rec->data = NULL;
      +	return p;
      +}
      +
     -+void record_destroy(struct record *rec)
     ++void reftable_record_destroy(struct reftable_record *rec)
      +{
     -+	record_clear(*rec);
     -+	reftable_free(record_yield(rec));
     ++	reftable_record_clear(rec);
     ++	reftable_free(reftable_record_yield(rec));
      +}
      +
     -+static void index_record_key(const void *r, struct slice *dest)
     ++static void reftable_index_record_key(const void *r, struct slice *dest)
      +{
     -+	struct index_record *rec = (struct index_record *)r;
     ++	struct reftable_index_record *rec = (struct reftable_index_record *)r;
      +	slice_copy(dest, rec->last_key);
      +}
      +
     -+static void index_record_copy_from(void *rec, const void *src_rec,
     -+				   int hash_size)
     ++static void reftable_index_record_copy_from(void *rec, const void *src_rec,
     ++					    int hash_size)
      +{
     -+	struct index_record *dst = (struct index_record *)rec;
     -+	struct index_record *src = (struct index_record *)src_rec;
     ++	struct reftable_index_record *dst = (struct reftable_index_record *)rec;
     ++	struct reftable_index_record *src =
     ++		(struct reftable_index_record *)src_rec;
      +
      +	slice_copy(&dst->last_key, src->last_key);
      +	dst->offset = src->offset;
      +}
      +
     -+static void index_record_clear(void *rec)
     ++static void reftable_index_record_clear(void *rec)
      +{
     -+	struct index_record *idx = (struct index_record *)rec;
     -+	slice_clear(&idx->last_key);
     ++	struct reftable_index_record *idx = (struct reftable_index_record *)rec;
     ++	slice_release(&idx->last_key);
      +}
      +
     -+static byte index_record_val_type(const void *rec)
     ++static byte reftable_index_record_val_type(const void *rec)
      +{
      +	return 0;
      +}
      +
     -+static int index_record_encode(const void *rec, struct slice out, int hash_size)
     ++static int reftable_index_record_encode(const void *rec, struct slice out,
     ++					int hash_size)
      +{
     -+	const struct index_record *r = (const struct index_record *)rec;
     ++	const struct reftable_index_record *r =
     ++		(const struct reftable_index_record *)rec;
      +	struct slice start = out;
      +
      +	int n = put_var_int(out, r->offset);
     @@ reftable/record.c (new)
      +	return start.len - out.len;
      +}
      +
     -+static int index_record_decode(void *rec, struct slice key, byte val_type,
     -+			       struct slice in, int hash_size)
     ++static int reftable_index_record_decode(void *rec, struct slice key,
     ++					byte val_type, struct slice in,
     ++					int hash_size)
      +{
      +	struct slice start = in;
     -+	struct index_record *r = (struct index_record *)rec;
     ++	struct reftable_index_record *r = (struct reftable_index_record *)rec;
      +	int n = 0;
      +
      +	slice_copy(&r->last_key, key);
     @@ reftable/record.c (new)
      +	return start.len - in.len;
      +}
      +
     -+struct record_vtable index_record_vtable = {
     -+	.key = &index_record_key,
     ++struct reftable_record_vtable reftable_index_record_vtable = {
     ++	.key = &reftable_index_record_key,
      +	.type = BLOCK_TYPE_INDEX,
     -+	.copy_from = &index_record_copy_from,
     -+	.val_type = &index_record_val_type,
     -+	.encode = &index_record_encode,
     -+	.decode = &index_record_decode,
     -+	.clear = &index_record_clear,
     ++	.copy_from = &reftable_index_record_copy_from,
     ++	.val_type = &reftable_index_record_val_type,
     ++	.encode = &reftable_index_record_encode,
     ++	.decode = &reftable_index_record_decode,
     ++	.clear = &reftable_index_record_clear,
      +	.is_deletion = &not_a_deletion,
      +};
      +
     -+void record_key(struct record rec, struct slice *dest)
     ++void reftable_record_key(struct reftable_record *rec, struct slice *dest)
      +{
     -+	rec.ops->key(rec.data, dest);
     ++	rec->ops->key(rec->data, dest);
      +}
      +
     -+byte record_type(struct record rec)
     ++byte reftable_record_type(struct reftable_record *rec)
      +{
     -+	return rec.ops->type;
     ++	return rec->ops->type;
      +}
      +
     -+int record_encode(struct record rec, struct slice dest, int hash_size)
     ++int reftable_record_encode(struct reftable_record *rec, struct slice dest,
     ++			   int hash_size)
      +{
     -+	return rec.ops->encode(rec.data, dest, hash_size);
     ++	return rec->ops->encode(rec->data, dest, hash_size);
      +}
      +
     -+void record_copy_from(struct record rec, struct record src, int hash_size)
     ++void reftable_record_copy_from(struct reftable_record *rec,
     ++			       struct reftable_record *src, int hash_size)
      +{
     -+	assert(src.ops->type == rec.ops->type);
     ++	assert(src->ops->type == rec->ops->type);
      +
     -+	rec.ops->copy_from(rec.data, src.data, hash_size);
     ++	rec->ops->copy_from(rec->data, src->data, hash_size);
      +}
      +
     -+byte record_val_type(struct record rec)
     ++byte reftable_record_val_type(struct reftable_record *rec)
      +{
     -+	return rec.ops->val_type(rec.data);
     ++	return rec->ops->val_type(rec->data);
      +}
      +
     -+int record_decode(struct record rec, struct slice key, byte extra,
     -+		  struct slice src, int hash_size)
     ++int reftable_record_decode(struct reftable_record *rec, struct slice key,
     ++			   byte extra, struct slice src, int hash_size)
      +{
     -+	return rec.ops->decode(rec.data, key, extra, src, hash_size);
     ++	return rec->ops->decode(rec->data, key, extra, src, hash_size);
      +}
      +
     -+void record_clear(struct record rec)
     ++void reftable_record_clear(struct reftable_record *rec)
      +{
     -+	rec.ops->clear(rec.data);
     ++	rec->ops->clear(rec->data);
      +}
      +
     -+bool record_is_deletion(struct record rec)
     ++bool reftable_record_is_deletion(struct reftable_record *rec)
      +{
     -+	return rec.ops->is_deletion(rec.data);
     ++	return rec->ops->is_deletion(rec->data);
      +}
      +
     -+void record_from_ref(struct record *rec, struct reftable_ref_record *ref_rec)
     ++void reftable_record_from_ref(struct reftable_record *rec,
     ++			      struct reftable_ref_record *ref_rec)
      +{
      +	assert(rec->ops == NULL);
      +	rec->data = ref_rec;
      +	rec->ops = &reftable_ref_record_vtable;
      +}
      +
     -+void record_from_obj(struct record *rec, struct obj_record *obj_rec)
     ++void reftable_record_from_obj(struct reftable_record *rec,
     ++			      struct reftable_obj_record *obj_rec)
      +{
      +	assert(rec->ops == NULL);
      +	rec->data = obj_rec;
     -+	rec->ops = &obj_record_vtable;
     ++	rec->ops = &reftable_obj_record_vtable;
      +}
      +
     -+void record_from_index(struct record *rec, struct index_record *index_rec)
     ++void reftable_record_from_index(struct reftable_record *rec,
     ++				struct reftable_index_record *index_rec)
      +{
      +	assert(rec->ops == NULL);
      +	rec->data = index_rec;
     -+	rec->ops = &index_record_vtable;
     ++	rec->ops = &reftable_index_record_vtable;
      +}
      +
     -+void record_from_log(struct record *rec, struct reftable_log_record *log_rec)
     ++void reftable_record_from_log(struct reftable_record *rec,
     ++			      struct reftable_log_record *log_rec)
      +{
      +	assert(rec->ops == NULL);
      +	rec->data = log_rec;
      +	rec->ops = &reftable_log_record_vtable;
      +}
      +
     -+struct reftable_ref_record *record_as_ref(struct record rec)
     ++struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *rec)
      +{
     -+	assert(record_type(rec) == BLOCK_TYPE_REF);
     -+	return (struct reftable_ref_record *)rec.data;
     ++	assert(reftable_record_type(rec) == BLOCK_TYPE_REF);
     ++	return (struct reftable_ref_record *)rec->data;
      +}
      +
     -+struct reftable_log_record *record_as_log(struct record rec)
     ++struct reftable_log_record *reftable_record_as_log(struct reftable_record *rec)
      +{
     -+	assert(record_type(rec) == BLOCK_TYPE_LOG);
     -+	return (struct reftable_log_record *)rec.data;
     ++	assert(reftable_record_type(rec) == BLOCK_TYPE_LOG);
     ++	return (struct reftable_log_record *)rec->data;
      +}
      +
      +static bool hash_equal(byte *a, byte *b, int hash_size)
     @@ reftable/record.h (new)
      +int put_var_int(struct slice dest, uint64_t val);
      +
      +/* Methods for records. */
     -+struct record_vtable {
     ++struct reftable_record_vtable {
      +	/* encode the key of to a byte slice. */
      +	void (*key)(const void *rec, struct slice *dest);
      +
     @@ reftable/record.h (new)
      +};
      +
      +/* record is a generic wrapper for different types of records. */
     -+struct record {
     ++struct reftable_record {
      +	void *data;
     -+	struct record_vtable *ops;
     ++	struct reftable_record_vtable *ops;
      +};
      +
      +/* returns true for recognized block types. Block start with the block type. */
     -+int is_block_type(byte typ);
     ++int reftable_is_block_type(byte typ);
      +
      +/* creates a malloced record of the given type. Dispose with record_destroy */
     -+struct record new_record(byte typ);
     ++struct reftable_record reftable_new_record(byte typ);
      +
     -+extern struct record_vtable reftable_ref_record_vtable;
     ++extern struct reftable_record_vtable reftable_ref_record_vtable;
      +
      +/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
      +   number of bytes written. */
     -+int encode_key(bool *restart, struct slice dest, struct slice prev_key,
     -+	       struct slice key, byte extra);
     ++int reftable_encode_key(bool *restart, struct slice dest, struct slice prev_key,
     ++			struct slice key, byte extra);
      +
      +/* Decode into `key` and `extra` from `in` */
     -+int decode_key(struct slice *key, byte *extra, struct slice last_key,
     -+	       struct slice in);
     ++int reftable_decode_key(struct slice *key, byte *extra, struct slice last_key,
     ++			struct slice in);
      +
     -+/* index_record are used internally to speed up lookups. */
     -+struct index_record {
     ++/* reftable_index_record are used internally to speed up lookups. */
     ++struct reftable_index_record {
      +	uint64_t offset; /* Offset of block */
      +	struct slice last_key; /* Last key of the block. */
      +};
      +
     -+/* obj_record stores an object ID => ref mapping. */
     -+struct obj_record {
     ++/* reftable_obj_record stores an object ID => ref mapping. */
     ++struct reftable_obj_record {
      +	byte *hash_prefix; /* leading bytes of the object ID */
      +	int hash_prefix_len; /* number of leading bytes. Constant
      +			      * across a single table. */
     @@ reftable/record.h (new)
      +
      +/* see struct record_vtable */
      +
     -+void record_key(struct record rec, struct slice *dest);
     -+byte record_type(struct record rec);
     -+void record_copy_from(struct record rec, struct record src, int hash_size);
     -+byte record_val_type(struct record rec);
     -+int record_encode(struct record rec, struct slice dest, int hash_size);
     -+int record_decode(struct record rec, struct slice key, byte extra,
     -+		  struct slice src, int hash_size);
     -+bool record_is_deletion(struct record rec);
     ++void reftable_record_key(struct reftable_record *rec, struct slice *dest);
     ++byte reftable_record_type(struct reftable_record *rec);
     ++void reftable_record_copy_from(struct reftable_record *rec,
     ++			       struct reftable_record *src, int hash_size);
     ++byte reftable_record_val_type(struct reftable_record *rec);
     ++int reftable_record_encode(struct reftable_record *rec, struct slice dest,
     ++			   int hash_size);
     ++int reftable_record_decode(struct reftable_record *rec, struct slice key,
     ++			   byte extra, struct slice src, int hash_size);
     ++bool reftable_record_is_deletion(struct reftable_record *rec);
      +
      +/* zeroes out the embedded record */
     -+void record_clear(struct record rec);
     ++void reftable_record_clear(struct reftable_record *rec);
      +
     -+/* clear out the record, yielding the record data that was encapsulated. */
     -+void *record_yield(struct record *rec);
     ++/* clear out the record, yielding the reftable_record data that was
     ++ * encapsulated. */
     ++void *reftable_record_yield(struct reftable_record *rec);
      +
      +/* clear and deallocate embedded record, and zero `rec`. */
     -+void record_destroy(struct record *rec);
     ++void reftable_record_destroy(struct reftable_record *rec);
      +
      +/* initialize generic records from concrete records. The generic record should
      + * be zeroed out. */
     -+void record_from_obj(struct record *rec, struct obj_record *objrec);
     -+void record_from_index(struct record *rec, struct index_record *idxrec);
     -+void record_from_ref(struct record *rec, struct reftable_ref_record *refrec);
     -+void record_from_log(struct record *rec, struct reftable_log_record *logrec);
     -+struct reftable_ref_record *record_as_ref(struct record ref);
     -+struct reftable_log_record *record_as_log(struct record ref);
     ++void reftable_record_from_obj(struct reftable_record *rec,
     ++			      struct reftable_obj_record *objrec);
     ++void reftable_record_from_index(struct reftable_record *rec,
     ++				struct reftable_index_record *idxrec);
     ++void reftable_record_from_ref(struct reftable_record *rec,
     ++			      struct reftable_ref_record *refrec);
     ++void reftable_record_from_log(struct reftable_record *rec,
     ++			      struct reftable_log_record *logrec);
     ++struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *ref);
     ++struct reftable_log_record *reftable_record_as_log(struct reftable_record *ref);
      +
      +/* for qsort. */
      +int reftable_ref_record_compare_name(const void *a, const void *b);
     @@ reftable/refname.c (new)
      +		}
      +	}
      +
     -+	err = reftable_table_read_ref(mod->tab, name, &ref);
     ++	err = reftable_table_read_ref(&mod->tab, name, &ref);
      +	reftable_ref_record_clear(&ref);
      +	return err;
      +}
     @@ reftable/refname.c (new)
      +			goto exit;
      +		}
      +	}
     -+	err = reftable_table_seek_ref(mod->tab, &it, prefix);
     ++	err = reftable_table_seek_ref(&mod->tab, &it, prefix);
      +	if (err) {
      +		goto exit;
      +	}
      +
      +	while (true) {
     -+		err = reftable_iterator_next_ref(it, &ref);
     ++		err = reftable_iterator_next_ref(&it, &ref);
      +		if (err) {
      +			goto exit;
      +		}
     @@ reftable/refname.c (new)
      +			goto exit;
      +		}
      +		slice_set_string(&slashed, mod->add[i]);
     -+		slice_append_string(&slashed, "/");
     ++		slice_addstr(&slashed, "/");
      +
      +		err = modification_has_ref_with_prefix(
      +			mod, slice_as_string(&slashed));
     @@ reftable/refname.c (new)
      +	}
      +	err = 0;
      +exit:
     -+	slice_clear(&slashed);
     ++	slice_release(&slashed);
      +	return err;
      +}
      
     @@ reftable/reftable.c (new)
      +#include "merged.h"
      +
      +struct reftable_table_vtable {
     -+	int (*seek)(void *tab, struct reftable_iterator *it, struct record);
     ++	int (*seek)(void *tab, struct reftable_iterator *it,
     ++		    struct reftable_record *);
      +};
      +
      +static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
     -+				     struct record rec)
     ++				     struct reftable_record *rec)
      +{
      +	return reader_seek((struct reftable_reader *)tab, it, rec);
      +}
     @@ reftable/reftable.c (new)
      +
      +static int reftable_merged_table_seek_void(void *tab,
      +					   struct reftable_iterator *it,
     -+					   struct record rec)
     ++					   struct reftable_record *rec)
      +{
      +	return merged_table_seek_record((struct reftable_merged_table *)tab, it,
      +					rec);
     @@ reftable/reftable.c (new)
      +	.seek = reftable_merged_table_seek_void,
      +};
      +
     -+int reftable_table_seek_ref(struct reftable_table tab,
     ++int reftable_table_seek_ref(struct reftable_table *tab,
      +			    struct reftable_iterator *it, const char *name)
      +{
      +	struct reftable_ref_record ref = {
      +		.ref_name = (char *)name,
      +	};
     -+	struct record rec = { 0 };
     -+	record_from_ref(&rec, &ref);
     -+	return tab.ops->seek(tab.table_arg, it, rec);
     ++	struct reftable_record rec = { 0 };
     ++	reftable_record_from_ref(&rec, &ref);
     ++	return tab->ops->seek(tab->table_arg, it, &rec);
      +}
      +
      +void reftable_table_from_reader(struct reftable_table *tab,
     @@ reftable/reftable.c (new)
      +	tab->table_arg = merged;
      +}
      +
     -+int reftable_table_read_ref(struct reftable_table tab, const char *name,
     ++int reftable_table_read_ref(struct reftable_table *tab, const char *name,
      +			    struct reftable_ref_record *ref)
      +{
      +	struct reftable_iterator it = { 0 };
     @@ reftable/reftable.c (new)
      +		goto exit;
      +	}
      +
     -+	err = reftable_iterator_next_ref(it, ref);
     ++	err = reftable_iterator_next_ref(&it, ref);
      +	if (err) {
      +		goto exit;
      +	}
     @@ reftable/reftable.h (new)
      +/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
      +   end of iteration.
      +*/
     -+int reftable_iterator_next_ref(struct reftable_iterator it,
     ++int reftable_iterator_next_ref(struct reftable_iterator *it,
      +			       struct reftable_ref_record *ref);
      +
      +/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
      +   end of iteration.
      +*/
     -+int reftable_iterator_next_log(struct reftable_iterator it,
     ++int reftable_iterator_next_log(struct reftable_iterator *it,
      +			       struct reftable_log_record *log);
      +
      +/* releases resources associated with an iterator. */
     @@ reftable/reftable.h (new)
      + * closed on calling reftable_reader_destroy().
      + */
      +int reftable_new_reader(struct reftable_reader **pp,
     -+			struct reftable_block_source src, const char *name);
     ++			struct reftable_block_source *src, const char *name);
      +
      +/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
      +   in the table.  To seek to the start of the table, use name = "".
     @@ reftable/reftable.h (new)
      +   example:
      +
      +   struct reftable_reader *r = NULL;
     -+   int err = reftable_new_reader(&r, src, "filename");
     ++   int err = reftable_new_reader(&r, &src, "filename");
      +   if (err < 0) { ... }
      +   struct reftable_iterator it  = {0};
      +   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
      +   if (err < 0) { ... }
      +   struct reftable_ref_record ref  = {0};
      +   while (1) {
     -+     err = reftable_iterator_next_ref(it, &ref);
     ++     err = reftable_iterator_next_ref(&it, &ref);
      +     if (err > 0) {
      +       break;
      +     }
     @@ reftable/reftable.h (new)
      +	void *table_arg;
      +};
      +
     -+int reftable_table_seek_ref(struct reftable_table tab,
     ++int reftable_table_seek_ref(struct reftable_table *tab,
      +			    struct reftable_iterator *it, const char *name);
      +
      +void reftable_table_from_reader(struct reftable_table *tab,
     @@ reftable/reftable.h (new)
      +
      +/* convenience function to read a single ref. Returns < 0 for error, 0
      +   for success, and 1 if ref not found. */
     -+int reftable_table_read_ref(struct reftable_table tab, const char *name,
     ++int reftable_table_read_ref(struct reftable_table *tab, const char *name,
      +			    struct reftable_ref_record *ref);
      +
      +/****************************************************************
     @@ reftable/slice.c (new)
      +	s->len = l;
      +}
      +
     -+void slice_append_string(struct slice *d, const char *s)
     ++void slice_addstr(struct slice *d, const char *s)
      +{
      +	int l1 = d->len;
      +	int l2 = strlen(s);
     @@ reftable/slice.c (new)
      +	memcpy(d->buf + l1, s, l2);
      +}
      +
     -+void slice_append(struct slice *s, struct slice a)
     ++void slice_addbuf(struct slice *s, struct slice a)
      +{
      +	int end = s->len;
      +	slice_resize(s, s->len + a.len);
     @@ reftable/slice.c (new)
      +	s->len -= n;
      +}
      +
     -+byte *slice_yield(struct slice *s)
     ++byte *slice_detach(struct slice *s)
      +{
      +	byte *p = s->buf;
      +	s->buf = NULL;
     @@ reftable/slice.c (new)
      +	return p;
      +}
      +
     -+void slice_clear(struct slice *s)
     ++void slice_release(struct slice *s)
      +{
     -+	reftable_free(slice_yield(s));
     ++	reftable_free(slice_detach(s));
      +}
      +
      +void slice_copy(struct slice *dest, struct slice src)
     @@ reftable/slice.c (new)
      +	slice_resize(&s, in.len + 1);
      +	s.buf[in.len] = 0;
      +	memcpy(s.buf, in.buf, in.len);
     -+	return (char *)slice_yield(&s);
     ++	return (char *)slice_detach(&s);
      +}
      +
      +bool slice_equal(struct slice a, struct slice b)
     @@ reftable/slice.c (new)
      +	return memcmp(a.buf, b.buf, a.len) == 0;
      +}
      +
     -+int slice_compare(struct slice a, struct slice b)
     ++int slice_cmp(struct slice a, struct slice b)
      +{
      +	int min = a.len < b.len ? a.len : b.len;
      +	int res = memcmp(a.buf, b.buf, min);
     @@ reftable/slice.c (new)
      +	}
      +}
      +
     -+int slice_write(struct slice *b, byte *data, size_t sz)
     ++int slice_add(struct slice *b, byte *data, size_t sz)
      +{
      +	if (b->len + sz > b->cap) {
      +		int newcap = 2 * b->cap + 1;
     @@ reftable/slice.c (new)
      +	return sz;
      +}
      +
     -+int slice_write_void(void *b, byte *data, size_t sz)
     ++int slice_add_void(void *b, byte *data, size_t sz)
      +{
     -+	return slice_write((struct slice *)b, data, sz);
     ++	return slice_add((struct slice *)b, data, sz);
      +}
      +
      +static uint64_t slice_size(void *b)
     @@ reftable/slice.h (new)
      +};
      +
      +void slice_set_string(struct slice *dest, const char *src);
     -+void slice_append_string(struct slice *dest, const char *src);
     -+/* Set length to 0, but retain buffer */
     -+void slice_clear(struct slice *slice);
     ++void slice_addstr(struct slice *dest, const char *src);
     ++
     ++/* Deallocate and clear slice */
     ++void slice_release(struct slice *slice);
      +
      +/* Return a malloced string for `src` */
      +char *slice_to_string(struct slice src);
     @@ reftable/slice.h (new)
      +bool slice_equal(struct slice a, struct slice b);
      +
      +/* Return `buf`, clearing out `s` */
     -+byte *slice_yield(struct slice *s);
     ++byte *slice_detach(struct slice *s);
      +
      +/* Copy bytes */
      +void slice_copy(struct slice *dest, struct slice src);
     @@ reftable/slice.h (new)
      +void slice_resize(struct slice *s, int l);
      +
      +/* Signed comparison */
     -+int slice_compare(struct slice a, struct slice b);
     ++int slice_cmp(struct slice a, struct slice b);
      +
      +/* Append `data` to the `dest` slice.  */
     -+int slice_write(struct slice *dest, byte *data, size_t sz);
     ++int slice_add(struct slice *dest, byte *data, size_t sz);
      +
      +/* Append `add` to `dest. */
     -+void slice_append(struct slice *dest, struct slice add);
     ++void slice_addbuf(struct slice *dest, struct slice add);
      +
     -+/* Like slice_write, but suitable for passing to reftable_new_writer
     ++/* Like slice_add, but suitable for passing to reftable_new_writer
      + */
     -+int slice_write_void(void *b, byte *data, size_t sz);
     ++int slice_add_void(void *b, byte *data, size_t sz);
      +
      +/* Find the longest shared prefix size of `a` and `b` */
      +int common_prefix_size(struct slice a, struct slice b);
     @@ reftable/stack.c (new)
      +	*dest = NULL;
      +
      +	slice_set_string(&list_file_name, dir);
     -+	slice_append_string(&list_file_name, "/tables.list");
     ++	slice_addstr(&list_file_name, "/tables.list");
      +
      +	p->list_file = slice_to_string(list_file_name);
     -+	slice_clear(&list_file_name);
     ++	slice_release(&list_file_name);
      +	p->reftable_dir = xstrdup(dir);
      +	p->config = config;
      +
     @@ reftable/stack.c (new)
      +		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
      +	int new_tables_len = 0;
      +	struct reftable_merged_table *new_merged = NULL;
     -+
     ++	int i;
      +	struct slice table_path = { 0 };
      +
      +	while (*names) {
     @@ reftable/stack.c (new)
      +		if (rd == NULL) {
      +			struct reftable_block_source src = { 0 };
      +			slice_set_string(&table_path, st->reftable_dir);
     -+			slice_append_string(&table_path, "/");
     -+			slice_append_string(&table_path, name);
     ++			slice_addstr(&table_path, "/");
     ++			slice_addstr(&table_path, name);
      +
      +			err = reftable_block_source_from_file(
      +				&src, slice_as_string(&table_path));
     @@ reftable/stack.c (new)
      +				goto exit;
      +			}
      +
     -+			err = reftable_new_reader(&rd, src, name);
     ++			err = reftable_new_reader(&rd, &src, name);
      +			if (err < 0) {
      +				goto exit;
      +			}
     @@ reftable/stack.c (new)
      +	new_merged->suppress_deletions = true;
      +	st->merged = new_merged;
      +
     -+	{
     -+		int i = 0;
     -+		for (i = 0; i < cur_len; i++) {
     -+			if (cur[i] != NULL) {
     -+				reader_close(cur[i]);
     -+				reftable_reader_free(cur[i]);
     -+			}
     ++	for (i = 0; i < cur_len; i++) {
     ++		if (cur[i] != NULL) {
     ++			reader_close(cur[i]);
     ++			reftable_reader_free(cur[i]);
      +		}
      +	}
      +
      +exit:
     -+	slice_clear(&table_path);
     -+	{
     -+		int i = 0;
     -+		for (i = 0; i < new_tables_len; i++) {
     -+			reader_close(new_tables[i]);
     -+			reftable_reader_free(new_tables[i]);
     -+		}
     ++	slice_release(&table_path);
     ++	for (i = 0; i < new_tables_len; i++) {
     ++		reader_close(new_tables[i]);
     ++		reftable_reader_free(new_tables[i]);
      +	}
      +	reftable_free(new_tables);
      +	reftable_free(cur);
     @@ reftable/stack.c (new)
      +	add->stack = st;
      +
      +	slice_set_string(&add->lock_file_name, st->list_file);
     -+	slice_append_string(&add->lock_file_name, ".lock");
     ++	slice_addstr(&add->lock_file_name, ".lock");
      +
      +	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
      +				 O_EXCL | O_CREAT | O_WRONLY, 0644);
     @@ reftable/stack.c (new)
      +	struct slice nm = { 0 };
      +	for (i = 0; i < add->new_tables_len; i++) {
      +		slice_set_string(&nm, add->stack->list_file);
     -+		slice_append_string(&nm, "/");
     -+		slice_append_string(&nm, add->new_tables[i]);
     ++		slice_addstr(&nm, "/");
     ++		slice_addstr(&nm, add->new_tables[i]);
      +		unlink(slice_as_string(&nm));
      +		reftable_free(add->new_tables[i]);
      +		add->new_tables[i] = NULL;
     @@ reftable/stack.c (new)
      +	}
      +	if (add->lock_file_name.len > 0) {
      +		unlink(slice_as_string(&add->lock_file_name));
     -+		slice_clear(&add->lock_file_name);
     ++		slice_release(&add->lock_file_name);
      +	}
      +
      +	free_names(add->names);
      +	add->names = NULL;
     -+	slice_clear(&nm);
     ++	slice_release(&nm);
      +}
      +
      +void reftable_addition_destroy(struct reftable_addition *add)
     @@ reftable/stack.c (new)
      +	}
      +
      +	for (i = 0; i < add->stack->merged->stack_len; i++) {
     -+		slice_append_string(&table_list,
     -+				    add->stack->merged->stack[i]->name);
     -+		slice_append_string(&table_list, "\n");
     ++		slice_addstr(&table_list, add->stack->merged->stack[i]->name);
     ++		slice_addstr(&table_list, "\n");
      +	}
      +	for (i = 0; i < add->new_tables_len; i++) {
     -+		slice_append_string(&table_list, add->new_tables[i]);
     -+		slice_append_string(&table_list, "\n");
     ++		slice_addstr(&table_list, add->new_tables[i]);
     ++		slice_addstr(&table_list, "\n");
      +	}
      +
      +	err = write(add->lock_file_fd, table_list.buf, table_list.len);
     -+	slice_clear(&table_list);
     ++	slice_release(&table_list);
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
      +		goto exit;
     @@ reftable/stack.c (new)
      +	format_name(&next_name, add->next_update_index, add->next_update_index);
      +
      +	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
     -+	slice_append_string(&temp_tab_file_name, "/");
     -+	slice_append(&temp_tab_file_name, next_name);
     -+	slice_append_string(&temp_tab_file_name, ".temp.XXXXXX");
     ++	slice_addstr(&temp_tab_file_name, "/");
     ++	slice_addbuf(&temp_tab_file_name, next_name);
     ++	slice_addstr(&temp_tab_file_name, ".temp.XXXXXX");
      +
      +	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
      +	if (tab_fd < 0) {
     @@ reftable/stack.c (new)
      +	}
      +
      +	format_name(&next_name, wr->min_update_index, wr->max_update_index);
     -+	slice_append_string(&next_name, ".ref");
     ++	slice_addstr(&next_name, ".ref");
      +
      +	slice_set_string(&tab_file_name, add->stack->reftable_dir);
     -+	slice_append_string(&tab_file_name, "/");
     -+	slice_append(&tab_file_name, next_name);
     ++	slice_addstr(&tab_file_name, "/");
     ++	slice_addbuf(&tab_file_name, next_name);
      +
      +	/* TODO: should check destination out of paranoia */
      +	err = rename(slice_as_string(&temp_tab_file_name),
     @@ reftable/stack.c (new)
      +		unlink(slice_as_string(&temp_tab_file_name));
      +	}
      +
     -+	slice_clear(&temp_tab_file_name);
     -+	slice_clear(&tab_file_name);
     -+	slice_clear(&next_name);
     ++	slice_release(&temp_tab_file_name);
     ++	slice_release(&tab_file_name);
     ++	slice_release(&next_name);
      +	reftable_writer_free(wr);
      +	return err;
      +}
     @@ reftable/stack.c (new)
      +		    reftable_reader_max_update_index(st->merged->stack[first]));
      +
      +	slice_set_string(temp_tab, st->reftable_dir);
     -+	slice_append_string(temp_tab, "/");
     -+	slice_append(temp_tab, next_name);
     -+	slice_append_string(temp_tab, ".temp.XXXXXX");
     ++	slice_addstr(temp_tab, "/");
     ++	slice_addbuf(temp_tab, next_name);
     ++	slice_addstr(temp_tab, ".temp.XXXXXX");
      +
      +	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
      +	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
     @@ reftable/stack.c (new)
      +	}
      +	if (err != 0 && temp_tab->len > 0) {
      +		unlink(slice_as_string(temp_tab));
     -+		slice_clear(temp_tab);
     ++		slice_release(temp_tab);
      +	}
     -+	slice_clear(&next_name);
     ++	slice_release(&next_name);
      +	return err;
      +}
      +
     @@ reftable/stack.c (new)
      +	}
      +
      +	while (true) {
     -+		err = reftable_iterator_next_ref(it, &ref);
     ++		err = reftable_iterator_next_ref(&it, &ref);
      +		if (err > 0) {
      +			err = 0;
      +			break;
     @@ reftable/stack.c (new)
      +	}
      +
      +	while (true) {
     -+		err = reftable_iterator_next_log(it, &log);
     ++		err = reftable_iterator_next_log(&it, &log);
      +		if (err > 0) {
      +			err = 0;
      +			break;
     @@ reftable/stack.c (new)
      +	bool have_lock = false;
      +	int lock_file_fd = 0;
      +	int compact_count = last - first + 1;
     ++	char **listp = NULL;
      +	char **delete_on_success =
      +		reftable_calloc(sizeof(char *) * (compact_count + 1));
      +	char **subtable_locks =
     @@ reftable/stack.c (new)
      +	st->stats.attempts++;
      +
      +	slice_set_string(&lock_file_name, st->list_file);
     -+	slice_append_string(&lock_file_name, ".lock");
     ++	slice_addstr(&lock_file_name, ".lock");
      +
      +	lock_file_fd = open(slice_as_string(&lock_file_name),
      +			    O_EXCL | O_CREAT | O_WRONLY, 0644);
     @@ reftable/stack.c (new)
      +		struct slice subtab_file_name = { 0 };
      +		struct slice subtab_lock = { 0 };
      +		slice_set_string(&subtab_file_name, st->reftable_dir);
     -+		slice_append_string(&subtab_file_name, "/");
     -+		slice_append_string(&subtab_file_name,
     -+				    reader_name(st->merged->stack[i]));
     ++		slice_addstr(&subtab_file_name, "/");
     ++		slice_addstr(&subtab_file_name,
     ++			     reader_name(st->merged->stack[i]));
      +
      +		slice_copy(&subtab_lock, subtab_file_name);
     -+		slice_append_string(&subtab_lock, ".lock");
     ++		slice_addstr(&subtab_lock, ".lock");
      +
      +		{
      +			int sublock_file_fd =
     @@ reftable/stack.c (new)
      +
      +	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
      +		    st->merged->stack[last]->max_update_index);
     -+	slice_append_string(&new_table_name, ".ref");
     ++	slice_addstr(&new_table_name, ".ref");
      +
      +	slice_set_string(&new_table_path, st->reftable_dir);
     -+	slice_append_string(&new_table_path, "/");
     ++	slice_addstr(&new_table_path, "/");
      +
     -+	slice_append(&new_table_path, new_table_name);
     ++	slice_addbuf(&new_table_path, new_table_name);
      +
      +	if (!is_empty_table) {
      +		err = rename(slice_as_string(&temp_tab_file_name),
     @@ reftable/stack.c (new)
      +	}
      +
      +	for (i = 0; i < first; i++) {
     -+		slice_append_string(&ref_list_contents,
     -+				    st->merged->stack[i]->name);
     -+		slice_append_string(&ref_list_contents, "\n");
     ++		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
     ++		slice_addstr(&ref_list_contents, "\n");
      +	}
      +	if (!is_empty_table) {
     -+		slice_append(&ref_list_contents, new_table_name);
     -+		slice_append_string(&ref_list_contents, "\n");
     ++		slice_addbuf(&ref_list_contents, new_table_name);
     ++		slice_addstr(&ref_list_contents, "\n");
      +	}
      +	for (i = last + 1; i < st->merged->stack_len; i++) {
     -+		slice_append_string(&ref_list_contents,
     -+				    st->merged->stack[i]->name);
     -+		slice_append_string(&ref_list_contents, "\n");
     ++		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
     ++		slice_addstr(&ref_list_contents, "\n");
      +	}
      +
      +	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
     @@ reftable/stack.c (new)
      +	*/
      +	err = reftable_stack_reload_maybe_reuse(st, first < last);
      +
     -+	{
     -+		char **p = delete_on_success;
     -+		while (*p) {
     -+			if (strcmp(*p, slice_as_string(&new_table_path))) {
     -+				unlink(*p);
     -+			}
     -+			p++;
     ++	listp = delete_on_success;
     ++	while (*listp) {
     ++		if (strcmp(*listp, slice_as_string(&new_table_path))) {
     ++			unlink(*listp);
      +		}
     ++		listp++;
      +	}
      +
      +exit:
      +	free_names(delete_on_success);
     -+	{
     -+		char **p = subtable_locks;
     -+		while (*p) {
     -+			unlink(*p);
     -+			p++;
     -+		}
     ++
     ++	listp = subtable_locks;
     ++	while (*listp) {
     ++		unlink(*listp);
     ++		listp++;
      +	}
      +	free_names(subtable_locks);
      +	if (lock_file_fd > 0) {
     @@ reftable/stack.c (new)
      +	if (have_lock) {
      +		unlink(slice_as_string(&lock_file_name));
      +	}
     -+	slice_clear(&new_table_name);
     -+	slice_clear(&new_table_path);
     -+	slice_clear(&ref_list_contents);
     -+	slice_clear(&temp_tab_file_name);
     -+	slice_clear(&lock_file_name);
     ++	slice_release(&new_table_name);
     ++	slice_release(&new_table_path);
     ++	slice_release(&ref_list_contents);
     ++	slice_release(&temp_tab_file_name);
     ++	slice_release(&lock_file_name);
      +	return err;
      +}
      +
     @@ reftable/stack.c (new)
      +{
      +	struct reftable_table tab = { NULL };
      +	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
     -+	return reftable_table_read_ref(tab, refname, ref);
     ++	return reftable_table_read_ref(&tab, refname, ref);
      +}
      +
      +int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
     @@ reftable/stack.c (new)
      +		goto exit;
      +	}
      +
     -+	err = reftable_iterator_next_log(it, log);
     ++	err = reftable_iterator_next_log(&it, log);
      +	if (err) {
      +		goto exit;
      +	}
     @@ reftable/stack.c (new)
      +		goto exit;
      +	}
      +
     -+	err = reftable_new_reader(&rd, src, new_tab_name);
     ++	err = reftable_new_reader(&rd, &src, new_tab_name);
      +	if (err < 0) {
      +		goto exit;
      +	}
     @@ reftable/stack.c (new)
      +
      +	while (true) {
      +		struct reftable_ref_record ref = { 0 };
     -+		err = reftable_iterator_next_ref(it, &ref);
     ++		err = reftable_iterator_next_ref(&it, &ref);
      +		if (err > 0) {
      +			break;
      +		}
     @@ reftable/tree.c (new)
      +			      int (*compare)(const void *, const void *),
      +			      int insert)
      +{
     ++	int res;
      +	if (*rootp == NULL) {
      +		if (!insert) {
      +			return NULL;
     @@ reftable/tree.c (new)
      +		}
      +	}
      +
     -+	{
     -+		int res = compare(key, (*rootp)->key);
     -+		if (res < 0) {
     -+			return tree_search(key, &(*rootp)->left, compare,
     -+					   insert);
     -+		} else if (res > 0) {
     -+			return tree_search(key, &(*rootp)->right, compare,
     -+					   insert);
     -+		}
     ++	res = compare(key, (*rootp)->key);
     ++	if (res < 0) {
     ++		return tree_search(key, &(*rootp)->left, compare, insert);
     ++	} else if (res > 0) {
     ++		return tree_search(key, &(*rootp)->right, compare, insert);
      +	}
      +	return *rootp;
      +}
     @@ reftable/writer.c (new)
      +
      +static int obj_index_tree_node_compare(const void *a, const void *b)
      +{
     -+	return slice_compare(((const struct obj_index_tree_node *)a)->hash,
     -+			     ((const struct obj_index_tree_node *)b)->hash);
     ++	return slice_cmp(((const struct obj_index_tree_node *)a)->hash,
     ++			 ((const struct obj_index_tree_node *)b)->hash);
      +}
      +
      +static void writer_index_hash(struct reftable_writer *w, struct slice hash)
     @@ reftable/writer.c (new)
      +	key->offsets[key->offset_len++] = off;
      +}
      +
     -+static int writer_add_record(struct reftable_writer *w, struct record rec)
     ++static int writer_add_record(struct reftable_writer *w,
     ++			     struct reftable_record *rec)
      +{
      +	int result = -1;
      +	struct slice key = { 0 };
      +	int err = 0;
     -+	record_key(rec, &key);
     -+	if (slice_compare(w->last_key, key) >= 0) {
     ++	reftable_record_key(rec, &key);
     ++	if (slice_cmp(w->last_key, key) >= 0) {
      +		goto exit;
      +	}
      +
      +	slice_copy(&w->last_key, key);
      +	if (w->block_writer == NULL) {
     -+		writer_reinit_block_writer(w, record_type(rec));
     ++		writer_reinit_block_writer(w, reftable_record_type(rec));
      +	}
      +
     -+	assert(block_writer_type(w->block_writer) == record_type(rec));
     ++	assert(block_writer_type(w->block_writer) == reftable_record_type(rec));
      +
      +	if (block_writer_add(w->block_writer, rec) == 0) {
      +		result = 0;
     @@ reftable/writer.c (new)
      +		goto exit;
      +	}
      +
     -+	writer_reinit_block_writer(w, record_type(rec));
     ++	writer_reinit_block_writer(w, reftable_record_type(rec));
      +	err = block_writer_add(w->block_writer, rec);
      +	if (err < 0) {
      +		result = err;
     @@ reftable/writer.c (new)
      +
      +	result = 0;
      +exit:
     -+	slice_clear(&key);
     ++	slice_release(&key);
      +	return result;
      +}
      +
      +int reftable_writer_add_ref(struct reftable_writer *w,
      +			    struct reftable_ref_record *ref)
      +{
     -+	struct record rec = { 0 };
     ++	struct reftable_record rec = { 0 };
      +	struct reftable_ref_record copy = *ref;
      +	int err = 0;
      +
     @@ reftable/writer.c (new)
      +		return REFTABLE_API_ERROR;
      +	}
      +
     -+	record_from_ref(&rec, &copy);
     ++	reftable_record_from_ref(&rec, &copy);
      +	copy.update_index -= w->min_update_index;
     -+	err = writer_add_record(w, rec);
     ++	err = writer_add_record(w, &rec);
      +	if (err < 0) {
      +		return err;
      +	}
     @@ reftable/writer.c (new)
      +int reftable_writer_add_log(struct reftable_writer *w,
      +			    struct reftable_log_record *log)
      +{
     ++	struct reftable_record rec = { 0 };
     ++	int err;
      +	if (log->ref_name == NULL) {
      +		return REFTABLE_API_ERROR;
      +	}
     @@ reftable/writer.c (new)
      +	w->next -= w->pending_padding;
      +	w->pending_padding = 0;
      +
     -+	{
     -+		struct record rec = { 0 };
     -+		int err;
     -+		record_from_log(&rec, log);
     -+		err = writer_add_record(w, rec);
     -+		return err;
     -+	}
     ++	reftable_record_from_log(&rec, log);
     ++	err = writer_add_record(w, &rec);
     ++	return err;
      +}
      +
      +int reftable_writer_add_logs(struct reftable_writer *w,
     @@ reftable/writer.c (new)
      +	int before_blocks = w->stats.idx_stats.blocks;
      +	int err = writer_flush_block(w);
      +	int i = 0;
     ++	struct reftable_block_stats *bstats = NULL;
      +	if (err < 0) {
      +		return err;
      +	}
      +
      +	while (w->index_len > threshold) {
     -+		struct index_record *idx = NULL;
     ++		struct reftable_index_record *idx = NULL;
      +		int idx_len = 0;
      +
      +		max_level++;
     @@ reftable/writer.c (new)
      +		w->index_len = 0;
      +		w->index_cap = 0;
      +		for (i = 0; i < idx_len; i++) {
     -+			struct record rec = { 0 };
     -+			record_from_index(&rec, idx + i);
     -+			if (block_writer_add(w->block_writer, rec) == 0) {
     ++			struct reftable_record rec = { 0 };
     ++			reftable_record_from_index(&rec, idx + i);
     ++			if (block_writer_add(w->block_writer, &rec) == 0) {
      +				continue;
      +			}
      +
     @@ reftable/writer.c (new)
      +
      +			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
      +
     -+			err = block_writer_add(w->block_writer, rec);
     ++			err = block_writer_add(w->block_writer, &rec);
      +			if (err != 0) {
      +				/* write into fresh block should always succeed
      +				 */
     @@ reftable/writer.c (new)
      +			}
      +		}
      +		for (i = 0; i < idx_len; i++) {
     -+			slice_clear(&idx[i].last_key);
     ++			slice_release(&idx[i].last_key);
      +		}
      +		reftable_free(idx);
      +	}
     @@ reftable/writer.c (new)
      +		return err;
      +	}
      +
     -+	{
     -+		struct reftable_block_stats *bstats =
     -+			writer_reftable_block_stats(w, typ);
     -+		bstats->index_blocks =
     -+			w->stats.idx_stats.blocks - before_blocks;
     -+		bstats->index_offset = index_start;
     -+		bstats->max_index_level = max_level;
     -+	}
     ++	bstats = writer_reftable_block_stats(w, typ);
     ++	bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
     ++	bstats->index_offset = index_start;
     ++	bstats->max_index_level = max_level;
      +
      +	/* Reinit lastKey, as the next section can start with any key. */
      +	w->last_key.len = 0;
     @@ reftable/writer.c (new)
      +{
      +	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
      +	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
     -+	struct obj_record obj_rec = {
     ++	struct reftable_obj_record obj_rec = {
      +		.hash_prefix = entry->hash.buf,
      +		.hash_prefix_len = arg->w->stats.object_id_len,
      +		.offsets = entry->offsets,
      +		.offset_len = entry->offset_len,
      +	};
     -+	struct record rec = { 0 };
     ++	struct reftable_record rec = { 0 };
      +	if (arg->err < 0) {
      +		goto exit;
      +	}
      +
     -+	record_from_obj(&rec, &obj_rec);
     -+	arg->err = block_writer_add(arg->w->block_writer, rec);
     ++	reftable_record_from_obj(&rec, &obj_rec);
     ++	arg->err = block_writer_add(arg->w->block_writer, &rec);
      +	if (arg->err == 0) {
      +		goto exit;
      +	}
     @@ reftable/writer.c (new)
      +	}
      +
      +	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
     -+	arg->err = block_writer_add(arg->w->block_writer, rec);
     ++	arg->err = block_writer_add(arg->w->block_writer, &rec);
      +	if (arg->err == 0) {
      +		goto exit;
      +	}
      +	obj_rec.offset_len = 0;
     -+	arg->err = block_writer_add(arg->w->block_writer, rec);
     ++	arg->err = block_writer_add(arg->w->block_writer, &rec);
      +
      +	/* Should be able to write into a fresh block. */
      +	assert(arg->err == 0);
     @@ reftable/writer.c (new)
      +	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
      +
      +	FREE_AND_NULL(entry->offsets);
     -+	slice_clear(&entry->hash);
     ++	slice_release(&entry->hash);
      +	reftable_free(entry);
      +}
      +
     @@ reftable/writer.c (new)
      +	/* free up memory. */
      +	block_writer_clear(&w->block_writer_data);
      +	writer_clear_index(w);
     -+	slice_clear(&w->last_key);
     ++	slice_release(&w->last_key);
      +	return err;
      +}
      +
     @@ reftable/writer.c (new)
      +{
      +	int i = 0;
      +	for (i = 0; i < w->index_len; i++) {
     -+		slice_clear(&w->index[i].last_key);
     ++		slice_release(&w->index[i].last_key);
      +	}
      +
      +	FREE_AND_NULL(w->index);
     @@ reftable/writer.c (new)
      +	int raw_bytes = block_writer_finish(w->block_writer);
      +	int padding = 0;
      +	int err = 0;
     ++	struct reftable_index_record ir = { 0 };
      +	if (raw_bytes < 0) {
      +		return raw_bytes;
      +	}
     @@ reftable/writer.c (new)
      +	if (w->index_cap == w->index_len) {
      +		w->index_cap = 2 * w->index_cap + 1;
      +		w->index = reftable_realloc(
     -+			w->index, sizeof(struct index_record) * w->index_cap);
     ++			w->index,
     ++			sizeof(struct reftable_index_record) * w->index_cap);
      +	}
      +
     -+	{
     -+		struct index_record ir = {
     -+			.offset = w->next,
     -+		};
     -+		slice_copy(&ir.last_key, w->block_writer->last_key);
     -+		w->index[w->index_len] = ir;
     -+	}
     ++	ir.offset = w->next;
     ++	slice_copy(&ir.last_key, w->block_writer->last_key);
     ++	w->index[w->index_len] = ir;
      +
      +	w->index_len++;
      +	w->next += padding + raw_bytes;
     @@ reftable/writer.h (new)
      +	struct block_writer block_writer_data;
      +
      +	/* pending index records for the current section */
     -+	struct index_record *index;
     ++	struct reftable_index_record *index;
      +	int index_len;
      +	int index_cap;
      +
 11:  ace95b6cd88 !  6:  865c2c4567a Reftable support for git-core
     @@ refs/reftable-backend.c (new)
      +	struct reftable_stack *stack;
      +};
      +
     ++static int reftable_read_raw_ref(struct ref_store *ref_store,
     ++				 const char *refname, struct object_id *oid,
     ++				 struct strbuf *referent, unsigned int *type);
     ++
      +static void clear_reftable_log_record(struct reftable_log_record *log)
      +{
      +	log->old_hash = NULL;
     @@ refs/reftable-backend.c (new)
      +	struct git_reftable_iterator *ri =
      +		(struct git_reftable_iterator *)ref_iterator;
      +	while (ri->err == 0) {
     -+		ri->err = reftable_iterator_next_ref(ri->iter, &ri->ref);
     ++		ri->err = reftable_iterator_next_ref(&ri->iter, &ri->ref);
      +		if (ri->err) {
      +			break;
      +		}
     @@ refs/reftable-backend.c (new)
      +	return &ri->base;
      +}
      +
     ++static int fixup_symrefs(struct ref_store *ref_store,
     ++			 struct ref_transaction *transaction)
     ++{
     ++	struct strbuf referent = STRBUF_INIT;
     ++	int i = 0;
     ++	int err = 0;
     ++
     ++	for (i = 0; i < transaction->nr; i++) {
     ++		struct ref_update *update = transaction->updates[i];
     ++		struct object_id old_oid;
     ++
     ++		err = reftable_read_raw_ref(ref_store, update->refname,
     ++					    &old_oid, &referent,
     ++					    /* mutate input, like
     ++					       files-backend.c */
     ++					    &update->type);
     ++		if (err < 0 && errno == ENOENT &&
     ++		    is_null_oid(&update->old_oid)) {
     ++			err = 0;
     ++		}
     ++		if (err < 0)
     ++			goto done;
     ++
     ++		if (!(update->type & REF_ISSYMREF))
     ++			continue;
     ++
     ++		if (update->flags & REF_NO_DEREF) {
     ++			/* what should happen here? See files-backend.c
     ++			 * lock_ref_for_update. */
     ++		} else {
     ++			/*
     ++			  If we are updating a symref (eg. HEAD), we should also
     ++			  update the branch that the symref points to.
     ++
     ++			  This is generic functionality, and would be better
     ++			  done in refs.c, but the current implementation is
     ++			  intertwined with the locking in files-backend.c.
     ++			*/
     ++			int new_flags = update->flags;
     ++			struct ref_update *new_update = NULL;
     ++
     ++			/* if this is an update for HEAD, should also record a
     ++			   log entry for HEAD? See files-backend.c,
     ++			   split_head_update()
     ++			*/
     ++			new_update = ref_transaction_add_update(
     ++				transaction, referent.buf, new_flags,
     ++				&update->new_oid, &update->old_oid,
     ++				update->msg);
     ++			new_update->parent_update = update;
     ++
     ++			/* files-backend sets REF_LOG_ONLY here. */
     ++			update->flags |= REF_NO_DEREF | REF_LOG_ONLY;
     ++			update->flags &= ~REF_HAVE_OLD;
     ++		}
     ++	}
     ++
     ++done:
     ++	strbuf_release(&referent);
     ++	return err;
     ++}
     ++
      +static int reftable_transaction_prepare(struct ref_store *ref_store,
      +					struct ref_transaction *transaction,
      +					struct strbuf *errbuf)
      +{
     -+	/* XXX rewrite using the reftable transaction API. */
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_addition *add = NULL;
      +	int err = refs->err;
      +	if (err < 0) {
      +		goto done;
      +	}
     ++
      +	err = reftable_stack_reload(refs->stack);
      +	if (err) {
      +		goto done;
      +	}
      +
     ++	err = reftable_stack_new_addition(&add, refs->stack);
     ++	if (err) {
     ++		goto done;
     ++	}
     ++
     ++	err = fixup_symrefs(ref_store, transaction);
     ++	if (err) {
     ++		goto done;
     ++	}
     ++
     ++	transaction->backend_data = add;
     ++	transaction->state = REF_TRANSACTION_PREPARED;
     ++
      +done:
     ++	if (err < 0) {
     ++		transaction->state = REF_TRANSACTION_CLOSED;
     ++		strbuf_addf(errbuf, "reftable: transaction prepare: %s",
     ++			    reftable_error_str(err));
     ++	}
     ++
      +	return err;
      +}
      +
     @@ refs/reftable-backend.c (new)
      +				      struct ref_transaction *transaction,
      +				      struct strbuf *err)
      +{
     -+	struct git_reftable_ref_store *refs =
     -+		(struct git_reftable_ref_store *)ref_store;
     -+	(void)refs;
     ++	struct reftable_addition *add =
     ++		(struct reftable_addition *)transaction->backend_data;
     ++	reftable_addition_destroy(add);
     ++	transaction->backend_data = NULL;
      +	return 0;
      +}
      +
     @@ refs/reftable-backend.c (new)
      +
      +static int ref_update_cmp(const void *a, const void *b)
      +{
     -+	return strcmp(((struct ref_update *)a)->refname,
     -+		      ((struct ref_update *)b)->refname);
     ++	return strcmp((*(struct ref_update **)a)->refname,
     ++		      (*(struct ref_update **)b)->refname);
      +}
      +
      +static int write_transaction_table(struct reftable_writer *writer, void *arg)
     @@ refs/reftable-backend.c (new)
      +	QSORT(sorted, transaction->nr, ref_update_cmp);
      +	reftable_writer_set_limits(writer, ts, ts);
      +
     -+	for (i = 0; i < transaction->nr; i++) {
     -+		struct ref_update *u = sorted[i];
     -+		if (u->flags & REF_HAVE_OLD) {
     -+			err = reftable_check_old_oid(transaction->ref_store,
     -+						     u->refname, &u->old_oid);
     -+			if (err < 0) {
     -+				goto exit;
     -+			}
     -+		}
     -+	}
      +
      +	for (i = 0; i < transaction->nr; i++) {
      +		struct ref_update *u = sorted[i];
     @@ refs/reftable-backend.c (new)
      +		log->update_index = ts;
      +		log->message = u->msg;
      +
     ++		if (u->flags & REF_LOG_ONLY) {
     ++			continue;
     ++		}
     ++
      +		if (u->flags & REF_HAVE_NEW) {
     -+			struct object_id out_oid;
     -+			int out_flags = 0;
     -+			/* Memory owned by refs_resolve_ref_unsafe, no need to
     -+			 * free(). */
     -+			const char *resolved = refs_resolve_ref_unsafe(
     -+				transaction->ref_store, u->refname, 0, &out_oid,
     -+				&out_flags);
      +			struct reftable_ref_record ref = { NULL };
      +			struct object_id peeled;
     -+			int peel_error = peel_object(&u->new_oid, &peeled);
      +
     -+			ref.ref_name =
     -+				(char *)(resolved ? resolved : u->refname);
     -+			log->ref_name = ref.ref_name;
     ++			int peel_error = peel_object(&u->new_oid, &peeled);
     ++			ref.ref_name = (char *)u->refname;
      +
      +			if (!is_null_oid(&u->new_oid)) {
      +				ref.value = u->new_oid.hash;
     @@ refs/reftable-backend.c (new)
      +
      +			err = reftable_writer_add_ref(writer, &ref);
      +			if (err < 0) {
     -+				goto exit;
     ++				goto done;
      +			}
      +		}
      +	}
     @@ refs/reftable-backend.c (new)
      +		err = reftable_writer_add_log(writer, &logs[i]);
      +		clear_reftable_log_record(&logs[i]);
      +		if (err < 0) {
     -+			goto exit;
     ++			goto done;
      +		}
      +	}
      +
     -+exit:
     ++done:
      +	free(logs);
      +	free(sorted);
      +	return err;
      +}
      +
     -+static int reftable_transaction_commit(struct ref_store *ref_store,
     ++static int reftable_transaction_finish(struct ref_store *ref_store,
      +				       struct ref_transaction *transaction,
      +				       struct strbuf *errmsg)
      +{
     -+	struct git_reftable_ref_store *refs =
     -+		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_addition *add =
     ++		(struct reftable_addition *)transaction->backend_data;
      +	int err = 0;
     -+	if (refs->err < 0) {
     -+		return refs->err;
     ++	int i;
     ++
     ++	for (i = 0; i < transaction->nr; i++) {
     ++		struct ref_update *u = transaction->updates[i];
     ++		if (u->flags & REF_HAVE_OLD) {
     ++			err = reftable_check_old_oid(transaction->ref_store,
     ++						     u->refname, &u->old_oid);
     ++			if (err < 0) {
     ++				goto done;
     ++			}
     ++		}
      +	}
      +
     -+	err = reftable_stack_add(refs->stack, &write_transaction_table,
     -+				 transaction);
     ++	err = reftable_addition_add(add, &write_transaction_table, transaction);
      +	if (err < 0) {
     ++		goto done;
     ++	}
     ++
     ++	err = reftable_addition_commit(add);
     ++
     ++done:
     ++	reftable_addition_destroy(add);
     ++	transaction->state = REF_TRANSACTION_CLOSED;
     ++	transaction->backend_data = NULL;
     ++	if (err) {
      +		strbuf_addf(errmsg, "reftable: transaction failure: %s",
      +			    reftable_error_str(err));
      +		return -1;
      +	}
     -+
     -+	return 0;
     ++	return err;
      +}
      +
     -+static int reftable_transaction_finish(struct ref_store *ref_store,
     -+				       struct ref_transaction *transaction,
     -+				       struct strbuf *err)
     ++static int
     ++reftable_transaction_initial_commit(struct ref_store *ref_store,
     ++				    struct ref_transaction *transaction,
     ++				    struct strbuf *errmsg)
      +{
     -+	return reftable_transaction_commit(ref_store, transaction, err);
     ++	return reftable_transaction_finish(ref_store, transaction, errmsg);
      +}
      +
      +struct write_pseudoref_arg {
     @@ refs/reftable-backend.c (new)
      +	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
      +
      +	if (err) {
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	/* XXX do ref renames overwrite the target? */
      +	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	free(ref.ref_name);
     @@ refs/reftable-backend.c (new)
      +
      +		err = reftable_writer_add_refs(writer, todo, 2);
      +		if (err < 0) {
     -+			goto exit;
     ++			goto done;
      +		}
      +	}
      +
     @@ refs/reftable-backend.c (new)
      +		clear_reftable_log_record(&todo[1]);
      +
      +		if (err < 0) {
     -+			goto exit;
     ++			goto done;
      +		}
      +
      +	} else {
      +		/* XXX symrefs? */
      +	}
      +
     -+exit:
     ++done:
      +	reftable_ref_record_clear(&ref);
      +	return err;
      +}
     @@ refs/reftable-backend.c (new)
      +		(struct reftable_reflog_ref_iterator *)ref_iterator;
      +
      +	while (1) {
     -+		int err = reftable_iterator_next_log(ri->iter, &ri->log);
     ++		int err = reftable_iterator_next_log(&ri->iter, &ri->log);
      +		if (err > 0) {
      +			return ITER_DONE;
      +		}
     @@ refs/reftable-backend.c (new)
      +	mt = reftable_stack_merged_table(refs->stack);
      +	err = reftable_merged_table_seek_log(mt, &it, refname);
      +	while (err == 0) {
     -+		err = reftable_iterator_next_log(it, &log);
     ++		err = reftable_iterator_next_log(&it, &log);
      +		if (err != 0) {
      +			break;
      +		}
     @@ refs/reftable-backend.c (new)
      +
      +	while (err == 0) {
      +		struct reftable_log_record log = { NULL };
     -+		err = reftable_iterator_next_log(it, &log);
     ++		err = reftable_iterator_next_log(&it, &log);
      +		if (err != 0) {
      +			break;
      +		}
     @@ refs/reftable-backend.c (new)
      +	if (refs->err < 0) {
      +		return refs->err;
      +	}
     ++	err = reftable_stack_reload(refs->stack);
     ++	if (err) {
     ++		goto done;
     ++	}
      +
      +	mt = reftable_stack_merged_table(refs->stack);
      +	err = reftable_merged_table_seek_log(mt, &it, refname);
      +	if (err < 0) {
     -+		return err;
     ++		goto done;
      +	}
      +
      +	while (1) {
      +		struct object_id ooid;
      +		struct object_id noid;
      +
     -+		int err = reftable_iterator_next_log(it, &log);
     ++		int err = reftable_iterator_next_log(&it, &log);
      +		if (err < 0) {
     -+			return err;
     ++			goto done;
      +		}
      +
      +		if (err > 0 || strcmp(log.ref_name, refname)) {
     @@ refs/reftable-backend.c (new)
      +			add_log_tombstone(&arg, refname, log.update_index);
      +		}
      +	}
     ++	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
     ++
     ++done:
      +	reftable_log_record_clear(&log);
      +	reftable_iterator_destroy(&it);
     -+	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
      +	clear_log_tombstones(&arg);
      +	return err;
      +}
     @@ refs/reftable-backend.c (new)
      +	if (err > 0) {
      +		errno = ENOENT;
      +		err = -1;
     -+		goto exit;
     ++		goto done;
      +	}
      +	if (err < 0) {
      +		errno = reftable_error_to_errno(err);
      +		err = -1;
     -+		goto exit;
     ++		goto done;
      +	}
      +	if (ref.target != NULL) {
     -+		/* XXX recurse? */
      +		strbuf_reset(referent);
      +		strbuf_addstr(referent, ref.target);
      +		*type |= REF_ISSYMREF;
     @@ refs/reftable-backend.c (new)
      +		errno = EINVAL;
      +		err = -1;
      +	}
     -+exit:
     ++done:
      +	reftable_ref_record_clear(&ref);
      +	return err;
      +}
     @@ refs/reftable-backend.c (new)
      +	reftable_transaction_prepare,
      +	reftable_transaction_finish,
      +	reftable_transaction_abort,
     -+	reftable_transaction_commit,
     ++	reftable_transaction_initial_commit,
      +
      +	reftable_pack_refs,
      +	reftable_create_symref,
     @@ refs/reftable-backend.c (new)
      +	reftable_reflog_exists,
      +	reftable_create_reflog,
      +	reftable_delete_reflog,
     -+	reftable_reflog_expire
     ++	reftable_reflog_expire,
      +};
      
       ## repository.c ##
     @@ t/t0031-reftable.sh (new)
      +	test_cmp expect actual
      +'
      +
     -+test_expect_success 'basic operation of reftable storage: commit, reflog, repack' '
     ++test_expect_success 'basic operation of reftable storage: commit, show-ref' '
      +	initialize &&
      +	test_commit file &&
      +	test_write_lines refs/heads/master refs/tags/file >expect &&
      +	git show-ref &&
      +	git show-ref | cut -f2 -d" " > actual &&
     -+	test_cmp actual expect &&
     ++	test_cmp actual expect
     ++'
     ++
     ++test_expect_success 'reflog, repack' '
     ++	initialize &&
      +	for count in $(test_seq 1 10)
      +	do
      +		test_commit "number $count" file.t $count number-$count ||
     @@ t/t0031-reftable.sh (new)
      +	ls -1 .git/reftable >table-files &&
      +	test_line_count = 2 table-files &&
      +	git reflog refs/heads/master >output &&
     -+	test_line_count = 11 output &&
     -+	grep "commit (initial): file" output &&
     ++	test_line_count = 10 output &&
     ++	grep "commit (initial): number 1" output &&
      +	grep "commit: number 10" output &&
      +	git gc &&
      +	git reflog refs/heads/master >output &&
     @@ t/t0031-reftable.sh (new)
      +	test -f file2
      +'
      +
     ++# cherry-pick uses a pseudo ref.
     ++test_expect_success 'rebase' '
     ++	initialize &&
     ++	test_commit message1 file1 &&
     ++	test_commit message2 file2 &&
     ++	git branch source &&
     ++	git checkout HEAD^ &&
     ++	test_commit message3 file3 &&
     ++ 	git rebase source &&
     ++	test -f file2
     ++'
     ++
      +test_done
      +
  -:  ----------- >  7:  6b248d5fdb4 Add GIT_DEBUG_REFS debugging mechanism
 12:  b96f0712d44 =  8:  54102355ce7 vcxproj: adjust for the reftable changes
 13:  0e732d30b51 !  9:  7764ebf0956 Add some reftable testing infrastructure
     @@ Metadata
      Author: Han-Wen Nienhuys <hanwen@google.com>
      
       ## Commit message ##
     -    Add some reftable testing infrastructure
     +    Add reftable testing infrastructure
      
          * Add GIT_TEST_REFTABLE environment var to control default ref storage
      
     -    * Add test_prerequisite REFTABLE. Skip t/t3210-pack-refs.sh for REFTABLE.
     +    * Add test_prerequisite REFTABLE.
     +
     +    * Skip some tests that are incompatible:
     +
     +      * t3210-pack-refs.sh - does not apply
     +      * t9903-bash-prompt - The bash mode reads .git/HEAD directly
     +      * t1450-fsck.sh - manipulates .git/ directly to create invalid state
      
          Major test failures:
      
     -     * t9903-bash-prompt - The bash mode reads .git/HEAD directly
           * t1400-update-ref.sh - Reads from .git/{refs,logs} directly
           * t1404-update-ref-errors.sh - Manipulates .git/refs/ directly
           * t1405 - inspecs .git/ directly.
     -     * t1450-fsck.sh - manipulates .git/ directly to create invalid state
     -     * Rebase, cherry-pick: pseudo refs aren't written through the refs backend.
     -
     -    Other tests by decreasing brokenness:
     -
     -    t1407-worktree-ref-store.sh              - 5 of 5
     -    t1413-reflog-detach.sh                   - 7 of 7
     -    t1415-worktree-refs.sh                   - 11 of 11
     -    t3908-stash-in-worktree.sh               - 2 of 2
     -    t4207-log-decoration-colors.sh           - 2 of 2
     -    t5515-fetch-merge-logic.sh               - 17 of 17
     -    t5900-repo-selection.sh                  - 8 of 8
     -    t6016-rev-list-graph-simplify-history.sh - 12 of 12
     -    t5573-pull-verify-signatures.sh          - 15 of 16
     -    t5612-clone-refspec.sh                   - 12 of 13
     -    t5514-fetch-multiple.sh                  - 11 of 12
     -    t6030-bisect-porcelain.sh                - 64 of 71
     -    t5533-push-cas.sh                        - 15 of 17
     -    t5539-fetch-http-shallow.sh              - 7 of 8
     -    t7413-submodule-is-active.sh             - 7 of 8
     -    t2400-worktree-add.sh                    - 59 of 69
     -    t0100-previous.sh                        - 5 of 6
     -    t7419-submodule-set-branch.sh            - 5 of 6
     -    t1404-update-ref-errors.sh               - 44 of 53
     -    t6003-rev-list-topo-order.sh             - 29 of 35
     -    t1409-avoid-packing-refs.sh              - 9 of 11
     -    t5541-http-push-smart.sh                 - 31 of 38
     -    t5407-post-rewrite-hook.sh               - 13 of 16
     -    t9903-bash-prompt.sh                     - 52 of 66
     -    t1414-reflog-walk.sh                     - 9 of 12
     -    t1507-rev-parse-upstream.sh              - 21 of 28
     -    t2404-worktree-config.sh                 - 9 of 12
     -    t1505-rev-parse-last.sh                  - 5 of 7
     -    t7510-signed-commit.sh                   - 16 of 23
     -    t2018-checkout-branch.sh                 - 15 of 22
     -    (..etc)
     -
      
     +    t6030-bisect-porcelain.sh                - 62 of 72
     +    t2400-worktree-add.sh                    - 58 of 69
     +    t3200-branch.sh                          - 58 of 145
     +    t7406-submodule-update.sh                - 54 of 54
     +    t5601-clone.sh                           - 51 of 105
     +    t9903-bash-prompt.sh                     - 50 of 66
     +    t1404-update-ref-errors.sh               - 44 of 53
     +    t5510-fetch.sh                           - 40 of 171
     +    t7400-submodule-basic.sh                 - 38 of 111
     +    t3514-cherry-pick-revert-gpg.sh          - 36 of 36
     +    ..
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
     @@ refs.c: struct ref_store *get_main_ref_store(struct repository *r)
      -						 DEFAULT_REF_STORAGE,
      +						 default_ref_storage(),
       					 REF_STORE_ALL_CAPS);
     - 	return r->refs_private;
     - }
     + 	if (getenv("GIT_DEBUG_REFS")) {
     + 		r->refs_private = debug_wrap(r->refs_private);
      @@ refs.c: struct ref_store *get_submodule_ref_store(const char *submodule)
       		goto done;
       
     @@ t/t1409-avoid-packing-refs.sh: test_description='avoid rewriting packed-refs unn
       # shouldn't upset readers, and it should be omitted if the file is
       # ever rewritten.
      
     + ## t/t1450-fsck.sh ##
     +@@ t/t1450-fsck.sh: test_description='git fsck random collection of tests
     + 
     + . ./test-lib.sh
     + 
     ++if test_have_prereq REFTABLE
     ++then
     ++  skip_all='skipping tests; incompatible with reftable'
     ++  test_done
     ++fi
     ++
     + test_expect_success setup '
     + 	test_oid_init &&
     + 	git config gc.auto 0 &&
     +
       ## t/t3210-pack-refs.sh ##
      @@ t/t3210-pack-refs.sh: semantic is still the same.
       '
     @@ t/t3210-pack-refs.sh: semantic is still the same.
       	git config core.logallrefupdates true
       '
      
     + ## t/t9903-bash-prompt.sh ##
     +@@ t/t9903-bash-prompt.sh: test_description='test git-specific bash prompt functions'
     + 
     + . ./lib-bash.sh
     + 
     ++if test_have_prereq REFTABLE
     ++then
     ++  skip_all='skipping tests; incompatible with reftable'
     ++  test_done
     ++fi
     ++
     + . "$GIT_BUILD_DIR/contrib/completion/git-prompt.sh"
     + 
     + actual="$TRASH_DIRECTORY/actual"
     +
       ## t/test-lib.sh ##
      @@ t/test-lib.sh: FreeBSD)
       	;;

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v14 1/9] Write pseudorefs through ref backends.
  2020-05-18 20:31                         ` [PATCH v14 0/9] " Han-Wen Nienhuys via GitGitGadget
@ 2020-05-18 20:31                           ` Han-Wen Nienhuys via GitGitGadget
  2020-05-18 20:31                           ` [PATCH v14 2/9] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
                                             ` (8 subsequent siblings)
  9 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-18 20:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Pseudorefs store transient data in in the repository. Examples are HEAD,
CHERRY_PICK_HEAD, etc.

These refs have always been read through the ref backends, but they were written
in a one-off routine that wrote a object ID or symref directly wrote into
.git/<pseudo_ref_name>.

This causes problems when introducing a new ref storage backend. To remedy this,
extend the ref backend implementation with a write_pseudoref_fn and
update_pseudoref_fn.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c                | 114 +++++++-----------------------------------
 refs.h                |  11 ++++
 refs/files-backend.c  | 114 +++++++++++++++++++++++++++++++++++++++++-
 refs/packed-backend.c |  21 +++++++-
 refs/refs-internal.h  |  12 +++++
 5 files changed, 173 insertions(+), 99 deletions(-)

diff --git a/refs.c b/refs.c
index 224ff66c7bb..4acc22373e4 100644
--- a/refs.c
+++ b/refs.c
@@ -739,101 +739,6 @@ long get_files_ref_lock_timeout_ms(void)
 	return timeout_ms;
 }
 
-static int write_pseudoref(const char *pseudoref, const struct object_id *oid,
-			   const struct object_id *old_oid, struct strbuf *err)
-{
-	const char *filename;
-	int fd;
-	struct lock_file lock = LOCK_INIT;
-	struct strbuf buf = STRBUF_INIT;
-	int ret = -1;
-
-	if (!oid)
-		return 0;
-
-	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
-
-	filename = git_path("%s", pseudoref);
-	fd = hold_lock_file_for_update_timeout(&lock, filename, 0,
-					       get_files_ref_lock_timeout_ms());
-	if (fd < 0) {
-		strbuf_addf(err, _("could not open '%s' for writing: %s"),
-			    filename, strerror(errno));
-		goto done;
-	}
-
-	if (old_oid) {
-		struct object_id actual_old_oid;
-
-		if (read_ref(pseudoref, &actual_old_oid)) {
-			if (!is_null_oid(old_oid)) {
-				strbuf_addf(err, _("could not read ref '%s'"),
-					    pseudoref);
-				rollback_lock_file(&lock);
-				goto done;
-			}
-		} else if (is_null_oid(old_oid)) {
-			strbuf_addf(err, _("ref '%s' already exists"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		} else if (!oideq(&actual_old_oid, old_oid)) {
-			strbuf_addf(err, _("unexpected object ID when writing '%s'"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		}
-	}
-
-	if (write_in_full(fd, buf.buf, buf.len) < 0) {
-		strbuf_addf(err, _("could not write to '%s'"), filename);
-		rollback_lock_file(&lock);
-		goto done;
-	}
-
-	commit_lock_file(&lock);
-	ret = 0;
-done:
-	strbuf_release(&buf);
-	return ret;
-}
-
-static int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
-{
-	const char *filename;
-
-	filename = git_path("%s", pseudoref);
-
-	if (old_oid && !is_null_oid(old_oid)) {
-		struct lock_file lock = LOCK_INIT;
-		int fd;
-		struct object_id actual_old_oid;
-
-		fd = hold_lock_file_for_update_timeout(
-				&lock, filename, 0,
-				get_files_ref_lock_timeout_ms());
-		if (fd < 0) {
-			error_errno(_("could not open '%s' for writing"),
-				    filename);
-			return -1;
-		}
-		if (read_ref(pseudoref, &actual_old_oid))
-			die(_("could not read ref '%s'"), pseudoref);
-		if (!oideq(&actual_old_oid, old_oid)) {
-			error(_("unexpected object ID when deleting '%s'"),
-			      pseudoref);
-			rollback_lock_file(&lock);
-			return -1;
-		}
-
-		unlink(filename);
-		rollback_lock_file(&lock);
-	} else {
-		unlink(filename);
-	}
-
-	return 0;
-}
 
 int refs_delete_ref(struct ref_store *refs, const char *msg,
 		    const char *refname,
@@ -845,7 +750,7 @@ int refs_delete_ref(struct ref_store *refs, const char *msg,
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
 		assert(refs == get_main_ref_store(the_repository));
-		return delete_pseudoref(refname, old_oid);
+		return ref_store_delete_pseudoref(refs, refname, old_oid);
 	}
 
 	transaction = ref_store_transaction_begin(refs, &err);
@@ -1172,7 +1077,8 @@ int refs_update_ref(struct ref_store *refs, const char *msg,
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
 		assert(refs == get_main_ref_store(the_repository));
-		ret = write_pseudoref(refname, new_oid, old_oid, &err);
+		ret = ref_store_write_pseudoref(refs, refname, new_oid, old_oid,
+						&err);
 	} else {
 		t = ref_store_transaction_begin(refs, &err);
 		if (!t ||
@@ -1441,6 +1347,20 @@ int head_ref(each_ref_fn fn, void *cb_data)
 	return refs_head_ref(get_main_ref_store(the_repository), fn, cb_data);
 }
 
+int ref_store_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			      const struct object_id *oid,
+			      const struct object_id *old_oid,
+			      struct strbuf *err)
+{
+	return refs->be->write_pseudoref(refs, pseudoref, oid, old_oid, err);
+}
+
+int ref_store_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			       const struct object_id *old_oid)
+{
+	return refs->be->delete_pseudoref(refs, pseudoref, old_oid);
+}
+
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
 		const char *prefix, int trim, int flags)
diff --git a/refs.h b/refs.h
index 99ba9e331e5..d1d9361441b 100644
--- a/refs.h
+++ b/refs.h
@@ -728,6 +728,17 @@ int update_ref(const char *msg, const char *refname,
 	       const struct object_id *new_oid, const struct object_id *old_oid,
 	       unsigned int flags, enum action_on_err onerr);
 
+/* Pseudorefs (eg. HEAD, CHERRY_PICK_HEAD) have a separate routines for updating
+   and deletion as they cannot take part in normal transactional updates.
+   Pseudorefs should only be written for the main repository.
+*/
+int ref_store_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			      const struct object_id *oid,
+			      const struct object_id *old_oid,
+			      struct strbuf *err);
+int ref_store_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			       const struct object_id *old_oid);
+
 int parse_hide_refs_config(const char *var, const char *value, const char *);
 
 /*
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6516c7bc8c8..df7553f4cc3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -731,6 +731,115 @@ static int lock_raw_ref(struct files_ref_store *refs,
 	return ret;
 }
 
+static int files_write_pseudoref(struct ref_store *ref_store,
+				 const char *pseudoref,
+				 const struct object_id *oid,
+				 const struct object_id *old_oid,
+				 struct strbuf *err)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "write_pseudoref");
+	int fd;
+	struct lock_file lock = LOCK_INIT;
+	struct strbuf filename = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	int ret = -1;
+
+	if (!oid)
+		return 0;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+	fd = hold_lock_file_for_update_timeout(&lock, filename.buf, 0,
+					       get_files_ref_lock_timeout_ms());
+	if (fd < 0) {
+		strbuf_addf(err, _("could not open '%s' for writing: %s"),
+			    buf.buf, strerror(errno));
+		goto done;
+	}
+
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(err, _("could not read ref '%s'"),
+					    pseudoref);
+				rollback_lock_file(&lock);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(err, _("ref '%s' already exists"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(err,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+	}
+
+	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
+	if (write_in_full(fd, buf.buf, buf.len) < 0) {
+		strbuf_addf(err, _("could not write to '%s'"), filename.buf);
+		rollback_lock_file(&lock);
+		goto done;
+	}
+
+	commit_lock_file(&lock);
+	ret = 0;
+done:
+	strbuf_release(&buf);
+	strbuf_release(&filename);
+	return ret;
+}
+
+static int files_delete_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *old_oid)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "delete_pseudoref");
+	struct strbuf filename = STRBUF_INIT;
+	int ret = -1;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+
+	if (old_oid && !is_null_oid(old_oid)) {
+		struct lock_file lock = LOCK_INIT;
+		int fd;
+		struct object_id actual_old_oid;
+
+		fd = hold_lock_file_for_update_timeout(
+			&lock, filename.buf, 0,
+			get_files_ref_lock_timeout_ms());
+		if (fd < 0) {
+			error_errno(_("could not open '%s' for writing"),
+				    filename.buf);
+			goto done;
+		}
+		if (read_ref(pseudoref, &actual_old_oid))
+			die(_("could not read ref '%s'"), pseudoref);
+		if (!oideq(&actual_old_oid, old_oid)) {
+			error(_("unexpected object ID when deleting '%s'"),
+			      pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+
+		unlink(filename.buf);
+		rollback_lock_file(&lock);
+	} else {
+		unlink(filename.buf);
+	}
+	ret = 0;
+done:
+	strbuf_release(&filename);
+	return ret;
+}
+
 struct files_ref_iterator {
 	struct ref_iterator base;
 
@@ -3189,6 +3298,9 @@ struct ref_storage_be refs_be_files = {
 	files_rename_ref,
 	files_copy_ref,
 
+	files_write_pseudoref,
+	files_delete_pseudoref,
+
 	files_ref_iterator_begin,
 	files_read_raw_ref,
 
@@ -3198,5 +3310,5 @@ struct ref_storage_be refs_be_files = {
 	files_reflog_exists,
 	files_create_reflog,
 	files_delete_reflog,
-	files_reflog_expire
+	files_reflog_expire,
 };
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 4458a0f69cc..08e8253a893 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1590,6 +1590,22 @@ static int packed_copy_ref(struct ref_store *ref_store,
 	BUG("packed reference store does not support copying references");
 }
 
+static int packed_write_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *oid,
+				  const struct object_id *old_oid,
+				  struct strbuf *err)
+{
+	BUG("packed reference store does not support writing pseudo-references");
+}
+
+static int packed_delete_pseudoref(struct ref_store *ref_store,
+				   const char *pseudoref,
+				   const struct object_id *old_oid)
+{
+	BUG("packed reference store does not support deleting pseudo-references");
+}
+
 static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_store)
 {
 	return empty_ref_iterator_begin();
@@ -1656,6 +1672,9 @@ struct ref_storage_be refs_be_packed = {
 	packed_rename_ref,
 	packed_copy_ref,
 
+	packed_write_pseudoref,
+	packed_delete_pseudoref,
+
 	packed_ref_iterator_begin,
 	packed_read_raw_ref,
 
@@ -1665,5 +1684,5 @@ struct ref_storage_be refs_be_packed = {
 	packed_reflog_exists,
 	packed_create_reflog,
 	packed_delete_reflog,
-	packed_reflog_expire
+	packed_reflog_expire,
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 3490aac3a40..dabe18baea1 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -549,6 +549,15 @@ typedef int copy_ref_fn(struct ref_store *ref_store,
 			  const char *oldref, const char *newref,
 			  const char *logmsg);
 
+typedef int write_pseudoref_fn(struct ref_store *ref_store,
+			       const char *pseudoref,
+			       const struct object_id *oid,
+			       const struct object_id *old_oid,
+			       struct strbuf *err);
+typedef int delete_pseudoref_fn(struct ref_store *ref_store,
+				const char *pseudoref,
+				const struct object_id *old_oid);
+
 /*
  * Iterate over the references in `ref_store` whose names start with
  * `prefix`. `prefix` is matched as a literal string, without regard
@@ -648,6 +657,9 @@ struct ref_storage_be {
 	rename_ref_fn *rename_ref;
 	copy_ref_fn *copy_ref;
 
+	write_pseudoref_fn *write_pseudoref;
+	delete_pseudoref_fn *delete_pseudoref;
+
 	ref_iterator_begin_fn *iterator_begin;
 	read_raw_ref_fn *read_raw_ref;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v14 2/9] Move REF_LOG_ONLY to refs-internal.h
  2020-05-18 20:31                         ` [PATCH v14 0/9] " Han-Wen Nienhuys via GitGitGadget
  2020-05-18 20:31                           ` [PATCH v14 1/9] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
@ 2020-05-18 20:31                           ` Han-Wen Nienhuys via GitGitGadget
  2020-05-18 20:31                           ` [PATCH v14 3/9] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
                                             ` (7 subsequent siblings)
  9 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-18 20:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

REF_LOG_ONLY is used in the transaction preparation: if a symref is involved in
a transaction, the referent of the symref should be updated, and the symref
itself should only be updated in the reflog. Other ref backends will need to
duplicate this logic.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/files-backend.c | 7 -------
 refs/refs-internal.h | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index df7553f4cc3..141b6b08816 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -38,13 +38,6 @@
  */
 #define REF_NEEDS_COMMIT (1 << 6)
 
-/*
- * Used as a flag in ref_update::flags when we want to log a ref
- * update but not actually perform it.  This is used when a symbolic
- * ref update is split up.
- */
-#define REF_LOG_ONLY (1 << 7)
-
 /*
  * Used as a flag in ref_update::flags when the ref_update was via an
  * update to HEAD.
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index dabe18baea1..51c96ebd485 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -31,6 +31,13 @@ struct ref_transaction;
  */
 #define REF_HAVE_OLD (1 << 3)
 
+/*
+ * Used as a flag in ref_update::flags when we want to log a ref
+ * update but not actually perform it.  This is used when a symbolic
+ * ref update is split up.
+ */
+#define REF_LOG_ONLY (1 << 7)
+
 /*
  * Return the length of time to retry acquiring a loose reference lock
  * before giving up, in milliseconds:
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v14 3/9] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-05-18 20:31                         ` [PATCH v14 0/9] " Han-Wen Nienhuys via GitGitGadget
  2020-05-18 20:31                           ` [PATCH v14 1/9] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
  2020-05-18 20:31                           ` [PATCH v14 2/9] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
@ 2020-05-18 20:31                           ` Han-Wen Nienhuys via GitGitGadget
  2020-05-18 20:31                           ` [PATCH v14 4/9] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
                                             ` (6 subsequent siblings)
  9 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-18 20:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This happens implicitly in the files/packed ref backend; making it
explicit simplifies adding alternate ref storage backends, such as
reftable.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 4acc22373e4..dfafd04b9f9 100644
--- a/refs.c
+++ b/refs.c
@@ -1445,7 +1445,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1505,8 +1505,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
-			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
+			       cb_data);
 }
 
 int for_each_rawref(each_ref_fn fn, void *cb_data)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v14 4/9] Add .gitattributes for the reftable/ directory
  2020-05-18 20:31                         ` [PATCH v14 0/9] " Han-Wen Nienhuys via GitGitGadget
                                             ` (2 preceding siblings ...)
  2020-05-18 20:31                           ` [PATCH v14 3/9] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
@ 2020-05-18 20:31                           ` Han-Wen Nienhuys via GitGitGadget
  2020-05-18 20:31                           ` [PATCH v14 5/9] Add reftable library Han-Wen Nienhuys via GitGitGadget
                                             ` (5 subsequent siblings)
  9 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-18 20:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 reftable/.gitattributes

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 00000000000..f44451a3795
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v14 5/9] Add reftable library
  2020-05-18 20:31                         ` [PATCH v14 0/9] " Han-Wen Nienhuys via GitGitGadget
                                             ` (3 preceding siblings ...)
  2020-05-18 20:31                           ` [PATCH v14 4/9] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
@ 2020-05-18 20:31                           ` Han-Wen Nienhuys via GitGitGadget
  2020-05-18 20:31                           ` [PATCH v14 6/9] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                                             ` (4 subsequent siblings)
  9 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-18 20:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. Its rationale and
specification is in the preceding commit.

This imports the upstream library as one big commit. For understanding
the code, it is suggested to read the code in the following order:

* The specification under Documentation/technical/reftable.txt

* reftable.h - the public API

* record.{c,h} - reading and writing records

* block.{c,h} - reading and writing blocks.

* writer.{c,h} - writing a complete reftable file.

* merged.{c,h} and pq.{c,h} - reading a stack of reftables

* stack.{c,h} - writing and compacting stacks of reftable on the
filesystem.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE       |   31 +
 reftable/README.md     |   11 +
 reftable/VERSION       |    5 +
 reftable/basics.c      |  215 +++++++
 reftable/basics.h      |   53 ++
 reftable/block.c       |  435 ++++++++++++++
 reftable/block.h       |  129 +++++
 reftable/constants.h   |   21 +
 reftable/file.c        |   97 ++++
 reftable/iter.c        |  241 ++++++++
 reftable/iter.h        |   63 ++
 reftable/merged.c      |  327 +++++++++++
 reftable/merged.h      |   39 ++
 reftable/pq.c          |  114 ++++
 reftable/pq.h          |   34 ++
 reftable/reader.c      |  754 ++++++++++++++++++++++++
 reftable/reader.h      |   65 +++
 reftable/record.c      | 1154 +++++++++++++++++++++++++++++++++++++
 reftable/record.h      |  128 +++++
 reftable/refname.c     |  215 +++++++
 reftable/refname.h     |   38 ++
 reftable/reftable.c    |   92 +++
 reftable/reftable.h    |  564 ++++++++++++++++++
 reftable/slice.c       |  225 ++++++++
 reftable/slice.h       |   77 +++
 reftable/stack.c       | 1234 ++++++++++++++++++++++++++++++++++++++++
 reftable/stack.h       |   48 ++
 reftable/system.h      |   54 ++
 reftable/tree.c        |   64 +++
 reftable/tree.h        |   34 ++
 reftable/update.sh     |   24 +
 reftable/writer.c      |  659 +++++++++++++++++++++
 reftable/writer.h      |   60 ++
 reftable/zlib-compat.c |   92 +++
 34 files changed, 7396 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 00000000000..402e0f9356b
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 00000000000..21500563fd4
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,11 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   sh reftable/update.sh
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 00000000000..02ba07c199a
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,5 @@
+commit bad78b6de70700933a2f93ebadaa37a5e852d9ea
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Mon May 18 20:12:57 2020 +0200
+
+    C: get rid of inner blocks for local scoping
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 00000000000..14c4dfaf5a6
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,215 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)(i & 0xff);
+}
+
+uint32_t get_be24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)(i & 0xff);
+}
+
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args)
+{
+	size_t lo = 0;
+	size_t hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		size_t mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		reftable_free(*p);
+		p++;
+	}
+	reftable_free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = reftable_realloc(
+					names, names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = reftable_realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *reftable_error_str(int err)
+{
+	static char buf[250];
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return "I/O error";
+	case REFTABLE_FORMAT_ERROR:
+		return "corrupt reftable file";
+	case REFTABLE_NOT_EXIST_ERROR:
+		return "file does not exist";
+	case REFTABLE_LOCK_ERROR:
+		return "data is outdated";
+	case REFTABLE_API_ERROR:
+		return "misuse of the reftable API";
+	case REFTABLE_ZLIB_ERROR:
+		return "zlib failure";
+	case REFTABLE_NAME_CONFLICT:
+		return "file/directory conflict";
+	case REFTABLE_REFNAME_ERROR:
+		return "invalid refname";
+	case -1:
+		return "general error";
+	default:
+		snprintf(buf, sizeof(buf), "unknown error code %d", err);
+		return buf;
+	}
+}
+
+int reftable_error_to_errno(int err)
+{
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return EIO;
+	case REFTABLE_FORMAT_ERROR:
+		return EFAULT;
+	case REFTABLE_NOT_EXIST_ERROR:
+		return ENOENT;
+	case REFTABLE_LOCK_ERROR:
+		return EBUSY;
+	case REFTABLE_API_ERROR:
+		return EINVAL;
+	case REFTABLE_ZLIB_ERROR:
+		return EDOM;
+	default:
+		return ERANGE;
+	}
+}
+
+void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
+void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
+void (*reftable_free_ptr)(void *) = &free;
+
+void *reftable_malloc(size_t sz)
+{
+	return (*reftable_malloc_ptr)(sz);
+}
+
+void *reftable_realloc(void *p, size_t sz)
+{
+	return (*reftable_realloc_ptr)(p, sz);
+}
+
+void reftable_free(void *p)
+{
+	reftable_free_ptr(p);
+}
+
+void *reftable_calloc(size_t sz)
+{
+	void *p = reftable_malloc(sz);
+	memset(p, 0, sz);
+	return p;
+}
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *))
+{
+	reftable_malloc_ptr = malloc;
+	reftable_realloc_ptr = realloc;
+	reftable_free_ptr = free;
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 00000000000..1b003485c9c
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+/* Bigendian en/decoding of integers */
+
+void put_be24(byte *out, uint32_t i);
+uint32_t get_be24(byte *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args);
+
+/*
+  Frees a NULL terminated array of malloced strings. The array itself is also
+  freed.
+ */
+void free_names(char **a);
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp);
+
+/* compares two NULL-terminated arrays of strings. */
+int names_equal(char **a, char **b);
+
+/* returns the array size of a NULL-terminated array of strings. */
+int names_length(char **names);
+
+/* Allocation routines; they invoke the functions set through
+ * reftable_set_alloc() */
+void *reftable_malloc(size_t sz);
+void *reftable_realloc(void *p, size_t sz);
+void reftable_free(void *p);
+void *reftable_calloc(size_t sz);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 00000000000..fc6fc29f0cb
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,435 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int header_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 24;
+	case 2:
+		return 28;
+	}
+	abort();
+}
+
+int footer_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 68;
+	case 2:
+		return 72;
+	}
+	abort();
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the reftable_record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec)
+{
+	struct slice empty = { 0 };
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = { 0 };
+	int n = 0;
+
+	reftable_record_key(rec, &key);
+	n = reftable_encode_key(&restart, out, last, key,
+				reftable_record_val_type(rec));
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	n = reftable_record_encode(rec, out, w->hash_size);
+	if (n < 0) {
+		goto err;
+	}
+	slice_consume(&out, n);
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0) {
+		goto err;
+	}
+
+	slice_release(&key);
+	return 0;
+
+err:
+	slice_release(&key);
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next) {
+		return -1;
+	}
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = reftable_realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = { 0 };
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		slice_resize(&compressed, src_len);
+
+		while (1) {
+			uLongf dest_len = compressed.len;
+
+			zresult = compress2(compressed.buf, &dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				slice_resize(&compressed, 2 * compressed.len);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				slice_release(&compressed);
+				return REFTABLE_ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed.buf,
+			       dest_len);
+			w->next = dest_len + block_header_skip;
+			slice_release(&compressed);
+			break;
+		}
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct reftable_block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	if (!reftable_is_block_type(typ)) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = { 0 };
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip; /* total size of dest
+							    buffer. */
+		uLongf src_len = block->len - block_header_skip;
+
+		/* Log blocks specify the *uncompressed* size in their header.
+		 */
+		slice_resize(&uncompressed, sz);
+
+		/* Copy over the block header verbatim. It's not compressed. */
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		/* Uncompress */
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			slice_release(&uncompressed);
+			return REFTABLE_ZLIB_ERROR;
+		}
+
+		if (dst_len + block_header_skip != sz) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+
+		/* We're done with the input data. */
+		reftable_block_done(block);
+		block->data = uncompressed.buf;
+		block->len = sz;
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	{
+		uint16_t restart_count = get_be16(block->data + sz - 2);
+		uint32_t restart_start = sz - 2 - 3 * restart_count;
+
+		byte *restart_bytes = block->data + restart_start;
+
+		/* transfer ownership. */
+		br->block = *block;
+		block->data = NULL;
+		block->len = 0;
+
+		br->hash_size = hash_size;
+		br->block_len = restart_start;
+		br->full_block_size = full_block_size;
+		br->header_off = header_off;
+		br->restart_count = restart_count;
+		br->restart_bytes = restart_bytes;
+	}
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct slice key;
+	struct block_reader *r;
+};
+
+static int restart_key_less(size_t idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = { 0 };
+	struct slice last_key = { 0 };
+	byte unused_extra;
+	int n = reftable_decode_key(&rkey, &unused_extra, last_key, in);
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	{
+		int result = slice_cmp(a->key, rkey);
+		slice_release(&rkey);
+		return result;
+	}
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+int block_iter_next(struct block_iter *it, struct reftable_record *rec)
+{
+	struct slice in = {
+		.buf = it->br->block.data + it->next_off,
+		.len = it->br->block_len - it->next_off,
+	};
+	struct slice start = in;
+	struct slice key = { 0 };
+	byte extra = 0;
+	int n = 0;
+
+	if (it->next_off >= it->br->block_len) {
+		return 1;
+	}
+
+	n = reftable_decode_key(&key, &extra, it->last_key, in);
+	if (n < 0) {
+		return -1;
+	}
+
+	slice_consume(&in, n);
+	n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	slice_copy(&it->last_key, key);
+	it->next_off += start.len - in.len;
+	slice_release(&key);
+	return 0;
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = { 0 };
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	byte extra = 0;
+	int n = reftable_decode_key(key, &extra, empty, in);
+	if (n < 0) {
+		return n;
+	}
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	slice_release(&it->last_key);
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+	struct reftable_record rec = reftable_new_record(block_reader_type(br));
+	struct slice key = { 0 };
+	int err = 0;
+	struct block_iter next = { 0 };
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	/* We're looking for the last entry less/equal than the wanted key, so
+	   we have to go one entry too far and then back up.
+	*/
+	while (true) {
+		block_iter_copy_from(&next, it);
+		err = block_iter_next(&next, &rec);
+		if (err < 0) {
+			goto exit;
+		}
+
+		reftable_record_key(&rec, &key);
+		if (err > 0 || slice_cmp(key, want) >= 0) {
+			err = 0;
+			goto exit;
+		}
+
+		block_iter_copy_from(it, &next);
+	}
+
+exit:
+	slice_release(&key);
+	slice_release(&next.last_key);
+	reftable_record_destroy(&rec);
+
+	return err;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	slice_release(&bw->last_key);
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 00000000000..fe2350f3cdb
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,129 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+/*
+  Writes reftable blocks. The block_writer is reused across blocks to minimize
+  allocation overhead.
+*/
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+
+	/* Offset ofof the global header. Nonzero in the first block only. */
+	uint32_t header_off;
+
+	/* How often to restart keys. */
+	int restart_interval;
+	int hash_size;
+
+	/* Offset of next byte to write. */
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+
+	struct slice last_key;
+	int entries;
+};
+
+/*
+  initializes the blockwriter to write `typ` entries, using `buf` as temporary
+  storage. `buf` is not owned by the block_writer. */
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+
+/*
+  returns the block type (eg. 'r' for ref records.
+*/
+byte block_writer_type(struct block_writer *bw);
+
+/* appends the record, or -1 if it doesn't fit. */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec);
+
+/* appends the key restarts, and compress the block if necessary. */
+int block_writer_finish(struct block_writer *w);
+
+/* clears out internally allocated block_writer members. */
+void block_writer_clear(struct block_writer *bw);
+
+/* Read a block. */
+struct block_reader {
+	/* offset of the block header; nonzero for the first block in a
+	 * reftable. */
+	uint32_t header_off;
+
+	/* the memory block */
+	struct reftable_block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint16_t restart_count;
+
+	/* size of the data in the file. For log blocks, this is the compressed
+	 * size. */
+	uint32_t full_block_size;
+};
+
+/* Iterate over entries in a block */
+struct block_iter {
+	/* offset within the block of the next entry to read. */
+	uint32_t next_off;
+	struct block_reader *br;
+
+	/* key for last entry we read. */
+	struct slice last_key;
+};
+
+/* initializes a block reader */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Position `it` at start of the block */
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+
+/* Position `it` to the `want` key in the block */
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want);
+
+/* Returns the block type (eg. 'r' for refs) */
+byte block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct reftable_record *rec);
+
+/* Seek to `want` with in the block pointed to by `it` */
+int block_iter_seek(struct block_iter *it, struct slice want);
+
+/* deallocate memory for `it`. The block reader and its block is left intact. */
+void block_iter_close(struct block_iter *it);
+
+/* size of file header, depending on format version */
+int header_size(int version);
+
+/* size of file footer, depending on format version */
+int footer_size(int version);
+
+/* returns a block to its source. */
+void reftable_block_done(struct reftable_block *ret);
+
+#endif
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 00000000000..5eee72c4c11
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,21 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 00000000000..56b0b3ca3f8
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	reftable_free(b);
+}
+
+static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = reftable_malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size) {
+		return -1;
+	}
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int reftable_block_source_from_file(struct reftable_block_source *bs,
+				    const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	struct file_block_source *p = NULL;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return REFTABLE_NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0) {
+		return -1;
+	}
+
+	p = reftable_calloc(sizeof(struct file_block_source));
+	p->size = st.st_size;
+	p->fd = fd;
+
+	assert(bs->ops == NULL);
+	bs->ops = &file_vtable;
+	bs->arg = p;
+	return 0;
+}
+
+int reftable_fd_write(void *arg, byte *data, size_t sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 00000000000..d4ad750d822
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,241 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct reftable_iterator *it)
+{
+	return it->ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct reftable_record *rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct reftable_iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct reftable_iterator *it)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec)
+{
+	return it->ops->next(it->iter_arg, rec);
+}
+
+void reftable_iterator_destroy(struct reftable_iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, ref);
+	return iterator_next(it, &rec);
+}
+
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, log);
+	return iterator_next(it, &rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	slice_release(&fri->oid);
+	reftable_iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg,
+				       struct reftable_record *rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+	int err = 0;
+	while (true) {
+		err = reftable_iterator_next_ref(&fri->it, ref);
+		if (err != 0) {
+			break;
+		}
+
+		if (fri->double_check) {
+			struct reftable_iterator it = { 0 };
+
+			err = reftable_table_seek_ref(&fri->tab, &it,
+						      ref->ref_name);
+			if (err == 0) {
+				err = reftable_iterator_next_ref(&it, ref);
+			}
+
+			reftable_iterator_destroy(&it);
+
+			if (err < 0) {
+				break;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+
+	reftable_ref_record_clear(ref);
+	return err;
+}
+
+struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reftable_block_done(&it->block_reader.block);
+	slice_release(&it->oid);
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reftable_block_done(&it->block_reader.block);
+
+	{
+		uint64_t off = it->offsets[it->offset_idx++];
+		int err = reader_init_block_reader(it->r, &it->block_reader,
+						   off, BLOCK_TYPE_REF);
+		if (err < 0) {
+			return err;
+		}
+		if (err > 0) {
+			/* indexed block does not exist. */
+			return REFTABLE_FORMAT_ERROR;
+		}
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter *itr =
+		reftable_calloc(sizeof(struct indexed_table_ref_iter));
+	int err = 0;
+
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		reftable_free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 00000000000..a608a0fa38b
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,63 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct reftable_iterator_vtable {
+	int (*next)(void *iter_arg, struct reftable_record *rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct reftable_iterator *it);
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec);
+
+/* Returns true for a zeroed out iterator, such as the one returned from
+   iterator_destroy. */
+bool iterator_is_null(struct reftable_iterator *it);
+
+/* iterator that produces only ref records that point to `oid` */
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reftable_table tab;
+	struct slice oid;
+	struct reftable_iterator it;
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
+					  struct filtering_ref_iterator *);
+
+/* iterator that produces only ref records that point to `oid`,
+   but using the object index.
+ */
+struct indexed_table_ref_iter {
+	struct reftable_reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 00000000000..55684bc624d
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,327 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct reftable_record rec = reftable_new_record(mi->typ);
+		int err = iterator_next(&mi->stack[i], &rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[i]);
+			reftable_record_destroy(&rec);
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->stack[i]);
+	}
+	reftable_free(mi->stack);
+}
+
+static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
+					       size_t idx)
+{
+	struct reftable_record rec = reftable_new_record(mi->typ);
+	struct pq_entry e = {
+		.rec = rec,
+		.index = idx,
+	};
+	int err = iterator_next(&mi->stack[idx], &rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (err > 0) {
+		reftable_iterator_destroy(&mi->stack[idx]);
+		reftable_record_destroy(&rec);
+		return 0;
+	}
+
+	merged_iter_pqueue_add(&mi->pq, e);
+	return 0;
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
+{
+	if (iterator_is_null(&mi->stack[idx])) {
+		return 0;
+	}
+	return merged_iter_advance_nonnull_subiter(mi, idx);
+}
+
+static int merged_iter_next_entry(struct merged_iter *mi,
+				  struct reftable_record *rec)
+{
+	struct slice entry_key = { 0 };
+	struct pq_entry entry = { 0 };
+	int err = 0;
+
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	entry = merged_iter_pqueue_remove(&mi->pq);
+	err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0) {
+		return err;
+	}
+
+	/*
+	  One can also use reftable as datacenter-local storage, where the ref
+	  database is maintained in globally consistent database (eg.
+	  CockroachDB or Spanner). In this scenario, replication delays together
+	  with compaction may cause newer tables to contain older entries. In
+	  such a deployment, the loop below must be changed to collect all
+	  entries for the same key, and return new the newest one.
+	*/
+	reftable_record_key(&entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = { 0 };
+		int err = 0, cmp = 0;
+
+		reftable_record_key(&top.rec, &k);
+
+		cmp = slice_cmp(k, entry_key);
+		slice_release(&k);
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		reftable_record_destroy(&top.rec);
+	}
+
+	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
+	reftable_record_destroy(&entry.rec);
+	slice_release(&entry_key);
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec)
+{
+	while (true) {
+		int err = merged_iter_next_entry(mi, rec);
+		if (err == 0 && mi->suppress_deletions &&
+		    reftable_record_is_deletion(rec)) {
+			continue;
+		}
+
+		return err;
+	}
+}
+
+static int merged_iter_next_void(void *p, struct reftable_record *rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq)) {
+		return 1;
+	}
+
+	return merged_iter_next(mi, rec);
+}
+
+struct reftable_iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct reftable_iterator *it,
+				      struct merged_iter *mi)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id)
+{
+	struct reftable_merged_table *m = NULL;
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reftable_reader *r = stack[i];
+		if (r->hash_id != hash_id) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reftable_reader_min_update_index(r);
+		}
+
+		last_max = reftable_reader_max_update_index(r);
+	}
+
+	m = (struct reftable_merged_table *)reftable_calloc(
+		sizeof(struct reftable_merged_table));
+	m->stack = stack;
+	m->stack_len = n;
+	m->min = first_min;
+	m->max = last_max;
+	m->hash_id = hash_id;
+	*dest = m;
+	return 0;
+}
+
+void reftable_merged_table_close(struct reftable_merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reftable_reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct reftable_merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void reftable_merged_table_free(struct reftable_merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	reftable_free(mt);
+}
+
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
+{
+	return mt->min;
+}
+
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec)
+{
+	struct reftable_iterator *iters = reftable_calloc(
+		sizeof(struct reftable_iterator) * mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = reftable_record_type(rec),
+		.hash_id = mt->hash_id,
+		.suppress_deletions = mt->suppress_deletions,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			reftable_iterator_destroy(&iters[i]);
+		}
+		reftable_free(iters);
+		return err;
+	}
+
+	merged.stack_len = n;
+	err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	}
+
+	{
+		struct merged_iter *p =
+			reftable_malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 00000000000..4e03642bdbe
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct reftable_merged_table {
+	struct reftable_reader **stack;
+	int stack_len;
+	uint32_t hash_id;
+	bool suppress_deletions;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	int stack_len;
+	byte typ;
+	bool suppress_deletions;
+	struct merged_iter_pqueue pq;
+};
+
+void merged_table_clear(struct reftable_merged_table *mt);
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec);
+
+#endif
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 00000000000..fd4e92b957c
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,114 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = { 0 };
+	struct slice bk = { 0 };
+	int cmp = 0;
+	reftable_record_key(&a.rec, &ak);
+	reftable_record_key(&b.rec, &bk);
+
+	cmp = slice_cmp(ak, bk);
+
+	slice_release(&ak);
+	slice_release(&bk);
+
+	if (cmp == 0) {
+		return a.index > b.index;
+	}
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = reftable_realloc(pq->heap,
+					    pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		reftable_record_destroy(&pq->heap[i].rec);
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 00000000000..43c0974f162
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct reftable_record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 00000000000..ab66806b4a1
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,754 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct reftable_block_source *source)
+{
+	return source->ops->size(source->arg);
+}
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	int result = source->ops->read_block(source->arg, dest, off, size);
+	dest->source = *source;
+	return result;
+}
+
+void reftable_block_done(struct reftable_block *blockp)
+{
+	struct reftable_block_source source = blockp->source;
+	if (blockp != NULL && source.ops != NULL)
+		source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct reftable_block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reftable_reader_offsets *
+reader_offsets_for(struct reftable_reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reftable_reader *r,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size) {
+		return 0;
+	}
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(&r->source, dest, off, sz);
+}
+
+uint32_t reftable_reader_hash_id(struct reftable_reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reftable_reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reftable_reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	byte first_block_typ;
+	int err = 0;
+	uint32_t computed_crc;
+	uint32_t file_crc;
+
+	if (memcmp(f, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, header_size(r->version))) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+
+	f++;
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	if (r->version == 1) {
+		r->hash_id = SHA1_ID;
+	} else {
+		r->hash_id = get_be32(f);
+		switch (r->hash_id) {
+		case SHA1_ID:
+			break;
+		case SHA256_ID:
+			break;
+		default:
+			err = REFTABLE_FORMAT_ERROR;
+			goto exit;
+		}
+		f += 4;
+	}
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	computed_crc = crc32(0, footer, f - footer);
+	file_crc = get_be32(f);
+	f += 4;
+	if (computed_crc != file_crc) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+
+	first_block_typ = header[header_size(r->version)];
+	r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+	r->ref_offsets.offset = 0;
+	r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+				  r->log_offsets.offset > 0);
+	r->obj_offsets.present = r->obj_offsets.offset > 0;
+	err = 0;
+exit:
+	return err;
+}
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name)
+{
+	struct reftable_block footer = { 0 };
+	struct reftable_block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reftable_reader));
+
+	/* Need +1 to read type of first block. */
+	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
+	if (err != header_size(2) + 1) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	if (memcmp(header.data, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+	r->version = header.data[4];
+	if (r->version != 1 && r->version != 2) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto exit;
+	}
+
+	r->size = block_source_size(source) - footer_size(r->version);
+	r->source = *source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size,
+				      footer_size(r->version));
+	if (err != footer_size(r->version)) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+exit:
+	reftable_block_done(&footer);
+	reftable_block_done(&header);
+	return err;
+}
+
+struct table_iter {
+	struct reftable_reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti,
+				    struct reftable_record *rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && reftable_record_type(rec) == BLOCK_TYPE_REF) {
+		((struct reftable_ref_record *)rec->data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reftable_block_done(&ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off,
+				  int version)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += header_size(version);
+	}
+
+	*typ = data[0];
+	if (reftable_is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct reftable_block block = { 0 };
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : header_size(r->version);
+	int32_t block_size = 0;
+
+	if (next_off >= r->size) {
+		return 1;
+	}
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0) {
+		return err;
+	}
+
+	block_size = extract_block_size(block.data, &block_typ, next_off,
+					r->version);
+	if (block_size < 0) {
+		return block_size;
+	}
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reftable_block_done(&block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reftable_block_done(&block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0) {
+		return err;
+	}
+
+	{
+		struct block_reader *brp =
+			reftable_malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
+{
+	if (reftable_record_type(rec) != ti->typ) {
+		return REFTABLE_API_ERROR;
+	}
+
+	while (true) {
+		struct table_iter next = { 0 };
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct reftable_record *rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct reftable_iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reftable_reader *r,
+				struct table_iter *ti, uint64_t off, byte typ)
+{
+	struct block_reader br = { 0 };
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0) {
+		return err;
+	}
+
+	brp = reftable_malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reftable_reader *r, struct table_iter *ti,
+			byte typ, bool index)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
+			      struct reftable_record *want)
+{
+	struct reftable_record rec =
+		reftable_new_record(reftable_record_type(want));
+	struct slice want_key = { 0 };
+	struct slice got_key = { 0 };
+	struct table_iter next = { 0 };
+	int err = -1;
+	reftable_record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0) {
+			goto exit;
+		}
+		{
+			int cmp = slice_cmp(got_key, want_key);
+			if (cmp > 0) {
+				table_iter_block_done(&next);
+				break;
+			}
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0) {
+		goto exit;
+	}
+	err = 0;
+
+exit:
+	block_iter_close(&next.bi);
+	reftable_record_destroy(&rec);
+	slice_release(&want_key);
+	slice_release(&got_key);
+	return err;
+}
+
+static int reader_seek_indexed(struct reftable_reader *r,
+			       struct reftable_iterator *it,
+			       struct reftable_record *rec)
+{
+	struct reftable_index_record want_index = { 0 };
+	struct reftable_record want_index_rec = { 0 };
+	struct reftable_index_record index_result = { 0 };
+	struct reftable_record index_result_rec = { 0 };
+	struct table_iter index_iter = { 0 };
+	struct table_iter next = { 0 };
+	int err = 0;
+
+	reftable_record_key(rec, &want_index.last_key);
+	reftable_record_from_index(&want_index_rec, &want_index);
+	reftable_record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, reftable_record_type(rec), true);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reader_seek_linear(r, &index_iter, &want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, &index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0) {
+			goto exit;
+		}
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (next.typ == reftable_record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = REFTABLE_FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter *malloced =
+			reftable_calloc(sizeof(struct table_iter));
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+exit:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	reftable_record_clear(&want_index_rec);
+	reftable_record_clear(&index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reftable_reader *r,
+				struct reftable_iterator *it,
+				struct reftable_record *rec)
+{
+	struct reftable_reader_offsets *offs =
+		reader_offsets_for(r, reftable_record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = { 0 };
+	int err = 0;
+	if (idx > 0) {
+		return reader_seek_indexed(r, it, rec);
+	}
+
+	err = reader_start(r, &ti, reftable_record_type(rec), false);
+	if (err < 0) {
+		return err;
+	}
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct table_iter *p =
+			reftable_malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec)
+{
+	byte typ = reftable_record_type(rec);
+
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reftable_reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int reftable_new_reader(struct reftable_reader **p,
+			struct reftable_block_source *src, char const *name)
+{
+	struct reftable_reader *rd =
+		reftable_calloc(sizeof(struct reftable_reader));
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		block_source_close(src);
+		reftable_free(rd);
+	}
+	return err;
+}
+
+void reftable_reader_free(struct reftable_reader *r)
+{
+	reader_close(r);
+	reftable_free(r);
+}
+
+static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
+					    struct reftable_iterator *it,
+					    byte *oid)
+{
+	struct reftable_obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct reftable_record want_rec = { 0 };
+	struct reftable_iterator oit = { 0 };
+	struct reftable_obj_record got = { 0 };
+	struct reftable_record got_rec = { 0 };
+	int err = 0;
+	struct indexed_table_ref_iter *itr = NULL;
+
+	/* Look through the reverse index. */
+	reftable_record_from_obj(&want_rec, &want);
+	err = reader_seek(r, &oit, &want_rec);
+	if (err != 0) {
+		goto exit;
+	}
+
+	/* read out the reftable_obj_record */
+	reftable_record_from_obj(&got_rec, &got);
+	err = iterator_next(&oit, &got_rec);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		/* didn't find it; return empty iterator */
+		iterator_set_empty(it);
+		err = 0;
+		goto exit;
+	}
+
+	err = new_indexed_table_ref_iter(&itr, r, oid, hash_size(r->hash_id),
+					 got.offsets, got.offset_len);
+	if (err < 0) {
+		goto exit;
+	}
+	got.offsets = NULL;
+	iterator_from_indexed_table_ref_iter(it, itr);
+
+exit:
+	reftable_iterator_destroy(&oit);
+	reftable_record_clear(&got_rec);
+	return err;
+}
+
+static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
+					      struct reftable_iterator *it,
+					      byte *oid)
+{
+	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
+	struct filtering_ref_iterator *filter = NULL;
+	int oid_len = hash_size(r->hash_id);
+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		reftable_free(ti);
+		return err;
+	}
+
+	filter = reftable_calloc(sizeof(struct filtering_ref_iterator));
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	reftable_table_from_reader(&filter->tab, r);
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, byte *oid)
+{
+	if (r->obj_offsets.present) {
+		return reftable_reader_refs_for_indexed(r, it, oid);
+	}
+	return reftable_reader_refs_for_unindexed(r, it, oid);
+}
+
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 00000000000..cc87aa49ea5
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,65 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source *source);
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_close(struct reftable_block_source *source);
+
+/* metadata for a block type */
+struct reftable_reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+/* The state for reading a reftable file. */
+struct reftable_reader {
+	/* for convience, associate a name with the instance. */
+	char *name;
+	struct reftable_block_source source;
+
+	/* Size of the file, excluding the footer. */
+	uint64_t size;
+
+	/* 'sha1' for SHA1, 's256' for SHA-256 */
+	uint32_t hash_id;
+
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	/* Length of the OID keys in the 'o' section */
+	int object_id_len;
+	int version;
+
+	struct reftable_reader_offsets ref_offsets;
+	struct reftable_reader_offsets obj_offsets;
+	struct reftable_reader_offsets log_offsets;
+};
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name);
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec);
+void reader_close(struct reftable_reader *r);
+const char *reader_name(struct reftable_reader *r);
+
+/* initialize a block reader to read from `r` */
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 00000000000..083c7d92052
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1154 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+/* record.c - methods for different types of records. */
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0) {
+		return -1;
+	}
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = { 0 };
+	int i = 9;
+	int n = 0;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	n = sizeof(buf) - i - 1;
+	if (dest.len < n) {
+		return -1;
+	}
+	memcpy(dest.buf, &buf[i + 1], n);
+	return n;
+}
+
+int reftable_is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+	if (in.len < tsize) {
+		return -1;
+	}
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	slice_consume(&in, tsize);
+
+	return start_len - in.len;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+	if (s.len < l) {
+		return -1;
+	}
+	memcpy(s.buf, str, l);
+	slice_consume(&s, l);
+
+	return start.len - s.len;
+}
+
+int reftable_encode_key(bool *restart, struct slice dest, struct slice prev_key,
+			struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&dest, n);
+
+	if (dest.len < suffix_len) {
+		return -1;
+	}
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	slice_consume(&dest, suffix_len);
+
+	return start.len - dest.len;
+}
+
+int reftable_decode_key(struct slice *key, byte *extra, struct slice last_key,
+			struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	if (prefix_len > last_key.len) {
+		return -1;
+	}
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0) {
+		return -1;
+	}
+	slice_consume(&in, n);
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len) {
+		return -1;
+	}
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	slice_consume(&in, suffix_len);
+
+	return start_len - in.len;
+}
+
+static void reftable_ref_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_ref_record *rec =
+		(const struct reftable_ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
+	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	reftable_ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = xstrdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = reftable_malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = reftable_malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9) {
+		return '0' + c;
+	}
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size(hash_id));
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size(hash_id));
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void reftable_ref_record_clear_void(void *rec)
+{
+	reftable_ref_record_clear((struct reftable_ref_record *)rec);
+}
+
+void reftable_ref_record_clear(struct reftable_ref_record *ref)
+{
+	reftable_free(ref->ref_name);
+	reftable_free(ref->target);
+	reftable_free(ref->target_value);
+	reftable_free(ref->value);
+	memset(ref, 0, sizeof(struct reftable_ref_record));
+}
+
+static byte reftable_ref_record_val_type(const void *rec)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL) {
+		return 3;
+	}
+	return 0;
+}
+
+static int reftable_ref_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+
+	return start.len - s.len;
+}
+
+static int reftable_ref_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0) {
+		return n;
+	}
+	assert(hash_size > 0);
+
+	slice_consume(&in, n);
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = reftable_malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = reftable_malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		break;
+	case 3: {
+		struct slice dest = { 0 };
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+		seen_target = true;
+		if (r->target != NULL) {
+			reftable_free(r->target);
+		}
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+static bool reftable_ref_record_is_deletion_void(const void *p)
+{
+	return reftable_ref_record_is_deletion(
+		(const struct reftable_ref_record *)p);
+}
+
+struct reftable_record_vtable reftable_ref_record_vtable = {
+	.key = &reftable_ref_record_key,
+	.type = BLOCK_TYPE_REF,
+	.copy_from = &reftable_ref_record_copy_from,
+	.val_type = &reftable_ref_record_val_type,
+	.encode = &reftable_ref_record_encode,
+	.decode = &reftable_ref_record_decode,
+	.clear = &reftable_ref_record_clear_void,
+	.is_deletion = &reftable_ref_record_is_deletion_void,
+};
+
+static void reftable_obj_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_obj_record *rec =
+		(const struct reftable_obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void reftable_obj_record_clear(void *rec)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	FREE_AND_NULL(obj->hash_prefix);
+	FREE_AND_NULL(obj->offsets);
+	memset(obj, 0, sizeof(struct reftable_obj_record));
+}
+
+static void reftable_obj_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	const struct reftable_obj_record *src =
+		(const struct reftable_obj_record *)src_rec;
+
+	reftable_obj_record_clear(obj);
+	*obj = *src;
+	obj->hash_prefix = reftable_malloc(obj->hash_prefix_len);
+	memcpy(obj->hash_prefix, src->hash_prefix, obj->hash_prefix_len);
+
+	{
+		int olen = obj->offset_len * sizeof(uint64_t);
+		obj->offsets = reftable_malloc(olen);
+		memcpy(obj->offsets, src->offsets, olen);
+	}
+}
+
+static byte reftable_obj_record_val_type(const void *rec)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8) {
+		return r->offset_len;
+	}
+	return 0;
+}
+
+static int reftable_obj_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	struct slice start = s;
+	int i = 0;
+	int n = 0;
+	uint64_t last = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+	if (r->offset_len == 0) {
+		return start.len - s.len;
+	}
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	last = r->offsets[0];
+	for (i = 1; i < r->offset_len; i++) {
+		int n = put_var_int(s, r->offsets[i] - last);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+		last = r->offsets[i];
+	}
+	return start.len - s.len;
+}
+
+static int reftable_obj_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	uint64_t last;
+	int j;
+	r->hash_prefix = reftable_malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		slice_consume(&in, n);
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0) {
+		return start.len - in.len;
+	}
+
+	r->offsets = reftable_malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0) {
+		return n;
+	}
+	slice_consume(&in, n);
+
+	last = r->offsets[0];
+	j = 1;
+	while (j < count) {
+		uint64_t delta = 0;
+		int n = get_var_int(&delta, in);
+		if (n < 0) {
+			return n;
+		}
+		slice_consume(&in, n);
+
+		last = r->offsets[j] = (delta + last);
+		j++;
+	}
+	return start.len - in.len;
+}
+
+static bool not_a_deletion(const void *p)
+{
+	return false;
+}
+
+struct reftable_record_vtable reftable_obj_record_vtable = {
+	.key = &reftable_obj_record_key,
+	.type = BLOCK_TYPE_OBJ,
+	.copy_from = &reftable_obj_record_copy_from,
+	.val_type = &reftable_obj_record_val_type,
+	.encode = &reftable_obj_record_encode,
+	.decode = &reftable_obj_record_decode,
+	.clear = &reftable_obj_record_clear,
+	.is_deletion = not_a_deletion,
+};
+
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size(hash_id));
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size(hash_id));
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static void reftable_log_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_log_record *rec =
+		(const struct reftable_log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_be64(dest->buf + 1 + len, ts);
+}
+
+static void reftable_log_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
+	const struct reftable_log_record *src =
+		(const struct reftable_log_record *)src_rec;
+
+	reftable_log_record_clear(dst);
+	*dst = *src;
+	if (dst->ref_name != NULL) {
+		dst->ref_name = xstrdup(dst->ref_name);
+	}
+	if (dst->email != NULL) {
+		dst->email = xstrdup(dst->email);
+	}
+	if (dst->name != NULL) {
+		dst->name = xstrdup(dst->name);
+	}
+	if (dst->message != NULL) {
+		dst->message = xstrdup(dst->message);
+	}
+
+	if (dst->new_hash != NULL) {
+		dst->new_hash = reftable_malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = reftable_malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void reftable_log_record_clear_void(void *rec)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	reftable_log_record_clear(r);
+}
+
+void reftable_log_record_clear(struct reftable_log_record *r)
+{
+	reftable_free(r->ref_name);
+	reftable_free(r->new_hash);
+	reftable_free(r->old_hash);
+	reftable_free(r->name);
+	reftable_free(r->email);
+	reftable_free(r->message);
+	memset(r, 0, sizeof(struct reftable_log_record));
+}
+
+static byte reftable_log_record_val_type(const void *rec)
+{
+	const struct reftable_log_record *log =
+		(const struct reftable_log_record *)rec;
+
+	return reftable_log_record_is_deletion(log) ? 0 : 1;
+}
+
+static byte zero[SHA256_SIZE] = { 0 };
+
+static int reftable_log_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (reftable_log_record_is_deletion(r)) {
+		return 0;
+	}
+
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size) {
+		return -1;
+	}
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	slice_consume(&s, 2 * hash_size);
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	n = put_var_int(s, r->time);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	if (s.len < 2) {
+		return -1;
+	}
+
+	put_be16(s.buf, r->tz_offset);
+	slice_consume(&s, 2);
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0) {
+		return -1;
+	}
+	slice_consume(&s, n);
+
+	return start.len - s.len;
+}
+
+static int reftable_log_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = { 0 };
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (val_type == 0) {
+		FREE_AND_NULL(r->old_hash);
+		FREE_AND_NULL(r->new_hash);
+		FREE_AND_NULL(r->message);
+		FREE_AND_NULL(r->email);
+		FREE_AND_NULL(r->name);
+		return 0;
+	}
+
+	if (in.len < 2 * hash_size) {
+		return REFTABLE_FORMAT_ERROR;
+	}
+
+	r->old_hash = reftable_realloc(r->old_hash, hash_size);
+	r->new_hash = reftable_realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	slice_consume(&in, 2 * hash_size);
+
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->name = reftable_realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->email = reftable_realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+	r->time = ts;
+	if (in.len < 2) {
+		goto error;
+	}
+
+	r->tz_offset = get_be16(in.buf);
+	slice_consume(&in, 2);
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0) {
+		goto error;
+	}
+	slice_consume(&in, n);
+
+	r->message = reftable_realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	slice_release(&dest);
+	return start.len - in.len;
+
+error:
+	slice_release(&dest);
+	return REFTABLE_FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL) {
+		a = empty;
+	}
+	if (b == NULL) {
+		b = empty;
+	}
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL) {
+		a = zero;
+	}
+	if (b == NULL) {
+		b = zero;
+	}
+	return !memcmp(a, b, sz);
+}
+
+bool reftable_log_record_equal(struct reftable_log_record *a,
+			       struct reftable_log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+static bool reftable_log_record_is_deletion_void(const void *p)
+{
+	return reftable_log_record_is_deletion(
+		(const struct reftable_log_record *)p);
+}
+
+struct reftable_record_vtable reftable_log_record_vtable = {
+	.key = &reftable_log_record_key,
+	.type = BLOCK_TYPE_LOG,
+	.copy_from = &reftable_log_record_copy_from,
+	.val_type = &reftable_log_record_val_type,
+	.encode = &reftable_log_record_encode,
+	.decode = &reftable_log_record_decode,
+	.clear = &reftable_log_record_clear_void,
+	.is_deletion = &reftable_log_record_is_deletion_void,
+};
+
+struct reftable_record reftable_new_record(byte typ)
+{
+	struct reftable_record rec = { NULL };
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct reftable_ref_record *r =
+			reftable_calloc(sizeof(struct reftable_ref_record));
+		reftable_record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct reftable_obj_record *r =
+			reftable_calloc(sizeof(struct reftable_obj_record));
+		reftable_record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct reftable_log_record *r =
+			reftable_calloc(sizeof(struct reftable_log_record));
+		reftable_record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct reftable_index_record *r =
+			reftable_calloc(sizeof(struct reftable_index_record));
+		reftable_record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+void *reftable_record_yield(struct reftable_record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+void reftable_record_destroy(struct reftable_record *rec)
+{
+	reftable_record_clear(rec);
+	reftable_free(reftable_record_yield(rec));
+}
+
+static void reftable_index_record_key(const void *r, struct slice *dest)
+{
+	struct reftable_index_record *rec = (struct reftable_index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void reftable_index_record_copy_from(void *rec, const void *src_rec,
+					    int hash_size)
+{
+	struct reftable_index_record *dst = (struct reftable_index_record *)rec;
+	struct reftable_index_record *src =
+		(struct reftable_index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void reftable_index_record_clear(void *rec)
+{
+	struct reftable_index_record *idx = (struct reftable_index_record *)rec;
+	slice_release(&idx->last_key);
+}
+
+static byte reftable_index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int reftable_index_record_encode(const void *rec, struct slice out,
+					int hash_size)
+{
+	const struct reftable_index_record *r =
+		(const struct reftable_index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&out, n);
+
+	return start.len - out.len;
+}
+
+static int reftable_index_record_decode(void *rec, struct slice key,
+					byte val_type, struct slice in,
+					int hash_size)
+{
+	struct slice start = in;
+	struct reftable_index_record *r = (struct reftable_index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0) {
+		return n;
+	}
+
+	slice_consume(&in, n);
+	return start.len - in.len;
+}
+
+struct reftable_record_vtable reftable_index_record_vtable = {
+	.key = &reftable_index_record_key,
+	.type = BLOCK_TYPE_INDEX,
+	.copy_from = &reftable_index_record_copy_from,
+	.val_type = &reftable_index_record_val_type,
+	.encode = &reftable_index_record_encode,
+	.decode = &reftable_index_record_decode,
+	.clear = &reftable_index_record_clear,
+	.is_deletion = &not_a_deletion,
+};
+
+void reftable_record_key(struct reftable_record *rec, struct slice *dest)
+{
+	rec->ops->key(rec->data, dest);
+}
+
+byte reftable_record_type(struct reftable_record *rec)
+{
+	return rec->ops->type;
+}
+
+int reftable_record_encode(struct reftable_record *rec, struct slice dest,
+			   int hash_size)
+{
+	return rec->ops->encode(rec->data, dest, hash_size);
+}
+
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size)
+{
+	assert(src->ops->type == rec->ops->type);
+
+	rec->ops->copy_from(rec->data, src->data, hash_size);
+}
+
+byte reftable_record_val_type(struct reftable_record *rec)
+{
+	return rec->ops->val_type(rec->data);
+}
+
+int reftable_record_decode(struct reftable_record *rec, struct slice key,
+			   byte extra, struct slice src, int hash_size)
+{
+	return rec->ops->decode(rec->data, key, extra, src, hash_size);
+}
+
+void reftable_record_clear(struct reftable_record *rec)
+{
+	rec->ops->clear(rec->data);
+}
+
+bool reftable_record_is_deletion(struct reftable_record *rec)
+{
+	return rec->ops->is_deletion(rec->data);
+}
+
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *ref_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = ref_rec;
+	rec->ops = &reftable_ref_record_vtable;
+}
+
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *obj_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = obj_rec;
+	rec->ops = &reftable_obj_record_vtable;
+}
+
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *index_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = index_rec;
+	rec->ops = &reftable_index_record_vtable;
+}
+
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *log_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = log_rec;
+	rec->ops = &reftable_log_record_vtable;
+}
+
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_REF);
+	return (struct reftable_ref_record *)rec->data;
+}
+
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_LOG);
+	return (struct reftable_log_record *)rec->data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL) {
+		return !memcmp(a, b, hash_size);
+	}
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL) {
+		return 0 == strcmp(a, b);
+	}
+
+	return a == b;
+}
+
+bool reftable_ref_record_equal(struct reftable_ref_record *a,
+			       struct reftable_ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int reftable_ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_ref_record *)a)->ref_name,
+		      ((struct reftable_ref_record *)b)->ref_name);
+}
+
+bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int reftable_log_record_compare_key(const void *a, const void *b)
+{
+	struct reftable_log_record *la = (struct reftable_log_record *)a;
+	struct reftable_log_record *lb = (struct reftable_log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp) {
+		return cmp;
+	}
+	if (la->update_index > lb->update_index) {
+		return -1;
+	}
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
+{
+	return (log->new_hash == NULL && log->old_hash == NULL &&
+		log->name == NULL && log->email == NULL &&
+		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
+		log->message == NULL);
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 00000000000..dca26eda2dc
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,128 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+/* utilities for de/encoding varints */
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+
+/* Methods for records. */
+struct reftable_record_vtable {
+	/* encode the key of to a byte slice. */
+	void (*key)(const void *rec, struct slice *dest);
+
+	/* The record type of ('r' for ref). */
+	byte type;
+
+	void (*copy_from)(void *dest, const void *src, int hash_size);
+
+	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
+	 * vs ref deletion) */
+	byte (*val_type)(const void *rec);
+
+	/* encodes rec into dest, returning how much space was used. */
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+
+	/* decode data from `src` into the record. */
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+
+	/* deallocate and null the record. */
+	void (*clear)(void *rec);
+
+	/* is this a tombstone? */
+	bool (*is_deletion)(const void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct reftable_record {
+	void *data;
+	struct reftable_record_vtable *ops;
+};
+
+/* returns true for recognized block types. Block start with the block type. */
+int reftable_is_block_type(byte typ);
+
+/* creates a malloced record of the given type. Dispose with record_destroy */
+struct reftable_record reftable_new_record(byte typ);
+
+extern struct reftable_record_vtable reftable_ref_record_vtable;
+
+/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
+   number of bytes written. */
+int reftable_encode_key(bool *restart, struct slice dest, struct slice prev_key,
+			struct slice key, byte extra);
+
+/* Decode into `key` and `extra` from `in` */
+int reftable_decode_key(struct slice *key, byte *extra, struct slice last_key,
+			struct slice in);
+
+/* reftable_index_record are used internally to speed up lookups. */
+struct reftable_index_record {
+	uint64_t offset; /* Offset of block */
+	struct slice last_key; /* Last key of the block. */
+};
+
+/* reftable_obj_record stores an object ID => ref mapping. */
+struct reftable_obj_record {
+	byte *hash_prefix; /* leading bytes of the object ID */
+	int hash_prefix_len; /* number of leading bytes. Constant
+			      * across a single table. */
+	uint64_t *offsets; /* a vector of file offsets. */
+	int offset_len;
+};
+
+/* see struct record_vtable */
+
+void reftable_record_key(struct reftable_record *rec, struct slice *dest);
+byte reftable_record_type(struct reftable_record *rec);
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size);
+byte reftable_record_val_type(struct reftable_record *rec);
+int reftable_record_encode(struct reftable_record *rec, struct slice dest,
+			   int hash_size);
+int reftable_record_decode(struct reftable_record *rec, struct slice key,
+			   byte extra, struct slice src, int hash_size);
+bool reftable_record_is_deletion(struct reftable_record *rec);
+
+/* zeroes out the embedded record */
+void reftable_record_clear(struct reftable_record *rec);
+
+/* clear out the record, yielding the reftable_record data that was
+ * encapsulated. */
+void *reftable_record_yield(struct reftable_record *rec);
+
+/* clear and deallocate embedded record, and zero `rec`. */
+void reftable_record_destroy(struct reftable_record *rec);
+
+/* initialize generic records from concrete records. The generic record should
+ * be zeroed out. */
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *objrec);
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *idxrec);
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *refrec);
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *logrec);
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *ref);
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *ref);
+
+/* for qsort. */
+int reftable_ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int reftable_log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/refname.c b/reftable/refname.c
new file mode 100644
index 00000000000..a4d026942e8
--- /dev/null
+++ b/reftable/refname.c
@@ -0,0 +1,215 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+#include "reftable.h"
+#include "basics.h"
+#include "refname.h"
+#include "slice.h"
+
+struct find_arg {
+	char **names;
+	const char *want;
+};
+
+static int find_name(size_t k, void *arg)
+{
+	struct find_arg *f_arg = (struct find_arg *)arg;
+	return strcmp(f_arg->names[k], f_arg->want) >= 0;
+}
+
+int modification_has_ref(struct modification *mod, const char *name)
+{
+	struct reftable_ref_record ref = { 0 };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = name,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len && !strcmp(mod->add[idx], name)) {
+			return 0;
+		}
+	}
+
+	if (mod->del_len > 0) {
+		struct find_arg arg = {
+			.names = mod->del,
+			.want = name,
+		};
+		int idx = binsearch(mod->del_len, find_name, &arg);
+		if (idx < mod->del_len && !strcmp(mod->del[idx], name)) {
+			return 1;
+		}
+	}
+
+	err = reftable_table_read_ref(&mod->tab, name, &ref);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static void modification_clear(struct modification *mod)
+{
+	/* don't delete the strings themselves; they're owned by ref records.
+	 */
+	FREE_AND_NULL(mod->add);
+	FREE_AND_NULL(mod->del);
+	mod->add_len = 0;
+	mod->del_len = 0;
+}
+
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix)
+{
+	struct reftable_iterator it = { NULL };
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = prefix,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len &&
+		    !strncmp(prefix, mod->add[idx], strlen(prefix))) {
+			goto exit;
+		}
+	}
+	err = reftable_table_seek_ref(&mod->tab, &it, prefix);
+	if (err) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err) {
+			goto exit;
+		}
+
+		if (mod->del_len > 0) {
+			struct find_arg arg = {
+				.names = mod->del,
+				.want = ref.ref_name,
+			};
+			int idx = binsearch(mod->del_len, find_name, &arg);
+			if (idx < mod->del_len &&
+			    !strcmp(ref.ref_name, mod->del[idx])) {
+				continue;
+			}
+		}
+
+		if (strncmp(ref.ref_name, prefix, strlen(prefix))) {
+			err = 1;
+			goto exit;
+		}
+		err = 0;
+		goto exit;
+	}
+
+exit:
+	reftable_ref_record_clear(&ref);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int validate_ref_name(const char *name)
+{
+	while (true) {
+		char *next = strchr(name, '/');
+		if (!*name) {
+			return REFTABLE_REFNAME_ERROR;
+		}
+		if (!next) {
+			return 0;
+		}
+		if (next - name == 0 || (next - name == 1 && *name == '.') ||
+		    (next - name == 2 && name[0] == '.' && name[1] == '.'))
+			return REFTABLE_REFNAME_ERROR;
+		name = next + 1;
+	}
+	return 0;
+}
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz)
+{
+	struct modification mod = {
+		.tab = tab,
+		.add = reftable_calloc(sizeof(char *) * sz),
+		.del = reftable_calloc(sizeof(char *) * sz),
+	};
+	int i = 0;
+	int err = 0;
+	for (; i < sz; i++) {
+		if (reftable_ref_record_is_deletion(&recs[i])) {
+			mod.del[mod.del_len++] = recs[i].ref_name;
+		} else {
+			mod.add[mod.add_len++] = recs[i].ref_name;
+		}
+	}
+
+	err = modification_validate(&mod);
+	modification_clear(&mod);
+	return err;
+}
+
+static void slice_trim_component(struct slice *sl)
+{
+	while (sl->len > 0) {
+		bool is_slash = (sl->buf[sl->len - 1] == '/');
+		sl->len--;
+		if (is_slash)
+			break;
+	}
+}
+
+int modification_validate(struct modification *mod)
+{
+	struct slice slashed = { 0 };
+	int err = 0;
+	int i = 0;
+	for (; i < mod->add_len; i++) {
+		err = validate_ref_name(mod->add[i]);
+		if (err) {
+			goto exit;
+		}
+		slice_set_string(&slashed, mod->add[i]);
+		slice_addstr(&slashed, "/");
+
+		err = modification_has_ref_with_prefix(
+			mod, slice_as_string(&slashed));
+		if (err == 0) {
+			err = REFTABLE_NAME_CONFLICT;
+			goto exit;
+		}
+		if (err < 0) {
+			goto exit;
+		}
+
+		slice_set_string(&slashed, mod->add[i]);
+		while (slashed.len) {
+			slice_trim_component(&slashed);
+			err = modification_has_ref(mod,
+						   slice_as_string(&slashed));
+			if (err == 0) {
+				err = REFTABLE_NAME_CONFLICT;
+				goto exit;
+			}
+			if (err < 0) {
+				goto exit;
+			}
+		}
+	}
+	err = 0;
+exit:
+	slice_release(&slashed);
+	return err;
+}
diff --git a/reftable/refname.h b/reftable/refname.h
new file mode 100644
index 00000000000..f335287fcdb
--- /dev/null
+++ b/reftable/refname.h
@@ -0,0 +1,38 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+#ifndef REFNAME_H
+#define REFNAME_H
+
+#include "reftable.h"
+
+struct modification {
+	struct reftable_table tab;
+
+	char **add;
+	size_t add_len;
+
+	char **del;
+	size_t del_len;
+};
+
+// -1 = error, 0 = found, 1 = not found
+int modification_has_ref(struct modification *mod, const char *name);
+
+// -1 = error, 0 = found, 1 = not found.
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix);
+
+// 0 = OK.
+int validate_ref_name(const char *name);
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz);
+
+int modification_validate(struct modification *mod);
+
+#endif
diff --git a/reftable/reftable.c b/reftable/reftable.c
new file mode 100644
index 00000000000..f385ce0bd83
--- /dev/null
+++ b/reftable/reftable.c
@@ -0,0 +1,92 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+#include "record.h"
+#include "reader.h"
+#include "merged.h"
+
+struct reftable_table_vtable {
+	int (*seek)(void *tab, struct reftable_iterator *it,
+		    struct reftable_record *);
+};
+
+static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
+				     struct reftable_record *rec)
+{
+	return reader_seek((struct reftable_reader *)tab, it, rec);
+}
+
+static struct reftable_table_vtable reader_vtable = {
+	.seek = reftable_reader_seek_void,
+};
+
+static int reftable_merged_table_seek_void(void *tab,
+					   struct reftable_iterator *it,
+					   struct reftable_record *rec)
+{
+	return merged_table_seek_record((struct reftable_merged_table *)tab, it,
+					rec);
+}
+
+static struct reftable_table_vtable merged_table_vtable = {
+	.seek = reftable_merged_table_seek_void,
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return tab->ops->seek(tab->table_arg, it, &rec);
+}
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &reader_vtable;
+	tab->table_arg = reader;
+}
+
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *merged)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &merged_table_vtable;
+	tab->table_arg = merged;
+}
+
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_iterator it = { 0 };
+	int err = reftable_table_seek_ref(tab, &it, name);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_ref(&it, ref);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(ref->ref_name, name) ||
+	    reftable_ref_record_is_deletion(ref)) {
+		reftable_ref_record_clear(ref);
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 00000000000..86f2dcf33d4
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,564 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *));
+
+/****************************************************************
+ Basic data types
+
+ Reftables store the state of each ref in struct reftable_ref_record, and they
+ store a sequence of reflog updates in struct reftable_log_record.
+ ****************************************************************/
+
+/* reftable_ref_record holds a ref database entry target_value */
+struct reftable_ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
+
+/* prints a reftable_ref_record onto stdout */
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id);
+
+/* frees and nulls all pointer values. */
+void reftable_ref_record_clear(struct reftable_ref_record *ref);
+
+/* returns whether two reftable_ref_records are the same */
+int reftable_ref_record_equal(struct reftable_ref_record *a,
+			      struct reftable_ref_record *b, int hash_size);
+
+/* reftable_log_record holds a reflog entry */
+struct reftable_log_record {
+	char *ref_name;
+	uint64_t update_index; /* logical timestamp of a transactional update.
+				*/
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int reftable_log_record_is_deletion(const struct reftable_log_record *log);
+
+/* frees and nulls all pointer values. */
+void reftable_log_record_clear(struct reftable_log_record *log);
+
+/* returns whether two records are equal. */
+int reftable_log_record_equal(struct reftable_log_record *a,
+			      struct reftable_log_record *b, int hash_size);
+
+/* dumps a reftable_log_record on stdout, for debugging/testing. */
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id);
+
+/****************************************************************
+ Error handling
+
+ Error are signaled with negative integer return values. 0 means success.
+ ****************************************************************/
+
+/* different types of errors */
+enum reftable_error {
+	/* Unexpected file system behavior */
+	REFTABLE_IO_ERROR = -2,
+
+	/* Format inconsistency on reading data
+	 */
+	REFTABLE_FORMAT_ERROR = -3,
+
+	/* File does not exist. Returned from block_source_from_file(),  because
+	   it needs special handling in stack.
+	*/
+	REFTABLE_NOT_EXIST_ERROR = -4,
+
+	/* Trying to write out-of-date data. */
+	REFTABLE_LOCK_ERROR = -5,
+
+	/* Misuse of the API:
+	   - on writing a record with NULL ref_name.
+	   - on writing a reftable_ref_record outside the table limits
+	   - on writing a ref or log record before the stack's next_update_index
+	   - on reading a reftable_ref_record from log iterator, or vice versa.
+	*/
+	REFTABLE_API_ERROR = -6,
+
+	/* Decompression error */
+	REFTABLE_ZLIB_ERROR = -7,
+
+	/* Wrote a table without blocks. */
+	REFTABLE_EMPTY_TABLE_ERROR = -8,
+
+	/* Dir/file conflict. */
+	REFTABLE_NAME_CONFLICT = -9,
+
+	/* Illegal ref name. */
+	REFTABLE_REFNAME_ERROR = -10,
+};
+
+/* convert the numeric error code to a string. The string should not be
+ * deallocated. */
+const char *reftable_error_str(int err);
+
+/*
+ * Convert the numeric error code to an equivalent errno code.
+ */
+int reftable_error_to_errno(int err);
+
+/****************************************************************
+ Writing
+
+ Writing single reftables
+ ****************************************************************/
+
+/* reftable_write_options sets options for writing a single reftable. */
+struct reftable_write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+
+	/* boolean: do not check ref names for validity or dir/file conflicts.
+	 */
+	int skip_name_check;
+};
+
+/* reftable_block_stats holds statistics for a single block type */
+struct reftable_block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct reftable_stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct reftable_block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct reftable_block_stats obj_stats;
+	/* stats for index blocks */
+	struct reftable_block_stats idx_stats;
+	/* stats for log blocks */
+	struct reftable_block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* reftable_new_writer creates a new writer */
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, uint8_t *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int reftable_fd_write(void *fdp, uint8_t *data, size_t size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+
+   For transactional updates, typically min==max. When converting an existing
+   ref database into a single reftable, this would be a range of update-index
+   timestamps.
+ */
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max);
+
+/* adds a reftable_ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned.
+
+   It is an error to write a ref record after a log record.
+ */
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n);
+
+/* adds a reftable_log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n);
+
+/* reftable_writer_close finalizes the reftable. The writer is retained so
+ * statistics can be inspected. */
+int reftable_writer_close(struct reftable_writer *w);
+
+/* writer_stats returns the statistics on the reftable being written.
+
+   This struct becomes invalid when the writer is freed.
+ */
+const struct reftable_stats *writer_stats(struct reftable_writer *w);
+
+/* reftable_writer_free deallocates memory for the writer */
+void reftable_writer_free(struct reftable_writer *w);
+
+/****************************************************************
+ * ITERATING
+ ****************************************************************/
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct reftable_iterator {
+	struct reftable_iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref);
+
+/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log);
+
+/* releases resources associated with an iterator. */
+void reftable_iterator_destroy(struct reftable_iterator *it);
+
+/****************************************************************
+ Reading single tables
+
+ The follow routines are for reading single files. For an application-level
+ interface, skip ahead to struct reftable_merged_table and struct
+ reftable_stack.
+ ****************************************************************/
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct reftable_block_source {
+	struct reftable_block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct reftable_block {
+	uint8_t *data;
+	int len;
+	struct reftable_block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct reftable_block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct reftable_block *dest,
+			  uint64_t off, uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct reftable_block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int reftable_block_source_from_file(struct reftable_block_source *block_src,
+				    const char *name);
+
+/* The reader struct is a handle to an open reftable file. */
+struct reftable_reader;
+
+/* reftable_new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp. The name is used for creating a stack. Typically, it is the
+ * basename of the file. The block source `src` is owned by the reader, and is
+ * closed on calling reftable_reader_destroy().
+ */
+int reftable_new_reader(struct reftable_reader **pp,
+			struct reftable_block_source *src, const char *name);
+
+/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
+   in the table.  To seek to the start of the table, use name = "".
+
+   example:
+
+   struct reftable_reader *r = NULL;
+   int err = reftable_new_reader(&r, &src, "filename");
+   if (err < 0) { ... }
+   struct reftable_iterator it  = {0};
+   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct reftable_ref_record ref  = {0};
+   while (1) {
+     err = reftable_iterator_next_ref(&it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   reftable_iterator_destroy(&it);
+   reftable_ref_record_clear(&ref);
+ */
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reftable_reader_hash_id(struct reftable_reader *r);
+
+/* seek to logs for the given name, older than update_index. To seek to the
+   start of the table, use name = "".
+ */
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reftable_reader_free(struct reftable_reader *);
+
+/* return an iterator for the refs pointing to `oid`. */
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid);
+
+/* return the max_update_index for a table */
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
+
+/****************************************************************
+ Merged tables
+
+ A ref database kept in a sequence of table files. The merged_table presents a
+ unified view to reading (seeking, iterating) a sequence of immutable tables.
+ ****************************************************************/
+
+/* A merged table is implements seeking/iterating over a stack of tables. */
+struct reftable_merged_table;
+
+/* reftable_new_merged_table creates a new merged table. It takes ownership of
+   the stack array.
+*/
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id);
+
+/* returns an iterator positioned just before 'name' */
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index);
+
+/* like reftable_merged_table_seek_log_at but look for the newest entry. */
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
+
+/* closes readers for the merged tables */
+void reftable_merged_table_close(struct reftable_merged_table *mt);
+
+/* releases memory for the merged_table */
+void reftable_merged_table_free(struct reftable_merged_table *m);
+
+/****************************************************************
+ Generic tables
+
+ A unified API for reading tables, either merged tables, or single readers.
+ ****************************************************************/
+
+struct reftable_table {
+	struct reftable_table_vtable *ops;
+	void *table_arg;
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name);
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader);
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *table);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref);
+
+/****************************************************************
+ Mutable ref database
+
+ The stack presents an interface to a mutable sequence of reftables.
+ ****************************************************************/
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct reftable_stack;
+
+/* open a new reftable stack. The tables along with the table list will be
+   stored in 'dir'. Typically, this should be .git/reftables.
+*/
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
+
+/* holds a transaction to add tables at the top of a stack. */
+struct reftable_addition;
+
+/*
+  returns a new transaction to add reftables to the given stack. As a side
+  effect, the ref database is locked.
+*/
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st);
+
+/* Adds a reftable to transaction. */
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg);
+
+/* Commits the transaction, releasing the lock. */
+int reftable_addition_commit(struct reftable_addition *add);
+
+/* Release all non-committed data from the transaction, and deallocate the
+   transaction. Releases the lock if held. */
+void reftable_addition_destroy(struct reftable_addition *add);
+
+/* add a new table to the stack. The write_table function must call
+   reftable_writer_set_limits, add refs and return an error value. */
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write_table)(struct reftable_writer *wr,
+					  void *write_arg),
+		       void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st);
+
+/* frees all resources associated with the stack. */
+void reftable_stack_destroy(struct reftable_stack *st);
+
+/* Reloads the stack if necessary. This is very cheap to run if the stack was up
+ * to date */
+int reftable_stack_reload(struct reftable_stack *st);
+
+/* Policy for expiring reflog entries. */
+struct reftable_log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int reftable_stack_auto_compact(struct reftable_stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log);
+
+/* statistics on past compactions. */
+struct reftable_compaction_stats {
+	uint64_t bytes; /* total number of bytes written */
+	uint64_t entries_written; /* total number of entries written, including
+				     failures. */
+	int attempts; /* how often we tried to compact */
+	int failures; /* failures happen on concurrent updates */
+};
+
+/* return statistics for compaction up till now. */
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st);
+
+#endif
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 00000000000..b13a2fe09d1
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,225 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+
+	{
+		int l = strlen(str);
+		l++; /* \0 */
+		slice_resize(s, l);
+		memcpy(s->buf, str, l);
+		s->len = l - 1;
+	}
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = reftable_realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_addstr(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_addbuf(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+void slice_consume(struct slice *s, int n)
+{
+	s->buf += n;
+	s->len -= n;
+}
+
+byte *slice_detach(struct slice *s)
+{
+	byte *p = s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_release(struct slice *s)
+{
+	reftable_free(slice_detach(s));
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = { 0 };
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_detach(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	if (a.len != b.len) {
+		return 0;
+	}
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_cmp(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	if (res != 0) {
+		return res;
+	}
+	if (a.len < b.len) {
+		return -1;
+	} else if (a.len > b.len) {
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
+int slice_add(struct slice *b, byte *data, size_t sz)
+{
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = reftable_realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_add_void(void *b, byte *data, size_t sz)
+{
+	return slice_add((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = reftable_calloc(size);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf)
+{
+	assert(bs->ops == NULL);
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+struct reftable_block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct reftable_block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct reftable_block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 00000000000..1880730365b
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,77 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+/*
+  provides bounds-checked byte ranges.
+  To use, initialize as "slice x = {0};"
+ */
+struct slice {
+	int len;
+	int cap;
+	byte *buf;
+};
+
+void slice_set_string(struct slice *dest, const char *src);
+void slice_addstr(struct slice *dest, const char *src);
+
+/* Deallocate and clear slice */
+void slice_release(struct slice *slice);
+
+/* Return a malloced string for `src` */
+char *slice_to_string(struct slice src);
+
+/* Ensure that `buf` is \0 terminated. */
+const char *slice_as_string(struct slice *src);
+
+/* Compare slices */
+bool slice_equal(struct slice a, struct slice b);
+
+/* Return `buf`, clearing out `s` */
+byte *slice_detach(struct slice *s);
+
+/* Copy bytes */
+void slice_copy(struct slice *dest, struct slice src);
+
+/* Advance `buf` by `n`, and decrease length. A copy of the slice
+   should be kept for deallocating the slice. */
+void slice_consume(struct slice *s, int n);
+
+/* Set length of the slice to `l` */
+void slice_resize(struct slice *s, int l);
+
+/* Signed comparison */
+int slice_cmp(struct slice a, struct slice b);
+
+/* Append `data` to the `dest` slice.  */
+int slice_add(struct slice *dest, byte *data, size_t sz);
+
+/* Append `add` to `dest. */
+void slice_addbuf(struct slice *dest, struct slice add);
+
+/* Like slice_add, but suitable for passing to reftable_new_writer
+ */
+int slice_add_void(void *b, byte *data, size_t sz);
+
+/* Find the longest shared prefix size of `a` and `b` */
+int common_prefix_size(struct slice a, struct slice b);
+
+struct reftable_block_source;
+
+/* Create an in-memory block source for reading reftables */
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf);
+
+struct reftable_block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 00000000000..aff7f61c124
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1234 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "refname.h"
+#include "reftable.h"
+#include "writer.h"
+
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config)
+{
+	struct reftable_stack *p =
+		reftable_calloc(sizeof(struct reftable_stack));
+	struct slice list_file_name = { 0 };
+	int err = 0;
+
+	if (config.hash_id == 0) {
+		config.hash_id = SHA1_ID;
+	}
+
+	*dest = NULL;
+
+	slice_set_string(&list_file_name, dir);
+	slice_addstr(&list_file_name, "/tables.list");
+
+	p->list_file = slice_to_string(list_file_name);
+	slice_release(&list_file_name);
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = reftable_stack_reload_maybe_reuse(p, true);
+	if (err < 0) {
+		reftable_stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fd_read_lines(int fd, char ***namesp)
+{
+	off_t size = lseek(fd, 0, SEEK_END);
+	char *buf = NULL;
+	int err = 0;
+	if (size < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+	err = lseek(fd, 0, SEEK_SET);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	buf = reftable_malloc(size + 1);
+	if (read(fd, buf, size) != size) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+
+exit:
+	reftable_free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	int fd = open(filename, O_RDONLY, 0644);
+	int err = 0;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			*namesp = reftable_calloc(sizeof(char *));
+			return 0;
+		}
+
+		return REFTABLE_IO_ERROR;
+	}
+	err = fd_read_lines(fd, namesp);
+	close(fd);
+	return err;
+}
+
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void reftable_stack_destroy(struct reftable_stack *st)
+{
+	if (st->merged != NULL) {
+		reftable_merged_table_close(st->merged);
+		reftable_merged_table_free(st->merged);
+		st->merged = NULL;
+	}
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	reftable_free(st);
+}
+
+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
+						   int cur_len)
+{
+	struct reftable_reader **cur =
+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
+				      bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reftable_reader **new_tables =
+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
+	int new_tables_len = 0;
+	struct reftable_merged_table *new_merged = NULL;
+	int i;
+	struct slice table_path = { 0 };
+
+	while (*names) {
+		struct reftable_reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct reftable_block_source src = { 0 };
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_addstr(&table_path, "/");
+			slice_addstr(&table_path, name);
+
+			err = reftable_block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0) {
+				goto exit;
+			}
+
+			err = reftable_new_reader(&rd, &src, name);
+			if (err < 0) {
+				goto exit;
+			}
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
+					st->config.hash_id);
+	if (err < 0) {
+		goto exit;
+	}
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		reftable_merged_table_free(st->merged);
+	}
+	new_merged->suppress_deletions = true;
+	st->merged = new_merged;
+
+	for (i = 0; i < cur_len; i++) {
+		if (cur[i] != NULL) {
+			reader_close(cur[i]);
+			reftable_reader_free(cur[i]);
+		}
+	}
+
+exit:
+	slice_release(&table_path);
+	for (i = 0; i < new_tables_len; i++) {
+		reader_close(new_tables[i]);
+		reftable_reader_free(new_tables[i]);
+	}
+	reftable_free(new_tables);
+	reftable_free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0) {
+		return diff;
+	}
+
+	return udiff;
+}
+
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		int err2 = 0;
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = reftable_stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != REFTABLE_NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
+		   writer. Check if there was one by checking if the name list
+		   changed.
+		*/
+		err2 = read_lines(st->list_file, &names_after);
+		if (err2 < 0) {
+			free_names(names);
+			return err2;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return err;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 1;
+		sleep_millisec(delay);
+	}
+
+	return 0;
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct reftable_stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0) {
+		return err;
+	}
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto exit;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto exit;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	free_names(names);
+	return err;
+}
+
+int reftable_stack_reload(struct reftable_stack *st)
+{
+	int err = stack_uptodate(st);
+	if (err > 0) {
+		return reftable_stack_reload_maybe_reuse(st, true);
+	}
+	return err;
+}
+
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write)(struct reftable_writer *wr, void *arg),
+		       void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == REFTABLE_LOCK_ERROR) {
+			/* Ignore error return, we want to propagate
+			   REFTABLE_LOCK_ERROR.
+			*/
+			reftable_stack_reload(st);
+		}
+		return err;
+	}
+
+	if (!st->disable_auto_compact) {
+		return reftable_stack_auto_compact(st);
+	}
+
+	return 0;
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
+	slice_set_string(dest, buf);
+}
+
+struct reftable_addition {
+	int lock_file_fd;
+	struct slice lock_file_name;
+	struct reftable_stack *stack;
+	char **names;
+	char **new_tables;
+	int new_tables_len;
+	uint64_t next_update_index;
+};
+
+static int reftable_stack_init_addition(struct reftable_addition *add,
+					struct reftable_stack *st)
+{
+	int err = 0;
+	add->stack = st;
+
+	slice_set_string(&add->lock_file_name, st->list_file);
+	slice_addstr(&add->lock_file_name, ".lock");
+
+	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (add->lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = REFTABLE_LOCK_ERROR;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	err = stack_uptodate(st);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (err > 1) {
+		err = REFTABLE_LOCK_ERROR;
+		goto exit;
+	}
+
+	add->next_update_index = reftable_stack_next_update_index(st);
+exit:
+	if (err) {
+		reftable_addition_close(add);
+	}
+	return err;
+}
+
+void reftable_addition_close(struct reftable_addition *add)
+{
+	int i = 0;
+	struct slice nm = { 0 };
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_set_string(&nm, add->stack->list_file);
+		slice_addstr(&nm, "/");
+		slice_addstr(&nm, add->new_tables[i]);
+		unlink(slice_as_string(&nm));
+		reftable_free(add->new_tables[i]);
+		add->new_tables[i] = NULL;
+	}
+	reftable_free(add->new_tables);
+	add->new_tables = NULL;
+	add->new_tables_len = 0;
+
+	if (add->lock_file_fd > 0) {
+		close(add->lock_file_fd);
+		add->lock_file_fd = 0;
+	}
+	if (add->lock_file_name.len > 0) {
+		unlink(slice_as_string(&add->lock_file_name));
+		slice_release(&add->lock_file_name);
+	}
+
+	free_names(add->names);
+	add->names = NULL;
+	slice_release(&nm);
+}
+
+void reftable_addition_destroy(struct reftable_addition *add)
+{
+	if (add == NULL) {
+		return;
+	}
+	reftable_addition_close(add);
+	reftable_free(add);
+}
+
+int reftable_addition_commit(struct reftable_addition *add)
+{
+	struct slice table_list = { 0 };
+	int i = 0;
+	int err = 0;
+	if (add->new_tables_len == 0) {
+		goto exit;
+	}
+
+	for (i = 0; i < add->stack->merged->stack_len; i++) {
+		slice_addstr(&table_list, add->stack->merged->stack[i]->name);
+		slice_addstr(&table_list, "\n");
+	}
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_addstr(&table_list, add->new_tables[i]);
+		slice_addstr(&table_list, "\n");
+	}
+
+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	slice_release(&table_list);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = close(add->lock_file_fd);
+	add->lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&add->lock_file_name),
+		     add->stack->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = reftable_stack_reload(add->stack);
+
+exit:
+	reftable_addition_close(add);
+	return err;
+}
+
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st)
+{
+	int err = 0;
+	*dest = reftable_calloc(sizeof(**dest));
+	err = reftable_stack_init_addition(*dest, st);
+	if (err) {
+		reftable_free(*dest);
+		*dest = NULL;
+	}
+	return err;
+}
+
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg)
+{
+	struct reftable_addition add = { 0 };
+	int err = reftable_stack_init_addition(&add, st);
+	if (err < 0) {
+		goto exit;
+	}
+	if (err > 0) {
+		err = REFTABLE_LOCK_ERROR;
+		goto exit;
+	}
+
+	err = reftable_addition_add(&add, write_table, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_addition_commit(&add);
+exit:
+	reftable_addition_close(&add);
+	return err;
+}
+
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice tab_file_name = { 0 };
+	struct slice next_name = { 0 };
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, add->next_update_index, add->next_update_index);
+
+	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
+	slice_addstr(&temp_tab_file_name, "/");
+	slice_addbuf(&temp_tab_file_name, next_name);
+	slice_addstr(&temp_tab_file_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
+	if (tab_fd < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
+				 &add->stack->config);
+	err = write_table(wr, arg);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_writer_close(wr);
+	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto exit;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	err = stack_check_addition(add->stack,
+				   slice_as_string(&temp_tab_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (wr->min_update_index < add->next_update_index) {
+		err = REFTABLE_API_ERROR;
+		goto exit;
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_addstr(&next_name, ".ref");
+
+	slice_set_string(&tab_file_name, add->stack->reftable_dir);
+	slice_addstr(&tab_file_name, "/");
+	slice_addbuf(&tab_file_name, next_name);
+
+	/* TODO: should check destination out of paranoia */
+	err = rename(slice_as_string(&temp_tab_file_name),
+		     slice_as_string(&tab_file_name));
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto exit;
+	}
+
+	add->new_tables = reftable_realloc(add->new_tables,
+					   sizeof(*add->new_tables) *
+						   (add->new_tables_len + 1));
+	add->new_tables[add->new_tables_len] = slice_to_string(next_name);
+	add->new_tables_len++;
+exit:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_file_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_file_name));
+	}
+
+	slice_release(&temp_tab_file_name);
+	slice_release(&tab_file_name);
+	slice_release(&next_name);
+	reftable_writer_free(wr);
+	return err;
+}
+
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0) {
+		return reftable_reader_max_update_index(
+			       st->merged->stack[sz - 1]) +
+		       1;
+	}
+	return 1;
+}
+
+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct reftable_log_expiry_config *config)
+{
+	struct slice next_name = { 0 };
+	int tab_fd = -1;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reftable_reader_min_update_index(st->merged->stack[first]),
+		    reftable_reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_addstr(temp_tab, "/");
+	slice_addbuf(temp_tab, next_name);
+	slice_addstr(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0) {
+		goto exit;
+	}
+	err = reftable_writer_close(wr);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+exit:
+	reftable_writer_free(wr);
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		slice_release(temp_tab);
+	}
+	slice_release(&next_name);
+	return err;
+}
+
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reftable_reader **subtabs = reftable_calloc(
+		sizeof(struct reftable_reader *) * (last - first + 1));
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+
+	uint64_t entries = 0;
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reftable_reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	reftable_writer_set_limits(wr,
+				   st->merged->stack[first]->min_update_index,
+				   st->merged->stack[last]->max_update_index);
+
+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
+					st->config.hash_id);
+	if (err < 0) {
+		reftable_free(subtabs);
+		goto exit;
+	}
+
+	err = reftable_merged_table_seek_ref(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = reftable_writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+	reftable_iterator_destroy(&it);
+
+	err = reftable_merged_table_seek_log(mt, &it, "");
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = reftable_writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+
+exit:
+	reftable_iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		reftable_merged_table_free(mt);
+	}
+	reftable_ref_record_clear(&ref);
+	reftable_log_record_clear(&log);
+	st->stats.entries_written += entries;
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct reftable_stack *st, int first, int last,
+			       struct reftable_log_expiry_config *expiry)
+{
+	struct slice temp_tab_file_name = { 0 };
+	struct slice new_table_name = { 0 };
+	struct slice lock_file_name = { 0 };
+	struct slice ref_list_contents = { 0 };
+	struct slice new_table_path = { 0 };
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **listp = NULL;
+	char **delete_on_success =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	char **subtable_locks =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto exit;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_addstr(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	/* Don't want to write to the lock for now.  */
+	close(lock_file_fd);
+	lock_file_fd = 0;
+
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0) {
+		goto exit;
+	}
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_file_name = { 0 };
+		struct slice subtab_lock = { 0 };
+		slice_set_string(&subtab_file_name, st->reftable_dir);
+		slice_addstr(&subtab_file_name, "/");
+		slice_addstr(&subtab_file_name,
+			     reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_file_name);
+		slice_addstr(&subtab_lock, ".lock");
+
+		{
+			int sublock_file_fd =
+				open(slice_as_string(&subtab_lock),
+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
+			if (sublock_file_fd > 0) {
+				close(sublock_file_fd);
+			} else if (sublock_file_fd < 0) {
+				if (errno == EEXIST) {
+					err = 1;
+				} else {
+					err = REFTABLE_IO_ERROR;
+				}
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] =
+			(char *)slice_as_string(&subtab_file_name);
+		j++;
+
+		if (err != 0) {
+			goto exit;
+		}
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0) {
+		goto exit;
+	}
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
+				   expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto exit;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_addstr(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_addstr(&new_table_path, "/");
+
+	slice_addbuf(&new_table_path, new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(slice_as_string(&temp_tab_file_name),
+			     slice_as_string(&new_table_path));
+		if (err < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto exit;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
+		slice_addstr(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		slice_addbuf(&ref_list_contents, new_table_name);
+		slice_addstr(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
+		slice_addstr(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto exit;
+	}
+	have_lock = false;
+
+	/* Reload the stack before deleting. On windows, we can only delete the
+	   files after we closed them.
+	*/
+	err = reftable_stack_reload_maybe_reuse(st, first < last);
+
+	listp = delete_on_success;
+	while (*listp) {
+		if (strcmp(*listp, slice_as_string(&new_table_path))) {
+			unlink(*listp);
+		}
+		listp++;
+	}
+
+exit:
+	free_names(delete_on_success);
+
+	listp = subtable_locks;
+	while (*listp) {
+		unlink(*listp);
+		listp++;
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	slice_release(&new_table_name);
+	slice_release(&new_table_path);
+	slice_release(&ref_list_contents);
+	slice_release(&temp_tab_file_name);
+	slice_release(&lock_file_name);
+	return err;
+}
+
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct reftable_stack *st, int first,
+				     int last,
+				     struct reftable_log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	if (sz == 0) {
+		return 0;
+	}
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+
+	if (n == 0) {
+		*seglen = 0;
+		return segs;
+	}
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	reftable_free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
+{
+	uint64_t *sizes =
+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
+	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
+	int overhead = header_size(version) - 1;
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		sizes[i] = st->merged->stack[i]->size - overhead;
+	}
+	return sizes;
+}
+
+int reftable_stack_auto_compact(struct reftable_stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	reftable_free(sizes);
+	if (segment_size(&seg) > 0) {
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+	}
+
+	return 0;
+}
+
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st)
+{
+	return &st->stats;
+}
+
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_table tab = { NULL };
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+	return reftable_table_read_ref(&tab, refname, ref);
+}
+
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto exit;
+	}
+
+	err = reftable_iterator_next_log(&it, log);
+	if (err) {
+		goto exit;
+	}
+
+	if (strcmp(log->ref_name, refname) ||
+	    reftable_log_record_is_deletion(log)) {
+		err = 1;
+		goto exit;
+	}
+
+exit:
+	if (err) {
+		reftable_log_record_clear(log);
+	}
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name)
+{
+	int err = 0;
+	struct reftable_block_source src = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct reftable_ref_record *refs = NULL;
+	struct reftable_iterator it = { NULL };
+	int cap = 0;
+	int len = 0;
+	int i = 0;
+
+	if (st->config.skip_name_check) {
+		return 0;
+	}
+
+	err = reftable_block_source_from_file(&src, new_tab_name);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_new_reader(&rd, &src, new_tab_name);
+	if (err < 0) {
+		goto exit;
+	}
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	if (err > 0) {
+		err = 0;
+		goto exit;
+	}
+	if (err < 0) {
+		goto exit;
+	}
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			goto exit;
+		}
+
+		if (len >= cap) {
+			cap = 2 * cap + 1;
+			refs = reftable_realloc(refs, cap * sizeof(refs[0]));
+		}
+
+		refs[len++] = ref;
+	}
+
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+
+	err = validate_ref_record_addition(tab, refs, len);
+
+exit:
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&refs[i]);
+	}
+
+	free(refs);
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 00000000000..60bca3b3e58
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,48 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+#include "system.h"
+
+struct reftable_stack {
+	char *list_file;
+	char *reftable_dir;
+	bool disable_auto_compact;
+
+	struct reftable_write_options config;
+
+	struct reftable_merged_table *merged;
+	struct reftable_compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg);
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config);
+int fastlog2(uint64_t sz);
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
+void reftable_addition_close(struct reftable_addition *add);
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 00000000000..9e2c4b8d9ac
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,54 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#if 1 /* REFTABLE_IN_GITCORE */
+
+#include "git-compat-util.h"
+#include "cache.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_IN_GITCORE */
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef uint8_t byte;
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ *
+ * TODO: in git-core, this should fallback to uncompress2 if it is available.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 00000000000..cea1073dbdd
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,64 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	int res;
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				reftable_calloc(sizeof(struct tree_node));
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	res = compare(key, (*rootp)->key);
+	if (res < 0) {
+		return tree_search(key, &(*rootp)->left, compare, insert);
+	} else if (res > 0) {
+		return tree_search(key, &(*rootp)->right, compare, insert);
+	}
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	reftable_free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 00000000000..954512e9a3a
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+/* tree_node is a generic binary search tree. */
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+/* looks for `key` in `rootp` using `compare` as comparison function. If insert
+   is set, insert the key if it's not found. Else, return NULL.
+*/
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+
+/* performs an infix walk of the tree. */
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+
+/*
+  deallocates the tree nodes recursively. Keys should be deallocated separately
+  by walking over the tree. */
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100755
index 00000000000..ef32aeda515
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+set -eu
+
+# Override this to import from somewhere else, say "../reftable".
+SRC=${SRC:-origin}
+BRANCH=${BRANCH:-master}
+
+((git --git-dir reftable-repo/.git fetch ${SRC} ${BRANCH}:import && cd reftable-repo && git checkout -f $(git rev-parse import) ) ||
+   git clone https://github.com/google/reftable reftable-repo)
+
+cp reftable-repo/c/*.[ch] reftable/
+cp reftable-repo/c/include/*.[ch] reftable/
+cp reftable-repo/LICENSE reftable/
+git --git-dir reftable-repo/.git show --no-patch HEAD \
+  > reftable/VERSION
+
+mv reftable/system.h reftable/system.h~
+sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
+
+# Remove unittests and compatibility hacks we don't need here.  
+rm reftable/*_test.c reftable/test_framework.* reftable/compat.*
+
+git add reftable/*.[ch] reftable/LICENSE reftable/VERSION 
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 00000000000..71b75166896
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,659 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct reftable_block_stats *
+writer_reftable_block_stats(struct reftable_writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct reftable_writer *w, byte *data, size_t len,
+			int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = reftable_calloc(w->pending_padding);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0) {
+			return n;
+		}
+
+		w->pending_padding = 0;
+		reftable_free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0) {
+		return n;
+	}
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct reftable_write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_version(struct reftable_writer *w)
+{
+	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
+}
+
+static int writer_write_header(struct reftable_writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	dest[4] = writer_version(w);
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	if (writer_version(w) == 2) {
+		put_be32(dest + 24, w->opts.hash_id);
+	}
+	return header_size(writer_version(w));
+}
+
+static void writer_reinit_block_writer(struct reftable_writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = header_size(writer_version(w));
+	}
+
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, byte *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts)
+{
+	struct reftable_writer *wp =
+		reftable_calloc(sizeof(struct reftable_writer));
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->block = reftable_calloc(opts->block_size);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void reftable_writer_free(struct reftable_writer *w)
+{
+	reftable_free(w->block);
+	reftable_free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_cmp(((const struct obj_index_tree_node *)a)->hash,
+			 ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct reftable_writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		key = reftable_calloc(sizeof(struct obj_index_tree_node));
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = reftable_realloc(
+			key->offsets, sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct reftable_writer *w,
+			     struct reftable_record *rec)
+{
+	int result = -1;
+	struct slice key = { 0 };
+	int err = 0;
+	reftable_record_key(rec, &key);
+	if (slice_cmp(w->last_key, key) >= 0) {
+		goto exit;
+	}
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, reftable_record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == reftable_record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto exit;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	writer_reinit_block_writer(w, reftable_record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto exit;
+	}
+
+	result = 0;
+exit:
+	slice_release(&key);
+	return result;
+}
+
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	struct reftable_ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL) {
+		return REFTABLE_API_ERROR;
+	}
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index) {
+		return REFTABLE_API_ERROR;
+	}
+
+	reftable_record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, &rec);
+	if (err < 0) {
+		return err;
+	}
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = hash_size(w->opts.hash_id),
+		};
+
+		writer_index_hash(w, h);
+	}
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = hash_size(w->opts.hash_id),
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, reftable_ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	int err;
+	if (log->ref_name == NULL) {
+		return REFTABLE_API_ERROR;
+	}
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	reftable_record_from_log(&rec, log);
+	err = writer_add_record(w, &rec);
+	return err;
+}
+
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, reftable_log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	struct reftable_block_stats *bstats = NULL;
+	if (err < 0) {
+		return err;
+	}
+
+	while (w->index_len > threshold) {
+		struct reftable_index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct reftable_record rec = { 0 };
+			reftable_record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, &rec) == 0) {
+				continue;
+			}
+
+			{
+				int err = writer_flush_block(w);
+				if (err < 0) {
+					return err;
+				}
+			}
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, &rec);
+			if (err != 0) {
+				/* write into fresh block should always succeed
+				 */
+				abort();
+			}
+		}
+		for (i = 0; i < idx_len; i++) {
+			slice_release(&idx[i].last_key);
+		}
+		reftable_free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		return err;
+	}
+
+	bstats = writer_reftable_block_stats(w, typ);
+	bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
+	bstats->index_offset = index_start;
+	bstats->max_index_level = max_level;
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct reftable_writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct reftable_obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct reftable_record rec = { 0 };
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	reftable_record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0) {
+		goto exit;
+	}
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0) {
+		goto exit;
+	}
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+exit:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	slice_release(&entry->hash);
+	reftable_free(entry);
+}
+
+static int writer_dump_object_index(struct reftable_writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0) {
+		return closure.err;
+	}
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct reftable_writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0) {
+		return err;
+	}
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int reftable_writer_close(struct reftable_writer *w)
+{
+	byte footer[72];
+	byte *p = footer;
+	int err = writer_finish_public_section(w);
+	int empty_table = w->next == 0;
+	if (err != 0) {
+		goto exit;
+	}
+	w->pending_padding = 0;
+	if (empty_table) {
+		/* Empty tables need a header anyway. */
+		byte header[28];
+		int n = writer_write_header(w, header);
+		err = padded_write(w, header, n, 0);
+		if (err < 0) {
+			goto exit;
+		}
+	}
+
+	p += writer_write_header(w, footer);
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+
+	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
+	if (err < 0) {
+		goto exit;
+	}
+
+	if (empty_table) {
+		err = REFTABLE_EMPTY_TABLE_ERROR;
+		goto exit;
+	}
+
+exit:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	slice_release(&w->last_key);
+	return err;
+}
+
+void writer_clear_index(struct reftable_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		slice_release(&w->index[i].last_key);
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct reftable_block_stats *bstats =
+		writer_reftable_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	struct reftable_index_record ir = { 0 };
+	if (raw_bytes < 0) {
+		return raw_bytes;
+	}
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0) {
+		return err;
+	}
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = reftable_realloc(
+			w->index,
+			sizeof(struct reftable_index_record) * w->index_cap);
+	}
+
+	ir.offset = w->next;
+	slice_copy(&ir.last_key, w->block_writer->last_key);
+	w->index[w->index_len] = ir;
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct reftable_writer *w)
+{
+	if (w->block_writer == NULL) {
+		return 0;
+	}
+	if (w->block_writer->entries == 0) {
+		return 0;
+	}
+	return writer_flush_nonempty_block(w);
+}
+
+const struct reftable_stats *writer_stats(struct reftable_writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 00000000000..c56d67f7116
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct reftable_writer {
+	int (*write)(void *, byte *, size_t);
+	void *write_arg;
+	int pending_padding;
+	struct slice last_key;
+
+	/* offset of next block to write. */
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct reftable_write_options opts;
+
+	/* memory buffer for writing */
+	byte *block;
+
+	/* writer for the current section. NULL or points to
+	 * block_writer_data */
+	struct block_writer *block_writer;
+
+	struct block_writer block_writer_data;
+
+	/* pending index records for the current section */
+	struct reftable_index_record *index;
+	int index_len;
+	int index_cap;
+
+	/*
+	  tree for use with tsearch; used to populate the 'o' inverse OID
+	  map */
+	struct tree_node *obj_index_tree;
+
+	struct reftable_stats stats;
+};
+
+/* finishes a block, and writes it to storage */
+int writer_flush_block(struct reftable_writer *w);
+
+/* deallocates memory related to the index */
+void writer_clear_index(struct reftable_writer *w);
+
+/* finishes writing a 'r' (refs) or 'g' (reflogs) section */
+int writer_finish_public_section(struct reftable_writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 00000000000..3e0b0f24f1c
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v14 6/9] Reftable support for git-core
  2020-05-18 20:31                         ` [PATCH v14 0/9] " Han-Wen Nienhuys via GitGitGadget
                                             ` (4 preceding siblings ...)
  2020-05-18 20:31                           ` [PATCH v14 5/9] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-05-18 20:31                           ` Han-Wen Nienhuys via GitGitGadget
  2020-05-18 20:31                           ` [PATCH v14 7/9] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
                                             ` (3 subsequent siblings)
  9 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-18 20:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

This introduces the refs/reftable-backend.c containing reftable powered ref
storage backend.

It can be activated by passing --ref-storage=reftable to "git init".

TODO:

* Resolve spots marked with XXX

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   27 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   56 +-
 cache.h                                       |    6 +-
 refs.c                                        |   27 +-
 refs.h                                        |    3 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1313 +++++++++++++++++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |  136 ++
 13 files changed, 1566 insertions(+), 30 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ffd..72576235833 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index 3d3a39fc192..f4848a382f4 100644
--- a/Makefile
+++ b/Makefile
@@ -810,6 +810,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -962,6 +963,7 @@ LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1165,7 +1167,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2360,11 +2362,29 @@ VCSSVN_OBJS += vcs-svn/sliding_window.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/refname.o
+REFTABLE_OBJS += reftable/reftable.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2505,6 +2525,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -3120,7 +3143,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/builtin/clone.c b/builtin/clone.c
index cb48a291caf..4d0cf065e4a 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1108,7 +1108,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
+		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 0b7222e7188..b7053b9e370 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,7 +178,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -188,7 +189,8 @@ void initialize_repository_version(int hash_algo)
 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
 #endif
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 ||
+	    !strcmp(ref_storage_format, "reftable"))
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -238,6 +240,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -247,6 +250,24 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
+	/*
+	 * refs/heads is a file when using reftable. We can't reinitialize with
+	 * a reftable because it will overwrite HEAD
+	 */
+	if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
+			      is_directory(git_path_buf(&buf, "refs/heads"))) {
+		die("cannot switch ref storage format.");
+	}
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -261,15 +282,12 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
@@ -383,7 +401,8 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash, unsigned int flags)
+	    const char *template_dir, int hash, const char *ref_storage_format,
+	    unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -422,6 +441,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
+	repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
@@ -450,6 +470,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -523,6 +545,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = DEFAULT_REF_STORAGE;
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
@@ -530,15 +553,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *object_format = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
@@ -648,9 +674,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
+	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
+		       ref_storage_format, flags);
 }
diff --git a/cache.h b/cache.h
index 0f0485ecfe2..8cb884773c3 100644
--- a/cache.h
+++ b/cache.h
@@ -628,8 +628,9 @@ int path_inside_repo(const char *prefix, const char *path);
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
-	    unsigned int flags);
-void initialize_repository_version(int hash_algo);
+	    const char *ref_storage_format, unsigned int flags);
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1043,6 +1044,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index dfafd04b9f9..8eb64a33d6e 100644
--- a/refs.c
+++ b/refs.c
@@ -17,10 +17,16 @@
 #include "argv-array.h"
 #include "repository.h"
 
+const char *default_ref_storage(void)
+{
+	const char *test = getenv("GIT_TEST_REFTABLE");
+	return test ? "reftable" : "files";
+}
+
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1712,13 +1718,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1734,7 +1740,11 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs_private = ref_store_init(r->gitdir,
+					 r->ref_storage_format ?
+						 r->ref_storage_format :
+						 DEFAULT_REF_STORAGE,
+					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
 
@@ -1789,7 +1799,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1803,6 +1813,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1816,9 +1827,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index d1d9361441b..a167f60734d 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* Returns the ref storage backend to use by default. */
+const char *default_ref_storage(void);
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 51c96ebd485..de2ead363f5 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -680,6 +680,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 00000000000..4721a896784
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1313 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct git_reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *repo_dir;
+	char *reftable_dir;
+	struct reftable_stack *stack;
+};
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type);
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	reftable_log_record_clear(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = { NULL };
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	reftable_log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *git_reftable_ref_store_create(const char *path,
+						       unsigned int store_flags)
+{
+	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct reftable_write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+	refs->repo_dir = xstrdup(path);
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct strbuf sb = STRBUF_INIT;
+
+	safe_create_dir(refs->reftable_dir, 1);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	return 0;
+}
+
+struct git_reftable_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	const char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = reftable_iterator_next_ref(&ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		/*
+		   We could filter pseudo refs here explicitly, but HEAD is not
+		   a PSEUDOREF, but a PER_WORKTREE, b/c each worktree can have
+		   its own HEAD.
+		 */
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	reftable_ref_record_clear(&ri->ref);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct reftable_merged_table *mt = NULL;
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else {
+		mt = reftable_stack_merged_table(refs->stack);
+		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->prefix = prefix;
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int fixup_symrefs(struct ref_store *ref_store,
+			 struct ref_transaction *transaction)
+{
+	struct strbuf referent = STRBUF_INIT;
+	int i = 0;
+	int err = 0;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *update = transaction->updates[i];
+		struct object_id old_oid;
+
+		err = reftable_read_raw_ref(ref_store, update->refname,
+					    &old_oid, &referent,
+					    /* mutate input, like
+					       files-backend.c */
+					    &update->type);
+		if (err < 0 && errno == ENOENT &&
+		    is_null_oid(&update->old_oid)) {
+			err = 0;
+		}
+		if (err < 0)
+			goto done;
+
+		if (!(update->type & REF_ISSYMREF))
+			continue;
+
+		if (update->flags & REF_NO_DEREF) {
+			/* what should happen here? See files-backend.c
+			 * lock_ref_for_update. */
+		} else {
+			/*
+			  If we are updating a symref (eg. HEAD), we should also
+			  update the branch that the symref points to.
+
+			  This is generic functionality, and would be better
+			  done in refs.c, but the current implementation is
+			  intertwined with the locking in files-backend.c.
+			*/
+			int new_flags = update->flags;
+			struct ref_update *new_update = NULL;
+
+			/* if this is an update for HEAD, should also record a
+			   log entry for HEAD? See files-backend.c,
+			   split_head_update()
+			*/
+			new_update = ref_transaction_add_update(
+				transaction, referent.buf, new_flags,
+				&update->new_oid, &update->old_oid,
+				update->msg);
+			new_update->parent_update = update;
+
+			/* files-backend sets REF_LOG_ONLY here. */
+			update->flags |= REF_NO_DEREF | REF_LOG_ONLY;
+			update->flags &= ~REF_HAVE_OLD;
+		}
+	}
+
+done:
+	strbuf_release(&referent);
+	return err;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_addition *add = NULL;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_new_addition(&add, refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = fixup_symrefs(ref_store, transaction);
+	if (err) {
+		goto done;
+	}
+
+	transaction->backend_data = add;
+	transaction->state = REF_TRANSACTION_PREPARED;
+
+done:
+	if (err < 0) {
+		transaction->state = REF_TRANSACTION_CLOSED;
+		strbuf_addf(errbuf, "reftable: transaction prepare: %s",
+			    reftable_error_str(err));
+	}
+
+	return err;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	reftable_addition_destroy(add);
+	transaction->backend_data = NULL;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid;
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp((*(struct ref_update **)a)->refname,
+		      (*(struct ref_update **)b)->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = reftable_stack_next_update_index(refs->stack);
+	int err = 0;
+	int i = 0;
+	struct reftable_log_record *logs =
+		calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	reftable_writer_set_limits(writer, ts, ts);
+
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct reftable_log_record *log = &logs[i];
+		fill_reftable_log_record(log);
+		log->ref_name = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_LOG_ONLY) {
+			continue;
+		}
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct reftable_ref_record ref = { NULL };
+			struct object_id peeled;
+
+			int peel_error = peel_object(&u->new_oid, &peeled);
+			ref.ref_name = (char *)u->refname;
+
+			if (!is_null_oid(&u->new_oid)) {
+				ref.value = u->new_oid.hash;
+			}
+			ref.update_index = ts;
+			if (!peel_error) {
+				ref.target_value = peeled.hash;
+			}
+
+			err = reftable_writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		err = reftable_writer_add_log(writer, &logs[i]);
+		clear_reftable_log_record(&logs[i]);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+done:
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	int err = 0;
+	int i;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+
+	err = reftable_addition_add(add, &write_transaction_table, transaction);
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_addition_commit(add);
+
+done:
+	reftable_addition_destroy(add);
+	transaction->state = REF_TRANSACTION_CLOSED;
+	transaction->backend_data = NULL;
+	if (err) {
+		strbuf_addf(errmsg, "reftable: transaction failure: %s",
+			    reftable_error_str(err));
+		return -1;
+	}
+	return err;
+}
+
+static int
+reftable_transaction_initial_commit(struct ref_store *ref_store,
+				    struct ref_transaction *transaction,
+				    struct strbuf *errmsg)
+{
+	return reftable_transaction_finish(ref_store, transaction, errmsg);
+}
+
+struct write_pseudoref_arg {
+	struct reftable_stack *stack;
+	const char *pseudoref;
+	const struct object_id *new_oid;
+	const struct object_id *old_oid;
+};
+
+static int write_pseudoref_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_pseudoref_arg *arg = (struct write_pseudoref_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	struct reftable_ref_record read_ref = { NULL };
+	struct reftable_ref_record write_ref = { NULL };
+
+	reftable_writer_set_limits(writer, ts, ts);
+	if (arg->old_oid) {
+		struct object_id read_oid;
+		err = reftable_stack_read_ref(arg->stack, arg->pseudoref,
+					      &read_ref);
+		if (err < 0)
+			goto done;
+
+		if ((err > 0) != is_null_oid(arg->old_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		/* XXX If old_oid is set, and we have a symref? */
+
+		if (err == 0 && read_ref.value == NULL) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		hashcpy(read_oid.hash, read_ref.value);
+		if (!oideq(arg->old_oid, &read_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+	}
+
+	write_ref.ref_name = (char *)arg->pseudoref;
+	write_ref.update_index = ts;
+	if (!is_null_oid(arg->new_oid))
+		write_ref.value = (uint8_t *)arg->new_oid->hash;
+
+	err = reftable_writer_add_ref(writer, &write_ref);
+done:
+	reftable_ref_record_clear(&read_ref);
+	return err;
+}
+
+static int reftable_write_pseudoref(struct ref_store *ref_store,
+				    const char *pseudoref,
+				    const struct object_id *oid,
+				    const struct object_id *old_oid,
+				    struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_pseudoref_arg arg = {
+		.stack = refs->stack,
+		.pseudoref = pseudoref,
+		.new_oid = oid,
+	};
+	struct reftable_addition *add = NULL;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_new_addition(&add, refs->stack);
+	if (err) {
+		goto done;
+	}
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		/* XXX this is cut & paste from files-backend - should factor
+		 * out? */
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(errbuf,
+					    _("could not read ref '%s'"),
+					    pseudoref);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(errbuf, _("ref '%s' already exists"),
+				    pseudoref);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(errbuf,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			goto done;
+		}
+	}
+
+	err = reftable_addition_add(add, &write_pseudoref_table, &arg);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref update failure: %s",
+			    reftable_error_str(err));
+	}
+
+	err = reftable_addition_commit(add);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref commit failure: %s",
+			    reftable_error_str(err));
+	}
+
+done:
+	reftable_addition_destroy(add);
+	return err;
+}
+
+static int reftable_delete_pseudoref(struct ref_store *ref_store,
+				     const char *pseudoref,
+				     const struct object_id *old_oid)
+{
+	struct strbuf errbuf = STRBUF_INIT;
+	int ret = reftable_write_pseudoref(ref_store, pseudoref, &null_oid,
+					   old_oid, &errbuf);
+	/* XXX what to do with the error message? */
+	strbuf_release(&errbuf);
+	return ret;
+}
+
+struct write_delete_refs_arg {
+	struct reftable_stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	int i = 0;
+
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = reftable_writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_log_record log = { NULL };
+		struct reftable_ref_record current = { NULL };
+		fill_reftable_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (reftable_stack_read_ref(arg->stack, log.ref_name,
+					    &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = reftable_writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		reftable_ref_record_clear(&current);
+
+		clear_reftable_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
+done:
+	return err;
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct git_reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = reftable_stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	reftable_writer_set_limits(writer, ts, ts);
+	err = reftable_writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_log_record log = { NULL };
+		struct object_id new_oid;
+		struct object_id old_oid;
+		struct reftable_ref_record current = { NULL };
+		reftable_stack_read_ref(create->refs->stack, create->refname,
+					&current);
+
+		fill_reftable_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			reftable_writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_reftable_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(refs->stack, &write_create_symref_table, &arg);
+done:
+	return err;
+}
+
+struct write_rename_arg {
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_ref_record ref = { NULL };
+	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto done;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto done;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	reftable_writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct reftable_ref_record todo[2] = { { NULL } };
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = reftable_writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct reftable_log_record todo[2] = { { NULL } };
+		fill_reftable_log_record(&todo[0]);
+		fill_reftable_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = reftable_writer_add_logs(writer, todo, 2);
+
+		clear_reftable_log_record(&todo[0]);
+		clear_reftable_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto done;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+done:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_add(refs->stack, &write_rename_table, &arg);
+done:
+	return err;
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = reftable_iterator_next_log(&ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			/* we want the refnames that we have reflogs for, so we
+			 * skip if we've already produced this name. This could
+			 * be faster by seeking directly to
+			 * reflog@update_index==0.
+			 */
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	reftable_log_record_clear(&ri->log);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(refs->stack);
+	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_log_record log = { NULL };
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid;
+			struct object_id new_oid;
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+	int i = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct reftable_log_record log = { NULL };
+		err = reftable_iterator_next_log(&it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (i = len; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (i = 0; i < len; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		reftable_log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct reftable_log_record tombstone = {
+		.ref_name = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->refs->stack);
+	int i = 0;
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  reftable_pack_refs(), which will compact the entire stack and get rid
+	  of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.refs = refs,
+	};
+	struct reftable_log_record log = { NULL };
+	struct reftable_iterator it = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		goto done;
+	}
+
+	while (1) {
+		struct object_id ooid;
+		struct object_id noid;
+
+		int err = reftable_iterator_next_log(&it, &log);
+		if (err < 0) {
+			goto done;
+		}
+
+		if (err > 0 || strcmp(log.ref_name, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
+
+done:
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_read_ref(refs->stack, refname, &ref);
+	if (err > 0) {
+		errno = ENOENT;
+		err = -1;
+		goto done;
+	}
+	if (err < 0) {
+		errno = reftable_error_to_errno(err);
+		err = -1;
+		goto done;
+	}
+	if (ref.target != NULL) {
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else if (ref.value != NULL) {
+		hashcpy(oid->hash, ref.value);
+	} else {
+		*type |= REF_ISBROKEN;
+		errno = EINVAL;
+		err = -1;
+	}
+done:
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	git_reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_initial_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_write_pseudoref,
+	reftable_delete_pseudoref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire,
+};
diff --git a/repository.c b/repository.c
index 6f7f6f002b1..087760bc184 100644
--- a/repository.c
+++ b/repository.c
@@ -178,6 +178,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 6534fbb7b31..f57b73f4a27 100644
--- a/repository.h
+++ b/repository.h
@@ -74,6 +74,9 @@ struct repository {
 	 */
 	struct ref_store *refs_private;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index 65fe5ecefbe..2ef970f9f88 100644
--- a/setup.c
+++ b/setup.c
@@ -468,9 +468,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -559,6 +561,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1204,8 +1207,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 00000000000..0f1e621025f
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,136 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+INVALID_SHA1=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+initialize ()  {
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	mv .git/hooks .git/hooks-disabled
+}
+
+test_expect_success 'delete ref' '
+	initialize &&
+	test_commit file &&
+	SHA=$(git show-ref -s --verify HEAD) &&
+	test_write_lines "$SHA refs/heads/master" "$SHA refs/tags/file" >expect &&
+	git show-ref > actual &&
+	! git update-ref -d refs/tags/file $INVALID_SHA1 &&
+	test_cmp expect actual &&
+	git update-ref -d refs/tags/file $SHA  &&
+	test_write_lines "$SHA refs/heads/master" >expect &&
+	git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'basic operation of reftable storage: commit, show-ref' '
+	initialize &&
+	test_commit file &&
+	test_write_lines refs/heads/master refs/tags/file >expect &&
+	git show-ref &&
+	git show-ref | cut -f2 -d" " > actual &&
+	test_cmp actual expect
+'
+
+test_expect_success 'reflog, repack' '
+	initialize &&
+	for count in $(test_seq 1 10)
+	do
+		test_commit "number $count" file.t $count number-$count ||
+	        return 1
+	done &&
+	git pack-refs &&
+	ls -1 .git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 10 output &&
+	grep "commit (initial): number 1" output &&
+	grep "commit: number 10" output &&
+	git gc &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 0 output 
+'
+
+# This matches show-ref's output
+print_ref() {
+	echo "$(git rev-parse "$1") $1"
+}
+
+test_expect_success 'peeled tags are stored' '
+	initialize &&
+	test_commit file &&
+	git tag -m "annotated tag" test_tag HEAD &&
+	{
+		print_ref "refs/heads/master" &&
+		print_ref "refs/tags/file" &&
+		print_ref "refs/tags/test_tag" &&
+		print_ref "refs/tags/test_tag^{}" 
+	} >expect &&
+	git show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'show-ref works on fresh repo' '
+	initialize &&
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	>expect &&
+	! git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'checkout unborn branch' '
+	initialize &&
+	git checkout -b master
+'
+
+
+test_expect_success 'dir/file conflict' '
+	initialize &&
+	test_commit file &&
+	! git branch master/forbidden
+'
+
+
+test_expect_success 'do not clobber existing repo' '
+	rm -rf .git &&
+	git init --ref-storage=files &&
+	cat .git/HEAD > expect &&
+	test_commit file &&
+	(git init --ref-storage=reftable || true) &&
+	cat .git/HEAD > actual &&
+	test_cmp expect actual
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'pseudo refs' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+	git cherry-pick source &&
+	test -f file2
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'rebase' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+ 	git rebase source &&
+	test -f file2
+'
+
+test_done
+
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v14 7/9] Add GIT_DEBUG_REFS debugging mechanism
  2020-05-18 20:31                         ` [PATCH v14 0/9] " Han-Wen Nienhuys via GitGitGadget
                                             ` (5 preceding siblings ...)
  2020-05-18 20:31                           ` [PATCH v14 6/9] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-05-18 20:31                           ` Han-Wen Nienhuys via GitGitGadget
  2020-05-18 20:31                           ` [PATCH v14 8/9] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
                                             ` (2 subsequent siblings)
  9 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-18 20:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

When set in the environment, GIT_DEBUG_REFS makes git print operations and
results as they flow through the ref storage backend. This helps debug
discrepancies between different ref backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile              |   1 +
 refs.c                |   3 +
 refs/debug.c          | 309 ++++++++++++++++++++++++++++++++++++++++++
 refs/refs-internal.h  |   6 +
 t/t0033-debug-refs.sh |  18 +++
 5 files changed, 337 insertions(+)
 create mode 100644 refs/debug.c
 create mode 100755 t/t0033-debug-refs.sh

diff --git a/Makefile b/Makefile
index f4848a382f4..cec61455184 100644
--- a/Makefile
+++ b/Makefile
@@ -962,6 +962,7 @@ LIB_OBJS += rebase.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
+LIB_OBJS += refs/debug.o
 LIB_OBJS += refs/files-backend.o
 LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
diff --git a/refs.c b/refs.c
index 8eb64a33d6e..9b456a46712 100644
--- a/refs.c
+++ b/refs.c
@@ -1745,6 +1745,9 @@ struct ref_store *get_main_ref_store(struct repository *r)
 						 r->ref_storage_format :
 						 DEFAULT_REF_STORAGE,
 					 REF_STORE_ALL_CAPS);
+	if (getenv("GIT_DEBUG_REFS")) {
+		r->refs_private = debug_wrap(r->refs_private);
+	}
 	return r->refs_private;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
new file mode 100644
index 00000000000..7c07ec92137
--- /dev/null
+++ b/refs/debug.c
@@ -0,0 +1,309 @@
+
+#include "refs-internal.h"
+
+struct debug_ref_store {
+	struct ref_store base;
+	struct ref_store *refs;
+};
+
+extern struct ref_storage_be refs_be_debug;
+struct ref_store *debug_wrap(struct ref_store *store);
+
+struct ref_store *debug_wrap(struct ref_store *store)
+{
+	struct debug_ref_store *res = malloc(sizeof(struct debug_ref_store));
+	res->refs = store;
+	base_ref_store_init((struct ref_store *)res, &refs_be_debug);
+	return (struct ref_store *)res;
+}
+
+static int debug_init_db(struct ref_store *refs, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res = drefs->refs->be->init_db(drefs->refs, err);
+	return res;
+}
+
+static int debug_transaction_prepare(struct ref_store *refs,
+				     struct ref_transaction *transaction,
+				     struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_prepare(drefs->refs, transaction,
+						   err);
+	return res;
+}
+
+static void print_update(int i, const char *refname,
+			 const struct object_id *old_oid,
+			 const struct object_id *new_oid, unsigned int flags,
+			 unsigned int type)
+{
+	char o[200] = "null";
+	char n[200] = "null";
+	if (old_oid)
+		oid_to_hex_r(o, old_oid);
+	if (new_oid)
+		oid_to_hex_r(n, new_oid);
+
+	type &= 0xf; /* see refs.h REF_* */
+	flags &= REF_HAVE_NEW | REF_HAVE_OLD | REF_NO_DEREF |
+		 REF_FORCE_CREATE_REFLOG | REF_LOG_ONLY;
+	printf("%d: %s %s -> %s (F=0x%x, T=0x%x)\n", i, refname, o, n, flags,
+	       type);
+}
+
+static void print_transaction(struct ref_transaction *transaction)
+{
+	printf("transaction {\n");
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		print_update(i, u->refname, &u->old_oid, &u->new_oid, u->flags,
+			     u->type);
+	}
+	printf("}\n");
+}
+
+static int debug_transaction_finish(struct ref_store *refs,
+				    struct ref_transaction *transaction,
+				    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_finish(drefs->refs, transaction,
+						  err);
+	print_transaction(transaction);
+	printf("finish: %d\n", res);
+	return res;
+}
+
+static int debug_transaction_abort(struct ref_store *refs,
+				   struct ref_transaction *transaction,
+				   struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_abort(drefs->refs, transaction, err);
+	return res;
+}
+
+static int debug_initial_transaction_commit(struct ref_store *refs,
+					    struct ref_transaction *transaction,
+					    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->initial_transaction_commit(drefs->refs,
+							  transaction, err);
+	return res;
+}
+
+static int debug_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->pack_refs(drefs->refs, flags);
+	return res;
+}
+
+static int debug_create_symref(struct ref_store *ref_store,
+			       const char *ref_target,
+			       const char *refs_heads_master,
+			       const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_symref(drefs->refs, ref_target,
+						 refs_heads_master, logmsg);
+	return res;
+}
+
+static int debug_delete_refs(struct ref_store *ref_store, const char *msg,
+			     struct string_list *refnames, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->delete_refs(drefs->refs, msg, refnames, flags);
+	return res;
+}
+static int debug_rename_ref(struct ref_store *ref_store, const char *oldref,
+			    const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->rename_ref(drefs->refs, oldref, newref,
+					      logmsg);
+	return res;
+}
+static int debug_copy_ref(struct ref_store *ref_store, const char *oldref,
+			  const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->copy_ref(drefs->refs, oldref, newref, logmsg);
+	return res;
+}
+
+static int debug_write_pseudoref(struct ref_store *ref_store,
+				 const char *pseudoref,
+				 const struct object_id *oid,
+				 const struct object_id *old_oid,
+				 struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->write_pseudoref(drefs->refs, pseudoref, oid,
+						   old_oid, err);
+	char o[100] = "null";
+	char n[100] = "null";
+	if (oid)
+		oid_to_hex_r(o, oid);
+	if (old_oid)
+		oid_to_hex_r(n, old_oid);
+	printf("write_pseudoref: %s, %s => %s, err %s: %d\n", pseudoref, o, n,
+	       err->buf, res);
+	return res;
+}
+
+static int debug_delete_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *old_oid)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->delete_pseudoref(drefs->refs, pseudoref,
+						    old_oid);
+	char hex[100] = "null";
+	if (old_oid)
+		oid_to_hex_r(hex, old_oid);
+	printf("delete_pseudoref: %s (%s): %d\n", pseudoref, hex, res);
+	return res;
+}
+
+static struct ref_iterator *
+debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			 unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->iterator_begin(drefs->refs, prefix, flags);
+	return res;
+}
+
+static int debug_read_raw_ref(struct ref_store *ref_store, const char *refname,
+			      struct object_id *oid, struct strbuf *referent,
+			      unsigned int *type)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = 0;
+
+	oidcpy(oid, &null_oid);
+	res = drefs->refs->be->read_raw_ref(drefs->refs, refname, oid, referent,
+					    type);
+
+	if (res == 0) {
+		printf("read_raw_ref: %s: %s (=> %s) type %x: %d\n", refname,
+		       oid_to_hex(oid), referent->buf, *type, res);
+	} else {
+		printf("read_raw_ref: %s err %d\n", refname, res);
+	}
+	return res;
+}
+
+static struct ref_iterator *
+debug_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->reflog_iterator_begin(drefs->refs);
+	return res;
+}
+
+static int debug_for_each_reflog_ent(struct ref_store *ref_store,
+				     const char *refname, each_reflog_ent_fn fn,
+				     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->for_each_reflog_ent(drefs->refs, refname, fn,
+						       cb_data);
+	return res;
+}
+static int debug_for_each_reflog_ent_reverse(struct ref_store *ref_store,
+					     const char *refname,
+					     each_reflog_ent_fn fn,
+					     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->for_each_reflog_ent_reverse(
+		drefs->refs, refname, fn, cb_data);
+	return res;
+}
+
+static int debug_reflog_exists(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_exists(drefs->refs, refname);
+	return res;
+}
+
+static int debug_create_reflog(struct ref_store *ref_store, const char *refname,
+			       int force_create, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_reflog(drefs->refs, refname,
+						 force_create, err);
+	return res;
+}
+
+static int debug_delete_reflog(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->delete_reflog(drefs->refs, refname);
+	return res;
+}
+
+static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
+			       const struct object_id *oid, unsigned int flags,
+			       reflog_expiry_prepare_fn prepare_fn,
+			       reflog_expiry_should_prune_fn should_prune_fn,
+			       reflog_expiry_cleanup_fn cleanup_fn,
+			       void *policy_cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_expire(drefs->refs, refname, oid,
+						 flags, prepare_fn,
+						 should_prune_fn, cleanup_fn,
+						 policy_cb_data);
+	return res;
+}
+
+struct ref_storage_be refs_be_debug = {
+	NULL,
+	"debug",
+	NULL,
+	debug_init_db,
+	debug_transaction_prepare,
+	debug_transaction_finish,
+	debug_transaction_abort,
+	debug_initial_transaction_commit,
+
+	debug_pack_refs,
+	debug_create_symref,
+	debug_delete_refs,
+	debug_rename_ref,
+	debug_copy_ref,
+
+	debug_write_pseudoref,
+	debug_delete_pseudoref,
+
+	debug_ref_iterator_begin,
+	debug_read_raw_ref,
+
+	debug_reflog_iterator_begin,
+	debug_for_each_reflog_ent,
+	debug_for_each_reflog_ent_reverse,
+	debug_reflog_exists,
+	debug_create_reflog,
+	debug_delete_reflog,
+	debug_reflog_expire,
+};
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index de2ead363f5..c4abab3d179 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -700,4 +700,10 @@ struct ref_store {
 void base_ref_store_init(struct ref_store *refs,
 			 const struct ref_storage_be *be);
 
+/*
+ * Print out ref operations as they occur. Useful for debugging alternate ref
+ * backends.
+ */
+struct ref_store *debug_wrap(struct ref_store *store);
+
 #endif /* REFS_REFS_INTERNAL_H */
diff --git a/t/t0033-debug-refs.sh b/t/t0033-debug-refs.sh
new file mode 100755
index 00000000000..96266a3ee3a
--- /dev/null
+++ b/t/t0033-debug-refs.sh
@@ -0,0 +1,18 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='cross-check reftable with files, using GIT_DEBUG_REFS output'
+
+. ./test-lib.sh
+
+test_expect_success 'GIT_DEBUG_REFS' '
+	git init --ref-storage=files files &&
+	git init --ref-storage=reftable reftable &&
+	(cd files && GIT_DEBUG_REFS=1 test_commit message file) > files.txt && 
+	(cd reftable && GIT_DEBUG_REFS=1 test_commit message file) > reftable.txt &&
+	test_cmp files.txt reftable.txt
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v14 8/9] vcxproj: adjust for the reftable changes
  2020-05-18 20:31                         ` [PATCH v14 0/9] " Han-Wen Nienhuys via GitGitGadget
                                             ` (6 preceding siblings ...)
  2020-05-18 20:31                           ` [PATCH v14 7/9] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
@ 2020-05-18 20:31                           ` Johannes Schindelin via GitGitGadget
  2020-05-18 20:31                           ` [PATCH v14 9/9] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  9 siblings, 0 replies; 409+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-05-18 20:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This allows Git to be compiled via Visual Studio again after integrating
the `hn/reftable` branch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.mak.uname                           |  2 +-
 contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/config.mak.uname b/config.mak.uname
index 5ad43c80b1a..484aec15ac2 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -710,7 +710,7 @@ vcxproj:
 	# Make .vcxproj files and add them
 	unset QUIET_GEN QUIET_BUILT_IN; \
 	perl contrib/buildsystems/generate -g Vcxproj
-	git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
+	git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj
 
 	# Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
 	(echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm
index 5c666f9ac03..33a08d31652 100644
--- a/contrib/buildsystems/Generators/Vcxproj.pm
+++ b/contrib/buildsystems/Generators/Vcxproj.pm
@@ -77,7 +77,7 @@ sub createProject {
     my $libs_release = "\n    ";
     my $libs_debug = "\n    ";
     if (!$static_library) {
-      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
+      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
       $libs_debug = $libs_release;
       $libs_debug =~ s/zlib\.lib/zlibd\.lib/g;
       $libs_debug =~ s/libcurl\.lib/libcurl-d\.lib/g;
@@ -231,6 +231,7 @@ sub createProject {
 EOM
     if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') {
       my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"};
+      my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"};
       my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"};
 
       print F << "EOM";
@@ -240,6 +241,14 @@ sub createProject {
       <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
     </ProjectReference>
 EOM
+      if (!($name =~ /xdiff|libreftable/)) {
+        print F << "EOM";
+    <ProjectReference Include="$cdup\\reftable\\libreftable\\libreftable.vcxproj">
+      <Project>$uuid_libreftable</Project>
+      <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
+    </ProjectReference>
+EOM
+      }
       if (!($name =~ 'xdiff')) {
         print F << "EOM";
     <ProjectReference Include="$cdup\\xdiff\\lib\\xdiff_lib.vcxproj">
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v14 9/9] Add reftable testing infrastructure
  2020-05-18 20:31                         ` [PATCH v14 0/9] " Han-Wen Nienhuys via GitGitGadget
                                             ` (7 preceding siblings ...)
  2020-05-18 20:31                           ` [PATCH v14 8/9] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
@ 2020-05-18 20:31                           ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  9 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-18 20:31 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* Add GIT_TEST_REFTABLE environment var to control default ref storage

* Add test_prerequisite REFTABLE.

* Skip some tests that are incompatible:

  * t3210-pack-refs.sh - does not apply
  * t9903-bash-prompt - The bash mode reads .git/HEAD directly
  * t1450-fsck.sh - manipulates .git/ directly to create invalid state

Major test failures:

 * t1400-update-ref.sh - Reads from .git/{refs,logs} directly
 * t1404-update-ref-errors.sh - Manipulates .git/refs/ directly
 * t1405 - inspecs .git/ directly.

t6030-bisect-porcelain.sh                - 62 of 72
t2400-worktree-add.sh                    - 58 of 69
t3200-branch.sh                          - 58 of 145
t7406-submodule-update.sh                - 54 of 54
t5601-clone.sh                           - 51 of 105
t9903-bash-prompt.sh                     - 50 of 66
t1404-update-ref-errors.sh               - 44 of 53
t5510-fetch.sh                           - 40 of 171
t7400-submodule-basic.sh                 - 38 of 111
t3514-cherry-pick-revert-gpg.sh          - 36 of 36
..

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/clone.c               | 2 +-
 builtin/init-db.c             | 2 +-
 refs.c                        | 6 +++---
 t/t1409-avoid-packing-refs.sh | 6 ++++++
 t/t1450-fsck.sh               | 6 ++++++
 t/t3210-pack-refs.sh          | 6 ++++++
 t/t9903-bash-prompt.sh        | 6 ++++++
 t/test-lib.sh                 | 5 +++++
 8 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4d0cf065e4a..780c5807415 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1109,7 +1109,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 
 	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
-		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
+		default_ref_storage(), INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index b7053b9e370..da5b4670c84 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -545,7 +545,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
-	const char *ref_storage_format = DEFAULT_REF_STORAGE;
+	const char *ref_storage_format = default_ref_storage();
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
diff --git a/refs.c b/refs.c
index 9b456a46712..6886330da41 100644
--- a/refs.c
+++ b/refs.c
@@ -1743,7 +1743,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	r->refs_private = ref_store_init(r->gitdir,
 					 r->ref_storage_format ?
 						 r->ref_storage_format :
-						 DEFAULT_REF_STORAGE,
+						 default_ref_storage(),
 					 REF_STORE_ALL_CAPS);
 	if (getenv("GIT_DEBUG_REFS")) {
 		r->refs_private = debug_wrap(r->refs_private);
@@ -1802,7 +1802,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
+	refs = ref_store_init(submodule_sb.buf, default_ref_storage(),
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1816,7 +1816,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
-	const char *format = DEFAULT_REF_STORAGE; /* XXX */
+	const char *format = default_ref_storage();
 	struct ref_store *refs;
 	const char *id;
 
diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
index be12fb63506..c6f78325563 100755
--- a/t/t1409-avoid-packing-refs.sh
+++ b/t/t1409-avoid-packing-refs.sh
@@ -4,6 +4,12 @@ test_description='avoid rewriting packed-refs unnecessarily'
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 # Add an identifying mark to the packed-refs file header line. This
 # shouldn't upset readers, and it should be omitted if the file is
 # ever rewritten.
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 449ebc5657c..9f605433033 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -8,6 +8,12 @@ test_description='git fsck random collection of tests
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success setup '
 	test_oid_init &&
 	git config gc.auto 0 &&
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index f41b2afb996..edaef2c175a 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -11,6 +11,12 @@ semantic is still the same.
 '
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success 'enable reflogs' '
 	git config core.logallrefupdates true
 '
diff --git a/t/t9903-bash-prompt.sh b/t/t9903-bash-prompt.sh
index ab5da2cabc4..5deaaf968b0 100755
--- a/t/t9903-bash-prompt.sh
+++ b/t/t9903-bash-prompt.sh
@@ -7,6 +7,12 @@ test_description='test git-specific bash prompt functions'
 
 . ./lib-bash.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping tests; incompatible with reftable'
+  test_done
+fi
+
 . "$GIT_BUILD_DIR/contrib/completion/git-prompt.sh"
 
 actual="$TRASH_DIRECTORY/actual"
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 1b221951a8e..b2b16979407 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1514,6 +1514,11 @@ FreeBSD)
 	;;
 esac
 
+if test -n "$GIT_TEST_REFTABLE"
+then
+  test_set_prereq REFTABLE
+fi
+
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 01/13] refs.h: clarify reflog iteration order
  2020-05-11 19:46                         ` [PATCH v13 01/13] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
@ 2020-05-18 23:31                           ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-18 23:31 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  refs.h | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)

I think the split of the topics was a good first move; now we need
to make sure that we need no more work on the early part.

> diff --git a/refs.h b/refs.h
> index a92d2c74c83..99ba9e331e5 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -432,19 +432,31 @@ int delete_refs(const char *msg, struct string_list *refnames,
>  int refs_delete_reflog(struct ref_store *refs, const char *refname);
>  int delete_reflog(const char *refname);
>  
> -/* iterate over reflog entries */
> +/* Callback to process a reflog entry found by the iteration functions (see
> + * below) */

/*
 * A multi-line comment in our codebase
 * look like this, with the opening slash-asterisk
 * and closing asterisk-slash occupying their own
 * lines.
 */
>  typedef int each_reflog_ent_fn(
>  		struct object_id *old_oid, struct object_id *new_oid,
>  		const char *committer, timestamp_t timestamp,
>  		int tz, const char *msg, void *cb_data);
>  
> +/* Iterate in over reflog entries in the log for `refname`. */

"in over"???  Should this just be "iterate over reflog entries in ..."?

> +
> +/* oldest entry first */
>  int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
>  			     each_reflog_ent_fn fn, void *cb_data);
> +
> +/* youngest entry first */
>  int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
>  				     const char *refname,
>  				     each_reflog_ent_fn fn,
>  				     void *cb_data);
> +
> +/* Call a function for each reflog entry in the log for `refname`. */
> +
> +/* oldest entry first */
>  int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
> +
> +/* youngest entry first */
>  int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);

The only difference betwen refs_for_each_reflog_ent() and
for_each_reflog_ent() is that the former takes an arbitrary
ref-store while the latter works on the default ref-store, no?  They
should otherwise be identical, so describing one as "Iterate over
reflog entries" while describing the other as "Call a function for
each" feel misleading.



^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 02/13] t: use update-ref and show-ref to reading/writing refs
  2020-05-11 19:46                         ` [PATCH v13 02/13] t: use update-ref and show-ref to reading/writing refs Han-Wen Nienhuys via GitGitGadget
@ 2020-05-18 23:34                           ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-18 23:34 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> Reading and writing .git/refs/* assumes that refs are stored in the 'files'
> ref backend.
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  t/t0002-gitfile.sh             |  2 +-
>  t/t1400-update-ref.sh          | 32 ++++++++++++++++----------------
>  t/t1506-rev-parse-diagnosis.sh |  2 +-
>  t/t6050-replace.sh             |  2 +-
>  t/t9020-remote-svn.sh          |  4 ++--
>  5 files changed, 21 insertions(+), 21 deletions(-)
>
> diff --git a/t/t0002-gitfile.sh b/t/t0002-gitfile.sh
> index 0aa9908ea12..960ed150cb5 100755
> --- a/t/t0002-gitfile.sh
> +++ b/t/t0002-gitfile.sh
> @@ -62,7 +62,7 @@ test_expect_success 'check commit-tree' '
>  '
>  
>  test_expect_success 'check rev-list' '
> -	echo $SHA >"$REAL/HEAD" &&
> +	git update-ref "HEAD" "$SHA" &&
>  	test "$SHA" = "$(git rev-list HEAD)"
>  '
>  
> diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
> index e1197ac8189..27171f82612 100755
> --- a/t/t1400-update-ref.sh
> +++ b/t/t1400-update-ref.sh
> @@ -37,15 +37,15 @@ test_expect_success setup '
>  
>  test_expect_success "create $m" '
>  	git update-ref $m $A &&
> -	test $A = $(cat .git/$m)
> +	test $A = $(git show-ref -s --verify $m)

All of the above looks good.  These were written long time ago, and
we should have cleaned them up much earlier, but we didn't.

> diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
> index 52edcbdcc32..dbf690b9c1b 100755
> --- a/t/t1506-rev-parse-diagnosis.sh
> +++ b/t/t1506-rev-parse-diagnosis.sh
> @@ -207,7 +207,7 @@ test_expect_success 'arg before dashdash must be a revision (ambiguous)' '
>  	{
>  		# we do not want to use rev-parse here, because
>  		# we are testing it
> -		cat .git/refs/heads/foobar &&
> +		git show-ref -s refs/heads/foobar &&
>  		printf "%s\n" --

Likewise.

>  	} >expect &&
>  	git rev-parse foobar -- >actual &&
> diff --git a/t/t6050-replace.sh b/t/t6050-replace.sh
> index e7e64e085dd..c80dc10b8f1 100755
> --- a/t/t6050-replace.sh
> +++ b/t/t6050-replace.sh
> @@ -135,7 +135,7 @@ test_expect_success 'tag replaced commit' '
>  test_expect_success '"git fsck" works' '
>       git fsck master >fsck_master.out &&
>       test_i18ngrep "dangling commit $R" fsck_master.out &&
> -     test_i18ngrep "dangling tag $(cat .git/refs/tags/mytag)" fsck_master.out &&
> +     test_i18ngrep "dangling tag $(git show-ref -s refs/tags/mytag)" fsck_master.out &&

Likewise; this has no excuse of being old code ;-)

>       test -z "$(git fsck)"
>  '
>  
> diff --git a/t/t9020-remote-svn.sh b/t/t9020-remote-svn.sh
> index 6fca08e5e35..9fcfa969a9b 100755
> --- a/t/t9020-remote-svn.sh
> +++ b/t/t9020-remote-svn.sh
> @@ -48,8 +48,8 @@ test_expect_success REMOTE_SVN 'simple fetch' '
>  '
>  
>  test_debug '
> -	cat .git/refs/svn/svnsim/master
> -	cat .git/refs/remotes/svnsim/master
> +	git show-ref -s refs/svn/svnsim/master
> +	git show-ref -s refs/remotes/svnsim/master
>  '

Looks good.

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 03/13] refs: document how ref_iterator_advance_fn should handle symrefs
  2020-05-11 19:46                         ` [PATCH v13 03/13] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
@ 2020-05-18 23:43                           ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-18 23:43 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  refs/refs-internal.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index ff2436c0fb7..3490aac3a40 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -438,6 +438,11 @@ void base_ref_iterator_free(struct ref_iterator *iter);
>  
>  /* Virtual function declarations for ref_iterators: */
>  
> +/*
> + * backend-specific implementation of ref_iterator_advance.
> + * For symrefs, the function should set REF_ISSYMREF, and it should also
> + * dereference the symref to provide the OID referent.
> + */
>  typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);

Shouldn't we also talk about the need for the backend to yield
broken refs with ISBROKEN (iow, you are not allowed to skip---let
the caller make the decision to skip) etc.?

Other than that, this is a good additional piece of information to
tell the implementors of new backend.

Thanks.





>  
>  typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 13/13] Add some reftable testing infrastructure
  2020-05-13 19:57                           ` Junio C Hamano
@ 2020-05-19 13:54                             ` Han-Wen Nienhuys
  2020-05-19 15:21                               ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-19 13:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Wed, May 13, 2020 at 9:57 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > diff --git a/builtin/init-db.c b/builtin/init-db.c
> > index b7053b9e370..da5b4670c84 100644
> > --- a/builtin/init-db.c
> > +++ b/builtin/init-db.c
> > @@ -545,7 +545,7 @@ static const char *const init_db_usage[] = {
> >  int cmd_init_db(int argc, const char **argv, const char *prefix)
> >  {
> >       const char *git_dir;
> > -     const char *ref_storage_format = DEFAULT_REF_STORAGE;
> > +     const char *ref_storage_format = default_ref_storage();
> >       const char *real_git_dir = NULL;
> >       const char *work_tree;
> >       const char *template_dir = NULL;
> > diff --git a/refs.c b/refs.c
> > index e8751415a9e..70c11b05391 100644
> > --- a/refs.c
> > +++ b/refs.c
> > @@ -1742,7 +1742,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
> >       r->refs_private = ref_store_init(r->gitdir,
> >                                        r->ref_storage_format ?
> >                                                r->ref_storage_format :
> > -                                              DEFAULT_REF_STORAGE,
> > +                                              default_ref_storage(),
> >                                        REF_STORE_ALL_CAPS);
> >       return r->refs_private;
> >  }
>
> Would it make sense to let NULL stand for "we use the default,
> whatever it is" so that anybody outside the implementation of the
> refs API does not have to care what the default is, and make
> ref_store_init() function the only thing that knows what it is?

Is this your strong recommendation in the form of a question? :)

If it is just a question: maybe. I don't know enough of Git to say for sure.

How do submodules decide on other settings, such as the hash ID?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 13/13] Add some reftable testing infrastructure
  2020-05-19 13:54                             ` Han-Wen Nienhuys
@ 2020-05-19 15:21                               ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-19 15:21 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	brian m. carlson

Han-Wen Nienhuys <hanwen@google.com> writes:

> On Wed, May 13, 2020 at 9:57 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> > --- a/refs.c
>> > +++ b/refs.c
>> > @@ -1742,7 +1742,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
>> >       r->refs_private = ref_store_init(r->gitdir,
>> >                                        r->ref_storage_format ?
>> >                                                r->ref_storage_format :
>> > -                                              DEFAULT_REF_STORAGE,
>> > +                                              default_ref_storage(),
>> >                                        REF_STORE_ALL_CAPS);
>> >       return r->refs_private;
>> >  }
>>
>> Would it make sense to let NULL stand for "we use the default,
>> whatever it is" so that anybody outside the implementation of the
>> refs API does not have to care what the default is, and make
>> ref_store_init() function the only thing that knows what it is?
>
> Is this your strong recommendation in the form of a question? :)
>
> If it is just a question: maybe. I don't know enough of Git to say for sure.

I try not to ask a rhetorical question without marking it explicitly
as such.  The question "instead of having to sprinkle ?: all over
the place on the callers' side, would it make it better or worse to
give a 'not decided' default to the ref_storage_format field?" came
to mind while reading the series.

> How do submodules decide on other settings, such as the hash ID?

I am not sure the issue of "what default to use" is limited to the
submodule case (e.g. "git init" and "git clone" starts from having
no "current repository" and they need to make the decision among
available choices).  As I've seen this series and SHA-256 series
conflict in the "init" area, it probably is time for me to introduce
you to Brian (cc'ed) to pick his brain ;-).

Thanks.


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 04/13] reftable: file format documentation
  2020-05-11 19:46                         ` [PATCH v13 04/13] reftable: file format documentation Jonathan Nieder via GitGitGadget
@ 2020-05-19 22:00                           ` Junio C Hamano
  2020-05-20 16:06                             ` Han-Wen Nienhuys
  2020-05-20 18:52                             ` Jonathan Nieder
  0 siblings, 2 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-19 22:00 UTC (permalink / raw)
  To: Jonathan Nieder via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Jonathan Nieder

"Jonathan Nieder via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jonathan Nieder <jrnieder@gmail.com>
>
> Shawn Pearce explains:
>
> Some repositories contain a lot of references (e.g. android at 866k,
> rails at 31k). The reftable format provides:
>
> - Near constant time lookup for any single reference, even when the
>   repository is cold and not in process or kernel cache.
> - Near constant time verification a SHA-1 is referred to by at least
>   one reference (for allow-tip-sha1-in-want).

Not quite grammatical sentence?  Perhaps "if" after "verification?

> - Efficient lookup of an entire namespace, such as `refs/tags/`.
> - Support atomic push `O(size_of_update)` operations.
> - Combine reflog storage with ref storage.
>
> This file format spec was originally written in July, 2017 by Shawn
> Pearce.  Some refinements since then were made by Shawn and by Han-Wen
> Nienhuys based on experiences implementing and experimenting with the
> format.  (All of this was in the context of our work at Google and
> Google is happy to contribute the result to the Git project.)
>
> Imported from JGit[1]'s current version (c217d33ff,
> "Documentation/technical/reftable: improve repo layout", 2020-02-04)
> of Documentation/technical/reftable.md and converted to asciidoc by
> running
>
>   pandoc -t asciidoc -f markdown reftable.md >reftable.txt
>
> using pandoc 2.2.1.  The result required the following additional
> minor changes:
>
> - removed the [TOC] directive to add a table of contents, since
>   asciidoc does not support it
> - replaced git-scm.com/docs links with linkgit: directives that link
>   to other pages within Git's documentation

There are many 

	’

funny-quotes where we would prefer to place vanilla single quotes,
which may also need to be corrected in the conversion toolchain.

Typoes pointed out below may probably be from the original where
they should be corrected.

> diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
> new file mode 100644
> index 00000000000..8bad9ade256
> --- /dev/null
> +++ b/Documentation/technical/reftable.txt
> @@ -0,0 +1,1067 @@
> +reftable
> +--------
> +
> +Overview
> +~~~~~~~~
> +
> +Problem statement
> +^^^^^^^^^^^^^^^^^
> +
> +Some repositories contain a lot of references (e.g. android at 866k,

Let's not use &nbsp; here after "e.g.".  I see a normal space after "e.g."
a few lines below.

> +rails at 31k). The existing packed-refs format takes up a lot of space
> +(e.g. 62M), and does not scale with additional references. Lookup of a
> +single reference requires linearly scanning the file.
> +Atomic pushes modifying multiple references require copying the entire
> +packed-refs file, which can be a considerable amount of data moved
> +(e.g. 62M in, 62M out) for even small transactions (2 refs modified).
> +
> +Repositories with many loose references occupy a large number of disk
> +blocks from the local file system, as each reference is its own file
> +storing 41 bytes (and another file for the corresponding reflog). This
> +negatively affects the number of inodes available when a large number of
> +repositories are stored on the same filesystem. Readers can be penalized
> +due to the larger number of syscalls required to traverse and read the
> +`$GIT_DIR/refs` directory.

Another downside is that we cannot arrange atomic updates to
multiple refs over loose refs, even though the "lookup of a single
reference does not require linear scan" unlike packed-refs, (as long
as the filesystem does its job).  Worth mentioning?

> +
> +Objectives
> +^^^^^^^^^^
> +
> +* Near constant time lookup for any single reference, even when the
> +repository is cold and not in process or kernel cache.
> +* Near constant time verification if a SHA-1 is referred to by at least
> +one reference (for allow-tip-sha1-in-want).
> +* Efficient lookup of an entire namespace, such as `refs/tags/`.

Does this "lookup" refer to "do we have anything in refs/tags/
hierarchy?" or "enumerate all refs under refs/tags/ hierarchy?"

If the latter, perhaps s/lookup of/iteration over/

> +* Support atomic push with `O(size_of_update)` operations.
> +* Combine reflog storage with ref storage for small transactions.
> +* Separate reflog storage for base refs and historical logs.

> +Details
> +~~~~~~~
> +
> +Peeling
> +^^^^^^^
> +
> +References stored in a reftable are peeled, a record for an annotated
> +(or signed) tag records both the tag object, and the object it refers
> +to.

OK.  Peeled results are recorded in packed-refs file because quite
often when we use a tag object, what we actually want to access is
the commit object it points at.  We do so here for the same reason?

Not a rhetorical question, but if it invites a question from a
reader, it may deserve to be described before readers ask it.

> +Reference name encoding
> +^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Reference names are an uninterpreted sequence of bytes that must pass
> +linkgit:git-check-ref-format[1] as a valid reference name.

OK.  We want to be able to express any reference that we allow in
the current backends.

> +Key unicity
> +^^^^^^^^^^^
> +
> +Each entry must have a unique key; repeated keys are disallowed.
> +
> +Network byte order
> +^^^^^^^^^^^^^^^^^^
> +
> +All multi-byte, fixed width fields are in network byte order.
> +
> +Ordering
> +^^^^^^^^
> +
> +Blocks are lexicographically ordered by their first reference.

Key and Block are not explained until quite later, so these two
among the above three are "Huh?" to readers at this point during
their first read, but it probably cannot be helped.  Let's read on.

> +Directory/file conflicts
> +^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +The reftable format accepts both `refs/heads/foo` and
> +`refs/heads/foo/bar` as distinct references.
> +
> +This property is useful for retaining log records in reftable, but may
> +confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
> +references. Users of reftable may choose to continue to reject `foo` and
> +`foo/bar` type conflicts to prevent problems for peers.

Here "users" refer to things like "git-core", "jgit", "libgit2", etc.?

Let's say we have these two "conflicting" branches and want to
interoperate with existing versions of Git (e.g. a "git ls-remote"
client requests us to show what we have).  We could either show
"refs/heads/foo" with its object name, or "refs/heads/foo/bar" with
its object name, but not both.

"users ... may choose" implies that it is up to the implementation
of reftable user which one to show, so given a single repository,
"jgit" may show "refs/heads/foo" while "libgit2" may choose to show
the other one.

I am not sure if that is desirable---I suspect that we want to
record which one needs to be chosen so that these "D/F conflicts
disallowing" users can make consistent choices, but I dunno.

> +Block size
> +^^^^^^^^^^
> +
> +The file’s block size is arbitrarily determined by the writer, and does
> +not have to be a power of 2. The block size must be larger than the
> +longest reference name or log entry used in the repository, as
> +references cannot span blocks.
> +
> +Powers of two that are friendly to the virtual memory system or
> +filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
> +yield better compression, with a possible increased cost incurred by
> +readers during access.
> +
> +The largest block size is `16777215` bytes (15.99 MiB).

The number being "(2**24) - 1" might be as significant as "15.99
MiB" to readers.  As we recommend, and the users would find it
natural, to use powers of two, the largest block size in practice
would be 8 MiB?

> +Ref block format
> +^^^^^^^^^^^^^^^^
> +
> +A ref block is written as:
> +
> +....
> +'r'
> +uint24( block_len )
> +ref_record+
> +uint24( restart_offset )+
> +uint16( restart_count )
> +
> +padding?
> +....
> +
> +Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
> +encodes the number of bytes in the block up to, but not including the
> +optional `padding`. This is always less than or equal to the file’s
> +block size. In the first ref block, `block_len` includes 24 bytes for
> +the file header.
> +
> +The 2-byte `restart_count` stores the number of entries in the
> +`restart_offset` list, which must not be empty. Readers can use
> +`restart_count` to binary search between restarts before starting a
> +linear scan.
> +
> +Exactly `restart_count` 3-byte `restart_offset` values precedes the
> +`restart_count`. Offsets are relative to the start of the block and
> +refer to the first byte of any `ref_record` whose name has not been
> +prefix compressed. Entries in the `restart_offset` list must be sorted,
> +ascending. Readers can start linear scans from any of these records.

So the algorithm to find a record in a single block would be

 - see how big the block_len is by reading the first four bytes;

 - look at the last 16-bit word to see how many restart_offset
   entries there are;

 - bisect using restart_offset array to see where to start a linear
   scan in the ref_record array

 - linear scan the range between two adjacent offsets in
   restart_offset array.

(this is a mental exercise to make sure the information given here
is sufficient, which I think it is).

> +A variable number of `ref_record` fill the middle of the block,
> +describing reference names and values. The format is described below.
> +
> +As the first ref block shares the first file block with the file header,
> +all `restart_offset` in the first block are relative to the start of the
> +file (position 0), and include the file header. This forces the first
> +`restart_offset` to be `28`.

OK.

> +ref record
> +++++++++++
> +
> +A `ref_record` describes a single reference, storing both the name and
> +its value(s). Records are formatted as:
> +
> +....
> +varint( prefix_length )
> +varint( (suffix_length << 3) | value_type )
> +suffix
> +varint( update_index_delta )
> +value?
> +....
> +
> +The `prefix_length` field specifies how many leading bytes of the prior
> +reference record’s name should be copied to obtain this reference’s
> +name. This must be 0 for the first reference in any block, and also must
> +be 0 for any `ref_record` whose offset is listed in the `restart_offset`
> +table at the end of the block.

OK.  That's quite similar to how v4 index format shortens pathnames.
We have encode_varint() helper (in varint.c), that is used from the
index v4 code (in read-cache.c) and also untracked cache code (in
dir.c).

The OFS_DELTA codepaths, both for encoding (in packfile.c) and for
decoding (in builtin/pack-objects.c), uses the same algorithm but
open codes it without using helper functions from varint.c (could
this become leftoverbit?  I dunno).

The document clarifies that the chosen variant of varint
representation much later, but it may want to be moved up close to
where "we use network byte order" etc. are declared.

> +Recovering a reference name from any `ref_record` is a simple concat:
> +
> +....
> +this_name = prior_name[0..prefix_length] + suffix
> +....
> +
> +The `suffix_length` value provides the number of bytes available in
> +`suffix` to copy from `suffix` to complete the reference name.

It is interesting that suffix is *not* a simple NUL terminated
string.  This DOES allow a NUL byte in a refname, but because we
upfront declared that only the refnames allowed are the ones that
pass check-ref-format, that would not be an advantage the use of
varint to encode the suffix seeks.  Then what is it?  

I guess the answer is that a lot of time, the suffix length is
shorter than 16 bytes (= 128/8), so we can store 3-bit value_type
for free in such a case.  Is that worth a mention?

> +The `update_index` that last modified the reference can be obtained by
> +adding `update_index_delta` to the `min_update_index` from the file
> +header: `min_update_index + update_index_delta`.

At this point, we can infer the following from we have learned so
far by reading the document:

 - there is a quantity called "update index" attached to each ref
   record;

 - each file knows the range of update indices used in it; and

 - when files are chained together, the range of update indices do
   not overlap---all indices in one file are strictly larger or
   smaller than those in another file.

But we haven't learned at all what "update index" is and how it is
used, which makes it a frustrating read.  We probably should give a
mention of what it is here (a brief description at the same level of
detail as say "monotonically increasing counter that is used as a
transaction id when any ref update is made"), even though we will go
into much deeper details in a later section.

> +The `value` follows. Its format is determined by `value_type`, one of
> +the following:
> +
> +* `0x0`: deletion; no value data (see transactions, below)
> +* `0x1`: one 20-byte object id; value of the ref
> +* `0x2`: two 20-byte object ids; value of the ref, peeled target
> +* `0x3`: symbolic reference: `varint( target_len ) target`

We probably should write these as "one (binary) object name", and
"two (binary) object names", without hardwiring the number of bytes
needed to represent an object name.

> +Symbolic references use `0x3`, followed by the complete name of the
> +reference target. No compression is applied to the target name.

Is there a place in the file format where an incomplete name can be
stored?  If not, I think it makes it easier to read if we drop
"complete" from the sentence.

> +Types `0x4..0x7` are reserved for future use.
> +
> +Ref index
> +^^^^^^^^^
> +
> +The ref index stores the name of the last reference from every ref block
> +in the file, enabling reduced disk seeks for lookups. Any reference can
> +be found by searching the index, identifying the containing block, and
> +searching within that block.
> +
> +The index may be organized into a multi-level index, where the 1st level
> +index block points to additional ref index blocks (2nd level), which may
> +in turn point to either additional index blocks (e.g. 3rd level) or ref
> +blocks (leaf level). Disk reads required to access a ref go up with
> +higher index levels. Multi-level indexes may be required to ensure no
> +single index block exceeds the file format’s max block size of
> +`16777215` bytes (15.99 MiB). To acheive constant O(1) disk seeks for

achieve

> +lookups the index must be a single level, which is permitted to exceed
> +the file’s configured block size, but not the format’s max block size of
> +15.99 MiB.


> +Obj block format
> +^^^^^^^^^^^^^^^^
> +
> +Object blocks are optional. Writers may choose to omit object blocks,
> +especially if readers will not use the SHA-1 to ref mapping.

"the object name to ref mapping".

> +Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to

Likewise.  "unique prefix of object names no less than 2 bytes" or
somesuch to futureproof "2-20 byte SHA-1".

> +ref blocks containing references pointing to that object directly, or as
> +the peeled value of an annotated tag. Like ref blocks, object blocks use
> +the file’s standard block size. The abbrevation length is available in

abbreviation

> +the footer as `obj_id_len`.
> +
> +To save space in small files, object blocks may be omitted if the ref
> +index is not present, as brute force search will only need to read a few
> +ref blocks. When missing, readers should brute force a linear search of
> +all references to lookup by SHA-1.
> +
> +An object block is written as:
> +
> +....
> +'o'
> +uint24( block_len )
> +obj_record+
> +uint24( restart_offset )+
> +uint16( restart_count )
> +
> +padding?
> +....
> +
> +Fields are identical to ref block. Binary search using the restart table
> +works the same as in reference blocks.
> +
> +Because object identifiers are abbreviated by writers to the shortest
> +unique abbreviation within the reftable, obj key lengths are variable
> +between 2 and 20 bytes. Readers must compare only for common prefix

Futureproof "2 and 20" similarly.

> +match within an obj block or obj index.

> +Log block format
> +^^^^^^^^^^^^^^^^
> +
> +Unlike ref and obj blocks, log blocks are always unaligned.
> +
> +Log blocks are variable in size, and do not match the `block_size`
> +specified in the file header or footer. Writers should choose an
> +appropriate buffer size to prepare a log block for deflation, such as
> +`2 * block_size`.

I can guess the reason behind this design decision, but the readers
may not be able to.  Should we write it down here, or would it make
too much irrelevant details?

> +A log block is written as:
> +
> +....
> +'g'
> +uint24( block_len )
> +zlib_deflate {
> +  log_record+
> +  uint24( restart_offset )+
> +  uint16( restart_count )
> +}
> +....
> +
> +Log blocks look similar to ref blocks, except `block_type = 'g'`.
> +
> +The 4-byte block header is followed by the deflated block contents using
> +zlib deflate. The `block_len` in the header is the inflated size
> +(including 4-byte block header), and should be used by readers to
> +preallocate the inflation output buffer. A log block’s `block_len` may
> +exceed the file’s block size.
> +
> +Offsets within the log block (e.g. `restart_offset`) still include the
> +4-byte header. Readers may prefer prefixing the inflation output buffer
> +with the 4-byte header.
> +
> +Within the deflate container, a variable number of `log_record` describe
> +reference changes. The log record format is described below. See ref
> +block format (above) for a description of `restart_offset` and
> +`restart_count`.
> +
> +Because log blocks have no alignment or padding between blocks, readers
> +must keep track of the bytes consumed by the inflater to know where the
> +next log block begins.
> +
> +log record
> +++++++++++
> +
> +Log record keys are structured as:
> +
> +....
> +ref_name '\0' reverse_int64( update_index )
> +....
> +
> +where `update_index` is the unique transaction identifier. The
> +`update_index` field must be unique within the scope of a `ref_name`.
> +See the update transactions section below for further details.
> +
> +The `reverse_int64` function inverses the value so lexographical

lexicographical

> +ordering the network byte order encoding sorts the more recent records
> +with higher `update_index` values first:
> +
> +....
> +reverse_int64(int64 t) {
> +  return 0xffffffffffffffff - t;
> +}
> +....

Rationale?  It may be to ease the iteration over reflog i.e. "log
-g" that wants to go from the youngest to older---isn't it worth
mentioning?

> +Log records have a similar starting structure to ref and index records,
> +utilizing the same prefix compression scheme applied to the log record
> +key described above.
> +
> +....
> +    varint( prefix_length )
> +    varint( (suffix_length << 3) | log_type )
> +    suffix
> +    log_data {
> +      old_id
> +      new_id
> +      varint( name_length    )  name
> +      varint( email_length   )  email
> +      varint( time_seconds )
> +      sint16( tz_offset )
> +      varint( message_length )  message
> +    }?
> +....
> +
> +Log record entries use `log_type` to indicate what follows:
> +
> +* `0x0`: deletion; no log data.
> +* `0x1`: standard git reflog data using `log_data` above.
> +
> +The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
> +entry from the reflog of `refs/stash` in a transaction file (below),
> +without needing to rewrite larger files. Readers reading a stack of
> +reflogs must treat this as a deletion.
> +
> +For `log_type = 0x1`, the `log_data` section follows
> +linkgit:git-update-ref[1] logging and includes:
> +
> +* two 20-byte SHA-1s (old id, new id)

"two (binary) object names (old name, new name)" for futureproof out
of SHA-1 world.

> +* varint string of committer’s name
> +* varint string of committer’s email
> +* varint time in seconds since epoch (Jan 1, 1970)
> +* 2-byte timezone offset in minutes (signed)

We use minus eight hundred for "GMT-0800" internally, but this would
use -480, which makes more sense ;-)

> +* varint string of message
> +
> +`tz_offset` is the absolute number of minutes from GMT the committer was
> +at the time of the update. For example `GMT-0800` is encoded in reftable
> +as `sint16(-480)` and `GMT+0230` is `sint16(150)`.
> +
> +The committer email does not contain `<` or `>`, it’s the value normally
> +found between the `<>` in a git commit object header.

Saving two precious bytes?

This is a tangent but in a repository at hosting provider, whose
primary (and often the only) source of updates are by end-user
pushing into it, if reflogs are enabled, whose name and email are
recorded in the logs?  The committer or tagger of the object that
sits at the tip of the ref after the update?  What happens when a
blob is pushed to update a ref?  Or would it be just a single "user"
that represents the "server operator"?

We know in a non-bare repository an individual contributor works on
typically records only one <name, email> in the reflog: the user who
works in it.

What I am trying to get at is if it makes more sense to have a small
table of unique <name, email> pairs used in the file and have
log_data record a single varint that is the index into that
"committer ident" table.  I would suspect that it would give us
significantly more gain than mere <> two bytes per log_data entry.

> +The `message_length` may be 0, in which case there was no message
> +supplied for the update.
> +
> +Contrary to traditional reflog (which is a file), renames are encoded as
> +a combination of ref deletion and ref creation.

Yay?  How does the deletion record look like?  The new object name
being 0*hashlength?  I didn't see it defined in the description (and
I am guessing that log_type of 0x0 is *NOT* used for that purpose).

So, NEEDSWORK: describe how "creation of a ref" and "deletion of a ref"
appears in a log as a log record entry.

> +Footer
> +^^^^^^
> +
> +After the last block of the file, a file footer is written. It begins
> +like the file header, but is extended with additional data.
> +
> +A 68-byte footer appears at the end:
> +
> +....
> +    'REFT'
> +    uint8( version_number = 1 )
> +    uint24( block_size )
> +    uint64( min_update_index )
> +    uint64( max_update_index )
> +
> +    uint64( ref_index_position )
> +    uint64( (obj_position << 5) | obj_id_len )
> +    uint64( obj_index_position )
> +
> +    uint64( log_position )
> +    uint64( log_index_position )
> +
> +    uint32( CRC-32 of above )
> +....
> +
> +If a section is missing (e.g. ref index) the corresponding position
> +field (e.g. `ref_index_position`) will be 0.
> +
> +* `obj_position`: byte position for the first obj block.
> +* `obj_id_len`: number of bytes used to abbreviate object identifiers in
> +obj blocks.

Should we write "this can be up to 31" somewhere?  It is more than
enough for SHA-1 and not quite sufficient for SHA-256 (unless we say
"we store obj_id_len-1 here")?

> +* `log_position`: byte position for the first log block.
> +* `ref_index_position`: byte position for the start of the ref index.
> +* `obj_index_position`: byte position for the start of the obj index.
> +* `log_index_position`: byte position for the start of the log index.


> +Varint encoding
> +^^^^^^^^^^^^^^^
> +
> +Varint encoding is identical to the ofs-delta encoding method used
> +within pack files.
> +
> +Decoder works such as:
> +
> +....
> +val = buf[ptr] & 0x7f
> +while (buf[ptr] & 0x80) {
> +  ptr++
> +  val = ((val + 1) << 7) | (buf[ptr] & 0x7f)
> +}
> +....

As already said, I think this should be given upfront next to where
we declare that we use network byte order.

> +Restart point selection
> ++++++++++++++++++++++++
> +
> +Writers determine the restart points at file creation. The process is
> +arbitrary, but every 16 or 64 records is recommended. Every 16 may be
> +more suitable for smaller block sizes (4k or 8k), every 64 for larger
> +block sizes (64k).
> +
> +More frequent restart points reduces prefix compression and increases
> +space consumed by the restart table, both of which increase file size.
> +
> +Less frequent restart points makes prefix compression more effective,
> +decreasing overall file size, with increased penalities for readers

penalties

> +walking through more records after the binary search step.
> +
> +A maximum of `65535` restart points per block is supported.


> +LMDB
> +^^^^
> +
> +David Turner proposed
> +https://public-inbox.org/git/1455772670-21142-26-git-send-email-dturner@twopensource.com/[using
> +LMDB], as LMDB is lightweight (64k of runtime code) and GPL-compatible
> +license.
> +
> +A downside of LMDB is its reliance on a single C implementation. This
> +makes embedding inside JGit (a popular reimplemenation of Git)

reimplementation

> +difficult, and hoisting onto virtual storage (for JGit DFS) virtually
> +impossible.
> +
> +A common format that can be supported by all major Git implementations
> +(git-core, JGit, libgit2) is strongly preferred.
> +
> +Future
> +~~~~~~
> +
> +Longer hashes
> +^^^^^^^^^^^^^
> +
> +Version will bump (e.g. 2) to indicate `value` uses a different object
> +id length other than 20. The length could be stored in an expanded file
> +header, or hardcoded as part of the version.


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 05/13] reftable: clarify how empty tables should be written
  2020-05-11 19:46                         ` [PATCH v13 05/13] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
@ 2020-05-19 22:01                           ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-19 22:01 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> The format allows for some ambiguity, as a lone footer also starts
> with a valid file header. However, the current JGit code will barf on
> this. This commit codifies this behavior into the standard.

Nice to see the documentation being careful.

>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  Documentation/technical/reftable.txt | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
> index 8bad9ade256..6223538d64e 100644
> --- a/Documentation/technical/reftable.txt
> +++ b/Documentation/technical/reftable.txt
> @@ -715,6 +715,13 @@ version)
>  
>  Once verified, the other fields of the footer can be accessed.
>  
> +Empty tables
> +++++++++++++
> +
> +A reftable may be empty. In this case, the file starts with a header
> +and is immediately followed by a footer.
> +
> +
>  Varint encoding
>  ^^^^^^^^^^^^^^^

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 06/13] reftable: define version 2 of the spec to accomodate SHA256
  2020-05-11 19:46                         ` [PATCH v13 06/13] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
@ 2020-05-19 22:32                           ` Junio C Hamano
  2020-05-20 12:38                             ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-05-19 22:32 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> Version appends a hash ID to the file header, making it slightly larger.
>
> This commit also changes "SHA-1" into "object ID" in many places.
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  Documentation/technical/reftable.txt | 79 ++++++++++++++++------------
>  1 file changed, 44 insertions(+), 35 deletions(-)
>
> diff --git a/Documentation/technical/reftable.txt b/Documentation/technical/reftable.txt
> index 6223538d64e..1464c4e7437 100644
> --- a/Documentation/technical/reftable.txt
> +++ b/Documentation/technical/reftable.txt
> @@ -29,7 +29,7 @@ Objectives
>  
>  * Near constant time lookup for any single reference, even when the
>  repository is cold and not in process or kernel cache.
> -* Near constant time verification if a SHA-1 is referred to by at least
> +* Near constant time verification if an object ID is referred to by at least
>  one reference (for allow-tip-sha1-in-want).

Good.  These are called "object names", though.

> @@ -193,8 +193,8 @@ and non-aligned files.
>  Very small files (e.g. a single ref block) may omit `padding` and the ref
>  index to reduce total file size.
>  
> -Header
> -^^^^^^
> +Header (version 1)
> +^^^^^^^^^^^^^^^^^^
>  
>  A 24-byte header appears at the beginning of the file:
>  
> @@ -215,6 +215,27 @@ used in a stack for link:#Update-transactions[transactions], these
>  fields can order the files such that the prior file’s
>  `max_update_index + 1` is the next file’s `min_update_index`.
>  
> +Header (version 2)
> +^^^^^^^^^^^^^^^^^^
> +
> +A 28-byte header appears at the beginning of the file:
> +
> +....
> +'REFT'
> +uint8( version_number = 2 )
> +uint24( block_size )
> +uint64( min_update_index )
> +uint64( max_update_index )
> +uint32( hash_id )
> +....
> +
> +The header is identical to `version_number=1`, with the 4-byte hash ID
> +("sha1" for SHA1 and "s256" for SHA-256) append to the header.
> +
> +For maximum backward compatibility, it is recommended to use version 1 when
> +writing SHA1 reftables.
> +
> +
>  First ref block
>  ^^^^^^^^^^^^^^^
>  
> @@ -302,8 +323,8 @@ The `value` follows. Its format is determined by `value_type`, one of
>  the following:
>  
>  * `0x0`: deletion; no value data (see transactions, below)
> -* `0x1`: one 20-byte object id; value of the ref
> -* `0x2`: two 20-byte object ids; value of the ref, peeled target
> +* `0x1`: one object id; value of the ref
> +* `0x2`: two object ids; value of the ref, peeled target

Ah, OK, I pointed out these future-proofing for the previous step,
but as long as the end result is written in a hash-algorithm
agnostic way, it is OK.  Again these are called "object names",
though.

>  * `0x3`: symbolic reference: `varint( target_len ) target`

> @@ -434,7 +455,7 @@ works the same as in reference blocks.
>  
>  Because object identifiers are abbreviated by writers to the shortest
>  unique abbreviation within the reftable, obj key lengths are variable
> -between 2 and 20 bytes. Readers must compare only for common prefix
> +between 2 and 32 bytes. Readers must compare only for common prefix

Is it allowed for a reftable file whose hash_id field says "sha1" to
use more than 20 bytes of obj key?  Phrasing it like "unique prefix
of object name, no shorter than 2 bytes" would avoid the problem, I
would think.

This version also adds more


	’

apostrophes, where we would prefer to place vanilla single quotes,
which may need to be corrected in the conversion toolchain.

I did not see any new typo introduced in this step.

Thanks.



^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 06/13] reftable: define version 2 of the spec to accomodate SHA256
  2020-05-19 22:32                           ` Junio C Hamano
@ 2020-05-20 12:38                             ` Han-Wen Nienhuys
  2020-05-20 14:40                               ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-20 12:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Wed, May 20, 2020 at 12:32 AM Junio C Hamano <gitster@pobox.com> wrote:
> > -* Near constant time verification if a SHA-1 is referred to by at least
> > +* Near constant time verification if an object ID is referred to by at least
> >  one reference (for allow-tip-sha1-in-want).
>
> Good.  These are called "object names", though.
>

Fixed throughout.

All of the source code (libgit2, Jgit, git itself) calls this object
ID (git_oid, ObjectId, struct object_id respectively), so it's a bit
surprising.


> >  * `0x3`: symbolic reference: `varint( target_len ) target`
>
> > @@ -434,7 +455,7 @@ works the same as in reference blocks.
> >
> >  Because object identifiers are abbreviated by writers to the shortest
> >  unique abbreviation within the reftable, obj key lengths are variable
> > -between 2 and 20 bytes. Readers must compare only for common prefix
> > +between 2 and 32 bytes. Readers must compare only for common prefix
>
> Is it allowed for a reftable file whose hash_id field says "sha1" to
> use more than 20 bytes of obj key?  Phrasing it like "unique prefix
> of object name, no shorter than 2 bytes" would avoid the problem, I
> would think.

rephrased.

> This version also adds more
>
>
>         ’

curious. I have no idea where they came from. Fixed now.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 06/13] reftable: define version 2 of the spec to accomodate SHA256
  2020-05-20 12:38                             ` Han-Wen Nienhuys
@ 2020-05-20 14:40                               ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-20 14:40 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

Han-Wen Nienhuys <hanwen@google.com> writes:

>> This version also adds more
>>
>>
>>         ’
>
> curious. I have no idea where they came from. Fixed now.

The version 1 format in 04/13 has the same; it may already be in the
original document, or the conversion process via pandoc transformed
a plain single quote intelligently?


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 04/13] reftable: file format documentation
  2020-05-19 22:00                           ` Junio C Hamano
@ 2020-05-20 16:06                             ` Han-Wen Nienhuys
  2020-05-20 17:20                               ` Han-Wen Nienhuys
  2020-05-20 18:52                             ` Jonathan Nieder
  1 sibling, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-20 16:06 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder via GitGitGadget, git, Han-Wen Nienhuys, Jonathan Nieder

On Wed, May 20, 2020 at 12:00 AM Junio C Hamano <gitster@pobox.com> wrote:
> > Some repositories contain a lot of references (e.g. android at 866k,
> > rails at 31k). The reftable format provides:
> >
> > - Near constant time lookup for any single reference, even when the
> >   repository is cold and not in process or kernel cache.
> > - Near constant time verification a SHA-1 is referred to by at least
> >   one reference (for allow-tip-sha1-in-want).
>
> Not quite grammatical sentence?  Perhaps "if" after "verification?

Done.

> > - Efficient lookup of an entire namespace, such as `refs/tags/`.
> > - Support atomic push `O(size_of_update)` operations.
> > - Combine reflog storage with ref storage.
> >
> > This file format spec was originally written in July, 2017 by Shawn
> > Pearce.  Some refinements since then were made by Shawn and by Han-Wen
> > Nienhuys based on experiences implementing and experimenting with the
> > format.  (All of this was in the context of our work at Google and
> > Google is happy to contribute the result to the Git project.)
> >
> > Imported from JGit[1]'s current version (c217d33ff,
> > "Documentation/technical/reftable: improve repo layout", 2020-02-04)
> > of Documentation/technical/reftable.md and converted to asciidoc by
> > running
> >
> >   pandoc -t asciidoc -f markdown reftable.md >reftable.txt
> >
> > using pandoc 2.2.1.  The result required the following additional
> > minor changes:
> >
> > - removed the [TOC] directive to add a table of contents, since
> >   asciidoc does not support it
> > - replaced git-scm.com/docs links with linkgit: directives that link
> >   to other pages within Git's documentation
>
> There are many
>
>         ’

fixed.

> Typoes pointed out below may probably be from the original where
> they should be corrected.

fixed.
> Let's not use &nbsp; here after "e.g.".  I see a normal space after "e.g."
> a few lines below.

fixed.

> > +rails at 31k). The existing packed-refs format takes up a lot of space
> > +(e.g. 62M), and does not scale with additional references. Lookup of a
> > +single reference requires linearly scanning the file.
> > +
> > +Atomic pushes modifying multiple references require copying the entire
> > +packed-refs file, which can be a considerable amount of data moved
> > +(e.g. 62M in, 62M out) for even small transactions (2 refs modified).

(*)

> > +Repositories with many loose references occupy a large number of disk
> > +blocks from the local file system, as each reference is its own file
> > +storing 41 bytes (and another file for the corresponding reflog). This
> > +negatively affects the number of inodes available when a large number of
> > +repositories are stored on the same filesystem. Readers can be penalized
> > +due to the larger number of syscalls required to traverse and read the
> > +`$GIT_DIR/refs` directory.
>
> Another downside is that we cannot arrange atomic updates to
> multiple refs over loose refs, even though the "lookup of a single
> reference does not require linear scan" unlike packed-refs, (as long
> as the filesystem does its job).  Worth mentioning?

that's already there at (*), no?

> > +* Near constant time lookup for any single reference, even when the
> > +repository is cold and not in process or kernel cache.
> > +* Near constant time verification if a SHA-1 is referred to by at least
> > +one reference (for allow-tip-sha1-in-want).
> > +* Efficient lookup of an entire namespace, such as `refs/tags/`.
>
> Does this "lookup" refer to "do we have anything in refs/tags/
> hierarchy?" or "enumerate all refs under refs/tags/ hierarchy?"
>
> If the latter, perhaps s/lookup of/iteration over/

changed to "efficient enumeration".

> > +Peeling
> > +^^^^^^^
> > +
> > +References stored in a reftable are peeled, a record for an annotated
> > +(or signed) tag records both the tag object, and the object it refers
> > +to.
>
> OK.  Peeled results are recorded in packed-refs file because quite
> often when we use a tag object, what we actually want to access is
> the commit object it points at.  We do so here for the same reason?

I think so. Added a reference to packed-refs

> > +Directory/file conflicts
> > +^^^^^^^^^^^^^^^^^^^^^^^^
> > +
> > +The reftable format accepts both `refs/heads/foo` and
> > +`refs/heads/foo/bar` as distinct references.
> > +
> > +This property is useful for retaining log records in reftable, but may
> > +confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
> > +references. Users of reftable may choose to continue to reject `foo` and
> > +`foo/bar` type conflicts to prevent problems for peers.
>
> Here "users" refer to things like "git-core", "jgit", "libgit2", etc.?
>
> Let's say we have these two "conflicting" branches and want to
> interoperate with existing versions of Git (e.g. a "git ls-remote"
> client requests us to show what we have).  We could either show
> "refs/heads/foo" with its object name, or "refs/heads/foo/bar" with
> its object name, but not both.
>
> "users ... may choose" implies that it is up to the implementation
> of reftable user which one to show, so given a single repository,
> "jgit" may show "refs/heads/foo" while "libgit2" may choose to show
> the other one.
>
> I am not sure if that is desirable---I suspect that we want to
> record which one needs to be chosen so that these "D/F conflicts
> disallowing" users can make consistent choices, but I dunno.

I think it's beyond the scope of the document. In reftable the
library, I added an option to validate for file/dir conflicts.

(Google's deployment doesn't do the validation completely correctly.
We had a ref ending in '/' for example.)

>
> > +Block size
> > +^^^^^^^^^^
> > +
> > +The file’s block size is arbitrarily determined by the writer, and does
> > +not have to be a power of 2. The block size must be larger than the
> > +longest reference name or log entry used in the repository, as
> > +references cannot span blocks.
> > +
> > +Powers of two that are friendly to the virtual memory system or
> > +filesystem (such as 4k or 8k) are recommended. Larger sizes (64k) can
> > +yield better compression, with a possible increased cost incurred by
> > +readers during access.
> > +
> > +The largest block size is `16777215` bytes (15.99 MiB).
>
> The number being "(2**24) - 1" might be as significant as "15.99
> MiB" to readers.  As we recommend, and the users would find it
> natural, to use powers of two, the largest block size in practice
> would be 8 MiB?

Beyond 4096 bytes (size of a page on x86), I don't think there is a
need for powers of two. What makes for a good block size would also be
decided by the characteristics of the underlying storage system. At
google, we use 64k, IIRC.


> > +Ref block format
> > +^^^^^^^^^^^^^^^^
> > +
> > +A ref block is written as:
> > +
> > +....
> > +'r'
> > +uint24( block_len )
> > +ref_record+
> > +uint24( restart_offset )+
> > +uint16( restart_count )
> > +
> > +padding?
> > +....
> > +
> > +Blocks begin with `block_type = 'r'` and a 3-byte `block_len` which
> > +encodes the number of bytes in the block up to, but not including the
> > +optional `padding`. This is always less than or equal to the file’s
> > +block size. In the first ref block, `block_len` includes 24 bytes for
> > +the file header.
> > +
> > +The 2-byte `restart_count` stores the number of entries in the
> > +`restart_offset` list, which must not be empty. Readers can use
> > +`restart_count` to binary search between restarts before starting a
> > +linear scan.
> > +
> > +Exactly `restart_count` 3-byte `restart_offset` values precedes the
> > +`restart_count`. Offsets are relative to the start of the block and
> > +refer to the first byte of any `ref_record` whose name has not been
> > +prefix compressed. Entries in the `restart_offset` list must be sorted,
> > +ascending. Readers can start linear scans from any of these records.
>
> So the algorithm to find a record in a single block would be
>
>  - see how big the block_len is by reading the first four bytes;
>
>  - look at the last 16-bit word to see how many restart_offset
>    entries there are;
>
>  - bisect using restart_offset array to see where to start a linear
>    scan in the ref_record array
>
>  - linear scan the range between two adjacent offsets in
>    restart_offset array.

yes, that works. In practice, you find the last restart before the
key, and then iterate until you find a key that is >= the wanted key.
There is no real need to find the next restart.

> The document clarifies that the chosen variant of varint
> representation much later, but it may want to be moved up close to
> where "we use network byte order" etc. are declared.

moved.

> > +Recovering a reference name from any `ref_record` is a simple concat:
> > +
> > +....
> > +this_name = prior_name[0..prefix_length] + suffix
> > +....
> > +
> > +The `suffix_length` value provides the number of bytes available in
> > +`suffix` to copy from `suffix` to complete the reference name.
>
> It is interesting that suffix is *not* a simple NUL terminated
> string.  This DOES allow a NUL byte in a refname, but because we
> upfront declared that only the refnames allowed are the ones that
> pass check-ref-format, that would not be an advantage the use of
> varint to encode the suffix seeks.  Then what is it?

reflog records encode update_index in the key, which may lead to keys
with zero bytes. So you can't rely on zero termination.

> I guess the answer is that a lot of time, the suffix length is
> shorter than 16 bytes (= 128/8), so we can store 3-bit value_type
> for free in such a case.  Is that worth a mention?

Here, and in other places, I would like to avoid annotating rationales
in detail.  First of all, this was Shawn's spec; if I go and rewrite
whole sections, the authorship becomes muddled. Second, a lot of the
"why" will invite questions how to do it differently to achieve the
same objectives (see below, where you muse about '<' and '>'), and
that's a past station given that it's in JGit. Finally, all these
edits cause extra work with conflict resolution in follow-up changes
in the series. If we want to editorialize more, I'd rather do it in a
follow up series.


> > +The `update_index` that last modified the reference can be obtained by
> > +adding `update_index_delta` to the `min_update_index` from the file
> > +header: `min_update_index + update_index_delta`.
>
> At this point, we can infer the following from we have learned so
> far by reading the document:
>
>  - there is a quantity called "update index" attached to each ref
>    record;
>
>  - each file knows the range of update indices used in it; and
>
>
> But we haven't learned at all what "update index" is and how it is
> used, which makes it a frustrating read.  We probably should give a
> mention of what it is here (a brief description at the same level of
> detail as say "monotonically increasing counter that is used as a
> transaction id when any ref update is made"), even though we will go
> into much deeper details in a later section.

I think update_index is poorly named. I think it's supposed to be the
index into an ordered array of updates.

A better name would be 'logical timestamp'. It is a timestamp, because
we use it to determine which update came first, but it is 'logical'
because it can't depend on actual clocks, because of synchronization
problems.  Inside Google, I think we use microsecond Spanner
timestamps of the database transaction on the global ref database.

In a local setting, it's good enough to use a sequential number, which
is determined by incrementing the last update_index.

>  - when files are chained together, the range of update indices do
>    not overlap---all indices in one file are strictly larger or
>    smaller than those in another file.

That's what I thought initially too , but it is a dangerous
assumption. See reftable/merged.c,  merged_iter_next_entry() for a
comment.

At Google, in the datacenter deployment, update indices of different
reftable can overlap. This can happen when compaction and replication
delay interact.

> We probably should write these as "one (binary) object name", and
> "two (binary) object names", without hardwiring the number of bytes
> needed to represent an object name.

This is addressed in the follow-up change for SHA256. Let's leave this alone.

> > +Symbolic references use `0x3`, followed by the complete name of the
> > +reference target. No compression is applied to the target name.
>
> Is there a place in the file format where an incomplete name can be
> stored?  If not, I think it makes it easier to read if we drop
> "complete" from the sentence.

All of the keys prefix compressed, so most refnames are incomplete.

> > +Types `0x4..0x7` are reserved for future use.
> > +
> > +Ref index
> > +^^^^^^^^^
> > +
> > +The ref index stores the name of the last reference from every ref block
> > +in the file, enabling reduced disk seeks for lookups. Any reference can
> > +be found by searching the index, identifying the containing block, and
> > +searching within that block.
> > +
> > +The index may be organized into a multi-level index, where the 1st level
> > +index block points to additional ref index blocks (2nd level), which may
> > +in turn point to either additional index blocks (e.g. 3rd level) or ref
> > +blocks (leaf level). Disk reads required to access a ref go up with
> > +higher index levels. Multi-level indexes may be required to ensure no
> > +single index block exceeds the file format’s max block size of
> > +`16777215` bytes (15.99 MiB). To acheive constant O(1) disk seeks for
>
> achieve
>
> > +lookups the index must be a single level, which is permitted to exceed
> > +the file’s configured block size, but not the format’s max block size of
> > +15.99 MiB.
>
>
> > +Obj block format
> > +^^^^^^^^^^^^^^^^
> > +
> > +Object blocks are optional. Writers may choose to omit object blocks,
> > +especially if readers will not use the SHA-1 to ref mapping.
>
> "the object name to ref mapping".
>
> > +Object blocks use unique, abbreviated 2-20 byte SHA-1 keys, mapping to
>
> Likewise.  "unique prefix of object names no less than 2 bytes" or
> somesuch to futureproof "2-20 byte SHA-1".
>
> > +ref blocks containing references pointing to that object directly, or as
> > +the peeled value of an annotated tag. Like ref blocks, object blocks use
> > +the file’s standard block size. The abbrevation length is available in
>
> abbreviation
>
> > +the footer as `obj_id_len`.
> > +
> > +To save space in small files, object blocks may be omitted if the ref
> > +index is not present, as brute force search will only need to read a few
> > +ref blocks. When missing, readers should brute force a linear search of
> > +all references to lookup by SHA-1.
> > +
> > +An object block is written as:
> > +
> > +....
> > +'o'
> > +uint24( block_len )
> > +obj_record+
> > +uint24( restart_offset )+
> > +uint16( restart_count )
> > +
> > +padding?
> > +....
> > +
> > +Fields are identical to ref block. Binary search using the restart table
> > +works the same as in reference blocks.
> > +
> > +Because object identifiers are abbreviated by writers to the shortest
> > +unique abbreviation within the reftable, obj key lengths are variable
> > +between 2 and 20 bytes. Readers must compare only for common prefix
>
> Futureproof "2 and 20" similarly.
>
> > +match within an obj block or obj index.
>
> > +Log block format
> > +^^^^^^^^^^^^^^^^
> > +
> > +Unlike ref and obj blocks, log blocks are always unaligned.
> > +
> > +Log blocks are variable in size, and do not match the `block_size`
> > +specified in the file header or footer. Writers should choose an
> > +appropriate buffer size to prepare a log block for deflation, such as
> > +`2 * block_size`.
>
> I can guess the reason behind this design decision, but the readers
> may not be able to.  Should we write it down here, or would it make
> too much irrelevant details?

I honestly don't know about the reason. reflogs will compress well
(because there is lots of ascii text), but it makes the format
needlessly complex IMO. But it's there, so we can't do much about it
now.

> > +A log block is written as:
> > +
> > +....
> > +'g'
> > +uint24( block_len )
> > +zlib_deflate {
> > +  log_record+
> > +  uint24( restart_offset )+
> > +  uint16( restart_count )
> > +}
> > +....
> > +
> > +Log blocks look similar to ref blocks, except `block_type = 'g'`.
> > +
> > +The 4-byte block header is followed by the deflated block contents using
> > +zlib deflate. The `block_len` in the header is the inflated size
> > +(including 4-byte block header), and should be used by readers to
> > +preallocate the inflation output buffer. A log block’s `block_len` may
> > +exceed the file’s block size.
> > +
> > +Offsets within the log block (e.g. `restart_offset`) still include the
> > +4-byte header. Readers may prefer prefixing the inflation output buffer
> > +with the 4-byte header.
> > +
> > +Within the deflate container, a variable number of `log_record` describe
> > +reference changes. The log record format is described below. See ref
> > +block format (above) for a description of `restart_offset` and
> > +`restart_count`.
> > +
> > +Because log blocks have no alignment or padding between blocks, readers
> > +must keep track of the bytes consumed by the inflater to know where the
> > +next log block begins.
> > +
> > +log record
> > +++++++++++
> > +
> > +Log record keys are structured as:
> > +
> > +....
> > +ref_name '\0' reverse_int64( update_index )
> > +....
> > +
> > +where `update_index` is the unique transaction identifier. The
> > +`update_index` field must be unique within the scope of a `ref_name`.
> > +See the update transactions section below for further details.
> > +
> > +The `reverse_int64` function inverses the value so lexographical
>
> lexicographical
>
> > +ordering the network byte order encoding sorts the more recent records
> > +with higher `update_index` values first:
> > +
> > +....
> > +reverse_int64(int64 t) {
> > +  return 0xffffffffffffffff - t;
> > +}
> > +....
>
> Rationale?  It may be to ease the iteration over reflog i.e. "log
> -g" that wants to go from the youngest to older---isn't it worth
> mentioning?
>
> > +Log records have a similar starting structure to ref and index records,
> > +utilizing the same prefix compression scheme applied to the log record
> > +key described above.
> > +
> > +....
> > +    varint( prefix_length )
> > +    varint( (suffix_length << 3) | log_type )
> > +    suffix
> > +    log_data {
> > +      old_id
> > +      new_id
> > +      varint( name_length    )  name
> > +      varint( email_length   )  email
> > +      varint( time_seconds )
> > +      sint16( tz_offset )
> > +      varint( message_length )  message
> > +    }?
> > +....
> > +
> > +Log record entries use `log_type` to indicate what follows:
> > +
> > +* `0x0`: deletion; no log data.
> > +* `0x1`: standard git reflog data using `log_data` above.
> > +
> > +The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
> > +entry from the reflog of `refs/stash` in a transaction file (below),
> > +without needing to rewrite larger files. Readers reading a stack of
> > +reflogs must treat this as a deletion.
> > +
> > +For `log_type = 0x1`, the `log_data` section follows
> > +linkgit:git-update-ref[1] logging and includes:
> > +
> > +* two 20-byte SHA-1s (old id, new id)
>
> "two (binary) object names (old name, new name)" for futureproof out
> of SHA-1 world.
>
> > +* varint string of committer’s name
> > +* varint string of committer’s email
> > +* varint time in seconds since epoch (Jan 1, 1970)
> > +* 2-byte timezone offset in minutes (signed)
>
> We use minus eight hundred for "GMT-0800" internally, but this would
> use -480, which makes more sense ;-)
>
> > +* varint string of message
> > +
> > +`tz_offset` is the absolute number of minutes from GMT the committer was
> > +at the time of the update. For example `GMT-0800` is encoded in reftable
> > +as `sint16(-480)` and `GMT+0230` is `sint16(150)`.
> > +
> > +The committer email does not contain `<` or `>`, it’s the value normally
> > +found between the `<>` in a git commit object header.
>
> Saving two precious bytes?

If you store reflogs in ascii files, both for convention, and  to
separate the name from the email. Internally, it's much more
convenient to not have the '<' '>'. For any practical application of
the email address (Say, you want to send out an email to the user),
you'll need to strip off the < >

> This is a tangent but in a repository at hosting provider, whose
> primary (and often the only) source of updates are by end-user
> pushing into it, if reflogs are enabled, whose name and email are
> recorded in the logs?  The committer or tagger of the object that
> sits at the tip of the ref after the update?  What happens when a
> blob is pushed to update a ref?  Or would it be just a single "user"
> that represents the "server operator"?

That's up to the caller to decide. I think JGit uses the committer
identity, but I'd have to check.

> We know in a non-bare repository an individual contributor works on
> typically records only one <name, email> in the reflog: the user who
> works in it.
>
> What I am trying to get at is if it makes more sense to have a small
> table of unique <name, email> pairs used in the file and have
> log_data record a single varint that is the index into that
> "committer ident" table.  I would suspect that it would give us
> significantly more gain than mere <> two bytes per log_data entry.

gain in what dimension? Space-wise, the zlib compression will remove
all slack anyway.

> > +The `message_length` may be 0, in which case there was no message
> > +supplied for the update.
> > +
> > +Contrary to traditional reflog (which is a file), renames are encoded as
> > +a combination of ref deletion and ref creation.
>
> Yay?  How does the deletion record look like?  The new object name
> being 0*hashlength?  I didn't see it defined in the description (and
> I am guessing that log_type of 0x0 is *NOT* used for that purpose).

quoting the spec:

"Log record entries use `log_type` to indicate what follows:

* `0x0`: deletion; no log data."

> So, NEEDSWORK: describe how "creation of a ref" and "deletion of a ref"
> appears in a log as a log record entry.

the creation would be the appearance of a reflog record for the ref.
You'd have to search back in time to decide if a reflog record it was
an update to an existing record or a creation.

> > +....
>
> As already said, I think this should be given upfront next to where
> we declare that we use network byte order.

I moved this

--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 04/13] reftable: file format documentation
  2020-05-20 16:06                             ` Han-Wen Nienhuys
@ 2020-05-20 17:20                               ` Han-Wen Nienhuys
  2020-05-20 17:25                                 ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-20 17:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder via GitGitGadget, git, Han-Wen Nienhuys, Jonathan Nieder

On Wed, May 20, 2020 at 6:06 PM Han-Wen Nienhuys <hanwen@google.com> wrote:
> > > +The `message_length` may be 0, in which case there was no message
> > > +supplied for the update.
> > > +
> > > +Contrary to traditional reflog (which is a file), renames are encoded as
> > > +a combination of ref deletion and ref creation.
> >
> > Yay?  How does the deletion record look like?  The new object name
> > being 0*hashlength?  I didn't see it defined in the description (and
> > I am guessing that log_type of 0x0 is *NOT* used for that purpose).
>
> quoting the spec:
>
> "Log record entries use `log_type` to indicate what follows:
>
> * `0x0`: deletion; no log data."
>
> > So, NEEDSWORK: describe how "creation of a ref" and "deletion of a ref"
> > appears in a log as a log record entry.
>
> the creation would be the appearance of a reflog record for the ref.
> You'd have to search back in time to decide if a reflog record it was
> an update to an existing record or a creation.

Correction. This is one of the things that confused me earlier: reflog
entries for creating and deleting branches look like this

    000000000000 -> xxxxx      (create)
    xxxxxx.. xx        -> 00000 .. (delete)

respectively. When the rename happens, we can signal that the deletion
and addition are linked, because they occur at the same update_index.

The deletion records for logs (type=0x0) remove a reflog entry at a
specific (earlier used) update_index. So you could have the following
situation:

0x0001.ref : reflog "refs/heads/master" @ update_index=0x0001,
new=xxx, old=yyy ...

and then a subsequent table could specify

0x0020.ref : reflog "refs/heads/master" @ update_index=0x0001 (type=0x0)

which would hide the earlier reflog entry.

Jonathan Nieder said that this is used for git-stash, but I have never
understood why this is necessary, and would love to clarify this
better.

I'll clarify the explanation to reflect this.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 04/13] reftable: file format documentation
  2020-05-20 17:20                               ` Han-Wen Nienhuys
@ 2020-05-20 17:25                                 ` Han-Wen Nienhuys
  2020-05-20 17:33                                   ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-05-20 17:25 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jonathan Nieder via GitGitGadget, git, Han-Wen Nienhuys, Jonathan Nieder

On Wed, May 20, 2020 at 7:20 PM Han-Wen Nienhuys <hanwen@google.com> wrote:
> Jonathan Nieder said that this is used for git-stash, but I have never
> understood why this is necessary, and would love to clarify this
> better.

The doc says this:

"The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
entry from the reflog of `refs/stash` in a transaction file (below),
without needing to rewrite larger files. Readers reading a stack of
reflogs must treat this as a deletion."

I should probably look at the code for git-stash to see how this plays out.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 04/13] reftable: file format documentation
  2020-05-20 17:25                                 ` Han-Wen Nienhuys
@ 2020-05-20 17:33                                   ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-20 17:33 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Jonathan Nieder via GitGitGadget, git, Han-Wen Nienhuys, Jonathan Nieder

Han-Wen Nienhuys <hanwen@google.com> writes:

> On Wed, May 20, 2020 at 7:20 PM Han-Wen Nienhuys <hanwen@google.com> wrote:
>> Jonathan Nieder said that this is used for git-stash, but I have never
>> understood why this is necessary, and would love to clarify this
>> better.
>
> The doc says this:
>
> "The `log_type = 0x0` is mostly useful for `git stash drop`, removing an
> entry from the reflog of `refs/stash` in a transaction file (below),
> without needing to rewrite larger files. Readers reading a stack of
> reflogs must treat this as a deletion."

Yup, I saw that and that is where my "I am guessing that log_type of
0x0 is *NOT* used for that purpose)." in my review comment came from.

The "git stash" itself is an abuse of the reflog mechanism, but its
"drop" subcommand is probably the worst offender.  It wants to remove
any arbitrary reflog entry in the middle of a reflog, not just topmost
ones (pop) or bottommost ones (expire).

> I should probably look at the code for git-stash to see how this plays out.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v13 04/13] reftable: file format documentation
  2020-05-19 22:00                           ` Junio C Hamano
  2020-05-20 16:06                             ` Han-Wen Nienhuys
@ 2020-05-20 18:52                             ` Jonathan Nieder
  1 sibling, 0 replies; 409+ messages in thread
From: Jonathan Nieder @ 2020-05-20 18:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jonathan Nieder via GitGitGadget, git, Han-Wen Nienhuys

Junio C Hamano wrote:
> Han-wen wrote:

>> From: Jonathan Nieder <jrnieder@gmail.com>
>>
>> Shawn Pearce explains:
>>
>> Some repositories contain a lot of references (e.g. android at 866k,
>> rails at 31k). The reftable format provides:
>>
>> - Near constant time lookup for any single reference, even when the
>>   repository is cold and not in process or kernel cache.
>> - Near constant time verification a SHA-1 is referred to by at least
>>   one reference (for allow-tip-sha1-in-want).
>
> Not quite grammatical sentence?  Perhaps "if" after "verification?

Good catch, thanks.

[...]
>> using pandoc 2.2.1.  The result required the following additional
>> minor changes:
>>
>> - removed the [TOC] directive to add a table of contents, since
>>   asciidoc does not support it
>> - replaced git-scm.com/docs links with linkgit: directives that link
>>   to other pages within Git's documentation
>
> There are many
>
> 	’
>
> funny-quotes where we would prefer to place vanilla single quotes,
> which may also need to be corrected in the conversion toolchain.

Looks like Han-Wen is taking care of this (thanks!).

> Typoes pointed out below may probably be from the original where
> they should be corrected.

I'm happy to do one final update the doc in JGit to match what we end
up with and then replace it with a pointer to Git's copy once that
lands.

[...]
>> +Repositories with many loose references occupy a large number of disk
>> +blocks from the local file system, as each reference is its own file
>> +storing 41 bytes (and another file for the corresponding reflog). This
>> +negatively affects the number of inodes available when a large number of
>> +repositories are stored on the same filesystem. Readers can be penalized
>> +due to the larger number of syscalls required to traverse and read the
>> +`$GIT_DIR/refs` directory.
>
> Another downside is that we cannot arrange atomic updates to
> multiple refs over loose refs, even though the "lookup of a single
> reference does not require linear scan" unlike packed-refs, (as long
> as the filesystem does its job).  Worth mentioning?

Yes, this was another major part of the motivation (avoiding the
complication of the "atomic" multi-ref updates to packed-refs that Git
and JGit had to learn).

[...]
>> +References stored in a reftable are peeled, a record for an annotated
>> +(or signed) tag records both the tag object, and the object it refers
>> +to.
>
> OK.  Peeled results are recorded in packed-refs file because quite
> often when we use a tag object, what we actually want to access is
> the commit object it points at.  We do so here for the same reason?
>
> Not a rhetorical question, but if it invites a question from a
> reader, it may deserve to be described before readers ask it.

For a single tag ref, peeling to a commit is not very expensive.  But
for batch lookups e.g. when serving a response to an ls-remote
request, it adds up, and having the peeled results recorded helps.

[...]
>> +Directory/file conflicts
>> +^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +The reftable format accepts both `refs/heads/foo` and
>> +`refs/heads/foo/bar` as distinct references.
>> +
>> +This property is useful for retaining log records in reftable, but may
>> +confuse versions of Git using `$GIT_DIR/refs` directory tree to maintain
>> +references. Users of reftable may choose to continue to reject `foo` and
>> +`foo/bar` type conflicts to prevent problems for peers.
[...]
> "users ... may choose" implies that it is up to the implementation
> of reftable user which one to show, so given a single repository,
> "jgit" may show "refs/heads/foo" while "libgit2" may choose to show
> the other one.
>
> I am not sure if that is desirable---I suspect that we want to
> record which one needs to be chosen so that these "D/F conflicts
> disallowing" users can make consistent choices, but I dunno.

Yes, I think it would be better to explicitly say that Git will continue
to reject D/F conflicts for refs (*not* reflogs) even though the format
can support them in principle.

If we choose to permit them some day in the future, I believe that would
be a separate repository format extension and protocol capability to
avoid confusing old versions of Git.

[...]
>> +Symbolic references use `0x3`, followed by the complete name of the
>> +reference target. No compression is applied to the target name.
>
> Is there a place in the file format where an incomplete name can be
> stored?  If not, I think it makes it easier to read if we drop
> "complete" from the sentence.

The sentence about "no compression" covers the lack of prefix encoding,
so I suppose I agree.

Might make sense to say "full name" to convey that we're talking about
rev-parse --symbolic-full-name, not a relative path like symlinks
support.

[...]
>> +Log block format
>> +^^^^^^^^^^^^^^^^
>> +
>> +Unlike ref and obj blocks, log blocks are always unaligned.
>> +
>> +Log blocks are variable in size, and do not match the `block_size`
>> +specified in the file header or footer. Writers should choose an
>> +appropriate buffer size to prepare a log block for deflation, such as
>> +`2 * block_size`.
>
> I can guess the reason behind this design decision, but the readers
> may not be able to.  Should we write it down here, or would it make
> too much irrelevant details?

I don't have a strong opinion.  It sounds like Han-Wen sees something to
explain there, so I suppose it would be nice to spell out.

(My take: reflog lookups are not on the critical path for most
operations; especially, random accesses do not need to be fast.  From a
performance perspective, the best we can do is to compress them well to
decrease I/O cost, hence there's not much value to alignment.)

[...]
> This is a tangent but in a repository at hosting provider, whose
> primary (and often the only) source of updates are by end-user
> pushing into it, if reflogs are enabled, whose name and email are
> recorded in the logs?  The committer or tagger of the object that
> sits at the tip of the ref after the update?  What happens when a
> blob is pushed to update a ref?  Or would it be just a single "user"
> that represents the "server operator"?

The latter, "server operator" (GIT_COMMITTER_IDENT at the server).

Committer in commit objects is forgeable, hence wouldn't be very
useful here.

> We know in a non-bare repository an individual contributor works on
> typically records only one <name, email> in the reflog: the user who
> works in it.
>
> What I am trying to get at is if it makes more sense to have a small
> table of unique <name, email> pairs used in the file and have
> log_data record a single varint that is the index into that
> "committer ident" table.  I would suspect that it would give us
> significantly more gain than mere <> two bytes per log_data entry.

That's true, and a good idea for the next rev of the format.

[...]
>> +A 68-byte footer appears at the end:
>> +
>> +....
>> +    'REFT'
>> +    uint8( version_number = 1 )
>> +    uint24( block_size )
>> +    uint64( min_update_index )
>> +    uint64( max_update_index )
>> +
>> +    uint64( ref_index_position )
>> +    uint64( (obj_position << 5) | obj_id_len )
[...]
>> +* `obj_id_len`: number of bytes used to abbreviate object identifiers in
>> +obj blocks.
>
> Should we write "this can be up to 31" somewhere?  It is more than
> enough for SHA-1 and not quite sufficient for SHA-256 (unless we say
> "we store obj_id_len-1 here")?

Oh!  I'll take a closer look and then follow up.

Thanks for looking it over,
Jonathan

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v15 00/13] Reftable support git-core
  2020-05-18 20:31                         ` [PATCH v14 0/9] " Han-Wen Nienhuys via GitGitGadget
                                             ` (8 preceding siblings ...)
  2020-05-18 20:31                           ` [PATCH v14 9/9] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 19:46                           ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 01/13] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
                                               ` (15 more replies)
  9 siblings, 16 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend. Based on
hn/refs-cleanup.

Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1

Summary 20479 tests pass 1299 tests fail

Some issues:

 * many tests inspect .git/{logs,heads}/ directly.
 * worktrees broken. 
 * Rebase/cherry-pick still largely broken

v17

 * many style tweaks in reftable/
 * fix rebase
 * GIT_DEBUG_REFS support.

v18

 * fix pseudo ref usage

Han-Wen Nienhuys (12):
  Write pseudorefs through ref backends.
  Make refs_ref_exists public
  Treat BISECT_HEAD as a pseudo ref
  Treat CHERRY_PICK_HEAD as a pseudo ref
  Treat REVERT_HEAD as a pseudo ref
  Move REF_LOG_ONLY to refs-internal.h
  Iterate over the "refs/" namespace in for_each_[raw]ref
  Add .gitattributes for the reftable/ directory
  Add reftable library
  Reftable support for git-core
  Add GIT_DEBUG_REFS debugging mechanism
  Add reftable testing infrastructure

Johannes Schindelin (1):
  vcxproj: adjust for the reftable changes

 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   28 +-
 builtin/bisect--helper.c                      |    3 +-
 builtin/clone.c                               |    3 +-
 builtin/commit.c                              |   34 +-
 builtin/init-db.c                             |   56 +-
 builtin/merge.c                               |    2 +-
 cache.h                                       |    6 +-
 config.mak.uname                              |    2 +-
 contrib/buildsystems/Generators/Vcxproj.pm    |   11 +-
 git-bisect.sh                                 |    4 +-
 path.c                                        |    2 -
 path.h                                        |    9 +-
 refs.c                                        |  157 +-
 refs.h                                        |   16 +
 refs/debug.c                                  |  309 ++++
 refs/files-backend.c                          |  121 +-
 refs/packed-backend.c                         |   21 +-
 refs/refs-internal.h                          |   26 +
 refs/reftable-backend.c                       | 1332 +++++++++++++++++
 reftable/.gitattributes                       |    1 +
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   11 +
 reftable/VERSION                              |    8 +
 reftable/basics.c                             |  215 +++
 reftable/basics.h                             |   53 +
 reftable/block.c                              |  433 ++++++
 reftable/block.h                              |  129 ++
 reftable/constants.h                          |   21 +
 reftable/file.c                               |   95 ++
 reftable/iter.c                               |  243 +++
 reftable/iter.h                               |   72 +
 reftable/merged.c                             |  320 ++++
 reftable/merged.h                             |   39 +
 reftable/pq.c                                 |  113 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  742 +++++++++
 reftable/reader.h                             |   65 +
 reftable/record.c                             | 1113 ++++++++++++++
 reftable/record.h                             |  128 ++
 reftable/refname.c                            |  209 +++
 reftable/refname.h                            |   38 +
 reftable/reftable.c                           |   90 ++
 reftable/reftable.h                           |  564 +++++++
 reftable/slice.c                              |  247 +++
 reftable/slice.h                              |   87 ++
 reftable/stack.c                              | 1207 +++++++++++++++
 reftable/stack.h                              |   48 +
 reftable/system.h                             |   54 +
 reftable/tree.c                               |   63 +
 reftable/tree.h                               |   34 +
 reftable/update.sh                            |   24 +
 reftable/writer.c                             |  644 ++++++++
 reftable/writer.h                             |   60 +
 reftable/zlib-compat.c                        |   92 ++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 sequencer.c                                   |   56 +-
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |  142 ++
 t/t0033-debug-refs.sh                         |   18 +
 t/t1409-avoid-packing-refs.sh                 |    6 +
 t/t1450-fsck.sh                               |    6 +
 t/t3210-pack-refs.sh                          |    6 +
 t/t9903-bash-prompt.sh                        |    6 +
 t/test-lib.sh                                 |    5 +
 wt-status.c                                   |    6 +-
 67 files changed, 9550 insertions(+), 194 deletions(-)
 create mode 100644 refs/debug.c
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/.gitattributes
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100755 t/t0031-reftable.sh
 create mode 100755 t/t0033-debug-refs.sh


base-commit: aada2199e11bff14ee91d5d590c2e3d3eba5e148
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v15
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v15
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v14:

  1:  46d04f6740e !  1:  9bb64f748dc Write pseudorefs through ref backends.
     @@ Commit message
          CHERRY_PICK_HEAD, etc.
      
          These refs have always been read through the ref backends, but they were written
     -    in a one-off routine that wrote a object ID or symref directly wrote into
     +    in a one-off routine that wrote an object ID or symref directly into
          .git/<pseudo_ref_name>.
      
          This causes problems when introducing a new ref storage backend. To remedy this,
     @@ Commit message
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
       ## refs.c ##
     +@@ refs.c: int ref_exists(const char *refname)
     + 	return refs_ref_exists(get_main_ref_store(the_repository), refname);
     + }
     + 
     ++int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
     ++{
     ++	return refs_delete_pseudoref(get_main_ref_store(the_repository),
     ++				     pseudoref, old_oid);
     ++}
     ++
     + static int filter_refs(const char *refname, const struct object_id *oid,
     + 			   int flags, void *data)
     + {
      @@ refs.c: long get_files_ref_lock_timeout_ms(void)
       	return timeout_ms;
       }
     @@ refs.c: int refs_delete_ref(struct ref_store *refs, const char *msg,
       	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
       		assert(refs == get_main_ref_store(the_repository));
      -		return delete_pseudoref(refname, old_oid);
     -+		return ref_store_delete_pseudoref(refs, refname, old_oid);
     ++		return refs_delete_pseudoref(refs, refname, old_oid);
       	}
       
       	transaction = ref_store_transaction_begin(refs, &err);
     @@ refs.c: int refs_update_ref(struct ref_store *refs, const char *msg,
       	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
       		assert(refs == get_main_ref_store(the_repository));
      -		ret = write_pseudoref(refname, new_oid, old_oid, &err);
     -+		ret = ref_store_write_pseudoref(refs, refname, new_oid, old_oid,
     -+						&err);
     ++		ret = refs_write_pseudoref(refs, refname, new_oid, old_oid,
     ++					   &err);
       	} else {
       		t = ref_store_transaction_begin(refs, &err);
       		if (!t ||
     @@ refs.c: int head_ref(each_ref_fn fn, void *cb_data)
       	return refs_head_ref(get_main_ref_store(the_repository), fn, cb_data);
       }
       
     -+int ref_store_write_pseudoref(struct ref_store *refs, const char *pseudoref,
     -+			      const struct object_id *oid,
     -+			      const struct object_id *old_oid,
     -+			      struct strbuf *err)
     ++int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
     ++			 const struct object_id *oid,
     ++			 const struct object_id *old_oid, struct strbuf *err)
      +{
      +	return refs->be->write_pseudoref(refs, pseudoref, oid, old_oid, err);
      +}
      +
     -+int ref_store_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
     -+			       const struct object_id *old_oid)
     ++int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
     ++			  const struct object_id *old_oid)
      +{
      +	return refs->be->delete_pseudoref(refs, pseudoref, old_oid);
      +}
     @@ refs.h: int update_ref(const char *msg, const char *refname,
      +   and deletion as they cannot take part in normal transactional updates.
      +   Pseudorefs should only be written for the main repository.
      +*/
     -+int ref_store_write_pseudoref(struct ref_store *refs, const char *pseudoref,
     -+			      const struct object_id *oid,
     -+			      const struct object_id *old_oid,
     -+			      struct strbuf *err);
     -+int ref_store_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
     -+			       const struct object_id *old_oid);
     ++int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
     ++			 const struct object_id *oid,
     ++			 const struct object_id *old_oid, struct strbuf *err);
     ++int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
     ++			  const struct object_id *old_oid);
     ++int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid);
      +
       int parse_hide_refs_config(const char *var, const char *value, const char *);
       
  -:  ----------- >  2:  d489806907b Make refs_ref_exists public
  -:  ----------- >  3:  60a82678a1b Treat BISECT_HEAD as a pseudo ref
  -:  ----------- >  4:  1a925f5671a Treat CHERRY_PICK_HEAD as a pseudo ref
  -:  ----------- >  5:  8f2ebf65b90 Treat REVERT_HEAD as a pseudo ref
  2:  c650f7e4345 =  6:  e0e1e3a558d Move REF_LOG_ONLY to refs-internal.h
  3:  0c953fce52a =  7:  27a980d1d9d Iterate over the "refs/" namespace in for_each_[raw]ref
  4:  206b7d329f8 =  8:  dcbd000e7f7 Add .gitattributes for the reftable/ directory
  5:  9a8e504a1d0 !  9:  718b646a54e Add reftable library
     @@ reftable/README.md (new)
      
       ## reftable/VERSION (new) ##
      @@
     -+commit bad78b6de70700933a2f93ebadaa37a5e852d9ea
     ++commit fe15c65dd95b6a13de92b4baf711e50338ef20ae
      +Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Mon May 18 20:12:57 2020 +0200
     ++Date:   Thu May 28 19:28:24 2020 +0200
      +
     -+    C: get rid of inner blocks for local scoping
     ++    C: add 'canary' to struct slice
     ++    
     ++    This enforces that all slices are initialized explicitly with
     ++    SLICE_INIT, for future compatibility with git's strbuf.
      
       ## reftable/basics.c (new) ##
      @@
     @@ reftable/block.c (new)
      +	bw->entries = 0;
      +	bw->restart_len = 0;
      +	bw->last_key.len = 0;
     ++	bw->last_key.canary = SLICE_CANARY;
      +}
      +
      +byte block_writer_type(struct block_writer *bw)
     @@ reftable/block.c (new)
      +   success */
      +int block_writer_add(struct block_writer *w, struct reftable_record *rec)
      +{
     -+	struct slice empty = { 0 };
     ++	struct slice empty = SLICE_INIT;
      +	struct slice last = w->entries % w->restart_interval == 0 ? empty :
      +								    w->last_key;
      +	struct slice out = {
      +		.buf = w->buf + w->next,
      +		.len = w->block_size - w->next,
     ++		.canary = SLICE_CANARY,
      +	};
      +
      +	struct slice start = out;
      +
      +	bool restart = false;
     -+	struct slice key = { 0 };
     ++	struct slice key = SLICE_INIT;
      +	int n = 0;
      +
      +	reftable_record_key(rec, &key);
      +	n = reftable_encode_key(&restart, out, last, key,
      +				reftable_record_val_type(rec));
     -+	if (n < 0) {
     -+		goto err;
     -+	}
     ++	if (n < 0)
     ++		goto done;
      +	slice_consume(&out, n);
      +
      +	n = reftable_record_encode(rec, out, w->hash_size);
     -+	if (n < 0) {
     -+		goto err;
     -+	}
     ++	if (n < 0)
     ++		goto done;
      +	slice_consume(&out, n);
      +
      +	if (block_writer_register_restart(w, start.len - out.len, restart,
     -+					  key) < 0) {
     -+		goto err;
     -+	}
     ++					  key) < 0)
     ++		goto done;
      +
      +	slice_release(&key);
      +	return 0;
      +
     -+err:
     ++done:
      +	slice_release(&key);
      +	return -1;
      +}
     @@ reftable/block.c (new)
      +	if (restart) {
      +		rlen++;
      +	}
     -+	if (2 + 3 * rlen + n > w->block_size - w->next) {
     ++	if (2 + 3 * rlen + n > w->block_size - w->next)
      +		return -1;
     -+	}
      +	if (restart) {
      +		if (w->restart_len == w->restart_cap) {
      +			w->restart_cap = w->restart_cap * 2 + 1;
     @@ reftable/block.c (new)
      +	}
      +
      +	w->next += n;
     ++
      +	slice_copy(&w->last_key, key);
      +	w->entries++;
      +	return 0;
     @@ reftable/block.c (new)
      +
      +	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
      +		int block_header_skip = 4 + w->header_off;
     -+		struct slice compressed = { 0 };
     ++		struct slice compressed = SLICE_INIT;
      +		int zresult = 0;
      +		uLongf src_len = w->next - block_header_skip;
      +		slice_resize(&compressed, src_len);
     @@ reftable/block.c (new)
      +	byte typ = block->data[header_off];
      +	uint32_t sz = get_be24(block->data + header_off + 1);
      +
     -+	if (!reftable_is_block_type(typ)) {
     ++	uint16_t restart_count = 0;
     ++	uint32_t restart_start = 0;
     ++	byte *restart_bytes = NULL;
     ++
     ++	if (!reftable_is_block_type(typ))
      +		return REFTABLE_FORMAT_ERROR;
     -+	}
      +
      +	if (typ == BLOCK_TYPE_LOG) {
     -+		struct slice uncompressed = { 0 };
     ++		struct slice uncompressed = SLICE_INIT;
      +		int block_header_skip = 4 + header_off;
      +		uLongf dst_len = sz - block_header_skip; /* total size of dest
      +							    buffer. */
     @@ reftable/block.c (new)
      +			return REFTABLE_ZLIB_ERROR;
      +		}
      +
     -+		if (dst_len + block_header_skip != sz) {
     ++		if (dst_len + block_header_skip != sz)
      +			return REFTABLE_FORMAT_ERROR;
     -+		}
      +
      +		/* We're done with the input data. */
      +		reftable_block_done(block);
     @@ reftable/block.c (new)
      +		full_block_size = sz;
      +	}
      +
     -+	{
     -+		uint16_t restart_count = get_be16(block->data + sz - 2);
     -+		uint32_t restart_start = sz - 2 - 3 * restart_count;
     ++	restart_count = get_be16(block->data + sz - 2);
     ++	restart_start = sz - 2 - 3 * restart_count;
     ++	restart_bytes = block->data + restart_start;
      +
     -+		byte *restart_bytes = block->data + restart_start;
     ++	/* transfer ownership. */
     ++	br->block = *block;
     ++	block->data = NULL;
     ++	block->len = 0;
      +
     -+		/* transfer ownership. */
     -+		br->block = *block;
     -+		block->data = NULL;
     -+		block->len = 0;
     -+
     -+		br->hash_size = hash_size;
     -+		br->block_len = restart_start;
     -+		br->full_block_size = full_block_size;
     -+		br->header_off = header_off;
     -+		br->restart_count = restart_count;
     -+		br->restart_bytes = restart_bytes;
     -+	}
     ++	br->hash_size = hash_size;
     ++	br->block_len = restart_start;
     ++	br->full_block_size = full_block_size;
     ++	br->header_off = header_off;
     ++	br->restart_count = restart_count;
     ++	br->restart_bytes = restart_bytes;
      +
      +	return 0;
      +}
     @@ reftable/block.c (new)
      +	struct slice in = {
      +		.buf = a->r->block.data + off,
      +		.len = a->r->block_len - off,
     ++		.canary = SLICE_CANARY,
      +	};
      +
      +	/* the restart key is verbatim in the block, so this could avoid the
      +	   alloc for decoding the key */
     -+	struct slice rkey = { 0 };
     -+	struct slice last_key = { 0 };
     ++	struct slice rkey = SLICE_INIT;
     ++	struct slice last_key = SLICE_INIT;
      +	byte unused_extra;
      +	int n = reftable_decode_key(&rkey, &unused_extra, last_key, in);
     ++	int result;
      +	if (n < 0) {
      +		a->error = 1;
      +		return -1;
      +	}
      +
     -+	{
     -+		int result = slice_cmp(a->key, rkey);
     -+		slice_release(&rkey);
     -+		return result;
     -+	}
     ++	result = slice_cmp(a->key, rkey);
     ++	slice_release(&rkey);
     ++	return result;
      +}
      +
      +void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
     @@ reftable/block.c (new)
      +	struct slice in = {
      +		.buf = it->br->block.data + it->next_off,
      +		.len = it->br->block_len - it->next_off,
     ++		.canary = SLICE_CANARY,
      +	};
      +	struct slice start = in;
     -+	struct slice key = { 0 };
     ++	struct slice key = SLICE_INIT;
      +	byte extra = 0;
      +	int n = 0;
      +
     -+	if (it->next_off >= it->br->block_len) {
     ++	if (it->next_off >= it->br->block_len)
      +		return 1;
     -+	}
      +
      +	n = reftable_decode_key(&key, &extra, it->last_key, in);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return -1;
     -+	}
      +
      +	slice_consume(&in, n);
      +	n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return -1;
     -+	}
      +	slice_consume(&in, n);
      +
      +	slice_copy(&it->last_key, key);
     @@ reftable/block.c (new)
      +
      +int block_reader_first_key(struct block_reader *br, struct slice *key)
      +{
     -+	struct slice empty = { 0 };
     ++	struct slice empty = SLICE_INIT;
      +	int off = br->header_off + 4;
      +	struct slice in = {
      +		.buf = br->block.data + off,
      +		.len = br->block_len - off,
     ++		.canary = SLICE_CANARY,
      +	};
      +
      +	byte extra = 0;
      +	int n = reftable_decode_key(key, &extra, empty, in);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return n;
     -+	}
     ++
      +	return 0;
      +}
      +
     @@ reftable/block.c (new)
      +		.r = br,
      +	};
      +	struct reftable_record rec = reftable_new_record(block_reader_type(br));
     -+	struct slice key = { 0 };
     ++	struct slice key = SLICE_INIT;
      +	int err = 0;
     -+	struct block_iter next = { 0 };
     ++	struct block_iter next = {
     ++		.last_key = SLICE_INIT,
     ++	};
      +
      +	int i = binsearch(br->restart_count, &restart_key_less, &args);
      +	if (args.error) {
      +		err = REFTABLE_FORMAT_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	it->br = br;
     @@ reftable/block.c (new)
      +	while (true) {
      +		block_iter_copy_from(&next, it);
      +		err = block_iter_next(&next, &rec);
     -+		if (err < 0) {
     -+			goto exit;
     -+		}
     ++		if (err < 0)
     ++			goto done;
      +
      +		reftable_record_key(&rec, &key);
      +		if (err > 0 || slice_cmp(key, want) >= 0) {
      +			err = 0;
     -+			goto exit;
     ++			goto done;
      +		}
      +
      +		block_iter_copy_from(it, &next);
      +	}
      +
     -+exit:
     ++done:
      +	slice_release(&key);
      +	slice_release(&next.last_key);
      +	reftable_record_destroy(&rec);
     @@ reftable/block.h (new)
      +	struct slice last_key;
      +};
      +
     -+/* initializes a block reader */
     ++/* initializes a block reader. */
      +int block_reader_init(struct block_reader *br, struct reftable_block *bl,
      +		      uint32_t header_off, uint32_t table_block_size,
      +		      int hash_size);
     @@ reftable/file.c (new)
      +	struct file_block_source *b = (struct file_block_source *)v;
      +	assert(off + size <= b->size);
      +	dest->data = reftable_malloc(size);
     -+	if (pread(b->fd, dest->data, size, off) != size) {
     ++	if (pread(b->fd, dest->data, size, off) != size)
      +		return -1;
     -+	}
      +	dest->len = size;
      +	return size;
      +}
     @@ reftable/file.c (new)
      +	}
      +
      +	err = fstat(fd, &st);
     -+	if (err < 0) {
     ++	if (err < 0)
      +		return -1;
     -+	}
      +
      +	p = reftable_calloc(sizeof(struct file_block_source));
      +	p->size = st.st_size;
     @@ reftable/iter.c (new)
      +
      +static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
      +{
     ++	uint64_t off;
     ++	int err = 0;
      +	if (it->offset_idx == it->offset_len) {
      +		it->finished = true;
      +		return 1;
     @@ reftable/iter.c (new)
      +
      +	reftable_block_done(&it->block_reader.block);
      +
     -+	{
     -+		uint64_t off = it->offsets[it->offset_idx++];
     -+		int err = reader_init_block_reader(it->r, &it->block_reader,
     -+						   off, BLOCK_TYPE_REF);
     -+		if (err < 0) {
     -+			return err;
     -+		}
     -+		if (err > 0) {
     -+			/* indexed block does not exist. */
     -+			return REFTABLE_FORMAT_ERROR;
     -+		}
     ++	off = it->offsets[it->offset_idx++];
     ++	err = reader_init_block_reader(it->r, &it->block_reader, off,
     ++				       BLOCK_TYPE_REF);
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++	if (err > 0) {
     ++		/* indexed block does not exist. */
     ++		return REFTABLE_FORMAT_ERROR;
      +	}
      +	block_reader_start(&it->block_reader, &it->cur);
      +	return 0;
     @@ reftable/iter.c (new)
      +			       struct reftable_reader *r, byte *oid,
      +			       int oid_len, uint64_t *offsets, int offset_len)
      +{
     ++	struct indexed_table_ref_iter empty = INDEXED_TABLE_REF_ITER_INIT;
      +	struct indexed_table_ref_iter *itr =
      +		reftable_calloc(sizeof(struct indexed_table_ref_iter));
      +	int err = 0;
      +
     ++	*itr = empty;
      +	itr->r = r;
      +	slice_resize(&itr->oid, oid_len);
      +	memcpy(itr->oid.buf, oid, oid_len);
     @@ reftable/iter.h (new)
      +	struct slice oid;
      +	struct reftable_iterator it;
      +};
     ++#define FILTERING_REF_ITERATOR_INIT \
     ++	{                           \
     ++		.oid = SLICE_INIT   \
     ++	}
      +
      +void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
      +					  struct filtering_ref_iterator *);
     @@ reftable/iter.h (new)
      +	bool finished;
      +};
      +
     ++#define INDEXED_TABLE_REF_ITER_INIT                                   \
     ++	{                                                             \
     ++		.cur = { .last_key = SLICE_INIT }, .oid = SLICE_INIT, \
     ++	}
     ++
      +void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
      +					  struct indexed_table_ref_iter *itr);
      +int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
     @@ reftable/merged.c (new)
      +		.index = idx,
      +	};
      +	int err = iterator_next(&mi->stack[idx], &rec);
     -+	if (err < 0) {
     ++	if (err < 0)
      +		return err;
     -+	}
      +
      +	if (err > 0) {
      +		reftable_iterator_destroy(&mi->stack[idx]);
     @@ reftable/merged.c (new)
      +
      +static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
      +{
     -+	if (iterator_is_null(&mi->stack[idx])) {
     ++	if (iterator_is_null(&mi->stack[idx]))
      +		return 0;
     -+	}
      +	return merged_iter_advance_nonnull_subiter(mi, idx);
      +}
      +
      +static int merged_iter_next_entry(struct merged_iter *mi,
      +				  struct reftable_record *rec)
      +{
     -+	struct slice entry_key = { 0 };
     ++	struct slice entry_key = SLICE_INIT;
      +	struct pq_entry entry = { 0 };
      +	int err = 0;
      +
     -+	if (merged_iter_pqueue_is_empty(mi->pq)) {
     ++	if (merged_iter_pqueue_is_empty(mi->pq))
      +		return 1;
     -+	}
      +
      +	entry = merged_iter_pqueue_remove(&mi->pq);
      +	err = merged_iter_advance_subiter(mi, entry.index);
     -+	if (err < 0) {
     ++	if (err < 0)
      +		return err;
     -+	}
      +
      +	/*
      +	  One can also use reftable as datacenter-local storage, where the ref
     @@ reftable/merged.c (new)
      +	reftable_record_key(&entry.rec, &entry_key);
      +	while (!merged_iter_pqueue_is_empty(mi->pq)) {
      +		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
     -+		struct slice k = { 0 };
     ++		struct slice k = SLICE_INIT;
      +		int err = 0, cmp = 0;
      +
      +		reftable_record_key(&top.rec, &k);
     @@ reftable/merged.c (new)
      +static int merged_iter_next_void(void *p, struct reftable_record *rec)
      +{
      +	struct merged_iter *mi = (struct merged_iter *)p;
     -+	if (merged_iter_pqueue_is_empty(mi->pq)) {
     ++	if (merged_iter_pqueue_is_empty(mi->pq))
      +		return 1;
     -+	}
      +
      +	return merged_iter_next(mi, rec);
      +}
     @@ reftable/merged.c (new)
      +	if (err < 0) {
      +		merged_iter_close(&merged);
      +		return err;
     -+	}
     -+
     -+	{
     ++	} else {
      +		struct merged_iter *p =
      +			reftable_malloc(sizeof(struct merged_iter));
      +		*p = merged;
     @@ reftable/pq.c (new)
      +
      +int pq_less(struct pq_entry a, struct pq_entry b)
      +{
     -+	struct slice ak = { 0 };
     -+	struct slice bk = { 0 };
     ++	struct slice ak = SLICE_INIT;
     ++	struct slice bk = SLICE_INIT;
      +	int cmp = 0;
      +	reftable_record_key(&a.rec, &ak);
      +	reftable_record_key(&b.rec, &bk);
     @@ reftable/pq.c (new)
      +	slice_release(&ak);
      +	slice_release(&bk);
      +
     -+	if (cmp == 0) {
     ++	if (cmp == 0)
      +		return a.index > b.index;
     -+	}
      +
      +	return cmp < 0;
      +}
     @@ reftable/reader.c (new)
      +			    struct reftable_block *dest, uint64_t off,
      +			    uint32_t sz)
      +{
     -+	if (off >= r->size) {
     ++	if (off >= r->size)
      +		return 0;
     -+	}
      +
      +	if (off + sz > r->size) {
      +		sz = r->size - off;
     @@ reftable/reader.c (new)
      +
      +	if (memcmp(f, "REFT", 4)) {
      +		err = REFTABLE_FORMAT_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +	f += 4;
      +
      +	if (memcmp(footer, header, header_size(r->version))) {
      +		err = REFTABLE_FORMAT_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	f++;
     @@ reftable/reader.c (new)
      +			break;
      +		default:
      +			err = REFTABLE_FORMAT_ERROR;
     -+			goto exit;
     ++			goto done;
      +		}
      +		f += 4;
      +	}
     @@ reftable/reader.c (new)
      +	f += 4;
      +	if (computed_crc != file_crc) {
      +		err = REFTABLE_FORMAT_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	first_block_typ = header[header_size(r->version)];
     @@ reftable/reader.c (new)
      +				  r->log_offsets.offset > 0);
      +	r->obj_offsets.present = r->obj_offsets.offset > 0;
      +	err = 0;
     -+exit:
     ++done:
      +	return err;
      +}
      +
     @@ reftable/reader.c (new)
      +	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
      +	if (err != header_size(2) + 1) {
      +		err = REFTABLE_IO_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	if (memcmp(header.data, "REFT", 4)) {
      +		err = REFTABLE_FORMAT_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +	r->version = header.data[4];
      +	if (r->version != 1 && r->version != 2) {
      +		err = REFTABLE_FORMAT_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	r->size = block_source_size(source) - footer_size(r->version);
     @@ reftable/reader.c (new)
      +				      footer_size(r->version));
      +	if (err != footer_size(r->version)) {
      +		err = REFTABLE_IO_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	err = parse_footer(r, footer.data, header.data);
     -+exit:
     ++done:
      +	reftable_block_done(&footer);
      +	reftable_block_done(&header);
      +	return err;
     @@ reftable/reader.c (new)
      +	struct block_iter bi;
      +	bool finished;
      +};
     ++#define TABLE_ITER_INIT                         \
     ++	{                                       \
     ++		.bi = {.last_key = SLICE_INIT } \
     ++	}
      +
      +static void table_iter_copy_from(struct table_iter *dest,
      +				 struct table_iter *src)
     @@ reftable/reader.c (new)
      +	uint32_t header_off = next_off ? 0 : header_size(r->version);
      +	int32_t block_size = 0;
      +
     -+	if (next_off >= r->size) {
     ++	if (next_off >= r->size)
      +		return 1;
     -+	}
      +
      +	err = reader_get_block(r, &block, next_off, guess_block_size);
     -+	if (err < 0) {
     ++	if (err < 0)
      +		return err;
     -+	}
      +
      +	block_size = extract_block_size(block.data, &block_typ, next_off,
      +					r->version);
     -+	if (block_size < 0) {
     ++	if (block_size < 0)
      +		return block_size;
     -+	}
      +
      +	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
      +		reftable_block_done(&block);
     @@ reftable/reader.c (new)
      +		dest->finished = true;
      +		return 1;
      +	}
     -+	if (err != 0) {
     ++	if (err != 0)
      +		return err;
     -+	}
     -+
     -+	{
     ++	else {
      +		struct block_reader *brp =
      +			reftable_malloc(sizeof(struct block_reader));
      +		*brp = br;
     @@ reftable/reader.c (new)
      +
      +static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
      +{
     -+	if (reftable_record_type(rec) != ti->typ) {
     ++	if (reftable_record_type(rec) != ti->typ)
      +		return REFTABLE_API_ERROR;
     -+	}
      +
      +	while (true) {
     -+		struct table_iter next = { 0 };
     ++		struct table_iter next = TABLE_ITER_INIT;
      +		int err = 0;
      +		if (ti->finished) {
      +			return 1;
     @@ reftable/reader.c (new)
      +	struct block_reader *brp = NULL;
      +
      +	int err = reader_init_block_reader(r, &br, off, typ);
     -+	if (err != 0) {
     ++	if (err != 0)
      +		return err;
     -+	}
      +
      +	brp = reftable_malloc(sizeof(struct block_reader));
      +	*brp = br;
     @@ reftable/reader.c (new)
      +{
      +	struct reftable_record rec =
      +		reftable_new_record(reftable_record_type(want));
     -+	struct slice want_key = { 0 };
     -+	struct slice got_key = { 0 };
     -+	struct table_iter next = { 0 };
     ++	struct slice want_key = SLICE_INIT;
     ++	struct slice got_key = SLICE_INIT;
     ++	struct table_iter next = TABLE_ITER_INIT;
      +	int err = -1;
     ++
      +	reftable_record_key(want, &want_key);
      +
      +	while (true) {
      +		err = table_iter_next_block(&next, ti);
     -+		if (err < 0) {
     -+			goto exit;
     -+		}
     ++		if (err < 0)
     ++			goto done;
      +
      +		if (err > 0) {
      +			break;
      +		}
      +
      +		err = block_reader_first_key(next.bi.br, &got_key);
     -+		if (err < 0) {
     -+			goto exit;
     -+		}
     -+		{
     -+			int cmp = slice_cmp(got_key, want_key);
     -+			if (cmp > 0) {
     -+				table_iter_block_done(&next);
     -+				break;
     -+			}
     ++		if (err < 0)
     ++			goto done;
     ++
     ++		if (slice_cmp(got_key, want_key) > 0) {
     ++			table_iter_block_done(&next);
     ++			break;
      +		}
      +
      +		table_iter_block_done(ti);
     @@ reftable/reader.c (new)
      +	}
      +
      +	err = block_iter_seek(&ti->bi, want_key);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +	err = 0;
      +
     -+exit:
     ++done:
      +	block_iter_close(&next.bi);
      +	reftable_record_destroy(&rec);
      +	slice_release(&want_key);
     @@ reftable/reader.c (new)
      +			       struct reftable_iterator *it,
      +			       struct reftable_record *rec)
      +{
     -+	struct reftable_index_record want_index = { 0 };
     ++	struct reftable_index_record want_index = { .last_key = SLICE_INIT };
      +	struct reftable_record want_index_rec = { 0 };
     -+	struct reftable_index_record index_result = { 0 };
     ++	struct reftable_index_record index_result = { .last_key = SLICE_INIT };
      +	struct reftable_record index_result_rec = { 0 };
     -+	struct table_iter index_iter = { 0 };
     -+	struct table_iter next = { 0 };
     ++	struct table_iter index_iter = TABLE_ITER_INIT;
     ++	struct table_iter next = TABLE_ITER_INIT;
      +	int err = 0;
      +
      +	reftable_record_key(rec, &want_index.last_key);
     @@ reftable/reader.c (new)
      +	reftable_record_from_index(&index_result_rec, &index_result);
      +
      +	err = reader_start(r, &index_iter, reftable_record_type(rec), true);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	err = reader_seek_linear(r, &index_iter, &want_index_rec);
      +	while (true) {
      +		err = table_iter_next(&index_iter, &index_result_rec);
      +		table_iter_block_done(&index_iter);
     -+		if (err != 0) {
     -+			goto exit;
     -+		}
     ++		if (err != 0)
     ++			goto done;
      +
      +		err = reader_table_iter_at(r, &next, index_result.offset, 0);
     -+		if (err != 0) {
     -+			goto exit;
     -+		}
     ++		if (err != 0)
     ++			goto done;
      +
      +		err = block_iter_seek(&next.bi, want_index.last_key);
     -+		if (err < 0) {
     -+			goto exit;
     -+		}
     ++		if (err < 0)
     ++			goto done;
      +
      +		if (next.typ == reftable_record_type(rec)) {
      +			err = 0;
     @@ reftable/reader.c (new)
      +	}
      +
      +	if (err == 0) {
     ++		struct table_iter empty = TABLE_ITER_INIT;
      +		struct table_iter *malloced =
      +			reftable_calloc(sizeof(struct table_iter));
     ++		*malloced = empty;
      +		table_iter_copy_from(malloced, &next);
      +		iterator_from_table_iter(it, malloced);
      +	}
     -+exit:
     ++done:
      +	block_iter_close(&next.bi);
      +	table_iter_close(&index_iter);
      +	reftable_record_clear(&want_index_rec);
     @@ reftable/reader.c (new)
      +	struct reftable_reader_offsets *offs =
      +		reader_offsets_for(r, reftable_record_type(rec));
      +	uint64_t idx = offs->index_offset;
     -+	struct table_iter ti = { 0 };
     ++	struct table_iter ti = TABLE_ITER_INIT;
      +	int err = 0;
     -+	if (idx > 0) {
     ++	if (idx > 0)
      +		return reader_seek_indexed(r, it, rec);
     -+	}
      +
      +	err = reader_start(r, &ti, reftable_record_type(rec), false);
     -+	if (err < 0) {
     ++	if (err < 0)
      +		return err;
     -+	}
      +	err = reader_seek_linear(r, &ti, rec);
     -+	if (err < 0) {
     ++	if (err < 0)
      +		return err;
     -+	}
     -+
     -+	{
     ++	else {
      +		struct table_iter *p =
      +			reftable_malloc(sizeof(struct table_iter));
      +		*p = ti;
     @@ reftable/reader.c (new)
      +	/* Look through the reverse index. */
      +	reftable_record_from_obj(&want_rec, &want);
      +	err = reader_seek(r, &oit, &want_rec);
     -+	if (err != 0) {
     -+		goto exit;
     -+	}
     ++	if (err != 0)
     ++		goto done;
      +
      +	/* read out the reftable_obj_record */
      +	reftable_record_from_obj(&got_rec, &got);
      +	err = iterator_next(&oit, &got_rec);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	if (err > 0 ||
      +	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
      +		/* didn't find it; return empty iterator */
      +		iterator_set_empty(it);
      +		err = 0;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	err = new_indexed_table_ref_iter(&itr, r, oid, hash_size(r->hash_id),
      +					 got.offsets, got.offset_len);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +	got.offsets = NULL;
      +	iterator_from_indexed_table_ref_iter(it, itr);
      +
     -+exit:
     ++done:
      +	reftable_iterator_destroy(&oit);
      +	reftable_record_clear(&got_rec);
      +	return err;
     @@ reftable/reader.c (new)
      +					      struct reftable_iterator *it,
      +					      byte *oid)
      +{
     ++	struct table_iter ti_empty = TABLE_ITER_INIT;
      +	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
      +	struct filtering_ref_iterator *filter = NULL;
     ++	struct filtering_ref_iterator empty = FILTERING_REF_ITERATOR_INIT;
      +	int oid_len = hash_size(r->hash_id);
     -+	int err = reader_start(r, ti, BLOCK_TYPE_REF, false);
     ++	int err;
     ++
     ++	*ti = ti_empty;
     ++	err = reader_start(r, ti, BLOCK_TYPE_REF, false);
      +	if (err < 0) {
      +		reftable_free(ti);
      +		return err;
      +	}
      +
     -+	filter = reftable_calloc(sizeof(struct filtering_ref_iterator));
     ++	filter = reftable_malloc(sizeof(struct filtering_ref_iterator));
     ++	*filter = empty;
      +	slice_resize(&filter->oid, oid_len);
      +	memcpy(filter->oid.buf, oid, oid_len);
      +	reftable_table_from_reader(&filter->tab, r);
     @@ reftable/reader.c (new)
      +int reftable_reader_refs_for(struct reftable_reader *r,
      +			     struct reftable_iterator *it, byte *oid)
      +{
     -+	if (r->obj_offsets.present) {
     ++	if (r->obj_offsets.present)
      +		return reftable_reader_refs_for_indexed(r, it, oid);
     -+	}
      +	return reftable_reader_refs_for_unindexed(r, it, oid);
      +}
      +
     @@ reftable/record.c (new)
      +	int ptr = 0;
      +	uint64_t val;
      +
     -+	if (in.len == 0) {
     ++	if (in.len == 0)
      +		return -1;
     -+	}
      +	val = in.buf[ptr] & 0x7f;
      +
      +	while (in.buf[ptr] & 0x80) {
     @@ reftable/record.c (new)
      +	}
      +
      +	n = sizeof(buf) - i - 1;
     -+	if (dest.len < n) {
     ++	if (dest.len < n)
      +		return -1;
     -+	}
      +	memcpy(dest.buf, &buf[i + 1], n);
      +	return n;
      +}
     @@ reftable/record.c (new)
      +	int start_len = in.len;
      +	uint64_t tsize = 0;
      +	int n = get_var_int(&tsize, in);
     -+	if (n <= 0) {
     ++	if (n <= 0)
      +		return -1;
     -+	}
      +	slice_consume(&in, n);
     -+	if (in.len < tsize) {
     ++	if (in.len < tsize)
      +		return -1;
     -+	}
      +
      +	slice_resize(dest, tsize + 1);
      +	dest->buf[tsize] = 0;
     @@ reftable/record.c (new)
      +	struct slice start = s;
      +	int l = strlen(str);
      +	int n = put_var_int(s, l);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return -1;
     -+	}
      +	slice_consume(&s, n);
     -+	if (s.len < l) {
     ++	if (s.len < l)
      +		return -1;
     -+	}
      +	memcpy(s.buf, str, l);
      +	slice_consume(&s, l);
      +
     @@ reftable/record.c (new)
      +	int prefix_len = common_prefix_size(prev_key, key);
      +	uint64_t suffix_len = key.len - prefix_len;
      +	int n = put_var_int(dest, (uint64_t)prefix_len);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return -1;
     -+	}
      +	slice_consume(&dest, n);
      +
      +	*restart = (prefix_len == 0);
      +
      +	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return -1;
     -+	}
      +	slice_consume(&dest, n);
      +
     -+	if (dest.len < suffix_len) {
     ++	if (dest.len < suffix_len)
      +		return -1;
     -+	}
      +	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
      +	slice_consume(&dest, suffix_len);
      +
     @@ reftable/record.c (new)
      +	uint64_t prefix_len = 0;
      +	uint64_t suffix_len = 0;
      +	int n = get_var_int(&prefix_len, in);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return -1;
     -+	}
      +	slice_consume(&in, n);
      +
     -+	if (prefix_len > last_key.len) {
     ++	if (prefix_len > last_key.len)
      +		return -1;
     -+	}
      +
      +	n = get_var_int(&suffix_len, in);
     -+	if (n <= 0) {
     ++	if (n <= 0)
      +		return -1;
     -+	}
      +	slice_consume(&in, n);
      +
      +	*extra = (byte)(suffix_len & 0x7);
      +	suffix_len >>= 3;
      +
     -+	if (in.len < suffix_len) {
     ++	if (in.len < suffix_len)
      +		return -1;
     -+	}
      +
      +	slice_resize(key, suffix_len + prefix_len);
      +	memcpy(key->buf, last_key.buf, prefix_len);
     @@ reftable/record.c (new)
      +
      +static char hexdigit(int c)
      +{
     -+	if (c <= 9) {
     ++	if (c <= 9)
      +		return '0' + c;
     -+	}
      +	return 'a' + (c - 10);
      +}
      +
     @@ reftable/record.c (new)
      +		} else {
      +			return 1;
      +		}
     -+	} else if (r->target != NULL) {
     ++	} else if (r->target != NULL)
      +		return 3;
     -+	}
      +	return 0;
      +}
      +
     @@ reftable/record.c (new)
      +	struct slice start = s;
      +	int n = put_var_int(s, r->update_index);
      +	assert(hash_size > 0);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return -1;
     -+	}
      +	slice_consume(&s, n);
      +
      +	if (r->value != NULL) {
     @@ reftable/record.c (new)
      +	bool seen_target = false;
      +
      +	int n = get_var_int(&r->update_index, in);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return n;
     -+	}
      +	assert(hash_size > 0);
      +
      +	slice_consume(&in, n);
     @@ reftable/record.c (new)
      +		slice_consume(&in, hash_size);
      +		break;
      +	case 3: {
     -+		struct slice dest = { 0 };
     ++		struct slice dest = SLICE_INIT;
      +		int n = decode_string(&dest, in);
      +		if (n < 0) {
      +			return -1;
     @@ reftable/record.c (new)
      +	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
      +	const struct reftable_obj_record *src =
      +		(const struct reftable_obj_record *)src_rec;
     ++	int olen;
      +
      +	reftable_obj_record_clear(obj);
      +	*obj = *src;
      +	obj->hash_prefix = reftable_malloc(obj->hash_prefix_len);
      +	memcpy(obj->hash_prefix, src->hash_prefix, obj->hash_prefix_len);
      +
     -+	{
     -+		int olen = obj->offset_len * sizeof(uint64_t);
     -+		obj->offsets = reftable_malloc(olen);
     -+		memcpy(obj->offsets, src->offsets, olen);
     -+	}
     ++	olen = obj->offset_len * sizeof(uint64_t);
     ++	obj->offsets = reftable_malloc(olen);
     ++	memcpy(obj->offsets, src->offsets, olen);
      +}
      +
      +static byte reftable_obj_record_val_type(const void *rec)
      +{
      +	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
     -+	if (r->offset_len > 0 && r->offset_len < 8) {
     ++	if (r->offset_len > 0 && r->offset_len < 8)
      +		return r->offset_len;
     -+	}
      +	return 0;
      +}
      +
     @@ reftable/record.c (new)
      +		}
      +		slice_consume(&s, n);
      +	}
     -+	if (r->offset_len == 0) {
     ++	if (r->offset_len == 0)
      +		return start.len - s.len;
     -+	}
      +	n = put_var_int(s, r->offsets[0]);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return -1;
     -+	}
      +	slice_consume(&s, n);
      +
      +	last = r->offsets[0];
     @@ reftable/record.c (new)
      +
      +	r->offsets = NULL;
      +	r->offset_len = 0;
     -+	if (count == 0) {
     ++	if (count == 0)
      +		return start.len - in.len;
     -+	}
      +
      +	r->offsets = reftable_malloc(count * sizeof(uint64_t));
      +	r->offset_len = count;
      +
      +	n = get_var_int(&r->offsets[0], in);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return n;
     -+	}
      +	slice_consume(&in, n);
      +
      +	last = r->offsets[0];
     @@ reftable/record.c (new)
      +	int n = 0;
      +	byte *oldh = r->old_hash;
      +	byte *newh = r->new_hash;
     -+	if (reftable_log_record_is_deletion(r)) {
     ++	if (reftable_log_record_is_deletion(r))
      +		return 0;
     -+	}
      +
      +	if (oldh == NULL) {
      +		oldh = zero;
     @@ reftable/record.c (new)
      +		newh = zero;
      +	}
      +
     -+	if (s.len < 2 * hash_size) {
     ++	if (s.len < 2 * hash_size)
      +		return -1;
     -+	}
      +
      +	memcpy(s.buf, oldh, hash_size);
      +	memcpy(s.buf + hash_size, newh, hash_size);
      +	slice_consume(&s, 2 * hash_size);
      +
      +	n = encode_string(r->name ? r->name : "", s);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return -1;
     -+	}
      +	slice_consume(&s, n);
      +
      +	n = encode_string(r->email ? r->email : "", s);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return -1;
     -+	}
      +	slice_consume(&s, n);
      +
      +	n = put_var_int(s, r->time);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return -1;
     -+	}
      +	slice_consume(&s, n);
      +
     -+	if (s.len < 2) {
     ++	if (s.len < 2)
      +		return -1;
     -+	}
      +
      +	put_be16(s.buf, r->tz_offset);
      +	slice_consume(&s, 2);
      +
      +	n = encode_string(r->message ? r->message : "", s);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return -1;
     -+	}
      +	slice_consume(&s, n);
      +
      +	return start.len - s.len;
     @@ reftable/record.c (new)
      +	struct reftable_log_record *r = (struct reftable_log_record *)rec;
      +	uint64_t max = 0;
      +	uint64_t ts = 0;
     -+	struct slice dest = { 0 };
     ++	struct slice dest = SLICE_INIT;
      +	int n;
      +
     -+	if (key.len <= 9 || key.buf[key.len - 9] != 0) {
     ++	if (key.len <= 9 || key.buf[key.len - 9] != 0)
      +		return REFTABLE_FORMAT_ERROR;
     -+	}
      +
      +	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
      +	memcpy(r->ref_name, key.buf, key.len - 8);
     @@ reftable/record.c (new)
      +		return 0;
      +	}
      +
     -+	if (in.len < 2 * hash_size) {
     ++	if (in.len < 2 * hash_size)
      +		return REFTABLE_FORMAT_ERROR;
     -+	}
      +
      +	r->old_hash = reftable_realloc(r->old_hash, hash_size);
      +	r->new_hash = reftable_realloc(r->new_hash, hash_size);
     @@ reftable/record.c (new)
      +	slice_consume(&in, 2 * hash_size);
      +
      +	n = decode_string(&dest, in);
     -+	if (n < 0) {
     -+		goto error;
     -+	}
     ++	if (n < 0)
     ++		goto done;
      +	slice_consume(&in, n);
      +
      +	r->name = reftable_realloc(r->name, dest.len + 1);
     @@ reftable/record.c (new)
      +
      +	slice_resize(&dest, 0);
      +	n = decode_string(&dest, in);
     -+	if (n < 0) {
     -+		goto error;
     -+	}
     ++	if (n < 0)
     ++		goto done;
      +	slice_consume(&in, n);
      +
      +	r->email = reftable_realloc(r->email, dest.len + 1);
     @@ reftable/record.c (new)
      +
      +	ts = 0;
      +	n = get_var_int(&ts, in);
     -+	if (n < 0) {
     -+		goto error;
     -+	}
     ++	if (n < 0)
     ++		goto done;
      +	slice_consume(&in, n);
      +	r->time = ts;
     -+	if (in.len < 2) {
     -+		goto error;
     -+	}
     ++	if (in.len < 2)
     ++		goto done;
      +
      +	r->tz_offset = get_be16(in.buf);
      +	slice_consume(&in, 2);
      +
      +	slice_resize(&dest, 0);
      +	n = decode_string(&dest, in);
     -+	if (n < 0) {
     -+		goto error;
     -+	}
     ++	if (n < 0)
     ++		goto done;
      +	slice_consume(&in, n);
      +
      +	r->message = reftable_realloc(r->message, dest.len + 1);
     @@ reftable/record.c (new)
      +	slice_release(&dest);
      +	return start.len - in.len;
      +
     -+error:
     ++done:
      +	slice_release(&dest);
      +	return REFTABLE_FORMAT_ERROR;
      +}
     @@ reftable/record.c (new)
      +static bool null_streq(char *a, char *b)
      +{
      +	char *empty = "";
     -+	if (a == NULL) {
     ++	if (a == NULL)
      +		a = empty;
     -+	}
     -+	if (b == NULL) {
     ++
     ++	if (b == NULL)
      +		b = empty;
     -+	}
     ++
      +	return 0 == strcmp(a, b);
      +}
      +
      +static bool zero_hash_eq(byte *a, byte *b, int sz)
      +{
     -+	if (a == NULL) {
     ++	if (a == NULL)
      +		a = zero;
     -+	}
     -+	if (b == NULL) {
     ++
     ++	if (b == NULL)
      +		b = zero;
     -+	}
     ++
      +	return !memcmp(a, b, sz);
      +}
      +
     @@ reftable/record.c (new)
      +		return rec;
      +	}
      +	case BLOCK_TYPE_INDEX: {
     ++		struct reftable_index_record empty = { .last_key = SLICE_INIT };
      +		struct reftable_index_record *r =
      +			reftable_calloc(sizeof(struct reftable_index_record));
     ++		*r = empty;
      +		reftable_record_from_index(&rec, r);
      +		return rec;
      +	}
     @@ reftable/record.c (new)
      +	struct slice start = out;
      +
      +	int n = put_var_int(out, r->offset);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return n;
     -+	}
      +
      +	slice_consume(&out, n);
      +
     @@ reftable/record.c (new)
      +	slice_copy(&r->last_key, key);
      +
      +	n = get_var_int(&r->offset, in);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return n;
     -+	}
      +
      +	slice_consume(&in, n);
      +	return start.len - in.len;
     @@ reftable/record.c (new)
      +
      +static bool hash_equal(byte *a, byte *b, int hash_size)
      +{
     -+	if (a != NULL && b != NULL) {
     ++	if (a != NULL && b != NULL)
      +		return !memcmp(a, b, hash_size);
     -+	}
      +
      +	return a == b;
      +}
      +
      +static bool str_equal(char *a, char *b)
      +{
     -+	if (a != NULL && b != NULL) {
     ++	if (a != NULL && b != NULL)
      +		return 0 == strcmp(a, b);
     -+	}
      +
      +	return a == b;
      +}
     @@ reftable/record.c (new)
      +	struct reftable_log_record *lb = (struct reftable_log_record *)b;
      +
      +	int cmp = strcmp(la->ref_name, lb->ref_name);
     -+	if (cmp) {
     ++	if (cmp)
      +		return cmp;
     -+	}
     -+	if (la->update_index > lb->update_index) {
     ++	if (la->update_index > lb->update_index)
      +		return -1;
     -+	}
      +	return (la->update_index < lb->update_index) ? 1 : 0;
      +}
      +
     @@ reftable/refname.c (new)
      +		};
      +		int idx = binsearch(mod->add_len, find_name, &arg);
      +		if (idx < mod->add_len &&
     -+		    !strncmp(prefix, mod->add[idx], strlen(prefix))) {
     -+			goto exit;
     -+		}
     ++		    !strncmp(prefix, mod->add[idx], strlen(prefix)))
     ++			goto done;
      +	}
      +	err = reftable_table_seek_ref(&mod->tab, &it, prefix);
     -+	if (err) {
     -+		goto exit;
     -+	}
     ++	if (err)
     ++		goto done;
      +
      +	while (true) {
      +		err = reftable_iterator_next_ref(&it, &ref);
     -+		if (err) {
     -+			goto exit;
     -+		}
     ++		if (err)
     ++			goto done;
      +
      +		if (mod->del_len > 0) {
      +			struct find_arg arg = {
     @@ reftable/refname.c (new)
      +
      +		if (strncmp(ref.ref_name, prefix, strlen(prefix))) {
      +			err = 1;
     -+			goto exit;
     ++			goto done;
      +		}
      +		err = 0;
     -+		goto exit;
     ++		goto done;
      +	}
      +
     -+exit:
     ++done:
      +	reftable_ref_record_clear(&ref);
      +	reftable_iterator_destroy(&it);
      +	return err;
     @@ reftable/refname.c (new)
      +
      +int modification_validate(struct modification *mod)
      +{
     -+	struct slice slashed = { 0 };
     ++	struct slice slashed = SLICE_INIT;
      +	int err = 0;
      +	int i = 0;
      +	for (; i < mod->add_len; i++) {
      +		err = validate_ref_name(mod->add[i]);
     -+		if (err) {
     -+			goto exit;
     -+		}
     ++		if (err)
     ++			goto done;
      +		slice_set_string(&slashed, mod->add[i]);
      +		slice_addstr(&slashed, "/");
      +
     @@ reftable/refname.c (new)
      +			mod, slice_as_string(&slashed));
      +		if (err == 0) {
      +			err = REFTABLE_NAME_CONFLICT;
     -+			goto exit;
     -+		}
     -+		if (err < 0) {
     -+			goto exit;
     ++			goto done;
      +		}
     ++		if (err < 0)
     ++			goto done;
      +
      +		slice_set_string(&slashed, mod->add[i]);
      +		while (slashed.len) {
     @@ reftable/refname.c (new)
      +						   slice_as_string(&slashed));
      +			if (err == 0) {
      +				err = REFTABLE_NAME_CONFLICT;
     -+				goto exit;
     -+			}
     -+			if (err < 0) {
     -+				goto exit;
     ++				goto done;
      +			}
     ++			if (err < 0)
     ++				goto done;
      +		}
      +	}
      +	err = 0;
     -+exit:
     ++done:
      +	slice_release(&slashed);
      +	return err;
      +}
     @@ reftable/reftable.c (new)
      +{
      +	struct reftable_iterator it = { 0 };
      +	int err = reftable_table_seek_ref(tab, &it, name);
     -+	if (err) {
     -+		goto exit;
     -+	}
     ++	if (err)
     ++		goto done;
      +
      +	err = reftable_iterator_next_ref(&it, ref);
     -+	if (err) {
     -+		goto exit;
     -+	}
     ++	if (err)
     ++		goto done;
      +
      +	if (strcmp(ref->ref_name, name) ||
      +	    reftable_ref_record_is_deletion(ref)) {
      +		reftable_ref_record_clear(ref);
      +		err = 1;
     -+		goto exit;
     ++		goto done;
      +	}
      +
     -+exit:
     ++done:
      +	reftable_iterator_destroy(&it);
      +	return err;
      +}
     @@ reftable/slice.c (new)
      +
      +#include "reftable.h"
      +
     ++struct slice reftable_empty_slice = SLICE_INIT;
     ++
      +void slice_set_string(struct slice *s, const char *str)
      +{
     ++	int l;
      +	if (str == NULL) {
      +		s->len = 0;
      +		return;
      +	}
     ++	assert(s->canary == SLICE_CANARY);
      +
     -+	{
     -+		int l = strlen(str);
     -+		l++; /* \0 */
     -+		slice_resize(s, l);
     -+		memcpy(s->buf, str, l);
     -+		s->len = l - 1;
     -+	}
     ++	l = strlen(str);
     ++	l++; /* \0 */
     ++	slice_resize(s, l);
     ++	memcpy(s->buf, str, l);
     ++	s->len = l - 1;
     ++}
     ++
     ++void slice_init(struct slice *s)
     ++{
     ++	struct slice empty = SLICE_INIT;
     ++	*s = empty;
      +}
      +
      +void slice_resize(struct slice *s, int l)
      +{
     ++	assert(s->canary == SLICE_CANARY);
      +	if (s->cap < l) {
      +		int c = s->cap * 2;
      +		if (c < l) {
     @@ reftable/slice.c (new)
      +{
      +	int l1 = d->len;
      +	int l2 = strlen(s);
     ++	assert(d->canary == SLICE_CANARY);
      +
      +	slice_resize(d, l2 + l1);
      +	memcpy(d->buf + l1, s, l2);
     @@ reftable/slice.c (new)
      +void slice_addbuf(struct slice *s, struct slice a)
      +{
      +	int end = s->len;
     ++	assert(s->canary == SLICE_CANARY);
      +	slice_resize(s, s->len + a.len);
      +	memcpy(s->buf + end, a.buf, a.len);
      +}
      +
      +void slice_consume(struct slice *s, int n)
      +{
     ++	assert(s->canary == SLICE_CANARY);
      +	s->buf += n;
      +	s->len -= n;
      +}
     @@ reftable/slice.c (new)
      +byte *slice_detach(struct slice *s)
      +{
      +	byte *p = s->buf;
     ++	assert(s->canary == SLICE_CANARY);
      +	s->buf = NULL;
      +	s->cap = 0;
      +	s->len = 0;
     @@ reftable/slice.c (new)
      +
      +void slice_release(struct slice *s)
      +{
     ++	assert(s->canary == SLICE_CANARY);
      +	reftable_free(slice_detach(s));
      +}
      +
      +void slice_copy(struct slice *dest, struct slice src)
      +{
     ++	assert(dest->canary == SLICE_CANARY);
     ++	assert(src.canary == SLICE_CANARY);
      +	slice_resize(dest, src.len);
      +	memcpy(dest->buf, src.buf, src.len);
      +}
     @@ reftable/slice.c (new)
      +   a \0 is added at the end. */
      +const char *slice_as_string(struct slice *s)
      +{
     ++	assert(s->canary == SLICE_CANARY);
      +	if (s->cap == s->len) {
      +		int l = s->len;
      +		slice_resize(s, l + 1);
     @@ reftable/slice.c (new)
      +/* return a newly malloced string for this slice */
      +char *slice_to_string(struct slice in)
      +{
     -+	struct slice s = { 0 };
     ++	struct slice s = SLICE_INIT;
     ++	assert(in.canary == SLICE_CANARY);
      +	slice_resize(&s, in.len + 1);
      +	s.buf[in.len] = 0;
      +	memcpy(s.buf, in.buf, in.len);
     @@ reftable/slice.c (new)
      +
      +bool slice_equal(struct slice a, struct slice b)
      +{
     -+	if (a.len != b.len) {
     ++	assert(a.canary == SLICE_CANARY);
     ++	assert(b.canary == SLICE_CANARY);
     ++	if (a.len != b.len)
      +		return 0;
     -+	}
      +	return memcmp(a.buf, b.buf, a.len) == 0;
      +}
      +
     @@ reftable/slice.c (new)
      +{
      +	int min = a.len < b.len ? a.len : b.len;
      +	int res = memcmp(a.buf, b.buf, min);
     -+	if (res != 0) {
     ++	assert(a.canary == SLICE_CANARY);
     ++	assert(b.canary == SLICE_CANARY);
     ++	if (res != 0)
      +		return res;
     -+	}
     -+	if (a.len < b.len) {
     ++	if (a.len < b.len)
      +		return -1;
     -+	} else if (a.len > b.len) {
     ++	else if (a.len > b.len)
      +		return 1;
     -+	} else {
     ++	else
      +		return 0;
     -+	}
      +}
      +
      +int slice_add(struct slice *b, byte *data, size_t sz)
      +{
     ++	assert(b->canary == SLICE_CANARY);
      +	if (b->len + sz > b->cap) {
      +		int newcap = 2 * b->cap + 1;
      +		if (newcap < b->len + sz) {
     @@ reftable/slice.c (new)
      +int common_prefix_size(struct slice a, struct slice b)
      +{
      +	int p = 0;
     ++	assert(a.canary == SLICE_CANARY);
     ++	assert(b.canary == SLICE_CANARY);
      +	while (p < a.len && p < b.len) {
      +		if (a.buf[p] != b.buf[p]) {
      +			break;
     @@ reftable/slice.h (new)
      +	int len;
      +	int cap;
      +	byte *buf;
     ++	byte canary;
      +};
     ++#define SLICE_CANARY 0x42
     ++#define SLICE_INIT                       \
     ++	{                                \
     ++		0, 0, NULL, SLICE_CANARY \
     ++	}
     ++extern struct slice reftable_empty_slice;
      +
      +void slice_set_string(struct slice *dest, const char *src);
      +void slice_addstr(struct slice *dest, const char *src);
     @@ reftable/slice.h (new)
      +/* Return a malloced string for `src` */
      +char *slice_to_string(struct slice src);
      +
     ++/* Initializes a slice. Accepts a slice with random garbage. */
     ++void slice_init(struct slice *slice);
     ++
      +/* Ensure that `buf` is \0 terminated. */
      +const char *slice_as_string(struct slice *src);
      +
     @@ reftable/stack.c (new)
      +{
      +	struct reftable_stack *p =
      +		reftable_calloc(sizeof(struct reftable_stack));
     -+	struct slice list_file_name = { 0 };
     ++	struct slice list_file_name = SLICE_INIT;
      +	int err = 0;
      +
      +	if (config.hash_id == 0) {
     @@ reftable/stack.c (new)
      +	int err = 0;
      +	if (size < 0) {
      +		err = REFTABLE_IO_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +	err = lseek(fd, 0, SEEK_SET);
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	buf = reftable_malloc(size + 1);
      +	if (read(fd, buf, size) != size) {
      +		err = REFTABLE_IO_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +	buf[size] = 0;
      +
      +	parse_names(buf, size, namesp);
      +
     -+exit:
     ++done:
      +	reftable_free(buf);
      +	return err;
      +}
     @@ reftable/stack.c (new)
      +	int new_tables_len = 0;
      +	struct reftable_merged_table *new_merged = NULL;
      +	int i;
     -+	struct slice table_path = { 0 };
     ++	struct slice table_path = SLICE_INIT;
      +
      +	while (*names) {
      +		struct reftable_reader *rd = NULL;
     @@ reftable/stack.c (new)
      +
      +			err = reftable_block_source_from_file(
      +				&src, slice_as_string(&table_path));
     -+			if (err < 0) {
     -+				goto exit;
     -+			}
     ++			if (err < 0)
     ++				goto done;
      +
      +			err = reftable_new_reader(&rd, &src, name);
     -+			if (err < 0) {
     -+				goto exit;
     -+			}
     ++			if (err < 0)
     ++				goto done;
      +		}
      +
      +		new_tables[new_tables_len++] = rd;
     @@ reftable/stack.c (new)
      +	/* success! */
      +	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
      +					st->config.hash_id);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	new_tables = NULL;
      +	new_tables_len = 0;
     @@ reftable/stack.c (new)
      +		}
      +	}
      +
     -+exit:
     ++done:
      +	slice_release(&table_path);
      +	for (i = 0; i < new_tables_len; i++) {
      +		reader_close(new_tables[i]);
     @@ reftable/stack.c (new)
      +	time_t diff = a->tv_sec - b->tv_sec;
      +	int udiff = a->tv_usec - b->tv_usec;
      +
     -+	if (diff != 0) {
     ++	if (diff != 0)
      +		return diff;
     -+	}
      +
      +	return udiff;
      +}
     @@ reftable/stack.c (new)
      +	int err = gettimeofday(&deadline, NULL);
      +	int64_t delay = 0;
      +	int tries = 0;
     -+	if (err < 0) {
     ++	if (err < 0)
      +		return err;
     -+	}
      +
      +	deadline.tv_sec += 3;
      +	while (true) {
     @@ reftable/stack.c (new)
      +	char **names = NULL;
      +	int err = read_lines(st->list_file, &names);
      +	int i = 0;
     -+	if (err < 0) {
     ++	if (err < 0)
      +		return err;
     -+	}
      +
      +	for (i = 0; i < st->merged->stack_len; i++) {
      +		if (names[i] == NULL) {
      +			err = 1;
     -+			goto exit;
     ++			goto done;
      +		}
      +
      +		if (strcmp(st->merged->stack[i]->name, names[i])) {
      +			err = 1;
     -+			goto exit;
     ++			goto done;
      +		}
      +	}
      +
      +	if (names[st->merged->stack_len] != NULL) {
      +		err = 1;
     -+		goto exit;
     ++		goto done;
      +	}
      +
     -+exit:
     ++done:
      +	free_names(names);
      +	return err;
      +}
     @@ reftable/stack.c (new)
      +int reftable_stack_reload(struct reftable_stack *st)
      +{
      +	int err = stack_uptodate(st);
     -+	if (err > 0) {
     ++	if (err > 0)
      +		return reftable_stack_reload_maybe_reuse(st, true);
     -+	}
      +	return err;
      +}
      +
     @@ reftable/stack.c (new)
      +		return err;
      +	}
      +
     -+	if (!st->disable_auto_compact) {
     ++	if (!st->disable_auto_compact)
      +		return reftable_stack_auto_compact(st);
     -+	}
      +
      +	return 0;
      +}
     @@ reftable/stack.c (new)
      +	uint64_t next_update_index;
      +};
      +
     ++#define REFTABLE_ADDITION_INIT               \
     ++	{                                    \
     ++		.lock_file_name = SLICE_INIT \
     ++	}
     ++
      +static int reftable_stack_init_addition(struct reftable_addition *add,
      +					struct reftable_stack *st)
      +{
     @@ reftable/stack.c (new)
      +		} else {
      +			err = REFTABLE_IO_ERROR;
      +		}
     -+		goto exit;
     ++		goto done;
      +	}
      +	err = stack_uptodate(st);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	if (err > 1) {
      +		err = REFTABLE_LOCK_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	add->next_update_index = reftable_stack_next_update_index(st);
     -+exit:
     ++done:
      +	if (err) {
      +		reftable_addition_close(add);
      +	}
     @@ reftable/stack.c (new)
      +void reftable_addition_close(struct reftable_addition *add)
      +{
      +	int i = 0;
     -+	struct slice nm = { 0 };
     ++	struct slice nm = SLICE_INIT;
      +	for (i = 0; i < add->new_tables_len; i++) {
      +		slice_set_string(&nm, add->stack->list_file);
      +		slice_addstr(&nm, "/");
     @@ reftable/stack.c (new)
      +
      +int reftable_addition_commit(struct reftable_addition *add)
      +{
     -+	struct slice table_list = { 0 };
     ++	struct slice table_list = SLICE_INIT;
      +	int i = 0;
      +	int err = 0;
     -+	if (add->new_tables_len == 0) {
     -+		goto exit;
     -+	}
     ++	if (add->new_tables_len == 0)
     ++		goto done;
      +
      +	for (i = 0; i < add->stack->merged->stack_len; i++) {
      +		slice_addstr(&table_list, add->stack->merged->stack[i]->name);
     @@ reftable/stack.c (new)
      +	slice_release(&table_list);
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	err = close(add->lock_file_fd);
      +	add->lock_file_fd = 0;
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	err = rename(slice_as_string(&add->lock_file_name),
      +		     add->stack->list_file);
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	err = reftable_stack_reload(add->stack);
      +
     -+exit:
     ++done:
      +	reftable_addition_close(add);
      +	return err;
      +}
     @@ reftable/stack.c (new)
      +				struct reftable_stack *st)
      +{
      +	int err = 0;
     ++	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
      +	*dest = reftable_calloc(sizeof(**dest));
     ++	**dest = empty;
      +	err = reftable_stack_init_addition(*dest, st);
      +	if (err) {
      +		reftable_free(*dest);
     @@ reftable/stack.c (new)
      +		  int (*write_table)(struct reftable_writer *wr, void *arg),
      +		  void *arg)
      +{
     -+	struct reftable_addition add = { 0 };
     ++	struct reftable_addition add = REFTABLE_ADDITION_INIT;
      +	int err = reftable_stack_init_addition(&add, st);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +	if (err > 0) {
      +		err = REFTABLE_LOCK_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	err = reftable_addition_add(&add, write_table, arg);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	err = reftable_addition_commit(&add);
     -+exit:
     ++done:
      +	reftable_addition_close(&add);
      +	return err;
      +}
     @@ reftable/stack.c (new)
      +					     void *arg),
      +			  void *arg)
      +{
     -+	struct slice temp_tab_file_name = { 0 };
     -+	struct slice tab_file_name = { 0 };
     -+	struct slice next_name = { 0 };
     ++	struct slice temp_tab_file_name = SLICE_INIT;
     ++	struct slice tab_file_name = SLICE_INIT;
     ++	struct slice next_name = SLICE_INIT;
      +	struct reftable_writer *wr = NULL;
      +	int err = 0;
      +	int tab_fd = 0;
     @@ reftable/stack.c (new)
      +	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
      +	if (tab_fd < 0) {
      +		err = REFTABLE_IO_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
      +				 &add->stack->config);
      +	err = write_table(wr, arg);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	err = reftable_writer_close(wr);
      +	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
      +		err = 0;
     -+		goto exit;
     -+	}
     -+	if (err < 0) {
     -+		goto exit;
     ++		goto done;
      +	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	err = close(tab_fd);
      +	tab_fd = 0;
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	err = stack_check_addition(add->stack,
      +				   slice_as_string(&temp_tab_file_name));
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	if (wr->min_update_index < add->next_update_index) {
      +		err = REFTABLE_API_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	format_name(&next_name, wr->min_update_index, wr->max_update_index);
     @@ reftable/stack.c (new)
      +		     slice_as_string(&tab_file_name));
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	add->new_tables = reftable_realloc(add->new_tables,
     @@ reftable/stack.c (new)
      +						   (add->new_tables_len + 1));
      +	add->new_tables[add->new_tables_len] = slice_to_string(next_name);
      +	add->new_tables_len++;
     -+exit:
     ++done:
      +	if (tab_fd > 0) {
      +		close(tab_fd);
      +		tab_fd = 0;
     @@ reftable/stack.c (new)
      +uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
      +{
      +	int sz = st->merged->stack_len;
     -+	if (sz > 0) {
     ++	if (sz > 0)
      +		return reftable_reader_max_update_index(
      +			       st->merged->stack[sz - 1]) +
      +		       1;
     -+	}
      +	return 1;
      +}
      +
     @@ reftable/stack.c (new)
      +				struct slice *temp_tab,
      +				struct reftable_log_expiry_config *config)
      +{
     -+	struct slice next_name = { 0 };
     ++	struct slice next_name = SLICE_INIT;
      +	int tab_fd = -1;
      +	struct reftable_writer *wr = NULL;
      +	int err = 0;
     @@ reftable/stack.c (new)
      +	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
      +
      +	err = stack_write_compact(st, wr, first, last, config);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +	err = reftable_writer_close(wr);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	err = close(tab_fd);
      +	tab_fd = 0;
      +
     -+exit:
     ++done:
      +	reftable_writer_free(wr);
      +	if (tab_fd > 0) {
      +		close(tab_fd);
     @@ reftable/stack.c (new)
      +					st->config.hash_id);
      +	if (err < 0) {
      +		reftable_free(subtabs);
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	err = reftable_merged_table_seek_ref(mt, &it, "");
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	while (true) {
      +		err = reftable_iterator_next_ref(&it, &ref);
     @@ reftable/stack.c (new)
      +	reftable_iterator_destroy(&it);
      +
      +	err = reftable_merged_table_seek_log(mt, &it, "");
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	while (true) {
      +		err = reftable_iterator_next_log(&it, &log);
     @@ reftable/stack.c (new)
      +		entries++;
      +	}
      +
     -+exit:
     ++done:
      +	reftable_iterator_destroy(&it);
      +	if (mt != NULL) {
      +		merged_table_clear(mt);
     @@ reftable/stack.c (new)
      +static int stack_compact_range(struct reftable_stack *st, int first, int last,
      +			       struct reftable_log_expiry_config *expiry)
      +{
     -+	struct slice temp_tab_file_name = { 0 };
     -+	struct slice new_table_name = { 0 };
     -+	struct slice lock_file_name = { 0 };
     -+	struct slice ref_list_contents = { 0 };
     -+	struct slice new_table_path = { 0 };
     ++	struct slice temp_tab_file_name = SLICE_INIT;
     ++	struct slice new_table_name = SLICE_INIT;
     ++	struct slice lock_file_name = SLICE_INIT;
     ++	struct slice ref_list_contents = SLICE_INIT;
     ++	struct slice new_table_path = SLICE_INIT;
      +	int err = 0;
      +	bool have_lock = false;
      +	int lock_file_fd = 0;
     @@ reftable/stack.c (new)
      +
      +	if (first > last || (expiry == NULL && first == last)) {
      +		err = 0;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	st->stats.attempts++;
     @@ reftable/stack.c (new)
      +		} else {
      +			err = REFTABLE_IO_ERROR;
      +		}
     -+		goto exit;
     ++		goto done;
      +	}
      +	/* Don't want to write to the lock for now.  */
      +	close(lock_file_fd);
     @@ reftable/stack.c (new)
      +
      +	have_lock = true;
      +	err = stack_uptodate(st);
     -+	if (err != 0) {
     -+		goto exit;
     -+	}
     ++	if (err != 0)
     ++		goto done;
      +
      +	for (i = first, j = 0; i <= last; i++) {
     -+		struct slice subtab_file_name = { 0 };
     -+		struct slice subtab_lock = { 0 };
     ++		struct slice subtab_file_name = SLICE_INIT;
     ++		struct slice subtab_lock = SLICE_INIT;
     ++		int sublock_file_fd = -1;
     ++
      +		slice_set_string(&subtab_file_name, st->reftable_dir);
      +		slice_addstr(&subtab_file_name, "/");
      +		slice_addstr(&subtab_file_name,
     @@ reftable/stack.c (new)
      +		slice_copy(&subtab_lock, subtab_file_name);
      +		slice_addstr(&subtab_lock, ".lock");
      +
     -+		{
     -+			int sublock_file_fd =
     -+				open(slice_as_string(&subtab_lock),
     -+				     O_EXCL | O_CREAT | O_WRONLY, 0644);
     -+			if (sublock_file_fd > 0) {
     -+				close(sublock_file_fd);
     -+			} else if (sublock_file_fd < 0) {
     -+				if (errno == EEXIST) {
     -+					err = 1;
     -+				} else {
     -+					err = REFTABLE_IO_ERROR;
     -+				}
     ++		sublock_file_fd = open(slice_as_string(&subtab_lock),
     ++				       O_EXCL | O_CREAT | O_WRONLY, 0644);
     ++		if (sublock_file_fd > 0) {
     ++			close(sublock_file_fd);
     ++		} else if (sublock_file_fd < 0) {
     ++			if (errno == EEXIST) {
     ++				err = 1;
     ++			} else {
     ++				err = REFTABLE_IO_ERROR;
      +			}
      +		}
      +
     @@ reftable/stack.c (new)
      +			(char *)slice_as_string(&subtab_file_name);
      +		j++;
      +
     -+		if (err != 0) {
     -+			goto exit;
     -+		}
     ++		if (err != 0)
     ++			goto done;
      +	}
      +
      +	err = unlink(slice_as_string(&lock_file_name));
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +	have_lock = false;
      +
      +	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
     @@ reftable/stack.c (new)
      +	if (is_empty_table) {
      +		err = 0;
      +	}
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	lock_file_fd = open(slice_as_string(&lock_file_name),
      +			    O_EXCL | O_CREAT | O_WRONLY, 0644);
     @@ reftable/stack.c (new)
      +		} else {
      +			err = REFTABLE_IO_ERROR;
      +		}
     -+		goto exit;
     ++		goto done;
      +	}
      +	have_lock = true;
      +
     @@ reftable/stack.c (new)
      +			     slice_as_string(&new_table_path));
      +		if (err < 0) {
      +			err = REFTABLE_IO_ERROR;
     -+			goto exit;
     ++			goto done;
      +		}
      +	}
      +
     @@ reftable/stack.c (new)
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
      +		unlink(slice_as_string(&new_table_path));
     -+		goto exit;
     ++		goto done;
      +	}
      +	err = close(lock_file_fd);
      +	lock_file_fd = 0;
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
      +		unlink(slice_as_string(&new_table_path));
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	err = rename(slice_as_string(&lock_file_name), st->list_file);
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
      +		unlink(slice_as_string(&new_table_path));
     -+		goto exit;
     ++		goto done;
      +	}
      +	have_lock = false;
      +
     @@ reftable/stack.c (new)
      +		listp++;
      +	}
      +
     -+exit:
     ++done:
      +	free_names(delete_on_success);
      +
      +	listp = subtable_locks;
     @@ reftable/stack.c (new)
      +int fastlog2(uint64_t sz)
      +{
      +	int l = 0;
     -+	if (sz == 0) {
     ++	if (sz == 0)
      +		return 0;
     -+	}
      +	for (; sz; sz /= 2) {
      +		l++;
      +	}
     @@ reftable/stack.c (new)
      +	struct segment seg =
      +		suggest_compaction_segment(sizes, st->merged->stack_len);
      +	reftable_free(sizes);
     -+	if (segment_size(&seg) > 0) {
     ++	if (segment_size(&seg) > 0)
      +		return stack_compact_range_stats(st, seg.start, seg.end - 1,
      +						 NULL);
     -+	}
      +
      +	return 0;
      +}
     @@ reftable/stack.c (new)
      +	struct reftable_iterator it = { 0 };
      +	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
      +	int err = reftable_merged_table_seek_log(mt, &it, refname);
     -+	if (err) {
     -+		goto exit;
     -+	}
     ++	if (err)
     ++		goto done;
      +
      +	err = reftable_iterator_next_log(&it, log);
     -+	if (err) {
     -+		goto exit;
     -+	}
     ++	if (err)
     ++		goto done;
      +
      +	if (strcmp(log->ref_name, refname) ||
      +	    reftable_log_record_is_deletion(log)) {
      +		err = 1;
     -+		goto exit;
     ++		goto done;
      +	}
      +
     -+exit:
     ++done:
      +	if (err) {
      +		reftable_log_record_clear(log);
      +	}
     @@ reftable/stack.c (new)
      +	int len = 0;
      +	int i = 0;
      +
     -+	if (st->config.skip_name_check) {
     ++	if (st->config.skip_name_check)
      +		return 0;
     -+	}
      +
      +	err = reftable_block_source_from_file(&src, new_tab_name);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	err = reftable_new_reader(&rd, &src, new_tab_name);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	err = reftable_reader_seek_ref(rd, &it, "");
      +	if (err > 0) {
      +		err = 0;
     -+		goto exit;
     -+	}
     -+	if (err < 0) {
     -+		goto exit;
     ++		goto done;
      +	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	while (true) {
      +		struct reftable_ref_record ref = { 0 };
     @@ reftable/stack.c (new)
      +		if (err > 0) {
      +			break;
      +		}
     -+		if (err < 0) {
     -+			goto exit;
     -+		}
     ++		if (err < 0)
     ++			goto done;
      +
      +		if (len >= cap) {
      +			cap = 2 * cap + 1;
     @@ reftable/stack.c (new)
      +
      +	err = validate_ref_record_addition(tab, refs, len);
      +
     -+exit:
     ++done:
      +	for (i = 0; i < len; i++) {
      +		reftable_ref_record_clear(&refs[i]);
      +	}
     @@ reftable/tree.c (new)
      +	}
      +
      +	res = compare(key, (*rootp)->key);
     -+	if (res < 0) {
     ++	if (res < 0)
      +		return tree_search(key, &(*rootp)->left, compare, insert);
     -+	} else if (res > 0) {
     ++	else if (res > 0)
      +		return tree_search(key, &(*rootp)->right, compare, insert);
     -+	}
      +	return *rootp;
      +}
      +
     @@ reftable/writer.c (new)
      +	if (w->pending_padding > 0) {
      +		byte *zeroed = reftable_calloc(w->pending_padding);
      +		int n = w->write(w->write_arg, zeroed, w->pending_padding);
     -+		if (n < 0) {
     ++		if (n < 0)
      +			return n;
     -+		}
      +
      +		w->pending_padding = 0;
      +		reftable_free(zeroed);
     @@ reftable/writer.c (new)
      +
      +	w->pending_padding = padding;
      +	n = w->write(w->write_arg, data, len);
     -+	if (n < 0) {
     ++	if (n < 0)
      +		return n;
     -+	}
      +	n += padding;
      +	return 0;
      +}
     @@ reftable/writer.c (new)
      +		block_start = header_size(writer_version(w));
      +	}
      +
     ++	slice_release(&w->last_key);
      +	block_writer_init(&w->block_writer_data, typ, w->block,
      +			  w->opts.block_size, block_start,
      +			  hash_size(w->opts.hash_id));
     @@ reftable/writer.c (new)
      +{
      +	struct reftable_writer *wp =
      +		reftable_calloc(sizeof(struct reftable_writer));
     ++	slice_init(&wp->block_writer_data.last_key);
      +	options_set_defaults(opts);
      +	if (opts->block_size >= (1 << 24)) {
      +		/* TODO - error return? */
      +		abort();
      +	}
     ++	wp->last_key = reftable_empty_slice;
      +	wp->block = reftable_calloc(opts->block_size);
      +	wp->write = writer_func;
      +	wp->write_arg = writer_arg;
     @@ reftable/writer.c (new)
      +	int offset_len;
      +	int offset_cap;
      +};
     ++#define OBJ_INDEX_TREE_NODE_INIT   \
     ++	{                          \
     ++		.hash = SLICE_INIT \
     ++	}
      +
      +static int obj_index_tree_node_compare(const void *a, const void *b)
      +{
     @@ reftable/writer.c (new)
      +					     &obj_index_tree_node_compare, 0);
      +	struct obj_index_tree_node *key = NULL;
      +	if (node == NULL) {
     -+		key = reftable_calloc(sizeof(struct obj_index_tree_node));
     ++		struct obj_index_tree_node empty = OBJ_INDEX_TREE_NODE_INIT;
     ++		key = reftable_malloc(sizeof(struct obj_index_tree_node));
     ++		*key = empty;
     ++
      +		slice_copy(&key->hash, hash);
      +		tree_search((void *)key, &w->obj_index_tree,
      +			    &obj_index_tree_node_compare, 1);
     @@ reftable/writer.c (new)
      +			     struct reftable_record *rec)
      +{
      +	int result = -1;
     -+	struct slice key = { 0 };
     ++	struct slice key = SLICE_INIT;
      +	int err = 0;
      +	reftable_record_key(rec, &key);
     -+	if (slice_cmp(w->last_key, key) >= 0) {
     -+		goto exit;
     -+	}
     ++	if (slice_cmp(w->last_key, key) >= 0)
     ++		goto done;
      +
      +	slice_copy(&w->last_key, key);
      +	if (w->block_writer == NULL) {
     @@ reftable/writer.c (new)
      +
      +	if (block_writer_add(w->block_writer, rec) == 0) {
      +		result = 0;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	err = writer_flush_block(w);
      +	if (err < 0) {
      +		result = err;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	writer_reinit_block_writer(w, reftable_record_type(rec));
      +	err = block_writer_add(w->block_writer, rec);
      +	if (err < 0) {
      +		result = err;
     -+		goto exit;
     ++		goto done;
      +	}
      +
      +	result = 0;
     -+exit:
     ++done:
      +	slice_release(&key);
      +	return result;
      +}
     @@ reftable/writer.c (new)
      +	struct reftable_ref_record copy = *ref;
      +	int err = 0;
      +
     -+	if (ref->ref_name == NULL) {
     ++	if (ref->ref_name == NULL)
      +		return REFTABLE_API_ERROR;
     -+	}
      +	if (ref->update_index < w->min_update_index ||
     -+	    ref->update_index > w->max_update_index) {
     ++	    ref->update_index > w->max_update_index)
      +		return REFTABLE_API_ERROR;
     -+	}
      +
      +	reftable_record_from_ref(&rec, &copy);
      +	copy.update_index -= w->min_update_index;
      +	err = writer_add_record(w, &rec);
     -+	if (err < 0) {
     ++	if (err < 0)
      +		return err;
     -+	}
      +
      +	if (!w->opts.skip_index_objects && ref->value != NULL) {
      +		struct slice h = {
      +			.buf = ref->value,
      +			.len = hash_size(w->opts.hash_id),
     ++			.canary = SLICE_CANARY,
      +		};
      +
      +		writer_index_hash(w, h);
      +	}
     ++
      +	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
      +		struct slice h = {
      +			.buf = ref->target_value,
      +			.len = hash_size(w->opts.hash_id),
     ++			.canary = SLICE_CANARY,
      +		};
      +		writer_index_hash(w, h);
      +	}
     @@ reftable/writer.c (new)
      +{
      +	struct reftable_record rec = { 0 };
      +	int err;
     -+	if (log->ref_name == NULL) {
     ++	if (log->ref_name == NULL)
      +		return REFTABLE_API_ERROR;
     -+	}
      +
      +	if (w->block_writer != NULL &&
      +	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
      +		int err = writer_finish_public_section(w);
     -+		if (err < 0) {
     ++		if (err < 0)
      +			return err;
     -+		}
      +	}
      +
      +	w->next -= w->pending_padding;
     @@ reftable/writer.c (new)
      +	int err = writer_flush_block(w);
      +	int i = 0;
      +	struct reftable_block_stats *bstats = NULL;
     -+	if (err < 0) {
     ++	if (err < 0)
      +		return err;
     -+	}
      +
      +	while (w->index_len > threshold) {
      +		struct reftable_index_record *idx = NULL;
     @@ reftable/writer.c (new)
      +				continue;
      +			}
      +
     -+			{
     -+				int err = writer_flush_block(w);
     -+				if (err < 0) {
     -+					return err;
     -+				}
     -+			}
     ++			err = writer_flush_block(w);
     ++			if (err < 0)
     ++				return err;
      +
      +			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
      +
     @@ reftable/writer.c (new)
      +	writer_clear_index(w);
      +
      +	err = writer_flush_block(w);
     -+	if (err < 0) {
     ++	if (err < 0)
      +		return err;
     -+	}
      +
      +	bstats = writer_reftable_block_stats(w, typ);
      +	bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
     @@ reftable/writer.c (new)
      +		.offset_len = entry->offset_len,
      +	};
      +	struct reftable_record rec = { 0 };
     -+	if (arg->err < 0) {
     -+		goto exit;
     -+	}
     ++	if (arg->err < 0)
     ++		goto done;
      +
      +	reftable_record_from_obj(&rec, &obj_rec);
      +	arg->err = block_writer_add(arg->w->block_writer, &rec);
     -+	if (arg->err == 0) {
     -+		goto exit;
     -+	}
     ++	if (arg->err == 0)
     ++		goto done;
      +
      +	arg->err = writer_flush_block(arg->w);
     -+	if (arg->err < 0) {
     -+		goto exit;
     -+	}
     ++	if (arg->err < 0)
     ++		goto done;
      +
      +	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
      +	arg->err = block_writer_add(arg->w->block_writer, &rec);
     -+	if (arg->err == 0) {
     -+		goto exit;
     -+	}
     ++	if (arg->err == 0)
     ++		goto done;
      +	obj_rec.offset_len = 0;
      +	arg->err = block_writer_add(arg->w->block_writer, &rec);
      +
      +	/* Should be able to write into a fresh block. */
      +	assert(arg->err == 0);
      +
     -+exit:;
     ++done:;
      +}
      +
      +static void object_record_free(void *void_arg, void *key)
     @@ reftable/writer.c (new)
      +		infix_walk(w->obj_index_tree, &write_object_record, &closure);
      +	}
      +
     -+	if (closure.err < 0) {
     ++	if (closure.err < 0)
      +		return closure.err;
     -+	}
      +	return writer_finish_section(w);
      +}
      +
     @@ reftable/writer.c (new)
      +	byte typ = 0;
      +	int err = 0;
      +
     -+	if (w->block_writer == NULL) {
     ++	if (w->block_writer == NULL)
      +		return 0;
     -+	}
      +
      +	typ = block_writer_type(w->block_writer);
      +	err = writer_finish_section(w);
     -+	if (err < 0) {
     ++	if (err < 0)
      +		return err;
     -+	}
      +	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
      +	    w->stats.ref_stats.index_blocks > 0) {
      +		err = writer_dump_object_index(w);
     -+		if (err < 0) {
     ++		if (err < 0)
      +			return err;
     -+		}
      +	}
      +
      +	if (w->obj_index_tree != NULL) {
     @@ reftable/writer.c (new)
      +	byte *p = footer;
      +	int err = writer_finish_public_section(w);
      +	int empty_table = w->next == 0;
     -+	if (err != 0) {
     -+		goto exit;
     -+	}
     ++	if (err != 0)
     ++		goto done;
      +	w->pending_padding = 0;
      +	if (empty_table) {
      +		/* Empty tables need a header anyway. */
      +		byte header[28];
      +		int n = writer_write_header(w, header);
      +		err = padded_write(w, header, n, 0);
     -+		if (err < 0) {
     -+			goto exit;
     -+		}
     ++		if (err < 0)
     ++			goto done;
      +	}
      +
      +	p += writer_write_header(w, footer);
     @@ reftable/writer.c (new)
      +	p += 4;
      +
      +	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
     -+	if (err < 0) {
     -+		goto exit;
     -+	}
     ++	if (err < 0)
     ++		goto done;
      +
      +	if (empty_table) {
      +		err = REFTABLE_EMPTY_TABLE_ERROR;
     -+		goto exit;
     ++		goto done;
      +	}
      +
     -+exit:
     ++done:
      +	/* free up memory. */
      +	block_writer_clear(&w->block_writer_data);
      +	writer_clear_index(w);
     @@ reftable/writer.c (new)
      +	int raw_bytes = block_writer_finish(w->block_writer);
      +	int padding = 0;
      +	int err = 0;
     -+	struct reftable_index_record ir = { 0 };
     -+	if (raw_bytes < 0) {
     ++	struct reftable_index_record ir = { .last_key = SLICE_INIT };
     ++	if (raw_bytes < 0)
      +		return raw_bytes;
     -+	}
      +
      +	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
      +		padding = w->opts.block_size - raw_bytes;
     @@ reftable/writer.c (new)
      +	}
      +
      +	err = padded_write(w, w->block, raw_bytes, padding);
     -+	if (err < 0) {
     ++	if (err < 0)
      +		return err;
     -+	}
      +
      +	if (w->index_cap == w->index_len) {
      +		w->index_cap = 2 * w->index_cap + 1;
     @@ reftable/writer.c (new)
      +
      +int writer_flush_block(struct reftable_writer *w)
      +{
     -+	if (w->block_writer == NULL) {
     ++	if (w->block_writer == NULL)
      +		return 0;
     -+	}
     -+	if (w->block_writer->entries == 0) {
     ++	if (w->block_writer->entries == 0)
      +		return 0;
     -+	}
      +	return writer_flush_nonempty_block(w);
      +}
      +
  6:  865c2c4567a ! 10:  a86c3753717 Reftable support for git-core
     @@ Commit message
      
          TODO:
      
     -    * Resolve spots marked with XXX
     +    * Fix worktree commands
     +
     +    * Spots marked XXX
      
          Example use: see t/t0031-reftable.sh
      
     @@ refs/reftable-backend.c (new)
      +	strbuf_reset(&sb);
      +
      +	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
     ++	assert(refs->err != REFTABLE_API_ERROR);
      +	strbuf_release(&sb);
      +	return ref_store;
      +}
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +done:
     ++	assert(err != REFTABLE_API_ERROR);
      +	strbuf_release(&referent);
      +	return err;
      +}
     @@ refs/reftable-backend.c (new)
      +	transaction->state = REF_TRANSACTION_PREPARED;
      +
      +done:
     ++	assert(err != REFTABLE_API_ERROR);
      +	if (err < 0) {
      +		transaction->state = REF_TRANSACTION_CLOSED;
      +		strbuf_addf(errbuf, "reftable: transaction prepare: %s",
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +done:
     ++	assert(err != REFTABLE_API_ERROR);
      +	free(logs);
      +	free(sorted);
      +	return err;
     @@ refs/reftable-backend.c (new)
      +	err = reftable_addition_commit(add);
      +
      +done:
     ++	assert(err != REFTABLE_API_ERROR);
      +	reftable_addition_destroy(add);
      +	transaction->state = REF_TRANSACTION_CLOSED;
      +	transaction->backend_data = NULL;
     @@ refs/reftable-backend.c (new)
      +				    struct ref_transaction *transaction,
      +				    struct strbuf *errmsg)
      +{
     ++	int err = reftable_transaction_prepare(ref_store, transaction, errmsg);
     ++	if (err)
     ++		return err;
     ++
      +	return reftable_transaction_finish(ref_store, transaction, errmsg);
      +}
      +
     @@ refs/reftable-backend.c (new)
      +
      +	err = reftable_writer_add_ref(writer, &write_ref);
      +done:
     ++	assert(err != REFTABLE_API_ERROR);
      +	reftable_ref_record_clear(&read_ref);
      +	return err;
      +}
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +done:
     ++	assert(err != REFTABLE_API_ERROR);
      +	reftable_addition_destroy(add);
      +	return err;
      +}
     @@ refs/reftable-backend.c (new)
      +	if (err < 0) {
      +		goto done;
      +	}
     ++
     ++	string_list_sort(refnames);
      +	err = reftable_stack_reload(refs->stack);
      +	if (err) {
      +		goto done;
      +	}
      +	err = reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
      +done:
     ++	assert(err != REFTABLE_API_ERROR);
      +	return err;
      +}
      +
     @@ refs/reftable-backend.c (new)
      +	}
      +	err = reftable_stack_add(refs->stack, &write_create_symref_table, &arg);
      +done:
     ++	assert(err != REFTABLE_API_ERROR);
      +	return err;
      +}
      +
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +done:
     ++	assert(err != REFTABLE_API_ERROR);
      +	reftable_ref_record_clear(&ref);
      +	return err;
      +}
     @@ refs/reftable-backend.c (new)
      +
      +	err = reftable_stack_add(refs->stack, &write_rename_table, &arg);
      +done:
     ++	assert(err != REFTABLE_API_ERROR);
      +	return err;
      +}
      +
     @@ refs/reftable-backend.c (new)
      +	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
      +
      +done:
     ++	assert(err != REFTABLE_API_ERROR);
      +	reftable_log_record_clear(&log);
      +	reftable_iterator_destroy(&it);
      +	clear_log_tombstones(&arg);
     @@ refs/reftable-backend.c (new)
      +		err = -1;
      +	}
      +done:
     ++	assert(err != REFTABLE_API_ERROR);
      +	reftable_ref_record_clear(&ref);
      +	return err;
      +}
     @@ t/t0031-reftable.sh (new)
      +	test_cmp expect actual
      +'
      +
     ++test_expect_success 'clone calls transaction_initial_commit' '
     ++	test_commit message1 file1 &&
     ++	git clone . cloned &&
     ++	(test  -f cloned/file1 || echo "Fixme.")
     ++'
     ++
      +test_expect_success 'basic operation of reftable storage: commit, show-ref' '
      +	initialize &&
      +	test_commit file &&
  7:  6b248d5fdb4 = 11:  d3613c2ff53 Add GIT_DEBUG_REFS debugging mechanism
  8:  54102355ce7 = 12:  9b98ed614ec vcxproj: adjust for the reftable changes
  9:  7764ebf0956 ! 13:  5e401e4f1ac Add reftable testing infrastructure
     @@ Commit message
           * t1404-update-ref-errors.sh - Manipulates .git/refs/ directly
           * t1405 - inspecs .git/ directly.
      
     -    t6030-bisect-porcelain.sh                - 62 of 72
     +    Worst offenders:
     +
     +    t1400-update-ref.sh                      - 82 of 185
          t2400-worktree-add.sh                    - 58 of 69
     -    t3200-branch.sh                          - 58 of 145
     -    t7406-submodule-update.sh                - 54 of 54
     -    t5601-clone.sh                           - 51 of 105
     -    t9903-bash-prompt.sh                     - 50 of 66
          t1404-update-ref-errors.sh               - 44 of 53
     -    t5510-fetch.sh                           - 40 of 171
     -    t7400-submodule-basic.sh                 - 38 of 111
          t3514-cherry-pick-revert-gpg.sh          - 36 of 36
     +    t5541-http-push-smart.sh                 - 29 of 38
     +    t6003-rev-list-topo-order.sh             - 29 of 35
     +    t3420-rebase-autostash.sh                - 28 of 42
     +    t6120-describe.sh                        - 21 of 82
     +    t3430-rebase-merges.sh                   - 18 of 24
     +    t2018-checkout-branch.sh                 - 15 of 22
          ..
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
     @@ t/t9903-bash-prompt.sh: test_description='test git-specific bash prompt function
       actual="$TRASH_DIRECTORY/actual"
      
       ## t/test-lib.sh ##
     -@@ t/test-lib.sh: FreeBSD)
     +@@ t/test-lib.sh: parisc* | hppa*)
       	;;
       esac
       

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v15 01/13] Write pseudorefs through ref backends.
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 19:46                             ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 02/13] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
                                               ` (14 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Pseudorefs store transient data in in the repository. Examples are HEAD,
CHERRY_PICK_HEAD, etc.

These refs have always been read through the ref backends, but they were written
in a one-off routine that wrote an object ID or symref directly into
.git/<pseudo_ref_name>.

This causes problems when introducing a new ref storage backend. To remedy this,
extend the ref backend implementation with a write_pseudoref_fn and
update_pseudoref_fn.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c                | 119 ++++++++----------------------------------
 refs.h                |  11 ++++
 refs/files-backend.c  | 114 +++++++++++++++++++++++++++++++++++++++-
 refs/packed-backend.c |  21 +++++++-
 refs/refs-internal.h  |  12 +++++
 5 files changed, 178 insertions(+), 99 deletions(-)

diff --git a/refs.c b/refs.c
index 224ff66c7bb..12908066b13 100644
--- a/refs.c
+++ b/refs.c
@@ -321,6 +321,12 @@ int ref_exists(const char *refname)
 	return refs_ref_exists(get_main_ref_store(the_repository), refname);
 }
 
+int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
+{
+	return refs_delete_pseudoref(get_main_ref_store(the_repository),
+				     pseudoref, old_oid);
+}
+
 static int filter_refs(const char *refname, const struct object_id *oid,
 			   int flags, void *data)
 {
@@ -739,101 +745,6 @@ long get_files_ref_lock_timeout_ms(void)
 	return timeout_ms;
 }
 
-static int write_pseudoref(const char *pseudoref, const struct object_id *oid,
-			   const struct object_id *old_oid, struct strbuf *err)
-{
-	const char *filename;
-	int fd;
-	struct lock_file lock = LOCK_INIT;
-	struct strbuf buf = STRBUF_INIT;
-	int ret = -1;
-
-	if (!oid)
-		return 0;
-
-	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
-
-	filename = git_path("%s", pseudoref);
-	fd = hold_lock_file_for_update_timeout(&lock, filename, 0,
-					       get_files_ref_lock_timeout_ms());
-	if (fd < 0) {
-		strbuf_addf(err, _("could not open '%s' for writing: %s"),
-			    filename, strerror(errno));
-		goto done;
-	}
-
-	if (old_oid) {
-		struct object_id actual_old_oid;
-
-		if (read_ref(pseudoref, &actual_old_oid)) {
-			if (!is_null_oid(old_oid)) {
-				strbuf_addf(err, _("could not read ref '%s'"),
-					    pseudoref);
-				rollback_lock_file(&lock);
-				goto done;
-			}
-		} else if (is_null_oid(old_oid)) {
-			strbuf_addf(err, _("ref '%s' already exists"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		} else if (!oideq(&actual_old_oid, old_oid)) {
-			strbuf_addf(err, _("unexpected object ID when writing '%s'"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		}
-	}
-
-	if (write_in_full(fd, buf.buf, buf.len) < 0) {
-		strbuf_addf(err, _("could not write to '%s'"), filename);
-		rollback_lock_file(&lock);
-		goto done;
-	}
-
-	commit_lock_file(&lock);
-	ret = 0;
-done:
-	strbuf_release(&buf);
-	return ret;
-}
-
-static int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
-{
-	const char *filename;
-
-	filename = git_path("%s", pseudoref);
-
-	if (old_oid && !is_null_oid(old_oid)) {
-		struct lock_file lock = LOCK_INIT;
-		int fd;
-		struct object_id actual_old_oid;
-
-		fd = hold_lock_file_for_update_timeout(
-				&lock, filename, 0,
-				get_files_ref_lock_timeout_ms());
-		if (fd < 0) {
-			error_errno(_("could not open '%s' for writing"),
-				    filename);
-			return -1;
-		}
-		if (read_ref(pseudoref, &actual_old_oid))
-			die(_("could not read ref '%s'"), pseudoref);
-		if (!oideq(&actual_old_oid, old_oid)) {
-			error(_("unexpected object ID when deleting '%s'"),
-			      pseudoref);
-			rollback_lock_file(&lock);
-			return -1;
-		}
-
-		unlink(filename);
-		rollback_lock_file(&lock);
-	} else {
-		unlink(filename);
-	}
-
-	return 0;
-}
 
 int refs_delete_ref(struct ref_store *refs, const char *msg,
 		    const char *refname,
@@ -845,7 +756,7 @@ int refs_delete_ref(struct ref_store *refs, const char *msg,
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
 		assert(refs == get_main_ref_store(the_repository));
-		return delete_pseudoref(refname, old_oid);
+		return refs_delete_pseudoref(refs, refname, old_oid);
 	}
 
 	transaction = ref_store_transaction_begin(refs, &err);
@@ -1172,7 +1083,8 @@ int refs_update_ref(struct ref_store *refs, const char *msg,
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
 		assert(refs == get_main_ref_store(the_repository));
-		ret = write_pseudoref(refname, new_oid, old_oid, &err);
+		ret = refs_write_pseudoref(refs, refname, new_oid, old_oid,
+					   &err);
 	} else {
 		t = ref_store_transaction_begin(refs, &err);
 		if (!t ||
@@ -1441,6 +1353,19 @@ int head_ref(each_ref_fn fn, void *cb_data)
 	return refs_head_ref(get_main_ref_store(the_repository), fn, cb_data);
 }
 
+int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			 const struct object_id *oid,
+			 const struct object_id *old_oid, struct strbuf *err)
+{
+	return refs->be->write_pseudoref(refs, pseudoref, oid, old_oid, err);
+}
+
+int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			  const struct object_id *old_oid)
+{
+	return refs->be->delete_pseudoref(refs, pseudoref, old_oid);
+}
+
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
 		const char *prefix, int trim, int flags)
diff --git a/refs.h b/refs.h
index e010f8aec28..4dad8f24914 100644
--- a/refs.h
+++ b/refs.h
@@ -732,6 +732,17 @@ int update_ref(const char *msg, const char *refname,
 	       const struct object_id *new_oid, const struct object_id *old_oid,
 	       unsigned int flags, enum action_on_err onerr);
 
+/* Pseudorefs (eg. HEAD, CHERRY_PICK_HEAD) have a separate routines for updating
+   and deletion as they cannot take part in normal transactional updates.
+   Pseudorefs should only be written for the main repository.
+*/
+int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			 const struct object_id *oid,
+			 const struct object_id *old_oid, struct strbuf *err);
+int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			  const struct object_id *old_oid);
+int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid);
+
 int parse_hide_refs_config(const char *var, const char *value, const char *);
 
 /*
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6516c7bc8c8..df7553f4cc3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -731,6 +731,115 @@ static int lock_raw_ref(struct files_ref_store *refs,
 	return ret;
 }
 
+static int files_write_pseudoref(struct ref_store *ref_store,
+				 const char *pseudoref,
+				 const struct object_id *oid,
+				 const struct object_id *old_oid,
+				 struct strbuf *err)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "write_pseudoref");
+	int fd;
+	struct lock_file lock = LOCK_INIT;
+	struct strbuf filename = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	int ret = -1;
+
+	if (!oid)
+		return 0;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+	fd = hold_lock_file_for_update_timeout(&lock, filename.buf, 0,
+					       get_files_ref_lock_timeout_ms());
+	if (fd < 0) {
+		strbuf_addf(err, _("could not open '%s' for writing: %s"),
+			    buf.buf, strerror(errno));
+		goto done;
+	}
+
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(err, _("could not read ref '%s'"),
+					    pseudoref);
+				rollback_lock_file(&lock);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(err, _("ref '%s' already exists"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(err,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+	}
+
+	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
+	if (write_in_full(fd, buf.buf, buf.len) < 0) {
+		strbuf_addf(err, _("could not write to '%s'"), filename.buf);
+		rollback_lock_file(&lock);
+		goto done;
+	}
+
+	commit_lock_file(&lock);
+	ret = 0;
+done:
+	strbuf_release(&buf);
+	strbuf_release(&filename);
+	return ret;
+}
+
+static int files_delete_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *old_oid)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "delete_pseudoref");
+	struct strbuf filename = STRBUF_INIT;
+	int ret = -1;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+
+	if (old_oid && !is_null_oid(old_oid)) {
+		struct lock_file lock = LOCK_INIT;
+		int fd;
+		struct object_id actual_old_oid;
+
+		fd = hold_lock_file_for_update_timeout(
+			&lock, filename.buf, 0,
+			get_files_ref_lock_timeout_ms());
+		if (fd < 0) {
+			error_errno(_("could not open '%s' for writing"),
+				    filename.buf);
+			goto done;
+		}
+		if (read_ref(pseudoref, &actual_old_oid))
+			die(_("could not read ref '%s'"), pseudoref);
+		if (!oideq(&actual_old_oid, old_oid)) {
+			error(_("unexpected object ID when deleting '%s'"),
+			      pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+
+		unlink(filename.buf);
+		rollback_lock_file(&lock);
+	} else {
+		unlink(filename.buf);
+	}
+	ret = 0;
+done:
+	strbuf_release(&filename);
+	return ret;
+}
+
 struct files_ref_iterator {
 	struct ref_iterator base;
 
@@ -3189,6 +3298,9 @@ struct ref_storage_be refs_be_files = {
 	files_rename_ref,
 	files_copy_ref,
 
+	files_write_pseudoref,
+	files_delete_pseudoref,
+
 	files_ref_iterator_begin,
 	files_read_raw_ref,
 
@@ -3198,5 +3310,5 @@ struct ref_storage_be refs_be_files = {
 	files_reflog_exists,
 	files_create_reflog,
 	files_delete_reflog,
-	files_reflog_expire
+	files_reflog_expire,
 };
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 4458a0f69cc..08e8253a893 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1590,6 +1590,22 @@ static int packed_copy_ref(struct ref_store *ref_store,
 	BUG("packed reference store does not support copying references");
 }
 
+static int packed_write_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *oid,
+				  const struct object_id *old_oid,
+				  struct strbuf *err)
+{
+	BUG("packed reference store does not support writing pseudo-references");
+}
+
+static int packed_delete_pseudoref(struct ref_store *ref_store,
+				   const char *pseudoref,
+				   const struct object_id *old_oid)
+{
+	BUG("packed reference store does not support deleting pseudo-references");
+}
+
 static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_store)
 {
 	return empty_ref_iterator_begin();
@@ -1656,6 +1672,9 @@ struct ref_storage_be refs_be_packed = {
 	packed_rename_ref,
 	packed_copy_ref,
 
+	packed_write_pseudoref,
+	packed_delete_pseudoref,
+
 	packed_ref_iterator_begin,
 	packed_read_raw_ref,
 
@@ -1665,5 +1684,5 @@ struct ref_storage_be refs_be_packed = {
 	packed_reflog_exists,
 	packed_create_reflog,
 	packed_delete_reflog,
-	packed_reflog_expire
+	packed_reflog_expire,
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 4271362d264..4c6d77898aa 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -556,6 +556,15 @@ typedef int copy_ref_fn(struct ref_store *ref_store,
 			  const char *oldref, const char *newref,
 			  const char *logmsg);
 
+typedef int write_pseudoref_fn(struct ref_store *ref_store,
+			       const char *pseudoref,
+			       const struct object_id *oid,
+			       const struct object_id *old_oid,
+			       struct strbuf *err);
+typedef int delete_pseudoref_fn(struct ref_store *ref_store,
+				const char *pseudoref,
+				const struct object_id *old_oid);
+
 /*
  * Iterate over the references in `ref_store` whose names start with
  * `prefix`. `prefix` is matched as a literal string, without regard
@@ -655,6 +664,9 @@ struct ref_storage_be {
 	rename_ref_fn *rename_ref;
 	copy_ref_fn *copy_ref;
 
+	write_pseudoref_fn *write_pseudoref;
+	delete_pseudoref_fn *delete_pseudoref;
+
 	ref_iterator_begin_fn *iterator_begin;
 	read_raw_ref_fn *read_raw_ref;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v15 02/13] Make refs_ref_exists public
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 01/13] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 19:46                             ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 03/13] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
                                               ` (13 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 2 +-
 refs.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 12908066b13..812fee47108 100644
--- a/refs.c
+++ b/refs.c
@@ -311,7 +311,7 @@ int read_ref(const char *refname, struct object_id *oid)
 	return read_ref_full(refname, RESOLVE_REF_READING, oid, NULL);
 }
 
-static int refs_ref_exists(struct ref_store *refs, const char *refname)
+int refs_ref_exists(struct ref_store *refs, const char *refname)
 {
 	return !!refs_resolve_ref_unsafe(refs, refname, RESOLVE_REF_READING, NULL, NULL);
 }
diff --git a/refs.h b/refs.h
index 4dad8f24914..7aaa1226551 100644
--- a/refs.h
+++ b/refs.h
@@ -105,6 +105,8 @@ int refs_verify_refname_available(struct ref_store *refs,
 				  const struct string_list *skip,
 				  struct strbuf *err);
 
+int refs_ref_exists(struct ref_store *refs, const char *refname);
+
 int ref_exists(const char *refname);
 
 int should_autocreate_reflog(const char *refname);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v15 03/13] Treat BISECT_HEAD as a pseudo ref
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 01/13] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 02/13] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 19:46                             ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 20:52                               ` Junio C Hamano
  2020-05-28 19:46                             ` [PATCH v15 04/13] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
                                               ` (12 subsequent siblings)
  15 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Both the git-bisect.sh as bisect--helper inspected the file system directly.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/bisect--helper.c | 3 +--
 git-bisect.sh            | 4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index ec4996282e3..73f9324ad7d 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -13,7 +13,6 @@ static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
 static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
 static GIT_PATH_FUNC(git_path_bisect_ancestors_ok, "BISECT_ANCESTORS_OK")
 static GIT_PATH_FUNC(git_path_bisect_start, "BISECT_START")
-static GIT_PATH_FUNC(git_path_bisect_head, "BISECT_HEAD")
 static GIT_PATH_FUNC(git_path_bisect_log, "BISECT_LOG")
 static GIT_PATH_FUNC(git_path_head_name, "head-name")
 static GIT_PATH_FUNC(git_path_bisect_names, "BISECT_NAMES")
@@ -164,7 +163,7 @@ static int bisect_reset(const char *commit)
 		strbuf_addstr(&branch, commit);
 	}
 
-	if (!file_exists(git_path_bisect_head())) {
+	if (!ref_exists("BISECT_HEAD")) {
 		struct argv_array argv = ARGV_ARRAY_INIT;
 
 		argv_array_pushl(&argv, "checkout", branch.buf, "--", NULL);
diff --git a/git-bisect.sh b/git-bisect.sh
index 08a6ed57ddb..f03fbb18f00 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -41,7 +41,7 @@ TERM_GOOD=good
 
 bisect_head()
 {
-	if test -f "$GIT_DIR/BISECT_HEAD"
+	if git rev-parse --verify -q BISECT_HEAD > /dev/null
 	then
 		echo BISECT_HEAD
 	else
@@ -153,7 +153,7 @@ bisect_next() {
 	git bisect--helper --bisect-next-check $TERM_GOOD $TERM_BAD $TERM_GOOD|| exit
 
 	# Perform all bisection computation, display and checkout
-	git bisect--helper --next-all $(test -f "$GIT_DIR/BISECT_HEAD" && echo --no-checkout)
+	git bisect--helper --next-all $(git rev-parse --verify -q BISECT_HEAD > /dev/null && echo --no-checkout)
 	res=$?
 
 	# Check if we should exit because bisection is finished
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v15 04/13] Treat CHERRY_PICK_HEAD as a pseudo ref
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (2 preceding siblings ...)
  2020-05-28 19:46                             ` [PATCH v15 03/13] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 19:46                             ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 05/13] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
                                               ` (11 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Check for existence and delete CHERRY_PICK_HEAD through pseudo ref functions.
This will help cherry-pick work with alternate ref storage backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/commit.c | 34 +++++++++++++++++++---------------
 builtin/merge.c  |  2 +-
 path.c           |  1 -
 path.h           |  7 ++++---
 sequencer.c      | 42 ++++++++++++++++++++++++++----------------
 wt-status.c      |  4 ++--
 6 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index d1b7396052a..e27120b982b 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -847,21 +847,25 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 			if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
 				!merge_contains_scissors)
 				wt_status_add_cut_line(s->fp);
-			status_printf_ln(s, GIT_COLOR_NORMAL,
-			    whence == FROM_MERGE
-				? _("\n"
-					"It looks like you may be committing a merge.\n"
-					"If this is not correct, please remove the file\n"
-					"	%s\n"
-					"and try again.\n")
-				: _("\n"
-					"It looks like you may be committing a cherry-pick.\n"
-					"If this is not correct, please remove the file\n"
-					"	%s\n"
-					"and try again.\n"),
-				whence == FROM_MERGE ?
-					git_path_merge_head(the_repository) :
-					git_path_cherry_pick_head(the_repository));
+			if (whence == FROM_MERGE)
+				status_printf_ln(
+					s, GIT_COLOR_NORMAL,
+
+					_("\n"
+					  "It looks like you may be committing a merge.\n"
+					  "If this is not correct, please remove the file\n"
+					  "	%s\n"
+					  "and try again.\n"),
+					git_path_merge_head(the_repository));
+			else
+				status_printf_ln(
+					s, GIT_COLOR_NORMAL,
+
+					_("\n"
+					  "It looks like you may be committing a cherry-pick.\n"
+					  "If this is not correct, please run\n"
+					  "	git cherry-pick --abort\n"
+					  "and try again.\n"));
 		}
 
 		fprintf(s->fp, "\n");
diff --git a/builtin/merge.c b/builtin/merge.c
index 7da707bf55d..93b0a7b6eda 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -1352,7 +1352,7 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
 		else
 			die(_("You have not concluded your merge (MERGE_HEAD exists)."));
 	}
-	if (file_exists(git_path_cherry_pick_head(the_repository))) {
+	if (ref_exists("CHERRY_PICK_HEAD")) {
 		if (advice_resolve_conflict)
 			die(_("You have not concluded your cherry-pick (CHERRY_PICK_HEAD exists).\n"
 			    "Please, commit your changes before you merge."));
diff --git a/path.c b/path.c
index 8b2c7531919..783cc2ae819 100644
--- a/path.c
+++ b/path.c
@@ -1528,7 +1528,6 @@ char *xdg_cache_home(const char *filename)
 	return NULL;
 }
 
-REPO_GIT_PATH_FUNC(cherry_pick_head, "CHERRY_PICK_HEAD")
 REPO_GIT_PATH_FUNC(revert_head, "REVERT_HEAD")
 REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
 REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
diff --git a/path.h b/path.h
index 1f1bf8f87a8..8941c018a99 100644
--- a/path.h
+++ b/path.h
@@ -170,7 +170,6 @@ void report_linked_checkout_garbage(void);
 	}
 
 struct path_cache {
-	const char *cherry_pick_head;
 	const char *revert_head;
 	const char *squash_msg;
 	const char *merge_msg;
@@ -182,9 +181,11 @@ struct path_cache {
 	const char *shallow;
 };
 
-#define PATH_CACHE_INIT { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL }
+#define PATH_CACHE_INIT                                              \
+	{                                                            \
+		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
+	}
 
-const char *git_path_cherry_pick_head(struct repository *r);
 const char *git_path_revert_head(struct repository *r);
 const char *git_path_squash_msg(struct repository *r);
 const char *git_path_merge_msg(struct repository *r);
diff --git a/sequencer.c b/sequencer.c
index fd7701c88a8..188488cb40b 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -381,7 +381,8 @@ static void print_advice(struct repository *r, int show_hint,
 		 * (typically rebase --interactive) wants to take care
 		 * of the commit itself so remove CHERRY_PICK_HEAD
 		 */
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		return;
 	}
 
@@ -1455,7 +1456,8 @@ static int do_commit(struct repository *r,
 				    author, opts, flags, &oid);
 		strbuf_release(&sb);
 		if (!res) {
-			unlink(git_path_cherry_pick_head(r));
+			refs_delete_pseudoref(get_main_ref_store(r),
+					      "CHERRY_PICK_HEAD", NULL);
 			unlink(git_path_merge_msg(r));
 			if (!is_rebase_i(opts))
 				print_commit_summary(r, NULL, &oid,
@@ -1966,7 +1968,8 @@ static int do_pick_commit(struct repository *r,
 		flags |= ALLOW_EMPTY;
 	} else if (allow == 2) {
 		drop_commit = 1;
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		unlink(git_path_merge_msg(r));
 		fprintf(stderr,
 			_("dropping %s %s -- patch contents already upstream\n"),
@@ -2305,8 +2308,10 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose)
 	struct replay_opts opts = REPLAY_OPTS_INIT;
 	int need_cleanup = 0;
 
-	if (file_exists(git_path_cherry_pick_head(r))) {
-		if (!unlink(git_path_cherry_pick_head(r)) && verbose)
+	if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD")) {
+		if (!refs_delete_pseudoref(get_main_ref_store(r),
+					   "CHERRY_PICK_HEAD", NULL) &&
+		    verbose)
 			warning(_("cancelling a cherry picking in progress"));
 		opts.action = REPLAY_PICK;
 		need_cleanup = 1;
@@ -2671,8 +2676,9 @@ static int create_seq_dir(struct repository *r)
 	enum replay_action action;
 	const char *in_progress_error = NULL;
 	const char *in_progress_advice = NULL;
-	unsigned int advise_skip = file_exists(git_path_revert_head(r)) ||
-				file_exists(git_path_cherry_pick_head(r));
+	unsigned int advise_skip =
+		file_exists(git_path_revert_head(r)) ||
+		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
 
 	if (!sequencer_get_last_command(r, &action)) {
 		switch (action) {
@@ -2771,7 +2777,7 @@ static int rollback_single_pick(struct repository *r)
 {
 	struct object_id head_oid;
 
-	if (!file_exists(git_path_cherry_pick_head(r)) &&
+	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
 	    !file_exists(git_path_revert_head(r)))
 		return error(_("no cherry-pick or revert in progress"));
 	if (read_ref_full("HEAD", 0, &head_oid, NULL))
@@ -2874,7 +2880,8 @@ int sequencer_skip(struct repository *r, struct replay_opts *opts)
 		}
 		break;
 	case REPLAY_PICK:
-		if (!file_exists(git_path_cherry_pick_head(r))) {
+		if (!refs_ref_exists(get_main_ref_store(r),
+				     "CHERRY_PICK_HEAD")) {
 			if (action != REPLAY_PICK)
 				return error(_("no cherry-pick in progress"));
 			if (!rollback_is_safe())
@@ -3569,7 +3576,8 @@ static int do_merge(struct repository *r,
 					oid_to_hex(&j->item->object.oid));
 
 		strbuf_release(&ref_name);
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		rollback_lock_file(&lock);
 
 		rollback_lock_file(&lock);
@@ -4201,7 +4209,7 @@ static int continue_single_pick(struct repository *r)
 {
 	const char *argv[] = { "commit", NULL };
 
-	if (!file_exists(git_path_cherry_pick_head(r)) &&
+	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
 	    !file_exists(git_path_revert_head(r)))
 		return error(_("no cherry-pick or revert in progress"));
 	return run_command_v_opt(argv, RUN_GIT_CMD);
@@ -4318,9 +4326,10 @@ static int commit_staged_changes(struct repository *r,
 	}
 
 	if (is_clean) {
-		const char *cherry_pick_head = git_path_cherry_pick_head(r);
-
-		if (file_exists(cherry_pick_head) && unlink(cherry_pick_head))
+		if (refs_ref_exists(get_main_ref_store(r),
+				    "CHERRY_PICK_HEAD") &&
+		    !refs_delete_pseudoref(get_main_ref_store(r),
+					   "CHERRY_PICK_HEAD", NULL))
 			return error(_("could not remove CHERRY_PICK_HEAD"));
 		if (!final_fixup)
 			return 0;
@@ -4379,7 +4388,8 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
 
 	if (!is_rebase_i(opts)) {
 		/* Verify that the conflict has been resolved */
-		if (file_exists(git_path_cherry_pick_head(r)) ||
+		if (refs_ref_exists(get_main_ref_store(r),
+				    "CHERRY_PICK_HEAD") ||
 		    file_exists(git_path_revert_head(r))) {
 			res = continue_single_pick(r);
 			if (res)
@@ -5442,7 +5452,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list)
 
 int sequencer_determine_whence(struct repository *r, enum commit_whence *whence)
 {
-	if (file_exists(git_path_cherry_pick_head(r))) {
+	if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD")) {
 		struct object_id cherry_pick_head, rebase_head;
 
 		if (file_exists(git_path_seq_dir()))
diff --git a/wt-status.c b/wt-status.c
index 98dfa6f73f9..96302be030b 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1636,8 +1636,8 @@ void wt_status_get_state(struct repository *r,
 		state->merge_in_progress = 1;
 	} else if (wt_status_check_rebase(NULL, state)) {
 		;		/* all set */
-	} else if (!stat(git_path_cherry_pick_head(r), &st) &&
-			!get_oid("CHERRY_PICK_HEAD", &oid)) {
+	} else if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
+		   !get_oid("CHERRY_PICK_HEAD", &oid)) {
 		state->cherry_pick_in_progress = 1;
 		oidcpy(&state->cherry_pick_head_oid, &oid);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v15 05/13] Treat REVERT_HEAD as a pseudo ref
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (3 preceding siblings ...)
  2020-05-28 19:46                             ` [PATCH v15 04/13] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 19:46                             ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 06/13] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
                                               ` (10 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 path.c      |  1 -
 path.h      |  8 +++-----
 sequencer.c | 18 ++++++++++--------
 wt-status.c |  2 +-
 4 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/path.c b/path.c
index 783cc2ae819..7b385e5eb28 100644
--- a/path.c
+++ b/path.c
@@ -1528,7 +1528,6 @@ char *xdg_cache_home(const char *filename)
 	return NULL;
 }
 
-REPO_GIT_PATH_FUNC(revert_head, "REVERT_HEAD")
 REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
 REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
 REPO_GIT_PATH_FUNC(merge_rr, "MERGE_RR")
diff --git a/path.h b/path.h
index 8941c018a99..e7e77da6aaa 100644
--- a/path.h
+++ b/path.h
@@ -170,7 +170,6 @@ void report_linked_checkout_garbage(void);
 	}
 
 struct path_cache {
-	const char *revert_head;
 	const char *squash_msg;
 	const char *merge_msg;
 	const char *merge_rr;
@@ -181,12 +180,11 @@ struct path_cache {
 	const char *shallow;
 };
 
-#define PATH_CACHE_INIT                                              \
-	{                                                            \
-		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
+#define PATH_CACHE_INIT                                        \
+	{                                                      \
+		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
 	}
 
-const char *git_path_revert_head(struct repository *r);
 const char *git_path_squash_msg(struct repository *r);
 const char *git_path_merge_msg(struct repository *r);
 const char *git_path_merge_rr(struct repository *r);
diff --git a/sequencer.c b/sequencer.c
index 188488cb40b..a41913455ec 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2317,8 +2317,10 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose)
 		need_cleanup = 1;
 	}
 
-	if (file_exists(git_path_revert_head(r))) {
-		if (!unlink(git_path_revert_head(r)) && verbose)
+	if (refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
+		if (!refs_delete_pseudoref(get_main_ref_store(r), "REVERT_HEAD",
+					   NULL) &&
+		    verbose)
 			warning(_("cancelling a revert in progress"));
 		opts.action = REPLAY_REVERT;
 		need_cleanup = 1;
@@ -2677,8 +2679,8 @@ static int create_seq_dir(struct repository *r)
 	const char *in_progress_error = NULL;
 	const char *in_progress_advice = NULL;
 	unsigned int advise_skip =
-		file_exists(git_path_revert_head(r)) ||
-		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
+		refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD");
+	refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
 
 	if (!sequencer_get_last_command(r, &action)) {
 		switch (action) {
@@ -2778,7 +2780,7 @@ static int rollback_single_pick(struct repository *r)
 	struct object_id head_oid;
 
 	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
-	    !file_exists(git_path_revert_head(r)))
+	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
 		return error(_("no cherry-pick or revert in progress"));
 	if (read_ref_full("HEAD", 0, &head_oid, NULL))
 		return error(_("cannot resolve HEAD"));
@@ -2872,7 +2874,7 @@ int sequencer_skip(struct repository *r, struct replay_opts *opts)
 	 */
 	switch (opts->action) {
 	case REPLAY_REVERT:
-		if (!file_exists(git_path_revert_head(r))) {
+		if (!refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
 			if (action != REPLAY_REVERT)
 				return error(_("no revert in progress"));
 			if (!rollback_is_safe())
@@ -4210,7 +4212,7 @@ static int continue_single_pick(struct repository *r)
 	const char *argv[] = { "commit", NULL };
 
 	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
-	    !file_exists(git_path_revert_head(r)))
+	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
 		return error(_("no cherry-pick or revert in progress"));
 	return run_command_v_opt(argv, RUN_GIT_CMD);
 }
@@ -4390,7 +4392,7 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
 		/* Verify that the conflict has been resolved */
 		if (refs_ref_exists(get_main_ref_store(r),
 				    "CHERRY_PICK_HEAD") ||
-		    file_exists(git_path_revert_head(r))) {
+		    refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
 			res = continue_single_pick(r);
 			if (res)
 				goto release_todo_list;
diff --git a/wt-status.c b/wt-status.c
index 96302be030b..81b768504a1 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1642,7 +1642,7 @@ void wt_status_get_state(struct repository *r,
 		oidcpy(&state->cherry_pick_head_oid, &oid);
 	}
 	wt_status_check_bisect(NULL, state);
-	if (!stat(git_path_revert_head(r), &st) &&
+	if (refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD") &&
 	    !get_oid("REVERT_HEAD", &oid)) {
 		state->revert_in_progress = 1;
 		oidcpy(&state->revert_head_oid, &oid);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v15 06/13] Move REF_LOG_ONLY to refs-internal.h
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (4 preceding siblings ...)
  2020-05-28 19:46                             ` [PATCH v15 05/13] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 19:46                             ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 07/13] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
                                               ` (9 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

REF_LOG_ONLY is used in the transaction preparation: if a symref is involved in
a transaction, the referent of the symref should be updated, and the symref
itself should only be updated in the reflog. Other ref backends will need to
duplicate this logic.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/files-backend.c | 7 -------
 refs/refs-internal.h | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index df7553f4cc3..141b6b08816 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -38,13 +38,6 @@
  */
 #define REF_NEEDS_COMMIT (1 << 6)
 
-/*
- * Used as a flag in ref_update::flags when we want to log a ref
- * update but not actually perform it.  This is used when a symbolic
- * ref update is split up.
- */
-#define REF_LOG_ONLY (1 << 7)
-
 /*
  * Used as a flag in ref_update::flags when the ref_update was via an
  * update to HEAD.
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 4c6d77898aa..d1d00947990 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -31,6 +31,13 @@ struct ref_transaction;
  */
 #define REF_HAVE_OLD (1 << 3)
 
+/*
+ * Used as a flag in ref_update::flags when we want to log a ref
+ * update but not actually perform it.  This is used when a symbolic
+ * ref update is split up.
+ */
+#define REF_LOG_ONLY (1 << 7)
+
 /*
  * Return the length of time to retry acquiring a loose reference lock
  * before giving up, in milliseconds:
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v15 07/13] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (5 preceding siblings ...)
  2020-05-28 19:46                             ` [PATCH v15 06/13] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 19:46                             ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 08/13] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
                                               ` (8 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This happens implicitly in the files/packed ref backend; making it
explicit simplifies adding alternate ref storage backends, such as
reftable.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 812fee47108..29e710a29e9 100644
--- a/refs.c
+++ b/refs.c
@@ -1450,7 +1450,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1510,8 +1510,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
-			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
+			       cb_data);
 }
 
 int for_each_rawref(each_ref_fn fn, void *cb_data)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v15 08/13] Add .gitattributes for the reftable/ directory
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (6 preceding siblings ...)
  2020-05-28 19:46                             ` [PATCH v15 07/13] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 19:46                             ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 09/13] Add reftable library Han-Wen Nienhuys via GitGitGadget
                                               ` (7 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 reftable/.gitattributes

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 00000000000..f44451a3795
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v15 09/13] Add reftable library
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (7 preceding siblings ...)
  2020-05-28 19:46                             ` [PATCH v15 08/13] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 19:46                             ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 10/13] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (6 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. Its rationale and
specification is in the preceding commit.

This imports the upstream library as one big commit. For understanding
the code, it is suggested to read the code in the following order:

* The specification under Documentation/technical/reftable.txt

* reftable.h - the public API

* record.{c,h} - reading and writing records

* block.{c,h} - reading and writing blocks.

* writer.{c,h} - writing a complete reftable file.

* merged.{c,h} and pq.{c,h} - reading a stack of reftables

* stack.{c,h} - writing and compacting stacks of reftable on the
filesystem.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE       |   31 ++
 reftable/README.md     |   11 +
 reftable/VERSION       |    8 +
 reftable/basics.c      |  215 +++++++
 reftable/basics.h      |   53 ++
 reftable/block.c       |  433 ++++++++++++++
 reftable/block.h       |  129 +++++
 reftable/constants.h   |   21 +
 reftable/file.c        |   95 ++++
 reftable/iter.c        |  243 ++++++++
 reftable/iter.h        |   72 +++
 reftable/merged.c      |  320 +++++++++++
 reftable/merged.h      |   39 ++
 reftable/pq.c          |  113 ++++
 reftable/pq.h          |   34 ++
 reftable/reader.c      |  742 ++++++++++++++++++++++++
 reftable/reader.h      |   65 +++
 reftable/record.c      | 1113 ++++++++++++++++++++++++++++++++++++
 reftable/record.h      |  128 +++++
 reftable/refname.c     |  209 +++++++
 reftable/refname.h     |   38 ++
 reftable/reftable.c    |   90 +++
 reftable/reftable.h    |  564 +++++++++++++++++++
 reftable/slice.c       |  247 ++++++++
 reftable/slice.h       |   87 +++
 reftable/stack.c       | 1207 ++++++++++++++++++++++++++++++++++++++++
 reftable/stack.h       |   48 ++
 reftable/system.h      |   54 ++
 reftable/tree.c        |   63 +++
 reftable/tree.h        |   34 ++
 reftable/update.sh     |   24 +
 reftable/writer.c      |  644 +++++++++++++++++++++
 reftable/writer.h      |   60 ++
 reftable/zlib-compat.c |   92 +++
 34 files changed, 7326 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/system.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 00000000000..402e0f9356b
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 00000000000..21500563fd4
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,11 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   sh reftable/update.sh
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 00000000000..eb557542cbe
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1,8 @@
+commit fe15c65dd95b6a13de92b4baf711e50338ef20ae
+Author: Han-Wen Nienhuys <hanwen@google.com>
+Date:   Thu May 28 19:28:24 2020 +0200
+
+    C: add 'canary' to struct slice
+    
+    This enforces that all slices are initialized explicitly with
+    SLICE_INIT, for future compatibility with git's strbuf.
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 00000000000..14c4dfaf5a6
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,215 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)(i & 0xff);
+}
+
+uint32_t get_be24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)(i & 0xff);
+}
+
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args)
+{
+	size_t lo = 0;
+	size_t hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		size_t mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		reftable_free(*p);
+		p++;
+	}
+	reftable_free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = reftable_realloc(
+					names, names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = reftable_realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *reftable_error_str(int err)
+{
+	static char buf[250];
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return "I/O error";
+	case REFTABLE_FORMAT_ERROR:
+		return "corrupt reftable file";
+	case REFTABLE_NOT_EXIST_ERROR:
+		return "file does not exist";
+	case REFTABLE_LOCK_ERROR:
+		return "data is outdated";
+	case REFTABLE_API_ERROR:
+		return "misuse of the reftable API";
+	case REFTABLE_ZLIB_ERROR:
+		return "zlib failure";
+	case REFTABLE_NAME_CONFLICT:
+		return "file/directory conflict";
+	case REFTABLE_REFNAME_ERROR:
+		return "invalid refname";
+	case -1:
+		return "general error";
+	default:
+		snprintf(buf, sizeof(buf), "unknown error code %d", err);
+		return buf;
+	}
+}
+
+int reftable_error_to_errno(int err)
+{
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return EIO;
+	case REFTABLE_FORMAT_ERROR:
+		return EFAULT;
+	case REFTABLE_NOT_EXIST_ERROR:
+		return ENOENT;
+	case REFTABLE_LOCK_ERROR:
+		return EBUSY;
+	case REFTABLE_API_ERROR:
+		return EINVAL;
+	case REFTABLE_ZLIB_ERROR:
+		return EDOM;
+	default:
+		return ERANGE;
+	}
+}
+
+void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
+void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
+void (*reftable_free_ptr)(void *) = &free;
+
+void *reftable_malloc(size_t sz)
+{
+	return (*reftable_malloc_ptr)(sz);
+}
+
+void *reftable_realloc(void *p, size_t sz)
+{
+	return (*reftable_realloc_ptr)(p, sz);
+}
+
+void reftable_free(void *p)
+{
+	reftable_free_ptr(p);
+}
+
+void *reftable_calloc(size_t sz)
+{
+	void *p = reftable_malloc(sz);
+	memset(p, 0, sz);
+	return p;
+}
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *))
+{
+	reftable_malloc_ptr = malloc;
+	reftable_realloc_ptr = realloc;
+	reftable_free_ptr = free;
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 00000000000..1b003485c9c
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+/* Bigendian en/decoding of integers */
+
+void put_be24(byte *out, uint32_t i);
+uint32_t get_be24(byte *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args);
+
+/*
+  Frees a NULL terminated array of malloced strings. The array itself is also
+  freed.
+ */
+void free_names(char **a);
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp);
+
+/* compares two NULL-terminated arrays of strings. */
+int names_equal(char **a, char **b);
+
+/* returns the array size of a NULL-terminated array of strings. */
+int names_length(char **names);
+
+/* Allocation routines; they invoke the functions set through
+ * reftable_set_alloc() */
+void *reftable_malloc(size_t sz);
+void *reftable_realloc(void *p, size_t sz);
+void reftable_free(void *p);
+void *reftable_calloc(size_t sz);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 00000000000..543523fb407
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,433 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int header_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 24;
+	case 2:
+		return 28;
+	}
+	abort();
+}
+
+int footer_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 68;
+	case 2:
+		return 72;
+	}
+	abort();
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+	bw->last_key.canary = SLICE_CANARY;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the reftable_record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec)
+{
+	struct slice empty = SLICE_INIT;
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+		.canary = SLICE_CANARY,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = SLICE_INIT;
+	int n = 0;
+
+	reftable_record_key(rec, &key);
+	n = reftable_encode_key(&restart, out, last, key,
+				reftable_record_val_type(rec));
+	if (n < 0)
+		goto done;
+	slice_consume(&out, n);
+
+	n = reftable_record_encode(rec, out, w->hash_size);
+	if (n < 0)
+		goto done;
+	slice_consume(&out, n);
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  key) < 0)
+		goto done;
+
+	slice_release(&key);
+	return 0;
+
+done:
+	slice_release(&key);
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next)
+		return -1;
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = reftable_realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = SLICE_INIT;
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		slice_resize(&compressed, src_len);
+
+		while (1) {
+			uLongf dest_len = compressed.len;
+
+			zresult = compress2(compressed.buf, &dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				slice_resize(&compressed, 2 * compressed.len);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				slice_release(&compressed);
+				return REFTABLE_ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed.buf,
+			       dest_len);
+			w->next = dest_len + block_header_skip;
+			slice_release(&compressed);
+			break;
+		}
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct reftable_block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	uint16_t restart_count = 0;
+	uint32_t restart_start = 0;
+	byte *restart_bytes = NULL;
+
+	if (!reftable_is_block_type(typ))
+		return REFTABLE_FORMAT_ERROR;
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = SLICE_INIT;
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip; /* total size of dest
+							    buffer. */
+		uLongf src_len = block->len - block_header_skip;
+
+		/* Log blocks specify the *uncompressed* size in their header.
+		 */
+		slice_resize(&uncompressed, sz);
+
+		/* Copy over the block header verbatim. It's not compressed. */
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		/* Uncompress */
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			slice_release(&uncompressed);
+			return REFTABLE_ZLIB_ERROR;
+		}
+
+		if (dst_len + block_header_skip != sz)
+			return REFTABLE_FORMAT_ERROR;
+
+		/* We're done with the input data. */
+		reftable_block_done(block);
+		block->data = uncompressed.buf;
+		block->len = sz;
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	restart_count = get_be16(block->data + sz - 2);
+	restart_start = sz - 2 - 3 * restart_count;
+	restart_bytes = block->data + restart_start;
+
+	/* transfer ownership. */
+	br->block = *block;
+	block->data = NULL;
+	block->len = 0;
+
+	br->hash_size = hash_size;
+	br->block_len = restart_start;
+	br->full_block_size = full_block_size;
+	br->header_off = header_off;
+	br->restart_count = restart_count;
+	br->restart_bytes = restart_bytes;
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct slice key;
+	struct block_reader *r;
+};
+
+static int restart_key_less(size_t idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+		.canary = SLICE_CANARY,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = SLICE_INIT;
+	struct slice last_key = SLICE_INIT;
+	byte unused_extra;
+	int n = reftable_decode_key(&rkey, &unused_extra, last_key, in);
+	int result;
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	result = slice_cmp(a->key, rkey);
+	slice_release(&rkey);
+	return result;
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, src->last_key);
+}
+
+int block_iter_next(struct block_iter *it, struct reftable_record *rec)
+{
+	struct slice in = {
+		.buf = it->br->block.data + it->next_off,
+		.len = it->br->block_len - it->next_off,
+		.canary = SLICE_CANARY,
+	};
+	struct slice start = in;
+	struct slice key = SLICE_INIT;
+	byte extra = 0;
+	int n = 0;
+
+	if (it->next_off >= it->br->block_len)
+		return 1;
+
+	n = reftable_decode_key(&key, &extra, it->last_key, in);
+	if (n < 0)
+		return -1;
+
+	slice_consume(&in, n);
+	n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
+	if (n < 0)
+		return -1;
+	slice_consume(&in, n);
+
+	slice_copy(&it->last_key, key);
+	it->next_off += start.len - in.len;
+	slice_release(&key);
+	return 0;
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = SLICE_INIT;
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+		.canary = SLICE_CANARY,
+	};
+
+	byte extra = 0;
+	int n = reftable_decode_key(key, &extra, empty, in);
+	if (n < 0)
+		return n;
+
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	slice_release(&it->last_key);
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want)
+{
+	struct restart_find_args args = {
+		.key = want,
+		.r = br,
+	};
+	struct reftable_record rec = reftable_new_record(block_reader_type(br));
+	struct slice key = SLICE_INIT;
+	int err = 0;
+	struct block_iter next = {
+		.last_key = SLICE_INIT,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	/* We're looking for the last entry less/equal than the wanted key, so
+	   we have to go one entry too far and then back up.
+	*/
+	while (true) {
+		block_iter_copy_from(&next, it);
+		err = block_iter_next(&next, &rec);
+		if (err < 0)
+			goto done;
+
+		reftable_record_key(&rec, &key);
+		if (err > 0 || slice_cmp(key, want) >= 0) {
+			err = 0;
+			goto done;
+		}
+
+		block_iter_copy_from(it, &next);
+	}
+
+done:
+	slice_release(&key);
+	slice_release(&next.last_key);
+	reftable_record_destroy(&rec);
+
+	return err;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	slice_release(&bw->last_key);
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 00000000000..fe7a2a479c1
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,129 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+/*
+  Writes reftable blocks. The block_writer is reused across blocks to minimize
+  allocation overhead.
+*/
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+
+	/* Offset ofof the global header. Nonzero in the first block only. */
+	uint32_t header_off;
+
+	/* How often to restart keys. */
+	int restart_interval;
+	int hash_size;
+
+	/* Offset of next byte to write. */
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+
+	struct slice last_key;
+	int entries;
+};
+
+/*
+  initializes the blockwriter to write `typ` entries, using `buf` as temporary
+  storage. `buf` is not owned by the block_writer. */
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+
+/*
+  returns the block type (eg. 'r' for ref records.
+*/
+byte block_writer_type(struct block_writer *bw);
+
+/* appends the record, or -1 if it doesn't fit. */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec);
+
+/* appends the key restarts, and compress the block if necessary. */
+int block_writer_finish(struct block_writer *w);
+
+/* clears out internally allocated block_writer members. */
+void block_writer_clear(struct block_writer *bw);
+
+/* Read a block. */
+struct block_reader {
+	/* offset of the block header; nonzero for the first block in a
+	 * reftable. */
+	uint32_t header_off;
+
+	/* the memory block */
+	struct reftable_block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint16_t restart_count;
+
+	/* size of the data in the file. For log blocks, this is the compressed
+	 * size. */
+	uint32_t full_block_size;
+};
+
+/* Iterate over entries in a block */
+struct block_iter {
+	/* offset within the block of the next entry to read. */
+	uint32_t next_off;
+	struct block_reader *br;
+
+	/* key for last entry we read. */
+	struct slice last_key;
+};
+
+/* initializes a block reader. */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Position `it` at start of the block */
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+
+/* Position `it` to the `want` key in the block */
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice want);
+
+/* Returns the block type (eg. 'r' for refs) */
+byte block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct reftable_record *rec);
+
+/* Seek to `want` with in the block pointed to by `it` */
+int block_iter_seek(struct block_iter *it, struct slice want);
+
+/* deallocate memory for `it`. The block reader and its block is left intact. */
+void block_iter_close(struct block_iter *it);
+
+/* size of file header, depending on format version */
+int header_size(int version);
+
+/* size of file footer, depending on format version */
+int footer_size(int version);
+
+/* returns a block to its source. */
+void reftable_block_done(struct reftable_block *ret);
+
+#endif
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 00000000000..5eee72c4c11
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,21 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 00000000000..63843a640b0
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,95 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	reftable_free(b);
+}
+
+static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = reftable_malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size)
+		return -1;
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int reftable_block_source_from_file(struct reftable_block_source *bs,
+				    const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	struct file_block_source *p = NULL;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return REFTABLE_NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0)
+		return -1;
+
+	p = reftable_calloc(sizeof(struct file_block_source));
+	p->size = st.st_size;
+	p->fd = fd;
+
+	assert(bs->ops == NULL);
+	bs->ops = &file_vtable;
+	bs->arg = p;
+	return 0;
+}
+
+int reftable_fd_write(void *arg, byte *data, size_t sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 00000000000..df7bcdc2160
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,243 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct reftable_iterator *it)
+{
+	return it->ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct reftable_record *rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct reftable_iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct reftable_iterator *it)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec)
+{
+	return it->ops->next(it->iter_arg, rec);
+}
+
+void reftable_iterator_destroy(struct reftable_iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, ref);
+	return iterator_next(it, &rec);
+}
+
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, log);
+	return iterator_next(it, &rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	slice_release(&fri->oid);
+	reftable_iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg,
+				       struct reftable_record *rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+	int err = 0;
+	while (true) {
+		err = reftable_iterator_next_ref(&fri->it, ref);
+		if (err != 0) {
+			break;
+		}
+
+		if (fri->double_check) {
+			struct reftable_iterator it = { 0 };
+
+			err = reftable_table_seek_ref(&fri->tab, &it,
+						      ref->ref_name);
+			if (err == 0) {
+				err = reftable_iterator_next_ref(&it, ref);
+			}
+
+			reftable_iterator_destroy(&it);
+
+			if (err < 0) {
+				break;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+
+	reftable_ref_record_clear(ref);
+	return err;
+}
+
+struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reftable_block_done(&it->block_reader.block);
+	slice_release(&it->oid);
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	uint64_t off;
+	int err = 0;
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reftable_block_done(&it->block_reader.block);
+
+	off = it->offsets[it->offset_idx++];
+	err = reader_init_block_reader(it->r, &it->block_reader, off,
+				       BLOCK_TYPE_REF);
+	if (err < 0) {
+		return err;
+	}
+	if (err > 0) {
+		/* indexed block does not exist. */
+		return REFTABLE_FORMAT_ERROR;
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter empty = INDEXED_TABLE_REF_ITER_INIT;
+	struct indexed_table_ref_iter *itr =
+		reftable_calloc(sizeof(struct indexed_table_ref_iter));
+	int err = 0;
+
+	*itr = empty;
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		reftable_free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 00000000000..b6294cf3206
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,72 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct reftable_iterator_vtable {
+	int (*next)(void *iter_arg, struct reftable_record *rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct reftable_iterator *it);
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec);
+
+/* Returns true for a zeroed out iterator, such as the one returned from
+   iterator_destroy. */
+bool iterator_is_null(struct reftable_iterator *it);
+
+/* iterator that produces only ref records that point to `oid` */
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reftable_table tab;
+	struct slice oid;
+	struct reftable_iterator it;
+};
+#define FILTERING_REF_ITERATOR_INIT \
+	{                           \
+		.oid = SLICE_INIT   \
+	}
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
+					  struct filtering_ref_iterator *);
+
+/* iterator that produces only ref records that point to `oid`,
+   but using the object index.
+ */
+struct indexed_table_ref_iter {
+	struct reftable_reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+#define INDEXED_TABLE_REF_ITER_INIT                                   \
+	{                                                             \
+		.cur = { .last_key = SLICE_INIT }, .oid = SLICE_INIT, \
+	}
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 00000000000..874d460ded1
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,320 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct reftable_record rec = reftable_new_record(mi->typ);
+		int err = iterator_next(&mi->stack[i], &rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[i]);
+			reftable_record_destroy(&rec);
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->stack[i]);
+	}
+	reftable_free(mi->stack);
+}
+
+static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
+					       size_t idx)
+{
+	struct reftable_record rec = reftable_new_record(mi->typ);
+	struct pq_entry e = {
+		.rec = rec,
+		.index = idx,
+	};
+	int err = iterator_next(&mi->stack[idx], &rec);
+	if (err < 0)
+		return err;
+
+	if (err > 0) {
+		reftable_iterator_destroy(&mi->stack[idx]);
+		reftable_record_destroy(&rec);
+		return 0;
+	}
+
+	merged_iter_pqueue_add(&mi->pq, e);
+	return 0;
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
+{
+	if (iterator_is_null(&mi->stack[idx]))
+		return 0;
+	return merged_iter_advance_nonnull_subiter(mi, idx);
+}
+
+static int merged_iter_next_entry(struct merged_iter *mi,
+				  struct reftable_record *rec)
+{
+	struct slice entry_key = SLICE_INIT;
+	struct pq_entry entry = { 0 };
+	int err = 0;
+
+	if (merged_iter_pqueue_is_empty(mi->pq))
+		return 1;
+
+	entry = merged_iter_pqueue_remove(&mi->pq);
+	err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0)
+		return err;
+
+	/*
+	  One can also use reftable as datacenter-local storage, where the ref
+	  database is maintained in globally consistent database (eg.
+	  CockroachDB or Spanner). In this scenario, replication delays together
+	  with compaction may cause newer tables to contain older entries. In
+	  such a deployment, the loop below must be changed to collect all
+	  entries for the same key, and return new the newest one.
+	*/
+	reftable_record_key(&entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = SLICE_INIT;
+		int err = 0, cmp = 0;
+
+		reftable_record_key(&top.rec, &k);
+
+		cmp = slice_cmp(k, entry_key);
+		slice_release(&k);
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		reftable_record_destroy(&top.rec);
+	}
+
+	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
+	reftable_record_destroy(&entry.rec);
+	slice_release(&entry_key);
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec)
+{
+	while (true) {
+		int err = merged_iter_next_entry(mi, rec);
+		if (err == 0 && mi->suppress_deletions &&
+		    reftable_record_is_deletion(rec)) {
+			continue;
+		}
+
+		return err;
+	}
+}
+
+static int merged_iter_next_void(void *p, struct reftable_record *rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq))
+		return 1;
+
+	return merged_iter_next(mi, rec);
+}
+
+struct reftable_iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct reftable_iterator *it,
+				      struct merged_iter *mi)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id)
+{
+	struct reftable_merged_table *m = NULL;
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reftable_reader *r = stack[i];
+		if (r->hash_id != hash_id) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reftable_reader_min_update_index(r);
+		}
+
+		last_max = reftable_reader_max_update_index(r);
+	}
+
+	m = (struct reftable_merged_table *)reftable_calloc(
+		sizeof(struct reftable_merged_table));
+	m->stack = stack;
+	m->stack_len = n;
+	m->min = first_min;
+	m->max = last_max;
+	m->hash_id = hash_id;
+	*dest = m;
+	return 0;
+}
+
+void reftable_merged_table_close(struct reftable_merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reftable_reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct reftable_merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void reftable_merged_table_free(struct reftable_merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	reftable_free(mt);
+}
+
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
+{
+	return mt->min;
+}
+
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec)
+{
+	struct reftable_iterator *iters = reftable_calloc(
+		sizeof(struct reftable_iterator) * mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = reftable_record_type(rec),
+		.hash_id = mt->hash_id,
+		.suppress_deletions = mt->suppress_deletions,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			reftable_iterator_destroy(&iters[i]);
+		}
+		reftable_free(iters);
+		return err;
+	}
+
+	merged.stack_len = n;
+	err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	} else {
+		struct merged_iter *p =
+			reftable_malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 00000000000..4e03642bdbe
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct reftable_merged_table {
+	struct reftable_reader **stack;
+	int stack_len;
+	uint32_t hash_id;
+	bool suppress_deletions;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	int stack_len;
+	byte typ;
+	bool suppress_deletions;
+	struct merged_iter_pqueue pq;
+};
+
+void merged_table_clear(struct reftable_merged_table *mt);
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec);
+
+#endif
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 00000000000..236e36ae99d
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,113 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = SLICE_INIT;
+	struct slice bk = SLICE_INIT;
+	int cmp = 0;
+	reftable_record_key(&a.rec, &ak);
+	reftable_record_key(&b.rec, &bk);
+
+	cmp = slice_cmp(ak, bk);
+
+	slice_release(&ak);
+	slice_release(&bk);
+
+	if (cmp == 0)
+		return a.index > b.index;
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = reftable_realloc(pq->heap,
+					    pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		reftable_record_destroy(&pq->heap[i].rec);
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 00000000000..43c0974f162
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct reftable_record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 00000000000..3c98cafe384
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,742 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct reftable_block_source *source)
+{
+	return source->ops->size(source->arg);
+}
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	int result = source->ops->read_block(source->arg, dest, off, size);
+	dest->source = *source;
+	return result;
+}
+
+void reftable_block_done(struct reftable_block *blockp)
+{
+	struct reftable_block_source source = blockp->source;
+	if (blockp != NULL && source.ops != NULL)
+		source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct reftable_block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reftable_reader_offsets *
+reader_offsets_for(struct reftable_reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reftable_reader *r,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size)
+		return 0;
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(&r->source, dest, off, sz);
+}
+
+uint32_t reftable_reader_hash_id(struct reftable_reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reftable_reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reftable_reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	byte first_block_typ;
+	int err = 0;
+	uint32_t computed_crc;
+	uint32_t file_crc;
+
+	if (memcmp(f, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, header_size(r->version))) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	f++;
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	if (r->version == 1) {
+		r->hash_id = SHA1_ID;
+	} else {
+		r->hash_id = get_be32(f);
+		switch (r->hash_id) {
+		case SHA1_ID:
+			break;
+		case SHA256_ID:
+			break;
+		default:
+			err = REFTABLE_FORMAT_ERROR;
+			goto done;
+		}
+		f += 4;
+	}
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	computed_crc = crc32(0, footer, f - footer);
+	file_crc = get_be32(f);
+	f += 4;
+	if (computed_crc != file_crc) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	first_block_typ = header[header_size(r->version)];
+	r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+	r->ref_offsets.offset = 0;
+	r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+				  r->log_offsets.offset > 0);
+	r->obj_offsets.present = r->obj_offsets.offset > 0;
+	err = 0;
+done:
+	return err;
+}
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name)
+{
+	struct reftable_block footer = { 0 };
+	struct reftable_block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reftable_reader));
+
+	/* Need +1 to read type of first block. */
+	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
+	if (err != header_size(2) + 1) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	if (memcmp(header.data, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+	r->version = header.data[4];
+	if (r->version != 1 && r->version != 2) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	r->size = block_source_size(source) - footer_size(r->version);
+	r->source = *source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size,
+				      footer_size(r->version));
+	if (err != footer_size(r->version)) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+done:
+	reftable_block_done(&footer);
+	reftable_block_done(&header);
+	return err;
+}
+
+struct table_iter {
+	struct reftable_reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+#define TABLE_ITER_INIT                         \
+	{                                       \
+		.bi = {.last_key = SLICE_INIT } \
+	}
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti,
+				    struct reftable_record *rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && reftable_record_type(rec) == BLOCK_TYPE_REF) {
+		((struct reftable_ref_record *)rec->data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reftable_block_done(&ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off,
+				  int version)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += header_size(version);
+	}
+
+	*typ = data[0];
+	if (reftable_is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct reftable_block block = { 0 };
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : header_size(r->version);
+	int32_t block_size = 0;
+
+	if (next_off >= r->size)
+		return 1;
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0)
+		return err;
+
+	block_size = extract_block_size(block.data, &block_typ, next_off,
+					r->version);
+	if (block_size < 0)
+		return block_size;
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reftable_block_done(&block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reftable_block_done(&block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0)
+		return err;
+	else {
+		struct block_reader *brp =
+			reftable_malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
+{
+	if (reftable_record_type(rec) != ti->typ)
+		return REFTABLE_API_ERROR;
+
+	while (true) {
+		struct table_iter next = TABLE_ITER_INIT;
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct reftable_record *rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct reftable_iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reftable_reader *r,
+				struct table_iter *ti, uint64_t off, byte typ)
+{
+	struct block_reader br = { 0 };
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0)
+		return err;
+
+	brp = reftable_malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reftable_reader *r, struct table_iter *ti,
+			byte typ, bool index)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
+			      struct reftable_record *want)
+{
+	struct reftable_record rec =
+		reftable_new_record(reftable_record_type(want));
+	struct slice want_key = SLICE_INIT;
+	struct slice got_key = SLICE_INIT;
+	struct table_iter next = TABLE_ITER_INIT;
+	int err = -1;
+
+	reftable_record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0)
+			goto done;
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0)
+			goto done;
+
+		if (slice_cmp(got_key, want_key) > 0) {
+			table_iter_block_done(&next);
+			break;
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, want_key);
+	if (err < 0)
+		goto done;
+	err = 0;
+
+done:
+	block_iter_close(&next.bi);
+	reftable_record_destroy(&rec);
+	slice_release(&want_key);
+	slice_release(&got_key);
+	return err;
+}
+
+static int reader_seek_indexed(struct reftable_reader *r,
+			       struct reftable_iterator *it,
+			       struct reftable_record *rec)
+{
+	struct reftable_index_record want_index = { .last_key = SLICE_INIT };
+	struct reftable_record want_index_rec = { 0 };
+	struct reftable_index_record index_result = { .last_key = SLICE_INIT };
+	struct reftable_record index_result_rec = { 0 };
+	struct table_iter index_iter = TABLE_ITER_INIT;
+	struct table_iter next = TABLE_ITER_INIT;
+	int err = 0;
+
+	reftable_record_key(rec, &want_index.last_key);
+	reftable_record_from_index(&want_index_rec, &want_index);
+	reftable_record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, reftable_record_type(rec), true);
+	if (err < 0)
+		goto done;
+
+	err = reader_seek_linear(r, &index_iter, &want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, &index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0)
+			goto done;
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0)
+			goto done;
+
+		err = block_iter_seek(&next.bi, want_index.last_key);
+		if (err < 0)
+			goto done;
+
+		if (next.typ == reftable_record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = REFTABLE_FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter empty = TABLE_ITER_INIT;
+		struct table_iter *malloced =
+			reftable_calloc(sizeof(struct table_iter));
+		*malloced = empty;
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+done:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	reftable_record_clear(&want_index_rec);
+	reftable_record_clear(&index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reftable_reader *r,
+				struct reftable_iterator *it,
+				struct reftable_record *rec)
+{
+	struct reftable_reader_offsets *offs =
+		reader_offsets_for(r, reftable_record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = TABLE_ITER_INIT;
+	int err = 0;
+	if (idx > 0)
+		return reader_seek_indexed(r, it, rec);
+
+	err = reader_start(r, &ti, reftable_record_type(rec), false);
+	if (err < 0)
+		return err;
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0)
+		return err;
+	else {
+		struct table_iter *p =
+			reftable_malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec)
+{
+	byte typ = reftable_record_type(rec);
+
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reftable_reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int reftable_new_reader(struct reftable_reader **p,
+			struct reftable_block_source *src, char const *name)
+{
+	struct reftable_reader *rd =
+		reftable_calloc(sizeof(struct reftable_reader));
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		block_source_close(src);
+		reftable_free(rd);
+	}
+	return err;
+}
+
+void reftable_reader_free(struct reftable_reader *r)
+{
+	reader_close(r);
+	reftable_free(r);
+}
+
+static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
+					    struct reftable_iterator *it,
+					    byte *oid)
+{
+	struct reftable_obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct reftable_record want_rec = { 0 };
+	struct reftable_iterator oit = { 0 };
+	struct reftable_obj_record got = { 0 };
+	struct reftable_record got_rec = { 0 };
+	int err = 0;
+	struct indexed_table_ref_iter *itr = NULL;
+
+	/* Look through the reverse index. */
+	reftable_record_from_obj(&want_rec, &want);
+	err = reader_seek(r, &oit, &want_rec);
+	if (err != 0)
+		goto done;
+
+	/* read out the reftable_obj_record */
+	reftable_record_from_obj(&got_rec, &got);
+	err = iterator_next(&oit, &got_rec);
+	if (err < 0)
+		goto done;
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		/* didn't find it; return empty iterator */
+		iterator_set_empty(it);
+		err = 0;
+		goto done;
+	}
+
+	err = new_indexed_table_ref_iter(&itr, r, oid, hash_size(r->hash_id),
+					 got.offsets, got.offset_len);
+	if (err < 0)
+		goto done;
+	got.offsets = NULL;
+	iterator_from_indexed_table_ref_iter(it, itr);
+
+done:
+	reftable_iterator_destroy(&oit);
+	reftable_record_clear(&got_rec);
+	return err;
+}
+
+static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
+					      struct reftable_iterator *it,
+					      byte *oid)
+{
+	struct table_iter ti_empty = TABLE_ITER_INIT;
+	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
+	struct filtering_ref_iterator *filter = NULL;
+	struct filtering_ref_iterator empty = FILTERING_REF_ITERATOR_INIT;
+	int oid_len = hash_size(r->hash_id);
+	int err;
+
+	*ti = ti_empty;
+	err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		reftable_free(ti);
+		return err;
+	}
+
+	filter = reftable_malloc(sizeof(struct filtering_ref_iterator));
+	*filter = empty;
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	reftable_table_from_reader(&filter->tab, r);
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, byte *oid)
+{
+	if (r->obj_offsets.present)
+		return reftable_reader_refs_for_indexed(r, it, oid);
+	return reftable_reader_refs_for_unindexed(r, it, oid);
+}
+
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 00000000000..cc87aa49ea5
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,65 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source *source);
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_close(struct reftable_block_source *source);
+
+/* metadata for a block type */
+struct reftable_reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+/* The state for reading a reftable file. */
+struct reftable_reader {
+	/* for convience, associate a name with the instance. */
+	char *name;
+	struct reftable_block_source source;
+
+	/* Size of the file, excluding the footer. */
+	uint64_t size;
+
+	/* 'sha1' for SHA1, 's256' for SHA-256 */
+	uint32_t hash_id;
+
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	/* Length of the OID keys in the 'o' section */
+	int object_id_len;
+	int version;
+
+	struct reftable_reader_offsets ref_offsets;
+	struct reftable_reader_offsets obj_offsets;
+	struct reftable_reader_offsets log_offsets;
+};
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name);
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec);
+void reader_close(struct reftable_reader *r);
+const char *reader_name(struct reftable_reader *r);
+
+/* initialize a block reader to read from `r` */
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 00000000000..586210a6a3d
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1113 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+/* record.c - methods for different types of records. */
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int get_var_int(uint64_t *dest, struct slice in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in.len == 0)
+		return -1;
+	val = in.buf[ptr] & 0x7f;
+
+	while (in.buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in.len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice dest, uint64_t val)
+{
+	byte buf[10] = { 0 };
+	int i = 9;
+	int n = 0;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	n = sizeof(buf) - i - 1;
+	if (dest.len < n)
+		return -1;
+	memcpy(dest.buf, &buf[i + 1], n);
+	return n;
+}
+
+int reftable_is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, in);
+	if (n <= 0)
+		return -1;
+	slice_consume(&in, n);
+	if (in.len < tsize)
+		return -1;
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	slice_consume(&in, tsize);
+
+	return start_len - in.len;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(s, l);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+	if (s.len < l)
+		return -1;
+	memcpy(s.buf, str, l);
+	slice_consume(&s, l);
+
+	return start.len - s.len;
+}
+
+int reftable_encode_key(bool *restart, struct slice dest, struct slice prev_key,
+			struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(prev_key, key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(dest, (uint64_t)prefix_len);
+	if (n < 0)
+		return -1;
+	slice_consume(&dest, n);
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0)
+		return -1;
+	slice_consume(&dest, n);
+
+	if (dest.len < suffix_len)
+		return -1;
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	slice_consume(&dest, suffix_len);
+
+	return start.len - dest.len;
+}
+
+int reftable_decode_key(struct slice *key, byte *extra, struct slice last_key,
+			struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, in);
+	if (n < 0)
+		return -1;
+	slice_consume(&in, n);
+
+	if (prefix_len > last_key.len)
+		return -1;
+
+	n = get_var_int(&suffix_len, in);
+	if (n <= 0)
+		return -1;
+	slice_consume(&in, n);
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len)
+		return -1;
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	slice_consume(&in, suffix_len);
+
+	return start_len - in.len;
+}
+
+static void reftable_ref_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_ref_record *rec =
+		(const struct reftable_ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
+	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	reftable_ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = xstrdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = reftable_malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = reftable_malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9)
+		return '0' + c;
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size(hash_id));
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size(hash_id));
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void reftable_ref_record_clear_void(void *rec)
+{
+	reftable_ref_record_clear((struct reftable_ref_record *)rec);
+}
+
+void reftable_ref_record_clear(struct reftable_ref_record *ref)
+{
+	reftable_free(ref->ref_name);
+	reftable_free(ref->target);
+	reftable_free(ref->target_value);
+	reftable_free(ref->value);
+	memset(ref, 0, sizeof(struct reftable_ref_record));
+}
+
+static byte reftable_ref_record_val_type(const void *rec)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL)
+		return 3;
+	return 0;
+}
+
+static int reftable_ref_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+
+	return start.len - s.len;
+}
+
+static int reftable_ref_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, in);
+	if (n < 0)
+		return n;
+	assert(hash_size > 0);
+
+	slice_consume(&in, n);
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = reftable_malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = reftable_malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		break;
+	case 3: {
+		struct slice dest = SLICE_INIT;
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+		seen_target = true;
+		if (r->target != NULL) {
+			reftable_free(r->target);
+		}
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+static bool reftable_ref_record_is_deletion_void(const void *p)
+{
+	return reftable_ref_record_is_deletion(
+		(const struct reftable_ref_record *)p);
+}
+
+struct reftable_record_vtable reftable_ref_record_vtable = {
+	.key = &reftable_ref_record_key,
+	.type = BLOCK_TYPE_REF,
+	.copy_from = &reftable_ref_record_copy_from,
+	.val_type = &reftable_ref_record_val_type,
+	.encode = &reftable_ref_record_encode,
+	.decode = &reftable_ref_record_decode,
+	.clear = &reftable_ref_record_clear_void,
+	.is_deletion = &reftable_ref_record_is_deletion_void,
+};
+
+static void reftable_obj_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_obj_record *rec =
+		(const struct reftable_obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void reftable_obj_record_clear(void *rec)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	FREE_AND_NULL(obj->hash_prefix);
+	FREE_AND_NULL(obj->offsets);
+	memset(obj, 0, sizeof(struct reftable_obj_record));
+}
+
+static void reftable_obj_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	const struct reftable_obj_record *src =
+		(const struct reftable_obj_record *)src_rec;
+	int olen;
+
+	reftable_obj_record_clear(obj);
+	*obj = *src;
+	obj->hash_prefix = reftable_malloc(obj->hash_prefix_len);
+	memcpy(obj->hash_prefix, src->hash_prefix, obj->hash_prefix_len);
+
+	olen = obj->offset_len * sizeof(uint64_t);
+	obj->offsets = reftable_malloc(olen);
+	memcpy(obj->offsets, src->offsets, olen);
+}
+
+static byte reftable_obj_record_val_type(const void *rec)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8)
+		return r->offset_len;
+	return 0;
+}
+
+static int reftable_obj_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	struct slice start = s;
+	int i = 0;
+	int n = 0;
+	uint64_t last = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+	if (r->offset_len == 0)
+		return start.len - s.len;
+	n = put_var_int(s, r->offsets[0]);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	last = r->offsets[0];
+	for (i = 1; i < r->offset_len; i++) {
+		int n = put_var_int(s, r->offsets[i] - last);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+		last = r->offsets[i];
+	}
+	return start.len - s.len;
+}
+
+static int reftable_obj_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	uint64_t last;
+	int j;
+	r->hash_prefix = reftable_malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, in);
+		if (n < 0) {
+			return n;
+		}
+
+		slice_consume(&in, n);
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0)
+		return start.len - in.len;
+
+	r->offsets = reftable_malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], in);
+	if (n < 0)
+		return n;
+	slice_consume(&in, n);
+
+	last = r->offsets[0];
+	j = 1;
+	while (j < count) {
+		uint64_t delta = 0;
+		int n = get_var_int(&delta, in);
+		if (n < 0) {
+			return n;
+		}
+		slice_consume(&in, n);
+
+		last = r->offsets[j] = (delta + last);
+		j++;
+	}
+	return start.len - in.len;
+}
+
+static bool not_a_deletion(const void *p)
+{
+	return false;
+}
+
+struct reftable_record_vtable reftable_obj_record_vtable = {
+	.key = &reftable_obj_record_key,
+	.type = BLOCK_TYPE_OBJ,
+	.copy_from = &reftable_obj_record_copy_from,
+	.val_type = &reftable_obj_record_val_type,
+	.encode = &reftable_obj_record_encode,
+	.decode = &reftable_obj_record_decode,
+	.clear = &reftable_obj_record_clear,
+	.is_deletion = not_a_deletion,
+};
+
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size(hash_id));
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size(hash_id));
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static void reftable_log_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_log_record *rec =
+		(const struct reftable_log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_be64(dest->buf + 1 + len, ts);
+}
+
+static void reftable_log_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
+	const struct reftable_log_record *src =
+		(const struct reftable_log_record *)src_rec;
+
+	reftable_log_record_clear(dst);
+	*dst = *src;
+	if (dst->ref_name != NULL) {
+		dst->ref_name = xstrdup(dst->ref_name);
+	}
+	if (dst->email != NULL) {
+		dst->email = xstrdup(dst->email);
+	}
+	if (dst->name != NULL) {
+		dst->name = xstrdup(dst->name);
+	}
+	if (dst->message != NULL) {
+		dst->message = xstrdup(dst->message);
+	}
+
+	if (dst->new_hash != NULL) {
+		dst->new_hash = reftable_malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = reftable_malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void reftable_log_record_clear_void(void *rec)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	reftable_log_record_clear(r);
+}
+
+void reftable_log_record_clear(struct reftable_log_record *r)
+{
+	reftable_free(r->ref_name);
+	reftable_free(r->new_hash);
+	reftable_free(r->old_hash);
+	reftable_free(r->name);
+	reftable_free(r->email);
+	reftable_free(r->message);
+	memset(r, 0, sizeof(struct reftable_log_record));
+}
+
+static byte reftable_log_record_val_type(const void *rec)
+{
+	const struct reftable_log_record *log =
+		(const struct reftable_log_record *)rec;
+
+	return reftable_log_record_is_deletion(log) ? 0 : 1;
+}
+
+static byte zero[SHA256_SIZE] = { 0 };
+
+static int reftable_log_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (reftable_log_record_is_deletion(r))
+		return 0;
+
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size)
+		return -1;
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	slice_consume(&s, 2 * hash_size);
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	n = put_var_int(s, r->time);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	if (s.len < 2)
+		return -1;
+
+	put_be16(s.buf, r->tz_offset);
+	slice_consume(&s, 2);
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	return start.len - s.len;
+}
+
+static int reftable_log_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = SLICE_INIT;
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0)
+		return REFTABLE_FORMAT_ERROR;
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (val_type == 0) {
+		FREE_AND_NULL(r->old_hash);
+		FREE_AND_NULL(r->new_hash);
+		FREE_AND_NULL(r->message);
+		FREE_AND_NULL(r->email);
+		FREE_AND_NULL(r->name);
+		return 0;
+	}
+
+	if (in.len < 2 * hash_size)
+		return REFTABLE_FORMAT_ERROR;
+
+	r->old_hash = reftable_realloc(r->old_hash, hash_size);
+	r->new_hash = reftable_realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	slice_consume(&in, 2 * hash_size);
+
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	slice_consume(&in, n);
+
+	r->name = reftable_realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	slice_consume(&in, n);
+
+	r->email = reftable_realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, in);
+	if (n < 0)
+		goto done;
+	slice_consume(&in, n);
+	r->time = ts;
+	if (in.len < 2)
+		goto done;
+
+	r->tz_offset = get_be16(in.buf);
+	slice_consume(&in, 2);
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	slice_consume(&in, n);
+
+	r->message = reftable_realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	slice_release(&dest);
+	return start.len - in.len;
+
+done:
+	slice_release(&dest);
+	return REFTABLE_FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL)
+		a = empty;
+
+	if (b == NULL)
+		b = empty;
+
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL)
+		a = zero;
+
+	if (b == NULL)
+		b = zero;
+
+	return !memcmp(a, b, sz);
+}
+
+bool reftable_log_record_equal(struct reftable_log_record *a,
+			       struct reftable_log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+static bool reftable_log_record_is_deletion_void(const void *p)
+{
+	return reftable_log_record_is_deletion(
+		(const struct reftable_log_record *)p);
+}
+
+struct reftable_record_vtable reftable_log_record_vtable = {
+	.key = &reftable_log_record_key,
+	.type = BLOCK_TYPE_LOG,
+	.copy_from = &reftable_log_record_copy_from,
+	.val_type = &reftable_log_record_val_type,
+	.encode = &reftable_log_record_encode,
+	.decode = &reftable_log_record_decode,
+	.clear = &reftable_log_record_clear_void,
+	.is_deletion = &reftable_log_record_is_deletion_void,
+};
+
+struct reftable_record reftable_new_record(byte typ)
+{
+	struct reftable_record rec = { NULL };
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct reftable_ref_record *r =
+			reftable_calloc(sizeof(struct reftable_ref_record));
+		reftable_record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct reftable_obj_record *r =
+			reftable_calloc(sizeof(struct reftable_obj_record));
+		reftable_record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct reftable_log_record *r =
+			reftable_calloc(sizeof(struct reftable_log_record));
+		reftable_record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct reftable_index_record empty = { .last_key = SLICE_INIT };
+		struct reftable_index_record *r =
+			reftable_calloc(sizeof(struct reftable_index_record));
+		*r = empty;
+		reftable_record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+void *reftable_record_yield(struct reftable_record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+void reftable_record_destroy(struct reftable_record *rec)
+{
+	reftable_record_clear(rec);
+	reftable_free(reftable_record_yield(rec));
+}
+
+static void reftable_index_record_key(const void *r, struct slice *dest)
+{
+	struct reftable_index_record *rec = (struct reftable_index_record *)r;
+	slice_copy(dest, rec->last_key);
+}
+
+static void reftable_index_record_copy_from(void *rec, const void *src_rec,
+					    int hash_size)
+{
+	struct reftable_index_record *dst = (struct reftable_index_record *)rec;
+	struct reftable_index_record *src =
+		(struct reftable_index_record *)src_rec;
+
+	slice_copy(&dst->last_key, src->last_key);
+	dst->offset = src->offset;
+}
+
+static void reftable_index_record_clear(void *rec)
+{
+	struct reftable_index_record *idx = (struct reftable_index_record *)rec;
+	slice_release(&idx->last_key);
+}
+
+static byte reftable_index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int reftable_index_record_encode(const void *rec, struct slice out,
+					int hash_size)
+{
+	const struct reftable_index_record *r =
+		(const struct reftable_index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(out, r->offset);
+	if (n < 0)
+		return n;
+
+	slice_consume(&out, n);
+
+	return start.len - out.len;
+}
+
+static int reftable_index_record_decode(void *rec, struct slice key,
+					byte val_type, struct slice in,
+					int hash_size)
+{
+	struct slice start = in;
+	struct reftable_index_record *r = (struct reftable_index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, key);
+
+	n = get_var_int(&r->offset, in);
+	if (n < 0)
+		return n;
+
+	slice_consume(&in, n);
+	return start.len - in.len;
+}
+
+struct reftable_record_vtable reftable_index_record_vtable = {
+	.key = &reftable_index_record_key,
+	.type = BLOCK_TYPE_INDEX,
+	.copy_from = &reftable_index_record_copy_from,
+	.val_type = &reftable_index_record_val_type,
+	.encode = &reftable_index_record_encode,
+	.decode = &reftable_index_record_decode,
+	.clear = &reftable_index_record_clear,
+	.is_deletion = &not_a_deletion,
+};
+
+void reftable_record_key(struct reftable_record *rec, struct slice *dest)
+{
+	rec->ops->key(rec->data, dest);
+}
+
+byte reftable_record_type(struct reftable_record *rec)
+{
+	return rec->ops->type;
+}
+
+int reftable_record_encode(struct reftable_record *rec, struct slice dest,
+			   int hash_size)
+{
+	return rec->ops->encode(rec->data, dest, hash_size);
+}
+
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size)
+{
+	assert(src->ops->type == rec->ops->type);
+
+	rec->ops->copy_from(rec->data, src->data, hash_size);
+}
+
+byte reftable_record_val_type(struct reftable_record *rec)
+{
+	return rec->ops->val_type(rec->data);
+}
+
+int reftable_record_decode(struct reftable_record *rec, struct slice key,
+			   byte extra, struct slice src, int hash_size)
+{
+	return rec->ops->decode(rec->data, key, extra, src, hash_size);
+}
+
+void reftable_record_clear(struct reftable_record *rec)
+{
+	rec->ops->clear(rec->data);
+}
+
+bool reftable_record_is_deletion(struct reftable_record *rec)
+{
+	return rec->ops->is_deletion(rec->data);
+}
+
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *ref_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = ref_rec;
+	rec->ops = &reftable_ref_record_vtable;
+}
+
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *obj_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = obj_rec;
+	rec->ops = &reftable_obj_record_vtable;
+}
+
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *index_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = index_rec;
+	rec->ops = &reftable_index_record_vtable;
+}
+
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *log_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = log_rec;
+	rec->ops = &reftable_log_record_vtable;
+}
+
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_REF);
+	return (struct reftable_ref_record *)rec->data;
+}
+
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_LOG);
+	return (struct reftable_log_record *)rec->data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL)
+		return !memcmp(a, b, hash_size);
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL)
+		return 0 == strcmp(a, b);
+
+	return a == b;
+}
+
+bool reftable_ref_record_equal(struct reftable_ref_record *a,
+			       struct reftable_ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int reftable_ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_ref_record *)a)->ref_name,
+		      ((struct reftable_ref_record *)b)->ref_name);
+}
+
+bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int reftable_log_record_compare_key(const void *a, const void *b)
+{
+	struct reftable_log_record *la = (struct reftable_log_record *)a;
+	struct reftable_log_record *lb = (struct reftable_log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp)
+		return cmp;
+	if (la->update_index > lb->update_index)
+		return -1;
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
+{
+	return (log->new_hash == NULL && log->old_hash == NULL &&
+		log->name == NULL && log->email == NULL &&
+		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
+		log->message == NULL);
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 00000000000..dca26eda2dc
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,128 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+/* utilities for de/encoding varints */
+
+int get_var_int(uint64_t *dest, struct slice in);
+int put_var_int(struct slice dest, uint64_t val);
+
+/* Methods for records. */
+struct reftable_record_vtable {
+	/* encode the key of to a byte slice. */
+	void (*key)(const void *rec, struct slice *dest);
+
+	/* The record type of ('r' for ref). */
+	byte type;
+
+	void (*copy_from)(void *dest, const void *src, int hash_size);
+
+	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
+	 * vs ref deletion) */
+	byte (*val_type)(const void *rec);
+
+	/* encodes rec into dest, returning how much space was used. */
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+
+	/* decode data from `src` into the record. */
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+
+	/* deallocate and null the record. */
+	void (*clear)(void *rec);
+
+	/* is this a tombstone? */
+	bool (*is_deletion)(const void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct reftable_record {
+	void *data;
+	struct reftable_record_vtable *ops;
+};
+
+/* returns true for recognized block types. Block start with the block type. */
+int reftable_is_block_type(byte typ);
+
+/* creates a malloced record of the given type. Dispose with record_destroy */
+struct reftable_record reftable_new_record(byte typ);
+
+extern struct reftable_record_vtable reftable_ref_record_vtable;
+
+/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
+   number of bytes written. */
+int reftable_encode_key(bool *restart, struct slice dest, struct slice prev_key,
+			struct slice key, byte extra);
+
+/* Decode into `key` and `extra` from `in` */
+int reftable_decode_key(struct slice *key, byte *extra, struct slice last_key,
+			struct slice in);
+
+/* reftable_index_record are used internally to speed up lookups. */
+struct reftable_index_record {
+	uint64_t offset; /* Offset of block */
+	struct slice last_key; /* Last key of the block. */
+};
+
+/* reftable_obj_record stores an object ID => ref mapping. */
+struct reftable_obj_record {
+	byte *hash_prefix; /* leading bytes of the object ID */
+	int hash_prefix_len; /* number of leading bytes. Constant
+			      * across a single table. */
+	uint64_t *offsets; /* a vector of file offsets. */
+	int offset_len;
+};
+
+/* see struct record_vtable */
+
+void reftable_record_key(struct reftable_record *rec, struct slice *dest);
+byte reftable_record_type(struct reftable_record *rec);
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size);
+byte reftable_record_val_type(struct reftable_record *rec);
+int reftable_record_encode(struct reftable_record *rec, struct slice dest,
+			   int hash_size);
+int reftable_record_decode(struct reftable_record *rec, struct slice key,
+			   byte extra, struct slice src, int hash_size);
+bool reftable_record_is_deletion(struct reftable_record *rec);
+
+/* zeroes out the embedded record */
+void reftable_record_clear(struct reftable_record *rec);
+
+/* clear out the record, yielding the reftable_record data that was
+ * encapsulated. */
+void *reftable_record_yield(struct reftable_record *rec);
+
+/* clear and deallocate embedded record, and zero `rec`. */
+void reftable_record_destroy(struct reftable_record *rec);
+
+/* initialize generic records from concrete records. The generic record should
+ * be zeroed out. */
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *objrec);
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *idxrec);
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *refrec);
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *logrec);
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *ref);
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *ref);
+
+/* for qsort. */
+int reftable_ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int reftable_log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/refname.c b/reftable/refname.c
new file mode 100644
index 00000000000..7226329e20b
--- /dev/null
+++ b/reftable/refname.c
@@ -0,0 +1,209 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+#include "reftable.h"
+#include "basics.h"
+#include "refname.h"
+#include "slice.h"
+
+struct find_arg {
+	char **names;
+	const char *want;
+};
+
+static int find_name(size_t k, void *arg)
+{
+	struct find_arg *f_arg = (struct find_arg *)arg;
+	return strcmp(f_arg->names[k], f_arg->want) >= 0;
+}
+
+int modification_has_ref(struct modification *mod, const char *name)
+{
+	struct reftable_ref_record ref = { 0 };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = name,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len && !strcmp(mod->add[idx], name)) {
+			return 0;
+		}
+	}
+
+	if (mod->del_len > 0) {
+		struct find_arg arg = {
+			.names = mod->del,
+			.want = name,
+		};
+		int idx = binsearch(mod->del_len, find_name, &arg);
+		if (idx < mod->del_len && !strcmp(mod->del[idx], name)) {
+			return 1;
+		}
+	}
+
+	err = reftable_table_read_ref(&mod->tab, name, &ref);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static void modification_clear(struct modification *mod)
+{
+	/* don't delete the strings themselves; they're owned by ref records.
+	 */
+	FREE_AND_NULL(mod->add);
+	FREE_AND_NULL(mod->del);
+	mod->add_len = 0;
+	mod->del_len = 0;
+}
+
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix)
+{
+	struct reftable_iterator it = { NULL };
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = prefix,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len &&
+		    !strncmp(prefix, mod->add[idx], strlen(prefix)))
+			goto done;
+	}
+	err = reftable_table_seek_ref(&mod->tab, &it, prefix);
+	if (err)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err)
+			goto done;
+
+		if (mod->del_len > 0) {
+			struct find_arg arg = {
+				.names = mod->del,
+				.want = ref.ref_name,
+			};
+			int idx = binsearch(mod->del_len, find_name, &arg);
+			if (idx < mod->del_len &&
+			    !strcmp(ref.ref_name, mod->del[idx])) {
+				continue;
+			}
+		}
+
+		if (strncmp(ref.ref_name, prefix, strlen(prefix))) {
+			err = 1;
+			goto done;
+		}
+		err = 0;
+		goto done;
+	}
+
+done:
+	reftable_ref_record_clear(&ref);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int validate_ref_name(const char *name)
+{
+	while (true) {
+		char *next = strchr(name, '/');
+		if (!*name) {
+			return REFTABLE_REFNAME_ERROR;
+		}
+		if (!next) {
+			return 0;
+		}
+		if (next - name == 0 || (next - name == 1 && *name == '.') ||
+		    (next - name == 2 && name[0] == '.' && name[1] == '.'))
+			return REFTABLE_REFNAME_ERROR;
+		name = next + 1;
+	}
+	return 0;
+}
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz)
+{
+	struct modification mod = {
+		.tab = tab,
+		.add = reftable_calloc(sizeof(char *) * sz),
+		.del = reftable_calloc(sizeof(char *) * sz),
+	};
+	int i = 0;
+	int err = 0;
+	for (; i < sz; i++) {
+		if (reftable_ref_record_is_deletion(&recs[i])) {
+			mod.del[mod.del_len++] = recs[i].ref_name;
+		} else {
+			mod.add[mod.add_len++] = recs[i].ref_name;
+		}
+	}
+
+	err = modification_validate(&mod);
+	modification_clear(&mod);
+	return err;
+}
+
+static void slice_trim_component(struct slice *sl)
+{
+	while (sl->len > 0) {
+		bool is_slash = (sl->buf[sl->len - 1] == '/');
+		sl->len--;
+		if (is_slash)
+			break;
+	}
+}
+
+int modification_validate(struct modification *mod)
+{
+	struct slice slashed = SLICE_INIT;
+	int err = 0;
+	int i = 0;
+	for (; i < mod->add_len; i++) {
+		err = validate_ref_name(mod->add[i]);
+		if (err)
+			goto done;
+		slice_set_string(&slashed, mod->add[i]);
+		slice_addstr(&slashed, "/");
+
+		err = modification_has_ref_with_prefix(
+			mod, slice_as_string(&slashed));
+		if (err == 0) {
+			err = REFTABLE_NAME_CONFLICT;
+			goto done;
+		}
+		if (err < 0)
+			goto done;
+
+		slice_set_string(&slashed, mod->add[i]);
+		while (slashed.len) {
+			slice_trim_component(&slashed);
+			err = modification_has_ref(mod,
+						   slice_as_string(&slashed));
+			if (err == 0) {
+				err = REFTABLE_NAME_CONFLICT;
+				goto done;
+			}
+			if (err < 0)
+				goto done;
+		}
+	}
+	err = 0;
+done:
+	slice_release(&slashed);
+	return err;
+}
diff --git a/reftable/refname.h b/reftable/refname.h
new file mode 100644
index 00000000000..f335287fcdb
--- /dev/null
+++ b/reftable/refname.h
@@ -0,0 +1,38 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+#ifndef REFNAME_H
+#define REFNAME_H
+
+#include "reftable.h"
+
+struct modification {
+	struct reftable_table tab;
+
+	char **add;
+	size_t add_len;
+
+	char **del;
+	size_t del_len;
+};
+
+// -1 = error, 0 = found, 1 = not found
+int modification_has_ref(struct modification *mod, const char *name);
+
+// -1 = error, 0 = found, 1 = not found.
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix);
+
+// 0 = OK.
+int validate_ref_name(const char *name);
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz);
+
+int modification_validate(struct modification *mod);
+
+#endif
diff --git a/reftable/reftable.c b/reftable/reftable.c
new file mode 100644
index 00000000000..6e7d1da10e7
--- /dev/null
+++ b/reftable/reftable.c
@@ -0,0 +1,90 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+#include "record.h"
+#include "reader.h"
+#include "merged.h"
+
+struct reftable_table_vtable {
+	int (*seek)(void *tab, struct reftable_iterator *it,
+		    struct reftable_record *);
+};
+
+static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
+				     struct reftable_record *rec)
+{
+	return reader_seek((struct reftable_reader *)tab, it, rec);
+}
+
+static struct reftable_table_vtable reader_vtable = {
+	.seek = reftable_reader_seek_void,
+};
+
+static int reftable_merged_table_seek_void(void *tab,
+					   struct reftable_iterator *it,
+					   struct reftable_record *rec)
+{
+	return merged_table_seek_record((struct reftable_merged_table *)tab, it,
+					rec);
+}
+
+static struct reftable_table_vtable merged_table_vtable = {
+	.seek = reftable_merged_table_seek_void,
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return tab->ops->seek(tab->table_arg, it, &rec);
+}
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &reader_vtable;
+	tab->table_arg = reader;
+}
+
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *merged)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &merged_table_vtable;
+	tab->table_arg = merged;
+}
+
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_iterator it = { 0 };
+	int err = reftable_table_seek_ref(tab, &it, name);
+	if (err)
+		goto done;
+
+	err = reftable_iterator_next_ref(&it, ref);
+	if (err)
+		goto done;
+
+	if (strcmp(ref->ref_name, name) ||
+	    reftable_ref_record_is_deletion(ref)) {
+		reftable_ref_record_clear(ref);
+		err = 1;
+		goto done;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 00000000000..86f2dcf33d4
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,564 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *));
+
+/****************************************************************
+ Basic data types
+
+ Reftables store the state of each ref in struct reftable_ref_record, and they
+ store a sequence of reflog updates in struct reftable_log_record.
+ ****************************************************************/
+
+/* reftable_ref_record holds a ref database entry target_value */
+struct reftable_ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
+
+/* prints a reftable_ref_record onto stdout */
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id);
+
+/* frees and nulls all pointer values. */
+void reftable_ref_record_clear(struct reftable_ref_record *ref);
+
+/* returns whether two reftable_ref_records are the same */
+int reftable_ref_record_equal(struct reftable_ref_record *a,
+			      struct reftable_ref_record *b, int hash_size);
+
+/* reftable_log_record holds a reflog entry */
+struct reftable_log_record {
+	char *ref_name;
+	uint64_t update_index; /* logical timestamp of a transactional update.
+				*/
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int reftable_log_record_is_deletion(const struct reftable_log_record *log);
+
+/* frees and nulls all pointer values. */
+void reftable_log_record_clear(struct reftable_log_record *log);
+
+/* returns whether two records are equal. */
+int reftable_log_record_equal(struct reftable_log_record *a,
+			      struct reftable_log_record *b, int hash_size);
+
+/* dumps a reftable_log_record on stdout, for debugging/testing. */
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id);
+
+/****************************************************************
+ Error handling
+
+ Error are signaled with negative integer return values. 0 means success.
+ ****************************************************************/
+
+/* different types of errors */
+enum reftable_error {
+	/* Unexpected file system behavior */
+	REFTABLE_IO_ERROR = -2,
+
+	/* Format inconsistency on reading data
+	 */
+	REFTABLE_FORMAT_ERROR = -3,
+
+	/* File does not exist. Returned from block_source_from_file(),  because
+	   it needs special handling in stack.
+	*/
+	REFTABLE_NOT_EXIST_ERROR = -4,
+
+	/* Trying to write out-of-date data. */
+	REFTABLE_LOCK_ERROR = -5,
+
+	/* Misuse of the API:
+	   - on writing a record with NULL ref_name.
+	   - on writing a reftable_ref_record outside the table limits
+	   - on writing a ref or log record before the stack's next_update_index
+	   - on reading a reftable_ref_record from log iterator, or vice versa.
+	*/
+	REFTABLE_API_ERROR = -6,
+
+	/* Decompression error */
+	REFTABLE_ZLIB_ERROR = -7,
+
+	/* Wrote a table without blocks. */
+	REFTABLE_EMPTY_TABLE_ERROR = -8,
+
+	/* Dir/file conflict. */
+	REFTABLE_NAME_CONFLICT = -9,
+
+	/* Illegal ref name. */
+	REFTABLE_REFNAME_ERROR = -10,
+};
+
+/* convert the numeric error code to a string. The string should not be
+ * deallocated. */
+const char *reftable_error_str(int err);
+
+/*
+ * Convert the numeric error code to an equivalent errno code.
+ */
+int reftable_error_to_errno(int err);
+
+/****************************************************************
+ Writing
+
+ Writing single reftables
+ ****************************************************************/
+
+/* reftable_write_options sets options for writing a single reftable. */
+struct reftable_write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+
+	/* boolean: do not check ref names for validity or dir/file conflicts.
+	 */
+	int skip_name_check;
+};
+
+/* reftable_block_stats holds statistics for a single block type */
+struct reftable_block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct reftable_stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct reftable_block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct reftable_block_stats obj_stats;
+	/* stats for index blocks */
+	struct reftable_block_stats idx_stats;
+	/* stats for log blocks */
+	struct reftable_block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* reftable_new_writer creates a new writer */
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, uint8_t *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int reftable_fd_write(void *fdp, uint8_t *data, size_t size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+
+   For transactional updates, typically min==max. When converting an existing
+   ref database into a single reftable, this would be a range of update-index
+   timestamps.
+ */
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max);
+
+/* adds a reftable_ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned.
+
+   It is an error to write a ref record after a log record.
+ */
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n);
+
+/* adds a reftable_log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n);
+
+/* reftable_writer_close finalizes the reftable. The writer is retained so
+ * statistics can be inspected. */
+int reftable_writer_close(struct reftable_writer *w);
+
+/* writer_stats returns the statistics on the reftable being written.
+
+   This struct becomes invalid when the writer is freed.
+ */
+const struct reftable_stats *writer_stats(struct reftable_writer *w);
+
+/* reftable_writer_free deallocates memory for the writer */
+void reftable_writer_free(struct reftable_writer *w);
+
+/****************************************************************
+ * ITERATING
+ ****************************************************************/
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct reftable_iterator {
+	struct reftable_iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref);
+
+/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log);
+
+/* releases resources associated with an iterator. */
+void reftable_iterator_destroy(struct reftable_iterator *it);
+
+/****************************************************************
+ Reading single tables
+
+ The follow routines are for reading single files. For an application-level
+ interface, skip ahead to struct reftable_merged_table and struct
+ reftable_stack.
+ ****************************************************************/
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct reftable_block_source {
+	struct reftable_block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct reftable_block {
+	uint8_t *data;
+	int len;
+	struct reftable_block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct reftable_block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct reftable_block *dest,
+			  uint64_t off, uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct reftable_block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int reftable_block_source_from_file(struct reftable_block_source *block_src,
+				    const char *name);
+
+/* The reader struct is a handle to an open reftable file. */
+struct reftable_reader;
+
+/* reftable_new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp. The name is used for creating a stack. Typically, it is the
+ * basename of the file. The block source `src` is owned by the reader, and is
+ * closed on calling reftable_reader_destroy().
+ */
+int reftable_new_reader(struct reftable_reader **pp,
+			struct reftable_block_source *src, const char *name);
+
+/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
+   in the table.  To seek to the start of the table, use name = "".
+
+   example:
+
+   struct reftable_reader *r = NULL;
+   int err = reftable_new_reader(&r, &src, "filename");
+   if (err < 0) { ... }
+   struct reftable_iterator it  = {0};
+   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct reftable_ref_record ref  = {0};
+   while (1) {
+     err = reftable_iterator_next_ref(&it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   reftable_iterator_destroy(&it);
+   reftable_ref_record_clear(&ref);
+ */
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reftable_reader_hash_id(struct reftable_reader *r);
+
+/* seek to logs for the given name, older than update_index. To seek to the
+   start of the table, use name = "".
+ */
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reftable_reader_free(struct reftable_reader *);
+
+/* return an iterator for the refs pointing to `oid`. */
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid);
+
+/* return the max_update_index for a table */
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
+
+/****************************************************************
+ Merged tables
+
+ A ref database kept in a sequence of table files. The merged_table presents a
+ unified view to reading (seeking, iterating) a sequence of immutable tables.
+ ****************************************************************/
+
+/* A merged table is implements seeking/iterating over a stack of tables. */
+struct reftable_merged_table;
+
+/* reftable_new_merged_table creates a new merged table. It takes ownership of
+   the stack array.
+*/
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id);
+
+/* returns an iterator positioned just before 'name' */
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index);
+
+/* like reftable_merged_table_seek_log_at but look for the newest entry. */
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
+
+/* closes readers for the merged tables */
+void reftable_merged_table_close(struct reftable_merged_table *mt);
+
+/* releases memory for the merged_table */
+void reftable_merged_table_free(struct reftable_merged_table *m);
+
+/****************************************************************
+ Generic tables
+
+ A unified API for reading tables, either merged tables, or single readers.
+ ****************************************************************/
+
+struct reftable_table {
+	struct reftable_table_vtable *ops;
+	void *table_arg;
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name);
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader);
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *table);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref);
+
+/****************************************************************
+ Mutable ref database
+
+ The stack presents an interface to a mutable sequence of reftables.
+ ****************************************************************/
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct reftable_stack;
+
+/* open a new reftable stack. The tables along with the table list will be
+   stored in 'dir'. Typically, this should be .git/reftables.
+*/
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
+
+/* holds a transaction to add tables at the top of a stack. */
+struct reftable_addition;
+
+/*
+  returns a new transaction to add reftables to the given stack. As a side
+  effect, the ref database is locked.
+*/
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st);
+
+/* Adds a reftable to transaction. */
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg);
+
+/* Commits the transaction, releasing the lock. */
+int reftable_addition_commit(struct reftable_addition *add);
+
+/* Release all non-committed data from the transaction, and deallocate the
+   transaction. Releases the lock if held. */
+void reftable_addition_destroy(struct reftable_addition *add);
+
+/* add a new table to the stack. The write_table function must call
+   reftable_writer_set_limits, add refs and return an error value. */
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write_table)(struct reftable_writer *wr,
+					  void *write_arg),
+		       void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st);
+
+/* frees all resources associated with the stack. */
+void reftable_stack_destroy(struct reftable_stack *st);
+
+/* Reloads the stack if necessary. This is very cheap to run if the stack was up
+ * to date */
+int reftable_stack_reload(struct reftable_stack *st);
+
+/* Policy for expiring reflog entries. */
+struct reftable_log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int reftable_stack_auto_compact(struct reftable_stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log);
+
+/* statistics on past compactions. */
+struct reftable_compaction_stats {
+	uint64_t bytes; /* total number of bytes written */
+	uint64_t entries_written; /* total number of entries written, including
+				     failures. */
+	int attempts; /* how often we tried to compact */
+	int failures; /* failures happen on concurrent updates */
+};
+
+/* return statistics for compaction up till now. */
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st);
+
+#endif
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 00000000000..160db02991e
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,247 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+struct slice reftable_empty_slice = SLICE_INIT;
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	int l;
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+	assert(s->canary == SLICE_CANARY);
+
+	l = strlen(str);
+	l++; /* \0 */
+	slice_resize(s, l);
+	memcpy(s->buf, str, l);
+	s->len = l - 1;
+}
+
+void slice_init(struct slice *s)
+{
+	struct slice empty = SLICE_INIT;
+	*s = empty;
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	assert(s->canary == SLICE_CANARY);
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = reftable_realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_addstr(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+	assert(d->canary == SLICE_CANARY);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_addbuf(struct slice *s, struct slice a)
+{
+	int end = s->len;
+	assert(s->canary == SLICE_CANARY);
+	slice_resize(s, s->len + a.len);
+	memcpy(s->buf + end, a.buf, a.len);
+}
+
+void slice_consume(struct slice *s, int n)
+{
+	assert(s->canary == SLICE_CANARY);
+	s->buf += n;
+	s->len -= n;
+}
+
+byte *slice_detach(struct slice *s)
+{
+	byte *p = s->buf;
+	assert(s->canary == SLICE_CANARY);
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_release(struct slice *s)
+{
+	assert(s->canary == SLICE_CANARY);
+	reftable_free(slice_detach(s));
+}
+
+void slice_copy(struct slice *dest, struct slice src)
+{
+	assert(dest->canary == SLICE_CANARY);
+	assert(src.canary == SLICE_CANARY);
+	slice_resize(dest, src.len);
+	memcpy(dest->buf, src.buf, src.len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	assert(s->canary == SLICE_CANARY);
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice in)
+{
+	struct slice s = SLICE_INIT;
+	assert(in.canary == SLICE_CANARY);
+	slice_resize(&s, in.len + 1);
+	s.buf[in.len] = 0;
+	memcpy(s.buf, in.buf, in.len);
+	return (char *)slice_detach(&s);
+}
+
+bool slice_equal(struct slice a, struct slice b)
+{
+	assert(a.canary == SLICE_CANARY);
+	assert(b.canary == SLICE_CANARY);
+	if (a.len != b.len)
+		return 0;
+	return memcmp(a.buf, b.buf, a.len) == 0;
+}
+
+int slice_cmp(struct slice a, struct slice b)
+{
+	int min = a.len < b.len ? a.len : b.len;
+	int res = memcmp(a.buf, b.buf, min);
+	assert(a.canary == SLICE_CANARY);
+	assert(b.canary == SLICE_CANARY);
+	if (res != 0)
+		return res;
+	if (a.len < b.len)
+		return -1;
+	else if (a.len > b.len)
+		return 1;
+	else
+		return 0;
+}
+
+int slice_add(struct slice *b, byte *data, size_t sz)
+{
+	assert(b->canary == SLICE_CANARY);
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = reftable_realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_add_void(void *b, byte *data, size_t sz)
+{
+	return slice_add((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = reftable_calloc(size);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf)
+{
+	assert(bs->ops == NULL);
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+struct reftable_block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct reftable_block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct reftable_block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
+
+int common_prefix_size(struct slice a, struct slice b)
+{
+	int p = 0;
+	assert(a.canary == SLICE_CANARY);
+	assert(b.canary == SLICE_CANARY);
+	while (p < a.len && p < b.len) {
+		if (a.buf[p] != b.buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 00000000000..e085d9361ea
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,87 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+/*
+  provides bounds-checked byte ranges.
+  To use, initialize as "slice x = {0};"
+ */
+struct slice {
+	int len;
+	int cap;
+	byte *buf;
+	byte canary;
+};
+#define SLICE_CANARY 0x42
+#define SLICE_INIT                       \
+	{                                \
+		0, 0, NULL, SLICE_CANARY \
+	}
+extern struct slice reftable_empty_slice;
+
+void slice_set_string(struct slice *dest, const char *src);
+void slice_addstr(struct slice *dest, const char *src);
+
+/* Deallocate and clear slice */
+void slice_release(struct slice *slice);
+
+/* Return a malloced string for `src` */
+char *slice_to_string(struct slice src);
+
+/* Initializes a slice. Accepts a slice with random garbage. */
+void slice_init(struct slice *slice);
+
+/* Ensure that `buf` is \0 terminated. */
+const char *slice_as_string(struct slice *src);
+
+/* Compare slices */
+bool slice_equal(struct slice a, struct slice b);
+
+/* Return `buf`, clearing out `s` */
+byte *slice_detach(struct slice *s);
+
+/* Copy bytes */
+void slice_copy(struct slice *dest, struct slice src);
+
+/* Advance `buf` by `n`, and decrease length. A copy of the slice
+   should be kept for deallocating the slice. */
+void slice_consume(struct slice *s, int n);
+
+/* Set length of the slice to `l` */
+void slice_resize(struct slice *s, int l);
+
+/* Signed comparison */
+int slice_cmp(struct slice a, struct slice b);
+
+/* Append `data` to the `dest` slice.  */
+int slice_add(struct slice *dest, byte *data, size_t sz);
+
+/* Append `add` to `dest. */
+void slice_addbuf(struct slice *dest, struct slice add);
+
+/* Like slice_add, but suitable for passing to reftable_new_writer
+ */
+int slice_add_void(void *b, byte *data, size_t sz);
+
+/* Find the longest shared prefix size of `a` and `b` */
+int common_prefix_size(struct slice a, struct slice b);
+
+struct reftable_block_source;
+
+/* Create an in-memory block source for reading reftables */
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf);
+
+struct reftable_block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 00000000000..97790bda92e
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1207 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "refname.h"
+#include "reftable.h"
+#include "writer.h"
+
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config)
+{
+	struct reftable_stack *p =
+		reftable_calloc(sizeof(struct reftable_stack));
+	struct slice list_file_name = SLICE_INIT;
+	int err = 0;
+
+	if (config.hash_id == 0) {
+		config.hash_id = SHA1_ID;
+	}
+
+	*dest = NULL;
+
+	slice_set_string(&list_file_name, dir);
+	slice_addstr(&list_file_name, "/tables.list");
+
+	p->list_file = slice_to_string(list_file_name);
+	slice_release(&list_file_name);
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = reftable_stack_reload_maybe_reuse(p, true);
+	if (err < 0) {
+		reftable_stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fd_read_lines(int fd, char ***namesp)
+{
+	off_t size = lseek(fd, 0, SEEK_END);
+	char *buf = NULL;
+	int err = 0;
+	if (size < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+	err = lseek(fd, 0, SEEK_SET);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	buf = reftable_malloc(size + 1);
+	if (read(fd, buf, size) != size) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+
+done:
+	reftable_free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	int fd = open(filename, O_RDONLY, 0644);
+	int err = 0;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			*namesp = reftable_calloc(sizeof(char *));
+			return 0;
+		}
+
+		return REFTABLE_IO_ERROR;
+	}
+	err = fd_read_lines(fd, namesp);
+	close(fd);
+	return err;
+}
+
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void reftable_stack_destroy(struct reftable_stack *st)
+{
+	if (st->merged != NULL) {
+		reftable_merged_table_close(st->merged);
+		reftable_merged_table_free(st->merged);
+		st->merged = NULL;
+	}
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	reftable_free(st);
+}
+
+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
+						   int cur_len)
+{
+	struct reftable_reader **cur =
+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
+				      bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reftable_reader **new_tables =
+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
+	int new_tables_len = 0;
+	struct reftable_merged_table *new_merged = NULL;
+	int i;
+	struct slice table_path = SLICE_INIT;
+
+	while (*names) {
+		struct reftable_reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct reftable_block_source src = { 0 };
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_addstr(&table_path, "/");
+			slice_addstr(&table_path, name);
+
+			err = reftable_block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0)
+				goto done;
+
+			err = reftable_new_reader(&rd, &src, name);
+			if (err < 0)
+				goto done;
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
+					st->config.hash_id);
+	if (err < 0)
+		goto done;
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		reftable_merged_table_free(st->merged);
+	}
+	new_merged->suppress_deletions = true;
+	st->merged = new_merged;
+
+	for (i = 0; i < cur_len; i++) {
+		if (cur[i] != NULL) {
+			reader_close(cur[i]);
+			reftable_reader_free(cur[i]);
+		}
+	}
+
+done:
+	slice_release(&table_path);
+	for (i = 0; i < new_tables_len; i++) {
+		reader_close(new_tables[i]);
+		reftable_reader_free(new_tables[i]);
+	}
+	reftable_free(new_tables);
+	reftable_free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0)
+		return diff;
+
+	return udiff;
+}
+
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0)
+		return err;
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		int err2 = 0;
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = reftable_stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != REFTABLE_NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
+		   writer. Check if there was one by checking if the name list
+		   changed.
+		*/
+		err2 = read_lines(st->list_file, &names_after);
+		if (err2 < 0) {
+			free_names(names);
+			return err2;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return err;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 1;
+		sleep_millisec(delay);
+	}
+
+	return 0;
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct reftable_stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0)
+		return err;
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto done;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto done;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto done;
+	}
+
+done:
+	free_names(names);
+	return err;
+}
+
+int reftable_stack_reload(struct reftable_stack *st)
+{
+	int err = stack_uptodate(st);
+	if (err > 0)
+		return reftable_stack_reload_maybe_reuse(st, true);
+	return err;
+}
+
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write)(struct reftable_writer *wr, void *arg),
+		       void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == REFTABLE_LOCK_ERROR) {
+			/* Ignore error return, we want to propagate
+			   REFTABLE_LOCK_ERROR.
+			*/
+			reftable_stack_reload(st);
+		}
+		return err;
+	}
+
+	if (!st->disable_auto_compact)
+		return reftable_stack_auto_compact(st);
+
+	return 0;
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
+	slice_set_string(dest, buf);
+}
+
+struct reftable_addition {
+	int lock_file_fd;
+	struct slice lock_file_name;
+	struct reftable_stack *stack;
+	char **names;
+	char **new_tables;
+	int new_tables_len;
+	uint64_t next_update_index;
+};
+
+#define REFTABLE_ADDITION_INIT               \
+	{                                    \
+		.lock_file_name = SLICE_INIT \
+	}
+
+static int reftable_stack_init_addition(struct reftable_addition *add,
+					struct reftable_stack *st)
+{
+	int err = 0;
+	add->stack = st;
+
+	slice_set_string(&add->lock_file_name, st->list_file);
+	slice_addstr(&add->lock_file_name, ".lock");
+
+	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (add->lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = REFTABLE_LOCK_ERROR;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	err = stack_uptodate(st);
+	if (err < 0)
+		goto done;
+
+	if (err > 1) {
+		err = REFTABLE_LOCK_ERROR;
+		goto done;
+	}
+
+	add->next_update_index = reftable_stack_next_update_index(st);
+done:
+	if (err) {
+		reftable_addition_close(add);
+	}
+	return err;
+}
+
+void reftable_addition_close(struct reftable_addition *add)
+{
+	int i = 0;
+	struct slice nm = SLICE_INIT;
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_set_string(&nm, add->stack->list_file);
+		slice_addstr(&nm, "/");
+		slice_addstr(&nm, add->new_tables[i]);
+		unlink(slice_as_string(&nm));
+		reftable_free(add->new_tables[i]);
+		add->new_tables[i] = NULL;
+	}
+	reftable_free(add->new_tables);
+	add->new_tables = NULL;
+	add->new_tables_len = 0;
+
+	if (add->lock_file_fd > 0) {
+		close(add->lock_file_fd);
+		add->lock_file_fd = 0;
+	}
+	if (add->lock_file_name.len > 0) {
+		unlink(slice_as_string(&add->lock_file_name));
+		slice_release(&add->lock_file_name);
+	}
+
+	free_names(add->names);
+	add->names = NULL;
+	slice_release(&nm);
+}
+
+void reftable_addition_destroy(struct reftable_addition *add)
+{
+	if (add == NULL) {
+		return;
+	}
+	reftable_addition_close(add);
+	reftable_free(add);
+}
+
+int reftable_addition_commit(struct reftable_addition *add)
+{
+	struct slice table_list = SLICE_INIT;
+	int i = 0;
+	int err = 0;
+	if (add->new_tables_len == 0)
+		goto done;
+
+	for (i = 0; i < add->stack->merged->stack_len; i++) {
+		slice_addstr(&table_list, add->stack->merged->stack[i]->name);
+		slice_addstr(&table_list, "\n");
+	}
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_addstr(&table_list, add->new_tables[i]);
+		slice_addstr(&table_list, "\n");
+	}
+
+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	slice_release(&table_list);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = close(add->lock_file_fd);
+	add->lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = rename(slice_as_string(&add->lock_file_name),
+		     add->stack->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = reftable_stack_reload(add->stack);
+
+done:
+	reftable_addition_close(add);
+	return err;
+}
+
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st)
+{
+	int err = 0;
+	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
+	*dest = reftable_calloc(sizeof(**dest));
+	**dest = empty;
+	err = reftable_stack_init_addition(*dest, st);
+	if (err) {
+		reftable_free(*dest);
+		*dest = NULL;
+	}
+	return err;
+}
+
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg)
+{
+	struct reftable_addition add = REFTABLE_ADDITION_INIT;
+	int err = reftable_stack_init_addition(&add, st);
+	if (err < 0)
+		goto done;
+	if (err > 0) {
+		err = REFTABLE_LOCK_ERROR;
+		goto done;
+	}
+
+	err = reftable_addition_add(&add, write_table, arg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_addition_commit(&add);
+done:
+	reftable_addition_close(&add);
+	return err;
+}
+
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg)
+{
+	struct slice temp_tab_file_name = SLICE_INIT;
+	struct slice tab_file_name = SLICE_INIT;
+	struct slice next_name = SLICE_INIT;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, add->next_update_index, add->next_update_index);
+
+	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
+	slice_addstr(&temp_tab_file_name, "/");
+	slice_addbuf(&temp_tab_file_name, next_name);
+	slice_addstr(&temp_tab_file_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
+	if (tab_fd < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
+				 &add->stack->config);
+	err = write_table(wr, arg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_writer_close(wr);
+	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto done;
+	}
+	if (err < 0)
+		goto done;
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = stack_check_addition(add->stack,
+				   slice_as_string(&temp_tab_file_name));
+	if (err < 0)
+		goto done;
+
+	if (wr->min_update_index < add->next_update_index) {
+		err = REFTABLE_API_ERROR;
+		goto done;
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_addstr(&next_name, ".ref");
+
+	slice_set_string(&tab_file_name, add->stack->reftable_dir);
+	slice_addstr(&tab_file_name, "/");
+	slice_addbuf(&tab_file_name, next_name);
+
+	/* TODO: should check destination out of paranoia */
+	err = rename(slice_as_string(&temp_tab_file_name),
+		     slice_as_string(&tab_file_name));
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	add->new_tables = reftable_realloc(add->new_tables,
+					   sizeof(*add->new_tables) *
+						   (add->new_tables_len + 1));
+	add->new_tables[add->new_tables_len] = slice_to_string(next_name);
+	add->new_tables_len++;
+done:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_file_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_file_name));
+	}
+
+	slice_release(&temp_tab_file_name);
+	slice_release(&tab_file_name);
+	slice_release(&next_name);
+	reftable_writer_free(wr);
+	return err;
+}
+
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0)
+		return reftable_reader_max_update_index(
+			       st->merged->stack[sz - 1]) +
+		       1;
+	return 1;
+}
+
+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct reftable_log_expiry_config *config)
+{
+	struct slice next_name = SLICE_INIT;
+	int tab_fd = -1;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reftable_reader_min_update_index(st->merged->stack[first]),
+		    reftable_reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_addstr(temp_tab, "/");
+	slice_addbuf(temp_tab, next_name);
+	slice_addstr(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0)
+		goto done;
+	err = reftable_writer_close(wr);
+	if (err < 0)
+		goto done;
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+done:
+	reftable_writer_free(wr);
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		slice_release(temp_tab);
+	}
+	slice_release(&next_name);
+	return err;
+}
+
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reftable_reader **subtabs = reftable_calloc(
+		sizeof(struct reftable_reader *) * (last - first + 1));
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+
+	uint64_t entries = 0;
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reftable_reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	reftable_writer_set_limits(wr,
+				   st->merged->stack[first]->min_update_index,
+				   st->merged->stack[last]->max_update_index);
+
+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
+					st->config.hash_id);
+	if (err < 0) {
+		reftable_free(subtabs);
+		goto done;
+	}
+
+	err = reftable_merged_table_seek_ref(mt, &it, "");
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = reftable_writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+	reftable_iterator_destroy(&it);
+
+	err = reftable_merged_table_seek_log(mt, &it, "");
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = reftable_writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		reftable_merged_table_free(mt);
+	}
+	reftable_ref_record_clear(&ref);
+	reftable_log_record_clear(&log);
+	st->stats.entries_written += entries;
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct reftable_stack *st, int first, int last,
+			       struct reftable_log_expiry_config *expiry)
+{
+	struct slice temp_tab_file_name = SLICE_INIT;
+	struct slice new_table_name = SLICE_INIT;
+	struct slice lock_file_name = SLICE_INIT;
+	struct slice ref_list_contents = SLICE_INIT;
+	struct slice new_table_path = SLICE_INIT;
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **listp = NULL;
+	char **delete_on_success =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	char **subtable_locks =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto done;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_addstr(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	/* Don't want to write to the lock for now.  */
+	close(lock_file_fd);
+	lock_file_fd = 0;
+
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0)
+		goto done;
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_file_name = SLICE_INIT;
+		struct slice subtab_lock = SLICE_INIT;
+		int sublock_file_fd = -1;
+
+		slice_set_string(&subtab_file_name, st->reftable_dir);
+		slice_addstr(&subtab_file_name, "/");
+		slice_addstr(&subtab_file_name,
+			     reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, subtab_file_name);
+		slice_addstr(&subtab_lock, ".lock");
+
+		sublock_file_fd = open(slice_as_string(&subtab_lock),
+				       O_EXCL | O_CREAT | O_WRONLY, 0644);
+		if (sublock_file_fd > 0) {
+			close(sublock_file_fd);
+		} else if (sublock_file_fd < 0) {
+			if (errno == EEXIST) {
+				err = 1;
+			} else {
+				err = REFTABLE_IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] =
+			(char *)slice_as_string(&subtab_file_name);
+		j++;
+
+		if (err != 0)
+			goto done;
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0)
+		goto done;
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
+				   expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0)
+		goto done;
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_addstr(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_addstr(&new_table_path, "/");
+
+	slice_addbuf(&new_table_path, new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(slice_as_string(&temp_tab_file_name),
+			     slice_as_string(&new_table_path));
+		if (err < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto done;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
+		slice_addstr(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		slice_addbuf(&ref_list_contents, new_table_name);
+		slice_addstr(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
+		slice_addstr(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto done;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto done;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto done;
+	}
+	have_lock = false;
+
+	/* Reload the stack before deleting. On windows, we can only delete the
+	   files after we closed them.
+	*/
+	err = reftable_stack_reload_maybe_reuse(st, first < last);
+
+	listp = delete_on_success;
+	while (*listp) {
+		if (strcmp(*listp, slice_as_string(&new_table_path))) {
+			unlink(*listp);
+		}
+		listp++;
+	}
+
+done:
+	free_names(delete_on_success);
+
+	listp = subtable_locks;
+	while (*listp) {
+		unlink(*listp);
+		listp++;
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	slice_release(&new_table_name);
+	slice_release(&new_table_path);
+	slice_release(&ref_list_contents);
+	slice_release(&temp_tab_file_name);
+	slice_release(&lock_file_name);
+	return err;
+}
+
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct reftable_stack *st, int first,
+				     int last,
+				     struct reftable_log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	if (sz == 0)
+		return 0;
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+
+	if (n == 0) {
+		*seglen = 0;
+		return segs;
+	}
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	reftable_free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
+{
+	uint64_t *sizes =
+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
+	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
+	int overhead = header_size(version) - 1;
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		sizes[i] = st->merged->stack[i]->size - overhead;
+	}
+	return sizes;
+}
+
+int reftable_stack_auto_compact(struct reftable_stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	reftable_free(sizes);
+	if (segment_size(&seg) > 0)
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+
+	return 0;
+}
+
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st)
+{
+	return &st->stats;
+}
+
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_table tab = { NULL };
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+	return reftable_table_read_ref(&tab, refname, ref);
+}
+
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err)
+		goto done;
+
+	err = reftable_iterator_next_log(&it, log);
+	if (err)
+		goto done;
+
+	if (strcmp(log->ref_name, refname) ||
+	    reftable_log_record_is_deletion(log)) {
+		err = 1;
+		goto done;
+	}
+
+done:
+	if (err) {
+		reftable_log_record_clear(log);
+	}
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name)
+{
+	int err = 0;
+	struct reftable_block_source src = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct reftable_ref_record *refs = NULL;
+	struct reftable_iterator it = { NULL };
+	int cap = 0;
+	int len = 0;
+	int i = 0;
+
+	if (st->config.skip_name_check)
+		return 0;
+
+	err = reftable_block_source_from_file(&src, new_tab_name);
+	if (err < 0)
+		goto done;
+
+	err = reftable_new_reader(&rd, &src, new_tab_name);
+	if (err < 0)
+		goto done;
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	if (err > 0) {
+		err = 0;
+		goto done;
+	}
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0)
+			goto done;
+
+		if (len >= cap) {
+			cap = 2 * cap + 1;
+			refs = reftable_realloc(refs, cap * sizeof(refs[0]));
+		}
+
+		refs[len++] = ref;
+	}
+
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+
+	err = validate_ref_record_addition(tab, refs, len);
+
+done:
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&refs[i]);
+	}
+
+	free(refs);
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 00000000000..60bca3b3e58
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,48 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+#include "system.h"
+
+struct reftable_stack {
+	char *list_file;
+	char *reftable_dir;
+	bool disable_auto_compact;
+
+	struct reftable_write_options config;
+
+	struct reftable_merged_table *merged;
+	struct reftable_compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg);
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config);
+int fastlog2(uint64_t sz);
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
+void reftable_addition_close(struct reftable_addition *add);
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 00000000000..9e2c4b8d9ac
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,54 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#if 1 /* REFTABLE_IN_GITCORE */
+
+#include "git-compat-util.h"
+#include "cache.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_IN_GITCORE */
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef uint8_t byte;
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ *
+ * TODO: in git-core, this should fallback to uncompress2 if it is available.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 00000000000..0061d14e306
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,63 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	int res;
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				reftable_calloc(sizeof(struct tree_node));
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	res = compare(key, (*rootp)->key);
+	if (res < 0)
+		return tree_search(key, &(*rootp)->left, compare, insert);
+	else if (res > 0)
+		return tree_search(key, &(*rootp)->right, compare, insert);
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	reftable_free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 00000000000..954512e9a3a
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+/* tree_node is a generic binary search tree. */
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+/* looks for `key` in `rootp` using `compare` as comparison function. If insert
+   is set, insert the key if it's not found. Else, return NULL.
+*/
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+
+/* performs an infix walk of the tree. */
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+
+/*
+  deallocates the tree nodes recursively. Keys should be deallocated separately
+  by walking over the tree. */
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100755
index 00000000000..ef32aeda515
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,24 @@
+#!/bin/sh
+
+set -eu
+
+# Override this to import from somewhere else, say "../reftable".
+SRC=${SRC:-origin}
+BRANCH=${BRANCH:-master}
+
+((git --git-dir reftable-repo/.git fetch ${SRC} ${BRANCH}:import && cd reftable-repo && git checkout -f $(git rev-parse import) ) ||
+   git clone https://github.com/google/reftable reftable-repo)
+
+cp reftable-repo/c/*.[ch] reftable/
+cp reftable-repo/c/include/*.[ch] reftable/
+cp reftable-repo/LICENSE reftable/
+git --git-dir reftable-repo/.git show --no-patch HEAD \
+  > reftable/VERSION
+
+mv reftable/system.h reftable/system.h~
+sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
+
+# Remove unittests and compatibility hacks we don't need here.  
+rm reftable/*_test.c reftable/test_framework.* reftable/compat.*
+
+git add reftable/*.[ch] reftable/LICENSE reftable/VERSION 
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 00000000000..331edb4992e
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,644 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct reftable_block_stats *
+writer_reftable_block_stats(struct reftable_writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct reftable_writer *w, byte *data, size_t len,
+			int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = reftable_calloc(w->pending_padding);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0)
+			return n;
+
+		w->pending_padding = 0;
+		reftable_free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0)
+		return n;
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct reftable_write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_version(struct reftable_writer *w)
+{
+	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
+}
+
+static int writer_write_header(struct reftable_writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	dest[4] = writer_version(w);
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	if (writer_version(w) == 2) {
+		put_be32(dest + 24, w->opts.hash_id);
+	}
+	return header_size(writer_version(w));
+}
+
+static void writer_reinit_block_writer(struct reftable_writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = header_size(writer_version(w));
+	}
+
+	slice_release(&w->last_key);
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, byte *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts)
+{
+	struct reftable_writer *wp =
+		reftable_calloc(sizeof(struct reftable_writer));
+	slice_init(&wp->block_writer_data.last_key);
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->last_key = reftable_empty_slice;
+	wp->block = reftable_calloc(opts->block_size);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void reftable_writer_free(struct reftable_writer *w)
+{
+	reftable_free(w->block);
+	reftable_free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+#define OBJ_INDEX_TREE_NODE_INIT   \
+	{                          \
+		.hash = SLICE_INIT \
+	}
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_cmp(((const struct obj_index_tree_node *)a)->hash,
+			 ((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct reftable_writer *w, struct slice hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		struct obj_index_tree_node empty = OBJ_INDEX_TREE_NODE_INIT;
+		key = reftable_malloc(sizeof(struct obj_index_tree_node));
+		*key = empty;
+
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = reftable_realloc(
+			key->offsets, sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct reftable_writer *w,
+			     struct reftable_record *rec)
+{
+	int result = -1;
+	struct slice key = SLICE_INIT;
+	int err = 0;
+	reftable_record_key(rec, &key);
+	if (slice_cmp(w->last_key, key) >= 0)
+		goto done;
+
+	slice_copy(&w->last_key, key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, reftable_record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == reftable_record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto done;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto done;
+	}
+
+	writer_reinit_block_writer(w, reftable_record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto done;
+	}
+
+	result = 0;
+done:
+	slice_release(&key);
+	return result;
+}
+
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	struct reftable_ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL)
+		return REFTABLE_API_ERROR;
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index)
+		return REFTABLE_API_ERROR;
+
+	reftable_record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, &rec);
+	if (err < 0)
+		return err;
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = hash_size(w->opts.hash_id),
+			.canary = SLICE_CANARY,
+		};
+
+		writer_index_hash(w, h);
+	}
+
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = hash_size(w->opts.hash_id),
+			.canary = SLICE_CANARY,
+		};
+		writer_index_hash(w, h);
+	}
+	return 0;
+}
+
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, reftable_ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	int err;
+	if (log->ref_name == NULL)
+		return REFTABLE_API_ERROR;
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0)
+			return err;
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	reftable_record_from_log(&rec, log);
+	err = writer_add_record(w, &rec);
+	return err;
+}
+
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, reftable_log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	struct reftable_block_stats *bstats = NULL;
+	if (err < 0)
+		return err;
+
+	while (w->index_len > threshold) {
+		struct reftable_index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct reftable_record rec = { 0 };
+			reftable_record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, &rec) == 0) {
+				continue;
+			}
+
+			err = writer_flush_block(w);
+			if (err < 0)
+				return err;
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, &rec);
+			if (err != 0) {
+				/* write into fresh block should always succeed
+				 */
+				abort();
+			}
+		}
+		for (i = 0; i < idx_len; i++) {
+			slice_release(&idx[i].last_key);
+		}
+		reftable_free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0)
+		return err;
+
+	bstats = writer_reftable_block_stats(w, typ);
+	bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
+	bstats->index_offset = index_start;
+	bstats->max_index_level = max_level;
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(entry->hash, *arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct reftable_writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct reftable_obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct reftable_record rec = { 0 };
+	if (arg->err < 0)
+		goto done;
+
+	reftable_record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0)
+		goto done;
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0)
+		goto done;
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0)
+		goto done;
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+done:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	slice_release(&entry->hash);
+	reftable_free(entry);
+}
+
+static int writer_dump_object_index(struct reftable_writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0)
+		return closure.err;
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct reftable_writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL)
+		return 0;
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0)
+		return err;
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0)
+			return err;
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int reftable_writer_close(struct reftable_writer *w)
+{
+	byte footer[72];
+	byte *p = footer;
+	int err = writer_finish_public_section(w);
+	int empty_table = w->next == 0;
+	if (err != 0)
+		goto done;
+	w->pending_padding = 0;
+	if (empty_table) {
+		/* Empty tables need a header anyway. */
+		byte header[28];
+		int n = writer_write_header(w, header);
+		err = padded_write(w, header, n, 0);
+		if (err < 0)
+			goto done;
+	}
+
+	p += writer_write_header(w, footer);
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+
+	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
+	if (err < 0)
+		goto done;
+
+	if (empty_table) {
+		err = REFTABLE_EMPTY_TABLE_ERROR;
+		goto done;
+	}
+
+done:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	slice_release(&w->last_key);
+	return err;
+}
+
+void writer_clear_index(struct reftable_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		slice_release(&w->index[i].last_key);
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct reftable_block_stats *bstats =
+		writer_reftable_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	struct reftable_index_record ir = { .last_key = SLICE_INIT };
+	if (raw_bytes < 0)
+		return raw_bytes;
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0)
+		return err;
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = reftable_realloc(
+			w->index,
+			sizeof(struct reftable_index_record) * w->index_cap);
+	}
+
+	ir.offset = w->next;
+	slice_copy(&ir.last_key, w->block_writer->last_key);
+	w->index[w->index_len] = ir;
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct reftable_writer *w)
+{
+	if (w->block_writer == NULL)
+		return 0;
+	if (w->block_writer->entries == 0)
+		return 0;
+	return writer_flush_nonempty_block(w);
+}
+
+const struct reftable_stats *writer_stats(struct reftable_writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 00000000000..c56d67f7116
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct reftable_writer {
+	int (*write)(void *, byte *, size_t);
+	void *write_arg;
+	int pending_padding;
+	struct slice last_key;
+
+	/* offset of next block to write. */
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct reftable_write_options opts;
+
+	/* memory buffer for writing */
+	byte *block;
+
+	/* writer for the current section. NULL or points to
+	 * block_writer_data */
+	struct block_writer *block_writer;
+
+	struct block_writer block_writer_data;
+
+	/* pending index records for the current section */
+	struct reftable_index_record *index;
+	int index_len;
+	int index_cap;
+
+	/*
+	  tree for use with tsearch; used to populate the 'o' inverse OID
+	  map */
+	struct tree_node *obj_index_tree;
+
+	struct reftable_stats stats;
+};
+
+/* finishes a block, and writes it to storage */
+int writer_flush_block(struct reftable_writer *w);
+
+/* deallocates memory related to the index */
+void writer_clear_index(struct reftable_writer *w);
+
+/* finishes writing a 'r' (refs) or 'g' (reflogs) section */
+int writer_finish_public_section(struct reftable_writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 00000000000..3e0b0f24f1c
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v15 10/13] Reftable support for git-core
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (8 preceding siblings ...)
  2020-05-28 19:46                             ` [PATCH v15 09/13] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 19:46                             ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 11/13] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
                                               ` (5 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

This introduces the refs/reftable-backend.c containing reftable powered ref
storage backend.

It can be activated by passing --ref-storage=reftable to "git init".

TODO:

* Fix worktree commands

* Spots marked XXX

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   27 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   56 +-
 cache.h                                       |    6 +-
 refs.c                                        |   27 +-
 refs.h                                        |    3 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1332 +++++++++++++++++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |  142 ++
 13 files changed, 1591 insertions(+), 30 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ffd..72576235833 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index 372139f1f24..6b21c247b12 100644
--- a/Makefile
+++ b/Makefile
@@ -807,6 +807,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -959,6 +960,7 @@ LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1162,7 +1164,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2352,11 +2354,29 @@ VCSSVN_OBJS += vcs-svn/sliding_window.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/refname.o
+REFTABLE_OBJS += reftable/reftable.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2497,6 +2517,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -3112,7 +3135,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/builtin/clone.c b/builtin/clone.c
index cb48a291caf..4d0cf065e4a 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1108,7 +1108,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
+		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 0b7222e7188..b7053b9e370 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,7 +178,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -188,7 +189,8 @@ void initialize_repository_version(int hash_algo)
 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
 #endif
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 ||
+	    !strcmp(ref_storage_format, "reftable"))
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -238,6 +240,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -247,6 +250,24 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
+	/*
+	 * refs/heads is a file when using reftable. We can't reinitialize with
+	 * a reftable because it will overwrite HEAD
+	 */
+	if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
+			      is_directory(git_path_buf(&buf, "refs/heads"))) {
+		die("cannot switch ref storage format.");
+	}
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -261,15 +282,12 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
@@ -383,7 +401,8 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash, unsigned int flags)
+	    const char *template_dir, int hash, const char *ref_storage_format,
+	    unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -422,6 +441,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
+	repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
@@ -450,6 +470,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -523,6 +545,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = DEFAULT_REF_STORAGE;
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
@@ -530,15 +553,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *object_format = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
@@ -648,9 +674,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
+	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
+		       ref_storage_format, flags);
 }
diff --git a/cache.h b/cache.h
index 0f0485ecfe2..8cb884773c3 100644
--- a/cache.h
+++ b/cache.h
@@ -628,8 +628,9 @@ int path_inside_repo(const char *prefix, const char *path);
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
-	    unsigned int flags);
-void initialize_repository_version(int hash_algo);
+	    const char *ref_storage_format, unsigned int flags);
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1043,6 +1044,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 29e710a29e9..4409080dfd8 100644
--- a/refs.c
+++ b/refs.c
@@ -17,10 +17,16 @@
 #include "argv-array.h"
 #include "repository.h"
 
+const char *default_ref_storage(void)
+{
+	const char *test = getenv("GIT_TEST_REFTABLE");
+	return test ? "reftable" : "files";
+}
+
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1717,13 +1723,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1739,7 +1745,11 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs_private = ref_store_init(r->gitdir,
+					 r->ref_storage_format ?
+						 r->ref_storage_format :
+						 DEFAULT_REF_STORAGE,
+					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
 
@@ -1794,7 +1804,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1808,6 +1818,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1821,9 +1832,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index 7aaa1226551..ddcc940d5c7 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* Returns the ref storage backend to use by default. */
+const char *default_ref_storage(void);
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index d1d00947990..1d58fe17033 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -687,6 +687,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 00000000000..5f927d08e1d
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1332 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct git_reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *repo_dir;
+	char *reftable_dir;
+	struct reftable_stack *stack;
+};
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type);
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	reftable_log_record_clear(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = { NULL };
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	reftable_log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *git_reftable_ref_store_create(const char *path,
+						       unsigned int store_flags)
+{
+	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct reftable_write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+	refs->repo_dir = xstrdup(path);
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
+	assert(refs->err != REFTABLE_API_ERROR);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct strbuf sb = STRBUF_INIT;
+
+	safe_create_dir(refs->reftable_dir, 1);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	return 0;
+}
+
+struct git_reftable_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	const char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = reftable_iterator_next_ref(&ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		/*
+		   We could filter pseudo refs here explicitly, but HEAD is not
+		   a PSEUDOREF, but a PER_WORKTREE, b/c each worktree can have
+		   its own HEAD.
+		 */
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	reftable_ref_record_clear(&ri->ref);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct reftable_merged_table *mt = NULL;
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else {
+		mt = reftable_stack_merged_table(refs->stack);
+		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->prefix = prefix;
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int fixup_symrefs(struct ref_store *ref_store,
+			 struct ref_transaction *transaction)
+{
+	struct strbuf referent = STRBUF_INIT;
+	int i = 0;
+	int err = 0;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *update = transaction->updates[i];
+		struct object_id old_oid;
+
+		err = reftable_read_raw_ref(ref_store, update->refname,
+					    &old_oid, &referent,
+					    /* mutate input, like
+					       files-backend.c */
+					    &update->type);
+		if (err < 0 && errno == ENOENT &&
+		    is_null_oid(&update->old_oid)) {
+			err = 0;
+		}
+		if (err < 0)
+			goto done;
+
+		if (!(update->type & REF_ISSYMREF))
+			continue;
+
+		if (update->flags & REF_NO_DEREF) {
+			/* what should happen here? See files-backend.c
+			 * lock_ref_for_update. */
+		} else {
+			/*
+			  If we are updating a symref (eg. HEAD), we should also
+			  update the branch that the symref points to.
+
+			  This is generic functionality, and would be better
+			  done in refs.c, but the current implementation is
+			  intertwined with the locking in files-backend.c.
+			*/
+			int new_flags = update->flags;
+			struct ref_update *new_update = NULL;
+
+			/* if this is an update for HEAD, should also record a
+			   log entry for HEAD? See files-backend.c,
+			   split_head_update()
+			*/
+			new_update = ref_transaction_add_update(
+				transaction, referent.buf, new_flags,
+				&update->new_oid, &update->old_oid,
+				update->msg);
+			new_update->parent_update = update;
+
+			/* files-backend sets REF_LOG_ONLY here. */
+			update->flags |= REF_NO_DEREF | REF_LOG_ONLY;
+			update->flags &= ~REF_HAVE_OLD;
+		}
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	strbuf_release(&referent);
+	return err;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_addition *add = NULL;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_new_addition(&add, refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = fixup_symrefs(ref_store, transaction);
+	if (err) {
+		goto done;
+	}
+
+	transaction->backend_data = add;
+	transaction->state = REF_TRANSACTION_PREPARED;
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	if (err < 0) {
+		transaction->state = REF_TRANSACTION_CLOSED;
+		strbuf_addf(errbuf, "reftable: transaction prepare: %s",
+			    reftable_error_str(err));
+	}
+
+	return err;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	reftable_addition_destroy(add);
+	transaction->backend_data = NULL;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid;
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp((*(struct ref_update **)a)->refname,
+		      (*(struct ref_update **)b)->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = reftable_stack_next_update_index(refs->stack);
+	int err = 0;
+	int i = 0;
+	struct reftable_log_record *logs =
+		calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	reftable_writer_set_limits(writer, ts, ts);
+
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct reftable_log_record *log = &logs[i];
+		fill_reftable_log_record(log);
+		log->ref_name = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_LOG_ONLY) {
+			continue;
+		}
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct reftable_ref_record ref = { NULL };
+			struct object_id peeled;
+
+			int peel_error = peel_object(&u->new_oid, &peeled);
+			ref.ref_name = (char *)u->refname;
+
+			if (!is_null_oid(&u->new_oid)) {
+				ref.value = u->new_oid.hash;
+			}
+			ref.update_index = ts;
+			if (!peel_error) {
+				ref.target_value = peeled.hash;
+			}
+
+			err = reftable_writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		err = reftable_writer_add_log(writer, &logs[i]);
+		clear_reftable_log_record(&logs[i]);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	int err = 0;
+	int i;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+
+	err = reftable_addition_add(add, &write_transaction_table, transaction);
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_addition_commit(add);
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_addition_destroy(add);
+	transaction->state = REF_TRANSACTION_CLOSED;
+	transaction->backend_data = NULL;
+	if (err) {
+		strbuf_addf(errmsg, "reftable: transaction failure: %s",
+			    reftable_error_str(err));
+		return -1;
+	}
+	return err;
+}
+
+static int
+reftable_transaction_initial_commit(struct ref_store *ref_store,
+				    struct ref_transaction *transaction,
+				    struct strbuf *errmsg)
+{
+	int err = reftable_transaction_prepare(ref_store, transaction, errmsg);
+	if (err)
+		return err;
+
+	return reftable_transaction_finish(ref_store, transaction, errmsg);
+}
+
+struct write_pseudoref_arg {
+	struct reftable_stack *stack;
+	const char *pseudoref;
+	const struct object_id *new_oid;
+	const struct object_id *old_oid;
+};
+
+static int write_pseudoref_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_pseudoref_arg *arg = (struct write_pseudoref_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	struct reftable_ref_record read_ref = { NULL };
+	struct reftable_ref_record write_ref = { NULL };
+
+	reftable_writer_set_limits(writer, ts, ts);
+	if (arg->old_oid) {
+		struct object_id read_oid;
+		err = reftable_stack_read_ref(arg->stack, arg->pseudoref,
+					      &read_ref);
+		if (err < 0)
+			goto done;
+
+		if ((err > 0) != is_null_oid(arg->old_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		/* XXX If old_oid is set, and we have a symref? */
+
+		if (err == 0 && read_ref.value == NULL) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		hashcpy(read_oid.hash, read_ref.value);
+		if (!oideq(arg->old_oid, &read_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+	}
+
+	write_ref.ref_name = (char *)arg->pseudoref;
+	write_ref.update_index = ts;
+	if (!is_null_oid(arg->new_oid))
+		write_ref.value = (uint8_t *)arg->new_oid->hash;
+
+	err = reftable_writer_add_ref(writer, &write_ref);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&read_ref);
+	return err;
+}
+
+static int reftable_write_pseudoref(struct ref_store *ref_store,
+				    const char *pseudoref,
+				    const struct object_id *oid,
+				    const struct object_id *old_oid,
+				    struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_pseudoref_arg arg = {
+		.stack = refs->stack,
+		.pseudoref = pseudoref,
+		.new_oid = oid,
+	};
+	struct reftable_addition *add = NULL;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_new_addition(&add, refs->stack);
+	if (err) {
+		goto done;
+	}
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		/* XXX this is cut & paste from files-backend - should factor
+		 * out? */
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(errbuf,
+					    _("could not read ref '%s'"),
+					    pseudoref);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(errbuf, _("ref '%s' already exists"),
+				    pseudoref);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(errbuf,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			goto done;
+		}
+	}
+
+	err = reftable_addition_add(add, &write_pseudoref_table, &arg);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref update failure: %s",
+			    reftable_error_str(err));
+	}
+
+	err = reftable_addition_commit(add);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref commit failure: %s",
+			    reftable_error_str(err));
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_addition_destroy(add);
+	return err;
+}
+
+static int reftable_delete_pseudoref(struct ref_store *ref_store,
+				     const char *pseudoref,
+				     const struct object_id *old_oid)
+{
+	struct strbuf errbuf = STRBUF_INIT;
+	int ret = reftable_write_pseudoref(ref_store, pseudoref, &null_oid,
+					   old_oid, &errbuf);
+	/* XXX what to do with the error message? */
+	strbuf_release(&errbuf);
+	return ret;
+}
+
+struct write_delete_refs_arg {
+	struct reftable_stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	int i = 0;
+
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = reftable_writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_log_record log = { NULL };
+		struct reftable_ref_record current = { NULL };
+		fill_reftable_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (reftable_stack_read_ref(arg->stack, log.ref_name,
+					    &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = reftable_writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		reftable_ref_record_clear(&current);
+
+		clear_reftable_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	string_list_sort(refnames);
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct git_reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = reftable_stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	reftable_writer_set_limits(writer, ts, ts);
+	err = reftable_writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_log_record log = { NULL };
+		struct object_id new_oid;
+		struct object_id old_oid;
+		struct reftable_ref_record current = { NULL };
+		reftable_stack_read_ref(create->refs->stack, create->refname,
+					&current);
+
+		fill_reftable_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			reftable_writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_reftable_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(refs->stack, &write_create_symref_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+struct write_rename_arg {
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_ref_record ref = { NULL };
+	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto done;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto done;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	reftable_writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct reftable_ref_record todo[2] = { { NULL } };
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = reftable_writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct reftable_log_record todo[2] = { { NULL } };
+		fill_reftable_log_record(&todo[0]);
+		fill_reftable_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = reftable_writer_add_logs(writer, todo, 2);
+
+		clear_reftable_log_record(&todo[0]);
+		clear_reftable_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto done;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_add(refs->stack, &write_rename_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = reftable_iterator_next_log(&ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			/* we want the refnames that we have reflogs for, so we
+			 * skip if we've already produced this name. This could
+			 * be faster by seeking directly to
+			 * reflog@update_index==0.
+			 */
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	reftable_log_record_clear(&ri->log);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(refs->stack);
+	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_log_record log = { NULL };
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid;
+			struct object_id new_oid;
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+	int i = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct reftable_log_record log = { NULL };
+		err = reftable_iterator_next_log(&it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (i = len; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (i = 0; i < len; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		reftable_log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct reftable_log_record tombstone = {
+		.ref_name = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->refs->stack);
+	int i = 0;
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  reftable_pack_refs(), which will compact the entire stack and get rid
+	  of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.refs = refs,
+	};
+	struct reftable_log_record log = { NULL };
+	struct reftable_iterator it = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		goto done;
+	}
+
+	while (1) {
+		struct object_id ooid;
+		struct object_id noid;
+
+		int err = reftable_iterator_next_log(&it, &log);
+		if (err < 0) {
+			goto done;
+		}
+
+		if (err > 0 || strcmp(log.ref_name, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_read_ref(refs->stack, refname, &ref);
+	if (err > 0) {
+		errno = ENOENT;
+		err = -1;
+		goto done;
+	}
+	if (err < 0) {
+		errno = reftable_error_to_errno(err);
+		err = -1;
+		goto done;
+	}
+	if (ref.target != NULL) {
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else if (ref.value != NULL) {
+		hashcpy(oid->hash, ref.value);
+	} else {
+		*type |= REF_ISBROKEN;
+		errno = EINVAL;
+		err = -1;
+	}
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	git_reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_initial_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_write_pseudoref,
+	reftable_delete_pseudoref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire,
+};
diff --git a/repository.c b/repository.c
index 6f7f6f002b1..087760bc184 100644
--- a/repository.c
+++ b/repository.c
@@ -178,6 +178,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 6534fbb7b31..f57b73f4a27 100644
--- a/repository.h
+++ b/repository.h
@@ -74,6 +74,9 @@ struct repository {
 	 */
 	struct ref_store *refs_private;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index 65fe5ecefbe..2ef970f9f88 100644
--- a/setup.c
+++ b/setup.c
@@ -468,9 +468,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -559,6 +561,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1204,8 +1207,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 00000000000..52deba272f2
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,142 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+INVALID_SHA1=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+initialize ()  {
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	mv .git/hooks .git/hooks-disabled
+}
+
+test_expect_success 'delete ref' '
+	initialize &&
+	test_commit file &&
+	SHA=$(git show-ref -s --verify HEAD) &&
+	test_write_lines "$SHA refs/heads/master" "$SHA refs/tags/file" >expect &&
+	git show-ref > actual &&
+	! git update-ref -d refs/tags/file $INVALID_SHA1 &&
+	test_cmp expect actual &&
+	git update-ref -d refs/tags/file $SHA  &&
+	test_write_lines "$SHA refs/heads/master" >expect &&
+	git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone calls transaction_initial_commit' '
+	test_commit message1 file1 &&
+	git clone . cloned &&
+	(test  -f cloned/file1 || echo "Fixme.")
+'
+
+test_expect_success 'basic operation of reftable storage: commit, show-ref' '
+	initialize &&
+	test_commit file &&
+	test_write_lines refs/heads/master refs/tags/file >expect &&
+	git show-ref &&
+	git show-ref | cut -f2 -d" " > actual &&
+	test_cmp actual expect
+'
+
+test_expect_success 'reflog, repack' '
+	initialize &&
+	for count in $(test_seq 1 10)
+	do
+		test_commit "number $count" file.t $count number-$count ||
+	        return 1
+	done &&
+	git pack-refs &&
+	ls -1 .git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 10 output &&
+	grep "commit (initial): number 1" output &&
+	grep "commit: number 10" output &&
+	git gc &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 0 output 
+'
+
+# This matches show-ref's output
+print_ref() {
+	echo "$(git rev-parse "$1") $1"
+}
+
+test_expect_success 'peeled tags are stored' '
+	initialize &&
+	test_commit file &&
+	git tag -m "annotated tag" test_tag HEAD &&
+	{
+		print_ref "refs/heads/master" &&
+		print_ref "refs/tags/file" &&
+		print_ref "refs/tags/test_tag" &&
+		print_ref "refs/tags/test_tag^{}" 
+	} >expect &&
+	git show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'show-ref works on fresh repo' '
+	initialize &&
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	>expect &&
+	! git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'checkout unborn branch' '
+	initialize &&
+	git checkout -b master
+'
+
+
+test_expect_success 'dir/file conflict' '
+	initialize &&
+	test_commit file &&
+	! git branch master/forbidden
+'
+
+
+test_expect_success 'do not clobber existing repo' '
+	rm -rf .git &&
+	git init --ref-storage=files &&
+	cat .git/HEAD > expect &&
+	test_commit file &&
+	(git init --ref-storage=reftable || true) &&
+	cat .git/HEAD > actual &&
+	test_cmp expect actual
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'pseudo refs' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+	git cherry-pick source &&
+	test -f file2
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'rebase' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+ 	git rebase source &&
+	test -f file2
+'
+
+test_done
+
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v15 11/13] Add GIT_DEBUG_REFS debugging mechanism
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (9 preceding siblings ...)
  2020-05-28 19:46                             ` [PATCH v15 10/13] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 19:46                             ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 12/13] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
                                               ` (4 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

When set in the environment, GIT_DEBUG_REFS makes git print operations and
results as they flow through the ref storage backend. This helps debug
discrepancies between different ref backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile              |   1 +
 refs.c                |   3 +
 refs/debug.c          | 309 ++++++++++++++++++++++++++++++++++++++++++
 refs/refs-internal.h  |   6 +
 t/t0033-debug-refs.sh |  18 +++
 5 files changed, 337 insertions(+)
 create mode 100644 refs/debug.c
 create mode 100755 t/t0033-debug-refs.sh

diff --git a/Makefile b/Makefile
index 6b21c247b12..9bab1acd6a4 100644
--- a/Makefile
+++ b/Makefile
@@ -959,6 +959,7 @@ LIB_OBJS += rebase.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
+LIB_OBJS += refs/debug.o
 LIB_OBJS += refs/files-backend.o
 LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
diff --git a/refs.c b/refs.c
index 4409080dfd8..bd1f3cc0e45 100644
--- a/refs.c
+++ b/refs.c
@@ -1750,6 +1750,9 @@ struct ref_store *get_main_ref_store(struct repository *r)
 						 r->ref_storage_format :
 						 DEFAULT_REF_STORAGE,
 					 REF_STORE_ALL_CAPS);
+	if (getenv("GIT_DEBUG_REFS")) {
+		r->refs_private = debug_wrap(r->refs_private);
+	}
 	return r->refs_private;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
new file mode 100644
index 00000000000..7c07ec92137
--- /dev/null
+++ b/refs/debug.c
@@ -0,0 +1,309 @@
+
+#include "refs-internal.h"
+
+struct debug_ref_store {
+	struct ref_store base;
+	struct ref_store *refs;
+};
+
+extern struct ref_storage_be refs_be_debug;
+struct ref_store *debug_wrap(struct ref_store *store);
+
+struct ref_store *debug_wrap(struct ref_store *store)
+{
+	struct debug_ref_store *res = malloc(sizeof(struct debug_ref_store));
+	res->refs = store;
+	base_ref_store_init((struct ref_store *)res, &refs_be_debug);
+	return (struct ref_store *)res;
+}
+
+static int debug_init_db(struct ref_store *refs, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res = drefs->refs->be->init_db(drefs->refs, err);
+	return res;
+}
+
+static int debug_transaction_prepare(struct ref_store *refs,
+				     struct ref_transaction *transaction,
+				     struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_prepare(drefs->refs, transaction,
+						   err);
+	return res;
+}
+
+static void print_update(int i, const char *refname,
+			 const struct object_id *old_oid,
+			 const struct object_id *new_oid, unsigned int flags,
+			 unsigned int type)
+{
+	char o[200] = "null";
+	char n[200] = "null";
+	if (old_oid)
+		oid_to_hex_r(o, old_oid);
+	if (new_oid)
+		oid_to_hex_r(n, new_oid);
+
+	type &= 0xf; /* see refs.h REF_* */
+	flags &= REF_HAVE_NEW | REF_HAVE_OLD | REF_NO_DEREF |
+		 REF_FORCE_CREATE_REFLOG | REF_LOG_ONLY;
+	printf("%d: %s %s -> %s (F=0x%x, T=0x%x)\n", i, refname, o, n, flags,
+	       type);
+}
+
+static void print_transaction(struct ref_transaction *transaction)
+{
+	printf("transaction {\n");
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		print_update(i, u->refname, &u->old_oid, &u->new_oid, u->flags,
+			     u->type);
+	}
+	printf("}\n");
+}
+
+static int debug_transaction_finish(struct ref_store *refs,
+				    struct ref_transaction *transaction,
+				    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_finish(drefs->refs, transaction,
+						  err);
+	print_transaction(transaction);
+	printf("finish: %d\n", res);
+	return res;
+}
+
+static int debug_transaction_abort(struct ref_store *refs,
+				   struct ref_transaction *transaction,
+				   struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_abort(drefs->refs, transaction, err);
+	return res;
+}
+
+static int debug_initial_transaction_commit(struct ref_store *refs,
+					    struct ref_transaction *transaction,
+					    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->initial_transaction_commit(drefs->refs,
+							  transaction, err);
+	return res;
+}
+
+static int debug_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->pack_refs(drefs->refs, flags);
+	return res;
+}
+
+static int debug_create_symref(struct ref_store *ref_store,
+			       const char *ref_target,
+			       const char *refs_heads_master,
+			       const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_symref(drefs->refs, ref_target,
+						 refs_heads_master, logmsg);
+	return res;
+}
+
+static int debug_delete_refs(struct ref_store *ref_store, const char *msg,
+			     struct string_list *refnames, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->delete_refs(drefs->refs, msg, refnames, flags);
+	return res;
+}
+static int debug_rename_ref(struct ref_store *ref_store, const char *oldref,
+			    const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->rename_ref(drefs->refs, oldref, newref,
+					      logmsg);
+	return res;
+}
+static int debug_copy_ref(struct ref_store *ref_store, const char *oldref,
+			  const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->copy_ref(drefs->refs, oldref, newref, logmsg);
+	return res;
+}
+
+static int debug_write_pseudoref(struct ref_store *ref_store,
+				 const char *pseudoref,
+				 const struct object_id *oid,
+				 const struct object_id *old_oid,
+				 struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->write_pseudoref(drefs->refs, pseudoref, oid,
+						   old_oid, err);
+	char o[100] = "null";
+	char n[100] = "null";
+	if (oid)
+		oid_to_hex_r(o, oid);
+	if (old_oid)
+		oid_to_hex_r(n, old_oid);
+	printf("write_pseudoref: %s, %s => %s, err %s: %d\n", pseudoref, o, n,
+	       err->buf, res);
+	return res;
+}
+
+static int debug_delete_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *old_oid)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->delete_pseudoref(drefs->refs, pseudoref,
+						    old_oid);
+	char hex[100] = "null";
+	if (old_oid)
+		oid_to_hex_r(hex, old_oid);
+	printf("delete_pseudoref: %s (%s): %d\n", pseudoref, hex, res);
+	return res;
+}
+
+static struct ref_iterator *
+debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			 unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->iterator_begin(drefs->refs, prefix, flags);
+	return res;
+}
+
+static int debug_read_raw_ref(struct ref_store *ref_store, const char *refname,
+			      struct object_id *oid, struct strbuf *referent,
+			      unsigned int *type)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = 0;
+
+	oidcpy(oid, &null_oid);
+	res = drefs->refs->be->read_raw_ref(drefs->refs, refname, oid, referent,
+					    type);
+
+	if (res == 0) {
+		printf("read_raw_ref: %s: %s (=> %s) type %x: %d\n", refname,
+		       oid_to_hex(oid), referent->buf, *type, res);
+	} else {
+		printf("read_raw_ref: %s err %d\n", refname, res);
+	}
+	return res;
+}
+
+static struct ref_iterator *
+debug_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->reflog_iterator_begin(drefs->refs);
+	return res;
+}
+
+static int debug_for_each_reflog_ent(struct ref_store *ref_store,
+				     const char *refname, each_reflog_ent_fn fn,
+				     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->for_each_reflog_ent(drefs->refs, refname, fn,
+						       cb_data);
+	return res;
+}
+static int debug_for_each_reflog_ent_reverse(struct ref_store *ref_store,
+					     const char *refname,
+					     each_reflog_ent_fn fn,
+					     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->for_each_reflog_ent_reverse(
+		drefs->refs, refname, fn, cb_data);
+	return res;
+}
+
+static int debug_reflog_exists(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_exists(drefs->refs, refname);
+	return res;
+}
+
+static int debug_create_reflog(struct ref_store *ref_store, const char *refname,
+			       int force_create, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_reflog(drefs->refs, refname,
+						 force_create, err);
+	return res;
+}
+
+static int debug_delete_reflog(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->delete_reflog(drefs->refs, refname);
+	return res;
+}
+
+static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
+			       const struct object_id *oid, unsigned int flags,
+			       reflog_expiry_prepare_fn prepare_fn,
+			       reflog_expiry_should_prune_fn should_prune_fn,
+			       reflog_expiry_cleanup_fn cleanup_fn,
+			       void *policy_cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_expire(drefs->refs, refname, oid,
+						 flags, prepare_fn,
+						 should_prune_fn, cleanup_fn,
+						 policy_cb_data);
+	return res;
+}
+
+struct ref_storage_be refs_be_debug = {
+	NULL,
+	"debug",
+	NULL,
+	debug_init_db,
+	debug_transaction_prepare,
+	debug_transaction_finish,
+	debug_transaction_abort,
+	debug_initial_transaction_commit,
+
+	debug_pack_refs,
+	debug_create_symref,
+	debug_delete_refs,
+	debug_rename_ref,
+	debug_copy_ref,
+
+	debug_write_pseudoref,
+	debug_delete_pseudoref,
+
+	debug_ref_iterator_begin,
+	debug_read_raw_ref,
+
+	debug_reflog_iterator_begin,
+	debug_for_each_reflog_ent,
+	debug_for_each_reflog_ent_reverse,
+	debug_reflog_exists,
+	debug_create_reflog,
+	debug_delete_reflog,
+	debug_reflog_expire,
+};
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 1d58fe17033..4dfa0c9b156 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -707,4 +707,10 @@ struct ref_store {
 void base_ref_store_init(struct ref_store *refs,
 			 const struct ref_storage_be *be);
 
+/*
+ * Print out ref operations as they occur. Useful for debugging alternate ref
+ * backends.
+ */
+struct ref_store *debug_wrap(struct ref_store *store);
+
 #endif /* REFS_REFS_INTERNAL_H */
diff --git a/t/t0033-debug-refs.sh b/t/t0033-debug-refs.sh
new file mode 100755
index 00000000000..96266a3ee3a
--- /dev/null
+++ b/t/t0033-debug-refs.sh
@@ -0,0 +1,18 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='cross-check reftable with files, using GIT_DEBUG_REFS output'
+
+. ./test-lib.sh
+
+test_expect_success 'GIT_DEBUG_REFS' '
+	git init --ref-storage=files files &&
+	git init --ref-storage=reftable reftable &&
+	(cd files && GIT_DEBUG_REFS=1 test_commit message file) > files.txt && 
+	(cd reftable && GIT_DEBUG_REFS=1 test_commit message file) > reftable.txt &&
+	test_cmp files.txt reftable.txt
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v15 12/13] vcxproj: adjust for the reftable changes
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (10 preceding siblings ...)
  2020-05-28 19:46                             ` [PATCH v15 11/13] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 19:46                             ` Johannes Schindelin via GitGitGadget
  2020-05-28 19:46                             ` [PATCH v15 13/13] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
                                               ` (3 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This allows Git to be compiled via Visual Studio again after integrating
the `hn/reftable` branch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.mak.uname                           |  2 +-
 contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/config.mak.uname b/config.mak.uname
index c7eba69e54e..ae4e25a1a42 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -709,7 +709,7 @@ vcxproj:
 	# Make .vcxproj files and add them
 	unset QUIET_GEN QUIET_BUILT_IN; \
 	perl contrib/buildsystems/generate -g Vcxproj
-	git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
+	git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj
 
 	# Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
 	(echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm
index 5c666f9ac03..33a08d31652 100644
--- a/contrib/buildsystems/Generators/Vcxproj.pm
+++ b/contrib/buildsystems/Generators/Vcxproj.pm
@@ -77,7 +77,7 @@ sub createProject {
     my $libs_release = "\n    ";
     my $libs_debug = "\n    ";
     if (!$static_library) {
-      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
+      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
       $libs_debug = $libs_release;
       $libs_debug =~ s/zlib\.lib/zlibd\.lib/g;
       $libs_debug =~ s/libcurl\.lib/libcurl-d\.lib/g;
@@ -231,6 +231,7 @@ sub createProject {
 EOM
     if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') {
       my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"};
+      my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"};
       my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"};
 
       print F << "EOM";
@@ -240,6 +241,14 @@ sub createProject {
       <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
     </ProjectReference>
 EOM
+      if (!($name =~ /xdiff|libreftable/)) {
+        print F << "EOM";
+    <ProjectReference Include="$cdup\\reftable\\libreftable\\libreftable.vcxproj">
+      <Project>$uuid_libreftable</Project>
+      <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
+    </ProjectReference>
+EOM
+      }
       if (!($name =~ 'xdiff')) {
         print F << "EOM";
     <ProjectReference Include="$cdup\\xdiff\\lib\\xdiff_lib.vcxproj">
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v15 13/13] Add reftable testing infrastructure
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (11 preceding siblings ...)
  2020-05-28 19:46                             ` [PATCH v15 12/13] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
@ 2020-05-28 19:46                             ` Han-Wen Nienhuys via GitGitGadget
  2020-05-28 20:15                             ` [PATCH v15 00/13] Reftable support git-core Junio C Hamano
                                               ` (2 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-05-28 19:46 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* Add GIT_TEST_REFTABLE environment var to control default ref storage

* Add test_prerequisite REFTABLE.

* Skip some tests that are incompatible:

  * t3210-pack-refs.sh - does not apply
  * t9903-bash-prompt - The bash mode reads .git/HEAD directly
  * t1450-fsck.sh - manipulates .git/ directly to create invalid state

Major test failures:

 * t1400-update-ref.sh - Reads from .git/{refs,logs} directly
 * t1404-update-ref-errors.sh - Manipulates .git/refs/ directly
 * t1405 - inspecs .git/ directly.

Worst offenders:

t1400-update-ref.sh                      - 82 of 185
t2400-worktree-add.sh                    - 58 of 69
t1404-update-ref-errors.sh               - 44 of 53
t3514-cherry-pick-revert-gpg.sh          - 36 of 36
t5541-http-push-smart.sh                 - 29 of 38
t6003-rev-list-topo-order.sh             - 29 of 35
t3420-rebase-autostash.sh                - 28 of 42
t6120-describe.sh                        - 21 of 82
t3430-rebase-merges.sh                   - 18 of 24
t2018-checkout-branch.sh                 - 15 of 22
..

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/clone.c               | 2 +-
 builtin/init-db.c             | 2 +-
 refs.c                        | 6 +++---
 t/t1409-avoid-packing-refs.sh | 6 ++++++
 t/t1450-fsck.sh               | 6 ++++++
 t/t3210-pack-refs.sh          | 6 ++++++
 t/t9903-bash-prompt.sh        | 6 ++++++
 t/test-lib.sh                 | 5 +++++
 8 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4d0cf065e4a..780c5807415 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1109,7 +1109,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 
 	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
-		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
+		default_ref_storage(), INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index b7053b9e370..da5b4670c84 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -545,7 +545,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
-	const char *ref_storage_format = DEFAULT_REF_STORAGE;
+	const char *ref_storage_format = default_ref_storage();
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
diff --git a/refs.c b/refs.c
index bd1f3cc0e45..ba5239b8421 100644
--- a/refs.c
+++ b/refs.c
@@ -1748,7 +1748,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	r->refs_private = ref_store_init(r->gitdir,
 					 r->ref_storage_format ?
 						 r->ref_storage_format :
-						 DEFAULT_REF_STORAGE,
+						 default_ref_storage(),
 					 REF_STORE_ALL_CAPS);
 	if (getenv("GIT_DEBUG_REFS")) {
 		r->refs_private = debug_wrap(r->refs_private);
@@ -1807,7 +1807,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
+	refs = ref_store_init(submodule_sb.buf, default_ref_storage(),
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1821,7 +1821,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
-	const char *format = DEFAULT_REF_STORAGE; /* XXX */
+	const char *format = default_ref_storage();
 	struct ref_store *refs;
 	const char *id;
 
diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
index be12fb63506..c6f78325563 100755
--- a/t/t1409-avoid-packing-refs.sh
+++ b/t/t1409-avoid-packing-refs.sh
@@ -4,6 +4,12 @@ test_description='avoid rewriting packed-refs unnecessarily'
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 # Add an identifying mark to the packed-refs file header line. This
 # shouldn't upset readers, and it should be omitted if the file is
 # ever rewritten.
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 344a2aad82f..09669203249 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -8,6 +8,12 @@ test_description='git fsck random collection of tests
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success setup '
 	test_oid_init &&
 	git config gc.auto 0 &&
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index f41b2afb996..edaef2c175a 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -11,6 +11,12 @@ semantic is still the same.
 '
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success 'enable reflogs' '
 	git config core.logallrefupdates true
 '
diff --git a/t/t9903-bash-prompt.sh b/t/t9903-bash-prompt.sh
index ab5da2cabc4..5deaaf968b0 100755
--- a/t/t9903-bash-prompt.sh
+++ b/t/t9903-bash-prompt.sh
@@ -7,6 +7,12 @@ test_description='test git-specific bash prompt functions'
 
 . ./lib-bash.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping tests; incompatible with reftable'
+  test_done
+fi
+
 . "$GIT_BUILD_DIR/contrib/completion/git-prompt.sh"
 
 actual="$TRASH_DIRECTORY/actual"
diff --git a/t/test-lib.sh b/t/test-lib.sh
index dbc027ff267..3ce9b957b1b 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1504,6 +1504,11 @@ parisc* | hppa*)
 	;;
 esac
 
+if test -n "$GIT_TEST_REFTABLE"
+then
+  test_set_prereq REFTABLE
+fi
+
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v15 00/13] Reftable support git-core
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (12 preceding siblings ...)
  2020-05-28 19:46                             ` [PATCH v15 13/13] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 20:15                             ` Junio C Hamano
  2020-05-28 21:21                             ` Junio C Hamano
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
  15 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-28 20:15 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys

I refrained from saying this but perhaps I should have in the early
round.  Next time, can you make sure the patches you are sending out
apply cleanly without need for fixing whitespace errors?

This is what I got.  Thanks.

$ git am -sc ./+hn13-v15-reftable
Applying: Write pseudorefs through ref backends.
Applying: Make refs_ref_exists public
Applying: Treat BISECT_HEAD as a pseudo ref
Applying: Treat CHERRY_PICK_HEAD as a pseudo ref
Applying: Treat REVERT_HEAD as a pseudo ref
Applying: Move REF_LOG_ONLY to refs-internal.h
Applying: Iterate over the "refs/" namespace in for_each_[raw]ref
Applying: Add .gitattributes for the reftable/ directory
.git/rebase-apply/patch:137: trailing whitespace.
    
.git/rebase-apply/patch:6784: trailing whitespace.
# Remove unittests and compatibility hacks we don't need here.  
.git/rebase-apply/patch:6787: trailing whitespace.
git add reftable/*.[ch] reftable/LICENSE reftable/VERSION 
.git/rebase-apply/patch:7558: indent with spaces.
        left = *destLen;
.git/rebase-apply/patch:7559: indent with spaces.
        *destLen = 0;
warning: squelched 16 whitespace errors
warning: 21 lines applied after fixing whitespace errors.
Applying: Add reftable library
.git/rebase-apply/patch:1865: indent with spaces.
	        return 1
.git/rebase-apply/patch:1876: trailing whitespace.
	test_line_count = 0 output 
.git/rebase-apply/patch:1892: trailing whitespace.
		print_ref "refs/tags/test_tag^{}" 
.git/rebase-apply/patch:1950: space before tab in indent.
 	git rebase source &&
.git/rebase-apply/patch:1955: new blank line at EOF.
+
warning: 4 lines applied after fixing whitespace errors.
Applying: Reftable support for git-core
Test number t0031 already taken
.git/rebase-apply/patch:385: trailing whitespace.
	(cd files && GIT_DEBUG_REFS=1 test_commit message file) > files.txt && 
warning: 1 line applied after fixing whitespace errors.
Applying: Add GIT_DEBUG_REFS debugging mechanism
Test number t0033 already taken
Applying: vcxproj: adjust for the reftable changes
Applying: Add reftable testing infrastructure

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v15 03/13] Treat BISECT_HEAD as a pseudo ref
  2020-05-28 19:46                             ` [PATCH v15 03/13] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
@ 2020-05-28 20:52                               ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-28 20:52 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget, Miriam Rubio
  Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> Both the git-bisect.sh as bisect--helper inspected the file system directly.
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  builtin/bisect--helper.c | 3 +--
>  git-bisect.sh            | 4 ++--
>  2 files changed, 3 insertions(+), 4 deletions(-)

Makes sense.  

A topic that adds more callers to the git_path_bisect_head()
function has already been in flight for quite a while; I wonder if
we can meet the topic in the middle.  For example, would it have
helped if we had a helper like this one:

        static inline int bisect_head_exists(void) {
               return ref_exists("BISECT_HEAD");
        }

on this side, and have the other side have something like:

        static inline int bisect_head_exists(void) {
               return file_exists(git_path_bisect_head());
        }

Then the caller(s) of bisect_head_exists() don't have to be changed
at all.

Anyway, it's just a lesson that communication and collaboration
between developers may help coming up with correct integration
results.

Thanks.  Queued.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v15 00/13] Reftable support git-core
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (13 preceding siblings ...)
  2020-05-28 20:15                             ` [PATCH v15 00/13] Reftable support git-core Junio C Hamano
@ 2020-05-28 21:21                             ` Junio C Hamano
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
  15 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-05-28 21:21 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This adds the reftable library, and hooks it up as a ref backend. Based on
> hn/refs-cleanup.
>
> Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1
>
> Summary 20479 tests pass 1299 tests fail

Testing without GIT_TEST_REFTABLE=0 gives a few failures, too.  I
haven't taken a look into them, though.

Thanks.


*** log for hn/reftable ***
HEAD is now at 341a26d318 Add reftable testing infrastructure
GIT_VERSION = unknown.g00000000
    * new build flags
    CC fuzz-commit-graph.o
    CC fuzz-pack-headers.o
    CC fuzz-pack-idx.o
    CC bugreport.o
...
[14:11:46] t7112-reset-submodule.sh ........................... ok   141577 ms ( 2.43 usr  0.95 sys + 661.06 cusr 464.38 csys = 1128.82 CPU)
[14:11:50] t9402-git-cvsserver-refs.sh ........................ ok    41912 ms ( 0.61 usr  0.17 sys + 260.43 cusr 177.60 csys = 438.81 CPU)
[14:12:03] t9001-send-email.sh ................................ ok   129424 ms ( 1.47 usr  0.57 sys + 576.18 cusr 387.33 csys = 965.55 CPU)
[14:12:03]

Test Summary Report
-------------------
t3510-cherry-pick-sequence.sh                    (Wstat: 256 Tests: 50 Failed: 0)
  Non-zero exit status: 1
  Parse errors: Tests out of sequence.  Found (14) but expected (13)
                Tests out of sequence.  Found (15) but expected (14)
                Tests out of sequence.  Found (16) but expected (15)
                Tests out of sequence.  Found (17) but expected (16)
                Tests out of sequence.  Found (18) but expected (17)
Displayed the first 5 of 39 TAP syntax errors.
Re-run prove with the -p option to see them all.
t3404-rebase-interactive.sh                      (Wstat: 256 Tests: 118 Failed: 0)
  Non-zero exit status: 1
  Parse errors: Tests out of sequence.  Found (100) but expected (98)
                Tests out of sequence.  Found (101) but expected (99)
                Tests out of sequence.  Found (102) but expected (100)
                Tests out of sequence.  Found (103) but expected (101)
                Tests out of sequence.  Found (104) but expected (102)
Displayed the first 5 of 22 TAP syntax errors.
Re-run prove with the -p option to see them all.
Files=906, Tests=22755, 360 wallclock secs (10.16 usr  3.35 sys + 1318.97 cusr 996.62 csys = 2329.10 CPU)
Result: FAIL
make[1]: *** [Makefile:52: prove] Error 1
make[1]: Leaving directory '/usr/local/google/home/jch/w/git.cycle/t'
make: *** [Makefile:2799: test] Error 2

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v16 00/14] Reftable support git-core
  2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                               ` (14 preceding siblings ...)
  2020-05-28 21:21                             ` Junio C Hamano
@ 2020-06-05 18:03                             ` Han-Wen Nienhuys via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 01/14] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
                                                 ` (15 more replies)
  15 siblings, 16 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend. Based on
hn/refs-cleanup.

Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1

Summary 20479 tests pass 1299 tests fail

Some issues:

 * many tests inspect .git/{logs,heads}/ directly.
 * worktrees broken. 

v18

 * fix pseudo ref usage
 * import and execute unittests too.

Han-Wen Nienhuys (13):
  Write pseudorefs through ref backends.
  Make refs_ref_exists public
  Treat BISECT_HEAD as a pseudo ref
  Treat CHERRY_PICK_HEAD as a pseudo ref
  Treat REVERT_HEAD as a pseudo ref
  Move REF_LOG_ONLY to refs-internal.h
  Iterate over the "refs/" namespace in for_each_[raw]ref
  Add .gitattributes for the reftable/ directory
  Add reftable library
  Reftable support for git-core
  Hookup unittests for the reftable library.
  Add GIT_DEBUG_REFS debugging mechanism
  Add reftable testing infrastructure

Johannes Schindelin (1):
  vcxproj: adjust for the reftable changes

 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   45 +-
 builtin/bisect--helper.c                      |    3 +-
 builtin/clone.c                               |    3 +-
 builtin/commit.c                              |   34 +-
 builtin/init-db.c                             |   56 +-
 builtin/merge.c                               |    2 +-
 cache.h                                       |    6 +-
 config.mak.uname                              |    2 +-
 contrib/buildsystems/Generators/Vcxproj.pm    |   11 +-
 git-bisect.sh                                 |    4 +-
 path.c                                        |    2 -
 path.h                                        |    9 +-
 refs.c                                        |  157 +-
 refs.h                                        |   16 +
 refs/debug.c                                  |  309 ++++
 refs/files-backend.c                          |  121 +-
 refs/packed-backend.c                         |   21 +-
 refs/refs-internal.h                          |   32 +
 refs/reftable-backend.c                       | 1331 +++++++++++++++++
 reftable/.gitattributes                       |    1 +
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   11 +
 reftable/VERSION                              |    1 +
 reftable/basics.c                             |  215 +++
 reftable/basics.h                             |   53 +
 reftable/block.c                              |  433 ++++++
 reftable/block.h                              |  129 ++
 reftable/block_test.c                         |  156 ++
 reftable/constants.h                          |   21 +
 reftable/file.c                               |   95 ++
 reftable/iter.c                               |  243 +++
 reftable/iter.h                               |   72 +
 reftable/merged.c                             |  320 ++++
 reftable/merged.h                             |   39 +
 reftable/merged_test.c                        |  273 ++++
 reftable/pq.c                                 |  113 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  742 +++++++++
 reftable/reader.h                             |   65 +
 reftable/record.c                             | 1113 ++++++++++++++
 reftable/record.h                             |  128 ++
 reftable/record_test.c                        |  400 +++++
 reftable/refname.c                            |  209 +++
 reftable/refname.h                            |   38 +
 reftable/refname_test.c                       |   99 ++
 reftable/reftable-tests.h                     |   13 +
 reftable/reftable.c                           |   90 ++
 reftable/reftable.h                           |  564 +++++++
 reftable/reftable_test.c                      |  631 ++++++++
 reftable/slice.c                              |  243 +++
 reftable/slice.h                              |   87 ++
 reftable/slice_test.c                         |   40 +
 reftable/stack.c                              | 1207 +++++++++++++++
 reftable/stack.h                              |   48 +
 reftable/stack_test.c                         |  743 +++++++++
 reftable/system.h                             |   54 +
 reftable/test_framework.c                     |   69 +
 reftable/test_framework.h                     |   64 +
 reftable/tree.c                               |   63 +
 reftable/tree.h                               |   34 +
 reftable/tree_test.c                          |   62 +
 reftable/update.sh                            |   25 +
 reftable/writer.c                             |  644 ++++++++
 reftable/writer.h                             |   60 +
 reftable/zlib-compat.c                        |   92 ++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 sequencer.c                                   |   56 +-
 setup.c                                       |   12 +-
 t/helper/test-tool.c                          |    1 +
 t/helper/test-tool.h                          |    1 +
 t/t0031-reftable.sh                           |  147 ++
 t/t0033-debug-refs.sh                         |   18 +
 t/t1409-avoid-packing-refs.sh                 |    6 +
 t/t1450-fsck.sh                               |    6 +
 t/t3210-pack-refs.sh                          |    6 +
 t/t9903-bash-prompt.sh                        |    6 +
 t/test-lib.sh                                 |    5 +
 wt-status.c                                   |    6 +-
 80 files changed, 12118 insertions(+), 195 deletions(-)
 create mode 100644 refs/debug.c
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/.gitattributes
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/refname_test.c
 create mode 100644 reftable/reftable-tests.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/slice_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/system.h
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100755 t/t0031-reftable.sh
 create mode 100755 t/t0033-debug-refs.sh


base-commit: aada2199e11bff14ee91d5d590c2e3d3eba5e148
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v16
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v16
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v15:

  1:  9bb64f748dc !  1:  4a1f600fb85 Write pseudorefs through ref backends.
     @@ refs/refs-internal.h: typedef int copy_ref_fn(struct ref_store *ref_store,
      +			       const struct object_id *oid,
      +			       const struct object_id *old_oid,
      +			       struct strbuf *err);
     ++
     ++/*
     ++ * Deletes a pseudoref. Deletion always succeeds (even if the pseudoref doesn't
     ++ * exist.), except if old_oid is specified. If it is, it can fail due to lock
     ++ * failure, failure reading the old OID, or an OID mismatch
     ++ */
      +typedef int delete_pseudoref_fn(struct ref_store *ref_store,
      +				const char *pseudoref,
      +				const struct object_id *old_oid);
  2:  d489806907b =  2:  a4a67ce9635 Make refs_ref_exists public
  3:  60a82678a1b =  3:  ecd6591d86f Treat BISECT_HEAD as a pseudo ref
  4:  1a925f5671a !  4:  7c31727de69 Treat CHERRY_PICK_HEAD as a pseudo ref
     @@ sequencer.c: static int commit_staged_changes(struct repository *r,
      -		if (file_exists(cherry_pick_head) && unlink(cherry_pick_head))
      +		if (refs_ref_exists(get_main_ref_store(r),
      +				    "CHERRY_PICK_HEAD") &&
     -+		    !refs_delete_pseudoref(get_main_ref_store(r),
     -+					   "CHERRY_PICK_HEAD", NULL))
     ++		    refs_delete_pseudoref(get_main_ref_store(r),
     ++					  "CHERRY_PICK_HEAD", NULL))
       			return error(_("could not remove CHERRY_PICK_HEAD"));
       		if (!final_fixup)
       			return 0;
  5:  8f2ebf65b90 !  5:  fbdcee5208b Treat REVERT_HEAD as a pseudo ref
     @@ sequencer.c: static int create_seq_dir(struct repository *r)
       	const char *in_progress_advice = NULL;
       	unsigned int advise_skip =
      -		file_exists(git_path_revert_head(r)) ||
     --		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
     -+		refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD");
     -+	refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
     ++		refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD") ||
     + 		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
       
       	if (!sequencer_get_last_command(r, &action)) {
     - 		switch (action) {
      @@ sequencer.c: static int rollback_single_pick(struct repository *r)
       	struct object_id head_oid;
       
  6:  e0e1e3a558d =  6:  d8801367f7d Move REF_LOG_ONLY to refs-internal.h
  7:  27a980d1d9d =  7:  556d57610cc Iterate over the "refs/" namespace in for_each_[raw]ref
  8:  dcbd000e7f7 =  8:  78cb1930172 Add .gitattributes for the reftable/ directory
  9:  718b646a54e !  9:  0bc28ac610f Add reftable library
     @@ reftable/README.md (new)
      
       ## reftable/VERSION (new) ##
      @@
     -+commit fe15c65dd95b6a13de92b4baf711e50338ef20ae
     -+Author: Han-Wen Nienhuys <hanwen@google.com>
     -+Date:   Thu May 28 19:28:24 2020 +0200
     -+
     -+    C: add 'canary' to struct slice
     -+    
     -+    This enforces that all slices are initialized explicitly with
     -+    SLICE_INIT, for future compatibility with git's strbuf.
     ++0c164275ea8f8aa7d099f7425ddcef1affe137e9 C: compile framework with Git options
      
       ## reftable/basics.c (new) ##
      @@
     @@ reftable/block.c (new)
      +}
      +
      +int block_writer_register_restart(struct block_writer *w, int n, bool restart,
     -+				  struct slice key);
     ++				  struct slice *key);
      +
      +void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
      +		       uint32_t block_size, uint32_t header_off, int hash_size)
     @@ reftable/block.c (new)
      +	slice_consume(&out, n);
      +
      +	if (block_writer_register_restart(w, start.len - out.len, restart,
     -+					  key) < 0)
     ++					  &key) < 0)
      +		goto done;
      +
      +	slice_release(&key);
     @@ reftable/block.c (new)
      +}
      +
      +int block_writer_register_restart(struct block_writer *w, int n, bool restart,
     -+				  struct slice key)
     ++				  struct slice *key)
      +{
      +	int rlen = w->restart_len;
      +	if (rlen >= MAX_RESTARTS) {
     @@ reftable/block.c (new)
      +		return -1;
      +	}
      +
     -+	result = slice_cmp(a->key, rkey);
     ++	result = slice_cmp(&a->key, &rkey);
      +	slice_release(&rkey);
      +	return result;
      +}
     @@ reftable/block.c (new)
      +{
      +	dest->br = src->br;
      +	dest->next_off = src->next_off;
     -+	slice_copy(&dest->last_key, src->last_key);
     ++	slice_copy(&dest->last_key, &src->last_key);
      +}
      +
      +int block_iter_next(struct block_iter *it, struct reftable_record *rec)
     @@ reftable/block.c (new)
      +		return -1;
      +	slice_consume(&in, n);
      +
     -+	slice_copy(&it->last_key, key);
     ++	slice_copy(&it->last_key, &key);
      +	it->next_off += start.len - in.len;
      +	slice_release(&key);
      +	return 0;
     @@ reftable/block.c (new)
      +	return 0;
      +}
      +
     -+int block_iter_seek(struct block_iter *it, struct slice want)
     ++int block_iter_seek(struct block_iter *it, struct slice *want)
      +{
      +	return block_reader_seek(it->br, it, want);
      +}
     @@ reftable/block.c (new)
      +}
      +
      +int block_reader_seek(struct block_reader *br, struct block_iter *it,
     -+		      struct slice want)
     ++		      struct slice *want)
      +{
      +	struct restart_find_args args = {
     -+		.key = want,
     ++		.key = *want,
      +		.r = br,
      +	};
      +	struct reftable_record rec = reftable_new_record(block_reader_type(br));
     @@ reftable/block.c (new)
      +			goto done;
      +
      +		reftable_record_key(&rec, &key);
     -+		if (err > 0 || slice_cmp(key, want) >= 0) {
     ++		if (err > 0 || slice_cmp(&key, want) >= 0) {
      +			err = 0;
      +			goto done;
      +		}
     @@ reftable/block.h (new)
      +
      +/* Position `it` to the `want` key in the block */
      +int block_reader_seek(struct block_reader *br, struct block_iter *it,
     -+		      struct slice want);
     ++		      struct slice *want);
      +
      +/* Returns the block type (eg. 'r' for refs) */
      +byte block_reader_type(struct block_reader *r);
     @@ reftable/block.h (new)
      +int block_iter_next(struct block_iter *it, struct reftable_record *rec);
      +
      +/* Seek to `want` with in the block pointed to by `it` */
     -+int block_iter_seek(struct block_iter *it, struct slice want);
     ++int block_iter_seek(struct block_iter *it, struct slice *want);
      +
      +/* deallocate memory for `it`. The block reader and its block is left intact. */
      +void block_iter_close(struct block_iter *it);
     @@ reftable/block.h (new)
      +
      +#endif
      
     + ## reftable/block_test.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include "block.h"
     ++
     ++#include "system.h"
     ++
     ++#include "basics.h"
     ++#include "constants.h"
     ++#include "record.h"
     ++#include "reftable.h"
     ++#include "test_framework.h"
     ++#include "reftable-tests.h"
     ++
     ++struct binsearch_args {
     ++	int key;
     ++	int *arr;
     ++};
     ++
     ++static int binsearch_func(size_t i, void *void_args)
     ++{
     ++	struct binsearch_args *args = (struct binsearch_args *)void_args;
     ++
     ++	return args->key < args->arr[i];
     ++}
     ++
     ++static void test_binsearch(void)
     ++{
     ++	int arr[] = { 2, 4, 6, 8, 10 };
     ++	size_t sz = ARRAY_SIZE(arr);
     ++	struct binsearch_args args = {
     ++		.arr = arr,
     ++	};
     ++
     ++	int i = 0;
     ++	for (i = 1; i < 11; i++) {
     ++		int res;
     ++		args.key = i;
     ++		res = binsearch(sz, &binsearch_func, &args);
     ++
     ++		if (res < sz) {
     ++			assert(args.key < arr[res]);
     ++			if (res > 0) {
     ++				assert(args.key >= arr[res - 1]);
     ++			}
     ++		} else {
     ++			assert(args.key == 10 || args.key == 11);
     ++		}
     ++	}
     ++}
     ++
     ++static void test_block_read_write(void)
     ++{
     ++	const int header_off = 21; /* random */
     ++	char *names[30];
     ++	const int N = ARRAY_SIZE(names);
     ++	const int block_size = 1024;
     ++	struct reftable_block block = { 0 };
     ++	struct block_writer bw = {
     ++		.last_key = SLICE_INIT,
     ++	};
     ++	struct reftable_ref_record ref = { 0 };
     ++	struct reftable_record rec = { 0 };
     ++	int i = 0;
     ++	int n;
     ++	struct block_reader br = { 0 };
     ++	struct block_iter it = { .last_key = SLICE_INIT };
     ++	int j = 0;
     ++	struct slice want = SLICE_INIT;
     ++
     ++	block.data = reftable_calloc(block_size);
     ++	block.len = block_size;
     ++	block.source = malloc_block_source();
     ++	block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size,
     ++			  header_off, hash_size(SHA1_ID));
     ++	reftable_record_from_ref(&rec, &ref);
     ++
     ++	for (i = 0; i < N; i++) {
     ++		char name[100];
     ++		byte hash[SHA1_SIZE];
     ++		snprintf(name, sizeof(name), "branch%02d", i);
     ++		memset(hash, i, sizeof(hash));
     ++
     ++		ref.ref_name = name;
     ++		ref.value = hash;
     ++		names[i] = xstrdup(name);
     ++		n = block_writer_add(&bw, &rec);
     ++		ref.ref_name = NULL;
     ++		ref.value = NULL;
     ++		assert(n == 0);
     ++	}
     ++
     ++	n = block_writer_finish(&bw);
     ++	assert(n > 0);
     ++
     ++	block_writer_clear(&bw);
     ++
     ++	block_reader_init(&br, &block, header_off, block_size, SHA1_SIZE);
     ++
     ++	block_reader_start(&br, &it);
     ++
     ++	while (true) {
     ++		int r = block_iter_next(&it, &rec);
     ++		assert(r >= 0);
     ++		if (r > 0) {
     ++			break;
     ++		}
     ++		assert_streq(names[j], ref.ref_name);
     ++		j++;
     ++	}
     ++
     ++	reftable_record_clear(&rec);
     ++	block_iter_close(&it);
     ++
     ++	for (i = 0; i < N; i++) {
     ++		struct block_iter it = { .last_key = SLICE_INIT };
     ++		slice_set_string(&want, names[i]);
     ++
     ++		n = block_reader_seek(&br, &it, &want);
     ++		assert(n == 0);
     ++
     ++		n = block_iter_next(&it, &rec);
     ++		assert(n == 0);
     ++
     ++		assert_streq(names[i], ref.ref_name);
     ++
     ++		want.len--;
     ++		n = block_reader_seek(&br, &it, &want);
     ++		assert(n == 0);
     ++
     ++		n = block_iter_next(&it, &rec);
     ++		assert(n == 0);
     ++		assert_streq(names[10 * (i / 10)], ref.ref_name);
     ++
     ++		block_iter_close(&it);
     ++	}
     ++
     ++	reftable_record_clear(&rec);
     ++	reftable_block_done(&br.block);
     ++	slice_release(&want);
     ++	for (i = 0; i < N; i++) {
     ++		reftable_free(names[i]);
     ++	}
     ++}
     ++
     ++int block_test_main(int argc, const char *argv[])
     ++{
     ++	add_test_case("binsearch", &test_binsearch);
     ++	add_test_case("block_read_write", &test_block_read_write);
     ++	return test_main(argc, argv);
     ++}
     +
       ## reftable/constants.h (new) ##
      @@
      +/*
     @@ reftable/merged.c (new)
      +
      +		reftable_record_key(&top.rec, &k);
      +
     -+		cmp = slice_cmp(k, entry_key);
     ++		cmp = slice_cmp(&k, &entry_key);
      +		slice_release(&k);
      +
      +		if (cmp > 0) {
     @@ reftable/merged.h (new)
      +
      +#endif
      
     + ## reftable/merged_test.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include "merged.h"
     ++
     ++#include "system.h"
     ++
     ++#include "basics.h"
     ++#include "block.h"
     ++#include "constants.h"
     ++#include "pq.h"
     ++#include "reader.h"
     ++#include "record.h"
     ++#include "reftable.h"
     ++#include "test_framework.h"
     ++#include "reftable-tests.h"
     ++
     ++static void test_pq(void)
     ++{
     ++	char *names[54] = { 0 };
     ++	int N = ARRAY_SIZE(names) - 1;
     ++
     ++	struct merged_iter_pqueue pq = { 0 };
     ++	const char *last = NULL;
     ++
     ++	int i = 0;
     ++	for (i = 0; i < N; i++) {
     ++		char name[100];
     ++		snprintf(name, sizeof(name), "%02d", i);
     ++		names[i] = xstrdup(name);
     ++	}
     ++
     ++	i = 1;
     ++	do {
     ++		struct reftable_record rec =
     ++			reftable_new_record(BLOCK_TYPE_REF);
     ++		struct pq_entry e = { 0 };
     ++
     ++		reftable_record_as_ref(&rec)->ref_name = names[i];
     ++		e.rec = rec;
     ++		merged_iter_pqueue_add(&pq, e);
     ++		merged_iter_pqueue_check(pq);
     ++		i = (i * 7) % N;
     ++	} while (i != 1);
     ++
     ++	while (!merged_iter_pqueue_is_empty(pq)) {
     ++		struct pq_entry e = merged_iter_pqueue_remove(&pq);
     ++		struct reftable_ref_record *ref =
     ++			reftable_record_as_ref(&e.rec);
     ++
     ++		merged_iter_pqueue_check(pq);
     ++
     ++		if (last != NULL) {
     ++			assert(strcmp(last, ref->ref_name) < 0);
     ++		}
     ++		last = ref->ref_name;
     ++		ref->ref_name = NULL;
     ++		reftable_free(ref);
     ++	}
     ++
     ++	for (i = 0; i < N; i++) {
     ++		reftable_free(names[i]);
     ++	}
     ++
     ++	merged_iter_pqueue_clear(&pq);
     ++}
     ++
     ++static void write_test_table(struct slice *buf,
     ++			     struct reftable_ref_record refs[], int n)
     ++{
     ++	int min = 0xffffffff;
     ++	int max = 0;
     ++	int i = 0;
     ++	int err;
     ++
     ++	struct reftable_write_options opts = {
     ++		.block_size = 256,
     ++	};
     ++	struct reftable_writer *w = NULL;
     ++	for (i = 0; i < n; i++) {
     ++		uint64_t ui = refs[i].update_index;
     ++		if (ui > max) {
     ++			max = ui;
     ++		}
     ++		if (ui < min) {
     ++			min = ui;
     ++		}
     ++	}
     ++
     ++	w = reftable_new_writer(&slice_add_void, buf, &opts);
     ++	reftable_writer_set_limits(w, min, max);
     ++
     ++	for (i = 0; i < n; i++) {
     ++		uint64_t before = refs[i].update_index;
     ++		int n = reftable_writer_add_ref(w, &refs[i]);
     ++		assert(n == 0);
     ++		assert(before == refs[i].update_index);
     ++	}
     ++
     ++	err = reftable_writer_close(w);
     ++	assert_err(err);
     ++
     ++	reftable_writer_free(w);
     ++}
     ++
     ++static struct reftable_merged_table *
     ++merged_table_from_records(struct reftable_ref_record **refs,
     ++			  struct reftable_block_source **source, int *sizes,
     ++			  struct slice *buf, int n)
     ++{
     ++	struct reftable_reader **rd = reftable_calloc(n * sizeof(*rd));
     ++	int i = 0;
     ++	struct reftable_merged_table *mt = NULL;
     ++	int err;
     ++	*source = reftable_calloc(n * sizeof(**source));
     ++	for (i = 0; i < n; i++) {
     ++		write_test_table(&buf[i], refs[i], sizes[i]);
     ++		block_source_from_slice(&(*source)[i], &buf[i]);
     ++
     ++		err = reftable_new_reader(&rd[i], &(*source)[i], "name");
     ++		assert_err(err);
     ++	}
     ++
     ++	err = reftable_new_merged_table(&mt, rd, n, SHA1_ID);
     ++	assert_err(err);
     ++	return mt;
     ++}
     ++
     ++static void test_merged_between(void)
     ++{
     ++	byte hash1[SHA1_SIZE] = { 1, 2, 3, 0 };
     ++
     ++	struct reftable_ref_record r1[] = { {
     ++		.ref_name = "b",
     ++		.update_index = 1,
     ++		.value = hash1,
     ++	} };
     ++	struct reftable_ref_record r2[] = { {
     ++		.ref_name = "a",
     ++		.update_index = 2,
     ++	} };
     ++
     ++	struct reftable_ref_record *refs[] = { r1, r2 };
     ++	int sizes[] = { 1, 1 };
     ++	struct slice bufs[2] = { SLICE_INIT, SLICE_INIT };
     ++	struct reftable_block_source *bs = NULL;
     ++	struct reftable_merged_table *mt =
     ++		merged_table_from_records(refs, &bs, sizes, bufs, 2);
     ++	int i;
     ++	struct reftable_ref_record ref = { 0 };
     ++	struct reftable_iterator it = { 0 };
     ++	int err = reftable_merged_table_seek_ref(mt, &it, "a");
     ++	assert_err(err);
     ++
     ++	err = reftable_iterator_next_ref(&it, &ref);
     ++	assert_err(err);
     ++	assert(ref.update_index == 2);
     ++	reftable_ref_record_clear(&ref);
     ++
     ++	reftable_iterator_destroy(&it);
     ++	reftable_merged_table_close(mt);
     ++	reftable_merged_table_free(mt);
     ++	for (i = 0; i < ARRAY_SIZE(bufs); i++) {
     ++		slice_release(&bufs[i]);
     ++	}
     ++	reftable_free(bs);
     ++}
     ++
     ++static void test_merged(void)
     ++{
     ++	byte hash1[SHA1_SIZE] = { 1 };
     ++	byte hash2[SHA1_SIZE] = { 2 };
     ++	struct reftable_ref_record r1[] = { {
     ++						    .ref_name = "a",
     ++						    .update_index = 1,
     ++						    .value = hash1,
     ++					    },
     ++					    {
     ++						    .ref_name = "b",
     ++						    .update_index = 1,
     ++						    .value = hash1,
     ++					    },
     ++					    {
     ++						    .ref_name = "c",
     ++						    .update_index = 1,
     ++						    .value = hash1,
     ++					    } };
     ++	struct reftable_ref_record r2[] = { {
     ++		.ref_name = "a",
     ++		.update_index = 2,
     ++	} };
     ++	struct reftable_ref_record r3[] = {
     ++		{
     ++			.ref_name = "c",
     ++			.update_index = 3,
     ++			.value = hash2,
     ++		},
     ++		{
     ++			.ref_name = "d",
     ++			.update_index = 3,
     ++			.value = hash1,
     ++		},
     ++	};
     ++
     ++	struct reftable_ref_record want[] = {
     ++		r2[0],
     ++		r1[1],
     ++		r3[0],
     ++		r3[1],
     ++	};
     ++
     ++	struct reftable_ref_record *refs[] = { r1, r2, r3 };
     ++	int sizes[3] = { 3, 1, 2 };
     ++	struct slice bufs[3] = { SLICE_INIT, SLICE_INIT, SLICE_INIT };
     ++	struct reftable_block_source *bs = NULL;
     ++
     ++	struct reftable_merged_table *mt =
     ++		merged_table_from_records(refs, &bs, sizes, bufs, 3);
     ++
     ++	struct reftable_iterator it = { 0 };
     ++	int err = reftable_merged_table_seek_ref(mt, &it, "a");
     ++	struct reftable_ref_record *out = NULL;
     ++	int len = 0;
     ++	int cap = 0;
     ++	int i = 0;
     ++
     ++	assert_err(err);
     ++	while (len < 100) { /* cap loops/recursion. */
     ++		struct reftable_ref_record ref = { 0 };
     ++		int err = reftable_iterator_next_ref(&it, &ref);
     ++		if (err > 0) {
     ++			break;
     ++		}
     ++		if (len == cap) {
     ++			cap = 2 * cap + 1;
     ++			out = reftable_realloc(
     ++				out, sizeof(struct reftable_ref_record) * cap);
     ++		}
     ++		out[len++] = ref;
     ++	}
     ++	reftable_iterator_destroy(&it);
     ++
     ++	assert(ARRAY_SIZE(want) == len);
     ++	for (i = 0; i < len; i++) {
     ++		assert(reftable_ref_record_equal(&want[i], &out[i], SHA1_SIZE));
     ++	}
     ++	for (i = 0; i < len; i++) {
     ++		reftable_ref_record_clear(&out[i]);
     ++	}
     ++	reftable_free(out);
     ++
     ++	for (i = 0; i < 3; i++) {
     ++		slice_release(&bufs[i]);
     ++	}
     ++	reftable_merged_table_close(mt);
     ++	reftable_merged_table_free(mt);
     ++	reftable_free(bs);
     ++}
     ++
     ++/* XXX test refs_for(oid) */
     ++
     ++int merged_test_main(int argc, const char *argv[])
     ++{
     ++	add_test_case("test_merged_between", &test_merged_between);
     ++	add_test_case("test_pq", &test_pq);
     ++	add_test_case("test_merged", &test_merged);
     ++	return test_main(argc, argv);
     ++}
     +
       ## reftable/pq.c (new) ##
      @@
      +/*
     @@ reftable/pq.c (new)
      +	reftable_record_key(&a.rec, &ak);
      +	reftable_record_key(&b.rec, &bk);
      +
     -+	cmp = slice_cmp(ak, bk);
     ++	cmp = slice_cmp(&ak, &bk);
      +
      +	slice_release(&ak);
      +	slice_release(&bk);
     @@ reftable/reader.c (new)
      +		if (err < 0)
      +			goto done;
      +
     -+		if (slice_cmp(got_key, want_key) > 0) {
     ++		if (slice_cmp(&got_key, &want_key) > 0) {
      +			table_iter_block_done(&next);
      +			break;
      +		}
     @@ reftable/reader.c (new)
      +		table_iter_copy_from(ti, &next);
      +	}
      +
     -+	err = block_iter_seek(&ti->bi, want_key);
     ++	err = block_iter_seek(&ti->bi, &want_key);
      +	if (err < 0)
      +		goto done;
      +	err = 0;
     @@ reftable/reader.c (new)
      +		if (err != 0)
      +			goto done;
      +
     -+		err = block_iter_seek(&next.bi, want_index.last_key);
     ++		err = block_iter_seek(&next.bi, &want_index.last_key);
      +		if (err < 0)
      +			goto done;
      +
     @@ reftable/record.c (new)
      +#include "constants.h"
      +#include "reftable.h"
      +
     -+int get_var_int(uint64_t *dest, struct slice in)
     ++int get_var_int(uint64_t *dest, struct slice *in)
      +{
      +	int ptr = 0;
      +	uint64_t val;
      +
     -+	if (in.len == 0)
     ++	if (in->len == 0)
      +		return -1;
     -+	val = in.buf[ptr] & 0x7f;
     ++	val = in->buf[ptr] & 0x7f;
      +
     -+	while (in.buf[ptr] & 0x80) {
     ++	while (in->buf[ptr] & 0x80) {
      +		ptr++;
     -+		if (ptr > in.len) {
     ++		if (ptr > in->len) {
      +			return -1;
      +		}
     -+		val = (val + 1) << 7 | (uint64_t)(in.buf[ptr] & 0x7f);
     ++		val = (val + 1) << 7 | (uint64_t)(in->buf[ptr] & 0x7f);
      +	}
      +
      +	*dest = val;
      +	return ptr + 1;
      +}
      +
     -+int put_var_int(struct slice dest, uint64_t val)
     ++int put_var_int(struct slice *dest, uint64_t val)
      +{
      +	byte buf[10] = { 0 };
      +	int i = 9;
     @@ reftable/record.c (new)
      +	}
      +
      +	n = sizeof(buf) - i - 1;
     -+	if (dest.len < n)
     ++	if (dest->len < n)
      +		return -1;
     -+	memcpy(dest.buf, &buf[i + 1], n);
     ++	memcpy(dest->buf, &buf[i + 1], n);
      +	return n;
      +}
      +
     @@ reftable/record.c (new)
      +{
      +	int start_len = in.len;
      +	uint64_t tsize = 0;
     -+	int n = get_var_int(&tsize, in);
     ++	int n = get_var_int(&tsize, &in);
      +	if (n <= 0)
      +		return -1;
      +	slice_consume(&in, n);
     @@ reftable/record.c (new)
      +{
      +	struct slice start = s;
      +	int l = strlen(str);
     -+	int n = put_var_int(s, l);
     ++	int n = put_var_int(&s, l);
      +	if (n < 0)
      +		return -1;
      +	slice_consume(&s, n);
     @@ reftable/record.c (new)
      +			struct slice key, byte extra)
      +{
      +	struct slice start = dest;
     -+	int prefix_len = common_prefix_size(prev_key, key);
     ++	int prefix_len = common_prefix_size(&prev_key, &key);
      +	uint64_t suffix_len = key.len - prefix_len;
     -+	int n = put_var_int(dest, (uint64_t)prefix_len);
     ++	int n = put_var_int(&dest, (uint64_t)prefix_len);
      +	if (n < 0)
      +		return -1;
      +	slice_consume(&dest, n);
      +
      +	*restart = (prefix_len == 0);
      +
     -+	n = put_var_int(dest, suffix_len << 3 | (uint64_t)extra);
     ++	n = put_var_int(&dest, suffix_len << 3 | (uint64_t)extra);
      +	if (n < 0)
      +		return -1;
      +	slice_consume(&dest, n);
     @@ reftable/record.c (new)
      +	int start_len = in.len;
      +	uint64_t prefix_len = 0;
      +	uint64_t suffix_len = 0;
     -+	int n = get_var_int(&prefix_len, in);
     ++	int n = get_var_int(&prefix_len, &in);
      +	if (n < 0)
      +		return -1;
      +	slice_consume(&in, n);
     @@ reftable/record.c (new)
      +	if (prefix_len > last_key.len)
      +		return -1;
      +
     -+	n = get_var_int(&suffix_len, in);
     ++	n = get_var_int(&suffix_len, &in);
      +	if (n <= 0)
      +		return -1;
      +	slice_consume(&in, n);
     @@ reftable/record.c (new)
      +	const struct reftable_ref_record *r =
      +		(const struct reftable_ref_record *)rec;
      +	struct slice start = s;
     -+	int n = put_var_int(s, r->update_index);
     ++	int n = put_var_int(&s, r->update_index);
      +	assert(hash_size > 0);
      +	if (n < 0)
      +		return -1;
     @@ reftable/record.c (new)
      +	bool seen_target_value = false;
      +	bool seen_target = false;
      +
     -+	int n = get_var_int(&r->update_index, in);
     ++	int n = get_var_int(&r->update_index, &in);
      +	if (n < 0)
      +		return n;
      +	assert(hash_size > 0);
     @@ reftable/record.c (new)
      +	int n = 0;
      +	uint64_t last = 0;
      +	if (r->offset_len == 0 || r->offset_len >= 8) {
     -+		n = put_var_int(s, r->offset_len);
     ++		n = put_var_int(&s, r->offset_len);
      +		if (n < 0) {
      +			return -1;
      +		}
     @@ reftable/record.c (new)
      +	}
      +	if (r->offset_len == 0)
      +		return start.len - s.len;
     -+	n = put_var_int(s, r->offsets[0]);
     ++	n = put_var_int(&s, r->offsets[0]);
      +	if (n < 0)
      +		return -1;
      +	slice_consume(&s, n);
      +
      +	last = r->offsets[0];
      +	for (i = 1; i < r->offset_len; i++) {
     -+		int n = put_var_int(s, r->offsets[i] - last);
     ++		int n = put_var_int(&s, r->offsets[i] - last);
      +		if (n < 0) {
      +			return -1;
      +		}
     @@ reftable/record.c (new)
      +	r->hash_prefix_len = key.len;
      +
      +	if (val_type == 0) {
     -+		n = get_var_int(&count, in);
     ++		n = get_var_int(&count, &in);
      +		if (n < 0) {
      +			return n;
      +		}
     @@ reftable/record.c (new)
      +	r->offsets = reftable_malloc(count * sizeof(uint64_t));
      +	r->offset_len = count;
      +
     -+	n = get_var_int(&r->offsets[0], in);
     ++	n = get_var_int(&r->offsets[0], &in);
      +	if (n < 0)
      +		return n;
      +	slice_consume(&in, n);
     @@ reftable/record.c (new)
      +	j = 1;
      +	while (j < count) {
      +		uint64_t delta = 0;
     -+		int n = get_var_int(&delta, in);
     ++		int n = get_var_int(&delta, &in);
      +		if (n < 0) {
      +			return n;
      +		}
     @@ reftable/record.c (new)
      +		return -1;
      +	slice_consume(&s, n);
      +
     -+	n = put_var_int(s, r->time);
     ++	n = put_var_int(&s, r->time);
      +	if (n < 0)
      +		return -1;
      +	slice_consume(&s, n);
     @@ reftable/record.c (new)
      +	r->email[dest.len] = 0;
      +
      +	ts = 0;
     -+	n = get_var_int(&ts, in);
     ++	n = get_var_int(&ts, &in);
      +	if (n < 0)
      +		goto done;
      +	slice_consume(&in, n);
     @@ reftable/record.c (new)
      +static void reftable_index_record_key(const void *r, struct slice *dest)
      +{
      +	struct reftable_index_record *rec = (struct reftable_index_record *)r;
     -+	slice_copy(dest, rec->last_key);
     ++	slice_copy(dest, &rec->last_key);
      +}
      +
      +static void reftable_index_record_copy_from(void *rec, const void *src_rec,
     @@ reftable/record.c (new)
      +	struct reftable_index_record *src =
      +		(struct reftable_index_record *)src_rec;
      +
     -+	slice_copy(&dst->last_key, src->last_key);
     ++	slice_copy(&dst->last_key, &src->last_key);
      +	dst->offset = src->offset;
      +}
      +
     @@ reftable/record.c (new)
      +		(const struct reftable_index_record *)rec;
      +	struct slice start = out;
      +
     -+	int n = put_var_int(out, r->offset);
     ++	int n = put_var_int(&out, r->offset);
      +	if (n < 0)
      +		return n;
      +
     @@ reftable/record.c (new)
      +	struct reftable_index_record *r = (struct reftable_index_record *)rec;
      +	int n = 0;
      +
     -+	slice_copy(&r->last_key, key);
     ++	slice_copy(&r->last_key, &key);
      +
     -+	n = get_var_int(&r->offset, in);
     ++	n = get_var_int(&r->offset, &in);
      +	if (n < 0)
      +		return n;
      +
     @@ reftable/record.h (new)
      +
      +/* utilities for de/encoding varints */
      +
     -+int get_var_int(uint64_t *dest, struct slice in);
     -+int put_var_int(struct slice dest, uint64_t val);
     ++int get_var_int(uint64_t *dest, struct slice *in);
     ++int put_var_int(struct slice *dest, uint64_t val);
      +
      +/* Methods for records. */
      +struct reftable_record_vtable {
     @@ reftable/record.h (new)
      +
      +#endif
      
     - ## reftable/refname.c (new) ##
     + ## reftable/record_test.c (new) ##
      @@
      +/*
     -+  Copyright 2020 Google LLC
     ++Copyright 2020 Google LLC
      +
     -+  Use of this source code is governed by a BSD-style
     -+  license that can be found in the LICENSE file or at
     -+  https://developers.google.com/open-source/licenses/bsd
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
      +*/
      +
     ++#include "record.h"
     ++
      +#include "system.h"
     -+#include "reftable.h"
      +#include "basics.h"
     -+#include "refname.h"
     -+#include "slice.h"
     -+
     -+struct find_arg {
     -+	char **names;
     -+	const char *want;
     -+};
     ++#include "constants.h"
     ++#include "reftable.h"
     ++#include "test_framework.h"
     ++#include "reftable-tests.h"
      +
     -+static int find_name(size_t k, void *arg)
     ++static void test_copy(struct reftable_record *rec)
      +{
     -+	struct find_arg *f_arg = (struct find_arg *)arg;
     -+	return strcmp(f_arg->names[k], f_arg->want) >= 0;
     ++	struct reftable_record copy =
     ++		reftable_new_record(reftable_record_type(rec));
     ++	reftable_record_copy_from(&copy, rec, SHA1_SIZE);
     ++	/* do it twice to catch memory leaks */
     ++	reftable_record_copy_from(&copy, rec, SHA1_SIZE);
     ++	switch (reftable_record_type(&copy)) {
     ++	case BLOCK_TYPE_REF:
     ++		assert(reftable_ref_record_equal(reftable_record_as_ref(&copy),
     ++						 reftable_record_as_ref(rec),
     ++						 SHA1_SIZE));
     ++		break;
     ++	case BLOCK_TYPE_LOG:
     ++		assert(reftable_log_record_equal(reftable_record_as_log(&copy),
     ++						 reftable_record_as_log(rec),
     ++						 SHA1_SIZE));
     ++		break;
     ++	}
     ++	reftable_record_destroy(&copy);
      +}
      +
     -+int modification_has_ref(struct modification *mod, const char *name)
     ++static void test_varint_roundtrip(void)
      +{
     -+	struct reftable_ref_record ref = { 0 };
     -+	int err = 0;
     ++	uint64_t inputs[] = { 0,
     ++			      1,
     ++			      27,
     ++			      127,
     ++			      128,
     ++			      257,
     ++			      4096,
     ++			      ((uint64_t)1 << 63),
     ++			      ((uint64_t)1 << 63) + ((uint64_t)1 << 63) - 1 };
     ++	int i = 0;
     ++	for (i = 0; i < ARRAY_SIZE(inputs); i++) {
     ++		byte dest[10];
      +
     -+	if (mod->add_len > 0) {
     -+		struct find_arg arg = {
     -+			.names = mod->add,
     -+			.want = name,
     -+		};
     -+		int idx = binsearch(mod->add_len, find_name, &arg);
     -+		if (idx < mod->add_len && !strcmp(mod->add[idx], name)) {
     -+			return 0;
     -+		}
     -+	}
     ++		struct slice out = { .buf = dest, .len = 10, .cap = 10 };
      +
     -+	if (mod->del_len > 0) {
     -+		struct find_arg arg = {
     -+			.names = mod->del,
     -+			.want = name,
     -+		};
     -+		int idx = binsearch(mod->del_len, find_name, &arg);
     -+		if (idx < mod->del_len && !strcmp(mod->del[idx], name)) {
     -+			return 1;
     -+		}
     ++		uint64_t in = inputs[i];
     ++		int n = put_var_int(&out, in);
     ++		uint64_t got = 0;
     ++
     ++		assert(n > 0);
     ++		out.len = n;
     ++		n = get_var_int(&got, &out);
     ++		assert(n > 0);
     ++
     ++		assert(got == in);
      +	}
     ++}
      +
     -+	err = reftable_table_read_ref(&mod->tab, name, &ref);
     -+	reftable_ref_record_clear(&ref);
     -+	return err;
     ++static void test_common_prefix(void)
     ++{
     ++	struct {
     ++		const char *a, *b;
     ++		int want;
     ++	} cases[] = {
     ++		{ "abc", "ab", 2 },
     ++		{ "", "abc", 0 },
     ++		{ "abc", "abd", 2 },
     ++		{ "abc", "pqr", 0 },
     ++	};
     ++
     ++	int i = 0;
     ++	for (i = 0; i < ARRAY_SIZE(cases); i++) {
     ++		struct slice a = SLICE_INIT;
     ++		struct slice b = SLICE_INIT;
     ++		slice_set_string(&a, cases[i].a);
     ++		slice_set_string(&b, cases[i].b);
     ++
     ++		assert(common_prefix_size(&a, &b) == cases[i].want);
     ++
     ++		slice_release(&a);
     ++		slice_release(&b);
     ++	}
      +}
      +
     -+static void modification_clear(struct modification *mod)
     ++static void set_hash(byte *h, int j)
      +{
     -+	/* don't delete the strings themselves; they're owned by ref records.
     -+	 */
     -+	FREE_AND_NULL(mod->add);
     -+	FREE_AND_NULL(mod->del);
     -+	mod->add_len = 0;
     -+	mod->del_len = 0;
     ++	int i = 0;
     ++	for (i = 0; i < hash_size(SHA1_ID); i++) {
     ++		h[i] = (j >> i) & 0xff;
     ++	}
      +}
      +
     -+int modification_has_ref_with_prefix(struct modification *mod,
     -+				     const char *prefix)
     ++static void test_reftable_ref_record_roundtrip(void)
      +{
     -+	struct reftable_iterator it = { NULL };
     -+	struct reftable_ref_record ref = { NULL };
     -+	int err = 0;
     ++	int i = 0;
     ++
     ++	for (i = 0; i <= 3; i++) {
     ++		struct reftable_ref_record in = { 0 };
     ++		struct reftable_ref_record out = {
     ++			.ref_name = xstrdup("old name"),
     ++			.value = reftable_calloc(SHA1_SIZE),
     ++			.target_value = reftable_calloc(SHA1_SIZE),
     ++			.target = xstrdup("old value"),
     ++		};
     ++		struct reftable_record rec_out = { 0 };
     ++		struct slice key = SLICE_INIT;
     ++		struct reftable_record rec = { 0 };
     ++		struct slice dest = SLICE_INIT;
     ++		int n, m;
     ++
     ++		switch (i) {
     ++		case 0:
     ++			break;
     ++		case 1:
     ++			in.value = reftable_malloc(SHA1_SIZE);
     ++			set_hash(in.value, 1);
     ++			break;
     ++		case 2:
     ++			in.value = reftable_malloc(SHA1_SIZE);
     ++			set_hash(in.value, 1);
     ++			in.target_value = reftable_malloc(SHA1_SIZE);
     ++			set_hash(in.target_value, 2);
     ++			break;
     ++		case 3:
     ++			in.target = xstrdup("target");
     ++			break;
     ++		}
     ++		in.ref_name = xstrdup("refs/heads/master");
     ++
     ++		reftable_record_from_ref(&rec, &in);
     ++		test_copy(&rec);
     ++
     ++		assert(reftable_record_val_type(&rec) == i);
     ++
     ++		reftable_record_key(&rec, &key);
     ++		slice_resize(&dest, 1024);
     ++		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
     ++		assert(n > 0);
     ++
     ++		/* decode into a non-zero reftable_record to test for leaks. */
     ++
     ++		reftable_record_from_ref(&rec_out, &out);
     ++		m = reftable_record_decode(&rec_out, key, i, dest, SHA1_SIZE);
     ++		assert(n == m);
     ++
     ++		assert((out.value != NULL) == (in.value != NULL));
     ++		assert((out.target_value != NULL) == (in.target_value != NULL));
     ++		assert((out.target != NULL) == (in.target != NULL));
     ++		reftable_record_clear(&rec_out);
     ++
     ++		slice_release(&key);
     ++		slice_release(&dest);
     ++		reftable_ref_record_clear(&in);
     ++	}
     ++}
     ++
     ++static void test_reftable_log_record_equal(void)
     ++{
     ++	struct reftable_log_record in[2] = {
     ++		{
     ++			.ref_name = xstrdup("refs/heads/master"),
     ++			.update_index = 42,
     ++		},
     ++		{
     ++			.ref_name = xstrdup("refs/heads/master"),
     ++			.update_index = 22,
     ++		}
     ++	};
     ++
     ++	assert(!reftable_log_record_equal(&in[0], &in[1], SHA1_SIZE));
     ++	in[1].update_index = in[0].update_index;
     ++	assert(reftable_log_record_equal(&in[0], &in[1], SHA1_SIZE));
     ++	reftable_log_record_clear(&in[0]);
     ++	reftable_log_record_clear(&in[1]);
     ++}
     ++
     ++static void test_reftable_log_record_roundtrip(void)
     ++{
     ++	struct reftable_log_record in[2] = {
     ++		{
     ++			.ref_name = xstrdup("refs/heads/master"),
     ++			.old_hash = reftable_malloc(SHA1_SIZE),
     ++			.new_hash = reftable_malloc(SHA1_SIZE),
     ++			.name = xstrdup("han-wen"),
     ++			.email = xstrdup("hanwen@google.com"),
     ++			.message = xstrdup("test"),
     ++			.update_index = 42,
     ++			.time = 1577123507,
     ++			.tz_offset = 100,
     ++		},
     ++		{
     ++			.ref_name = xstrdup("refs/heads/master"),
     ++			.update_index = 22,
     ++		}
     ++	};
     ++	set_test_hash(in[0].new_hash, 1);
     ++	set_test_hash(in[0].old_hash, 2);
     ++	for (int i = 0; i < ARRAY_SIZE(in); i++) {
     ++		struct reftable_record rec = { 0 };
     ++		struct slice key = SLICE_INIT;
     ++		struct slice dest = SLICE_INIT;
     ++		/* populate out, to check for leaks. */
     ++		struct reftable_log_record out = {
     ++			.ref_name = xstrdup("old name"),
     ++			.new_hash = reftable_calloc(SHA1_SIZE),
     ++			.old_hash = reftable_calloc(SHA1_SIZE),
     ++			.name = xstrdup("old name"),
     ++			.email = xstrdup("old@email"),
     ++			.message = xstrdup("old message"),
     ++		};
     ++		struct reftable_record rec_out = { 0 };
     ++		int n, m, valtype;
     ++
     ++		reftable_record_from_log(&rec, &in[i]);
     ++
     ++		test_copy(&rec);
     ++
     ++		reftable_record_key(&rec, &key);
     ++
     ++		slice_resize(&dest, 1024);
     ++
     ++		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
     ++		assert(n >= 0);
     ++		reftable_record_from_log(&rec_out, &out);
     ++		valtype = reftable_record_val_type(&rec);
     ++		m = reftable_record_decode(&rec_out, key, valtype, dest,
     ++					   SHA1_SIZE);
     ++		assert(n == m);
     ++
     ++		assert(reftable_log_record_equal(&in[i], &out, SHA1_SIZE));
     ++		reftable_log_record_clear(&in[i]);
     ++		slice_release(&key);
     ++		slice_release(&dest);
     ++		reftable_record_clear(&rec_out);
     ++	}
     ++}
     ++
     ++static void test_u24_roundtrip(void)
     ++{
     ++	uint32_t in = 0x112233;
     ++	byte dest[3];
     ++	uint32_t out;
     ++	put_be24(dest, in);
     ++	out = get_be24(dest);
     ++	assert(in == out);
     ++}
     ++
     ++static void test_key_roundtrip(void)
     ++{
     ++	struct slice dest = SLICE_INIT;
     ++	struct slice last_key = SLICE_INIT;
     ++	struct slice key = SLICE_INIT;
     ++	struct slice roundtrip = SLICE_INIT;
     ++	bool restart;
     ++	byte extra;
     ++	int n, m;
     ++	byte rt_extra;
     ++
     ++	slice_resize(&dest, 1024);
     ++	slice_set_string(&last_key, "refs/heads/master");
     ++	slice_set_string(&key, "refs/tags/bla");
     ++
     ++	extra = 6;
     ++	n = reftable_encode_key(&restart, dest, last_key, key, extra);
     ++	assert(!restart);
     ++	assert(n > 0);
     ++
     ++	m = reftable_decode_key(&roundtrip, &rt_extra, last_key, dest);
     ++	assert(n == m);
     ++	assert(slice_equal(&key, &roundtrip));
     ++	assert(rt_extra == extra);
     ++
     ++	slice_release(&last_key);
     ++	slice_release(&key);
     ++	slice_release(&dest);
     ++	slice_release(&roundtrip);
     ++}
     ++
     ++static void test_reftable_obj_record_roundtrip(void)
     ++{
     ++	byte testHash1[SHA1_SIZE] = { 1, 2, 3, 4, 0 };
     ++	uint64_t till9[] = { 1, 2, 3, 4, 500, 600, 700, 800, 9000 };
     ++	struct reftable_obj_record recs[3] = { {
     ++						       .hash_prefix = testHash1,
     ++						       .hash_prefix_len = 5,
     ++						       .offsets = till9,
     ++						       .offset_len = 3,
     ++					       },
     ++					       {
     ++						       .hash_prefix = testHash1,
     ++						       .hash_prefix_len = 5,
     ++						       .offsets = till9,
     ++						       .offset_len = 9,
     ++					       },
     ++					       {
     ++						       .hash_prefix = testHash1,
     ++						       .hash_prefix_len = 5,
     ++					       } };
     ++	int i = 0;
     ++	for (i = 0; i < ARRAY_SIZE(recs); i++) {
     ++		struct reftable_obj_record in = recs[i];
     ++		struct slice dest = SLICE_INIT;
     ++		struct reftable_record rec = { 0 };
     ++		struct slice key = SLICE_INIT;
     ++		struct reftable_obj_record out = { 0 };
     ++		struct reftable_record rec_out = { 0 };
     ++		int n, m;
     ++		byte extra;
     ++
     ++		reftable_record_from_obj(&rec, &in);
     ++		test_copy(&rec);
     ++		reftable_record_key(&rec, &key);
     ++		slice_resize(&dest, 1024);
     ++		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
     ++		assert(n > 0);
     ++		extra = reftable_record_val_type(&rec);
     ++		reftable_record_from_obj(&rec_out, &out);
     ++		m = reftable_record_decode(&rec_out, key, extra, dest,
     ++					   SHA1_SIZE);
     ++		assert(n == m);
     ++
     ++		assert(in.hash_prefix_len == out.hash_prefix_len);
     ++		assert(in.offset_len == out.offset_len);
     ++
     ++		assert(!memcmp(in.hash_prefix, out.hash_prefix,
     ++			       in.hash_prefix_len));
     ++		assert(0 == memcmp(in.offsets, out.offsets,
     ++				   sizeof(uint64_t) * in.offset_len));
     ++		slice_release(&key);
     ++		slice_release(&dest);
     ++		reftable_record_clear(&rec_out);
     ++	}
     ++}
     ++
     ++static void test_reftable_index_record_roundtrip(void)
     ++{
     ++	struct reftable_index_record in = {
     ++		.offset = 42,
     ++		.last_key = SLICE_INIT,
     ++	};
     ++	struct slice dest = SLICE_INIT;
     ++	struct slice key = SLICE_INIT;
     ++	struct reftable_record rec = { 0 };
     ++	struct reftable_index_record out = { .last_key = SLICE_INIT };
     ++	struct reftable_record out_rec = { NULL };
     ++	int n, m;
     ++	byte extra;
     ++
     ++	slice_set_string(&in.last_key, "refs/heads/master");
     ++	reftable_record_from_index(&rec, &in);
     ++	reftable_record_key(&rec, &key);
     ++	test_copy(&rec);
     ++
     ++	assert(0 == slice_cmp(&key, &in.last_key));
     ++	slice_resize(&dest, 1024);
     ++	n = reftable_record_encode(&rec, dest, SHA1_SIZE);
     ++	assert(n > 0);
     ++
     ++	extra = reftable_record_val_type(&rec);
     ++	reftable_record_from_index(&out_rec, &out);
     ++	m = reftable_record_decode(&out_rec, key, extra, dest, SHA1_SIZE);
     ++	assert(m == n);
     ++
     ++	assert(in.offset == out.offset);
     ++
     ++	reftable_record_clear(&out_rec);
     ++	slice_release(&key);
     ++	slice_release(&in.last_key);
     ++	slice_release(&dest);
     ++}
     ++
     ++int record_test_main(int argc, const char *argv[])
     ++{
     ++	add_test_case("test_reftable_log_record_equal",
     ++		      &test_reftable_log_record_equal);
     ++	add_test_case("test_reftable_log_record_roundtrip",
     ++		      &test_reftable_log_record_roundtrip);
     ++	add_test_case("test_reftable_ref_record_roundtrip",
     ++		      &test_reftable_ref_record_roundtrip);
     ++	add_test_case("test_varint_roundtrip", &test_varint_roundtrip);
     ++	add_test_case("test_key_roundtrip", &test_key_roundtrip);
     ++	add_test_case("test_common_prefix", &test_common_prefix);
     ++	add_test_case("test_reftable_obj_record_roundtrip",
     ++		      &test_reftable_obj_record_roundtrip);
     ++	add_test_case("test_reftable_index_record_roundtrip",
     ++		      &test_reftable_index_record_roundtrip);
     ++	add_test_case("test_u24_roundtrip", &test_u24_roundtrip);
     ++	return test_main(argc, argv);
     ++}
     +
     + ## reftable/refname.c (new) ##
     +@@
     ++/*
     ++  Copyright 2020 Google LLC
     ++
     ++  Use of this source code is governed by a BSD-style
     ++  license that can be found in the LICENSE file or at
     ++  https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include "system.h"
     ++#include "reftable.h"
     ++#include "basics.h"
     ++#include "refname.h"
     ++#include "slice.h"
     ++
     ++struct find_arg {
     ++	char **names;
     ++	const char *want;
     ++};
     ++
     ++static int find_name(size_t k, void *arg)
     ++{
     ++	struct find_arg *f_arg = (struct find_arg *)arg;
     ++	return strcmp(f_arg->names[k], f_arg->want) >= 0;
     ++}
     ++
     ++int modification_has_ref(struct modification *mod, const char *name)
     ++{
     ++	struct reftable_ref_record ref = { 0 };
     ++	int err = 0;
     ++
     ++	if (mod->add_len > 0) {
     ++		struct find_arg arg = {
     ++			.names = mod->add,
     ++			.want = name,
     ++		};
     ++		int idx = binsearch(mod->add_len, find_name, &arg);
     ++		if (idx < mod->add_len && !strcmp(mod->add[idx], name)) {
     ++			return 0;
     ++		}
     ++	}
     ++
     ++	if (mod->del_len > 0) {
     ++		struct find_arg arg = {
     ++			.names = mod->del,
     ++			.want = name,
     ++		};
     ++		int idx = binsearch(mod->del_len, find_name, &arg);
     ++		if (idx < mod->del_len && !strcmp(mod->del[idx], name)) {
     ++			return 1;
     ++		}
     ++	}
     ++
     ++	err = reftable_table_read_ref(&mod->tab, name, &ref);
     ++	reftable_ref_record_clear(&ref);
     ++	return err;
     ++}
     ++
     ++static void modification_clear(struct modification *mod)
     ++{
     ++	/* don't delete the strings themselves; they're owned by ref records.
     ++	 */
     ++	FREE_AND_NULL(mod->add);
     ++	FREE_AND_NULL(mod->del);
     ++	mod->add_len = 0;
     ++	mod->del_len = 0;
     ++}
     ++
     ++int modification_has_ref_with_prefix(struct modification *mod,
     ++				     const char *prefix)
     ++{
     ++	struct reftable_iterator it = { NULL };
     ++	struct reftable_ref_record ref = { NULL };
     ++	int err = 0;
      +
      +	if (mod->add_len > 0) {
      +		struct find_arg arg = {
     @@ reftable/refname.h (new)
      +
      +int modification_validate(struct modification *mod);
      +
     ++#endif
     +
     + ## reftable/refname_test.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include "reftable.h"
     ++
     ++#include "basics.h"
     ++#include "block.h"
     ++#include "constants.h"
     ++#include "reader.h"
     ++#include "record.h"
     ++#include "refname.h"
     ++#include "system.h"
     ++
     ++#include "test_framework.h"
     ++#include "reftable-tests.h"
     ++
     ++struct testcase {
     ++	char *add;
     ++	char *del;
     ++	int error_code;
     ++};
     ++
     ++static void test_conflict(void)
     ++{
     ++	struct reftable_write_options opts = { 0 };
     ++	struct slice buf = SLICE_INIT;
     ++	struct reftable_writer *w =
     ++		reftable_new_writer(&slice_add_void, &buf, &opts);
     ++	struct reftable_ref_record rec = {
     ++		.ref_name = "a/b",
     ++		.target = "destination", /* make sure it's not a symref. */
     ++		.update_index = 1,
     ++	};
     ++	int err;
     ++	int i;
     ++	struct reftable_block_source source = { 0 };
     ++	struct reftable_reader *rd = NULL;
     ++	struct reftable_table tab = { NULL };
     ++	struct testcase cases[] = {
     ++		{ "a/b/c", NULL, REFTABLE_NAME_CONFLICT },
     ++		{ "b", NULL, 0 },
     ++		{ "a", NULL, REFTABLE_NAME_CONFLICT },
     ++		{ "a", "a/b", 0 },
     ++
     ++		{ "p/", NULL, REFTABLE_REFNAME_ERROR },
     ++		{ "p//q", NULL, REFTABLE_REFNAME_ERROR },
     ++		{ "p/./q", NULL, REFTABLE_REFNAME_ERROR },
     ++		{ "p/../q", NULL, REFTABLE_REFNAME_ERROR },
     ++
     ++		{ "a/b/c", "a/b", 0 },
     ++		{ NULL, "a//b", 0 },
     ++	};
     ++	reftable_writer_set_limits(w, 1, 1);
     ++
     ++	err = reftable_writer_add_ref(w, &rec);
     ++	assert_err(err);
     ++
     ++	err = reftable_writer_close(w);
     ++	assert_err(err);
     ++	reftable_writer_free(w);
     ++
     ++	block_source_from_slice(&source, &buf);
     ++	err = reftable_new_reader(&rd, &source, "filename");
     ++	assert_err(err);
     ++
     ++	reftable_table_from_reader(&tab, rd);
     ++
     ++	for (i = 0; i < ARRAY_SIZE(cases); i++) {
     ++		struct modification mod = {
     ++			.tab = tab,
     ++		};
     ++
     ++		if (cases[i].add != NULL) {
     ++			mod.add = &cases[i].add;
     ++			mod.add_len = 1;
     ++		}
     ++		if (cases[i].del != NULL) {
     ++			mod.del = &cases[i].del;
     ++			mod.del_len = 1;
     ++		}
     ++
     ++		err = modification_validate(&mod);
     ++		assert(err == cases[i].error_code);
     ++	}
     ++
     ++	reftable_reader_free(rd);
     ++	slice_release(&buf);
     ++}
     ++
     ++int refname_test_main(int argc, const char *argv[])
     ++{
     ++	add_test_case("test_conflict", &test_conflict);
     ++	return test_main(argc, argv);
     ++}
     +
     + ## reftable/reftable-tests.h (new) ##
     +@@
     ++#ifndef REFTABLE_TESTS_H
     ++#define REFTABLE_TESTS_H
     ++
     ++int block_test_main(int argc, const char **argv);
     ++int merged_test_main(int argc, const char **argv);
     ++int record_test_main(int argc, const char **argv);
     ++int refname_test_main(int argc, const char **argv);
     ++int reftable_test_main(int argc, const char **argv);
     ++int slice_test_main(int argc, const char **argv);
     ++int stack_test_main(int argc, const char **argv);
     ++int tree_test_main(int argc, const char **argv);
     ++
      +#endif
      
       ## reftable/reftable.c (new) ##
     @@ reftable/reftable.h (new)
      +
      +#endif
      
     - ## reftable/slice.c (new) ##
     + ## reftable/reftable_test.c (new) ##
      @@
      +/*
      +Copyright 2020 Google LLC
     @@ reftable/slice.c (new)
      +https://developers.google.com/open-source/licenses/bsd
      +*/
      +
     -+#include "slice.h"
     ++#include "reftable.h"
      +
      +#include "system.h"
      +
     -+#include "reftable.h"
     ++#include "basics.h"
     ++#include "block.h"
     ++#include "constants.h"
     ++#include "reader.h"
     ++#include "record.h"
     ++#include "test_framework.h"
     ++#include "reftable-tests.h"
      +
     -+struct slice reftable_empty_slice = SLICE_INIT;
     ++static const int update_index = 5;
      +
     -+void slice_set_string(struct slice *s, const char *str)
     ++static void test_buffer(void)
      +{
     -+	int l;
     -+	if (str == NULL) {
     -+		s->len = 0;
     -+		return;
     -+	}
     -+	assert(s->canary == SLICE_CANARY);
     ++	struct slice buf = SLICE_INIT;
     ++	struct reftable_block_source source = { NULL };
     ++	struct reftable_block out = { 0 };
     ++	int n;
     ++	byte in[] = "hello";
     ++	slice_add(&buf, in, sizeof(in));
     ++	block_source_from_slice(&source, &buf);
     ++	assert(block_source_size(&source) == 6);
     ++	n = block_source_read_block(&source, &out, 0, sizeof(in));
     ++	assert(n == sizeof(in));
     ++	assert(!memcmp(in, out.data, n));
     ++	reftable_block_done(&out);
     ++
     ++	n = block_source_read_block(&source, &out, 1, 2);
     ++	assert(n == 2);
     ++	assert(!memcmp(out.data, "el", 2));
     ++
     ++	reftable_block_done(&out);
     ++	block_source_close(&source);
     ++	slice_release(&buf);
     ++}
     ++
     ++static void test_default_write_opts(void)
     ++{
     ++	struct reftable_write_options opts = { 0 };
     ++	struct slice buf = SLICE_INIT;
     ++	struct reftable_writer *w =
     ++		reftable_new_writer(&slice_add_void, &buf, &opts);
     ++
     ++	struct reftable_ref_record rec = {
     ++		.ref_name = "master",
     ++		.update_index = 1,
     ++	};
     ++	int err;
     ++	struct reftable_block_source source = { 0 };
     ++	struct reftable_reader **readers = malloc(sizeof(*readers) * 1);
     ++	uint32_t hash_id;
     ++	struct reftable_reader *rd = NULL;
     ++	struct reftable_merged_table *merged = NULL;
      +
     -+	l = strlen(str);
     -+	l++; /* \0 */
     -+	slice_resize(s, l);
     -+	memcpy(s->buf, str, l);
     -+	s->len = l - 1;
     -+}
     ++	reftable_writer_set_limits(w, 1, 1);
      +
     -+void slice_init(struct slice *s)
     -+{
     -+	struct slice empty = SLICE_INIT;
     -+	*s = empty;
     -+}
     ++	err = reftable_writer_add_ref(w, &rec);
     ++	assert_err(err);
      +
     -+void slice_resize(struct slice *s, int l)
     -+{
     -+	assert(s->canary == SLICE_CANARY);
     -+	if (s->cap < l) {
     -+		int c = s->cap * 2;
     -+		if (c < l) {
     -+			c = l;
     -+		}
     -+		s->cap = c;
     -+		s->buf = reftable_realloc(s->buf, s->cap);
     -+	}
     -+	s->len = l;
     -+}
     ++	err = reftable_writer_close(w);
     ++	assert_err(err);
     ++	reftable_writer_free(w);
      +
     -+void slice_addstr(struct slice *d, const char *s)
     -+{
     -+	int l1 = d->len;
     -+	int l2 = strlen(s);
     -+	assert(d->canary == SLICE_CANARY);
     ++	block_source_from_slice(&source, &buf);
      +
     -+	slice_resize(d, l2 + l1);
     -+	memcpy(d->buf + l1, s, l2);
     ++	err = reftable_new_reader(&rd, &source, "filename");
     ++	assert_err(err);
     ++
     ++	hash_id = reftable_reader_hash_id(rd);
     ++	assert(hash_id == SHA1_ID);
     ++
     ++	readers[0] = rd;
     ++
     ++	err = reftable_new_merged_table(&merged, readers, 1, SHA1_ID);
     ++	assert_err(err);
     ++
     ++	reftable_merged_table_close(merged);
     ++	reftable_merged_table_free(merged);
     ++	slice_release(&buf);
      +}
      +
     -+void slice_addbuf(struct slice *s, struct slice a)
     ++static void write_table(char ***names, struct slice *buf, int N, int block_size,
     ++			uint32_t hash_id)
      +{
     -+	int end = s->len;
     -+	assert(s->canary == SLICE_CANARY);
     -+	slice_resize(s, s->len + a.len);
     -+	memcpy(s->buf + end, a.buf, a.len);
     -+}
     -+
     -+void slice_consume(struct slice *s, int n)
     -+{
     -+	assert(s->canary == SLICE_CANARY);
     -+	s->buf += n;
     -+	s->len -= n;
     -+}
     ++	struct reftable_write_options opts = {
     ++		.block_size = block_size,
     ++		.hash_id = hash_id,
     ++	};
     ++	struct reftable_writer *w =
     ++		reftable_new_writer(&slice_add_void, buf, &opts);
     ++	struct reftable_ref_record ref = { 0 };
     ++	int i = 0, n;
     ++	struct reftable_log_record log = { 0 };
     ++	const struct reftable_stats *stats = NULL;
     ++	*names = reftable_calloc(sizeof(char *) * (N + 1));
     ++	reftable_writer_set_limits(w, update_index, update_index);
     ++	for (i = 0; i < N; i++) {
     ++		byte hash[SHA256_SIZE] = { 0 };
     ++		char name[100];
     ++		int n;
      +
     -+byte *slice_detach(struct slice *s)
     -+{
     -+	byte *p = s->buf;
     -+	assert(s->canary == SLICE_CANARY);
     -+	s->buf = NULL;
     -+	s->cap = 0;
     -+	s->len = 0;
     -+	return p;
     -+}
     ++		set_test_hash(hash, i);
      +
     -+void slice_release(struct slice *s)
     -+{
     -+	assert(s->canary == SLICE_CANARY);
     -+	reftable_free(slice_detach(s));
     -+}
     ++		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
      +
     -+void slice_copy(struct slice *dest, struct slice src)
     -+{
     -+	assert(dest->canary == SLICE_CANARY);
     -+	assert(src.canary == SLICE_CANARY);
     -+	slice_resize(dest, src.len);
     -+	memcpy(dest->buf, src.buf, src.len);
     -+}
     ++		ref.ref_name = name;
     ++		ref.value = hash;
     ++		ref.update_index = update_index;
     ++		(*names)[i] = xstrdup(name);
      +
     -+/* return the underlying data as char*. len is left unchanged, but
     -+   a \0 is added at the end. */
     -+const char *slice_as_string(struct slice *s)
     -+{
     -+	assert(s->canary == SLICE_CANARY);
     -+	if (s->cap == s->len) {
     -+		int l = s->len;
     -+		slice_resize(s, l + 1);
     -+		s->len = l;
     ++		n = reftable_writer_add_ref(w, &ref);
     ++		assert(n == 0);
      +	}
     -+	s->buf[s->len] = 0;
     -+	return (const char *)s->buf;
     -+}
      +
     -+/* return a newly malloced string for this slice */
     -+char *slice_to_string(struct slice in)
     -+{
     -+	struct slice s = SLICE_INIT;
     -+	assert(in.canary == SLICE_CANARY);
     -+	slice_resize(&s, in.len + 1);
     -+	s.buf[in.len] = 0;
     -+	memcpy(s.buf, in.buf, in.len);
     -+	return (char *)slice_detach(&s);
     -+}
     ++	for (i = 0; i < N; i++) {
     ++		byte hash[SHA256_SIZE] = { 0 };
     ++		char name[100];
     ++		int n;
      +
     -+bool slice_equal(struct slice a, struct slice b)
     -+{
     -+	assert(a.canary == SLICE_CANARY);
     -+	assert(b.canary == SLICE_CANARY);
     -+	if (a.len != b.len)
     -+		return 0;
     -+	return memcmp(a.buf, b.buf, a.len) == 0;
     -+}
     ++		set_test_hash(hash, i);
      +
     -+int slice_cmp(struct slice a, struct slice b)
     -+{
     -+	int min = a.len < b.len ? a.len : b.len;
     -+	int res = memcmp(a.buf, b.buf, min);
     -+	assert(a.canary == SLICE_CANARY);
     -+	assert(b.canary == SLICE_CANARY);
     -+	if (res != 0)
     -+		return res;
     -+	if (a.len < b.len)
     -+		return -1;
     -+	else if (a.len > b.len)
     -+		return 1;
     -+	else
     -+		return 0;
     -+}
     ++		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
      +
     -+int slice_add(struct slice *b, byte *data, size_t sz)
     -+{
     -+	assert(b->canary == SLICE_CANARY);
     -+	if (b->len + sz > b->cap) {
     -+		int newcap = 2 * b->cap + 1;
     -+		if (newcap < b->len + sz) {
     -+			newcap = (b->len + sz);
     ++		log.ref_name = name;
     ++		log.new_hash = hash;
     ++		log.update_index = update_index;
     ++		log.message = "message";
     ++
     ++		n = reftable_writer_add_log(w, &log);
     ++		assert(n == 0);
     ++	}
     ++
     ++	n = reftable_writer_close(w);
     ++	assert(n == 0);
     ++
     ++	stats = writer_stats(w);
     ++	for (i = 0; i < stats->ref_stats.blocks; i++) {
     ++		int off = i * opts.block_size;
     ++		if (off == 0) {
     ++			off = header_size((hash_id == SHA256_ID) ? 2 : 1);
      +		}
     -+		b->buf = reftable_realloc(b->buf, newcap);
     -+		b->cap = newcap;
     ++		assert(buf->buf[off] == 'r');
      +	}
      +
     -+	memcpy(b->buf + b->len, data, sz);
     -+	b->len += sz;
     -+	return sz;
     ++	assert(stats->log_stats.blocks > 0);
     ++	reftable_writer_free(w);
      +}
      +
     -+int slice_add_void(void *b, byte *data, size_t sz)
     ++static void test_log_buffer_size(void)
      +{
     -+	return slice_add((struct slice *)b, data, sz);
     -+}
     ++	struct slice buf = SLICE_INIT;
     ++	struct reftable_write_options opts = {
     ++		.block_size = 4096,
     ++	};
     ++	int err;
     ++	struct reftable_log_record log = {
     ++		.ref_name = "refs/heads/master",
     ++		.name = "Han-Wen Nienhuys",
     ++		.email = "hanwen@google.com",
     ++		.tz_offset = 100,
     ++		.time = 0x5e430672,
     ++		.update_index = 0xa,
     ++		.message = "commit: 9\n",
     ++	};
     ++	struct reftable_writer *w =
     ++		reftable_new_writer(&slice_add_void, &buf, &opts);
      +
     -+static uint64_t slice_size(void *b)
     -+{
     -+	return ((struct slice *)b)->len;
     -+}
     ++	/* This tests buffer extension for log compression. Must use a random
     ++	   hash, to ensure that the compressed part is larger than the original.
     ++	*/
     ++	byte hash1[SHA1_SIZE], hash2[SHA1_SIZE];
     ++	for (int i = 0; i < SHA1_SIZE; i++) {
     ++		hash1[i] = (byte)(rand() % 256);
     ++		hash2[i] = (byte)(rand() % 256);
     ++	}
     ++	log.old_hash = hash1;
     ++	log.new_hash = hash2;
     ++	reftable_writer_set_limits(w, update_index, update_index);
     ++	err = reftable_writer_add_log(w, &log);
     ++	assert_err(err);
     ++	err = reftable_writer_close(w);
     ++	assert_err(err);
     ++	reftable_writer_free(w);
     ++	slice_release(&buf);
     ++}
     ++
     ++static void test_log_write_read(void)
     ++{
     ++	int N = 2;
     ++	char **names = reftable_calloc(sizeof(char *) * (N + 1));
     ++	int err;
     ++	struct reftable_write_options opts = {
     ++		.block_size = 256,
     ++	};
     ++	struct reftable_ref_record ref = { 0 };
     ++	int i = 0;
     ++	struct reftable_log_record log = { 0 };
     ++	int n;
     ++	struct reftable_iterator it = { 0 };
     ++	struct reftable_reader rd = { 0 };
     ++	struct reftable_block_source source = { 0 };
     ++	struct slice buf = SLICE_INIT;
     ++	struct reftable_writer *w =
     ++		reftable_new_writer(&slice_add_void, &buf, &opts);
     ++	const struct reftable_stats *stats = NULL;
     ++	reftable_writer_set_limits(w, 0, N);
     ++	for (i = 0; i < N; i++) {
     ++		char name[256];
     ++		struct reftable_ref_record ref = { 0 };
     ++		snprintf(name, sizeof(name), "b%02d%0*d", i, 130, 7);
     ++		names[i] = xstrdup(name);
     ++		puts(name);
     ++		ref.ref_name = name;
     ++		ref.update_index = i;
      +
     -+static void slice_return_block(void *b, struct reftable_block *dest)
     -+{
     -+	memset(dest->data, 0xff, dest->len);
     -+	reftable_free(dest->data);
     -+}
     ++		err = reftable_writer_add_ref(w, &ref);
     ++		assert_err(err);
     ++	}
     ++	for (i = 0; i < N; i++) {
     ++		byte hash1[SHA1_SIZE], hash2[SHA1_SIZE];
     ++		struct reftable_log_record log = { 0 };
     ++		set_test_hash(hash1, i);
     ++		set_test_hash(hash2, i + 1);
      +
     -+static void slice_close(void *b)
     -+{
     -+}
     ++		log.ref_name = names[i];
     ++		log.update_index = i;
     ++		log.old_hash = hash1;
     ++		log.new_hash = hash2;
      +
     -+static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
     -+			    uint32_t size)
     -+{
     -+	struct slice *b = (struct slice *)v;
     -+	assert(off + size <= b->len);
     -+	dest->data = reftable_calloc(size);
     -+	memcpy(dest->data, b->buf + off, size);
     -+	dest->len = size;
     -+	return size;
     -+}
     ++		err = reftable_writer_add_log(w, &log);
     ++		assert_err(err);
     ++	}
      +
     -+struct reftable_block_source_vtable slice_vtable = {
     -+	.size = &slice_size,
     -+	.read_block = &slice_read_block,
     -+	.return_block = &slice_return_block,
     -+	.close = &slice_close,
     -+};
     ++	n = reftable_writer_close(w);
     ++	assert(n == 0);
      +
     -+void block_source_from_slice(struct reftable_block_source *bs,
     -+			     struct slice *buf)
     -+{
     -+	assert(bs->ops == NULL);
     -+	bs->ops = &slice_vtable;
     -+	bs->arg = buf;
     -+}
     ++	stats = writer_stats(w);
     ++	assert(stats->log_stats.blocks > 0);
     ++	reftable_writer_free(w);
     ++	w = NULL;
      +
     -+static void malloc_return_block(void *b, struct reftable_block *dest)
     -+{
     -+	memset(dest->data, 0xff, dest->len);
     -+	reftable_free(dest->data);
     -+}
     ++	block_source_from_slice(&source, &buf);
      +
     -+struct reftable_block_source_vtable malloc_vtable = {
     -+	.return_block = &malloc_return_block,
     -+};
     ++	err = init_reader(&rd, &source, "file.log");
     ++	assert_err(err);
      +
     -+struct reftable_block_source malloc_block_source_instance = {
     -+	.ops = &malloc_vtable,
     -+};
     ++	err = reftable_reader_seek_ref(&rd, &it, names[N - 1]);
     ++	assert_err(err);
      +
     -+struct reftable_block_source malloc_block_source(void)
     -+{
     -+	return malloc_block_source_instance;
     -+}
     ++	err = reftable_iterator_next_ref(&it, &ref);
     ++	assert_err(err);
      +
     -+int common_prefix_size(struct slice a, struct slice b)
     -+{
     -+	int p = 0;
     -+	assert(a.canary == SLICE_CANARY);
     -+	assert(b.canary == SLICE_CANARY);
     -+	while (p < a.len && p < b.len) {
     -+		if (a.buf[p] != b.buf[p]) {
     ++	/* end of iteration. */
     ++	err = reftable_iterator_next_ref(&it, &ref);
     ++	assert(0 < err);
     ++
     ++	reftable_iterator_destroy(&it);
     ++	reftable_ref_record_clear(&ref);
     ++
     ++	err = reftable_reader_seek_log(&rd, &it, "");
     ++	assert_err(err);
     ++
     ++	i = 0;
     ++	while (true) {
     ++		int err = reftable_iterator_next_log(&it, &log);
     ++		if (err > 0) {
      +			break;
      +		}
     -+		p++;
     ++
     ++		assert_err(err);
     ++		assert_streq(names[i], log.ref_name);
     ++		assert(i == log.update_index);
     ++		i++;
     ++		reftable_log_record_clear(&log);
      +	}
      +
     -+	return p;
     ++	assert(i == N);
     ++	reftable_iterator_destroy(&it);
     ++
     ++	/* cleanup. */
     ++	slice_release(&buf);
     ++	free_names(names);
     ++	reader_close(&rd);
      +}
     -
     - ## reftable/slice.h (new) ##
     -@@
     -+/*
     -+Copyright 2020 Google LLC
      +
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     ++static void test_table_read_write_sequential(void)
     ++{
     ++	char **names;
     ++	struct slice buf = SLICE_INIT;
     ++	int N = 50;
     ++	struct reftable_iterator it = { 0 };
     ++	struct reftable_block_source source = { 0 };
     ++	struct reftable_reader rd = { 0 };
     ++	int err = 0;
     ++	int j = 0;
      +
     -+#ifndef SLICE_H
     -+#define SLICE_H
     ++	write_table(&names, &buf, N, 256, SHA1_ID);
      +
     -+#include "basics.h"
     -+#include "reftable.h"
     ++	block_source_from_slice(&source, &buf);
      +
     -+/*
     -+  provides bounds-checked byte ranges.
     -+  To use, initialize as "slice x = {0};"
     -+ */
     -+struct slice {
     -+	int len;
     -+	int cap;
     -+	byte *buf;
     -+	byte canary;
     -+};
     -+#define SLICE_CANARY 0x42
     -+#define SLICE_INIT                       \
     -+	{                                \
     -+		0, 0, NULL, SLICE_CANARY \
     ++	err = init_reader(&rd, &source, "file.ref");
     ++	assert_err(err);
     ++
     ++	err = reftable_reader_seek_ref(&rd, &it, "");
     ++	assert_err(err);
     ++
     ++	while (true) {
     ++		struct reftable_ref_record ref = { 0 };
     ++		int r = reftable_iterator_next_ref(&it, &ref);
     ++		assert(r >= 0);
     ++		if (r > 0) {
     ++			break;
     ++		}
     ++		assert(0 == strcmp(names[j], ref.ref_name));
     ++		assert(update_index == ref.update_index);
     ++
     ++		j++;
     ++		reftable_ref_record_clear(&ref);
      +	}
     -+extern struct slice reftable_empty_slice;
     ++	assert(j == N);
     ++	reftable_iterator_destroy(&it);
     ++	slice_release(&buf);
     ++	free_names(names);
      +
     -+void slice_set_string(struct slice *dest, const char *src);
     -+void slice_addstr(struct slice *dest, const char *src);
     ++	reader_close(&rd);
     ++}
      +
     -+/* Deallocate and clear slice */
     -+void slice_release(struct slice *slice);
     ++static void test_table_write_small_table(void)
     ++{
     ++	char **names;
     ++	struct slice buf = SLICE_INIT;
     ++	int N = 1;
     ++	write_table(&names, &buf, N, 4096, SHA1_ID);
     ++	assert(buf.len < 200);
     ++	slice_release(&buf);
     ++	free_names(names);
     ++}
      +
     -+/* Return a malloced string for `src` */
     -+char *slice_to_string(struct slice src);
     ++static void test_table_read_api(void)
     ++{
     ++	char **names;
     ++	struct slice buf = SLICE_INIT;
     ++	int N = 50;
     ++	struct reftable_reader rd = { 0 };
     ++	struct reftable_block_source source = { 0 };
     ++	int err;
     ++	int i;
     ++	struct reftable_log_record log = { 0 };
     ++	struct reftable_iterator it = { 0 };
      +
     -+/* Initializes a slice. Accepts a slice with random garbage. */
     -+void slice_init(struct slice *slice);
     ++	write_table(&names, &buf, N, 256, SHA1_ID);
      +
     -+/* Ensure that `buf` is \0 terminated. */
     -+const char *slice_as_string(struct slice *src);
     ++	block_source_from_slice(&source, &buf);
      +
     -+/* Compare slices */
     -+bool slice_equal(struct slice a, struct slice b);
     ++	err = init_reader(&rd, &source, "file.ref");
     ++	assert_err(err);
      +
     -+/* Return `buf`, clearing out `s` */
     -+byte *slice_detach(struct slice *s);
     ++	err = reftable_reader_seek_ref(&rd, &it, names[0]);
     ++	assert_err(err);
      +
     -+/* Copy bytes */
     -+void slice_copy(struct slice *dest, struct slice src);
     ++	err = reftable_iterator_next_log(&it, &log);
     ++	assert(err == REFTABLE_API_ERROR);
      +
     -+/* Advance `buf` by `n`, and decrease length. A copy of the slice
     -+   should be kept for deallocating the slice. */
     -+void slice_consume(struct slice *s, int n);
     ++	slice_release(&buf);
     ++	for (i = 0; i < N; i++) {
     ++		reftable_free(names[i]);
     ++	}
     ++	reftable_iterator_destroy(&it);
     ++	reftable_free(names);
     ++	reader_close(&rd);
     ++	slice_release(&buf);
     ++}
      +
     -+/* Set length of the slice to `l` */
     -+void slice_resize(struct slice *s, int l);
     ++static void test_table_read_write_seek(bool index, int hash_id)
     ++{
     ++	char **names;
     ++	struct slice buf = SLICE_INIT;
     ++	int N = 50;
     ++	struct reftable_reader rd = { 0 };
     ++	struct reftable_block_source source = { 0 };
     ++	int err;
     ++	int i = 0;
      +
     -+/* Signed comparison */
     -+int slice_cmp(struct slice a, struct slice b);
     ++	struct reftable_iterator it = { 0 };
     ++	struct slice pastLast = SLICE_INIT;
     ++	struct reftable_ref_record ref = { 0 };
      +
     -+/* Append `data` to the `dest` slice.  */
     -+int slice_add(struct slice *dest, byte *data, size_t sz);
     ++	write_table(&names, &buf, N, 256, hash_id);
      +
     -+/* Append `add` to `dest. */
     -+void slice_addbuf(struct slice *dest, struct slice add);
     ++	block_source_from_slice(&source, &buf);
      +
     -+/* Like slice_add, but suitable for passing to reftable_new_writer
     -+ */
     -+int slice_add_void(void *b, byte *data, size_t sz);
     ++	err = init_reader(&rd, &source, "file.ref");
     ++	assert_err(err);
     ++	assert(hash_id == reftable_reader_hash_id(&rd));
      +
     -+/* Find the longest shared prefix size of `a` and `b` */
     -+int common_prefix_size(struct slice a, struct slice b);
     ++	if (!index) {
     ++		rd.ref_offsets.index_offset = 0;
     ++	} else {
     ++		assert(rd.ref_offsets.index_offset > 0);
     ++	}
      +
     -+struct reftable_block_source;
     ++	for (i = 1; i < N; i++) {
     ++		int err = reftable_reader_seek_ref(&rd, &it, names[i]);
     ++		assert_err(err);
     ++		err = reftable_iterator_next_ref(&it, &ref);
     ++		assert_err(err);
     ++		assert(0 == strcmp(names[i], ref.ref_name));
     ++		assert(i == ref.value[0]);
      +
     -+/* Create an in-memory block source for reading reftables */
     -+void block_source_from_slice(struct reftable_block_source *bs,
     -+			     struct slice *buf);
     ++		reftable_ref_record_clear(&ref);
     ++		reftable_iterator_destroy(&it);
     ++	}
      +
     -+struct reftable_block_source malloc_block_source(void);
     ++	slice_set_string(&pastLast, names[N - 1]);
     ++	slice_addstr(&pastLast, "/");
      +
     -+#endif
     -
     - ## reftable/stack.c (new) ##
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     ++	err = reftable_reader_seek_ref(&rd, &it, slice_as_string(&pastLast));
     ++	if (err == 0) {
     ++		struct reftable_ref_record ref = { 0 };
     ++		int err = reftable_iterator_next_ref(&it, &ref);
     ++		assert(err > 0);
     ++	} else {
     ++		assert(err > 0);
     ++	}
      +
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     ++	slice_release(&pastLast);
     ++	reftable_iterator_destroy(&it);
      +
     -+#include "stack.h"
     ++	slice_release(&buf);
     ++	for (i = 0; i < N; i++) {
     ++		reftable_free(names[i]);
     ++	}
     ++	reftable_free(names);
     ++	reader_close(&rd);
     ++}
      +
     -+#include "system.h"
     -+#include "merged.h"
     -+#include "reader.h"
     -+#include "refname.h"
     -+#include "reftable.h"
     -+#include "writer.h"
     ++static void test_table_read_write_seek_linear(void)
     ++{
     ++	test_table_read_write_seek(false, SHA1_ID);
     ++}
      +
     -+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
     -+		       struct reftable_write_options config)
     ++static void test_table_read_write_seek_linear_sha256(void)
      +{
     -+	struct reftable_stack *p =
     -+		reftable_calloc(sizeof(struct reftable_stack));
     -+	struct slice list_file_name = SLICE_INIT;
     -+	int err = 0;
     ++	test_table_read_write_seek(false, SHA256_ID);
     ++}
      +
     -+	if (config.hash_id == 0) {
     -+		config.hash_id = SHA1_ID;
     -+	}
     ++static void test_table_read_write_seek_index(void)
     ++{
     ++	test_table_read_write_seek(true, SHA1_ID);
     ++}
      +
     -+	*dest = NULL;
     ++static void test_table_refs_for(bool indexed)
     ++{
     ++	int N = 50;
     ++	char **want_names = reftable_calloc(sizeof(char *) * (N + 1));
     ++	int want_names_len = 0;
     ++	byte want_hash[SHA1_SIZE];
      +
     -+	slice_set_string(&list_file_name, dir);
     -+	slice_addstr(&list_file_name, "/tables.list");
     ++	struct reftable_write_options opts = {
     ++		.block_size = 256,
     ++	};
     ++	struct reftable_ref_record ref = { 0 };
     ++	int i = 0;
     ++	int n;
     ++	int err;
     ++	struct reftable_reader rd;
     ++	struct reftable_block_source source = { 0 };
      +
     -+	p->list_file = slice_to_string(list_file_name);
     -+	slice_release(&list_file_name);
     -+	p->reftable_dir = xstrdup(dir);
     -+	p->config = config;
     ++	struct slice buf = SLICE_INIT;
     ++	struct reftable_writer *w =
     ++		reftable_new_writer(&slice_add_void, &buf, &opts);
      +
     -+	err = reftable_stack_reload_maybe_reuse(p, true);
     -+	if (err < 0) {
     -+		reftable_stack_destroy(p);
     -+	} else {
     -+		*dest = p;
     -+	}
     -+	return err;
     -+}
     ++	struct reftable_iterator it = { 0 };
     ++	int j;
      +
     -+static int fd_read_lines(int fd, char ***namesp)
     -+{
     -+	off_t size = lseek(fd, 0, SEEK_END);
     -+	char *buf = NULL;
     -+	int err = 0;
     -+	if (size < 0) {
     -+		err = REFTABLE_IO_ERROR;
     -+		goto done;
     -+	}
     -+	err = lseek(fd, 0, SEEK_SET);
     -+	if (err < 0) {
     -+		err = REFTABLE_IO_ERROR;
     -+		goto done;
     ++	set_test_hash(want_hash, 4);
     ++
     ++	for (i = 0; i < N; i++) {
     ++		byte hash[SHA1_SIZE];
     ++		char fill[51] = { 0 };
     ++		char name[100];
     ++		byte hash1[SHA1_SIZE];
     ++		byte hash2[SHA1_SIZE];
     ++		struct reftable_ref_record ref = { 0 };
     ++
     ++		memset(hash, i, sizeof(hash));
     ++		memset(fill, 'x', 50);
     ++		/* Put the variable part in the start */
     ++		snprintf(name, sizeof(name), "br%02d%s", i, fill);
     ++		name[40] = 0;
     ++		ref.ref_name = name;
     ++
     ++		set_test_hash(hash1, i / 4);
     ++		set_test_hash(hash2, 3 + i / 4);
     ++		ref.value = hash1;
     ++		ref.target_value = hash2;
     ++
     ++		/* 80 bytes / entry, so 3 entries per block. Yields 17
     ++		 */
     ++		/* blocks. */
     ++		n = reftable_writer_add_ref(w, &ref);
     ++		assert(n == 0);
     ++
     ++		if (!memcmp(hash1, want_hash, SHA1_SIZE) ||
     ++		    !memcmp(hash2, want_hash, SHA1_SIZE)) {
     ++			want_names[want_names_len++] = xstrdup(name);
     ++		}
      +	}
      +
     -+	buf = reftable_malloc(size + 1);
     -+	if (read(fd, buf, size) != size) {
     -+		err = REFTABLE_IO_ERROR;
     -+		goto done;
     ++	n = reftable_writer_close(w);
     ++	assert(n == 0);
     ++
     ++	reftable_writer_free(w);
     ++	w = NULL;
     ++
     ++	block_source_from_slice(&source, &buf);
     ++
     ++	err = init_reader(&rd, &source, "file.ref");
     ++	assert_err(err);
     ++	if (!indexed) {
     ++		rd.obj_offsets.present = 0;
      +	}
     -+	buf[size] = 0;
      +
     -+	parse_names(buf, size, namesp);
     ++	err = reftable_reader_seek_ref(&rd, &it, "");
     ++	assert_err(err);
     ++	reftable_iterator_destroy(&it);
      +
     -+done:
     -+	reftable_free(buf);
     -+	return err;
     -+}
     ++	err = reftable_reader_refs_for(&rd, &it, want_hash);
     ++	assert_err(err);
      +
     -+int read_lines(const char *filename, char ***namesp)
     -+{
     -+	int fd = open(filename, O_RDONLY, 0644);
     -+	int err = 0;
     -+	if (fd < 0) {
     -+		if (errno == ENOENT) {
     -+			*namesp = reftable_calloc(sizeof(char *));
     -+			return 0;
     ++	j = 0;
     ++	while (true) {
     ++		int err = reftable_iterator_next_ref(&it, &ref);
     ++		assert(err >= 0);
     ++		if (err > 0) {
     ++			break;
      +		}
      +
     -+		return REFTABLE_IO_ERROR;
     ++		assert(j < want_names_len);
     ++		assert(0 == strcmp(ref.ref_name, want_names[j]));
     ++		j++;
     ++		reftable_ref_record_clear(&ref);
      +	}
     -+	err = fd_read_lines(fd, namesp);
     -+	close(fd);
     -+	return err;
     -+}
     ++	assert(j == want_names_len);
      +
     -+struct reftable_merged_table *
     -+reftable_stack_merged_table(struct reftable_stack *st)
     -+{
     -+	return st->merged;
     ++	slice_release(&buf);
     ++	free_names(want_names);
     ++	reftable_iterator_destroy(&it);
     ++	reader_close(&rd);
      +}
      +
     -+/* Close and free the stack */
     -+void reftable_stack_destroy(struct reftable_stack *st)
     ++static void test_table_refs_for_no_index(void)
      +{
     -+	if (st->merged != NULL) {
     -+		reftable_merged_table_close(st->merged);
     -+		reftable_merged_table_free(st->merged);
     -+		st->merged = NULL;
     -+	}
     -+	FREE_AND_NULL(st->list_file);
     -+	FREE_AND_NULL(st->reftable_dir);
     -+	reftable_free(st);
     ++	test_table_refs_for(false);
      +}
      +
     -+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
     -+						   int cur_len)
     ++static void test_table_refs_for_obj_index(void)
      +{
     -+	struct reftable_reader **cur =
     -+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
     -+	int i = 0;
     -+	for (i = 0; i < cur_len; i++) {
     -+		cur[i] = st->merged->stack[i];
     -+	}
     -+	return cur;
     ++	test_table_refs_for(true);
      +}
      +
     -+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
     -+				      bool reuse_open)
     ++static void test_table_empty(void)
      +{
     -+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
     -+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
     -+	int err = 0;
     -+	int names_len = names_length(names);
     -+	struct reftable_reader **new_tables =
     -+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
     -+	int new_tables_len = 0;
     -+	struct reftable_merged_table *new_merged = NULL;
     -+	int i;
     -+	struct slice table_path = SLICE_INIT;
     ++	struct reftable_write_options opts = { 0 };
     ++	struct slice buf = SLICE_INIT;
     ++	struct reftable_writer *w =
     ++		reftable_new_writer(&slice_add_void, &buf, &opts);
     ++	struct reftable_block_source source = { 0 };
     ++	struct reftable_reader *rd = NULL;
     ++	struct reftable_ref_record rec = { 0 };
     ++	struct reftable_iterator it = { 0 };
     ++	int err;
      +
     -+	while (*names) {
     -+		struct reftable_reader *rd = NULL;
     -+		char *name = *names++;
     ++	reftable_writer_set_limits(w, 1, 1);
      +
     -+		/* this is linear; we assume compaction keeps the number of
     -+		   tables under control so this is not quadratic. */
     -+		int j = 0;
     -+		for (j = 0; reuse_open && j < cur_len; j++) {
     -+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
     -+				rd = cur[j];
     -+				cur[j] = NULL;
     -+				break;
     -+			}
     -+		}
     ++	err = reftable_writer_close(w);
     ++	assert(err == REFTABLE_EMPTY_TABLE_ERROR);
     ++	reftable_writer_free(w);
      +
     -+		if (rd == NULL) {
     -+			struct reftable_block_source src = { 0 };
     -+			slice_set_string(&table_path, st->reftable_dir);
     -+			slice_addstr(&table_path, "/");
     -+			slice_addstr(&table_path, name);
     ++	assert(buf.len == header_size(1) + footer_size(1));
      +
     -+			err = reftable_block_source_from_file(
     -+				&src, slice_as_string(&table_path));
     -+			if (err < 0)
     -+				goto done;
     ++	block_source_from_slice(&source, &buf);
      +
     -+			err = reftable_new_reader(&rd, &src, name);
     -+			if (err < 0)
     -+				goto done;
     -+		}
     ++	err = reftable_new_reader(&rd, &source, "filename");
     ++	assert_err(err);
      +
     -+		new_tables[new_tables_len++] = rd;
     -+	}
     ++	err = reftable_reader_seek_ref(rd, &it, "");
     ++	assert_err(err);
      +
     -+	/* success! */
     -+	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
     -+					st->config.hash_id);
     -+	if (err < 0)
     -+		goto done;
     ++	err = reftable_iterator_next_ref(&it, &rec);
     ++	assert(err > 0);
      +
     -+	new_tables = NULL;
     -+	new_tables_len = 0;
     -+	if (st->merged != NULL) {
     -+		merged_table_clear(st->merged);
     -+		reftable_merged_table_free(st->merged);
     -+	}
     -+	new_merged->suppress_deletions = true;
     -+	st->merged = new_merged;
     ++	reftable_iterator_destroy(&it);
     ++	reftable_reader_free(rd);
     ++	slice_release(&buf);
     ++}
     ++
     ++int reftable_test_main(int argc, const char *argv[])
     ++{
     ++	add_test_case("test_default_write_opts", test_default_write_opts);
     ++	add_test_case("test_log_write_read", test_log_write_read);
     ++	add_test_case("test_table_read_write_seek_linear_sha256",
     ++		      &test_table_read_write_seek_linear_sha256);
     ++	add_test_case("test_log_buffer_size", test_log_buffer_size);
     ++	add_test_case("test_table_write_small_table",
     ++		      &test_table_write_small_table);
     ++	add_test_case("test_buffer", &test_buffer);
     ++	add_test_case("test_table_read_api", &test_table_read_api);
     ++	add_test_case("test_table_read_write_sequential",
     ++		      &test_table_read_write_sequential);
     ++	add_test_case("test_table_read_write_seek_linear",
     ++		      &test_table_read_write_seek_linear);
     ++	add_test_case("test_table_read_write_seek_index",
     ++		      &test_table_read_write_seek_index);
     ++	add_test_case("test_table_read_write_refs_for_no_index",
     ++		      &test_table_refs_for_no_index);
     ++	add_test_case("test_table_read_write_refs_for_obj_index",
     ++		      &test_table_refs_for_obj_index);
     ++	add_test_case("test_table_empty", &test_table_empty);
     ++	return test_main(argc, argv);
     ++}
     +
     + ## reftable/slice.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
      +
     -+	for (i = 0; i < cur_len; i++) {
     -+		if (cur[i] != NULL) {
     -+			reader_close(cur[i]);
     -+			reftable_reader_free(cur[i]);
     -+		}
     -+	}
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
      +
     -+done:
     -+	slice_release(&table_path);
     -+	for (i = 0; i < new_tables_len; i++) {
     -+		reader_close(new_tables[i]);
     -+		reftable_reader_free(new_tables[i]);
     -+	}
     -+	reftable_free(new_tables);
     -+	reftable_free(cur);
     -+	return err;
     -+}
     ++#include "slice.h"
      +
     -+/* return negative if a before b. */
     -+static int tv_cmp(struct timeval *a, struct timeval *b)
     ++#include "system.h"
     ++
     ++#include "reftable.h"
     ++
     ++struct slice reftable_empty_slice = SLICE_INIT;
     ++
     ++void slice_set_string(struct slice *s, const char *str)
      +{
     -+	time_t diff = a->tv_sec - b->tv_sec;
     -+	int udiff = a->tv_usec - b->tv_usec;
     ++	int l;
     ++	if (str == NULL) {
     ++		s->len = 0;
     ++		return;
     ++	}
     ++	assert(s->canary == SLICE_CANARY);
      +
     -+	if (diff != 0)
     -+		return diff;
     ++	l = strlen(str);
     ++	l++; /* \0 */
     ++	slice_resize(s, l);
     ++	memcpy(s->buf, str, l);
     ++	s->len = l - 1;
     ++}
      +
     -+	return udiff;
     ++void slice_init(struct slice *s)
     ++{
     ++	struct slice empty = SLICE_INIT;
     ++	*s = empty;
      +}
      +
     -+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
     -+				      bool reuse_open)
     ++void slice_resize(struct slice *s, int l)
      +{
     -+	struct timeval deadline = { 0 };
     ++	assert(s->canary == SLICE_CANARY);
     ++	if (s->cap < l) {
     ++		int c = s->cap * 2;
     ++		if (c < l) {
     ++			c = l;
     ++		}
     ++		s->cap = c;
     ++		s->buf = reftable_realloc(s->buf, s->cap);
     ++	}
     ++	s->len = l;
     ++}
     ++
     ++void slice_addstr(struct slice *d, const char *s)
     ++{
     ++	int l1 = d->len;
     ++	int l2 = strlen(s);
     ++	assert(d->canary == SLICE_CANARY);
     ++
     ++	slice_resize(d, l2 + l1);
     ++	memcpy(d->buf + l1, s, l2);
     ++}
     ++
     ++void slice_addbuf(struct slice *s, struct slice *a)
     ++{
     ++	int end = s->len;
     ++	assert(s->canary == SLICE_CANARY);
     ++	slice_resize(s, s->len + a->len);
     ++	memcpy(s->buf + end, a->buf, a->len);
     ++}
     ++
     ++void slice_consume(struct slice *s, int n)
     ++{
     ++	assert(s->canary == SLICE_CANARY);
     ++	s->buf += n;
     ++	s->len -= n;
     ++}
     ++
     ++byte *slice_detach(struct slice *s)
     ++{
     ++	byte *p = s->buf;
     ++	assert(s->canary == SLICE_CANARY);
     ++	s->buf = NULL;
     ++	s->cap = 0;
     ++	s->len = 0;
     ++	return p;
     ++}
     ++
     ++void slice_release(struct slice *s)
     ++{
     ++	assert(s->canary == SLICE_CANARY);
     ++	reftable_free(slice_detach(s));
     ++}
     ++
     ++void slice_copy(struct slice *dest, struct slice *src)
     ++{
     ++	assert(dest->canary == SLICE_CANARY);
     ++	assert(src->canary == SLICE_CANARY);
     ++	slice_resize(dest, src->len);
     ++	memcpy(dest->buf, src->buf, src->len);
     ++}
     ++
     ++/* return the underlying data as char*. len is left unchanged, but
     ++   a \0 is added at the end. */
     ++const char *slice_as_string(struct slice *s)
     ++{
     ++	assert(s->canary == SLICE_CANARY);
     ++	if (s->cap == s->len) {
     ++		int l = s->len;
     ++		slice_resize(s, l + 1);
     ++		s->len = l;
     ++	}
     ++	s->buf[s->len] = 0;
     ++	return (const char *)s->buf;
     ++}
     ++
     ++/* return a newly malloced string for this slice */
     ++char *slice_to_string(struct slice *in)
     ++{
     ++	struct slice s = SLICE_INIT;
     ++	assert(in->canary == SLICE_CANARY);
     ++	slice_resize(&s, in->len + 1);
     ++	s.buf[in->len] = 0;
     ++	memcpy(s.buf, in->buf, in->len);
     ++	return (char *)slice_detach(&s);
     ++}
     ++
     ++bool slice_equal(struct slice *a, struct slice *b)
     ++{
     ++	return slice_cmp(a, b) == 0;
     ++}
     ++
     ++int slice_cmp(const struct slice *a, const struct slice *b)
     ++{
     ++	int min = a->len < b->len ? a->len : b->len;
     ++	int res = memcmp(a->buf, b->buf, min);
     ++	assert(a->canary == SLICE_CANARY);
     ++	assert(b->canary == SLICE_CANARY);
     ++	if (res != 0)
     ++		return res;
     ++	if (a->len < b->len)
     ++		return -1;
     ++	else if (a->len > b->len)
     ++		return 1;
     ++	else
     ++		return 0;
     ++}
     ++
     ++int slice_add(struct slice *b, byte *data, size_t sz)
     ++{
     ++	assert(b->canary == SLICE_CANARY);
     ++	if (b->len + sz > b->cap) {
     ++		int newcap = 2 * b->cap + 1;
     ++		if (newcap < b->len + sz) {
     ++			newcap = (b->len + sz);
     ++		}
     ++		b->buf = reftable_realloc(b->buf, newcap);
     ++		b->cap = newcap;
     ++	}
     ++
     ++	memcpy(b->buf + b->len, data, sz);
     ++	b->len += sz;
     ++	return sz;
     ++}
     ++
     ++int slice_add_void(void *b, byte *data, size_t sz)
     ++{
     ++	return slice_add((struct slice *)b, data, sz);
     ++}
     ++
     ++static uint64_t slice_size(void *b)
     ++{
     ++	return ((struct slice *)b)->len;
     ++}
     ++
     ++static void slice_return_block(void *b, struct reftable_block *dest)
     ++{
     ++	memset(dest->data, 0xff, dest->len);
     ++	reftable_free(dest->data);
     ++}
     ++
     ++static void slice_close(void *b)
     ++{
     ++}
     ++
     ++static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
     ++			    uint32_t size)
     ++{
     ++	struct slice *b = (struct slice *)v;
     ++	assert(off + size <= b->len);
     ++	dest->data = reftable_calloc(size);
     ++	memcpy(dest->data, b->buf + off, size);
     ++	dest->len = size;
     ++	return size;
     ++}
     ++
     ++struct reftable_block_source_vtable slice_vtable = {
     ++	.size = &slice_size,
     ++	.read_block = &slice_read_block,
     ++	.return_block = &slice_return_block,
     ++	.close = &slice_close,
     ++};
     ++
     ++void block_source_from_slice(struct reftable_block_source *bs,
     ++			     struct slice *buf)
     ++{
     ++	assert(bs->ops == NULL);
     ++	bs->ops = &slice_vtable;
     ++	bs->arg = buf;
     ++}
     ++
     ++static void malloc_return_block(void *b, struct reftable_block *dest)
     ++{
     ++	memset(dest->data, 0xff, dest->len);
     ++	reftable_free(dest->data);
     ++}
     ++
     ++struct reftable_block_source_vtable malloc_vtable = {
     ++	.return_block = &malloc_return_block,
     ++};
     ++
     ++struct reftable_block_source malloc_block_source_instance = {
     ++	.ops = &malloc_vtable,
     ++};
     ++
     ++struct reftable_block_source malloc_block_source(void)
     ++{
     ++	return malloc_block_source_instance;
     ++}
     ++
     ++int common_prefix_size(struct slice *a, struct slice *b)
     ++{
     ++	int p = 0;
     ++	assert(a->canary == SLICE_CANARY);
     ++	assert(b->canary == SLICE_CANARY);
     ++	while (p < a->len && p < b->len) {
     ++		if (a->buf[p] != b->buf[p]) {
     ++			break;
     ++		}
     ++		p++;
     ++	}
     ++
     ++	return p;
     ++}
     +
     + ## reftable/slice.h (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#ifndef SLICE_H
     ++#define SLICE_H
     ++
     ++#include "basics.h"
     ++#include "reftable.h"
     ++
     ++/*
     ++  provides bounds-checked byte ranges.
     ++  To use, initialize as "slice x = {0};"
     ++ */
     ++struct slice {
     ++	int len;
     ++	int cap;
     ++	byte *buf;
     ++	byte canary;
     ++};
     ++#define SLICE_CANARY 0x42
     ++#define SLICE_INIT                       \
     ++	{                                \
     ++		0, 0, NULL, SLICE_CANARY \
     ++	}
     ++extern struct slice reftable_empty_slice;
     ++
     ++void slice_set_string(struct slice *dest, const char *src);
     ++void slice_addstr(struct slice *dest, const char *src);
     ++
     ++/* Deallocate and clear slice */
     ++void slice_release(struct slice *slice);
     ++
     ++/* Return a malloced string for `src` */
     ++char *slice_to_string(struct slice *src);
     ++
     ++/* Initializes a slice. Accepts a slice with random garbage. */
     ++void slice_init(struct slice *slice);
     ++
     ++/* Ensure that `buf` is \0 terminated. */
     ++const char *slice_as_string(struct slice *src);
     ++
     ++/* Compare slices */
     ++bool slice_equal(struct slice *a, struct slice *b);
     ++
     ++/* Return `buf`, clearing out `s` */
     ++byte *slice_detach(struct slice *s);
     ++
     ++/* Copy bytes */
     ++void slice_copy(struct slice *dest, struct slice *src);
     ++
     ++/* Advance `buf` by `n`, and decrease length. A copy of the slice
     ++   should be kept for deallocating the slice. */
     ++void slice_consume(struct slice *s, int n);
     ++
     ++/* Set length of the slice to `l` */
     ++void slice_resize(struct slice *s, int l);
     ++
     ++/* Signed comparison */
     ++int slice_cmp(const struct slice *a, const struct slice *b);
     ++
     ++/* Append `data` to the `dest` slice.  */
     ++int slice_add(struct slice *dest, byte *data, size_t sz);
     ++
     ++/* Append `add` to `dest. */
     ++void slice_addbuf(struct slice *dest, struct slice *add);
     ++
     ++/* Like slice_add, but suitable for passing to reftable_new_writer
     ++ */
     ++int slice_add_void(void *b, byte *data, size_t sz);
     ++
     ++/* Find the longest shared prefix size of `a` and `b` */
     ++int common_prefix_size(struct slice *a, struct slice *b);
     ++
     ++struct reftable_block_source;
     ++
     ++/* Create an in-memory block source for reading reftables */
     ++void block_source_from_slice(struct reftable_block_source *bs,
     ++			     struct slice *buf);
     ++
     ++struct reftable_block_source malloc_block_source(void);
     ++
     ++#endif
     +
     + ## reftable/slice_test.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include "slice.h"
     ++
     ++#include "system.h"
     ++
     ++#include "basics.h"
     ++#include "record.h"
     ++#include "reftable.h"
     ++#include "test_framework.h"
     ++#include "reftable-tests.h"
     ++
     ++static void test_slice(void)
     ++{
     ++	struct slice s = SLICE_INIT;
     ++	struct slice t = SLICE_INIT;
     ++
     ++	slice_set_string(&s, "abc");
     ++	assert(0 == strcmp("abc", slice_as_string(&s)));
     ++
     ++	slice_set_string(&t, "pqr");
     ++
     ++	slice_addbuf(&s, &t);
     ++	assert(0 == strcmp("abcpqr", slice_as_string(&s)));
     ++
     ++	slice_release(&s);
     ++	slice_release(&t);
     ++}
     ++
     ++int slice_test_main(int argc, const char *argv[])
     ++{
     ++	add_test_case("test_slice", &test_slice);
     ++	return test_main(argc, argv);
     ++}
     +
     + ## reftable/stack.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include "stack.h"
     ++
     ++#include "system.h"
     ++#include "merged.h"
     ++#include "reader.h"
     ++#include "refname.h"
     ++#include "reftable.h"
     ++#include "writer.h"
     ++
     ++int reftable_new_stack(struct reftable_stack **dest, const char *dir,
     ++		       struct reftable_write_options config)
     ++{
     ++	struct reftable_stack *p =
     ++		reftable_calloc(sizeof(struct reftable_stack));
     ++	struct slice list_file_name = SLICE_INIT;
     ++	int err = 0;
     ++
     ++	if (config.hash_id == 0) {
     ++		config.hash_id = SHA1_ID;
     ++	}
     ++
     ++	*dest = NULL;
     ++
     ++	slice_set_string(&list_file_name, dir);
     ++	slice_addstr(&list_file_name, "/tables.list");
     ++
     ++	p->list_file = slice_to_string(&list_file_name);
     ++	slice_release(&list_file_name);
     ++	p->reftable_dir = xstrdup(dir);
     ++	p->config = config;
     ++
     ++	err = reftable_stack_reload_maybe_reuse(p, true);
     ++	if (err < 0) {
     ++		reftable_stack_destroy(p);
     ++	} else {
     ++		*dest = p;
     ++	}
     ++	return err;
     ++}
     ++
     ++static int fd_read_lines(int fd, char ***namesp)
     ++{
     ++	off_t size = lseek(fd, 0, SEEK_END);
     ++	char *buf = NULL;
     ++	int err = 0;
     ++	if (size < 0) {
     ++		err = REFTABLE_IO_ERROR;
     ++		goto done;
     ++	}
     ++	err = lseek(fd, 0, SEEK_SET);
     ++	if (err < 0) {
     ++		err = REFTABLE_IO_ERROR;
     ++		goto done;
     ++	}
     ++
     ++	buf = reftable_malloc(size + 1);
     ++	if (read(fd, buf, size) != size) {
     ++		err = REFTABLE_IO_ERROR;
     ++		goto done;
     ++	}
     ++	buf[size] = 0;
     ++
     ++	parse_names(buf, size, namesp);
     ++
     ++done:
     ++	reftable_free(buf);
     ++	return err;
     ++}
     ++
     ++int read_lines(const char *filename, char ***namesp)
     ++{
     ++	int fd = open(filename, O_RDONLY, 0644);
     ++	int err = 0;
     ++	if (fd < 0) {
     ++		if (errno == ENOENT) {
     ++			*namesp = reftable_calloc(sizeof(char *));
     ++			return 0;
     ++		}
     ++
     ++		return REFTABLE_IO_ERROR;
     ++	}
     ++	err = fd_read_lines(fd, namesp);
     ++	close(fd);
     ++	return err;
     ++}
     ++
     ++struct reftable_merged_table *
     ++reftable_stack_merged_table(struct reftable_stack *st)
     ++{
     ++	return st->merged;
     ++}
     ++
     ++/* Close and free the stack */
     ++void reftable_stack_destroy(struct reftable_stack *st)
     ++{
     ++	if (st->merged != NULL) {
     ++		reftable_merged_table_close(st->merged);
     ++		reftable_merged_table_free(st->merged);
     ++		st->merged = NULL;
     ++	}
     ++	FREE_AND_NULL(st->list_file);
     ++	FREE_AND_NULL(st->reftable_dir);
     ++	reftable_free(st);
     ++}
     ++
     ++static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
     ++						   int cur_len)
     ++{
     ++	struct reftable_reader **cur =
     ++		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
     ++	int i = 0;
     ++	for (i = 0; i < cur_len; i++) {
     ++		cur[i] = st->merged->stack[i];
     ++	}
     ++	return cur;
     ++}
     ++
     ++static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
     ++				      bool reuse_open)
     ++{
     ++	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
     ++	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
     ++	int err = 0;
     ++	int names_len = names_length(names);
     ++	struct reftable_reader **new_tables =
     ++		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
     ++	int new_tables_len = 0;
     ++	struct reftable_merged_table *new_merged = NULL;
     ++	int i;
     ++	struct slice table_path = SLICE_INIT;
     ++
     ++	while (*names) {
     ++		struct reftable_reader *rd = NULL;
     ++		char *name = *names++;
     ++
     ++		/* this is linear; we assume compaction keeps the number of
     ++		   tables under control so this is not quadratic. */
     ++		int j = 0;
     ++		for (j = 0; reuse_open && j < cur_len; j++) {
     ++			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
     ++				rd = cur[j];
     ++				cur[j] = NULL;
     ++				break;
     ++			}
     ++		}
     ++
     ++		if (rd == NULL) {
     ++			struct reftable_block_source src = { 0 };
     ++			slice_set_string(&table_path, st->reftable_dir);
     ++			slice_addstr(&table_path, "/");
     ++			slice_addstr(&table_path, name);
     ++
     ++			err = reftable_block_source_from_file(
     ++				&src, slice_as_string(&table_path));
     ++			if (err < 0)
     ++				goto done;
     ++
     ++			err = reftable_new_reader(&rd, &src, name);
     ++			if (err < 0)
     ++				goto done;
     ++		}
     ++
     ++		new_tables[new_tables_len++] = rd;
     ++	}
     ++
     ++	/* success! */
     ++	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
     ++					st->config.hash_id);
     ++	if (err < 0)
     ++		goto done;
     ++
     ++	new_tables = NULL;
     ++	new_tables_len = 0;
     ++	if (st->merged != NULL) {
     ++		merged_table_clear(st->merged);
     ++		reftable_merged_table_free(st->merged);
     ++	}
     ++	new_merged->suppress_deletions = true;
     ++	st->merged = new_merged;
     ++
     ++	for (i = 0; i < cur_len; i++) {
     ++		if (cur[i] != NULL) {
     ++			reader_close(cur[i]);
     ++			reftable_reader_free(cur[i]);
     ++		}
     ++	}
     ++
     ++done:
     ++	slice_release(&table_path);
     ++	for (i = 0; i < new_tables_len; i++) {
     ++		reader_close(new_tables[i]);
     ++		reftable_reader_free(new_tables[i]);
     ++	}
     ++	reftable_free(new_tables);
     ++	reftable_free(cur);
     ++	return err;
     ++}
     ++
     ++/* return negative if a before b. */
     ++static int tv_cmp(struct timeval *a, struct timeval *b)
     ++{
     ++	time_t diff = a->tv_sec - b->tv_sec;
     ++	int udiff = a->tv_usec - b->tv_usec;
     ++
     ++	if (diff != 0)
     ++		return diff;
     ++
     ++	return udiff;
     ++}
     ++
     ++int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
     ++				      bool reuse_open)
     ++{
     ++	struct timeval deadline = { 0 };
      +	int err = gettimeofday(&deadline, NULL);
      +	int64_t delay = 0;
      +	int tries = 0;
      +	if (err < 0)
     -+		return err;
     ++		return err;
     ++
     ++	deadline.tv_sec += 3;
     ++	while (true) {
     ++		char **names = NULL;
     ++		char **names_after = NULL;
     ++		struct timeval now = { 0 };
     ++		int err = gettimeofday(&now, NULL);
     ++		int err2 = 0;
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++
     ++		/* Only look at deadlines after the first few times. This
     ++		   simplifies debugging in GDB */
     ++		tries++;
     ++		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
     ++			break;
     ++		}
     ++
     ++		err = read_lines(st->list_file, &names);
     ++		if (err < 0) {
     ++			free_names(names);
     ++			return err;
     ++		}
     ++		err = reftable_stack_reload_once(st, names, reuse_open);
     ++		if (err == 0) {
     ++			free_names(names);
     ++			break;
     ++		}
     ++		if (err != REFTABLE_NOT_EXIST_ERROR) {
     ++			free_names(names);
     ++			return err;
     ++		}
     ++
     ++		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
     ++		   writer. Check if there was one by checking if the name list
     ++		   changed.
     ++		*/
     ++		err2 = read_lines(st->list_file, &names_after);
     ++		if (err2 < 0) {
     ++			free_names(names);
     ++			return err2;
     ++		}
     ++
     ++		if (names_equal(names_after, names)) {
     ++			free_names(names);
     ++			free_names(names_after);
     ++			return err;
     ++		}
     ++		free_names(names);
     ++		free_names(names_after);
     ++
     ++		delay = delay + (delay * rand()) / RAND_MAX + 1;
     ++		sleep_millisec(delay);
     ++	}
     ++
     ++	return 0;
     ++}
     ++
     ++/* -1 = error
     ++ 0 = up to date
     ++ 1 = changed. */
     ++static int stack_uptodate(struct reftable_stack *st)
     ++{
     ++	char **names = NULL;
     ++	int err = read_lines(st->list_file, &names);
     ++	int i = 0;
     ++	if (err < 0)
     ++		return err;
     ++
     ++	for (i = 0; i < st->merged->stack_len; i++) {
     ++		if (names[i] == NULL) {
     ++			err = 1;
     ++			goto done;
     ++		}
     ++
     ++		if (strcmp(st->merged->stack[i]->name, names[i])) {
     ++			err = 1;
     ++			goto done;
     ++		}
     ++	}
     ++
     ++	if (names[st->merged->stack_len] != NULL) {
     ++		err = 1;
     ++		goto done;
     ++	}
     ++
     ++done:
     ++	free_names(names);
     ++	return err;
     ++}
     ++
     ++int reftable_stack_reload(struct reftable_stack *st)
     ++{
     ++	int err = stack_uptodate(st);
     ++	if (err > 0)
     ++		return reftable_stack_reload_maybe_reuse(st, true);
     ++	return err;
     ++}
     ++
     ++int reftable_stack_add(struct reftable_stack *st,
     ++		       int (*write)(struct reftable_writer *wr, void *arg),
     ++		       void *arg)
     ++{
     ++	int err = stack_try_add(st, write, arg);
     ++	if (err < 0) {
     ++		if (err == REFTABLE_LOCK_ERROR) {
     ++			/* Ignore error return, we want to propagate
     ++			   REFTABLE_LOCK_ERROR.
     ++			*/
     ++			reftable_stack_reload(st);
     ++		}
     ++		return err;
     ++	}
     ++
     ++	if (!st->disable_auto_compact)
     ++		return reftable_stack_auto_compact(st);
     ++
     ++	return 0;
     ++}
     ++
     ++static void format_name(struct slice *dest, uint64_t min, uint64_t max)
     ++{
     ++	char buf[100];
     ++	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
     ++	slice_set_string(dest, buf);
     ++}
     ++
     ++struct reftable_addition {
     ++	int lock_file_fd;
     ++	struct slice lock_file_name;
     ++	struct reftable_stack *stack;
     ++	char **names;
     ++	char **new_tables;
     ++	int new_tables_len;
     ++	uint64_t next_update_index;
     ++};
     ++
     ++#define REFTABLE_ADDITION_INIT               \
     ++	{                                    \
     ++		.lock_file_name = SLICE_INIT \
     ++	}
     ++
     ++static int reftable_stack_init_addition(struct reftable_addition *add,
     ++					struct reftable_stack *st)
     ++{
     ++	int err = 0;
     ++	add->stack = st;
     ++
     ++	slice_set_string(&add->lock_file_name, st->list_file);
     ++	slice_addstr(&add->lock_file_name, ".lock");
     ++
     ++	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
     ++				 O_EXCL | O_CREAT | O_WRONLY, 0644);
     ++	if (add->lock_file_fd < 0) {
     ++		if (errno == EEXIST) {
     ++			err = REFTABLE_LOCK_ERROR;
     ++		} else {
     ++			err = REFTABLE_IO_ERROR;
     ++		}
     ++		goto done;
     ++	}
     ++	err = stack_uptodate(st);
     ++	if (err < 0)
     ++		goto done;
     ++
     ++	if (err > 1) {
     ++		err = REFTABLE_LOCK_ERROR;
     ++		goto done;
     ++	}
     ++
     ++	add->next_update_index = reftable_stack_next_update_index(st);
     ++done:
     ++	if (err) {
     ++		reftable_addition_close(add);
     ++	}
     ++	return err;
     ++}
     ++
     ++void reftable_addition_close(struct reftable_addition *add)
     ++{
     ++	int i = 0;
     ++	struct slice nm = SLICE_INIT;
     ++	for (i = 0; i < add->new_tables_len; i++) {
     ++		slice_set_string(&nm, add->stack->list_file);
     ++		slice_addstr(&nm, "/");
     ++		slice_addstr(&nm, add->new_tables[i]);
     ++		unlink(slice_as_string(&nm));
     ++		reftable_free(add->new_tables[i]);
     ++		add->new_tables[i] = NULL;
     ++	}
     ++	reftable_free(add->new_tables);
     ++	add->new_tables = NULL;
     ++	add->new_tables_len = 0;
     ++
     ++	if (add->lock_file_fd > 0) {
     ++		close(add->lock_file_fd);
     ++		add->lock_file_fd = 0;
     ++	}
     ++	if (add->lock_file_name.len > 0) {
     ++		unlink(slice_as_string(&add->lock_file_name));
     ++		slice_release(&add->lock_file_name);
     ++	}
     ++
     ++	free_names(add->names);
     ++	add->names = NULL;
     ++	slice_release(&nm);
     ++}
     ++
     ++void reftable_addition_destroy(struct reftable_addition *add)
     ++{
     ++	if (add == NULL) {
     ++		return;
     ++	}
     ++	reftable_addition_close(add);
     ++	reftable_free(add);
     ++}
     ++
     ++int reftable_addition_commit(struct reftable_addition *add)
     ++{
     ++	struct slice table_list = SLICE_INIT;
     ++	int i = 0;
     ++	int err = 0;
     ++	if (add->new_tables_len == 0)
     ++		goto done;
     ++
     ++	for (i = 0; i < add->stack->merged->stack_len; i++) {
     ++		slice_addstr(&table_list, add->stack->merged->stack[i]->name);
     ++		slice_addstr(&table_list, "\n");
     ++	}
     ++	for (i = 0; i < add->new_tables_len; i++) {
     ++		slice_addstr(&table_list, add->new_tables[i]);
     ++		slice_addstr(&table_list, "\n");
     ++	}
     ++
     ++	err = write(add->lock_file_fd, table_list.buf, table_list.len);
     ++	slice_release(&table_list);
     ++	if (err < 0) {
     ++		err = REFTABLE_IO_ERROR;
     ++		goto done;
     ++	}
     ++
     ++	err = close(add->lock_file_fd);
     ++	add->lock_file_fd = 0;
     ++	if (err < 0) {
     ++		err = REFTABLE_IO_ERROR;
     ++		goto done;
     ++	}
     ++
     ++	err = rename(slice_as_string(&add->lock_file_name),
     ++		     add->stack->list_file);
     ++	if (err < 0) {
     ++		err = REFTABLE_IO_ERROR;
     ++		goto done;
     ++	}
     ++
     ++	err = reftable_stack_reload(add->stack);
     ++
     ++done:
     ++	reftable_addition_close(add);
     ++	return err;
     ++}
     ++
     ++int reftable_stack_new_addition(struct reftable_addition **dest,
     ++				struct reftable_stack *st)
     ++{
     ++	int err = 0;
     ++	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
     ++	*dest = reftable_calloc(sizeof(**dest));
     ++	**dest = empty;
     ++	err = reftable_stack_init_addition(*dest, st);
     ++	if (err) {
     ++		reftable_free(*dest);
     ++		*dest = NULL;
     ++	}
     ++	return err;
     ++}
     ++
     ++int stack_try_add(struct reftable_stack *st,
     ++		  int (*write_table)(struct reftable_writer *wr, void *arg),
     ++		  void *arg)
     ++{
     ++	struct reftable_addition add = REFTABLE_ADDITION_INIT;
     ++	int err = reftable_stack_init_addition(&add, st);
     ++	if (err < 0)
     ++		goto done;
     ++	if (err > 0) {
     ++		err = REFTABLE_LOCK_ERROR;
     ++		goto done;
     ++	}
     ++
     ++	err = reftable_addition_add(&add, write_table, arg);
     ++	if (err < 0)
     ++		goto done;
     ++
     ++	err = reftable_addition_commit(&add);
     ++done:
     ++	reftable_addition_close(&add);
     ++	return err;
     ++}
     ++
     ++int reftable_addition_add(struct reftable_addition *add,
     ++			  int (*write_table)(struct reftable_writer *wr,
     ++					     void *arg),
     ++			  void *arg)
     ++{
     ++	struct slice temp_tab_file_name = SLICE_INIT;
     ++	struct slice tab_file_name = SLICE_INIT;
     ++	struct slice next_name = SLICE_INIT;
     ++	struct reftable_writer *wr = NULL;
     ++	int err = 0;
     ++	int tab_fd = 0;
     ++
     ++	slice_resize(&next_name, 0);
     ++	format_name(&next_name, add->next_update_index, add->next_update_index);
     ++
     ++	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
     ++	slice_addstr(&temp_tab_file_name, "/");
     ++	slice_addbuf(&temp_tab_file_name, &next_name);
     ++	slice_addstr(&temp_tab_file_name, ".temp.XXXXXX");
     ++
     ++	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
     ++	if (tab_fd < 0) {
     ++		err = REFTABLE_IO_ERROR;
     ++		goto done;
     ++	}
     ++
     ++	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
     ++				 &add->stack->config);
     ++	err = write_table(wr, arg);
     ++	if (err < 0)
     ++		goto done;
     ++
     ++	err = reftable_writer_close(wr);
     ++	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
     ++		err = 0;
     ++		goto done;
     ++	}
     ++	if (err < 0)
     ++		goto done;
     ++
     ++	err = close(tab_fd);
     ++	tab_fd = 0;
     ++	if (err < 0) {
     ++		err = REFTABLE_IO_ERROR;
     ++		goto done;
     ++	}
     ++
     ++	err = stack_check_addition(add->stack,
     ++				   slice_as_string(&temp_tab_file_name));
     ++	if (err < 0)
     ++		goto done;
     ++
     ++	if (wr->min_update_index < add->next_update_index) {
     ++		err = REFTABLE_API_ERROR;
     ++		goto done;
     ++	}
     ++
     ++	format_name(&next_name, wr->min_update_index, wr->max_update_index);
     ++	slice_addstr(&next_name, ".ref");
     ++
     ++	slice_set_string(&tab_file_name, add->stack->reftable_dir);
     ++	slice_addstr(&tab_file_name, "/");
     ++	slice_addbuf(&tab_file_name, &next_name);
     ++
     ++	/* TODO: should check destination out of paranoia */
     ++	err = rename(slice_as_string(&temp_tab_file_name),
     ++		     slice_as_string(&tab_file_name));
     ++	if (err < 0) {
     ++		err = REFTABLE_IO_ERROR;
     ++		goto done;
     ++	}
     ++
     ++	add->new_tables = reftable_realloc(add->new_tables,
     ++					   sizeof(*add->new_tables) *
     ++						   (add->new_tables_len + 1));
     ++	add->new_tables[add->new_tables_len] = slice_to_string(&next_name);
     ++	add->new_tables_len++;
     ++done:
     ++	if (tab_fd > 0) {
     ++		close(tab_fd);
     ++		tab_fd = 0;
     ++	}
     ++	if (temp_tab_file_name.len > 0) {
     ++		unlink(slice_as_string(&temp_tab_file_name));
     ++	}
     ++
     ++	slice_release(&temp_tab_file_name);
     ++	slice_release(&tab_file_name);
     ++	slice_release(&next_name);
     ++	reftable_writer_free(wr);
     ++	return err;
     ++}
     ++
     ++uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
     ++{
     ++	int sz = st->merged->stack_len;
     ++	if (sz > 0)
     ++		return reftable_reader_max_update_index(
     ++			       st->merged->stack[sz - 1]) +
     ++		       1;
     ++	return 1;
     ++}
     ++
     ++static int stack_compact_locked(struct reftable_stack *st, int first, int last,
     ++				struct slice *temp_tab,
     ++				struct reftable_log_expiry_config *config)
     ++{
     ++	struct slice next_name = SLICE_INIT;
     ++	int tab_fd = -1;
     ++	struct reftable_writer *wr = NULL;
     ++	int err = 0;
     ++
     ++	format_name(&next_name,
     ++		    reftable_reader_min_update_index(st->merged->stack[first]),
     ++		    reftable_reader_max_update_index(st->merged->stack[first]));
     ++
     ++	slice_set_string(temp_tab, st->reftable_dir);
     ++	slice_addstr(temp_tab, "/");
     ++	slice_addbuf(temp_tab, &next_name);
     ++	slice_addstr(temp_tab, ".temp.XXXXXX");
     ++
     ++	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
     ++	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
     ++
     ++	err = stack_write_compact(st, wr, first, last, config);
     ++	if (err < 0)
     ++		goto done;
     ++	err = reftable_writer_close(wr);
     ++	if (err < 0)
     ++		goto done;
     ++
     ++	err = close(tab_fd);
     ++	tab_fd = 0;
     ++
     ++done:
     ++	reftable_writer_free(wr);
     ++	if (tab_fd > 0) {
     ++		close(tab_fd);
     ++		tab_fd = 0;
     ++	}
     ++	if (err != 0 && temp_tab->len > 0) {
     ++		unlink(slice_as_string(temp_tab));
     ++		slice_release(temp_tab);
     ++	}
     ++	slice_release(&next_name);
     ++	return err;
     ++}
     ++
     ++int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
     ++			int first, int last,
     ++			struct reftable_log_expiry_config *config)
     ++{
     ++	int subtabs_len = last - first + 1;
     ++	struct reftable_reader **subtabs = reftable_calloc(
     ++		sizeof(struct reftable_reader *) * (last - first + 1));
     ++	struct reftable_merged_table *mt = NULL;
     ++	int err = 0;
     ++	struct reftable_iterator it = { 0 };
     ++	struct reftable_ref_record ref = { 0 };
     ++	struct reftable_log_record log = { 0 };
     ++
     ++	uint64_t entries = 0;
     ++
     ++	int i = 0, j = 0;
     ++	for (i = first, j = 0; i <= last; i++) {
     ++		struct reftable_reader *t = st->merged->stack[i];
     ++		subtabs[j++] = t;
     ++		st->stats.bytes += t->size;
     ++	}
     ++	reftable_writer_set_limits(wr,
     ++				   st->merged->stack[first]->min_update_index,
     ++				   st->merged->stack[last]->max_update_index);
     ++
     ++	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
     ++					st->config.hash_id);
     ++	if (err < 0) {
     ++		reftable_free(subtabs);
     ++		goto done;
     ++	}
     ++
     ++	err = reftable_merged_table_seek_ref(mt, &it, "");
     ++	if (err < 0)
     ++		goto done;
      +
     -+	deadline.tv_sec += 3;
      +	while (true) {
     -+		char **names = NULL;
     -+		char **names_after = NULL;
     -+		struct timeval now = { 0 };
     -+		int err = gettimeofday(&now, NULL);
     -+		int err2 = 0;
     ++		err = reftable_iterator_next_ref(&it, &ref);
     ++		if (err > 0) {
     ++			err = 0;
     ++			break;
     ++		}
      +		if (err < 0) {
     -+			return err;
     ++			break;
     ++		}
     ++		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
     ++			continue;
      +		}
      +
     -+		/* Only look at deadlines after the first few times. This
     -+		   simplifies debugging in GDB */
     -+		tries++;
     -+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
     ++		err = reftable_writer_add_ref(wr, &ref);
     ++		if (err < 0) {
      +			break;
      +		}
     ++		entries++;
     ++	}
     ++	reftable_iterator_destroy(&it);
      +
     -+		err = read_lines(st->list_file, &names);
     -+		if (err < 0) {
     -+			free_names(names);
     -+			return err;
     ++	err = reftable_merged_table_seek_log(mt, &it, "");
     ++	if (err < 0)
     ++		goto done;
     ++
     ++	while (true) {
     ++		err = reftable_iterator_next_log(&it, &log);
     ++		if (err > 0) {
     ++			err = 0;
     ++			break;
      +		}
     -+		err = reftable_stack_reload_once(st, names, reuse_open);
     -+		if (err == 0) {
     -+			free_names(names);
     ++		if (err < 0) {
      +			break;
      +		}
     -+		if (err != REFTABLE_NOT_EXIST_ERROR) {
     -+			free_names(names);
     -+			return err;
     ++		if (first == 0 && reftable_log_record_is_deletion(&log)) {
     ++			continue;
      +		}
      +
     -+		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
     -+		   writer. Check if there was one by checking if the name list
     -+		   changed.
     -+		*/
     -+		err2 = read_lines(st->list_file, &names_after);
     -+		if (err2 < 0) {
     -+			free_names(names);
     -+			return err2;
     ++		if (config != NULL && config->time > 0 &&
     ++		    log.time < config->time) {
     ++			continue;
      +		}
      +
     -+		if (names_equal(names_after, names)) {
     -+			free_names(names);
     -+			free_names(names_after);
     -+			return err;
     ++		if (config != NULL && config->min_update_index > 0 &&
     ++		    log.update_index < config->min_update_index) {
     ++			continue;
      +		}
     -+		free_names(names);
     -+		free_names(names_after);
      +
     -+		delay = delay + (delay * rand()) / RAND_MAX + 1;
     -+		sleep_millisec(delay);
     ++		err = reftable_writer_add_log(wr, &log);
     ++		if (err < 0) {
     ++			break;
     ++		}
     ++		entries++;
     ++	}
     ++
     ++done:
     ++	reftable_iterator_destroy(&it);
     ++	if (mt != NULL) {
     ++		merged_table_clear(mt);
     ++		reftable_merged_table_free(mt);
     ++	}
     ++	reftable_ref_record_clear(&ref);
     ++	reftable_log_record_clear(&log);
     ++	st->stats.entries_written += entries;
     ++	return err;
     ++}
     ++
     ++/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
     ++static int stack_compact_range(struct reftable_stack *st, int first, int last,
     ++			       struct reftable_log_expiry_config *expiry)
     ++{
     ++	struct slice temp_tab_file_name = SLICE_INIT;
     ++	struct slice new_table_name = SLICE_INIT;
     ++	struct slice lock_file_name = SLICE_INIT;
     ++	struct slice ref_list_contents = SLICE_INIT;
     ++	struct slice new_table_path = SLICE_INIT;
     ++	int err = 0;
     ++	bool have_lock = false;
     ++	int lock_file_fd = 0;
     ++	int compact_count = last - first + 1;
     ++	char **listp = NULL;
     ++	char **delete_on_success =
     ++		reftable_calloc(sizeof(char *) * (compact_count + 1));
     ++	char **subtable_locks =
     ++		reftable_calloc(sizeof(char *) * (compact_count + 1));
     ++	int i = 0;
     ++	int j = 0;
     ++	bool is_empty_table = false;
     ++
     ++	if (first > last || (expiry == NULL && first == last)) {
     ++		err = 0;
     ++		goto done;
     ++	}
     ++
     ++	st->stats.attempts++;
     ++
     ++	slice_set_string(&lock_file_name, st->list_file);
     ++	slice_addstr(&lock_file_name, ".lock");
     ++
     ++	lock_file_fd = open(slice_as_string(&lock_file_name),
     ++			    O_EXCL | O_CREAT | O_WRONLY, 0644);
     ++	if (lock_file_fd < 0) {
     ++		if (errno == EEXIST) {
     ++			err = 1;
     ++		} else {
     ++			err = REFTABLE_IO_ERROR;
     ++		}
     ++		goto done;
     ++	}
     ++	/* Don't want to write to the lock for now.  */
     ++	close(lock_file_fd);
     ++	lock_file_fd = 0;
     ++
     ++	have_lock = true;
     ++	err = stack_uptodate(st);
     ++	if (err != 0)
     ++		goto done;
     ++
     ++	for (i = first, j = 0; i <= last; i++) {
     ++		struct slice subtab_file_name = SLICE_INIT;
     ++		struct slice subtab_lock = SLICE_INIT;
     ++		int sublock_file_fd = -1;
     ++
     ++		slice_set_string(&subtab_file_name, st->reftable_dir);
     ++		slice_addstr(&subtab_file_name, "/");
     ++		slice_addstr(&subtab_file_name,
     ++			     reader_name(st->merged->stack[i]));
     ++
     ++		slice_copy(&subtab_lock, &subtab_file_name);
     ++		slice_addstr(&subtab_lock, ".lock");
     ++
     ++		sublock_file_fd = open(slice_as_string(&subtab_lock),
     ++				       O_EXCL | O_CREAT | O_WRONLY, 0644);
     ++		if (sublock_file_fd > 0) {
     ++			close(sublock_file_fd);
     ++		} else if (sublock_file_fd < 0) {
     ++			if (errno == EEXIST) {
     ++				err = 1;
     ++			} else {
     ++				err = REFTABLE_IO_ERROR;
     ++			}
     ++		}
     ++
     ++		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
     ++		delete_on_success[j] =
     ++			(char *)slice_as_string(&subtab_file_name);
     ++		j++;
     ++
     ++		if (err != 0)
     ++			goto done;
      +	}
      +
     -+	return 0;
     -+}
     ++	err = unlink(slice_as_string(&lock_file_name));
     ++	if (err < 0)
     ++		goto done;
     ++	have_lock = false;
      +
     -+/* -1 = error
     -+ 0 = up to date
     -+ 1 = changed. */
     -+static int stack_uptodate(struct reftable_stack *st)
     -+{
     -+	char **names = NULL;
     -+	int err = read_lines(st->list_file, &names);
     -+	int i = 0;
     ++	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
     ++				   expiry);
     ++	/* Compaction + tombstones can create an empty table out of non-empty
     ++	 * tables. */
     ++	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
     ++	if (is_empty_table) {
     ++		err = 0;
     ++	}
      +	if (err < 0)
     -+		return err;
     ++		goto done;
      +
     -+	for (i = 0; i < st->merged->stack_len; i++) {
     -+		if (names[i] == NULL) {
     ++	lock_file_fd = open(slice_as_string(&lock_file_name),
     ++			    O_EXCL | O_CREAT | O_WRONLY, 0644);
     ++	if (lock_file_fd < 0) {
     ++		if (errno == EEXIST) {
      +			err = 1;
     -+			goto done;
     ++		} else {
     ++			err = REFTABLE_IO_ERROR;
      +		}
     ++		goto done;
     ++	}
     ++	have_lock = true;
      +
     -+		if (strcmp(st->merged->stack[i]->name, names[i])) {
     -+			err = 1;
     ++	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
     ++		    st->merged->stack[last]->max_update_index);
     ++	slice_addstr(&new_table_name, ".ref");
     ++
     ++	slice_set_string(&new_table_path, st->reftable_dir);
     ++	slice_addstr(&new_table_path, "/");
     ++
     ++	slice_addbuf(&new_table_path, &new_table_name);
     ++
     ++	if (!is_empty_table) {
     ++		err = rename(slice_as_string(&temp_tab_file_name),
     ++			     slice_as_string(&new_table_path));
     ++		if (err < 0) {
     ++			err = REFTABLE_IO_ERROR;
      +			goto done;
      +		}
      +	}
      +
     -+	if (names[st->merged->stack_len] != NULL) {
     -+		err = 1;
     ++	for (i = 0; i < first; i++) {
     ++		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
     ++		slice_addstr(&ref_list_contents, "\n");
     ++	}
     ++	if (!is_empty_table) {
     ++		slice_addbuf(&ref_list_contents, &new_table_name);
     ++		slice_addstr(&ref_list_contents, "\n");
     ++	}
     ++	for (i = last + 1; i < st->merged->stack_len; i++) {
     ++		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
     ++		slice_addstr(&ref_list_contents, "\n");
     ++	}
     ++
     ++	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
     ++	if (err < 0) {
     ++		err = REFTABLE_IO_ERROR;
     ++		unlink(slice_as_string(&new_table_path));
     ++		goto done;
     ++	}
     ++	err = close(lock_file_fd);
     ++	lock_file_fd = 0;
     ++	if (err < 0) {
     ++		err = REFTABLE_IO_ERROR;
     ++		unlink(slice_as_string(&new_table_path));
     ++		goto done;
     ++	}
     ++
     ++	err = rename(slice_as_string(&lock_file_name), st->list_file);
     ++	if (err < 0) {
     ++		err = REFTABLE_IO_ERROR;
     ++		unlink(slice_as_string(&new_table_path));
      +		goto done;
      +	}
     ++	have_lock = false;
     ++
     ++	/* Reload the stack before deleting. On windows, we can only delete the
     ++	   files after we closed them.
     ++	*/
     ++	err = reftable_stack_reload_maybe_reuse(st, first < last);
     ++
     ++	listp = delete_on_success;
     ++	while (*listp) {
     ++		if (strcmp(*listp, slice_as_string(&new_table_path))) {
     ++			unlink(*listp);
     ++		}
     ++		listp++;
     ++	}
      +
      +done:
     -+	free_names(names);
     ++	free_names(delete_on_success);
     ++
     ++	listp = subtable_locks;
     ++	while (*listp) {
     ++		unlink(*listp);
     ++		listp++;
     ++	}
     ++	free_names(subtable_locks);
     ++	if (lock_file_fd > 0) {
     ++		close(lock_file_fd);
     ++		lock_file_fd = 0;
     ++	}
     ++	if (have_lock) {
     ++		unlink(slice_as_string(&lock_file_name));
     ++	}
     ++	slice_release(&new_table_name);
     ++	slice_release(&new_table_path);
     ++	slice_release(&ref_list_contents);
     ++	slice_release(&temp_tab_file_name);
     ++	slice_release(&lock_file_name);
      +	return err;
      +}
      +
     -+int reftable_stack_reload(struct reftable_stack *st)
     ++int reftable_stack_compact_all(struct reftable_stack *st,
     ++			       struct reftable_log_expiry_config *config)
      +{
     -+	int err = stack_uptodate(st);
     -+	if (err > 0)
     -+		return reftable_stack_reload_maybe_reuse(st, true);
     -+	return err;
     ++	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
      +}
      +
     -+int reftable_stack_add(struct reftable_stack *st,
     -+		       int (*write)(struct reftable_writer *wr, void *arg),
     -+		       void *arg)
     ++static int stack_compact_range_stats(struct reftable_stack *st, int first,
     ++				     int last,
     ++				     struct reftable_log_expiry_config *config)
      +{
     -+	int err = stack_try_add(st, write, arg);
     -+	if (err < 0) {
     -+		if (err == REFTABLE_LOCK_ERROR) {
     -+			/* Ignore error return, we want to propagate
     -+			   REFTABLE_LOCK_ERROR.
     -+			*/
     -+			reftable_stack_reload(st);
     -+		}
     -+		return err;
     ++	int err = stack_compact_range(st, first, last, config);
     ++	if (err > 0) {
     ++		st->stats.failures++;
      +	}
     -+
     -+	if (!st->disable_auto_compact)
     -+		return reftable_stack_auto_compact(st);
     -+
     -+	return 0;
     ++	return err;
      +}
      +
     -+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
     ++static int segment_size(struct segment *s)
      +{
     -+	char buf[100];
     -+	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
     -+	slice_set_string(dest, buf);
     ++	return s->end - s->start;
      +}
      +
     -+struct reftable_addition {
     -+	int lock_file_fd;
     -+	struct slice lock_file_name;
     -+	struct reftable_stack *stack;
     -+	char **names;
     -+	char **new_tables;
     -+	int new_tables_len;
     -+	uint64_t next_update_index;
     -+};
     -+
     -+#define REFTABLE_ADDITION_INIT               \
     -+	{                                    \
     -+		.lock_file_name = SLICE_INIT \
     ++int fastlog2(uint64_t sz)
     ++{
     ++	int l = 0;
     ++	if (sz == 0)
     ++		return 0;
     ++	for (; sz; sz /= 2) {
     ++		l++;
      +	}
     ++	return l - 1;
     ++}
      +
     -+static int reftable_stack_init_addition(struct reftable_addition *add,
     -+					struct reftable_stack *st)
     ++struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
      +{
     -+	int err = 0;
     -+	add->stack = st;
     ++	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
     ++	int next = 0;
     ++	struct segment cur = { 0 };
     ++	int i = 0;
      +
     -+	slice_set_string(&add->lock_file_name, st->list_file);
     -+	slice_addstr(&add->lock_file_name, ".lock");
     ++	if (n == 0) {
     ++		*seglen = 0;
     ++		return segs;
     ++	}
     ++	for (i = 0; i < n; i++) {
     ++		int log = fastlog2(sizes[i]);
     ++		if (cur.log != log && cur.bytes > 0) {
     ++			struct segment fresh = {
     ++				.start = i,
     ++			};
      +
     -+	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
     -+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
     -+	if (add->lock_file_fd < 0) {
     -+		if (errno == EEXIST) {
     -+			err = REFTABLE_LOCK_ERROR;
     -+		} else {
     -+			err = REFTABLE_IO_ERROR;
     ++			segs[next++] = cur;
     ++			cur = fresh;
      +		}
     -+		goto done;
     ++
     ++		cur.log = log;
     ++		cur.end = i + 1;
     ++		cur.bytes += sizes[i];
      +	}
     -+	err = stack_uptodate(st);
     -+	if (err < 0)
     -+		goto done;
     ++	segs[next++] = cur;
     ++	*seglen = next;
     ++	return segs;
     ++}
      +
     -+	if (err > 1) {
     -+		err = REFTABLE_LOCK_ERROR;
     -+		goto done;
     ++struct segment suggest_compaction_segment(uint64_t *sizes, int n)
     ++{
     ++	int seglen = 0;
     ++	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
     ++	struct segment min_seg = {
     ++		.log = 64,
     ++	};
     ++	int i = 0;
     ++	for (i = 0; i < seglen; i++) {
     ++		if (segment_size(&segs[i]) == 1) {
     ++			continue;
     ++		}
     ++
     ++		if (segs[i].log < min_seg.log) {
     ++			min_seg = segs[i];
     ++		}
      +	}
      +
     -+	add->next_update_index = reftable_stack_next_update_index(st);
     -+done:
     -+	if (err) {
     -+		reftable_addition_close(add);
     ++	while (min_seg.start > 0) {
     ++		int prev = min_seg.start - 1;
     ++		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
     ++			break;
     ++		}
     ++
     ++		min_seg.start = prev;
     ++		min_seg.bytes += sizes[prev];
      +	}
     -+	return err;
     ++
     ++	reftable_free(segs);
     ++	return min_seg;
      +}
      +
     -+void reftable_addition_close(struct reftable_addition *add)
     ++static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
      +{
     ++	uint64_t *sizes =
     ++		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
     ++	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
     ++	int overhead = header_size(version) - 1;
      +	int i = 0;
     -+	struct slice nm = SLICE_INIT;
     -+	for (i = 0; i < add->new_tables_len; i++) {
     -+		slice_set_string(&nm, add->stack->list_file);
     -+		slice_addstr(&nm, "/");
     -+		slice_addstr(&nm, add->new_tables[i]);
     -+		unlink(slice_as_string(&nm));
     -+		reftable_free(add->new_tables[i]);
     -+		add->new_tables[i] = NULL;
     ++	for (i = 0; i < st->merged->stack_len; i++) {
     ++		sizes[i] = st->merged->stack[i]->size - overhead;
      +	}
     -+	reftable_free(add->new_tables);
     -+	add->new_tables = NULL;
     -+	add->new_tables_len = 0;
     ++	return sizes;
     ++}
      +
     -+	if (add->lock_file_fd > 0) {
     -+		close(add->lock_file_fd);
     -+		add->lock_file_fd = 0;
     -+	}
     -+	if (add->lock_file_name.len > 0) {
     -+		unlink(slice_as_string(&add->lock_file_name));
     -+		slice_release(&add->lock_file_name);
     -+	}
     ++int reftable_stack_auto_compact(struct reftable_stack *st)
     ++{
     ++	uint64_t *sizes = stack_table_sizes_for_compaction(st);
     ++	struct segment seg =
     ++		suggest_compaction_segment(sizes, st->merged->stack_len);
     ++	reftable_free(sizes);
     ++	if (segment_size(&seg) > 0)
     ++		return stack_compact_range_stats(st, seg.start, seg.end - 1,
     ++						 NULL);
      +
     -+	free_names(add->names);
     -+	add->names = NULL;
     -+	slice_release(&nm);
     ++	return 0;
      +}
      +
     -+void reftable_addition_destroy(struct reftable_addition *add)
     ++struct reftable_compaction_stats *
     ++reftable_stack_compaction_stats(struct reftable_stack *st)
      +{
     -+	if (add == NULL) {
     -+		return;
     -+	}
     -+	reftable_addition_close(add);
     -+	reftable_free(add);
     ++	return &st->stats;
      +}
      +
     -+int reftable_addition_commit(struct reftable_addition *add)
     ++int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
     ++			    struct reftable_ref_record *ref)
      +{
     -+	struct slice table_list = SLICE_INIT;
     -+	int i = 0;
     -+	int err = 0;
     -+	if (add->new_tables_len == 0)
     -+		goto done;
     -+
     -+	for (i = 0; i < add->stack->merged->stack_len; i++) {
     -+		slice_addstr(&table_list, add->stack->merged->stack[i]->name);
     -+		slice_addstr(&table_list, "\n");
     -+	}
     -+	for (i = 0; i < add->new_tables_len; i++) {
     -+		slice_addstr(&table_list, add->new_tables[i]);
     -+		slice_addstr(&table_list, "\n");
     -+	}
     ++	struct reftable_table tab = { NULL };
     ++	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
     ++	return reftable_table_read_ref(&tab, refname, ref);
     ++}
      +
     -+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
     -+	slice_release(&table_list);
     -+	if (err < 0) {
     -+		err = REFTABLE_IO_ERROR;
     ++int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
     ++			    struct reftable_log_record *log)
     ++{
     ++	struct reftable_iterator it = { 0 };
     ++	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
     ++	int err = reftable_merged_table_seek_log(mt, &it, refname);
     ++	if (err)
      +		goto done;
     -+	}
      +
     -+	err = close(add->lock_file_fd);
     -+	add->lock_file_fd = 0;
     -+	if (err < 0) {
     -+		err = REFTABLE_IO_ERROR;
     ++	err = reftable_iterator_next_log(&it, log);
     ++	if (err)
      +		goto done;
     -+	}
      +
     -+	err = rename(slice_as_string(&add->lock_file_name),
     -+		     add->stack->list_file);
     -+	if (err < 0) {
     -+		err = REFTABLE_IO_ERROR;
     ++	if (strcmp(log->ref_name, refname) ||
     ++	    reftable_log_record_is_deletion(log)) {
     ++		err = 1;
      +		goto done;
      +	}
      +
     -+	err = reftable_stack_reload(add->stack);
     -+
      +done:
     -+	reftable_addition_close(add);
     -+	return err;
     -+}
     -+
     -+int reftable_stack_new_addition(struct reftable_addition **dest,
     -+				struct reftable_stack *st)
     -+{
     -+	int err = 0;
     -+	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
     -+	*dest = reftable_calloc(sizeof(**dest));
     -+	**dest = empty;
     -+	err = reftable_stack_init_addition(*dest, st);
      +	if (err) {
     -+		reftable_free(*dest);
     -+		*dest = NULL;
     ++		reftable_log_record_clear(log);
      +	}
     ++	reftable_iterator_destroy(&it);
      +	return err;
      +}
      +
     -+int stack_try_add(struct reftable_stack *st,
     -+		  int (*write_table)(struct reftable_writer *wr, void *arg),
     -+		  void *arg)
     ++int stack_check_addition(struct reftable_stack *st, const char *new_tab_name)
      +{
     -+	struct reftable_addition add = REFTABLE_ADDITION_INIT;
     -+	int err = reftable_stack_init_addition(&add, st);
     ++	int err = 0;
     ++	struct reftable_block_source src = { 0 };
     ++	struct reftable_reader *rd = NULL;
     ++	struct reftable_table tab = { NULL };
     ++	struct reftable_ref_record *refs = NULL;
     ++	struct reftable_iterator it = { NULL };
     ++	int cap = 0;
     ++	int len = 0;
     ++	int i = 0;
     ++
     ++	if (st->config.skip_name_check)
     ++		return 0;
     ++
     ++	err = reftable_block_source_from_file(&src, new_tab_name);
     ++	if (err < 0)
     ++		goto done;
     ++
     ++	err = reftable_new_reader(&rd, &src, new_tab_name);
      +	if (err < 0)
      +		goto done;
     ++
     ++	err = reftable_reader_seek_ref(rd, &it, "");
      +	if (err > 0) {
     -+		err = REFTABLE_LOCK_ERROR;
     ++		err = 0;
      +		goto done;
      +	}
     -+
     -+	err = reftable_addition_add(&add, write_table, arg);
      +	if (err < 0)
      +		goto done;
      +
     -+	err = reftable_addition_commit(&add);
     ++	while (true) {
     ++		struct reftable_ref_record ref = { 0 };
     ++		err = reftable_iterator_next_ref(&it, &ref);
     ++		if (err > 0) {
     ++			break;
     ++		}
     ++		if (err < 0)
     ++			goto done;
     ++
     ++		if (len >= cap) {
     ++			cap = 2 * cap + 1;
     ++			refs = reftable_realloc(refs, cap * sizeof(refs[0]));
     ++		}
     ++
     ++		refs[len++] = ref;
     ++	}
     ++
     ++	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
     ++
     ++	err = validate_ref_record_addition(tab, refs, len);
     ++
      +done:
     -+	reftable_addition_close(&add);
     ++	for (i = 0; i < len; i++) {
     ++		reftable_ref_record_clear(&refs[i]);
     ++	}
     ++
     ++	free(refs);
     ++	reftable_iterator_destroy(&it);
     ++	reftable_reader_free(rd);
      +	return err;
      +}
     +
     + ## reftable/stack.h (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
      +
     -+int reftable_addition_add(struct reftable_addition *add,
     -+			  int (*write_table)(struct reftable_writer *wr,
     -+					     void *arg),
     -+			  void *arg)
     -+{
     -+	struct slice temp_tab_file_name = SLICE_INIT;
     -+	struct slice tab_file_name = SLICE_INIT;
     -+	struct slice next_name = SLICE_INIT;
     -+	struct reftable_writer *wr = NULL;
     -+	int err = 0;
     -+	int tab_fd = 0;
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
      +
     -+	slice_resize(&next_name, 0);
     -+	format_name(&next_name, add->next_update_index, add->next_update_index);
     ++#ifndef STACK_H
     ++#define STACK_H
      +
     -+	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
     -+	slice_addstr(&temp_tab_file_name, "/");
     -+	slice_addbuf(&temp_tab_file_name, next_name);
     -+	slice_addstr(&temp_tab_file_name, ".temp.XXXXXX");
     ++#include "reftable.h"
     ++#include "system.h"
      +
     -+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
     -+	if (tab_fd < 0) {
     -+		err = REFTABLE_IO_ERROR;
     -+		goto done;
     -+	}
     ++struct reftable_stack {
     ++	char *list_file;
     ++	char *reftable_dir;
     ++	bool disable_auto_compact;
      +
     -+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
     -+				 &add->stack->config);
     -+	err = write_table(wr, arg);
     -+	if (err < 0)
     -+		goto done;
     ++	struct reftable_write_options config;
      +
     -+	err = reftable_writer_close(wr);
     -+	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
     -+		err = 0;
     -+		goto done;
     -+	}
     -+	if (err < 0)
     -+		goto done;
     ++	struct reftable_merged_table *merged;
     ++	struct reftable_compaction_stats stats;
     ++};
      +
     -+	err = close(tab_fd);
     -+	tab_fd = 0;
     -+	if (err < 0) {
     -+		err = REFTABLE_IO_ERROR;
     -+		goto done;
     -+	}
     ++int read_lines(const char *filename, char ***lines);
     ++int stack_try_add(struct reftable_stack *st,
     ++		  int (*write_table)(struct reftable_writer *wr, void *arg),
     ++		  void *arg);
     ++int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
     ++			int first, int last,
     ++			struct reftable_log_expiry_config *config);
     ++int fastlog2(uint64_t sz);
     ++int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
     ++void reftable_addition_close(struct reftable_addition *add);
     ++int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
     ++				      bool reuse_open);
      +
     -+	err = stack_check_addition(add->stack,
     -+				   slice_as_string(&temp_tab_file_name));
     -+	if (err < 0)
     -+		goto done;
     ++struct segment {
     ++	int start, end;
     ++	int log;
     ++	uint64_t bytes;
     ++};
      +
     -+	if (wr->min_update_index < add->next_update_index) {
     -+		err = REFTABLE_API_ERROR;
     -+		goto done;
     -+	}
     ++struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
     ++struct segment suggest_compaction_segment(uint64_t *sizes, int n);
      +
     -+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
     -+	slice_addstr(&next_name, ".ref");
     ++#endif
     +
     + ## reftable/stack_test.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
      +
     -+	slice_set_string(&tab_file_name, add->stack->reftable_dir);
     -+	slice_addstr(&tab_file_name, "/");
     -+	slice_addbuf(&tab_file_name, next_name);
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
      +
     -+	/* TODO: should check destination out of paranoia */
     -+	err = rename(slice_as_string(&temp_tab_file_name),
     -+		     slice_as_string(&tab_file_name));
     -+	if (err < 0) {
     -+		err = REFTABLE_IO_ERROR;
     -+		goto done;
     -+	}
     ++#include "stack.h"
      +
     -+	add->new_tables = reftable_realloc(add->new_tables,
     -+					   sizeof(*add->new_tables) *
     -+						   (add->new_tables_len + 1));
     -+	add->new_tables[add->new_tables_len] = slice_to_string(next_name);
     -+	add->new_tables_len++;
     -+done:
     -+	if (tab_fd > 0) {
     -+		close(tab_fd);
     -+		tab_fd = 0;
     -+	}
     -+	if (temp_tab_file_name.len > 0) {
     -+		unlink(slice_as_string(&temp_tab_file_name));
     -+	}
     ++#include "system.h"
      +
     -+	slice_release(&temp_tab_file_name);
     -+	slice_release(&tab_file_name);
     -+	slice_release(&next_name);
     -+	reftable_writer_free(wr);
     -+	return err;
     -+}
     ++#include "merged.h"
     ++#include "basics.h"
     ++#include "constants.h"
     ++#include "record.h"
     ++#include "reftable.h"
     ++#include "test_framework.h"
     ++#include "reftable-tests.h"
      +
     -+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
     -+{
     -+	int sz = st->merged->stack_len;
     -+	if (sz > 0)
     -+		return reftable_reader_max_update_index(
     -+			       st->merged->stack[sz - 1]) +
     -+		       1;
     -+	return 1;
     -+}
     ++#include <sys/types.h>
     ++#include <dirent.h>
      +
     -+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
     -+				struct slice *temp_tab,
     -+				struct reftable_log_expiry_config *config)
     ++static void clear_dir(const char *dirname)
      +{
     -+	struct slice next_name = SLICE_INIT;
     -+	int tab_fd = -1;
     -+	struct reftable_writer *wr = NULL;
     -+	int err = 0;
     ++	int fd = open(dirname, O_DIRECTORY, 0);
     ++	DIR *dir = fdopendir(fd);
     ++	struct dirent *ent = NULL;
      +
     -+	format_name(&next_name,
     -+		    reftable_reader_min_update_index(st->merged->stack[first]),
     -+		    reftable_reader_max_update_index(st->merged->stack[first]));
     ++	assert(fd >= 0);
      +
     -+	slice_set_string(temp_tab, st->reftable_dir);
     -+	slice_addstr(temp_tab, "/");
     -+	slice_addbuf(temp_tab, next_name);
     -+	slice_addstr(temp_tab, ".temp.XXXXXX");
     ++	while ((ent = readdir(dir)) != NULL) {
     ++		unlinkat(fd, ent->d_name, 0);
     ++	}
     ++	closedir(dir);
     ++	rmdir(dirname);
     ++}
      +
     -+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
     -+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
     ++static void test_read_file(void)
     ++{
     ++	char fn[256] = "/tmp/stack.test_read_file.XXXXXX";
     ++	int fd = mkstemp(fn);
     ++	char out[1024] = "line1\n\nline2\nline3";
     ++	int n, err;
     ++	char **names = NULL;
     ++	char *want[] = { "line1", "line2", "line3" };
     ++	int i = 0;
      +
     -+	err = stack_write_compact(st, wr, first, last, config);
     -+	if (err < 0)
     -+		goto done;
     -+	err = reftable_writer_close(wr);
     -+	if (err < 0)
     -+		goto done;
     ++	assert(fd > 0);
     ++	n = write(fd, out, strlen(out));
     ++	assert(n == strlen(out));
     ++	err = close(fd);
     ++	assert(err >= 0);
      +
     -+	err = close(tab_fd);
     -+	tab_fd = 0;
     ++	err = read_lines(fn, &names);
     ++	assert_err(err);
      +
     -+done:
     -+	reftable_writer_free(wr);
     -+	if (tab_fd > 0) {
     -+		close(tab_fd);
     -+		tab_fd = 0;
     ++	for (i = 0; names[i] != NULL; i++) {
     ++		assert(0 == strcmp(want[i], names[i]));
      +	}
     -+	if (err != 0 && temp_tab->len > 0) {
     -+		unlink(slice_as_string(temp_tab));
     -+		slice_release(temp_tab);
     -+	}
     -+	slice_release(&next_name);
     -+	return err;
     ++	free_names(names);
     ++	remove(fn);
      +}
      +
     -+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
     -+			int first, int last,
     -+			struct reftable_log_expiry_config *config)
     ++static void test_parse_names(void)
      +{
     -+	int subtabs_len = last - first + 1;
     -+	struct reftable_reader **subtabs = reftable_calloc(
     -+		sizeof(struct reftable_reader *) * (last - first + 1));
     -+	struct reftable_merged_table *mt = NULL;
     -+	int err = 0;
     -+	struct reftable_iterator it = { 0 };
     -+	struct reftable_ref_record ref = { 0 };
     -+	struct reftable_log_record log = { 0 };
     ++	char buf[] = "line\n";
     ++	char **names = NULL;
     ++	parse_names(buf, strlen(buf), &names);
      +
     -+	uint64_t entries = 0;
     ++	assert(NULL != names[0]);
     ++	assert(0 == strcmp(names[0], "line"));
     ++	assert(NULL == names[1]);
     ++	free_names(names);
     ++}
      +
     -+	int i = 0, j = 0;
     -+	for (i = first, j = 0; i <= last; i++) {
     -+		struct reftable_reader *t = st->merged->stack[i];
     -+		subtabs[j++] = t;
     -+		st->stats.bytes += t->size;
     -+	}
     -+	reftable_writer_set_limits(wr,
     -+				   st->merged->stack[first]->min_update_index,
     -+				   st->merged->stack[last]->max_update_index);
     ++static void test_names_equal(void)
     ++{
     ++	char *a[] = { "a", "b", "c", NULL };
     ++	char *b[] = { "a", "b", "d", NULL };
     ++	char *c[] = { "a", "b", NULL };
      +
     -+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
     -+					st->config.hash_id);
     -+	if (err < 0) {
     -+		reftable_free(subtabs);
     -+		goto done;
     -+	}
     ++	assert(names_equal(a, a));
     ++	assert(!names_equal(a, b));
     ++	assert(!names_equal(a, c));
     ++}
      +
     -+	err = reftable_merged_table_seek_ref(mt, &it, "");
     -+	if (err < 0)
     -+		goto done;
     ++static int write_test_ref(struct reftable_writer *wr, void *arg)
     ++{
     ++	struct reftable_ref_record *ref = arg;
     ++	reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
     ++	return reftable_writer_add_ref(wr, ref);
     ++}
      +
     -+	while (true) {
     -+		err = reftable_iterator_next_ref(&it, &ref);
     -+		if (err > 0) {
     -+			err = 0;
     -+			break;
     -+		}
     -+		if (err < 0) {
     -+			break;
     -+		}
     -+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
     -+			continue;
     -+		}
     ++struct write_log_arg {
     ++	struct reftable_log_record *log;
     ++	uint64_t update_index;
     ++};
      +
     -+		err = reftable_writer_add_ref(wr, &ref);
     -+		if (err < 0) {
     -+			break;
     -+		}
     -+		entries++;
     -+	}
     -+	reftable_iterator_destroy(&it);
     ++static int write_test_log(struct reftable_writer *wr, void *arg)
     ++{
     ++	struct write_log_arg *wla = arg;
      +
     -+	err = reftable_merged_table_seek_log(mt, &it, "");
     -+	if (err < 0)
     -+		goto done;
     ++	reftable_writer_set_limits(wr, wla->update_index, wla->update_index);
     ++	return reftable_writer_add_log(wr, wla->log);
     ++}
      +
     -+	while (true) {
     -+		err = reftable_iterator_next_log(&it, &log);
     -+		if (err > 0) {
     -+			err = 0;
     -+			break;
     -+		}
     -+		if (err < 0) {
     -+			break;
     -+		}
     -+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
     -+			continue;
     -+		}
     ++static void test_reftable_stack_add_one(void)
     ++{
     ++	char dir[256] = "/tmp/stack_test.XXXXXX";
     ++	struct reftable_write_options cfg = { 0 };
     ++	struct reftable_stack *st = NULL;
     ++	int err;
     ++	struct reftable_ref_record ref = {
     ++		.ref_name = "HEAD",
     ++		.update_index = 1,
     ++		.target = "master",
     ++	};
     ++	struct reftable_ref_record dest = { 0 };
      +
     -+		if (config != NULL && config->time > 0 &&
     -+		    log.time < config->time) {
     -+			continue;
     -+		}
     ++	assert(mkdtemp(dir));
      +
     -+		if (config != NULL && config->min_update_index > 0 &&
     -+		    log.update_index < config->min_update_index) {
     -+			continue;
     -+		}
     ++	err = reftable_new_stack(&st, dir, cfg);
     ++	assert_err(err);
      +
     -+		err = reftable_writer_add_log(wr, &log);
     -+		if (err < 0) {
     -+			break;
     -+		}
     -+		entries++;
     -+	}
     ++	err = reftable_stack_add(st, &write_test_ref, &ref);
     ++	assert_err(err);
      +
     -+done:
     -+	reftable_iterator_destroy(&it);
     -+	if (mt != NULL) {
     -+		merged_table_clear(mt);
     -+		reftable_merged_table_free(mt);
     -+	}
     -+	reftable_ref_record_clear(&ref);
     -+	reftable_log_record_clear(&log);
     -+	st->stats.entries_written += entries;
     -+	return err;
     ++	err = reftable_stack_read_ref(st, ref.ref_name, &dest);
     ++	assert_err(err);
     ++	assert(0 == strcmp("master", dest.target));
     ++
     ++	reftable_ref_record_clear(&dest);
     ++	reftable_stack_destroy(st);
     ++	clear_dir(dir);
      +}
      +
     -+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
     -+static int stack_compact_range(struct reftable_stack *st, int first, int last,
     -+			       struct reftable_log_expiry_config *expiry)
     ++static void test_reftable_stack_uptodate(void)
      +{
     -+	struct slice temp_tab_file_name = SLICE_INIT;
     -+	struct slice new_table_name = SLICE_INIT;
     -+	struct slice lock_file_name = SLICE_INIT;
     -+	struct slice ref_list_contents = SLICE_INIT;
     -+	struct slice new_table_path = SLICE_INIT;
     -+	int err = 0;
     -+	bool have_lock = false;
     -+	int lock_file_fd = 0;
     -+	int compact_count = last - first + 1;
     -+	char **listp = NULL;
     -+	char **delete_on_success =
     -+		reftable_calloc(sizeof(char *) * (compact_count + 1));
     -+	char **subtable_locks =
     -+		reftable_calloc(sizeof(char *) * (compact_count + 1));
     -+	int i = 0;
     -+	int j = 0;
     -+	bool is_empty_table = false;
     -+
     -+	if (first > last || (expiry == NULL && first == last)) {
     -+		err = 0;
     -+		goto done;
     -+	}
     -+
     -+	st->stats.attempts++;
     -+
     -+	slice_set_string(&lock_file_name, st->list_file);
     -+	slice_addstr(&lock_file_name, ".lock");
     ++	struct reftable_write_options cfg = { 0 };
     ++	struct reftable_stack *st1 = NULL;
     ++	struct reftable_stack *st2 = NULL;
     ++	char dir[256] = "/tmp/stack_test.XXXXXX";
     ++	int err;
     ++	struct reftable_ref_record ref1 = {
     ++		.ref_name = "HEAD",
     ++		.update_index = 1,
     ++		.target = "master",
     ++	};
     ++	struct reftable_ref_record ref2 = {
     ++		.ref_name = "branch2",
     ++		.update_index = 2,
     ++		.target = "master",
     ++	};
      +
     -+	lock_file_fd = open(slice_as_string(&lock_file_name),
     -+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
     -+	if (lock_file_fd < 0) {
     -+		if (errno == EEXIST) {
     -+			err = 1;
     -+		} else {
     -+			err = REFTABLE_IO_ERROR;
     -+		}
     -+		goto done;
     -+	}
     -+	/* Don't want to write to the lock for now.  */
     -+	close(lock_file_fd);
     -+	lock_file_fd = 0;
     ++	assert(mkdtemp(dir));
      +
     -+	have_lock = true;
     -+	err = stack_uptodate(st);
     -+	if (err != 0)
     -+		goto done;
     ++	err = reftable_new_stack(&st1, dir, cfg);
     ++	assert_err(err);
      +
     -+	for (i = first, j = 0; i <= last; i++) {
     -+		struct slice subtab_file_name = SLICE_INIT;
     -+		struct slice subtab_lock = SLICE_INIT;
     -+		int sublock_file_fd = -1;
     ++	err = reftable_new_stack(&st2, dir, cfg);
     ++	assert_err(err);
      +
     -+		slice_set_string(&subtab_file_name, st->reftable_dir);
     -+		slice_addstr(&subtab_file_name, "/");
     -+		slice_addstr(&subtab_file_name,
     -+			     reader_name(st->merged->stack[i]));
     ++	err = reftable_stack_add(st1, &write_test_ref, &ref1);
     ++	assert_err(err);
      +
     -+		slice_copy(&subtab_lock, subtab_file_name);
     -+		slice_addstr(&subtab_lock, ".lock");
     ++	err = reftable_stack_add(st2, &write_test_ref, &ref2);
     ++	assert(err == REFTABLE_LOCK_ERROR);
      +
     -+		sublock_file_fd = open(slice_as_string(&subtab_lock),
     -+				       O_EXCL | O_CREAT | O_WRONLY, 0644);
     -+		if (sublock_file_fd > 0) {
     -+			close(sublock_file_fd);
     -+		} else if (sublock_file_fd < 0) {
     -+			if (errno == EEXIST) {
     -+				err = 1;
     -+			} else {
     -+				err = REFTABLE_IO_ERROR;
     -+			}
     -+		}
     ++	err = reftable_stack_reload(st2);
     ++	assert_err(err);
      +
     -+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
     -+		delete_on_success[j] =
     -+			(char *)slice_as_string(&subtab_file_name);
     -+		j++;
     ++	err = reftable_stack_add(st2, &write_test_ref, &ref2);
     ++	assert_err(err);
     ++	reftable_stack_destroy(st1);
     ++	reftable_stack_destroy(st2);
     ++	clear_dir(dir);
     ++}
      +
     -+		if (err != 0)
     -+			goto done;
     -+	}
     ++static void test_reftable_stack_transaction_api(void)
     ++{
     ++	char dir[256] = "/tmp/stack_test.XXXXXX";
     ++	struct reftable_write_options cfg = { 0 };
     ++	struct reftable_stack *st = NULL;
     ++	int err;
     ++	struct reftable_addition *add = NULL;
      +
     -+	err = unlink(slice_as_string(&lock_file_name));
     -+	if (err < 0)
     -+		goto done;
     -+	have_lock = false;
     ++	struct reftable_ref_record ref = {
     ++		.ref_name = "HEAD",
     ++		.update_index = 1,
     ++		.target = "master",
     ++	};
     ++	struct reftable_ref_record dest = { 0 };
      +
     -+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
     -+				   expiry);
     -+	/* Compaction + tombstones can create an empty table out of non-empty
     -+	 * tables. */
     -+	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
     -+	if (is_empty_table) {
     -+		err = 0;
     -+	}
     -+	if (err < 0)
     -+		goto done;
     ++	assert(mkdtemp(dir));
      +
     -+	lock_file_fd = open(slice_as_string(&lock_file_name),
     -+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
     -+	if (lock_file_fd < 0) {
     -+		if (errno == EEXIST) {
     -+			err = 1;
     -+		} else {
     -+			err = REFTABLE_IO_ERROR;
     -+		}
     -+		goto done;
     -+	}
     -+	have_lock = true;
     ++	err = reftable_new_stack(&st, dir, cfg);
     ++	assert_err(err);
      +
     -+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
     -+		    st->merged->stack[last]->max_update_index);
     -+	slice_addstr(&new_table_name, ".ref");
     ++	reftable_addition_destroy(add);
      +
     -+	slice_set_string(&new_table_path, st->reftable_dir);
     -+	slice_addstr(&new_table_path, "/");
     ++	err = reftable_stack_new_addition(&add, st);
     ++	assert_err(err);
      +
     -+	slice_addbuf(&new_table_path, new_table_name);
     ++	err = reftable_addition_add(add, &write_test_ref, &ref);
     ++	assert_err(err);
      +
     -+	if (!is_empty_table) {
     -+		err = rename(slice_as_string(&temp_tab_file_name),
     -+			     slice_as_string(&new_table_path));
     -+		if (err < 0) {
     -+			err = REFTABLE_IO_ERROR;
     -+			goto done;
     -+		}
     -+	}
     ++	err = reftable_addition_commit(add);
     ++	assert_err(err);
      +
     -+	for (i = 0; i < first; i++) {
     -+		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
     -+		slice_addstr(&ref_list_contents, "\n");
     -+	}
     -+	if (!is_empty_table) {
     -+		slice_addbuf(&ref_list_contents, new_table_name);
     -+		slice_addstr(&ref_list_contents, "\n");
     -+	}
     -+	for (i = last + 1; i < st->merged->stack_len; i++) {
     -+		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
     -+		slice_addstr(&ref_list_contents, "\n");
     -+	}
     ++	reftable_addition_destroy(add);
      +
     -+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
     -+	if (err < 0) {
     -+		err = REFTABLE_IO_ERROR;
     -+		unlink(slice_as_string(&new_table_path));
     -+		goto done;
     -+	}
     -+	err = close(lock_file_fd);
     -+	lock_file_fd = 0;
     -+	if (err < 0) {
     -+		err = REFTABLE_IO_ERROR;
     -+		unlink(slice_as_string(&new_table_path));
     -+		goto done;
     -+	}
     ++	err = reftable_stack_read_ref(st, ref.ref_name, &dest);
     ++	assert_err(err);
     ++	assert(0 == strcmp("master", dest.target));
      +
     -+	err = rename(slice_as_string(&lock_file_name), st->list_file);
     -+	if (err < 0) {
     -+		err = REFTABLE_IO_ERROR;
     -+		unlink(slice_as_string(&new_table_path));
     -+		goto done;
     -+	}
     -+	have_lock = false;
     ++	reftable_ref_record_clear(&dest);
     ++	reftable_stack_destroy(st);
     ++	clear_dir(dir);
     ++}
      +
     -+	/* Reload the stack before deleting. On windows, we can only delete the
     -+	   files after we closed them.
     -+	*/
     -+	err = reftable_stack_reload_maybe_reuse(st, first < last);
     ++static void test_reftable_stack_validate_refname(void)
     ++{
     ++	struct reftable_write_options cfg = { 0 };
     ++	struct reftable_stack *st = NULL;
     ++	int err;
     ++	char dir[256] = "/tmp/stack_test.XXXXXX";
     ++	int i;
     ++	struct reftable_ref_record ref = {
     ++		.ref_name = "a/b",
     ++		.update_index = 1,
     ++		.target = "master",
     ++	};
     ++	char *additions[] = { "a", "a/b/c" };
      +
     -+	listp = delete_on_success;
     -+	while (*listp) {
     -+		if (strcmp(*listp, slice_as_string(&new_table_path))) {
     -+			unlink(*listp);
     -+		}
     -+		listp++;
     -+	}
     ++	assert(mkdtemp(dir));
     ++	err = reftable_new_stack(&st, dir, cfg);
     ++	assert_err(err);
      +
     -+done:
     -+	free_names(delete_on_success);
     ++	err = reftable_stack_add(st, &write_test_ref, &ref);
     ++	assert_err(err);
      +
     -+	listp = subtable_locks;
     -+	while (*listp) {
     -+		unlink(*listp);
     -+		listp++;
     -+	}
     -+	free_names(subtable_locks);
     -+	if (lock_file_fd > 0) {
     -+		close(lock_file_fd);
     -+		lock_file_fd = 0;
     -+	}
     -+	if (have_lock) {
     -+		unlink(slice_as_string(&lock_file_name));
     ++	for (i = 0; i < ARRAY_SIZE(additions); i++) {
     ++		struct reftable_ref_record ref = {
     ++			.ref_name = additions[i],
     ++			.update_index = 1,
     ++			.target = "master",
     ++		};
     ++
     ++		err = reftable_stack_add(st, &write_test_ref, &ref);
     ++		assert(err == REFTABLE_NAME_CONFLICT);
      +	}
     -+	slice_release(&new_table_name);
     -+	slice_release(&new_table_path);
     -+	slice_release(&ref_list_contents);
     -+	slice_release(&temp_tab_file_name);
     -+	slice_release(&lock_file_name);
     -+	return err;
     -+}
      +
     -+int reftable_stack_compact_all(struct reftable_stack *st,
     -+			       struct reftable_log_expiry_config *config)
     -+{
     -+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
     ++	reftable_stack_destroy(st);
     ++	clear_dir(dir);
      +}
      +
     -+static int stack_compact_range_stats(struct reftable_stack *st, int first,
     -+				     int last,
     -+				     struct reftable_log_expiry_config *config)
     ++static int write_error(struct reftable_writer *wr, void *arg)
      +{
     -+	int err = stack_compact_range(st, first, last, config);
     -+	if (err > 0) {
     -+		st->stats.failures++;
     -+	}
     -+	return err;
     ++	return *((int *)arg);
      +}
      +
     -+static int segment_size(struct segment *s)
     ++static void test_reftable_stack_update_index_check(void)
      +{
     -+	return s->end - s->start;
     ++	char dir[256] = "/tmp/stack_test.XXXXXX";
     ++	struct reftable_write_options cfg = { 0 };
     ++	struct reftable_stack *st = NULL;
     ++	int err;
     ++	struct reftable_ref_record ref1 = {
     ++		.ref_name = "name1",
     ++		.update_index = 1,
     ++		.target = "master",
     ++	};
     ++	struct reftable_ref_record ref2 = {
     ++		.ref_name = "name2",
     ++		.update_index = 1,
     ++		.target = "master",
     ++	};
     ++	assert(mkdtemp(dir));
     ++
     ++	err = reftable_new_stack(&st, dir, cfg);
     ++	assert_err(err);
     ++
     ++	err = reftable_stack_add(st, &write_test_ref, &ref1);
     ++	assert_err(err);
     ++
     ++	err = reftable_stack_add(st, &write_test_ref, &ref2);
     ++	assert(err == REFTABLE_API_ERROR);
     ++	reftable_stack_destroy(st);
     ++	clear_dir(dir);
      +}
      +
     -+int fastlog2(uint64_t sz)
     ++static void test_reftable_stack_lock_failure(void)
      +{
     -+	int l = 0;
     -+	if (sz == 0)
     -+		return 0;
     -+	for (; sz; sz /= 2) {
     -+		l++;
     ++	char dir[256] = "/tmp/stack_test.XXXXXX";
     ++	struct reftable_write_options cfg = { 0 };
     ++	struct reftable_stack *st = NULL;
     ++	int err, i;
     ++	assert(mkdtemp(dir));
     ++
     ++	err = reftable_new_stack(&st, dir, cfg);
     ++	assert_err(err);
     ++	for (i = -1; i != REFTABLE_EMPTY_TABLE_ERROR; i--) {
     ++		err = reftable_stack_add(st, &write_error, &i);
     ++		assert(err == i);
      +	}
     -+	return l - 1;
     ++
     ++	reftable_stack_destroy(st);
     ++	clear_dir(dir);
      +}
      +
     -+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
     ++static void test_reftable_stack_add(void)
      +{
     -+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
     -+	int next = 0;
     -+	struct segment cur = { 0 };
      +	int i = 0;
     ++	int err = 0;
     ++	struct reftable_write_options cfg = { 0 };
     ++	struct reftable_stack *st = NULL;
     ++	char dir[256] = "/tmp/stack_test.XXXXXX";
     ++	struct reftable_ref_record refs[2] = { 0 };
     ++	struct reftable_log_record logs[2] = { 0 };
     ++	int N = ARRAY_SIZE(refs);
     ++
     ++	assert(mkdtemp(dir));
     ++
     ++	err = reftable_new_stack(&st, dir, cfg);
     ++	assert_err(err);
     ++	st->disable_auto_compact = true;
     ++
     ++	for (i = 0; i < N; i++) {
     ++		char buf[256];
     ++		snprintf(buf, sizeof(buf), "branch%02d", i);
     ++		refs[i].ref_name = xstrdup(buf);
     ++		refs[i].value = reftable_malloc(SHA1_SIZE);
     ++		refs[i].update_index = i + 1;
     ++		set_test_hash(refs[i].value, i);
     ++
     ++		logs[i].ref_name = xstrdup(buf);
     ++		logs[i].update_index = N + i + 1;
     ++		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
     ++		logs[i].email = xstrdup("identity@invalid");
     ++		set_test_hash(logs[i].new_hash, i);
     ++	}
     ++
     ++	for (i = 0; i < N; i++) {
     ++		int err = reftable_stack_add(st, &write_test_ref, &refs[i]);
     ++		assert_err(err);
     ++	}
     ++
     ++	for (i = 0; i < N; i++) {
     ++		struct write_log_arg arg = {
     ++			.log = &logs[i],
     ++			.update_index = reftable_stack_next_update_index(st),
     ++		};
     ++		int err = reftable_stack_add(st, &write_test_log, &arg);
     ++		assert_err(err);
     ++	}
      +
     -+	if (n == 0) {
     -+		*seglen = 0;
     -+		return segs;
     ++	err = reftable_stack_compact_all(st, NULL);
     ++	assert_err(err);
     ++
     ++	for (i = 0; i < N; i++) {
     ++		struct reftable_ref_record dest = { 0 };
     ++		int err = reftable_stack_read_ref(st, refs[i].ref_name, &dest);
     ++		assert_err(err);
     ++		assert(reftable_ref_record_equal(&dest, refs + i, SHA1_SIZE));
     ++		reftable_ref_record_clear(&dest);
      +	}
     -+	for (i = 0; i < n; i++) {
     -+		int log = fastlog2(sizes[i]);
     -+		if (cur.log != log && cur.bytes > 0) {
     -+			struct segment fresh = {
     -+				.start = i,
     -+			};
      +
     -+			segs[next++] = cur;
     -+			cur = fresh;
     -+		}
     ++	for (i = 0; i < N; i++) {
     ++		struct reftable_log_record dest = { 0 };
     ++		int err = reftable_stack_read_log(st, refs[i].ref_name, &dest);
     ++		assert_err(err);
     ++		assert(reftable_log_record_equal(&dest, logs + i, SHA1_SIZE));
     ++		reftable_log_record_clear(&dest);
     ++	}
      +
     -+		cur.log = log;
     -+		cur.end = i + 1;
     -+		cur.bytes += sizes[i];
     ++	/* cleanup */
     ++	reftable_stack_destroy(st);
     ++	for (i = 0; i < N; i++) {
     ++		reftable_ref_record_clear(&refs[i]);
     ++		reftable_log_record_clear(&logs[i]);
      +	}
     -+	segs[next++] = cur;
     -+	*seglen = next;
     -+	return segs;
     ++	clear_dir(dir);
      +}
      +
     -+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
     ++static void test_reftable_stack_tombstone(void)
      +{
     -+	int seglen = 0;
     -+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
     -+	struct segment min_seg = {
     -+		.log = 64,
     -+	};
      +	int i = 0;
     -+	for (i = 0; i < seglen; i++) {
     -+		if (segment_size(&segs[i]) == 1) {
     -+			continue;
     ++	char dir[256] = "/tmp/stack_test.XXXXXX";
     ++	struct reftable_write_options cfg = { 0 };
     ++	struct reftable_stack *st = NULL;
     ++	int err;
     ++	struct reftable_ref_record refs[2] = { 0 };
     ++	struct reftable_log_record logs[2] = { 0 };
     ++	int N = ARRAY_SIZE(refs);
     ++	struct reftable_ref_record dest = { 0 };
     ++	struct reftable_log_record log_dest = { 0 };
     ++
     ++	assert(mkdtemp(dir));
     ++
     ++	err = reftable_new_stack(&st, dir, cfg);
     ++	assert_err(err);
     ++
     ++	for (i = 0; i < N; i++) {
     ++		const char *buf = "branch";
     ++		refs[i].ref_name = xstrdup(buf);
     ++		refs[i].update_index = i + 1;
     ++		if (i % 2 == 0) {
     ++			refs[i].value = reftable_malloc(SHA1_SIZE);
     ++			set_test_hash(refs[i].value, i);
      +		}
     -+
     -+		if (segs[i].log < min_seg.log) {
     -+			min_seg = segs[i];
     ++		logs[i].ref_name = xstrdup(buf);
     ++		/* update_index is part of the key. */
     ++		logs[i].update_index = 42;
     ++		if (i % 2 == 0) {
     ++			logs[i].new_hash = reftable_malloc(SHA1_SIZE);
     ++			set_test_hash(logs[i].new_hash, i);
     ++			logs[i].email = xstrdup("identity@invalid");
      +		}
      +	}
     ++	for (i = 0; i < N; i++) {
     ++		int err = reftable_stack_add(st, &write_test_ref, &refs[i]);
     ++		assert_err(err);
     ++	}
     ++	for (i = 0; i < N; i++) {
     ++		struct write_log_arg arg = {
     ++			.log = &logs[i],
     ++			.update_index = reftable_stack_next_update_index(st),
     ++		};
     ++		int err = reftable_stack_add(st, &write_test_log, &arg);
     ++		assert_err(err);
     ++	}
      +
     -+	while (min_seg.start > 0) {
     -+		int prev = min_seg.start - 1;
     -+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
     -+			break;
     -+		}
     ++	err = reftable_stack_read_ref(st, "branch", &dest);
     ++	assert(err == 1);
     ++	reftable_ref_record_clear(&dest);
      +
     -+		min_seg.start = prev;
     -+		min_seg.bytes += sizes[prev];
     -+	}
     ++	err = reftable_stack_read_log(st, "branch", &log_dest);
     ++	assert(err == 1);
     ++	reftable_log_record_clear(&log_dest);
      +
     -+	reftable_free(segs);
     -+	return min_seg;
     -+}
     ++	err = reftable_stack_compact_all(st, NULL);
     ++	assert_err(err);
      +
     -+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
     -+{
     -+	uint64_t *sizes =
     -+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
     -+	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
     -+	int overhead = header_size(version) - 1;
     -+	int i = 0;
     -+	for (i = 0; i < st->merged->stack_len; i++) {
     -+		sizes[i] = st->merged->stack[i]->size - overhead;
     ++	err = reftable_stack_read_ref(st, "branch", &dest);
     ++	assert(err == 1);
     ++
     ++	err = reftable_stack_read_log(st, "branch", &log_dest);
     ++	assert(err == 1);
     ++	reftable_ref_record_clear(&dest);
     ++	reftable_log_record_clear(&log_dest);
     ++
     ++	/* cleanup */
     ++	reftable_stack_destroy(st);
     ++	for (i = 0; i < N; i++) {
     ++		reftable_ref_record_clear(&refs[i]);
     ++		reftable_log_record_clear(&logs[i]);
      +	}
     -+	return sizes;
     ++	clear_dir(dir);
      +}
      +
     -+int reftable_stack_auto_compact(struct reftable_stack *st)
     ++static void test_reftable_stack_hash_id(void)
      +{
     -+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
     -+	struct segment seg =
     -+		suggest_compaction_segment(sizes, st->merged->stack_len);
     -+	reftable_free(sizes);
     -+	if (segment_size(&seg) > 0)
     -+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
     -+						 NULL);
     ++	char dir[256] = "/tmp/stack_test.XXXXXX";
     ++	struct reftable_write_options cfg = { 0 };
     ++	struct reftable_stack *st = NULL;
     ++	int err;
     ++
     ++	struct reftable_ref_record ref = {
     ++		.ref_name = "master",
     ++		.target = "target",
     ++		.update_index = 1,
     ++	};
     ++	struct reftable_write_options cfg32 = { .hash_id = SHA256_ID };
     ++	struct reftable_stack *st32 = NULL;
     ++	struct reftable_write_options cfg_default = { 0 };
     ++	struct reftable_stack *st_default = NULL;
     ++	struct reftable_ref_record dest = { 0 };
     ++
     ++	assert(mkdtemp(dir));
     ++	err = reftable_new_stack(&st, dir, cfg);
     ++	assert_err(err);
     ++
     ++	err = reftable_stack_add(st, &write_test_ref, &ref);
     ++	assert_err(err);
     ++
     ++	/* can't read it with the wrong hash ID. */
     ++	err = reftable_new_stack(&st32, dir, cfg32);
     ++	assert(err == REFTABLE_FORMAT_ERROR);
     ++
     ++	/* check that we can read it back with default config too. */
     ++	err = reftable_new_stack(&st_default, dir, cfg_default);
     ++	assert_err(err);
     ++
     ++	err = reftable_stack_read_ref(st_default, "master", &dest);
     ++	assert_err(err);
     ++
     ++	assert(!strcmp(dest.target, ref.target));
     ++	reftable_ref_record_clear(&dest);
     ++	reftable_stack_destroy(st);
     ++	reftable_stack_destroy(st_default);
     ++	clear_dir(dir);
     ++}
      +
     -+	return 0;
     ++static void test_log2(void)
     ++{
     ++	assert(1 == fastlog2(3));
     ++	assert(2 == fastlog2(4));
     ++	assert(2 == fastlog2(5));
      +}
      +
     -+struct reftable_compaction_stats *
     -+reftable_stack_compaction_stats(struct reftable_stack *st)
     ++static void test_sizes_to_segments(void)
      +{
     -+	return &st->stats;
     ++	uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
     ++	/* .................0  1  2  3  4  5 */
     ++
     ++	int seglen = 0;
     ++	struct segment *segs =
     ++		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
     ++	assert(segs[2].log == 3);
     ++	assert(segs[2].start == 5);
     ++	assert(segs[2].end == 6);
     ++
     ++	assert(segs[1].log == 2);
     ++	assert(segs[1].start == 2);
     ++	assert(segs[1].end == 5);
     ++	reftable_free(segs);
      +}
      +
     -+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
     -+			    struct reftable_ref_record *ref)
     ++static void test_sizes_to_segments_empty(void)
      +{
     -+	struct reftable_table tab = { NULL };
     -+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
     -+	return reftable_table_read_ref(&tab, refname, ref);
     ++	uint64_t sizes[0];
     ++
     ++	int seglen = 0;
     ++	struct segment *segs =
     ++		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
     ++	assert(seglen == 0);
     ++	reftable_free(segs);
      +}
      +
     -+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
     -+			    struct reftable_log_record *log)
     ++static void test_sizes_to_segments_all_equal(void)
      +{
     -+	struct reftable_iterator it = { 0 };
     -+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
     -+	int err = reftable_merged_table_seek_log(mt, &it, refname);
     -+	if (err)
     -+		goto done;
     ++	uint64_t sizes[] = { 5, 5 };
      +
     -+	err = reftable_iterator_next_log(&it, log);
     -+	if (err)
     -+		goto done;
     ++	int seglen = 0;
     ++	struct segment *segs =
     ++		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
     ++	assert(seglen == 1);
     ++	assert(segs[0].start == 0);
     ++	assert(segs[0].end == 2);
     ++	reftable_free(segs);
     ++}
      +
     -+	if (strcmp(log->ref_name, refname) ||
     -+	    reftable_log_record_is_deletion(log)) {
     -+		err = 1;
     -+		goto done;
     -+	}
     ++static void test_suggest_compaction_segment(void)
     ++{
     ++	uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
     ++	/* .................0    1    2  3   4  5  6 */
     ++	struct segment min =
     ++		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
     ++	assert(min.start == 2);
     ++	assert(min.end == 7);
     ++}
      +
     -+done:
     -+	if (err) {
     -+		reftable_log_record_clear(log);
     -+	}
     -+	reftable_iterator_destroy(&it);
     -+	return err;
     ++static void test_suggest_compaction_segment_nothing(void)
     ++{
     ++	uint64_t sizes[] = { 64, 32, 16, 8, 4, 2 };
     ++	struct segment result =
     ++		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
     ++	assert(result.start == result.end);
      +}
      +
     -+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name)
     ++static void test_reflog_expire(void)
      +{
     -+	int err = 0;
     -+	struct reftable_block_source src = { 0 };
     -+	struct reftable_reader *rd = NULL;
     -+	struct reftable_table tab = { NULL };
     -+	struct reftable_ref_record *refs = NULL;
     -+	struct reftable_iterator it = { NULL };
     -+	int cap = 0;
     -+	int len = 0;
     ++	char dir[256] = "/tmp/stack.test_reflog_expire.XXXXXX";
     ++	struct reftable_write_options cfg = { 0 };
     ++	struct reftable_stack *st = NULL;
     ++	struct reftable_log_record logs[20] = { 0 };
     ++	int N = ARRAY_SIZE(logs) - 1;
      +	int i = 0;
     ++	int err;
     ++	struct reftable_log_expiry_config expiry = {
     ++		.time = 10,
     ++	};
     ++	struct reftable_log_record log = { 0 };
      +
     -+	if (st->config.skip_name_check)
     -+		return 0;
     ++	assert(mkdtemp(dir));
      +
     -+	err = reftable_block_source_from_file(&src, new_tab_name);
     -+	if (err < 0)
     -+		goto done;
     ++	err = reftable_new_stack(&st, dir, cfg);
     ++	assert_err(err);
      +
     -+	err = reftable_new_reader(&rd, &src, new_tab_name);
     -+	if (err < 0)
     -+		goto done;
     ++	for (i = 1; i <= N; i++) {
     ++		char buf[256];
     ++		snprintf(buf, sizeof(buf), "branch%02d", i);
      +
     -+	err = reftable_reader_seek_ref(rd, &it, "");
     -+	if (err > 0) {
     -+		err = 0;
     -+		goto done;
     ++		logs[i].ref_name = xstrdup(buf);
     ++		logs[i].update_index = i;
     ++		logs[i].time = i;
     ++		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
     ++		logs[i].email = xstrdup("identity@invalid");
     ++		set_test_hash(logs[i].new_hash, i);
      +	}
     -+	if (err < 0)
     -+		goto done;
      +
     -+	while (true) {
     -+		struct reftable_ref_record ref = { 0 };
     -+		err = reftable_iterator_next_ref(&it, &ref);
     -+		if (err > 0) {
     -+			break;
     -+		}
     -+		if (err < 0)
     -+			goto done;
     ++	for (i = 1; i <= N; i++) {
     ++		struct write_log_arg arg = {
     ++			.log = &logs[i],
     ++			.update_index = reftable_stack_next_update_index(st),
     ++		};
     ++		int err = reftable_stack_add(st, &write_test_log, &arg);
     ++		assert_err(err);
     ++	}
      +
     -+		if (len >= cap) {
     -+			cap = 2 * cap + 1;
     -+			refs = reftable_realloc(refs, cap * sizeof(refs[0]));
     -+		}
     ++	err = reftable_stack_compact_all(st, NULL);
     ++	assert_err(err);
      +
     -+		refs[len++] = ref;
     -+	}
     ++	err = reftable_stack_compact_all(st, &expiry);
     ++	assert_err(err);
      +
     -+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
     ++	err = reftable_stack_read_log(st, logs[9].ref_name, &log);
     ++	assert(err == 1);
      +
     -+	err = validate_ref_record_addition(tab, refs, len);
     ++	err = reftable_stack_read_log(st, logs[11].ref_name, &log);
     ++	assert_err(err);
      +
     -+done:
     -+	for (i = 0; i < len; i++) {
     -+		reftable_ref_record_clear(&refs[i]);
     -+	}
     ++	expiry.min_update_index = 15;
     ++	err = reftable_stack_compact_all(st, &expiry);
     ++	assert_err(err);
      +
     -+	free(refs);
     -+	reftable_iterator_destroy(&it);
     -+	reftable_reader_free(rd);
     -+	return err;
     -+}
     -
     - ## reftable/stack.h (new) ##
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     ++	err = reftable_stack_read_log(st, logs[14].ref_name, &log);
     ++	assert(err == 1);
      +
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     ++	err = reftable_stack_read_log(st, logs[16].ref_name, &log);
     ++	assert_err(err);
      +
     -+#ifndef STACK_H
     -+#define STACK_H
     ++	/* cleanup */
     ++	reftable_stack_destroy(st);
     ++	for (i = 0; i <= N; i++) {
     ++		reftable_log_record_clear(&logs[i]);
     ++	}
     ++	clear_dir(dir);
     ++	reftable_log_record_clear(&log);
     ++}
      +
     -+#include "reftable.h"
     -+#include "system.h"
     ++static int write_nothing(struct reftable_writer *wr, void *arg)
     ++{
     ++	reftable_writer_set_limits(wr, 1, 1);
     ++	return 0;
     ++}
      +
     -+struct reftable_stack {
     -+	char *list_file;
     -+	char *reftable_dir;
     -+	bool disable_auto_compact;
     ++static void test_empty_add(void)
     ++{
     ++	struct reftable_write_options cfg = { 0 };
     ++	struct reftable_stack *st = NULL;
     ++	int err;
     ++	char dir[256] = "/tmp/stack_test.XXXXXX";
     ++	struct reftable_stack *st2 = NULL;
      +
     -+	struct reftable_write_options config;
     ++	assert(mkdtemp(dir));
      +
     -+	struct reftable_merged_table *merged;
     -+	struct reftable_compaction_stats stats;
     -+};
     ++	err = reftable_new_stack(&st, dir, cfg);
     ++	assert_err(err);
      +
     -+int read_lines(const char *filename, char ***lines);
     -+int stack_try_add(struct reftable_stack *st,
     -+		  int (*write_table)(struct reftable_writer *wr, void *arg),
     -+		  void *arg);
     -+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
     -+			int first, int last,
     -+			struct reftable_log_expiry_config *config);
     -+int fastlog2(uint64_t sz);
     -+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
     -+void reftable_addition_close(struct reftable_addition *add);
     -+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
     -+				      bool reuse_open);
     ++	err = reftable_stack_add(st, &write_nothing, NULL);
     ++	assert_err(err);
      +
     -+struct segment {
     -+	int start, end;
     -+	int log;
     -+	uint64_t bytes;
     -+};
     ++	err = reftable_new_stack(&st2, dir, cfg);
     ++	assert_err(err);
     ++	clear_dir(dir);
     ++	reftable_stack_destroy(st);
     ++	reftable_stack_destroy(st2);
     ++}
      +
     -+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
     -+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
     ++static void test_reftable_stack_auto_compaction(void)
     ++{
     ++	struct reftable_write_options cfg = { 0 };
     ++	struct reftable_stack *st = NULL;
     ++	char dir[256] = "/tmp/stack_test.XXXXXX";
     ++	int err, i;
     ++	int N = 100;
     ++	assert(mkdtemp(dir));
      +
     -+#endif
     ++	err = reftable_new_stack(&st, dir, cfg);
     ++	assert_err(err);
     ++
     ++	for (i = 0; i < N; i++) {
     ++		char name[100];
     ++		struct reftable_ref_record ref = {
     ++			.ref_name = name,
     ++			.update_index = reftable_stack_next_update_index(st),
     ++			.target = "master",
     ++		};
     ++		snprintf(name, sizeof(name), "branch%04d", i);
     ++
     ++		err = reftable_stack_add(st, &write_test_ref, &ref);
     ++		assert_err(err);
     ++
     ++		assert(i < 3 || st->merged->stack_len < 2 * fastlog2(i));
     ++	}
     ++
     ++	assert(reftable_stack_compaction_stats(st)->entries_written <
     ++	       (uint64_t)(N * fastlog2(N)));
     ++
     ++	reftable_stack_destroy(st);
     ++	clear_dir(dir);
     ++}
     ++
     ++int stack_test_main(int argc, const char *argv[])
     ++{
     ++	add_test_case("test_reftable_stack_uptodate",
     ++		      &test_reftable_stack_uptodate);
     ++	add_test_case("test_reftable_stack_transaction_api",
     ++		      &test_reftable_stack_transaction_api);
     ++	add_test_case("test_reftable_stack_hash_id",
     ++		      &test_reftable_stack_hash_id);
     ++	add_test_case("test_sizes_to_segments_all_equal",
     ++		      &test_sizes_to_segments_all_equal);
     ++	add_test_case("test_reftable_stack_auto_compaction",
     ++		      &test_reftable_stack_auto_compaction);
     ++	add_test_case("test_reftable_stack_validate_refname",
     ++		      &test_reftable_stack_validate_refname);
     ++	add_test_case("test_reftable_stack_update_index_check",
     ++		      &test_reftable_stack_update_index_check);
     ++	add_test_case("test_reftable_stack_lock_failure",
     ++		      &test_reftable_stack_lock_failure);
     ++	add_test_case("test_reftable_stack_tombstone",
     ++		      &test_reftable_stack_tombstone);
     ++	add_test_case("test_reftable_stack_add_one",
     ++		      &test_reftable_stack_add_one);
     ++	add_test_case("test_empty_add", test_empty_add);
     ++	add_test_case("test_reflog_expire", test_reflog_expire);
     ++	add_test_case("test_suggest_compaction_segment",
     ++		      &test_suggest_compaction_segment);
     ++	add_test_case("test_suggest_compaction_segment_nothing",
     ++		      &test_suggest_compaction_segment_nothing);
     ++	add_test_case("test_sizes_to_segments", &test_sizes_to_segments);
     ++	add_test_case("test_sizes_to_segments_empty",
     ++		      &test_sizes_to_segments_empty);
     ++	add_test_case("test_log2", &test_log2);
     ++	add_test_case("test_parse_names", &test_parse_names);
     ++	add_test_case("test_read_file", &test_read_file);
     ++	add_test_case("test_names_equal", &test_names_equal);
     ++	add_test_case("test_reftable_stack_add", &test_reftable_stack_add);
     ++	return test_main(argc, argv);
     ++}
      
       ## reftable/system.h (new) ##
      @@
     @@ reftable/system.h (new)
      +			       const Bytef *source, uLong *sourceLen);
      +int hash_size(uint32_t id);
      +
     ++#endif
     +
     + ## reftable/test_framework.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include "test_framework.h"
     ++
     ++#include "system.h"
     ++#include "basics.h"
     ++#include "constants.h"
     ++
     ++struct test_case **test_cases;
     ++int test_case_len;
     ++int test_case_cap;
     ++
     ++struct test_case *new_test_case(const char *name, void (*testfunc)(void))
     ++{
     ++	struct test_case *tc = reftable_malloc(sizeof(struct test_case));
     ++	tc->name = name;
     ++	tc->testfunc = testfunc;
     ++	return tc;
     ++}
     ++
     ++struct test_case *add_test_case(const char *name, void (*testfunc)(void))
     ++{
     ++	struct test_case *tc = new_test_case(name, testfunc);
     ++	if (test_case_len == test_case_cap) {
     ++		test_case_cap = 2 * test_case_cap + 1;
     ++		test_cases = reftable_realloc(
     ++			test_cases, sizeof(struct test_case) * test_case_cap);
     ++	}
     ++
     ++	test_cases[test_case_len++] = tc;
     ++	return tc;
     ++}
     ++
     ++int test_main(int argc, const char *argv[])
     ++{
     ++	const char *filter = NULL;
     ++	int i = 0;
     ++	if (argc > 1) {
     ++		filter = argv[1];
     ++	}
     ++
     ++	for (i = 0; i < test_case_len; i++) {
     ++		const char *name = test_cases[i]->name;
     ++		if (filter == NULL || strstr(name, filter) != NULL) {
     ++			printf("case %s\n", name);
     ++			test_cases[i]->testfunc();
     ++		} else {
     ++			printf("skip %s\n", name);
     ++		}
     ++
     ++		reftable_free(test_cases[i]);
     ++	}
     ++	reftable_free(test_cases);
     ++	test_cases = 0;
     ++	test_case_len = 0;
     ++	test_case_cap = 0;
     ++	return 0;
     ++}
     ++
     ++void set_test_hash(byte *p, int i)
     ++{
     ++	memset(p, (byte)i, hash_size(SHA1_ID));
     ++}
     +
     + ## reftable/test_framework.h (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#ifndef TEST_FRAMEWORK_H
     ++#define TEST_FRAMEWORK_H
     ++
     ++#include "system.h"
     ++
     ++#include "reftable.h"
     ++
     ++#ifdef NDEBUG
     ++#undef NDEBUG
     ++#endif
     ++
     ++#include "system.h"
     ++
     ++#ifdef assert
     ++#undef assert
     ++#endif
     ++
     ++#define assert_err(c)                                                  \
     ++	if (c != 0) {                                                  \
     ++		fflush(stderr);                                        \
     ++		fflush(stdout);                                        \
     ++		fprintf(stderr, "%s: %d: error == %d (%s), want 0\n",  \
     ++			__FILE__, __LINE__, c, reftable_error_str(c)); \
     ++		abort();                                               \
     ++	}
     ++
     ++#define assert_streq(a, b)                                               \
     ++	if (strcmp(a, b)) {                                              \
     ++		fflush(stderr);                                          \
     ++		fflush(stdout);                                          \
     ++		fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, \
     ++			__LINE__, #a, a, #b, b);                         \
     ++		abort();                                                 \
     ++	}
     ++
     ++#define assert(c)                                                          \
     ++	if (!(c)) {                                                        \
     ++		fflush(stderr);                                            \
     ++		fflush(stdout);                                            \
     ++		fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, \
     ++			__LINE__, #c);                                     \
     ++		abort();                                                   \
     ++	}
     ++
     ++struct test_case {
     ++	const char *name;
     ++	void (*testfunc)(void);
     ++};
     ++
     ++struct test_case *new_test_case(const char *name, void (*testfunc)(void));
     ++struct test_case *add_test_case(const char *name, void (*testfunc)(void));
     ++int test_main(int argc, const char *argv[]);
     ++
     ++void set_test_hash(byte *p, int i);
     ++
      +#endif
      
       ## reftable/tree.c (new) ##
     @@ reftable/tree.h (new)
      +
      +#endif
      
     + ## reftable/tree_test.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include "tree.h"
     ++
     ++#include "basics.h"
     ++#include "record.h"
     ++#include "reftable.h"
     ++#include "test_framework.h"
     ++#include "reftable-tests.h"
     ++
     ++static int test_compare(const void *a, const void *b)
     ++{
     ++	return (char *)a - (char *)b;
     ++}
     ++
     ++struct curry {
     ++	void *last;
     ++};
     ++
     ++static void check_increasing(void *arg, void *key)
     ++{
     ++	struct curry *c = (struct curry *)arg;
     ++	if (c->last != NULL) {
     ++		assert(test_compare(c->last, key) < 0);
     ++	}
     ++	c->last = key;
     ++}
     ++
     ++static void test_tree(void)
     ++{
     ++	struct tree_node *root = NULL;
     ++
     ++	void *values[11] = { 0 };
     ++	struct tree_node *nodes[11] = { 0 };
     ++	int i = 1;
     ++	struct curry c = { 0 };
     ++	do {
     ++		nodes[i] = tree_search(values + i, &root, &test_compare, 1);
     ++		i = (i * 7) % 11;
     ++	} while (i != 1);
     ++
     ++	for (i = 1; i < ARRAY_SIZE(nodes); i++) {
     ++		assert(values + i == nodes[i]->key);
     ++		assert(nodes[i] ==
     ++		       tree_search(values + i, &root, &test_compare, 0));
     ++	}
     ++
     ++	infix_walk(root, check_increasing, &c);
     ++	tree_free(root);
     ++}
     ++
     ++int tree_test_main(int argc, const char *argv[])
     ++{
     ++	add_test_case("test_tree", &test_tree);
     ++	return test_main(argc, argv);
     ++}
     +
       ## reftable/update.sh (new) ##
      @@
      +#!/bin/sh
     @@ reftable/update.sh (new)
      +SRC=${SRC:-origin}
      +BRANCH=${BRANCH:-master}
      +
     -+((git --git-dir reftable-repo/.git fetch ${SRC} ${BRANCH}:import && cd reftable-repo && git checkout -f $(git rev-parse import) ) ||
     ++((git --git-dir reftable-repo/.git fetch -f ${SRC} ${BRANCH}:import && cd reftable-repo && git checkout -f $(git rev-parse import) ) ||
      +   git clone https://github.com/google/reftable reftable-repo)
      +
      +cp reftable-repo/c/*.[ch] reftable/
      +cp reftable-repo/c/include/*.[ch] reftable/
      +cp reftable-repo/LICENSE reftable/
     -+git --git-dir reftable-repo/.git show --no-patch HEAD \
     ++
     ++git --git-dir reftable-repo/.git show --no-patch --format=oneline HEAD \
      +  > reftable/VERSION
      +
      +mv reftable/system.h reftable/system.h~
      +sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
      +
     -+# Remove unittests and compatibility hacks we don't need here.  
     -+rm reftable/*_test.c reftable/test_framework.* reftable/compat.*
     ++# Remove compatibility hacks we don't need here.
     ++rm reftable/compat.*
      +
     -+git add reftable/*.[ch] reftable/LICENSE reftable/VERSION 
     ++git add reftable/*.[ch] reftable/LICENSE reftable/VERSION
      
       ## reftable/writer.c (new) ##
      @@
     @@ reftable/writer.c (new)
      +
      +static int obj_index_tree_node_compare(const void *a, const void *b)
      +{
     -+	return slice_cmp(((const struct obj_index_tree_node *)a)->hash,
     -+			 ((const struct obj_index_tree_node *)b)->hash);
     ++	return slice_cmp(&((const struct obj_index_tree_node *)a)->hash,
     ++			 &((const struct obj_index_tree_node *)b)->hash);
      +}
      +
     -+static void writer_index_hash(struct reftable_writer *w, struct slice hash)
     ++static void writer_index_hash(struct reftable_writer *w, struct slice *hash)
      +{
      +	uint64_t off = w->next;
      +
     -+	struct obj_index_tree_node want = { .hash = hash };
     ++	struct obj_index_tree_node want = { .hash = *hash };
      +
      +	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
      +					     &obj_index_tree_node_compare, 0);
     @@ reftable/writer.c (new)
      +	struct slice key = SLICE_INIT;
      +	int err = 0;
      +	reftable_record_key(rec, &key);
     -+	if (slice_cmp(w->last_key, key) >= 0)
     ++	if (slice_cmp(&w->last_key, &key) >= 0)
      +		goto done;
      +
     -+	slice_copy(&w->last_key, key);
     ++	slice_copy(&w->last_key, &key);
      +	if (w->block_writer == NULL) {
      +		writer_reinit_block_writer(w, reftable_record_type(rec));
      +	}
     @@ reftable/writer.c (new)
      +			.canary = SLICE_CANARY,
      +		};
      +
     -+		writer_index_hash(w, h);
     ++		writer_index_hash(w, &h);
      +	}
      +
      +	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
     @@ reftable/writer.c (new)
      +			.len = hash_size(w->opts.hash_id),
      +			.canary = SLICE_CANARY,
      +		};
     -+		writer_index_hash(w, h);
     ++		writer_index_hash(w, &h);
      +	}
      +	return 0;
      +}
     @@ reftable/writer.c (new)
      +	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
      +	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
      +	if (arg->last != NULL) {
     -+		int n = common_prefix_size(entry->hash, *arg->last);
     ++		int n = common_prefix_size(&entry->hash, arg->last);
      +		if (n > arg->max) {
      +			arg->max = n;
      +		}
     @@ reftable/writer.c (new)
      +	}
      +
      +	ir.offset = w->next;
     -+	slice_copy(&ir.last_key, w->block_writer->last_key);
     ++	slice_copy(&ir.last_key, &w->block_writer->last_key);
      +	w->index[w->index_len] = ir;
      +
      +	w->index_len++;
 10:  a86c3753717 ! 10:  f3d74b78135 Reftable support for git-core
     @@ builtin/init-db.c: static const char *const init_db_usage[] = {
       int cmd_init_db(int argc, const char **argv, const char *prefix)
       {
       	const char *git_dir;
     -+	const char *ref_storage_format = DEFAULT_REF_STORAGE;
     ++	const char *ref_storage_format = default_ref_storage();
       	const char *real_git_dir = NULL;
       	const char *work_tree;
       	const char *template_dir = NULL;
     @@ refs/reftable-backend.c (new)
      +	QSORT(sorted, transaction->nr, ref_update_cmp);
      +	reftable_writer_set_limits(writer, ts, ts);
      +
     -+
      +	for (i = 0; i < transaction->nr; i++) {
      +		struct ref_update *u = sorted[i];
      +		struct reftable_log_record *log = &logs[i];
     @@ t/t0031-reftable.sh (new)
      +	grep "commit: number 10" output &&
      +	git gc &&
      +	git reflog refs/heads/master >output &&
     -+	test_line_count = 0 output 
     ++	test_line_count = 0 output
      +'
      +
      +# This matches show-ref's output
     @@ t/t0031-reftable.sh (new)
      +		print_ref "refs/heads/master" &&
      +		print_ref "refs/tags/file" &&
      +		print_ref "refs/tags/test_tag" &&
     -+		print_ref "refs/tags/test_tag^{}" 
     ++		print_ref "refs/tags/test_tag^{}"
      +	} >expect &&
      +	git show-ref -d >actual &&
      +	test_cmp expect actual
  -:  ----------- > 11:  3e7868ee409 Hookup unittests for the reftable library.
 11:  d3613c2ff53 = 12:  1f193b6da23 Add GIT_DEBUG_REFS debugging mechanism
 12:  9b98ed614ec = 13:  0471f7e3570 vcxproj: adjust for the reftable changes
 13:  5e401e4f1ac ! 14:  a03c882b075 Add reftable testing infrastructure
     @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
       	if (real_git_dir)
       		git_dir = real_git_dir;
      
     - ## builtin/init-db.c ##
     -@@ builtin/init-db.c: static const char *const init_db_usage[] = {
     - int cmd_init_db(int argc, const char **argv, const char *prefix)
     - {
     - 	const char *git_dir;
     --	const char *ref_storage_format = DEFAULT_REF_STORAGE;
     -+	const char *ref_storage_format = default_ref_storage();
     - 	const char *real_git_dir = NULL;
     - 	const char *work_tree;
     - 	const char *template_dir = NULL;
     -
       ## refs.c ##
      @@ refs.c: struct ref_store *get_main_ref_store(struct repository *r)
       	r->refs_private = ref_store_init(r->gitdir,

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v16 01/14] Write pseudorefs through ref backends.
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
@ 2020-06-05 18:03                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-09 10:16                                 ` Phillip Wood
  2020-06-05 18:03                               ` [PATCH v16 02/14] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
                                                 ` (14 subsequent siblings)
  15 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Pseudorefs store transient data in in the repository. Examples are HEAD,
CHERRY_PICK_HEAD, etc.

These refs have always been read through the ref backends, but they were written
in a one-off routine that wrote an object ID or symref directly into
.git/<pseudo_ref_name>.

This causes problems when introducing a new ref storage backend. To remedy this,
extend the ref backend implementation with a write_pseudoref_fn and
update_pseudoref_fn.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c                | 119 ++++++++----------------------------------
 refs.h                |  11 ++++
 refs/files-backend.c  | 114 +++++++++++++++++++++++++++++++++++++++-
 refs/packed-backend.c |  21 +++++++-
 refs/refs-internal.h  |  18 +++++++
 5 files changed, 184 insertions(+), 99 deletions(-)

diff --git a/refs.c b/refs.c
index 224ff66c7bb..12908066b13 100644
--- a/refs.c
+++ b/refs.c
@@ -321,6 +321,12 @@ int ref_exists(const char *refname)
 	return refs_ref_exists(get_main_ref_store(the_repository), refname);
 }
 
+int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
+{
+	return refs_delete_pseudoref(get_main_ref_store(the_repository),
+				     pseudoref, old_oid);
+}
+
 static int filter_refs(const char *refname, const struct object_id *oid,
 			   int flags, void *data)
 {
@@ -739,101 +745,6 @@ long get_files_ref_lock_timeout_ms(void)
 	return timeout_ms;
 }
 
-static int write_pseudoref(const char *pseudoref, const struct object_id *oid,
-			   const struct object_id *old_oid, struct strbuf *err)
-{
-	const char *filename;
-	int fd;
-	struct lock_file lock = LOCK_INIT;
-	struct strbuf buf = STRBUF_INIT;
-	int ret = -1;
-
-	if (!oid)
-		return 0;
-
-	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
-
-	filename = git_path("%s", pseudoref);
-	fd = hold_lock_file_for_update_timeout(&lock, filename, 0,
-					       get_files_ref_lock_timeout_ms());
-	if (fd < 0) {
-		strbuf_addf(err, _("could not open '%s' for writing: %s"),
-			    filename, strerror(errno));
-		goto done;
-	}
-
-	if (old_oid) {
-		struct object_id actual_old_oid;
-
-		if (read_ref(pseudoref, &actual_old_oid)) {
-			if (!is_null_oid(old_oid)) {
-				strbuf_addf(err, _("could not read ref '%s'"),
-					    pseudoref);
-				rollback_lock_file(&lock);
-				goto done;
-			}
-		} else if (is_null_oid(old_oid)) {
-			strbuf_addf(err, _("ref '%s' already exists"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		} else if (!oideq(&actual_old_oid, old_oid)) {
-			strbuf_addf(err, _("unexpected object ID when writing '%s'"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		}
-	}
-
-	if (write_in_full(fd, buf.buf, buf.len) < 0) {
-		strbuf_addf(err, _("could not write to '%s'"), filename);
-		rollback_lock_file(&lock);
-		goto done;
-	}
-
-	commit_lock_file(&lock);
-	ret = 0;
-done:
-	strbuf_release(&buf);
-	return ret;
-}
-
-static int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
-{
-	const char *filename;
-
-	filename = git_path("%s", pseudoref);
-
-	if (old_oid && !is_null_oid(old_oid)) {
-		struct lock_file lock = LOCK_INIT;
-		int fd;
-		struct object_id actual_old_oid;
-
-		fd = hold_lock_file_for_update_timeout(
-				&lock, filename, 0,
-				get_files_ref_lock_timeout_ms());
-		if (fd < 0) {
-			error_errno(_("could not open '%s' for writing"),
-				    filename);
-			return -1;
-		}
-		if (read_ref(pseudoref, &actual_old_oid))
-			die(_("could not read ref '%s'"), pseudoref);
-		if (!oideq(&actual_old_oid, old_oid)) {
-			error(_("unexpected object ID when deleting '%s'"),
-			      pseudoref);
-			rollback_lock_file(&lock);
-			return -1;
-		}
-
-		unlink(filename);
-		rollback_lock_file(&lock);
-	} else {
-		unlink(filename);
-	}
-
-	return 0;
-}
 
 int refs_delete_ref(struct ref_store *refs, const char *msg,
 		    const char *refname,
@@ -845,7 +756,7 @@ int refs_delete_ref(struct ref_store *refs, const char *msg,
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
 		assert(refs == get_main_ref_store(the_repository));
-		return delete_pseudoref(refname, old_oid);
+		return refs_delete_pseudoref(refs, refname, old_oid);
 	}
 
 	transaction = ref_store_transaction_begin(refs, &err);
@@ -1172,7 +1083,8 @@ int refs_update_ref(struct ref_store *refs, const char *msg,
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
 		assert(refs == get_main_ref_store(the_repository));
-		ret = write_pseudoref(refname, new_oid, old_oid, &err);
+		ret = refs_write_pseudoref(refs, refname, new_oid, old_oid,
+					   &err);
 	} else {
 		t = ref_store_transaction_begin(refs, &err);
 		if (!t ||
@@ -1441,6 +1353,19 @@ int head_ref(each_ref_fn fn, void *cb_data)
 	return refs_head_ref(get_main_ref_store(the_repository), fn, cb_data);
 }
 
+int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			 const struct object_id *oid,
+			 const struct object_id *old_oid, struct strbuf *err)
+{
+	return refs->be->write_pseudoref(refs, pseudoref, oid, old_oid, err);
+}
+
+int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			  const struct object_id *old_oid)
+{
+	return refs->be->delete_pseudoref(refs, pseudoref, old_oid);
+}
+
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
 		const char *prefix, int trim, int flags)
diff --git a/refs.h b/refs.h
index e010f8aec28..4dad8f24914 100644
--- a/refs.h
+++ b/refs.h
@@ -732,6 +732,17 @@ int update_ref(const char *msg, const char *refname,
 	       const struct object_id *new_oid, const struct object_id *old_oid,
 	       unsigned int flags, enum action_on_err onerr);
 
+/* Pseudorefs (eg. HEAD, CHERRY_PICK_HEAD) have a separate routines for updating
+   and deletion as they cannot take part in normal transactional updates.
+   Pseudorefs should only be written for the main repository.
+*/
+int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			 const struct object_id *oid,
+			 const struct object_id *old_oid, struct strbuf *err);
+int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			  const struct object_id *old_oid);
+int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid);
+
 int parse_hide_refs_config(const char *var, const char *value, const char *);
 
 /*
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6516c7bc8c8..df7553f4cc3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -731,6 +731,115 @@ static int lock_raw_ref(struct files_ref_store *refs,
 	return ret;
 }
 
+static int files_write_pseudoref(struct ref_store *ref_store,
+				 const char *pseudoref,
+				 const struct object_id *oid,
+				 const struct object_id *old_oid,
+				 struct strbuf *err)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "write_pseudoref");
+	int fd;
+	struct lock_file lock = LOCK_INIT;
+	struct strbuf filename = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	int ret = -1;
+
+	if (!oid)
+		return 0;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+	fd = hold_lock_file_for_update_timeout(&lock, filename.buf, 0,
+					       get_files_ref_lock_timeout_ms());
+	if (fd < 0) {
+		strbuf_addf(err, _("could not open '%s' for writing: %s"),
+			    buf.buf, strerror(errno));
+		goto done;
+	}
+
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(err, _("could not read ref '%s'"),
+					    pseudoref);
+				rollback_lock_file(&lock);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(err, _("ref '%s' already exists"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(err,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+	}
+
+	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
+	if (write_in_full(fd, buf.buf, buf.len) < 0) {
+		strbuf_addf(err, _("could not write to '%s'"), filename.buf);
+		rollback_lock_file(&lock);
+		goto done;
+	}
+
+	commit_lock_file(&lock);
+	ret = 0;
+done:
+	strbuf_release(&buf);
+	strbuf_release(&filename);
+	return ret;
+}
+
+static int files_delete_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *old_oid)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "delete_pseudoref");
+	struct strbuf filename = STRBUF_INIT;
+	int ret = -1;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+
+	if (old_oid && !is_null_oid(old_oid)) {
+		struct lock_file lock = LOCK_INIT;
+		int fd;
+		struct object_id actual_old_oid;
+
+		fd = hold_lock_file_for_update_timeout(
+			&lock, filename.buf, 0,
+			get_files_ref_lock_timeout_ms());
+		if (fd < 0) {
+			error_errno(_("could not open '%s' for writing"),
+				    filename.buf);
+			goto done;
+		}
+		if (read_ref(pseudoref, &actual_old_oid))
+			die(_("could not read ref '%s'"), pseudoref);
+		if (!oideq(&actual_old_oid, old_oid)) {
+			error(_("unexpected object ID when deleting '%s'"),
+			      pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+
+		unlink(filename.buf);
+		rollback_lock_file(&lock);
+	} else {
+		unlink(filename.buf);
+	}
+	ret = 0;
+done:
+	strbuf_release(&filename);
+	return ret;
+}
+
 struct files_ref_iterator {
 	struct ref_iterator base;
 
@@ -3189,6 +3298,9 @@ struct ref_storage_be refs_be_files = {
 	files_rename_ref,
 	files_copy_ref,
 
+	files_write_pseudoref,
+	files_delete_pseudoref,
+
 	files_ref_iterator_begin,
 	files_read_raw_ref,
 
@@ -3198,5 +3310,5 @@ struct ref_storage_be refs_be_files = {
 	files_reflog_exists,
 	files_create_reflog,
 	files_delete_reflog,
-	files_reflog_expire
+	files_reflog_expire,
 };
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 4458a0f69cc..08e8253a893 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1590,6 +1590,22 @@ static int packed_copy_ref(struct ref_store *ref_store,
 	BUG("packed reference store does not support copying references");
 }
 
+static int packed_write_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *oid,
+				  const struct object_id *old_oid,
+				  struct strbuf *err)
+{
+	BUG("packed reference store does not support writing pseudo-references");
+}
+
+static int packed_delete_pseudoref(struct ref_store *ref_store,
+				   const char *pseudoref,
+				   const struct object_id *old_oid)
+{
+	BUG("packed reference store does not support deleting pseudo-references");
+}
+
 static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_store)
 {
 	return empty_ref_iterator_begin();
@@ -1656,6 +1672,9 @@ struct ref_storage_be refs_be_packed = {
 	packed_rename_ref,
 	packed_copy_ref,
 
+	packed_write_pseudoref,
+	packed_delete_pseudoref,
+
 	packed_ref_iterator_begin,
 	packed_read_raw_ref,
 
@@ -1665,5 +1684,5 @@ struct ref_storage_be refs_be_packed = {
 	packed_reflog_exists,
 	packed_create_reflog,
 	packed_delete_reflog,
-	packed_reflog_expire
+	packed_reflog_expire,
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 4271362d264..59b053d53a2 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -556,6 +556,21 @@ typedef int copy_ref_fn(struct ref_store *ref_store,
 			  const char *oldref, const char *newref,
 			  const char *logmsg);
 
+typedef int write_pseudoref_fn(struct ref_store *ref_store,
+			       const char *pseudoref,
+			       const struct object_id *oid,
+			       const struct object_id *old_oid,
+			       struct strbuf *err);
+
+/*
+ * Deletes a pseudoref. Deletion always succeeds (even if the pseudoref doesn't
+ * exist.), except if old_oid is specified. If it is, it can fail due to lock
+ * failure, failure reading the old OID, or an OID mismatch
+ */
+typedef int delete_pseudoref_fn(struct ref_store *ref_store,
+				const char *pseudoref,
+				const struct object_id *old_oid);
+
 /*
  * Iterate over the references in `ref_store` whose names start with
  * `prefix`. `prefix` is matched as a literal string, without regard
@@ -655,6 +670,9 @@ struct ref_storage_be {
 	rename_ref_fn *rename_ref;
 	copy_ref_fn *copy_ref;
 
+	write_pseudoref_fn *write_pseudoref;
+	delete_pseudoref_fn *delete_pseudoref;
+
 	ref_iterator_begin_fn *iterator_begin;
 	read_raw_ref_fn *read_raw_ref;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v16 02/14] Make refs_ref_exists public
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 01/14] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
@ 2020-06-05 18:03                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-09 10:36                                 ` Phillip Wood
  2020-06-05 18:03                               ` [PATCH v16 03/14] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
                                                 ` (13 subsequent siblings)
  15 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 2 +-
 refs.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 12908066b13..812fee47108 100644
--- a/refs.c
+++ b/refs.c
@@ -311,7 +311,7 @@ int read_ref(const char *refname, struct object_id *oid)
 	return read_ref_full(refname, RESOLVE_REF_READING, oid, NULL);
 }
 
-static int refs_ref_exists(struct ref_store *refs, const char *refname)
+int refs_ref_exists(struct ref_store *refs, const char *refname)
 {
 	return !!refs_resolve_ref_unsafe(refs, refname, RESOLVE_REF_READING, NULL, NULL);
 }
diff --git a/refs.h b/refs.h
index 4dad8f24914..7aaa1226551 100644
--- a/refs.h
+++ b/refs.h
@@ -105,6 +105,8 @@ int refs_verify_refname_available(struct ref_store *refs,
 				  const struct string_list *skip,
 				  struct strbuf *err);
 
+int refs_ref_exists(struct ref_store *refs, const char *refname);
+
 int ref_exists(const char *refname);
 
 int should_autocreate_reflog(const char *refname);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v16 03/14] Treat BISECT_HEAD as a pseudo ref
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 01/14] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 02/14] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
@ 2020-06-05 18:03                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 04/14] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
                                                 ` (12 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Both the git-bisect.sh as bisect--helper inspected the file system directly.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/bisect--helper.c | 3 +--
 git-bisect.sh            | 4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index ec4996282e3..73f9324ad7d 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -13,7 +13,6 @@ static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
 static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
 static GIT_PATH_FUNC(git_path_bisect_ancestors_ok, "BISECT_ANCESTORS_OK")
 static GIT_PATH_FUNC(git_path_bisect_start, "BISECT_START")
-static GIT_PATH_FUNC(git_path_bisect_head, "BISECT_HEAD")
 static GIT_PATH_FUNC(git_path_bisect_log, "BISECT_LOG")
 static GIT_PATH_FUNC(git_path_head_name, "head-name")
 static GIT_PATH_FUNC(git_path_bisect_names, "BISECT_NAMES")
@@ -164,7 +163,7 @@ static int bisect_reset(const char *commit)
 		strbuf_addstr(&branch, commit);
 	}
 
-	if (!file_exists(git_path_bisect_head())) {
+	if (!ref_exists("BISECT_HEAD")) {
 		struct argv_array argv = ARGV_ARRAY_INIT;
 
 		argv_array_pushl(&argv, "checkout", branch.buf, "--", NULL);
diff --git a/git-bisect.sh b/git-bisect.sh
index 08a6ed57ddb..f03fbb18f00 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -41,7 +41,7 @@ TERM_GOOD=good
 
 bisect_head()
 {
-	if test -f "$GIT_DIR/BISECT_HEAD"
+	if git rev-parse --verify -q BISECT_HEAD > /dev/null
 	then
 		echo BISECT_HEAD
 	else
@@ -153,7 +153,7 @@ bisect_next() {
 	git bisect--helper --bisect-next-check $TERM_GOOD $TERM_BAD $TERM_GOOD|| exit
 
 	# Perform all bisection computation, display and checkout
-	git bisect--helper --next-all $(test -f "$GIT_DIR/BISECT_HEAD" && echo --no-checkout)
+	git bisect--helper --next-all $(git rev-parse --verify -q BISECT_HEAD > /dev/null && echo --no-checkout)
 	res=$?
 
 	# Check if we should exit because bisection is finished
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v16 04/14] Treat CHERRY_PICK_HEAD as a pseudo ref
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
                                                 ` (2 preceding siblings ...)
  2020-06-05 18:03                               ` [PATCH v16 03/14] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
@ 2020-06-05 18:03                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 05/14] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
                                                 ` (11 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Check for existence and delete CHERRY_PICK_HEAD through pseudo ref functions.
This will help cherry-pick work with alternate ref storage backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/commit.c | 34 +++++++++++++++++++---------------
 builtin/merge.c  |  2 +-
 path.c           |  1 -
 path.h           |  7 ++++---
 sequencer.c      | 42 ++++++++++++++++++++++++++----------------
 wt-status.c      |  4 ++--
 6 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index d1b7396052a..e27120b982b 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -847,21 +847,25 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 			if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
 				!merge_contains_scissors)
 				wt_status_add_cut_line(s->fp);
-			status_printf_ln(s, GIT_COLOR_NORMAL,
-			    whence == FROM_MERGE
-				? _("\n"
-					"It looks like you may be committing a merge.\n"
-					"If this is not correct, please remove the file\n"
-					"	%s\n"
-					"and try again.\n")
-				: _("\n"
-					"It looks like you may be committing a cherry-pick.\n"
-					"If this is not correct, please remove the file\n"
-					"	%s\n"
-					"and try again.\n"),
-				whence == FROM_MERGE ?
-					git_path_merge_head(the_repository) :
-					git_path_cherry_pick_head(the_repository));
+			if (whence == FROM_MERGE)
+				status_printf_ln(
+					s, GIT_COLOR_NORMAL,
+
+					_("\n"
+					  "It looks like you may be committing a merge.\n"
+					  "If this is not correct, please remove the file\n"
+					  "	%s\n"
+					  "and try again.\n"),
+					git_path_merge_head(the_repository));
+			else
+				status_printf_ln(
+					s, GIT_COLOR_NORMAL,
+
+					_("\n"
+					  "It looks like you may be committing a cherry-pick.\n"
+					  "If this is not correct, please run\n"
+					  "	git cherry-pick --abort\n"
+					  "and try again.\n"));
 		}
 
 		fprintf(s->fp, "\n");
diff --git a/builtin/merge.c b/builtin/merge.c
index 7da707bf55d..93b0a7b6eda 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -1352,7 +1352,7 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
 		else
 			die(_("You have not concluded your merge (MERGE_HEAD exists)."));
 	}
-	if (file_exists(git_path_cherry_pick_head(the_repository))) {
+	if (ref_exists("CHERRY_PICK_HEAD")) {
 		if (advice_resolve_conflict)
 			die(_("You have not concluded your cherry-pick (CHERRY_PICK_HEAD exists).\n"
 			    "Please, commit your changes before you merge."));
diff --git a/path.c b/path.c
index 8b2c7531919..783cc2ae819 100644
--- a/path.c
+++ b/path.c
@@ -1528,7 +1528,6 @@ char *xdg_cache_home(const char *filename)
 	return NULL;
 }
 
-REPO_GIT_PATH_FUNC(cherry_pick_head, "CHERRY_PICK_HEAD")
 REPO_GIT_PATH_FUNC(revert_head, "REVERT_HEAD")
 REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
 REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
diff --git a/path.h b/path.h
index 1f1bf8f87a8..8941c018a99 100644
--- a/path.h
+++ b/path.h
@@ -170,7 +170,6 @@ void report_linked_checkout_garbage(void);
 	}
 
 struct path_cache {
-	const char *cherry_pick_head;
 	const char *revert_head;
 	const char *squash_msg;
 	const char *merge_msg;
@@ -182,9 +181,11 @@ struct path_cache {
 	const char *shallow;
 };
 
-#define PATH_CACHE_INIT { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL }
+#define PATH_CACHE_INIT                                              \
+	{                                                            \
+		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
+	}
 
-const char *git_path_cherry_pick_head(struct repository *r);
 const char *git_path_revert_head(struct repository *r);
 const char *git_path_squash_msg(struct repository *r);
 const char *git_path_merge_msg(struct repository *r);
diff --git a/sequencer.c b/sequencer.c
index fd7701c88a8..26286ec8d08 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -381,7 +381,8 @@ static void print_advice(struct repository *r, int show_hint,
 		 * (typically rebase --interactive) wants to take care
 		 * of the commit itself so remove CHERRY_PICK_HEAD
 		 */
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		return;
 	}
 
@@ -1455,7 +1456,8 @@ static int do_commit(struct repository *r,
 				    author, opts, flags, &oid);
 		strbuf_release(&sb);
 		if (!res) {
-			unlink(git_path_cherry_pick_head(r));
+			refs_delete_pseudoref(get_main_ref_store(r),
+					      "CHERRY_PICK_HEAD", NULL);
 			unlink(git_path_merge_msg(r));
 			if (!is_rebase_i(opts))
 				print_commit_summary(r, NULL, &oid,
@@ -1966,7 +1968,8 @@ static int do_pick_commit(struct repository *r,
 		flags |= ALLOW_EMPTY;
 	} else if (allow == 2) {
 		drop_commit = 1;
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		unlink(git_path_merge_msg(r));
 		fprintf(stderr,
 			_("dropping %s %s -- patch contents already upstream\n"),
@@ -2305,8 +2308,10 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose)
 	struct replay_opts opts = REPLAY_OPTS_INIT;
 	int need_cleanup = 0;
 
-	if (file_exists(git_path_cherry_pick_head(r))) {
-		if (!unlink(git_path_cherry_pick_head(r)) && verbose)
+	if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD")) {
+		if (!refs_delete_pseudoref(get_main_ref_store(r),
+					   "CHERRY_PICK_HEAD", NULL) &&
+		    verbose)
 			warning(_("cancelling a cherry picking in progress"));
 		opts.action = REPLAY_PICK;
 		need_cleanup = 1;
@@ -2671,8 +2676,9 @@ static int create_seq_dir(struct repository *r)
 	enum replay_action action;
 	const char *in_progress_error = NULL;
 	const char *in_progress_advice = NULL;
-	unsigned int advise_skip = file_exists(git_path_revert_head(r)) ||
-				file_exists(git_path_cherry_pick_head(r));
+	unsigned int advise_skip =
+		file_exists(git_path_revert_head(r)) ||
+		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
 
 	if (!sequencer_get_last_command(r, &action)) {
 		switch (action) {
@@ -2771,7 +2777,7 @@ static int rollback_single_pick(struct repository *r)
 {
 	struct object_id head_oid;
 
-	if (!file_exists(git_path_cherry_pick_head(r)) &&
+	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
 	    !file_exists(git_path_revert_head(r)))
 		return error(_("no cherry-pick or revert in progress"));
 	if (read_ref_full("HEAD", 0, &head_oid, NULL))
@@ -2874,7 +2880,8 @@ int sequencer_skip(struct repository *r, struct replay_opts *opts)
 		}
 		break;
 	case REPLAY_PICK:
-		if (!file_exists(git_path_cherry_pick_head(r))) {
+		if (!refs_ref_exists(get_main_ref_store(r),
+				     "CHERRY_PICK_HEAD")) {
 			if (action != REPLAY_PICK)
 				return error(_("no cherry-pick in progress"));
 			if (!rollback_is_safe())
@@ -3569,7 +3576,8 @@ static int do_merge(struct repository *r,
 					oid_to_hex(&j->item->object.oid));
 
 		strbuf_release(&ref_name);
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		rollback_lock_file(&lock);
 
 		rollback_lock_file(&lock);
@@ -4201,7 +4209,7 @@ static int continue_single_pick(struct repository *r)
 {
 	const char *argv[] = { "commit", NULL };
 
-	if (!file_exists(git_path_cherry_pick_head(r)) &&
+	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
 	    !file_exists(git_path_revert_head(r)))
 		return error(_("no cherry-pick or revert in progress"));
 	return run_command_v_opt(argv, RUN_GIT_CMD);
@@ -4318,9 +4326,10 @@ static int commit_staged_changes(struct repository *r,
 	}
 
 	if (is_clean) {
-		const char *cherry_pick_head = git_path_cherry_pick_head(r);
-
-		if (file_exists(cherry_pick_head) && unlink(cherry_pick_head))
+		if (refs_ref_exists(get_main_ref_store(r),
+				    "CHERRY_PICK_HEAD") &&
+		    refs_delete_pseudoref(get_main_ref_store(r),
+					  "CHERRY_PICK_HEAD", NULL))
 			return error(_("could not remove CHERRY_PICK_HEAD"));
 		if (!final_fixup)
 			return 0;
@@ -4379,7 +4388,8 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
 
 	if (!is_rebase_i(opts)) {
 		/* Verify that the conflict has been resolved */
-		if (file_exists(git_path_cherry_pick_head(r)) ||
+		if (refs_ref_exists(get_main_ref_store(r),
+				    "CHERRY_PICK_HEAD") ||
 		    file_exists(git_path_revert_head(r))) {
 			res = continue_single_pick(r);
 			if (res)
@@ -5442,7 +5452,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list)
 
 int sequencer_determine_whence(struct repository *r, enum commit_whence *whence)
 {
-	if (file_exists(git_path_cherry_pick_head(r))) {
+	if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD")) {
 		struct object_id cherry_pick_head, rebase_head;
 
 		if (file_exists(git_path_seq_dir()))
diff --git a/wt-status.c b/wt-status.c
index 98dfa6f73f9..96302be030b 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1636,8 +1636,8 @@ void wt_status_get_state(struct repository *r,
 		state->merge_in_progress = 1;
 	} else if (wt_status_check_rebase(NULL, state)) {
 		;		/* all set */
-	} else if (!stat(git_path_cherry_pick_head(r), &st) &&
-			!get_oid("CHERRY_PICK_HEAD", &oid)) {
+	} else if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
+		   !get_oid("CHERRY_PICK_HEAD", &oid)) {
 		state->cherry_pick_in_progress = 1;
 		oidcpy(&state->cherry_pick_head_oid, &oid);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v16 05/14] Treat REVERT_HEAD as a pseudo ref
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
                                                 ` (3 preceding siblings ...)
  2020-06-05 18:03                               ` [PATCH v16 04/14] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
@ 2020-06-05 18:03                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 06/14] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
                                                 ` (10 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 path.c      |  1 -
 path.h      |  8 +++-----
 sequencer.c | 16 +++++++++-------
 wt-status.c |  2 +-
 4 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/path.c b/path.c
index 783cc2ae819..7b385e5eb28 100644
--- a/path.c
+++ b/path.c
@@ -1528,7 +1528,6 @@ char *xdg_cache_home(const char *filename)
 	return NULL;
 }
 
-REPO_GIT_PATH_FUNC(revert_head, "REVERT_HEAD")
 REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
 REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
 REPO_GIT_PATH_FUNC(merge_rr, "MERGE_RR")
diff --git a/path.h b/path.h
index 8941c018a99..e7e77da6aaa 100644
--- a/path.h
+++ b/path.h
@@ -170,7 +170,6 @@ void report_linked_checkout_garbage(void);
 	}
 
 struct path_cache {
-	const char *revert_head;
 	const char *squash_msg;
 	const char *merge_msg;
 	const char *merge_rr;
@@ -181,12 +180,11 @@ struct path_cache {
 	const char *shallow;
 };
 
-#define PATH_CACHE_INIT                                              \
-	{                                                            \
-		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
+#define PATH_CACHE_INIT                                        \
+	{                                                      \
+		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
 	}
 
-const char *git_path_revert_head(struct repository *r);
 const char *git_path_squash_msg(struct repository *r);
 const char *git_path_merge_msg(struct repository *r);
 const char *git_path_merge_rr(struct repository *r);
diff --git a/sequencer.c b/sequencer.c
index 26286ec8d08..25ef9fc773d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2317,8 +2317,10 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose)
 		need_cleanup = 1;
 	}
 
-	if (file_exists(git_path_revert_head(r))) {
-		if (!unlink(git_path_revert_head(r)) && verbose)
+	if (refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
+		if (!refs_delete_pseudoref(get_main_ref_store(r), "REVERT_HEAD",
+					   NULL) &&
+		    verbose)
 			warning(_("cancelling a revert in progress"));
 		opts.action = REPLAY_REVERT;
 		need_cleanup = 1;
@@ -2677,7 +2679,7 @@ static int create_seq_dir(struct repository *r)
 	const char *in_progress_error = NULL;
 	const char *in_progress_advice = NULL;
 	unsigned int advise_skip =
-		file_exists(git_path_revert_head(r)) ||
+		refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD") ||
 		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
 
 	if (!sequencer_get_last_command(r, &action)) {
@@ -2778,7 +2780,7 @@ static int rollback_single_pick(struct repository *r)
 	struct object_id head_oid;
 
 	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
-	    !file_exists(git_path_revert_head(r)))
+	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
 		return error(_("no cherry-pick or revert in progress"));
 	if (read_ref_full("HEAD", 0, &head_oid, NULL))
 		return error(_("cannot resolve HEAD"));
@@ -2872,7 +2874,7 @@ int sequencer_skip(struct repository *r, struct replay_opts *opts)
 	 */
 	switch (opts->action) {
 	case REPLAY_REVERT:
-		if (!file_exists(git_path_revert_head(r))) {
+		if (!refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
 			if (action != REPLAY_REVERT)
 				return error(_("no revert in progress"));
 			if (!rollback_is_safe())
@@ -4210,7 +4212,7 @@ static int continue_single_pick(struct repository *r)
 	const char *argv[] = { "commit", NULL };
 
 	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
-	    !file_exists(git_path_revert_head(r)))
+	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
 		return error(_("no cherry-pick or revert in progress"));
 	return run_command_v_opt(argv, RUN_GIT_CMD);
 }
@@ -4390,7 +4392,7 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
 		/* Verify that the conflict has been resolved */
 		if (refs_ref_exists(get_main_ref_store(r),
 				    "CHERRY_PICK_HEAD") ||
-		    file_exists(git_path_revert_head(r))) {
+		    refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
 			res = continue_single_pick(r);
 			if (res)
 				goto release_todo_list;
diff --git a/wt-status.c b/wt-status.c
index 96302be030b..81b768504a1 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1642,7 +1642,7 @@ void wt_status_get_state(struct repository *r,
 		oidcpy(&state->cherry_pick_head_oid, &oid);
 	}
 	wt_status_check_bisect(NULL, state);
-	if (!stat(git_path_revert_head(r), &st) &&
+	if (refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD") &&
 	    !get_oid("REVERT_HEAD", &oid)) {
 		state->revert_in_progress = 1;
 		oidcpy(&state->revert_head_oid, &oid);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v16 06/14] Move REF_LOG_ONLY to refs-internal.h
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
                                                 ` (4 preceding siblings ...)
  2020-06-05 18:03                               ` [PATCH v16 05/14] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
@ 2020-06-05 18:03                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 07/14] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
                                                 ` (9 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

REF_LOG_ONLY is used in the transaction preparation: if a symref is involved in
a transaction, the referent of the symref should be updated, and the symref
itself should only be updated in the reflog. Other ref backends will need to
duplicate this logic.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/files-backend.c | 7 -------
 refs/refs-internal.h | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index df7553f4cc3..141b6b08816 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -38,13 +38,6 @@
  */
 #define REF_NEEDS_COMMIT (1 << 6)
 
-/*
- * Used as a flag in ref_update::flags when we want to log a ref
- * update but not actually perform it.  This is used when a symbolic
- * ref update is split up.
- */
-#define REF_LOG_ONLY (1 << 7)
-
 /*
  * Used as a flag in ref_update::flags when the ref_update was via an
  * update to HEAD.
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 59b053d53a2..dc9e8d3a92b 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -31,6 +31,13 @@ struct ref_transaction;
  */
 #define REF_HAVE_OLD (1 << 3)
 
+/*
+ * Used as a flag in ref_update::flags when we want to log a ref
+ * update but not actually perform it.  This is used when a symbolic
+ * ref update is split up.
+ */
+#define REF_LOG_ONLY (1 << 7)
+
 /*
  * Return the length of time to retry acquiring a loose reference lock
  * before giving up, in milliseconds:
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v16 07/14] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
                                                 ` (5 preceding siblings ...)
  2020-06-05 18:03                               ` [PATCH v16 06/14] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
@ 2020-06-05 18:03                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 08/14] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
                                                 ` (8 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This happens implicitly in the files/packed ref backend; making it
explicit simplifies adding alternate ref storage backends, such as
reftable.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 812fee47108..29e710a29e9 100644
--- a/refs.c
+++ b/refs.c
@@ -1450,7 +1450,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1510,8 +1510,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
-			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
+			       cb_data);
 }
 
 int for_each_rawref(each_ref_fn fn, void *cb_data)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v16 08/14] Add .gitattributes for the reftable/ directory
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
                                                 ` (6 preceding siblings ...)
  2020-06-05 18:03                               ` [PATCH v16 07/14] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
@ 2020-06-05 18:03                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 09/14] Add reftable library Han-Wen Nienhuys via GitGitGadget
                                                 ` (7 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 reftable/.gitattributes

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 00000000000..f44451a3795
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v16 09/14] Add reftable library
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
                                                 ` (7 preceding siblings ...)
  2020-06-05 18:03                               ` [PATCH v16 08/14] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
@ 2020-06-05 18:03                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 10/14] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                                                 ` (6 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. Its rationale and
specification is in the preceding commit.

This imports the upstream library as one big commit. For understanding
the code, it is suggested to read the code in the following order:

* The specification under Documentation/technical/reftable.txt

* reftable.h - the public API

* record.{c,h} - reading and writing records

* block.{c,h} - reading and writing blocks.

* writer.{c,h} - writing a complete reftable file.

* merged.{c,h} and pq.{c,h} - reading a stack of reftables

* stack.{c,h} - writing and compacting stacks of reftable on the
filesystem.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE          |   31 +
 reftable/README.md        |   11 +
 reftable/VERSION          |    1 +
 reftable/basics.c         |  215 +++++++
 reftable/basics.h         |   53 ++
 reftable/block.c          |  433 +++++++++++++
 reftable/block.h          |  129 ++++
 reftable/block_test.c     |  156 +++++
 reftable/constants.h      |   21 +
 reftable/file.c           |   95 +++
 reftable/iter.c           |  243 ++++++++
 reftable/iter.h           |   72 +++
 reftable/merged.c         |  320 ++++++++++
 reftable/merged.h         |   39 ++
 reftable/merged_test.c    |  273 +++++++++
 reftable/pq.c             |  113 ++++
 reftable/pq.h             |   34 ++
 reftable/reader.c         |  742 +++++++++++++++++++++++
 reftable/reader.h         |   65 ++
 reftable/record.c         | 1113 ++++++++++++++++++++++++++++++++++
 reftable/record.h         |  128 ++++
 reftable/record_test.c    |  400 ++++++++++++
 reftable/refname.c        |  209 +++++++
 reftable/refname.h        |   38 ++
 reftable/refname_test.c   |   99 +++
 reftable/reftable-tests.h |   13 +
 reftable/reftable.c       |   90 +++
 reftable/reftable.h       |  564 +++++++++++++++++
 reftable/reftable_test.c  |  631 +++++++++++++++++++
 reftable/slice.c          |  243 ++++++++
 reftable/slice.h          |   87 +++
 reftable/slice_test.c     |   40 ++
 reftable/stack.c          | 1207 +++++++++++++++++++++++++++++++++++++
 reftable/stack.h          |   48 ++
 reftable/stack_test.c     |  743 +++++++++++++++++++++++
 reftable/system.h         |   54 ++
 reftable/test_framework.c |   69 +++
 reftable/test_framework.h |   64 ++
 reftable/tree.c           |   63 ++
 reftable/tree.h           |   34 ++
 reftable/tree_test.c      |   62 ++
 reftable/update.sh        |   25 +
 reftable/writer.c         |  644 ++++++++++++++++++++
 reftable/writer.h         |   60 ++
 reftable/zlib-compat.c    |   92 +++
 45 files changed, 9866 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/refname_test.c
 create mode 100644 reftable/reftable-tests.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/slice_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/system.h
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 00000000000..402e0f9356b
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 00000000000..21500563fd4
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,11 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   sh reftable/update.sh
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 00000000000..c8bb0952506
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1 @@
+0c164275ea8f8aa7d099f7425ddcef1affe137e9 C: compile framework with Git options
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 00000000000..14c4dfaf5a6
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,215 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)(i & 0xff);
+}
+
+uint32_t get_be24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)(i & 0xff);
+}
+
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args)
+{
+	size_t lo = 0;
+	size_t hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		size_t mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		reftable_free(*p);
+		p++;
+	}
+	reftable_free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = reftable_realloc(
+					names, names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = reftable_realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *reftable_error_str(int err)
+{
+	static char buf[250];
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return "I/O error";
+	case REFTABLE_FORMAT_ERROR:
+		return "corrupt reftable file";
+	case REFTABLE_NOT_EXIST_ERROR:
+		return "file does not exist";
+	case REFTABLE_LOCK_ERROR:
+		return "data is outdated";
+	case REFTABLE_API_ERROR:
+		return "misuse of the reftable API";
+	case REFTABLE_ZLIB_ERROR:
+		return "zlib failure";
+	case REFTABLE_NAME_CONFLICT:
+		return "file/directory conflict";
+	case REFTABLE_REFNAME_ERROR:
+		return "invalid refname";
+	case -1:
+		return "general error";
+	default:
+		snprintf(buf, sizeof(buf), "unknown error code %d", err);
+		return buf;
+	}
+}
+
+int reftable_error_to_errno(int err)
+{
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return EIO;
+	case REFTABLE_FORMAT_ERROR:
+		return EFAULT;
+	case REFTABLE_NOT_EXIST_ERROR:
+		return ENOENT;
+	case REFTABLE_LOCK_ERROR:
+		return EBUSY;
+	case REFTABLE_API_ERROR:
+		return EINVAL;
+	case REFTABLE_ZLIB_ERROR:
+		return EDOM;
+	default:
+		return ERANGE;
+	}
+}
+
+void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
+void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
+void (*reftable_free_ptr)(void *) = &free;
+
+void *reftable_malloc(size_t sz)
+{
+	return (*reftable_malloc_ptr)(sz);
+}
+
+void *reftable_realloc(void *p, size_t sz)
+{
+	return (*reftable_realloc_ptr)(p, sz);
+}
+
+void reftable_free(void *p)
+{
+	reftable_free_ptr(p);
+}
+
+void *reftable_calloc(size_t sz)
+{
+	void *p = reftable_malloc(sz);
+	memset(p, 0, sz);
+	return p;
+}
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *))
+{
+	reftable_malloc_ptr = malloc;
+	reftable_realloc_ptr = realloc;
+	reftable_free_ptr = free;
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 00000000000..1b003485c9c
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+/* Bigendian en/decoding of integers */
+
+void put_be24(byte *out, uint32_t i);
+uint32_t get_be24(byte *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args);
+
+/*
+  Frees a NULL terminated array of malloced strings. The array itself is also
+  freed.
+ */
+void free_names(char **a);
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp);
+
+/* compares two NULL-terminated arrays of strings. */
+int names_equal(char **a, char **b);
+
+/* returns the array size of a NULL-terminated array of strings. */
+int names_length(char **names);
+
+/* Allocation routines; they invoke the functions set through
+ * reftable_set_alloc() */
+void *reftable_malloc(size_t sz);
+void *reftable_realloc(void *p, size_t sz);
+void reftable_free(void *p);
+void *reftable_calloc(size_t sz);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 00000000000..e3bf00d64c4
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,433 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int header_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 24;
+	case 2:
+		return 28;
+	}
+	abort();
+}
+
+int footer_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 68;
+	case 2:
+		return 72;
+	}
+	abort();
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice *key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+	bw->last_key.canary = SLICE_CANARY;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the reftable_record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec)
+{
+	struct slice empty = SLICE_INIT;
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+		.canary = SLICE_CANARY,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = SLICE_INIT;
+	int n = 0;
+
+	reftable_record_key(rec, &key);
+	n = reftable_encode_key(&restart, out, last, key,
+				reftable_record_val_type(rec));
+	if (n < 0)
+		goto done;
+	slice_consume(&out, n);
+
+	n = reftable_record_encode(rec, out, w->hash_size);
+	if (n < 0)
+		goto done;
+	slice_consume(&out, n);
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  &key) < 0)
+		goto done;
+
+	slice_release(&key);
+	return 0;
+
+done:
+	slice_release(&key);
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice *key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next)
+		return -1;
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = reftable_realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+
+	slice_copy(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		struct slice compressed = SLICE_INIT;
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		slice_resize(&compressed, src_len);
+
+		while (1) {
+			uLongf dest_len = compressed.len;
+
+			zresult = compress2(compressed.buf, &dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				slice_resize(&compressed, 2 * compressed.len);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				slice_release(&compressed);
+				return REFTABLE_ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed.buf,
+			       dest_len);
+			w->next = dest_len + block_header_skip;
+			slice_release(&compressed);
+			break;
+		}
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct reftable_block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	uint16_t restart_count = 0;
+	uint32_t restart_start = 0;
+	byte *restart_bytes = NULL;
+
+	if (!reftable_is_block_type(typ))
+		return REFTABLE_FORMAT_ERROR;
+
+	if (typ == BLOCK_TYPE_LOG) {
+		struct slice uncompressed = SLICE_INIT;
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip; /* total size of dest
+							    buffer. */
+		uLongf src_len = block->len - block_header_skip;
+
+		/* Log blocks specify the *uncompressed* size in their header.
+		 */
+		slice_resize(&uncompressed, sz);
+
+		/* Copy over the block header verbatim. It's not compressed. */
+		memcpy(uncompressed.buf, block->data, block_header_skip);
+
+		/* Uncompress */
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed.buf + block_header_skip,
+				    &dst_len, block->data + block_header_skip,
+				    &src_len)) {
+			slice_release(&uncompressed);
+			return REFTABLE_ZLIB_ERROR;
+		}
+
+		if (dst_len + block_header_skip != sz)
+			return REFTABLE_FORMAT_ERROR;
+
+		/* We're done with the input data. */
+		reftable_block_done(block);
+		block->data = uncompressed.buf;
+		block->len = sz;
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	restart_count = get_be16(block->data + sz - 2);
+	restart_start = sz - 2 - 3 * restart_count;
+	restart_bytes = block->data + restart_start;
+
+	/* transfer ownership. */
+	br->block = *block;
+	block->data = NULL;
+	block->len = 0;
+
+	br->hash_size = hash_size;
+	br->block_len = restart_start;
+	br->full_block_size = full_block_size;
+	br->header_off = header_off;
+	br->restart_count = restart_count;
+	br->restart_bytes = restart_bytes;
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_resize(&it->last_key, 0);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct slice key;
+	struct block_reader *r;
+};
+
+static int restart_key_less(size_t idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+		.canary = SLICE_CANARY,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = SLICE_INIT;
+	struct slice last_key = SLICE_INIT;
+	byte unused_extra;
+	int n = reftable_decode_key(&rkey, &unused_extra, last_key, in);
+	int result;
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	result = slice_cmp(&a->key, &rkey);
+	slice_release(&rkey);
+	return result;
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_copy(&dest->last_key, &src->last_key);
+}
+
+int block_iter_next(struct block_iter *it, struct reftable_record *rec)
+{
+	struct slice in = {
+		.buf = it->br->block.data + it->next_off,
+		.len = it->br->block_len - it->next_off,
+		.canary = SLICE_CANARY,
+	};
+	struct slice start = in;
+	struct slice key = SLICE_INIT;
+	byte extra = 0;
+	int n = 0;
+
+	if (it->next_off >= it->br->block_len)
+		return 1;
+
+	n = reftable_decode_key(&key, &extra, it->last_key, in);
+	if (n < 0)
+		return -1;
+
+	slice_consume(&in, n);
+	n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
+	if (n < 0)
+		return -1;
+	slice_consume(&in, n);
+
+	slice_copy(&it->last_key, &key);
+	it->next_off += start.len - in.len;
+	slice_release(&key);
+	return 0;
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = SLICE_INIT;
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+		.canary = SLICE_CANARY,
+	};
+
+	byte extra = 0;
+	int n = reftable_decode_key(key, &extra, empty, in);
+	if (n < 0)
+		return n;
+
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice *want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	slice_release(&it->last_key);
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice *want)
+{
+	struct restart_find_args args = {
+		.key = *want,
+		.r = br,
+	};
+	struct reftable_record rec = reftable_new_record(block_reader_type(br));
+	struct slice key = SLICE_INIT;
+	int err = 0;
+	struct block_iter next = {
+		.last_key = SLICE_INIT,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	/* We're looking for the last entry less/equal than the wanted key, so
+	   we have to go one entry too far and then back up.
+	*/
+	while (true) {
+		block_iter_copy_from(&next, it);
+		err = block_iter_next(&next, &rec);
+		if (err < 0)
+			goto done;
+
+		reftable_record_key(&rec, &key);
+		if (err > 0 || slice_cmp(&key, want) >= 0) {
+			err = 0;
+			goto done;
+		}
+
+		block_iter_copy_from(it, &next);
+	}
+
+done:
+	slice_release(&key);
+	slice_release(&next.last_key);
+	reftable_record_destroy(&rec);
+
+	return err;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	slice_release(&bw->last_key);
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 00000000000..c55f746c5c1
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,129 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+/*
+  Writes reftable blocks. The block_writer is reused across blocks to minimize
+  allocation overhead.
+*/
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+
+	/* Offset ofof the global header. Nonzero in the first block only. */
+	uint32_t header_off;
+
+	/* How often to restart keys. */
+	int restart_interval;
+	int hash_size;
+
+	/* Offset of next byte to write. */
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+
+	struct slice last_key;
+	int entries;
+};
+
+/*
+  initializes the blockwriter to write `typ` entries, using `buf` as temporary
+  storage. `buf` is not owned by the block_writer. */
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+
+/*
+  returns the block type (eg. 'r' for ref records.
+*/
+byte block_writer_type(struct block_writer *bw);
+
+/* appends the record, or -1 if it doesn't fit. */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec);
+
+/* appends the key restarts, and compress the block if necessary. */
+int block_writer_finish(struct block_writer *w);
+
+/* clears out internally allocated block_writer members. */
+void block_writer_clear(struct block_writer *bw);
+
+/* Read a block. */
+struct block_reader {
+	/* offset of the block header; nonzero for the first block in a
+	 * reftable. */
+	uint32_t header_off;
+
+	/* the memory block */
+	struct reftable_block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint16_t restart_count;
+
+	/* size of the data in the file. For log blocks, this is the compressed
+	 * size. */
+	uint32_t full_block_size;
+};
+
+/* Iterate over entries in a block */
+struct block_iter {
+	/* offset within the block of the next entry to read. */
+	uint32_t next_off;
+	struct block_reader *br;
+
+	/* key for last entry we read. */
+	struct slice last_key;
+};
+
+/* initializes a block reader. */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Position `it` at start of the block */
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+
+/* Position `it` to the `want` key in the block */
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice *want);
+
+/* Returns the block type (eg. 'r' for refs) */
+byte block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct reftable_record *rec);
+
+/* Seek to `want` with in the block pointed to by `it` */
+int block_iter_seek(struct block_iter *it, struct slice *want);
+
+/* deallocate memory for `it`. The block reader and its block is left intact. */
+void block_iter_close(struct block_iter *it);
+
+/* size of file header, depending on format version */
+int header_size(int version);
+
+/* size of file footer, depending on format version */
+int footer_size(int version);
+
+/* returns a block to its source. */
+void reftable_block_done(struct reftable_block *ret);
+
+#endif
diff --git a/reftable/block_test.c b/reftable/block_test.c
new file mode 100644
index 00000000000..c1147caa521
--- /dev/null
+++ b/reftable/block_test.c
@@ -0,0 +1,156 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+struct binsearch_args {
+	int key;
+	int *arr;
+};
+
+static int binsearch_func(size_t i, void *void_args)
+{
+	struct binsearch_args *args = (struct binsearch_args *)void_args;
+
+	return args->key < args->arr[i];
+}
+
+static void test_binsearch(void)
+{
+	int arr[] = { 2, 4, 6, 8, 10 };
+	size_t sz = ARRAY_SIZE(arr);
+	struct binsearch_args args = {
+		.arr = arr,
+	};
+
+	int i = 0;
+	for (i = 1; i < 11; i++) {
+		int res;
+		args.key = i;
+		res = binsearch(sz, &binsearch_func, &args);
+
+		if (res < sz) {
+			assert(args.key < arr[res]);
+			if (res > 0) {
+				assert(args.key >= arr[res - 1]);
+			}
+		} else {
+			assert(args.key == 10 || args.key == 11);
+		}
+	}
+}
+
+static void test_block_read_write(void)
+{
+	const int header_off = 21; /* random */
+	char *names[30];
+	const int N = ARRAY_SIZE(names);
+	const int block_size = 1024;
+	struct reftable_block block = { 0 };
+	struct block_writer bw = {
+		.last_key = SLICE_INIT,
+	};
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_record rec = { 0 };
+	int i = 0;
+	int n;
+	struct block_reader br = { 0 };
+	struct block_iter it = { .last_key = SLICE_INIT };
+	int j = 0;
+	struct slice want = SLICE_INIT;
+
+	block.data = reftable_calloc(block_size);
+	block.len = block_size;
+	block.source = malloc_block_source();
+	block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size,
+			  header_off, hash_size(SHA1_ID));
+	reftable_record_from_ref(&rec, &ref);
+
+	for (i = 0; i < N; i++) {
+		char name[100];
+		byte hash[SHA1_SIZE];
+		snprintf(name, sizeof(name), "branch%02d", i);
+		memset(hash, i, sizeof(hash));
+
+		ref.ref_name = name;
+		ref.value = hash;
+		names[i] = xstrdup(name);
+		n = block_writer_add(&bw, &rec);
+		ref.ref_name = NULL;
+		ref.value = NULL;
+		assert(n == 0);
+	}
+
+	n = block_writer_finish(&bw);
+	assert(n > 0);
+
+	block_writer_clear(&bw);
+
+	block_reader_init(&br, &block, header_off, block_size, SHA1_SIZE);
+
+	block_reader_start(&br, &it);
+
+	while (true) {
+		int r = block_iter_next(&it, &rec);
+		assert(r >= 0);
+		if (r > 0) {
+			break;
+		}
+		assert_streq(names[j], ref.ref_name);
+		j++;
+	}
+
+	reftable_record_clear(&rec);
+	block_iter_close(&it);
+
+	for (i = 0; i < N; i++) {
+		struct block_iter it = { .last_key = SLICE_INIT };
+		slice_set_string(&want, names[i]);
+
+		n = block_reader_seek(&br, &it, &want);
+		assert(n == 0);
+
+		n = block_iter_next(&it, &rec);
+		assert(n == 0);
+
+		assert_streq(names[i], ref.ref_name);
+
+		want.len--;
+		n = block_reader_seek(&br, &it, &want);
+		assert(n == 0);
+
+		n = block_iter_next(&it, &rec);
+		assert(n == 0);
+		assert_streq(names[10 * (i / 10)], ref.ref_name);
+
+		block_iter_close(&it);
+	}
+
+	reftable_record_clear(&rec);
+	reftable_block_done(&br.block);
+	slice_release(&want);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+}
+
+int block_test_main(int argc, const char *argv[])
+{
+	add_test_case("binsearch", &test_binsearch);
+	add_test_case("block_read_write", &test_block_read_write);
+	return test_main(argc, argv);
+}
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 00000000000..5eee72c4c11
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,21 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 00000000000..63843a640b0
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,95 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	reftable_free(b);
+}
+
+static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = reftable_malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size)
+		return -1;
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int reftable_block_source_from_file(struct reftable_block_source *bs,
+				    const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	struct file_block_source *p = NULL;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return REFTABLE_NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0)
+		return -1;
+
+	p = reftable_calloc(sizeof(struct file_block_source));
+	p->size = st.st_size;
+	p->fd = fd;
+
+	assert(bs->ops == NULL);
+	bs->ops = &file_vtable;
+	bs->arg = p;
+	return 0;
+}
+
+int reftable_fd_write(void *arg, byte *data, size_t sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 00000000000..df7bcdc2160
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,243 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct reftable_iterator *it)
+{
+	return it->ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct reftable_record *rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct reftable_iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct reftable_iterator *it)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec)
+{
+	return it->ops->next(it->iter_arg, rec);
+}
+
+void reftable_iterator_destroy(struct reftable_iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, ref);
+	return iterator_next(it, &rec);
+}
+
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, log);
+	return iterator_next(it, &rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	slice_release(&fri->oid);
+	reftable_iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg,
+				       struct reftable_record *rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+	int err = 0;
+	while (true) {
+		err = reftable_iterator_next_ref(&fri->it, ref);
+		if (err != 0) {
+			break;
+		}
+
+		if (fri->double_check) {
+			struct reftable_iterator it = { 0 };
+
+			err = reftable_table_seek_ref(&fri->tab, &it,
+						      ref->ref_name);
+			if (err == 0) {
+				err = reftable_iterator_next_ref(&it, ref);
+			}
+
+			reftable_iterator_destroy(&it);
+
+			if (err < 0) {
+				break;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+
+	reftable_ref_record_clear(ref);
+	return err;
+}
+
+struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reftable_block_done(&it->block_reader.block);
+	slice_release(&it->oid);
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	uint64_t off;
+	int err = 0;
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reftable_block_done(&it->block_reader.block);
+
+	off = it->offsets[it->offset_idx++];
+	err = reader_init_block_reader(it->r, &it->block_reader, off,
+				       BLOCK_TYPE_REF);
+	if (err < 0) {
+		return err;
+	}
+	if (err > 0) {
+		/* indexed block does not exist. */
+		return REFTABLE_FORMAT_ERROR;
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter empty = INDEXED_TABLE_REF_ITER_INIT;
+	struct indexed_table_ref_iter *itr =
+		reftable_calloc(sizeof(struct indexed_table_ref_iter));
+	int err = 0;
+
+	*itr = empty;
+	itr->r = r;
+	slice_resize(&itr->oid, oid_len);
+	memcpy(itr->oid.buf, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		reftable_free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 00000000000..b6294cf3206
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,72 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct reftable_iterator_vtable {
+	int (*next)(void *iter_arg, struct reftable_record *rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct reftable_iterator *it);
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec);
+
+/* Returns true for a zeroed out iterator, such as the one returned from
+   iterator_destroy. */
+bool iterator_is_null(struct reftable_iterator *it);
+
+/* iterator that produces only ref records that point to `oid` */
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reftable_table tab;
+	struct slice oid;
+	struct reftable_iterator it;
+};
+#define FILTERING_REF_ITERATOR_INIT \
+	{                           \
+		.oid = SLICE_INIT   \
+	}
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
+					  struct filtering_ref_iterator *);
+
+/* iterator that produces only ref records that point to `oid`,
+   but using the object index.
+ */
+struct indexed_table_ref_iter {
+	struct reftable_reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+#define INDEXED_TABLE_REF_ITER_INIT                                   \
+	{                                                             \
+		.cur = { .last_key = SLICE_INIT }, .oid = SLICE_INIT, \
+	}
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 00000000000..5670a52a80d
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,320 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct reftable_record rec = reftable_new_record(mi->typ);
+		int err = iterator_next(&mi->stack[i], &rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[i]);
+			reftable_record_destroy(&rec);
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->stack[i]);
+	}
+	reftable_free(mi->stack);
+}
+
+static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
+					       size_t idx)
+{
+	struct reftable_record rec = reftable_new_record(mi->typ);
+	struct pq_entry e = {
+		.rec = rec,
+		.index = idx,
+	};
+	int err = iterator_next(&mi->stack[idx], &rec);
+	if (err < 0)
+		return err;
+
+	if (err > 0) {
+		reftable_iterator_destroy(&mi->stack[idx]);
+		reftable_record_destroy(&rec);
+		return 0;
+	}
+
+	merged_iter_pqueue_add(&mi->pq, e);
+	return 0;
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
+{
+	if (iterator_is_null(&mi->stack[idx]))
+		return 0;
+	return merged_iter_advance_nonnull_subiter(mi, idx);
+}
+
+static int merged_iter_next_entry(struct merged_iter *mi,
+				  struct reftable_record *rec)
+{
+	struct slice entry_key = SLICE_INIT;
+	struct pq_entry entry = { 0 };
+	int err = 0;
+
+	if (merged_iter_pqueue_is_empty(mi->pq))
+		return 1;
+
+	entry = merged_iter_pqueue_remove(&mi->pq);
+	err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0)
+		return err;
+
+	/*
+	  One can also use reftable as datacenter-local storage, where the ref
+	  database is maintained in globally consistent database (eg.
+	  CockroachDB or Spanner). In this scenario, replication delays together
+	  with compaction may cause newer tables to contain older entries. In
+	  such a deployment, the loop below must be changed to collect all
+	  entries for the same key, and return new the newest one.
+	*/
+	reftable_record_key(&entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = SLICE_INIT;
+		int err = 0, cmp = 0;
+
+		reftable_record_key(&top.rec, &k);
+
+		cmp = slice_cmp(&k, &entry_key);
+		slice_release(&k);
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		reftable_record_destroy(&top.rec);
+	}
+
+	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
+	reftable_record_destroy(&entry.rec);
+	slice_release(&entry_key);
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec)
+{
+	while (true) {
+		int err = merged_iter_next_entry(mi, rec);
+		if (err == 0 && mi->suppress_deletions &&
+		    reftable_record_is_deletion(rec)) {
+			continue;
+		}
+
+		return err;
+	}
+}
+
+static int merged_iter_next_void(void *p, struct reftable_record *rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq))
+		return 1;
+
+	return merged_iter_next(mi, rec);
+}
+
+struct reftable_iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct reftable_iterator *it,
+				      struct merged_iter *mi)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id)
+{
+	struct reftable_merged_table *m = NULL;
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reftable_reader *r = stack[i];
+		if (r->hash_id != hash_id) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reftable_reader_min_update_index(r);
+		}
+
+		last_max = reftable_reader_max_update_index(r);
+	}
+
+	m = (struct reftable_merged_table *)reftable_calloc(
+		sizeof(struct reftable_merged_table));
+	m->stack = stack;
+	m->stack_len = n;
+	m->min = first_min;
+	m->max = last_max;
+	m->hash_id = hash_id;
+	*dest = m;
+	return 0;
+}
+
+void reftable_merged_table_close(struct reftable_merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reftable_reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct reftable_merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void reftable_merged_table_free(struct reftable_merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	reftable_free(mt);
+}
+
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
+{
+	return mt->min;
+}
+
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec)
+{
+	struct reftable_iterator *iters = reftable_calloc(
+		sizeof(struct reftable_iterator) * mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = reftable_record_type(rec),
+		.hash_id = mt->hash_id,
+		.suppress_deletions = mt->suppress_deletions,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			reftable_iterator_destroy(&iters[i]);
+		}
+		reftable_free(iters);
+		return err;
+	}
+
+	merged.stack_len = n;
+	err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	} else {
+		struct merged_iter *p =
+			reftable_malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 00000000000..4e03642bdbe
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct reftable_merged_table {
+	struct reftable_reader **stack;
+	int stack_len;
+	uint32_t hash_id;
+	bool suppress_deletions;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	int stack_len;
+	byte typ;
+	bool suppress_deletions;
+	struct merged_iter_pqueue pq;
+};
+
+void merged_table_clear(struct reftable_merged_table *mt);
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec);
+
+#endif
diff --git a/reftable/merged_test.c b/reftable/merged_test.c
new file mode 100644
index 00000000000..180ebcb1158
--- /dev/null
+++ b/reftable/merged_test.c
@@ -0,0 +1,273 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "pq.h"
+#include "reader.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_pq(void)
+{
+	char *names[54] = { 0 };
+	int N = ARRAY_SIZE(names) - 1;
+
+	struct merged_iter_pqueue pq = { 0 };
+	const char *last = NULL;
+
+	int i = 0;
+	for (i = 0; i < N; i++) {
+		char name[100];
+		snprintf(name, sizeof(name), "%02d", i);
+		names[i] = xstrdup(name);
+	}
+
+	i = 1;
+	do {
+		struct reftable_record rec =
+			reftable_new_record(BLOCK_TYPE_REF);
+		struct pq_entry e = { 0 };
+
+		reftable_record_as_ref(&rec)->ref_name = names[i];
+		e.rec = rec;
+		merged_iter_pqueue_add(&pq, e);
+		merged_iter_pqueue_check(pq);
+		i = (i * 7) % N;
+	} while (i != 1);
+
+	while (!merged_iter_pqueue_is_empty(pq)) {
+		struct pq_entry e = merged_iter_pqueue_remove(&pq);
+		struct reftable_ref_record *ref =
+			reftable_record_as_ref(&e.rec);
+
+		merged_iter_pqueue_check(pq);
+
+		if (last != NULL) {
+			assert(strcmp(last, ref->ref_name) < 0);
+		}
+		last = ref->ref_name;
+		ref->ref_name = NULL;
+		reftable_free(ref);
+	}
+
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+
+	merged_iter_pqueue_clear(&pq);
+}
+
+static void write_test_table(struct slice *buf,
+			     struct reftable_ref_record refs[], int n)
+{
+	int min = 0xffffffff;
+	int max = 0;
+	int i = 0;
+	int err;
+
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_writer *w = NULL;
+	for (i = 0; i < n; i++) {
+		uint64_t ui = refs[i].update_index;
+		if (ui > max) {
+			max = ui;
+		}
+		if (ui < min) {
+			min = ui;
+		}
+	}
+
+	w = reftable_new_writer(&slice_add_void, buf, &opts);
+	reftable_writer_set_limits(w, min, max);
+
+	for (i = 0; i < n; i++) {
+		uint64_t before = refs[i].update_index;
+		int n = reftable_writer_add_ref(w, &refs[i]);
+		assert(n == 0);
+		assert(before == refs[i].update_index);
+	}
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+
+	reftable_writer_free(w);
+}
+
+static struct reftable_merged_table *
+merged_table_from_records(struct reftable_ref_record **refs,
+			  struct reftable_block_source **source, int *sizes,
+			  struct slice *buf, int n)
+{
+	struct reftable_reader **rd = reftable_calloc(n * sizeof(*rd));
+	int i = 0;
+	struct reftable_merged_table *mt = NULL;
+	int err;
+	*source = reftable_calloc(n * sizeof(**source));
+	for (i = 0; i < n; i++) {
+		write_test_table(&buf[i], refs[i], sizes[i]);
+		block_source_from_slice(&(*source)[i], &buf[i]);
+
+		err = reftable_new_reader(&rd[i], &(*source)[i], "name");
+		assert_err(err);
+	}
+
+	err = reftable_new_merged_table(&mt, rd, n, SHA1_ID);
+	assert_err(err);
+	return mt;
+}
+
+static void test_merged_between(void)
+{
+	byte hash1[SHA1_SIZE] = { 1, 2, 3, 0 };
+
+	struct reftable_ref_record r1[] = { {
+		.ref_name = "b",
+		.update_index = 1,
+		.value = hash1,
+	} };
+	struct reftable_ref_record r2[] = { {
+		.ref_name = "a",
+		.update_index = 2,
+	} };
+
+	struct reftable_ref_record *refs[] = { r1, r2 };
+	int sizes[] = { 1, 1 };
+	struct slice bufs[2] = { SLICE_INIT, SLICE_INIT };
+	struct reftable_block_source *bs = NULL;
+	struct reftable_merged_table *mt =
+		merged_table_from_records(refs, &bs, sizes, bufs, 2);
+	int i;
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_iterator it = { 0 };
+	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert_err(err);
+	assert(ref.update_index == 2);
+	reftable_ref_record_clear(&ref);
+
+	reftable_iterator_destroy(&it);
+	reftable_merged_table_close(mt);
+	reftable_merged_table_free(mt);
+	for (i = 0; i < ARRAY_SIZE(bufs); i++) {
+		slice_release(&bufs[i]);
+	}
+	reftable_free(bs);
+}
+
+static void test_merged(void)
+{
+	byte hash1[SHA1_SIZE] = { 1 };
+	byte hash2[SHA1_SIZE] = { 2 };
+	struct reftable_ref_record r1[] = { {
+						    .ref_name = "a",
+						    .update_index = 1,
+						    .value = hash1,
+					    },
+					    {
+						    .ref_name = "b",
+						    .update_index = 1,
+						    .value = hash1,
+					    },
+					    {
+						    .ref_name = "c",
+						    .update_index = 1,
+						    .value = hash1,
+					    } };
+	struct reftable_ref_record r2[] = { {
+		.ref_name = "a",
+		.update_index = 2,
+	} };
+	struct reftable_ref_record r3[] = {
+		{
+			.ref_name = "c",
+			.update_index = 3,
+			.value = hash2,
+		},
+		{
+			.ref_name = "d",
+			.update_index = 3,
+			.value = hash1,
+		},
+	};
+
+	struct reftable_ref_record want[] = {
+		r2[0],
+		r1[1],
+		r3[0],
+		r3[1],
+	};
+
+	struct reftable_ref_record *refs[] = { r1, r2, r3 };
+	int sizes[3] = { 3, 1, 2 };
+	struct slice bufs[3] = { SLICE_INIT, SLICE_INIT, SLICE_INIT };
+	struct reftable_block_source *bs = NULL;
+
+	struct reftable_merged_table *mt =
+		merged_table_from_records(refs, &bs, sizes, bufs, 3);
+
+	struct reftable_iterator it = { 0 };
+	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	struct reftable_ref_record *out = NULL;
+	int len = 0;
+	int cap = 0;
+	int i = 0;
+
+	assert_err(err);
+	while (len < 100) { /* cap loops/recursion. */
+		struct reftable_ref_record ref = { 0 };
+		int err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			out = reftable_realloc(
+				out, sizeof(struct reftable_ref_record) * cap);
+		}
+		out[len++] = ref;
+	}
+	reftable_iterator_destroy(&it);
+
+	assert(ARRAY_SIZE(want) == len);
+	for (i = 0; i < len; i++) {
+		assert(reftable_ref_record_equal(&want[i], &out[i], SHA1_SIZE));
+	}
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&out[i]);
+	}
+	reftable_free(out);
+
+	for (i = 0; i < 3; i++) {
+		slice_release(&bufs[i]);
+	}
+	reftable_merged_table_close(mt);
+	reftable_merged_table_free(mt);
+	reftable_free(bs);
+}
+
+/* XXX test refs_for(oid) */
+
+int merged_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_merged_between", &test_merged_between);
+	add_test_case("test_pq", &test_pq);
+	add_test_case("test_merged", &test_merged);
+	return test_main(argc, argv);
+}
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 00000000000..3910d497113
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,113 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = SLICE_INIT;
+	struct slice bk = SLICE_INIT;
+	int cmp = 0;
+	reftable_record_key(&a.rec, &ak);
+	reftable_record_key(&b.rec, &bk);
+
+	cmp = slice_cmp(&ak, &bk);
+
+	slice_release(&ak);
+	slice_release(&bk);
+
+	if (cmp == 0)
+		return a.index > b.index;
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = reftable_realloc(pq->heap,
+					    pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		reftable_record_destroy(&pq->heap[i].rec);
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 00000000000..43c0974f162
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct reftable_record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 00000000000..4e3c27cbc2e
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,742 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct reftable_block_source *source)
+{
+	return source->ops->size(source->arg);
+}
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	int result = source->ops->read_block(source->arg, dest, off, size);
+	dest->source = *source;
+	return result;
+}
+
+void reftable_block_done(struct reftable_block *blockp)
+{
+	struct reftable_block_source source = blockp->source;
+	if (blockp != NULL && source.ops != NULL)
+		source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct reftable_block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reftable_reader_offsets *
+reader_offsets_for(struct reftable_reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reftable_reader *r,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size)
+		return 0;
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(&r->source, dest, off, sz);
+}
+
+uint32_t reftable_reader_hash_id(struct reftable_reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reftable_reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reftable_reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	byte first_block_typ;
+	int err = 0;
+	uint32_t computed_crc;
+	uint32_t file_crc;
+
+	if (memcmp(f, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, header_size(r->version))) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	f++;
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	if (r->version == 1) {
+		r->hash_id = SHA1_ID;
+	} else {
+		r->hash_id = get_be32(f);
+		switch (r->hash_id) {
+		case SHA1_ID:
+			break;
+		case SHA256_ID:
+			break;
+		default:
+			err = REFTABLE_FORMAT_ERROR;
+			goto done;
+		}
+		f += 4;
+	}
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	computed_crc = crc32(0, footer, f - footer);
+	file_crc = get_be32(f);
+	f += 4;
+	if (computed_crc != file_crc) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	first_block_typ = header[header_size(r->version)];
+	r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+	r->ref_offsets.offset = 0;
+	r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+				  r->log_offsets.offset > 0);
+	r->obj_offsets.present = r->obj_offsets.offset > 0;
+	err = 0;
+done:
+	return err;
+}
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name)
+{
+	struct reftable_block footer = { 0 };
+	struct reftable_block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reftable_reader));
+
+	/* Need +1 to read type of first block. */
+	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
+	if (err != header_size(2) + 1) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	if (memcmp(header.data, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+	r->version = header.data[4];
+	if (r->version != 1 && r->version != 2) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	r->size = block_source_size(source) - footer_size(r->version);
+	r->source = *source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size,
+				      footer_size(r->version));
+	if (err != footer_size(r->version)) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+done:
+	reftable_block_done(&footer);
+	reftable_block_done(&header);
+	return err;
+}
+
+struct table_iter {
+	struct reftable_reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+#define TABLE_ITER_INIT                         \
+	{                                       \
+		.bi = {.last_key = SLICE_INIT } \
+	}
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti,
+				    struct reftable_record *rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && reftable_record_type(rec) == BLOCK_TYPE_REF) {
+		((struct reftable_ref_record *)rec->data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reftable_block_done(&ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off,
+				  int version)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += header_size(version);
+	}
+
+	*typ = data[0];
+	if (reftable_is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct reftable_block block = { 0 };
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : header_size(r->version);
+	int32_t block_size = 0;
+
+	if (next_off >= r->size)
+		return 1;
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0)
+		return err;
+
+	block_size = extract_block_size(block.data, &block_typ, next_off,
+					r->version);
+	if (block_size < 0)
+		return block_size;
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reftable_block_done(&block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reftable_block_done(&block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0)
+		return err;
+	else {
+		struct block_reader *brp =
+			reftable_malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
+{
+	if (reftable_record_type(rec) != ti->typ)
+		return REFTABLE_API_ERROR;
+
+	while (true) {
+		struct table_iter next = TABLE_ITER_INIT;
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct reftable_record *rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct reftable_iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reftable_reader *r,
+				struct table_iter *ti, uint64_t off, byte typ)
+{
+	struct block_reader br = { 0 };
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0)
+		return err;
+
+	brp = reftable_malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reftable_reader *r, struct table_iter *ti,
+			byte typ, bool index)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
+			      struct reftable_record *want)
+{
+	struct reftable_record rec =
+		reftable_new_record(reftable_record_type(want));
+	struct slice want_key = SLICE_INIT;
+	struct slice got_key = SLICE_INIT;
+	struct table_iter next = TABLE_ITER_INIT;
+	int err = -1;
+
+	reftable_record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0)
+			goto done;
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0)
+			goto done;
+
+		if (slice_cmp(&got_key, &want_key) > 0) {
+			table_iter_block_done(&next);
+			break;
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, &want_key);
+	if (err < 0)
+		goto done;
+	err = 0;
+
+done:
+	block_iter_close(&next.bi);
+	reftable_record_destroy(&rec);
+	slice_release(&want_key);
+	slice_release(&got_key);
+	return err;
+}
+
+static int reader_seek_indexed(struct reftable_reader *r,
+			       struct reftable_iterator *it,
+			       struct reftable_record *rec)
+{
+	struct reftable_index_record want_index = { .last_key = SLICE_INIT };
+	struct reftable_record want_index_rec = { 0 };
+	struct reftable_index_record index_result = { .last_key = SLICE_INIT };
+	struct reftable_record index_result_rec = { 0 };
+	struct table_iter index_iter = TABLE_ITER_INIT;
+	struct table_iter next = TABLE_ITER_INIT;
+	int err = 0;
+
+	reftable_record_key(rec, &want_index.last_key);
+	reftable_record_from_index(&want_index_rec, &want_index);
+	reftable_record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, reftable_record_type(rec), true);
+	if (err < 0)
+		goto done;
+
+	err = reader_seek_linear(r, &index_iter, &want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, &index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0)
+			goto done;
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0)
+			goto done;
+
+		err = block_iter_seek(&next.bi, &want_index.last_key);
+		if (err < 0)
+			goto done;
+
+		if (next.typ == reftable_record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = REFTABLE_FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter empty = TABLE_ITER_INIT;
+		struct table_iter *malloced =
+			reftable_calloc(sizeof(struct table_iter));
+		*malloced = empty;
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+done:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	reftable_record_clear(&want_index_rec);
+	reftable_record_clear(&index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reftable_reader *r,
+				struct reftable_iterator *it,
+				struct reftable_record *rec)
+{
+	struct reftable_reader_offsets *offs =
+		reader_offsets_for(r, reftable_record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = TABLE_ITER_INIT;
+	int err = 0;
+	if (idx > 0)
+		return reader_seek_indexed(r, it, rec);
+
+	err = reader_start(r, &ti, reftable_record_type(rec), false);
+	if (err < 0)
+		return err;
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0)
+		return err;
+	else {
+		struct table_iter *p =
+			reftable_malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec)
+{
+	byte typ = reftable_record_type(rec);
+
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reftable_reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int reftable_new_reader(struct reftable_reader **p,
+			struct reftable_block_source *src, char const *name)
+{
+	struct reftable_reader *rd =
+		reftable_calloc(sizeof(struct reftable_reader));
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		block_source_close(src);
+		reftable_free(rd);
+	}
+	return err;
+}
+
+void reftable_reader_free(struct reftable_reader *r)
+{
+	reader_close(r);
+	reftable_free(r);
+}
+
+static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
+					    struct reftable_iterator *it,
+					    byte *oid)
+{
+	struct reftable_obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct reftable_record want_rec = { 0 };
+	struct reftable_iterator oit = { 0 };
+	struct reftable_obj_record got = { 0 };
+	struct reftable_record got_rec = { 0 };
+	int err = 0;
+	struct indexed_table_ref_iter *itr = NULL;
+
+	/* Look through the reverse index. */
+	reftable_record_from_obj(&want_rec, &want);
+	err = reader_seek(r, &oit, &want_rec);
+	if (err != 0)
+		goto done;
+
+	/* read out the reftable_obj_record */
+	reftable_record_from_obj(&got_rec, &got);
+	err = iterator_next(&oit, &got_rec);
+	if (err < 0)
+		goto done;
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		/* didn't find it; return empty iterator */
+		iterator_set_empty(it);
+		err = 0;
+		goto done;
+	}
+
+	err = new_indexed_table_ref_iter(&itr, r, oid, hash_size(r->hash_id),
+					 got.offsets, got.offset_len);
+	if (err < 0)
+		goto done;
+	got.offsets = NULL;
+	iterator_from_indexed_table_ref_iter(it, itr);
+
+done:
+	reftable_iterator_destroy(&oit);
+	reftable_record_clear(&got_rec);
+	return err;
+}
+
+static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
+					      struct reftable_iterator *it,
+					      byte *oid)
+{
+	struct table_iter ti_empty = TABLE_ITER_INIT;
+	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
+	struct filtering_ref_iterator *filter = NULL;
+	struct filtering_ref_iterator empty = FILTERING_REF_ITERATOR_INIT;
+	int oid_len = hash_size(r->hash_id);
+	int err;
+
+	*ti = ti_empty;
+	err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		reftable_free(ti);
+		return err;
+	}
+
+	filter = reftable_malloc(sizeof(struct filtering_ref_iterator));
+	*filter = empty;
+	slice_resize(&filter->oid, oid_len);
+	memcpy(filter->oid.buf, oid, oid_len);
+	reftable_table_from_reader(&filter->tab, r);
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, byte *oid)
+{
+	if (r->obj_offsets.present)
+		return reftable_reader_refs_for_indexed(r, it, oid);
+	return reftable_reader_refs_for_unindexed(r, it, oid);
+}
+
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 00000000000..cc87aa49ea5
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,65 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source *source);
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_close(struct reftable_block_source *source);
+
+/* metadata for a block type */
+struct reftable_reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+/* The state for reading a reftable file. */
+struct reftable_reader {
+	/* for convience, associate a name with the instance. */
+	char *name;
+	struct reftable_block_source source;
+
+	/* Size of the file, excluding the footer. */
+	uint64_t size;
+
+	/* 'sha1' for SHA1, 's256' for SHA-256 */
+	uint32_t hash_id;
+
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	/* Length of the OID keys in the 'o' section */
+	int object_id_len;
+	int version;
+
+	struct reftable_reader_offsets ref_offsets;
+	struct reftable_reader_offsets obj_offsets;
+	struct reftable_reader_offsets log_offsets;
+};
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name);
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec);
+void reader_close(struct reftable_reader *r);
+const char *reader_name(struct reftable_reader *r);
+
+/* initialize a block reader to read from `r` */
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 00000000000..38b690b6772
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1113 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+/* record.c - methods for different types of records. */
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int get_var_int(uint64_t *dest, struct slice *in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in->len == 0)
+		return -1;
+	val = in->buf[ptr] & 0x7f;
+
+	while (in->buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in->len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in->buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice *dest, uint64_t val)
+{
+	byte buf[10] = { 0 };
+	int i = 9;
+	int n = 0;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	n = sizeof(buf) - i - 1;
+	if (dest->len < n)
+		return -1;
+	memcpy(dest->buf, &buf[i + 1], n);
+	return n;
+}
+
+int reftable_is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, &in);
+	if (n <= 0)
+		return -1;
+	slice_consume(&in, n);
+	if (in.len < tsize)
+		return -1;
+
+	slice_resize(dest, tsize + 1);
+	dest->buf[tsize] = 0;
+	memcpy(dest->buf, in.buf, tsize);
+	slice_consume(&in, tsize);
+
+	return start_len - in.len;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(&s, l);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+	if (s.len < l)
+		return -1;
+	memcpy(s.buf, str, l);
+	slice_consume(&s, l);
+
+	return start.len - s.len;
+}
+
+int reftable_encode_key(bool *restart, struct slice dest, struct slice prev_key,
+			struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(&prev_key, &key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(&dest, (uint64_t)prefix_len);
+	if (n < 0)
+		return -1;
+	slice_consume(&dest, n);
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(&dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0)
+		return -1;
+	slice_consume(&dest, n);
+
+	if (dest.len < suffix_len)
+		return -1;
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	slice_consume(&dest, suffix_len);
+
+	return start.len - dest.len;
+}
+
+int reftable_decode_key(struct slice *key, byte *extra, struct slice last_key,
+			struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, &in);
+	if (n < 0)
+		return -1;
+	slice_consume(&in, n);
+
+	if (prefix_len > last_key.len)
+		return -1;
+
+	n = get_var_int(&suffix_len, &in);
+	if (n <= 0)
+		return -1;
+	slice_consume(&in, n);
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len)
+		return -1;
+
+	slice_resize(key, suffix_len + prefix_len);
+	memcpy(key->buf, last_key.buf, prefix_len);
+
+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
+	slice_consume(&in, suffix_len);
+
+	return start_len - in.len;
+}
+
+static void reftable_ref_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_ref_record *rec =
+		(const struct reftable_ref_record *)r;
+	slice_set_string(dest, rec->ref_name);
+}
+
+static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
+	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	reftable_ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = xstrdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = reftable_malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = reftable_malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9)
+		return '0' + c;
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size(hash_id));
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size(hash_id));
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void reftable_ref_record_clear_void(void *rec)
+{
+	reftable_ref_record_clear((struct reftable_ref_record *)rec);
+}
+
+void reftable_ref_record_clear(struct reftable_ref_record *ref)
+{
+	reftable_free(ref->ref_name);
+	reftable_free(ref->target);
+	reftable_free(ref->target_value);
+	reftable_free(ref->value);
+	memset(ref, 0, sizeof(struct reftable_ref_record));
+}
+
+static byte reftable_ref_record_val_type(const void *rec)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL)
+		return 3;
+	return 0;
+}
+
+static int reftable_ref_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(&s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+
+	return start.len - s.len;
+}
+
+static int reftable_ref_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, &in);
+	if (n < 0)
+		return n;
+	assert(hash_size > 0);
+
+	slice_consume(&in, n);
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = reftable_malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = reftable_malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		break;
+	case 3: {
+		struct slice dest = SLICE_INIT;
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+		seen_target = true;
+		if (r->target != NULL) {
+			reftable_free(r->target);
+		}
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+static bool reftable_ref_record_is_deletion_void(const void *p)
+{
+	return reftable_ref_record_is_deletion(
+		(const struct reftable_ref_record *)p);
+}
+
+struct reftable_record_vtable reftable_ref_record_vtable = {
+	.key = &reftable_ref_record_key,
+	.type = BLOCK_TYPE_REF,
+	.copy_from = &reftable_ref_record_copy_from,
+	.val_type = &reftable_ref_record_val_type,
+	.encode = &reftable_ref_record_encode,
+	.decode = &reftable_ref_record_decode,
+	.clear = &reftable_ref_record_clear_void,
+	.is_deletion = &reftable_ref_record_is_deletion_void,
+};
+
+static void reftable_obj_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_obj_record *rec =
+		(const struct reftable_obj_record *)r;
+	slice_resize(dest, rec->hash_prefix_len);
+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void reftable_obj_record_clear(void *rec)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	FREE_AND_NULL(obj->hash_prefix);
+	FREE_AND_NULL(obj->offsets);
+	memset(obj, 0, sizeof(struct reftable_obj_record));
+}
+
+static void reftable_obj_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	const struct reftable_obj_record *src =
+		(const struct reftable_obj_record *)src_rec;
+	int olen;
+
+	reftable_obj_record_clear(obj);
+	*obj = *src;
+	obj->hash_prefix = reftable_malloc(obj->hash_prefix_len);
+	memcpy(obj->hash_prefix, src->hash_prefix, obj->hash_prefix_len);
+
+	olen = obj->offset_len * sizeof(uint64_t);
+	obj->offsets = reftable_malloc(olen);
+	memcpy(obj->offsets, src->offsets, olen);
+}
+
+static byte reftable_obj_record_val_type(const void *rec)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8)
+		return r->offset_len;
+	return 0;
+}
+
+static int reftable_obj_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	struct slice start = s;
+	int i = 0;
+	int n = 0;
+	uint64_t last = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(&s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+	if (r->offset_len == 0)
+		return start.len - s.len;
+	n = put_var_int(&s, r->offsets[0]);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	last = r->offsets[0];
+	for (i = 1; i < r->offset_len; i++) {
+		int n = put_var_int(&s, r->offsets[i] - last);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+		last = r->offsets[i];
+	}
+	return start.len - s.len;
+}
+
+static int reftable_obj_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	uint64_t last;
+	int j;
+	r->hash_prefix = reftable_malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, &in);
+		if (n < 0) {
+			return n;
+		}
+
+		slice_consume(&in, n);
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0)
+		return start.len - in.len;
+
+	r->offsets = reftable_malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], &in);
+	if (n < 0)
+		return n;
+	slice_consume(&in, n);
+
+	last = r->offsets[0];
+	j = 1;
+	while (j < count) {
+		uint64_t delta = 0;
+		int n = get_var_int(&delta, &in);
+		if (n < 0) {
+			return n;
+		}
+		slice_consume(&in, n);
+
+		last = r->offsets[j] = (delta + last);
+		j++;
+	}
+	return start.len - in.len;
+}
+
+static bool not_a_deletion(const void *p)
+{
+	return false;
+}
+
+struct reftable_record_vtable reftable_obj_record_vtable = {
+	.key = &reftable_obj_record_key,
+	.type = BLOCK_TYPE_OBJ,
+	.copy_from = &reftable_obj_record_copy_from,
+	.val_type = &reftable_obj_record_val_type,
+	.encode = &reftable_obj_record_encode,
+	.decode = &reftable_obj_record_decode,
+	.clear = &reftable_obj_record_clear,
+	.is_deletion = not_a_deletion,
+};
+
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size(hash_id));
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size(hash_id));
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static void reftable_log_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_log_record *rec =
+		(const struct reftable_log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint64_t ts = 0;
+	slice_resize(dest, len + 9);
+	memcpy(dest->buf, rec->ref_name, len + 1);
+	ts = (~ts) - rec->update_index;
+	put_be64(dest->buf + 1 + len, ts);
+}
+
+static void reftable_log_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
+	const struct reftable_log_record *src =
+		(const struct reftable_log_record *)src_rec;
+
+	reftable_log_record_clear(dst);
+	*dst = *src;
+	if (dst->ref_name != NULL) {
+		dst->ref_name = xstrdup(dst->ref_name);
+	}
+	if (dst->email != NULL) {
+		dst->email = xstrdup(dst->email);
+	}
+	if (dst->name != NULL) {
+		dst->name = xstrdup(dst->name);
+	}
+	if (dst->message != NULL) {
+		dst->message = xstrdup(dst->message);
+	}
+
+	if (dst->new_hash != NULL) {
+		dst->new_hash = reftable_malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = reftable_malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void reftable_log_record_clear_void(void *rec)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	reftable_log_record_clear(r);
+}
+
+void reftable_log_record_clear(struct reftable_log_record *r)
+{
+	reftable_free(r->ref_name);
+	reftable_free(r->new_hash);
+	reftable_free(r->old_hash);
+	reftable_free(r->name);
+	reftable_free(r->email);
+	reftable_free(r->message);
+	memset(r, 0, sizeof(struct reftable_log_record));
+}
+
+static byte reftable_log_record_val_type(const void *rec)
+{
+	const struct reftable_log_record *log =
+		(const struct reftable_log_record *)rec;
+
+	return reftable_log_record_is_deletion(log) ? 0 : 1;
+}
+
+static byte zero[SHA256_SIZE] = { 0 };
+
+static int reftable_log_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (reftable_log_record_is_deletion(r))
+		return 0;
+
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size)
+		return -1;
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	slice_consume(&s, 2 * hash_size);
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	n = put_var_int(&s, r->time);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	if (s.len < 2)
+		return -1;
+
+	put_be16(s.buf, r->tz_offset);
+	slice_consume(&s, 2);
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	return start.len - s.len;
+}
+
+static int reftable_log_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = SLICE_INIT;
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0)
+		return REFTABLE_FORMAT_ERROR;
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (val_type == 0) {
+		FREE_AND_NULL(r->old_hash);
+		FREE_AND_NULL(r->new_hash);
+		FREE_AND_NULL(r->message);
+		FREE_AND_NULL(r->email);
+		FREE_AND_NULL(r->name);
+		return 0;
+	}
+
+	if (in.len < 2 * hash_size)
+		return REFTABLE_FORMAT_ERROR;
+
+	r->old_hash = reftable_realloc(r->old_hash, hash_size);
+	r->new_hash = reftable_realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	slice_consume(&in, 2 * hash_size);
+
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	slice_consume(&in, n);
+
+	r->name = reftable_realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	slice_consume(&in, n);
+
+	r->email = reftable_realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, &in);
+	if (n < 0)
+		goto done;
+	slice_consume(&in, n);
+	r->time = ts;
+	if (in.len < 2)
+		goto done;
+
+	r->tz_offset = get_be16(in.buf);
+	slice_consume(&in, 2);
+
+	slice_resize(&dest, 0);
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	slice_consume(&in, n);
+
+	r->message = reftable_realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	slice_release(&dest);
+	return start.len - in.len;
+
+done:
+	slice_release(&dest);
+	return REFTABLE_FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL)
+		a = empty;
+
+	if (b == NULL)
+		b = empty;
+
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL)
+		a = zero;
+
+	if (b == NULL)
+		b = zero;
+
+	return !memcmp(a, b, sz);
+}
+
+bool reftable_log_record_equal(struct reftable_log_record *a,
+			       struct reftable_log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+static bool reftable_log_record_is_deletion_void(const void *p)
+{
+	return reftable_log_record_is_deletion(
+		(const struct reftable_log_record *)p);
+}
+
+struct reftable_record_vtable reftable_log_record_vtable = {
+	.key = &reftable_log_record_key,
+	.type = BLOCK_TYPE_LOG,
+	.copy_from = &reftable_log_record_copy_from,
+	.val_type = &reftable_log_record_val_type,
+	.encode = &reftable_log_record_encode,
+	.decode = &reftable_log_record_decode,
+	.clear = &reftable_log_record_clear_void,
+	.is_deletion = &reftable_log_record_is_deletion_void,
+};
+
+struct reftable_record reftable_new_record(byte typ)
+{
+	struct reftable_record rec = { NULL };
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct reftable_ref_record *r =
+			reftable_calloc(sizeof(struct reftable_ref_record));
+		reftable_record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct reftable_obj_record *r =
+			reftable_calloc(sizeof(struct reftable_obj_record));
+		reftable_record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct reftable_log_record *r =
+			reftable_calloc(sizeof(struct reftable_log_record));
+		reftable_record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct reftable_index_record empty = { .last_key = SLICE_INIT };
+		struct reftable_index_record *r =
+			reftable_calloc(sizeof(struct reftable_index_record));
+		*r = empty;
+		reftable_record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+void *reftable_record_yield(struct reftable_record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+void reftable_record_destroy(struct reftable_record *rec)
+{
+	reftable_record_clear(rec);
+	reftable_free(reftable_record_yield(rec));
+}
+
+static void reftable_index_record_key(const void *r, struct slice *dest)
+{
+	struct reftable_index_record *rec = (struct reftable_index_record *)r;
+	slice_copy(dest, &rec->last_key);
+}
+
+static void reftable_index_record_copy_from(void *rec, const void *src_rec,
+					    int hash_size)
+{
+	struct reftable_index_record *dst = (struct reftable_index_record *)rec;
+	struct reftable_index_record *src =
+		(struct reftable_index_record *)src_rec;
+
+	slice_copy(&dst->last_key, &src->last_key);
+	dst->offset = src->offset;
+}
+
+static void reftable_index_record_clear(void *rec)
+{
+	struct reftable_index_record *idx = (struct reftable_index_record *)rec;
+	slice_release(&idx->last_key);
+}
+
+static byte reftable_index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int reftable_index_record_encode(const void *rec, struct slice out,
+					int hash_size)
+{
+	const struct reftable_index_record *r =
+		(const struct reftable_index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(&out, r->offset);
+	if (n < 0)
+		return n;
+
+	slice_consume(&out, n);
+
+	return start.len - out.len;
+}
+
+static int reftable_index_record_decode(void *rec, struct slice key,
+					byte val_type, struct slice in,
+					int hash_size)
+{
+	struct slice start = in;
+	struct reftable_index_record *r = (struct reftable_index_record *)rec;
+	int n = 0;
+
+	slice_copy(&r->last_key, &key);
+
+	n = get_var_int(&r->offset, &in);
+	if (n < 0)
+		return n;
+
+	slice_consume(&in, n);
+	return start.len - in.len;
+}
+
+struct reftable_record_vtable reftable_index_record_vtable = {
+	.key = &reftable_index_record_key,
+	.type = BLOCK_TYPE_INDEX,
+	.copy_from = &reftable_index_record_copy_from,
+	.val_type = &reftable_index_record_val_type,
+	.encode = &reftable_index_record_encode,
+	.decode = &reftable_index_record_decode,
+	.clear = &reftable_index_record_clear,
+	.is_deletion = &not_a_deletion,
+};
+
+void reftable_record_key(struct reftable_record *rec, struct slice *dest)
+{
+	rec->ops->key(rec->data, dest);
+}
+
+byte reftable_record_type(struct reftable_record *rec)
+{
+	return rec->ops->type;
+}
+
+int reftable_record_encode(struct reftable_record *rec, struct slice dest,
+			   int hash_size)
+{
+	return rec->ops->encode(rec->data, dest, hash_size);
+}
+
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size)
+{
+	assert(src->ops->type == rec->ops->type);
+
+	rec->ops->copy_from(rec->data, src->data, hash_size);
+}
+
+byte reftable_record_val_type(struct reftable_record *rec)
+{
+	return rec->ops->val_type(rec->data);
+}
+
+int reftable_record_decode(struct reftable_record *rec, struct slice key,
+			   byte extra, struct slice src, int hash_size)
+{
+	return rec->ops->decode(rec->data, key, extra, src, hash_size);
+}
+
+void reftable_record_clear(struct reftable_record *rec)
+{
+	rec->ops->clear(rec->data);
+}
+
+bool reftable_record_is_deletion(struct reftable_record *rec)
+{
+	return rec->ops->is_deletion(rec->data);
+}
+
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *ref_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = ref_rec;
+	rec->ops = &reftable_ref_record_vtable;
+}
+
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *obj_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = obj_rec;
+	rec->ops = &reftable_obj_record_vtable;
+}
+
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *index_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = index_rec;
+	rec->ops = &reftable_index_record_vtable;
+}
+
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *log_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = log_rec;
+	rec->ops = &reftable_log_record_vtable;
+}
+
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_REF);
+	return (struct reftable_ref_record *)rec->data;
+}
+
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_LOG);
+	return (struct reftable_log_record *)rec->data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL)
+		return !memcmp(a, b, hash_size);
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL)
+		return 0 == strcmp(a, b);
+
+	return a == b;
+}
+
+bool reftable_ref_record_equal(struct reftable_ref_record *a,
+			       struct reftable_ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int reftable_ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_ref_record *)a)->ref_name,
+		      ((struct reftable_ref_record *)b)->ref_name);
+}
+
+bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int reftable_log_record_compare_key(const void *a, const void *b)
+{
+	struct reftable_log_record *la = (struct reftable_log_record *)a;
+	struct reftable_log_record *lb = (struct reftable_log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp)
+		return cmp;
+	if (la->update_index > lb->update_index)
+		return -1;
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
+{
+	return (log->new_hash == NULL && log->old_hash == NULL &&
+		log->name == NULL && log->email == NULL &&
+		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
+		log->message == NULL);
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 00000000000..f9cd6cc3ce2
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,128 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+/* utilities for de/encoding varints */
+
+int get_var_int(uint64_t *dest, struct slice *in);
+int put_var_int(struct slice *dest, uint64_t val);
+
+/* Methods for records. */
+struct reftable_record_vtable {
+	/* encode the key of to a byte slice. */
+	void (*key)(const void *rec, struct slice *dest);
+
+	/* The record type of ('r' for ref). */
+	byte type;
+
+	void (*copy_from)(void *dest, const void *src, int hash_size);
+
+	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
+	 * vs ref deletion) */
+	byte (*val_type)(const void *rec);
+
+	/* encodes rec into dest, returning how much space was used. */
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+
+	/* decode data from `src` into the record. */
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+
+	/* deallocate and null the record. */
+	void (*clear)(void *rec);
+
+	/* is this a tombstone? */
+	bool (*is_deletion)(const void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct reftable_record {
+	void *data;
+	struct reftable_record_vtable *ops;
+};
+
+/* returns true for recognized block types. Block start with the block type. */
+int reftable_is_block_type(byte typ);
+
+/* creates a malloced record of the given type. Dispose with record_destroy */
+struct reftable_record reftable_new_record(byte typ);
+
+extern struct reftable_record_vtable reftable_ref_record_vtable;
+
+/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
+   number of bytes written. */
+int reftable_encode_key(bool *restart, struct slice dest, struct slice prev_key,
+			struct slice key, byte extra);
+
+/* Decode into `key` and `extra` from `in` */
+int reftable_decode_key(struct slice *key, byte *extra, struct slice last_key,
+			struct slice in);
+
+/* reftable_index_record are used internally to speed up lookups. */
+struct reftable_index_record {
+	uint64_t offset; /* Offset of block */
+	struct slice last_key; /* Last key of the block. */
+};
+
+/* reftable_obj_record stores an object ID => ref mapping. */
+struct reftable_obj_record {
+	byte *hash_prefix; /* leading bytes of the object ID */
+	int hash_prefix_len; /* number of leading bytes. Constant
+			      * across a single table. */
+	uint64_t *offsets; /* a vector of file offsets. */
+	int offset_len;
+};
+
+/* see struct record_vtable */
+
+void reftable_record_key(struct reftable_record *rec, struct slice *dest);
+byte reftable_record_type(struct reftable_record *rec);
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size);
+byte reftable_record_val_type(struct reftable_record *rec);
+int reftable_record_encode(struct reftable_record *rec, struct slice dest,
+			   int hash_size);
+int reftable_record_decode(struct reftable_record *rec, struct slice key,
+			   byte extra, struct slice src, int hash_size);
+bool reftable_record_is_deletion(struct reftable_record *rec);
+
+/* zeroes out the embedded record */
+void reftable_record_clear(struct reftable_record *rec);
+
+/* clear out the record, yielding the reftable_record data that was
+ * encapsulated. */
+void *reftable_record_yield(struct reftable_record *rec);
+
+/* clear and deallocate embedded record, and zero `rec`. */
+void reftable_record_destroy(struct reftable_record *rec);
+
+/* initialize generic records from concrete records. The generic record should
+ * be zeroed out. */
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *objrec);
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *idxrec);
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *refrec);
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *logrec);
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *ref);
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *ref);
+
+/* for qsort. */
+int reftable_ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int reftable_log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/record_test.c b/reftable/record_test.c
new file mode 100644
index 00000000000..acdc07688b2
--- /dev/null
+++ b/reftable/record_test.c
@@ -0,0 +1,400 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include "system.h"
+#include "basics.h"
+#include "constants.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_copy(struct reftable_record *rec)
+{
+	struct reftable_record copy =
+		reftable_new_record(reftable_record_type(rec));
+	reftable_record_copy_from(&copy, rec, SHA1_SIZE);
+	/* do it twice to catch memory leaks */
+	reftable_record_copy_from(&copy, rec, SHA1_SIZE);
+	switch (reftable_record_type(&copy)) {
+	case BLOCK_TYPE_REF:
+		assert(reftable_ref_record_equal(reftable_record_as_ref(&copy),
+						 reftable_record_as_ref(rec),
+						 SHA1_SIZE));
+		break;
+	case BLOCK_TYPE_LOG:
+		assert(reftable_log_record_equal(reftable_record_as_log(&copy),
+						 reftable_record_as_log(rec),
+						 SHA1_SIZE));
+		break;
+	}
+	reftable_record_destroy(&copy);
+}
+
+static void test_varint_roundtrip(void)
+{
+	uint64_t inputs[] = { 0,
+			      1,
+			      27,
+			      127,
+			      128,
+			      257,
+			      4096,
+			      ((uint64_t)1 << 63),
+			      ((uint64_t)1 << 63) + ((uint64_t)1 << 63) - 1 };
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(inputs); i++) {
+		byte dest[10];
+
+		struct slice out = { .buf = dest, .len = 10, .cap = 10 };
+
+		uint64_t in = inputs[i];
+		int n = put_var_int(&out, in);
+		uint64_t got = 0;
+
+		assert(n > 0);
+		out.len = n;
+		n = get_var_int(&got, &out);
+		assert(n > 0);
+
+		assert(got == in);
+	}
+}
+
+static void test_common_prefix(void)
+{
+	struct {
+		const char *a, *b;
+		int want;
+	} cases[] = {
+		{ "abc", "ab", 2 },
+		{ "", "abc", 0 },
+		{ "abc", "abd", 2 },
+		{ "abc", "pqr", 0 },
+	};
+
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		struct slice a = SLICE_INIT;
+		struct slice b = SLICE_INIT;
+		slice_set_string(&a, cases[i].a);
+		slice_set_string(&b, cases[i].b);
+
+		assert(common_prefix_size(&a, &b) == cases[i].want);
+
+		slice_release(&a);
+		slice_release(&b);
+	}
+}
+
+static void set_hash(byte *h, int j)
+{
+	int i = 0;
+	for (i = 0; i < hash_size(SHA1_ID); i++) {
+		h[i] = (j >> i) & 0xff;
+	}
+}
+
+static void test_reftable_ref_record_roundtrip(void)
+{
+	int i = 0;
+
+	for (i = 0; i <= 3; i++) {
+		struct reftable_ref_record in = { 0 };
+		struct reftable_ref_record out = {
+			.ref_name = xstrdup("old name"),
+			.value = reftable_calloc(SHA1_SIZE),
+			.target_value = reftable_calloc(SHA1_SIZE),
+			.target = xstrdup("old value"),
+		};
+		struct reftable_record rec_out = { 0 };
+		struct slice key = SLICE_INIT;
+		struct reftable_record rec = { 0 };
+		struct slice dest = SLICE_INIT;
+		int n, m;
+
+		switch (i) {
+		case 0:
+			break;
+		case 1:
+			in.value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.value, 1);
+			break;
+		case 2:
+			in.value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.value, 1);
+			in.target_value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.target_value, 2);
+			break;
+		case 3:
+			in.target = xstrdup("target");
+			break;
+		}
+		in.ref_name = xstrdup("refs/heads/master");
+
+		reftable_record_from_ref(&rec, &in);
+		test_copy(&rec);
+
+		assert(reftable_record_val_type(&rec) == i);
+
+		reftable_record_key(&rec, &key);
+		slice_resize(&dest, 1024);
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n > 0);
+
+		/* decode into a non-zero reftable_record to test for leaks. */
+
+		reftable_record_from_ref(&rec_out, &out);
+		m = reftable_record_decode(&rec_out, key, i, dest, SHA1_SIZE);
+		assert(n == m);
+
+		assert((out.value != NULL) == (in.value != NULL));
+		assert((out.target_value != NULL) == (in.target_value != NULL));
+		assert((out.target != NULL) == (in.target != NULL));
+		reftable_record_clear(&rec_out);
+
+		slice_release(&key);
+		slice_release(&dest);
+		reftable_ref_record_clear(&in);
+	}
+}
+
+static void test_reftable_log_record_equal(void)
+{
+	struct reftable_log_record in[2] = {
+		{
+			.ref_name = xstrdup("refs/heads/master"),
+			.update_index = 42,
+		},
+		{
+			.ref_name = xstrdup("refs/heads/master"),
+			.update_index = 22,
+		}
+	};
+
+	assert(!reftable_log_record_equal(&in[0], &in[1], SHA1_SIZE));
+	in[1].update_index = in[0].update_index;
+	assert(reftable_log_record_equal(&in[0], &in[1], SHA1_SIZE));
+	reftable_log_record_clear(&in[0]);
+	reftable_log_record_clear(&in[1]);
+}
+
+static void test_reftable_log_record_roundtrip(void)
+{
+	struct reftable_log_record in[2] = {
+		{
+			.ref_name = xstrdup("refs/heads/master"),
+			.old_hash = reftable_malloc(SHA1_SIZE),
+			.new_hash = reftable_malloc(SHA1_SIZE),
+			.name = xstrdup("han-wen"),
+			.email = xstrdup("hanwen@google.com"),
+			.message = xstrdup("test"),
+			.update_index = 42,
+			.time = 1577123507,
+			.tz_offset = 100,
+		},
+		{
+			.ref_name = xstrdup("refs/heads/master"),
+			.update_index = 22,
+		}
+	};
+	set_test_hash(in[0].new_hash, 1);
+	set_test_hash(in[0].old_hash, 2);
+	for (int i = 0; i < ARRAY_SIZE(in); i++) {
+		struct reftable_record rec = { 0 };
+		struct slice key = SLICE_INIT;
+		struct slice dest = SLICE_INIT;
+		/* populate out, to check for leaks. */
+		struct reftable_log_record out = {
+			.ref_name = xstrdup("old name"),
+			.new_hash = reftable_calloc(SHA1_SIZE),
+			.old_hash = reftable_calloc(SHA1_SIZE),
+			.name = xstrdup("old name"),
+			.email = xstrdup("old@email"),
+			.message = xstrdup("old message"),
+		};
+		struct reftable_record rec_out = { 0 };
+		int n, m, valtype;
+
+		reftable_record_from_log(&rec, &in[i]);
+
+		test_copy(&rec);
+
+		reftable_record_key(&rec, &key);
+
+		slice_resize(&dest, 1024);
+
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n >= 0);
+		reftable_record_from_log(&rec_out, &out);
+		valtype = reftable_record_val_type(&rec);
+		m = reftable_record_decode(&rec_out, key, valtype, dest,
+					   SHA1_SIZE);
+		assert(n == m);
+
+		assert(reftable_log_record_equal(&in[i], &out, SHA1_SIZE));
+		reftable_log_record_clear(&in[i]);
+		slice_release(&key);
+		slice_release(&dest);
+		reftable_record_clear(&rec_out);
+	}
+}
+
+static void test_u24_roundtrip(void)
+{
+	uint32_t in = 0x112233;
+	byte dest[3];
+	uint32_t out;
+	put_be24(dest, in);
+	out = get_be24(dest);
+	assert(in == out);
+}
+
+static void test_key_roundtrip(void)
+{
+	struct slice dest = SLICE_INIT;
+	struct slice last_key = SLICE_INIT;
+	struct slice key = SLICE_INIT;
+	struct slice roundtrip = SLICE_INIT;
+	bool restart;
+	byte extra;
+	int n, m;
+	byte rt_extra;
+
+	slice_resize(&dest, 1024);
+	slice_set_string(&last_key, "refs/heads/master");
+	slice_set_string(&key, "refs/tags/bla");
+
+	extra = 6;
+	n = reftable_encode_key(&restart, dest, last_key, key, extra);
+	assert(!restart);
+	assert(n > 0);
+
+	m = reftable_decode_key(&roundtrip, &rt_extra, last_key, dest);
+	assert(n == m);
+	assert(slice_equal(&key, &roundtrip));
+	assert(rt_extra == extra);
+
+	slice_release(&last_key);
+	slice_release(&key);
+	slice_release(&dest);
+	slice_release(&roundtrip);
+}
+
+static void test_reftable_obj_record_roundtrip(void)
+{
+	byte testHash1[SHA1_SIZE] = { 1, 2, 3, 4, 0 };
+	uint64_t till9[] = { 1, 2, 3, 4, 500, 600, 700, 800, 9000 };
+	struct reftable_obj_record recs[3] = { {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+						       .offsets = till9,
+						       .offset_len = 3,
+					       },
+					       {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+						       .offsets = till9,
+						       .offset_len = 9,
+					       },
+					       {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+					       } };
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(recs); i++) {
+		struct reftable_obj_record in = recs[i];
+		struct slice dest = SLICE_INIT;
+		struct reftable_record rec = { 0 };
+		struct slice key = SLICE_INIT;
+		struct reftable_obj_record out = { 0 };
+		struct reftable_record rec_out = { 0 };
+		int n, m;
+		byte extra;
+
+		reftable_record_from_obj(&rec, &in);
+		test_copy(&rec);
+		reftable_record_key(&rec, &key);
+		slice_resize(&dest, 1024);
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n > 0);
+		extra = reftable_record_val_type(&rec);
+		reftable_record_from_obj(&rec_out, &out);
+		m = reftable_record_decode(&rec_out, key, extra, dest,
+					   SHA1_SIZE);
+		assert(n == m);
+
+		assert(in.hash_prefix_len == out.hash_prefix_len);
+		assert(in.offset_len == out.offset_len);
+
+		assert(!memcmp(in.hash_prefix, out.hash_prefix,
+			       in.hash_prefix_len));
+		assert(0 == memcmp(in.offsets, out.offsets,
+				   sizeof(uint64_t) * in.offset_len));
+		slice_release(&key);
+		slice_release(&dest);
+		reftable_record_clear(&rec_out);
+	}
+}
+
+static void test_reftable_index_record_roundtrip(void)
+{
+	struct reftable_index_record in = {
+		.offset = 42,
+		.last_key = SLICE_INIT,
+	};
+	struct slice dest = SLICE_INIT;
+	struct slice key = SLICE_INIT;
+	struct reftable_record rec = { 0 };
+	struct reftable_index_record out = { .last_key = SLICE_INIT };
+	struct reftable_record out_rec = { NULL };
+	int n, m;
+	byte extra;
+
+	slice_set_string(&in.last_key, "refs/heads/master");
+	reftable_record_from_index(&rec, &in);
+	reftable_record_key(&rec, &key);
+	test_copy(&rec);
+
+	assert(0 == slice_cmp(&key, &in.last_key));
+	slice_resize(&dest, 1024);
+	n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+	assert(n > 0);
+
+	extra = reftable_record_val_type(&rec);
+	reftable_record_from_index(&out_rec, &out);
+	m = reftable_record_decode(&out_rec, key, extra, dest, SHA1_SIZE);
+	assert(m == n);
+
+	assert(in.offset == out.offset);
+
+	reftable_record_clear(&out_rec);
+	slice_release(&key);
+	slice_release(&in.last_key);
+	slice_release(&dest);
+}
+
+int record_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_reftable_log_record_equal",
+		      &test_reftable_log_record_equal);
+	add_test_case("test_reftable_log_record_roundtrip",
+		      &test_reftable_log_record_roundtrip);
+	add_test_case("test_reftable_ref_record_roundtrip",
+		      &test_reftable_ref_record_roundtrip);
+	add_test_case("test_varint_roundtrip", &test_varint_roundtrip);
+	add_test_case("test_key_roundtrip", &test_key_roundtrip);
+	add_test_case("test_common_prefix", &test_common_prefix);
+	add_test_case("test_reftable_obj_record_roundtrip",
+		      &test_reftable_obj_record_roundtrip);
+	add_test_case("test_reftable_index_record_roundtrip",
+		      &test_reftable_index_record_roundtrip);
+	add_test_case("test_u24_roundtrip", &test_u24_roundtrip);
+	return test_main(argc, argv);
+}
diff --git a/reftable/refname.c b/reftable/refname.c
new file mode 100644
index 00000000000..7226329e20b
--- /dev/null
+++ b/reftable/refname.c
@@ -0,0 +1,209 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+#include "reftable.h"
+#include "basics.h"
+#include "refname.h"
+#include "slice.h"
+
+struct find_arg {
+	char **names;
+	const char *want;
+};
+
+static int find_name(size_t k, void *arg)
+{
+	struct find_arg *f_arg = (struct find_arg *)arg;
+	return strcmp(f_arg->names[k], f_arg->want) >= 0;
+}
+
+int modification_has_ref(struct modification *mod, const char *name)
+{
+	struct reftable_ref_record ref = { 0 };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = name,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len && !strcmp(mod->add[idx], name)) {
+			return 0;
+		}
+	}
+
+	if (mod->del_len > 0) {
+		struct find_arg arg = {
+			.names = mod->del,
+			.want = name,
+		};
+		int idx = binsearch(mod->del_len, find_name, &arg);
+		if (idx < mod->del_len && !strcmp(mod->del[idx], name)) {
+			return 1;
+		}
+	}
+
+	err = reftable_table_read_ref(&mod->tab, name, &ref);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static void modification_clear(struct modification *mod)
+{
+	/* don't delete the strings themselves; they're owned by ref records.
+	 */
+	FREE_AND_NULL(mod->add);
+	FREE_AND_NULL(mod->del);
+	mod->add_len = 0;
+	mod->del_len = 0;
+}
+
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix)
+{
+	struct reftable_iterator it = { NULL };
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = prefix,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len &&
+		    !strncmp(prefix, mod->add[idx], strlen(prefix)))
+			goto done;
+	}
+	err = reftable_table_seek_ref(&mod->tab, &it, prefix);
+	if (err)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err)
+			goto done;
+
+		if (mod->del_len > 0) {
+			struct find_arg arg = {
+				.names = mod->del,
+				.want = ref.ref_name,
+			};
+			int idx = binsearch(mod->del_len, find_name, &arg);
+			if (idx < mod->del_len &&
+			    !strcmp(ref.ref_name, mod->del[idx])) {
+				continue;
+			}
+		}
+
+		if (strncmp(ref.ref_name, prefix, strlen(prefix))) {
+			err = 1;
+			goto done;
+		}
+		err = 0;
+		goto done;
+	}
+
+done:
+	reftable_ref_record_clear(&ref);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int validate_ref_name(const char *name)
+{
+	while (true) {
+		char *next = strchr(name, '/');
+		if (!*name) {
+			return REFTABLE_REFNAME_ERROR;
+		}
+		if (!next) {
+			return 0;
+		}
+		if (next - name == 0 || (next - name == 1 && *name == '.') ||
+		    (next - name == 2 && name[0] == '.' && name[1] == '.'))
+			return REFTABLE_REFNAME_ERROR;
+		name = next + 1;
+	}
+	return 0;
+}
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz)
+{
+	struct modification mod = {
+		.tab = tab,
+		.add = reftable_calloc(sizeof(char *) * sz),
+		.del = reftable_calloc(sizeof(char *) * sz),
+	};
+	int i = 0;
+	int err = 0;
+	for (; i < sz; i++) {
+		if (reftable_ref_record_is_deletion(&recs[i])) {
+			mod.del[mod.del_len++] = recs[i].ref_name;
+		} else {
+			mod.add[mod.add_len++] = recs[i].ref_name;
+		}
+	}
+
+	err = modification_validate(&mod);
+	modification_clear(&mod);
+	return err;
+}
+
+static void slice_trim_component(struct slice *sl)
+{
+	while (sl->len > 0) {
+		bool is_slash = (sl->buf[sl->len - 1] == '/');
+		sl->len--;
+		if (is_slash)
+			break;
+	}
+}
+
+int modification_validate(struct modification *mod)
+{
+	struct slice slashed = SLICE_INIT;
+	int err = 0;
+	int i = 0;
+	for (; i < mod->add_len; i++) {
+		err = validate_ref_name(mod->add[i]);
+		if (err)
+			goto done;
+		slice_set_string(&slashed, mod->add[i]);
+		slice_addstr(&slashed, "/");
+
+		err = modification_has_ref_with_prefix(
+			mod, slice_as_string(&slashed));
+		if (err == 0) {
+			err = REFTABLE_NAME_CONFLICT;
+			goto done;
+		}
+		if (err < 0)
+			goto done;
+
+		slice_set_string(&slashed, mod->add[i]);
+		while (slashed.len) {
+			slice_trim_component(&slashed);
+			err = modification_has_ref(mod,
+						   slice_as_string(&slashed));
+			if (err == 0) {
+				err = REFTABLE_NAME_CONFLICT;
+				goto done;
+			}
+			if (err < 0)
+				goto done;
+		}
+	}
+	err = 0;
+done:
+	slice_release(&slashed);
+	return err;
+}
diff --git a/reftable/refname.h b/reftable/refname.h
new file mode 100644
index 00000000000..f335287fcdb
--- /dev/null
+++ b/reftable/refname.h
@@ -0,0 +1,38 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+#ifndef REFNAME_H
+#define REFNAME_H
+
+#include "reftable.h"
+
+struct modification {
+	struct reftable_table tab;
+
+	char **add;
+	size_t add_len;
+
+	char **del;
+	size_t del_len;
+};
+
+// -1 = error, 0 = found, 1 = not found
+int modification_has_ref(struct modification *mod, const char *name);
+
+// -1 = error, 0 = found, 1 = not found.
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix);
+
+// 0 = OK.
+int validate_ref_name(const char *name);
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz);
+
+int modification_validate(struct modification *mod);
+
+#endif
diff --git a/reftable/refname_test.c b/reftable/refname_test.c
new file mode 100644
index 00000000000..c8a91ccaa18
--- /dev/null
+++ b/reftable/refname_test.c
@@ -0,0 +1,99 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "record.h"
+#include "refname.h"
+#include "system.h"
+
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+struct testcase {
+	char *add;
+	char *del;
+	int error_code;
+};
+
+static void test_conflict(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct slice buf = SLICE_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, &buf, &opts);
+	struct reftable_ref_record rec = {
+		.ref_name = "a/b",
+		.target = "destination", /* make sure it's not a symref. */
+		.update_index = 1,
+	};
+	int err;
+	int i;
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct testcase cases[] = {
+		{ "a/b/c", NULL, REFTABLE_NAME_CONFLICT },
+		{ "b", NULL, 0 },
+		{ "a", NULL, REFTABLE_NAME_CONFLICT },
+		{ "a", "a/b", 0 },
+
+		{ "p/", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p//q", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p/./q", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p/../q", NULL, REFTABLE_REFNAME_ERROR },
+
+		{ "a/b/c", "a/b", 0 },
+		{ NULL, "a//b", 0 },
+	};
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_add_ref(w, &rec);
+	assert_err(err);
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+
+	block_source_from_slice(&source, &buf);
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	reftable_table_from_reader(&tab, rd);
+
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		struct modification mod = {
+			.tab = tab,
+		};
+
+		if (cases[i].add != NULL) {
+			mod.add = &cases[i].add;
+			mod.add_len = 1;
+		}
+		if (cases[i].del != NULL) {
+			mod.del = &cases[i].del;
+			mod.del_len = 1;
+		}
+
+		err = modification_validate(&mod);
+		assert(err == cases[i].error_code);
+	}
+
+	reftable_reader_free(rd);
+	slice_release(&buf);
+}
+
+int refname_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_conflict", &test_conflict);
+	return test_main(argc, argv);
+}
diff --git a/reftable/reftable-tests.h b/reftable/reftable-tests.h
new file mode 100644
index 00000000000..031e1ef2c5b
--- /dev/null
+++ b/reftable/reftable-tests.h
@@ -0,0 +1,13 @@
+#ifndef REFTABLE_TESTS_H
+#define REFTABLE_TESTS_H
+
+int block_test_main(int argc, const char **argv);
+int merged_test_main(int argc, const char **argv);
+int record_test_main(int argc, const char **argv);
+int refname_test_main(int argc, const char **argv);
+int reftable_test_main(int argc, const char **argv);
+int slice_test_main(int argc, const char **argv);
+int stack_test_main(int argc, const char **argv);
+int tree_test_main(int argc, const char **argv);
+
+#endif
diff --git a/reftable/reftable.c b/reftable/reftable.c
new file mode 100644
index 00000000000..6e7d1da10e7
--- /dev/null
+++ b/reftable/reftable.c
@@ -0,0 +1,90 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+#include "record.h"
+#include "reader.h"
+#include "merged.h"
+
+struct reftable_table_vtable {
+	int (*seek)(void *tab, struct reftable_iterator *it,
+		    struct reftable_record *);
+};
+
+static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
+				     struct reftable_record *rec)
+{
+	return reader_seek((struct reftable_reader *)tab, it, rec);
+}
+
+static struct reftable_table_vtable reader_vtable = {
+	.seek = reftable_reader_seek_void,
+};
+
+static int reftable_merged_table_seek_void(void *tab,
+					   struct reftable_iterator *it,
+					   struct reftable_record *rec)
+{
+	return merged_table_seek_record((struct reftable_merged_table *)tab, it,
+					rec);
+}
+
+static struct reftable_table_vtable merged_table_vtable = {
+	.seek = reftable_merged_table_seek_void,
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return tab->ops->seek(tab->table_arg, it, &rec);
+}
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &reader_vtable;
+	tab->table_arg = reader;
+}
+
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *merged)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &merged_table_vtable;
+	tab->table_arg = merged;
+}
+
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_iterator it = { 0 };
+	int err = reftable_table_seek_ref(tab, &it, name);
+	if (err)
+		goto done;
+
+	err = reftable_iterator_next_ref(&it, ref);
+	if (err)
+		goto done;
+
+	if (strcmp(ref->ref_name, name) ||
+	    reftable_ref_record_is_deletion(ref)) {
+		reftable_ref_record_clear(ref);
+		err = 1;
+		goto done;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 00000000000..86f2dcf33d4
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,564 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *));
+
+/****************************************************************
+ Basic data types
+
+ Reftables store the state of each ref in struct reftable_ref_record, and they
+ store a sequence of reflog updates in struct reftable_log_record.
+ ****************************************************************/
+
+/* reftable_ref_record holds a ref database entry target_value */
+struct reftable_ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
+
+/* prints a reftable_ref_record onto stdout */
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id);
+
+/* frees and nulls all pointer values. */
+void reftable_ref_record_clear(struct reftable_ref_record *ref);
+
+/* returns whether two reftable_ref_records are the same */
+int reftable_ref_record_equal(struct reftable_ref_record *a,
+			      struct reftable_ref_record *b, int hash_size);
+
+/* reftable_log_record holds a reflog entry */
+struct reftable_log_record {
+	char *ref_name;
+	uint64_t update_index; /* logical timestamp of a transactional update.
+				*/
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int reftable_log_record_is_deletion(const struct reftable_log_record *log);
+
+/* frees and nulls all pointer values. */
+void reftable_log_record_clear(struct reftable_log_record *log);
+
+/* returns whether two records are equal. */
+int reftable_log_record_equal(struct reftable_log_record *a,
+			      struct reftable_log_record *b, int hash_size);
+
+/* dumps a reftable_log_record on stdout, for debugging/testing. */
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id);
+
+/****************************************************************
+ Error handling
+
+ Error are signaled with negative integer return values. 0 means success.
+ ****************************************************************/
+
+/* different types of errors */
+enum reftable_error {
+	/* Unexpected file system behavior */
+	REFTABLE_IO_ERROR = -2,
+
+	/* Format inconsistency on reading data
+	 */
+	REFTABLE_FORMAT_ERROR = -3,
+
+	/* File does not exist. Returned from block_source_from_file(),  because
+	   it needs special handling in stack.
+	*/
+	REFTABLE_NOT_EXIST_ERROR = -4,
+
+	/* Trying to write out-of-date data. */
+	REFTABLE_LOCK_ERROR = -5,
+
+	/* Misuse of the API:
+	   - on writing a record with NULL ref_name.
+	   - on writing a reftable_ref_record outside the table limits
+	   - on writing a ref or log record before the stack's next_update_index
+	   - on reading a reftable_ref_record from log iterator, or vice versa.
+	*/
+	REFTABLE_API_ERROR = -6,
+
+	/* Decompression error */
+	REFTABLE_ZLIB_ERROR = -7,
+
+	/* Wrote a table without blocks. */
+	REFTABLE_EMPTY_TABLE_ERROR = -8,
+
+	/* Dir/file conflict. */
+	REFTABLE_NAME_CONFLICT = -9,
+
+	/* Illegal ref name. */
+	REFTABLE_REFNAME_ERROR = -10,
+};
+
+/* convert the numeric error code to a string. The string should not be
+ * deallocated. */
+const char *reftable_error_str(int err);
+
+/*
+ * Convert the numeric error code to an equivalent errno code.
+ */
+int reftable_error_to_errno(int err);
+
+/****************************************************************
+ Writing
+
+ Writing single reftables
+ ****************************************************************/
+
+/* reftable_write_options sets options for writing a single reftable. */
+struct reftable_write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+
+	/* boolean: do not check ref names for validity or dir/file conflicts.
+	 */
+	int skip_name_check;
+};
+
+/* reftable_block_stats holds statistics for a single block type */
+struct reftable_block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct reftable_stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct reftable_block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct reftable_block_stats obj_stats;
+	/* stats for index blocks */
+	struct reftable_block_stats idx_stats;
+	/* stats for log blocks */
+	struct reftable_block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* reftable_new_writer creates a new writer */
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, uint8_t *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int reftable_fd_write(void *fdp, uint8_t *data, size_t size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+
+   For transactional updates, typically min==max. When converting an existing
+   ref database into a single reftable, this would be a range of update-index
+   timestamps.
+ */
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max);
+
+/* adds a reftable_ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned.
+
+   It is an error to write a ref record after a log record.
+ */
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n);
+
+/* adds a reftable_log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n);
+
+/* reftable_writer_close finalizes the reftable. The writer is retained so
+ * statistics can be inspected. */
+int reftable_writer_close(struct reftable_writer *w);
+
+/* writer_stats returns the statistics on the reftable being written.
+
+   This struct becomes invalid when the writer is freed.
+ */
+const struct reftable_stats *writer_stats(struct reftable_writer *w);
+
+/* reftable_writer_free deallocates memory for the writer */
+void reftable_writer_free(struct reftable_writer *w);
+
+/****************************************************************
+ * ITERATING
+ ****************************************************************/
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct reftable_iterator {
+	struct reftable_iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref);
+
+/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log);
+
+/* releases resources associated with an iterator. */
+void reftable_iterator_destroy(struct reftable_iterator *it);
+
+/****************************************************************
+ Reading single tables
+
+ The follow routines are for reading single files. For an application-level
+ interface, skip ahead to struct reftable_merged_table and struct
+ reftable_stack.
+ ****************************************************************/
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct reftable_block_source {
+	struct reftable_block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct reftable_block {
+	uint8_t *data;
+	int len;
+	struct reftable_block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct reftable_block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct reftable_block *dest,
+			  uint64_t off, uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct reftable_block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int reftable_block_source_from_file(struct reftable_block_source *block_src,
+				    const char *name);
+
+/* The reader struct is a handle to an open reftable file. */
+struct reftable_reader;
+
+/* reftable_new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp. The name is used for creating a stack. Typically, it is the
+ * basename of the file. The block source `src` is owned by the reader, and is
+ * closed on calling reftable_reader_destroy().
+ */
+int reftable_new_reader(struct reftable_reader **pp,
+			struct reftable_block_source *src, const char *name);
+
+/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
+   in the table.  To seek to the start of the table, use name = "".
+
+   example:
+
+   struct reftable_reader *r = NULL;
+   int err = reftable_new_reader(&r, &src, "filename");
+   if (err < 0) { ... }
+   struct reftable_iterator it  = {0};
+   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct reftable_ref_record ref  = {0};
+   while (1) {
+     err = reftable_iterator_next_ref(&it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   reftable_iterator_destroy(&it);
+   reftable_ref_record_clear(&ref);
+ */
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reftable_reader_hash_id(struct reftable_reader *r);
+
+/* seek to logs for the given name, older than update_index. To seek to the
+   start of the table, use name = "".
+ */
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reftable_reader_free(struct reftable_reader *);
+
+/* return an iterator for the refs pointing to `oid`. */
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid);
+
+/* return the max_update_index for a table */
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
+
+/****************************************************************
+ Merged tables
+
+ A ref database kept in a sequence of table files. The merged_table presents a
+ unified view to reading (seeking, iterating) a sequence of immutable tables.
+ ****************************************************************/
+
+/* A merged table is implements seeking/iterating over a stack of tables. */
+struct reftable_merged_table;
+
+/* reftable_new_merged_table creates a new merged table. It takes ownership of
+   the stack array.
+*/
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id);
+
+/* returns an iterator positioned just before 'name' */
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index);
+
+/* like reftable_merged_table_seek_log_at but look for the newest entry. */
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
+
+/* closes readers for the merged tables */
+void reftable_merged_table_close(struct reftable_merged_table *mt);
+
+/* releases memory for the merged_table */
+void reftable_merged_table_free(struct reftable_merged_table *m);
+
+/****************************************************************
+ Generic tables
+
+ A unified API for reading tables, either merged tables, or single readers.
+ ****************************************************************/
+
+struct reftable_table {
+	struct reftable_table_vtable *ops;
+	void *table_arg;
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name);
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader);
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *table);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref);
+
+/****************************************************************
+ Mutable ref database
+
+ The stack presents an interface to a mutable sequence of reftables.
+ ****************************************************************/
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct reftable_stack;
+
+/* open a new reftable stack. The tables along with the table list will be
+   stored in 'dir'. Typically, this should be .git/reftables.
+*/
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
+
+/* holds a transaction to add tables at the top of a stack. */
+struct reftable_addition;
+
+/*
+  returns a new transaction to add reftables to the given stack. As a side
+  effect, the ref database is locked.
+*/
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st);
+
+/* Adds a reftable to transaction. */
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg);
+
+/* Commits the transaction, releasing the lock. */
+int reftable_addition_commit(struct reftable_addition *add);
+
+/* Release all non-committed data from the transaction, and deallocate the
+   transaction. Releases the lock if held. */
+void reftable_addition_destroy(struct reftable_addition *add);
+
+/* add a new table to the stack. The write_table function must call
+   reftable_writer_set_limits, add refs and return an error value. */
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write_table)(struct reftable_writer *wr,
+					  void *write_arg),
+		       void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st);
+
+/* frees all resources associated with the stack. */
+void reftable_stack_destroy(struct reftable_stack *st);
+
+/* Reloads the stack if necessary. This is very cheap to run if the stack was up
+ * to date */
+int reftable_stack_reload(struct reftable_stack *st);
+
+/* Policy for expiring reflog entries. */
+struct reftable_log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int reftable_stack_auto_compact(struct reftable_stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log);
+
+/* statistics on past compactions. */
+struct reftable_compaction_stats {
+	uint64_t bytes; /* total number of bytes written */
+	uint64_t entries_written; /* total number of entries written, including
+				     failures. */
+	int attempts; /* how often we tried to compact */
+	int failures; /* failures happen on concurrent updates */
+};
+
+/* return statistics for compaction up till now. */
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st);
+
+#endif
diff --git a/reftable/reftable_test.c b/reftable/reftable_test.c
new file mode 100644
index 00000000000..da0a7460a46
--- /dev/null
+++ b/reftable/reftable_test.c
@@ -0,0 +1,631 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "record.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static const int update_index = 5;
+
+static void test_buffer(void)
+{
+	struct slice buf = SLICE_INIT;
+	struct reftable_block_source source = { NULL };
+	struct reftable_block out = { 0 };
+	int n;
+	byte in[] = "hello";
+	slice_add(&buf, in, sizeof(in));
+	block_source_from_slice(&source, &buf);
+	assert(block_source_size(&source) == 6);
+	n = block_source_read_block(&source, &out, 0, sizeof(in));
+	assert(n == sizeof(in));
+	assert(!memcmp(in, out.data, n));
+	reftable_block_done(&out);
+
+	n = block_source_read_block(&source, &out, 1, 2);
+	assert(n == 2);
+	assert(!memcmp(out.data, "el", 2));
+
+	reftable_block_done(&out);
+	block_source_close(&source);
+	slice_release(&buf);
+}
+
+static void test_default_write_opts(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct slice buf = SLICE_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, &buf, &opts);
+
+	struct reftable_ref_record rec = {
+		.ref_name = "master",
+		.update_index = 1,
+	};
+	int err;
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader **readers = malloc(sizeof(*readers) * 1);
+	uint32_t hash_id;
+	struct reftable_reader *rd = NULL;
+	struct reftable_merged_table *merged = NULL;
+
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_add_ref(w, &rec);
+	assert_err(err);
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+
+	block_source_from_slice(&source, &buf);
+
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	hash_id = reftable_reader_hash_id(rd);
+	assert(hash_id == SHA1_ID);
+
+	readers[0] = rd;
+
+	err = reftable_new_merged_table(&merged, readers, 1, SHA1_ID);
+	assert_err(err);
+
+	reftable_merged_table_close(merged);
+	reftable_merged_table_free(merged);
+	slice_release(&buf);
+}
+
+static void write_table(char ***names, struct slice *buf, int N, int block_size,
+			uint32_t hash_id)
+{
+	struct reftable_write_options opts = {
+		.block_size = block_size,
+		.hash_id = hash_id,
+	};
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, buf, &opts);
+	struct reftable_ref_record ref = { 0 };
+	int i = 0, n;
+	struct reftable_log_record log = { 0 };
+	const struct reftable_stats *stats = NULL;
+	*names = reftable_calloc(sizeof(char *) * (N + 1));
+	reftable_writer_set_limits(w, update_index, update_index);
+	for (i = 0; i < N; i++) {
+		byte hash[SHA256_SIZE] = { 0 };
+		char name[100];
+		int n;
+
+		set_test_hash(hash, i);
+
+		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
+
+		ref.ref_name = name;
+		ref.value = hash;
+		ref.update_index = update_index;
+		(*names)[i] = xstrdup(name);
+
+		n = reftable_writer_add_ref(w, &ref);
+		assert(n == 0);
+	}
+
+	for (i = 0; i < N; i++) {
+		byte hash[SHA256_SIZE] = { 0 };
+		char name[100];
+		int n;
+
+		set_test_hash(hash, i);
+
+		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
+
+		log.ref_name = name;
+		log.new_hash = hash;
+		log.update_index = update_index;
+		log.message = "message";
+
+		n = reftable_writer_add_log(w, &log);
+		assert(n == 0);
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	stats = writer_stats(w);
+	for (i = 0; i < stats->ref_stats.blocks; i++) {
+		int off = i * opts.block_size;
+		if (off == 0) {
+			off = header_size((hash_id == SHA256_ID) ? 2 : 1);
+		}
+		assert(buf->buf[off] == 'r');
+	}
+
+	assert(stats->log_stats.blocks > 0);
+	reftable_writer_free(w);
+}
+
+static void test_log_buffer_size(void)
+{
+	struct slice buf = SLICE_INIT;
+	struct reftable_write_options opts = {
+		.block_size = 4096,
+	};
+	int err;
+	struct reftable_log_record log = {
+		.ref_name = "refs/heads/master",
+		.name = "Han-Wen Nienhuys",
+		.email = "hanwen@google.com",
+		.tz_offset = 100,
+		.time = 0x5e430672,
+		.update_index = 0xa,
+		.message = "commit: 9\n",
+	};
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, &buf, &opts);
+
+	/* This tests buffer extension for log compression. Must use a random
+	   hash, to ensure that the compressed part is larger than the original.
+	*/
+	byte hash1[SHA1_SIZE], hash2[SHA1_SIZE];
+	for (int i = 0; i < SHA1_SIZE; i++) {
+		hash1[i] = (byte)(rand() % 256);
+		hash2[i] = (byte)(rand() % 256);
+	}
+	log.old_hash = hash1;
+	log.new_hash = hash2;
+	reftable_writer_set_limits(w, update_index, update_index);
+	err = reftable_writer_add_log(w, &log);
+	assert_err(err);
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+	slice_release(&buf);
+}
+
+static void test_log_write_read(void)
+{
+	int N = 2;
+	char **names = reftable_calloc(sizeof(char *) * (N + 1));
+	int err;
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_ref_record ref = { 0 };
+	int i = 0;
+	struct reftable_log_record log = { 0 };
+	int n;
+	struct reftable_iterator it = { 0 };
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	struct slice buf = SLICE_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, &buf, &opts);
+	const struct reftable_stats *stats = NULL;
+	reftable_writer_set_limits(w, 0, N);
+	for (i = 0; i < N; i++) {
+		char name[256];
+		struct reftable_ref_record ref = { 0 };
+		snprintf(name, sizeof(name), "b%02d%0*d", i, 130, 7);
+		names[i] = xstrdup(name);
+		puts(name);
+		ref.ref_name = name;
+		ref.update_index = i;
+
+		err = reftable_writer_add_ref(w, &ref);
+		assert_err(err);
+	}
+	for (i = 0; i < N; i++) {
+		byte hash1[SHA1_SIZE], hash2[SHA1_SIZE];
+		struct reftable_log_record log = { 0 };
+		set_test_hash(hash1, i);
+		set_test_hash(hash2, i + 1);
+
+		log.ref_name = names[i];
+		log.update_index = i;
+		log.old_hash = hash1;
+		log.new_hash = hash2;
+
+		err = reftable_writer_add_log(w, &log);
+		assert_err(err);
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	stats = writer_stats(w);
+	assert(stats->log_stats.blocks > 0);
+	reftable_writer_free(w);
+	w = NULL;
+
+	block_source_from_slice(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.log");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, names[N - 1]);
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert_err(err);
+
+	/* end of iteration. */
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert(0 < err);
+
+	reftable_iterator_destroy(&it);
+	reftable_ref_record_clear(&ref);
+
+	err = reftable_reader_seek_log(&rd, &it, "");
+	assert_err(err);
+
+	i = 0;
+	while (true) {
+		int err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			break;
+		}
+
+		assert_err(err);
+		assert_streq(names[i], log.ref_name);
+		assert(i == log.update_index);
+		i++;
+		reftable_log_record_clear(&log);
+	}
+
+	assert(i == N);
+	reftable_iterator_destroy(&it);
+
+	/* cleanup. */
+	slice_release(&buf);
+	free_names(names);
+	reader_close(&rd);
+}
+
+static void test_table_read_write_sequential(void)
+{
+	char **names;
+	struct slice buf = SLICE_INIT;
+	int N = 50;
+	struct reftable_iterator it = { 0 };
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader rd = { 0 };
+	int err = 0;
+	int j = 0;
+
+	write_table(&names, &buf, N, 256, SHA1_ID);
+
+	block_source_from_slice(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, "");
+	assert_err(err);
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		int r = reftable_iterator_next_ref(&it, &ref);
+		assert(r >= 0);
+		if (r > 0) {
+			break;
+		}
+		assert(0 == strcmp(names[j], ref.ref_name));
+		assert(update_index == ref.update_index);
+
+		j++;
+		reftable_ref_record_clear(&ref);
+	}
+	assert(j == N);
+	reftable_iterator_destroy(&it);
+	slice_release(&buf);
+	free_names(names);
+
+	reader_close(&rd);
+}
+
+static void test_table_write_small_table(void)
+{
+	char **names;
+	struct slice buf = SLICE_INIT;
+	int N = 1;
+	write_table(&names, &buf, N, 4096, SHA1_ID);
+	assert(buf.len < 200);
+	slice_release(&buf);
+	free_names(names);
+}
+
+static void test_table_read_api(void)
+{
+	char **names;
+	struct slice buf = SLICE_INIT;
+	int N = 50;
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	int err;
+	int i;
+	struct reftable_log_record log = { 0 };
+	struct reftable_iterator it = { 0 };
+
+	write_table(&names, &buf, N, 256, SHA1_ID);
+
+	block_source_from_slice(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, names[0]);
+	assert_err(err);
+
+	err = reftable_iterator_next_log(&it, &log);
+	assert(err == REFTABLE_API_ERROR);
+
+	slice_release(&buf);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_free(names);
+	reader_close(&rd);
+	slice_release(&buf);
+}
+
+static void test_table_read_write_seek(bool index, int hash_id)
+{
+	char **names;
+	struct slice buf = SLICE_INIT;
+	int N = 50;
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	int err;
+	int i = 0;
+
+	struct reftable_iterator it = { 0 };
+	struct slice pastLast = SLICE_INIT;
+	struct reftable_ref_record ref = { 0 };
+
+	write_table(&names, &buf, N, 256, hash_id);
+
+	block_source_from_slice(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+	assert(hash_id == reftable_reader_hash_id(&rd));
+
+	if (!index) {
+		rd.ref_offsets.index_offset = 0;
+	} else {
+		assert(rd.ref_offsets.index_offset > 0);
+	}
+
+	for (i = 1; i < N; i++) {
+		int err = reftable_reader_seek_ref(&rd, &it, names[i]);
+		assert_err(err);
+		err = reftable_iterator_next_ref(&it, &ref);
+		assert_err(err);
+		assert(0 == strcmp(names[i], ref.ref_name));
+		assert(i == ref.value[0]);
+
+		reftable_ref_record_clear(&ref);
+		reftable_iterator_destroy(&it);
+	}
+
+	slice_set_string(&pastLast, names[N - 1]);
+	slice_addstr(&pastLast, "/");
+
+	err = reftable_reader_seek_ref(&rd, &it, slice_as_string(&pastLast));
+	if (err == 0) {
+		struct reftable_ref_record ref = { 0 };
+		int err = reftable_iterator_next_ref(&it, &ref);
+		assert(err > 0);
+	} else {
+		assert(err > 0);
+	}
+
+	slice_release(&pastLast);
+	reftable_iterator_destroy(&it);
+
+	slice_release(&buf);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+	reftable_free(names);
+	reader_close(&rd);
+}
+
+static void test_table_read_write_seek_linear(void)
+{
+	test_table_read_write_seek(false, SHA1_ID);
+}
+
+static void test_table_read_write_seek_linear_sha256(void)
+{
+	test_table_read_write_seek(false, SHA256_ID);
+}
+
+static void test_table_read_write_seek_index(void)
+{
+	test_table_read_write_seek(true, SHA1_ID);
+}
+
+static void test_table_refs_for(bool indexed)
+{
+	int N = 50;
+	char **want_names = reftable_calloc(sizeof(char *) * (N + 1));
+	int want_names_len = 0;
+	byte want_hash[SHA1_SIZE];
+
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_ref_record ref = { 0 };
+	int i = 0;
+	int n;
+	int err;
+	struct reftable_reader rd;
+	struct reftable_block_source source = { 0 };
+
+	struct slice buf = SLICE_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, &buf, &opts);
+
+	struct reftable_iterator it = { 0 };
+	int j;
+
+	set_test_hash(want_hash, 4);
+
+	for (i = 0; i < N; i++) {
+		byte hash[SHA1_SIZE];
+		char fill[51] = { 0 };
+		char name[100];
+		byte hash1[SHA1_SIZE];
+		byte hash2[SHA1_SIZE];
+		struct reftable_ref_record ref = { 0 };
+
+		memset(hash, i, sizeof(hash));
+		memset(fill, 'x', 50);
+		/* Put the variable part in the start */
+		snprintf(name, sizeof(name), "br%02d%s", i, fill);
+		name[40] = 0;
+		ref.ref_name = name;
+
+		set_test_hash(hash1, i / 4);
+		set_test_hash(hash2, 3 + i / 4);
+		ref.value = hash1;
+		ref.target_value = hash2;
+
+		/* 80 bytes / entry, so 3 entries per block. Yields 17
+		 */
+		/* blocks. */
+		n = reftable_writer_add_ref(w, &ref);
+		assert(n == 0);
+
+		if (!memcmp(hash1, want_hash, SHA1_SIZE) ||
+		    !memcmp(hash2, want_hash, SHA1_SIZE)) {
+			want_names[want_names_len++] = xstrdup(name);
+		}
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	reftable_writer_free(w);
+	w = NULL;
+
+	block_source_from_slice(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+	if (!indexed) {
+		rd.obj_offsets.present = 0;
+	}
+
+	err = reftable_reader_seek_ref(&rd, &it, "");
+	assert_err(err);
+	reftable_iterator_destroy(&it);
+
+	err = reftable_reader_refs_for(&rd, &it, want_hash);
+	assert_err(err);
+
+	j = 0;
+	while (true) {
+		int err = reftable_iterator_next_ref(&it, &ref);
+		assert(err >= 0);
+		if (err > 0) {
+			break;
+		}
+
+		assert(j < want_names_len);
+		assert(0 == strcmp(ref.ref_name, want_names[j]));
+		j++;
+		reftable_ref_record_clear(&ref);
+	}
+	assert(j == want_names_len);
+
+	slice_release(&buf);
+	free_names(want_names);
+	reftable_iterator_destroy(&it);
+	reader_close(&rd);
+}
+
+static void test_table_refs_for_no_index(void)
+{
+	test_table_refs_for(false);
+}
+
+static void test_table_refs_for_obj_index(void)
+{
+	test_table_refs_for(true);
+}
+
+static void test_table_empty(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct slice buf = SLICE_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, &buf, &opts);
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_ref_record rec = { 0 };
+	struct reftable_iterator it = { 0 };
+	int err;
+
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_close(w);
+	assert(err == REFTABLE_EMPTY_TABLE_ERROR);
+	reftable_writer_free(w);
+
+	assert(buf.len == header_size(1) + footer_size(1));
+
+	block_source_from_slice(&source, &buf);
+
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &rec);
+	assert(err > 0);
+
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	slice_release(&buf);
+}
+
+int reftable_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_default_write_opts", test_default_write_opts);
+	add_test_case("test_log_write_read", test_log_write_read);
+	add_test_case("test_table_read_write_seek_linear_sha256",
+		      &test_table_read_write_seek_linear_sha256);
+	add_test_case("test_log_buffer_size", test_log_buffer_size);
+	add_test_case("test_table_write_small_table",
+		      &test_table_write_small_table);
+	add_test_case("test_buffer", &test_buffer);
+	add_test_case("test_table_read_api", &test_table_read_api);
+	add_test_case("test_table_read_write_sequential",
+		      &test_table_read_write_sequential);
+	add_test_case("test_table_read_write_seek_linear",
+		      &test_table_read_write_seek_linear);
+	add_test_case("test_table_read_write_seek_index",
+		      &test_table_read_write_seek_index);
+	add_test_case("test_table_read_write_refs_for_no_index",
+		      &test_table_refs_for_no_index);
+	add_test_case("test_table_read_write_refs_for_obj_index",
+		      &test_table_refs_for_obj_index);
+	add_test_case("test_table_empty", &test_table_empty);
+	return test_main(argc, argv);
+}
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 00000000000..c491a8575e5
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,243 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+struct slice reftable_empty_slice = SLICE_INIT;
+
+void slice_set_string(struct slice *s, const char *str)
+{
+	int l;
+	if (str == NULL) {
+		s->len = 0;
+		return;
+	}
+	assert(s->canary == SLICE_CANARY);
+
+	l = strlen(str);
+	l++; /* \0 */
+	slice_resize(s, l);
+	memcpy(s->buf, str, l);
+	s->len = l - 1;
+}
+
+void slice_init(struct slice *s)
+{
+	struct slice empty = SLICE_INIT;
+	*s = empty;
+}
+
+void slice_resize(struct slice *s, int l)
+{
+	assert(s->canary == SLICE_CANARY);
+	if (s->cap < l) {
+		int c = s->cap * 2;
+		if (c < l) {
+			c = l;
+		}
+		s->cap = c;
+		s->buf = reftable_realloc(s->buf, s->cap);
+	}
+	s->len = l;
+}
+
+void slice_addstr(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+	assert(d->canary == SLICE_CANARY);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_addbuf(struct slice *s, struct slice *a)
+{
+	int end = s->len;
+	assert(s->canary == SLICE_CANARY);
+	slice_resize(s, s->len + a->len);
+	memcpy(s->buf + end, a->buf, a->len);
+}
+
+void slice_consume(struct slice *s, int n)
+{
+	assert(s->canary == SLICE_CANARY);
+	s->buf += n;
+	s->len -= n;
+}
+
+byte *slice_detach(struct slice *s)
+{
+	byte *p = s->buf;
+	assert(s->canary == SLICE_CANARY);
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_release(struct slice *s)
+{
+	assert(s->canary == SLICE_CANARY);
+	reftable_free(slice_detach(s));
+}
+
+void slice_copy(struct slice *dest, struct slice *src)
+{
+	assert(dest->canary == SLICE_CANARY);
+	assert(src->canary == SLICE_CANARY);
+	slice_resize(dest, src->len);
+	memcpy(dest->buf, src->buf, src->len);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	assert(s->canary == SLICE_CANARY);
+	if (s->cap == s->len) {
+		int l = s->len;
+		slice_resize(s, l + 1);
+		s->len = l;
+	}
+	s->buf[s->len] = 0;
+	return (const char *)s->buf;
+}
+
+/* return a newly malloced string for this slice */
+char *slice_to_string(struct slice *in)
+{
+	struct slice s = SLICE_INIT;
+	assert(in->canary == SLICE_CANARY);
+	slice_resize(&s, in->len + 1);
+	s.buf[in->len] = 0;
+	memcpy(s.buf, in->buf, in->len);
+	return (char *)slice_detach(&s);
+}
+
+bool slice_equal(struct slice *a, struct slice *b)
+{
+	return slice_cmp(a, b) == 0;
+}
+
+int slice_cmp(const struct slice *a, const struct slice *b)
+{
+	int min = a->len < b->len ? a->len : b->len;
+	int res = memcmp(a->buf, b->buf, min);
+	assert(a->canary == SLICE_CANARY);
+	assert(b->canary == SLICE_CANARY);
+	if (res != 0)
+		return res;
+	if (a->len < b->len)
+		return -1;
+	else if (a->len > b->len)
+		return 1;
+	else
+		return 0;
+}
+
+int slice_add(struct slice *b, byte *data, size_t sz)
+{
+	assert(b->canary == SLICE_CANARY);
+	if (b->len + sz > b->cap) {
+		int newcap = 2 * b->cap + 1;
+		if (newcap < b->len + sz) {
+			newcap = (b->len + sz);
+		}
+		b->buf = reftable_realloc(b->buf, newcap);
+		b->cap = newcap;
+	}
+
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	return sz;
+}
+
+int slice_add_void(void *b, byte *data, size_t sz)
+{
+	return slice_add((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = reftable_calloc(size);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf)
+{
+	assert(bs->ops == NULL);
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+struct reftable_block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct reftable_block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct reftable_block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
+
+int common_prefix_size(struct slice *a, struct slice *b)
+{
+	int p = 0;
+	assert(a->canary == SLICE_CANARY);
+	assert(b->canary == SLICE_CANARY);
+	while (p < a->len && p < b->len) {
+		if (a->buf[p] != b->buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 00000000000..1a2bde3fc69
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,87 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+/*
+  provides bounds-checked byte ranges.
+  To use, initialize as "slice x = {0};"
+ */
+struct slice {
+	int len;
+	int cap;
+	byte *buf;
+	byte canary;
+};
+#define SLICE_CANARY 0x42
+#define SLICE_INIT                       \
+	{                                \
+		0, 0, NULL, SLICE_CANARY \
+	}
+extern struct slice reftable_empty_slice;
+
+void slice_set_string(struct slice *dest, const char *src);
+void slice_addstr(struct slice *dest, const char *src);
+
+/* Deallocate and clear slice */
+void slice_release(struct slice *slice);
+
+/* Return a malloced string for `src` */
+char *slice_to_string(struct slice *src);
+
+/* Initializes a slice. Accepts a slice with random garbage. */
+void slice_init(struct slice *slice);
+
+/* Ensure that `buf` is \0 terminated. */
+const char *slice_as_string(struct slice *src);
+
+/* Compare slices */
+bool slice_equal(struct slice *a, struct slice *b);
+
+/* Return `buf`, clearing out `s` */
+byte *slice_detach(struct slice *s);
+
+/* Copy bytes */
+void slice_copy(struct slice *dest, struct slice *src);
+
+/* Advance `buf` by `n`, and decrease length. A copy of the slice
+   should be kept for deallocating the slice. */
+void slice_consume(struct slice *s, int n);
+
+/* Set length of the slice to `l` */
+void slice_resize(struct slice *s, int l);
+
+/* Signed comparison */
+int slice_cmp(const struct slice *a, const struct slice *b);
+
+/* Append `data` to the `dest` slice.  */
+int slice_add(struct slice *dest, byte *data, size_t sz);
+
+/* Append `add` to `dest. */
+void slice_addbuf(struct slice *dest, struct slice *add);
+
+/* Like slice_add, but suitable for passing to reftable_new_writer
+ */
+int slice_add_void(void *b, byte *data, size_t sz);
+
+/* Find the longest shared prefix size of `a` and `b` */
+int common_prefix_size(struct slice *a, struct slice *b);
+
+struct reftable_block_source;
+
+/* Create an in-memory block source for reading reftables */
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf);
+
+struct reftable_block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/slice_test.c b/reftable/slice_test.c
new file mode 100644
index 00000000000..f95790c7643
--- /dev/null
+++ b/reftable/slice_test.c
@@ -0,0 +1,40 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_slice(void)
+{
+	struct slice s = SLICE_INIT;
+	struct slice t = SLICE_INIT;
+
+	slice_set_string(&s, "abc");
+	assert(0 == strcmp("abc", slice_as_string(&s)));
+
+	slice_set_string(&t, "pqr");
+
+	slice_addbuf(&s, &t);
+	assert(0 == strcmp("abcpqr", slice_as_string(&s)));
+
+	slice_release(&s);
+	slice_release(&t);
+}
+
+int slice_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_slice", &test_slice);
+	return test_main(argc, argv);
+}
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 00000000000..4c0ea0556ab
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1207 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "refname.h"
+#include "reftable.h"
+#include "writer.h"
+
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config)
+{
+	struct reftable_stack *p =
+		reftable_calloc(sizeof(struct reftable_stack));
+	struct slice list_file_name = SLICE_INIT;
+	int err = 0;
+
+	if (config.hash_id == 0) {
+		config.hash_id = SHA1_ID;
+	}
+
+	*dest = NULL;
+
+	slice_set_string(&list_file_name, dir);
+	slice_addstr(&list_file_name, "/tables.list");
+
+	p->list_file = slice_to_string(&list_file_name);
+	slice_release(&list_file_name);
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = reftable_stack_reload_maybe_reuse(p, true);
+	if (err < 0) {
+		reftable_stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fd_read_lines(int fd, char ***namesp)
+{
+	off_t size = lseek(fd, 0, SEEK_END);
+	char *buf = NULL;
+	int err = 0;
+	if (size < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+	err = lseek(fd, 0, SEEK_SET);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	buf = reftable_malloc(size + 1);
+	if (read(fd, buf, size) != size) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+
+done:
+	reftable_free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	int fd = open(filename, O_RDONLY, 0644);
+	int err = 0;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			*namesp = reftable_calloc(sizeof(char *));
+			return 0;
+		}
+
+		return REFTABLE_IO_ERROR;
+	}
+	err = fd_read_lines(fd, namesp);
+	close(fd);
+	return err;
+}
+
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void reftable_stack_destroy(struct reftable_stack *st)
+{
+	if (st->merged != NULL) {
+		reftable_merged_table_close(st->merged);
+		reftable_merged_table_free(st->merged);
+		st->merged = NULL;
+	}
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	reftable_free(st);
+}
+
+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
+						   int cur_len)
+{
+	struct reftable_reader **cur =
+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
+				      bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reftable_reader **new_tables =
+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
+	int new_tables_len = 0;
+	struct reftable_merged_table *new_merged = NULL;
+	int i;
+	struct slice table_path = SLICE_INIT;
+
+	while (*names) {
+		struct reftable_reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct reftable_block_source src = { 0 };
+			slice_set_string(&table_path, st->reftable_dir);
+			slice_addstr(&table_path, "/");
+			slice_addstr(&table_path, name);
+
+			err = reftable_block_source_from_file(
+				&src, slice_as_string(&table_path));
+			if (err < 0)
+				goto done;
+
+			err = reftable_new_reader(&rd, &src, name);
+			if (err < 0)
+				goto done;
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
+					st->config.hash_id);
+	if (err < 0)
+		goto done;
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		reftable_merged_table_free(st->merged);
+	}
+	new_merged->suppress_deletions = true;
+	st->merged = new_merged;
+
+	for (i = 0; i < cur_len; i++) {
+		if (cur[i] != NULL) {
+			reader_close(cur[i]);
+			reftable_reader_free(cur[i]);
+		}
+	}
+
+done:
+	slice_release(&table_path);
+	for (i = 0; i < new_tables_len; i++) {
+		reader_close(new_tables[i]);
+		reftable_reader_free(new_tables[i]);
+	}
+	reftable_free(new_tables);
+	reftable_free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0)
+		return diff;
+
+	return udiff;
+}
+
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0)
+		return err;
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		int err2 = 0;
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = reftable_stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != REFTABLE_NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
+		   writer. Check if there was one by checking if the name list
+		   changed.
+		*/
+		err2 = read_lines(st->list_file, &names_after);
+		if (err2 < 0) {
+			free_names(names);
+			return err2;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return err;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 1;
+		sleep_millisec(delay);
+	}
+
+	return 0;
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct reftable_stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0)
+		return err;
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto done;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto done;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto done;
+	}
+
+done:
+	free_names(names);
+	return err;
+}
+
+int reftable_stack_reload(struct reftable_stack *st)
+{
+	int err = stack_uptodate(st);
+	if (err > 0)
+		return reftable_stack_reload_maybe_reuse(st, true);
+	return err;
+}
+
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write)(struct reftable_writer *wr, void *arg),
+		       void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == REFTABLE_LOCK_ERROR) {
+			/* Ignore error return, we want to propagate
+			   REFTABLE_LOCK_ERROR.
+			*/
+			reftable_stack_reload(st);
+		}
+		return err;
+	}
+
+	if (!st->disable_auto_compact)
+		return reftable_stack_auto_compact(st);
+
+	return 0;
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
+	slice_set_string(dest, buf);
+}
+
+struct reftable_addition {
+	int lock_file_fd;
+	struct slice lock_file_name;
+	struct reftable_stack *stack;
+	char **names;
+	char **new_tables;
+	int new_tables_len;
+	uint64_t next_update_index;
+};
+
+#define REFTABLE_ADDITION_INIT               \
+	{                                    \
+		.lock_file_name = SLICE_INIT \
+	}
+
+static int reftable_stack_init_addition(struct reftable_addition *add,
+					struct reftable_stack *st)
+{
+	int err = 0;
+	add->stack = st;
+
+	slice_set_string(&add->lock_file_name, st->list_file);
+	slice_addstr(&add->lock_file_name, ".lock");
+
+	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (add->lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = REFTABLE_LOCK_ERROR;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	err = stack_uptodate(st);
+	if (err < 0)
+		goto done;
+
+	if (err > 1) {
+		err = REFTABLE_LOCK_ERROR;
+		goto done;
+	}
+
+	add->next_update_index = reftable_stack_next_update_index(st);
+done:
+	if (err) {
+		reftable_addition_close(add);
+	}
+	return err;
+}
+
+void reftable_addition_close(struct reftable_addition *add)
+{
+	int i = 0;
+	struct slice nm = SLICE_INIT;
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_set_string(&nm, add->stack->list_file);
+		slice_addstr(&nm, "/");
+		slice_addstr(&nm, add->new_tables[i]);
+		unlink(slice_as_string(&nm));
+		reftable_free(add->new_tables[i]);
+		add->new_tables[i] = NULL;
+	}
+	reftable_free(add->new_tables);
+	add->new_tables = NULL;
+	add->new_tables_len = 0;
+
+	if (add->lock_file_fd > 0) {
+		close(add->lock_file_fd);
+		add->lock_file_fd = 0;
+	}
+	if (add->lock_file_name.len > 0) {
+		unlink(slice_as_string(&add->lock_file_name));
+		slice_release(&add->lock_file_name);
+	}
+
+	free_names(add->names);
+	add->names = NULL;
+	slice_release(&nm);
+}
+
+void reftable_addition_destroy(struct reftable_addition *add)
+{
+	if (add == NULL) {
+		return;
+	}
+	reftable_addition_close(add);
+	reftable_free(add);
+}
+
+int reftable_addition_commit(struct reftable_addition *add)
+{
+	struct slice table_list = SLICE_INIT;
+	int i = 0;
+	int err = 0;
+	if (add->new_tables_len == 0)
+		goto done;
+
+	for (i = 0; i < add->stack->merged->stack_len; i++) {
+		slice_addstr(&table_list, add->stack->merged->stack[i]->name);
+		slice_addstr(&table_list, "\n");
+	}
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_addstr(&table_list, add->new_tables[i]);
+		slice_addstr(&table_list, "\n");
+	}
+
+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	slice_release(&table_list);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = close(add->lock_file_fd);
+	add->lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = rename(slice_as_string(&add->lock_file_name),
+		     add->stack->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = reftable_stack_reload(add->stack);
+
+done:
+	reftable_addition_close(add);
+	return err;
+}
+
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st)
+{
+	int err = 0;
+	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
+	*dest = reftable_calloc(sizeof(**dest));
+	**dest = empty;
+	err = reftable_stack_init_addition(*dest, st);
+	if (err) {
+		reftable_free(*dest);
+		*dest = NULL;
+	}
+	return err;
+}
+
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg)
+{
+	struct reftable_addition add = REFTABLE_ADDITION_INIT;
+	int err = reftable_stack_init_addition(&add, st);
+	if (err < 0)
+		goto done;
+	if (err > 0) {
+		err = REFTABLE_LOCK_ERROR;
+		goto done;
+	}
+
+	err = reftable_addition_add(&add, write_table, arg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_addition_commit(&add);
+done:
+	reftable_addition_close(&add);
+	return err;
+}
+
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg)
+{
+	struct slice temp_tab_file_name = SLICE_INIT;
+	struct slice tab_file_name = SLICE_INIT;
+	struct slice next_name = SLICE_INIT;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+
+	slice_resize(&next_name, 0);
+	format_name(&next_name, add->next_update_index, add->next_update_index);
+
+	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
+	slice_addstr(&temp_tab_file_name, "/");
+	slice_addbuf(&temp_tab_file_name, &next_name);
+	slice_addstr(&temp_tab_file_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
+	if (tab_fd < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
+				 &add->stack->config);
+	err = write_table(wr, arg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_writer_close(wr);
+	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto done;
+	}
+	if (err < 0)
+		goto done;
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = stack_check_addition(add->stack,
+				   slice_as_string(&temp_tab_file_name));
+	if (err < 0)
+		goto done;
+
+	if (wr->min_update_index < add->next_update_index) {
+		err = REFTABLE_API_ERROR;
+		goto done;
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_addstr(&next_name, ".ref");
+
+	slice_set_string(&tab_file_name, add->stack->reftable_dir);
+	slice_addstr(&tab_file_name, "/");
+	slice_addbuf(&tab_file_name, &next_name);
+
+	/* TODO: should check destination out of paranoia */
+	err = rename(slice_as_string(&temp_tab_file_name),
+		     slice_as_string(&tab_file_name));
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	add->new_tables = reftable_realloc(add->new_tables,
+					   sizeof(*add->new_tables) *
+						   (add->new_tables_len + 1));
+	add->new_tables[add->new_tables_len] = slice_to_string(&next_name);
+	add->new_tables_len++;
+done:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_file_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_file_name));
+	}
+
+	slice_release(&temp_tab_file_name);
+	slice_release(&tab_file_name);
+	slice_release(&next_name);
+	reftable_writer_free(wr);
+	return err;
+}
+
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0)
+		return reftable_reader_max_update_index(
+			       st->merged->stack[sz - 1]) +
+		       1;
+	return 1;
+}
+
+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct reftable_log_expiry_config *config)
+{
+	struct slice next_name = SLICE_INIT;
+	int tab_fd = -1;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reftable_reader_min_update_index(st->merged->stack[first]),
+		    reftable_reader_max_update_index(st->merged->stack[first]));
+
+	slice_set_string(temp_tab, st->reftable_dir);
+	slice_addstr(temp_tab, "/");
+	slice_addbuf(temp_tab, &next_name);
+	slice_addstr(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0)
+		goto done;
+	err = reftable_writer_close(wr);
+	if (err < 0)
+		goto done;
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+done:
+	reftable_writer_free(wr);
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		slice_release(temp_tab);
+	}
+	slice_release(&next_name);
+	return err;
+}
+
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reftable_reader **subtabs = reftable_calloc(
+		sizeof(struct reftable_reader *) * (last - first + 1));
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+
+	uint64_t entries = 0;
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reftable_reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	reftable_writer_set_limits(wr,
+				   st->merged->stack[first]->min_update_index,
+				   st->merged->stack[last]->max_update_index);
+
+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
+					st->config.hash_id);
+	if (err < 0) {
+		reftable_free(subtabs);
+		goto done;
+	}
+
+	err = reftable_merged_table_seek_ref(mt, &it, "");
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = reftable_writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+	reftable_iterator_destroy(&it);
+
+	err = reftable_merged_table_seek_log(mt, &it, "");
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = reftable_writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		reftable_merged_table_free(mt);
+	}
+	reftable_ref_record_clear(&ref);
+	reftable_log_record_clear(&log);
+	st->stats.entries_written += entries;
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct reftable_stack *st, int first, int last,
+			       struct reftable_log_expiry_config *expiry)
+{
+	struct slice temp_tab_file_name = SLICE_INIT;
+	struct slice new_table_name = SLICE_INIT;
+	struct slice lock_file_name = SLICE_INIT;
+	struct slice ref_list_contents = SLICE_INIT;
+	struct slice new_table_path = SLICE_INIT;
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **listp = NULL;
+	char **delete_on_success =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	char **subtable_locks =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto done;
+	}
+
+	st->stats.attempts++;
+
+	slice_set_string(&lock_file_name, st->list_file);
+	slice_addstr(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	/* Don't want to write to the lock for now.  */
+	close(lock_file_fd);
+	lock_file_fd = 0;
+
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0)
+		goto done;
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_file_name = SLICE_INIT;
+		struct slice subtab_lock = SLICE_INIT;
+		int sublock_file_fd = -1;
+
+		slice_set_string(&subtab_file_name, st->reftable_dir);
+		slice_addstr(&subtab_file_name, "/");
+		slice_addstr(&subtab_file_name,
+			     reader_name(st->merged->stack[i]));
+
+		slice_copy(&subtab_lock, &subtab_file_name);
+		slice_addstr(&subtab_lock, ".lock");
+
+		sublock_file_fd = open(slice_as_string(&subtab_lock),
+				       O_EXCL | O_CREAT | O_WRONLY, 0644);
+		if (sublock_file_fd > 0) {
+			close(sublock_file_fd);
+		} else if (sublock_file_fd < 0) {
+			if (errno == EEXIST) {
+				err = 1;
+			} else {
+				err = REFTABLE_IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] =
+			(char *)slice_as_string(&subtab_file_name);
+		j++;
+
+		if (err != 0)
+			goto done;
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0)
+		goto done;
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
+				   expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0)
+		goto done;
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_addstr(&new_table_name, ".ref");
+
+	slice_set_string(&new_table_path, st->reftable_dir);
+	slice_addstr(&new_table_path, "/");
+
+	slice_addbuf(&new_table_path, &new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(slice_as_string(&temp_tab_file_name),
+			     slice_as_string(&new_table_path));
+		if (err < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto done;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
+		slice_addstr(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		slice_addbuf(&ref_list_contents, &new_table_name);
+		slice_addstr(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
+		slice_addstr(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto done;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto done;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto done;
+	}
+	have_lock = false;
+
+	/* Reload the stack before deleting. On windows, we can only delete the
+	   files after we closed them.
+	*/
+	err = reftable_stack_reload_maybe_reuse(st, first < last);
+
+	listp = delete_on_success;
+	while (*listp) {
+		if (strcmp(*listp, slice_as_string(&new_table_path))) {
+			unlink(*listp);
+		}
+		listp++;
+	}
+
+done:
+	free_names(delete_on_success);
+
+	listp = subtable_locks;
+	while (*listp) {
+		unlink(*listp);
+		listp++;
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	slice_release(&new_table_name);
+	slice_release(&new_table_path);
+	slice_release(&ref_list_contents);
+	slice_release(&temp_tab_file_name);
+	slice_release(&lock_file_name);
+	return err;
+}
+
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct reftable_stack *st, int first,
+				     int last,
+				     struct reftable_log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	if (sz == 0)
+		return 0;
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+
+	if (n == 0) {
+		*seglen = 0;
+		return segs;
+	}
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	reftable_free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
+{
+	uint64_t *sizes =
+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
+	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
+	int overhead = header_size(version) - 1;
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		sizes[i] = st->merged->stack[i]->size - overhead;
+	}
+	return sizes;
+}
+
+int reftable_stack_auto_compact(struct reftable_stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	reftable_free(sizes);
+	if (segment_size(&seg) > 0)
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+
+	return 0;
+}
+
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st)
+{
+	return &st->stats;
+}
+
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_table tab = { NULL };
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+	return reftable_table_read_ref(&tab, refname, ref);
+}
+
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err)
+		goto done;
+
+	err = reftable_iterator_next_log(&it, log);
+	if (err)
+		goto done;
+
+	if (strcmp(log->ref_name, refname) ||
+	    reftable_log_record_is_deletion(log)) {
+		err = 1;
+		goto done;
+	}
+
+done:
+	if (err) {
+		reftable_log_record_clear(log);
+	}
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name)
+{
+	int err = 0;
+	struct reftable_block_source src = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct reftable_ref_record *refs = NULL;
+	struct reftable_iterator it = { NULL };
+	int cap = 0;
+	int len = 0;
+	int i = 0;
+
+	if (st->config.skip_name_check)
+		return 0;
+
+	err = reftable_block_source_from_file(&src, new_tab_name);
+	if (err < 0)
+		goto done;
+
+	err = reftable_new_reader(&rd, &src, new_tab_name);
+	if (err < 0)
+		goto done;
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	if (err > 0) {
+		err = 0;
+		goto done;
+	}
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0)
+			goto done;
+
+		if (len >= cap) {
+			cap = 2 * cap + 1;
+			refs = reftable_realloc(refs, cap * sizeof(refs[0]));
+		}
+
+		refs[len++] = ref;
+	}
+
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+
+	err = validate_ref_record_addition(tab, refs, len);
+
+done:
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&refs[i]);
+	}
+
+	free(refs);
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 00000000000..60bca3b3e58
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,48 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+#include "system.h"
+
+struct reftable_stack {
+	char *list_file;
+	char *reftable_dir;
+	bool disable_auto_compact;
+
+	struct reftable_write_options config;
+
+	struct reftable_merged_table *merged;
+	struct reftable_compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg);
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config);
+int fastlog2(uint64_t sz);
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
+void reftable_addition_close(struct reftable_addition *add);
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
new file mode 100644
index 00000000000..ffcc1e42819
--- /dev/null
+++ b/reftable/stack_test.c
@@ -0,0 +1,743 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+
+#include "merged.h"
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+#include <sys/types.h>
+#include <dirent.h>
+
+static void clear_dir(const char *dirname)
+{
+	int fd = open(dirname, O_DIRECTORY, 0);
+	DIR *dir = fdopendir(fd);
+	struct dirent *ent = NULL;
+
+	assert(fd >= 0);
+
+	while ((ent = readdir(dir)) != NULL) {
+		unlinkat(fd, ent->d_name, 0);
+	}
+	closedir(dir);
+	rmdir(dirname);
+}
+
+static void test_read_file(void)
+{
+	char fn[256] = "/tmp/stack.test_read_file.XXXXXX";
+	int fd = mkstemp(fn);
+	char out[1024] = "line1\n\nline2\nline3";
+	int n, err;
+	char **names = NULL;
+	char *want[] = { "line1", "line2", "line3" };
+	int i = 0;
+
+	assert(fd > 0);
+	n = write(fd, out, strlen(out));
+	assert(n == strlen(out));
+	err = close(fd);
+	assert(err >= 0);
+
+	err = read_lines(fn, &names);
+	assert_err(err);
+
+	for (i = 0; names[i] != NULL; i++) {
+		assert(0 == strcmp(want[i], names[i]));
+	}
+	free_names(names);
+	remove(fn);
+}
+
+static void test_parse_names(void)
+{
+	char buf[] = "line\n";
+	char **names = NULL;
+	parse_names(buf, strlen(buf), &names);
+
+	assert(NULL != names[0]);
+	assert(0 == strcmp(names[0], "line"));
+	assert(NULL == names[1]);
+	free_names(names);
+}
+
+static void test_names_equal(void)
+{
+	char *a[] = { "a", "b", "c", NULL };
+	char *b[] = { "a", "b", "d", NULL };
+	char *c[] = { "a", "b", NULL };
+
+	assert(names_equal(a, a));
+	assert(!names_equal(a, b));
+	assert(!names_equal(a, c));
+}
+
+static int write_test_ref(struct reftable_writer *wr, void *arg)
+{
+	struct reftable_ref_record *ref = arg;
+	reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
+	return reftable_writer_add_ref(wr, ref);
+}
+
+struct write_log_arg {
+	struct reftable_log_record *log;
+	uint64_t update_index;
+};
+
+static int write_test_log(struct reftable_writer *wr, void *arg)
+{
+	struct write_log_arg *wla = arg;
+
+	reftable_writer_set_limits(wr, wla->update_index, wla->update_index);
+	return reftable_writer_add_log(wr, wla->log);
+}
+
+static void test_reftable_stack_add_one(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record ref = {
+		.ref_name = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st, ref.ref_name, &dest);
+	assert_err(err);
+	assert(0 == strcmp("master", dest.target));
+
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	clear_dir(dir);
+}
+
+static void test_reftable_stack_uptodate(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st1 = NULL;
+	struct reftable_stack *st2 = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int err;
+	struct reftable_ref_record ref1 = {
+		.ref_name = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record ref2 = {
+		.ref_name = "branch2",
+		.update_index = 2,
+		.target = "master",
+	};
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st1, dir, cfg);
+	assert_err(err);
+
+	err = reftable_new_stack(&st2, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st1, &write_test_ref, &ref1);
+	assert_err(err);
+
+	err = reftable_stack_add(st2, &write_test_ref, &ref2);
+	assert(err == REFTABLE_LOCK_ERROR);
+
+	err = reftable_stack_reload(st2);
+	assert_err(err);
+
+	err = reftable_stack_add(st2, &write_test_ref, &ref2);
+	assert_err(err);
+	reftable_stack_destroy(st1);
+	reftable_stack_destroy(st2);
+	clear_dir(dir);
+}
+
+static void test_reftable_stack_transaction_api(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_addition *add = NULL;
+
+	struct reftable_ref_record ref = {
+		.ref_name = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	reftable_addition_destroy(add);
+
+	err = reftable_stack_new_addition(&add, st);
+	assert_err(err);
+
+	err = reftable_addition_add(add, &write_test_ref, &ref);
+	assert_err(err);
+
+	err = reftable_addition_commit(add);
+	assert_err(err);
+
+	reftable_addition_destroy(add);
+
+	err = reftable_stack_read_ref(st, ref.ref_name, &dest);
+	assert_err(err);
+	assert(0 == strcmp("master", dest.target));
+
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	clear_dir(dir);
+}
+
+static void test_reftable_stack_validate_refname(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int i;
+	struct reftable_ref_record ref = {
+		.ref_name = "a/b",
+		.update_index = 1,
+		.target = "master",
+	};
+	char *additions[] = { "a", "a/b/c" };
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	for (i = 0; i < ARRAY_SIZE(additions); i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = additions[i],
+			.update_index = 1,
+			.target = "master",
+		};
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		assert(err == REFTABLE_NAME_CONFLICT);
+	}
+
+	reftable_stack_destroy(st);
+	clear_dir(dir);
+}
+
+static int write_error(struct reftable_writer *wr, void *arg)
+{
+	return *((int *)arg);
+}
+
+static void test_reftable_stack_update_index_check(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record ref1 = {
+		.ref_name = "name1",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record ref2 = {
+		.ref_name = "name2",
+		.update_index = 1,
+		.target = "master",
+	};
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref1);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref2);
+	assert(err == REFTABLE_API_ERROR);
+	reftable_stack_destroy(st);
+	clear_dir(dir);
+}
+
+static void test_reftable_stack_lock_failure(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err, i;
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+	for (i = -1; i != REFTABLE_EMPTY_TABLE_ERROR; i--) {
+		err = reftable_stack_add(st, &write_error, &i);
+		assert(err == i);
+	}
+
+	reftable_stack_destroy(st);
+	clear_dir(dir);
+}
+
+static void test_reftable_stack_add(void)
+{
+	int i = 0;
+	int err = 0;
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_ref_record refs[2] = { 0 };
+	struct reftable_log_record logs[2] = { 0 };
+	int N = ARRAY_SIZE(refs);
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+	st->disable_auto_compact = true;
+
+	for (i = 0; i < N; i++) {
+		char buf[256];
+		snprintf(buf, sizeof(buf), "branch%02d", i);
+		refs[i].ref_name = xstrdup(buf);
+		refs[i].value = reftable_malloc(SHA1_SIZE);
+		refs[i].update_index = i + 1;
+		set_test_hash(refs[i].value, i);
+
+		logs[i].ref_name = xstrdup(buf);
+		logs[i].update_index = N + i + 1;
+		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+		logs[i].email = xstrdup("identity@invalid");
+		set_test_hash(logs[i].new_hash, i);
+	}
+
+	for (i = 0; i < N; i++) {
+		int err = reftable_stack_add(st, &write_test_ref, &refs[i]);
+		assert_err(err);
+	}
+
+	for (i = 0; i < N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		struct reftable_ref_record dest = { 0 };
+		int err = reftable_stack_read_ref(st, refs[i].ref_name, &dest);
+		assert_err(err);
+		assert(reftable_ref_record_equal(&dest, refs + i, SHA1_SIZE));
+		reftable_ref_record_clear(&dest);
+	}
+
+	for (i = 0; i < N; i++) {
+		struct reftable_log_record dest = { 0 };
+		int err = reftable_stack_read_log(st, refs[i].ref_name, &dest);
+		assert_err(err);
+		assert(reftable_log_record_equal(&dest, logs + i, SHA1_SIZE));
+		reftable_log_record_clear(&dest);
+	}
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i < N; i++) {
+		reftable_ref_record_clear(&refs[i]);
+		reftable_log_record_clear(&logs[i]);
+	}
+	clear_dir(dir);
+}
+
+static void test_reftable_stack_tombstone(void)
+{
+	int i = 0;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record refs[2] = { 0 };
+	struct reftable_log_record logs[2] = { 0 };
+	int N = ARRAY_SIZE(refs);
+	struct reftable_ref_record dest = { 0 };
+	struct reftable_log_record log_dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		const char *buf = "branch";
+		refs[i].ref_name = xstrdup(buf);
+		refs[i].update_index = i + 1;
+		if (i % 2 == 0) {
+			refs[i].value = reftable_malloc(SHA1_SIZE);
+			set_test_hash(refs[i].value, i);
+		}
+		logs[i].ref_name = xstrdup(buf);
+		/* update_index is part of the key. */
+		logs[i].update_index = 42;
+		if (i % 2 == 0) {
+			logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+			set_test_hash(logs[i].new_hash, i);
+			logs[i].email = xstrdup("identity@invalid");
+		}
+	}
+	for (i = 0; i < N; i++) {
+		int err = reftable_stack_add(st, &write_test_ref, &refs[i]);
+		assert_err(err);
+	}
+	for (i = 0; i < N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_read_ref(st, "branch", &dest);
+	assert(err == 1);
+	reftable_ref_record_clear(&dest);
+
+	err = reftable_stack_read_log(st, "branch", &log_dest);
+	assert(err == 1);
+	reftable_log_record_clear(&log_dest);
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st, "branch", &dest);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, "branch", &log_dest);
+	assert(err == 1);
+	reftable_ref_record_clear(&dest);
+	reftable_log_record_clear(&log_dest);
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i < N; i++) {
+		reftable_ref_record_clear(&refs[i]);
+		reftable_log_record_clear(&logs[i]);
+	}
+	clear_dir(dir);
+}
+
+static void test_reftable_stack_hash_id(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+
+	struct reftable_ref_record ref = {
+		.ref_name = "master",
+		.target = "target",
+		.update_index = 1,
+	};
+	struct reftable_write_options cfg32 = { .hash_id = SHA256_ID };
+	struct reftable_stack *st32 = NULL;
+	struct reftable_write_options cfg_default = { 0 };
+	struct reftable_stack *st_default = NULL;
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	/* can't read it with the wrong hash ID. */
+	err = reftable_new_stack(&st32, dir, cfg32);
+	assert(err == REFTABLE_FORMAT_ERROR);
+
+	/* check that we can read it back with default config too. */
+	err = reftable_new_stack(&st_default, dir, cfg_default);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st_default, "master", &dest);
+	assert_err(err);
+
+	assert(!strcmp(dest.target, ref.target));
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	reftable_stack_destroy(st_default);
+	clear_dir(dir);
+}
+
+static void test_log2(void)
+{
+	assert(1 == fastlog2(3));
+	assert(2 == fastlog2(4));
+	assert(2 == fastlog2(5));
+}
+
+static void test_sizes_to_segments(void)
+{
+	uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
+	/* .................0  1  2  3  4  5 */
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(segs[2].log == 3);
+	assert(segs[2].start == 5);
+	assert(segs[2].end == 6);
+
+	assert(segs[1].log == 2);
+	assert(segs[1].start == 2);
+	assert(segs[1].end == 5);
+	reftable_free(segs);
+}
+
+static void test_sizes_to_segments_empty(void)
+{
+	uint64_t sizes[0];
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(seglen == 0);
+	reftable_free(segs);
+}
+
+static void test_sizes_to_segments_all_equal(void)
+{
+	uint64_t sizes[] = { 5, 5 };
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(seglen == 1);
+	assert(segs[0].start == 0);
+	assert(segs[0].end == 2);
+	reftable_free(segs);
+}
+
+static void test_suggest_compaction_segment(void)
+{
+	uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
+	/* .................0    1    2  3   4  5  6 */
+	struct segment min =
+		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
+	assert(min.start == 2);
+	assert(min.end == 7);
+}
+
+static void test_suggest_compaction_segment_nothing(void)
+{
+	uint64_t sizes[] = { 64, 32, 16, 8, 4, 2 };
+	struct segment result =
+		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
+	assert(result.start == result.end);
+}
+
+static void test_reflog_expire(void)
+{
+	char dir[256] = "/tmp/stack.test_reflog_expire.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	struct reftable_log_record logs[20] = { 0 };
+	int N = ARRAY_SIZE(logs) - 1;
+	int i = 0;
+	int err;
+	struct reftable_log_expiry_config expiry = {
+		.time = 10,
+	};
+	struct reftable_log_record log = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 1; i <= N; i++) {
+		char buf[256];
+		snprintf(buf, sizeof(buf), "branch%02d", i);
+
+		logs[i].ref_name = xstrdup(buf);
+		logs[i].update_index = i;
+		logs[i].time = i;
+		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+		logs[i].email = xstrdup("identity@invalid");
+		set_test_hash(logs[i].new_hash, i);
+	}
+
+	for (i = 1; i <= N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	err = reftable_stack_compact_all(st, &expiry);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, logs[9].ref_name, &log);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, logs[11].ref_name, &log);
+	assert_err(err);
+
+	expiry.min_update_index = 15;
+	err = reftable_stack_compact_all(st, &expiry);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, logs[14].ref_name, &log);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, logs[16].ref_name, &log);
+	assert_err(err);
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i <= N; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	clear_dir(dir);
+	reftable_log_record_clear(&log);
+}
+
+static int write_nothing(struct reftable_writer *wr, void *arg)
+{
+	reftable_writer_set_limits(wr, 1, 1);
+	return 0;
+}
+
+static void test_empty_add(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_stack *st2 = NULL;
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_nothing, NULL);
+	assert_err(err);
+
+	err = reftable_new_stack(&st2, dir, cfg);
+	assert_err(err);
+	clear_dir(dir);
+	reftable_stack_destroy(st);
+	reftable_stack_destroy(st2);
+}
+
+static void test_reftable_stack_auto_compaction(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int err, i;
+	int N = 100;
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		char name[100];
+		struct reftable_ref_record ref = {
+			.ref_name = name,
+			.update_index = reftable_stack_next_update_index(st),
+			.target = "master",
+		};
+		snprintf(name, sizeof(name), "branch%04d", i);
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		assert_err(err);
+
+		assert(i < 3 || st->merged->stack_len < 2 * fastlog2(i));
+	}
+
+	assert(reftable_stack_compaction_stats(st)->entries_written <
+	       (uint64_t)(N * fastlog2(N)));
+
+	reftable_stack_destroy(st);
+	clear_dir(dir);
+}
+
+int stack_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_reftable_stack_uptodate",
+		      &test_reftable_stack_uptodate);
+	add_test_case("test_reftable_stack_transaction_api",
+		      &test_reftable_stack_transaction_api);
+	add_test_case("test_reftable_stack_hash_id",
+		      &test_reftable_stack_hash_id);
+	add_test_case("test_sizes_to_segments_all_equal",
+		      &test_sizes_to_segments_all_equal);
+	add_test_case("test_reftable_stack_auto_compaction",
+		      &test_reftable_stack_auto_compaction);
+	add_test_case("test_reftable_stack_validate_refname",
+		      &test_reftable_stack_validate_refname);
+	add_test_case("test_reftable_stack_update_index_check",
+		      &test_reftable_stack_update_index_check);
+	add_test_case("test_reftable_stack_lock_failure",
+		      &test_reftable_stack_lock_failure);
+	add_test_case("test_reftable_stack_tombstone",
+		      &test_reftable_stack_tombstone);
+	add_test_case("test_reftable_stack_add_one",
+		      &test_reftable_stack_add_one);
+	add_test_case("test_empty_add", test_empty_add);
+	add_test_case("test_reflog_expire", test_reflog_expire);
+	add_test_case("test_suggest_compaction_segment",
+		      &test_suggest_compaction_segment);
+	add_test_case("test_suggest_compaction_segment_nothing",
+		      &test_suggest_compaction_segment_nothing);
+	add_test_case("test_sizes_to_segments", &test_sizes_to_segments);
+	add_test_case("test_sizes_to_segments_empty",
+		      &test_sizes_to_segments_empty);
+	add_test_case("test_log2", &test_log2);
+	add_test_case("test_parse_names", &test_parse_names);
+	add_test_case("test_read_file", &test_read_file);
+	add_test_case("test_names_equal", &test_names_equal);
+	add_test_case("test_reftable_stack_add", &test_reftable_stack_add);
+	return test_main(argc, argv);
+}
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 00000000000..9e2c4b8d9ac
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,54 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#if 1 /* REFTABLE_IN_GITCORE */
+
+#include "git-compat-util.h"
+#include "cache.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_IN_GITCORE */
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef uint8_t byte;
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ *
+ * TODO: in git-core, this should fallback to uncompress2 if it is available.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/test_framework.c b/reftable/test_framework.c
new file mode 100644
index 00000000000..44de31b7338
--- /dev/null
+++ b/reftable/test_framework.c
@@ -0,0 +1,69 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "test_framework.h"
+
+#include "system.h"
+#include "basics.h"
+#include "constants.h"
+
+struct test_case **test_cases;
+int test_case_len;
+int test_case_cap;
+
+struct test_case *new_test_case(const char *name, void (*testfunc)(void))
+{
+	struct test_case *tc = reftable_malloc(sizeof(struct test_case));
+	tc->name = name;
+	tc->testfunc = testfunc;
+	return tc;
+}
+
+struct test_case *add_test_case(const char *name, void (*testfunc)(void))
+{
+	struct test_case *tc = new_test_case(name, testfunc);
+	if (test_case_len == test_case_cap) {
+		test_case_cap = 2 * test_case_cap + 1;
+		test_cases = reftable_realloc(
+			test_cases, sizeof(struct test_case) * test_case_cap);
+	}
+
+	test_cases[test_case_len++] = tc;
+	return tc;
+}
+
+int test_main(int argc, const char *argv[])
+{
+	const char *filter = NULL;
+	int i = 0;
+	if (argc > 1) {
+		filter = argv[1];
+	}
+
+	for (i = 0; i < test_case_len; i++) {
+		const char *name = test_cases[i]->name;
+		if (filter == NULL || strstr(name, filter) != NULL) {
+			printf("case %s\n", name);
+			test_cases[i]->testfunc();
+		} else {
+			printf("skip %s\n", name);
+		}
+
+		reftable_free(test_cases[i]);
+	}
+	reftable_free(test_cases);
+	test_cases = 0;
+	test_case_len = 0;
+	test_case_cap = 0;
+	return 0;
+}
+
+void set_test_hash(byte *p, int i)
+{
+	memset(p, (byte)i, hash_size(SHA1_ID));
+}
diff --git a/reftable/test_framework.h b/reftable/test_framework.h
new file mode 100644
index 00000000000..a3ba7d9a6e6
--- /dev/null
+++ b/reftable/test_framework.h
@@ -0,0 +1,64 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TEST_FRAMEWORK_H
+#define TEST_FRAMEWORK_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#ifdef NDEBUG
+#undef NDEBUG
+#endif
+
+#include "system.h"
+
+#ifdef assert
+#undef assert
+#endif
+
+#define assert_err(c)                                                  \
+	if (c != 0) {                                                  \
+		fflush(stderr);                                        \
+		fflush(stdout);                                        \
+		fprintf(stderr, "%s: %d: error == %d (%s), want 0\n",  \
+			__FILE__, __LINE__, c, reftable_error_str(c)); \
+		abort();                                               \
+	}
+
+#define assert_streq(a, b)                                               \
+	if (strcmp(a, b)) {                                              \
+		fflush(stderr);                                          \
+		fflush(stdout);                                          \
+		fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, \
+			__LINE__, #a, a, #b, b);                         \
+		abort();                                                 \
+	}
+
+#define assert(c)                                                          \
+	if (!(c)) {                                                        \
+		fflush(stderr);                                            \
+		fflush(stdout);                                            \
+		fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, \
+			__LINE__, #c);                                     \
+		abort();                                                   \
+	}
+
+struct test_case {
+	const char *name;
+	void (*testfunc)(void);
+};
+
+struct test_case *new_test_case(const char *name, void (*testfunc)(void));
+struct test_case *add_test_case(const char *name, void (*testfunc)(void));
+int test_main(int argc, const char *argv[]);
+
+void set_test_hash(byte *p, int i);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 00000000000..0061d14e306
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,63 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	int res;
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				reftable_calloc(sizeof(struct tree_node));
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	res = compare(key, (*rootp)->key);
+	if (res < 0)
+		return tree_search(key, &(*rootp)->left, compare, insert);
+	else if (res > 0)
+		return tree_search(key, &(*rootp)->right, compare, insert);
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	reftable_free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 00000000000..954512e9a3a
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+/* tree_node is a generic binary search tree. */
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+/* looks for `key` in `rootp` using `compare` as comparison function. If insert
+   is set, insert the key if it's not found. Else, return NULL.
+*/
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+
+/* performs an infix walk of the tree. */
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+
+/*
+  deallocates the tree nodes recursively. Keys should be deallocated separately
+  by walking over the tree. */
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/tree_test.c b/reftable/tree_test.c
new file mode 100644
index 00000000000..c6d448cbe8a
--- /dev/null
+++ b/reftable/tree_test.c
@@ -0,0 +1,62 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static int test_compare(const void *a, const void *b)
+{
+	return (char *)a - (char *)b;
+}
+
+struct curry {
+	void *last;
+};
+
+static void check_increasing(void *arg, void *key)
+{
+	struct curry *c = (struct curry *)arg;
+	if (c->last != NULL) {
+		assert(test_compare(c->last, key) < 0);
+	}
+	c->last = key;
+}
+
+static void test_tree(void)
+{
+	struct tree_node *root = NULL;
+
+	void *values[11] = { 0 };
+	struct tree_node *nodes[11] = { 0 };
+	int i = 1;
+	struct curry c = { 0 };
+	do {
+		nodes[i] = tree_search(values + i, &root, &test_compare, 1);
+		i = (i * 7) % 11;
+	} while (i != 1);
+
+	for (i = 1; i < ARRAY_SIZE(nodes); i++) {
+		assert(values + i == nodes[i]->key);
+		assert(nodes[i] ==
+		       tree_search(values + i, &root, &test_compare, 0));
+	}
+
+	infix_walk(root, check_increasing, &c);
+	tree_free(root);
+}
+
+int tree_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_tree", &test_tree);
+	return test_main(argc, argv);
+}
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100755
index 00000000000..b1d0c649f1f
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,25 @@
+#!/bin/sh
+
+set -eu
+
+# Override this to import from somewhere else, say "../reftable".
+SRC=${SRC:-origin}
+BRANCH=${BRANCH:-master}
+
+((git --git-dir reftable-repo/.git fetch -f ${SRC} ${BRANCH}:import && cd reftable-repo && git checkout -f $(git rev-parse import) ) ||
+   git clone https://github.com/google/reftable reftable-repo)
+
+cp reftable-repo/c/*.[ch] reftable/
+cp reftable-repo/c/include/*.[ch] reftable/
+cp reftable-repo/LICENSE reftable/
+
+git --git-dir reftable-repo/.git show --no-patch --format=oneline HEAD \
+  > reftable/VERSION
+
+mv reftable/system.h reftable/system.h~
+sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
+
+# Remove compatibility hacks we don't need here.
+rm reftable/compat.*
+
+git add reftable/*.[ch] reftable/LICENSE reftable/VERSION
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 00000000000..c55ff238d96
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,644 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct reftable_block_stats *
+writer_reftable_block_stats(struct reftable_writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct reftable_writer *w, byte *data, size_t len,
+			int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = reftable_calloc(w->pending_padding);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0)
+			return n;
+
+		w->pending_padding = 0;
+		reftable_free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0)
+		return n;
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct reftable_write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_version(struct reftable_writer *w)
+{
+	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
+}
+
+static int writer_write_header(struct reftable_writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	dest[4] = writer_version(w);
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	if (writer_version(w) == 2) {
+		put_be32(dest + 24, w->opts.hash_id);
+	}
+	return header_size(writer_version(w));
+}
+
+static void writer_reinit_block_writer(struct reftable_writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = header_size(writer_version(w));
+	}
+
+	slice_release(&w->last_key);
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, byte *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts)
+{
+	struct reftable_writer *wp =
+		reftable_calloc(sizeof(struct reftable_writer));
+	slice_init(&wp->block_writer_data.last_key);
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->last_key = reftable_empty_slice;
+	wp->block = reftable_calloc(opts->block_size);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void reftable_writer_free(struct reftable_writer *w)
+{
+	reftable_free(w->block);
+	reftable_free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+#define OBJ_INDEX_TREE_NODE_INIT   \
+	{                          \
+		.hash = SLICE_INIT \
+	}
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_cmp(&((const struct obj_index_tree_node *)a)->hash,
+			 &((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct reftable_writer *w, struct slice *hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = *hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		struct obj_index_tree_node empty = OBJ_INDEX_TREE_NODE_INIT;
+		key = reftable_malloc(sizeof(struct obj_index_tree_node));
+		*key = empty;
+
+		slice_copy(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = reftable_realloc(
+			key->offsets, sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct reftable_writer *w,
+			     struct reftable_record *rec)
+{
+	int result = -1;
+	struct slice key = SLICE_INIT;
+	int err = 0;
+	reftable_record_key(rec, &key);
+	if (slice_cmp(&w->last_key, &key) >= 0)
+		goto done;
+
+	slice_copy(&w->last_key, &key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, reftable_record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == reftable_record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto done;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto done;
+	}
+
+	writer_reinit_block_writer(w, reftable_record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto done;
+	}
+
+	result = 0;
+done:
+	slice_release(&key);
+	return result;
+}
+
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	struct reftable_ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL)
+		return REFTABLE_API_ERROR;
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index)
+		return REFTABLE_API_ERROR;
+
+	reftable_record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, &rec);
+	if (err < 0)
+		return err;
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = {
+			.buf = ref->value,
+			.len = hash_size(w->opts.hash_id),
+			.canary = SLICE_CANARY,
+		};
+
+		writer_index_hash(w, &h);
+	}
+
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = {
+			.buf = ref->target_value,
+			.len = hash_size(w->opts.hash_id),
+			.canary = SLICE_CANARY,
+		};
+		writer_index_hash(w, &h);
+	}
+	return 0;
+}
+
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, reftable_ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	int err;
+	if (log->ref_name == NULL)
+		return REFTABLE_API_ERROR;
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0)
+			return err;
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	reftable_record_from_log(&rec, log);
+	err = writer_add_record(w, &rec);
+	return err;
+}
+
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, reftable_log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	struct reftable_block_stats *bstats = NULL;
+	if (err < 0)
+		return err;
+
+	while (w->index_len > threshold) {
+		struct reftable_index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct reftable_record rec = { 0 };
+			reftable_record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, &rec) == 0) {
+				continue;
+			}
+
+			err = writer_flush_block(w);
+			if (err < 0)
+				return err;
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, &rec);
+			if (err != 0) {
+				/* write into fresh block should always succeed
+				 */
+				abort();
+			}
+		}
+		for (i = 0; i < idx_len; i++) {
+			slice_release(&idx[i].last_key);
+		}
+		reftable_free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0)
+		return err;
+
+	bstats = writer_reftable_block_stats(w, typ);
+	bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
+	bstats->index_offset = index_start;
+	bstats->max_index_level = max_level;
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(&entry->hash, arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct reftable_writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct reftable_obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct reftable_record rec = { 0 };
+	if (arg->err < 0)
+		goto done;
+
+	reftable_record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0)
+		goto done;
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0)
+		goto done;
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0)
+		goto done;
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+done:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	slice_release(&entry->hash);
+	reftable_free(entry);
+}
+
+static int writer_dump_object_index(struct reftable_writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0)
+		return closure.err;
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct reftable_writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL)
+		return 0;
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0)
+		return err;
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0)
+			return err;
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int reftable_writer_close(struct reftable_writer *w)
+{
+	byte footer[72];
+	byte *p = footer;
+	int err = writer_finish_public_section(w);
+	int empty_table = w->next == 0;
+	if (err != 0)
+		goto done;
+	w->pending_padding = 0;
+	if (empty_table) {
+		/* Empty tables need a header anyway. */
+		byte header[28];
+		int n = writer_write_header(w, header);
+		err = padded_write(w, header, n, 0);
+		if (err < 0)
+			goto done;
+	}
+
+	p += writer_write_header(w, footer);
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+
+	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
+	if (err < 0)
+		goto done;
+
+	if (empty_table) {
+		err = REFTABLE_EMPTY_TABLE_ERROR;
+		goto done;
+	}
+
+done:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	slice_release(&w->last_key);
+	return err;
+}
+
+void writer_clear_index(struct reftable_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		slice_release(&w->index[i].last_key);
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct reftable_block_stats *bstats =
+		writer_reftable_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	struct reftable_index_record ir = { .last_key = SLICE_INIT };
+	if (raw_bytes < 0)
+		return raw_bytes;
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0)
+		return err;
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = reftable_realloc(
+			w->index,
+			sizeof(struct reftable_index_record) * w->index_cap);
+	}
+
+	ir.offset = w->next;
+	slice_copy(&ir.last_key, &w->block_writer->last_key);
+	w->index[w->index_len] = ir;
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct reftable_writer *w)
+{
+	if (w->block_writer == NULL)
+		return 0;
+	if (w->block_writer->entries == 0)
+		return 0;
+	return writer_flush_nonempty_block(w);
+}
+
+const struct reftable_stats *writer_stats(struct reftable_writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 00000000000..c56d67f7116
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct reftable_writer {
+	int (*write)(void *, byte *, size_t);
+	void *write_arg;
+	int pending_padding;
+	struct slice last_key;
+
+	/* offset of next block to write. */
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct reftable_write_options opts;
+
+	/* memory buffer for writing */
+	byte *block;
+
+	/* writer for the current section. NULL or points to
+	 * block_writer_data */
+	struct block_writer *block_writer;
+
+	struct block_writer block_writer_data;
+
+	/* pending index records for the current section */
+	struct reftable_index_record *index;
+	int index_len;
+	int index_cap;
+
+	/*
+	  tree for use with tsearch; used to populate the 'o' inverse OID
+	  map */
+	struct tree_node *obj_index_tree;
+
+	struct reftable_stats stats;
+};
+
+/* finishes a block, and writes it to storage */
+int writer_flush_block(struct reftable_writer *w);
+
+/* deallocates memory related to the index */
+void writer_clear_index(struct reftable_writer *w);
+
+/* finishes writing a 'r' (refs) or 'g' (reflogs) section */
+int writer_finish_public_section(struct reftable_writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 00000000000..3e0b0f24f1c
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v16 10/14] Reftable support for git-core
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
                                                 ` (8 preceding siblings ...)
  2020-06-05 18:03                               ` [PATCH v16 09/14] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-06-05 18:03                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 11/14] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
                                                 ` (5 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

This introduces the refs/reftable-backend.c containing reftable powered ref
storage backend.

It can be activated by passing --ref-storage=reftable to "git init".

TODO:

* Fix worktree commands

* Spots marked XXX

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   27 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   56 +-
 cache.h                                       |    6 +-
 refs.c                                        |   27 +-
 refs.h                                        |    3 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1331 +++++++++++++++++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |  142 ++
 13 files changed, 1590 insertions(+), 30 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ffd..72576235833 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index 372139f1f24..6b21c247b12 100644
--- a/Makefile
+++ b/Makefile
@@ -807,6 +807,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -959,6 +960,7 @@ LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1162,7 +1164,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2352,11 +2354,29 @@ VCSSVN_OBJS += vcs-svn/sliding_window.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/refname.o
+REFTABLE_OBJS += reftable/reftable.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2497,6 +2517,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -3112,7 +3135,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/builtin/clone.c b/builtin/clone.c
index cb48a291caf..4d0cf065e4a 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1108,7 +1108,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
+		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 0b7222e7188..da5b4670c84 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,7 +178,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -188,7 +189,8 @@ void initialize_repository_version(int hash_algo)
 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
 #endif
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 ||
+	    !strcmp(ref_storage_format, "reftable"))
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -238,6 +240,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -247,6 +250,24 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
+	/*
+	 * refs/heads is a file when using reftable. We can't reinitialize with
+	 * a reftable because it will overwrite HEAD
+	 */
+	if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
+			      is_directory(git_path_buf(&buf, "refs/heads"))) {
+		die("cannot switch ref storage format.");
+	}
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -261,15 +282,12 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
@@ -383,7 +401,8 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash, unsigned int flags)
+	    const char *template_dir, int hash, const char *ref_storage_format,
+	    unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -422,6 +441,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
+	repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
@@ -450,6 +470,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -523,6 +545,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = default_ref_storage();
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
@@ -530,15 +553,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *object_format = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
@@ -648,9 +674,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
+	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
+		       ref_storage_format, flags);
 }
diff --git a/cache.h b/cache.h
index 0f0485ecfe2..8cb884773c3 100644
--- a/cache.h
+++ b/cache.h
@@ -628,8 +628,9 @@ int path_inside_repo(const char *prefix, const char *path);
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
-	    unsigned int flags);
-void initialize_repository_version(int hash_algo);
+	    const char *ref_storage_format, unsigned int flags);
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1043,6 +1044,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 29e710a29e9..4409080dfd8 100644
--- a/refs.c
+++ b/refs.c
@@ -17,10 +17,16 @@
 #include "argv-array.h"
 #include "repository.h"
 
+const char *default_ref_storage(void)
+{
+	const char *test = getenv("GIT_TEST_REFTABLE");
+	return test ? "reftable" : "files";
+}
+
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1717,13 +1723,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1739,7 +1745,11 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs_private = ref_store_init(r->gitdir,
+					 r->ref_storage_format ?
+						 r->ref_storage_format :
+						 DEFAULT_REF_STORAGE,
+					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
 
@@ -1794,7 +1804,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1808,6 +1818,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1821,9 +1832,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index 7aaa1226551..ddcc940d5c7 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* Returns the ref storage backend to use by default. */
+const char *default_ref_storage(void);
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index dc9e8d3a92b..7afe4c28310 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -693,6 +693,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 00000000000..68b1679ae48
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1331 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct git_reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *repo_dir;
+	char *reftable_dir;
+	struct reftable_stack *stack;
+};
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type);
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	reftable_log_record_clear(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = { NULL };
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	reftable_log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *git_reftable_ref_store_create(const char *path,
+						       unsigned int store_flags)
+{
+	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct reftable_write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+	refs->repo_dir = xstrdup(path);
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
+	assert(refs->err != REFTABLE_API_ERROR);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct strbuf sb = STRBUF_INIT;
+
+	safe_create_dir(refs->reftable_dir, 1);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	return 0;
+}
+
+struct git_reftable_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	const char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = reftable_iterator_next_ref(&ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		/*
+		   We could filter pseudo refs here explicitly, but HEAD is not
+		   a PSEUDOREF, but a PER_WORKTREE, b/c each worktree can have
+		   its own HEAD.
+		 */
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	reftable_ref_record_clear(&ri->ref);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct reftable_merged_table *mt = NULL;
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else {
+		mt = reftable_stack_merged_table(refs->stack);
+		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->prefix = prefix;
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int fixup_symrefs(struct ref_store *ref_store,
+			 struct ref_transaction *transaction)
+{
+	struct strbuf referent = STRBUF_INIT;
+	int i = 0;
+	int err = 0;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *update = transaction->updates[i];
+		struct object_id old_oid;
+
+		err = reftable_read_raw_ref(ref_store, update->refname,
+					    &old_oid, &referent,
+					    /* mutate input, like
+					       files-backend.c */
+					    &update->type);
+		if (err < 0 && errno == ENOENT &&
+		    is_null_oid(&update->old_oid)) {
+			err = 0;
+		}
+		if (err < 0)
+			goto done;
+
+		if (!(update->type & REF_ISSYMREF))
+			continue;
+
+		if (update->flags & REF_NO_DEREF) {
+			/* what should happen here? See files-backend.c
+			 * lock_ref_for_update. */
+		} else {
+			/*
+			  If we are updating a symref (eg. HEAD), we should also
+			  update the branch that the symref points to.
+
+			  This is generic functionality, and would be better
+			  done in refs.c, but the current implementation is
+			  intertwined with the locking in files-backend.c.
+			*/
+			int new_flags = update->flags;
+			struct ref_update *new_update = NULL;
+
+			/* if this is an update for HEAD, should also record a
+			   log entry for HEAD? See files-backend.c,
+			   split_head_update()
+			*/
+			new_update = ref_transaction_add_update(
+				transaction, referent.buf, new_flags,
+				&update->new_oid, &update->old_oid,
+				update->msg);
+			new_update->parent_update = update;
+
+			/* files-backend sets REF_LOG_ONLY here. */
+			update->flags |= REF_NO_DEREF | REF_LOG_ONLY;
+			update->flags &= ~REF_HAVE_OLD;
+		}
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	strbuf_release(&referent);
+	return err;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_addition *add = NULL;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_new_addition(&add, refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = fixup_symrefs(ref_store, transaction);
+	if (err) {
+		goto done;
+	}
+
+	transaction->backend_data = add;
+	transaction->state = REF_TRANSACTION_PREPARED;
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	if (err < 0) {
+		transaction->state = REF_TRANSACTION_CLOSED;
+		strbuf_addf(errbuf, "reftable: transaction prepare: %s",
+			    reftable_error_str(err));
+	}
+
+	return err;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	reftable_addition_destroy(add);
+	transaction->backend_data = NULL;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid;
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp((*(struct ref_update **)a)->refname,
+		      (*(struct ref_update **)b)->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = reftable_stack_next_update_index(refs->stack);
+	int err = 0;
+	int i = 0;
+	struct reftable_log_record *logs =
+		calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct reftable_log_record *log = &logs[i];
+		fill_reftable_log_record(log);
+		log->ref_name = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_LOG_ONLY) {
+			continue;
+		}
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct reftable_ref_record ref = { NULL };
+			struct object_id peeled;
+
+			int peel_error = peel_object(&u->new_oid, &peeled);
+			ref.ref_name = (char *)u->refname;
+
+			if (!is_null_oid(&u->new_oid)) {
+				ref.value = u->new_oid.hash;
+			}
+			ref.update_index = ts;
+			if (!peel_error) {
+				ref.target_value = peeled.hash;
+			}
+
+			err = reftable_writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		err = reftable_writer_add_log(writer, &logs[i]);
+		clear_reftable_log_record(&logs[i]);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	int err = 0;
+	int i;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+
+	err = reftable_addition_add(add, &write_transaction_table, transaction);
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_addition_commit(add);
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_addition_destroy(add);
+	transaction->state = REF_TRANSACTION_CLOSED;
+	transaction->backend_data = NULL;
+	if (err) {
+		strbuf_addf(errmsg, "reftable: transaction failure: %s",
+			    reftable_error_str(err));
+		return -1;
+	}
+	return err;
+}
+
+static int
+reftable_transaction_initial_commit(struct ref_store *ref_store,
+				    struct ref_transaction *transaction,
+				    struct strbuf *errmsg)
+{
+	int err = reftable_transaction_prepare(ref_store, transaction, errmsg);
+	if (err)
+		return err;
+
+	return reftable_transaction_finish(ref_store, transaction, errmsg);
+}
+
+struct write_pseudoref_arg {
+	struct reftable_stack *stack;
+	const char *pseudoref;
+	const struct object_id *new_oid;
+	const struct object_id *old_oid;
+};
+
+static int write_pseudoref_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_pseudoref_arg *arg = (struct write_pseudoref_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	struct reftable_ref_record read_ref = { NULL };
+	struct reftable_ref_record write_ref = { NULL };
+
+	reftable_writer_set_limits(writer, ts, ts);
+	if (arg->old_oid) {
+		struct object_id read_oid;
+		err = reftable_stack_read_ref(arg->stack, arg->pseudoref,
+					      &read_ref);
+		if (err < 0)
+			goto done;
+
+		if ((err > 0) != is_null_oid(arg->old_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		/* XXX If old_oid is set, and we have a symref? */
+
+		if (err == 0 && read_ref.value == NULL) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		hashcpy(read_oid.hash, read_ref.value);
+		if (!oideq(arg->old_oid, &read_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+	}
+
+	write_ref.ref_name = (char *)arg->pseudoref;
+	write_ref.update_index = ts;
+	if (!is_null_oid(arg->new_oid))
+		write_ref.value = (uint8_t *)arg->new_oid->hash;
+
+	err = reftable_writer_add_ref(writer, &write_ref);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&read_ref);
+	return err;
+}
+
+static int reftable_write_pseudoref(struct ref_store *ref_store,
+				    const char *pseudoref,
+				    const struct object_id *oid,
+				    const struct object_id *old_oid,
+				    struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_pseudoref_arg arg = {
+		.stack = refs->stack,
+		.pseudoref = pseudoref,
+		.new_oid = oid,
+	};
+	struct reftable_addition *add = NULL;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_new_addition(&add, refs->stack);
+	if (err) {
+		goto done;
+	}
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		/* XXX this is cut & paste from files-backend - should factor
+		 * out? */
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(errbuf,
+					    _("could not read ref '%s'"),
+					    pseudoref);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(errbuf, _("ref '%s' already exists"),
+				    pseudoref);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(errbuf,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			goto done;
+		}
+	}
+
+	err = reftable_addition_add(add, &write_pseudoref_table, &arg);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref update failure: %s",
+			    reftable_error_str(err));
+	}
+
+	err = reftable_addition_commit(add);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref commit failure: %s",
+			    reftable_error_str(err));
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_addition_destroy(add);
+	return err;
+}
+
+static int reftable_delete_pseudoref(struct ref_store *ref_store,
+				     const char *pseudoref,
+				     const struct object_id *old_oid)
+{
+	struct strbuf errbuf = STRBUF_INIT;
+	int ret = reftable_write_pseudoref(ref_store, pseudoref, &null_oid,
+					   old_oid, &errbuf);
+	/* XXX what to do with the error message? */
+	strbuf_release(&errbuf);
+	return ret;
+}
+
+struct write_delete_refs_arg {
+	struct reftable_stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	int i = 0;
+
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = reftable_writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_log_record log = { NULL };
+		struct reftable_ref_record current = { NULL };
+		fill_reftable_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (reftable_stack_read_ref(arg->stack, log.ref_name,
+					    &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = reftable_writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		reftable_ref_record_clear(&current);
+
+		clear_reftable_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	string_list_sort(refnames);
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct git_reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = reftable_stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	reftable_writer_set_limits(writer, ts, ts);
+	err = reftable_writer_add_ref(writer, &ref);
+	if (err < 0) {
+		return err;
+	}
+
+	{
+		struct reftable_log_record log = { NULL };
+		struct object_id new_oid;
+		struct object_id old_oid;
+		struct reftable_ref_record current = { NULL };
+		reftable_stack_read_ref(create->refs->stack, create->refname,
+					&current);
+
+		fill_reftable_log_record(&log);
+		log.ref_name = current.ref_name;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			reftable_writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_reftable_log_record(&log);
+	}
+	return 0;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(refs->stack, &write_create_symref_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+struct write_rename_arg {
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_ref_record ref = { NULL };
+	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto done;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto done;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	reftable_writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct reftable_ref_record todo[2] = { { NULL } };
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = reftable_writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct reftable_log_record todo[2] = { { NULL } };
+		fill_reftable_log_record(&todo[0]);
+		fill_reftable_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = reftable_writer_add_logs(writer, todo, 2);
+
+		clear_reftable_log_record(&todo[0]);
+		clear_reftable_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto done;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_add(refs->stack, &write_rename_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = reftable_iterator_next_log(&ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			/* we want the refnames that we have reflogs for, so we
+			 * skip if we've already produced this name. This could
+			 * be faster by seeking directly to
+			 * reflog@update_index==0.
+			 */
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	reftable_log_record_clear(&ri->log);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(refs->stack);
+	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_log_record log = { NULL };
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		{
+			struct object_id old_oid;
+			struct object_id new_oid;
+			const char *full_committer = "";
+
+			hashcpy(old_oid.hash, log.old_hash);
+			hashcpy(new_oid.hash, log.new_hash);
+
+			full_committer = fmt_ident(log.name, log.email,
+						   WANT_COMMITTER_IDENT,
+						   /*date*/ NULL,
+						   IDENT_NO_DATE);
+			if (fn(&old_oid, &new_oid, full_committer, log.time,
+			       log.tz_offset, log.message, cb_data)) {
+				err = -1;
+				break;
+			}
+		}
+	}
+
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+	int i = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct reftable_log_record log = { NULL };
+		err = reftable_iterator_next_log(&it, &log);
+		if (err != 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (i = len; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
+			log->tz_offset, log->message, cb_data)) {
+			err = -1;
+			break;
+		}
+	}
+
+	for (i = 0; i < len; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	reftable_iterator_destroy(&it);
+	if (err > 0) {
+		err = 0;
+	}
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		reftable_log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct reftable_log_record tombstone = {
+		.ref_name = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->refs->stack);
+	int i = 0;
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  reftable_pack_refs(), which will compact the entire stack and get rid
+	  of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.refs = refs,
+	};
+	struct reftable_log_record log = { NULL };
+	struct reftable_iterator it = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		goto done;
+	}
+
+	while (1) {
+		struct object_id ooid;
+		struct object_id noid;
+
+		int err = reftable_iterator_next_log(&it, &log);
+		if (err < 0) {
+			goto done;
+		}
+
+		if (err > 0 || strcmp(log.ref_name, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_read_ref(refs->stack, refname, &ref);
+	if (err > 0) {
+		errno = ENOENT;
+		err = -1;
+		goto done;
+	}
+	if (err < 0) {
+		errno = reftable_error_to_errno(err);
+		err = -1;
+		goto done;
+	}
+	if (ref.target != NULL) {
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else if (ref.value != NULL) {
+		hashcpy(oid->hash, ref.value);
+	} else {
+		*type |= REF_ISBROKEN;
+		errno = EINVAL;
+		err = -1;
+	}
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	git_reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_initial_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_write_pseudoref,
+	reftable_delete_pseudoref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire,
+};
diff --git a/repository.c b/repository.c
index 6f7f6f002b1..087760bc184 100644
--- a/repository.c
+++ b/repository.c
@@ -178,6 +178,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 6534fbb7b31..f57b73f4a27 100644
--- a/repository.h
+++ b/repository.h
@@ -74,6 +74,9 @@ struct repository {
 	 */
 	struct ref_store *refs_private;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index 65fe5ecefbe..2ef970f9f88 100644
--- a/setup.c
+++ b/setup.c
@@ -468,9 +468,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -559,6 +561,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1204,8 +1207,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 00000000000..221ef51447c
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,142 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+INVALID_SHA1=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+initialize ()  {
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	mv .git/hooks .git/hooks-disabled
+}
+
+test_expect_success 'delete ref' '
+	initialize &&
+	test_commit file &&
+	SHA=$(git show-ref -s --verify HEAD) &&
+	test_write_lines "$SHA refs/heads/master" "$SHA refs/tags/file" >expect &&
+	git show-ref > actual &&
+	! git update-ref -d refs/tags/file $INVALID_SHA1 &&
+	test_cmp expect actual &&
+	git update-ref -d refs/tags/file $SHA  &&
+	test_write_lines "$SHA refs/heads/master" >expect &&
+	git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone calls transaction_initial_commit' '
+	test_commit message1 file1 &&
+	git clone . cloned &&
+	(test  -f cloned/file1 || echo "Fixme.")
+'
+
+test_expect_success 'basic operation of reftable storage: commit, show-ref' '
+	initialize &&
+	test_commit file &&
+	test_write_lines refs/heads/master refs/tags/file >expect &&
+	git show-ref &&
+	git show-ref | cut -f2 -d" " > actual &&
+	test_cmp actual expect
+'
+
+test_expect_success 'reflog, repack' '
+	initialize &&
+	for count in $(test_seq 1 10)
+	do
+		test_commit "number $count" file.t $count number-$count ||
+	        return 1
+	done &&
+	git pack-refs &&
+	ls -1 .git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 10 output &&
+	grep "commit (initial): number 1" output &&
+	grep "commit: number 10" output &&
+	git gc &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 0 output
+'
+
+# This matches show-ref's output
+print_ref() {
+	echo "$(git rev-parse "$1") $1"
+}
+
+test_expect_success 'peeled tags are stored' '
+	initialize &&
+	test_commit file &&
+	git tag -m "annotated tag" test_tag HEAD &&
+	{
+		print_ref "refs/heads/master" &&
+		print_ref "refs/tags/file" &&
+		print_ref "refs/tags/test_tag" &&
+		print_ref "refs/tags/test_tag^{}"
+	} >expect &&
+	git show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'show-ref works on fresh repo' '
+	initialize &&
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	>expect &&
+	! git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'checkout unborn branch' '
+	initialize &&
+	git checkout -b master
+'
+
+
+test_expect_success 'dir/file conflict' '
+	initialize &&
+	test_commit file &&
+	! git branch master/forbidden
+'
+
+
+test_expect_success 'do not clobber existing repo' '
+	rm -rf .git &&
+	git init --ref-storage=files &&
+	cat .git/HEAD > expect &&
+	test_commit file &&
+	(git init --ref-storage=reftable || true) &&
+	cat .git/HEAD > actual &&
+	test_cmp expect actual
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'pseudo refs' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+	git cherry-pick source &&
+	test -f file2
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'rebase' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+ 	git rebase source &&
+	test -f file2
+'
+
+test_done
+
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v16 11/14] Hookup unittests for the reftable library.
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
                                                 ` (9 preceding siblings ...)
  2020-06-05 18:03                               ` [PATCH v16 10/14] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-06-05 18:03                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-08 19:39                                 ` Junio C Hamano
  2020-06-05 18:03                               ` [PATCH v16 12/14] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
                                                 ` (4 subsequent siblings)
  15 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The unittests are under reftable/*_test.c, so all of the reftable code stays in
one directory. They are called from t/helpers/test-reftable.c in t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile             | 17 ++++++++++++++++-
 t/helper/test-tool.c |  1 +
 t/helper/test-tool.h |  1 +
 t/t0031-reftable.sh  |  5 +++++
 4 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 6b21c247b12..a39896191c7 100644
--- a/Makefile
+++ b/Makefile
@@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-read-cache.o
 TEST_BUILTINS_OBJS += test-read-graph.o
 TEST_BUILTINS_OBJS += test-read-midx.o
 TEST_BUILTINS_OBJS += test-ref-store.o
+TEST_BUILTINS_OBJS += test-reftable.o
 TEST_BUILTINS_OBJS += test-regex.o
 TEST_BUILTINS_OBJS += test-repository.o
 TEST_BUILTINS_OBJS += test-revision-walking.o
@@ -808,6 +809,7 @@ LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
 REFTABLE_LIB = reftable/libreftable.a
+REFTABLE_TEST_LIB = reftable/libreftable_test.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -2370,6 +2372,15 @@ REFTABLE_OBJS += reftable/tree.o
 REFTABLE_OBJS += reftable/writer.o
 REFTABLE_OBJS += reftable/zlib-compat.o
 
+REFTABLE_TEST_OBJS += reftable/block_test.o
+REFTABLE_TEST_OBJS += reftable/merged_test.o
+REFTABLE_TEST_OBJS += reftable/record_test.o
+REFTABLE_TEST_OBJS += reftable/refname_test.o
+REFTABLE_TEST_OBJS += reftable/reftable_test.o
+REFTABLE_TEST_OBJS += reftable/slice_test.o
+REFTABLE_TEST_OBJS += reftable/stack_test.o
+REFTABLE_TEST_OBJS += reftable/tree_test.o
+REFTABLE_TEST_OBJS += reftable/test_framework.o
 
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
@@ -2377,6 +2388,7 @@ OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
 	$(REFTABLE_OBJS) \
+	$(REFTABLE_TEST_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2520,6 +2532,9 @@ $(VCSSVN_LIB): $(VCSSVN_OBJS)
 $(REFTABLE_LIB): $(REFTABLE_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_TEST_LIB): $(REFTABLE_TEST_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -2802,7 +2817,7 @@ t/helper/test-svn-fe$X: $(VCSSVN_LIB)
 
 t/helper/test-tool$X: $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 
-t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS)
+t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS) $(REFTABLE_TEST_LIB)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS)
 
 check-sha1:: t/helper/test-tool$X
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 590b2efca70..10366b7b762 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -52,6 +52,7 @@ static struct test_cmd cmds[] = {
 	{ "read-graph", cmd__read_graph },
 	{ "read-midx", cmd__read_midx },
 	{ "ref-store", cmd__ref_store },
+	{ "reftable", cmd__reftable },
 	{ "regex", cmd__regex },
 	{ "repository", cmd__repository },
 	{ "revision-walking", cmd__revision_walking },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ddc8e990e91..d52ba2f5e57 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -41,6 +41,7 @@ int cmd__read_cache(int argc, const char **argv);
 int cmd__read_graph(int argc, const char **argv);
 int cmd__read_midx(int argc, const char **argv);
 int cmd__ref_store(int argc, const char **argv);
+int cmd__reftable(int argc, const char **argv);
 int cmd__regex(int argc, const char **argv);
 int cmd__repository(int argc, const char **argv);
 int cmd__revision_walking(int argc, const char **argv);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
index 221ef51447c..c1f3abfda9f 100755
--- a/t/t0031-reftable.sh
+++ b/t/t0031-reftable.sh
@@ -15,6 +15,11 @@ initialize ()  {
 	mv .git/hooks .git/hooks-disabled
 }
 
+test_expect_success 'unittests' '
+	test-tool reftable
+'
+
+
 test_expect_success 'delete ref' '
 	initialize &&
 	test_commit file &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v16 12/14] Add GIT_DEBUG_REFS debugging mechanism
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
                                                 ` (10 preceding siblings ...)
  2020-06-05 18:03                               ` [PATCH v16 11/14] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-06-05 18:03                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 13/14] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
                                                 ` (3 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

When set in the environment, GIT_DEBUG_REFS makes git print operations and
results as they flow through the ref storage backend. This helps debug
discrepancies between different ref backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile              |   1 +
 refs.c                |   3 +
 refs/debug.c          | 309 ++++++++++++++++++++++++++++++++++++++++++
 refs/refs-internal.h  |   6 +
 t/t0033-debug-refs.sh |  18 +++
 5 files changed, 337 insertions(+)
 create mode 100644 refs/debug.c
 create mode 100755 t/t0033-debug-refs.sh

diff --git a/Makefile b/Makefile
index a39896191c7..1d09af7fd8d 100644
--- a/Makefile
+++ b/Makefile
@@ -961,6 +961,7 @@ LIB_OBJS += rebase.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
+LIB_OBJS += refs/debug.o
 LIB_OBJS += refs/files-backend.o
 LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
diff --git a/refs.c b/refs.c
index 4409080dfd8..bd1f3cc0e45 100644
--- a/refs.c
+++ b/refs.c
@@ -1750,6 +1750,9 @@ struct ref_store *get_main_ref_store(struct repository *r)
 						 r->ref_storage_format :
 						 DEFAULT_REF_STORAGE,
 					 REF_STORE_ALL_CAPS);
+	if (getenv("GIT_DEBUG_REFS")) {
+		r->refs_private = debug_wrap(r->refs_private);
+	}
 	return r->refs_private;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
new file mode 100644
index 00000000000..7c07ec92137
--- /dev/null
+++ b/refs/debug.c
@@ -0,0 +1,309 @@
+
+#include "refs-internal.h"
+
+struct debug_ref_store {
+	struct ref_store base;
+	struct ref_store *refs;
+};
+
+extern struct ref_storage_be refs_be_debug;
+struct ref_store *debug_wrap(struct ref_store *store);
+
+struct ref_store *debug_wrap(struct ref_store *store)
+{
+	struct debug_ref_store *res = malloc(sizeof(struct debug_ref_store));
+	res->refs = store;
+	base_ref_store_init((struct ref_store *)res, &refs_be_debug);
+	return (struct ref_store *)res;
+}
+
+static int debug_init_db(struct ref_store *refs, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res = drefs->refs->be->init_db(drefs->refs, err);
+	return res;
+}
+
+static int debug_transaction_prepare(struct ref_store *refs,
+				     struct ref_transaction *transaction,
+				     struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_prepare(drefs->refs, transaction,
+						   err);
+	return res;
+}
+
+static void print_update(int i, const char *refname,
+			 const struct object_id *old_oid,
+			 const struct object_id *new_oid, unsigned int flags,
+			 unsigned int type)
+{
+	char o[200] = "null";
+	char n[200] = "null";
+	if (old_oid)
+		oid_to_hex_r(o, old_oid);
+	if (new_oid)
+		oid_to_hex_r(n, new_oid);
+
+	type &= 0xf; /* see refs.h REF_* */
+	flags &= REF_HAVE_NEW | REF_HAVE_OLD | REF_NO_DEREF |
+		 REF_FORCE_CREATE_REFLOG | REF_LOG_ONLY;
+	printf("%d: %s %s -> %s (F=0x%x, T=0x%x)\n", i, refname, o, n, flags,
+	       type);
+}
+
+static void print_transaction(struct ref_transaction *transaction)
+{
+	printf("transaction {\n");
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		print_update(i, u->refname, &u->old_oid, &u->new_oid, u->flags,
+			     u->type);
+	}
+	printf("}\n");
+}
+
+static int debug_transaction_finish(struct ref_store *refs,
+				    struct ref_transaction *transaction,
+				    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_finish(drefs->refs, transaction,
+						  err);
+	print_transaction(transaction);
+	printf("finish: %d\n", res);
+	return res;
+}
+
+static int debug_transaction_abort(struct ref_store *refs,
+				   struct ref_transaction *transaction,
+				   struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_abort(drefs->refs, transaction, err);
+	return res;
+}
+
+static int debug_initial_transaction_commit(struct ref_store *refs,
+					    struct ref_transaction *transaction,
+					    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->initial_transaction_commit(drefs->refs,
+							  transaction, err);
+	return res;
+}
+
+static int debug_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->pack_refs(drefs->refs, flags);
+	return res;
+}
+
+static int debug_create_symref(struct ref_store *ref_store,
+			       const char *ref_target,
+			       const char *refs_heads_master,
+			       const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_symref(drefs->refs, ref_target,
+						 refs_heads_master, logmsg);
+	return res;
+}
+
+static int debug_delete_refs(struct ref_store *ref_store, const char *msg,
+			     struct string_list *refnames, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->delete_refs(drefs->refs, msg, refnames, flags);
+	return res;
+}
+static int debug_rename_ref(struct ref_store *ref_store, const char *oldref,
+			    const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->rename_ref(drefs->refs, oldref, newref,
+					      logmsg);
+	return res;
+}
+static int debug_copy_ref(struct ref_store *ref_store, const char *oldref,
+			  const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->copy_ref(drefs->refs, oldref, newref, logmsg);
+	return res;
+}
+
+static int debug_write_pseudoref(struct ref_store *ref_store,
+				 const char *pseudoref,
+				 const struct object_id *oid,
+				 const struct object_id *old_oid,
+				 struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->write_pseudoref(drefs->refs, pseudoref, oid,
+						   old_oid, err);
+	char o[100] = "null";
+	char n[100] = "null";
+	if (oid)
+		oid_to_hex_r(o, oid);
+	if (old_oid)
+		oid_to_hex_r(n, old_oid);
+	printf("write_pseudoref: %s, %s => %s, err %s: %d\n", pseudoref, o, n,
+	       err->buf, res);
+	return res;
+}
+
+static int debug_delete_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *old_oid)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->delete_pseudoref(drefs->refs, pseudoref,
+						    old_oid);
+	char hex[100] = "null";
+	if (old_oid)
+		oid_to_hex_r(hex, old_oid);
+	printf("delete_pseudoref: %s (%s): %d\n", pseudoref, hex, res);
+	return res;
+}
+
+static struct ref_iterator *
+debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			 unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->iterator_begin(drefs->refs, prefix, flags);
+	return res;
+}
+
+static int debug_read_raw_ref(struct ref_store *ref_store, const char *refname,
+			      struct object_id *oid, struct strbuf *referent,
+			      unsigned int *type)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = 0;
+
+	oidcpy(oid, &null_oid);
+	res = drefs->refs->be->read_raw_ref(drefs->refs, refname, oid, referent,
+					    type);
+
+	if (res == 0) {
+		printf("read_raw_ref: %s: %s (=> %s) type %x: %d\n", refname,
+		       oid_to_hex(oid), referent->buf, *type, res);
+	} else {
+		printf("read_raw_ref: %s err %d\n", refname, res);
+	}
+	return res;
+}
+
+static struct ref_iterator *
+debug_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->reflog_iterator_begin(drefs->refs);
+	return res;
+}
+
+static int debug_for_each_reflog_ent(struct ref_store *ref_store,
+				     const char *refname, each_reflog_ent_fn fn,
+				     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->for_each_reflog_ent(drefs->refs, refname, fn,
+						       cb_data);
+	return res;
+}
+static int debug_for_each_reflog_ent_reverse(struct ref_store *ref_store,
+					     const char *refname,
+					     each_reflog_ent_fn fn,
+					     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->for_each_reflog_ent_reverse(
+		drefs->refs, refname, fn, cb_data);
+	return res;
+}
+
+static int debug_reflog_exists(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_exists(drefs->refs, refname);
+	return res;
+}
+
+static int debug_create_reflog(struct ref_store *ref_store, const char *refname,
+			       int force_create, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_reflog(drefs->refs, refname,
+						 force_create, err);
+	return res;
+}
+
+static int debug_delete_reflog(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->delete_reflog(drefs->refs, refname);
+	return res;
+}
+
+static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
+			       const struct object_id *oid, unsigned int flags,
+			       reflog_expiry_prepare_fn prepare_fn,
+			       reflog_expiry_should_prune_fn should_prune_fn,
+			       reflog_expiry_cleanup_fn cleanup_fn,
+			       void *policy_cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_expire(drefs->refs, refname, oid,
+						 flags, prepare_fn,
+						 should_prune_fn, cleanup_fn,
+						 policy_cb_data);
+	return res;
+}
+
+struct ref_storage_be refs_be_debug = {
+	NULL,
+	"debug",
+	NULL,
+	debug_init_db,
+	debug_transaction_prepare,
+	debug_transaction_finish,
+	debug_transaction_abort,
+	debug_initial_transaction_commit,
+
+	debug_pack_refs,
+	debug_create_symref,
+	debug_delete_refs,
+	debug_rename_ref,
+	debug_copy_ref,
+
+	debug_write_pseudoref,
+	debug_delete_pseudoref,
+
+	debug_ref_iterator_begin,
+	debug_read_raw_ref,
+
+	debug_reflog_iterator_begin,
+	debug_for_each_reflog_ent,
+	debug_for_each_reflog_ent_reverse,
+	debug_reflog_exists,
+	debug_create_reflog,
+	debug_delete_reflog,
+	debug_reflog_expire,
+};
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 7afe4c28310..5417623c86e 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -713,4 +713,10 @@ struct ref_store {
 void base_ref_store_init(struct ref_store *refs,
 			 const struct ref_storage_be *be);
 
+/*
+ * Print out ref operations as they occur. Useful for debugging alternate ref
+ * backends.
+ */
+struct ref_store *debug_wrap(struct ref_store *store);
+
 #endif /* REFS_REFS_INTERNAL_H */
diff --git a/t/t0033-debug-refs.sh b/t/t0033-debug-refs.sh
new file mode 100755
index 00000000000..96266a3ee3a
--- /dev/null
+++ b/t/t0033-debug-refs.sh
@@ -0,0 +1,18 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='cross-check reftable with files, using GIT_DEBUG_REFS output'
+
+. ./test-lib.sh
+
+test_expect_success 'GIT_DEBUG_REFS' '
+	git init --ref-storage=files files &&
+	git init --ref-storage=reftable reftable &&
+	(cd files && GIT_DEBUG_REFS=1 test_commit message file) > files.txt && 
+	(cd reftable && GIT_DEBUG_REFS=1 test_commit message file) > reftable.txt &&
+	test_cmp files.txt reftable.txt
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v16 13/14] vcxproj: adjust for the reftable changes
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
                                                 ` (11 preceding siblings ...)
  2020-06-05 18:03                               ` [PATCH v16 12/14] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
@ 2020-06-05 18:03                               ` Johannes Schindelin via GitGitGadget
  2020-06-05 18:03                               ` [PATCH v16 14/14] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
                                                 ` (2 subsequent siblings)
  15 siblings, 0 replies; 409+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This allows Git to be compiled via Visual Studio again after integrating
the `hn/reftable` branch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.mak.uname                           |  2 +-
 contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/config.mak.uname b/config.mak.uname
index c7eba69e54e..ae4e25a1a42 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -709,7 +709,7 @@ vcxproj:
 	# Make .vcxproj files and add them
 	unset QUIET_GEN QUIET_BUILT_IN; \
 	perl contrib/buildsystems/generate -g Vcxproj
-	git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
+	git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj
 
 	# Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
 	(echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm
index 5c666f9ac03..33a08d31652 100644
--- a/contrib/buildsystems/Generators/Vcxproj.pm
+++ b/contrib/buildsystems/Generators/Vcxproj.pm
@@ -77,7 +77,7 @@ sub createProject {
     my $libs_release = "\n    ";
     my $libs_debug = "\n    ";
     if (!$static_library) {
-      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
+      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
       $libs_debug = $libs_release;
       $libs_debug =~ s/zlib\.lib/zlibd\.lib/g;
       $libs_debug =~ s/libcurl\.lib/libcurl-d\.lib/g;
@@ -231,6 +231,7 @@ sub createProject {
 EOM
     if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') {
       my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"};
+      my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"};
       my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"};
 
       print F << "EOM";
@@ -240,6 +241,14 @@ sub createProject {
       <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
     </ProjectReference>
 EOM
+      if (!($name =~ /xdiff|libreftable/)) {
+        print F << "EOM";
+    <ProjectReference Include="$cdup\\reftable\\libreftable\\libreftable.vcxproj">
+      <Project>$uuid_libreftable</Project>
+      <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
+    </ProjectReference>
+EOM
+      }
       if (!($name =~ 'xdiff')) {
         print F << "EOM";
     <ProjectReference Include="$cdup\\xdiff\\lib\\xdiff_lib.vcxproj">
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v16 14/14] Add reftable testing infrastructure
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
                                                 ` (12 preceding siblings ...)
  2020-06-05 18:03                               ` [PATCH v16 13/14] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
@ 2020-06-05 18:03                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-09 23:14                               ` [PATCH v16 00/14] Reftable support git-core Junio C Hamano
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
  15 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-05 18:03 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* Add GIT_TEST_REFTABLE environment var to control default ref storage

* Add test_prerequisite REFTABLE.

* Skip some tests that are incompatible:

  * t3210-pack-refs.sh - does not apply
  * t9903-bash-prompt - The bash mode reads .git/HEAD directly
  * t1450-fsck.sh - manipulates .git/ directly to create invalid state

Major test failures:

 * t1400-update-ref.sh - Reads from .git/{refs,logs} directly
 * t1404-update-ref-errors.sh - Manipulates .git/refs/ directly
 * t1405 - inspecs .git/ directly.

Worst offenders:

t1400-update-ref.sh                      - 82 of 185
t2400-worktree-add.sh                    - 58 of 69
t1404-update-ref-errors.sh               - 44 of 53
t3514-cherry-pick-revert-gpg.sh          - 36 of 36
t5541-http-push-smart.sh                 - 29 of 38
t6003-rev-list-topo-order.sh             - 29 of 35
t3420-rebase-autostash.sh                - 28 of 42
t6120-describe.sh                        - 21 of 82
t3430-rebase-merges.sh                   - 18 of 24
t2018-checkout-branch.sh                 - 15 of 22
..

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/clone.c               | 2 +-
 refs.c                        | 6 +++---
 t/t1409-avoid-packing-refs.sh | 6 ++++++
 t/t1450-fsck.sh               | 6 ++++++
 t/t3210-pack-refs.sh          | 6 ++++++
 t/t9903-bash-prompt.sh        | 6 ++++++
 t/test-lib.sh                 | 5 +++++
 7 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 4d0cf065e4a..780c5807415 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1109,7 +1109,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 
 	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
-		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
+		default_ref_storage(), INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/refs.c b/refs.c
index bd1f3cc0e45..ba5239b8421 100644
--- a/refs.c
+++ b/refs.c
@@ -1748,7 +1748,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	r->refs_private = ref_store_init(r->gitdir,
 					 r->ref_storage_format ?
 						 r->ref_storage_format :
-						 DEFAULT_REF_STORAGE,
+						 default_ref_storage(),
 					 REF_STORE_ALL_CAPS);
 	if (getenv("GIT_DEBUG_REFS")) {
 		r->refs_private = debug_wrap(r->refs_private);
@@ -1807,7 +1807,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
+	refs = ref_store_init(submodule_sb.buf, default_ref_storage(),
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1821,7 +1821,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
-	const char *format = DEFAULT_REF_STORAGE; /* XXX */
+	const char *format = default_ref_storage();
 	struct ref_store *refs;
 	const char *id;
 
diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
index be12fb63506..c6f78325563 100755
--- a/t/t1409-avoid-packing-refs.sh
+++ b/t/t1409-avoid-packing-refs.sh
@@ -4,6 +4,12 @@ test_description='avoid rewriting packed-refs unnecessarily'
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 # Add an identifying mark to the packed-refs file header line. This
 # shouldn't upset readers, and it should be omitted if the file is
 # ever rewritten.
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 344a2aad82f..09669203249 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -8,6 +8,12 @@ test_description='git fsck random collection of tests
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success setup '
 	test_oid_init &&
 	git config gc.auto 0 &&
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index f41b2afb996..edaef2c175a 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -11,6 +11,12 @@ semantic is still the same.
 '
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success 'enable reflogs' '
 	git config core.logallrefupdates true
 '
diff --git a/t/t9903-bash-prompt.sh b/t/t9903-bash-prompt.sh
index ab5da2cabc4..5deaaf968b0 100755
--- a/t/t9903-bash-prompt.sh
+++ b/t/t9903-bash-prompt.sh
@@ -7,6 +7,12 @@ test_description='test git-specific bash prompt functions'
 
 . ./lib-bash.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping tests; incompatible with reftable'
+  test_done
+fi
+
 . "$GIT_BUILD_DIR/contrib/completion/git-prompt.sh"
 
 actual="$TRASH_DIRECTORY/actual"
diff --git a/t/test-lib.sh b/t/test-lib.sh
index dbc027ff267..3ce9b957b1b 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1504,6 +1504,11 @@ parisc* | hppa*)
 	;;
 esac
 
+if test -n "$GIT_TEST_REFTABLE"
+then
+  test_set_prereq REFTABLE
+fi
+
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 11/14] Hookup unittests for the reftable library.
  2020-06-05 18:03                               ` [PATCH v16 11/14] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-06-08 19:39                                 ` Junio C Hamano
  2020-06-09 17:22                                   ` [PATCH] Fixup! Add t/helper/test-reftable.c hanwen
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-06-08 19:39 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> The unittests are under reftable/*_test.c, so all of the reftable code stays in
> one directory. They are called from t/helpers/test-reftable.c in t0031-reftable.sh
>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---

Hmph,

make: *** No rule to make target 't/helper/test-reftable.c', needed by 't/helper/test-reftable.o'.  Stop.
make: *** Waiting for unfinished jobs....

>  Makefile             | 17 ++++++++++++++++-
>  t/helper/test-tool.c |  1 +
>  t/helper/test-tool.h |  1 +
>  t/t0031-reftable.sh  |  5 +++++
>  4 files changed, 23 insertions(+), 1 deletion(-)

Forgot to add a source file?

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 01/14] Write pseudorefs through ref backends.
  2020-06-05 18:03                               ` [PATCH v16 01/14] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
@ 2020-06-09 10:16                                 ` Phillip Wood
  0 siblings, 0 replies; 409+ messages in thread
From: Phillip Wood @ 2020-06-09 10:16 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget, git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

On 05/06/2020 19:03, Han-Wen Nienhuys via GitGitGadget wrote:
> From: Han-Wen Nienhuys <hanwen@google.com>
> 
> Pseudorefs store transient data in in the repository. Examples are HEAD,
> CHERRY_PICK_HEAD, etc.
> 
> These refs have always been read through the ref backends, but they were written
> in a one-off routine that wrote an object ID or symref directly into
> .git/<pseudo_ref_name>.
> 
> This causes problems when introducing a new ref storage backend. To remedy this,
> extend the ref backend implementation with a write_pseudoref_fn and
> update_pseudoref_fn.

I think this patch is looking better now, I've had a couple more
thoughts though

> [...]
> diff --git a/refs.h b/refs.h
> index e010f8aec28..4dad8f24914 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -732,6 +732,17 @@ int update_ref(const char *msg, const char *refname,
>  	       const struct object_id *new_oid, const struct object_id *old_oid,
>  	       unsigned int flags, enum action_on_err onerr);
>  
> +/* Pseudorefs (eg. HEAD, CHERRY_PICK_HEAD) have a separate routines for updating
> +   and deletion as they cannot take part in normal transactional updates.
> +   Pseudorefs should only be written for the main repository.
> +*/
> +int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
> +			 const struct object_id *oid,
> +			 const struct object_id *old_oid, struct strbuf *err);
> +int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
> +			  const struct object_id *old_oid);
> +int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid);
> +

Do we need to expose these functions to the rest of git? In sequencer.c
we write the pseudo ref REBASE_HEAD when a rebase stops due to
conflicts. It is created with update_ref() and deleted with
delete_ref(), is there a reason can't we just use update_ref()/delete
ref() for the other pseudo refs?

>  int parse_hide_refs_config(const char *var, const char *value, const char *);
>  
>  /*
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 6516c7bc8c8..df7553f4cc3 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -731,6 +731,115 @@ static int lock_raw_ref(struct files_ref_store *refs,
>  	return ret;
>  }
>  
> +static int files_write_pseudoref(struct ref_store *ref_store,
> +				 const char *pseudoref,
> +				 const struct object_id *oid,
> +				 const struct object_id *old_oid,
> +				 struct strbuf *err)
> +{
> +	struct files_ref_store *refs =
> +		files_downcast(ref_store, REF_STORE_READ, "write_pseudoref");
> +	int fd;
> +	struct lock_file lock = LOCK_INIT;
> +	struct strbuf filename = STRBUF_INIT;
> +	struct strbuf buf = STRBUF_INIT;
> +	int ret = -1;
> +
> +	if (!oid)
> +		return 0;
> +
> +	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
> +	fd = hold_lock_file_for_update_timeout(&lock, filename.buf, 0,
> +					       get_files_ref_lock_timeout_ms());
> +	if (fd < 0) {
> +		strbuf_addf(err, _("could not open '%s' for writing: %s"),
> +			    buf.buf, strerror(errno));
> +		goto done;
> +	}
> +
> +	if (old_oid) {
> +		struct object_id actual_old_oid;
> +
> +		if (read_ref(pseudoref, &actual_old_oid)) {
> +			if (!is_null_oid(old_oid)) {
> +				strbuf_addf(err, _("could not read ref '%s'"),
> +					    pseudoref);
> +				rollback_lock_file(&lock);
> +				goto done;
> +			}
> +		} else if (is_null_oid(old_oid)) {
> +			strbuf_addf(err, _("ref '%s' already exists"),
> +				    pseudoref);
> +			rollback_lock_file(&lock);
> +			goto done;
> +		} else if (!oideq(&actual_old_oid, old_oid)) {
> +			strbuf_addf(err,
> +				    _("unexpected object ID when writing '%s'"),
> +				    pseudoref);
> +			rollback_lock_file(&lock);
> +			goto done;
> +		}
> +	}
> +
> +	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
> +	if (write_in_full(fd, buf.buf, buf.len) < 0) {
> +		strbuf_addf(err, _("could not write to '%s'"), filename.buf);
> +		rollback_lock_file(&lock);
> +		goto done;
> +	}
> +
> +	commit_lock_file(&lock);
> +	ret = 0;
> +done:
> +	strbuf_release(&buf);
> +	strbuf_release(&filename);
> +	return ret;
> +}
> +
> +static int files_delete_pseudoref(struct ref_store *ref_store,
> +				  const char *pseudoref,
> +				  const struct object_id *old_oid)
> +{
> +	struct files_ref_store *refs =
> +		files_downcast(ref_store, REF_STORE_READ, "delete_pseudoref");
> +	struct strbuf filename = STRBUF_INIT;
> +	int ret = -1;
> +
> +	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
> +
> +	if (old_oid && !is_null_oid(old_oid)) {
> +		struct lock_file lock = LOCK_INIT;
> +		int fd;
> +		struct object_id actual_old_oid;
> +
> +		fd = hold_lock_file_for_update_timeout(
> +			&lock, filename.buf, 0,
> +			get_files_ref_lock_timeout_ms());
> +		if (fd < 0) {
> +			error_errno(_("could not open '%s' for writing"),
> +				    filename.buf);
> +			goto done;
> +		}
> +		if (read_ref(pseudoref, &actual_old_oid))
> +			die(_("could not read ref '%s'"), pseudoref);
> +		if (!oideq(&actual_old_oid, old_oid)) {
> +			error(_("unexpected object ID when deleting '%s'"),
> +			      pseudoref);
> +			rollback_lock_file(&lock);
> +			goto done;
> +		}
> +
> +		unlink(filename.buf);

This could fail (especially on windows I think) and there are one or two
places in sequencer.c where we do actually check that the ref has
actually been deleted. Switching to calling delete_pseudoref() as it
stands will break those checks. What are the error semantics of normal
ref deletion where the file cannot be unlinked?

Best Wishes

Phillip

> +		rollback_lock_file(&lock);
> +	} else {
> +		unlink(filename.buf);
> +	}
> +	ret = 0;
> +done:
> +	strbuf_release(&filename);
> +	return ret;
> +}
> +
>  struct files_ref_iterator {
>  	struct ref_iterator base;
>  
> @@ -3189,6 +3298,9 @@ struct ref_storage_be refs_be_files = {
>  	files_rename_ref,
>  	files_copy_ref,
>  
> +	files_write_pseudoref,
> +	files_delete_pseudoref,
> +
>  	files_ref_iterator_begin,
>  	files_read_raw_ref,
>  
> @@ -3198,5 +3310,5 @@ struct ref_storage_be refs_be_files = {
>  	files_reflog_exists,
>  	files_create_reflog,
>  	files_delete_reflog,
> -	files_reflog_expire
> +	files_reflog_expire,
>  };
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 4458a0f69cc..08e8253a893 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1590,6 +1590,22 @@ static int packed_copy_ref(struct ref_store *ref_store,
>  	BUG("packed reference store does not support copying references");
>  }
>  
> +static int packed_write_pseudoref(struct ref_store *ref_store,
> +				  const char *pseudoref,
> +				  const struct object_id *oid,
> +				  const struct object_id *old_oid,
> +				  struct strbuf *err)
> +{
> +	BUG("packed reference store does not support writing pseudo-references");
> +}
> +
> +static int packed_delete_pseudoref(struct ref_store *ref_store,
> +				   const char *pseudoref,
> +				   const struct object_id *old_oid)
> +{
> +	BUG("packed reference store does not support deleting pseudo-references");
> +}
> +
>  static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_store)
>  {
>  	return empty_ref_iterator_begin();
> @@ -1656,6 +1672,9 @@ struct ref_storage_be refs_be_packed = {
>  	packed_rename_ref,
>  	packed_copy_ref,
>  
> +	packed_write_pseudoref,
> +	packed_delete_pseudoref,
> +
>  	packed_ref_iterator_begin,
>  	packed_read_raw_ref,
>  
> @@ -1665,5 +1684,5 @@ struct ref_storage_be refs_be_packed = {
>  	packed_reflog_exists,
>  	packed_create_reflog,
>  	packed_delete_reflog,
> -	packed_reflog_expire
> +	packed_reflog_expire,
>  };
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index 4271362d264..59b053d53a2 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -556,6 +556,21 @@ typedef int copy_ref_fn(struct ref_store *ref_store,
>  			  const char *oldref, const char *newref,
>  			  const char *logmsg);
>  
> +typedef int write_pseudoref_fn(struct ref_store *ref_store,
> +			       const char *pseudoref,
> +			       const struct object_id *oid,
> +			       const struct object_id *old_oid,
> +			       struct strbuf *err);
> +
> +/*
> + * Deletes a pseudoref. Deletion always succeeds (even if the pseudoref doesn't
> + * exist.), except if old_oid is specified. If it is, it can fail due to lock
> + * failure, failure reading the old OID, or an OID mismatch
> + */
> +typedef int delete_pseudoref_fn(struct ref_store *ref_store,
> +				const char *pseudoref,
> +				const struct object_id *old_oid);
> +
>  /*
>   * Iterate over the references in `ref_store` whose names start with
>   * `prefix`. `prefix` is matched as a literal string, without regard
> @@ -655,6 +670,9 @@ struct ref_storage_be {
>  	rename_ref_fn *rename_ref;
>  	copy_ref_fn *copy_ref;
>  
> +	write_pseudoref_fn *write_pseudoref;
> +	delete_pseudoref_fn *delete_pseudoref;
> +
>  	ref_iterator_begin_fn *iterator_begin;
>  	read_raw_ref_fn *read_raw_ref;
>  
> 


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 02/14] Make refs_ref_exists public
  2020-06-05 18:03                               ` [PATCH v16 02/14] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
@ 2020-06-09 10:36                                 ` Phillip Wood
  2020-06-10 18:05                                   ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Phillip Wood @ 2020-06-09 10:36 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget, git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

On 05/06/2020 19:03, Han-Wen Nienhuys via GitGitGadget wrote:
> From: Han-Wen Nienhuys <hanwen@google.com>
> 
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  refs.c | 2 +-
>  refs.h | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/refs.c b/refs.c
> index 12908066b13..812fee47108 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -311,7 +311,7 @@ int read_ref(const char *refname, struct object_id *oid)
>  	return read_ref_full(refname, RESOLVE_REF_READING, oid, NULL);
>  }
>  
> -static int refs_ref_exists(struct ref_store *refs, const char *refname)
> +int refs_ref_exists(struct ref_store *refs, const char *refname)
>  {
>  	return !!refs_resolve_ref_unsafe(refs, refname, RESOLVE_REF_READING, NULL, NULL);
>  }

It is a shame that ref_exists() does not take a struct repository. The
code in the following patches would be pleasanter if we could write
    ref_exists(r, ref)
rather than
    refs_ref_exists(get_main_ref_store(r), ref)
all the time

Maybe we should think about updating the existing callers to
ref_exists() to pass a struct repository.

The existing code is inconsistent about its repository handling - we
create the refs with update_ref() which operates on the main repository
but when checking their existence and deleting them we use a path which
depends on the repository. I've realized now the answer to my question
about using delete_ref() in my reply to the previous patch - it does not
take a repository - maybe it should along with update_ref() but that
might be more work than you want to do though.

Best Wishes

Phillip

> diff --git a/refs.h b/refs.h
> index 4dad8f24914..7aaa1226551 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -105,6 +105,8 @@ int refs_verify_refname_available(struct ref_store *refs,
>  				  const struct string_list *skip,
>  				  struct strbuf *err);
>  
> +int refs_ref_exists(struct ref_store *refs, const char *refname);
> +
>  int ref_exists(const char *refname);
>  
>  int should_autocreate_reflog(const char *refname);
> 


^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH] Fixup! Add t/helper/test-reftable.c
  2020-06-08 19:39                                 ` Junio C Hamano
@ 2020-06-09 17:22                                   ` hanwen
  2020-06-09 20:45                                     ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: hanwen @ 2020-06-09 17:22 UTC (permalink / raw)
  To: gitster; +Cc: git, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/helper/test-reftable.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
 create mode 100644 t/helper/test-reftable.c

diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c
new file mode 100644
index 0000000000..95d18ba1fa
--- /dev/null
+++ b/t/helper/test-reftable.c
@@ -0,0 +1,15 @@
+#include "reftable/reftable-tests.h"
+#include "test-tool.h"
+
+int cmd__reftable(int argc, const char **argv)
+{
+	block_test_main(argc, argv);
+	merged_test_main(argc, argv);
+	record_test_main(argc, argv);
+	refname_test_main(argc, argv);
+	reftable_test_main(argc, argv);
+	slice_test_main(argc, argv);
+	stack_test_main(argc, argv);
+	tree_test_main(argc, argv);
+	return 0;
+}
-- 
2.27.0.278.ge193c7cf3a9-goog


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH] Fixup! Add t/helper/test-reftable.c
  2020-06-09 17:22                                   ` [PATCH] Fixup! Add t/helper/test-reftable.c hanwen
@ 2020-06-09 20:45                                     ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-06-09 20:45 UTC (permalink / raw)
  To: hanwen; +Cc: git

hanwen@google.com writes:

> Subject: Re: [PATCH] Fixup! Add t/helper/test-reftable.c

I do not see a step with "Add t/helper/test-reftable.c" as its
subject; is this supposed to be squashed into "Hookup unittests for
the reftable library." or somewhere else?

I also notice, with or without this squashed in, 775c540f (Hookup
unittests for the reftable library., 2020-06-05) does not compile
cleanly.  Since you are relatively new to this project (even though
you are very experienced and capable), I probably should let you
know that in this project each patch in a series is expected to
cleanly compile and pass the testsuite, not just the endstate.

Thanks.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 00/14] Reftable support git-core
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
                                                 ` (13 preceding siblings ...)
  2020-06-05 18:03                               ` [PATCH v16 14/14] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
@ 2020-06-09 23:14                               ` Junio C Hamano
  2020-06-10  6:56                                 ` Han-Wen Nienhuys
  2020-06-10 16:57                                 ` Han-Wen Nienhuys
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
  15 siblings, 2 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-06-09 23:14 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This adds the reftable library, and hooks it up as a ref backend. Based on
> hn/refs-cleanup.
>
> Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1
>
> Summary 20479 tests pass 1299 tests fail

Thanks.

> Some issues:
>
>  * many tests inspect .git/{logs,heads}/ directly.
>  * worktrees broken. 

Are these with or without GIT_TEST_REFTABLE=1?

The patches (with the fix-up I saw today) applied on top of your
refs clean-up topic seems to pass the tests (without setting
GIT_TEST_REFTABLE) as the whole, but when merged at the tip of 'pu',
many tests fail (please see [*1*], for examples).  I suspect that
there may be some topics in flight that add more direct access to
the .git/refs/ hierarchy, and/or there may be stupid mismerges (I'd
especially appreciate it if you can spot any).



[Reference]

*1* https://travis-ci.org/github/git/git/builds/696638569

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 00/14] Reftable support git-core
  2020-06-09 23:14                               ` [PATCH v16 00/14] Reftable support git-core Junio C Hamano
@ 2020-06-10  6:56                                 ` Han-Wen Nienhuys
  2020-06-10 17:09                                   ` Junio C Hamano
  2020-06-10 16:57                                 ` Han-Wen Nienhuys
  1 sibling, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-06-10  6:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, hanwen

On Wed, Jun 10, 2020 at 1:14 AM Junio C Hamano <gitster@pobox.com> wrote:

> > Some issues:
> >
> >  * many tests inspect .git/{logs,heads}/ directly.
> >  * worktrees broken.
>
> Are these with or without GIT_TEST_REFTABLE=1?

worktrees are broken with reftable, because HEAD is per-worktree ref,
and I haven't decided how to handle those for reftable.

> The patches (with the fix-up I saw today) applied on top of your
> refs clean-up topic seems to pass the tests (without setting
> GIT_TEST_REFTABLE) as the whole, but when merged at the tip of 'pu',
> many tests fail (please see [*1*], for examples).  I suspect that
> there may be some topics in flight that add more direct access to
> the .git/refs/ hierarchy, and/or there may be stupid mismerges (I'd
> especially appreciate it if you can spot any).

Curious; I'll have a look.

There is also:

reftable/stack_test.c:27:7: error: incompatible pointer types
initializing 'PREC_DIR *' with an expression of type 'DIR *'
[-Werror,-Wincompatible-pointer-types]
        DIR *dir = fdopendir(fd);

on OSX. What is the proper dialect for reading out a directory within
the git source code? opendir and fdopendir are POSIX, so I'm surprised
this fails.

-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 00/14] Reftable support git-core
  2020-06-09 23:14                               ` [PATCH v16 00/14] Reftable support git-core Junio C Hamano
  2020-06-10  6:56                                 ` Han-Wen Nienhuys
@ 2020-06-10 16:57                                 ` Han-Wen Nienhuys
  1 sibling, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-06-10 16:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Wed, Jun 10, 2020 at 1:14 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> he patches (with the fix-up I saw today) applied on top of your
> refs clean-up topic seems to pass the tests (without setting
> GIT_TEST_REFTABLE) as the whole, but when merged at the tip of 'pu',
> many tests fail (please see [*1*], for examples).  I suspect that

they all say

  warning: unable to upgrade repository format from 0 to 1:
  fatal: unable to upgrade repository format to support XXX

maybe a mismerge in the code in init-db.c that deals with the
repository format version?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 00/14] Reftable support git-core
  2020-06-10  6:56                                 ` Han-Wen Nienhuys
@ 2020-06-10 17:09                                   ` Junio C Hamano
  2020-06-10 17:38                                     ` Junio C Hamano
  2020-06-10 18:59                                     ` Johannes Schindelin
  0 siblings, 2 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-06-10 17:09 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, hanwen

Han-Wen Nienhuys <hanwenn@gmail.com> writes:

> There is also:
>
> reftable/stack_test.c:27:7: error: incompatible pointer types
> initializing 'PREC_DIR *' with an expression of type 'DIR *'
> [-Werror,-Wincompatible-pointer-types]
>         DIR *dir = fdopendir(fd);
>
> on OSX. What is the proper dialect for reading out a directory within
> the git source code? opendir and fdopendir are POSIX, so I'm surprised
> this fails.

I am reasonably sure we use opendir() to iterate over existing files
and subdirectories in a directory (e.g. in the codepaths to
enumerate loose object files in .git/objects/??/ directories).  

I do not offhand know we also use fdopendir() elsewhere.  I strongly
suspect we do not.  Perhaps some platforms do POSIX.1-2001 but not
ready for POSIX.1-2008 or something silly like that?

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 00/14] Reftable support git-core
  2020-06-10 17:09                                   ` Junio C Hamano
@ 2020-06-10 17:38                                     ` Junio C Hamano
  2020-06-10 18:59                                     ` Johannes Schindelin
  1 sibling, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-06-10 17:38 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, hanwen

Junio C Hamano <gitster@pobox.com> writes:

> I do not offhand know we also use fdopendir() elsewhere.  I strongly
> suspect we do not.  Perhaps some platforms do POSIX.1-2001 but not
> ready for POSIX.1-2008 or something silly like that?

Searching "macos fdopendir" gives a few hits from other projects,
all saying that before macOS 10.10 they did not have fdopendir()
and their code failed to build on macs.




^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 02/14] Make refs_ref_exists public
  2020-06-09 10:36                                 ` Phillip Wood
@ 2020-06-10 18:05                                   ` Han-Wen Nienhuys
  2020-06-11 14:59                                     ` Phillip Wood
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-06-10 18:05 UTC (permalink / raw)
  To: Phillip Wood; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, Jun 9, 2020 at 12:36 PM Phillip Wood <phillip.wood123@gmail.com> wrote:
>
> On 05/06/2020 19:03, Han-Wen Nienhuys via GitGitGadget wrote:
> > From: Han-Wen Nienhuys <hanwen@google.com>
> >
> > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> > ---
> >  refs.c | 2 +-
> >  refs.h | 2 ++
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/refs.c b/refs.c
> > index 12908066b13..812fee47108 100644
> > --- a/refs.c
> > +++ b/refs.c
> > @@ -311,7 +311,7 @@ int read_ref(const char *refname, struct object_id *oid)
> >       return read_ref_full(refname, RESOLVE_REF_READING, oid, NULL);
> >  }
> >
> > -static int refs_ref_exists(struct ref_store *refs, const char *refname)
> > +int refs_ref_exists(struct ref_store *refs, const char *refname)
> >  {
> >       return !!refs_resolve_ref_unsafe(refs, refname, RESOLVE_REF_READING, NULL, NULL);
> >  }
>
> It is a shame that ref_exists() does not take a struct repository. The

I'm trying to follow the pattern that the rest of the code
establishes; you're right that the code isn't very consistent (the
fact that it uses unlink() rather than go through the ref store in the
first place is an indication of that). I want to avoid starting a
general cleanup tour of the code base, the reftable patch is hard
enough to pull off without.

> The existing code is inconsistent about its repository handling - we
> create the refs with update_ref() which operates on the main repository
> but when checking their existence and deleting them we use a path which
> depends on the repository. I've realized now the answer to my question
> about using delete_ref() in my reply to the previous patch - it does not
> take a repository - maybe it should along with update_ref() but that
> might be more work than you want to do though.

Why do they take repository arguments anyway? Is it because
rebase/cherry-pick supports recursion into submodules?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 00/14] Reftable support git-core
  2020-06-10 17:09                                   ` Junio C Hamano
  2020-06-10 17:38                                     ` Junio C Hamano
@ 2020-06-10 18:59                                     ` Johannes Schindelin
  2020-06-10 19:04                                       ` Han-Wen Nienhuys
  1 sibling, 1 reply; 409+ messages in thread
From: Johannes Schindelin @ 2020-06-10 18:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Han-Wen Nienhuys, Han-Wen Nienhuys via GitGitGadget, git, hanwen

Hi Junio,

On Wed, 10 Jun 2020, Junio C Hamano wrote:

> Han-Wen Nienhuys <hanwenn@gmail.com> writes:
>
> > There is also:
> >
> > reftable/stack_test.c:27:7: error: incompatible pointer types
> > initializing 'PREC_DIR *' with an expression of type 'DIR *'
> > [-Werror,-Wincompatible-pointer-types]
> >         DIR *dir = fdopendir(fd);
> >
> > on OSX. What is the proper dialect for reading out a directory within
> > the git source code? opendir and fdopendir are POSIX, so I'm surprised
> > this fails.
>
> I am reasonably sure we use opendir() to iterate over existing files
> and subdirectories in a directory (e.g. in the codepaths to
> enumerate loose object files in .git/objects/??/ directories).
>
> I do not offhand know we also use fdopendir() elsewhere.  I strongly
> suspect we do not.  Perhaps some platforms do POSIX.1-2001 but not
> ready for POSIX.1-2008 or something silly like that?

We don't. We also do not use `unlinkat()`. And we generally use
`remove_dir_recursively()` instead of implementing a separate version of
it that only handles one directory level ;-)

This is what I needed in Git for Windows' `shears/pu` branch to make it
compile again:

-- snipsnap --
Subject: [PATCH] fixup??? Add reftable library

Rather than causing a build failure e.g. on Windows, where `fdopendir()`
and `unlinkat()` just is not a thing, and rather than re-inventing
`remove_dir_recursively()`, let's just use the perfectly fine library
function we already have for that purpose.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 reftable/stack_test.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index ffcc1e42819a..4dadc39bf45f 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -17,23 +17,20 @@ license that can be found in the LICENSE file or at
 #include "reftable.h"
 #include "test_framework.h"
 #include "reftable-tests.h"
+#include "dir.h"

 #include <sys/types.h>
 #include <dirent.h>

 static void clear_dir(const char *dirname)
 {
-	int fd = open(dirname, O_DIRECTORY, 0);
-	DIR *dir = fdopendir(fd);
-	struct dirent *ent = NULL;
+	struct strbuf buf = STRBUF_INIT;

-	assert(fd >= 0);
+	strbuf_addstr(&buf, dirname);
+	if (remove_dir_recursively(&buf, 0) < 0)
+		die("Could not remove '%s'", dirname);

-	while ((ent = readdir(dir)) != NULL) {
-		unlinkat(fd, ent->d_name, 0);
-	}
-	closedir(dir);
-	rmdir(dirname);
+	strbuf_release(&buf);
 }

 static void test_read_file(void)
--
2.27.0.windows.1


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 00/14] Reftable support git-core
  2020-06-10 18:59                                     ` Johannes Schindelin
@ 2020-06-10 19:04                                       ` Han-Wen Nienhuys
  2020-06-10 19:20                                         ` Johannes Schindelin
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-06-10 19:04 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Han-Wen Nienhuys, Han-Wen Nienhuys via GitGitGadget, git

On Wed, Jun 10, 2020 at 8:59 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:

> > I am reasonably sure we use opendir() to iterate over existing files
> > and subdirectories in a directory (e.g. in the codepaths to
> > enumerate loose object files in .git/objects/??/ directories).
> >
> > I do not offhand know we also use fdopendir() elsewhere.  I strongly
> > suspect we do not.  Perhaps some platforms do POSIX.1-2001 but not
> > ready for POSIX.1-2008 or something silly like that?
>
> We don't. We also do not use `unlinkat()`. And we generally use
> `remove_dir_recursively()` instead of implementing a separate version of
> it that only handles one directory level ;-)
>
> This is what I needed in Git for Windows' `shears/pu` branch to make it
> compile again:

Thanks Dscho. Last time I folded a fix into
github.com/google/reftable, you seemed quite upset. How should I
proceed with this fixup?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 00/14] Reftable support git-core
  2020-06-10 19:04                                       ` Han-Wen Nienhuys
@ 2020-06-10 19:20                                         ` Johannes Schindelin
  0 siblings, 0 replies; 409+ messages in thread
From: Johannes Schindelin @ 2020-06-10 19:20 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Junio C Hamano, Han-Wen Nienhuys, Han-Wen Nienhuys via GitGitGadget, git

Hi Han-Wen,

On Wed, 10 Jun 2020, Han-Wen Nienhuys wrote:

> On Wed, Jun 10, 2020 at 8:59 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > > I am reasonably sure we use opendir() to iterate over existing files
> > > and subdirectories in a directory (e.g. in the codepaths to
> > > enumerate loose object files in .git/objects/??/ directories).
> > >
> > > I do not offhand know we also use fdopendir() elsewhere.  I strongly
> > > suspect we do not.  Perhaps some platforms do POSIX.1-2001 but not
> > > ready for POSIX.1-2008 or something silly like that?
> >
> > We don't. We also do not use `unlinkat()`. And we generally use
> > `remove_dir_recursively()` instead of implementing a separate version of
> > it that only handles one directory level ;-)
> >
> > This is what I needed in Git for Windows' `shears/pu` branch to make it
> > compile again:
>
> Thanks Dscho. Last time I folded a fix into
> github.com/google/reftable, you seemed quite upset.

I was not upset. I just pointed out how unnecessarily difficult and
cumbersome it is to have this ever-growing patch, with the _actual_ commit
history completely elsewhere.

My assessment has not changed, and the situation has not changed, and does
not look like it will change, so I'll just keep quiet from now on.

:-)

> How should I proceed with this fixup?

That's for you to decide, not for me. My suggestions in the past seem not
to have pleased you, so in the interest of keeping our relationship
friendly, I will refrain from adding more suggestions.

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 02/14] Make refs_ref_exists public
  2020-06-10 18:05                                   ` Han-Wen Nienhuys
@ 2020-06-11 14:59                                     ` Phillip Wood
  2020-06-12  9:51                                       ` Phillip Wood
  0 siblings, 1 reply; 409+ messages in thread
From: Phillip Wood @ 2020-06-11 14:59 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Johannes Schindelin, Junio C Hamano

On 10/06/2020 19:05, Han-Wen Nienhuys wrote:
> On Tue, Jun 9, 2020 at 12:36 PM Phillip Wood <phillip.wood123@gmail.com> wrote:
>>
>> On 05/06/2020 19:03, Han-Wen Nienhuys via GitGitGadget wrote:
>>> From: Han-Wen Nienhuys <hanwen@google.com>
>>>
>>> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
>>> ---
>>>   refs.c | 2 +-
>>>   refs.h | 2 ++
>>>   2 files changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/refs.c b/refs.c
>>> index 12908066b13..812fee47108 100644
>>> --- a/refs.c
>>> +++ b/refs.c
>>> @@ -311,7 +311,7 @@ int read_ref(const char *refname, struct object_id *oid)
>>>        return read_ref_full(refname, RESOLVE_REF_READING, oid, NULL);
>>>   }
>>>
>>> -static int refs_ref_exists(struct ref_store *refs, const char *refname)
>>> +int refs_ref_exists(struct ref_store *refs, const char *refname)
>>>   {
>>>        return !!refs_resolve_ref_unsafe(refs, refname, RESOLVE_REF_READING, NULL, NULL);
>>>   }
>>
>> It is a shame that ref_exists() does not take a struct repository. The
> 
> I'm trying to follow the pattern that the rest of the code
> establishes; you're right that the code isn't very consistent (the
> fact that it uses unlink() rather than go through the ref store in the
> first place is an indication of that). I want to avoid starting a
> general cleanup tour of the code base, the reftable patch is hard
> enough to pull off without.

Sure

>> The existing code is inconsistent about its repository handling - we
>> create the refs with update_ref() which operates on the main repository
>> but when checking their existence and deleting them we use a path which
>> depends on the repository. I've realized now the answer to my question
>> about using delete_ref() in my reply to the previous patch - it does not
>> take a repository - maybe it should along with update_ref() but that
>> might be more work than you want to do though.
> 
> Why do they take repository arguments anyway? Is it because
> rebase/cherry-pick supports recursion into submodules?

It was a stepping stone towards that, the git_path mechanism that is 
used to create git_path_cherry_pick_head() etc was changed to take a 
struct repository so it could support submodules without forking a 
separate process. However are still plenty of places where the sequencer 
code assumes a single repository (it calls update_ref(), delete_ref(), 
commit_tree_extended(), ...) and the two contributors who did a lot of 
that work have moved on. With that in mind perhaps we'd be better off 
just using ref_exists() and delete_ref() in this conversion. The call 
sites will be easy enough to fixup if those functions are converted to 
take a struct repository in the future and the result of this patch 
series will be nicer. I've cc'd dscho and Junio to see what they think.

Best Wishes

Phillip


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 02/14] Make refs_ref_exists public
  2020-06-11 14:59                                     ` Phillip Wood
@ 2020-06-12  9:51                                       ` Phillip Wood
  2020-06-15 11:32                                         ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Phillip Wood @ 2020-06-12  9:51 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Johannes Schindelin, Junio C Hamano

On 11/06/2020 15:59, Phillip Wood wrote:
> On 10/06/2020 19:05, Han-Wen Nienhuys wrote:
>> On Tue, Jun 9, 2020 at 12:36 PM Phillip Wood
>> <phillip.wood123@gmail.com> wrote:
>>>
>>> On 05/06/2020 19:03, Han-Wen Nienhuys via GitGitGadget wrote:
>>>> From: Han-Wen Nienhuys <hanwen@google.com>
>>>>
>>>> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
>>>> ---
>>>>   refs.c | 2 +-
>>>>   refs.h | 2 ++
>>>>   2 files changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/refs.c b/refs.c
>>>> index 12908066b13..812fee47108 100644
>>>> --- a/refs.c
>>>> +++ b/refs.c
>>>> @@ -311,7 +311,7 @@ int read_ref(const char *refname, struct
>>>> object_id *oid)
>>>>        return read_ref_full(refname, RESOLVE_REF_READING, oid, NULL);
>>>>   }
>>>>
>>>> -static int refs_ref_exists(struct ref_store *refs, const char
>>>> *refname)
>>>> +int refs_ref_exists(struct ref_store *refs, const char *refname)
>>>>   {
>>>>        return !!refs_resolve_ref_unsafe(refs, refname,
>>>> RESOLVE_REF_READING, NULL, NULL);
>>>>   }
>>>
>>> It is a shame that ref_exists() does not take a struct repository. The
>>
>> I'm trying to follow the pattern that the rest of the code
>> establishes; you're right that the code isn't very consistent (the
>> fact that it uses unlink() rather than go through the ref store in the
>> first place is an indication of that). I want to avoid starting a
>> general cleanup tour of the code base, the reftable patch is hard
>> enough to pull off without.
> 
> Sure
> 
>>> The existing code is inconsistent about its repository handling - we
>>> create the refs with update_ref() which operates on the main repository
>>> but when checking their existence and deleting them we use a path which
>>> depends on the repository. I've realized now the answer to my question
>>> about using delete_ref() in my reply to the previous patch - it does not
>>> take a repository - maybe it should along with update_ref() but that
>>> might be more work than you want to do though.
>>
>> Why do they take repository arguments anyway? Is it because
>> rebase/cherry-pick supports recursion into submodules?
> 
> It was a stepping stone towards that, the git_path mechanism that is
> used to create git_path_cherry_pick_head() etc was changed to take a
> struct repository so it could support submodules without forking a
> separate process. However are still plenty of places where the sequencer
> code assumes a single repository (it calls update_ref(), delete_ref(),
> commit_tree_extended(), ...) and the two contributors who did a lot of
> that work have moved on. With that in mind perhaps we'd be better off
> just using ref_exists() and delete_ref() in this conversion. The call
> sites will be easy enough to fixup if those functions are converted to
> take a struct repository in the future and the result of this patch
> series will be nicer. I've cc'd dscho and Junio to see what they think.

In the end I put some patches together that change delete_ref(),
update_ref() and ref_exists() to take a struct repository*. I'll clean
them up and post them next week. Hopefully that will mean that this
series can then use those functions when converting unlink() etc which
will avoid having to expose a separate api for pseudo refs.

Best Wishes

Phillip



^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v16 02/14] Make refs_ref_exists public
  2020-06-12  9:51                                       ` Phillip Wood
@ 2020-06-15 11:32                                         ` Han-Wen Nienhuys
  0 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-06-15 11:32 UTC (permalink / raw)
  To: phillip.wood
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Johannes Schindelin, Junio C Hamano

On Fri, Jun 12, 2020 at 11:51 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
> > It was a stepping stone towards that, the git_path mechanism that is
> > used to create git_path_cherry_pick_head() etc was changed to take a
> > struct repository so it could support submodules without forking a
> > separate process. However are still plenty of places where the sequencer
> > code assumes a single repository (it calls update_ref(), delete_ref(),
> > commit_tree_extended(), ...) and the two contributors who did a lot of
> > that work have moved on. With that in mind perhaps we'd be better off
> > just using ref_exists() and delete_ref() in this conversion. The call
> > sites will be easy enough to fixup if those functions are converted to
> > take a struct repository in the future and the result of this patch
> > series will be nicer. I've cc'd dscho and Junio to see what they think.
>
> In the end I put some patches together that change delete_ref(),
> update_ref() and ref_exists() to take a struct repository*. I'll clean
> them up and post them next week. Hopefully that will mean that this
> series can then use those functions when converting unlink() etc which
> will avoid having to expose a separate api for pseudo refs.

Sounds good.  Can you CC me on the patches?

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v17 00/17] Reftable support git-core
  2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
                                                 ` (14 preceding siblings ...)
  2020-06-09 23:14                               ` [PATCH v16 00/14] Reftable support git-core Junio C Hamano
@ 2020-06-16 19:20                               ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 01/17] lib-t6000.sh: write tag using git-update-ref Han-Wen Nienhuys via GitGitGadget
                                                   ` (17 more replies)
  15 siblings, 18 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend. Based on
hn/refs-cleanup.

Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1

Summary 20816 tests pass 1122 tests fail

Some issues:

 * many tests inspect .git/{logs,heads}/ directly.
 * worktrees broken. 

v20

 * various reflog fixes; @{-1} now works
 * more debug output for GIT_DEBUG_REFS
 * reftable's slice class is now equivalent to strbuf
 * add test-tool dump-reftable.

Han-Wen Nienhuys (16):
  lib-t6000.sh: write tag using git-update-ref
  checkout: add '\n' to reflog message
  Write pseudorefs through ref backends.
  Make refs_ref_exists public
  Treat BISECT_HEAD as a pseudo ref
  Treat CHERRY_PICK_HEAD as a pseudo ref
  Treat REVERT_HEAD as a pseudo ref
  Move REF_LOG_ONLY to refs-internal.h
  Iterate over the "refs/" namespace in for_each_[raw]ref
  Add .gitattributes for the reftable/ directory
  Add reftable library
  Reftable support for git-core
  Hookup unittests for the reftable library.
  Add GIT_DEBUG_REFS debugging mechanism
  Add reftable testing infrastructure
  Add "test-tool dump-reftable" command.

Johannes Schindelin (1):
  vcxproj: adjust for the reftable changes

 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   47 +-
 builtin/bisect--helper.c                      |    3 +-
 builtin/checkout.c                            |    5 +-
 builtin/clone.c                               |    3 +-
 builtin/commit.c                              |   34 +-
 builtin/init-db.c                             |   56 +-
 builtin/merge.c                               |    2 +-
 cache.h                                       |    6 +-
 config.mak.uname                              |    2 +-
 contrib/buildsystems/Generators/Vcxproj.pm    |   11 +-
 git-bisect.sh                                 |    4 +-
 path.c                                        |    2 -
 path.h                                        |    9 +-
 refs.c                                        |  157 +-
 refs.h                                        |   16 +
 refs/debug.c                                  |  358 +++++
 refs/files-backend.c                          |  121 +-
 refs/packed-backend.c                         |   21 +-
 refs/refs-internal.h                          |   32 +
 refs/reftable-backend.c                       | 1327 +++++++++++++++++
 reftable/.gitattributes                       |    1 +
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   11 +
 reftable/VERSION                              |    1 +
 reftable/basics.c                             |  215 +++
 reftable/basics.h                             |   53 +
 reftable/block.c                              |  437 ++++++
 reftable/block.h                              |  129 ++
 reftable/block_test.c                         |  157 ++
 reftable/compat.c                             |   97 ++
 reftable/compat.h                             |   48 +
 reftable/constants.h                          |   21 +
 reftable/dump.c                               |  212 +++
 reftable/file.c                               |   95 ++
 reftable/iter.c                               |  242 +++
 reftable/iter.h                               |   72 +
 reftable/merged.c                             |  320 ++++
 reftable/merged.h                             |   39 +
 reftable/merged_test.c                        |  273 ++++
 reftable/pq.c                                 |  113 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  742 +++++++++
 reftable/reader.h                             |   65 +
 reftable/record.c                             | 1118 ++++++++++++++
 reftable/record.h                             |  128 ++
 reftable/record_test.c                        |  403 +++++
 reftable/refname.c                            |  211 +++
 reftable/refname.h                            |   38 +
 reftable/refname_test.c                       |   99 ++
 reftable/reftable-tests.h                     |   22 +
 reftable/reftable.c                           |   90 ++
 reftable/reftable.h                           |  571 +++++++
 reftable/reftable_test.c                      |  631 ++++++++
 reftable/slice.c                              |  217 +++
 reftable/slice.h                              |   85 ++
 reftable/slice_test.c                         |   39 +
 reftable/stack.c                              | 1214 +++++++++++++++
 reftable/stack.h                              |   48 +
 reftable/stack_test.c                         |  787 ++++++++++
 reftable/system.h                             |   55 +
 reftable/test_framework.c                     |   69 +
 reftable/test_framework.h                     |   64 +
 reftable/tree.c                               |   63 +
 reftable/tree.h                               |   34 +
 reftable/tree_test.c                          |   62 +
 reftable/update.sh                            |   22 +
 reftable/writer.c                             |  661 ++++++++
 reftable/writer.h                             |   60 +
 reftable/zlib-compat.c                        |   92 ++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 sequencer.c                                   |   56 +-
 setup.c                                       |   12 +-
 t/helper/test-reftable.c                      |   20 +
 t/helper/test-tool.c                          |    2 +
 t/helper/test-tool.h                          |    2 +
 t/lib-t6000.sh                                |    5 +-
 t/t0031-reftable.sh                           |  160 ++
 t/t0033-debug-refs.sh                         |   18 +
 t/t1409-avoid-packing-refs.sh                 |    6 +
 t/t1450-fsck.sh                               |    6 +
 t/t3210-pack-refs.sh                          |    6 +
 t/t9903-bash-prompt.sh                        |    6 +
 t/test-lib.sh                                 |    5 +
 wt-status.c                                   |    6 +-
 86 files changed, 12629 insertions(+), 200 deletions(-)
 create mode 100644 refs/debug.c
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/.gitattributes
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/compat.c
 create mode 100644 reftable/compat.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/refname_test.c
 create mode 100644 reftable/reftable-tests.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/slice_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/system.h
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100644 t/helper/test-reftable.c
 create mode 100755 t/t0031-reftable.sh
 create mode 100755 t/t0033-debug-refs.sh


base-commit: bcb73516bf1f7fd16f882782d42250dd19f39b85
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v17
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v17
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v16:

  -:  ----------- >  1:  8304c3d6379 lib-t6000.sh: write tag using git-update-ref
  -:  ----------- >  2:  4012d801e3c checkout: add '\n' to reflog message
  1:  4a1f600fb85 =  3:  95a6a1d968e Write pseudorefs through ref backends.
  2:  a4a67ce9635 =  4:  1f8865f4b3e Make refs_ref_exists public
  3:  ecd6591d86f =  5:  7f376a76d84 Treat BISECT_HEAD as a pseudo ref
  4:  7c31727de69 =  6:  959c69b5ee4 Treat CHERRY_PICK_HEAD as a pseudo ref
  5:  fbdcee5208b =  7:  3f18475d0d3 Treat REVERT_HEAD as a pseudo ref
  6:  d8801367f7d =  8:  4981e5395c6 Move REF_LOG_ONLY to refs-internal.h
  7:  556d57610cc =  9:  f452c48ae44 Iterate over the "refs/" namespace in for_each_[raw]ref
  8:  78cb1930172 = 10:  1fa68d5d34f Add .gitattributes for the reftable/ directory
  9:  0bc28ac610f ! 11:  86646c834c2 Add reftable library
     @@ reftable/README.md (new)
      
       ## reftable/VERSION (new) ##
      @@
     -+0c164275ea8f8aa7d099f7425ddcef1affe137e9 C: compile framework with Git options
     ++2c91c4b305dcbf6500c0806bb1a7fbcfc668510c C: include system.h in compat.h
      
       ## reftable/basics.c (new) ##
      @@
     @@ reftable/block.c (new)
      +
      +	w->next += n;
      +
     -+	slice_copy(&w->last_key, key);
     ++	slice_reset(&w->last_key);
     ++	slice_addbuf(&w->last_key, key);
      +	w->entries++;
      +	return 0;
      +}
     @@ reftable/block.c (new)
      +
      +	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
      +		int block_header_skip = 4 + w->header_off;
     -+		struct slice compressed = SLICE_INIT;
     ++		byte *compressed = NULL;
      +		int zresult = 0;
      +		uLongf src_len = w->next - block_header_skip;
     -+		slice_resize(&compressed, src_len);
     ++		size_t dest_cap = src_len;
      +
     ++		compressed = reftable_malloc(dest_cap);
      +		while (1) {
     -+			uLongf dest_len = compressed.len;
     ++			uLongf out_dest_len = dest_cap;
      +
     -+			zresult = compress2(compressed.buf, &dest_len,
     ++			zresult = compress2(compressed, &out_dest_len,
      +					    w->buf + block_header_skip, src_len,
      +					    9);
      +			if (zresult == Z_BUF_ERROR) {
     -+				slice_resize(&compressed, 2 * compressed.len);
     ++				dest_cap *= 2;
     ++				compressed =
     ++					reftable_realloc(compressed, dest_cap);
      +				continue;
      +			}
      +
      +			if (Z_OK != zresult) {
     -+				slice_release(&compressed);
     ++				reftable_free(compressed);
      +				return REFTABLE_ZLIB_ERROR;
      +			}
      +
     -+			memcpy(w->buf + block_header_skip, compressed.buf,
     -+			       dest_len);
     -+			w->next = dest_len + block_header_skip;
     -+			slice_release(&compressed);
     ++			memcpy(w->buf + block_header_skip, compressed,
     ++			       out_dest_len);
     ++			w->next = out_dest_len + block_header_skip;
     ++			reftable_free(compressed);
      +			break;
      +		}
      +	}
     @@ reftable/block.c (new)
      +		return REFTABLE_FORMAT_ERROR;
      +
      +	if (typ == BLOCK_TYPE_LOG) {
     -+		struct slice uncompressed = SLICE_INIT;
      +		int block_header_skip = 4 + header_off;
      +		uLongf dst_len = sz - block_header_skip; /* total size of dest
      +							    buffer. */
      +		uLongf src_len = block->len - block_header_skip;
     -+
      +		/* Log blocks specify the *uncompressed* size in their header.
      +		 */
     -+		slice_resize(&uncompressed, sz);
     ++		byte *uncompressed = reftable_malloc(sz);
      +
      +		/* Copy over the block header verbatim. It's not compressed. */
     -+		memcpy(uncompressed.buf, block->data, block_header_skip);
     ++		memcpy(uncompressed, block->data, block_header_skip);
      +
      +		/* Uncompress */
      +		if (Z_OK != uncompress_return_consumed(
     -+				    uncompressed.buf + block_header_skip,
     -+				    &dst_len, block->data + block_header_skip,
     ++				    uncompressed + block_header_skip, &dst_len,
     ++				    block->data + block_header_skip,
      +				    &src_len)) {
     -+			slice_release(&uncompressed);
     ++			reftable_free(uncompressed);
      +			return REFTABLE_ZLIB_ERROR;
      +		}
      +
     @@ reftable/block.c (new)
      +
      +		/* We're done with the input data. */
      +		reftable_block_done(block);
     -+		block->data = uncompressed.buf;
     ++		block->data = uncompressed;
      +		block->len = sz;
      +		block->source = malloc_block_source();
      +		full_block_size = src_len + block_header_skip;
     @@ reftable/block.c (new)
      +void block_reader_start(struct block_reader *br, struct block_iter *it)
      +{
      +	it->br = br;
     -+	slice_resize(&it->last_key, 0);
     ++	slice_reset(&it->last_key);
      +	it->next_off = br->header_off + 4;
      +}
      +
     @@ reftable/block.c (new)
      +{
      +	dest->br = src->br;
      +	dest->next_off = src->next_off;
     -+	slice_copy(&dest->last_key, &src->last_key);
     ++	slice_reset(&dest->last_key);
     ++	slice_addbuf(&dest->last_key, &src->last_key);
      +}
      +
      +int block_iter_next(struct block_iter *it, struct reftable_record *rec)
     @@ reftable/block.c (new)
      +		return -1;
      +	slice_consume(&in, n);
      +
     -+	slice_copy(&it->last_key, &key);
     ++	slice_reset(&it->last_key);
     ++	slice_addbuf(&it->last_key, &key);
      +	it->next_off += start.len - in.len;
      +	slice_release(&key);
      +	return 0;
     @@ reftable/block_test.c (new)
      +
      +	for (i = 0; i < N; i++) {
      +		struct block_iter it = { .last_key = SLICE_INIT };
     -+		slice_set_string(&want, names[i]);
     ++		slice_reset(&want);
     ++		slice_addstr(&want, names[i]);
      +
      +		n = block_reader_seek(&br, &it, &want);
      +		assert(n == 0);
     @@ reftable/block_test.c (new)
      +	return test_main(argc, argv);
      +}
      
     + ## reftable/compat.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++
     ++*/
     ++
     ++/* compat.c - compatibility functions for standalone compilation */
     ++
     ++#include "system.h"
     ++#include "basics.h"
     ++
     ++#ifndef REFTABLE_IN_GITCORE
     ++
     ++#include <dirent.h>
     ++
     ++void put_be32(void *p, uint32_t i)
     ++{
     ++	byte *out = (byte *)p;
     ++
     ++	out[0] = (uint8_t)((i >> 24) & 0xff);
     ++	out[1] = (uint8_t)((i >> 16) & 0xff);
     ++	out[2] = (uint8_t)((i >> 8) & 0xff);
     ++	out[3] = (uint8_t)((i)&0xff);
     ++}
     ++
     ++uint32_t get_be32(uint8_t *in)
     ++{
     ++	return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
     ++	       (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
     ++}
     ++
     ++void put_be64(void *p, uint64_t v)
     ++{
     ++	byte *out = (byte *)p;
     ++	int i = sizeof(uint64_t);
     ++	while (i--) {
     ++		out[i] = (uint8_t)(v & 0xff);
     ++		v >>= 8;
     ++	}
     ++}
     ++
     ++uint64_t get_be64(uint8_t *out)
     ++{
     ++	uint64_t v = 0;
     ++	int i = 0;
     ++	for (i = 0; i < sizeof(uint64_t); i++) {
     ++		v = (v << 8) | (uint8_t)(out[i] & 0xff);
     ++	}
     ++	return v;
     ++}
     ++
     ++uint16_t get_be16(uint8_t *in)
     ++{
     ++	return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
     ++}
     ++
     ++char *xstrdup(const char *s)
     ++{
     ++	int l = strlen(s);
     ++	char *dest = (char *)reftable_malloc(l + 1);
     ++	strncpy(dest, s, l + 1);
     ++	return dest;
     ++}
     ++
     ++void sleep_millisec(int millisecs)
     ++{
     ++	usleep(millisecs * 1000);
     ++}
     ++
     ++void reftable_clear_dir(const char *dirname)
     ++{
     ++	DIR *dir = opendir(dirname);
     ++	struct dirent *ent = NULL;
     ++	assert(dir);
     ++	while ((ent = readdir(dir)) != NULL) {
     ++		unlinkat(dirfd(dir), ent->d_name, 0);
     ++	}
     ++	closedir(dir);
     ++	rmdir(dirname);
     ++}
     ++
     ++#else
     ++
     ++#include "../dir.h"
     ++
     ++void reftable_clear_dir(const char *dirname)
     ++{
     ++	struct strbuf path = STRBUF_INIT;
     ++	strbuf_addstr(&path, dirname);
     ++	remove_dir_recursively(&path, 0);
     ++	strbuf_release(&path);
     ++}
     ++
     ++#endif
     +
     + ## reftable/compat.h (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#ifndef COMPAT_H
     ++#define COMPAT_H
     ++
     ++#include "system.h"
     ++
     ++#ifndef REFTABLE_IN_GITCORE
     ++
     ++/* functions that git-core provides, for standalone compilation */
     ++#include <stdint.h>
     ++
     ++uint64_t get_be64(uint8_t *in);
     ++void put_be64(void *out, uint64_t i);
     ++
     ++void put_be32(void *out, uint32_t i);
     ++uint32_t get_be32(uint8_t *in);
     ++
     ++uint16_t get_be16(uint8_t *in);
     ++
     ++#define ARRAY_SIZE(a) sizeof((a)) / sizeof((a)[0])
     ++#define FREE_AND_NULL(x)          \
     ++	do {                      \
     ++		reftable_free(x); \
     ++		(x) = NULL;       \
     ++	} while (0)
     ++#define QSORT(arr, n, cmp) qsort(arr, n, sizeof(arr[0]), cmp)
     ++#define SWAP(a, b)                              \
     ++	{                                       \
     ++		char tmp[sizeof(a)];            \
     ++		assert(sizeof(a) == sizeof(b)); \
     ++		memcpy(&tmp[0], &a, sizeof(a)); \
     ++		memcpy(&a, &b, sizeof(a));      \
     ++		memcpy(&b, &tmp[0], sizeof(a)); \
     ++	}
     ++
     ++char *xstrdup(const char *s);
     ++
     ++void sleep_millisec(int millisecs);
     ++
     ++#endif
     ++#endif
     +
       ## reftable/constants.h (new) ##
      @@
      +/*
     @@ reftable/constants.h (new)
      +
      +#endif
      
     + ## reftable/dump.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include <stddef.h>
     ++#include <stdio.h>
     ++#include <stdlib.h>
     ++#include <unistd.h>
     ++#include <string.h>
     ++
     ++#include "reftable.h"
     ++#include "reftable-tests.h"
     ++
     ++static uint32_t hash_id;
     ++
     ++static int dump_table(const char *tablename)
     ++{
     ++	struct reftable_block_source src = { 0 };
     ++	int err = reftable_block_source_from_file(&src, tablename);
     ++	struct reftable_iterator it = { 0 };
     ++	struct reftable_ref_record ref = { 0 };
     ++	struct reftable_log_record log = { 0 };
     ++	struct reftable_reader *r = NULL;
     ++
     ++	if (err < 0)
     ++		return err;
     ++
     ++	err = reftable_new_reader(&r, &src, tablename);
     ++	if (err < 0)
     ++		return err;
     ++
     ++	err = reftable_reader_seek_ref(r, &it, "");
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	while (1) {
     ++		err = reftable_iterator_next_ref(&it, &ref);
     ++		if (err > 0) {
     ++			break;
     ++		}
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++		reftable_ref_record_print(&ref, hash_id);
     ++	}
     ++	reftable_iterator_destroy(&it);
     ++	reftable_ref_record_clear(&ref);
     ++
     ++	err = reftable_reader_seek_log(r, &it, "");
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++	while (1) {
     ++		err = reftable_iterator_next_log(&it, &log);
     ++		if (err > 0) {
     ++			break;
     ++		}
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++		reftable_log_record_print(&log, hash_id);
     ++	}
     ++	reftable_iterator_destroy(&it);
     ++	reftable_log_record_clear(&log);
     ++
     ++	reftable_reader_free(r);
     ++	return 0;
     ++}
     ++
     ++static int compact_stack(const char *stackdir)
     ++{
     ++	struct reftable_stack *stack = NULL;
     ++	struct reftable_write_options cfg = {};
     ++
     ++	int err = reftable_new_stack(&stack, stackdir, cfg);
     ++	if (err < 0)
     ++		goto done;
     ++
     ++	err = reftable_stack_compact_all(stack, NULL);
     ++	if (err < 0)
     ++		goto done;
     ++done:
     ++	if (stack != NULL) {
     ++		reftable_stack_destroy(stack);
     ++	}
     ++	return err;
     ++}
     ++
     ++static int dump_stack(const char *stackdir)
     ++{
     ++	struct reftable_stack *stack = NULL;
     ++	struct reftable_write_options cfg = {};
     ++	struct reftable_iterator it = { 0 };
     ++	struct reftable_ref_record ref = { 0 };
     ++	struct reftable_log_record log = { 0 };
     ++	struct reftable_merged_table *merged = NULL;
     ++
     ++	int err = reftable_new_stack(&stack, stackdir, cfg);
     ++	if (err < 0)
     ++		return err;
     ++
     ++	merged = reftable_stack_merged_table(stack);
     ++
     ++	err = reftable_merged_table_seek_ref(merged, &it, "");
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++
     ++	while (1) {
     ++		err = reftable_iterator_next_ref(&it, &ref);
     ++		if (err > 0) {
     ++			break;
     ++		}
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++		reftable_ref_record_print(&ref, hash_id);
     ++	}
     ++	reftable_iterator_destroy(&it);
     ++	reftable_ref_record_clear(&ref);
     ++
     ++	err = reftable_merged_table_seek_log(merged, &it, "");
     ++	if (err < 0) {
     ++		return err;
     ++	}
     ++	while (1) {
     ++		err = reftable_iterator_next_log(&it, &log);
     ++		if (err > 0) {
     ++			break;
     ++		}
     ++		if (err < 0) {
     ++			return err;
     ++		}
     ++		reftable_log_record_print(&log, hash_id);
     ++	}
     ++	reftable_iterator_destroy(&it);
     ++	reftable_log_record_clear(&log);
     ++
     ++	reftable_stack_destroy(stack);
     ++	return 0;
     ++}
     ++
     ++static void print_help(void)
     ++{
     ++	printf("usage: dump [-cst] arg\n\n"
     ++	       "options: \n"
     ++	       "  -c compact\n"
     ++	       "  -t dump table\n"
     ++	       "  -s dump stack\n"
     ++	       "  -h this help\n"
     ++	       "  -2 use SHA256\n"
     ++	       "\n");
     ++}
     ++
     ++int reftable_dump_main(int argc, char *const *argv)
     ++{
     ++	int err = 0;
     ++	int opt;
     ++	int opt_dump_table = 0;
     ++	int opt_dump_stack = 0;
     ++	int opt_compact = 0;
     ++	const char *arg = NULL;
     ++	while ((opt = getopt(argc, argv, "2chts")) != -1) {
     ++		switch (opt) {
     ++		case '2':
     ++			hash_id = 0x73323536;
     ++			break;
     ++		case 't':
     ++			opt_dump_table = 1;
     ++			break;
     ++		case 's':
     ++			opt_dump_stack = 1;
     ++			break;
     ++		case 'c':
     ++			opt_compact = 1;
     ++			break;
     ++		case '?':
     ++		case 'h':
     ++			print_help();
     ++			return 2;
     ++			break;
     ++		}
     ++	}
     ++
     ++	if (argv[optind] == NULL) {
     ++		fprintf(stderr, "need argument\n");
     ++		print_help();
     ++		return 2;
     ++	}
     ++
     ++	arg = argv[optind];
     ++
     ++	if (opt_dump_table) {
     ++		err = dump_table(arg);
     ++	} else if (opt_dump_stack) {
     ++		err = dump_stack(arg);
     ++	} else if (opt_compact) {
     ++		err = compact_stack(arg);
     ++	}
     ++
     ++	if (err < 0) {
     ++		fprintf(stderr, "%s: %s: %s\n", argv[0], arg,
     ++			reftable_error_str(err));
     ++		return 1;
     ++	}
     ++	return 0;
     ++}
     +
       ## reftable/file.c (new) ##
      @@
      +/*
     @@ reftable/iter.c (new)
      +
      +	*itr = empty;
      +	itr->r = r;
     -+	slice_resize(&itr->oid, oid_len);
     -+	memcpy(itr->oid.buf, oid, oid_len);
     ++	slice_add(&itr->oid, oid, oid_len);
      +
      +	itr->offsets = offsets;
      +	itr->offset_len = offset_len;
     @@ reftable/reader.c (new)
      +
      +	filter = reftable_malloc(sizeof(struct filtering_ref_iterator));
      +	*filter = empty;
     -+	slice_resize(&filter->oid, oid_len);
     -+	memcpy(filter->oid.buf, oid, oid_len);
     ++
     ++	slice_add(&filter->oid, oid, oid_len);
      +	reftable_table_from_reader(&filter->tab, r);
      +	filter->double_check = false;
      +	iterator_from_table_iter(&filter->it, ti);
     @@ reftable/record.c (new)
      +	if (in.len < tsize)
      +		return -1;
      +
     -+	slice_resize(dest, tsize + 1);
     -+	dest->buf[tsize] = 0;
     -+	memcpy(dest->buf, in.buf, tsize);
     ++	slice_reset(dest);
     ++	slice_add(dest, in.buf, tsize);
      +	slice_consume(&in, tsize);
      +
      +	return start_len - in.len;
     @@ reftable/record.c (new)
      +	if (in.len < suffix_len)
      +		return -1;
      +
     -+	slice_resize(key, suffix_len + prefix_len);
     -+	memcpy(key->buf, last_key.buf, prefix_len);
     -+
     -+	memcpy(key->buf + prefix_len, in.buf, suffix_len);
     ++	slice_reset(key);
     ++	slice_add(key, last_key.buf, prefix_len);
     ++	slice_add(key, in.buf, suffix_len);
      +	slice_consume(&in, suffix_len);
      +
      +	return start_len - in.len;
     @@ reftable/record.c (new)
      +{
      +	const struct reftable_ref_record *rec =
      +		(const struct reftable_ref_record *)r;
     -+	slice_set_string(dest, rec->ref_name);
     ++	slice_reset(dest);
     ++	slice_addstr(dest, rec->ref_name);
      +}
      +
      +static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
     @@ reftable/record.c (new)
      +{
      +	const struct reftable_obj_record *rec =
      +		(const struct reftable_obj_record *)r;
     -+	slice_resize(dest, rec->hash_prefix_len);
     -+	memcpy(dest->buf, rec->hash_prefix, rec->hash_prefix_len);
     ++	slice_reset(dest);
     ++	slice_add(dest, rec->hash_prefix, rec->hash_prefix_len);
      +}
      +
      +static void reftable_obj_record_clear(void *rec)
     @@ reftable/record.c (new)
      +	const struct reftable_log_record *rec =
      +		(const struct reftable_log_record *)r;
      +	int len = strlen(rec->ref_name);
     ++	byte i64[8];
      +	uint64_t ts = 0;
     -+	slice_resize(dest, len + 9);
     -+	memcpy(dest->buf, rec->ref_name, len + 1);
     ++	slice_reset(dest);
     ++	slice_add(dest, (byte *)rec->ref_name, len + 1);
     ++
      +	ts = (~ts) - rec->update_index;
     -+	put_be64(dest->buf + 1 + len, ts);
     ++	put_be64(&i64[0], ts);
     ++	slice_add(dest, i64, sizeof(i64));
      +}
      +
      +static void reftable_log_record_copy_from(void *rec, const void *src_rec,
     @@ reftable/record.c (new)
      +	memcpy(r->name, dest.buf, dest.len);
      +	r->name[dest.len] = 0;
      +
     -+	slice_resize(&dest, 0);
     ++	slice_reset(&dest);
      +	n = decode_string(&dest, in);
      +	if (n < 0)
      +		goto done;
     @@ reftable/record.c (new)
      +	r->tz_offset = get_be16(in.buf);
      +	slice_consume(&in, 2);
      +
     -+	slice_resize(&dest, 0);
     ++	slice_reset(&dest);
      +	n = decode_string(&dest, in);
      +	if (n < 0)
      +		goto done;
     @@ reftable/record.c (new)
      +static void reftable_index_record_key(const void *r, struct slice *dest)
      +{
      +	struct reftable_index_record *rec = (struct reftable_index_record *)r;
     -+	slice_copy(dest, &rec->last_key);
     ++	slice_reset(dest);
     ++	slice_addbuf(dest, &rec->last_key);
      +}
      +
      +static void reftable_index_record_copy_from(void *rec, const void *src_rec,
     @@ reftable/record.c (new)
      +	struct reftable_index_record *src =
      +		(struct reftable_index_record *)src_rec;
      +
     -+	slice_copy(&dst->last_key, &src->last_key);
     ++	slice_reset(&dst->last_key);
     ++	slice_addbuf(&dst->last_key, &src->last_key);
      +	dst->offset = src->offset;
      +}
      +
     @@ reftable/record.c (new)
      +	struct reftable_index_record *r = (struct reftable_index_record *)rec;
      +	int n = 0;
      +
     -+	slice_copy(&r->last_key, &key);
     ++	slice_reset(&r->last_key);
     ++	slice_addbuf(&r->last_key, &key);
      +
      +	n = get_var_int(&r->offset, &in);
      +	if (n < 0)
     @@ reftable/record_test.c (new)
      +	for (i = 0; i < ARRAY_SIZE(cases); i++) {
      +		struct slice a = SLICE_INIT;
      +		struct slice b = SLICE_INIT;
     -+		slice_set_string(&a, cases[i].a);
     -+		slice_set_string(&b, cases[i].b);
     -+
     ++		slice_addstr(&a, cases[i].a);
     ++		slice_addstr(&b, cases[i].b);
      +		assert(common_prefix_size(&a, &b) == cases[i].want);
      +
      +		slice_release(&a);
     @@ reftable/record_test.c (new)
      +		assert(reftable_record_val_type(&rec) == i);
      +
      +		reftable_record_key(&rec, &key);
     -+		slice_resize(&dest, 1024);
     ++		slice_grow(&dest, 1024);
     ++		slice_setlen(&dest, 1024);
      +		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
      +		assert(n > 0);
      +
     @@ reftable/record_test.c (new)
      +
      +		reftable_record_key(&rec, &key);
      +
     -+		slice_resize(&dest, 1024);
     ++		slice_grow(&dest, 1024);
     ++		slice_setlen(&dest, 1024);
      +
      +		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
      +		assert(n >= 0);
     @@ reftable/record_test.c (new)
      +	int n, m;
      +	byte rt_extra;
      +
     -+	slice_resize(&dest, 1024);
     -+	slice_set_string(&last_key, "refs/heads/master");
     -+	slice_set_string(&key, "refs/tags/bla");
     -+
     ++	slice_grow(&dest, 1024);
     ++	slice_setlen(&dest, 1024);
     ++	slice_addstr(&last_key, "refs/heads/master");
     ++	slice_addstr(&key, "refs/tags/bla");
      +	extra = 6;
      +	n = reftable_encode_key(&restart, dest, last_key, key, extra);
      +	assert(!restart);
     @@ reftable/record_test.c (new)
      +
      +	m = reftable_decode_key(&roundtrip, &rt_extra, last_key, dest);
      +	assert(n == m);
     -+	assert(slice_equal(&key, &roundtrip));
     ++	assert(0 == slice_cmp(&key, &roundtrip));
      +	assert(rt_extra == extra);
      +
      +	slice_release(&last_key);
     @@ reftable/record_test.c (new)
      +		reftable_record_from_obj(&rec, &in);
      +		test_copy(&rec);
      +		reftable_record_key(&rec, &key);
     -+		slice_resize(&dest, 1024);
     ++		slice_grow(&dest, 1024);
     ++		slice_setlen(&dest, 1024);
      +		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
      +		assert(n > 0);
      +		extra = reftable_record_val_type(&rec);
     @@ reftable/record_test.c (new)
      +	int n, m;
      +	byte extra;
      +
     -+	slice_set_string(&in.last_key, "refs/heads/master");
     ++	slice_addstr(&in.last_key, "refs/heads/master");
      +	reftable_record_from_index(&rec, &in);
      +	reftable_record_key(&rec, &key);
      +	test_copy(&rec);
      +
      +	assert(0 == slice_cmp(&key, &in.last_key));
     -+	slice_resize(&dest, 1024);
     ++	slice_grow(&dest, 1024);
     ++	slice_setlen(&dest, 1024);
      +	n = reftable_record_encode(&rec, dest, SHA1_SIZE);
      +	assert(n > 0);
      +
     @@ reftable/refname.c (new)
      +{
      +	while (sl->len > 0) {
      +		bool is_slash = (sl->buf[sl->len - 1] == '/');
     -+		sl->len--;
     ++		slice_setlen(sl, sl->len - 1);
      +		if (is_slash)
      +			break;
      +	}
     @@ reftable/refname.c (new)
      +		err = validate_ref_name(mod->add[i]);
      +		if (err)
      +			goto done;
     -+		slice_set_string(&slashed, mod->add[i]);
     ++		slice_reset(&slashed);
     ++		slice_addstr(&slashed, mod->add[i]);
      +		slice_addstr(&slashed, "/");
      +
      +		err = modification_has_ref_with_prefix(
     @@ reftable/refname.c (new)
      +		if (err < 0)
      +			goto done;
      +
     -+		slice_set_string(&slashed, mod->add[i]);
     ++		slice_reset(&slashed);
     ++		slice_addstr(&slashed, mod->add[i]);
      +		while (slashed.len) {
      +			slice_trim_component(&slashed);
      +			err = modification_has_ref(mod,
     @@ reftable/refname_test.c (new)
      
       ## reftable/reftable-tests.h (new) ##
      @@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
      +#ifndef REFTABLE_TESTS_H
      +#define REFTABLE_TESTS_H
      +
     @@ reftable/reftable-tests.h (new)
      +int slice_test_main(int argc, const char **argv);
      +int stack_test_main(int argc, const char **argv);
      +int tree_test_main(int argc, const char **argv);
     ++int reftable_dump_main(int argc, char *const *argv);
      +
      +#endif
      
     @@ reftable/reftable.h (new)
      +	   - on writing a record with NULL ref_name.
      +	   - on writing a reftable_ref_record outside the table limits
      +	   - on writing a ref or log record before the stack's next_update_index
     ++	   - on writing a log record with multiline message with
     ++	   exact_log_message unset
      +	   - on reading a reftable_ref_record from log iterator, or vice versa.
      +	*/
      +	REFTABLE_API_ERROR = -6,
     @@ reftable/reftable.h (new)
      +	/* boolean: do not check ref names for validity or dir/file conflicts.
      +	 */
      +	int skip_name_check;
     ++
     ++	/* boolean: copy log messages exactly. If unset, check that the message
     ++	 *   is a single line, and add '\n' if missing.
     ++	 */
     ++	int exact_log_message;
      +};
      +
      +/* reftable_block_stats holds statistics for a single block type */
     @@ reftable/reftable_test.c (new)
      +		reftable_iterator_destroy(&it);
      +	}
      +
     -+	slice_set_string(&pastLast, names[N - 1]);
     ++	slice_addstr(&pastLast, names[N - 1]);
      +	slice_addstr(&pastLast, "/");
      +
      +	err = reftable_reader_seek_ref(&rd, &it, slice_as_string(&pastLast));
     @@ reftable/slice.c (new)
      +
      +struct slice reftable_empty_slice = SLICE_INIT;
      +
     -+void slice_set_string(struct slice *s, const char *str)
     -+{
     -+	int l;
     -+	if (str == NULL) {
     -+		s->len = 0;
     -+		return;
     -+	}
     -+	assert(s->canary == SLICE_CANARY);
     -+
     -+	l = strlen(str);
     -+	l++; /* \0 */
     -+	slice_resize(s, l);
     -+	memcpy(s->buf, str, l);
     -+	s->len = l - 1;
     -+}
     -+
      +void slice_init(struct slice *s)
      +{
      +	struct slice empty = SLICE_INIT;
      +	*s = empty;
      +}
      +
     -+void slice_resize(struct slice *s, int l)
     ++void slice_grow(struct slice *s, size_t extra)
      +{
     ++	size_t newcap = s->len + extra + 1;
     ++	if (newcap > s->cap) {
     ++		s->buf = reftable_realloc(s->buf, newcap);
     ++		s->cap = newcap;
     ++	}
     ++}
     ++
     ++static void slice_resize(struct slice *s, int l)
     ++{
     ++	int zl = l + 1; /* one byte for 0 termination. */
      +	assert(s->canary == SLICE_CANARY);
     -+	if (s->cap < l) {
     ++	if (s->cap < zl) {
      +		int c = s->cap * 2;
     -+		if (c < l) {
     -+			c = l;
     ++		if (c < zl) {
     ++			c = zl;
      +		}
      +		s->cap = c;
      +		s->buf = reftable_realloc(s->buf, s->cap);
      +	}
      +	s->len = l;
     ++	s->buf[l] = 0;
     ++}
     ++
     ++void slice_setlen(struct slice *s, size_t l)
     ++{
     ++	assert(s->cap >= l + 1);
     ++	s->len = l;
     ++	s->buf[l] = 0;
     ++}
     ++
     ++void slice_reset(struct slice *s)
     ++{
     ++	slice_resize(s, 0);
      +}
      +
      +void slice_addstr(struct slice *d, const char *s)
     @@ reftable/slice.c (new)
      +	s->len -= n;
      +}
      +
     -+byte *slice_detach(struct slice *s)
     ++char *slice_detach(struct slice *s)
      +{
     -+	byte *p = s->buf;
     -+	assert(s->canary == SLICE_CANARY);
     ++	char *p = NULL;
     ++	slice_as_string(s);
     ++	p = (char *)s->buf;
      +	s->buf = NULL;
      +	s->cap = 0;
      +	s->len = 0;
     @@ reftable/slice.c (new)
      +
      +void slice_release(struct slice *s)
      +{
     ++	byte *ptr = s->buf;
      +	assert(s->canary == SLICE_CANARY);
     -+	reftable_free(slice_detach(s));
     -+}
     -+
     -+void slice_copy(struct slice *dest, struct slice *src)
     -+{
     -+	assert(dest->canary == SLICE_CANARY);
     -+	assert(src->canary == SLICE_CANARY);
     -+	slice_resize(dest, src->len);
     -+	memcpy(dest->buf, src->buf, src->len);
     ++	s->buf = NULL;
     ++	s->cap = 0;
     ++	s->len = 0;
     ++	reftable_free(ptr);
      +}
      +
      +/* return the underlying data as char*. len is left unchanged, but
      +   a \0 is added at the end. */
      +const char *slice_as_string(struct slice *s)
      +{
     -+	assert(s->canary == SLICE_CANARY);
     -+	if (s->cap == s->len) {
     -+		int l = s->len;
     -+		slice_resize(s, l + 1);
     -+		s->len = l;
     -+	}
     -+	s->buf[s->len] = 0;
      +	return (const char *)s->buf;
      +}
      +
     -+/* return a newly malloced string for this slice */
     -+char *slice_to_string(struct slice *in)
     -+{
     -+	struct slice s = SLICE_INIT;
     -+	assert(in->canary == SLICE_CANARY);
     -+	slice_resize(&s, in->len + 1);
     -+	s.buf[in->len] = 0;
     -+	memcpy(s.buf, in->buf, in->len);
     -+	return (char *)slice_detach(&s);
     -+}
     -+
     -+bool slice_equal(struct slice *a, struct slice *b)
     -+{
     -+	return slice_cmp(a, b) == 0;
     -+}
     -+
      +int slice_cmp(const struct slice *a, const struct slice *b)
      +{
      +	int min = a->len < b->len ? a->len : b->len;
     @@ reftable/slice.c (new)
      +		return 0;
      +}
      +
     -+int slice_add(struct slice *b, byte *data, size_t sz)
     ++int slice_add(struct slice *b, const byte *data, size_t sz)
      +{
      +	assert(b->canary == SLICE_CANARY);
     -+	if (b->len + sz > b->cap) {
     -+		int newcap = 2 * b->cap + 1;
     -+		if (newcap < b->len + sz) {
     -+			newcap = (b->len + sz);
     -+		}
     -+		b->buf = reftable_realloc(b->buf, newcap);
     -+		b->cap = newcap;
     -+	}
     -+
     ++	slice_grow(b, sz);
      +	memcpy(b->buf + b->len, data, sz);
      +	b->len += sz;
     ++	b->buf[b->len] = 0;
      +	return sz;
      +}
      +
     @@ reftable/slice.h (new)
      +#include "reftable.h"
      +
      +/*
     -+  provides bounds-checked byte ranges.
     -+  To use, initialize as "slice x = {0};"
     ++  Provides a bounds-checked, growable byte ranges. To use, initialize as "slice
     ++  x = SLICE_INIT;"
      + */
      +struct slice {
      +	int len;
      +	int cap;
      +	byte *buf;
     ++
     ++	/* Used to enforce initialization with SLICE_INIT */
      +	byte canary;
      +};
      +#define SLICE_CANARY 0x42
     @@ reftable/slice.h (new)
      +	}
      +extern struct slice reftable_empty_slice;
      +
     -+void slice_set_string(struct slice *dest, const char *src);
      +void slice_addstr(struct slice *dest, const char *src);
      +
      +/* Deallocate and clear slice */
      +void slice_release(struct slice *slice);
      +
     -+/* Return a malloced string for `src` */
     -+char *slice_to_string(struct slice *src);
     ++/* Set slice to 0 length, but retain buffer. */
     ++void slice_reset(struct slice *slice);
      +
      +/* Initializes a slice. Accepts a slice with random garbage. */
      +void slice_init(struct slice *slice);
     @@ reftable/slice.h (new)
      +/* Ensure that `buf` is \0 terminated. */
      +const char *slice_as_string(struct slice *src);
      +
     -+/* Compare slices */
     -+bool slice_equal(struct slice *a, struct slice *b);
     -+
      +/* Return `buf`, clearing out `s` */
     -+byte *slice_detach(struct slice *s);
     ++char *slice_detach(struct slice *s);
      +
     -+/* Copy bytes */
     -+void slice_copy(struct slice *dest, struct slice *src);
     ++/* Set length of the slace to `l`, but don't reallocated. */
     ++void slice_setlen(struct slice *s, size_t l);
      +
     -+/* Advance `buf` by `n`, and decrease length. A copy of the slice
     -+   should be kept for deallocating the slice. */
     -+void slice_consume(struct slice *s, int n);
     -+
     -+/* Set length of the slice to `l` */
     -+void slice_resize(struct slice *s, int l);
     ++/* Ensure `l` bytes beyond current length are available */
     ++void slice_grow(struct slice *s, size_t l);
      +
      +/* Signed comparison */
      +int slice_cmp(const struct slice *a, const struct slice *b);
      +
      +/* Append `data` to the `dest` slice.  */
     -+int slice_add(struct slice *dest, byte *data, size_t sz);
     ++int slice_add(struct slice *dest, const byte *data, size_t sz);
      +
      +/* Append `add` to `dest. */
      +void slice_addbuf(struct slice *dest, struct slice *add);
     @@ reftable/slice.h (new)
      +
      +struct reftable_block_source malloc_block_source(void);
      +
     ++/* Advance `buf` by `n`, and decrease length. A copy of the slice
     ++   should be kept for deallocating the slice. */
     ++void slice_consume(struct slice *s, int n);
     ++
      +#endif
      
       ## reftable/slice_test.c (new) ##
     @@ reftable/slice_test.c (new)
      +	struct slice s = SLICE_INIT;
      +	struct slice t = SLICE_INIT;
      +
     -+	slice_set_string(&s, "abc");
     ++	slice_addstr(&s, "abc");
      +	assert(0 == strcmp("abc", slice_as_string(&s)));
      +
     -+	slice_set_string(&t, "pqr");
     -+
     ++	slice_addstr(&t, "pqr");
      +	slice_addbuf(&s, &t);
      +	assert(0 == strcmp("abcpqr", slice_as_string(&s)));
      +
     @@ reftable/stack.c (new)
      +
      +	*dest = NULL;
      +
     -+	slice_set_string(&list_file_name, dir);
     ++	slice_reset(&list_file_name);
     ++	slice_addstr(&list_file_name, dir);
      +	slice_addstr(&list_file_name, "/tables.list");
      +
     -+	p->list_file = slice_to_string(&list_file_name);
     -+	slice_release(&list_file_name);
     ++	p->list_file = slice_detach(&list_file_name);
      +	p->reftable_dir = xstrdup(dir);
      +	p->config = config;
      +
     @@ reftable/stack.c (new)
      +	int new_tables_len = 0;
      +	struct reftable_merged_table *new_merged = NULL;
      +	int i;
     -+	struct slice table_path = SLICE_INIT;
      +
      +	while (*names) {
      +		struct reftable_reader *rd = NULL;
     @@ reftable/stack.c (new)
      +
      +		if (rd == NULL) {
      +			struct reftable_block_source src = { 0 };
     -+			slice_set_string(&table_path, st->reftable_dir);
     ++			struct slice table_path = SLICE_INIT;
     ++			slice_addstr(&table_path, st->reftable_dir);
      +			slice_addstr(&table_path, "/");
      +			slice_addstr(&table_path, name);
      +
      +			err = reftable_block_source_from_file(
      +				&src, slice_as_string(&table_path));
     ++			slice_release(&table_path);
     ++
      +			if (err < 0)
      +				goto done;
      +
     @@ reftable/stack.c (new)
      +	}
      +
      +done:
     -+	slice_release(&table_path);
      +	for (i = 0; i < new_tables_len; i++) {
      +		reader_close(new_tables[i]);
      +		reftable_reader_free(new_tables[i]);
     @@ reftable/stack.c (new)
      +{
      +	char buf[100];
      +	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
     -+	slice_set_string(dest, buf);
     ++	slice_reset(dest);
     ++	slice_addstr(dest, buf);
      +}
      +
      +struct reftable_addition {
     @@ reftable/stack.c (new)
      +	int err = 0;
      +	add->stack = st;
      +
     -+	slice_set_string(&add->lock_file_name, st->list_file);
     ++	slice_reset(&add->lock_file_name);
     ++	slice_addstr(&add->lock_file_name, st->list_file);
      +	slice_addstr(&add->lock_file_name, ".lock");
      +
      +	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
     @@ reftable/stack.c (new)
      +	int i = 0;
      +	struct slice nm = SLICE_INIT;
      +	for (i = 0; i < add->new_tables_len; i++) {
     -+		slice_set_string(&nm, add->stack->list_file);
     ++		slice_reset(&nm);
     ++		slice_addstr(&nm, add->stack->list_file);
      +		slice_addstr(&nm, "/");
      +		slice_addstr(&nm, add->new_tables[i]);
      +		unlink(slice_as_string(&nm));
     @@ reftable/stack.c (new)
      +	int err = 0;
      +	int tab_fd = 0;
      +
     -+	slice_resize(&next_name, 0);
     ++	slice_reset(&next_name);
      +	format_name(&next_name, add->next_update_index, add->next_update_index);
      +
     -+	slice_set_string(&temp_tab_file_name, add->stack->reftable_dir);
     ++	slice_addstr(&temp_tab_file_name, add->stack->reftable_dir);
      +	slice_addstr(&temp_tab_file_name, "/");
      +	slice_addbuf(&temp_tab_file_name, &next_name);
      +	slice_addstr(&temp_tab_file_name, ".temp.XXXXXX");
     @@ reftable/stack.c (new)
      +	format_name(&next_name, wr->min_update_index, wr->max_update_index);
      +	slice_addstr(&next_name, ".ref");
      +
     -+	slice_set_string(&tab_file_name, add->stack->reftable_dir);
     ++	slice_addstr(&tab_file_name, add->stack->reftable_dir);
      +	slice_addstr(&tab_file_name, "/");
      +	slice_addbuf(&tab_file_name, &next_name);
      +
     @@ reftable/stack.c (new)
      +	add->new_tables = reftable_realloc(add->new_tables,
      +					   sizeof(*add->new_tables) *
      +						   (add->new_tables_len + 1));
     -+	add->new_tables[add->new_tables_len] = slice_to_string(&next_name);
     ++	add->new_tables[add->new_tables_len] = slice_detach(&next_name);
      +	add->new_tables_len++;
      +done:
      +	if (tab_fd > 0) {
     @@ reftable/stack.c (new)
      +		    reftable_reader_min_update_index(st->merged->stack[first]),
      +		    reftable_reader_max_update_index(st->merged->stack[first]));
      +
     -+	slice_set_string(temp_tab, st->reftable_dir);
     ++	slice_reset(temp_tab);
     ++	slice_addstr(temp_tab, st->reftable_dir);
      +	slice_addstr(temp_tab, "/");
      +	slice_addbuf(temp_tab, &next_name);
      +	slice_addstr(temp_tab, ".temp.XXXXXX");
     @@ reftable/stack.c (new)
      +
      +	st->stats.attempts++;
      +
     -+	slice_set_string(&lock_file_name, st->list_file);
     ++	slice_reset(&lock_file_name);
     ++	slice_addstr(&lock_file_name, st->list_file);
      +	slice_addstr(&lock_file_name, ".lock");
      +
      +	lock_file_fd = open(slice_as_string(&lock_file_name),
     @@ reftable/stack.c (new)
      +		struct slice subtab_lock = SLICE_INIT;
      +		int sublock_file_fd = -1;
      +
     -+		slice_set_string(&subtab_file_name, st->reftable_dir);
     ++		slice_addstr(&subtab_file_name, st->reftable_dir);
      +		slice_addstr(&subtab_file_name, "/");
      +		slice_addstr(&subtab_file_name,
      +			     reader_name(st->merged->stack[i]));
      +
     -+		slice_copy(&subtab_lock, &subtab_file_name);
     ++		slice_reset(&subtab_lock);
     ++		slice_addbuf(&subtab_lock, &subtab_file_name);
      +		slice_addstr(&subtab_lock, ".lock");
      +
      +		sublock_file_fd = open(slice_as_string(&subtab_lock),
     @@ reftable/stack.c (new)
      +		    st->merged->stack[last]->max_update_index);
      +	slice_addstr(&new_table_name, ".ref");
      +
     -+	slice_set_string(&new_table_path, st->reftable_dir);
     ++	slice_reset(&new_table_path);
     ++	slice_addstr(&new_table_path, st->reftable_dir);
      +	slice_addstr(&new_table_path, "/");
     -+
      +	slice_addbuf(&new_table_path, &new_table_name);
      +
      +	if (!is_empty_table) {
     @@ reftable/stack_test.c (new)
      +#include <sys/types.h>
      +#include <dirent.h>
      +
     -+static void clear_dir(const char *dirname)
     -+{
     -+	int fd = open(dirname, O_DIRECTORY, 0);
     -+	DIR *dir = fdopendir(fd);
     -+	struct dirent *ent = NULL;
     -+
     -+	assert(fd >= 0);
     -+
     -+	while ((ent = readdir(dir)) != NULL) {
     -+		unlinkat(fd, ent->d_name, 0);
     -+	}
     -+	closedir(dir);
     -+	rmdir(dirname);
     -+}
     -+
      +static void test_read_file(void)
      +{
      +	char fn[256] = "/tmp/stack.test_read_file.XXXXXX";
     @@ reftable/stack_test.c (new)
      +
      +	reftable_ref_record_clear(&dest);
      +	reftable_stack_destroy(st);
     -+	clear_dir(dir);
     ++	reftable_clear_dir(dir);
      +}
      +
      +static void test_reftable_stack_uptodate(void)
     @@ reftable/stack_test.c (new)
      +	assert_err(err);
      +	reftable_stack_destroy(st1);
      +	reftable_stack_destroy(st2);
     -+	clear_dir(dir);
     ++	reftable_clear_dir(dir);
      +}
      +
      +static void test_reftable_stack_transaction_api(void)
     @@ reftable/stack_test.c (new)
      +
      +	reftable_ref_record_clear(&dest);
      +	reftable_stack_destroy(st);
     -+	clear_dir(dir);
     ++	reftable_clear_dir(dir);
      +}
      +
      +static void test_reftable_stack_validate_refname(void)
     @@ reftable/stack_test.c (new)
      +	}
      +
      +	reftable_stack_destroy(st);
     -+	clear_dir(dir);
     ++	reftable_clear_dir(dir);
      +}
      +
      +static int write_error(struct reftable_writer *wr, void *arg)
     @@ reftable/stack_test.c (new)
      +	err = reftable_stack_add(st, &write_test_ref, &ref2);
      +	assert(err == REFTABLE_API_ERROR);
      +	reftable_stack_destroy(st);
     -+	clear_dir(dir);
     ++	reftable_clear_dir(dir);
      +}
      +
      +static void test_reftable_stack_lock_failure(void)
     @@ reftable/stack_test.c (new)
      +	}
      +
      +	reftable_stack_destroy(st);
     -+	clear_dir(dir);
     ++	reftable_clear_dir(dir);
      +}
      +
      +static void test_reftable_stack_add(void)
      +{
      +	int i = 0;
      +	int err = 0;
     -+	struct reftable_write_options cfg = { 0 };
     ++	struct reftable_write_options cfg = {
     ++		.exact_log_message = true,
     ++	};
      +	struct reftable_stack *st = NULL;
      +	char dir[256] = "/tmp/stack_test.XXXXXX";
     -+	struct reftable_ref_record refs[2] = { 0 };
     -+	struct reftable_log_record logs[2] = { 0 };
     ++	struct reftable_ref_record refs[2] = { { 0 } };
     ++	struct reftable_log_record logs[2] = { { 0 } };
      +	int N = ARRAY_SIZE(refs);
      +
      +	assert(mkdtemp(dir));
     @@ reftable/stack_test.c (new)
      +		reftable_ref_record_clear(&refs[i]);
      +		reftable_log_record_clear(&logs[i]);
      +	}
     -+	clear_dir(dir);
     ++	reftable_clear_dir(dir);
     ++}
     ++
     ++static void test_reftable_stack_log_normalize(void)
     ++{
     ++	int err = 0;
     ++	struct reftable_write_options cfg = {
     ++		0,
     ++	};
     ++	struct reftable_stack *st = NULL;
     ++	char dir[256] = "/tmp/stack_test.XXXXXX";
     ++
     ++	byte h1[SHA1_SIZE] = { 0x01 }, h2[SHA1_SIZE] = { 0x02 };
     ++
     ++	struct reftable_log_record input = {
     ++		.ref_name = "branch",
     ++		.update_index = 1,
     ++		.new_hash = h1,
     ++		.old_hash = h2,
     ++	};
     ++	struct reftable_log_record dest = {
     ++		.update_index = 0,
     ++	};
     ++	struct write_log_arg arg = {
     ++		.log = &input,
     ++		.update_index = 1,
     ++	};
     ++
     ++	assert(mkdtemp(dir));
     ++	err = reftable_new_stack(&st, dir, cfg);
     ++	assert_err(err);
     ++
     ++	input.message = "one\ntwo";
     ++	err = reftable_stack_add(st, &write_test_log, &arg);
     ++	assert(err == REFTABLE_API_ERROR);
     ++
     ++	input.message = "one";
     ++	err = reftable_stack_add(st, &write_test_log, &arg);
     ++	assert_err(err);
     ++
     ++	err = reftable_stack_read_log(st, input.ref_name, &dest);
     ++	assert_err(err);
     ++	assert(0 == strcmp(dest.message, "one\n"));
     ++
     ++	input.message = "two\n";
     ++	arg.update_index = 2;
     ++	err = reftable_stack_add(st, &write_test_log, &arg);
     ++	assert_err(err);
     ++	err = reftable_stack_read_log(st, input.ref_name, &dest);
     ++	assert_err(err);
     ++	assert(0 == strcmp(dest.message, "two\n"));
     ++
     ++	/* cleanup */
     ++	reftable_stack_destroy(st);
     ++	reftable_log_record_clear(&dest);
     ++	reftable_clear_dir(dir);
      +}
      +
      +static void test_reftable_stack_tombstone(void)
     @@ reftable/stack_test.c (new)
      +	struct reftable_write_options cfg = { 0 };
      +	struct reftable_stack *st = NULL;
      +	int err;
     -+	struct reftable_ref_record refs[2] = { 0 };
     -+	struct reftable_log_record logs[2] = { 0 };
     ++	struct reftable_ref_record refs[2] = { { 0 } };
     ++	struct reftable_log_record logs[2] = { { 0 } };
      +	int N = ARRAY_SIZE(refs);
      +	struct reftable_ref_record dest = { 0 };
      +	struct reftable_log_record log_dest = { 0 };
     @@ reftable/stack_test.c (new)
      +		reftable_ref_record_clear(&refs[i]);
      +		reftable_log_record_clear(&logs[i]);
      +	}
     -+	clear_dir(dir);
     ++	reftable_clear_dir(dir);
      +}
      +
      +static void test_reftable_stack_hash_id(void)
     @@ reftable/stack_test.c (new)
      +	reftable_ref_record_clear(&dest);
      +	reftable_stack_destroy(st);
      +	reftable_stack_destroy(st_default);
     -+	clear_dir(dir);
     ++	reftable_clear_dir(dir);
      +}
      +
      +static void test_log2(void)
     @@ reftable/stack_test.c (new)
      +	char dir[256] = "/tmp/stack.test_reflog_expire.XXXXXX";
      +	struct reftable_write_options cfg = { 0 };
      +	struct reftable_stack *st = NULL;
     -+	struct reftable_log_record logs[20] = { 0 };
     ++	struct reftable_log_record logs[20] = { { 0 } };
      +	int N = ARRAY_SIZE(logs) - 1;
      +	int i = 0;
      +	int err;
     @@ reftable/stack_test.c (new)
      +	for (i = 0; i <= N; i++) {
      +		reftable_log_record_clear(&logs[i]);
      +	}
     -+	clear_dir(dir);
     ++	reftable_clear_dir(dir);
      +	reftable_log_record_clear(&log);
      +}
      +
     @@ reftable/stack_test.c (new)
      +
      +	err = reftable_new_stack(&st2, dir, cfg);
      +	assert_err(err);
     -+	clear_dir(dir);
     ++	reftable_clear_dir(dir);
      +	reftable_stack_destroy(st);
      +	reftable_stack_destroy(st2);
      +}
     @@ reftable/stack_test.c (new)
      +	       (uint64_t)(N * fastlog2(N)));
      +
      +	reftable_stack_destroy(st);
     -+	clear_dir(dir);
     ++	reftable_clear_dir(dir);
      +}
      +
      +int stack_test_main(int argc, const char *argv[])
     @@ reftable/stack_test.c (new)
      +		      &test_reftable_stack_update_index_check);
      +	add_test_case("test_reftable_stack_lock_failure",
      +		      &test_reftable_stack_lock_failure);
     ++	add_test_case("test_reftable_stack_log_normalize",
     ++		      &test_reftable_stack_log_normalize);
      +	add_test_case("test_reftable_stack_tombstone",
      +		      &test_reftable_stack_tombstone);
      +	add_test_case("test_reftable_stack_add_one",
     @@ reftable/system.h (new)
      +#define SYSTEM_H
      +
      +#if 1 /* REFTABLE_IN_GITCORE */
     ++#define REFTABLE_IN_GITCORE
      +
      +#include "git-compat-util.h"
      +#include "cache.h"
     @@ reftable/system.h (new)
      +
      +#endif /* REFTABLE_IN_GITCORE */
      +
     ++void reftable_clear_dir(const char *dirname);
     ++
      +#define SHA1_ID 0x73686131
      +#define SHA256_ID 0x73323536
      +#define SHA1_SIZE 20
     @@ reftable/system.h (new)
      +typedef int bool;
      +
      +/* This is uncompress2, which is only available in zlib as of 2017.
     -+ *
     -+ * TODO: in git-core, this should fallback to uncompress2 if it is available.
      + */
      +int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
      +			       const Bytef *source, uLong *sourceLen);
     @@ reftable/update.sh (new)
      +mv reftable/system.h reftable/system.h~
      +sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
      +
     -+# Remove compatibility hacks we don't need here.
     -+rm reftable/compat.*
     -+
      +git add reftable/*.[ch] reftable/LICENSE reftable/VERSION
      
       ## reftable/writer.c (new) ##
     @@ reftable/writer.c (new)
      +		key = reftable_malloc(sizeof(struct obj_index_tree_node));
      +		*key = empty;
      +
     -+		slice_copy(&key->hash, hash);
     ++		slice_reset(&key->hash);
     ++		slice_addbuf(&key->hash, hash);
      +		tree_search((void *)key, &w->obj_index_tree,
      +			    &obj_index_tree_node_compare, 1);
      +	} else {
     @@ reftable/writer.c (new)
      +	if (slice_cmp(&w->last_key, &key) >= 0)
      +		goto done;
      +
     -+	slice_copy(&w->last_key, &key);
     ++	slice_reset(&w->last_key);
     ++	slice_addbuf(&w->last_key, &key);
      +	if (w->block_writer == NULL) {
      +		writer_reinit_block_writer(w, reftable_record_type(rec));
      +	}
     @@ reftable/writer.c (new)
      +		return err;
      +
      +	if (!w->opts.skip_index_objects && ref->value != NULL) {
     -+		struct slice h = {
     -+			.buf = ref->value,
     -+			.len = hash_size(w->opts.hash_id),
     -+			.canary = SLICE_CANARY,
     -+		};
     -+
     ++		struct slice h = SLICE_INIT;
     ++		slice_add(&h, ref->value, hash_size(w->opts.hash_id));
      +		writer_index_hash(w, &h);
     ++		slice_release(&h);
      +	}
      +
      +	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
     -+		struct slice h = {
     -+			.buf = ref->target_value,
     -+			.len = hash_size(w->opts.hash_id),
     -+			.canary = SLICE_CANARY,
     -+		};
     ++		struct slice h = SLICE_INIT;
     ++		slice_add(&h, ref->target_value, hash_size(w->opts.hash_id));
      +		writer_index_hash(w, &h);
      +	}
      +	return 0;
     @@ reftable/writer.c (new)
      +			    struct reftable_log_record *log)
      +{
      +	struct reftable_record rec = { 0 };
     ++	char *input_log_message = log->message;
     ++	struct slice cleaned_message = SLICE_INIT;
      +	int err;
      +	if (log->ref_name == NULL)
      +		return REFTABLE_API_ERROR;
     @@ reftable/writer.c (new)
      +			return err;
      +	}
      +
     ++	if (!w->opts.exact_log_message && log->message != NULL) {
     ++		slice_addstr(&cleaned_message, log->message);
     ++		while (cleaned_message.len &&
     ++		       cleaned_message.buf[cleaned_message.len - 1] == '\n')
     ++			slice_setlen(&cleaned_message, cleaned_message.len - 1);
     ++		if (strchr(slice_as_string(&cleaned_message), '\n')) {
     ++			// multiple lines not allowed.
     ++			err = REFTABLE_API_ERROR;
     ++			goto done;
     ++		}
     ++		slice_addstr(&cleaned_message, "\n");
     ++		log->message = (char *)slice_as_string(&cleaned_message);
     ++	}
     ++
      +	w->next -= w->pending_padding;
      +	w->pending_padding = 0;
      +
      +	reftable_record_from_log(&rec, log);
      +	err = writer_add_record(w, &rec);
     ++
     ++done:
     ++	log->message = input_log_message;
     ++	slice_release(&cleaned_message);
      +	return err;
      +}
      +
     @@ reftable/writer.c (new)
      +	}
      +
      +	ir.offset = w->next;
     -+	slice_copy(&ir.last_key, &w->block_writer->last_key);
     ++	slice_reset(&ir.last_key);
     ++	slice_addbuf(&ir.last_key, &w->block_writer->last_key);
      +	w->index[w->index_len] = ir;
      +
      +	w->index_len++;
 10:  f3d74b78135 ! 12:  e1b01927454 Reftable support for git-core
     @@ Makefile: VCSSVN_OBJS += vcs-svn/sliding_window.o
       
      +REFTABLE_OBJS += reftable/basics.o
      +REFTABLE_OBJS += reftable/block.o
     ++REFTABLE_OBJS += reftable/compat.o
      +REFTABLE_OBJS += reftable/file.o
      +REFTABLE_OBJS += reftable/iter.o
      +REFTABLE_OBJS += reftable/merged.o
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +	for (i = 0; i < arg->refnames->nr; i++) {
     -+		struct reftable_log_record log = { NULL };
     ++		struct reftable_log_record log = {
     ++			.update_index = ts,
     ++		};
      +		struct reftable_ref_record current = { NULL };
      +		fill_reftable_log_record(&log);
      +		log.message = xstrdup(arg->logmsg);
     @@ refs/reftable-backend.c (new)
      +	};
      +	reftable_writer_set_limits(writer, ts, ts);
      +	err = reftable_writer_add_ref(writer, &ref);
     -+	if (err < 0) {
     -+		return err;
     -+	}
     -+
     -+	{
     ++	if (err == 0) {
      +		struct reftable_log_record log = { NULL };
      +		struct object_id new_oid;
      +		struct object_id old_oid;
     -+		struct reftable_ref_record current = { NULL };
     -+		reftable_stack_read_ref(create->refs->stack, create->refname,
     -+					&current);
      +
      +		fill_reftable_log_record(&log);
     -+		log.ref_name = current.ref_name;
     ++		log.ref_name = (char *)create->refname;
     ++		log.message = (char *)create->logmsg;
     ++		log.update_index = ts;
      +		if (refs_resolve_ref_unsafe(
      +			    (struct ref_store *)create->refs, create->refname,
      +			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
     @@ refs/reftable-backend.c (new)
      +		}
      +
      +		if (log.old_hash != NULL || log.new_hash != NULL) {
     -+			reftable_writer_add_log(writer, &log);
     ++			err = reftable_writer_add_log(writer, &log);
      +		}
      +		log.ref_name = NULL;
     ++		log.message = NULL;
      +		log.old_hash = NULL;
      +		log.new_hash = NULL;
      +		clear_reftable_log_record(&log);
      +	}
     -+	return 0;
     ++	return err;
      +}
      +
      +static int reftable_create_symref(struct ref_store *ref_store,
     @@ refs/reftable-backend.c (new)
      +	mt = reftable_stack_merged_table(refs->stack);
      +	err = reftable_merged_table_seek_log(mt, &it, refname);
      +	while (err == 0) {
     ++		struct object_id old_oid;
     ++		struct object_id new_oid;
     ++		const char *full_committer = "";
     ++
      +		err = reftable_iterator_next_log(&it, &log);
     -+		if (err != 0) {
     ++		if (err > 0) {
     ++			err = 0;
     ++			break;
     ++		}
     ++		if (err < 0) {
      +			break;
      +		}
      +
     @@ refs/reftable-backend.c (new)
      +			break;
      +		}
      +
     -+		{
     -+			struct object_id old_oid;
     -+			struct object_id new_oid;
     -+			const char *full_committer = "";
     -+
     -+			hashcpy(old_oid.hash, log.old_hash);
     -+			hashcpy(new_oid.hash, log.new_hash);
     -+
     -+			full_committer = fmt_ident(log.name, log.email,
     -+						   WANT_COMMITTER_IDENT,
     -+						   /*date*/ NULL,
     -+						   IDENT_NO_DATE);
     -+			if (fn(&old_oid, &new_oid, full_committer, log.time,
     -+			       log.tz_offset, log.message, cb_data)) {
     -+				err = -1;
     -+				break;
     -+			}
     -+		}
     ++		hashcpy(old_oid.hash, log.old_hash);
     ++		hashcpy(new_oid.hash, log.new_hash);
     ++
     ++		full_committer = fmt_ident(log.name, log.email,
     ++					   WANT_COMMITTER_IDENT,
     ++					   /*date*/ NULL, IDENT_NO_DATE);
     ++		err = fn(&old_oid, &new_oid, full_committer, log.time,
     ++			 log.tz_offset, log.message, cb_data);
     ++		if (err)
     ++			break;
      +	}
      +
      +	reftable_log_record_clear(&log);
      +	reftable_iterator_destroy(&it);
     -+	if (err > 0) {
     -+		err = 0;
     -+	}
      +	return err;
      +}
      +
     @@ refs/reftable-backend.c (new)
      +	while (err == 0) {
      +		struct reftable_log_record log = { NULL };
      +		err = reftable_iterator_next_log(&it, &log);
     -+		if (err != 0) {
     ++		if (err > 0) {
     ++			err = 0;
     ++			break;
     ++		}
     ++		if (err < 0) {
      +			break;
      +		}
      +
     @@ refs/reftable-backend.c (new)
      +		full_committer = fmt_ident(log->name, log->email,
      +					   WANT_COMMITTER_IDENT, NULL,
      +					   IDENT_NO_DATE);
     -+		if (!fn(&old_oid, &new_oid, full_committer, log->time,
     -+			log->tz_offset, log->message, cb_data)) {
     -+			err = -1;
     ++		err = fn(&old_oid, &new_oid, full_committer, log->time,
     ++			 log->tz_offset, log->message, cb_data);
     ++		if (err) {
      +			break;
      +		}
      +	}
     @@ refs/reftable-backend.c (new)
      +	free(logs);
      +
      +	reftable_iterator_destroy(&it);
     -+	if (err > 0) {
     -+		err = 0;
     -+	}
      +	return err;
      +}
      +
     @@ refs/reftable-backend.c (new)
      +	reftable_read_raw_ref,
      +
      +	reftable_reflog_iterator_begin,
     -+	reftable_for_each_reflog_ent_newest_first,
      +	reftable_for_each_reflog_ent_oldest_first,
     ++	reftable_for_each_reflog_ent_newest_first,
      +	reftable_reflog_exists,
      +	reftable_create_reflog,
      +	reftable_delete_reflog,
     @@ t/t0031-reftable.sh (new)
      +	test_line_count = 0 output
      +'
      +
     ++test_expect_success 'branch switch in reflog output' '
     ++	initialize &&
     ++	test_commit file1 &&
     ++	git checkout -b branch1 &&
     ++	test_commit file2 &&
     ++	git checkout -b branch2 &&
     ++	git switch - &&
     ++	git rev-parse --symbolic-full-name HEAD > actual &&
     ++	echo refs/heads/branch1 > expect &&
     ++	test_cmp actual expect
     ++'
     ++
     ++
      +# This matches show-ref's output
      +print_ref() {
      +	echo "$(git rev-parse "$1") $1"
 11:  3e7868ee409 ! 13:  0359fe416fa Hookup unittests for the reftable library.
     @@ Makefile: t/helper/test-svn-fe$X: $(VCSSVN_LIB)
       
       check-sha1:: t/helper/test-tool$X
      
     + ## t/helper/test-reftable.c (new) ##
     +@@
     ++#include "reftable/reftable-tests.h"
     ++#include "test-tool.h"
     ++
     ++int cmd__reftable(int argc, const char **argv)
     ++{
     ++	block_test_main(argc, argv);
     ++	merged_test_main(argc, argv);
     ++	record_test_main(argc, argv);
     ++	refname_test_main(argc, argv);
     ++	reftable_test_main(argc, argv);
     ++	slice_test_main(argc, argv);
     ++	stack_test_main(argc, argv);
     ++	tree_test_main(argc, argv);
     ++	return 0;
     ++}
     +
       ## t/helper/test-tool.c ##
      @@ t/helper/test-tool.c: static struct test_cmd cmds[] = {
       	{ "read-graph", cmd__read_graph },
 12:  1f193b6da23 ! 14:  88640ea13f9 Add GIT_DEBUG_REFS debugging mechanism
     @@ refs/debug.c (new)
      +static void print_update(int i, const char *refname,
      +			 const struct object_id *old_oid,
      +			 const struct object_id *new_oid, unsigned int flags,
     -+			 unsigned int type)
     ++			 unsigned int type, const char *msg)
      +{
      +	char o[200] = "null";
      +	char n[200] = "null";
     @@ refs/debug.c (new)
      +	type &= 0xf; /* see refs.h REF_* */
      +	flags &= REF_HAVE_NEW | REF_HAVE_OLD | REF_NO_DEREF |
      +		 REF_FORCE_CREATE_REFLOG | REF_LOG_ONLY;
     -+	printf("%d: %s %s -> %s (F=0x%x, T=0x%x)\n", i, refname, o, n, flags,
     -+	       type);
     ++	printf("%d: %s %s -> %s (F=0x%x, T=0x%x) \"%s\"\n", i, refname, o, n,
     ++	       flags, type, msg);
      +}
      +
      +static void print_transaction(struct ref_transaction *transaction)
     @@ refs/debug.c (new)
      +	for (int i = 0; i < transaction->nr; i++) {
      +		struct ref_update *u = transaction->updates[i];
      +		print_update(i, u->refname, &u->old_oid, &u->new_oid, u->flags,
     -+			     u->type);
     ++			     u->type, u->msg);
      +	}
      +	printf("}\n");
      +}
     @@ refs/debug.c (new)
      +}
      +
      +static int debug_create_symref(struct ref_store *ref_store,
     -+			       const char *ref_target,
     -+			       const char *refs_heads_master,
     ++			       const char *ref_name, const char *target,
      +			       const char *logmsg)
      +{
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
     -+	int res = drefs->refs->be->create_symref(drefs->refs, ref_target,
     -+						 refs_heads_master, logmsg);
     ++	int res = drefs->refs->be->create_symref(drefs->refs, ref_name, target,
     ++						 logmsg);
     ++	printf("create_symref: %s -> %s \"%s\": %d\n", ref_name, target, logmsg,
     ++	       res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +		drefs->refs->be->delete_refs(drefs->refs, msg, refnames, flags);
      +	return res;
      +}
     ++
      +static int debug_rename_ref(struct ref_store *ref_store, const char *oldref,
      +			    const char *newref, const char *logmsg)
      +{
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	int res = drefs->refs->be->rename_ref(drefs->refs, oldref, newref,
      +					      logmsg);
     ++	printf("rename_ref: %s -> %s \"%s\": %d\n", oldref, newref, logmsg,
     ++	       res);
      +	return res;
      +}
     ++
      +static int debug_copy_ref(struct ref_store *ref_store, const char *oldref,
      +			  const char *newref, const char *logmsg)
      +{
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	int res =
      +		drefs->refs->be->copy_ref(drefs->refs, oldref, newref, logmsg);
     ++	printf("copy_ref: %s -> %s \"%s\": %d\n", oldref, newref, logmsg, res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	struct ref_iterator *res =
      +		drefs->refs->be->reflog_iterator_begin(drefs->refs);
     ++	printf("for_each_reflog_iterator_begin\n");
      +	return res;
      +}
      +
     ++struct debug_reflog {
     ++	const char *refname;
     ++	each_reflog_ent_fn *fn;
     ++	void *cb_data;
     ++};
     ++
     ++static int debug_print_reflog_ent(struct object_id *old_oid,
     ++				  struct object_id *new_oid,
     ++				  const char *committer, timestamp_t timestamp,
     ++				  int tz, const char *msg, void *cb_data)
     ++{
     ++	struct debug_reflog *dbg = (struct debug_reflog *)cb_data;
     ++	int ret;
     ++	char o[100] = "null";
     ++	char n[100] = "null";
     ++	if (old_oid)
     ++		oid_to_hex_r(o, old_oid);
     ++	if (new_oid)
     ++		oid_to_hex_r(n, new_oid);
     ++
     ++	ret = dbg->fn(old_oid, new_oid, committer, timestamp, tz, msg,
     ++		      dbg->cb_data);
     ++	printf("reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
     ++	       dbg->refname, ret, o, n, committer, (long int)timestamp, msg);
     ++	return ret;
     ++}
     ++
      +static int debug_for_each_reflog_ent(struct ref_store *ref_store,
      +				     const char *refname, each_reflog_ent_fn fn,
      +				     void *cb_data)
      +{
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
     -+	int res = drefs->refs->be->for_each_reflog_ent(drefs->refs, refname, fn,
     -+						       cb_data);
     ++	struct debug_reflog dbg = {
     ++		.refname = refname,
     ++		.fn = fn,
     ++		.cb_data = cb_data,
     ++	};
     ++
     ++	int res = drefs->refs->be->for_each_reflog_ent(
     ++		drefs->refs, refname, &debug_print_reflog_ent, &dbg);
     ++	printf("for_each_reflog: %s: %d\n", refname, res);
      +	return res;
      +}
     ++
      +static int debug_for_each_reflog_ent_reverse(struct ref_store *ref_store,
      +					     const char *refname,
      +					     each_reflog_ent_fn fn,
      +					     void *cb_data)
      +{
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
     ++	struct debug_reflog dbg = {
     ++		.refname = refname,
     ++		.fn = fn,
     ++		.cb_data = cb_data,
     ++	};
      +	int res = drefs->refs->be->for_each_reflog_ent_reverse(
     -+		drefs->refs, refname, fn, cb_data);
     ++		drefs->refs, refname, &debug_print_reflog_ent, &dbg);
     ++	printf("for_each_reflog_reverse: %s: %d\n", refname, res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +{
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	int res = drefs->refs->be->reflog_exists(drefs->refs, refname);
     ++	printf("reflog_exists: %s: %d\n", refname, res);
      +	return res;
      +}
      +
     @@ t/t0033-debug-refs.sh (new)
      +test_expect_success 'GIT_DEBUG_REFS' '
      +	git init --ref-storage=files files &&
      +	git init --ref-storage=reftable reftable &&
     -+	(cd files && GIT_DEBUG_REFS=1 test_commit message file) > files.txt && 
     ++	(cd files && GIT_DEBUG_REFS=1 test_commit message file) > files.txt &&
      +	(cd reftable && GIT_DEBUG_REFS=1 test_commit message file) > reftable.txt &&
      +	test_cmp files.txt reftable.txt
      +'
 13:  0471f7e3570 = 15:  1c0cc646084 vcxproj: adjust for the reftable changes
 14:  a03c882b075 ! 16:  4f24b5f73de Add reftable testing infrastructure
     @@ Commit message
           * t1404-update-ref-errors.sh - Manipulates .git/refs/ directly
           * t1405 - inspecs .git/ directly.
      
     -    Worst offenders:
     -
     -    t1400-update-ref.sh                      - 82 of 185
     -    t2400-worktree-add.sh                    - 58 of 69
     -    t1404-update-ref-errors.sh               - 44 of 53
     -    t3514-cherry-pick-revert-gpg.sh          - 36 of 36
     -    t5541-http-push-smart.sh                 - 29 of 38
     -    t6003-rev-list-topo-order.sh             - 29 of 35
     -    t3420-rebase-autostash.sh                - 28 of 42
     -    t6120-describe.sh                        - 21 of 82
     -    t3430-rebase-merges.sh                   - 18 of 24
     -    t2018-checkout-branch.sh                 - 15 of 22
     -    ..
     -
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
       ## builtin/clone.c ##
  -:  ----------- > 17:  ad5658ffc51 Add "test-tool dump-reftable" command.

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v17 01/17] lib-t6000.sh: write tag using git-update-ref
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 02/17] checkout: add '\n' to reflog message Han-Wen Nienhuys via GitGitGadget
                                                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/lib-t6000.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/lib-t6000.sh b/t/lib-t6000.sh
index b0ed4767e32..fba6778ca35 100644
--- a/t/lib-t6000.sh
+++ b/t/lib-t6000.sh
@@ -1,7 +1,5 @@
 : included from 6002 and others
 
-mkdir -p .git/refs/tags
-
 >sed.script
 
 # Answer the sha1 has associated with the tag. The tag must exist under refs/tags
@@ -26,7 +24,8 @@ save_tag () {
 	_tag=$1
 	test -n "$_tag" || error "usage: save_tag tag commit-args ..."
 	shift 1
-	"$@" >".git/refs/tags/$_tag"
+
+	git update-ref "refs/tags/$_tag" $("$@")
 
 	echo "s/$(tag $_tag)/$_tag/g" >sed.script.tmp
 	cat sed.script >>sed.script.tmp
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 02/17] checkout: add '\n' to reflog message
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 01/17] lib-t6000.sh: write tag using git-update-ref Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 03/17] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
                                                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable precisely reproduces the given message. This leads to differences,
because the files backend implicitly adds a trailing '\n' to all messages.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/checkout.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index af849c644fe..bb11fcc4e99 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -884,8 +884,9 @@ static void update_refs_for_switch(const struct checkout_opts *opts,
 
 	reflog_msg = getenv("GIT_REFLOG_ACTION");
 	if (!reflog_msg)
-		strbuf_addf(&msg, "checkout: moving from %s to %s",
-			old_desc ? old_desc : "(invalid)", new_branch_info->name);
+		strbuf_addf(&msg, "checkout: moving from %s to %s\n",
+			    old_desc ? old_desc : "(invalid)",
+			    new_branch_info->name);
 	else
 		strbuf_insertstr(&msg, 0, reflog_msg);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 03/17] Write pseudorefs through ref backends.
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 01/17] lib-t6000.sh: write tag using git-update-ref Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 02/17] checkout: add '\n' to reflog message Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 04/17] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
                                                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Pseudorefs store transient data in in the repository. Examples are HEAD,
CHERRY_PICK_HEAD, etc.

These refs have always been read through the ref backends, but they were written
in a one-off routine that wrote an object ID or symref directly into
.git/<pseudo_ref_name>.

This causes problems when introducing a new ref storage backend. To remedy this,
extend the ref backend implementation with a write_pseudoref_fn and
update_pseudoref_fn.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c                | 119 ++++++++----------------------------------
 refs.h                |  11 ++++
 refs/files-backend.c  | 114 +++++++++++++++++++++++++++++++++++++++-
 refs/packed-backend.c |  21 +++++++-
 refs/refs-internal.h  |  18 +++++++
 5 files changed, 184 insertions(+), 99 deletions(-)

diff --git a/refs.c b/refs.c
index 224ff66c7bb..12908066b13 100644
--- a/refs.c
+++ b/refs.c
@@ -321,6 +321,12 @@ int ref_exists(const char *refname)
 	return refs_ref_exists(get_main_ref_store(the_repository), refname);
 }
 
+int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
+{
+	return refs_delete_pseudoref(get_main_ref_store(the_repository),
+				     pseudoref, old_oid);
+}
+
 static int filter_refs(const char *refname, const struct object_id *oid,
 			   int flags, void *data)
 {
@@ -739,101 +745,6 @@ long get_files_ref_lock_timeout_ms(void)
 	return timeout_ms;
 }
 
-static int write_pseudoref(const char *pseudoref, const struct object_id *oid,
-			   const struct object_id *old_oid, struct strbuf *err)
-{
-	const char *filename;
-	int fd;
-	struct lock_file lock = LOCK_INIT;
-	struct strbuf buf = STRBUF_INIT;
-	int ret = -1;
-
-	if (!oid)
-		return 0;
-
-	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
-
-	filename = git_path("%s", pseudoref);
-	fd = hold_lock_file_for_update_timeout(&lock, filename, 0,
-					       get_files_ref_lock_timeout_ms());
-	if (fd < 0) {
-		strbuf_addf(err, _("could not open '%s' for writing: %s"),
-			    filename, strerror(errno));
-		goto done;
-	}
-
-	if (old_oid) {
-		struct object_id actual_old_oid;
-
-		if (read_ref(pseudoref, &actual_old_oid)) {
-			if (!is_null_oid(old_oid)) {
-				strbuf_addf(err, _("could not read ref '%s'"),
-					    pseudoref);
-				rollback_lock_file(&lock);
-				goto done;
-			}
-		} else if (is_null_oid(old_oid)) {
-			strbuf_addf(err, _("ref '%s' already exists"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		} else if (!oideq(&actual_old_oid, old_oid)) {
-			strbuf_addf(err, _("unexpected object ID when writing '%s'"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		}
-	}
-
-	if (write_in_full(fd, buf.buf, buf.len) < 0) {
-		strbuf_addf(err, _("could not write to '%s'"), filename);
-		rollback_lock_file(&lock);
-		goto done;
-	}
-
-	commit_lock_file(&lock);
-	ret = 0;
-done:
-	strbuf_release(&buf);
-	return ret;
-}
-
-static int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
-{
-	const char *filename;
-
-	filename = git_path("%s", pseudoref);
-
-	if (old_oid && !is_null_oid(old_oid)) {
-		struct lock_file lock = LOCK_INIT;
-		int fd;
-		struct object_id actual_old_oid;
-
-		fd = hold_lock_file_for_update_timeout(
-				&lock, filename, 0,
-				get_files_ref_lock_timeout_ms());
-		if (fd < 0) {
-			error_errno(_("could not open '%s' for writing"),
-				    filename);
-			return -1;
-		}
-		if (read_ref(pseudoref, &actual_old_oid))
-			die(_("could not read ref '%s'"), pseudoref);
-		if (!oideq(&actual_old_oid, old_oid)) {
-			error(_("unexpected object ID when deleting '%s'"),
-			      pseudoref);
-			rollback_lock_file(&lock);
-			return -1;
-		}
-
-		unlink(filename);
-		rollback_lock_file(&lock);
-	} else {
-		unlink(filename);
-	}
-
-	return 0;
-}
 
 int refs_delete_ref(struct ref_store *refs, const char *msg,
 		    const char *refname,
@@ -845,7 +756,7 @@ int refs_delete_ref(struct ref_store *refs, const char *msg,
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
 		assert(refs == get_main_ref_store(the_repository));
-		return delete_pseudoref(refname, old_oid);
+		return refs_delete_pseudoref(refs, refname, old_oid);
 	}
 
 	transaction = ref_store_transaction_begin(refs, &err);
@@ -1172,7 +1083,8 @@ int refs_update_ref(struct ref_store *refs, const char *msg,
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
 		assert(refs == get_main_ref_store(the_repository));
-		ret = write_pseudoref(refname, new_oid, old_oid, &err);
+		ret = refs_write_pseudoref(refs, refname, new_oid, old_oid,
+					   &err);
 	} else {
 		t = ref_store_transaction_begin(refs, &err);
 		if (!t ||
@@ -1441,6 +1353,19 @@ int head_ref(each_ref_fn fn, void *cb_data)
 	return refs_head_ref(get_main_ref_store(the_repository), fn, cb_data);
 }
 
+int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			 const struct object_id *oid,
+			 const struct object_id *old_oid, struct strbuf *err)
+{
+	return refs->be->write_pseudoref(refs, pseudoref, oid, old_oid, err);
+}
+
+int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			  const struct object_id *old_oid)
+{
+	return refs->be->delete_pseudoref(refs, pseudoref, old_oid);
+}
+
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
 		const char *prefix, int trim, int flags)
diff --git a/refs.h b/refs.h
index e010f8aec28..4dad8f24914 100644
--- a/refs.h
+++ b/refs.h
@@ -732,6 +732,17 @@ int update_ref(const char *msg, const char *refname,
 	       const struct object_id *new_oid, const struct object_id *old_oid,
 	       unsigned int flags, enum action_on_err onerr);
 
+/* Pseudorefs (eg. HEAD, CHERRY_PICK_HEAD) have a separate routines for updating
+   and deletion as they cannot take part in normal transactional updates.
+   Pseudorefs should only be written for the main repository.
+*/
+int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			 const struct object_id *oid,
+			 const struct object_id *old_oid, struct strbuf *err);
+int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			  const struct object_id *old_oid);
+int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid);
+
 int parse_hide_refs_config(const char *var, const char *value, const char *);
 
 /*
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6516c7bc8c8..df7553f4cc3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -731,6 +731,115 @@ static int lock_raw_ref(struct files_ref_store *refs,
 	return ret;
 }
 
+static int files_write_pseudoref(struct ref_store *ref_store,
+				 const char *pseudoref,
+				 const struct object_id *oid,
+				 const struct object_id *old_oid,
+				 struct strbuf *err)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "write_pseudoref");
+	int fd;
+	struct lock_file lock = LOCK_INIT;
+	struct strbuf filename = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	int ret = -1;
+
+	if (!oid)
+		return 0;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+	fd = hold_lock_file_for_update_timeout(&lock, filename.buf, 0,
+					       get_files_ref_lock_timeout_ms());
+	if (fd < 0) {
+		strbuf_addf(err, _("could not open '%s' for writing: %s"),
+			    buf.buf, strerror(errno));
+		goto done;
+	}
+
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(err, _("could not read ref '%s'"),
+					    pseudoref);
+				rollback_lock_file(&lock);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(err, _("ref '%s' already exists"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(err,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+	}
+
+	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
+	if (write_in_full(fd, buf.buf, buf.len) < 0) {
+		strbuf_addf(err, _("could not write to '%s'"), filename.buf);
+		rollback_lock_file(&lock);
+		goto done;
+	}
+
+	commit_lock_file(&lock);
+	ret = 0;
+done:
+	strbuf_release(&buf);
+	strbuf_release(&filename);
+	return ret;
+}
+
+static int files_delete_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *old_oid)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "delete_pseudoref");
+	struct strbuf filename = STRBUF_INIT;
+	int ret = -1;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+
+	if (old_oid && !is_null_oid(old_oid)) {
+		struct lock_file lock = LOCK_INIT;
+		int fd;
+		struct object_id actual_old_oid;
+
+		fd = hold_lock_file_for_update_timeout(
+			&lock, filename.buf, 0,
+			get_files_ref_lock_timeout_ms());
+		if (fd < 0) {
+			error_errno(_("could not open '%s' for writing"),
+				    filename.buf);
+			goto done;
+		}
+		if (read_ref(pseudoref, &actual_old_oid))
+			die(_("could not read ref '%s'"), pseudoref);
+		if (!oideq(&actual_old_oid, old_oid)) {
+			error(_("unexpected object ID when deleting '%s'"),
+			      pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+
+		unlink(filename.buf);
+		rollback_lock_file(&lock);
+	} else {
+		unlink(filename.buf);
+	}
+	ret = 0;
+done:
+	strbuf_release(&filename);
+	return ret;
+}
+
 struct files_ref_iterator {
 	struct ref_iterator base;
 
@@ -3189,6 +3298,9 @@ struct ref_storage_be refs_be_files = {
 	files_rename_ref,
 	files_copy_ref,
 
+	files_write_pseudoref,
+	files_delete_pseudoref,
+
 	files_ref_iterator_begin,
 	files_read_raw_ref,
 
@@ -3198,5 +3310,5 @@ struct ref_storage_be refs_be_files = {
 	files_reflog_exists,
 	files_create_reflog,
 	files_delete_reflog,
-	files_reflog_expire
+	files_reflog_expire,
 };
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 4458a0f69cc..08e8253a893 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1590,6 +1590,22 @@ static int packed_copy_ref(struct ref_store *ref_store,
 	BUG("packed reference store does not support copying references");
 }
 
+static int packed_write_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *oid,
+				  const struct object_id *old_oid,
+				  struct strbuf *err)
+{
+	BUG("packed reference store does not support writing pseudo-references");
+}
+
+static int packed_delete_pseudoref(struct ref_store *ref_store,
+				   const char *pseudoref,
+				   const struct object_id *old_oid)
+{
+	BUG("packed reference store does not support deleting pseudo-references");
+}
+
 static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_store)
 {
 	return empty_ref_iterator_begin();
@@ -1656,6 +1672,9 @@ struct ref_storage_be refs_be_packed = {
 	packed_rename_ref,
 	packed_copy_ref,
 
+	packed_write_pseudoref,
+	packed_delete_pseudoref,
+
 	packed_ref_iterator_begin,
 	packed_read_raw_ref,
 
@@ -1665,5 +1684,5 @@ struct ref_storage_be refs_be_packed = {
 	packed_reflog_exists,
 	packed_create_reflog,
 	packed_delete_reflog,
-	packed_reflog_expire
+	packed_reflog_expire,
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 4271362d264..59b053d53a2 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -556,6 +556,21 @@ typedef int copy_ref_fn(struct ref_store *ref_store,
 			  const char *oldref, const char *newref,
 			  const char *logmsg);
 
+typedef int write_pseudoref_fn(struct ref_store *ref_store,
+			       const char *pseudoref,
+			       const struct object_id *oid,
+			       const struct object_id *old_oid,
+			       struct strbuf *err);
+
+/*
+ * Deletes a pseudoref. Deletion always succeeds (even if the pseudoref doesn't
+ * exist.), except if old_oid is specified. If it is, it can fail due to lock
+ * failure, failure reading the old OID, or an OID mismatch
+ */
+typedef int delete_pseudoref_fn(struct ref_store *ref_store,
+				const char *pseudoref,
+				const struct object_id *old_oid);
+
 /*
  * Iterate over the references in `ref_store` whose names start with
  * `prefix`. `prefix` is matched as a literal string, without regard
@@ -655,6 +670,9 @@ struct ref_storage_be {
 	rename_ref_fn *rename_ref;
 	copy_ref_fn *copy_ref;
 
+	write_pseudoref_fn *write_pseudoref;
+	delete_pseudoref_fn *delete_pseudoref;
+
 	ref_iterator_begin_fn *iterator_begin;
 	read_raw_ref_fn *read_raw_ref;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 04/17] Make refs_ref_exists public
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (2 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 03/17] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 05/17] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
                                                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 2 +-
 refs.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 12908066b13..812fee47108 100644
--- a/refs.c
+++ b/refs.c
@@ -311,7 +311,7 @@ int read_ref(const char *refname, struct object_id *oid)
 	return read_ref_full(refname, RESOLVE_REF_READING, oid, NULL);
 }
 
-static int refs_ref_exists(struct ref_store *refs, const char *refname)
+int refs_ref_exists(struct ref_store *refs, const char *refname)
 {
 	return !!refs_resolve_ref_unsafe(refs, refname, RESOLVE_REF_READING, NULL, NULL);
 }
diff --git a/refs.h b/refs.h
index 4dad8f24914..7aaa1226551 100644
--- a/refs.h
+++ b/refs.h
@@ -105,6 +105,8 @@ int refs_verify_refname_available(struct ref_store *refs,
 				  const struct string_list *skip,
 				  struct strbuf *err);
 
+int refs_ref_exists(struct ref_store *refs, const char *refname);
+
 int ref_exists(const char *refname);
 
 int should_autocreate_reflog(const char *refname);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 05/17] Treat BISECT_HEAD as a pseudo ref
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (3 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 04/17] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 06/17] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
                                                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Both the git-bisect.sh as bisect--helper inspected the file system directly.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/bisect--helper.c | 3 +--
 git-bisect.sh            | 4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index ec4996282e3..73f9324ad7d 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -13,7 +13,6 @@ static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
 static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
 static GIT_PATH_FUNC(git_path_bisect_ancestors_ok, "BISECT_ANCESTORS_OK")
 static GIT_PATH_FUNC(git_path_bisect_start, "BISECT_START")
-static GIT_PATH_FUNC(git_path_bisect_head, "BISECT_HEAD")
 static GIT_PATH_FUNC(git_path_bisect_log, "BISECT_LOG")
 static GIT_PATH_FUNC(git_path_head_name, "head-name")
 static GIT_PATH_FUNC(git_path_bisect_names, "BISECT_NAMES")
@@ -164,7 +163,7 @@ static int bisect_reset(const char *commit)
 		strbuf_addstr(&branch, commit);
 	}
 
-	if (!file_exists(git_path_bisect_head())) {
+	if (!ref_exists("BISECT_HEAD")) {
 		struct argv_array argv = ARGV_ARRAY_INIT;
 
 		argv_array_pushl(&argv, "checkout", branch.buf, "--", NULL);
diff --git a/git-bisect.sh b/git-bisect.sh
index 08a6ed57ddb..f03fbb18f00 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -41,7 +41,7 @@ TERM_GOOD=good
 
 bisect_head()
 {
-	if test -f "$GIT_DIR/BISECT_HEAD"
+	if git rev-parse --verify -q BISECT_HEAD > /dev/null
 	then
 		echo BISECT_HEAD
 	else
@@ -153,7 +153,7 @@ bisect_next() {
 	git bisect--helper --bisect-next-check $TERM_GOOD $TERM_BAD $TERM_GOOD|| exit
 
 	# Perform all bisection computation, display and checkout
-	git bisect--helper --next-all $(test -f "$GIT_DIR/BISECT_HEAD" && echo --no-checkout)
+	git bisect--helper --next-all $(git rev-parse --verify -q BISECT_HEAD > /dev/null && echo --no-checkout)
 	res=$?
 
 	# Check if we should exit because bisection is finished
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 06/17] Treat CHERRY_PICK_HEAD as a pseudo ref
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (4 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 05/17] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 07/17] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
                                                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Check for existence and delete CHERRY_PICK_HEAD through pseudo ref functions.
This will help cherry-pick work with alternate ref storage backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/commit.c | 34 +++++++++++++++++++---------------
 builtin/merge.c  |  2 +-
 path.c           |  1 -
 path.h           |  7 ++++---
 sequencer.c      | 42 ++++++++++++++++++++++++++----------------
 wt-status.c      |  4 ++--
 6 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index d1b7396052a..e27120b982b 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -847,21 +847,25 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 			if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
 				!merge_contains_scissors)
 				wt_status_add_cut_line(s->fp);
-			status_printf_ln(s, GIT_COLOR_NORMAL,
-			    whence == FROM_MERGE
-				? _("\n"
-					"It looks like you may be committing a merge.\n"
-					"If this is not correct, please remove the file\n"
-					"	%s\n"
-					"and try again.\n")
-				: _("\n"
-					"It looks like you may be committing a cherry-pick.\n"
-					"If this is not correct, please remove the file\n"
-					"	%s\n"
-					"and try again.\n"),
-				whence == FROM_MERGE ?
-					git_path_merge_head(the_repository) :
-					git_path_cherry_pick_head(the_repository));
+			if (whence == FROM_MERGE)
+				status_printf_ln(
+					s, GIT_COLOR_NORMAL,
+
+					_("\n"
+					  "It looks like you may be committing a merge.\n"
+					  "If this is not correct, please remove the file\n"
+					  "	%s\n"
+					  "and try again.\n"),
+					git_path_merge_head(the_repository));
+			else
+				status_printf_ln(
+					s, GIT_COLOR_NORMAL,
+
+					_("\n"
+					  "It looks like you may be committing a cherry-pick.\n"
+					  "If this is not correct, please run\n"
+					  "	git cherry-pick --abort\n"
+					  "and try again.\n"));
 		}
 
 		fprintf(s->fp, "\n");
diff --git a/builtin/merge.c b/builtin/merge.c
index 7da707bf55d..93b0a7b6eda 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -1352,7 +1352,7 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
 		else
 			die(_("You have not concluded your merge (MERGE_HEAD exists)."));
 	}
-	if (file_exists(git_path_cherry_pick_head(the_repository))) {
+	if (ref_exists("CHERRY_PICK_HEAD")) {
 		if (advice_resolve_conflict)
 			die(_("You have not concluded your cherry-pick (CHERRY_PICK_HEAD exists).\n"
 			    "Please, commit your changes before you merge."));
diff --git a/path.c b/path.c
index 8b2c7531919..783cc2ae819 100644
--- a/path.c
+++ b/path.c
@@ -1528,7 +1528,6 @@ char *xdg_cache_home(const char *filename)
 	return NULL;
 }
 
-REPO_GIT_PATH_FUNC(cherry_pick_head, "CHERRY_PICK_HEAD")
 REPO_GIT_PATH_FUNC(revert_head, "REVERT_HEAD")
 REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
 REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
diff --git a/path.h b/path.h
index 1f1bf8f87a8..8941c018a99 100644
--- a/path.h
+++ b/path.h
@@ -170,7 +170,6 @@ void report_linked_checkout_garbage(void);
 	}
 
 struct path_cache {
-	const char *cherry_pick_head;
 	const char *revert_head;
 	const char *squash_msg;
 	const char *merge_msg;
@@ -182,9 +181,11 @@ struct path_cache {
 	const char *shallow;
 };
 
-#define PATH_CACHE_INIT { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL }
+#define PATH_CACHE_INIT                                              \
+	{                                                            \
+		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
+	}
 
-const char *git_path_cherry_pick_head(struct repository *r);
 const char *git_path_revert_head(struct repository *r);
 const char *git_path_squash_msg(struct repository *r);
 const char *git_path_merge_msg(struct repository *r);
diff --git a/sequencer.c b/sequencer.c
index fd7701c88a8..26286ec8d08 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -381,7 +381,8 @@ static void print_advice(struct repository *r, int show_hint,
 		 * (typically rebase --interactive) wants to take care
 		 * of the commit itself so remove CHERRY_PICK_HEAD
 		 */
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		return;
 	}
 
@@ -1455,7 +1456,8 @@ static int do_commit(struct repository *r,
 				    author, opts, flags, &oid);
 		strbuf_release(&sb);
 		if (!res) {
-			unlink(git_path_cherry_pick_head(r));
+			refs_delete_pseudoref(get_main_ref_store(r),
+					      "CHERRY_PICK_HEAD", NULL);
 			unlink(git_path_merge_msg(r));
 			if (!is_rebase_i(opts))
 				print_commit_summary(r, NULL, &oid,
@@ -1966,7 +1968,8 @@ static int do_pick_commit(struct repository *r,
 		flags |= ALLOW_EMPTY;
 	} else if (allow == 2) {
 		drop_commit = 1;
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		unlink(git_path_merge_msg(r));
 		fprintf(stderr,
 			_("dropping %s %s -- patch contents already upstream\n"),
@@ -2305,8 +2308,10 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose)
 	struct replay_opts opts = REPLAY_OPTS_INIT;
 	int need_cleanup = 0;
 
-	if (file_exists(git_path_cherry_pick_head(r))) {
-		if (!unlink(git_path_cherry_pick_head(r)) && verbose)
+	if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD")) {
+		if (!refs_delete_pseudoref(get_main_ref_store(r),
+					   "CHERRY_PICK_HEAD", NULL) &&
+		    verbose)
 			warning(_("cancelling a cherry picking in progress"));
 		opts.action = REPLAY_PICK;
 		need_cleanup = 1;
@@ -2671,8 +2676,9 @@ static int create_seq_dir(struct repository *r)
 	enum replay_action action;
 	const char *in_progress_error = NULL;
 	const char *in_progress_advice = NULL;
-	unsigned int advise_skip = file_exists(git_path_revert_head(r)) ||
-				file_exists(git_path_cherry_pick_head(r));
+	unsigned int advise_skip =
+		file_exists(git_path_revert_head(r)) ||
+		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
 
 	if (!sequencer_get_last_command(r, &action)) {
 		switch (action) {
@@ -2771,7 +2777,7 @@ static int rollback_single_pick(struct repository *r)
 {
 	struct object_id head_oid;
 
-	if (!file_exists(git_path_cherry_pick_head(r)) &&
+	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
 	    !file_exists(git_path_revert_head(r)))
 		return error(_("no cherry-pick or revert in progress"));
 	if (read_ref_full("HEAD", 0, &head_oid, NULL))
@@ -2874,7 +2880,8 @@ int sequencer_skip(struct repository *r, struct replay_opts *opts)
 		}
 		break;
 	case REPLAY_PICK:
-		if (!file_exists(git_path_cherry_pick_head(r))) {
+		if (!refs_ref_exists(get_main_ref_store(r),
+				     "CHERRY_PICK_HEAD")) {
 			if (action != REPLAY_PICK)
 				return error(_("no cherry-pick in progress"));
 			if (!rollback_is_safe())
@@ -3569,7 +3576,8 @@ static int do_merge(struct repository *r,
 					oid_to_hex(&j->item->object.oid));
 
 		strbuf_release(&ref_name);
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		rollback_lock_file(&lock);
 
 		rollback_lock_file(&lock);
@@ -4201,7 +4209,7 @@ static int continue_single_pick(struct repository *r)
 {
 	const char *argv[] = { "commit", NULL };
 
-	if (!file_exists(git_path_cherry_pick_head(r)) &&
+	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
 	    !file_exists(git_path_revert_head(r)))
 		return error(_("no cherry-pick or revert in progress"));
 	return run_command_v_opt(argv, RUN_GIT_CMD);
@@ -4318,9 +4326,10 @@ static int commit_staged_changes(struct repository *r,
 	}
 
 	if (is_clean) {
-		const char *cherry_pick_head = git_path_cherry_pick_head(r);
-
-		if (file_exists(cherry_pick_head) && unlink(cherry_pick_head))
+		if (refs_ref_exists(get_main_ref_store(r),
+				    "CHERRY_PICK_HEAD") &&
+		    refs_delete_pseudoref(get_main_ref_store(r),
+					  "CHERRY_PICK_HEAD", NULL))
 			return error(_("could not remove CHERRY_PICK_HEAD"));
 		if (!final_fixup)
 			return 0;
@@ -4379,7 +4388,8 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
 
 	if (!is_rebase_i(opts)) {
 		/* Verify that the conflict has been resolved */
-		if (file_exists(git_path_cherry_pick_head(r)) ||
+		if (refs_ref_exists(get_main_ref_store(r),
+				    "CHERRY_PICK_HEAD") ||
 		    file_exists(git_path_revert_head(r))) {
 			res = continue_single_pick(r);
 			if (res)
@@ -5442,7 +5452,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list)
 
 int sequencer_determine_whence(struct repository *r, enum commit_whence *whence)
 {
-	if (file_exists(git_path_cherry_pick_head(r))) {
+	if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD")) {
 		struct object_id cherry_pick_head, rebase_head;
 
 		if (file_exists(git_path_seq_dir()))
diff --git a/wt-status.c b/wt-status.c
index 98dfa6f73f9..96302be030b 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1636,8 +1636,8 @@ void wt_status_get_state(struct repository *r,
 		state->merge_in_progress = 1;
 	} else if (wt_status_check_rebase(NULL, state)) {
 		;		/* all set */
-	} else if (!stat(git_path_cherry_pick_head(r), &st) &&
-			!get_oid("CHERRY_PICK_HEAD", &oid)) {
+	} else if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
+		   !get_oid("CHERRY_PICK_HEAD", &oid)) {
 		state->cherry_pick_in_progress = 1;
 		oidcpy(&state->cherry_pick_head_oid, &oid);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 07/17] Treat REVERT_HEAD as a pseudo ref
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (5 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 06/17] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 08/17] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
                                                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 path.c      |  1 -
 path.h      |  8 +++-----
 sequencer.c | 16 +++++++++-------
 wt-status.c |  2 +-
 4 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/path.c b/path.c
index 783cc2ae819..7b385e5eb28 100644
--- a/path.c
+++ b/path.c
@@ -1528,7 +1528,6 @@ char *xdg_cache_home(const char *filename)
 	return NULL;
 }
 
-REPO_GIT_PATH_FUNC(revert_head, "REVERT_HEAD")
 REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
 REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
 REPO_GIT_PATH_FUNC(merge_rr, "MERGE_RR")
diff --git a/path.h b/path.h
index 8941c018a99..e7e77da6aaa 100644
--- a/path.h
+++ b/path.h
@@ -170,7 +170,6 @@ void report_linked_checkout_garbage(void);
 	}
 
 struct path_cache {
-	const char *revert_head;
 	const char *squash_msg;
 	const char *merge_msg;
 	const char *merge_rr;
@@ -181,12 +180,11 @@ struct path_cache {
 	const char *shallow;
 };
 
-#define PATH_CACHE_INIT                                              \
-	{                                                            \
-		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
+#define PATH_CACHE_INIT                                        \
+	{                                                      \
+		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
 	}
 
-const char *git_path_revert_head(struct repository *r);
 const char *git_path_squash_msg(struct repository *r);
 const char *git_path_merge_msg(struct repository *r);
 const char *git_path_merge_rr(struct repository *r);
diff --git a/sequencer.c b/sequencer.c
index 26286ec8d08..25ef9fc773d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2317,8 +2317,10 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose)
 		need_cleanup = 1;
 	}
 
-	if (file_exists(git_path_revert_head(r))) {
-		if (!unlink(git_path_revert_head(r)) && verbose)
+	if (refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
+		if (!refs_delete_pseudoref(get_main_ref_store(r), "REVERT_HEAD",
+					   NULL) &&
+		    verbose)
 			warning(_("cancelling a revert in progress"));
 		opts.action = REPLAY_REVERT;
 		need_cleanup = 1;
@@ -2677,7 +2679,7 @@ static int create_seq_dir(struct repository *r)
 	const char *in_progress_error = NULL;
 	const char *in_progress_advice = NULL;
 	unsigned int advise_skip =
-		file_exists(git_path_revert_head(r)) ||
+		refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD") ||
 		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
 
 	if (!sequencer_get_last_command(r, &action)) {
@@ -2778,7 +2780,7 @@ static int rollback_single_pick(struct repository *r)
 	struct object_id head_oid;
 
 	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
-	    !file_exists(git_path_revert_head(r)))
+	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
 		return error(_("no cherry-pick or revert in progress"));
 	if (read_ref_full("HEAD", 0, &head_oid, NULL))
 		return error(_("cannot resolve HEAD"));
@@ -2872,7 +2874,7 @@ int sequencer_skip(struct repository *r, struct replay_opts *opts)
 	 */
 	switch (opts->action) {
 	case REPLAY_REVERT:
-		if (!file_exists(git_path_revert_head(r))) {
+		if (!refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
 			if (action != REPLAY_REVERT)
 				return error(_("no revert in progress"));
 			if (!rollback_is_safe())
@@ -4210,7 +4212,7 @@ static int continue_single_pick(struct repository *r)
 	const char *argv[] = { "commit", NULL };
 
 	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
-	    !file_exists(git_path_revert_head(r)))
+	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
 		return error(_("no cherry-pick or revert in progress"));
 	return run_command_v_opt(argv, RUN_GIT_CMD);
 }
@@ -4390,7 +4392,7 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
 		/* Verify that the conflict has been resolved */
 		if (refs_ref_exists(get_main_ref_store(r),
 				    "CHERRY_PICK_HEAD") ||
-		    file_exists(git_path_revert_head(r))) {
+		    refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
 			res = continue_single_pick(r);
 			if (res)
 				goto release_todo_list;
diff --git a/wt-status.c b/wt-status.c
index 96302be030b..81b768504a1 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1642,7 +1642,7 @@ void wt_status_get_state(struct repository *r,
 		oidcpy(&state->cherry_pick_head_oid, &oid);
 	}
 	wt_status_check_bisect(NULL, state);
-	if (!stat(git_path_revert_head(r), &st) &&
+	if (refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD") &&
 	    !get_oid("REVERT_HEAD", &oid)) {
 		state->revert_in_progress = 1;
 		oidcpy(&state->revert_head_oid, &oid);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 08/17] Move REF_LOG_ONLY to refs-internal.h
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (6 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 07/17] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 09/17] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
                                                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

REF_LOG_ONLY is used in the transaction preparation: if a symref is involved in
a transaction, the referent of the symref should be updated, and the symref
itself should only be updated in the reflog. Other ref backends will need to
duplicate this logic.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/files-backend.c | 7 -------
 refs/refs-internal.h | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index df7553f4cc3..141b6b08816 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -38,13 +38,6 @@
  */
 #define REF_NEEDS_COMMIT (1 << 6)
 
-/*
- * Used as a flag in ref_update::flags when we want to log a ref
- * update but not actually perform it.  This is used when a symbolic
- * ref update is split up.
- */
-#define REF_LOG_ONLY (1 << 7)
-
 /*
  * Used as a flag in ref_update::flags when the ref_update was via an
  * update to HEAD.
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 59b053d53a2..dc9e8d3a92b 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -31,6 +31,13 @@ struct ref_transaction;
  */
 #define REF_HAVE_OLD (1 << 3)
 
+/*
+ * Used as a flag in ref_update::flags when we want to log a ref
+ * update but not actually perform it.  This is used when a symbolic
+ * ref update is split up.
+ */
+#define REF_LOG_ONLY (1 << 7)
+
 /*
  * Return the length of time to retry acquiring a loose reference lock
  * before giving up, in milliseconds:
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 09/17] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (7 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 08/17] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 10/17] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
                                                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This happens implicitly in the files/packed ref backend; making it
explicit simplifies adding alternate ref storage backends, such as
reftable.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 812fee47108..29e710a29e9 100644
--- a/refs.c
+++ b/refs.c
@@ -1450,7 +1450,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1510,8 +1510,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
-			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
+			       cb_data);
 }
 
 int for_each_rawref(each_ref_fn fn, void *cb_data)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 10/17] Add .gitattributes for the reftable/ directory
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (8 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 09/17] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 11/17] Add reftable library Han-Wen Nienhuys via GitGitGadget
                                                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 reftable/.gitattributes

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 00000000000..f44451a3795
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 11/17] Add reftable library
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (9 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 10/17] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 12/17] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                                                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. Its rationale and
specification is in the preceding commit.

This imports the upstream library as one big commit. For understanding
the code, it is suggested to read the code in the following order:

* The specification under Documentation/technical/reftable.txt

* reftable.h - the public API

* record.{c,h} - reading and writing records

* block.{c,h} - reading and writing blocks.

* writer.{c,h} - writing a complete reftable file.

* merged.{c,h} and pq.{c,h} - reading a stack of reftables

* stack.{c,h} - writing and compacting stacks of reftable on the
filesystem.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE          |   31 +
 reftable/README.md        |   11 +
 reftable/VERSION          |    1 +
 reftable/basics.c         |  215 +++++++
 reftable/basics.h         |   53 ++
 reftable/block.c          |  437 +++++++++++++
 reftable/block.h          |  129 ++++
 reftable/block_test.c     |  157 +++++
 reftable/compat.c         |   97 +++
 reftable/compat.h         |   48 ++
 reftable/constants.h      |   21 +
 reftable/dump.c           |  212 +++++++
 reftable/file.c           |   95 +++
 reftable/iter.c           |  242 ++++++++
 reftable/iter.h           |   72 +++
 reftable/merged.c         |  320 ++++++++++
 reftable/merged.h         |   39 ++
 reftable/merged_test.c    |  273 +++++++++
 reftable/pq.c             |  113 ++++
 reftable/pq.h             |   34 ++
 reftable/reader.c         |  742 +++++++++++++++++++++++
 reftable/reader.h         |   65 ++
 reftable/record.c         | 1118 ++++++++++++++++++++++++++++++++++
 reftable/record.h         |  128 ++++
 reftable/record_test.c    |  403 ++++++++++++
 reftable/refname.c        |  211 +++++++
 reftable/refname.h        |   38 ++
 reftable/refname_test.c   |   99 +++
 reftable/reftable-tests.h |   22 +
 reftable/reftable.c       |   90 +++
 reftable/reftable.h       |  571 +++++++++++++++++
 reftable/reftable_test.c  |  631 +++++++++++++++++++
 reftable/slice.c          |  217 +++++++
 reftable/slice.h          |   85 +++
 reftable/slice_test.c     |   39 ++
 reftable/stack.c          | 1214 +++++++++++++++++++++++++++++++++++++
 reftable/stack.h          |   48 ++
 reftable/stack_test.c     |  787 ++++++++++++++++++++++++
 reftable/system.h         |   55 ++
 reftable/test_framework.c |   69 +++
 reftable/test_framework.h |   64 ++
 reftable/tree.c           |   63 ++
 reftable/tree.h           |   34 ++
 reftable/tree_test.c      |   62 ++
 reftable/update.sh        |   22 +
 reftable/writer.c         |  661 ++++++++++++++++++++
 reftable/writer.h         |   60 ++
 reftable/zlib-compat.c    |   92 +++
 48 files changed, 10290 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/compat.c
 create mode 100644 reftable/compat.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/refname_test.c
 create mode 100644 reftable/reftable-tests.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/slice.c
 create mode 100644 reftable/slice.h
 create mode 100644 reftable/slice_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/system.h
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 00000000000..402e0f9356b
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 00000000000..21500563fd4
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,11 @@
+
+The source code in this directory comes from https://github.com/google/reftable.
+
+The VERSION file keeps track of the current version of the reftable library.
+
+To update the library, do:
+
+   sh reftable/update.sh
+
+Bugfixes should be accompanied by a test and applied to upstream project at
+https://github.com/google/reftable.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 00000000000..1d6c51b7d42
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1 @@
+2c91c4b305dcbf6500c0806bb1a7fbcfc668510c C: include system.h in compat.h
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 00000000000..14c4dfaf5a6
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,215 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(byte *out, uint32_t i)
+{
+	out[0] = (byte)((i >> 16) & 0xff);
+	out[1] = (byte)((i >> 8) & 0xff);
+	out[2] = (byte)(i & 0xff);
+}
+
+uint32_t get_be24(byte *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)(i & 0xff);
+}
+
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args)
+{
+	size_t lo = 0;
+	size_t hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		size_t mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		reftable_free(*p);
+		p++;
+	}
+	reftable_free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = reftable_realloc(
+					names, names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = reftable_realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *reftable_error_str(int err)
+{
+	static char buf[250];
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return "I/O error";
+	case REFTABLE_FORMAT_ERROR:
+		return "corrupt reftable file";
+	case REFTABLE_NOT_EXIST_ERROR:
+		return "file does not exist";
+	case REFTABLE_LOCK_ERROR:
+		return "data is outdated";
+	case REFTABLE_API_ERROR:
+		return "misuse of the reftable API";
+	case REFTABLE_ZLIB_ERROR:
+		return "zlib failure";
+	case REFTABLE_NAME_CONFLICT:
+		return "file/directory conflict";
+	case REFTABLE_REFNAME_ERROR:
+		return "invalid refname";
+	case -1:
+		return "general error";
+	default:
+		snprintf(buf, sizeof(buf), "unknown error code %d", err);
+		return buf;
+	}
+}
+
+int reftable_error_to_errno(int err)
+{
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return EIO;
+	case REFTABLE_FORMAT_ERROR:
+		return EFAULT;
+	case REFTABLE_NOT_EXIST_ERROR:
+		return ENOENT;
+	case REFTABLE_LOCK_ERROR:
+		return EBUSY;
+	case REFTABLE_API_ERROR:
+		return EINVAL;
+	case REFTABLE_ZLIB_ERROR:
+		return EDOM;
+	default:
+		return ERANGE;
+	}
+}
+
+void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
+void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
+void (*reftable_free_ptr)(void *) = &free;
+
+void *reftable_malloc(size_t sz)
+{
+	return (*reftable_malloc_ptr)(sz);
+}
+
+void *reftable_realloc(void *p, size_t sz)
+{
+	return (*reftable_realloc_ptr)(p, sz);
+}
+
+void reftable_free(void *p)
+{
+	reftable_free_ptr(p);
+}
+
+void *reftable_calloc(size_t sz)
+{
+	void *p = reftable_malloc(sz);
+	memset(p, 0, sz);
+	return p;
+}
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *))
+{
+	reftable_malloc_ptr = malloc;
+	reftable_realloc_ptr = realloc;
+	reftable_free_ptr = free;
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 00000000000..1b003485c9c
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+/* Bigendian en/decoding of integers */
+
+void put_be24(byte *out, uint32_t i);
+uint32_t get_be24(byte *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args);
+
+/*
+  Frees a NULL terminated array of malloced strings. The array itself is also
+  freed.
+ */
+void free_names(char **a);
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp);
+
+/* compares two NULL-terminated arrays of strings. */
+int names_equal(char **a, char **b);
+
+/* returns the array size of a NULL-terminated array of strings. */
+int names_length(char **names);
+
+/* Allocation routines; they invoke the functions set through
+ * reftable_set_alloc() */
+void *reftable_malloc(size_t sz);
+void *reftable_realloc(void *p, size_t sz);
+void reftable_free(void *p);
+void *reftable_calloc(size_t sz);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 00000000000..ffd47c9cb90
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,437 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int header_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 24;
+	case 2:
+		return 28;
+	}
+	abort();
+}
+
+int footer_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 68;
+	case 2:
+		return 72;
+	}
+	abort();
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice *key);
+
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+	bw->last_key.canary = SLICE_CANARY;
+}
+
+byte block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the reftable_record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec)
+{
+	struct slice empty = SLICE_INIT;
+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
+								    w->last_key;
+	struct slice out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+		.canary = SLICE_CANARY,
+	};
+
+	struct slice start = out;
+
+	bool restart = false;
+	struct slice key = SLICE_INIT;
+	int n = 0;
+
+	reftable_record_key(rec, &key);
+	n = reftable_encode_key(&restart, out, last, key,
+				reftable_record_val_type(rec));
+	if (n < 0)
+		goto done;
+	slice_consume(&out, n);
+
+	n = reftable_record_encode(rec, out, w->hash_size);
+	if (n < 0)
+		goto done;
+	slice_consume(&out, n);
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  &key) < 0)
+		goto done;
+
+	slice_release(&key);
+	return 0;
+
+done:
+	slice_release(&key);
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct slice *key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next)
+		return -1;
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = reftable_realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+
+	slice_reset(&w->last_key);
+	slice_addbuf(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		byte *compressed = NULL;
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		size_t dest_cap = src_len;
+
+		compressed = reftable_malloc(dest_cap);
+		while (1) {
+			uLongf out_dest_len = dest_cap;
+
+			zresult = compress2(compressed, &out_dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				dest_cap *= 2;
+				compressed =
+					reftable_realloc(compressed, dest_cap);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				reftable_free(compressed);
+				return REFTABLE_ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed,
+			       out_dest_len);
+			w->next = out_dest_len + block_header_skip;
+			reftable_free(compressed);
+			break;
+		}
+	}
+	return w->next;
+}
+
+byte block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct reftable_block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	byte typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	uint16_t restart_count = 0;
+	uint32_t restart_start = 0;
+	byte *restart_bytes = NULL;
+
+	if (!reftable_is_block_type(typ))
+		return REFTABLE_FORMAT_ERROR;
+
+	if (typ == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip; /* total size of dest
+							    buffer. */
+		uLongf src_len = block->len - block_header_skip;
+		/* Log blocks specify the *uncompressed* size in their header.
+		 */
+		byte *uncompressed = reftable_malloc(sz);
+
+		/* Copy over the block header verbatim. It's not compressed. */
+		memcpy(uncompressed, block->data, block_header_skip);
+
+		/* Uncompress */
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed + block_header_skip, &dst_len,
+				    block->data + block_header_skip,
+				    &src_len)) {
+			reftable_free(uncompressed);
+			return REFTABLE_ZLIB_ERROR;
+		}
+
+		if (dst_len + block_header_skip != sz)
+			return REFTABLE_FORMAT_ERROR;
+
+		/* We're done with the input data. */
+		reftable_block_done(block);
+		block->data = uncompressed;
+		block->len = sz;
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	restart_count = get_be16(block->data + sz - 2);
+	restart_start = sz - 2 - 3 * restart_count;
+	restart_bytes = block->data + restart_start;
+
+	/* transfer ownership. */
+	br->block = *block;
+	block->data = NULL;
+	block->len = 0;
+
+	br->hash_size = hash_size;
+	br->block_len = restart_start;
+	br->full_block_size = full_block_size;
+	br->header_off = header_off;
+	br->restart_count = restart_count;
+	br->restart_bytes = restart_bytes;
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	slice_reset(&it->last_key);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct slice key;
+	struct block_reader *r;
+};
+
+static int restart_key_less(size_t idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct slice in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+		.canary = SLICE_CANARY,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct slice rkey = SLICE_INIT;
+	struct slice last_key = SLICE_INIT;
+	byte unused_extra;
+	int n = reftable_decode_key(&rkey, &unused_extra, last_key, in);
+	int result;
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	result = slice_cmp(&a->key, &rkey);
+	slice_release(&rkey);
+	return result;
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	slice_reset(&dest->last_key);
+	slice_addbuf(&dest->last_key, &src->last_key);
+}
+
+int block_iter_next(struct block_iter *it, struct reftable_record *rec)
+{
+	struct slice in = {
+		.buf = it->br->block.data + it->next_off,
+		.len = it->br->block_len - it->next_off,
+		.canary = SLICE_CANARY,
+	};
+	struct slice start = in;
+	struct slice key = SLICE_INIT;
+	byte extra = 0;
+	int n = 0;
+
+	if (it->next_off >= it->br->block_len)
+		return 1;
+
+	n = reftable_decode_key(&key, &extra, it->last_key, in);
+	if (n < 0)
+		return -1;
+
+	slice_consume(&in, n);
+	n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
+	if (n < 0)
+		return -1;
+	slice_consume(&in, n);
+
+	slice_reset(&it->last_key);
+	slice_addbuf(&it->last_key, &key);
+	it->next_off += start.len - in.len;
+	slice_release(&key);
+	return 0;
+}
+
+int block_reader_first_key(struct block_reader *br, struct slice *key)
+{
+	struct slice empty = SLICE_INIT;
+	int off = br->header_off + 4;
+	struct slice in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+		.canary = SLICE_CANARY,
+	};
+
+	byte extra = 0;
+	int n = reftable_decode_key(key, &extra, empty, in);
+	if (n < 0)
+		return n;
+
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct slice *want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	slice_release(&it->last_key);
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice *want)
+{
+	struct restart_find_args args = {
+		.key = *want,
+		.r = br,
+	};
+	struct reftable_record rec = reftable_new_record(block_reader_type(br));
+	struct slice key = SLICE_INIT;
+	int err = 0;
+	struct block_iter next = {
+		.last_key = SLICE_INIT,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	/* We're looking for the last entry less/equal than the wanted key, so
+	   we have to go one entry too far and then back up.
+	*/
+	while (true) {
+		block_iter_copy_from(&next, it);
+		err = block_iter_next(&next, &rec);
+		if (err < 0)
+			goto done;
+
+		reftable_record_key(&rec, &key);
+		if (err > 0 || slice_cmp(&key, want) >= 0) {
+			err = 0;
+			goto done;
+		}
+
+		block_iter_copy_from(it, &next);
+	}
+
+done:
+	slice_release(&key);
+	slice_release(&next.last_key);
+	reftable_record_destroy(&rec);
+
+	return err;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	slice_release(&bw->last_key);
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 00000000000..c55f746c5c1
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,129 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+/*
+  Writes reftable blocks. The block_writer is reused across blocks to minimize
+  allocation overhead.
+*/
+struct block_writer {
+	byte *buf;
+	uint32_t block_size;
+
+	/* Offset ofof the global header. Nonzero in the first block only. */
+	uint32_t header_off;
+
+	/* How often to restart keys. */
+	int restart_interval;
+	int hash_size;
+
+	/* Offset of next byte to write. */
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+
+	struct slice last_key;
+	int entries;
+};
+
+/*
+  initializes the blockwriter to write `typ` entries, using `buf` as temporary
+  storage. `buf` is not owned by the block_writer. */
+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+
+/*
+  returns the block type (eg. 'r' for ref records.
+*/
+byte block_writer_type(struct block_writer *bw);
+
+/* appends the record, or -1 if it doesn't fit. */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec);
+
+/* appends the key restarts, and compress the block if necessary. */
+int block_writer_finish(struct block_writer *w);
+
+/* clears out internally allocated block_writer members. */
+void block_writer_clear(struct block_writer *bw);
+
+/* Read a block. */
+struct block_reader {
+	/* offset of the block header; nonzero for the first block in a
+	 * reftable. */
+	uint32_t header_off;
+
+	/* the memory block */
+	struct reftable_block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	byte *restart_bytes;
+	uint16_t restart_count;
+
+	/* size of the data in the file. For log blocks, this is the compressed
+	 * size. */
+	uint32_t full_block_size;
+};
+
+/* Iterate over entries in a block */
+struct block_iter {
+	/* offset within the block of the next entry to read. */
+	uint32_t next_off;
+	struct block_reader *br;
+
+	/* key for last entry we read. */
+	struct slice last_key;
+};
+
+/* initializes a block reader. */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Position `it` at start of the block */
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+
+/* Position `it` to the `want` key in the block */
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct slice *want);
+
+/* Returns the block type (eg. 'r' for refs) */
+byte block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct slice *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct reftable_record *rec);
+
+/* Seek to `want` with in the block pointed to by `it` */
+int block_iter_seek(struct block_iter *it, struct slice *want);
+
+/* deallocate memory for `it`. The block reader and its block is left intact. */
+void block_iter_close(struct block_iter *it);
+
+/* size of file header, depending on format version */
+int header_size(int version);
+
+/* size of file footer, depending on format version */
+int footer_size(int version);
+
+/* returns a block to its source. */
+void reftable_block_done(struct reftable_block *ret);
+
+#endif
diff --git a/reftable/block_test.c b/reftable/block_test.c
new file mode 100644
index 00000000000..e677d8e28e0
--- /dev/null
+++ b/reftable/block_test.c
@@ -0,0 +1,157 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+struct binsearch_args {
+	int key;
+	int *arr;
+};
+
+static int binsearch_func(size_t i, void *void_args)
+{
+	struct binsearch_args *args = (struct binsearch_args *)void_args;
+
+	return args->key < args->arr[i];
+}
+
+static void test_binsearch(void)
+{
+	int arr[] = { 2, 4, 6, 8, 10 };
+	size_t sz = ARRAY_SIZE(arr);
+	struct binsearch_args args = {
+		.arr = arr,
+	};
+
+	int i = 0;
+	for (i = 1; i < 11; i++) {
+		int res;
+		args.key = i;
+		res = binsearch(sz, &binsearch_func, &args);
+
+		if (res < sz) {
+			assert(args.key < arr[res]);
+			if (res > 0) {
+				assert(args.key >= arr[res - 1]);
+			}
+		} else {
+			assert(args.key == 10 || args.key == 11);
+		}
+	}
+}
+
+static void test_block_read_write(void)
+{
+	const int header_off = 21; /* random */
+	char *names[30];
+	const int N = ARRAY_SIZE(names);
+	const int block_size = 1024;
+	struct reftable_block block = { 0 };
+	struct block_writer bw = {
+		.last_key = SLICE_INIT,
+	};
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_record rec = { 0 };
+	int i = 0;
+	int n;
+	struct block_reader br = { 0 };
+	struct block_iter it = { .last_key = SLICE_INIT };
+	int j = 0;
+	struct slice want = SLICE_INIT;
+
+	block.data = reftable_calloc(block_size);
+	block.len = block_size;
+	block.source = malloc_block_source();
+	block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size,
+			  header_off, hash_size(SHA1_ID));
+	reftable_record_from_ref(&rec, &ref);
+
+	for (i = 0; i < N; i++) {
+		char name[100];
+		byte hash[SHA1_SIZE];
+		snprintf(name, sizeof(name), "branch%02d", i);
+		memset(hash, i, sizeof(hash));
+
+		ref.ref_name = name;
+		ref.value = hash;
+		names[i] = xstrdup(name);
+		n = block_writer_add(&bw, &rec);
+		ref.ref_name = NULL;
+		ref.value = NULL;
+		assert(n == 0);
+	}
+
+	n = block_writer_finish(&bw);
+	assert(n > 0);
+
+	block_writer_clear(&bw);
+
+	block_reader_init(&br, &block, header_off, block_size, SHA1_SIZE);
+
+	block_reader_start(&br, &it);
+
+	while (true) {
+		int r = block_iter_next(&it, &rec);
+		assert(r >= 0);
+		if (r > 0) {
+			break;
+		}
+		assert_streq(names[j], ref.ref_name);
+		j++;
+	}
+
+	reftable_record_clear(&rec);
+	block_iter_close(&it);
+
+	for (i = 0; i < N; i++) {
+		struct block_iter it = { .last_key = SLICE_INIT };
+		slice_reset(&want);
+		slice_addstr(&want, names[i]);
+
+		n = block_reader_seek(&br, &it, &want);
+		assert(n == 0);
+
+		n = block_iter_next(&it, &rec);
+		assert(n == 0);
+
+		assert_streq(names[i], ref.ref_name);
+
+		want.len--;
+		n = block_reader_seek(&br, &it, &want);
+		assert(n == 0);
+
+		n = block_iter_next(&it, &rec);
+		assert(n == 0);
+		assert_streq(names[10 * (i / 10)], ref.ref_name);
+
+		block_iter_close(&it);
+	}
+
+	reftable_record_clear(&rec);
+	reftable_block_done(&br.block);
+	slice_release(&want);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+}
+
+int block_test_main(int argc, const char *argv[])
+{
+	add_test_case("binsearch", &test_binsearch);
+	add_test_case("block_read_write", &test_block_read_write);
+	return test_main(argc, argv);
+}
diff --git a/reftable/compat.c b/reftable/compat.c
new file mode 100644
index 00000000000..c83cc158311
--- /dev/null
+++ b/reftable/compat.c
@@ -0,0 +1,97 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+
+*/
+
+/* compat.c - compatibility functions for standalone compilation */
+
+#include "system.h"
+#include "basics.h"
+
+#ifndef REFTABLE_IN_GITCORE
+
+#include <dirent.h>
+
+void put_be32(void *p, uint32_t i)
+{
+	byte *out = (byte *)p;
+
+	out[0] = (uint8_t)((i >> 24) & 0xff);
+	out[1] = (uint8_t)((i >> 16) & 0xff);
+	out[2] = (uint8_t)((i >> 8) & 0xff);
+	out[3] = (uint8_t)((i)&0xff);
+}
+
+uint32_t get_be32(uint8_t *in)
+{
+	return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
+	       (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
+}
+
+void put_be64(void *p, uint64_t v)
+{
+	byte *out = (byte *)p;
+	int i = sizeof(uint64_t);
+	while (i--) {
+		out[i] = (uint8_t)(v & 0xff);
+		v >>= 8;
+	}
+}
+
+uint64_t get_be64(uint8_t *out)
+{
+	uint64_t v = 0;
+	int i = 0;
+	for (i = 0; i < sizeof(uint64_t); i++) {
+		v = (v << 8) | (uint8_t)(out[i] & 0xff);
+	}
+	return v;
+}
+
+uint16_t get_be16(uint8_t *in)
+{
+	return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
+}
+
+char *xstrdup(const char *s)
+{
+	int l = strlen(s);
+	char *dest = (char *)reftable_malloc(l + 1);
+	strncpy(dest, s, l + 1);
+	return dest;
+}
+
+void sleep_millisec(int millisecs)
+{
+	usleep(millisecs * 1000);
+}
+
+void reftable_clear_dir(const char *dirname)
+{
+	DIR *dir = opendir(dirname);
+	struct dirent *ent = NULL;
+	assert(dir);
+	while ((ent = readdir(dir)) != NULL) {
+		unlinkat(dirfd(dir), ent->d_name, 0);
+	}
+	closedir(dir);
+	rmdir(dirname);
+}
+
+#else
+
+#include "../dir.h"
+
+void reftable_clear_dir(const char *dirname)
+{
+	struct strbuf path = STRBUF_INIT;
+	strbuf_addstr(&path, dirname);
+	remove_dir_recursively(&path, 0);
+	strbuf_release(&path);
+}
+
+#endif
diff --git a/reftable/compat.h b/reftable/compat.h
new file mode 100644
index 00000000000..930f26070c7
--- /dev/null
+++ b/reftable/compat.h
@@ -0,0 +1,48 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef COMPAT_H
+#define COMPAT_H
+
+#include "system.h"
+
+#ifndef REFTABLE_IN_GITCORE
+
+/* functions that git-core provides, for standalone compilation */
+#include <stdint.h>
+
+uint64_t get_be64(uint8_t *in);
+void put_be64(void *out, uint64_t i);
+
+void put_be32(void *out, uint32_t i);
+uint32_t get_be32(uint8_t *in);
+
+uint16_t get_be16(uint8_t *in);
+
+#define ARRAY_SIZE(a) sizeof((a)) / sizeof((a)[0])
+#define FREE_AND_NULL(x)          \
+	do {                      \
+		reftable_free(x); \
+		(x) = NULL;       \
+	} while (0)
+#define QSORT(arr, n, cmp) qsort(arr, n, sizeof(arr[0]), cmp)
+#define SWAP(a, b)                              \
+	{                                       \
+		char tmp[sizeof(a)];            \
+		assert(sizeof(a) == sizeof(b)); \
+		memcpy(&tmp[0], &a, sizeof(a)); \
+		memcpy(&a, &b, sizeof(a));      \
+		memcpy(&b, &tmp[0], sizeof(a)); \
+	}
+
+char *xstrdup(const char *s);
+
+void sleep_millisec(int millisecs);
+
+#endif
+#endif
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 00000000000..5eee72c4c11
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,21 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 00000000000..00b444e8c9b
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,212 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+
+#include "reftable.h"
+#include "reftable-tests.h"
+
+static uint32_t hash_id;
+
+static int dump_table(const char *tablename)
+{
+	struct reftable_block_source src = { 0 };
+	int err = reftable_block_source_from_file(&src, tablename);
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+	struct reftable_reader *r = NULL;
+
+	if (err < 0)
+		return err;
+
+	err = reftable_new_reader(&r, &src, tablename);
+	if (err < 0)
+		return err;
+
+	err = reftable_reader_seek_ref(r, &it, "");
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_ref_record_print(&ref, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_ref_record_clear(&ref);
+
+	err = reftable_reader_seek_log(r, &it, "");
+	if (err < 0) {
+		return err;
+	}
+	while (1) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_log_record_print(&log, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_log_record_clear(&log);
+
+	reftable_reader_free(r);
+	return 0;
+}
+
+static int compact_stack(const char *stackdir)
+{
+	struct reftable_stack *stack = NULL;
+	struct reftable_write_options cfg = {};
+
+	int err = reftable_new_stack(&stack, stackdir, cfg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_stack_compact_all(stack, NULL);
+	if (err < 0)
+		goto done;
+done:
+	if (stack != NULL) {
+		reftable_stack_destroy(stack);
+	}
+	return err;
+}
+
+static int dump_stack(const char *stackdir)
+{
+	struct reftable_stack *stack = NULL;
+	struct reftable_write_options cfg = {};
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+	struct reftable_merged_table *merged = NULL;
+
+	int err = reftable_new_stack(&stack, stackdir, cfg);
+	if (err < 0)
+		return err;
+
+	merged = reftable_stack_merged_table(stack);
+
+	err = reftable_merged_table_seek_ref(merged, &it, "");
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_ref_record_print(&ref, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_ref_record_clear(&ref);
+
+	err = reftable_merged_table_seek_log(merged, &it, "");
+	if (err < 0) {
+		return err;
+	}
+	while (1) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_log_record_print(&log, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_log_record_clear(&log);
+
+	reftable_stack_destroy(stack);
+	return 0;
+}
+
+static void print_help(void)
+{
+	printf("usage: dump [-cst] arg\n\n"
+	       "options: \n"
+	       "  -c compact\n"
+	       "  -t dump table\n"
+	       "  -s dump stack\n"
+	       "  -h this help\n"
+	       "  -2 use SHA256\n"
+	       "\n");
+}
+
+int reftable_dump_main(int argc, char *const *argv)
+{
+	int err = 0;
+	int opt;
+	int opt_dump_table = 0;
+	int opt_dump_stack = 0;
+	int opt_compact = 0;
+	const char *arg = NULL;
+	while ((opt = getopt(argc, argv, "2chts")) != -1) {
+		switch (opt) {
+		case '2':
+			hash_id = 0x73323536;
+			break;
+		case 't':
+			opt_dump_table = 1;
+			break;
+		case 's':
+			opt_dump_stack = 1;
+			break;
+		case 'c':
+			opt_compact = 1;
+			break;
+		case '?':
+		case 'h':
+			print_help();
+			return 2;
+			break;
+		}
+	}
+
+	if (argv[optind] == NULL) {
+		fprintf(stderr, "need argument\n");
+		print_help();
+		return 2;
+	}
+
+	arg = argv[optind];
+
+	if (opt_dump_table) {
+		err = dump_table(arg);
+	} else if (opt_dump_stack) {
+		err = dump_stack(arg);
+	} else if (opt_compact) {
+		err = compact_stack(arg);
+	}
+
+	if (err < 0) {
+		fprintf(stderr, "%s: %s: %s\n", argv[0], arg,
+			reftable_error_str(err));
+		return 1;
+	}
+	return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 00000000000..63843a640b0
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,95 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	reftable_free(b);
+}
+
+static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = reftable_malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size)
+		return -1;
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int reftable_block_source_from_file(struct reftable_block_source *bs,
+				    const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	struct file_block_source *p = NULL;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return REFTABLE_NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0)
+		return -1;
+
+	p = reftable_calloc(sizeof(struct file_block_source));
+	p->size = st.st_size;
+	p->fd = fd;
+
+	assert(bs->ops == NULL);
+	bs->ops = &file_vtable;
+	bs->arg = p;
+	return 0;
+}
+
+int reftable_fd_write(void *arg, byte *data, size_t sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 00000000000..9f014c6948a
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,242 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct reftable_iterator *it)
+{
+	return it->ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct reftable_record *rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct reftable_iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct reftable_iterator *it)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec)
+{
+	return it->ops->next(it->iter_arg, rec);
+}
+
+void reftable_iterator_destroy(struct reftable_iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, ref);
+	return iterator_next(it, &rec);
+}
+
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, log);
+	return iterator_next(it, &rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	slice_release(&fri->oid);
+	reftable_iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg,
+				       struct reftable_record *rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+	int err = 0;
+	while (true) {
+		err = reftable_iterator_next_ref(&fri->it, ref);
+		if (err != 0) {
+			break;
+		}
+
+		if (fri->double_check) {
+			struct reftable_iterator it = { 0 };
+
+			err = reftable_table_seek_ref(&fri->tab, &it,
+						      ref->ref_name);
+			if (err == 0) {
+				err = reftable_iterator_next_ref(&it, ref);
+			}
+
+			reftable_iterator_destroy(&it);
+
+			if (err < 0) {
+				break;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+
+	reftable_ref_record_clear(ref);
+	return err;
+}
+
+struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reftable_block_done(&it->block_reader.block);
+	slice_release(&it->oid);
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	uint64_t off;
+	int err = 0;
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reftable_block_done(&it->block_reader.block);
+
+	off = it->offsets[it->offset_idx++];
+	err = reader_init_block_reader(it->r, &it->block_reader, off,
+				       BLOCK_TYPE_REF);
+	if (err < 0) {
+		return err;
+	}
+	if (err > 0) {
+		/* indexed block does not exist. */
+		return REFTABLE_FORMAT_ERROR;
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter empty = INDEXED_TABLE_REF_ITER_INIT;
+	struct indexed_table_ref_iter *itr =
+		reftable_calloc(sizeof(struct indexed_table_ref_iter));
+	int err = 0;
+
+	*itr = empty;
+	itr->r = r;
+	slice_add(&itr->oid, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		reftable_free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 00000000000..b6294cf3206
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,72 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "slice.h"
+
+struct reftable_iterator_vtable {
+	int (*next)(void *iter_arg, struct reftable_record *rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct reftable_iterator *it);
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec);
+
+/* Returns true for a zeroed out iterator, such as the one returned from
+   iterator_destroy. */
+bool iterator_is_null(struct reftable_iterator *it);
+
+/* iterator that produces only ref records that point to `oid` */
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reftable_table tab;
+	struct slice oid;
+	struct reftable_iterator it;
+};
+#define FILTERING_REF_ITERATOR_INIT \
+	{                           \
+		.oid = SLICE_INIT   \
+	}
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
+					  struct filtering_ref_iterator *);
+
+/* iterator that produces only ref records that point to `oid`,
+   but using the object index.
+ */
+struct indexed_table_ref_iter {
+	struct reftable_reader *r;
+	struct slice oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+#define INDEXED_TABLE_REF_ITER_INIT                                   \
+	{                                                             \
+		.cur = { .last_key = SLICE_INIT }, .oid = SLICE_INIT, \
+	}
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, byte *oid,
+			       int oid_len, uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 00000000000..5670a52a80d
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,320 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct reftable_record rec = reftable_new_record(mi->typ);
+		int err = iterator_next(&mi->stack[i], &rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[i]);
+			reftable_record_destroy(&rec);
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->stack[i]);
+	}
+	reftable_free(mi->stack);
+}
+
+static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
+					       size_t idx)
+{
+	struct reftable_record rec = reftable_new_record(mi->typ);
+	struct pq_entry e = {
+		.rec = rec,
+		.index = idx,
+	};
+	int err = iterator_next(&mi->stack[idx], &rec);
+	if (err < 0)
+		return err;
+
+	if (err > 0) {
+		reftable_iterator_destroy(&mi->stack[idx]);
+		reftable_record_destroy(&rec);
+		return 0;
+	}
+
+	merged_iter_pqueue_add(&mi->pq, e);
+	return 0;
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
+{
+	if (iterator_is_null(&mi->stack[idx]))
+		return 0;
+	return merged_iter_advance_nonnull_subiter(mi, idx);
+}
+
+static int merged_iter_next_entry(struct merged_iter *mi,
+				  struct reftable_record *rec)
+{
+	struct slice entry_key = SLICE_INIT;
+	struct pq_entry entry = { 0 };
+	int err = 0;
+
+	if (merged_iter_pqueue_is_empty(mi->pq))
+		return 1;
+
+	entry = merged_iter_pqueue_remove(&mi->pq);
+	err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0)
+		return err;
+
+	/*
+	  One can also use reftable as datacenter-local storage, where the ref
+	  database is maintained in globally consistent database (eg.
+	  CockroachDB or Spanner). In this scenario, replication delays together
+	  with compaction may cause newer tables to contain older entries. In
+	  such a deployment, the loop below must be changed to collect all
+	  entries for the same key, and return new the newest one.
+	*/
+	reftable_record_key(&entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct slice k = SLICE_INIT;
+		int err = 0, cmp = 0;
+
+		reftable_record_key(&top.rec, &k);
+
+		cmp = slice_cmp(&k, &entry_key);
+		slice_release(&k);
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		reftable_record_destroy(&top.rec);
+	}
+
+	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
+	reftable_record_destroy(&entry.rec);
+	slice_release(&entry_key);
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec)
+{
+	while (true) {
+		int err = merged_iter_next_entry(mi, rec);
+		if (err == 0 && mi->suppress_deletions &&
+		    reftable_record_is_deletion(rec)) {
+			continue;
+		}
+
+		return err;
+	}
+}
+
+static int merged_iter_next_void(void *p, struct reftable_record *rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq))
+		return 1;
+
+	return merged_iter_next(mi, rec);
+}
+
+struct reftable_iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct reftable_iterator *it,
+				      struct merged_iter *mi)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id)
+{
+	struct reftable_merged_table *m = NULL;
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reftable_reader *r = stack[i];
+		if (r->hash_id != hash_id) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reftable_reader_min_update_index(r);
+		}
+
+		last_max = reftable_reader_max_update_index(r);
+	}
+
+	m = (struct reftable_merged_table *)reftable_calloc(
+		sizeof(struct reftable_merged_table));
+	m->stack = stack;
+	m->stack_len = n;
+	m->min = first_min;
+	m->max = last_max;
+	m->hash_id = hash_id;
+	*dest = m;
+	return 0;
+}
+
+void reftable_merged_table_close(struct reftable_merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reftable_reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct reftable_merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void reftable_merged_table_free(struct reftable_merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	reftable_free(mt);
+}
+
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
+{
+	return mt->min;
+}
+
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec)
+{
+	struct reftable_iterator *iters = reftable_calloc(
+		sizeof(struct reftable_iterator) * mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = reftable_record_type(rec),
+		.hash_id = mt->hash_id,
+		.suppress_deletions = mt->suppress_deletions,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			reftable_iterator_destroy(&iters[i]);
+		}
+		reftable_free(iters);
+		return err;
+	}
+
+	merged.stack_len = n;
+	err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	} else {
+		struct merged_iter *p =
+			reftable_malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 00000000000..4e03642bdbe
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct reftable_merged_table {
+	struct reftable_reader **stack;
+	int stack_len;
+	uint32_t hash_id;
+	bool suppress_deletions;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	int stack_len;
+	byte typ;
+	bool suppress_deletions;
+	struct merged_iter_pqueue pq;
+};
+
+void merged_table_clear(struct reftable_merged_table *mt);
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec);
+
+#endif
diff --git a/reftable/merged_test.c b/reftable/merged_test.c
new file mode 100644
index 00000000000..180ebcb1158
--- /dev/null
+++ b/reftable/merged_test.c
@@ -0,0 +1,273 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "pq.h"
+#include "reader.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_pq(void)
+{
+	char *names[54] = { 0 };
+	int N = ARRAY_SIZE(names) - 1;
+
+	struct merged_iter_pqueue pq = { 0 };
+	const char *last = NULL;
+
+	int i = 0;
+	for (i = 0; i < N; i++) {
+		char name[100];
+		snprintf(name, sizeof(name), "%02d", i);
+		names[i] = xstrdup(name);
+	}
+
+	i = 1;
+	do {
+		struct reftable_record rec =
+			reftable_new_record(BLOCK_TYPE_REF);
+		struct pq_entry e = { 0 };
+
+		reftable_record_as_ref(&rec)->ref_name = names[i];
+		e.rec = rec;
+		merged_iter_pqueue_add(&pq, e);
+		merged_iter_pqueue_check(pq);
+		i = (i * 7) % N;
+	} while (i != 1);
+
+	while (!merged_iter_pqueue_is_empty(pq)) {
+		struct pq_entry e = merged_iter_pqueue_remove(&pq);
+		struct reftable_ref_record *ref =
+			reftable_record_as_ref(&e.rec);
+
+		merged_iter_pqueue_check(pq);
+
+		if (last != NULL) {
+			assert(strcmp(last, ref->ref_name) < 0);
+		}
+		last = ref->ref_name;
+		ref->ref_name = NULL;
+		reftable_free(ref);
+	}
+
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+
+	merged_iter_pqueue_clear(&pq);
+}
+
+static void write_test_table(struct slice *buf,
+			     struct reftable_ref_record refs[], int n)
+{
+	int min = 0xffffffff;
+	int max = 0;
+	int i = 0;
+	int err;
+
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_writer *w = NULL;
+	for (i = 0; i < n; i++) {
+		uint64_t ui = refs[i].update_index;
+		if (ui > max) {
+			max = ui;
+		}
+		if (ui < min) {
+			min = ui;
+		}
+	}
+
+	w = reftable_new_writer(&slice_add_void, buf, &opts);
+	reftable_writer_set_limits(w, min, max);
+
+	for (i = 0; i < n; i++) {
+		uint64_t before = refs[i].update_index;
+		int n = reftable_writer_add_ref(w, &refs[i]);
+		assert(n == 0);
+		assert(before == refs[i].update_index);
+	}
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+
+	reftable_writer_free(w);
+}
+
+static struct reftable_merged_table *
+merged_table_from_records(struct reftable_ref_record **refs,
+			  struct reftable_block_source **source, int *sizes,
+			  struct slice *buf, int n)
+{
+	struct reftable_reader **rd = reftable_calloc(n * sizeof(*rd));
+	int i = 0;
+	struct reftable_merged_table *mt = NULL;
+	int err;
+	*source = reftable_calloc(n * sizeof(**source));
+	for (i = 0; i < n; i++) {
+		write_test_table(&buf[i], refs[i], sizes[i]);
+		block_source_from_slice(&(*source)[i], &buf[i]);
+
+		err = reftable_new_reader(&rd[i], &(*source)[i], "name");
+		assert_err(err);
+	}
+
+	err = reftable_new_merged_table(&mt, rd, n, SHA1_ID);
+	assert_err(err);
+	return mt;
+}
+
+static void test_merged_between(void)
+{
+	byte hash1[SHA1_SIZE] = { 1, 2, 3, 0 };
+
+	struct reftable_ref_record r1[] = { {
+		.ref_name = "b",
+		.update_index = 1,
+		.value = hash1,
+	} };
+	struct reftable_ref_record r2[] = { {
+		.ref_name = "a",
+		.update_index = 2,
+	} };
+
+	struct reftable_ref_record *refs[] = { r1, r2 };
+	int sizes[] = { 1, 1 };
+	struct slice bufs[2] = { SLICE_INIT, SLICE_INIT };
+	struct reftable_block_source *bs = NULL;
+	struct reftable_merged_table *mt =
+		merged_table_from_records(refs, &bs, sizes, bufs, 2);
+	int i;
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_iterator it = { 0 };
+	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert_err(err);
+	assert(ref.update_index == 2);
+	reftable_ref_record_clear(&ref);
+
+	reftable_iterator_destroy(&it);
+	reftable_merged_table_close(mt);
+	reftable_merged_table_free(mt);
+	for (i = 0; i < ARRAY_SIZE(bufs); i++) {
+		slice_release(&bufs[i]);
+	}
+	reftable_free(bs);
+}
+
+static void test_merged(void)
+{
+	byte hash1[SHA1_SIZE] = { 1 };
+	byte hash2[SHA1_SIZE] = { 2 };
+	struct reftable_ref_record r1[] = { {
+						    .ref_name = "a",
+						    .update_index = 1,
+						    .value = hash1,
+					    },
+					    {
+						    .ref_name = "b",
+						    .update_index = 1,
+						    .value = hash1,
+					    },
+					    {
+						    .ref_name = "c",
+						    .update_index = 1,
+						    .value = hash1,
+					    } };
+	struct reftable_ref_record r2[] = { {
+		.ref_name = "a",
+		.update_index = 2,
+	} };
+	struct reftable_ref_record r3[] = {
+		{
+			.ref_name = "c",
+			.update_index = 3,
+			.value = hash2,
+		},
+		{
+			.ref_name = "d",
+			.update_index = 3,
+			.value = hash1,
+		},
+	};
+
+	struct reftable_ref_record want[] = {
+		r2[0],
+		r1[1],
+		r3[0],
+		r3[1],
+	};
+
+	struct reftable_ref_record *refs[] = { r1, r2, r3 };
+	int sizes[3] = { 3, 1, 2 };
+	struct slice bufs[3] = { SLICE_INIT, SLICE_INIT, SLICE_INIT };
+	struct reftable_block_source *bs = NULL;
+
+	struct reftable_merged_table *mt =
+		merged_table_from_records(refs, &bs, sizes, bufs, 3);
+
+	struct reftable_iterator it = { 0 };
+	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	struct reftable_ref_record *out = NULL;
+	int len = 0;
+	int cap = 0;
+	int i = 0;
+
+	assert_err(err);
+	while (len < 100) { /* cap loops/recursion. */
+		struct reftable_ref_record ref = { 0 };
+		int err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			out = reftable_realloc(
+				out, sizeof(struct reftable_ref_record) * cap);
+		}
+		out[len++] = ref;
+	}
+	reftable_iterator_destroy(&it);
+
+	assert(ARRAY_SIZE(want) == len);
+	for (i = 0; i < len; i++) {
+		assert(reftable_ref_record_equal(&want[i], &out[i], SHA1_SIZE));
+	}
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&out[i]);
+	}
+	reftable_free(out);
+
+	for (i = 0; i < 3; i++) {
+		slice_release(&bufs[i]);
+	}
+	reftable_merged_table_close(mt);
+	reftable_merged_table_free(mt);
+	reftable_free(bs);
+}
+
+/* XXX test refs_for(oid) */
+
+int merged_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_merged_between", &test_merged_between);
+	add_test_case("test_pq", &test_pq);
+	add_test_case("test_merged", &test_merged);
+	return test_main(argc, argv);
+}
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 00000000000..3910d497113
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,113 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "system.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct slice ak = SLICE_INIT;
+	struct slice bk = SLICE_INIT;
+	int cmp = 0;
+	reftable_record_key(&a.rec, &ak);
+	reftable_record_key(&b.rec, &bk);
+
+	cmp = slice_cmp(&ak, &bk);
+
+	slice_release(&ak);
+	slice_release(&bk);
+
+	if (cmp == 0)
+		return a.index > b.index;
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = reftable_realloc(pq->heap,
+					    pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		reftable_record_destroy(&pq->heap[i].rec);
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 00000000000..43c0974f162
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct reftable_record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 00000000000..a35189bf534
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,742 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct reftable_block_source *source)
+{
+	return source->ops->size(source->arg);
+}
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	int result = source->ops->read_block(source->arg, dest, off, size);
+	dest->source = *source;
+	return result;
+}
+
+void reftable_block_done(struct reftable_block *blockp)
+{
+	struct reftable_block_source source = blockp->source;
+	if (blockp != NULL && source.ops != NULL)
+		source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct reftable_block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reftable_reader_offsets *
+reader_offsets_for(struct reftable_reader *r, byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reftable_reader *r,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size)
+		return 0;
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(&r->source, dest, off, sz);
+}
+
+uint32_t reftable_reader_hash_id(struct reftable_reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reftable_reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reftable_reader *r, byte *footer, byte *header)
+{
+	byte *f = footer;
+	byte first_block_typ;
+	int err = 0;
+	uint32_t computed_crc;
+	uint32_t file_crc;
+
+	if (memcmp(f, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, header_size(r->version))) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	f++;
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	if (r->version == 1) {
+		r->hash_id = SHA1_ID;
+	} else {
+		r->hash_id = get_be32(f);
+		switch (r->hash_id) {
+		case SHA1_ID:
+			break;
+		case SHA256_ID:
+			break;
+		default:
+			err = REFTABLE_FORMAT_ERROR;
+			goto done;
+		}
+		f += 4;
+	}
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	computed_crc = crc32(0, footer, f - footer);
+	file_crc = get_be32(f);
+	f += 4;
+	if (computed_crc != file_crc) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	first_block_typ = header[header_size(r->version)];
+	r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+	r->ref_offsets.offset = 0;
+	r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+				  r->log_offsets.offset > 0);
+	r->obj_offsets.present = r->obj_offsets.offset > 0;
+	err = 0;
+done:
+	return err;
+}
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name)
+{
+	struct reftable_block footer = { 0 };
+	struct reftable_block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reftable_reader));
+
+	/* Need +1 to read type of first block. */
+	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
+	if (err != header_size(2) + 1) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	if (memcmp(header.data, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+	r->version = header.data[4];
+	if (r->version != 1 && r->version != 2) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	r->size = block_source_size(source) - footer_size(r->version);
+	r->source = *source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size,
+				      footer_size(r->version));
+	if (err != footer_size(r->version)) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+done:
+	reftable_block_done(&footer);
+	reftable_block_done(&header);
+	return err;
+}
+
+struct table_iter {
+	struct reftable_reader *r;
+	byte typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+#define TABLE_ITER_INIT                         \
+	{                                       \
+		.bi = {.last_key = SLICE_INIT } \
+	}
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti,
+				    struct reftable_record *rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && reftable_record_type(rec) == BLOCK_TYPE_REF) {
+		((struct reftable_ref_record *)rec->data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reftable_block_done(&ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off,
+				  int version)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += header_size(version);
+	}
+
+	*typ = data[0];
+	if (reftable_is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct reftable_block block = { 0 };
+	byte block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : header_size(r->version);
+	int32_t block_size = 0;
+
+	if (next_off >= r->size)
+		return 1;
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0)
+		return err;
+
+	block_size = extract_block_size(block.data, &block_typ, next_off,
+					r->version);
+	if (block_size < 0)
+		return block_size;
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reftable_block_done(&block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reftable_block_done(&block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0)
+		return err;
+	else {
+		struct block_reader *brp =
+			reftable_malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
+{
+	if (reftable_record_type(rec) != ti->typ)
+		return REFTABLE_API_ERROR;
+
+	while (true) {
+		struct table_iter next = TABLE_ITER_INIT;
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct reftable_record *rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct reftable_iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reftable_reader *r,
+				struct table_iter *ti, uint64_t off, byte typ)
+{
+	struct block_reader br = { 0 };
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0)
+		return err;
+
+	brp = reftable_malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reftable_reader *r, struct table_iter *ti,
+			byte typ, bool index)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
+			      struct reftable_record *want)
+{
+	struct reftable_record rec =
+		reftable_new_record(reftable_record_type(want));
+	struct slice want_key = SLICE_INIT;
+	struct slice got_key = SLICE_INIT;
+	struct table_iter next = TABLE_ITER_INIT;
+	int err = -1;
+
+	reftable_record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0)
+			goto done;
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0)
+			goto done;
+
+		if (slice_cmp(&got_key, &want_key) > 0) {
+			table_iter_block_done(&next);
+			break;
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, &want_key);
+	if (err < 0)
+		goto done;
+	err = 0;
+
+done:
+	block_iter_close(&next.bi);
+	reftable_record_destroy(&rec);
+	slice_release(&want_key);
+	slice_release(&got_key);
+	return err;
+}
+
+static int reader_seek_indexed(struct reftable_reader *r,
+			       struct reftable_iterator *it,
+			       struct reftable_record *rec)
+{
+	struct reftable_index_record want_index = { .last_key = SLICE_INIT };
+	struct reftable_record want_index_rec = { 0 };
+	struct reftable_index_record index_result = { .last_key = SLICE_INIT };
+	struct reftable_record index_result_rec = { 0 };
+	struct table_iter index_iter = TABLE_ITER_INIT;
+	struct table_iter next = TABLE_ITER_INIT;
+	int err = 0;
+
+	reftable_record_key(rec, &want_index.last_key);
+	reftable_record_from_index(&want_index_rec, &want_index);
+	reftable_record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, reftable_record_type(rec), true);
+	if (err < 0)
+		goto done;
+
+	err = reader_seek_linear(r, &index_iter, &want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, &index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0)
+			goto done;
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0)
+			goto done;
+
+		err = block_iter_seek(&next.bi, &want_index.last_key);
+		if (err < 0)
+			goto done;
+
+		if (next.typ == reftable_record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = REFTABLE_FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter empty = TABLE_ITER_INIT;
+		struct table_iter *malloced =
+			reftable_calloc(sizeof(struct table_iter));
+		*malloced = empty;
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+done:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	reftable_record_clear(&want_index_rec);
+	reftable_record_clear(&index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reftable_reader *r,
+				struct reftable_iterator *it,
+				struct reftable_record *rec)
+{
+	struct reftable_reader_offsets *offs =
+		reader_offsets_for(r, reftable_record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = TABLE_ITER_INIT;
+	int err = 0;
+	if (idx > 0)
+		return reader_seek_indexed(r, it, rec);
+
+	err = reader_start(r, &ti, reftable_record_type(rec), false);
+	if (err < 0)
+		return err;
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0)
+		return err;
+	else {
+		struct table_iter *p =
+			reftable_malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec)
+{
+	byte typ = reftable_record_type(rec);
+
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reftable_reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int reftable_new_reader(struct reftable_reader **p,
+			struct reftable_block_source *src, char const *name)
+{
+	struct reftable_reader *rd =
+		reftable_calloc(sizeof(struct reftable_reader));
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		block_source_close(src);
+		reftable_free(rd);
+	}
+	return err;
+}
+
+void reftable_reader_free(struct reftable_reader *r)
+{
+	reader_close(r);
+	reftable_free(r);
+}
+
+static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
+					    struct reftable_iterator *it,
+					    byte *oid)
+{
+	struct reftable_obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct reftable_record want_rec = { 0 };
+	struct reftable_iterator oit = { 0 };
+	struct reftable_obj_record got = { 0 };
+	struct reftable_record got_rec = { 0 };
+	int err = 0;
+	struct indexed_table_ref_iter *itr = NULL;
+
+	/* Look through the reverse index. */
+	reftable_record_from_obj(&want_rec, &want);
+	err = reader_seek(r, &oit, &want_rec);
+	if (err != 0)
+		goto done;
+
+	/* read out the reftable_obj_record */
+	reftable_record_from_obj(&got_rec, &got);
+	err = iterator_next(&oit, &got_rec);
+	if (err < 0)
+		goto done;
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		/* didn't find it; return empty iterator */
+		iterator_set_empty(it);
+		err = 0;
+		goto done;
+	}
+
+	err = new_indexed_table_ref_iter(&itr, r, oid, hash_size(r->hash_id),
+					 got.offsets, got.offset_len);
+	if (err < 0)
+		goto done;
+	got.offsets = NULL;
+	iterator_from_indexed_table_ref_iter(it, itr);
+
+done:
+	reftable_iterator_destroy(&oit);
+	reftable_record_clear(&got_rec);
+	return err;
+}
+
+static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
+					      struct reftable_iterator *it,
+					      byte *oid)
+{
+	struct table_iter ti_empty = TABLE_ITER_INIT;
+	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
+	struct filtering_ref_iterator *filter = NULL;
+	struct filtering_ref_iterator empty = FILTERING_REF_ITERATOR_INIT;
+	int oid_len = hash_size(r->hash_id);
+	int err;
+
+	*ti = ti_empty;
+	err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		reftable_free(ti);
+		return err;
+	}
+
+	filter = reftable_malloc(sizeof(struct filtering_ref_iterator));
+	*filter = empty;
+
+	slice_add(&filter->oid, oid, oid_len);
+	reftable_table_from_reader(&filter->tab, r);
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, byte *oid)
+{
+	if (r->obj_offsets.present)
+		return reftable_reader_refs_for_indexed(r, it, oid);
+	return reftable_reader_refs_for_unindexed(r, it, oid);
+}
+
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 00000000000..cc87aa49ea5
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,65 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source *source);
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_close(struct reftable_block_source *source);
+
+/* metadata for a block type */
+struct reftable_reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+/* The state for reading a reftable file. */
+struct reftable_reader {
+	/* for convience, associate a name with the instance. */
+	char *name;
+	struct reftable_block_source source;
+
+	/* Size of the file, excluding the footer. */
+	uint64_t size;
+
+	/* 'sha1' for SHA1, 's256' for SHA-256 */
+	uint32_t hash_id;
+
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	/* Length of the OID keys in the 'o' section */
+	int object_id_len;
+	int version;
+
+	struct reftable_reader_offsets ref_offsets;
+	struct reftable_reader_offsets obj_offsets;
+	struct reftable_reader_offsets log_offsets;
+};
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name);
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec);
+void reader_close(struct reftable_reader *r);
+const char *reader_name(struct reftable_reader *r);
+
+/* initialize a block reader to read from `r` */
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, byte want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 00000000000..78d4d30cbb9
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1118 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+/* record.c - methods for different types of records. */
+
+#include "record.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "reftable.h"
+
+int get_var_int(uint64_t *dest, struct slice *in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in->len == 0)
+		return -1;
+	val = in->buf[ptr] & 0x7f;
+
+	while (in->buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in->len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in->buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct slice *dest, uint64_t val)
+{
+	byte buf[10] = { 0 };
+	int i = 9;
+	int n = 0;
+	buf[i] = (byte)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (byte)(val & 0x7f);
+		i--;
+	}
+
+	n = sizeof(buf) - i - 1;
+	if (dest->len < n)
+		return -1;
+	memcpy(dest->buf, &buf[i + 1], n);
+	return n;
+}
+
+int reftable_is_block_type(byte typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+static int decode_string(struct slice *dest, struct slice in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, &in);
+	if (n <= 0)
+		return -1;
+	slice_consume(&in, n);
+	if (in.len < tsize)
+		return -1;
+
+	slice_reset(dest);
+	slice_add(dest, in.buf, tsize);
+	slice_consume(&in, tsize);
+
+	return start_len - in.len;
+}
+
+static int encode_string(char *str, struct slice s)
+{
+	struct slice start = s;
+	int l = strlen(str);
+	int n = put_var_int(&s, l);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+	if (s.len < l)
+		return -1;
+	memcpy(s.buf, str, l);
+	slice_consume(&s, l);
+
+	return start.len - s.len;
+}
+
+int reftable_encode_key(bool *restart, struct slice dest, struct slice prev_key,
+			struct slice key, byte extra)
+{
+	struct slice start = dest;
+	int prefix_len = common_prefix_size(&prev_key, &key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(&dest, (uint64_t)prefix_len);
+	if (n < 0)
+		return -1;
+	slice_consume(&dest, n);
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(&dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0)
+		return -1;
+	slice_consume(&dest, n);
+
+	if (dest.len < suffix_len)
+		return -1;
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	slice_consume(&dest, suffix_len);
+
+	return start.len - dest.len;
+}
+
+int reftable_decode_key(struct slice *key, byte *extra, struct slice last_key,
+			struct slice in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, &in);
+	if (n < 0)
+		return -1;
+	slice_consume(&in, n);
+
+	if (prefix_len > last_key.len)
+		return -1;
+
+	n = get_var_int(&suffix_len, &in);
+	if (n <= 0)
+		return -1;
+	slice_consume(&in, n);
+
+	*extra = (byte)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len)
+		return -1;
+
+	slice_reset(key);
+	slice_add(key, last_key.buf, prefix_len);
+	slice_add(key, in.buf, suffix_len);
+	slice_consume(&in, suffix_len);
+
+	return start_len - in.len;
+}
+
+static void reftable_ref_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_ref_record *rec =
+		(const struct reftable_ref_record *)r;
+	slice_reset(dest);
+	slice_addstr(dest, rec->ref_name);
+}
+
+static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
+	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	reftable_ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = xstrdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = reftable_malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = reftable_malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9)
+		return '0' + c;
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, byte *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size(hash_id));
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size(hash_id));
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void reftable_ref_record_clear_void(void *rec)
+{
+	reftable_ref_record_clear((struct reftable_ref_record *)rec);
+}
+
+void reftable_ref_record_clear(struct reftable_ref_record *ref)
+{
+	reftable_free(ref->ref_name);
+	reftable_free(ref->target);
+	reftable_free(ref->target_value);
+	reftable_free(ref->value);
+	memset(ref, 0, sizeof(struct reftable_ref_record));
+}
+
+static byte reftable_ref_record_val_type(const void *rec)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL)
+		return 3;
+	return 0;
+}
+
+static int reftable_ref_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	struct slice start = s;
+	int n = put_var_int(&s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		slice_consume(&s, hash_size);
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+
+	return start.len - s.len;
+}
+
+static int reftable_ref_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
+	struct slice start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, &in);
+	if (n < 0)
+		return n;
+	assert(hash_size > 0);
+
+	slice_consume(&in, n);
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = reftable_malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = reftable_malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		slice_consume(&in, hash_size);
+		break;
+	case 3: {
+		struct slice dest = SLICE_INIT;
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&in, n);
+		seen_target = true;
+		if (r->target != NULL) {
+			reftable_free(r->target);
+		}
+		r->target = (char *)slice_as_string(&dest);
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+static bool reftable_ref_record_is_deletion_void(const void *p)
+{
+	return reftable_ref_record_is_deletion(
+		(const struct reftable_ref_record *)p);
+}
+
+struct reftable_record_vtable reftable_ref_record_vtable = {
+	.key = &reftable_ref_record_key,
+	.type = BLOCK_TYPE_REF,
+	.copy_from = &reftable_ref_record_copy_from,
+	.val_type = &reftable_ref_record_val_type,
+	.encode = &reftable_ref_record_encode,
+	.decode = &reftable_ref_record_decode,
+	.clear = &reftable_ref_record_clear_void,
+	.is_deletion = &reftable_ref_record_is_deletion_void,
+};
+
+static void reftable_obj_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_obj_record *rec =
+		(const struct reftable_obj_record *)r;
+	slice_reset(dest);
+	slice_add(dest, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void reftable_obj_record_clear(void *rec)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	FREE_AND_NULL(obj->hash_prefix);
+	FREE_AND_NULL(obj->offsets);
+	memset(obj, 0, sizeof(struct reftable_obj_record));
+}
+
+static void reftable_obj_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	const struct reftable_obj_record *src =
+		(const struct reftable_obj_record *)src_rec;
+	int olen;
+
+	reftable_obj_record_clear(obj);
+	*obj = *src;
+	obj->hash_prefix = reftable_malloc(obj->hash_prefix_len);
+	memcpy(obj->hash_prefix, src->hash_prefix, obj->hash_prefix_len);
+
+	olen = obj->offset_len * sizeof(uint64_t);
+	obj->offsets = reftable_malloc(olen);
+	memcpy(obj->offsets, src->offsets, olen);
+}
+
+static byte reftable_obj_record_val_type(const void *rec)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8)
+		return r->offset_len;
+	return 0;
+}
+
+static int reftable_obj_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	struct slice start = s;
+	int i = 0;
+	int n = 0;
+	uint64_t last = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(&s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+	}
+	if (r->offset_len == 0)
+		return start.len - s.len;
+	n = put_var_int(&s, r->offsets[0]);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	last = r->offsets[0];
+	for (i = 1; i < r->offset_len; i++) {
+		int n = put_var_int(&s, r->offsets[i] - last);
+		if (n < 0) {
+			return -1;
+		}
+		slice_consume(&s, n);
+		last = r->offsets[i];
+	}
+	return start.len - s.len;
+}
+
+static int reftable_obj_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	uint64_t last;
+	int j;
+	r->hash_prefix = reftable_malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, &in);
+		if (n < 0) {
+			return n;
+		}
+
+		slice_consume(&in, n);
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0)
+		return start.len - in.len;
+
+	r->offsets = reftable_malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], &in);
+	if (n < 0)
+		return n;
+	slice_consume(&in, n);
+
+	last = r->offsets[0];
+	j = 1;
+	while (j < count) {
+		uint64_t delta = 0;
+		int n = get_var_int(&delta, &in);
+		if (n < 0) {
+			return n;
+		}
+		slice_consume(&in, n);
+
+		last = r->offsets[j] = (delta + last);
+		j++;
+	}
+	return start.len - in.len;
+}
+
+static bool not_a_deletion(const void *p)
+{
+	return false;
+}
+
+struct reftable_record_vtable reftable_obj_record_vtable = {
+	.key = &reftable_obj_record_key,
+	.type = BLOCK_TYPE_OBJ,
+	.copy_from = &reftable_obj_record_copy_from,
+	.val_type = &reftable_obj_record_val_type,
+	.encode = &reftable_obj_record_encode,
+	.decode = &reftable_obj_record_decode,
+	.clear = &reftable_obj_record_clear,
+	.is_deletion = not_a_deletion,
+};
+
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size(hash_id));
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size(hash_id));
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static void reftable_log_record_key(const void *r, struct slice *dest)
+{
+	const struct reftable_log_record *rec =
+		(const struct reftable_log_record *)r;
+	int len = strlen(rec->ref_name);
+	byte i64[8];
+	uint64_t ts = 0;
+	slice_reset(dest);
+	slice_add(dest, (byte *)rec->ref_name, len + 1);
+
+	ts = (~ts) - rec->update_index;
+	put_be64(&i64[0], ts);
+	slice_add(dest, i64, sizeof(i64));
+}
+
+static void reftable_log_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
+	const struct reftable_log_record *src =
+		(const struct reftable_log_record *)src_rec;
+
+	reftable_log_record_clear(dst);
+	*dst = *src;
+	if (dst->ref_name != NULL) {
+		dst->ref_name = xstrdup(dst->ref_name);
+	}
+	if (dst->email != NULL) {
+		dst->email = xstrdup(dst->email);
+	}
+	if (dst->name != NULL) {
+		dst->name = xstrdup(dst->name);
+	}
+	if (dst->message != NULL) {
+		dst->message = xstrdup(dst->message);
+	}
+
+	if (dst->new_hash != NULL) {
+		dst->new_hash = reftable_malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = reftable_malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void reftable_log_record_clear_void(void *rec)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	reftable_log_record_clear(r);
+}
+
+void reftable_log_record_clear(struct reftable_log_record *r)
+{
+	reftable_free(r->ref_name);
+	reftable_free(r->new_hash);
+	reftable_free(r->old_hash);
+	reftable_free(r->name);
+	reftable_free(r->email);
+	reftable_free(r->message);
+	memset(r, 0, sizeof(struct reftable_log_record));
+}
+
+static byte reftable_log_record_val_type(const void *rec)
+{
+	const struct reftable_log_record *log =
+		(const struct reftable_log_record *)rec;
+
+	return reftable_log_record_is_deletion(log) ? 0 : 1;
+}
+
+static byte zero[SHA256_SIZE] = { 0 };
+
+static int reftable_log_record_encode(const void *rec, struct slice s,
+				      int hash_size)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	struct slice start = s;
+	int n = 0;
+	byte *oldh = r->old_hash;
+	byte *newh = r->new_hash;
+	if (reftable_log_record_is_deletion(r))
+		return 0;
+
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size)
+		return -1;
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	slice_consume(&s, 2 * hash_size);
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	n = put_var_int(&s, r->time);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	if (s.len < 2)
+		return -1;
+
+	put_be16(s.buf, r->tz_offset);
+	slice_consume(&s, 2);
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0)
+		return -1;
+	slice_consume(&s, n);
+
+	return start.len - s.len;
+}
+
+static int reftable_log_record_decode(void *rec, struct slice key,
+				      byte val_type, struct slice in,
+				      int hash_size)
+{
+	struct slice start = in;
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct slice dest = SLICE_INIT;
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0)
+		return REFTABLE_FORMAT_ERROR;
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (val_type == 0) {
+		FREE_AND_NULL(r->old_hash);
+		FREE_AND_NULL(r->new_hash);
+		FREE_AND_NULL(r->message);
+		FREE_AND_NULL(r->email);
+		FREE_AND_NULL(r->name);
+		return 0;
+	}
+
+	if (in.len < 2 * hash_size)
+		return REFTABLE_FORMAT_ERROR;
+
+	r->old_hash = reftable_realloc(r->old_hash, hash_size);
+	r->new_hash = reftable_realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	slice_consume(&in, 2 * hash_size);
+
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	slice_consume(&in, n);
+
+	r->name = reftable_realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	slice_reset(&dest);
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	slice_consume(&in, n);
+
+	r->email = reftable_realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, &in);
+	if (n < 0)
+		goto done;
+	slice_consume(&in, n);
+	r->time = ts;
+	if (in.len < 2)
+		goto done;
+
+	r->tz_offset = get_be16(in.buf);
+	slice_consume(&in, 2);
+
+	slice_reset(&dest);
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	slice_consume(&in, n);
+
+	r->message = reftable_realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	slice_release(&dest);
+	return start.len - in.len;
+
+done:
+	slice_release(&dest);
+	return REFTABLE_FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL)
+		a = empty;
+
+	if (b == NULL)
+		b = empty;
+
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(byte *a, byte *b, int sz)
+{
+	if (a == NULL)
+		a = zero;
+
+	if (b == NULL)
+		b = zero;
+
+	return !memcmp(a, b, sz);
+}
+
+bool reftable_log_record_equal(struct reftable_log_record *a,
+			       struct reftable_log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+static bool reftable_log_record_is_deletion_void(const void *p)
+{
+	return reftable_log_record_is_deletion(
+		(const struct reftable_log_record *)p);
+}
+
+struct reftable_record_vtable reftable_log_record_vtable = {
+	.key = &reftable_log_record_key,
+	.type = BLOCK_TYPE_LOG,
+	.copy_from = &reftable_log_record_copy_from,
+	.val_type = &reftable_log_record_val_type,
+	.encode = &reftable_log_record_encode,
+	.decode = &reftable_log_record_decode,
+	.clear = &reftable_log_record_clear_void,
+	.is_deletion = &reftable_log_record_is_deletion_void,
+};
+
+struct reftable_record reftable_new_record(byte typ)
+{
+	struct reftable_record rec = { NULL };
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct reftable_ref_record *r =
+			reftable_calloc(sizeof(struct reftable_ref_record));
+		reftable_record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct reftable_obj_record *r =
+			reftable_calloc(sizeof(struct reftable_obj_record));
+		reftable_record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct reftable_log_record *r =
+			reftable_calloc(sizeof(struct reftable_log_record));
+		reftable_record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct reftable_index_record empty = { .last_key = SLICE_INIT };
+		struct reftable_index_record *r =
+			reftable_calloc(sizeof(struct reftable_index_record));
+		*r = empty;
+		reftable_record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+void *reftable_record_yield(struct reftable_record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+void reftable_record_destroy(struct reftable_record *rec)
+{
+	reftable_record_clear(rec);
+	reftable_free(reftable_record_yield(rec));
+}
+
+static void reftable_index_record_key(const void *r, struct slice *dest)
+{
+	struct reftable_index_record *rec = (struct reftable_index_record *)r;
+	slice_reset(dest);
+	slice_addbuf(dest, &rec->last_key);
+}
+
+static void reftable_index_record_copy_from(void *rec, const void *src_rec,
+					    int hash_size)
+{
+	struct reftable_index_record *dst = (struct reftable_index_record *)rec;
+	struct reftable_index_record *src =
+		(struct reftable_index_record *)src_rec;
+
+	slice_reset(&dst->last_key);
+	slice_addbuf(&dst->last_key, &src->last_key);
+	dst->offset = src->offset;
+}
+
+static void reftable_index_record_clear(void *rec)
+{
+	struct reftable_index_record *idx = (struct reftable_index_record *)rec;
+	slice_release(&idx->last_key);
+}
+
+static byte reftable_index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int reftable_index_record_encode(const void *rec, struct slice out,
+					int hash_size)
+{
+	const struct reftable_index_record *r =
+		(const struct reftable_index_record *)rec;
+	struct slice start = out;
+
+	int n = put_var_int(&out, r->offset);
+	if (n < 0)
+		return n;
+
+	slice_consume(&out, n);
+
+	return start.len - out.len;
+}
+
+static int reftable_index_record_decode(void *rec, struct slice key,
+					byte val_type, struct slice in,
+					int hash_size)
+{
+	struct slice start = in;
+	struct reftable_index_record *r = (struct reftable_index_record *)rec;
+	int n = 0;
+
+	slice_reset(&r->last_key);
+	slice_addbuf(&r->last_key, &key);
+
+	n = get_var_int(&r->offset, &in);
+	if (n < 0)
+		return n;
+
+	slice_consume(&in, n);
+	return start.len - in.len;
+}
+
+struct reftable_record_vtable reftable_index_record_vtable = {
+	.key = &reftable_index_record_key,
+	.type = BLOCK_TYPE_INDEX,
+	.copy_from = &reftable_index_record_copy_from,
+	.val_type = &reftable_index_record_val_type,
+	.encode = &reftable_index_record_encode,
+	.decode = &reftable_index_record_decode,
+	.clear = &reftable_index_record_clear,
+	.is_deletion = &not_a_deletion,
+};
+
+void reftable_record_key(struct reftable_record *rec, struct slice *dest)
+{
+	rec->ops->key(rec->data, dest);
+}
+
+byte reftable_record_type(struct reftable_record *rec)
+{
+	return rec->ops->type;
+}
+
+int reftable_record_encode(struct reftable_record *rec, struct slice dest,
+			   int hash_size)
+{
+	return rec->ops->encode(rec->data, dest, hash_size);
+}
+
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size)
+{
+	assert(src->ops->type == rec->ops->type);
+
+	rec->ops->copy_from(rec->data, src->data, hash_size);
+}
+
+byte reftable_record_val_type(struct reftable_record *rec)
+{
+	return rec->ops->val_type(rec->data);
+}
+
+int reftable_record_decode(struct reftable_record *rec, struct slice key,
+			   byte extra, struct slice src, int hash_size)
+{
+	return rec->ops->decode(rec->data, key, extra, src, hash_size);
+}
+
+void reftable_record_clear(struct reftable_record *rec)
+{
+	rec->ops->clear(rec->data);
+}
+
+bool reftable_record_is_deletion(struct reftable_record *rec)
+{
+	return rec->ops->is_deletion(rec->data);
+}
+
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *ref_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = ref_rec;
+	rec->ops = &reftable_ref_record_vtable;
+}
+
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *obj_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = obj_rec;
+	rec->ops = &reftable_obj_record_vtable;
+}
+
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *index_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = index_rec;
+	rec->ops = &reftable_index_record_vtable;
+}
+
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *log_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = log_rec;
+	rec->ops = &reftable_log_record_vtable;
+}
+
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_REF);
+	return (struct reftable_ref_record *)rec->data;
+}
+
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_LOG);
+	return (struct reftable_log_record *)rec->data;
+}
+
+static bool hash_equal(byte *a, byte *b, int hash_size)
+{
+	if (a != NULL && b != NULL)
+		return !memcmp(a, b, hash_size);
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL)
+		return 0 == strcmp(a, b);
+
+	return a == b;
+}
+
+bool reftable_ref_record_equal(struct reftable_ref_record *a,
+			       struct reftable_ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int reftable_ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_ref_record *)a)->ref_name,
+		      ((struct reftable_ref_record *)b)->ref_name);
+}
+
+bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int reftable_log_record_compare_key(const void *a, const void *b)
+{
+	struct reftable_log_record *la = (struct reftable_log_record *)a;
+	struct reftable_log_record *lb = (struct reftable_log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp)
+		return cmp;
+	if (la->update_index > lb->update_index)
+		return -1;
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
+{
+	return (log->new_hash == NULL && log->old_hash == NULL &&
+		log->name == NULL && log->email == NULL &&
+		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
+		log->message == NULL);
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 00000000000..f9cd6cc3ce2
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,128 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "slice.h"
+
+/* utilities for de/encoding varints */
+
+int get_var_int(uint64_t *dest, struct slice *in);
+int put_var_int(struct slice *dest, uint64_t val);
+
+/* Methods for records. */
+struct reftable_record_vtable {
+	/* encode the key of to a byte slice. */
+	void (*key)(const void *rec, struct slice *dest);
+
+	/* The record type of ('r' for ref). */
+	byte type;
+
+	void (*copy_from)(void *dest, const void *src, int hash_size);
+
+	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
+	 * vs ref deletion) */
+	byte (*val_type)(const void *rec);
+
+	/* encodes rec into dest, returning how much space was used. */
+	int (*encode)(const void *rec, struct slice dest, int hash_size);
+
+	/* decode data from `src` into the record. */
+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
+		      int hash_size);
+
+	/* deallocate and null the record. */
+	void (*clear)(void *rec);
+
+	/* is this a tombstone? */
+	bool (*is_deletion)(const void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct reftable_record {
+	void *data;
+	struct reftable_record_vtable *ops;
+};
+
+/* returns true for recognized block types. Block start with the block type. */
+int reftable_is_block_type(byte typ);
+
+/* creates a malloced record of the given type. Dispose with record_destroy */
+struct reftable_record reftable_new_record(byte typ);
+
+extern struct reftable_record_vtable reftable_ref_record_vtable;
+
+/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
+   number of bytes written. */
+int reftable_encode_key(bool *restart, struct slice dest, struct slice prev_key,
+			struct slice key, byte extra);
+
+/* Decode into `key` and `extra` from `in` */
+int reftable_decode_key(struct slice *key, byte *extra, struct slice last_key,
+			struct slice in);
+
+/* reftable_index_record are used internally to speed up lookups. */
+struct reftable_index_record {
+	uint64_t offset; /* Offset of block */
+	struct slice last_key; /* Last key of the block. */
+};
+
+/* reftable_obj_record stores an object ID => ref mapping. */
+struct reftable_obj_record {
+	byte *hash_prefix; /* leading bytes of the object ID */
+	int hash_prefix_len; /* number of leading bytes. Constant
+			      * across a single table. */
+	uint64_t *offsets; /* a vector of file offsets. */
+	int offset_len;
+};
+
+/* see struct record_vtable */
+
+void reftable_record_key(struct reftable_record *rec, struct slice *dest);
+byte reftable_record_type(struct reftable_record *rec);
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size);
+byte reftable_record_val_type(struct reftable_record *rec);
+int reftable_record_encode(struct reftable_record *rec, struct slice dest,
+			   int hash_size);
+int reftable_record_decode(struct reftable_record *rec, struct slice key,
+			   byte extra, struct slice src, int hash_size);
+bool reftable_record_is_deletion(struct reftable_record *rec);
+
+/* zeroes out the embedded record */
+void reftable_record_clear(struct reftable_record *rec);
+
+/* clear out the record, yielding the reftable_record data that was
+ * encapsulated. */
+void *reftable_record_yield(struct reftable_record *rec);
+
+/* clear and deallocate embedded record, and zero `rec`. */
+void reftable_record_destroy(struct reftable_record *rec);
+
+/* initialize generic records from concrete records. The generic record should
+ * be zeroed out. */
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *objrec);
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *idxrec);
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *refrec);
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *logrec);
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *ref);
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *ref);
+
+/* for qsort. */
+int reftable_ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int reftable_log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/record_test.c b/reftable/record_test.c
new file mode 100644
index 00000000000..8e988f5c00a
--- /dev/null
+++ b/reftable/record_test.c
@@ -0,0 +1,403 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include "system.h"
+#include "basics.h"
+#include "constants.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_copy(struct reftable_record *rec)
+{
+	struct reftable_record copy =
+		reftable_new_record(reftable_record_type(rec));
+	reftable_record_copy_from(&copy, rec, SHA1_SIZE);
+	/* do it twice to catch memory leaks */
+	reftable_record_copy_from(&copy, rec, SHA1_SIZE);
+	switch (reftable_record_type(&copy)) {
+	case BLOCK_TYPE_REF:
+		assert(reftable_ref_record_equal(reftable_record_as_ref(&copy),
+						 reftable_record_as_ref(rec),
+						 SHA1_SIZE));
+		break;
+	case BLOCK_TYPE_LOG:
+		assert(reftable_log_record_equal(reftable_record_as_log(&copy),
+						 reftable_record_as_log(rec),
+						 SHA1_SIZE));
+		break;
+	}
+	reftable_record_destroy(&copy);
+}
+
+static void test_varint_roundtrip(void)
+{
+	uint64_t inputs[] = { 0,
+			      1,
+			      27,
+			      127,
+			      128,
+			      257,
+			      4096,
+			      ((uint64_t)1 << 63),
+			      ((uint64_t)1 << 63) + ((uint64_t)1 << 63) - 1 };
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(inputs); i++) {
+		byte dest[10];
+
+		struct slice out = { .buf = dest, .len = 10, .cap = 10 };
+
+		uint64_t in = inputs[i];
+		int n = put_var_int(&out, in);
+		uint64_t got = 0;
+
+		assert(n > 0);
+		out.len = n;
+		n = get_var_int(&got, &out);
+		assert(n > 0);
+
+		assert(got == in);
+	}
+}
+
+static void test_common_prefix(void)
+{
+	struct {
+		const char *a, *b;
+		int want;
+	} cases[] = {
+		{ "abc", "ab", 2 },
+		{ "", "abc", 0 },
+		{ "abc", "abd", 2 },
+		{ "abc", "pqr", 0 },
+	};
+
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		struct slice a = SLICE_INIT;
+		struct slice b = SLICE_INIT;
+		slice_addstr(&a, cases[i].a);
+		slice_addstr(&b, cases[i].b);
+		assert(common_prefix_size(&a, &b) == cases[i].want);
+
+		slice_release(&a);
+		slice_release(&b);
+	}
+}
+
+static void set_hash(byte *h, int j)
+{
+	int i = 0;
+	for (i = 0; i < hash_size(SHA1_ID); i++) {
+		h[i] = (j >> i) & 0xff;
+	}
+}
+
+static void test_reftable_ref_record_roundtrip(void)
+{
+	int i = 0;
+
+	for (i = 0; i <= 3; i++) {
+		struct reftable_ref_record in = { 0 };
+		struct reftable_ref_record out = {
+			.ref_name = xstrdup("old name"),
+			.value = reftable_calloc(SHA1_SIZE),
+			.target_value = reftable_calloc(SHA1_SIZE),
+			.target = xstrdup("old value"),
+		};
+		struct reftable_record rec_out = { 0 };
+		struct slice key = SLICE_INIT;
+		struct reftable_record rec = { 0 };
+		struct slice dest = SLICE_INIT;
+		int n, m;
+
+		switch (i) {
+		case 0:
+			break;
+		case 1:
+			in.value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.value, 1);
+			break;
+		case 2:
+			in.value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.value, 1);
+			in.target_value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.target_value, 2);
+			break;
+		case 3:
+			in.target = xstrdup("target");
+			break;
+		}
+		in.ref_name = xstrdup("refs/heads/master");
+
+		reftable_record_from_ref(&rec, &in);
+		test_copy(&rec);
+
+		assert(reftable_record_val_type(&rec) == i);
+
+		reftable_record_key(&rec, &key);
+		slice_grow(&dest, 1024);
+		slice_setlen(&dest, 1024);
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n > 0);
+
+		/* decode into a non-zero reftable_record to test for leaks. */
+
+		reftable_record_from_ref(&rec_out, &out);
+		m = reftable_record_decode(&rec_out, key, i, dest, SHA1_SIZE);
+		assert(n == m);
+
+		assert((out.value != NULL) == (in.value != NULL));
+		assert((out.target_value != NULL) == (in.target_value != NULL));
+		assert((out.target != NULL) == (in.target != NULL));
+		reftable_record_clear(&rec_out);
+
+		slice_release(&key);
+		slice_release(&dest);
+		reftable_ref_record_clear(&in);
+	}
+}
+
+static void test_reftable_log_record_equal(void)
+{
+	struct reftable_log_record in[2] = {
+		{
+			.ref_name = xstrdup("refs/heads/master"),
+			.update_index = 42,
+		},
+		{
+			.ref_name = xstrdup("refs/heads/master"),
+			.update_index = 22,
+		}
+	};
+
+	assert(!reftable_log_record_equal(&in[0], &in[1], SHA1_SIZE));
+	in[1].update_index = in[0].update_index;
+	assert(reftable_log_record_equal(&in[0], &in[1], SHA1_SIZE));
+	reftable_log_record_clear(&in[0]);
+	reftable_log_record_clear(&in[1]);
+}
+
+static void test_reftable_log_record_roundtrip(void)
+{
+	struct reftable_log_record in[2] = {
+		{
+			.ref_name = xstrdup("refs/heads/master"),
+			.old_hash = reftable_malloc(SHA1_SIZE),
+			.new_hash = reftable_malloc(SHA1_SIZE),
+			.name = xstrdup("han-wen"),
+			.email = xstrdup("hanwen@google.com"),
+			.message = xstrdup("test"),
+			.update_index = 42,
+			.time = 1577123507,
+			.tz_offset = 100,
+		},
+		{
+			.ref_name = xstrdup("refs/heads/master"),
+			.update_index = 22,
+		}
+	};
+	set_test_hash(in[0].new_hash, 1);
+	set_test_hash(in[0].old_hash, 2);
+	for (int i = 0; i < ARRAY_SIZE(in); i++) {
+		struct reftable_record rec = { 0 };
+		struct slice key = SLICE_INIT;
+		struct slice dest = SLICE_INIT;
+		/* populate out, to check for leaks. */
+		struct reftable_log_record out = {
+			.ref_name = xstrdup("old name"),
+			.new_hash = reftable_calloc(SHA1_SIZE),
+			.old_hash = reftable_calloc(SHA1_SIZE),
+			.name = xstrdup("old name"),
+			.email = xstrdup("old@email"),
+			.message = xstrdup("old message"),
+		};
+		struct reftable_record rec_out = { 0 };
+		int n, m, valtype;
+
+		reftable_record_from_log(&rec, &in[i]);
+
+		test_copy(&rec);
+
+		reftable_record_key(&rec, &key);
+
+		slice_grow(&dest, 1024);
+		slice_setlen(&dest, 1024);
+
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n >= 0);
+		reftable_record_from_log(&rec_out, &out);
+		valtype = reftable_record_val_type(&rec);
+		m = reftable_record_decode(&rec_out, key, valtype, dest,
+					   SHA1_SIZE);
+		assert(n == m);
+
+		assert(reftable_log_record_equal(&in[i], &out, SHA1_SIZE));
+		reftable_log_record_clear(&in[i]);
+		slice_release(&key);
+		slice_release(&dest);
+		reftable_record_clear(&rec_out);
+	}
+}
+
+static void test_u24_roundtrip(void)
+{
+	uint32_t in = 0x112233;
+	byte dest[3];
+	uint32_t out;
+	put_be24(dest, in);
+	out = get_be24(dest);
+	assert(in == out);
+}
+
+static void test_key_roundtrip(void)
+{
+	struct slice dest = SLICE_INIT;
+	struct slice last_key = SLICE_INIT;
+	struct slice key = SLICE_INIT;
+	struct slice roundtrip = SLICE_INIT;
+	bool restart;
+	byte extra;
+	int n, m;
+	byte rt_extra;
+
+	slice_grow(&dest, 1024);
+	slice_setlen(&dest, 1024);
+	slice_addstr(&last_key, "refs/heads/master");
+	slice_addstr(&key, "refs/tags/bla");
+	extra = 6;
+	n = reftable_encode_key(&restart, dest, last_key, key, extra);
+	assert(!restart);
+	assert(n > 0);
+
+	m = reftable_decode_key(&roundtrip, &rt_extra, last_key, dest);
+	assert(n == m);
+	assert(0 == slice_cmp(&key, &roundtrip));
+	assert(rt_extra == extra);
+
+	slice_release(&last_key);
+	slice_release(&key);
+	slice_release(&dest);
+	slice_release(&roundtrip);
+}
+
+static void test_reftable_obj_record_roundtrip(void)
+{
+	byte testHash1[SHA1_SIZE] = { 1, 2, 3, 4, 0 };
+	uint64_t till9[] = { 1, 2, 3, 4, 500, 600, 700, 800, 9000 };
+	struct reftable_obj_record recs[3] = { {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+						       .offsets = till9,
+						       .offset_len = 3,
+					       },
+					       {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+						       .offsets = till9,
+						       .offset_len = 9,
+					       },
+					       {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+					       } };
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(recs); i++) {
+		struct reftable_obj_record in = recs[i];
+		struct slice dest = SLICE_INIT;
+		struct reftable_record rec = { 0 };
+		struct slice key = SLICE_INIT;
+		struct reftable_obj_record out = { 0 };
+		struct reftable_record rec_out = { 0 };
+		int n, m;
+		byte extra;
+
+		reftable_record_from_obj(&rec, &in);
+		test_copy(&rec);
+		reftable_record_key(&rec, &key);
+		slice_grow(&dest, 1024);
+		slice_setlen(&dest, 1024);
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n > 0);
+		extra = reftable_record_val_type(&rec);
+		reftable_record_from_obj(&rec_out, &out);
+		m = reftable_record_decode(&rec_out, key, extra, dest,
+					   SHA1_SIZE);
+		assert(n == m);
+
+		assert(in.hash_prefix_len == out.hash_prefix_len);
+		assert(in.offset_len == out.offset_len);
+
+		assert(!memcmp(in.hash_prefix, out.hash_prefix,
+			       in.hash_prefix_len));
+		assert(0 == memcmp(in.offsets, out.offsets,
+				   sizeof(uint64_t) * in.offset_len));
+		slice_release(&key);
+		slice_release(&dest);
+		reftable_record_clear(&rec_out);
+	}
+}
+
+static void test_reftable_index_record_roundtrip(void)
+{
+	struct reftable_index_record in = {
+		.offset = 42,
+		.last_key = SLICE_INIT,
+	};
+	struct slice dest = SLICE_INIT;
+	struct slice key = SLICE_INIT;
+	struct reftable_record rec = { 0 };
+	struct reftable_index_record out = { .last_key = SLICE_INIT };
+	struct reftable_record out_rec = { NULL };
+	int n, m;
+	byte extra;
+
+	slice_addstr(&in.last_key, "refs/heads/master");
+	reftable_record_from_index(&rec, &in);
+	reftable_record_key(&rec, &key);
+	test_copy(&rec);
+
+	assert(0 == slice_cmp(&key, &in.last_key));
+	slice_grow(&dest, 1024);
+	slice_setlen(&dest, 1024);
+	n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+	assert(n > 0);
+
+	extra = reftable_record_val_type(&rec);
+	reftable_record_from_index(&out_rec, &out);
+	m = reftable_record_decode(&out_rec, key, extra, dest, SHA1_SIZE);
+	assert(m == n);
+
+	assert(in.offset == out.offset);
+
+	reftable_record_clear(&out_rec);
+	slice_release(&key);
+	slice_release(&in.last_key);
+	slice_release(&dest);
+}
+
+int record_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_reftable_log_record_equal",
+		      &test_reftable_log_record_equal);
+	add_test_case("test_reftable_log_record_roundtrip",
+		      &test_reftable_log_record_roundtrip);
+	add_test_case("test_reftable_ref_record_roundtrip",
+		      &test_reftable_ref_record_roundtrip);
+	add_test_case("test_varint_roundtrip", &test_varint_roundtrip);
+	add_test_case("test_key_roundtrip", &test_key_roundtrip);
+	add_test_case("test_common_prefix", &test_common_prefix);
+	add_test_case("test_reftable_obj_record_roundtrip",
+		      &test_reftable_obj_record_roundtrip);
+	add_test_case("test_reftable_index_record_roundtrip",
+		      &test_reftable_index_record_roundtrip);
+	add_test_case("test_u24_roundtrip", &test_u24_roundtrip);
+	return test_main(argc, argv);
+}
diff --git a/reftable/refname.c b/reftable/refname.c
new file mode 100644
index 00000000000..0f1615d5385
--- /dev/null
+++ b/reftable/refname.c
@@ -0,0 +1,211 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+#include "reftable.h"
+#include "basics.h"
+#include "refname.h"
+#include "slice.h"
+
+struct find_arg {
+	char **names;
+	const char *want;
+};
+
+static int find_name(size_t k, void *arg)
+{
+	struct find_arg *f_arg = (struct find_arg *)arg;
+	return strcmp(f_arg->names[k], f_arg->want) >= 0;
+}
+
+int modification_has_ref(struct modification *mod, const char *name)
+{
+	struct reftable_ref_record ref = { 0 };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = name,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len && !strcmp(mod->add[idx], name)) {
+			return 0;
+		}
+	}
+
+	if (mod->del_len > 0) {
+		struct find_arg arg = {
+			.names = mod->del,
+			.want = name,
+		};
+		int idx = binsearch(mod->del_len, find_name, &arg);
+		if (idx < mod->del_len && !strcmp(mod->del[idx], name)) {
+			return 1;
+		}
+	}
+
+	err = reftable_table_read_ref(&mod->tab, name, &ref);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static void modification_clear(struct modification *mod)
+{
+	/* don't delete the strings themselves; they're owned by ref records.
+	 */
+	FREE_AND_NULL(mod->add);
+	FREE_AND_NULL(mod->del);
+	mod->add_len = 0;
+	mod->del_len = 0;
+}
+
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix)
+{
+	struct reftable_iterator it = { NULL };
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = prefix,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len &&
+		    !strncmp(prefix, mod->add[idx], strlen(prefix)))
+			goto done;
+	}
+	err = reftable_table_seek_ref(&mod->tab, &it, prefix);
+	if (err)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err)
+			goto done;
+
+		if (mod->del_len > 0) {
+			struct find_arg arg = {
+				.names = mod->del,
+				.want = ref.ref_name,
+			};
+			int idx = binsearch(mod->del_len, find_name, &arg);
+			if (idx < mod->del_len &&
+			    !strcmp(ref.ref_name, mod->del[idx])) {
+				continue;
+			}
+		}
+
+		if (strncmp(ref.ref_name, prefix, strlen(prefix))) {
+			err = 1;
+			goto done;
+		}
+		err = 0;
+		goto done;
+	}
+
+done:
+	reftable_ref_record_clear(&ref);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int validate_ref_name(const char *name)
+{
+	while (true) {
+		char *next = strchr(name, '/');
+		if (!*name) {
+			return REFTABLE_REFNAME_ERROR;
+		}
+		if (!next) {
+			return 0;
+		}
+		if (next - name == 0 || (next - name == 1 && *name == '.') ||
+		    (next - name == 2 && name[0] == '.' && name[1] == '.'))
+			return REFTABLE_REFNAME_ERROR;
+		name = next + 1;
+	}
+	return 0;
+}
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz)
+{
+	struct modification mod = {
+		.tab = tab,
+		.add = reftable_calloc(sizeof(char *) * sz),
+		.del = reftable_calloc(sizeof(char *) * sz),
+	};
+	int i = 0;
+	int err = 0;
+	for (; i < sz; i++) {
+		if (reftable_ref_record_is_deletion(&recs[i])) {
+			mod.del[mod.del_len++] = recs[i].ref_name;
+		} else {
+			mod.add[mod.add_len++] = recs[i].ref_name;
+		}
+	}
+
+	err = modification_validate(&mod);
+	modification_clear(&mod);
+	return err;
+}
+
+static void slice_trim_component(struct slice *sl)
+{
+	while (sl->len > 0) {
+		bool is_slash = (sl->buf[sl->len - 1] == '/');
+		slice_setlen(sl, sl->len - 1);
+		if (is_slash)
+			break;
+	}
+}
+
+int modification_validate(struct modification *mod)
+{
+	struct slice slashed = SLICE_INIT;
+	int err = 0;
+	int i = 0;
+	for (; i < mod->add_len; i++) {
+		err = validate_ref_name(mod->add[i]);
+		if (err)
+			goto done;
+		slice_reset(&slashed);
+		slice_addstr(&slashed, mod->add[i]);
+		slice_addstr(&slashed, "/");
+
+		err = modification_has_ref_with_prefix(
+			mod, slice_as_string(&slashed));
+		if (err == 0) {
+			err = REFTABLE_NAME_CONFLICT;
+			goto done;
+		}
+		if (err < 0)
+			goto done;
+
+		slice_reset(&slashed);
+		slice_addstr(&slashed, mod->add[i]);
+		while (slashed.len) {
+			slice_trim_component(&slashed);
+			err = modification_has_ref(mod,
+						   slice_as_string(&slashed));
+			if (err == 0) {
+				err = REFTABLE_NAME_CONFLICT;
+				goto done;
+			}
+			if (err < 0)
+				goto done;
+		}
+	}
+	err = 0;
+done:
+	slice_release(&slashed);
+	return err;
+}
diff --git a/reftable/refname.h b/reftable/refname.h
new file mode 100644
index 00000000000..f335287fcdb
--- /dev/null
+++ b/reftable/refname.h
@@ -0,0 +1,38 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+#ifndef REFNAME_H
+#define REFNAME_H
+
+#include "reftable.h"
+
+struct modification {
+	struct reftable_table tab;
+
+	char **add;
+	size_t add_len;
+
+	char **del;
+	size_t del_len;
+};
+
+// -1 = error, 0 = found, 1 = not found
+int modification_has_ref(struct modification *mod, const char *name);
+
+// -1 = error, 0 = found, 1 = not found.
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix);
+
+// 0 = OK.
+int validate_ref_name(const char *name);
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz);
+
+int modification_validate(struct modification *mod);
+
+#endif
diff --git a/reftable/refname_test.c b/reftable/refname_test.c
new file mode 100644
index 00000000000..c8a91ccaa18
--- /dev/null
+++ b/reftable/refname_test.c
@@ -0,0 +1,99 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "record.h"
+#include "refname.h"
+#include "system.h"
+
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+struct testcase {
+	char *add;
+	char *del;
+	int error_code;
+};
+
+static void test_conflict(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct slice buf = SLICE_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, &buf, &opts);
+	struct reftable_ref_record rec = {
+		.ref_name = "a/b",
+		.target = "destination", /* make sure it's not a symref. */
+		.update_index = 1,
+	};
+	int err;
+	int i;
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct testcase cases[] = {
+		{ "a/b/c", NULL, REFTABLE_NAME_CONFLICT },
+		{ "b", NULL, 0 },
+		{ "a", NULL, REFTABLE_NAME_CONFLICT },
+		{ "a", "a/b", 0 },
+
+		{ "p/", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p//q", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p/./q", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p/../q", NULL, REFTABLE_REFNAME_ERROR },
+
+		{ "a/b/c", "a/b", 0 },
+		{ NULL, "a//b", 0 },
+	};
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_add_ref(w, &rec);
+	assert_err(err);
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+
+	block_source_from_slice(&source, &buf);
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	reftable_table_from_reader(&tab, rd);
+
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		struct modification mod = {
+			.tab = tab,
+		};
+
+		if (cases[i].add != NULL) {
+			mod.add = &cases[i].add;
+			mod.add_len = 1;
+		}
+		if (cases[i].del != NULL) {
+			mod.del = &cases[i].del;
+			mod.del_len = 1;
+		}
+
+		err = modification_validate(&mod);
+		assert(err == cases[i].error_code);
+	}
+
+	reftable_reader_free(rd);
+	slice_release(&buf);
+}
+
+int refname_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_conflict", &test_conflict);
+	return test_main(argc, argv);
+}
diff --git a/reftable/reftable-tests.h b/reftable/reftable-tests.h
new file mode 100644
index 00000000000..e5955e2cecf
--- /dev/null
+++ b/reftable/reftable-tests.h
@@ -0,0 +1,22 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_TESTS_H
+#define REFTABLE_TESTS_H
+
+int block_test_main(int argc, const char **argv);
+int merged_test_main(int argc, const char **argv);
+int record_test_main(int argc, const char **argv);
+int refname_test_main(int argc, const char **argv);
+int reftable_test_main(int argc, const char **argv);
+int slice_test_main(int argc, const char **argv);
+int stack_test_main(int argc, const char **argv);
+int tree_test_main(int argc, const char **argv);
+int reftable_dump_main(int argc, char *const *argv);
+
+#endif
diff --git a/reftable/reftable.c b/reftable/reftable.c
new file mode 100644
index 00000000000..6e7d1da10e7
--- /dev/null
+++ b/reftable/reftable.c
@@ -0,0 +1,90 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+#include "record.h"
+#include "reader.h"
+#include "merged.h"
+
+struct reftable_table_vtable {
+	int (*seek)(void *tab, struct reftable_iterator *it,
+		    struct reftable_record *);
+};
+
+static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
+				     struct reftable_record *rec)
+{
+	return reader_seek((struct reftable_reader *)tab, it, rec);
+}
+
+static struct reftable_table_vtable reader_vtable = {
+	.seek = reftable_reader_seek_void,
+};
+
+static int reftable_merged_table_seek_void(void *tab,
+					   struct reftable_iterator *it,
+					   struct reftable_record *rec)
+{
+	return merged_table_seek_record((struct reftable_merged_table *)tab, it,
+					rec);
+}
+
+static struct reftable_table_vtable merged_table_vtable = {
+	.seek = reftable_merged_table_seek_void,
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return tab->ops->seek(tab->table_arg, it, &rec);
+}
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &reader_vtable;
+	tab->table_arg = reader;
+}
+
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *merged)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &merged_table_vtable;
+	tab->table_arg = merged;
+}
+
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_iterator it = { 0 };
+	int err = reftable_table_seek_ref(tab, &it, name);
+	if (err)
+		goto done;
+
+	err = reftable_iterator_next_ref(&it, ref);
+	if (err)
+		goto done;
+
+	if (strcmp(ref->ref_name, name) ||
+	    reftable_ref_record_is_deletion(ref)) {
+		reftable_ref_record_clear(ref);
+		err = 1;
+		goto done;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 00000000000..3038dbe9564
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,571 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *));
+
+/****************************************************************
+ Basic data types
+
+ Reftables store the state of each ref in struct reftable_ref_record, and they
+ store a sequence of reflog updates in struct reftable_log_record.
+ ****************************************************************/
+
+/* reftable_ref_record holds a ref database entry target_value */
+struct reftable_ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
+
+/* prints a reftable_ref_record onto stdout */
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id);
+
+/* frees and nulls all pointer values. */
+void reftable_ref_record_clear(struct reftable_ref_record *ref);
+
+/* returns whether two reftable_ref_records are the same */
+int reftable_ref_record_equal(struct reftable_ref_record *a,
+			      struct reftable_ref_record *b, int hash_size);
+
+/* reftable_log_record holds a reflog entry */
+struct reftable_log_record {
+	char *ref_name;
+	uint64_t update_index; /* logical timestamp of a transactional update.
+				*/
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int reftable_log_record_is_deletion(const struct reftable_log_record *log);
+
+/* frees and nulls all pointer values. */
+void reftable_log_record_clear(struct reftable_log_record *log);
+
+/* returns whether two records are equal. */
+int reftable_log_record_equal(struct reftable_log_record *a,
+			      struct reftable_log_record *b, int hash_size);
+
+/* dumps a reftable_log_record on stdout, for debugging/testing. */
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id);
+
+/****************************************************************
+ Error handling
+
+ Error are signaled with negative integer return values. 0 means success.
+ ****************************************************************/
+
+/* different types of errors */
+enum reftable_error {
+	/* Unexpected file system behavior */
+	REFTABLE_IO_ERROR = -2,
+
+	/* Format inconsistency on reading data
+	 */
+	REFTABLE_FORMAT_ERROR = -3,
+
+	/* File does not exist. Returned from block_source_from_file(),  because
+	   it needs special handling in stack.
+	*/
+	REFTABLE_NOT_EXIST_ERROR = -4,
+
+	/* Trying to write out-of-date data. */
+	REFTABLE_LOCK_ERROR = -5,
+
+	/* Misuse of the API:
+	   - on writing a record with NULL ref_name.
+	   - on writing a reftable_ref_record outside the table limits
+	   - on writing a ref or log record before the stack's next_update_index
+	   - on writing a log record with multiline message with
+	   exact_log_message unset
+	   - on reading a reftable_ref_record from log iterator, or vice versa.
+	*/
+	REFTABLE_API_ERROR = -6,
+
+	/* Decompression error */
+	REFTABLE_ZLIB_ERROR = -7,
+
+	/* Wrote a table without blocks. */
+	REFTABLE_EMPTY_TABLE_ERROR = -8,
+
+	/* Dir/file conflict. */
+	REFTABLE_NAME_CONFLICT = -9,
+
+	/* Illegal ref name. */
+	REFTABLE_REFNAME_ERROR = -10,
+};
+
+/* convert the numeric error code to a string. The string should not be
+ * deallocated. */
+const char *reftable_error_str(int err);
+
+/*
+ * Convert the numeric error code to an equivalent errno code.
+ */
+int reftable_error_to_errno(int err);
+
+/****************************************************************
+ Writing
+
+ Writing single reftables
+ ****************************************************************/
+
+/* reftable_write_options sets options for writing a single reftable. */
+struct reftable_write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+
+	/* boolean: do not check ref names for validity or dir/file conflicts.
+	 */
+	int skip_name_check;
+
+	/* boolean: copy log messages exactly. If unset, check that the message
+	 *   is a single line, and add '\n' if missing.
+	 */
+	int exact_log_message;
+};
+
+/* reftable_block_stats holds statistics for a single block type */
+struct reftable_block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct reftable_stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct reftable_block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct reftable_block_stats obj_stats;
+	/* stats for index blocks */
+	struct reftable_block_stats idx_stats;
+	/* stats for log blocks */
+	struct reftable_block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* reftable_new_writer creates a new writer */
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, uint8_t *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int reftable_fd_write(void *fdp, uint8_t *data, size_t size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+
+   For transactional updates, typically min==max. When converting an existing
+   ref database into a single reftable, this would be a range of update-index
+   timestamps.
+ */
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max);
+
+/* adds a reftable_ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned.
+
+   It is an error to write a ref record after a log record.
+ */
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n);
+
+/* adds a reftable_log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n);
+
+/* reftable_writer_close finalizes the reftable. The writer is retained so
+ * statistics can be inspected. */
+int reftable_writer_close(struct reftable_writer *w);
+
+/* writer_stats returns the statistics on the reftable being written.
+
+   This struct becomes invalid when the writer is freed.
+ */
+const struct reftable_stats *writer_stats(struct reftable_writer *w);
+
+/* reftable_writer_free deallocates memory for the writer */
+void reftable_writer_free(struct reftable_writer *w);
+
+/****************************************************************
+ * ITERATING
+ ****************************************************************/
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct reftable_iterator {
+	struct reftable_iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref);
+
+/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log);
+
+/* releases resources associated with an iterator. */
+void reftable_iterator_destroy(struct reftable_iterator *it);
+
+/****************************************************************
+ Reading single tables
+
+ The follow routines are for reading single files. For an application-level
+ interface, skip ahead to struct reftable_merged_table and struct
+ reftable_stack.
+ ****************************************************************/
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct reftable_block_source {
+	struct reftable_block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct reftable_block {
+	uint8_t *data;
+	int len;
+	struct reftable_block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct reftable_block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct reftable_block *dest,
+			  uint64_t off, uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct reftable_block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int reftable_block_source_from_file(struct reftable_block_source *block_src,
+				    const char *name);
+
+/* The reader struct is a handle to an open reftable file. */
+struct reftable_reader;
+
+/* reftable_new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp. The name is used for creating a stack. Typically, it is the
+ * basename of the file. The block source `src` is owned by the reader, and is
+ * closed on calling reftable_reader_destroy().
+ */
+int reftable_new_reader(struct reftable_reader **pp,
+			struct reftable_block_source *src, const char *name);
+
+/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
+   in the table.  To seek to the start of the table, use name = "".
+
+   example:
+
+   struct reftable_reader *r = NULL;
+   int err = reftable_new_reader(&r, &src, "filename");
+   if (err < 0) { ... }
+   struct reftable_iterator it  = {0};
+   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct reftable_ref_record ref  = {0};
+   while (1) {
+     err = reftable_iterator_next_ref(&it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   reftable_iterator_destroy(&it);
+   reftable_ref_record_clear(&ref);
+ */
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reftable_reader_hash_id(struct reftable_reader *r);
+
+/* seek to logs for the given name, older than update_index. To seek to the
+   start of the table, use name = "".
+ */
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reftable_reader_free(struct reftable_reader *);
+
+/* return an iterator for the refs pointing to `oid`. */
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid);
+
+/* return the max_update_index for a table */
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
+
+/****************************************************************
+ Merged tables
+
+ A ref database kept in a sequence of table files. The merged_table presents a
+ unified view to reading (seeking, iterating) a sequence of immutable tables.
+ ****************************************************************/
+
+/* A merged table is implements seeking/iterating over a stack of tables. */
+struct reftable_merged_table;
+
+/* reftable_new_merged_table creates a new merged table. It takes ownership of
+   the stack array.
+*/
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id);
+
+/* returns an iterator positioned just before 'name' */
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index);
+
+/* like reftable_merged_table_seek_log_at but look for the newest entry. */
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
+
+/* closes readers for the merged tables */
+void reftable_merged_table_close(struct reftable_merged_table *mt);
+
+/* releases memory for the merged_table */
+void reftable_merged_table_free(struct reftable_merged_table *m);
+
+/****************************************************************
+ Generic tables
+
+ A unified API for reading tables, either merged tables, or single readers.
+ ****************************************************************/
+
+struct reftable_table {
+	struct reftable_table_vtable *ops;
+	void *table_arg;
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name);
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader);
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *table);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref);
+
+/****************************************************************
+ Mutable ref database
+
+ The stack presents an interface to a mutable sequence of reftables.
+ ****************************************************************/
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct reftable_stack;
+
+/* open a new reftable stack. The tables along with the table list will be
+   stored in 'dir'. Typically, this should be .git/reftables.
+*/
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
+
+/* holds a transaction to add tables at the top of a stack. */
+struct reftable_addition;
+
+/*
+  returns a new transaction to add reftables to the given stack. As a side
+  effect, the ref database is locked.
+*/
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st);
+
+/* Adds a reftable to transaction. */
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg);
+
+/* Commits the transaction, releasing the lock. */
+int reftable_addition_commit(struct reftable_addition *add);
+
+/* Release all non-committed data from the transaction, and deallocate the
+   transaction. Releases the lock if held. */
+void reftable_addition_destroy(struct reftable_addition *add);
+
+/* add a new table to the stack. The write_table function must call
+   reftable_writer_set_limits, add refs and return an error value. */
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write_table)(struct reftable_writer *wr,
+					  void *write_arg),
+		       void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st);
+
+/* frees all resources associated with the stack. */
+void reftable_stack_destroy(struct reftable_stack *st);
+
+/* Reloads the stack if necessary. This is very cheap to run if the stack was up
+ * to date */
+int reftable_stack_reload(struct reftable_stack *st);
+
+/* Policy for expiring reflog entries. */
+struct reftable_log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int reftable_stack_auto_compact(struct reftable_stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log);
+
+/* statistics on past compactions. */
+struct reftable_compaction_stats {
+	uint64_t bytes; /* total number of bytes written */
+	uint64_t entries_written; /* total number of entries written, including
+				     failures. */
+	int attempts; /* how often we tried to compact */
+	int failures; /* failures happen on concurrent updates */
+};
+
+/* return statistics for compaction up till now. */
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st);
+
+#endif
diff --git a/reftable/reftable_test.c b/reftable/reftable_test.c
new file mode 100644
index 00000000000..f7ce66a00e9
--- /dev/null
+++ b/reftable/reftable_test.c
@@ -0,0 +1,631 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "record.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static const int update_index = 5;
+
+static void test_buffer(void)
+{
+	struct slice buf = SLICE_INIT;
+	struct reftable_block_source source = { NULL };
+	struct reftable_block out = { 0 };
+	int n;
+	byte in[] = "hello";
+	slice_add(&buf, in, sizeof(in));
+	block_source_from_slice(&source, &buf);
+	assert(block_source_size(&source) == 6);
+	n = block_source_read_block(&source, &out, 0, sizeof(in));
+	assert(n == sizeof(in));
+	assert(!memcmp(in, out.data, n));
+	reftable_block_done(&out);
+
+	n = block_source_read_block(&source, &out, 1, 2);
+	assert(n == 2);
+	assert(!memcmp(out.data, "el", 2));
+
+	reftable_block_done(&out);
+	block_source_close(&source);
+	slice_release(&buf);
+}
+
+static void test_default_write_opts(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct slice buf = SLICE_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, &buf, &opts);
+
+	struct reftable_ref_record rec = {
+		.ref_name = "master",
+		.update_index = 1,
+	};
+	int err;
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader **readers = malloc(sizeof(*readers) * 1);
+	uint32_t hash_id;
+	struct reftable_reader *rd = NULL;
+	struct reftable_merged_table *merged = NULL;
+
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_add_ref(w, &rec);
+	assert_err(err);
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+
+	block_source_from_slice(&source, &buf);
+
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	hash_id = reftable_reader_hash_id(rd);
+	assert(hash_id == SHA1_ID);
+
+	readers[0] = rd;
+
+	err = reftable_new_merged_table(&merged, readers, 1, SHA1_ID);
+	assert_err(err);
+
+	reftable_merged_table_close(merged);
+	reftable_merged_table_free(merged);
+	slice_release(&buf);
+}
+
+static void write_table(char ***names, struct slice *buf, int N, int block_size,
+			uint32_t hash_id)
+{
+	struct reftable_write_options opts = {
+		.block_size = block_size,
+		.hash_id = hash_id,
+	};
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, buf, &opts);
+	struct reftable_ref_record ref = { 0 };
+	int i = 0, n;
+	struct reftable_log_record log = { 0 };
+	const struct reftable_stats *stats = NULL;
+	*names = reftable_calloc(sizeof(char *) * (N + 1));
+	reftable_writer_set_limits(w, update_index, update_index);
+	for (i = 0; i < N; i++) {
+		byte hash[SHA256_SIZE] = { 0 };
+		char name[100];
+		int n;
+
+		set_test_hash(hash, i);
+
+		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
+
+		ref.ref_name = name;
+		ref.value = hash;
+		ref.update_index = update_index;
+		(*names)[i] = xstrdup(name);
+
+		n = reftable_writer_add_ref(w, &ref);
+		assert(n == 0);
+	}
+
+	for (i = 0; i < N; i++) {
+		byte hash[SHA256_SIZE] = { 0 };
+		char name[100];
+		int n;
+
+		set_test_hash(hash, i);
+
+		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
+
+		log.ref_name = name;
+		log.new_hash = hash;
+		log.update_index = update_index;
+		log.message = "message";
+
+		n = reftable_writer_add_log(w, &log);
+		assert(n == 0);
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	stats = writer_stats(w);
+	for (i = 0; i < stats->ref_stats.blocks; i++) {
+		int off = i * opts.block_size;
+		if (off == 0) {
+			off = header_size((hash_id == SHA256_ID) ? 2 : 1);
+		}
+		assert(buf->buf[off] == 'r');
+	}
+
+	assert(stats->log_stats.blocks > 0);
+	reftable_writer_free(w);
+}
+
+static void test_log_buffer_size(void)
+{
+	struct slice buf = SLICE_INIT;
+	struct reftable_write_options opts = {
+		.block_size = 4096,
+	};
+	int err;
+	struct reftable_log_record log = {
+		.ref_name = "refs/heads/master",
+		.name = "Han-Wen Nienhuys",
+		.email = "hanwen@google.com",
+		.tz_offset = 100,
+		.time = 0x5e430672,
+		.update_index = 0xa,
+		.message = "commit: 9\n",
+	};
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, &buf, &opts);
+
+	/* This tests buffer extension for log compression. Must use a random
+	   hash, to ensure that the compressed part is larger than the original.
+	*/
+	byte hash1[SHA1_SIZE], hash2[SHA1_SIZE];
+	for (int i = 0; i < SHA1_SIZE; i++) {
+		hash1[i] = (byte)(rand() % 256);
+		hash2[i] = (byte)(rand() % 256);
+	}
+	log.old_hash = hash1;
+	log.new_hash = hash2;
+	reftable_writer_set_limits(w, update_index, update_index);
+	err = reftable_writer_add_log(w, &log);
+	assert_err(err);
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+	slice_release(&buf);
+}
+
+static void test_log_write_read(void)
+{
+	int N = 2;
+	char **names = reftable_calloc(sizeof(char *) * (N + 1));
+	int err;
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_ref_record ref = { 0 };
+	int i = 0;
+	struct reftable_log_record log = { 0 };
+	int n;
+	struct reftable_iterator it = { 0 };
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	struct slice buf = SLICE_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, &buf, &opts);
+	const struct reftable_stats *stats = NULL;
+	reftable_writer_set_limits(w, 0, N);
+	for (i = 0; i < N; i++) {
+		char name[256];
+		struct reftable_ref_record ref = { 0 };
+		snprintf(name, sizeof(name), "b%02d%0*d", i, 130, 7);
+		names[i] = xstrdup(name);
+		puts(name);
+		ref.ref_name = name;
+		ref.update_index = i;
+
+		err = reftable_writer_add_ref(w, &ref);
+		assert_err(err);
+	}
+	for (i = 0; i < N; i++) {
+		byte hash1[SHA1_SIZE], hash2[SHA1_SIZE];
+		struct reftable_log_record log = { 0 };
+		set_test_hash(hash1, i);
+		set_test_hash(hash2, i + 1);
+
+		log.ref_name = names[i];
+		log.update_index = i;
+		log.old_hash = hash1;
+		log.new_hash = hash2;
+
+		err = reftable_writer_add_log(w, &log);
+		assert_err(err);
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	stats = writer_stats(w);
+	assert(stats->log_stats.blocks > 0);
+	reftable_writer_free(w);
+	w = NULL;
+
+	block_source_from_slice(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.log");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, names[N - 1]);
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert_err(err);
+
+	/* end of iteration. */
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert(0 < err);
+
+	reftable_iterator_destroy(&it);
+	reftable_ref_record_clear(&ref);
+
+	err = reftable_reader_seek_log(&rd, &it, "");
+	assert_err(err);
+
+	i = 0;
+	while (true) {
+		int err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			break;
+		}
+
+		assert_err(err);
+		assert_streq(names[i], log.ref_name);
+		assert(i == log.update_index);
+		i++;
+		reftable_log_record_clear(&log);
+	}
+
+	assert(i == N);
+	reftable_iterator_destroy(&it);
+
+	/* cleanup. */
+	slice_release(&buf);
+	free_names(names);
+	reader_close(&rd);
+}
+
+static void test_table_read_write_sequential(void)
+{
+	char **names;
+	struct slice buf = SLICE_INIT;
+	int N = 50;
+	struct reftable_iterator it = { 0 };
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader rd = { 0 };
+	int err = 0;
+	int j = 0;
+
+	write_table(&names, &buf, N, 256, SHA1_ID);
+
+	block_source_from_slice(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, "");
+	assert_err(err);
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		int r = reftable_iterator_next_ref(&it, &ref);
+		assert(r >= 0);
+		if (r > 0) {
+			break;
+		}
+		assert(0 == strcmp(names[j], ref.ref_name));
+		assert(update_index == ref.update_index);
+
+		j++;
+		reftable_ref_record_clear(&ref);
+	}
+	assert(j == N);
+	reftable_iterator_destroy(&it);
+	slice_release(&buf);
+	free_names(names);
+
+	reader_close(&rd);
+}
+
+static void test_table_write_small_table(void)
+{
+	char **names;
+	struct slice buf = SLICE_INIT;
+	int N = 1;
+	write_table(&names, &buf, N, 4096, SHA1_ID);
+	assert(buf.len < 200);
+	slice_release(&buf);
+	free_names(names);
+}
+
+static void test_table_read_api(void)
+{
+	char **names;
+	struct slice buf = SLICE_INIT;
+	int N = 50;
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	int err;
+	int i;
+	struct reftable_log_record log = { 0 };
+	struct reftable_iterator it = { 0 };
+
+	write_table(&names, &buf, N, 256, SHA1_ID);
+
+	block_source_from_slice(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, names[0]);
+	assert_err(err);
+
+	err = reftable_iterator_next_log(&it, &log);
+	assert(err == REFTABLE_API_ERROR);
+
+	slice_release(&buf);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_free(names);
+	reader_close(&rd);
+	slice_release(&buf);
+}
+
+static void test_table_read_write_seek(bool index, int hash_id)
+{
+	char **names;
+	struct slice buf = SLICE_INIT;
+	int N = 50;
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	int err;
+	int i = 0;
+
+	struct reftable_iterator it = { 0 };
+	struct slice pastLast = SLICE_INIT;
+	struct reftable_ref_record ref = { 0 };
+
+	write_table(&names, &buf, N, 256, hash_id);
+
+	block_source_from_slice(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+	assert(hash_id == reftable_reader_hash_id(&rd));
+
+	if (!index) {
+		rd.ref_offsets.index_offset = 0;
+	} else {
+		assert(rd.ref_offsets.index_offset > 0);
+	}
+
+	for (i = 1; i < N; i++) {
+		int err = reftable_reader_seek_ref(&rd, &it, names[i]);
+		assert_err(err);
+		err = reftable_iterator_next_ref(&it, &ref);
+		assert_err(err);
+		assert(0 == strcmp(names[i], ref.ref_name));
+		assert(i == ref.value[0]);
+
+		reftable_ref_record_clear(&ref);
+		reftable_iterator_destroy(&it);
+	}
+
+	slice_addstr(&pastLast, names[N - 1]);
+	slice_addstr(&pastLast, "/");
+
+	err = reftable_reader_seek_ref(&rd, &it, slice_as_string(&pastLast));
+	if (err == 0) {
+		struct reftable_ref_record ref = { 0 };
+		int err = reftable_iterator_next_ref(&it, &ref);
+		assert(err > 0);
+	} else {
+		assert(err > 0);
+	}
+
+	slice_release(&pastLast);
+	reftable_iterator_destroy(&it);
+
+	slice_release(&buf);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+	reftable_free(names);
+	reader_close(&rd);
+}
+
+static void test_table_read_write_seek_linear(void)
+{
+	test_table_read_write_seek(false, SHA1_ID);
+}
+
+static void test_table_read_write_seek_linear_sha256(void)
+{
+	test_table_read_write_seek(false, SHA256_ID);
+}
+
+static void test_table_read_write_seek_index(void)
+{
+	test_table_read_write_seek(true, SHA1_ID);
+}
+
+static void test_table_refs_for(bool indexed)
+{
+	int N = 50;
+	char **want_names = reftable_calloc(sizeof(char *) * (N + 1));
+	int want_names_len = 0;
+	byte want_hash[SHA1_SIZE];
+
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_ref_record ref = { 0 };
+	int i = 0;
+	int n;
+	int err;
+	struct reftable_reader rd;
+	struct reftable_block_source source = { 0 };
+
+	struct slice buf = SLICE_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, &buf, &opts);
+
+	struct reftable_iterator it = { 0 };
+	int j;
+
+	set_test_hash(want_hash, 4);
+
+	for (i = 0; i < N; i++) {
+		byte hash[SHA1_SIZE];
+		char fill[51] = { 0 };
+		char name[100];
+		byte hash1[SHA1_SIZE];
+		byte hash2[SHA1_SIZE];
+		struct reftable_ref_record ref = { 0 };
+
+		memset(hash, i, sizeof(hash));
+		memset(fill, 'x', 50);
+		/* Put the variable part in the start */
+		snprintf(name, sizeof(name), "br%02d%s", i, fill);
+		name[40] = 0;
+		ref.ref_name = name;
+
+		set_test_hash(hash1, i / 4);
+		set_test_hash(hash2, 3 + i / 4);
+		ref.value = hash1;
+		ref.target_value = hash2;
+
+		/* 80 bytes / entry, so 3 entries per block. Yields 17
+		 */
+		/* blocks. */
+		n = reftable_writer_add_ref(w, &ref);
+		assert(n == 0);
+
+		if (!memcmp(hash1, want_hash, SHA1_SIZE) ||
+		    !memcmp(hash2, want_hash, SHA1_SIZE)) {
+			want_names[want_names_len++] = xstrdup(name);
+		}
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	reftable_writer_free(w);
+	w = NULL;
+
+	block_source_from_slice(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+	if (!indexed) {
+		rd.obj_offsets.present = 0;
+	}
+
+	err = reftable_reader_seek_ref(&rd, &it, "");
+	assert_err(err);
+	reftable_iterator_destroy(&it);
+
+	err = reftable_reader_refs_for(&rd, &it, want_hash);
+	assert_err(err);
+
+	j = 0;
+	while (true) {
+		int err = reftable_iterator_next_ref(&it, &ref);
+		assert(err >= 0);
+		if (err > 0) {
+			break;
+		}
+
+		assert(j < want_names_len);
+		assert(0 == strcmp(ref.ref_name, want_names[j]));
+		j++;
+		reftable_ref_record_clear(&ref);
+	}
+	assert(j == want_names_len);
+
+	slice_release(&buf);
+	free_names(want_names);
+	reftable_iterator_destroy(&it);
+	reader_close(&rd);
+}
+
+static void test_table_refs_for_no_index(void)
+{
+	test_table_refs_for(false);
+}
+
+static void test_table_refs_for_obj_index(void)
+{
+	test_table_refs_for(true);
+}
+
+static void test_table_empty(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct slice buf = SLICE_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&slice_add_void, &buf, &opts);
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_ref_record rec = { 0 };
+	struct reftable_iterator it = { 0 };
+	int err;
+
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_close(w);
+	assert(err == REFTABLE_EMPTY_TABLE_ERROR);
+	reftable_writer_free(w);
+
+	assert(buf.len == header_size(1) + footer_size(1));
+
+	block_source_from_slice(&source, &buf);
+
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &rec);
+	assert(err > 0);
+
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	slice_release(&buf);
+}
+
+int reftable_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_default_write_opts", test_default_write_opts);
+	add_test_case("test_log_write_read", test_log_write_read);
+	add_test_case("test_table_read_write_seek_linear_sha256",
+		      &test_table_read_write_seek_linear_sha256);
+	add_test_case("test_log_buffer_size", test_log_buffer_size);
+	add_test_case("test_table_write_small_table",
+		      &test_table_write_small_table);
+	add_test_case("test_buffer", &test_buffer);
+	add_test_case("test_table_read_api", &test_table_read_api);
+	add_test_case("test_table_read_write_sequential",
+		      &test_table_read_write_sequential);
+	add_test_case("test_table_read_write_seek_linear",
+		      &test_table_read_write_seek_linear);
+	add_test_case("test_table_read_write_seek_index",
+		      &test_table_read_write_seek_index);
+	add_test_case("test_table_read_write_refs_for_no_index",
+		      &test_table_refs_for_no_index);
+	add_test_case("test_table_read_write_refs_for_obj_index",
+		      &test_table_refs_for_obj_index);
+	add_test_case("test_table_empty", &test_table_empty);
+	return test_main(argc, argv);
+}
diff --git a/reftable/slice.c b/reftable/slice.c
new file mode 100644
index 00000000000..2b0140359f4
--- /dev/null
+++ b/reftable/slice.c
@@ -0,0 +1,217 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "reftable.h"
+
+struct slice reftable_empty_slice = SLICE_INIT;
+
+void slice_init(struct slice *s)
+{
+	struct slice empty = SLICE_INIT;
+	*s = empty;
+}
+
+void slice_grow(struct slice *s, size_t extra)
+{
+	size_t newcap = s->len + extra + 1;
+	if (newcap > s->cap) {
+		s->buf = reftable_realloc(s->buf, newcap);
+		s->cap = newcap;
+	}
+}
+
+static void slice_resize(struct slice *s, int l)
+{
+	int zl = l + 1; /* one byte for 0 termination. */
+	assert(s->canary == SLICE_CANARY);
+	if (s->cap < zl) {
+		int c = s->cap * 2;
+		if (c < zl) {
+			c = zl;
+		}
+		s->cap = c;
+		s->buf = reftable_realloc(s->buf, s->cap);
+	}
+	s->len = l;
+	s->buf[l] = 0;
+}
+
+void slice_setlen(struct slice *s, size_t l)
+{
+	assert(s->cap >= l + 1);
+	s->len = l;
+	s->buf[l] = 0;
+}
+
+void slice_reset(struct slice *s)
+{
+	slice_resize(s, 0);
+}
+
+void slice_addstr(struct slice *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+	assert(d->canary == SLICE_CANARY);
+
+	slice_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void slice_addbuf(struct slice *s, struct slice *a)
+{
+	int end = s->len;
+	assert(s->canary == SLICE_CANARY);
+	slice_resize(s, s->len + a->len);
+	memcpy(s->buf + end, a->buf, a->len);
+}
+
+void slice_consume(struct slice *s, int n)
+{
+	assert(s->canary == SLICE_CANARY);
+	s->buf += n;
+	s->len -= n;
+}
+
+char *slice_detach(struct slice *s)
+{
+	char *p = NULL;
+	slice_as_string(s);
+	p = (char *)s->buf;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void slice_release(struct slice *s)
+{
+	byte *ptr = s->buf;
+	assert(s->canary == SLICE_CANARY);
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	reftable_free(ptr);
+}
+
+/* return the underlying data as char*. len is left unchanged, but
+   a \0 is added at the end. */
+const char *slice_as_string(struct slice *s)
+{
+	return (const char *)s->buf;
+}
+
+int slice_cmp(const struct slice *a, const struct slice *b)
+{
+	int min = a->len < b->len ? a->len : b->len;
+	int res = memcmp(a->buf, b->buf, min);
+	assert(a->canary == SLICE_CANARY);
+	assert(b->canary == SLICE_CANARY);
+	if (res != 0)
+		return res;
+	if (a->len < b->len)
+		return -1;
+	else if (a->len > b->len)
+		return 1;
+	else
+		return 0;
+}
+
+int slice_add(struct slice *b, const byte *data, size_t sz)
+{
+	assert(b->canary == SLICE_CANARY);
+	slice_grow(b, sz);
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	b->buf[b->len] = 0;
+	return sz;
+}
+
+int slice_add_void(void *b, byte *data, size_t sz)
+{
+	return slice_add((struct slice *)b, data, sz);
+}
+
+static uint64_t slice_size(void *b)
+{
+	return ((struct slice *)b)->len;
+}
+
+static void slice_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void slice_close(void *b)
+{
+}
+
+static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	struct slice *b = (struct slice *)v;
+	assert(off + size <= b->len);
+	dest->data = reftable_calloc(size);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable slice_vtable = {
+	.size = &slice_size,
+	.read_block = &slice_read_block,
+	.return_block = &slice_return_block,
+	.close = &slice_close,
+};
+
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf)
+{
+	assert(bs->ops == NULL);
+	bs->ops = &slice_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+struct reftable_block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct reftable_block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct reftable_block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
+
+int common_prefix_size(struct slice *a, struct slice *b)
+{
+	int p = 0;
+	assert(a->canary == SLICE_CANARY);
+	assert(b->canary == SLICE_CANARY);
+	while (p < a->len && p < b->len) {
+		if (a->buf[p] != b->buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
diff --git a/reftable/slice.h b/reftable/slice.h
new file mode 100644
index 00000000000..b3f910fa007
--- /dev/null
+++ b/reftable/slice.h
@@ -0,0 +1,85 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#include "basics.h"
+#include "reftable.h"
+
+/*
+  Provides a bounds-checked, growable byte ranges. To use, initialize as "slice
+  x = SLICE_INIT;"
+ */
+struct slice {
+	int len;
+	int cap;
+	byte *buf;
+
+	/* Used to enforce initialization with SLICE_INIT */
+	byte canary;
+};
+#define SLICE_CANARY 0x42
+#define SLICE_INIT                       \
+	{                                \
+		0, 0, NULL, SLICE_CANARY \
+	}
+extern struct slice reftable_empty_slice;
+
+void slice_addstr(struct slice *dest, const char *src);
+
+/* Deallocate and clear slice */
+void slice_release(struct slice *slice);
+
+/* Set slice to 0 length, but retain buffer. */
+void slice_reset(struct slice *slice);
+
+/* Initializes a slice. Accepts a slice with random garbage. */
+void slice_init(struct slice *slice);
+
+/* Ensure that `buf` is \0 terminated. */
+const char *slice_as_string(struct slice *src);
+
+/* Return `buf`, clearing out `s` */
+char *slice_detach(struct slice *s);
+
+/* Set length of the slace to `l`, but don't reallocated. */
+void slice_setlen(struct slice *s, size_t l);
+
+/* Ensure `l` bytes beyond current length are available */
+void slice_grow(struct slice *s, size_t l);
+
+/* Signed comparison */
+int slice_cmp(const struct slice *a, const struct slice *b);
+
+/* Append `data` to the `dest` slice.  */
+int slice_add(struct slice *dest, const byte *data, size_t sz);
+
+/* Append `add` to `dest. */
+void slice_addbuf(struct slice *dest, struct slice *add);
+
+/* Like slice_add, but suitable for passing to reftable_new_writer
+ */
+int slice_add_void(void *b, byte *data, size_t sz);
+
+/* Find the longest shared prefix size of `a` and `b` */
+int common_prefix_size(struct slice *a, struct slice *b);
+
+struct reftable_block_source;
+
+/* Create an in-memory block source for reading reftables */
+void block_source_from_slice(struct reftable_block_source *bs,
+			     struct slice *buf);
+
+struct reftable_block_source malloc_block_source(void);
+
+/* Advance `buf` by `n`, and decrease length. A copy of the slice
+   should be kept for deallocating the slice. */
+void slice_consume(struct slice *s, int n);
+
+#endif
diff --git a/reftable/slice_test.c b/reftable/slice_test.c
new file mode 100644
index 00000000000..1f16f2f5090
--- /dev/null
+++ b/reftable/slice_test.c
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "slice.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_slice(void)
+{
+	struct slice s = SLICE_INIT;
+	struct slice t = SLICE_INIT;
+
+	slice_addstr(&s, "abc");
+	assert(0 == strcmp("abc", slice_as_string(&s)));
+
+	slice_addstr(&t, "pqr");
+	slice_addbuf(&s, &t);
+	assert(0 == strcmp("abcpqr", slice_as_string(&s)));
+
+	slice_release(&s);
+	slice_release(&t);
+}
+
+int slice_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_slice", &test_slice);
+	return test_main(argc, argv);
+}
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 00000000000..43155af81fa
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1214 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "refname.h"
+#include "reftable.h"
+#include "writer.h"
+
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config)
+{
+	struct reftable_stack *p =
+		reftable_calloc(sizeof(struct reftable_stack));
+	struct slice list_file_name = SLICE_INIT;
+	int err = 0;
+
+	if (config.hash_id == 0) {
+		config.hash_id = SHA1_ID;
+	}
+
+	*dest = NULL;
+
+	slice_reset(&list_file_name);
+	slice_addstr(&list_file_name, dir);
+	slice_addstr(&list_file_name, "/tables.list");
+
+	p->list_file = slice_detach(&list_file_name);
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = reftable_stack_reload_maybe_reuse(p, true);
+	if (err < 0) {
+		reftable_stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fd_read_lines(int fd, char ***namesp)
+{
+	off_t size = lseek(fd, 0, SEEK_END);
+	char *buf = NULL;
+	int err = 0;
+	if (size < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+	err = lseek(fd, 0, SEEK_SET);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	buf = reftable_malloc(size + 1);
+	if (read(fd, buf, size) != size) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+
+done:
+	reftable_free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	int fd = open(filename, O_RDONLY, 0644);
+	int err = 0;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			*namesp = reftable_calloc(sizeof(char *));
+			return 0;
+		}
+
+		return REFTABLE_IO_ERROR;
+	}
+	err = fd_read_lines(fd, namesp);
+	close(fd);
+	return err;
+}
+
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void reftable_stack_destroy(struct reftable_stack *st)
+{
+	if (st->merged != NULL) {
+		reftable_merged_table_close(st->merged);
+		reftable_merged_table_free(st->merged);
+		st->merged = NULL;
+	}
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	reftable_free(st);
+}
+
+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
+						   int cur_len)
+{
+	struct reftable_reader **cur =
+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
+				      bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reftable_reader **new_tables =
+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
+	int new_tables_len = 0;
+	struct reftable_merged_table *new_merged = NULL;
+	int i;
+
+	while (*names) {
+		struct reftable_reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct reftable_block_source src = { 0 };
+			struct slice table_path = SLICE_INIT;
+			slice_addstr(&table_path, st->reftable_dir);
+			slice_addstr(&table_path, "/");
+			slice_addstr(&table_path, name);
+
+			err = reftable_block_source_from_file(
+				&src, slice_as_string(&table_path));
+			slice_release(&table_path);
+
+			if (err < 0)
+				goto done;
+
+			err = reftable_new_reader(&rd, &src, name);
+			if (err < 0)
+				goto done;
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
+					st->config.hash_id);
+	if (err < 0)
+		goto done;
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		reftable_merged_table_free(st->merged);
+	}
+	new_merged->suppress_deletions = true;
+	st->merged = new_merged;
+
+	for (i = 0; i < cur_len; i++) {
+		if (cur[i] != NULL) {
+			reader_close(cur[i]);
+			reftable_reader_free(cur[i]);
+		}
+	}
+
+done:
+	for (i = 0; i < new_tables_len; i++) {
+		reader_close(new_tables[i]);
+		reftable_reader_free(new_tables[i]);
+	}
+	reftable_free(new_tables);
+	reftable_free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0)
+		return diff;
+
+	return udiff;
+}
+
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0)
+		return err;
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		int err2 = 0;
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = reftable_stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != REFTABLE_NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
+		   writer. Check if there was one by checking if the name list
+		   changed.
+		*/
+		err2 = read_lines(st->list_file, &names_after);
+		if (err2 < 0) {
+			free_names(names);
+			return err2;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return err;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 1;
+		sleep_millisec(delay);
+	}
+
+	return 0;
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct reftable_stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0)
+		return err;
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto done;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto done;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto done;
+	}
+
+done:
+	free_names(names);
+	return err;
+}
+
+int reftable_stack_reload(struct reftable_stack *st)
+{
+	int err = stack_uptodate(st);
+	if (err > 0)
+		return reftable_stack_reload_maybe_reuse(st, true);
+	return err;
+}
+
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write)(struct reftable_writer *wr, void *arg),
+		       void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == REFTABLE_LOCK_ERROR) {
+			/* Ignore error return, we want to propagate
+			   REFTABLE_LOCK_ERROR.
+			*/
+			reftable_stack_reload(st);
+		}
+		return err;
+	}
+
+	if (!st->disable_auto_compact)
+		return reftable_stack_auto_compact(st);
+
+	return 0;
+}
+
+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
+	slice_reset(dest);
+	slice_addstr(dest, buf);
+}
+
+struct reftable_addition {
+	int lock_file_fd;
+	struct slice lock_file_name;
+	struct reftable_stack *stack;
+	char **names;
+	char **new_tables;
+	int new_tables_len;
+	uint64_t next_update_index;
+};
+
+#define REFTABLE_ADDITION_INIT               \
+	{                                    \
+		.lock_file_name = SLICE_INIT \
+	}
+
+static int reftable_stack_init_addition(struct reftable_addition *add,
+					struct reftable_stack *st)
+{
+	int err = 0;
+	add->stack = st;
+
+	slice_reset(&add->lock_file_name);
+	slice_addstr(&add->lock_file_name, st->list_file);
+	slice_addstr(&add->lock_file_name, ".lock");
+
+	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (add->lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = REFTABLE_LOCK_ERROR;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	err = stack_uptodate(st);
+	if (err < 0)
+		goto done;
+
+	if (err > 1) {
+		err = REFTABLE_LOCK_ERROR;
+		goto done;
+	}
+
+	add->next_update_index = reftable_stack_next_update_index(st);
+done:
+	if (err) {
+		reftable_addition_close(add);
+	}
+	return err;
+}
+
+void reftable_addition_close(struct reftable_addition *add)
+{
+	int i = 0;
+	struct slice nm = SLICE_INIT;
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_reset(&nm);
+		slice_addstr(&nm, add->stack->list_file);
+		slice_addstr(&nm, "/");
+		slice_addstr(&nm, add->new_tables[i]);
+		unlink(slice_as_string(&nm));
+		reftable_free(add->new_tables[i]);
+		add->new_tables[i] = NULL;
+	}
+	reftable_free(add->new_tables);
+	add->new_tables = NULL;
+	add->new_tables_len = 0;
+
+	if (add->lock_file_fd > 0) {
+		close(add->lock_file_fd);
+		add->lock_file_fd = 0;
+	}
+	if (add->lock_file_name.len > 0) {
+		unlink(slice_as_string(&add->lock_file_name));
+		slice_release(&add->lock_file_name);
+	}
+
+	free_names(add->names);
+	add->names = NULL;
+	slice_release(&nm);
+}
+
+void reftable_addition_destroy(struct reftable_addition *add)
+{
+	if (add == NULL) {
+		return;
+	}
+	reftable_addition_close(add);
+	reftable_free(add);
+}
+
+int reftable_addition_commit(struct reftable_addition *add)
+{
+	struct slice table_list = SLICE_INIT;
+	int i = 0;
+	int err = 0;
+	if (add->new_tables_len == 0)
+		goto done;
+
+	for (i = 0; i < add->stack->merged->stack_len; i++) {
+		slice_addstr(&table_list, add->stack->merged->stack[i]->name);
+		slice_addstr(&table_list, "\n");
+	}
+	for (i = 0; i < add->new_tables_len; i++) {
+		slice_addstr(&table_list, add->new_tables[i]);
+		slice_addstr(&table_list, "\n");
+	}
+
+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	slice_release(&table_list);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = close(add->lock_file_fd);
+	add->lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = rename(slice_as_string(&add->lock_file_name),
+		     add->stack->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = reftable_stack_reload(add->stack);
+
+done:
+	reftable_addition_close(add);
+	return err;
+}
+
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st)
+{
+	int err = 0;
+	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
+	*dest = reftable_calloc(sizeof(**dest));
+	**dest = empty;
+	err = reftable_stack_init_addition(*dest, st);
+	if (err) {
+		reftable_free(*dest);
+		*dest = NULL;
+	}
+	return err;
+}
+
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg)
+{
+	struct reftable_addition add = REFTABLE_ADDITION_INIT;
+	int err = reftable_stack_init_addition(&add, st);
+	if (err < 0)
+		goto done;
+	if (err > 0) {
+		err = REFTABLE_LOCK_ERROR;
+		goto done;
+	}
+
+	err = reftable_addition_add(&add, write_table, arg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_addition_commit(&add);
+done:
+	reftable_addition_close(&add);
+	return err;
+}
+
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg)
+{
+	struct slice temp_tab_file_name = SLICE_INIT;
+	struct slice tab_file_name = SLICE_INIT;
+	struct slice next_name = SLICE_INIT;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+
+	slice_reset(&next_name);
+	format_name(&next_name, add->next_update_index, add->next_update_index);
+
+	slice_addstr(&temp_tab_file_name, add->stack->reftable_dir);
+	slice_addstr(&temp_tab_file_name, "/");
+	slice_addbuf(&temp_tab_file_name, &next_name);
+	slice_addstr(&temp_tab_file_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
+	if (tab_fd < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
+				 &add->stack->config);
+	err = write_table(wr, arg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_writer_close(wr);
+	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto done;
+	}
+	if (err < 0)
+		goto done;
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = stack_check_addition(add->stack,
+				   slice_as_string(&temp_tab_file_name));
+	if (err < 0)
+		goto done;
+
+	if (wr->min_update_index < add->next_update_index) {
+		err = REFTABLE_API_ERROR;
+		goto done;
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	slice_addstr(&next_name, ".ref");
+
+	slice_addstr(&tab_file_name, add->stack->reftable_dir);
+	slice_addstr(&tab_file_name, "/");
+	slice_addbuf(&tab_file_name, &next_name);
+
+	/* TODO: should check destination out of paranoia */
+	err = rename(slice_as_string(&temp_tab_file_name),
+		     slice_as_string(&tab_file_name));
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	add->new_tables = reftable_realloc(add->new_tables,
+					   sizeof(*add->new_tables) *
+						   (add->new_tables_len + 1));
+	add->new_tables[add->new_tables_len] = slice_detach(&next_name);
+	add->new_tables_len++;
+done:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_file_name.len > 0) {
+		unlink(slice_as_string(&temp_tab_file_name));
+	}
+
+	slice_release(&temp_tab_file_name);
+	slice_release(&tab_file_name);
+	slice_release(&next_name);
+	reftable_writer_free(wr);
+	return err;
+}
+
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0)
+		return reftable_reader_max_update_index(
+			       st->merged->stack[sz - 1]) +
+		       1;
+	return 1;
+}
+
+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
+				struct slice *temp_tab,
+				struct reftable_log_expiry_config *config)
+{
+	struct slice next_name = SLICE_INIT;
+	int tab_fd = -1;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reftable_reader_min_update_index(st->merged->stack[first]),
+		    reftable_reader_max_update_index(st->merged->stack[first]));
+
+	slice_reset(temp_tab);
+	slice_addstr(temp_tab, st->reftable_dir);
+	slice_addstr(temp_tab, "/");
+	slice_addbuf(temp_tab, &next_name);
+	slice_addstr(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0)
+		goto done;
+	err = reftable_writer_close(wr);
+	if (err < 0)
+		goto done;
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+done:
+	reftable_writer_free(wr);
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(slice_as_string(temp_tab));
+		slice_release(temp_tab);
+	}
+	slice_release(&next_name);
+	return err;
+}
+
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reftable_reader **subtabs = reftable_calloc(
+		sizeof(struct reftable_reader *) * (last - first + 1));
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+
+	uint64_t entries = 0;
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reftable_reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	reftable_writer_set_limits(wr,
+				   st->merged->stack[first]->min_update_index,
+				   st->merged->stack[last]->max_update_index);
+
+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
+					st->config.hash_id);
+	if (err < 0) {
+		reftable_free(subtabs);
+		goto done;
+	}
+
+	err = reftable_merged_table_seek_ref(mt, &it, "");
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = reftable_writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+	reftable_iterator_destroy(&it);
+
+	err = reftable_merged_table_seek_log(mt, &it, "");
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = reftable_writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		reftable_merged_table_free(mt);
+	}
+	reftable_ref_record_clear(&ref);
+	reftable_log_record_clear(&log);
+	st->stats.entries_written += entries;
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct reftable_stack *st, int first, int last,
+			       struct reftable_log_expiry_config *expiry)
+{
+	struct slice temp_tab_file_name = SLICE_INIT;
+	struct slice new_table_name = SLICE_INIT;
+	struct slice lock_file_name = SLICE_INIT;
+	struct slice ref_list_contents = SLICE_INIT;
+	struct slice new_table_path = SLICE_INIT;
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **listp = NULL;
+	char **delete_on_success =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	char **subtable_locks =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto done;
+	}
+
+	st->stats.attempts++;
+
+	slice_reset(&lock_file_name);
+	slice_addstr(&lock_file_name, st->list_file);
+	slice_addstr(&lock_file_name, ".lock");
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	/* Don't want to write to the lock for now.  */
+	close(lock_file_fd);
+	lock_file_fd = 0;
+
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0)
+		goto done;
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct slice subtab_file_name = SLICE_INIT;
+		struct slice subtab_lock = SLICE_INIT;
+		int sublock_file_fd = -1;
+
+		slice_addstr(&subtab_file_name, st->reftable_dir);
+		slice_addstr(&subtab_file_name, "/");
+		slice_addstr(&subtab_file_name,
+			     reader_name(st->merged->stack[i]));
+
+		slice_reset(&subtab_lock);
+		slice_addbuf(&subtab_lock, &subtab_file_name);
+		slice_addstr(&subtab_lock, ".lock");
+
+		sublock_file_fd = open(slice_as_string(&subtab_lock),
+				       O_EXCL | O_CREAT | O_WRONLY, 0644);
+		if (sublock_file_fd > 0) {
+			close(sublock_file_fd);
+		} else if (sublock_file_fd < 0) {
+			if (errno == EEXIST) {
+				err = 1;
+			} else {
+				err = REFTABLE_IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
+		delete_on_success[j] =
+			(char *)slice_as_string(&subtab_file_name);
+		j++;
+
+		if (err != 0)
+			goto done;
+	}
+
+	err = unlink(slice_as_string(&lock_file_name));
+	if (err < 0)
+		goto done;
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
+				   expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0)
+		goto done;
+
+	lock_file_fd = open(slice_as_string(&lock_file_name),
+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	slice_addstr(&new_table_name, ".ref");
+
+	slice_reset(&new_table_path);
+	slice_addstr(&new_table_path, st->reftable_dir);
+	slice_addstr(&new_table_path, "/");
+	slice_addbuf(&new_table_path, &new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(slice_as_string(&temp_tab_file_name),
+			     slice_as_string(&new_table_path));
+		if (err < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto done;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
+		slice_addstr(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		slice_addbuf(&ref_list_contents, &new_table_name);
+		slice_addstr(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
+		slice_addstr(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto done;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto done;
+	}
+
+	err = rename(slice_as_string(&lock_file_name), st->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(slice_as_string(&new_table_path));
+		goto done;
+	}
+	have_lock = false;
+
+	/* Reload the stack before deleting. On windows, we can only delete the
+	   files after we closed them.
+	*/
+	err = reftable_stack_reload_maybe_reuse(st, first < last);
+
+	listp = delete_on_success;
+	while (*listp) {
+		if (strcmp(*listp, slice_as_string(&new_table_path))) {
+			unlink(*listp);
+		}
+		listp++;
+	}
+
+done:
+	free_names(delete_on_success);
+
+	listp = subtable_locks;
+	while (*listp) {
+		unlink(*listp);
+		listp++;
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(slice_as_string(&lock_file_name));
+	}
+	slice_release(&new_table_name);
+	slice_release(&new_table_path);
+	slice_release(&ref_list_contents);
+	slice_release(&temp_tab_file_name);
+	slice_release(&lock_file_name);
+	return err;
+}
+
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct reftable_stack *st, int first,
+				     int last,
+				     struct reftable_log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	if (sz == 0)
+		return 0;
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+
+	if (n == 0) {
+		*seglen = 0;
+		return segs;
+	}
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	reftable_free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
+{
+	uint64_t *sizes =
+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
+	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
+	int overhead = header_size(version) - 1;
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		sizes[i] = st->merged->stack[i]->size - overhead;
+	}
+	return sizes;
+}
+
+int reftable_stack_auto_compact(struct reftable_stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	reftable_free(sizes);
+	if (segment_size(&seg) > 0)
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+
+	return 0;
+}
+
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st)
+{
+	return &st->stats;
+}
+
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_table tab = { NULL };
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+	return reftable_table_read_ref(&tab, refname, ref);
+}
+
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err)
+		goto done;
+
+	err = reftable_iterator_next_log(&it, log);
+	if (err)
+		goto done;
+
+	if (strcmp(log->ref_name, refname) ||
+	    reftable_log_record_is_deletion(log)) {
+		err = 1;
+		goto done;
+	}
+
+done:
+	if (err) {
+		reftable_log_record_clear(log);
+	}
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name)
+{
+	int err = 0;
+	struct reftable_block_source src = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct reftable_ref_record *refs = NULL;
+	struct reftable_iterator it = { NULL };
+	int cap = 0;
+	int len = 0;
+	int i = 0;
+
+	if (st->config.skip_name_check)
+		return 0;
+
+	err = reftable_block_source_from_file(&src, new_tab_name);
+	if (err < 0)
+		goto done;
+
+	err = reftable_new_reader(&rd, &src, new_tab_name);
+	if (err < 0)
+		goto done;
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	if (err > 0) {
+		err = 0;
+		goto done;
+	}
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0)
+			goto done;
+
+		if (len >= cap) {
+			cap = 2 * cap + 1;
+			refs = reftable_realloc(refs, cap * sizeof(refs[0]));
+		}
+
+		refs[len++] = ref;
+	}
+
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+
+	err = validate_ref_record_addition(tab, refs, len);
+
+done:
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&refs[i]);
+	}
+
+	free(refs);
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 00000000000..60bca3b3e58
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,48 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+#include "system.h"
+
+struct reftable_stack {
+	char *list_file;
+	char *reftable_dir;
+	bool disable_auto_compact;
+
+	struct reftable_write_options config;
+
+	struct reftable_merged_table *merged;
+	struct reftable_compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg);
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config);
+int fastlog2(uint64_t sz);
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
+void reftable_addition_close(struct reftable_addition *add);
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
new file mode 100644
index 00000000000..7d44a1450f0
--- /dev/null
+++ b/reftable/stack_test.c
@@ -0,0 +1,787 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+
+#include "merged.h"
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+#include <sys/types.h>
+#include <dirent.h>
+
+static void test_read_file(void)
+{
+	char fn[256] = "/tmp/stack.test_read_file.XXXXXX";
+	int fd = mkstemp(fn);
+	char out[1024] = "line1\n\nline2\nline3";
+	int n, err;
+	char **names = NULL;
+	char *want[] = { "line1", "line2", "line3" };
+	int i = 0;
+
+	assert(fd > 0);
+	n = write(fd, out, strlen(out));
+	assert(n == strlen(out));
+	err = close(fd);
+	assert(err >= 0);
+
+	err = read_lines(fn, &names);
+	assert_err(err);
+
+	for (i = 0; names[i] != NULL; i++) {
+		assert(0 == strcmp(want[i], names[i]));
+	}
+	free_names(names);
+	remove(fn);
+}
+
+static void test_parse_names(void)
+{
+	char buf[] = "line\n";
+	char **names = NULL;
+	parse_names(buf, strlen(buf), &names);
+
+	assert(NULL != names[0]);
+	assert(0 == strcmp(names[0], "line"));
+	assert(NULL == names[1]);
+	free_names(names);
+}
+
+static void test_names_equal(void)
+{
+	char *a[] = { "a", "b", "c", NULL };
+	char *b[] = { "a", "b", "d", NULL };
+	char *c[] = { "a", "b", NULL };
+
+	assert(names_equal(a, a));
+	assert(!names_equal(a, b));
+	assert(!names_equal(a, c));
+}
+
+static int write_test_ref(struct reftable_writer *wr, void *arg)
+{
+	struct reftable_ref_record *ref = arg;
+	reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
+	return reftable_writer_add_ref(wr, ref);
+}
+
+struct write_log_arg {
+	struct reftable_log_record *log;
+	uint64_t update_index;
+};
+
+static int write_test_log(struct reftable_writer *wr, void *arg)
+{
+	struct write_log_arg *wla = arg;
+
+	reftable_writer_set_limits(wr, wla->update_index, wla->update_index);
+	return reftable_writer_add_log(wr, wla->log);
+}
+
+static void test_reftable_stack_add_one(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record ref = {
+		.ref_name = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st, ref.ref_name, &dest);
+	assert_err(err);
+	assert(0 == strcmp("master", dest.target));
+
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_uptodate(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st1 = NULL;
+	struct reftable_stack *st2 = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int err;
+	struct reftable_ref_record ref1 = {
+		.ref_name = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record ref2 = {
+		.ref_name = "branch2",
+		.update_index = 2,
+		.target = "master",
+	};
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st1, dir, cfg);
+	assert_err(err);
+
+	err = reftable_new_stack(&st2, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st1, &write_test_ref, &ref1);
+	assert_err(err);
+
+	err = reftable_stack_add(st2, &write_test_ref, &ref2);
+	assert(err == REFTABLE_LOCK_ERROR);
+
+	err = reftable_stack_reload(st2);
+	assert_err(err);
+
+	err = reftable_stack_add(st2, &write_test_ref, &ref2);
+	assert_err(err);
+	reftable_stack_destroy(st1);
+	reftable_stack_destroy(st2);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_transaction_api(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_addition *add = NULL;
+
+	struct reftable_ref_record ref = {
+		.ref_name = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	reftable_addition_destroy(add);
+
+	err = reftable_stack_new_addition(&add, st);
+	assert_err(err);
+
+	err = reftable_addition_add(add, &write_test_ref, &ref);
+	assert_err(err);
+
+	err = reftable_addition_commit(add);
+	assert_err(err);
+
+	reftable_addition_destroy(add);
+
+	err = reftable_stack_read_ref(st, ref.ref_name, &dest);
+	assert_err(err);
+	assert(0 == strcmp("master", dest.target));
+
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_validate_refname(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int i;
+	struct reftable_ref_record ref = {
+		.ref_name = "a/b",
+		.update_index = 1,
+		.target = "master",
+	};
+	char *additions[] = { "a", "a/b/c" };
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	for (i = 0; i < ARRAY_SIZE(additions); i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = additions[i],
+			.update_index = 1,
+			.target = "master",
+		};
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		assert(err == REFTABLE_NAME_CONFLICT);
+	}
+
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static int write_error(struct reftable_writer *wr, void *arg)
+{
+	return *((int *)arg);
+}
+
+static void test_reftable_stack_update_index_check(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record ref1 = {
+		.ref_name = "name1",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record ref2 = {
+		.ref_name = "name2",
+		.update_index = 1,
+		.target = "master",
+	};
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref1);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref2);
+	assert(err == REFTABLE_API_ERROR);
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_lock_failure(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err, i;
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+	for (i = -1; i != REFTABLE_EMPTY_TABLE_ERROR; i--) {
+		err = reftable_stack_add(st, &write_error, &i);
+		assert(err == i);
+	}
+
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_add(void)
+{
+	int i = 0;
+	int err = 0;
+	struct reftable_write_options cfg = {
+		.exact_log_message = true,
+	};
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_ref_record refs[2] = { { 0 } };
+	struct reftable_log_record logs[2] = { { 0 } };
+	int N = ARRAY_SIZE(refs);
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+	st->disable_auto_compact = true;
+
+	for (i = 0; i < N; i++) {
+		char buf[256];
+		snprintf(buf, sizeof(buf), "branch%02d", i);
+		refs[i].ref_name = xstrdup(buf);
+		refs[i].value = reftable_malloc(SHA1_SIZE);
+		refs[i].update_index = i + 1;
+		set_test_hash(refs[i].value, i);
+
+		logs[i].ref_name = xstrdup(buf);
+		logs[i].update_index = N + i + 1;
+		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+		logs[i].email = xstrdup("identity@invalid");
+		set_test_hash(logs[i].new_hash, i);
+	}
+
+	for (i = 0; i < N; i++) {
+		int err = reftable_stack_add(st, &write_test_ref, &refs[i]);
+		assert_err(err);
+	}
+
+	for (i = 0; i < N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		struct reftable_ref_record dest = { 0 };
+		int err = reftable_stack_read_ref(st, refs[i].ref_name, &dest);
+		assert_err(err);
+		assert(reftable_ref_record_equal(&dest, refs + i, SHA1_SIZE));
+		reftable_ref_record_clear(&dest);
+	}
+
+	for (i = 0; i < N; i++) {
+		struct reftable_log_record dest = { 0 };
+		int err = reftable_stack_read_log(st, refs[i].ref_name, &dest);
+		assert_err(err);
+		assert(reftable_log_record_equal(&dest, logs + i, SHA1_SIZE));
+		reftable_log_record_clear(&dest);
+	}
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i < N; i++) {
+		reftable_ref_record_clear(&refs[i]);
+		reftable_log_record_clear(&logs[i]);
+	}
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_log_normalize(void)
+{
+	int err = 0;
+	struct reftable_write_options cfg = {
+		0,
+	};
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+
+	byte h1[SHA1_SIZE] = { 0x01 }, h2[SHA1_SIZE] = { 0x02 };
+
+	struct reftable_log_record input = {
+		.ref_name = "branch",
+		.update_index = 1,
+		.new_hash = h1,
+		.old_hash = h2,
+	};
+	struct reftable_log_record dest = {
+		.update_index = 0,
+	};
+	struct write_log_arg arg = {
+		.log = &input,
+		.update_index = 1,
+	};
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	input.message = "one\ntwo";
+	err = reftable_stack_add(st, &write_test_log, &arg);
+	assert(err == REFTABLE_API_ERROR);
+
+	input.message = "one";
+	err = reftable_stack_add(st, &write_test_log, &arg);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, input.ref_name, &dest);
+	assert_err(err);
+	assert(0 == strcmp(dest.message, "one\n"));
+
+	input.message = "two\n";
+	arg.update_index = 2;
+	err = reftable_stack_add(st, &write_test_log, &arg);
+	assert_err(err);
+	err = reftable_stack_read_log(st, input.ref_name, &dest);
+	assert_err(err);
+	assert(0 == strcmp(dest.message, "two\n"));
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	reftable_log_record_clear(&dest);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_tombstone(void)
+{
+	int i = 0;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record refs[2] = { { 0 } };
+	struct reftable_log_record logs[2] = { { 0 } };
+	int N = ARRAY_SIZE(refs);
+	struct reftable_ref_record dest = { 0 };
+	struct reftable_log_record log_dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		const char *buf = "branch";
+		refs[i].ref_name = xstrdup(buf);
+		refs[i].update_index = i + 1;
+		if (i % 2 == 0) {
+			refs[i].value = reftable_malloc(SHA1_SIZE);
+			set_test_hash(refs[i].value, i);
+		}
+		logs[i].ref_name = xstrdup(buf);
+		/* update_index is part of the key. */
+		logs[i].update_index = 42;
+		if (i % 2 == 0) {
+			logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+			set_test_hash(logs[i].new_hash, i);
+			logs[i].email = xstrdup("identity@invalid");
+		}
+	}
+	for (i = 0; i < N; i++) {
+		int err = reftable_stack_add(st, &write_test_ref, &refs[i]);
+		assert_err(err);
+	}
+	for (i = 0; i < N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_read_ref(st, "branch", &dest);
+	assert(err == 1);
+	reftable_ref_record_clear(&dest);
+
+	err = reftable_stack_read_log(st, "branch", &log_dest);
+	assert(err == 1);
+	reftable_log_record_clear(&log_dest);
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st, "branch", &dest);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, "branch", &log_dest);
+	assert(err == 1);
+	reftable_ref_record_clear(&dest);
+	reftable_log_record_clear(&log_dest);
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i < N; i++) {
+		reftable_ref_record_clear(&refs[i]);
+		reftable_log_record_clear(&logs[i]);
+	}
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_hash_id(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+
+	struct reftable_ref_record ref = {
+		.ref_name = "master",
+		.target = "target",
+		.update_index = 1,
+	};
+	struct reftable_write_options cfg32 = { .hash_id = SHA256_ID };
+	struct reftable_stack *st32 = NULL;
+	struct reftable_write_options cfg_default = { 0 };
+	struct reftable_stack *st_default = NULL;
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	/* can't read it with the wrong hash ID. */
+	err = reftable_new_stack(&st32, dir, cfg32);
+	assert(err == REFTABLE_FORMAT_ERROR);
+
+	/* check that we can read it back with default config too. */
+	err = reftable_new_stack(&st_default, dir, cfg_default);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st_default, "master", &dest);
+	assert_err(err);
+
+	assert(!strcmp(dest.target, ref.target));
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	reftable_stack_destroy(st_default);
+	reftable_clear_dir(dir);
+}
+
+static void test_log2(void)
+{
+	assert(1 == fastlog2(3));
+	assert(2 == fastlog2(4));
+	assert(2 == fastlog2(5));
+}
+
+static void test_sizes_to_segments(void)
+{
+	uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
+	/* .................0  1  2  3  4  5 */
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(segs[2].log == 3);
+	assert(segs[2].start == 5);
+	assert(segs[2].end == 6);
+
+	assert(segs[1].log == 2);
+	assert(segs[1].start == 2);
+	assert(segs[1].end == 5);
+	reftable_free(segs);
+}
+
+static void test_sizes_to_segments_empty(void)
+{
+	uint64_t sizes[0];
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(seglen == 0);
+	reftable_free(segs);
+}
+
+static void test_sizes_to_segments_all_equal(void)
+{
+	uint64_t sizes[] = { 5, 5 };
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(seglen == 1);
+	assert(segs[0].start == 0);
+	assert(segs[0].end == 2);
+	reftable_free(segs);
+}
+
+static void test_suggest_compaction_segment(void)
+{
+	uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
+	/* .................0    1    2  3   4  5  6 */
+	struct segment min =
+		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
+	assert(min.start == 2);
+	assert(min.end == 7);
+}
+
+static void test_suggest_compaction_segment_nothing(void)
+{
+	uint64_t sizes[] = { 64, 32, 16, 8, 4, 2 };
+	struct segment result =
+		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
+	assert(result.start == result.end);
+}
+
+static void test_reflog_expire(void)
+{
+	char dir[256] = "/tmp/stack.test_reflog_expire.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	struct reftable_log_record logs[20] = { { 0 } };
+	int N = ARRAY_SIZE(logs) - 1;
+	int i = 0;
+	int err;
+	struct reftable_log_expiry_config expiry = {
+		.time = 10,
+	};
+	struct reftable_log_record log = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 1; i <= N; i++) {
+		char buf[256];
+		snprintf(buf, sizeof(buf), "branch%02d", i);
+
+		logs[i].ref_name = xstrdup(buf);
+		logs[i].update_index = i;
+		logs[i].time = i;
+		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+		logs[i].email = xstrdup("identity@invalid");
+		set_test_hash(logs[i].new_hash, i);
+	}
+
+	for (i = 1; i <= N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	err = reftable_stack_compact_all(st, &expiry);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, logs[9].ref_name, &log);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, logs[11].ref_name, &log);
+	assert_err(err);
+
+	expiry.min_update_index = 15;
+	err = reftable_stack_compact_all(st, &expiry);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, logs[14].ref_name, &log);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, logs[16].ref_name, &log);
+	assert_err(err);
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i <= N; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	reftable_clear_dir(dir);
+	reftable_log_record_clear(&log);
+}
+
+static int write_nothing(struct reftable_writer *wr, void *arg)
+{
+	reftable_writer_set_limits(wr, 1, 1);
+	return 0;
+}
+
+static void test_empty_add(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_stack *st2 = NULL;
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_nothing, NULL);
+	assert_err(err);
+
+	err = reftable_new_stack(&st2, dir, cfg);
+	assert_err(err);
+	reftable_clear_dir(dir);
+	reftable_stack_destroy(st);
+	reftable_stack_destroy(st2);
+}
+
+static void test_reftable_stack_auto_compaction(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int err, i;
+	int N = 100;
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		char name[100];
+		struct reftable_ref_record ref = {
+			.ref_name = name,
+			.update_index = reftable_stack_next_update_index(st),
+			.target = "master",
+		};
+		snprintf(name, sizeof(name), "branch%04d", i);
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		assert_err(err);
+
+		assert(i < 3 || st->merged->stack_len < 2 * fastlog2(i));
+	}
+
+	assert(reftable_stack_compaction_stats(st)->entries_written <
+	       (uint64_t)(N * fastlog2(N)));
+
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+int stack_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_reftable_stack_uptodate",
+		      &test_reftable_stack_uptodate);
+	add_test_case("test_reftable_stack_transaction_api",
+		      &test_reftable_stack_transaction_api);
+	add_test_case("test_reftable_stack_hash_id",
+		      &test_reftable_stack_hash_id);
+	add_test_case("test_sizes_to_segments_all_equal",
+		      &test_sizes_to_segments_all_equal);
+	add_test_case("test_reftable_stack_auto_compaction",
+		      &test_reftable_stack_auto_compaction);
+	add_test_case("test_reftable_stack_validate_refname",
+		      &test_reftable_stack_validate_refname);
+	add_test_case("test_reftable_stack_update_index_check",
+		      &test_reftable_stack_update_index_check);
+	add_test_case("test_reftable_stack_lock_failure",
+		      &test_reftable_stack_lock_failure);
+	add_test_case("test_reftable_stack_log_normalize",
+		      &test_reftable_stack_log_normalize);
+	add_test_case("test_reftable_stack_tombstone",
+		      &test_reftable_stack_tombstone);
+	add_test_case("test_reftable_stack_add_one",
+		      &test_reftable_stack_add_one);
+	add_test_case("test_empty_add", test_empty_add);
+	add_test_case("test_reflog_expire", test_reflog_expire);
+	add_test_case("test_suggest_compaction_segment",
+		      &test_suggest_compaction_segment);
+	add_test_case("test_suggest_compaction_segment_nothing",
+		      &test_suggest_compaction_segment_nothing);
+	add_test_case("test_sizes_to_segments", &test_sizes_to_segments);
+	add_test_case("test_sizes_to_segments_empty",
+		      &test_sizes_to_segments_empty);
+	add_test_case("test_log2", &test_log2);
+	add_test_case("test_parse_names", &test_parse_names);
+	add_test_case("test_read_file", &test_read_file);
+	add_test_case("test_names_equal", &test_names_equal);
+	add_test_case("test_reftable_stack_add", &test_reftable_stack_add);
+	return test_main(argc, argv);
+}
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 00000000000..9435436c04f
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,55 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#if 1 /* REFTABLE_IN_GITCORE */
+#define REFTABLE_IN_GITCORE
+
+#include "git-compat-util.h"
+#include "cache.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_IN_GITCORE */
+
+void reftable_clear_dir(const char *dirname);
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef uint8_t byte;
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/test_framework.c b/reftable/test_framework.c
new file mode 100644
index 00000000000..44de31b7338
--- /dev/null
+++ b/reftable/test_framework.c
@@ -0,0 +1,69 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "test_framework.h"
+
+#include "system.h"
+#include "basics.h"
+#include "constants.h"
+
+struct test_case **test_cases;
+int test_case_len;
+int test_case_cap;
+
+struct test_case *new_test_case(const char *name, void (*testfunc)(void))
+{
+	struct test_case *tc = reftable_malloc(sizeof(struct test_case));
+	tc->name = name;
+	tc->testfunc = testfunc;
+	return tc;
+}
+
+struct test_case *add_test_case(const char *name, void (*testfunc)(void))
+{
+	struct test_case *tc = new_test_case(name, testfunc);
+	if (test_case_len == test_case_cap) {
+		test_case_cap = 2 * test_case_cap + 1;
+		test_cases = reftable_realloc(
+			test_cases, sizeof(struct test_case) * test_case_cap);
+	}
+
+	test_cases[test_case_len++] = tc;
+	return tc;
+}
+
+int test_main(int argc, const char *argv[])
+{
+	const char *filter = NULL;
+	int i = 0;
+	if (argc > 1) {
+		filter = argv[1];
+	}
+
+	for (i = 0; i < test_case_len; i++) {
+		const char *name = test_cases[i]->name;
+		if (filter == NULL || strstr(name, filter) != NULL) {
+			printf("case %s\n", name);
+			test_cases[i]->testfunc();
+		} else {
+			printf("skip %s\n", name);
+		}
+
+		reftable_free(test_cases[i]);
+	}
+	reftable_free(test_cases);
+	test_cases = 0;
+	test_case_len = 0;
+	test_case_cap = 0;
+	return 0;
+}
+
+void set_test_hash(byte *p, int i)
+{
+	memset(p, (byte)i, hash_size(SHA1_ID));
+}
diff --git a/reftable/test_framework.h b/reftable/test_framework.h
new file mode 100644
index 00000000000..a3ba7d9a6e6
--- /dev/null
+++ b/reftable/test_framework.h
@@ -0,0 +1,64 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TEST_FRAMEWORK_H
+#define TEST_FRAMEWORK_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#ifdef NDEBUG
+#undef NDEBUG
+#endif
+
+#include "system.h"
+
+#ifdef assert
+#undef assert
+#endif
+
+#define assert_err(c)                                                  \
+	if (c != 0) {                                                  \
+		fflush(stderr);                                        \
+		fflush(stdout);                                        \
+		fprintf(stderr, "%s: %d: error == %d (%s), want 0\n",  \
+			__FILE__, __LINE__, c, reftable_error_str(c)); \
+		abort();                                               \
+	}
+
+#define assert_streq(a, b)                                               \
+	if (strcmp(a, b)) {                                              \
+		fflush(stderr);                                          \
+		fflush(stdout);                                          \
+		fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, \
+			__LINE__, #a, a, #b, b);                         \
+		abort();                                                 \
+	}
+
+#define assert(c)                                                          \
+	if (!(c)) {                                                        \
+		fflush(stderr);                                            \
+		fflush(stdout);                                            \
+		fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, \
+			__LINE__, #c);                                     \
+		abort();                                                   \
+	}
+
+struct test_case {
+	const char *name;
+	void (*testfunc)(void);
+};
+
+struct test_case *new_test_case(const char *name, void (*testfunc)(void));
+struct test_case *add_test_case(const char *name, void (*testfunc)(void));
+int test_main(int argc, const char *argv[]);
+
+void set_test_hash(byte *p, int i);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 00000000000..0061d14e306
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,63 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	int res;
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				reftable_calloc(sizeof(struct tree_node));
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	res = compare(key, (*rootp)->key);
+	if (res < 0)
+		return tree_search(key, &(*rootp)->left, compare, insert);
+	else if (res > 0)
+		return tree_search(key, &(*rootp)->right, compare, insert);
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	reftable_free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 00000000000..954512e9a3a
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+/* tree_node is a generic binary search tree. */
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+/* looks for `key` in `rootp` using `compare` as comparison function. If insert
+   is set, insert the key if it's not found. Else, return NULL.
+*/
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+
+/* performs an infix walk of the tree. */
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+
+/*
+  deallocates the tree nodes recursively. Keys should be deallocated separately
+  by walking over the tree. */
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/tree_test.c b/reftable/tree_test.c
new file mode 100644
index 00000000000..c6d448cbe8a
--- /dev/null
+++ b/reftable/tree_test.c
@@ -0,0 +1,62 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static int test_compare(const void *a, const void *b)
+{
+	return (char *)a - (char *)b;
+}
+
+struct curry {
+	void *last;
+};
+
+static void check_increasing(void *arg, void *key)
+{
+	struct curry *c = (struct curry *)arg;
+	if (c->last != NULL) {
+		assert(test_compare(c->last, key) < 0);
+	}
+	c->last = key;
+}
+
+static void test_tree(void)
+{
+	struct tree_node *root = NULL;
+
+	void *values[11] = { 0 };
+	struct tree_node *nodes[11] = { 0 };
+	int i = 1;
+	struct curry c = { 0 };
+	do {
+		nodes[i] = tree_search(values + i, &root, &test_compare, 1);
+		i = (i * 7) % 11;
+	} while (i != 1);
+
+	for (i = 1; i < ARRAY_SIZE(nodes); i++) {
+		assert(values + i == nodes[i]->key);
+		assert(nodes[i] ==
+		       tree_search(values + i, &root, &test_compare, 0));
+	}
+
+	infix_walk(root, check_increasing, &c);
+	tree_free(root);
+}
+
+int tree_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_tree", &test_tree);
+	return test_main(argc, argv);
+}
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100755
index 00000000000..4dadd875f4a
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,22 @@
+#!/bin/sh
+
+set -eu
+
+# Override this to import from somewhere else, say "../reftable".
+SRC=${SRC:-origin}
+BRANCH=${BRANCH:-master}
+
+((git --git-dir reftable-repo/.git fetch -f ${SRC} ${BRANCH}:import && cd reftable-repo && git checkout -f $(git rev-parse import) ) ||
+   git clone https://github.com/google/reftable reftable-repo)
+
+cp reftable-repo/c/*.[ch] reftable/
+cp reftable-repo/c/include/*.[ch] reftable/
+cp reftable-repo/LICENSE reftable/
+
+git --git-dir reftable-repo/.git show --no-patch --format=oneline HEAD \
+  > reftable/VERSION
+
+mv reftable/system.h reftable/system.h~
+sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
+
+git add reftable/*.[ch] reftable/LICENSE reftable/VERSION
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 00000000000..eae89a6bf34
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,661 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct reftable_block_stats *
+writer_reftable_block_stats(struct reftable_writer *w, byte typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct reftable_writer *w, byte *data, size_t len,
+			int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		byte *zeroed = reftable_calloc(w->pending_padding);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0)
+			return n;
+
+		w->pending_padding = 0;
+		reftable_free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0)
+		return n;
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct reftable_write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_version(struct reftable_writer *w)
+{
+	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
+}
+
+static int writer_write_header(struct reftable_writer *w, byte *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	dest[4] = writer_version(w);
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	if (writer_version(w) == 2) {
+		put_be32(dest + 24, w->opts.hash_id);
+	}
+	return header_size(writer_version(w));
+}
+
+static void writer_reinit_block_writer(struct reftable_writer *w, byte typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = header_size(writer_version(w));
+	}
+
+	slice_release(&w->last_key);
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, byte *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts)
+{
+	struct reftable_writer *wp =
+		reftable_calloc(sizeof(struct reftable_writer));
+	slice_init(&wp->block_writer_data.last_key);
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->last_key = reftable_empty_slice;
+	wp->block = reftable_calloc(opts->block_size);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void reftable_writer_free(struct reftable_writer *w)
+{
+	reftable_free(w->block);
+	reftable_free(w);
+}
+
+struct obj_index_tree_node {
+	struct slice hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+#define OBJ_INDEX_TREE_NODE_INIT   \
+	{                          \
+		.hash = SLICE_INIT \
+	}
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return slice_cmp(&((const struct obj_index_tree_node *)a)->hash,
+			 &((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct reftable_writer *w, struct slice *hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = *hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		struct obj_index_tree_node empty = OBJ_INDEX_TREE_NODE_INIT;
+		key = reftable_malloc(sizeof(struct obj_index_tree_node));
+		*key = empty;
+
+		slice_reset(&key->hash);
+		slice_addbuf(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = reftable_realloc(
+			key->offsets, sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct reftable_writer *w,
+			     struct reftable_record *rec)
+{
+	int result = -1;
+	struct slice key = SLICE_INIT;
+	int err = 0;
+	reftable_record_key(rec, &key);
+	if (slice_cmp(&w->last_key, &key) >= 0)
+		goto done;
+
+	slice_reset(&w->last_key);
+	slice_addbuf(&w->last_key, &key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, reftable_record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == reftable_record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto done;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto done;
+	}
+
+	writer_reinit_block_writer(w, reftable_record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto done;
+	}
+
+	result = 0;
+done:
+	slice_release(&key);
+	return result;
+}
+
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	struct reftable_ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL)
+		return REFTABLE_API_ERROR;
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index)
+		return REFTABLE_API_ERROR;
+
+	reftable_record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, &rec);
+	if (err < 0)
+		return err;
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct slice h = SLICE_INIT;
+		slice_add(&h, ref->value, hash_size(w->opts.hash_id));
+		writer_index_hash(w, &h);
+		slice_release(&h);
+	}
+
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct slice h = SLICE_INIT;
+		slice_add(&h, ref->target_value, hash_size(w->opts.hash_id));
+		writer_index_hash(w, &h);
+	}
+	return 0;
+}
+
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, reftable_ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	char *input_log_message = log->message;
+	struct slice cleaned_message = SLICE_INIT;
+	int err;
+	if (log->ref_name == NULL)
+		return REFTABLE_API_ERROR;
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0)
+			return err;
+	}
+
+	if (!w->opts.exact_log_message && log->message != NULL) {
+		slice_addstr(&cleaned_message, log->message);
+		while (cleaned_message.len &&
+		       cleaned_message.buf[cleaned_message.len - 1] == '\n')
+			slice_setlen(&cleaned_message, cleaned_message.len - 1);
+		if (strchr(slice_as_string(&cleaned_message), '\n')) {
+			// multiple lines not allowed.
+			err = REFTABLE_API_ERROR;
+			goto done;
+		}
+		slice_addstr(&cleaned_message, "\n");
+		log->message = (char *)slice_as_string(&cleaned_message);
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	reftable_record_from_log(&rec, log);
+	err = writer_add_record(w, &rec);
+
+done:
+	log->message = input_log_message;
+	slice_release(&cleaned_message);
+	return err;
+}
+
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, reftable_log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	struct reftable_block_stats *bstats = NULL;
+	if (err < 0)
+		return err;
+
+	while (w->index_len > threshold) {
+		struct reftable_index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct reftable_record rec = { 0 };
+			reftable_record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, &rec) == 0) {
+				continue;
+			}
+
+			err = writer_flush_block(w);
+			if (err < 0)
+				return err;
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, &rec);
+			if (err != 0) {
+				/* write into fresh block should always succeed
+				 */
+				abort();
+			}
+		}
+		for (i = 0; i < idx_len; i++) {
+			slice_release(&idx[i].last_key);
+		}
+		reftable_free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0)
+		return err;
+
+	bstats = writer_reftable_block_stats(w, typ);
+	bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
+	bstats->index_offset = index_start;
+	bstats->max_index_level = max_level;
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct slice *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(&entry->hash, arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct reftable_writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct reftable_obj_record obj_rec = {
+		.hash_prefix = entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct reftable_record rec = { 0 };
+	if (arg->err < 0)
+		goto done;
+
+	reftable_record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0)
+		goto done;
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0)
+		goto done;
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0)
+		goto done;
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+done:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	slice_release(&entry->hash);
+	reftable_free(entry);
+}
+
+static int writer_dump_object_index(struct reftable_writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0)
+		return closure.err;
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct reftable_writer *w)
+{
+	byte typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL)
+		return 0;
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0)
+		return err;
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0)
+			return err;
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int reftable_writer_close(struct reftable_writer *w)
+{
+	byte footer[72];
+	byte *p = footer;
+	int err = writer_finish_public_section(w);
+	int empty_table = w->next == 0;
+	if (err != 0)
+		goto done;
+	w->pending_padding = 0;
+	if (empty_table) {
+		/* Empty tables need a header anyway. */
+		byte header[28];
+		int n = writer_write_header(w, header);
+		err = padded_write(w, header, n, 0);
+		if (err < 0)
+			goto done;
+	}
+
+	p += writer_write_header(w, footer);
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+
+	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
+	if (err < 0)
+		goto done;
+
+	if (empty_table) {
+		err = REFTABLE_EMPTY_TABLE_ERROR;
+		goto done;
+	}
+
+done:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	slice_release(&w->last_key);
+	return err;
+}
+
+void writer_clear_index(struct reftable_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		slice_release(&w->index[i].last_key);
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct reftable_writer *w)
+{
+	byte typ = block_writer_type(w->block_writer);
+	struct reftable_block_stats *bstats =
+		writer_reftable_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	struct reftable_index_record ir = { .last_key = SLICE_INIT };
+	if (raw_bytes < 0)
+		return raw_bytes;
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0)
+		return err;
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = reftable_realloc(
+			w->index,
+			sizeof(struct reftable_index_record) * w->index_cap);
+	}
+
+	ir.offset = w->next;
+	slice_reset(&ir.last_key);
+	slice_addbuf(&ir.last_key, &w->block_writer->last_key);
+	w->index[w->index_len] = ir;
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct reftable_writer *w)
+{
+	if (w->block_writer == NULL)
+		return 0;
+	if (w->block_writer->entries == 0)
+		return 0;
+	return writer_flush_nonempty_block(w);
+}
+
+const struct reftable_stats *writer_stats(struct reftable_writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 00000000000..c56d67f7116
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "slice.h"
+#include "tree.h"
+
+struct reftable_writer {
+	int (*write)(void *, byte *, size_t);
+	void *write_arg;
+	int pending_padding;
+	struct slice last_key;
+
+	/* offset of next block to write. */
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct reftable_write_options opts;
+
+	/* memory buffer for writing */
+	byte *block;
+
+	/* writer for the current section. NULL or points to
+	 * block_writer_data */
+	struct block_writer *block_writer;
+
+	struct block_writer block_writer_data;
+
+	/* pending index records for the current section */
+	struct reftable_index_record *index;
+	int index_len;
+	int index_cap;
+
+	/*
+	  tree for use with tsearch; used to populate the 'o' inverse OID
+	  map */
+	struct tree_node *obj_index_tree;
+
+	struct reftable_stats stats;
+};
+
+/* finishes a block, and writes it to storage */
+int writer_flush_block(struct reftable_writer *w);
+
+/* deallocates memory related to the index */
+void writer_clear_index(struct reftable_writer *w);
+
+/* finishes writing a 'r' (refs) or 'g' (reflogs) section */
+int writer_finish_public_section(struct reftable_writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 00000000000..3e0b0f24f1c
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 12/17] Reftable support for git-core
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (10 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 11/17] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-19 14:24                                   ` SZEDER Gábor
  2020-06-16 19:20                                 ` [PATCH v17 13/17] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
                                                   ` (5 subsequent siblings)
  17 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

This introduces the refs/reftable-backend.c containing reftable powered ref
storage backend.

It can be activated by passing --ref-storage=reftable to "git init".

TODO:

* Fix worktree commands

* Spots marked XXX

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   28 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   56 +-
 cache.h                                       |    6 +-
 refs.c                                        |   27 +-
 refs.h                                        |    3 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1327 +++++++++++++++++
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |  155 ++
 13 files changed, 1600 insertions(+), 30 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ffd..72576235833 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index 372139f1f24..7a42362aa42 100644
--- a/Makefile
+++ b/Makefile
@@ -807,6 +807,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -959,6 +960,7 @@ LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1162,7 +1164,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2352,11 +2354,30 @@ VCSSVN_OBJS += vcs-svn/sliding_window.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/compat.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/refname.o
+REFTABLE_OBJS += reftable/reftable.o
+REFTABLE_OBJS += reftable/slice.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2497,6 +2518,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -3112,7 +3136,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/builtin/clone.c b/builtin/clone.c
index 2a8e3aaaed3..8910af03d95 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1111,7 +1111,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
+		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 0b7222e7188..da5b4670c84 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,7 +178,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -188,7 +189,8 @@ void initialize_repository_version(int hash_algo)
 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
 #endif
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 ||
+	    !strcmp(ref_storage_format, "reftable"))
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -238,6 +240,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -247,6 +250,24 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
+	/*
+	 * refs/heads is a file when using reftable. We can't reinitialize with
+	 * a reftable because it will overwrite HEAD
+	 */
+	if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
+			      is_directory(git_path_buf(&buf, "refs/heads"))) {
+		die("cannot switch ref storage format.");
+	}
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -261,15 +282,12 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
@@ -383,7 +401,8 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash, unsigned int flags)
+	    const char *template_dir, int hash, const char *ref_storage_format,
+	    unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -422,6 +441,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
+	repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
@@ -450,6 +470,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -523,6 +545,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = default_ref_storage();
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
@@ -530,15 +553,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *object_format = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
@@ -648,9 +674,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
+	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
+		       ref_storage_format, flags);
 }
diff --git a/cache.h b/cache.h
index 0f0485ecfe2..8cb884773c3 100644
--- a/cache.h
+++ b/cache.h
@@ -628,8 +628,9 @@ int path_inside_repo(const char *prefix, const char *path);
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
-	    unsigned int flags);
-void initialize_repository_version(int hash_algo);
+	    const char *ref_storage_format, unsigned int flags);
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1043,6 +1044,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 29e710a29e9..4409080dfd8 100644
--- a/refs.c
+++ b/refs.c
@@ -17,10 +17,16 @@
 #include "argv-array.h"
 #include "repository.h"
 
+const char *default_ref_storage(void)
+{
+	const char *test = getenv("GIT_TEST_REFTABLE");
+	return test ? "reftable" : "files";
+}
+
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1717,13 +1723,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1739,7 +1745,11 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs_private = ref_store_init(r->gitdir,
+					 r->ref_storage_format ?
+						 r->ref_storage_format :
+						 DEFAULT_REF_STORAGE,
+					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
 
@@ -1794,7 +1804,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1808,6 +1818,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
 	struct ref_store *refs;
 	const char *id;
 
@@ -1821,9 +1832,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index 7aaa1226551..ddcc940d5c7 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* Returns the ref storage backend to use by default. */
+const char *default_ref_storage(void);
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index dc9e8d3a92b..7afe4c28310 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -693,6 +693,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 00000000000..6064fd7f58f
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1327 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct git_reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *repo_dir;
+	char *reftable_dir;
+	struct reftable_stack *stack;
+};
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type);
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	reftable_log_record_clear(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = { NULL };
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	reftable_log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *git_reftable_ref_store_create(const char *path,
+						       unsigned int store_flags)
+{
+	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct reftable_write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+	refs->repo_dir = xstrdup(path);
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
+	assert(refs->err != REFTABLE_API_ERROR);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct strbuf sb = STRBUF_INIT;
+
+	safe_create_dir(refs->reftable_dir, 1);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
+	write_file(sb.buf, "ref: refs/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	return 0;
+}
+
+struct git_reftable_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	const char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = reftable_iterator_next_ref(&ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		/*
+		   We could filter pseudo refs here explicitly, but HEAD is not
+		   a PSEUDOREF, but a PER_WORKTREE, b/c each worktree can have
+		   its own HEAD.
+		 */
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	reftable_ref_record_clear(&ri->ref);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct reftable_merged_table *mt = NULL;
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else {
+		mt = reftable_stack_merged_table(refs->stack);
+		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->prefix = prefix;
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int fixup_symrefs(struct ref_store *ref_store,
+			 struct ref_transaction *transaction)
+{
+	struct strbuf referent = STRBUF_INIT;
+	int i = 0;
+	int err = 0;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *update = transaction->updates[i];
+		struct object_id old_oid;
+
+		err = reftable_read_raw_ref(ref_store, update->refname,
+					    &old_oid, &referent,
+					    /* mutate input, like
+					       files-backend.c */
+					    &update->type);
+		if (err < 0 && errno == ENOENT &&
+		    is_null_oid(&update->old_oid)) {
+			err = 0;
+		}
+		if (err < 0)
+			goto done;
+
+		if (!(update->type & REF_ISSYMREF))
+			continue;
+
+		if (update->flags & REF_NO_DEREF) {
+			/* what should happen here? See files-backend.c
+			 * lock_ref_for_update. */
+		} else {
+			/*
+			  If we are updating a symref (eg. HEAD), we should also
+			  update the branch that the symref points to.
+
+			  This is generic functionality, and would be better
+			  done in refs.c, but the current implementation is
+			  intertwined with the locking in files-backend.c.
+			*/
+			int new_flags = update->flags;
+			struct ref_update *new_update = NULL;
+
+			/* if this is an update for HEAD, should also record a
+			   log entry for HEAD? See files-backend.c,
+			   split_head_update()
+			*/
+			new_update = ref_transaction_add_update(
+				transaction, referent.buf, new_flags,
+				&update->new_oid, &update->old_oid,
+				update->msg);
+			new_update->parent_update = update;
+
+			/* files-backend sets REF_LOG_ONLY here. */
+			update->flags |= REF_NO_DEREF | REF_LOG_ONLY;
+			update->flags &= ~REF_HAVE_OLD;
+		}
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	strbuf_release(&referent);
+	return err;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_addition *add = NULL;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_new_addition(&add, refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = fixup_symrefs(ref_store, transaction);
+	if (err) {
+		goto done;
+	}
+
+	transaction->backend_data = add;
+	transaction->state = REF_TRANSACTION_PREPARED;
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	if (err < 0) {
+		transaction->state = REF_TRANSACTION_CLOSED;
+		strbuf_addf(errbuf, "reftable: transaction prepare: %s",
+			    reftable_error_str(err));
+	}
+
+	return err;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	reftable_addition_destroy(add);
+	transaction->backend_data = NULL;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid;
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp((*(struct ref_update **)a)->refname,
+		      (*(struct ref_update **)b)->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = reftable_stack_next_update_index(refs->stack);
+	int err = 0;
+	int i = 0;
+	struct reftable_log_record *logs =
+		calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct reftable_log_record *log = &logs[i];
+		fill_reftable_log_record(log);
+		log->ref_name = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_LOG_ONLY) {
+			continue;
+		}
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct reftable_ref_record ref = { NULL };
+			struct object_id peeled;
+
+			int peel_error = peel_object(&u->new_oid, &peeled);
+			ref.ref_name = (char *)u->refname;
+
+			if (!is_null_oid(&u->new_oid)) {
+				ref.value = u->new_oid.hash;
+			}
+			ref.update_index = ts;
+			if (!peel_error) {
+				ref.target_value = peeled.hash;
+			}
+
+			err = reftable_writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		err = reftable_writer_add_log(writer, &logs[i]);
+		clear_reftable_log_record(&logs[i]);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	int err = 0;
+	int i;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+
+	err = reftable_addition_add(add, &write_transaction_table, transaction);
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_addition_commit(add);
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_addition_destroy(add);
+	transaction->state = REF_TRANSACTION_CLOSED;
+	transaction->backend_data = NULL;
+	if (err) {
+		strbuf_addf(errmsg, "reftable: transaction failure: %s",
+			    reftable_error_str(err));
+		return -1;
+	}
+	return err;
+}
+
+static int
+reftable_transaction_initial_commit(struct ref_store *ref_store,
+				    struct ref_transaction *transaction,
+				    struct strbuf *errmsg)
+{
+	int err = reftable_transaction_prepare(ref_store, transaction, errmsg);
+	if (err)
+		return err;
+
+	return reftable_transaction_finish(ref_store, transaction, errmsg);
+}
+
+struct write_pseudoref_arg {
+	struct reftable_stack *stack;
+	const char *pseudoref;
+	const struct object_id *new_oid;
+	const struct object_id *old_oid;
+};
+
+static int write_pseudoref_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_pseudoref_arg *arg = (struct write_pseudoref_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	struct reftable_ref_record read_ref = { NULL };
+	struct reftable_ref_record write_ref = { NULL };
+
+	reftable_writer_set_limits(writer, ts, ts);
+	if (arg->old_oid) {
+		struct object_id read_oid;
+		err = reftable_stack_read_ref(arg->stack, arg->pseudoref,
+					      &read_ref);
+		if (err < 0)
+			goto done;
+
+		if ((err > 0) != is_null_oid(arg->old_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		/* XXX If old_oid is set, and we have a symref? */
+
+		if (err == 0 && read_ref.value == NULL) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		hashcpy(read_oid.hash, read_ref.value);
+		if (!oideq(arg->old_oid, &read_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+	}
+
+	write_ref.ref_name = (char *)arg->pseudoref;
+	write_ref.update_index = ts;
+	if (!is_null_oid(arg->new_oid))
+		write_ref.value = (uint8_t *)arg->new_oid->hash;
+
+	err = reftable_writer_add_ref(writer, &write_ref);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&read_ref);
+	return err;
+}
+
+static int reftable_write_pseudoref(struct ref_store *ref_store,
+				    const char *pseudoref,
+				    const struct object_id *oid,
+				    const struct object_id *old_oid,
+				    struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_pseudoref_arg arg = {
+		.stack = refs->stack,
+		.pseudoref = pseudoref,
+		.new_oid = oid,
+	};
+	struct reftable_addition *add = NULL;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_new_addition(&add, refs->stack);
+	if (err) {
+		goto done;
+	}
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		/* XXX this is cut & paste from files-backend - should factor
+		 * out? */
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(errbuf,
+					    _("could not read ref '%s'"),
+					    pseudoref);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(errbuf, _("ref '%s' already exists"),
+				    pseudoref);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(errbuf,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			goto done;
+		}
+	}
+
+	err = reftable_addition_add(add, &write_pseudoref_table, &arg);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref update failure: %s",
+			    reftable_error_str(err));
+	}
+
+	err = reftable_addition_commit(add);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref commit failure: %s",
+			    reftable_error_str(err));
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_addition_destroy(add);
+	return err;
+}
+
+static int reftable_delete_pseudoref(struct ref_store *ref_store,
+				     const char *pseudoref,
+				     const struct object_id *old_oid)
+{
+	struct strbuf errbuf = STRBUF_INIT;
+	int ret = reftable_write_pseudoref(ref_store, pseudoref, &null_oid,
+					   old_oid, &errbuf);
+	/* XXX what to do with the error message? */
+	strbuf_release(&errbuf);
+	return ret;
+}
+
+struct write_delete_refs_arg {
+	struct reftable_stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	int i = 0;
+
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = reftable_writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_log_record log = {
+			.update_index = ts,
+		};
+		struct reftable_ref_record current = { NULL };
+		fill_reftable_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (reftable_stack_read_ref(arg->stack, log.ref_name,
+					    &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = reftable_writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		reftable_ref_record_clear(&current);
+
+		clear_reftable_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	string_list_sort(refnames);
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct git_reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = reftable_stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	reftable_writer_set_limits(writer, ts, ts);
+	err = reftable_writer_add_ref(writer, &ref);
+	if (err == 0) {
+		struct reftable_log_record log = { NULL };
+		struct object_id new_oid;
+		struct object_id old_oid;
+
+		fill_reftable_log_record(&log);
+		log.ref_name = (char *)create->refname;
+		log.message = (char *)create->logmsg;
+		log.update_index = ts;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			err = reftable_writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.message = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_reftable_log_record(&log);
+	}
+	return err;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(refs->stack, &write_create_symref_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+struct write_rename_arg {
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_ref_record ref = { NULL };
+	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto done;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto done;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	reftable_writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct reftable_ref_record todo[2] = { { NULL } };
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = reftable_writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct reftable_log_record todo[2] = { { NULL } };
+		fill_reftable_log_record(&todo[0]);
+		fill_reftable_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = reftable_writer_add_logs(writer, todo, 2);
+
+		clear_reftable_log_record(&todo[0]);
+		clear_reftable_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto done;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_add(refs->stack, &write_rename_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = reftable_iterator_next_log(&ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			/* we want the refnames that we have reflogs for, so we
+			 * skip if we've already produced this name. This could
+			 * be faster by seeking directly to
+			 * reflog@update_index==0.
+			 */
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	reftable_log_record_clear(&ri->log);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(refs->stack);
+	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_log_record log = { NULL };
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		hashcpy(old_oid.hash, log.old_hash);
+		hashcpy(new_oid.hash, log.new_hash);
+
+		full_committer = fmt_ident(log.name, log.email,
+					   WANT_COMMITTER_IDENT,
+					   /*date*/ NULL, IDENT_NO_DATE);
+		err = fn(&old_oid, &new_oid, full_committer, log.time,
+			 log.tz_offset, log.message, cb_data);
+		if (err)
+			break;
+	}
+
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+	int i = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct reftable_log_record log = { NULL };
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (i = len; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		err = fn(&old_oid, &new_oid, full_committer, log->time,
+			 log->tz_offset, log->message, cb_data);
+		if (err) {
+			break;
+		}
+	}
+
+	for (i = 0; i < len; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		reftable_log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct reftable_log_record tombstone = {
+		.ref_name = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->refs->stack);
+	int i = 0;
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  reftable_pack_refs(), which will compact the entire stack and get rid
+	  of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.refs = refs,
+	};
+	struct reftable_log_record log = { NULL };
+	struct reftable_iterator it = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		goto done;
+	}
+
+	while (1) {
+		struct object_id ooid;
+		struct object_id noid;
+
+		int err = reftable_iterator_next_log(&it, &log);
+		if (err < 0) {
+			goto done;
+		}
+
+		if (err > 0 || strcmp(log.ref_name, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	err = reftable_stack_read_ref(refs->stack, refname, &ref);
+	if (err > 0) {
+		errno = ENOENT;
+		err = -1;
+		goto done;
+	}
+	if (err < 0) {
+		errno = reftable_error_to_errno(err);
+		err = -1;
+		goto done;
+	}
+	if (ref.target != NULL) {
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else if (ref.value != NULL) {
+		hashcpy(oid->hash, ref.value);
+	} else {
+		*type |= REF_ISBROKEN;
+		errno = EINVAL;
+		err = -1;
+	}
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	git_reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_initial_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_write_pseudoref,
+	reftable_delete_pseudoref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire,
+};
diff --git a/repository.c b/repository.c
index 6f7f6f002b1..087760bc184 100644
--- a/repository.c
+++ b/repository.c
@@ -178,6 +178,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 6534fbb7b31..f57b73f4a27 100644
--- a/repository.h
+++ b/repository.h
@@ -74,6 +74,9 @@ struct repository {
 	 */
 	struct ref_store *refs_private;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index 65fe5ecefbe..2ef970f9f88 100644
--- a/setup.c
+++ b/setup.c
@@ -468,9 +468,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -559,6 +561,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1204,8 +1207,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 00000000000..802afb40e70
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,155 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+INVALID_SHA1=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+initialize ()  {
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	mv .git/hooks .git/hooks-disabled
+}
+
+test_expect_success 'delete ref' '
+	initialize &&
+	test_commit file &&
+	SHA=$(git show-ref -s --verify HEAD) &&
+	test_write_lines "$SHA refs/heads/master" "$SHA refs/tags/file" >expect &&
+	git show-ref > actual &&
+	! git update-ref -d refs/tags/file $INVALID_SHA1 &&
+	test_cmp expect actual &&
+	git update-ref -d refs/tags/file $SHA  &&
+	test_write_lines "$SHA refs/heads/master" >expect &&
+	git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone calls transaction_initial_commit' '
+	test_commit message1 file1 &&
+	git clone . cloned &&
+	(test  -f cloned/file1 || echo "Fixme.")
+'
+
+test_expect_success 'basic operation of reftable storage: commit, show-ref' '
+	initialize &&
+	test_commit file &&
+	test_write_lines refs/heads/master refs/tags/file >expect &&
+	git show-ref &&
+	git show-ref | cut -f2 -d" " > actual &&
+	test_cmp actual expect
+'
+
+test_expect_success 'reflog, repack' '
+	initialize &&
+	for count in $(test_seq 1 10)
+	do
+		test_commit "number $count" file.t $count number-$count ||
+	        return 1
+	done &&
+	git pack-refs &&
+	ls -1 .git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 10 output &&
+	grep "commit (initial): number 1" output &&
+	grep "commit: number 10" output &&
+	git gc &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 0 output
+'
+
+test_expect_success 'branch switch in reflog output' '
+	initialize &&
+	test_commit file1 &&
+	git checkout -b branch1 &&
+	test_commit file2 &&
+	git checkout -b branch2 &&
+	git switch - &&
+	git rev-parse --symbolic-full-name HEAD > actual &&
+	echo refs/heads/branch1 > expect &&
+	test_cmp actual expect
+'
+
+
+# This matches show-ref's output
+print_ref() {
+	echo "$(git rev-parse "$1") $1"
+}
+
+test_expect_success 'peeled tags are stored' '
+	initialize &&
+	test_commit file &&
+	git tag -m "annotated tag" test_tag HEAD &&
+	{
+		print_ref "refs/heads/master" &&
+		print_ref "refs/tags/file" &&
+		print_ref "refs/tags/test_tag" &&
+		print_ref "refs/tags/test_tag^{}"
+	} >expect &&
+	git show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'show-ref works on fresh repo' '
+	initialize &&
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	>expect &&
+	! git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'checkout unborn branch' '
+	initialize &&
+	git checkout -b master
+'
+
+
+test_expect_success 'dir/file conflict' '
+	initialize &&
+	test_commit file &&
+	! git branch master/forbidden
+'
+
+
+test_expect_success 'do not clobber existing repo' '
+	rm -rf .git &&
+	git init --ref-storage=files &&
+	cat .git/HEAD > expect &&
+	test_commit file &&
+	(git init --ref-storage=reftable || true) &&
+	cat .git/HEAD > actual &&
+	test_cmp expect actual
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'pseudo refs' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+	git cherry-pick source &&
+	test -f file2
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'rebase' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+ 	git rebase source &&
+	test -f file2
+'
+
+test_done
+
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 13/17] Hookup unittests for the reftable library.
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (11 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 12/17] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 14/17] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
                                                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The unittests are under reftable/*_test.c, so all of the reftable code stays in
one directory. They are called from t/helpers/test-reftable.c in t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile                 | 17 ++++++++++++++++-
 t/helper/test-reftable.c | 15 +++++++++++++++
 t/helper/test-tool.c     |  1 +
 t/helper/test-tool.h     |  1 +
 t/t0031-reftable.sh      |  5 +++++
 5 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 t/helper/test-reftable.c

diff --git a/Makefile b/Makefile
index 7a42362aa42..feb2e52ad22 100644
--- a/Makefile
+++ b/Makefile
@@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-read-cache.o
 TEST_BUILTINS_OBJS += test-read-graph.o
 TEST_BUILTINS_OBJS += test-read-midx.o
 TEST_BUILTINS_OBJS += test-ref-store.o
+TEST_BUILTINS_OBJS += test-reftable.o
 TEST_BUILTINS_OBJS += test-regex.o
 TEST_BUILTINS_OBJS += test-repository.o
 TEST_BUILTINS_OBJS += test-revision-walking.o
@@ -808,6 +809,7 @@ LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
 REFTABLE_LIB = reftable/libreftable.a
+REFTABLE_TEST_LIB = reftable/libreftable_test.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -2371,6 +2373,15 @@ REFTABLE_OBJS += reftable/tree.o
 REFTABLE_OBJS += reftable/writer.o
 REFTABLE_OBJS += reftable/zlib-compat.o
 
+REFTABLE_TEST_OBJS += reftable/block_test.o
+REFTABLE_TEST_OBJS += reftable/merged_test.o
+REFTABLE_TEST_OBJS += reftable/record_test.o
+REFTABLE_TEST_OBJS += reftable/refname_test.o
+REFTABLE_TEST_OBJS += reftable/reftable_test.o
+REFTABLE_TEST_OBJS += reftable/slice_test.o
+REFTABLE_TEST_OBJS += reftable/stack_test.o
+REFTABLE_TEST_OBJS += reftable/tree_test.o
+REFTABLE_TEST_OBJS += reftable/test_framework.o
 
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
@@ -2378,6 +2389,7 @@ OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
 	$(REFTABLE_OBJS) \
+	$(REFTABLE_TEST_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2521,6 +2533,9 @@ $(VCSSVN_LIB): $(VCSSVN_OBJS)
 $(REFTABLE_LIB): $(REFTABLE_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_TEST_LIB): $(REFTABLE_TEST_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -2803,7 +2818,7 @@ t/helper/test-svn-fe$X: $(VCSSVN_LIB)
 
 t/helper/test-tool$X: $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 
-t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS)
+t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS) $(REFTABLE_TEST_LIB)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS)
 
 check-sha1:: t/helper/test-tool$X
diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c
new file mode 100644
index 00000000000..95d18ba1fa9
--- /dev/null
+++ b/t/helper/test-reftable.c
@@ -0,0 +1,15 @@
+#include "reftable/reftable-tests.h"
+#include "test-tool.h"
+
+int cmd__reftable(int argc, const char **argv)
+{
+	block_test_main(argc, argv);
+	merged_test_main(argc, argv);
+	record_test_main(argc, argv);
+	refname_test_main(argc, argv);
+	reftable_test_main(argc, argv);
+	slice_test_main(argc, argv);
+	stack_test_main(argc, argv);
+	tree_test_main(argc, argv);
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 590b2efca70..10366b7b762 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -52,6 +52,7 @@ static struct test_cmd cmds[] = {
 	{ "read-graph", cmd__read_graph },
 	{ "read-midx", cmd__read_midx },
 	{ "ref-store", cmd__ref_store },
+	{ "reftable", cmd__reftable },
 	{ "regex", cmd__regex },
 	{ "repository", cmd__repository },
 	{ "revision-walking", cmd__revision_walking },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ddc8e990e91..d52ba2f5e57 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -41,6 +41,7 @@ int cmd__read_cache(int argc, const char **argv);
 int cmd__read_graph(int argc, const char **argv);
 int cmd__read_midx(int argc, const char **argv);
 int cmd__ref_store(int argc, const char **argv);
+int cmd__reftable(int argc, const char **argv);
 int cmd__regex(int argc, const char **argv);
 int cmd__repository(int argc, const char **argv);
 int cmd__revision_walking(int argc, const char **argv);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
index 802afb40e70..eeca76965a5 100755
--- a/t/t0031-reftable.sh
+++ b/t/t0031-reftable.sh
@@ -15,6 +15,11 @@ initialize ()  {
 	mv .git/hooks .git/hooks-disabled
 }
 
+test_expect_success 'unittests' '
+	test-tool reftable
+'
+
+
 test_expect_success 'delete ref' '
 	initialize &&
 	test_commit file &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 14/17] Add GIT_DEBUG_REFS debugging mechanism
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (12 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 13/17] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 15/17] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
                                                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

When set in the environment, GIT_DEBUG_REFS makes git print operations and
results as they flow through the ref storage backend. This helps debug
discrepancies between different ref backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile              |   1 +
 refs.c                |   3 +
 refs/debug.c          | 358 ++++++++++++++++++++++++++++++++++++++++++
 refs/refs-internal.h  |   6 +
 t/t0033-debug-refs.sh |  18 +++
 5 files changed, 386 insertions(+)
 create mode 100644 refs/debug.c
 create mode 100755 t/t0033-debug-refs.sh

diff --git a/Makefile b/Makefile
index feb2e52ad22..9c71bbdcbb8 100644
--- a/Makefile
+++ b/Makefile
@@ -961,6 +961,7 @@ LIB_OBJS += rebase.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
+LIB_OBJS += refs/debug.o
 LIB_OBJS += refs/files-backend.o
 LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
diff --git a/refs.c b/refs.c
index 4409080dfd8..bd1f3cc0e45 100644
--- a/refs.c
+++ b/refs.c
@@ -1750,6 +1750,9 @@ struct ref_store *get_main_ref_store(struct repository *r)
 						 r->ref_storage_format :
 						 DEFAULT_REF_STORAGE,
 					 REF_STORE_ALL_CAPS);
+	if (getenv("GIT_DEBUG_REFS")) {
+		r->refs_private = debug_wrap(r->refs_private);
+	}
 	return r->refs_private;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
new file mode 100644
index 00000000000..c33e684f5fc
--- /dev/null
+++ b/refs/debug.c
@@ -0,0 +1,358 @@
+
+#include "refs-internal.h"
+
+struct debug_ref_store {
+	struct ref_store base;
+	struct ref_store *refs;
+};
+
+extern struct ref_storage_be refs_be_debug;
+struct ref_store *debug_wrap(struct ref_store *store);
+
+struct ref_store *debug_wrap(struct ref_store *store)
+{
+	struct debug_ref_store *res = malloc(sizeof(struct debug_ref_store));
+	res->refs = store;
+	base_ref_store_init((struct ref_store *)res, &refs_be_debug);
+	return (struct ref_store *)res;
+}
+
+static int debug_init_db(struct ref_store *refs, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res = drefs->refs->be->init_db(drefs->refs, err);
+	return res;
+}
+
+static int debug_transaction_prepare(struct ref_store *refs,
+				     struct ref_transaction *transaction,
+				     struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_prepare(drefs->refs, transaction,
+						   err);
+	return res;
+}
+
+static void print_update(int i, const char *refname,
+			 const struct object_id *old_oid,
+			 const struct object_id *new_oid, unsigned int flags,
+			 unsigned int type, const char *msg)
+{
+	char o[200] = "null";
+	char n[200] = "null";
+	if (old_oid)
+		oid_to_hex_r(o, old_oid);
+	if (new_oid)
+		oid_to_hex_r(n, new_oid);
+
+	type &= 0xf; /* see refs.h REF_* */
+	flags &= REF_HAVE_NEW | REF_HAVE_OLD | REF_NO_DEREF |
+		 REF_FORCE_CREATE_REFLOG | REF_LOG_ONLY;
+	printf("%d: %s %s -> %s (F=0x%x, T=0x%x) \"%s\"\n", i, refname, o, n,
+	       flags, type, msg);
+}
+
+static void print_transaction(struct ref_transaction *transaction)
+{
+	printf("transaction {\n");
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		print_update(i, u->refname, &u->old_oid, &u->new_oid, u->flags,
+			     u->type, u->msg);
+	}
+	printf("}\n");
+}
+
+static int debug_transaction_finish(struct ref_store *refs,
+				    struct ref_transaction *transaction,
+				    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_finish(drefs->refs, transaction,
+						  err);
+	print_transaction(transaction);
+	printf("finish: %d\n", res);
+	return res;
+}
+
+static int debug_transaction_abort(struct ref_store *refs,
+				   struct ref_transaction *transaction,
+				   struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_abort(drefs->refs, transaction, err);
+	return res;
+}
+
+static int debug_initial_transaction_commit(struct ref_store *refs,
+					    struct ref_transaction *transaction,
+					    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->initial_transaction_commit(drefs->refs,
+							  transaction, err);
+	return res;
+}
+
+static int debug_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->pack_refs(drefs->refs, flags);
+	return res;
+}
+
+static int debug_create_symref(struct ref_store *ref_store,
+			       const char *ref_name, const char *target,
+			       const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_symref(drefs->refs, ref_name, target,
+						 logmsg);
+	printf("create_symref: %s -> %s \"%s\": %d\n", ref_name, target, logmsg,
+	       res);
+	return res;
+}
+
+static int debug_delete_refs(struct ref_store *ref_store, const char *msg,
+			     struct string_list *refnames, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->delete_refs(drefs->refs, msg, refnames, flags);
+	return res;
+}
+
+static int debug_rename_ref(struct ref_store *ref_store, const char *oldref,
+			    const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->rename_ref(drefs->refs, oldref, newref,
+					      logmsg);
+	printf("rename_ref: %s -> %s \"%s\": %d\n", oldref, newref, logmsg,
+	       res);
+	return res;
+}
+
+static int debug_copy_ref(struct ref_store *ref_store, const char *oldref,
+			  const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->copy_ref(drefs->refs, oldref, newref, logmsg);
+	printf("copy_ref: %s -> %s \"%s\": %d\n", oldref, newref, logmsg, res);
+	return res;
+}
+
+static int debug_write_pseudoref(struct ref_store *ref_store,
+				 const char *pseudoref,
+				 const struct object_id *oid,
+				 const struct object_id *old_oid,
+				 struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->write_pseudoref(drefs->refs, pseudoref, oid,
+						   old_oid, err);
+	char o[100] = "null";
+	char n[100] = "null";
+	if (oid)
+		oid_to_hex_r(o, oid);
+	if (old_oid)
+		oid_to_hex_r(n, old_oid);
+	printf("write_pseudoref: %s, %s => %s, err %s: %d\n", pseudoref, o, n,
+	       err->buf, res);
+	return res;
+}
+
+static int debug_delete_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *old_oid)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->delete_pseudoref(drefs->refs, pseudoref,
+						    old_oid);
+	char hex[100] = "null";
+	if (old_oid)
+		oid_to_hex_r(hex, old_oid);
+	printf("delete_pseudoref: %s (%s): %d\n", pseudoref, hex, res);
+	return res;
+}
+
+static struct ref_iterator *
+debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			 unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->iterator_begin(drefs->refs, prefix, flags);
+	return res;
+}
+
+static int debug_read_raw_ref(struct ref_store *ref_store, const char *refname,
+			      struct object_id *oid, struct strbuf *referent,
+			      unsigned int *type)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = 0;
+
+	oidcpy(oid, &null_oid);
+	res = drefs->refs->be->read_raw_ref(drefs->refs, refname, oid, referent,
+					    type);
+
+	if (res == 0) {
+		printf("read_raw_ref: %s: %s (=> %s) type %x: %d\n", refname,
+		       oid_to_hex(oid), referent->buf, *type, res);
+	} else {
+		printf("read_raw_ref: %s err %d\n", refname, res);
+	}
+	return res;
+}
+
+static struct ref_iterator *
+debug_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->reflog_iterator_begin(drefs->refs);
+	printf("for_each_reflog_iterator_begin\n");
+	return res;
+}
+
+struct debug_reflog {
+	const char *refname;
+	each_reflog_ent_fn *fn;
+	void *cb_data;
+};
+
+static int debug_print_reflog_ent(struct object_id *old_oid,
+				  struct object_id *new_oid,
+				  const char *committer, timestamp_t timestamp,
+				  int tz, const char *msg, void *cb_data)
+{
+	struct debug_reflog *dbg = (struct debug_reflog *)cb_data;
+	int ret;
+	char o[100] = "null";
+	char n[100] = "null";
+	if (old_oid)
+		oid_to_hex_r(o, old_oid);
+	if (new_oid)
+		oid_to_hex_r(n, new_oid);
+
+	ret = dbg->fn(old_oid, new_oid, committer, timestamp, tz, msg,
+		      dbg->cb_data);
+	printf("reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
+	       dbg->refname, ret, o, n, committer, (long int)timestamp, msg);
+	return ret;
+}
+
+static int debug_for_each_reflog_ent(struct ref_store *ref_store,
+				     const char *refname, each_reflog_ent_fn fn,
+				     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct debug_reflog dbg = {
+		.refname = refname,
+		.fn = fn,
+		.cb_data = cb_data,
+	};
+
+	int res = drefs->refs->be->for_each_reflog_ent(
+		drefs->refs, refname, &debug_print_reflog_ent, &dbg);
+	printf("for_each_reflog: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_for_each_reflog_ent_reverse(struct ref_store *ref_store,
+					     const char *refname,
+					     each_reflog_ent_fn fn,
+					     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct debug_reflog dbg = {
+		.refname = refname,
+		.fn = fn,
+		.cb_data = cb_data,
+	};
+	int res = drefs->refs->be->for_each_reflog_ent_reverse(
+		drefs->refs, refname, &debug_print_reflog_ent, &dbg);
+	printf("for_each_reflog_reverse: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_reflog_exists(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_exists(drefs->refs, refname);
+	printf("reflog_exists: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_create_reflog(struct ref_store *ref_store, const char *refname,
+			       int force_create, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_reflog(drefs->refs, refname,
+						 force_create, err);
+	return res;
+}
+
+static int debug_delete_reflog(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->delete_reflog(drefs->refs, refname);
+	return res;
+}
+
+static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
+			       const struct object_id *oid, unsigned int flags,
+			       reflog_expiry_prepare_fn prepare_fn,
+			       reflog_expiry_should_prune_fn should_prune_fn,
+			       reflog_expiry_cleanup_fn cleanup_fn,
+			       void *policy_cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_expire(drefs->refs, refname, oid,
+						 flags, prepare_fn,
+						 should_prune_fn, cleanup_fn,
+						 policy_cb_data);
+	return res;
+}
+
+struct ref_storage_be refs_be_debug = {
+	NULL,
+	"debug",
+	NULL,
+	debug_init_db,
+	debug_transaction_prepare,
+	debug_transaction_finish,
+	debug_transaction_abort,
+	debug_initial_transaction_commit,
+
+	debug_pack_refs,
+	debug_create_symref,
+	debug_delete_refs,
+	debug_rename_ref,
+	debug_copy_ref,
+
+	debug_write_pseudoref,
+	debug_delete_pseudoref,
+
+	debug_ref_iterator_begin,
+	debug_read_raw_ref,
+
+	debug_reflog_iterator_begin,
+	debug_for_each_reflog_ent,
+	debug_for_each_reflog_ent_reverse,
+	debug_reflog_exists,
+	debug_create_reflog,
+	debug_delete_reflog,
+	debug_reflog_expire,
+};
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 7afe4c28310..5417623c86e 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -713,4 +713,10 @@ struct ref_store {
 void base_ref_store_init(struct ref_store *refs,
 			 const struct ref_storage_be *be);
 
+/*
+ * Print out ref operations as they occur. Useful for debugging alternate ref
+ * backends.
+ */
+struct ref_store *debug_wrap(struct ref_store *store);
+
 #endif /* REFS_REFS_INTERNAL_H */
diff --git a/t/t0033-debug-refs.sh b/t/t0033-debug-refs.sh
new file mode 100755
index 00000000000..ed76c2c84a1
--- /dev/null
+++ b/t/t0033-debug-refs.sh
@@ -0,0 +1,18 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='cross-check reftable with files, using GIT_DEBUG_REFS output'
+
+. ./test-lib.sh
+
+test_expect_success 'GIT_DEBUG_REFS' '
+	git init --ref-storage=files files &&
+	git init --ref-storage=reftable reftable &&
+	(cd files && GIT_DEBUG_REFS=1 test_commit message file) > files.txt &&
+	(cd reftable && GIT_DEBUG_REFS=1 test_commit message file) > reftable.txt &&
+	test_cmp files.txt reftable.txt
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 15/17] vcxproj: adjust for the reftable changes
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (13 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 14/17] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Johannes Schindelin via GitGitGadget
  2020-06-16 19:20                                 ` [PATCH v17 16/17] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
                                                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 409+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This allows Git to be compiled via Visual Studio again after integrating
the `hn/reftable` branch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.mak.uname                           |  2 +-
 contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/config.mak.uname b/config.mak.uname
index c7eba69e54e..ae4e25a1a42 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -709,7 +709,7 @@ vcxproj:
 	# Make .vcxproj files and add them
 	unset QUIET_GEN QUIET_BUILT_IN; \
 	perl contrib/buildsystems/generate -g Vcxproj
-	git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
+	git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj
 
 	# Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
 	(echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm
index 5c666f9ac03..33a08d31652 100644
--- a/contrib/buildsystems/Generators/Vcxproj.pm
+++ b/contrib/buildsystems/Generators/Vcxproj.pm
@@ -77,7 +77,7 @@ sub createProject {
     my $libs_release = "\n    ";
     my $libs_debug = "\n    ";
     if (!$static_library) {
-      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
+      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
       $libs_debug = $libs_release;
       $libs_debug =~ s/zlib\.lib/zlibd\.lib/g;
       $libs_debug =~ s/libcurl\.lib/libcurl-d\.lib/g;
@@ -231,6 +231,7 @@ sub createProject {
 EOM
     if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') {
       my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"};
+      my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"};
       my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"};
 
       print F << "EOM";
@@ -240,6 +241,14 @@ sub createProject {
       <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
     </ProjectReference>
 EOM
+      if (!($name =~ /xdiff|libreftable/)) {
+        print F << "EOM";
+    <ProjectReference Include="$cdup\\reftable\\libreftable\\libreftable.vcxproj">
+      <Project>$uuid_libreftable</Project>
+      <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
+    </ProjectReference>
+EOM
+      }
       if (!($name =~ 'xdiff')) {
         print F << "EOM";
     <ProjectReference Include="$cdup\\xdiff\\lib\\xdiff_lib.vcxproj">
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 16/17] Add reftable testing infrastructure
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (14 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 15/17] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-19 16:03                                   ` SZEDER Gábor
  2020-06-16 19:20                                 ` [PATCH v17 17/17] Add "test-tool dump-reftable" command Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  17 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* Add GIT_TEST_REFTABLE environment var to control default ref storage

* Add test_prerequisite REFTABLE.

* Skip some tests that are incompatible:

  * t3210-pack-refs.sh - does not apply
  * t9903-bash-prompt - The bash mode reads .git/HEAD directly
  * t1450-fsck.sh - manipulates .git/ directly to create invalid state

Major test failures:

 * t1400-update-ref.sh - Reads from .git/{refs,logs} directly
 * t1404-update-ref-errors.sh - Manipulates .git/refs/ directly
 * t1405 - inspecs .git/ directly.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/clone.c               | 2 +-
 refs.c                        | 6 +++---
 t/t1409-avoid-packing-refs.sh | 6 ++++++
 t/t1450-fsck.sh               | 6 ++++++
 t/t3210-pack-refs.sh          | 6 ++++++
 t/t9903-bash-prompt.sh        | 6 ++++++
 t/test-lib.sh                 | 5 +++++
 7 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 8910af03d95..3deb24a7040 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1112,7 +1112,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 
 	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
-		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
+		default_ref_storage(), INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/refs.c b/refs.c
index bd1f3cc0e45..ba5239b8421 100644
--- a/refs.c
+++ b/refs.c
@@ -1748,7 +1748,7 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	r->refs_private = ref_store_init(r->gitdir,
 					 r->ref_storage_format ?
 						 r->ref_storage_format :
-						 DEFAULT_REF_STORAGE,
+						 default_ref_storage(),
 					 REF_STORE_ALL_CAPS);
 	if (getenv("GIT_DEBUG_REFS")) {
 		r->refs_private = debug_wrap(r->refs_private);
@@ -1807,7 +1807,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
+	refs = ref_store_init(submodule_sb.buf, default_ref_storage(),
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1821,7 +1821,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
-	const char *format = DEFAULT_REF_STORAGE; /* XXX */
+	const char *format = default_ref_storage();
 	struct ref_store *refs;
 	const char *id;
 
diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
index be12fb63506..c6f78325563 100755
--- a/t/t1409-avoid-packing-refs.sh
+++ b/t/t1409-avoid-packing-refs.sh
@@ -4,6 +4,12 @@ test_description='avoid rewriting packed-refs unnecessarily'
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 # Add an identifying mark to the packed-refs file header line. This
 # shouldn't upset readers, and it should be omitted if the file is
 # ever rewritten.
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 344a2aad82f..09669203249 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -8,6 +8,12 @@ test_description='git fsck random collection of tests
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success setup '
 	test_oid_init &&
 	git config gc.auto 0 &&
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index f41b2afb996..edaef2c175a 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -11,6 +11,12 @@ semantic is still the same.
 '
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success 'enable reflogs' '
 	git config core.logallrefupdates true
 '
diff --git a/t/t9903-bash-prompt.sh b/t/t9903-bash-prompt.sh
index ab5da2cabc4..5deaaf968b0 100755
--- a/t/t9903-bash-prompt.sh
+++ b/t/t9903-bash-prompt.sh
@@ -7,6 +7,12 @@ test_description='test git-specific bash prompt functions'
 
 . ./lib-bash.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping tests; incompatible with reftable'
+  test_done
+fi
+
 . "$GIT_BUILD_DIR/contrib/completion/git-prompt.sh"
 
 actual="$TRASH_DIRECTORY/actual"
diff --git a/t/test-lib.sh b/t/test-lib.sh
index dbc027ff267..3ce9b957b1b 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1504,6 +1504,11 @@ parisc* | hppa*)
 	;;
 esac
 
+if test -n "$GIT_TEST_REFTABLE"
+then
+  test_set_prereq REFTABLE
+fi
+
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v17 17/17] Add "test-tool dump-reftable" command.
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (15 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 16/17] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
@ 2020-06-16 19:20                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  17 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-16 19:20 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This command dumps individual tables or a stack of of tables.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile                 | 1 +
 t/helper/test-reftable.c | 5 +++++
 t/helper/test-tool.c     | 1 +
 t/helper/test-tool.h     | 1 +
 4 files changed, 8 insertions(+)

diff --git a/Makefile b/Makefile
index 9c71bbdcbb8..c79b3d83876 100644
--- a/Makefile
+++ b/Makefile
@@ -2375,6 +2375,7 @@ REFTABLE_OBJS += reftable/writer.o
 REFTABLE_OBJS += reftable/zlib-compat.o
 
 REFTABLE_TEST_OBJS += reftable/block_test.o
+REFTABLE_TEST_OBJS += reftable/dump.o
 REFTABLE_TEST_OBJS += reftable/merged_test.o
 REFTABLE_TEST_OBJS += reftable/record_test.o
 REFTABLE_TEST_OBJS += reftable/refname_test.o
diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c
index 95d18ba1fa9..339f4e148d7 100644
--- a/t/helper/test-reftable.c
+++ b/t/helper/test-reftable.c
@@ -13,3 +13,8 @@ int cmd__reftable(int argc, const char **argv)
 	tree_test_main(argc, argv);
 	return 0;
 }
+
+int cmd__dump_reftable(int argc, const char **argv)
+{
+	return reftable_dump_main(argc, (char *const *)argv);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 10366b7b762..9e689f9d2b3 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -53,6 +53,7 @@ static struct test_cmd cmds[] = {
 	{ "read-midx", cmd__read_midx },
 	{ "ref-store", cmd__ref_store },
 	{ "reftable", cmd__reftable },
+	{ "dump-reftable", cmd__dump_reftable },
 	{ "regex", cmd__regex },
 	{ "repository", cmd__repository },
 	{ "revision-walking", cmd__revision_walking },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index d52ba2f5e57..bf833e01d45 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -17,6 +17,7 @@ int cmd__dump_cache_tree(int argc, const char **argv);
 int cmd__dump_fsmonitor(int argc, const char **argv);
 int cmd__dump_split_index(int argc, const char **argv);
 int cmd__dump_untracked_cache(int argc, const char **argv);
+int cmd__dump_reftable(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__genzeros(int argc, const char **argv);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v17 12/17] Reftable support for git-core
  2020-06-16 19:20                                 ` [PATCH v17 12/17] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-06-19 14:24                                   ` SZEDER Gábor
  0 siblings, 0 replies; 409+ messages in thread
From: SZEDER Gábor @ 2020-06-19 14:24 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

On Tue, Jun 16, 2020 at 07:20:37PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> +static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
> +{
> +	struct git_reftable_ref_store *refs =
> +		(struct git_reftable_ref_store *)ref_store;
> +	struct strbuf sb = STRBUF_INIT;
> +
> +	safe_create_dir(refs->reftable_dir, 1);
> +
> +	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
> +	write_file(sb.buf, "ref: refs/.invalid");

The "Backward compatibility" section of the specs says:

  * `.git/HEAD`, a regular file containing `ref: refs/heads/.invalid`.

not "refs/.invalid".

> +	strbuf_reset(&sb);
> +
> +	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
> +	safe_create_dir(sb.buf, 1);
> +	strbuf_reset(&sb);
> +
> +	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
> +	write_file(sb.buf, "this repository uses the reftable format");
> +
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v17 16/17] Add reftable testing infrastructure
  2020-06-16 19:20                                 ` [PATCH v17 16/17] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
@ 2020-06-19 16:03                                   ` SZEDER Gábor
  0 siblings, 0 replies; 409+ messages in thread
From: SZEDER Gábor @ 2020-06-19 16:03 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

On Tue, Jun 16, 2020 at 07:20:41PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> * Add GIT_TEST_REFTABLE environment var to control default ref storage
> 
> * Add test_prerequisite REFTABLE.
> 
> * Skip some tests that are incompatible:

>   * t9903-bash-prompt - The bash mode reads .git/HEAD directly

The patch below fixes this incompatibility, relying on the fact that
the reftable specs clearly specify what must be written to the
placeholder '.git/HEAD'.  You can queue this as a preparatory patch to
this series.

Note, however, that, as poined out in my other email, the current
reftable code doesn't follow the specs WRT what must be written to
'.git/HEAD'.  The patch below follows the code, not the specs, so it
will have to be updated as well.

  ---  >8  ---

Subject: [PATCH] git-prompt: prepare for reftable refs backend

In our git-prompt script we strive to use Bash builtins wherever
possible, because fork()-ing subshells for command substitutions and
fork()+exec()-ing Git commands are expensive on some platforms.  We
even read and parse '.git/HEAD' using Bash builtins to get the name of
the current branch [1].  However, the upcoming reftable refs backend
won't use '.git/HEAD' at all, but will write an invalid refname as
placeholder for backwards compatibility instead, which will break our
git-prompt script.

Update the git-prompt script to recognize the placeholder '.git/HEAD'
written by the reftable backend (its content is specified in the
reftable specs), and then fall back to use 'git symbolic-ref' to get
the name of the current branch.

[1] 3a43c4b5bd (bash prompt: use bash builtins to find out current
    branch, 2011-03-31)

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 contrib/completion/git-prompt.sh | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/contrib/completion/git-prompt.sh b/contrib/completion/git-prompt.sh
index 014cd7c3cf..708d0b4b95 100644
--- a/contrib/completion/git-prompt.sh
+++ b/contrib/completion/git-prompt.sh
@@ -460,10 +460,15 @@ __git_ps1 ()
 			if ! __git_eread "$g/HEAD" head; then
 				return $exit
 			fi
-			# is it a symbolic ref?
 			b="${head#ref: }"
 			if [ "$head" = "$b" ]; then
 				detached=yes
+			elif [ "$b" = "refs/.invalid" ]; then
+				# Reftable
+				b="$(git symbolic-ref HEAD 2>/dev/null)" ||
+				detached=yes
+			fi
+			if [ "$detached" = yes ]; then
 				b="$(
 				case "${GIT_PS1_DESCRIBE_STYLE-}" in
 				(contains)
-- 
2.27.0.365.g477ada09ef


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 00/19] Reftable support git-core
  2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
                                                   ` (16 preceding siblings ...)
  2020-06-16 19:20                                 ` [PATCH v17 17/17] Add "test-tool dump-reftable" command Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                 ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 01/19] lib-t6000.sh: write tag using git-update-ref Han-Wen Nienhuys via GitGitGadget
                                                     ` (19 more replies)
  17 siblings, 20 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend. Based on
hn/refs-cleanup.

Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1

Summary 21561 tests pass 899 tests fail

Some issues:

 * many tests inspect .git/{logs,heads}/ directly.
 * worktrees broken. 

v21

 * reftable/ is now fully imported from upstream.
 * reftable/ uses strbuf
 * fix of the HEAD file.
 * check for reload on every read

Han-Wen Nienhuys (17):
  lib-t6000.sh: write tag using git-update-ref
  checkout: add '\n' to reflog message
  Write pseudorefs through ref backends.
  Make refs_ref_exists public
  Treat BISECT_HEAD as a pseudo ref
  Treat CHERRY_PICK_HEAD as a pseudo ref
  Treat REVERT_HEAD as a pseudo ref
  Move REF_LOG_ONLY to refs-internal.h
  Iterate over the "refs/" namespace in for_each_[raw]ref
  Add .gitattributes for the reftable/ directory
  Add reftable library
  Add standalone build infrastructure for reftable
  Reftable support for git-core
  Hookup unittests for the reftable library.
  Add GIT_DEBUG_REFS debugging mechanism
  Add reftable testing infrastructure
  Add "test-tool dump-reftable" command.

Johannes Schindelin (1):
  vcxproj: adjust for the reftable changes

SZEDER Gábor (1):
  git-prompt: prepare for reftable refs backend

 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   47 +-
 builtin/bisect--helper.c                      |    3 +-
 builtin/checkout.c                            |    5 +-
 builtin/clone.c                               |    3 +-
 builtin/commit.c                              |   34 +-
 builtin/init-db.c                             |   56 +-
 builtin/merge.c                               |    2 +-
 cache.h                                       |    6 +-
 config.mak.uname                              |    2 +-
 contrib/buildsystems/Generators/Vcxproj.pm    |   11 +-
 contrib/completion/git-prompt.sh              |    7 +-
 git-bisect.sh                                 |    4 +-
 path.c                                        |    2 -
 path.h                                        |    9 +-
 refs.c                                        |  157 +-
 refs.h                                        |   16 +
 refs/debug.c                                  |  411 +++++
 refs/files-backend.c                          |  121 +-
 refs/packed-backend.c                         |   21 +-
 refs/refs-internal.h                          |   32 +
 refs/reftable-backend.c                       | 1335 +++++++++++++++++
 reftable/.gitattributes                       |    1 +
 reftable/BUILD                                |  203 +++
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   33 +
 reftable/VERSION                              |    1 +
 reftable/WORKSPACE                            |   14 +
 reftable/basics.c                             |  215 +++
 reftable/basics.h                             |   53 +
 reftable/block.c                              |  432 ++++++
 reftable/block.h                              |  129 ++
 reftable/block_test.c                         |  157 ++
 reftable/compat.c                             |   98 ++
 reftable/compat.h                             |   48 +
 reftable/constants.h                          |   21 +
 reftable/dump.c                               |  212 +++
 reftable/file.c                               |   95 ++
 reftable/iter.c                               |  242 +++
 reftable/iter.h                               |   72 +
 reftable/merged.c                             |  321 ++++
 reftable/merged.h                             |   39 +
 reftable/merged_test.c                        |  273 ++++
 reftable/pq.c                                 |  115 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  744 +++++++++
 reftable/reader.h                             |   65 +
 reftable/record.c                             | 1126 ++++++++++++++
 reftable/record.h                             |  143 ++
 reftable/record_test.c                        |  410 +++++
 reftable/refname.c                            |  209 +++
 reftable/refname.h                            |   38 +
 reftable/refname_test.c                       |   99 ++
 reftable/reftable-tests.h                     |   22 +
 reftable/reftable.c                           |   90 ++
 reftable/reftable.h                           |  571 +++++++
 reftable/reftable_test.c                      |  631 ++++++++
 reftable/stack.c                              | 1209 +++++++++++++++
 reftable/stack.h                              |   48 +
 reftable/stack_test.c                         |  787 ++++++++++
 reftable/strbuf.c                             |  206 +++
 reftable/strbuf.h                             |   88 ++
 reftable/strbuf_test.c                        |   39 +
 reftable/system.h                             |   53 +
 reftable/test_framework.c                     |   69 +
 reftable/test_framework.h                     |   64 +
 reftable/tree.c                               |   63 +
 reftable/tree.h                               |   34 +
 reftable/tree_test.c                          |   62 +
 reftable/update.sh                            |   19 +
 reftable/writer.c                             |  662 ++++++++
 reftable/writer.h                             |   60 +
 reftable/zlib-compat.c                        |   92 ++
 reftable/zlib.BUILD                           |   36 +
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 sequencer.c                                   |   56 +-
 setup.c                                       |   12 +-
 t/helper/test-reftable.c                      |   20 +
 t/helper/test-tool.c                          |    2 +
 t/helper/test-tool.h                          |    2 +
 t/lib-t6000.sh                                |    5 +-
 t/t0031-reftable.sh                           |  160 ++
 t/t0033-debug-refs.sh                         |   18 +
 t/t1409-avoid-packing-refs.sh                 |    6 +
 t/t1450-fsck.sh                               |    6 +
 t/t3210-pack-refs.sh                          |    6 +
 t/test-lib.sh                                 |    5 +
 wt-status.c                                   |    6 +-
 89 files changed, 12977 insertions(+), 201 deletions(-)
 create mode 100644 refs/debug.c
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/.gitattributes
 create mode 100644 reftable/BUILD
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/WORKSPACE
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/compat.c
 create mode 100644 reftable/compat.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/refname_test.c
 create mode 100644 reftable/reftable-tests.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/strbuf.c
 create mode 100644 reftable/strbuf.h
 create mode 100644 reftable/strbuf_test.c
 create mode 100644 reftable/system.h
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100644 reftable/zlib.BUILD
 create mode 100644 t/helper/test-reftable.c
 create mode 100755 t/t0031-reftable.sh
 create mode 100755 t/t0033-debug-refs.sh


base-commit: 4d328a12d95bbea4e742463974ca03e49933bf7e
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v18
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v18
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v17:

  1:  8304c3d6379 =  1:  b968b795af9 lib-t6000.sh: write tag using git-update-ref
  2:  4012d801e3c =  2:  52102259773 checkout: add '\n' to reflog message
  3:  95a6a1d968e =  3:  15c9dd66e17 Write pseudorefs through ref backends.
  4:  1f8865f4b3e =  4:  a5bce2e3fe6 Make refs_ref_exists public
  5:  7f376a76d84 =  5:  a29d898907a Treat BISECT_HEAD as a pseudo ref
  6:  959c69b5ee4 =  6:  11a690d2b8e Treat CHERRY_PICK_HEAD as a pseudo ref
  7:  3f18475d0d3 =  7:  4e52ec0dbc1 Treat REVERT_HEAD as a pseudo ref
  8:  4981e5395c6 =  8:  37e350af159 Move REF_LOG_ONLY to refs-internal.h
  9:  f452c48ae44 =  9:  468f00eaf67 Iterate over the "refs/" namespace in for_each_[raw]ref
 10:  1fa68d5d34f = 10:  21febeaa81f Add .gitattributes for the reftable/ directory
 11:  86646c834c2 ! 11:  3c84f43cfa0 Add reftable library
     @@ Metadata
       ## Commit message ##
          Add reftable library
      
     -    Reftable is a format for storing the ref database. Its rationale and
     -    specification is in the preceding commit.
     +    Reftable is a format for storing the ref database. The rationale and format
     +    layout is detailed in commit 35e6c47404 ("reftable: file format documentation",
     +    2020-05-20).
      
     -    This imports the upstream library as one big commit. For understanding
     -    the code, it is suggested to read the code in the following order:
     +    This commits imports a fully functioning library for reading and writing
     +    reftables, along with unittests.
     +
     +    It is suggested to read the code in the following order:
      
          * The specification under Documentation/technical/reftable.txt
      
     @@ Commit message
      
          * block.{c,h} - reading and writing blocks.
      
     +    * reader.{c,h} - reading a complete reftable file.
     +
          * writer.{c,h} - writing a complete reftable file.
      
          * merged.{c,h} and pq.{c,h} - reading a stack of reftables
     @@ reftable/LICENSE (new)
      +(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
      +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
      
     - ## reftable/README.md (new) ##
     -@@
     -+
     -+The source code in this directory comes from https://github.com/google/reftable.
     -+
     -+The VERSION file keeps track of the current version of the reftable library.
     -+
     -+To update the library, do:
     -+
     -+   sh reftable/update.sh
     -+
     -+Bugfixes should be accompanied by a test and applied to upstream project at
     -+https://github.com/google/reftable.
     -
       ## reftable/VERSION (new) ##
      @@
     -+2c91c4b305dcbf6500c0806bb1a7fbcfc668510c C: include system.h in compat.h
     ++d7472cbd1afe6bf3da53e94971b1cc79ce183fa8 Remove outdated bits of the README file
      
       ## reftable/basics.c (new) ##
      @@
     @@ reftable/basics.c (new)
      +
      +#include "system.h"
      +
     -+void put_be24(byte *out, uint32_t i)
     ++void put_be24(uint8_t *out, uint32_t i)
      +{
     -+	out[0] = (byte)((i >> 16) & 0xff);
     -+	out[1] = (byte)((i >> 8) & 0xff);
     -+	out[2] = (byte)(i & 0xff);
     ++	out[0] = (uint8_t)((i >> 16) & 0xff);
     ++	out[1] = (uint8_t)((i >> 8) & 0xff);
     ++	out[2] = (uint8_t)(i & 0xff);
      +}
      +
     -+uint32_t get_be24(byte *in)
     ++uint32_t get_be24(uint8_t *in)
      +{
      +	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
      +	       (uint32_t)(in[2]);
     @@ reftable/basics.h (new)
      +
      +/* Bigendian en/decoding of integers */
      +
     -+void put_be24(byte *out, uint32_t i);
     -+uint32_t get_be24(byte *in);
     ++void put_be24(uint8_t *out, uint32_t i);
     ++uint32_t get_be24(uint8_t *in);
      +void put_be16(uint8_t *out, uint16_t i);
      +
      +/*
     @@ reftable/block.c (new)
      +}
      +
      +int block_writer_register_restart(struct block_writer *w, int n, bool restart,
     -+				  struct slice *key);
     ++				  struct strbuf *key);
      +
     -+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
     ++void block_writer_init(struct block_writer *bw, uint8_t typ, uint8_t *buf,
      +		       uint32_t block_size, uint32_t header_off, int hash_size)
      +{
      +	bw->buf = buf;
     @@ reftable/block.c (new)
      +	bw->entries = 0;
      +	bw->restart_len = 0;
      +	bw->last_key.len = 0;
     -+	bw->last_key.canary = SLICE_CANARY;
      +}
      +
     -+byte block_writer_type(struct block_writer *bw)
     ++uint8_t block_writer_type(struct block_writer *bw)
      +{
      +	return bw->buf[bw->header_off];
      +}
     @@ reftable/block.c (new)
      +   success */
      +int block_writer_add(struct block_writer *w, struct reftable_record *rec)
      +{
     -+	struct slice empty = SLICE_INIT;
     -+	struct slice last = w->entries % w->restart_interval == 0 ? empty :
     -+								    w->last_key;
     -+	struct slice out = {
     ++	struct strbuf empty = STRBUF_INIT;
     ++	struct strbuf last =
     ++		w->entries % w->restart_interval == 0 ? empty : w->last_key;
     ++	struct string_view out = {
      +		.buf = w->buf + w->next,
      +		.len = w->block_size - w->next,
     -+		.canary = SLICE_CANARY,
      +	};
      +
     -+	struct slice start = out;
     ++	struct string_view start = out;
      +
      +	bool restart = false;
     -+	struct slice key = SLICE_INIT;
     ++	struct strbuf key = STRBUF_INIT;
      +	int n = 0;
      +
      +	reftable_record_key(rec, &key);
     @@ reftable/block.c (new)
      +				reftable_record_val_type(rec));
      +	if (n < 0)
      +		goto done;
     -+	slice_consume(&out, n);
     ++	string_view_consume(&out, n);
      +
      +	n = reftable_record_encode(rec, out, w->hash_size);
      +	if (n < 0)
      +		goto done;
     -+	slice_consume(&out, n);
     ++	string_view_consume(&out, n);
      +
      +	if (block_writer_register_restart(w, start.len - out.len, restart,
      +					  &key) < 0)
      +		goto done;
      +
     -+	slice_release(&key);
     ++	strbuf_release(&key);
      +	return 0;
      +
      +done:
     -+	slice_release(&key);
     ++	strbuf_release(&key);
      +	return -1;
      +}
      +
      +int block_writer_register_restart(struct block_writer *w, int n, bool restart,
     -+				  struct slice *key)
     ++				  struct strbuf *key)
      +{
      +	int rlen = w->restart_len;
      +	if (rlen >= MAX_RESTARTS) {
     @@ reftable/block.c (new)
      +
      +	w->next += n;
      +
     -+	slice_reset(&w->last_key);
     -+	slice_addbuf(&w->last_key, key);
     ++	strbuf_reset(&w->last_key);
     ++	strbuf_addbuf(&w->last_key, key);
      +	w->entries++;
      +	return 0;
      +}
     @@ reftable/block.c (new)
      +
      +	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
      +		int block_header_skip = 4 + w->header_off;
     -+		byte *compressed = NULL;
     ++		uint8_t *compressed = NULL;
      +		int zresult = 0;
      +		uLongf src_len = w->next - block_header_skip;
      +		size_t dest_cap = src_len;
     @@ reftable/block.c (new)
      +	return w->next;
      +}
      +
     -+byte block_reader_type(struct block_reader *r)
     ++uint8_t block_reader_type(struct block_reader *r)
      +{
      +	return r->block.data[r->header_off];
      +}
     @@ reftable/block.c (new)
      +		      int hash_size)
      +{
      +	uint32_t full_block_size = table_block_size;
     -+	byte typ = block->data[header_off];
     ++	uint8_t typ = block->data[header_off];
      +	uint32_t sz = get_be24(block->data + header_off + 1);
      +
      +	uint16_t restart_count = 0;
      +	uint32_t restart_start = 0;
     -+	byte *restart_bytes = NULL;
     ++	uint8_t *restart_bytes = NULL;
      +
      +	if (!reftable_is_block_type(typ))
      +		return REFTABLE_FORMAT_ERROR;
     @@ reftable/block.c (new)
      +		uLongf src_len = block->len - block_header_skip;
      +		/* Log blocks specify the *uncompressed* size in their header.
      +		 */
     -+		byte *uncompressed = reftable_malloc(sz);
     ++		uint8_t *uncompressed = reftable_malloc(sz);
      +
      +		/* Copy over the block header verbatim. It's not compressed. */
      +		memcpy(uncompressed, block->data, block_header_skip);
     @@ reftable/block.c (new)
      +void block_reader_start(struct block_reader *br, struct block_iter *it)
      +{
      +	it->br = br;
     -+	slice_reset(&it->last_key);
     ++	strbuf_reset(&it->last_key);
      +	it->next_off = br->header_off + 4;
      +}
      +
      +struct restart_find_args {
      +	int error;
     -+	struct slice key;
     ++	struct strbuf key;
      +	struct block_reader *r;
      +};
      +
     @@ reftable/block.c (new)
      +{
      +	struct restart_find_args *a = (struct restart_find_args *)args;
      +	uint32_t off = block_reader_restart_offset(a->r, idx);
     -+	struct slice in = {
     ++	struct string_view in = {
      +		.buf = a->r->block.data + off,
      +		.len = a->r->block_len - off,
     -+		.canary = SLICE_CANARY,
      +	};
      +
      +	/* the restart key is verbatim in the block, so this could avoid the
      +	   alloc for decoding the key */
     -+	struct slice rkey = SLICE_INIT;
     -+	struct slice last_key = SLICE_INIT;
     -+	byte unused_extra;
     ++	struct strbuf rkey = STRBUF_INIT;
     ++	struct strbuf last_key = STRBUF_INIT;
     ++	uint8_t unused_extra;
      +	int n = reftable_decode_key(&rkey, &unused_extra, last_key, in);
      +	int result;
      +	if (n < 0) {
     @@ reftable/block.c (new)
      +		return -1;
      +	}
      +
     -+	result = slice_cmp(&a->key, &rkey);
     -+	slice_release(&rkey);
     ++	result = strbuf_cmp(&a->key, &rkey);
     ++	strbuf_release(&rkey);
      +	return result;
      +}
      +
     @@ reftable/block.c (new)
      +{
      +	dest->br = src->br;
      +	dest->next_off = src->next_off;
     -+	slice_reset(&dest->last_key);
     -+	slice_addbuf(&dest->last_key, &src->last_key);
     ++	strbuf_reset(&dest->last_key);
     ++	strbuf_addbuf(&dest->last_key, &src->last_key);
      +}
      +
      +int block_iter_next(struct block_iter *it, struct reftable_record *rec)
      +{
     -+	struct slice in = {
     ++	struct string_view in = {
      +		.buf = it->br->block.data + it->next_off,
      +		.len = it->br->block_len - it->next_off,
     -+		.canary = SLICE_CANARY,
      +	};
     -+	struct slice start = in;
     -+	struct slice key = SLICE_INIT;
     -+	byte extra = 0;
     ++	struct string_view start = in;
     ++	struct strbuf key = STRBUF_INIT;
     ++	uint8_t extra = 0;
      +	int n = 0;
      +
      +	if (it->next_off >= it->br->block_len)
     @@ reftable/block.c (new)
      +	if (n < 0)
      +		return -1;
      +
     -+	slice_consume(&in, n);
     ++	string_view_consume(&in, n);
      +	n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
      +	if (n < 0)
      +		return -1;
     -+	slice_consume(&in, n);
     ++	string_view_consume(&in, n);
      +
     -+	slice_reset(&it->last_key);
     -+	slice_addbuf(&it->last_key, &key);
     ++	strbuf_reset(&it->last_key);
     ++	strbuf_addbuf(&it->last_key, &key);
      +	it->next_off += start.len - in.len;
     -+	slice_release(&key);
     ++	strbuf_release(&key);
      +	return 0;
      +}
      +
     -+int block_reader_first_key(struct block_reader *br, struct slice *key)
     ++int block_reader_first_key(struct block_reader *br, struct strbuf *key)
      +{
     -+	struct slice empty = SLICE_INIT;
     ++	struct strbuf empty = STRBUF_INIT;
      +	int off = br->header_off + 4;
     -+	struct slice in = {
     ++	struct string_view in = {
      +		.buf = br->block.data + off,
      +		.len = br->block_len - off,
     -+		.canary = SLICE_CANARY,
      +	};
      +
     -+	byte extra = 0;
     ++	uint8_t extra = 0;
      +	int n = reftable_decode_key(key, &extra, empty, in);
      +	if (n < 0)
      +		return n;
     @@ reftable/block.c (new)
      +	return 0;
      +}
      +
     -+int block_iter_seek(struct block_iter *it, struct slice *want)
     ++int block_iter_seek(struct block_iter *it, struct strbuf *want)
      +{
      +	return block_reader_seek(it->br, it, want);
      +}
      +
      +void block_iter_close(struct block_iter *it)
      +{
     -+	slice_release(&it->last_key);
     ++	strbuf_release(&it->last_key);
      +}
      +
      +int block_reader_seek(struct block_reader *br, struct block_iter *it,
     -+		      struct slice *want)
     ++		      struct strbuf *want)
      +{
      +	struct restart_find_args args = {
      +		.key = *want,
      +		.r = br,
      +	};
      +	struct reftable_record rec = reftable_new_record(block_reader_type(br));
     -+	struct slice key = SLICE_INIT;
     ++	struct strbuf key = STRBUF_INIT;
      +	int err = 0;
      +	struct block_iter next = {
     -+		.last_key = SLICE_INIT,
     ++		.last_key = STRBUF_INIT,
      +	};
      +
      +	int i = binsearch(br->restart_count, &restart_key_less, &args);
     @@ reftable/block.c (new)
      +			goto done;
      +
      +		reftable_record_key(&rec, &key);
     -+		if (err > 0 || slice_cmp(&key, want) >= 0) {
     ++		if (err > 0 || strbuf_cmp(&key, want) >= 0) {
      +			err = 0;
      +			goto done;
      +		}
     @@ reftable/block.c (new)
      +	}
      +
      +done:
     -+	slice_release(&key);
     -+	slice_release(&next.last_key);
     ++	strbuf_release(&key);
     ++	strbuf_release(&next.last_key);
      +	reftable_record_destroy(&rec);
      +
      +	return err;
     @@ reftable/block.c (new)
      +void block_writer_clear(struct block_writer *bw)
      +{
      +	FREE_AND_NULL(bw->restarts);
     -+	slice_release(&bw->last_key);
     ++	strbuf_release(&bw->last_key);
      +	/* the block is not owned. */
      +}
      
     @@ reftable/block.h (new)
      +  allocation overhead.
      +*/
      +struct block_writer {
     -+	byte *buf;
     ++	uint8_t *buf;
      +	uint32_t block_size;
      +
      +	/* Offset ofof the global header. Nonzero in the first block only. */
     @@ reftable/block.h (new)
      +	int restart_interval;
      +	int hash_size;
      +
     -+	/* Offset of next byte to write. */
     ++	/* Offset of next uint8_t to write. */
      +	uint32_t next;
      +	uint32_t *restarts;
      +	uint32_t restart_len;
      +	uint32_t restart_cap;
      +
     -+	struct slice last_key;
     ++	struct strbuf last_key;
      +	int entries;
      +};
      +
      +/*
      +  initializes the blockwriter to write `typ` entries, using `buf` as temporary
      +  storage. `buf` is not owned by the block_writer. */
     -+void block_writer_init(struct block_writer *bw, byte typ, byte *buf,
     ++void block_writer_init(struct block_writer *bw, uint8_t typ, uint8_t *buf,
      +		       uint32_t block_size, uint32_t header_off, int hash_size);
      +
      +/*
      +  returns the block type (eg. 'r' for ref records.
      +*/
     -+byte block_writer_type(struct block_writer *bw);
     ++uint8_t block_writer_type(struct block_writer *bw);
      +
      +/* appends the record, or -1 if it doesn't fit. */
      +int block_writer_add(struct block_writer *w, struct reftable_record *rec);
     @@ reftable/block.h (new)
      +
      +	/* size of the data, excluding restart data. */
      +	uint32_t block_len;
     -+	byte *restart_bytes;
     ++	uint8_t *restart_bytes;
      +	uint16_t restart_count;
      +
      +	/* size of the data in the file. For log blocks, this is the compressed
     @@ reftable/block.h (new)
      +	struct block_reader *br;
      +
      +	/* key for last entry we read. */
     -+	struct slice last_key;
     ++	struct strbuf last_key;
      +};
      +
      +/* initializes a block reader. */
     @@ reftable/block.h (new)
      +
      +/* Position `it` to the `want` key in the block */
      +int block_reader_seek(struct block_reader *br, struct block_iter *it,
     -+		      struct slice *want);
     ++		      struct strbuf *want);
      +
      +/* Returns the block type (eg. 'r' for refs) */
     -+byte block_reader_type(struct block_reader *r);
     ++uint8_t block_reader_type(struct block_reader *r);
      +
      +/* Decodes the first key in the block */
     -+int block_reader_first_key(struct block_reader *br, struct slice *key);
     ++int block_reader_first_key(struct block_reader *br, struct strbuf *key);
      +
      +void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
      +
     @@ reftable/block.h (new)
      +int block_iter_next(struct block_iter *it, struct reftable_record *rec);
      +
      +/* Seek to `want` with in the block pointed to by `it` */
     -+int block_iter_seek(struct block_iter *it, struct slice *want);
     ++int block_iter_seek(struct block_iter *it, struct strbuf *want);
      +
      +/* deallocate memory for `it`. The block reader and its block is left intact. */
      +void block_iter_close(struct block_iter *it);
     @@ reftable/block_test.c (new)
      +	const int block_size = 1024;
      +	struct reftable_block block = { 0 };
      +	struct block_writer bw = {
     -+		.last_key = SLICE_INIT,
     ++		.last_key = STRBUF_INIT,
      +	};
      +	struct reftable_ref_record ref = { 0 };
      +	struct reftable_record rec = { 0 };
      +	int i = 0;
      +	int n;
      +	struct block_reader br = { 0 };
     -+	struct block_iter it = { .last_key = SLICE_INIT };
     ++	struct block_iter it = { .last_key = STRBUF_INIT };
      +	int j = 0;
     -+	struct slice want = SLICE_INIT;
     ++	struct strbuf want = STRBUF_INIT;
      +
      +	block.data = reftable_calloc(block_size);
      +	block.len = block_size;
     @@ reftable/block_test.c (new)
      +
      +	for (i = 0; i < N; i++) {
      +		char name[100];
     -+		byte hash[SHA1_SIZE];
     ++		uint8_t hash[SHA1_SIZE];
      +		snprintf(name, sizeof(name), "branch%02d", i);
      +		memset(hash, i, sizeof(hash));
      +
     @@ reftable/block_test.c (new)
      +	block_iter_close(&it);
      +
      +	for (i = 0; i < N; i++) {
     -+		struct block_iter it = { .last_key = SLICE_INIT };
     -+		slice_reset(&want);
     -+		slice_addstr(&want, names[i]);
     ++		struct block_iter it = { .last_key = STRBUF_INIT };
     ++		strbuf_reset(&want);
     ++		strbuf_addstr(&want, names[i]);
      +
      +		n = block_reader_seek(&br, &it, &want);
      +		assert(n == 0);
     @@ reftable/block_test.c (new)
      +
      +	reftable_record_clear(&rec);
      +	reftable_block_done(&br.block);
     -+	slice_release(&want);
     ++	strbuf_release(&want);
      +	for (i = 0; i < N; i++) {
      +		reftable_free(names[i]);
      +	}
     @@ reftable/compat.c (new)
      +#include "system.h"
      +#include "basics.h"
      +
     -+#ifndef REFTABLE_IN_GITCORE
     ++#ifdef REFTABLE_STANDALONE
      +
      +#include <dirent.h>
      +
      +void put_be32(void *p, uint32_t i)
      +{
     -+	byte *out = (byte *)p;
     ++	uint8_t *out = (uint8_t *)p;
      +
      +	out[0] = (uint8_t)((i >> 24) & 0xff);
      +	out[1] = (uint8_t)((i >> 16) & 0xff);
     @@ reftable/compat.c (new)
      +
      +void put_be64(void *p, uint64_t v)
      +{
     -+	byte *out = (byte *)p;
     ++	uint8_t *out = (uint8_t *)p;
      +	int i = sizeof(uint64_t);
      +	while (i--) {
      +		out[i] = (uint8_t)(v & 0xff);
     @@ reftable/compat.c (new)
      +	}
      +}
      +
     -+uint64_t get_be64(uint8_t *out)
     ++uint64_t get_be64(void *out)
      +{
     ++	uint8_t *bytes = (uint8_t *)out;
      +	uint64_t v = 0;
      +	int i = 0;
      +	for (i = 0; i < sizeof(uint64_t); i++) {
     -+		v = (v << 8) | (uint8_t)(out[i] & 0xff);
     ++		v = (v << 8) | (uint8_t)(bytes[i] & 0xff);
      +	}
      +	return v;
      +}
     @@ reftable/compat.h (new)
      +
      +#include "system.h"
      +
     -+#ifndef REFTABLE_IN_GITCORE
     ++#ifdef REFTABLE_STANDALONE
      +
      +/* functions that git-core provides, for standalone compilation */
      +#include <stdint.h>
      +
     -+uint64_t get_be64(uint8_t *in);
     ++uint64_t get_be64(void *in);
      +void put_be64(void *out, uint64_t i);
      +
      +void put_be32(void *out, uint32_t i);
     @@ reftable/file.c (new)
      +	return 0;
      +}
      +
     -+int reftable_fd_write(void *arg, byte *data, size_t sz)
     ++int reftable_fd_write(void *arg, const void *data, size_t sz)
      +{
      +	int *fdp = (int *)arg;
      +	return write(*fdp, data, sz);
     @@ reftable/iter.c (new)
      +{
      +	struct filtering_ref_iterator *fri =
      +		(struct filtering_ref_iterator *)iter_arg;
     -+	slice_release(&fri->oid);
     ++	strbuf_release(&fri->oid);
      +	reftable_iterator_destroy(&fri->it);
      +}
      +
     @@ reftable/iter.c (new)
      +	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
      +	block_iter_close(&it->cur);
      +	reftable_block_done(&it->block_reader.block);
     -+	slice_release(&it->oid);
     ++	strbuf_release(&it->oid);
      +}
      +
      +static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
     @@ reftable/iter.c (new)
      +}
      +
      +int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
     -+			       struct reftable_reader *r, byte *oid,
     ++			       struct reftable_reader *r, uint8_t *oid,
      +			       int oid_len, uint64_t *offsets, int offset_len)
      +{
      +	struct indexed_table_ref_iter empty = INDEXED_TABLE_REF_ITER_INIT;
     @@ reftable/iter.c (new)
      +
      +	*itr = empty;
      +	itr->r = r;
     -+	slice_add(&itr->oid, oid, oid_len);
     ++	strbuf_add(&itr->oid, oid, oid_len);
      +
      +	itr->offsets = offsets;
      +	itr->offset_len = offset_len;
     @@ reftable/iter.h (new)
      +
      +#include "block.h"
      +#include "record.h"
     -+#include "slice.h"
     ++#include "strbuf.h"
      +
      +struct reftable_iterator_vtable {
      +	int (*next)(void *iter_arg, struct reftable_record *rec);
     @@ reftable/iter.h (new)
      +struct filtering_ref_iterator {
      +	bool double_check;
      +	struct reftable_table tab;
     -+	struct slice oid;
     ++	struct strbuf oid;
      +	struct reftable_iterator it;
      +};
      +#define FILTERING_REF_ITERATOR_INIT \
      +	{                           \
     -+		.oid = SLICE_INIT   \
     ++		.oid = STRBUF_INIT  \
      +	}
      +
      +void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
     @@ reftable/iter.h (new)
      + */
      +struct indexed_table_ref_iter {
      +	struct reftable_reader *r;
     -+	struct slice oid;
     ++	struct strbuf oid;
      +
      +	/* mutable */
      +	uint64_t *offsets;
     @@ reftable/iter.h (new)
      +	bool finished;
      +};
      +
     -+#define INDEXED_TABLE_REF_ITER_INIT                                   \
     -+	{                                                             \
     -+		.cur = { .last_key = SLICE_INIT }, .oid = SLICE_INIT, \
     ++#define INDEXED_TABLE_REF_ITER_INIT                                     \
     ++	{                                                               \
     ++		.cur = { .last_key = STRBUF_INIT }, .oid = STRBUF_INIT, \
      +	}
      +
      +void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
      +					  struct indexed_table_ref_iter *itr);
      +int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
     -+			       struct reftable_reader *r, byte *oid,
     ++			       struct reftable_reader *r, uint8_t *oid,
      +			       int oid_len, uint64_t *offsets, int offset_len);
      +
      +#endif
     @@ reftable/merged.c (new)
      +#include "iter.h"
      +#include "pq.h"
      +#include "reader.h"
     ++#include "record.h"
      +
      +static int merged_iter_init(struct merged_iter *mi)
      +{
     @@ reftable/merged.c (new)
      +static int merged_iter_next_entry(struct merged_iter *mi,
      +				  struct reftable_record *rec)
      +{
     -+	struct slice entry_key = SLICE_INIT;
     ++	struct strbuf entry_key = STRBUF_INIT;
      +	struct pq_entry entry = { 0 };
      +	int err = 0;
      +
     @@ reftable/merged.c (new)
      +	reftable_record_key(&entry.rec, &entry_key);
      +	while (!merged_iter_pqueue_is_empty(mi->pq)) {
      +		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
     -+		struct slice k = SLICE_INIT;
     ++		struct strbuf k = STRBUF_INIT;
      +		int err = 0, cmp = 0;
      +
      +		reftable_record_key(&top.rec, &k);
      +
     -+		cmp = slice_cmp(&k, &entry_key);
     -+		slice_release(&k);
     ++		cmp = strbuf_cmp(&k, &entry_key);
     ++		strbuf_release(&k);
      +
      +		if (cmp > 0) {
      +			break;
     @@ reftable/merged.c (new)
      +
      +	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
      +	reftable_record_destroy(&entry.rec);
     -+	slice_release(&entry_key);
     ++	strbuf_release(&entry_key);
      +	return 0;
      +}
      +
     @@ reftable/merged.h (new)
      +	struct reftable_iterator *stack;
      +	uint32_t hash_id;
      +	int stack_len;
     -+	byte typ;
     ++	uint8_t typ;
      +	bool suppress_deletions;
      +	struct merged_iter_pqueue pq;
      +};
     @@ reftable/merged_test.c (new)
      +	merged_iter_pqueue_clear(&pq);
      +}
      +
     -+static void write_test_table(struct slice *buf,
     ++static void write_test_table(struct strbuf *buf,
      +			     struct reftable_ref_record refs[], int n)
      +{
      +	int min = 0xffffffff;
     @@ reftable/merged_test.c (new)
      +		}
      +	}
      +
     -+	w = reftable_new_writer(&slice_add_void, buf, &opts);
     ++	w = reftable_new_writer(&strbuf_add_void, buf, &opts);
      +	reftable_writer_set_limits(w, min, max);
      +
      +	for (i = 0; i < n; i++) {
     @@ reftable/merged_test.c (new)
      +static struct reftable_merged_table *
      +merged_table_from_records(struct reftable_ref_record **refs,
      +			  struct reftable_block_source **source, int *sizes,
     -+			  struct slice *buf, int n)
     ++			  struct strbuf *buf, int n)
      +{
      +	struct reftable_reader **rd = reftable_calloc(n * sizeof(*rd));
      +	int i = 0;
     @@ reftable/merged_test.c (new)
      +	*source = reftable_calloc(n * sizeof(**source));
      +	for (i = 0; i < n; i++) {
      +		write_test_table(&buf[i], refs[i], sizes[i]);
     -+		block_source_from_slice(&(*source)[i], &buf[i]);
     ++		block_source_from_strbuf(&(*source)[i], &buf[i]);
      +
      +		err = reftable_new_reader(&rd[i], &(*source)[i], "name");
      +		assert_err(err);
     @@ reftable/merged_test.c (new)
      +
      +static void test_merged_between(void)
      +{
     -+	byte hash1[SHA1_SIZE] = { 1, 2, 3, 0 };
     ++	uint8_t hash1[SHA1_SIZE] = { 1, 2, 3, 0 };
      +
      +	struct reftable_ref_record r1[] = { {
      +		.ref_name = "b",
     @@ reftable/merged_test.c (new)
      +
      +	struct reftable_ref_record *refs[] = { r1, r2 };
      +	int sizes[] = { 1, 1 };
     -+	struct slice bufs[2] = { SLICE_INIT, SLICE_INIT };
     ++	struct strbuf bufs[2] = { STRBUF_INIT, STRBUF_INIT };
      +	struct reftable_block_source *bs = NULL;
      +	struct reftable_merged_table *mt =
      +		merged_table_from_records(refs, &bs, sizes, bufs, 2);
     @@ reftable/merged_test.c (new)
      +	reftable_merged_table_close(mt);
      +	reftable_merged_table_free(mt);
      +	for (i = 0; i < ARRAY_SIZE(bufs); i++) {
     -+		slice_release(&bufs[i]);
     ++		strbuf_release(&bufs[i]);
      +	}
      +	reftable_free(bs);
      +}
      +
      +static void test_merged(void)
      +{
     -+	byte hash1[SHA1_SIZE] = { 1 };
     -+	byte hash2[SHA1_SIZE] = { 2 };
     ++	uint8_t hash1[SHA1_SIZE] = { 1 };
     ++	uint8_t hash2[SHA1_SIZE] = { 2 };
      +	struct reftable_ref_record r1[] = { {
      +						    .ref_name = "a",
      +						    .update_index = 1,
     @@ reftable/merged_test.c (new)
      +
      +	struct reftable_ref_record *refs[] = { r1, r2, r3 };
      +	int sizes[3] = { 3, 1, 2 };
     -+	struct slice bufs[3] = { SLICE_INIT, SLICE_INIT, SLICE_INIT };
     ++	struct strbuf bufs[3] = { STRBUF_INIT, STRBUF_INIT, STRBUF_INIT };
      +	struct reftable_block_source *bs = NULL;
      +
      +	struct reftable_merged_table *mt =
     @@ reftable/merged_test.c (new)
      +	reftable_free(out);
      +
      +	for (i = 0; i < 3; i++) {
     -+		slice_release(&bufs[i]);
     ++		strbuf_release(&bufs[i]);
      +	}
      +	reftable_merged_table_close(mt);
      +	reftable_merged_table_free(mt);
     @@ reftable/pq.c (new)
      +
      +#include "pq.h"
      +
     ++#include "reftable.h"
      +#include "system.h"
     ++#include "basics.h"
      +
      +int pq_less(struct pq_entry a, struct pq_entry b)
      +{
     -+	struct slice ak = SLICE_INIT;
     -+	struct slice bk = SLICE_INIT;
     ++	struct strbuf ak = STRBUF_INIT;
     ++	struct strbuf bk = STRBUF_INIT;
      +	int cmp = 0;
      +	reftable_record_key(&a.rec, &ak);
      +	reftable_record_key(&b.rec, &bk);
      +
     -+	cmp = slice_cmp(&ak, &bk);
     ++	cmp = strbuf_cmp(&ak, &bk);
      +
     -+	slice_release(&ak);
     -+	slice_release(&bk);
     ++	strbuf_release(&ak);
     ++	strbuf_release(&bk);
      +
      +	if (cmp == 0)
      +		return a.index > b.index;
     @@ reftable/reader.c (new)
      +}
      +
      +static struct reftable_reader_offsets *
     -+reader_offsets_for(struct reftable_reader *r, byte typ)
     ++reader_offsets_for(struct reftable_reader *r, uint8_t typ)
      +{
      +	switch (typ) {
      +	case BLOCK_TYPE_REF:
     @@ reftable/reader.c (new)
      +	return r->name;
      +}
      +
     -+static int parse_footer(struct reftable_reader *r, byte *footer, byte *header)
     ++static int parse_footer(struct reftable_reader *r, uint8_t *footer,
     ++			uint8_t *header)
      +{
     -+	byte *f = footer;
     -+	byte first_block_typ;
     ++	uint8_t *f = footer;
     ++	uint8_t first_block_typ;
      +	int err = 0;
      +	uint32_t computed_crc;
      +	uint32_t file_crc;
     @@ reftable/reader.c (new)
      +
      +struct table_iter {
      +	struct reftable_reader *r;
     -+	byte typ;
     ++	uint8_t typ;
      +	uint64_t block_off;
      +	struct block_iter bi;
      +	bool finished;
      +};
     -+#define TABLE_ITER_INIT                         \
     -+	{                                       \
     -+		.bi = {.last_key = SLICE_INIT } \
     ++#define TABLE_ITER_INIT                          \
     ++	{                                        \
     ++		.bi = {.last_key = STRBUF_INIT } \
      +	}
      +
      +static void table_iter_copy_from(struct table_iter *dest,
     @@ reftable/reader.c (new)
      +	ti->bi.next_off = 0;
      +}
      +
     -+static int32_t extract_block_size(byte *data, byte *typ, uint64_t off,
     ++static int32_t extract_block_size(uint8_t *data, uint8_t *typ, uint64_t off,
      +				  int version)
      +{
      +	int32_t result = 0;
     @@ reftable/reader.c (new)
      +}
      +
      +int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
     -+			     uint64_t next_off, byte want_typ)
     ++			     uint64_t next_off, uint8_t want_typ)
      +{
      +	int32_t guess_block_size = r->block_size ? r->block_size :
      +						   DEFAULT_BLOCK_SIZE;
      +	struct reftable_block block = { 0 };
     -+	byte block_typ = 0;
     ++	uint8_t block_typ = 0;
      +	int err = 0;
      +	uint32_t header_off = next_off ? 0 : header_size(r->version);
      +	int32_t block_size = 0;
     @@ reftable/reader.c (new)
      +}
      +
      +static int reader_table_iter_at(struct reftable_reader *r,
     -+				struct table_iter *ti, uint64_t off, byte typ)
     ++				struct table_iter *ti, uint64_t off,
     ++				uint8_t typ)
      +{
      +	struct block_reader br = { 0 };
      +	struct block_reader *brp = NULL;
     @@ reftable/reader.c (new)
      +}
      +
      +static int reader_start(struct reftable_reader *r, struct table_iter *ti,
     -+			byte typ, bool index)
     ++			uint8_t typ, bool index)
      +{
      +	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
      +	uint64_t off = offs->offset;
     @@ reftable/reader.c (new)
      +{
      +	struct reftable_record rec =
      +		reftable_new_record(reftable_record_type(want));
     -+	struct slice want_key = SLICE_INIT;
     -+	struct slice got_key = SLICE_INIT;
     ++	struct strbuf want_key = STRBUF_INIT;
     ++	struct strbuf got_key = STRBUF_INIT;
      +	struct table_iter next = TABLE_ITER_INIT;
      +	int err = -1;
      +
     @@ reftable/reader.c (new)
      +		if (err < 0)
      +			goto done;
      +
     -+		if (slice_cmp(&got_key, &want_key) > 0) {
     ++		if (strbuf_cmp(&got_key, &want_key) > 0) {
      +			table_iter_block_done(&next);
      +			break;
      +		}
     @@ reftable/reader.c (new)
      +done:
      +	block_iter_close(&next.bi);
      +	reftable_record_destroy(&rec);
     -+	slice_release(&want_key);
     -+	slice_release(&got_key);
     ++	strbuf_release(&want_key);
     ++	strbuf_release(&got_key);
      +	return err;
      +}
      +
     @@ reftable/reader.c (new)
      +			       struct reftable_iterator *it,
      +			       struct reftable_record *rec)
      +{
     -+	struct reftable_index_record want_index = { .last_key = SLICE_INIT };
     ++	struct reftable_index_record want_index = { .last_key = STRBUF_INIT };
      +	struct reftable_record want_index_rec = { 0 };
     -+	struct reftable_index_record index_result = { .last_key = SLICE_INIT };
     ++	struct reftable_index_record index_result = { .last_key = STRBUF_INIT };
      +	struct reftable_record index_result_rec = { 0 };
      +	struct table_iter index_iter = TABLE_ITER_INIT;
      +	struct table_iter next = TABLE_ITER_INIT;
     @@ reftable/reader.c (new)
      +int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
      +		struct reftable_record *rec)
      +{
     -+	byte typ = reftable_record_type(rec);
     ++	uint8_t typ = reftable_record_type(rec);
      +
      +	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
      +	if (!offs->present) {
     @@ reftable/reader.c (new)
      +
      +static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
      +					    struct reftable_iterator *it,
     -+					    byte *oid)
     ++					    uint8_t *oid)
      +{
      +	struct reftable_obj_record want = {
      +		.hash_prefix = oid,
     @@ reftable/reader.c (new)
      +
      +static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
      +					      struct reftable_iterator *it,
     -+					      byte *oid)
     ++					      uint8_t *oid)
      +{
      +	struct table_iter ti_empty = TABLE_ITER_INIT;
      +	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
     @@ reftable/reader.c (new)
      +	filter = reftable_malloc(sizeof(struct filtering_ref_iterator));
      +	*filter = empty;
      +
     -+	slice_add(&filter->oid, oid, oid_len);
     ++	strbuf_add(&filter->oid, oid, oid_len);
      +	reftable_table_from_reader(&filter->tab, r);
      +	filter->double_check = false;
      +	iterator_from_table_iter(&filter->it, ti);
     @@ reftable/reader.c (new)
      +}
      +
      +int reftable_reader_refs_for(struct reftable_reader *r,
     -+			     struct reftable_iterator *it, byte *oid)
     ++			     struct reftable_iterator *it, uint8_t *oid)
      +{
      +	if (r->obj_offsets.present)
      +		return reftable_reader_refs_for_indexed(r, it, oid);
     @@ reftable/reader.h (new)
      +
      +/* initialize a block reader to read from `r` */
      +int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
     -+			     uint64_t next_off, byte want_typ);
     ++			     uint64_t next_off, uint8_t want_typ);
      +
      +#endif
      
     @@ reftable/record.c (new)
      +#include "record.h"
      +
      +#include "system.h"
     -+
      +#include "constants.h"
      +#include "reftable.h"
     ++#include "basics.h"
      +
     -+int get_var_int(uint64_t *dest, struct slice *in)
     ++int get_var_int(uint64_t *dest, struct string_view *in)
      +{
      +	int ptr = 0;
      +	uint64_t val;
     @@ reftable/record.c (new)
      +	return ptr + 1;
      +}
      +
     -+int put_var_int(struct slice *dest, uint64_t val)
     ++int put_var_int(struct string_view *dest, uint64_t val)
      +{
     -+	byte buf[10] = { 0 };
     ++	uint8_t buf[10] = { 0 };
      +	int i = 9;
      +	int n = 0;
     -+	buf[i] = (byte)(val & 0x7f);
     ++	buf[i] = (uint8_t)(val & 0x7f);
      +	i--;
      +	while (true) {
      +		val >>= 7;
     @@ reftable/record.c (new)
      +			break;
      +		}
      +		val--;
     -+		buf[i] = 0x80 | (byte)(val & 0x7f);
     ++		buf[i] = 0x80 | (uint8_t)(val & 0x7f);
      +		i--;
      +	}
      +
     @@ reftable/record.c (new)
      +	return n;
      +}
      +
     -+int reftable_is_block_type(byte typ)
     ++int reftable_is_block_type(uint8_t typ)
      +{
      +	switch (typ) {
      +	case BLOCK_TYPE_REF:
     @@ reftable/record.c (new)
      +	return false;
      +}
      +
     -+static int decode_string(struct slice *dest, struct slice in)
     ++static int decode_string(struct strbuf *dest, struct string_view in)
      +{
      +	int start_len = in.len;
      +	uint64_t tsize = 0;
      +	int n = get_var_int(&tsize, &in);
      +	if (n <= 0)
      +		return -1;
     -+	slice_consume(&in, n);
     ++	string_view_consume(&in, n);
      +	if (in.len < tsize)
      +		return -1;
      +
     -+	slice_reset(dest);
     -+	slice_add(dest, in.buf, tsize);
     -+	slice_consume(&in, tsize);
     ++	strbuf_reset(dest);
     ++	strbuf_add(dest, in.buf, tsize);
     ++	string_view_consume(&in, tsize);
      +
      +	return start_len - in.len;
      +}
      +
     -+static int encode_string(char *str, struct slice s)
     ++static int encode_string(char *str, struct string_view s)
      +{
     -+	struct slice start = s;
     ++	struct string_view start = s;
      +	int l = strlen(str);
      +	int n = put_var_int(&s, l);
      +	if (n < 0)
      +		return -1;
     -+	slice_consume(&s, n);
     ++	string_view_consume(&s, n);
      +	if (s.len < l)
      +		return -1;
      +	memcpy(s.buf, str, l);
     -+	slice_consume(&s, l);
     ++	string_view_consume(&s, l);
      +
      +	return start.len - s.len;
      +}
      +
     -+int reftable_encode_key(bool *restart, struct slice dest, struct slice prev_key,
     -+			struct slice key, byte extra)
     ++int reftable_encode_key(bool *restart, struct string_view dest,
     ++			struct strbuf prev_key, struct strbuf key,
     ++			uint8_t extra)
      +{
     -+	struct slice start = dest;
     ++	struct string_view start = dest;
      +	int prefix_len = common_prefix_size(&prev_key, &key);
      +	uint64_t suffix_len = key.len - prefix_len;
      +	int n = put_var_int(&dest, (uint64_t)prefix_len);
      +	if (n < 0)
      +		return -1;
     -+	slice_consume(&dest, n);
     ++	string_view_consume(&dest, n);
      +
      +	*restart = (prefix_len == 0);
      +
      +	n = put_var_int(&dest, suffix_len << 3 | (uint64_t)extra);
      +	if (n < 0)
      +		return -1;
     -+	slice_consume(&dest, n);
     ++	string_view_consume(&dest, n);
      +
      +	if (dest.len < suffix_len)
      +		return -1;
      +	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
     -+	slice_consume(&dest, suffix_len);
     ++	string_view_consume(&dest, suffix_len);
      +
      +	return start.len - dest.len;
      +}
      +
     -+int reftable_decode_key(struct slice *key, byte *extra, struct slice last_key,
     -+			struct slice in)
     ++int reftable_decode_key(struct strbuf *key, uint8_t *extra,
     ++			struct strbuf last_key, struct string_view in)
      +{
      +	int start_len = in.len;
      +	uint64_t prefix_len = 0;
     @@ reftable/record.c (new)
      +	int n = get_var_int(&prefix_len, &in);
      +	if (n < 0)
      +		return -1;
     -+	slice_consume(&in, n);
     ++	string_view_consume(&in, n);
      +
      +	if (prefix_len > last_key.len)
      +		return -1;
     @@ reftable/record.c (new)
      +	n = get_var_int(&suffix_len, &in);
      +	if (n <= 0)
      +		return -1;
     -+	slice_consume(&in, n);
     ++	string_view_consume(&in, n);
      +
     -+	*extra = (byte)(suffix_len & 0x7);
     ++	*extra = (uint8_t)(suffix_len & 0x7);
      +	suffix_len >>= 3;
      +
      +	if (in.len < suffix_len)
      +		return -1;
      +
     -+	slice_reset(key);
     -+	slice_add(key, last_key.buf, prefix_len);
     -+	slice_add(key, in.buf, suffix_len);
     -+	slice_consume(&in, suffix_len);
     ++	strbuf_reset(key);
     ++	strbuf_add(key, last_key.buf, prefix_len);
     ++	strbuf_add(key, in.buf, suffix_len);
     ++	string_view_consume(&in, suffix_len);
      +
      +	return start_len - in.len;
      +}
      +
     -+static void reftable_ref_record_key(const void *r, struct slice *dest)
     ++static void reftable_ref_record_key(const void *r, struct strbuf *dest)
      +{
      +	const struct reftable_ref_record *rec =
      +		(const struct reftable_ref_record *)r;
     -+	slice_reset(dest);
     -+	slice_addstr(dest, rec->ref_name);
     ++	strbuf_reset(dest);
     ++	strbuf_addstr(dest, rec->ref_name);
      +}
      +
      +static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
     @@ reftable/record.c (new)
      +	return 'a' + (c - 10);
      +}
      +
     -+static void hex_format(char *dest, byte *src, int hash_size)
     ++static void hex_format(char *dest, uint8_t *src, int hash_size)
      +{
      +	assert(hash_size > 0);
      +	if (src != NULL) {
     @@ reftable/record.c (new)
      +	memset(ref, 0, sizeof(struct reftable_ref_record));
      +}
      +
     -+static byte reftable_ref_record_val_type(const void *rec)
     ++static uint8_t reftable_ref_record_val_type(const void *rec)
      +{
      +	const struct reftable_ref_record *r =
      +		(const struct reftable_ref_record *)rec;
     @@ reftable/record.c (new)
      +	return 0;
      +}
      +
     -+static int reftable_ref_record_encode(const void *rec, struct slice s,
     ++static int reftable_ref_record_encode(const void *rec, struct string_view s,
      +				      int hash_size)
      +{
      +	const struct reftable_ref_record *r =
      +		(const struct reftable_ref_record *)rec;
     -+	struct slice start = s;
     ++	struct string_view start = s;
      +	int n = put_var_int(&s, r->update_index);
      +	assert(hash_size > 0);
      +	if (n < 0)
      +		return -1;
     -+	slice_consume(&s, n);
     ++	string_view_consume(&s, n);
      +
      +	if (r->value != NULL) {
      +		if (s.len < hash_size) {
      +			return -1;
      +		}
      +		memcpy(s.buf, r->value, hash_size);
     -+		slice_consume(&s, hash_size);
     ++		string_view_consume(&s, hash_size);
      +	}
      +
      +	if (r->target_value != NULL) {
     @@ reftable/record.c (new)
      +			return -1;
      +		}
      +		memcpy(s.buf, r->target_value, hash_size);
     -+		slice_consume(&s, hash_size);
     ++		string_view_consume(&s, hash_size);
      +	}
      +
      +	if (r->target != NULL) {
     @@ reftable/record.c (new)
      +		if (n < 0) {
      +			return -1;
      +		}
     -+		slice_consume(&s, n);
     ++		string_view_consume(&s, n);
      +	}
      +
      +	return start.len - s.len;
      +}
      +
     -+static int reftable_ref_record_decode(void *rec, struct slice key,
     -+				      byte val_type, struct slice in,
     ++static int reftable_ref_record_decode(void *rec, struct strbuf key,
     ++				      uint8_t val_type, struct string_view in,
      +				      int hash_size)
      +{
      +	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
     -+	struct slice start = in;
     ++	struct string_view start = in;
      +	bool seen_value = false;
      +	bool seen_target_value = false;
      +	bool seen_target = false;
     @@ reftable/record.c (new)
      +		return n;
      +	assert(hash_size > 0);
      +
     -+	slice_consume(&in, n);
     ++	string_view_consume(&in, n);
      +
      +	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
      +	memcpy(r->ref_name, key.buf, key.len);
     @@ reftable/record.c (new)
      +		}
      +		seen_value = true;
      +		memcpy(r->value, in.buf, hash_size);
     -+		slice_consume(&in, hash_size);
     ++		string_view_consume(&in, hash_size);
      +		if (val_type == 1) {
      +			break;
      +		}
     @@ reftable/record.c (new)
      +		}
      +		seen_target_value = true;
      +		memcpy(r->target_value, in.buf, hash_size);
     -+		slice_consume(&in, hash_size);
     ++		string_view_consume(&in, hash_size);
      +		break;
      +	case 3: {
     -+		struct slice dest = SLICE_INIT;
     ++		struct strbuf dest = STRBUF_INIT;
      +		int n = decode_string(&dest, in);
      +		if (n < 0) {
      +			return -1;
      +		}
     -+		slice_consume(&in, n);
     ++		string_view_consume(&in, n);
      +		seen_target = true;
      +		if (r->target != NULL) {
      +			reftable_free(r->target);
      +		}
     -+		r->target = (char *)slice_as_string(&dest);
     ++		r->target = dest.buf;
      +	} break;
      +
      +	case 0:
     @@ reftable/record.c (new)
      +	.is_deletion = &reftable_ref_record_is_deletion_void,
      +};
      +
     -+static void reftable_obj_record_key(const void *r, struct slice *dest)
     ++static void reftable_obj_record_key(const void *r, struct strbuf *dest)
      +{
      +	const struct reftable_obj_record *rec =
      +		(const struct reftable_obj_record *)r;
     -+	slice_reset(dest);
     -+	slice_add(dest, rec->hash_prefix, rec->hash_prefix_len);
     ++	strbuf_reset(dest);
     ++	strbuf_add(dest, rec->hash_prefix, rec->hash_prefix_len);
      +}
      +
      +static void reftable_obj_record_clear(void *rec)
     @@ reftable/record.c (new)
      +	memcpy(obj->offsets, src->offsets, olen);
      +}
      +
     -+static byte reftable_obj_record_val_type(const void *rec)
     ++static uint8_t reftable_obj_record_val_type(const void *rec)
      +{
      +	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
      +	if (r->offset_len > 0 && r->offset_len < 8)
     @@ reftable/record.c (new)
      +	return 0;
      +}
      +
     -+static int reftable_obj_record_encode(const void *rec, struct slice s,
     ++static int reftable_obj_record_encode(const void *rec, struct string_view s,
      +				      int hash_size)
      +{
      +	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
     -+	struct slice start = s;
     ++	struct string_view start = s;
      +	int i = 0;
      +	int n = 0;
      +	uint64_t last = 0;
     @@ reftable/record.c (new)
      +		if (n < 0) {
      +			return -1;
      +		}
     -+		slice_consume(&s, n);
     ++		string_view_consume(&s, n);
      +	}
      +	if (r->offset_len == 0)
      +		return start.len - s.len;
      +	n = put_var_int(&s, r->offsets[0]);
      +	if (n < 0)
      +		return -1;
     -+	slice_consume(&s, n);
     ++	string_view_consume(&s, n);
      +
      +	last = r->offsets[0];
      +	for (i = 1; i < r->offset_len; i++) {
     @@ reftable/record.c (new)
      +		if (n < 0) {
      +			return -1;
      +		}
     -+		slice_consume(&s, n);
     ++		string_view_consume(&s, n);
      +		last = r->offsets[i];
      +	}
      +	return start.len - s.len;
      +}
      +
     -+static int reftable_obj_record_decode(void *rec, struct slice key,
     -+				      byte val_type, struct slice in,
     ++static int reftable_obj_record_decode(void *rec, struct strbuf key,
     ++				      uint8_t val_type, struct string_view in,
      +				      int hash_size)
      +{
     -+	struct slice start = in;
     ++	struct string_view start = in;
      +	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
      +	uint64_t count = val_type;
      +	int n = 0;
     @@ reftable/record.c (new)
      +			return n;
      +		}
      +
     -+		slice_consume(&in, n);
     ++		string_view_consume(&in, n);
      +	}
      +
      +	r->offsets = NULL;
     @@ reftable/record.c (new)
      +	n = get_var_int(&r->offsets[0], &in);
      +	if (n < 0)
      +		return n;
     -+	slice_consume(&in, n);
     ++	string_view_consume(&in, n);
      +
      +	last = r->offsets[0];
      +	j = 1;
     @@ reftable/record.c (new)
      +		if (n < 0) {
      +			return n;
      +		}
     -+		slice_consume(&in, n);
     ++		string_view_consume(&in, n);
      +
      +		last = r->offsets[j] = (delta + last);
      +		j++;
     @@ reftable/record.c (new)
      +	printf("%s\n\n%s\n}\n", hex, log->message);
      +}
      +
     -+static void reftable_log_record_key(const void *r, struct slice *dest)
     ++static void reftable_log_record_key(const void *r, struct strbuf *dest)
      +{
      +	const struct reftable_log_record *rec =
      +		(const struct reftable_log_record *)r;
      +	int len = strlen(rec->ref_name);
     -+	byte i64[8];
     ++	uint8_t i64[8];
      +	uint64_t ts = 0;
     -+	slice_reset(dest);
     -+	slice_add(dest, (byte *)rec->ref_name, len + 1);
     ++	strbuf_reset(dest);
     ++	strbuf_add(dest, (uint8_t *)rec->ref_name, len + 1);
      +
      +	ts = (~ts) - rec->update_index;
      +	put_be64(&i64[0], ts);
     -+	slice_add(dest, i64, sizeof(i64));
     ++	strbuf_add(dest, i64, sizeof(i64));
      +}
      +
      +static void reftable_log_record_copy_from(void *rec, const void *src_rec,
     @@ reftable/record.c (new)
      +	memset(r, 0, sizeof(struct reftable_log_record));
      +}
      +
     -+static byte reftable_log_record_val_type(const void *rec)
     ++static uint8_t reftable_log_record_val_type(const void *rec)
      +{
      +	const struct reftable_log_record *log =
      +		(const struct reftable_log_record *)rec;
     @@ reftable/record.c (new)
      +	return reftable_log_record_is_deletion(log) ? 0 : 1;
      +}
      +
     -+static byte zero[SHA256_SIZE] = { 0 };
     ++static uint8_t zero[SHA256_SIZE] = { 0 };
      +
     -+static int reftable_log_record_encode(const void *rec, struct slice s,
     ++static int reftable_log_record_encode(const void *rec, struct string_view s,
      +				      int hash_size)
      +{
      +	struct reftable_log_record *r = (struct reftable_log_record *)rec;
     -+	struct slice start = s;
     ++	struct string_view start = s;
      +	int n = 0;
     -+	byte *oldh = r->old_hash;
     -+	byte *newh = r->new_hash;
     ++	uint8_t *oldh = r->old_hash;
     ++	uint8_t *newh = r->new_hash;
      +	if (reftable_log_record_is_deletion(r))
      +		return 0;
      +
     @@ reftable/record.c (new)
      +
      +	memcpy(s.buf, oldh, hash_size);
      +	memcpy(s.buf + hash_size, newh, hash_size);
     -+	slice_consume(&s, 2 * hash_size);
     ++	string_view_consume(&s, 2 * hash_size);
      +
      +	n = encode_string(r->name ? r->name : "", s);
      +	if (n < 0)
      +		return -1;
     -+	slice_consume(&s, n);
     ++	string_view_consume(&s, n);
      +
      +	n = encode_string(r->email ? r->email : "", s);
      +	if (n < 0)
      +		return -1;
     -+	slice_consume(&s, n);
     ++	string_view_consume(&s, n);
      +
      +	n = put_var_int(&s, r->time);
      +	if (n < 0)
      +		return -1;
     -+	slice_consume(&s, n);
     ++	string_view_consume(&s, n);
      +
      +	if (s.len < 2)
      +		return -1;
      +
      +	put_be16(s.buf, r->tz_offset);
     -+	slice_consume(&s, 2);
     ++	string_view_consume(&s, 2);
      +
      +	n = encode_string(r->message ? r->message : "", s);
      +	if (n < 0)
      +		return -1;
     -+	slice_consume(&s, n);
     ++	string_view_consume(&s, n);
      +
      +	return start.len - s.len;
      +}
      +
     -+static int reftable_log_record_decode(void *rec, struct slice key,
     -+				      byte val_type, struct slice in,
     ++static int reftable_log_record_decode(void *rec, struct strbuf key,
     ++				      uint8_t val_type, struct string_view in,
      +				      int hash_size)
      +{
     -+	struct slice start = in;
     ++	struct string_view start = in;
      +	struct reftable_log_record *r = (struct reftable_log_record *)rec;
      +	uint64_t max = 0;
      +	uint64_t ts = 0;
     -+	struct slice dest = SLICE_INIT;
     ++	struct strbuf dest = STRBUF_INIT;
      +	int n;
      +
      +	if (key.len <= 9 || key.buf[key.len - 9] != 0)
     @@ reftable/record.c (new)
      +	memcpy(r->old_hash, in.buf, hash_size);
      +	memcpy(r->new_hash, in.buf + hash_size, hash_size);
      +
     -+	slice_consume(&in, 2 * hash_size);
     ++	string_view_consume(&in, 2 * hash_size);
      +
      +	n = decode_string(&dest, in);
      +	if (n < 0)
      +		goto done;
     -+	slice_consume(&in, n);
     ++	string_view_consume(&in, n);
      +
      +	r->name = reftable_realloc(r->name, dest.len + 1);
      +	memcpy(r->name, dest.buf, dest.len);
      +	r->name[dest.len] = 0;
      +
     -+	slice_reset(&dest);
     ++	strbuf_reset(&dest);
      +	n = decode_string(&dest, in);
      +	if (n < 0)
      +		goto done;
     -+	slice_consume(&in, n);
     ++	string_view_consume(&in, n);
      +
      +	r->email = reftable_realloc(r->email, dest.len + 1);
      +	memcpy(r->email, dest.buf, dest.len);
     @@ reftable/record.c (new)
      +	n = get_var_int(&ts, &in);
      +	if (n < 0)
      +		goto done;
     -+	slice_consume(&in, n);
     ++	string_view_consume(&in, n);
      +	r->time = ts;
      +	if (in.len < 2)
      +		goto done;
      +
      +	r->tz_offset = get_be16(in.buf);
     -+	slice_consume(&in, 2);
     ++	string_view_consume(&in, 2);
      +
     -+	slice_reset(&dest);
     ++	strbuf_reset(&dest);
      +	n = decode_string(&dest, in);
      +	if (n < 0)
      +		goto done;
     -+	slice_consume(&in, n);
     ++	string_view_consume(&in, n);
      +
      +	r->message = reftable_realloc(r->message, dest.len + 1);
      +	memcpy(r->message, dest.buf, dest.len);
      +	r->message[dest.len] = 0;
      +
     -+	slice_release(&dest);
     ++	strbuf_release(&dest);
      +	return start.len - in.len;
      +
      +done:
     -+	slice_release(&dest);
     ++	strbuf_release(&dest);
      +	return REFTABLE_FORMAT_ERROR;
      +}
      +
     @@ reftable/record.c (new)
      +	return 0 == strcmp(a, b);
      +}
      +
     -+static bool zero_hash_eq(byte *a, byte *b, int sz)
     ++static bool zero_hash_eq(uint8_t *a, uint8_t *b, int sz)
      +{
      +	if (a == NULL)
      +		a = zero;
     @@ reftable/record.c (new)
      +	.is_deletion = &reftable_log_record_is_deletion_void,
      +};
      +
     -+struct reftable_record reftable_new_record(byte typ)
     ++struct reftable_record reftable_new_record(uint8_t typ)
      +{
      +	struct reftable_record rec = { NULL };
      +	switch (typ) {
     @@ reftable/record.c (new)
      +		return rec;
      +	}
      +	case BLOCK_TYPE_INDEX: {
     -+		struct reftable_index_record empty = { .last_key = SLICE_INIT };
     ++		struct reftable_index_record empty = { .last_key =
     ++							       STRBUF_INIT };
      +		struct reftable_index_record *r =
      +			reftable_calloc(sizeof(struct reftable_index_record));
      +		*r = empty;
     @@ reftable/record.c (new)
      +	reftable_free(reftable_record_yield(rec));
      +}
      +
     -+static void reftable_index_record_key(const void *r, struct slice *dest)
     ++static void reftable_index_record_key(const void *r, struct strbuf *dest)
      +{
      +	struct reftable_index_record *rec = (struct reftable_index_record *)r;
     -+	slice_reset(dest);
     -+	slice_addbuf(dest, &rec->last_key);
     ++	strbuf_reset(dest);
     ++	strbuf_addbuf(dest, &rec->last_key);
      +}
      +
      +static void reftable_index_record_copy_from(void *rec, const void *src_rec,
     @@ reftable/record.c (new)
      +	struct reftable_index_record *src =
      +		(struct reftable_index_record *)src_rec;
      +
     -+	slice_reset(&dst->last_key);
     -+	slice_addbuf(&dst->last_key, &src->last_key);
     ++	strbuf_reset(&dst->last_key);
     ++	strbuf_addbuf(&dst->last_key, &src->last_key);
      +	dst->offset = src->offset;
      +}
      +
      +static void reftable_index_record_clear(void *rec)
      +{
      +	struct reftable_index_record *idx = (struct reftable_index_record *)rec;
     -+	slice_release(&idx->last_key);
     ++	strbuf_release(&idx->last_key);
      +}
      +
     -+static byte reftable_index_record_val_type(const void *rec)
     ++static uint8_t reftable_index_record_val_type(const void *rec)
      +{
      +	return 0;
      +}
      +
     -+static int reftable_index_record_encode(const void *rec, struct slice out,
     ++static int reftable_index_record_encode(const void *rec, struct string_view out,
      +					int hash_size)
      +{
      +	const struct reftable_index_record *r =
      +		(const struct reftable_index_record *)rec;
     -+	struct slice start = out;
     ++	struct string_view start = out;
      +
      +	int n = put_var_int(&out, r->offset);
      +	if (n < 0)
      +		return n;
      +
     -+	slice_consume(&out, n);
     ++	string_view_consume(&out, n);
      +
      +	return start.len - out.len;
      +}
      +
     -+static int reftable_index_record_decode(void *rec, struct slice key,
     -+					byte val_type, struct slice in,
     ++static int reftable_index_record_decode(void *rec, struct strbuf key,
     ++					uint8_t val_type, struct string_view in,
      +					int hash_size)
      +{
     -+	struct slice start = in;
     ++	struct string_view start = in;
      +	struct reftable_index_record *r = (struct reftable_index_record *)rec;
      +	int n = 0;
      +
     -+	slice_reset(&r->last_key);
     -+	slice_addbuf(&r->last_key, &key);
     ++	strbuf_reset(&r->last_key);
     ++	strbuf_addbuf(&r->last_key, &key);
      +
      +	n = get_var_int(&r->offset, &in);
      +	if (n < 0)
      +		return n;
      +
     -+	slice_consume(&in, n);
     ++	string_view_consume(&in, n);
      +	return start.len - in.len;
      +}
      +
     @@ reftable/record.c (new)
      +	.is_deletion = &not_a_deletion,
      +};
      +
     -+void reftable_record_key(struct reftable_record *rec, struct slice *dest)
     ++void reftable_record_key(struct reftable_record *rec, struct strbuf *dest)
      +{
      +	rec->ops->key(rec->data, dest);
      +}
      +
     -+byte reftable_record_type(struct reftable_record *rec)
     ++uint8_t reftable_record_type(struct reftable_record *rec)
      +{
      +	return rec->ops->type;
      +}
      +
     -+int reftable_record_encode(struct reftable_record *rec, struct slice dest,
     ++int reftable_record_encode(struct reftable_record *rec, struct string_view dest,
      +			   int hash_size)
      +{
      +	return rec->ops->encode(rec->data, dest, hash_size);
     @@ reftable/record.c (new)
      +	rec->ops->copy_from(rec->data, src->data, hash_size);
      +}
      +
     -+byte reftable_record_val_type(struct reftable_record *rec)
     ++uint8_t reftable_record_val_type(struct reftable_record *rec)
      +{
      +	return rec->ops->val_type(rec->data);
      +}
      +
     -+int reftable_record_decode(struct reftable_record *rec, struct slice key,
     -+			   byte extra, struct slice src, int hash_size)
     ++int reftable_record_decode(struct reftable_record *rec, struct strbuf key,
     ++			   uint8_t extra, struct string_view src, int hash_size)
      +{
      +	return rec->ops->decode(rec->data, key, extra, src, hash_size);
      +}
     @@ reftable/record.c (new)
      +	return (struct reftable_log_record *)rec->data;
      +}
      +
     -+static bool hash_equal(byte *a, byte *b, int hash_size)
     ++static bool hash_equal(uint8_t *a, uint8_t *b, int hash_size)
      +{
      +	if (a != NULL && b != NULL)
      +		return !memcmp(a, b, hash_size);
     @@ reftable/record.c (new)
      +		return SHA256_SIZE;
      +	}
      +	abort();
     ++}
     ++
     ++void string_view_consume(struct string_view *s, int n)
     ++{
     ++	s->buf += n;
     ++	s->len -= n;
      +}
      
       ## reftable/record.h (new) ##
     @@ reftable/record.h (new)
      +#define RECORD_H
      +
      +#include "reftable.h"
     -+#include "slice.h"
     ++#include "strbuf.h"
     ++#include "system.h"
     ++
     ++/*
     ++  A substring of existing string data. This structure takes no responsibility
     ++  for the lifetime of the data it points to.
     ++*/
     ++struct string_view {
     ++	uint8_t *buf;
     ++	int len;
     ++};
     ++
     ++/* Advance `s.buf` by `n`, and decrease length. */
     ++void string_view_consume(struct string_view *s, int n);
      +
      +/* utilities for de/encoding varints */
      +
     -+int get_var_int(uint64_t *dest, struct slice *in);
     -+int put_var_int(struct slice *dest, uint64_t val);
     ++int get_var_int(uint64_t *dest, struct string_view *in);
     ++int put_var_int(struct string_view *dest, uint64_t val);
      +
      +/* Methods for records. */
      +struct reftable_record_vtable {
     -+	/* encode the key of to a byte slice. */
     -+	void (*key)(const void *rec, struct slice *dest);
     ++	/* encode the key of to a uint8_t strbuf. */
     ++	void (*key)(const void *rec, struct strbuf *dest);
      +
      +	/* The record type of ('r' for ref). */
     -+	byte type;
     ++	uint8_t type;
      +
      +	void (*copy_from)(void *dest, const void *src, int hash_size);
      +
      +	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
      +	 * vs ref deletion) */
     -+	byte (*val_type)(const void *rec);
     ++	uint8_t (*val_type)(const void *rec);
      +
      +	/* encodes rec into dest, returning how much space was used. */
     -+	int (*encode)(const void *rec, struct slice dest, int hash_size);
     ++	int (*encode)(const void *rec, struct string_view dest, int hash_size);
      +
      +	/* decode data from `src` into the record. */
     -+	int (*decode)(void *rec, struct slice key, byte extra, struct slice src,
     -+		      int hash_size);
     ++	int (*decode)(void *rec, struct strbuf key, uint8_t extra,
     ++		      struct string_view src, int hash_size);
      +
      +	/* deallocate and null the record. */
      +	void (*clear)(void *rec);
     @@ reftable/record.h (new)
      +};
      +
      +/* returns true for recognized block types. Block start with the block type. */
     -+int reftable_is_block_type(byte typ);
     ++int reftable_is_block_type(uint8_t typ);
      +
      +/* creates a malloced record of the given type. Dispose with record_destroy */
     -+struct reftable_record reftable_new_record(byte typ);
     ++struct reftable_record reftable_new_record(uint8_t typ);
      +
      +extern struct reftable_record_vtable reftable_ref_record_vtable;
      +
      +/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
      +   number of bytes written. */
     -+int reftable_encode_key(bool *restart, struct slice dest, struct slice prev_key,
     -+			struct slice key, byte extra);
     ++int reftable_encode_key(bool *restart, struct string_view dest,
     ++			struct strbuf prev_key, struct strbuf key,
     ++			uint8_t extra);
      +
      +/* Decode into `key` and `extra` from `in` */
     -+int reftable_decode_key(struct slice *key, byte *extra, struct slice last_key,
     -+			struct slice in);
     ++int reftable_decode_key(struct strbuf *key, uint8_t *extra,
     ++			struct strbuf last_key, struct string_view in);
      +
      +/* reftable_index_record are used internally to speed up lookups. */
      +struct reftable_index_record {
      +	uint64_t offset; /* Offset of block */
     -+	struct slice last_key; /* Last key of the block. */
     ++	struct strbuf last_key; /* Last key of the block. */
      +};
      +
      +/* reftable_obj_record stores an object ID => ref mapping. */
      +struct reftable_obj_record {
     -+	byte *hash_prefix; /* leading bytes of the object ID */
     ++	uint8_t *hash_prefix; /* leading bytes of the object ID */
      +	int hash_prefix_len; /* number of leading bytes. Constant
      +			      * across a single table. */
      +	uint64_t *offsets; /* a vector of file offsets. */
     @@ reftable/record.h (new)
      +
      +/* see struct record_vtable */
      +
     -+void reftable_record_key(struct reftable_record *rec, struct slice *dest);
     -+byte reftable_record_type(struct reftable_record *rec);
     ++void reftable_record_key(struct reftable_record *rec, struct strbuf *dest);
     ++uint8_t reftable_record_type(struct reftable_record *rec);
      +void reftable_record_copy_from(struct reftable_record *rec,
      +			       struct reftable_record *src, int hash_size);
     -+byte reftable_record_val_type(struct reftable_record *rec);
     -+int reftable_record_encode(struct reftable_record *rec, struct slice dest,
     ++uint8_t reftable_record_val_type(struct reftable_record *rec);
     ++int reftable_record_encode(struct reftable_record *rec, struct string_view dest,
     ++			   int hash_size);
     ++int reftable_record_decode(struct reftable_record *rec, struct strbuf key,
     ++			   uint8_t extra, struct string_view src,
      +			   int hash_size);
     -+int reftable_record_decode(struct reftable_record *rec, struct slice key,
     -+			   byte extra, struct slice src, int hash_size);
      +bool reftable_record_is_deletion(struct reftable_record *rec);
      +
      +/* zeroes out the embedded record */
     @@ reftable/record_test.c (new)
      +			      ((uint64_t)1 << 63) + ((uint64_t)1 << 63) - 1 };
      +	int i = 0;
      +	for (i = 0; i < ARRAY_SIZE(inputs); i++) {
     -+		byte dest[10];
     -+
     -+		struct slice out = { .buf = dest, .len = 10, .cap = 10 };
     ++		uint8_t dest[10];
      +
     ++		struct string_view out = {
     ++			.buf = dest,
     ++			.len = sizeof(dest),
     ++		};
      +		uint64_t in = inputs[i];
      +		int n = put_var_int(&out, in);
      +		uint64_t got = 0;
     @@ reftable/record_test.c (new)
      +
      +	int i = 0;
      +	for (i = 0; i < ARRAY_SIZE(cases); i++) {
     -+		struct slice a = SLICE_INIT;
     -+		struct slice b = SLICE_INIT;
     -+		slice_addstr(&a, cases[i].a);
     -+		slice_addstr(&b, cases[i].b);
     ++		struct strbuf a = STRBUF_INIT;
     ++		struct strbuf b = STRBUF_INIT;
     ++		strbuf_addstr(&a, cases[i].a);
     ++		strbuf_addstr(&b, cases[i].b);
      +		assert(common_prefix_size(&a, &b) == cases[i].want);
      +
     -+		slice_release(&a);
     -+		slice_release(&b);
     ++		strbuf_release(&a);
     ++		strbuf_release(&b);
      +	}
      +}
      +
     -+static void set_hash(byte *h, int j)
     ++static void set_hash(uint8_t *h, int j)
      +{
      +	int i = 0;
      +	for (i = 0; i < hash_size(SHA1_ID); i++) {
     @@ reftable/record_test.c (new)
      +			.target = xstrdup("old value"),
      +		};
      +		struct reftable_record rec_out = { 0 };
     -+		struct slice key = SLICE_INIT;
     ++		struct strbuf key = STRBUF_INIT;
      +		struct reftable_record rec = { 0 };
     -+		struct slice dest = SLICE_INIT;
     ++		uint8_t buffer[1024] = { 0 };
     ++		struct string_view dest = {
     ++			.buf = buffer,
     ++			.len = sizeof(buffer),
     ++		};
     ++
      +		int n, m;
      +
      +		switch (i) {
     @@ reftable/record_test.c (new)
      +		assert(reftable_record_val_type(&rec) == i);
      +
      +		reftable_record_key(&rec, &key);
     -+		slice_grow(&dest, 1024);
     -+		slice_setlen(&dest, 1024);
      +		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
      +		assert(n > 0);
      +
     @@ reftable/record_test.c (new)
      +		assert((out.target != NULL) == (in.target != NULL));
      +		reftable_record_clear(&rec_out);
      +
     -+		slice_release(&key);
     -+		slice_release(&dest);
     ++		strbuf_release(&key);
      +		reftable_ref_record_clear(&in);
      +	}
      +}
     @@ reftable/record_test.c (new)
      +	set_test_hash(in[0].old_hash, 2);
      +	for (int i = 0; i < ARRAY_SIZE(in); i++) {
      +		struct reftable_record rec = { 0 };
     -+		struct slice key = SLICE_INIT;
     -+		struct slice dest = SLICE_INIT;
     ++		struct strbuf key = STRBUF_INIT;
     ++		uint8_t buffer[1024] = { 0 };
     ++		struct string_view dest = {
     ++			.buf = buffer,
     ++			.len = sizeof(buffer),
     ++		};
      +		/* populate out, to check for leaks. */
      +		struct reftable_log_record out = {
      +			.ref_name = xstrdup("old name"),
     @@ reftable/record_test.c (new)
      +
      +		reftable_record_key(&rec, &key);
      +
     -+		slice_grow(&dest, 1024);
     -+		slice_setlen(&dest, 1024);
     -+
      +		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
      +		assert(n >= 0);
      +		reftable_record_from_log(&rec_out, &out);
     @@ reftable/record_test.c (new)
      +
      +		assert(reftable_log_record_equal(&in[i], &out, SHA1_SIZE));
      +		reftable_log_record_clear(&in[i]);
     -+		slice_release(&key);
     -+		slice_release(&dest);
     ++		strbuf_release(&key);
      +		reftable_record_clear(&rec_out);
      +	}
      +}
     @@ reftable/record_test.c (new)
      +static void test_u24_roundtrip(void)
      +{
      +	uint32_t in = 0x112233;
     -+	byte dest[3];
     ++	uint8_t dest[3];
      +	uint32_t out;
      +	put_be24(dest, in);
      +	out = get_be24(dest);
     @@ reftable/record_test.c (new)
      +
      +static void test_key_roundtrip(void)
      +{
     -+	struct slice dest = SLICE_INIT;
     -+	struct slice last_key = SLICE_INIT;
     -+	struct slice key = SLICE_INIT;
     -+	struct slice roundtrip = SLICE_INIT;
     ++	uint8_t buffer[1024] = { 0 };
     ++	struct string_view dest = {
     ++		.buf = buffer,
     ++		.len = sizeof(buffer),
     ++	};
     ++	struct strbuf last_key = STRBUF_INIT;
     ++	struct strbuf key = STRBUF_INIT;
     ++	struct strbuf roundtrip = STRBUF_INIT;
      +	bool restart;
     -+	byte extra;
     ++	uint8_t extra;
      +	int n, m;
     -+	byte rt_extra;
     ++	uint8_t rt_extra;
      +
     -+	slice_grow(&dest, 1024);
     -+	slice_setlen(&dest, 1024);
     -+	slice_addstr(&last_key, "refs/heads/master");
     -+	slice_addstr(&key, "refs/tags/bla");
     ++	strbuf_addstr(&last_key, "refs/heads/master");
     ++	strbuf_addstr(&key, "refs/tags/bla");
      +	extra = 6;
      +	n = reftable_encode_key(&restart, dest, last_key, key, extra);
      +	assert(!restart);
     @@ reftable/record_test.c (new)
      +
      +	m = reftable_decode_key(&roundtrip, &rt_extra, last_key, dest);
      +	assert(n == m);
     -+	assert(0 == slice_cmp(&key, &roundtrip));
     ++	assert(0 == strbuf_cmp(&key, &roundtrip));
      +	assert(rt_extra == extra);
      +
     -+	slice_release(&last_key);
     -+	slice_release(&key);
     -+	slice_release(&dest);
     -+	slice_release(&roundtrip);
     ++	strbuf_release(&last_key);
     ++	strbuf_release(&key);
     ++	strbuf_release(&roundtrip);
      +}
      +
      +static void test_reftable_obj_record_roundtrip(void)
      +{
     -+	byte testHash1[SHA1_SIZE] = { 1, 2, 3, 4, 0 };
     ++	uint8_t testHash1[SHA1_SIZE] = { 1, 2, 3, 4, 0 };
      +	uint64_t till9[] = { 1, 2, 3, 4, 500, 600, 700, 800, 9000 };
      +	struct reftable_obj_record recs[3] = { {
      +						       .hash_prefix = testHash1,
     @@ reftable/record_test.c (new)
      +	int i = 0;
      +	for (i = 0; i < ARRAY_SIZE(recs); i++) {
      +		struct reftable_obj_record in = recs[i];
     -+		struct slice dest = SLICE_INIT;
     ++		uint8_t buffer[1024] = { 0 };
     ++		struct string_view dest = {
     ++			.buf = buffer,
     ++			.len = sizeof(buffer),
     ++		};
      +		struct reftable_record rec = { 0 };
     -+		struct slice key = SLICE_INIT;
     ++		struct strbuf key = STRBUF_INIT;
      +		struct reftable_obj_record out = { 0 };
      +		struct reftable_record rec_out = { 0 };
      +		int n, m;
     -+		byte extra;
     ++		uint8_t extra;
      +
      +		reftable_record_from_obj(&rec, &in);
      +		test_copy(&rec);
      +		reftable_record_key(&rec, &key);
     -+		slice_grow(&dest, 1024);
     -+		slice_setlen(&dest, 1024);
      +		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
      +		assert(n > 0);
      +		extra = reftable_record_val_type(&rec);
     @@ reftable/record_test.c (new)
      +			       in.hash_prefix_len));
      +		assert(0 == memcmp(in.offsets, out.offsets,
      +				   sizeof(uint64_t) * in.offset_len));
     -+		slice_release(&key);
     -+		slice_release(&dest);
     ++		strbuf_release(&key);
      +		reftable_record_clear(&rec_out);
      +	}
      +}
     @@ reftable/record_test.c (new)
      +{
      +	struct reftable_index_record in = {
      +		.offset = 42,
     -+		.last_key = SLICE_INIT,
     ++		.last_key = STRBUF_INIT,
     ++	};
     ++	uint8_t buffer[1024] = { 0 };
     ++	struct string_view dest = {
     ++		.buf = buffer,
     ++		.len = sizeof(buffer),
      +	};
     -+	struct slice dest = SLICE_INIT;
     -+	struct slice key = SLICE_INIT;
     ++	struct strbuf key = STRBUF_INIT;
      +	struct reftable_record rec = { 0 };
     -+	struct reftable_index_record out = { .last_key = SLICE_INIT };
     ++	struct reftable_index_record out = { .last_key = STRBUF_INIT };
      +	struct reftable_record out_rec = { NULL };
      +	int n, m;
     -+	byte extra;
     ++	uint8_t extra;
      +
     -+	slice_addstr(&in.last_key, "refs/heads/master");
     ++	strbuf_addstr(&in.last_key, "refs/heads/master");
      +	reftable_record_from_index(&rec, &in);
      +	reftable_record_key(&rec, &key);
      +	test_copy(&rec);
      +
     -+	assert(0 == slice_cmp(&key, &in.last_key));
     -+	slice_grow(&dest, 1024);
     -+	slice_setlen(&dest, 1024);
     ++	assert(0 == strbuf_cmp(&key, &in.last_key));
      +	n = reftable_record_encode(&rec, dest, SHA1_SIZE);
      +	assert(n > 0);
      +
     @@ reftable/record_test.c (new)
      +	assert(in.offset == out.offset);
      +
      +	reftable_record_clear(&out_rec);
     -+	slice_release(&key);
     -+	slice_release(&in.last_key);
     -+	slice_release(&dest);
     ++	strbuf_release(&key);
     ++	strbuf_release(&in.last_key);
      +}
      +
      +int record_test_main(int argc, const char *argv[])
     @@ reftable/refname.c (new)
      +#include "reftable.h"
      +#include "basics.h"
      +#include "refname.h"
     -+#include "slice.h"
     ++#include "strbuf.h"
      +
      +struct find_arg {
      +	char **names;
     @@ reftable/refname.c (new)
      +	return err;
      +}
      +
     -+static void slice_trim_component(struct slice *sl)
     ++static void strbuf_trim_component(struct strbuf *sl)
      +{
      +	while (sl->len > 0) {
      +		bool is_slash = (sl->buf[sl->len - 1] == '/');
     -+		slice_setlen(sl, sl->len - 1);
     ++		strbuf_setlen(sl, sl->len - 1);
      +		if (is_slash)
      +			break;
      +	}
     @@ reftable/refname.c (new)
      +
      +int modification_validate(struct modification *mod)
      +{
     -+	struct slice slashed = SLICE_INIT;
     ++	struct strbuf slashed = STRBUF_INIT;
      +	int err = 0;
      +	int i = 0;
      +	for (; i < mod->add_len; i++) {
      +		err = validate_ref_name(mod->add[i]);
      +		if (err)
      +			goto done;
     -+		slice_reset(&slashed);
     -+		slice_addstr(&slashed, mod->add[i]);
     -+		slice_addstr(&slashed, "/");
     ++		strbuf_reset(&slashed);
     ++		strbuf_addstr(&slashed, mod->add[i]);
     ++		strbuf_addstr(&slashed, "/");
      +
     -+		err = modification_has_ref_with_prefix(
     -+			mod, slice_as_string(&slashed));
     ++		err = modification_has_ref_with_prefix(mod, slashed.buf);
      +		if (err == 0) {
      +			err = REFTABLE_NAME_CONFLICT;
      +			goto done;
     @@ reftable/refname.c (new)
      +		if (err < 0)
      +			goto done;
      +
     -+		slice_reset(&slashed);
     -+		slice_addstr(&slashed, mod->add[i]);
     ++		strbuf_reset(&slashed);
     ++		strbuf_addstr(&slashed, mod->add[i]);
      +		while (slashed.len) {
     -+			slice_trim_component(&slashed);
     -+			err = modification_has_ref(mod,
     -+						   slice_as_string(&slashed));
     ++			strbuf_trim_component(&slashed);
     ++			err = modification_has_ref(mod, slashed.buf);
      +			if (err == 0) {
      +				err = REFTABLE_NAME_CONFLICT;
      +				goto done;
     @@ reftable/refname.c (new)
      +	}
      +	err = 0;
      +done:
     -+	slice_release(&slashed);
     ++	strbuf_release(&slashed);
      +	return err;
      +}
      
     @@ reftable/refname_test.c (new)
      +static void test_conflict(void)
      +{
      +	struct reftable_write_options opts = { 0 };
     -+	struct slice buf = SLICE_INIT;
     ++	struct strbuf buf = STRBUF_INIT;
      +	struct reftable_writer *w =
     -+		reftable_new_writer(&slice_add_void, &buf, &opts);
     ++		reftable_new_writer(&strbuf_add_void, &buf, &opts);
      +	struct reftable_ref_record rec = {
      +		.ref_name = "a/b",
      +		.target = "destination", /* make sure it's not a symref. */
     @@ reftable/refname_test.c (new)
      +	assert_err(err);
      +	reftable_writer_free(w);
      +
     -+	block_source_from_slice(&source, &buf);
     ++	block_source_from_strbuf(&source, &buf);
      +	err = reftable_new_reader(&rd, &source, "filename");
      +	assert_err(err);
      +
     @@ reftable/refname_test.c (new)
      +	}
      +
      +	reftable_reader_free(rd);
     -+	slice_release(&buf);
     ++	strbuf_release(&buf);
      +}
      +
      +int refname_test_main(int argc, const char *argv[])
     @@ reftable/reftable-tests.h (new)
      +int record_test_main(int argc, const char **argv);
      +int refname_test_main(int argc, const char **argv);
      +int reftable_test_main(int argc, const char **argv);
     -+int slice_test_main(int argc, const char **argv);
     ++int strbuf_test_main(int argc, const char **argv);
      +int stack_test_main(int argc, const char **argv);
      +int tree_test_main(int argc, const char **argv);
      +int reftable_dump_main(int argc, char *const *argv);
     @@ reftable/reftable.h (new)
      +
      +/* reftable_new_writer creates a new writer */
      +struct reftable_writer *
     -+reftable_new_writer(int (*writer_func)(void *, uint8_t *, size_t),
     ++reftable_new_writer(int (*writer_func)(void *, const void *, size_t),
      +		    void *writer_arg, struct reftable_write_options *opts);
      +
      +/* write to a file descriptor. fdp should be an int* pointing to the fd. */
     -+int reftable_fd_write(void *fdp, uint8_t *data, size_t size);
     ++int reftable_fd_write(void *fdp, const void *data, size_t size);
      +
      +/* Set the range of update indices for the records we will add.  When
      +   writing a table into a stack, the min should be at least
     @@ reftable/reftable_test.c (new)
      +
      +static void test_buffer(void)
      +{
     -+	struct slice buf = SLICE_INIT;
     ++	struct strbuf buf = STRBUF_INIT;
      +	struct reftable_block_source source = { NULL };
      +	struct reftable_block out = { 0 };
      +	int n;
     -+	byte in[] = "hello";
     -+	slice_add(&buf, in, sizeof(in));
     -+	block_source_from_slice(&source, &buf);
     ++	uint8_t in[] = "hello";
     ++	strbuf_add(&buf, in, sizeof(in));
     ++	block_source_from_strbuf(&source, &buf);
      +	assert(block_source_size(&source) == 6);
      +	n = block_source_read_block(&source, &out, 0, sizeof(in));
      +	assert(n == sizeof(in));
     @@ reftable/reftable_test.c (new)
      +
      +	reftable_block_done(&out);
      +	block_source_close(&source);
     -+	slice_release(&buf);
     ++	strbuf_release(&buf);
      +}
      +
      +static void test_default_write_opts(void)
      +{
      +	struct reftable_write_options opts = { 0 };
     -+	struct slice buf = SLICE_INIT;
     ++	struct strbuf buf = STRBUF_INIT;
      +	struct reftable_writer *w =
     -+		reftable_new_writer(&slice_add_void, &buf, &opts);
     ++		reftable_new_writer(&strbuf_add_void, &buf, &opts);
      +
      +	struct reftable_ref_record rec = {
      +		.ref_name = "master",
     @@ reftable/reftable_test.c (new)
      +	assert_err(err);
      +	reftable_writer_free(w);
      +
     -+	block_source_from_slice(&source, &buf);
     ++	block_source_from_strbuf(&source, &buf);
      +
      +	err = reftable_new_reader(&rd, &source, "filename");
      +	assert_err(err);
     @@ reftable/reftable_test.c (new)
      +
      +	reftable_merged_table_close(merged);
      +	reftable_merged_table_free(merged);
     -+	slice_release(&buf);
     ++	strbuf_release(&buf);
      +}
      +
     -+static void write_table(char ***names, struct slice *buf, int N, int block_size,
     -+			uint32_t hash_id)
     ++static void write_table(char ***names, struct strbuf *buf, int N,
     ++			int block_size, uint32_t hash_id)
      +{
      +	struct reftable_write_options opts = {
      +		.block_size = block_size,
      +		.hash_id = hash_id,
      +	};
      +	struct reftable_writer *w =
     -+		reftable_new_writer(&slice_add_void, buf, &opts);
     ++		reftable_new_writer(&strbuf_add_void, buf, &opts);
      +	struct reftable_ref_record ref = { 0 };
      +	int i = 0, n;
      +	struct reftable_log_record log = { 0 };
     @@ reftable/reftable_test.c (new)
      +	*names = reftable_calloc(sizeof(char *) * (N + 1));
      +	reftable_writer_set_limits(w, update_index, update_index);
      +	for (i = 0; i < N; i++) {
     -+		byte hash[SHA256_SIZE] = { 0 };
     ++		uint8_t hash[SHA256_SIZE] = { 0 };
      +		char name[100];
      +		int n;
      +
     @@ reftable/reftable_test.c (new)
      +	}
      +
      +	for (i = 0; i < N; i++) {
     -+		byte hash[SHA256_SIZE] = { 0 };
     ++		uint8_t hash[SHA256_SIZE] = { 0 };
      +		char name[100];
      +		int n;
      +
     @@ reftable/reftable_test.c (new)
      +
      +static void test_log_buffer_size(void)
      +{
     -+	struct slice buf = SLICE_INIT;
     ++	struct strbuf buf = STRBUF_INIT;
      +	struct reftable_write_options opts = {
      +		.block_size = 4096,
      +	};
     @@ reftable/reftable_test.c (new)
      +		.message = "commit: 9\n",
      +	};
      +	struct reftable_writer *w =
     -+		reftable_new_writer(&slice_add_void, &buf, &opts);
     ++		reftable_new_writer(&strbuf_add_void, &buf, &opts);
      +
      +	/* This tests buffer extension for log compression. Must use a random
      +	   hash, to ensure that the compressed part is larger than the original.
      +	*/
     -+	byte hash1[SHA1_SIZE], hash2[SHA1_SIZE];
     ++	uint8_t hash1[SHA1_SIZE], hash2[SHA1_SIZE];
      +	for (int i = 0; i < SHA1_SIZE; i++) {
     -+		hash1[i] = (byte)(rand() % 256);
     -+		hash2[i] = (byte)(rand() % 256);
     ++		hash1[i] = (uint8_t)(rand() % 256);
     ++		hash2[i] = (uint8_t)(rand() % 256);
      +	}
      +	log.old_hash = hash1;
      +	log.new_hash = hash2;
     @@ reftable/reftable_test.c (new)
      +	err = reftable_writer_close(w);
      +	assert_err(err);
      +	reftable_writer_free(w);
     -+	slice_release(&buf);
     ++	strbuf_release(&buf);
      +}
      +
      +static void test_log_write_read(void)
     @@ reftable/reftable_test.c (new)
      +	struct reftable_iterator it = { 0 };
      +	struct reftable_reader rd = { 0 };
      +	struct reftable_block_source source = { 0 };
     -+	struct slice buf = SLICE_INIT;
     ++	struct strbuf buf = STRBUF_INIT;
      +	struct reftable_writer *w =
     -+		reftable_new_writer(&slice_add_void, &buf, &opts);
     ++		reftable_new_writer(&strbuf_add_void, &buf, &opts);
      +	const struct reftable_stats *stats = NULL;
      +	reftable_writer_set_limits(w, 0, N);
      +	for (i = 0; i < N; i++) {
     @@ reftable/reftable_test.c (new)
      +		assert_err(err);
      +	}
      +	for (i = 0; i < N; i++) {
     -+		byte hash1[SHA1_SIZE], hash2[SHA1_SIZE];
     ++		uint8_t hash1[SHA1_SIZE], hash2[SHA1_SIZE];
      +		struct reftable_log_record log = { 0 };
      +		set_test_hash(hash1, i);
      +		set_test_hash(hash2, i + 1);
     @@ reftable/reftable_test.c (new)
      +	reftable_writer_free(w);
      +	w = NULL;
      +
     -+	block_source_from_slice(&source, &buf);
     ++	block_source_from_strbuf(&source, &buf);
      +
      +	err = init_reader(&rd, &source, "file.log");
      +	assert_err(err);
     @@ reftable/reftable_test.c (new)
      +	reftable_iterator_destroy(&it);
      +
      +	/* cleanup. */
     -+	slice_release(&buf);
     ++	strbuf_release(&buf);
      +	free_names(names);
      +	reader_close(&rd);
      +}
     @@ reftable/reftable_test.c (new)
      +static void test_table_read_write_sequential(void)
      +{
      +	char **names;
     -+	struct slice buf = SLICE_INIT;
     ++	struct strbuf buf = STRBUF_INIT;
      +	int N = 50;
      +	struct reftable_iterator it = { 0 };
      +	struct reftable_block_source source = { 0 };
     @@ reftable/reftable_test.c (new)
      +
      +	write_table(&names, &buf, N, 256, SHA1_ID);
      +
     -+	block_source_from_slice(&source, &buf);
     ++	block_source_from_strbuf(&source, &buf);
      +
      +	err = init_reader(&rd, &source, "file.ref");
      +	assert_err(err);
     @@ reftable/reftable_test.c (new)
      +	}
      +	assert(j == N);
      +	reftable_iterator_destroy(&it);
     -+	slice_release(&buf);
     ++	strbuf_release(&buf);
      +	free_names(names);
      +
      +	reader_close(&rd);
     @@ reftable/reftable_test.c (new)
      +static void test_table_write_small_table(void)
      +{
      +	char **names;
     -+	struct slice buf = SLICE_INIT;
     ++	struct strbuf buf = STRBUF_INIT;
      +	int N = 1;
      +	write_table(&names, &buf, N, 4096, SHA1_ID);
      +	assert(buf.len < 200);
     -+	slice_release(&buf);
     ++	strbuf_release(&buf);
      +	free_names(names);
      +}
      +
      +static void test_table_read_api(void)
      +{
      +	char **names;
     -+	struct slice buf = SLICE_INIT;
     ++	struct strbuf buf = STRBUF_INIT;
      +	int N = 50;
      +	struct reftable_reader rd = { 0 };
      +	struct reftable_block_source source = { 0 };
     @@ reftable/reftable_test.c (new)
      +
      +	write_table(&names, &buf, N, 256, SHA1_ID);
      +
     -+	block_source_from_slice(&source, &buf);
     ++	block_source_from_strbuf(&source, &buf);
      +
      +	err = init_reader(&rd, &source, "file.ref");
      +	assert_err(err);
     @@ reftable/reftable_test.c (new)
      +	err = reftable_iterator_next_log(&it, &log);
      +	assert(err == REFTABLE_API_ERROR);
      +
     -+	slice_release(&buf);
     ++	strbuf_release(&buf);
      +	for (i = 0; i < N; i++) {
      +		reftable_free(names[i]);
      +	}
      +	reftable_iterator_destroy(&it);
      +	reftable_free(names);
      +	reader_close(&rd);
     -+	slice_release(&buf);
     ++	strbuf_release(&buf);
      +}
      +
      +static void test_table_read_write_seek(bool index, int hash_id)
      +{
      +	char **names;
     -+	struct slice buf = SLICE_INIT;
     ++	struct strbuf buf = STRBUF_INIT;
      +	int N = 50;
      +	struct reftable_reader rd = { 0 };
      +	struct reftable_block_source source = { 0 };
     @@ reftable/reftable_test.c (new)
      +	int i = 0;
      +
      +	struct reftable_iterator it = { 0 };
     -+	struct slice pastLast = SLICE_INIT;
     ++	struct strbuf pastLast = STRBUF_INIT;
      +	struct reftable_ref_record ref = { 0 };
      +
      +	write_table(&names, &buf, N, 256, hash_id);
      +
     -+	block_source_from_slice(&source, &buf);
     ++	block_source_from_strbuf(&source, &buf);
      +
      +	err = init_reader(&rd, &source, "file.ref");
      +	assert_err(err);
     @@ reftable/reftable_test.c (new)
      +		reftable_iterator_destroy(&it);
      +	}
      +
     -+	slice_addstr(&pastLast, names[N - 1]);
     -+	slice_addstr(&pastLast, "/");
     ++	strbuf_addstr(&pastLast, names[N - 1]);
     ++	strbuf_addstr(&pastLast, "/");
      +
     -+	err = reftable_reader_seek_ref(&rd, &it, slice_as_string(&pastLast));
     ++	err = reftable_reader_seek_ref(&rd, &it, pastLast.buf);
      +	if (err == 0) {
      +		struct reftable_ref_record ref = { 0 };
      +		int err = reftable_iterator_next_ref(&it, &ref);
     @@ reftable/reftable_test.c (new)
      +		assert(err > 0);
      +	}
      +
     -+	slice_release(&pastLast);
     ++	strbuf_release(&pastLast);
      +	reftable_iterator_destroy(&it);
      +
     -+	slice_release(&buf);
     ++	strbuf_release(&buf);
      +	for (i = 0; i < N; i++) {
      +		reftable_free(names[i]);
      +	}
     @@ reftable/reftable_test.c (new)
      +	int N = 50;
      +	char **want_names = reftable_calloc(sizeof(char *) * (N + 1));
      +	int want_names_len = 0;
     -+	byte want_hash[SHA1_SIZE];
     ++	uint8_t want_hash[SHA1_SIZE];
      +
      +	struct reftable_write_options opts = {
      +		.block_size = 256,
     @@ reftable/reftable_test.c (new)
      +	struct reftable_reader rd;
      +	struct reftable_block_source source = { 0 };
      +
     -+	struct slice buf = SLICE_INIT;
     ++	struct strbuf buf = STRBUF_INIT;
      +	struct reftable_writer *w =
     -+		reftable_new_writer(&slice_add_void, &buf, &opts);
     ++		reftable_new_writer(&strbuf_add_void, &buf, &opts);
      +
      +	struct reftable_iterator it = { 0 };
      +	int j;
     @@ reftable/reftable_test.c (new)
      +	set_test_hash(want_hash, 4);
      +
      +	for (i = 0; i < N; i++) {
     -+		byte hash[SHA1_SIZE];
     ++		uint8_t hash[SHA1_SIZE];
      +		char fill[51] = { 0 };
      +		char name[100];
     -+		byte hash1[SHA1_SIZE];
     -+		byte hash2[SHA1_SIZE];
     ++		uint8_t hash1[SHA1_SIZE];
     ++		uint8_t hash2[SHA1_SIZE];
      +		struct reftable_ref_record ref = { 0 };
      +
      +		memset(hash, i, sizeof(hash));
     @@ reftable/reftable_test.c (new)
      +	reftable_writer_free(w);
      +	w = NULL;
      +
     -+	block_source_from_slice(&source, &buf);
     ++	block_source_from_strbuf(&source, &buf);
      +
      +	err = init_reader(&rd, &source, "file.ref");
      +	assert_err(err);
     @@ reftable/reftable_test.c (new)
      +	}
      +	assert(j == want_names_len);
      +
     -+	slice_release(&buf);
     ++	strbuf_release(&buf);
      +	free_names(want_names);
      +	reftable_iterator_destroy(&it);
      +	reader_close(&rd);
     @@ reftable/reftable_test.c (new)
      +static void test_table_empty(void)
      +{
      +	struct reftable_write_options opts = { 0 };
     -+	struct slice buf = SLICE_INIT;
     ++	struct strbuf buf = STRBUF_INIT;
      +	struct reftable_writer *w =
     -+		reftable_new_writer(&slice_add_void, &buf, &opts);
     ++		reftable_new_writer(&strbuf_add_void, &buf, &opts);
      +	struct reftable_block_source source = { 0 };
      +	struct reftable_reader *rd = NULL;
      +	struct reftable_ref_record rec = { 0 };
     @@ reftable/reftable_test.c (new)
      +
      +	assert(buf.len == header_size(1) + footer_size(1));
      +
     -+	block_source_from_slice(&source, &buf);
     ++	block_source_from_strbuf(&source, &buf);
      +
      +	err = reftable_new_reader(&rd, &source, "filename");
      +	assert_err(err);
     @@ reftable/reftable_test.c (new)
      +
      +	reftable_iterator_destroy(&it);
      +	reftable_reader_free(rd);
     -+	slice_release(&buf);
     ++	strbuf_release(&buf);
      +}
      +
      +int reftable_test_main(int argc, const char *argv[])
     @@ reftable/reftable_test.c (new)
      +	return test_main(argc, argv);
      +}
      
     - ## reftable/slice.c (new) ##
     + ## reftable/stack.c (new) ##
      @@
      +/*
      +Copyright 2020 Google LLC
     @@ reftable/slice.c (new)
      +https://developers.google.com/open-source/licenses/bsd
      +*/
      +
     -+#include "slice.h"
     ++#include "stack.h"
      +
      +#include "system.h"
     -+
     ++#include "merged.h"
     ++#include "reader.h"
     ++#include "refname.h"
      +#include "reftable.h"
     ++#include "writer.h"
      +
     -+struct slice reftable_empty_slice = SLICE_INIT;
     -+
     -+void slice_init(struct slice *s)
     -+{
     -+	struct slice empty = SLICE_INIT;
     -+	*s = empty;
     -+}
     -+
     -+void slice_grow(struct slice *s, size_t extra)
     ++int reftable_new_stack(struct reftable_stack **dest, const char *dir,
     ++		       struct reftable_write_options config)
      +{
     -+	size_t newcap = s->len + extra + 1;
     -+	if (newcap > s->cap) {
     -+		s->buf = reftable_realloc(s->buf, newcap);
     -+		s->cap = newcap;
     -+	}
     -+}
     ++	struct reftable_stack *p =
     ++		reftable_calloc(sizeof(struct reftable_stack));
     ++	struct strbuf list_file_name = STRBUF_INIT;
     ++	int err = 0;
      +
     -+static void slice_resize(struct slice *s, int l)
     -+{
     -+	int zl = l + 1; /* one byte for 0 termination. */
     -+	assert(s->canary == SLICE_CANARY);
     -+	if (s->cap < zl) {
     -+		int c = s->cap * 2;
     -+		if (c < zl) {
     -+			c = zl;
     -+		}
     -+		s->cap = c;
     -+		s->buf = reftable_realloc(s->buf, s->cap);
     ++	if (config.hash_id == 0) {
     ++		config.hash_id = SHA1_ID;
      +	}
     -+	s->len = l;
     -+	s->buf[l] = 0;
     -+}
     -+
     -+void slice_setlen(struct slice *s, size_t l)
     -+{
     -+	assert(s->cap >= l + 1);
     -+	s->len = l;
     -+	s->buf[l] = 0;
     -+}
      +
     -+void slice_reset(struct slice *s)
     -+{
     -+	slice_resize(s, 0);
     -+}
     ++	*dest = NULL;
      +
     -+void slice_addstr(struct slice *d, const char *s)
     -+{
     -+	int l1 = d->len;
     -+	int l2 = strlen(s);
     -+	assert(d->canary == SLICE_CANARY);
     ++	strbuf_reset(&list_file_name);
     ++	strbuf_addstr(&list_file_name, dir);
     ++	strbuf_addstr(&list_file_name, "/tables.list");
      +
     -+	slice_resize(d, l2 + l1);
     -+	memcpy(d->buf + l1, s, l2);
     -+}
     ++	p->list_file = strbuf_detach(&list_file_name, NULL);
     ++	p->reftable_dir = xstrdup(dir);
     ++	p->config = config;
      +
     -+void slice_addbuf(struct slice *s, struct slice *a)
     -+{
     -+	int end = s->len;
     -+	assert(s->canary == SLICE_CANARY);
     -+	slice_resize(s, s->len + a->len);
     -+	memcpy(s->buf + end, a->buf, a->len);
     ++	err = reftable_stack_reload_maybe_reuse(p, true);
     ++	if (err < 0) {
     ++		reftable_stack_destroy(p);
     ++	} else {
     ++		*dest = p;
     ++	}
     ++	return err;
      +}
      +
     -+void slice_consume(struct slice *s, int n)
     ++static int fd_read_lines(int fd, char ***namesp)
      +{
     -+	assert(s->canary == SLICE_CANARY);
     -+	s->buf += n;
     -+	s->len -= n;
     -+}
     ++	off_t size = lseek(fd, 0, SEEK_END);
     ++	char *buf = NULL;
     ++	int err = 0;
     ++	if (size < 0) {
     ++		err = REFTABLE_IO_ERROR;
     ++		goto done;
     ++	}
     ++	err = lseek(fd, 0, SEEK_SET);
     ++	if (err < 0) {
     ++		err = REFTABLE_IO_ERROR;
     ++		goto done;
     ++	}
      +
     -+char *slice_detach(struct slice *s)
     -+{
     -+	char *p = NULL;
     -+	slice_as_string(s);
     -+	p = (char *)s->buf;
     -+	s->buf = NULL;
     -+	s->cap = 0;
     -+	s->len = 0;
     -+	return p;
     -+}
     ++	buf = reftable_malloc(size + 1);
     ++	if (read(fd, buf, size) != size) {
     ++		err = REFTABLE_IO_ERROR;
     ++		goto done;
     ++	}
     ++	buf[size] = 0;
      +
     -+void slice_release(struct slice *s)
     -+{
     -+	byte *ptr = s->buf;
     -+	assert(s->canary == SLICE_CANARY);
     -+	s->buf = NULL;
     -+	s->cap = 0;
     -+	s->len = 0;
     -+	reftable_free(ptr);
     -+}
     ++	parse_names(buf, size, namesp);
      +
     -+/* return the underlying data as char*. len is left unchanged, but
     -+   a \0 is added at the end. */
     -+const char *slice_as_string(struct slice *s)
     -+{
     -+	return (const char *)s->buf;
     ++done:
     ++	reftable_free(buf);
     ++	return err;
      +}
      +
     -+int slice_cmp(const struct slice *a, const struct slice *b)
     ++int read_lines(const char *filename, char ***namesp)
      +{
     -+	int min = a->len < b->len ? a->len : b->len;
     -+	int res = memcmp(a->buf, b->buf, min);
     -+	assert(a->canary == SLICE_CANARY);
     -+	assert(b->canary == SLICE_CANARY);
     -+	if (res != 0)
     -+		return res;
     -+	if (a->len < b->len)
     -+		return -1;
     -+	else if (a->len > b->len)
     -+		return 1;
     -+	else
     -+		return 0;
     -+}
     ++	int fd = open(filename, O_RDONLY, 0644);
     ++	int err = 0;
     ++	if (fd < 0) {
     ++		if (errno == ENOENT) {
     ++			*namesp = reftable_calloc(sizeof(char *));
     ++			return 0;
     ++		}
      +
     -+int slice_add(struct slice *b, const byte *data, size_t sz)
     -+{
     -+	assert(b->canary == SLICE_CANARY);
     -+	slice_grow(b, sz);
     -+	memcpy(b->buf + b->len, data, sz);
     -+	b->len += sz;
     -+	b->buf[b->len] = 0;
     -+	return sz;
     ++		return REFTABLE_IO_ERROR;
     ++	}
     ++	err = fd_read_lines(fd, namesp);
     ++	close(fd);
     ++	return err;
      +}
      +
     -+int slice_add_void(void *b, byte *data, size_t sz)
     ++struct reftable_merged_table *
     ++reftable_stack_merged_table(struct reftable_stack *st)
      +{
     -+	return slice_add((struct slice *)b, data, sz);
     ++	return st->merged;
      +}
      +
     -+static uint64_t slice_size(void *b)
     ++/* Close and free the stack */
     ++void reftable_stack_destroy(struct reftable_stack *st)
      +{
     -+	return ((struct slice *)b)->len;
     ++	if (st->merged != NULL) {
     ++		reftable_merged_table_close(st->merged);
     ++		reftable_merged_table_free(st->merged);
     ++		st->merged = NULL;
     ++	}
     ++	FREE_AND_NULL(st->list_file);
     ++	FREE_AND_NULL(st->reftable_dir);
     ++	reftable_free(st);
      +}
      +
     -+static void slice_return_block(void *b, struct reftable_block *dest)
     ++static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
     ++						   int cur_len)
      +{
     -+	memset(dest->data, 0xff, dest->len);
     -+	reftable_free(dest->data);
     ++	struct reftable_reader **cur =
     ++		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
     ++	int i = 0;
     ++	for (i = 0; i < cur_len; i++) {
     ++		cur[i] = st->merged->stack[i];
     ++	}
     ++	return cur;
      +}
      +
     -+static void slice_close(void *b)
     ++static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
     ++				      bool reuse_open)
      +{
     -+}
     ++	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
     ++	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
     ++	int err = 0;
     ++	int names_len = names_length(names);
     ++	struct reftable_reader **new_tables =
     ++		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
     ++	int new_tables_len = 0;
     ++	struct reftable_merged_table *new_merged = NULL;
     ++	int i;
      +
     -+static int slice_read_block(void *v, struct reftable_block *dest, uint64_t off,
     -+			    uint32_t size)
     -+{
     -+	struct slice *b = (struct slice *)v;
     -+	assert(off + size <= b->len);
     -+	dest->data = reftable_calloc(size);
     -+	memcpy(dest->data, b->buf + off, size);
     -+	dest->len = size;
     -+	return size;
     -+}
     ++	while (*names) {
     ++		struct reftable_reader *rd = NULL;
     ++		char *name = *names++;
      +
     -+struct reftable_block_source_vtable slice_vtable = {
     -+	.size = &slice_size,
     -+	.read_block = &slice_read_block,
     -+	.return_block = &slice_return_block,
     -+	.close = &slice_close,
     -+};
     ++		/* this is linear; we assume compaction keeps the number of
     ++		   tables under control so this is not quadratic. */
     ++		int j = 0;
     ++		for (j = 0; reuse_open && j < cur_len; j++) {
     ++			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
     ++				rd = cur[j];
     ++				cur[j] = NULL;
     ++				break;
     ++			}
     ++		}
      +
     -+void block_source_from_slice(struct reftable_block_source *bs,
     -+			     struct slice *buf)
     -+{
     -+	assert(bs->ops == NULL);
     -+	bs->ops = &slice_vtable;
     -+	bs->arg = buf;
     -+}
     ++		if (rd == NULL) {
     ++			struct reftable_block_source src = { 0 };
     ++			struct strbuf table_path = STRBUF_INIT;
     ++			strbuf_addstr(&table_path, st->reftable_dir);
     ++			strbuf_addstr(&table_path, "/");
     ++			strbuf_addstr(&table_path, name);
      +
     -+static void malloc_return_block(void *b, struct reftable_block *dest)
     -+{
     -+	memset(dest->data, 0xff, dest->len);
     -+	reftable_free(dest->data);
     -+}
     ++			err = reftable_block_source_from_file(&src,
     ++							      table_path.buf);
     ++			strbuf_release(&table_path);
      +
     -+struct reftable_block_source_vtable malloc_vtable = {
     -+	.return_block = &malloc_return_block,
     -+};
     ++			if (err < 0)
     ++				goto done;
      +
     -+struct reftable_block_source malloc_block_source_instance = {
     -+	.ops = &malloc_vtable,
     -+};
     ++			err = reftable_new_reader(&rd, &src, name);
     ++			if (err < 0)
     ++				goto done;
     ++		}
      +
     -+struct reftable_block_source malloc_block_source(void)
     -+{
     -+	return malloc_block_source_instance;
     -+}
     -+
     -+int common_prefix_size(struct slice *a, struct slice *b)
     -+{
     -+	int p = 0;
     -+	assert(a->canary == SLICE_CANARY);
     -+	assert(b->canary == SLICE_CANARY);
     -+	while (p < a->len && p < b->len) {
     -+		if (a->buf[p] != b->buf[p]) {
     -+			break;
     -+		}
     -+		p++;
     -+	}
     -+
     -+	return p;
     -+}
     -
     - ## reftable/slice.h (new) ##
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#ifndef SLICE_H
     -+#define SLICE_H
     -+
     -+#include "basics.h"
     -+#include "reftable.h"
     -+
     -+/*
     -+  Provides a bounds-checked, growable byte ranges. To use, initialize as "slice
     -+  x = SLICE_INIT;"
     -+ */
     -+struct slice {
     -+	int len;
     -+	int cap;
     -+	byte *buf;
     -+
     -+	/* Used to enforce initialization with SLICE_INIT */
     -+	byte canary;
     -+};
     -+#define SLICE_CANARY 0x42
     -+#define SLICE_INIT                       \
     -+	{                                \
     -+		0, 0, NULL, SLICE_CANARY \
     -+	}
     -+extern struct slice reftable_empty_slice;
     -+
     -+void slice_addstr(struct slice *dest, const char *src);
     -+
     -+/* Deallocate and clear slice */
     -+void slice_release(struct slice *slice);
     -+
     -+/* Set slice to 0 length, but retain buffer. */
     -+void slice_reset(struct slice *slice);
     -+
     -+/* Initializes a slice. Accepts a slice with random garbage. */
     -+void slice_init(struct slice *slice);
     -+
     -+/* Ensure that `buf` is \0 terminated. */
     -+const char *slice_as_string(struct slice *src);
     -+
     -+/* Return `buf`, clearing out `s` */
     -+char *slice_detach(struct slice *s);
     -+
     -+/* Set length of the slace to `l`, but don't reallocated. */
     -+void slice_setlen(struct slice *s, size_t l);
     -+
     -+/* Ensure `l` bytes beyond current length are available */
     -+void slice_grow(struct slice *s, size_t l);
     -+
     -+/* Signed comparison */
     -+int slice_cmp(const struct slice *a, const struct slice *b);
     -+
     -+/* Append `data` to the `dest` slice.  */
     -+int slice_add(struct slice *dest, const byte *data, size_t sz);
     -+
     -+/* Append `add` to `dest. */
     -+void slice_addbuf(struct slice *dest, struct slice *add);
     -+
     -+/* Like slice_add, but suitable for passing to reftable_new_writer
     -+ */
     -+int slice_add_void(void *b, byte *data, size_t sz);
     -+
     -+/* Find the longest shared prefix size of `a` and `b` */
     -+int common_prefix_size(struct slice *a, struct slice *b);
     -+
     -+struct reftable_block_source;
     -+
     -+/* Create an in-memory block source for reading reftables */
     -+void block_source_from_slice(struct reftable_block_source *bs,
     -+			     struct slice *buf);
     -+
     -+struct reftable_block_source malloc_block_source(void);
     -+
     -+/* Advance `buf` by `n`, and decrease length. A copy of the slice
     -+   should be kept for deallocating the slice. */
     -+void slice_consume(struct slice *s, int n);
     -+
     -+#endif
     -
     - ## reftable/slice_test.c (new) ##
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#include "slice.h"
     -+
     -+#include "system.h"
     -+
     -+#include "basics.h"
     -+#include "record.h"
     -+#include "reftable.h"
     -+#include "test_framework.h"
     -+#include "reftable-tests.h"
     -+
     -+static void test_slice(void)
     -+{
     -+	struct slice s = SLICE_INIT;
     -+	struct slice t = SLICE_INIT;
     -+
     -+	slice_addstr(&s, "abc");
     -+	assert(0 == strcmp("abc", slice_as_string(&s)));
     -+
     -+	slice_addstr(&t, "pqr");
     -+	slice_addbuf(&s, &t);
     -+	assert(0 == strcmp("abcpqr", slice_as_string(&s)));
     -+
     -+	slice_release(&s);
     -+	slice_release(&t);
     -+}
     -+
     -+int slice_test_main(int argc, const char *argv[])
     -+{
     -+	add_test_case("test_slice", &test_slice);
     -+	return test_main(argc, argv);
     -+}
     -
     - ## reftable/stack.c (new) ##
     -@@
     -+/*
     -+Copyright 2020 Google LLC
     -+
     -+Use of this source code is governed by a BSD-style
     -+license that can be found in the LICENSE file or at
     -+https://developers.google.com/open-source/licenses/bsd
     -+*/
     -+
     -+#include "stack.h"
     -+
     -+#include "system.h"
     -+#include "merged.h"
     -+#include "reader.h"
     -+#include "refname.h"
     -+#include "reftable.h"
     -+#include "writer.h"
     -+
     -+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
     -+		       struct reftable_write_options config)
     -+{
     -+	struct reftable_stack *p =
     -+		reftable_calloc(sizeof(struct reftable_stack));
     -+	struct slice list_file_name = SLICE_INIT;
     -+	int err = 0;
     -+
     -+	if (config.hash_id == 0) {
     -+		config.hash_id = SHA1_ID;
     -+	}
     -+
     -+	*dest = NULL;
     -+
     -+	slice_reset(&list_file_name);
     -+	slice_addstr(&list_file_name, dir);
     -+	slice_addstr(&list_file_name, "/tables.list");
     -+
     -+	p->list_file = slice_detach(&list_file_name);
     -+	p->reftable_dir = xstrdup(dir);
     -+	p->config = config;
     -+
     -+	err = reftable_stack_reload_maybe_reuse(p, true);
     -+	if (err < 0) {
     -+		reftable_stack_destroy(p);
     -+	} else {
     -+		*dest = p;
     -+	}
     -+	return err;
     -+}
     -+
     -+static int fd_read_lines(int fd, char ***namesp)
     -+{
     -+	off_t size = lseek(fd, 0, SEEK_END);
     -+	char *buf = NULL;
     -+	int err = 0;
     -+	if (size < 0) {
     -+		err = REFTABLE_IO_ERROR;
     -+		goto done;
     -+	}
     -+	err = lseek(fd, 0, SEEK_SET);
     -+	if (err < 0) {
     -+		err = REFTABLE_IO_ERROR;
     -+		goto done;
     -+	}
     -+
     -+	buf = reftable_malloc(size + 1);
     -+	if (read(fd, buf, size) != size) {
     -+		err = REFTABLE_IO_ERROR;
     -+		goto done;
     -+	}
     -+	buf[size] = 0;
     -+
     -+	parse_names(buf, size, namesp);
     -+
     -+done:
     -+	reftable_free(buf);
     -+	return err;
     -+}
     -+
     -+int read_lines(const char *filename, char ***namesp)
     -+{
     -+	int fd = open(filename, O_RDONLY, 0644);
     -+	int err = 0;
     -+	if (fd < 0) {
     -+		if (errno == ENOENT) {
     -+			*namesp = reftable_calloc(sizeof(char *));
     -+			return 0;
     -+		}
     -+
     -+		return REFTABLE_IO_ERROR;
     -+	}
     -+	err = fd_read_lines(fd, namesp);
     -+	close(fd);
     -+	return err;
     -+}
     -+
     -+struct reftable_merged_table *
     -+reftable_stack_merged_table(struct reftable_stack *st)
     -+{
     -+	return st->merged;
     -+}
     -+
     -+/* Close and free the stack */
     -+void reftable_stack_destroy(struct reftable_stack *st)
     -+{
     -+	if (st->merged != NULL) {
     -+		reftable_merged_table_close(st->merged);
     -+		reftable_merged_table_free(st->merged);
     -+		st->merged = NULL;
     -+	}
     -+	FREE_AND_NULL(st->list_file);
     -+	FREE_AND_NULL(st->reftable_dir);
     -+	reftable_free(st);
     -+}
     -+
     -+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
     -+						   int cur_len)
     -+{
     -+	struct reftable_reader **cur =
     -+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
     -+	int i = 0;
     -+	for (i = 0; i < cur_len; i++) {
     -+		cur[i] = st->merged->stack[i];
     -+	}
     -+	return cur;
     -+}
     -+
     -+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
     -+				      bool reuse_open)
     -+{
     -+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
     -+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
     -+	int err = 0;
     -+	int names_len = names_length(names);
     -+	struct reftable_reader **new_tables =
     -+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
     -+	int new_tables_len = 0;
     -+	struct reftable_merged_table *new_merged = NULL;
     -+	int i;
     -+
     -+	while (*names) {
     -+		struct reftable_reader *rd = NULL;
     -+		char *name = *names++;
     -+
     -+		/* this is linear; we assume compaction keeps the number of
     -+		   tables under control so this is not quadratic. */
     -+		int j = 0;
     -+		for (j = 0; reuse_open && j < cur_len; j++) {
     -+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
     -+				rd = cur[j];
     -+				cur[j] = NULL;
     -+				break;
     -+			}
     -+		}
     -+
     -+		if (rd == NULL) {
     -+			struct reftable_block_source src = { 0 };
     -+			struct slice table_path = SLICE_INIT;
     -+			slice_addstr(&table_path, st->reftable_dir);
     -+			slice_addstr(&table_path, "/");
     -+			slice_addstr(&table_path, name);
     -+
     -+			err = reftable_block_source_from_file(
     -+				&src, slice_as_string(&table_path));
     -+			slice_release(&table_path);
     -+
     -+			if (err < 0)
     -+				goto done;
     -+
     -+			err = reftable_new_reader(&rd, &src, name);
     -+			if (err < 0)
     -+				goto done;
     -+		}
     -+
     -+		new_tables[new_tables_len++] = rd;
     -+	}
     ++		new_tables[new_tables_len++] = rd;
     ++	}
      +
      +	/* success! */
      +	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
     @@ reftable/stack.c (new)
      +	return 0;
      +}
      +
     -+static void format_name(struct slice *dest, uint64_t min, uint64_t max)
     ++static void format_name(struct strbuf *dest, uint64_t min, uint64_t max)
      +{
      +	char buf[100];
      +	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
     -+	slice_reset(dest);
     -+	slice_addstr(dest, buf);
     ++	strbuf_reset(dest);
     ++	strbuf_addstr(dest, buf);
      +}
      +
      +struct reftable_addition {
      +	int lock_file_fd;
     -+	struct slice lock_file_name;
     ++	struct strbuf lock_file_name;
      +	struct reftable_stack *stack;
      +	char **names;
      +	char **new_tables;
     @@ reftable/stack.c (new)
      +	uint64_t next_update_index;
      +};
      +
     -+#define REFTABLE_ADDITION_INIT               \
     -+	{                                    \
     -+		.lock_file_name = SLICE_INIT \
     ++#define REFTABLE_ADDITION_INIT                \
     ++	{                                     \
     ++		.lock_file_name = STRBUF_INIT \
      +	}
      +
      +static int reftable_stack_init_addition(struct reftable_addition *add,
     @@ reftable/stack.c (new)
      +	int err = 0;
      +	add->stack = st;
      +
     -+	slice_reset(&add->lock_file_name);
     -+	slice_addstr(&add->lock_file_name, st->list_file);
     -+	slice_addstr(&add->lock_file_name, ".lock");
     ++	strbuf_reset(&add->lock_file_name);
     ++	strbuf_addstr(&add->lock_file_name, st->list_file);
     ++	strbuf_addstr(&add->lock_file_name, ".lock");
      +
     -+	add->lock_file_fd = open(slice_as_string(&add->lock_file_name),
     ++	add->lock_file_fd = open(add->lock_file_name.buf,
      +				 O_EXCL | O_CREAT | O_WRONLY, 0644);
      +	if (add->lock_file_fd < 0) {
      +		if (errno == EEXIST) {
     @@ reftable/stack.c (new)
      +void reftable_addition_close(struct reftable_addition *add)
      +{
      +	int i = 0;
     -+	struct slice nm = SLICE_INIT;
     ++	struct strbuf nm = STRBUF_INIT;
      +	for (i = 0; i < add->new_tables_len; i++) {
     -+		slice_reset(&nm);
     -+		slice_addstr(&nm, add->stack->list_file);
     -+		slice_addstr(&nm, "/");
     -+		slice_addstr(&nm, add->new_tables[i]);
     -+		unlink(slice_as_string(&nm));
     ++		strbuf_reset(&nm);
     ++		strbuf_addstr(&nm, add->stack->list_file);
     ++		strbuf_addstr(&nm, "/");
     ++		strbuf_addstr(&nm, add->new_tables[i]);
     ++		unlink(nm.buf);
      +		reftable_free(add->new_tables[i]);
      +		add->new_tables[i] = NULL;
      +	}
     @@ reftable/stack.c (new)
      +		add->lock_file_fd = 0;
      +	}
      +	if (add->lock_file_name.len > 0) {
     -+		unlink(slice_as_string(&add->lock_file_name));
     -+		slice_release(&add->lock_file_name);
     ++		unlink(add->lock_file_name.buf);
     ++		strbuf_release(&add->lock_file_name);
      +	}
      +
      +	free_names(add->names);
      +	add->names = NULL;
     -+	slice_release(&nm);
     ++	strbuf_release(&nm);
      +}
      +
      +void reftable_addition_destroy(struct reftable_addition *add)
     @@ reftable/stack.c (new)
      +
      +int reftable_addition_commit(struct reftable_addition *add)
      +{
     -+	struct slice table_list = SLICE_INIT;
     ++	struct strbuf table_list = STRBUF_INIT;
      +	int i = 0;
      +	int err = 0;
      +	if (add->new_tables_len == 0)
      +		goto done;
      +
      +	for (i = 0; i < add->stack->merged->stack_len; i++) {
     -+		slice_addstr(&table_list, add->stack->merged->stack[i]->name);
     -+		slice_addstr(&table_list, "\n");
     ++		strbuf_addstr(&table_list, add->stack->merged->stack[i]->name);
     ++		strbuf_addstr(&table_list, "\n");
      +	}
      +	for (i = 0; i < add->new_tables_len; i++) {
     -+		slice_addstr(&table_list, add->new_tables[i]);
     -+		slice_addstr(&table_list, "\n");
     ++		strbuf_addstr(&table_list, add->new_tables[i]);
     ++		strbuf_addstr(&table_list, "\n");
      +	}
      +
      +	err = write(add->lock_file_fd, table_list.buf, table_list.len);
     -+	slice_release(&table_list);
     ++	strbuf_release(&table_list);
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
      +		goto done;
     @@ reftable/stack.c (new)
      +		goto done;
      +	}
      +
     -+	err = rename(slice_as_string(&add->lock_file_name),
     -+		     add->stack->list_file);
     ++	err = rename(add->lock_file_name.buf, add->stack->list_file);
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
      +		goto done;
     @@ reftable/stack.c (new)
      +					     void *arg),
      +			  void *arg)
      +{
     -+	struct slice temp_tab_file_name = SLICE_INIT;
     -+	struct slice tab_file_name = SLICE_INIT;
     -+	struct slice next_name = SLICE_INIT;
     ++	struct strbuf temp_tab_file_name = STRBUF_INIT;
     ++	struct strbuf tab_file_name = STRBUF_INIT;
     ++	struct strbuf next_name = STRBUF_INIT;
      +	struct reftable_writer *wr = NULL;
      +	int err = 0;
      +	int tab_fd = 0;
      +
     -+	slice_reset(&next_name);
     ++	strbuf_reset(&next_name);
      +	format_name(&next_name, add->next_update_index, add->next_update_index);
      +
     -+	slice_addstr(&temp_tab_file_name, add->stack->reftable_dir);
     -+	slice_addstr(&temp_tab_file_name, "/");
     -+	slice_addbuf(&temp_tab_file_name, &next_name);
     -+	slice_addstr(&temp_tab_file_name, ".temp.XXXXXX");
     ++	strbuf_addstr(&temp_tab_file_name, add->stack->reftable_dir);
     ++	strbuf_addstr(&temp_tab_file_name, "/");
     ++	strbuf_addbuf(&temp_tab_file_name, &next_name);
     ++	strbuf_addstr(&temp_tab_file_name, ".temp.XXXXXX");
      +
     -+	tab_fd = mkstemp((char *)slice_as_string(&temp_tab_file_name));
     ++	tab_fd = mkstemp(temp_tab_file_name.buf);
      +	if (tab_fd < 0) {
      +		err = REFTABLE_IO_ERROR;
      +		goto done;
     @@ reftable/stack.c (new)
      +		goto done;
      +	}
      +
     -+	err = stack_check_addition(add->stack,
     -+				   slice_as_string(&temp_tab_file_name));
     ++	err = stack_check_addition(add->stack, temp_tab_file_name.buf);
      +	if (err < 0)
      +		goto done;
      +
     @@ reftable/stack.c (new)
      +	}
      +
      +	format_name(&next_name, wr->min_update_index, wr->max_update_index);
     -+	slice_addstr(&next_name, ".ref");
     ++	strbuf_addstr(&next_name, ".ref");
      +
     -+	slice_addstr(&tab_file_name, add->stack->reftable_dir);
     -+	slice_addstr(&tab_file_name, "/");
     -+	slice_addbuf(&tab_file_name, &next_name);
     ++	strbuf_addstr(&tab_file_name, add->stack->reftable_dir);
     ++	strbuf_addstr(&tab_file_name, "/");
     ++	strbuf_addbuf(&tab_file_name, &next_name);
      +
      +	/* TODO: should check destination out of paranoia */
     -+	err = rename(slice_as_string(&temp_tab_file_name),
     -+		     slice_as_string(&tab_file_name));
     ++	err = rename(temp_tab_file_name.buf, tab_file_name.buf);
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
      +		goto done;
     @@ reftable/stack.c (new)
      +	add->new_tables = reftable_realloc(add->new_tables,
      +					   sizeof(*add->new_tables) *
      +						   (add->new_tables_len + 1));
     -+	add->new_tables[add->new_tables_len] = slice_detach(&next_name);
     ++	add->new_tables[add->new_tables_len] = strbuf_detach(&next_name, NULL);
      +	add->new_tables_len++;
      +done:
      +	if (tab_fd > 0) {
     @@ reftable/stack.c (new)
      +		tab_fd = 0;
      +	}
      +	if (temp_tab_file_name.len > 0) {
     -+		unlink(slice_as_string(&temp_tab_file_name));
     ++		unlink(temp_tab_file_name.buf);
      +	}
      +
     -+	slice_release(&temp_tab_file_name);
     -+	slice_release(&tab_file_name);
     -+	slice_release(&next_name);
     ++	strbuf_release(&temp_tab_file_name);
     ++	strbuf_release(&tab_file_name);
     ++	strbuf_release(&next_name);
      +	reftable_writer_free(wr);
      +	return err;
      +}
     @@ reftable/stack.c (new)
      +}
      +
      +static int stack_compact_locked(struct reftable_stack *st, int first, int last,
     -+				struct slice *temp_tab,
     ++				struct strbuf *temp_tab,
      +				struct reftable_log_expiry_config *config)
      +{
     -+	struct slice next_name = SLICE_INIT;
     ++	struct strbuf next_name = STRBUF_INIT;
      +	int tab_fd = -1;
      +	struct reftable_writer *wr = NULL;
      +	int err = 0;
     @@ reftable/stack.c (new)
      +		    reftable_reader_min_update_index(st->merged->stack[first]),
      +		    reftable_reader_max_update_index(st->merged->stack[first]));
      +
     -+	slice_reset(temp_tab);
     -+	slice_addstr(temp_tab, st->reftable_dir);
     -+	slice_addstr(temp_tab, "/");
     -+	slice_addbuf(temp_tab, &next_name);
     -+	slice_addstr(temp_tab, ".temp.XXXXXX");
     ++	strbuf_reset(temp_tab);
     ++	strbuf_addstr(temp_tab, st->reftable_dir);
     ++	strbuf_addstr(temp_tab, "/");
     ++	strbuf_addbuf(temp_tab, &next_name);
     ++	strbuf_addstr(temp_tab, ".temp.XXXXXX");
      +
     -+	tab_fd = mkstemp((char *)slice_as_string(temp_tab));
     ++	tab_fd = mkstemp(temp_tab->buf);
      +	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
      +
      +	err = stack_write_compact(st, wr, first, last, config);
     @@ reftable/stack.c (new)
      +		tab_fd = 0;
      +	}
      +	if (err != 0 && temp_tab->len > 0) {
     -+		unlink(slice_as_string(temp_tab));
     -+		slice_release(temp_tab);
     ++		unlink(temp_tab->buf);
     ++		strbuf_release(temp_tab);
      +	}
     -+	slice_release(&next_name);
     ++	strbuf_release(&next_name);
      +	return err;
      +}
      +
     @@ reftable/stack.c (new)
      +static int stack_compact_range(struct reftable_stack *st, int first, int last,
      +			       struct reftable_log_expiry_config *expiry)
      +{
     -+	struct slice temp_tab_file_name = SLICE_INIT;
     -+	struct slice new_table_name = SLICE_INIT;
     -+	struct slice lock_file_name = SLICE_INIT;
     -+	struct slice ref_list_contents = SLICE_INIT;
     -+	struct slice new_table_path = SLICE_INIT;
     ++	struct strbuf temp_tab_file_name = STRBUF_INIT;
     ++	struct strbuf new_table_name = STRBUF_INIT;
     ++	struct strbuf lock_file_name = STRBUF_INIT;
     ++	struct strbuf ref_list_contents = STRBUF_INIT;
     ++	struct strbuf new_table_path = STRBUF_INIT;
      +	int err = 0;
      +	bool have_lock = false;
      +	int lock_file_fd = 0;
     @@ reftable/stack.c (new)
      +
      +	st->stats.attempts++;
      +
     -+	slice_reset(&lock_file_name);
     -+	slice_addstr(&lock_file_name, st->list_file);
     -+	slice_addstr(&lock_file_name, ".lock");
     ++	strbuf_reset(&lock_file_name);
     ++	strbuf_addstr(&lock_file_name, st->list_file);
     ++	strbuf_addstr(&lock_file_name, ".lock");
      +
     -+	lock_file_fd = open(slice_as_string(&lock_file_name),
     -+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
     ++	lock_file_fd =
     ++		open(lock_file_name.buf, O_EXCL | O_CREAT | O_WRONLY, 0644);
      +	if (lock_file_fd < 0) {
      +		if (errno == EEXIST) {
      +			err = 1;
     @@ reftable/stack.c (new)
      +		goto done;
      +
      +	for (i = first, j = 0; i <= last; i++) {
     -+		struct slice subtab_file_name = SLICE_INIT;
     -+		struct slice subtab_lock = SLICE_INIT;
     ++		struct strbuf subtab_file_name = STRBUF_INIT;
     ++		struct strbuf subtab_lock = STRBUF_INIT;
      +		int sublock_file_fd = -1;
      +
     -+		slice_addstr(&subtab_file_name, st->reftable_dir);
     -+		slice_addstr(&subtab_file_name, "/");
     -+		slice_addstr(&subtab_file_name,
     -+			     reader_name(st->merged->stack[i]));
     ++		strbuf_addstr(&subtab_file_name, st->reftable_dir);
     ++		strbuf_addstr(&subtab_file_name, "/");
     ++		strbuf_addstr(&subtab_file_name,
     ++			      reader_name(st->merged->stack[i]));
      +
     -+		slice_reset(&subtab_lock);
     -+		slice_addbuf(&subtab_lock, &subtab_file_name);
     -+		slice_addstr(&subtab_lock, ".lock");
     ++		strbuf_reset(&subtab_lock);
     ++		strbuf_addbuf(&subtab_lock, &subtab_file_name);
     ++		strbuf_addstr(&subtab_lock, ".lock");
      +
     -+		sublock_file_fd = open(slice_as_string(&subtab_lock),
     ++		sublock_file_fd = open(subtab_lock.buf,
      +				       O_EXCL | O_CREAT | O_WRONLY, 0644);
      +		if (sublock_file_fd > 0) {
      +			close(sublock_file_fd);
     @@ reftable/stack.c (new)
      +			}
      +		}
      +
     -+		subtable_locks[j] = (char *)slice_as_string(&subtab_lock);
     -+		delete_on_success[j] =
     -+			(char *)slice_as_string(&subtab_file_name);
     ++		subtable_locks[j] = subtab_lock.buf;
     ++		delete_on_success[j] = subtab_file_name.buf;
      +		j++;
      +
      +		if (err != 0)
      +			goto done;
      +	}
      +
     -+	err = unlink(slice_as_string(&lock_file_name));
     ++	err = unlink(lock_file_name.buf);
      +	if (err < 0)
      +		goto done;
      +	have_lock = false;
     @@ reftable/stack.c (new)
      +	if (err < 0)
      +		goto done;
      +
     -+	lock_file_fd = open(slice_as_string(&lock_file_name),
     -+			    O_EXCL | O_CREAT | O_WRONLY, 0644);
     ++	lock_file_fd =
     ++		open(lock_file_name.buf, O_EXCL | O_CREAT | O_WRONLY, 0644);
      +	if (lock_file_fd < 0) {
      +		if (errno == EEXIST) {
      +			err = 1;
     @@ reftable/stack.c (new)
      +
      +	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
      +		    st->merged->stack[last]->max_update_index);
     -+	slice_addstr(&new_table_name, ".ref");
     ++	strbuf_addstr(&new_table_name, ".ref");
      +
     -+	slice_reset(&new_table_path);
     -+	slice_addstr(&new_table_path, st->reftable_dir);
     -+	slice_addstr(&new_table_path, "/");
     -+	slice_addbuf(&new_table_path, &new_table_name);
     ++	strbuf_reset(&new_table_path);
     ++	strbuf_addstr(&new_table_path, st->reftable_dir);
     ++	strbuf_addstr(&new_table_path, "/");
     ++	strbuf_addbuf(&new_table_path, &new_table_name);
      +
      +	if (!is_empty_table) {
     -+		err = rename(slice_as_string(&temp_tab_file_name),
     -+			     slice_as_string(&new_table_path));
     ++		err = rename(temp_tab_file_name.buf, new_table_path.buf);
      +		if (err < 0) {
      +			err = REFTABLE_IO_ERROR;
      +			goto done;
     @@ reftable/stack.c (new)
      +	}
      +
      +	for (i = 0; i < first; i++) {
     -+		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
     -+		slice_addstr(&ref_list_contents, "\n");
     ++		strbuf_addstr(&ref_list_contents, st->merged->stack[i]->name);
     ++		strbuf_addstr(&ref_list_contents, "\n");
      +	}
      +	if (!is_empty_table) {
     -+		slice_addbuf(&ref_list_contents, &new_table_name);
     -+		slice_addstr(&ref_list_contents, "\n");
     ++		strbuf_addbuf(&ref_list_contents, &new_table_name);
     ++		strbuf_addstr(&ref_list_contents, "\n");
      +	}
      +	for (i = last + 1; i < st->merged->stack_len; i++) {
     -+		slice_addstr(&ref_list_contents, st->merged->stack[i]->name);
     -+		slice_addstr(&ref_list_contents, "\n");
     ++		strbuf_addstr(&ref_list_contents, st->merged->stack[i]->name);
     ++		strbuf_addstr(&ref_list_contents, "\n");
      +	}
      +
      +	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
     -+		unlink(slice_as_string(&new_table_path));
     ++		unlink(new_table_path.buf);
      +		goto done;
      +	}
      +	err = close(lock_file_fd);
      +	lock_file_fd = 0;
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
     -+		unlink(slice_as_string(&new_table_path));
     ++		unlink(new_table_path.buf);
      +		goto done;
      +	}
      +
     -+	err = rename(slice_as_string(&lock_file_name), st->list_file);
     ++	err = rename(lock_file_name.buf, st->list_file);
      +	if (err < 0) {
      +		err = REFTABLE_IO_ERROR;
     -+		unlink(slice_as_string(&new_table_path));
     ++		unlink(new_table_path.buf);
      +		goto done;
      +	}
      +	have_lock = false;
     @@ reftable/stack.c (new)
      +
      +	listp = delete_on_success;
      +	while (*listp) {
     -+		if (strcmp(*listp, slice_as_string(&new_table_path))) {
     ++		if (strcmp(*listp, new_table_path.buf)) {
      +			unlink(*listp);
      +		}
      +		listp++;
     @@ reftable/stack.c (new)
      +		lock_file_fd = 0;
      +	}
      +	if (have_lock) {
     -+		unlink(slice_as_string(&lock_file_name));
     ++		unlink(lock_file_name.buf);
      +	}
     -+	slice_release(&new_table_name);
     -+	slice_release(&new_table_path);
     -+	slice_release(&ref_list_contents);
     -+	slice_release(&temp_tab_file_name);
     -+	slice_release(&lock_file_name);
     ++	strbuf_release(&new_table_name);
     ++	strbuf_release(&new_table_path);
     ++	strbuf_release(&ref_list_contents);
     ++	strbuf_release(&temp_tab_file_name);
     ++	strbuf_release(&lock_file_name);
      +	return err;
      +}
      +
     @@ reftable/stack_test.c (new)
      +	struct reftable_stack *st = NULL;
      +	char dir[256] = "/tmp/stack_test.XXXXXX";
      +
     -+	byte h1[SHA1_SIZE] = { 0x01 }, h2[SHA1_SIZE] = { 0x02 };
     ++	uint8_t h1[SHA1_SIZE] = { 0x01 }, h2[SHA1_SIZE] = { 0x02 };
      +
      +	struct reftable_log_record input = {
      +		.ref_name = "branch",
     @@ reftable/stack_test.c (new)
      +	add_test_case("test_names_equal", &test_names_equal);
      +	add_test_case("test_reftable_stack_add", &test_reftable_stack_add);
      +	return test_main(argc, argv);
     ++}
     +
     + ## reftable/strbuf.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include "strbuf.h"
     ++
     ++#include "system.h"
     ++#include "reftable.h"
     ++#include "basics.h"
     ++
     ++#ifdef REFTABLE_STANDALONE
     ++
     ++void strbuf_init(struct strbuf *s, size_t alloc)
     ++{
     ++	struct strbuf empty = STRBUF_INIT;
     ++	*s = empty;
     ++}
     ++
     ++void strbuf_grow(struct strbuf *s, size_t extra)
     ++{
     ++	size_t newcap = s->len + extra + 1;
     ++	if (newcap > s->cap) {
     ++		s->buf = reftable_realloc(s->buf, newcap);
     ++		s->cap = newcap;
     ++	}
     ++}
     ++
     ++static void strbuf_resize(struct strbuf *s, int l)
     ++{
     ++	int zl = l + 1; /* one uint8_t for 0 termination. */
     ++	assert(s->canary == STRBUF_CANARY);
     ++	if (s->cap < zl) {
     ++		int c = s->cap * 2;
     ++		if (c < zl) {
     ++			c = zl;
     ++		}
     ++		s->cap = c;
     ++		s->buf = reftable_realloc(s->buf, s->cap);
     ++	}
     ++	s->len = l;
     ++	s->buf[l] = 0;
     ++}
     ++
     ++void strbuf_setlen(struct strbuf *s, size_t l)
     ++{
     ++	assert(s->cap >= l + 1);
     ++	s->len = l;
     ++	s->buf[l] = 0;
     ++}
     ++
     ++void strbuf_reset(struct strbuf *s)
     ++{
     ++	strbuf_resize(s, 0);
     ++}
     ++
     ++void strbuf_addstr(struct strbuf *d, const char *s)
     ++{
     ++	int l1 = d->len;
     ++	int l2 = strlen(s);
     ++	assert(d->canary == STRBUF_CANARY);
     ++
     ++	strbuf_resize(d, l2 + l1);
     ++	memcpy(d->buf + l1, s, l2);
     ++}
     ++
     ++void strbuf_addbuf(struct strbuf *s, struct strbuf *a)
     ++{
     ++	int end = s->len;
     ++	assert(s->canary == STRBUF_CANARY);
     ++	strbuf_resize(s, s->len + a->len);
     ++	memcpy(s->buf + end, a->buf, a->len);
     ++}
     ++
     ++char *strbuf_detach(struct strbuf *s, size_t *sz)
     ++{
     ++	char *p = NULL;
     ++	p = (char *)s->buf;
     ++	if (sz)
     ++		*sz = s->len;
     ++	s->buf = NULL;
     ++	s->cap = 0;
     ++	s->len = 0;
     ++	return p;
     ++}
     ++
     ++void strbuf_release(struct strbuf *s)
     ++{
     ++	assert(s->canary == STRBUF_CANARY);
     ++	s->cap = 0;
     ++	s->len = 0;
     ++	reftable_free(s->buf);
     ++	s->buf = NULL;
     ++}
     ++
     ++int strbuf_cmp(const struct strbuf *a, const struct strbuf *b)
     ++{
     ++	int min = a->len < b->len ? a->len : b->len;
     ++	int res = memcmp(a->buf, b->buf, min);
     ++	assert(a->canary == STRBUF_CANARY);
     ++	assert(b->canary == STRBUF_CANARY);
     ++	if (res != 0)
     ++		return res;
     ++	if (a->len < b->len)
     ++		return -1;
     ++	else if (a->len > b->len)
     ++		return 1;
     ++	else
     ++		return 0;
     ++}
     ++
     ++int strbuf_add(struct strbuf *b, const void *data, size_t sz)
     ++{
     ++	assert(b->canary == STRBUF_CANARY);
     ++	strbuf_grow(b, sz);
     ++	memcpy(b->buf + b->len, data, sz);
     ++	b->len += sz;
     ++	b->buf[b->len] = 0;
     ++	return sz;
     ++}
     ++
     ++#endif
     ++
     ++static uint64_t strbuf_size(void *b)
     ++{
     ++	return ((struct strbuf *)b)->len;
     ++}
     ++
     ++int strbuf_add_void(void *b, const void *data, size_t sz)
     ++{
     ++	strbuf_add((struct strbuf *)b, data, sz);
     ++	return sz;
     ++}
     ++
     ++static void strbuf_return_block(void *b, struct reftable_block *dest)
     ++{
     ++	memset(dest->data, 0xff, dest->len);
     ++	reftable_free(dest->data);
     ++}
     ++
     ++static void strbuf_close(void *b)
     ++{
     ++}
     ++
     ++static int strbuf_read_block(void *v, struct reftable_block *dest, uint64_t off,
     ++			     uint32_t size)
     ++{
     ++	struct strbuf *b = (struct strbuf *)v;
     ++	assert(off + size <= b->len);
     ++	dest->data = reftable_calloc(size);
     ++	memcpy(dest->data, b->buf + off, size);
     ++	dest->len = size;
     ++	return size;
     ++}
     ++
     ++struct reftable_block_source_vtable strbuf_vtable = {
     ++	.size = &strbuf_size,
     ++	.read_block = &strbuf_read_block,
     ++	.return_block = &strbuf_return_block,
     ++	.close = &strbuf_close,
     ++};
     ++
     ++void block_source_from_strbuf(struct reftable_block_source *bs,
     ++			      struct strbuf *buf)
     ++{
     ++	assert(bs->ops == NULL);
     ++	bs->ops = &strbuf_vtable;
     ++	bs->arg = buf;
     ++}
     ++
     ++static void malloc_return_block(void *b, struct reftable_block *dest)
     ++{
     ++	memset(dest->data, 0xff, dest->len);
     ++	reftable_free(dest->data);
     ++}
     ++
     ++struct reftable_block_source_vtable malloc_vtable = {
     ++	.return_block = &malloc_return_block,
     ++};
     ++
     ++struct reftable_block_source malloc_block_source_instance = {
     ++	.ops = &malloc_vtable,
     ++};
     ++
     ++struct reftable_block_source malloc_block_source(void)
     ++{
     ++	return malloc_block_source_instance;
     ++}
     ++
     ++int common_prefix_size(struct strbuf *a, struct strbuf *b)
     ++{
     ++	int p = 0;
     ++	while (p < a->len && p < b->len) {
     ++		if (a->buf[p] != b->buf[p]) {
     ++			break;
     ++		}
     ++		p++;
     ++	}
     ++
     ++	return p;
     ++}
     ++
     ++struct strbuf reftable_empty_strbuf = STRBUF_INIT;
     +
     + ## reftable/strbuf.h (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#ifndef SLICE_H
     ++#define SLICE_H
     ++
     ++#ifdef REFTABLE_STANDALONE
     ++
     ++#include "basics.h"
     ++#include "reftable.h"
     ++
     ++/*
     ++  Provides a bounds-checked, growable byte ranges. To use, initialize as "strbuf
     ++  x = STRBUF_INIT;"
     ++ */
     ++struct strbuf {
     ++	int len;
     ++	int cap;
     ++	char *buf;
     ++
     ++	/* Used to enforce initialization with STRBUF_INIT */
     ++	uint8_t canary;
     ++};
     ++#define STRBUF_CANARY 0x42
     ++#define STRBUF_INIT                       \
     ++	{                                 \
     ++		0, 0, NULL, STRBUF_CANARY \
     ++	}
     ++
     ++void strbuf_addstr(struct strbuf *dest, const char *src);
     ++
     ++/* Deallocate and clear strbuf */
     ++void strbuf_release(struct strbuf *strbuf);
     ++
     ++/* Set strbuf to 0 length, but retain buffer. */
     ++void strbuf_reset(struct strbuf *strbuf);
     ++
     ++/* Initializes a strbuf. Accepts a strbuf with random garbage. */
     ++void strbuf_init(struct strbuf *strbuf, size_t alloc);
     ++
     ++/* Return `buf`, clearing out `s`. Optionally return len (not cap) in `sz`.  */
     ++char *strbuf_detach(struct strbuf *s, size_t *sz);
     ++
     ++/* Set length of the slace to `l`, but don't reallocated. */
     ++void strbuf_setlen(struct strbuf *s, size_t l);
     ++
     ++/* Ensure `l` bytes beyond current length are available */
     ++void strbuf_grow(struct strbuf *s, size_t l);
     ++
     ++/* Signed comparison */
     ++int strbuf_cmp(const struct strbuf *a, const struct strbuf *b);
     ++
     ++/* Append `data` to the `dest` strbuf.  */
     ++int strbuf_add(struct strbuf *dest, const void *data, size_t sz);
     ++
     ++/* Append `add` to `dest. */
     ++void strbuf_addbuf(struct strbuf *dest, struct strbuf *add);
     ++
     ++#else
     ++
     ++#include "../git-compat-util.h"
     ++#include "../strbuf.h"
     ++
     ++#endif
     ++
     ++extern struct strbuf reftable_empty_strbuf;
     ++
     ++/* Like strbuf_add, but suitable for passing to reftable_new_writer
     ++ */
     ++int strbuf_add_void(void *b, const void *data, size_t sz);
     ++
     ++/* Find the longest shared prefix size of `a` and `b` */
     ++int common_prefix_size(struct strbuf *a, struct strbuf *b);
     ++
     ++struct reftable_block_source;
     ++
     ++/* Create an in-memory block source for reading reftables */
     ++void block_source_from_strbuf(struct reftable_block_source *bs,
     ++			      struct strbuf *buf);
     ++
     ++struct reftable_block_source malloc_block_source(void);
     ++
     ++#endif
     +
     + ## reftable/strbuf_test.c (new) ##
     +@@
     ++/*
     ++Copyright 2020 Google LLC
     ++
     ++Use of this source code is governed by a BSD-style
     ++license that can be found in the LICENSE file or at
     ++https://developers.google.com/open-source/licenses/bsd
     ++*/
     ++
     ++#include "strbuf.h"
     ++
     ++#include "system.h"
     ++
     ++#include "basics.h"
     ++#include "record.h"
     ++#include "reftable.h"
     ++#include "test_framework.h"
     ++#include "reftable-tests.h"
     ++
     ++static void test_strbuf(void)
     ++{
     ++	struct strbuf s = STRBUF_INIT;
     ++	struct strbuf t = STRBUF_INIT;
     ++
     ++	strbuf_addstr(&s, "abc");
     ++	assert(0 == strcmp("abc", s.buf));
     ++
     ++	strbuf_addstr(&t, "pqr");
     ++	strbuf_addbuf(&s, &t);
     ++	assert(0 == strcmp("abcpqr", s.buf));
     ++
     ++	strbuf_release(&s);
     ++	strbuf_release(&t);
     ++}
     ++
     ++int strbuf_test_main(int argc, const char *argv[])
     ++{
     ++	add_test_case("test_strbuf", &test_strbuf);
     ++	return test_main(argc, argv);
      +}
      
       ## reftable/system.h (new) ##
     @@ reftable/system.h (new)
      +#ifndef SYSTEM_H
      +#define SYSTEM_H
      +
     -+#if 1 /* REFTABLE_IN_GITCORE */
     -+#define REFTABLE_IN_GITCORE
     ++#ifndef REFTABLE_STANDALONE
      +
      +#include "git-compat-util.h"
      +#include "cache.h"
     @@ reftable/system.h (new)
      +
      +#include "compat.h"
      +
     -+#endif /* REFTABLE_IN_GITCORE */
     ++#endif /* REFTABLE_STANDALONE */
      +
      +void reftable_clear_dir(const char *dirname);
      +
     @@ reftable/system.h (new)
      +#define SHA1_SIZE 20
      +#define SHA256_SIZE 32
      +
     -+typedef uint8_t byte;
      +typedef int bool;
      +
      +/* This is uncompress2, which is only available in zlib as of 2017.
     @@ reftable/test_framework.c (new)
      +	return 0;
      +}
      +
     -+void set_test_hash(byte *p, int i)
     ++void set_test_hash(uint8_t *p, int i)
      +{
     -+	memset(p, (byte)i, hash_size(SHA1_ID));
     ++	memset(p, (uint8_t)i, hash_size(SHA1_ID));
      +}
      
       ## reftable/test_framework.h (new) ##
     @@ reftable/test_framework.h (new)
      +struct test_case *add_test_case(const char *name, void (*testfunc)(void));
      +int test_main(int argc, const char *argv[]);
      +
     -+void set_test_hash(byte *p, int i);
     ++void set_test_hash(uint8_t *p, int i);
      +
      +#endif
      
     @@ reftable/writer.c (new)
      +#include "tree.h"
      +
      +static struct reftable_block_stats *
     -+writer_reftable_block_stats(struct reftable_writer *w, byte typ)
     ++writer_reftable_block_stats(struct reftable_writer *w, uint8_t typ)
      +{
      +	switch (typ) {
      +	case 'r':
     @@ reftable/writer.c (new)
      +
      +/* write data, queuing the padding for the next write. Returns negative for
      + * error. */
     -+static int padded_write(struct reftable_writer *w, byte *data, size_t len,
     ++static int padded_write(struct reftable_writer *w, uint8_t *data, size_t len,
      +			int padding)
      +{
      +	int n = 0;
      +	if (w->pending_padding > 0) {
     -+		byte *zeroed = reftable_calloc(w->pending_padding);
     ++		uint8_t *zeroed = reftable_calloc(w->pending_padding);
      +		int n = w->write(w->write_arg, zeroed, w->pending_padding);
      +		if (n < 0)
      +			return n;
     @@ reftable/writer.c (new)
      +	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
      +}
      +
     -+static int writer_write_header(struct reftable_writer *w, byte *dest)
     ++static int writer_write_header(struct reftable_writer *w, uint8_t *dest)
      +{
      +	memcpy((char *)dest, "REFT", 4);
      +
     @@ reftable/writer.c (new)
      +	return header_size(writer_version(w));
      +}
      +
     -+static void writer_reinit_block_writer(struct reftable_writer *w, byte typ)
     ++static void writer_reinit_block_writer(struct reftable_writer *w, uint8_t typ)
      +{
      +	int block_start = 0;
      +	if (w->next == 0) {
      +		block_start = header_size(writer_version(w));
      +	}
      +
     -+	slice_release(&w->last_key);
     ++	strbuf_release(&w->last_key);
      +	block_writer_init(&w->block_writer_data, typ, w->block,
      +			  w->opts.block_size, block_start,
      +			  hash_size(w->opts.hash_id));
     @@ reftable/writer.c (new)
      +}
      +
      +struct reftable_writer *
     -+reftable_new_writer(int (*writer_func)(void *, byte *, size_t),
     ++reftable_new_writer(int (*writer_func)(void *, const void *, size_t),
      +		    void *writer_arg, struct reftable_write_options *opts)
      +{
      +	struct reftable_writer *wp =
      +		reftable_calloc(sizeof(struct reftable_writer));
     -+	slice_init(&wp->block_writer_data.last_key);
     ++	strbuf_init(&wp->block_writer_data.last_key, 0);
      +	options_set_defaults(opts);
      +	if (opts->block_size >= (1 << 24)) {
      +		/* TODO - error return? */
      +		abort();
      +	}
     -+	wp->last_key = reftable_empty_slice;
     ++	wp->last_key = reftable_empty_strbuf;
      +	wp->block = reftable_calloc(opts->block_size);
      +	wp->write = writer_func;
      +	wp->write_arg = writer_arg;
     @@ reftable/writer.c (new)
      +}
      +
      +struct obj_index_tree_node {
     -+	struct slice hash;
     ++	struct strbuf hash;
      +	uint64_t *offsets;
      +	int offset_len;
      +	int offset_cap;
      +};
     -+#define OBJ_INDEX_TREE_NODE_INIT   \
     -+	{                          \
     -+		.hash = SLICE_INIT \
     ++#define OBJ_INDEX_TREE_NODE_INIT    \
     ++	{                           \
     ++		.hash = STRBUF_INIT \
      +	}
      +
      +static int obj_index_tree_node_compare(const void *a, const void *b)
      +{
     -+	return slice_cmp(&((const struct obj_index_tree_node *)a)->hash,
     -+			 &((const struct obj_index_tree_node *)b)->hash);
     ++	return strbuf_cmp(&((const struct obj_index_tree_node *)a)->hash,
     ++			  &((const struct obj_index_tree_node *)b)->hash);
      +}
      +
     -+static void writer_index_hash(struct reftable_writer *w, struct slice *hash)
     ++static void writer_index_hash(struct reftable_writer *w, struct strbuf *hash)
      +{
      +	uint64_t off = w->next;
      +
     @@ reftable/writer.c (new)
      +		key = reftable_malloc(sizeof(struct obj_index_tree_node));
      +		*key = empty;
      +
     -+		slice_reset(&key->hash);
     -+		slice_addbuf(&key->hash, hash);
     ++		strbuf_reset(&key->hash);
     ++		strbuf_addbuf(&key->hash, hash);
      +		tree_search((void *)key, &w->obj_index_tree,
      +			    &obj_index_tree_node_compare, 1);
      +	} else {
     @@ reftable/writer.c (new)
      +			     struct reftable_record *rec)
      +{
      +	int result = -1;
     -+	struct slice key = SLICE_INIT;
     ++	struct strbuf key = STRBUF_INIT;
      +	int err = 0;
      +	reftable_record_key(rec, &key);
     -+	if (slice_cmp(&w->last_key, &key) >= 0)
     ++	if (strbuf_cmp(&w->last_key, &key) >= 0)
      +		goto done;
      +
     -+	slice_reset(&w->last_key);
     -+	slice_addbuf(&w->last_key, &key);
     ++	strbuf_reset(&w->last_key);
     ++	strbuf_addbuf(&w->last_key, &key);
      +	if (w->block_writer == NULL) {
      +		writer_reinit_block_writer(w, reftable_record_type(rec));
      +	}
     @@ reftable/writer.c (new)
      +
      +	result = 0;
      +done:
     -+	slice_release(&key);
     ++	strbuf_release(&key);
      +	return result;
      +}
      +
     @@ reftable/writer.c (new)
      +		return err;
      +
      +	if (!w->opts.skip_index_objects && ref->value != NULL) {
     -+		struct slice h = SLICE_INIT;
     -+		slice_add(&h, ref->value, hash_size(w->opts.hash_id));
     ++		struct strbuf h = STRBUF_INIT;
     ++		strbuf_add(&h, (char *)ref->value, hash_size(w->opts.hash_id));
      +		writer_index_hash(w, &h);
     -+		slice_release(&h);
     ++		strbuf_release(&h);
      +	}
      +
      +	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
     -+		struct slice h = SLICE_INIT;
     -+		slice_add(&h, ref->target_value, hash_size(w->opts.hash_id));
     ++		struct strbuf h = STRBUF_INIT;
     ++		strbuf_add(&h, ref->target_value, hash_size(w->opts.hash_id));
      +		writer_index_hash(w, &h);
      +	}
      +	return 0;
     @@ reftable/writer.c (new)
      +{
      +	struct reftable_record rec = { 0 };
      +	char *input_log_message = log->message;
     -+	struct slice cleaned_message = SLICE_INIT;
     ++	struct strbuf cleaned_message = STRBUF_INIT;
      +	int err;
      +	if (log->ref_name == NULL)
      +		return REFTABLE_API_ERROR;
     @@ reftable/writer.c (new)
      +	}
      +
      +	if (!w->opts.exact_log_message && log->message != NULL) {
     -+		slice_addstr(&cleaned_message, log->message);
     ++		strbuf_addstr(&cleaned_message, log->message);
      +		while (cleaned_message.len &&
      +		       cleaned_message.buf[cleaned_message.len - 1] == '\n')
     -+			slice_setlen(&cleaned_message, cleaned_message.len - 1);
     -+		if (strchr(slice_as_string(&cleaned_message), '\n')) {
     ++			strbuf_setlen(&cleaned_message,
     ++				      cleaned_message.len - 1);
     ++		if (strchr(cleaned_message.buf, '\n')) {
      +			// multiple lines not allowed.
      +			err = REFTABLE_API_ERROR;
      +			goto done;
      +		}
     -+		slice_addstr(&cleaned_message, "\n");
     -+		log->message = (char *)slice_as_string(&cleaned_message);
     ++		strbuf_addstr(&cleaned_message, "\n");
     ++		log->message = cleaned_message.buf;
      +	}
      +
      +	w->next -= w->pending_padding;
     @@ reftable/writer.c (new)
      +
      +done:
      +	log->message = input_log_message;
     -+	slice_release(&cleaned_message);
     ++	strbuf_release(&cleaned_message);
      +	return err;
      +}
      +
     @@ reftable/writer.c (new)
      +
      +static int writer_finish_section(struct reftable_writer *w)
      +{
     -+	byte typ = block_writer_type(w->block_writer);
     ++	uint8_t typ = block_writer_type(w->block_writer);
      +	uint64_t index_start = 0;
      +	int max_level = 0;
      +	int threshold = w->opts.unpadded ? 1 : 3;
     @@ reftable/writer.c (new)
      +			}
      +		}
      +		for (i = 0; i < idx_len; i++) {
     -+			slice_release(&idx[i].last_key);
     ++			strbuf_release(&idx[i].last_key);
      +		}
      +		reftable_free(idx);
      +	}
     @@ reftable/writer.c (new)
      +}
      +
      +struct common_prefix_arg {
     -+	struct slice *last;
     ++	struct strbuf *last;
      +	int max;
      +};
      +
     @@ reftable/writer.c (new)
      +	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
      +	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
      +	struct reftable_obj_record obj_rec = {
     -+		.hash_prefix = entry->hash.buf,
     ++		.hash_prefix = (uint8_t *)entry->hash.buf,
      +		.hash_prefix_len = arg->w->stats.object_id_len,
      +		.offsets = entry->offsets,
      +		.offset_len = entry->offset_len,
     @@ reftable/writer.c (new)
      +	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
      +
      +	FREE_AND_NULL(entry->offsets);
     -+	slice_release(&entry->hash);
     ++	strbuf_release(&entry->hash);
      +	reftable_free(entry);
      +}
      +
     @@ reftable/writer.c (new)
      +
      +int writer_finish_public_section(struct reftable_writer *w)
      +{
     -+	byte typ = 0;
     ++	uint8_t typ = 0;
      +	int err = 0;
      +
      +	if (w->block_writer == NULL)
     @@ reftable/writer.c (new)
      +
      +int reftable_writer_close(struct reftable_writer *w)
      +{
     -+	byte footer[72];
     -+	byte *p = footer;
     ++	uint8_t footer[72];
     ++	uint8_t *p = footer;
      +	int err = writer_finish_public_section(w);
      +	int empty_table = w->next == 0;
      +	if (err != 0)
     @@ reftable/writer.c (new)
      +	w->pending_padding = 0;
      +	if (empty_table) {
      +		/* Empty tables need a header anyway. */
     -+		byte header[28];
     ++		uint8_t header[28];
      +		int n = writer_write_header(w, header);
      +		err = padded_write(w, header, n, 0);
      +		if (err < 0)
     @@ reftable/writer.c (new)
      +	/* free up memory. */
      +	block_writer_clear(&w->block_writer_data);
      +	writer_clear_index(w);
     -+	slice_release(&w->last_key);
     ++	strbuf_release(&w->last_key);
      +	return err;
      +}
      +
     @@ reftable/writer.c (new)
      +{
      +	int i = 0;
      +	for (i = 0; i < w->index_len; i++) {
     -+		slice_release(&w->index[i].last_key);
     ++		strbuf_release(&w->index[i].last_key);
      +	}
      +
      +	FREE_AND_NULL(w->index);
     @@ reftable/writer.c (new)
      +
      +static int writer_flush_nonempty_block(struct reftable_writer *w)
      +{
     -+	byte typ = block_writer_type(w->block_writer);
     ++	uint8_t typ = block_writer_type(w->block_writer);
      +	struct reftable_block_stats *bstats =
      +		writer_reftable_block_stats(w, typ);
      +	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
      +	int raw_bytes = block_writer_finish(w->block_writer);
      +	int padding = 0;
      +	int err = 0;
     -+	struct reftable_index_record ir = { .last_key = SLICE_INIT };
     ++	struct reftable_index_record ir = { .last_key = STRBUF_INIT };
      +	if (raw_bytes < 0)
      +		return raw_bytes;
      +
     @@ reftable/writer.c (new)
      +	}
      +
      +	ir.offset = w->next;
     -+	slice_reset(&ir.last_key);
     -+	slice_addbuf(&ir.last_key, &w->block_writer->last_key);
     ++	strbuf_reset(&ir.last_key);
     ++	strbuf_addbuf(&ir.last_key, &w->block_writer->last_key);
      +	w->index[w->index_len] = ir;
      +
      +	w->index_len++;
     @@ reftable/writer.h (new)
      +#include "basics.h"
      +#include "block.h"
      +#include "reftable.h"
     -+#include "slice.h"
     ++#include "strbuf.h"
      +#include "tree.h"
      +
      +struct reftable_writer {
     -+	int (*write)(void *, byte *, size_t);
     ++	int (*write)(void *, const void *, size_t);
      +	void *write_arg;
      +	int pending_padding;
     -+	struct slice last_key;
     ++	struct strbuf last_key;
      +
      +	/* offset of next block to write. */
      +	uint64_t next;
     @@ reftable/writer.h (new)
      +	struct reftable_write_options opts;
      +
      +	/* memory buffer for writing */
     -+	byte *block;
     ++	uint8_t *block;
      +
      +	/* writer for the current section. NULL or points to
      +	 * block_writer_data */
  -:  ----------- > 12:  c92b8d12ec6 Add standalone build infrastructure for reftable
 12:  e1b01927454 ! 13:  479fe884e98 Reftable support for git-core
     @@ Commit message
      
          It can be activated by passing --ref-storage=reftable to "git init".
      
     -    TODO:
     -
     -    * Fix worktree commands
     -
     -    * Spots marked XXX
     -
          Example use: see t/t0031-reftable.sh
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
     @@ Makefile: VCSSVN_OBJS += vcs-svn/sliding_window.o
      +REFTABLE_OBJS += reftable/record.o
      +REFTABLE_OBJS += reftable/refname.o
      +REFTABLE_OBJS += reftable/reftable.o
     -+REFTABLE_OBJS += reftable/slice.o
     ++REFTABLE_OBJS += reftable/strbuf.o
      +REFTABLE_OBJS += reftable/stack.o
      +REFTABLE_OBJS += reftable/tree.o
      +REFTABLE_OBJS += reftable/writer.o
     @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
       
      -	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
      +	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
     -+		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
     ++		default_ref_storage(), INIT_DB_QUIET);
       
       	if (real_git_dir)
       		git_dir = real_git_dir;
     @@ cache.h: int path_inside_repo(const char *prefix, const char *path);
       void sanitize_stdfds(void);
       int daemonize(void);
      @@ cache.h: struct repository_format {
     - 	int is_bare;
       	int hash_algo;
     + 	int has_extensions;
       	char *work_tree;
      +	char *ref_storage;
       	struct string_list unknown_extensions;
     @@ refs.c: struct ref_store *get_main_ref_store(struct repository *r)
      +	r->refs_private = ref_store_init(r->gitdir,
      +					 r->ref_storage_format ?
      +						 r->ref_storage_format :
     -+						 DEFAULT_REF_STORAGE,
     ++						 default_ref_storage(),
      +					 REF_STORE_ALL_CAPS);
       	return r->refs_private;
       }
     @@ refs.c: struct ref_store *get_submodule_ref_store(const char *submodule)
       
       	/* assume that add_submodule_odb() has been called */
      -	refs = ref_store_init(submodule_sb.buf,
     -+	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
     ++	refs = ref_store_init(submodule_sb.buf, default_ref_storage(),
       			      REF_STORE_READ | REF_STORE_ODB);
       	register_ref_store_map(&submodule_ref_stores, "submodule",
       			       refs, submodule);
     @@ refs.c: struct ref_store *get_submodule_ref_store(const char *submodule)
       
       struct ref_store *get_worktree_ref_store(const struct worktree *wt)
       {
     -+	const char *format = DEFAULT_REF_STORAGE; /* XXX */
     ++	const char *format = default_ref_storage();
       	struct ref_store *refs;
       	const char *id;
       
     @@ refs/reftable-backend.c (new)
      +	safe_create_dir(refs->reftable_dir, 1);
      +
      +	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
     -+	write_file(sb.buf, "ref: refs/.invalid");
     ++	write_file(sb.buf, "ref: refs/heads/.invalid");
      +	strbuf_reset(&sb);
      +
      +	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
     @@ refs/reftable-backend.c (new)
      +		return refs->err;
      +	}
      +
     ++	/* This is usually not needed, but Git doesn't signal to ref backend if
     ++	   a subprocess updated the ref DB.  So we always check.
     ++	*/
     ++	err = reftable_stack_reload(refs->stack);
     ++	if (err) {
     ++		goto done;
     ++	}
     ++
      +	err = reftable_stack_read_ref(refs->stack, refname, &ref);
      +	if (err > 0) {
      +		errno = ENOENT;
     @@ refs/reftable-backend.c (new)
      +	reftable_reflog_expire,
      +};
      
     + ## reftable/update.sh ##
     +@@ reftable/update.sh: cp reftable-repo/LICENSE reftable/
     + git --git-dir reftable-repo/.git show --no-patch --format=oneline HEAD \
     +   > reftable/VERSION
     + 
     +-mv reftable/system.h reftable/system.h~
     +-sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
     +-
     + git add reftable/*.[ch] reftable/LICENSE reftable/VERSION
     +
       ## repository.c ##
      @@ repository.c: int repo_init(struct repository *repo,
       	if (worktree)
     @@ t/t0031-reftable.sh (new)
      +	for count in $(test_seq 1 10)
      +	do
      +		test_commit "number $count" file.t $count number-$count ||
     -+	        return 1
     ++		return 1
      +	done &&
      +	git pack-refs &&
      +	ls -1 .git/reftable >table-files &&
     @@ t/t0031-reftable.sh (new)
      +	git branch source &&
      +	git checkout HEAD^ &&
      +	test_commit message3 file3 &&
     -+ 	git rebase source &&
     ++	git rebase source &&
      +	test -f file2
      +'
      +
 13:  0359fe416fa ! 14:  eafd8eeefcc Hookup unittests for the reftable library.
     @@ Makefile: REFTABLE_OBJS += reftable/tree.o
      +REFTABLE_TEST_OBJS += reftable/record_test.o
      +REFTABLE_TEST_OBJS += reftable/refname_test.o
      +REFTABLE_TEST_OBJS += reftable/reftable_test.o
     -+REFTABLE_TEST_OBJS += reftable/slice_test.o
     ++REFTABLE_TEST_OBJS += reftable/strbuf_test.o
      +REFTABLE_TEST_OBJS += reftable/stack_test.o
      +REFTABLE_TEST_OBJS += reftable/tree_test.o
      +REFTABLE_TEST_OBJS += reftable/test_framework.o
     @@ t/helper/test-reftable.c (new)
      +	record_test_main(argc, argv);
      +	refname_test_main(argc, argv);
      +	reftable_test_main(argc, argv);
     -+	slice_test_main(argc, argv);
     ++	strbuf_test_main(argc, argv);
      +	stack_test_main(argc, argv);
      +	tree_test_main(argc, argv);
      +	return 0;
 14:  88640ea13f9 ! 15:  46af142f33b Add GIT_DEBUG_REFS debugging mechanism
     @@ Makefile: LIB_OBJS += rebase.o
       ## refs.c ##
      @@ refs.c: struct ref_store *get_main_ref_store(struct repository *r)
       						 r->ref_storage_format :
     - 						 DEFAULT_REF_STORAGE,
     + 						 default_ref_storage(),
       					 REF_STORE_ALL_CAPS);
      +	if (getenv("GIT_DEBUG_REFS")) {
      +		r->refs_private = debug_wrap(r->refs_private);
     @@ refs/debug.c (new)
      +{
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
      +	int res = drefs->refs->be->init_db(drefs->refs, err);
     ++	fprintf(stderr, "init_db: %d\n", res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +	transaction->ref_store = drefs->refs;
      +	res = drefs->refs->be->transaction_prepare(drefs->refs, transaction,
      +						   err);
     ++	fprintf(stderr, "transaction_prepare: %d\n", res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +	type &= 0xf; /* see refs.h REF_* */
      +	flags &= REF_HAVE_NEW | REF_HAVE_OLD | REF_NO_DEREF |
      +		 REF_FORCE_CREATE_REFLOG | REF_LOG_ONLY;
     -+	printf("%d: %s %s -> %s (F=0x%x, T=0x%x) \"%s\"\n", i, refname, o, n,
     -+	       flags, type, msg);
     ++	fprintf(stderr, "%d: %s %s -> %s (F=0x%x, T=0x%x) \"%s\"\n", i, refname,
     ++		o, n, flags, type, msg);
      +}
      +
      +static void print_transaction(struct ref_transaction *transaction)
      +{
     -+	printf("transaction {\n");
     ++	fprintf(stderr, "transaction {\n");
      +	for (int i = 0; i < transaction->nr; i++) {
      +		struct ref_update *u = transaction->updates[i];
      +		print_update(i, u->refname, &u->old_oid, &u->new_oid, u->flags,
      +			     u->type, u->msg);
      +	}
     -+	printf("}\n");
     ++	fprintf(stderr, "}\n");
      +}
      +
      +static int debug_transaction_finish(struct ref_store *refs,
     @@ refs/debug.c (new)
      +	res = drefs->refs->be->transaction_finish(drefs->refs, transaction,
      +						  err);
      +	print_transaction(transaction);
     -+	printf("finish: %d\n", res);
     ++	fprintf(stderr, "finish: %d\n", res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +{
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	int res = drefs->refs->be->pack_refs(drefs->refs, flags);
     ++	fprintf(stderr, "pack_refs: %d\n", res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	int res = drefs->refs->be->create_symref(drefs->refs, ref_name, target,
      +						 logmsg);
     -+	printf("create_symref: %s -> %s \"%s\": %d\n", ref_name, target, logmsg,
     -+	       res);
     ++	fprintf(stderr, "create_symref: %s -> %s \"%s\": %d\n", ref_name,
     ++		target, logmsg, res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	int res = drefs->refs->be->rename_ref(drefs->refs, oldref, newref,
      +					      logmsg);
     -+	printf("rename_ref: %s -> %s \"%s\": %d\n", oldref, newref, logmsg,
     -+	       res);
     ++	fprintf(stderr, "rename_ref: %s -> %s \"%s\": %d\n", oldref, newref,
     ++		logmsg, res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	int res =
      +		drefs->refs->be->copy_ref(drefs->refs, oldref, newref, logmsg);
     -+	printf("copy_ref: %s -> %s \"%s\": %d\n", oldref, newref, logmsg, res);
     ++	fprintf(stderr, "copy_ref: %s -> %s \"%s\": %d\n", oldref, newref,
     ++		logmsg, res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +		oid_to_hex_r(o, oid);
      +	if (old_oid)
      +		oid_to_hex_r(n, old_oid);
     -+	printf("write_pseudoref: %s, %s => %s, err %s: %d\n", pseudoref, o, n,
     -+	       err->buf, res);
     ++	fprintf(stderr, "write_pseudoref: %s, %s => %s, err %s: %d\n",
     ++		pseudoref, o, n, err->buf, res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +	char hex[100] = "null";
      +	if (old_oid)
      +		oid_to_hex_r(hex, old_oid);
     -+	printf("delete_pseudoref: %s (%s): %d\n", pseudoref, hex, res);
     ++	fprintf(stderr, "delete_pseudoref: %s (%s): %d\n", pseudoref, hex, res);
      +	return res;
      +}
      +
     ++struct debug_ref_iterator {
     ++	struct ref_iterator base;
     ++	struct ref_iterator *iter;
     ++};
     ++
     ++static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
     ++{
     ++	struct debug_ref_iterator *diter =
     ++		(struct debug_ref_iterator *)ref_iterator;
     ++	int res = diter->iter->vtable->advance(diter->iter);
     ++	fprintf(stderr, "iterator_advance: %s: %d\n", diter->iter->refname,
     ++		res);
     ++	diter->base.ordered = diter->iter->ordered;
     ++	diter->base.refname = diter->iter->refname;
     ++	diter->base.oid = diter->iter->oid;
     ++	diter->base.flags = diter->iter->flags;
     ++	return res;
     ++}
     ++static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
     ++				   struct object_id *peeled)
     ++{
     ++	struct debug_ref_iterator *diter =
     ++		(struct debug_ref_iterator *)ref_iterator;
     ++	int res = diter->iter->vtable->peel(diter->iter, peeled);
     ++	fprintf(stderr, "iterator_peel: %s: %d\n", diter->iter->refname, res);
     ++	return res;
     ++}
     ++
     ++static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator)
     ++{
     ++	struct debug_ref_iterator *diter =
     ++		(struct debug_ref_iterator *)ref_iterator;
     ++	int res = diter->iter->vtable->abort(diter->iter);
     ++	fprintf(stderr, "iterator_abort: %d\n", res);
     ++	return res;
     ++}
     ++
     ++static struct ref_iterator_vtable debug_ref_iterator_vtable = {
     ++	debug_ref_iterator_advance, debug_ref_iterator_peel,
     ++	debug_ref_iterator_abort
     ++};
     ++
      +static struct ref_iterator *
      +debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
      +			 unsigned int flags)
     @@ refs/debug.c (new)
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	struct ref_iterator *res =
      +		drefs->refs->be->iterator_begin(drefs->refs, prefix, flags);
     -+	return res;
     ++	struct debug_ref_iterator *diter = xcalloc(1, sizeof(*diter));
     ++	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable, 1);
     ++	diter->iter = res;
     ++	fprintf(stderr, "ref_iterator_begin: %s (0x%x)\n", prefix, flags);
     ++	return &diter->base;
      +}
      +
      +static int debug_read_raw_ref(struct ref_store *ref_store, const char *refname,
     @@ refs/debug.c (new)
      +					    type);
      +
      +	if (res == 0) {
     -+		printf("read_raw_ref: %s: %s (=> %s) type %x: %d\n", refname,
     -+		       oid_to_hex(oid), referent->buf, *type, res);
     ++		fprintf(stderr, "read_raw_ref: %s: %s (=> %s) type %x: %d\n",
     ++			refname, oid_to_hex(oid), referent->buf, *type, res);
      +	} else {
     -+		printf("read_raw_ref: %s err %d\n", refname, res);
     ++		fprintf(stderr, "read_raw_ref: %s err %d\n", refname, res);
      +	}
      +	return res;
      +}
     @@ refs/debug.c (new)
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	struct ref_iterator *res =
      +		drefs->refs->be->reflog_iterator_begin(drefs->refs);
     -+	printf("for_each_reflog_iterator_begin\n");
     ++	fprintf(stderr, "for_each_reflog_iterator_begin\n");
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +
      +	ret = dbg->fn(old_oid, new_oid, committer, timestamp, tz, msg,
      +		      dbg->cb_data);
     -+	printf("reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
     -+	       dbg->refname, ret, o, n, committer, (long int)timestamp, msg);
     ++	fprintf(stderr, "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
     ++		dbg->refname, ret, o, n, committer, (long int)timestamp, msg);
      +	return ret;
      +}
      +
     @@ refs/debug.c (new)
      +
      +	int res = drefs->refs->be->for_each_reflog_ent(
      +		drefs->refs, refname, &debug_print_reflog_ent, &dbg);
     -+	printf("for_each_reflog: %s: %d\n", refname, res);
     ++	fprintf(stderr, "for_each_reflog: %s: %d\n", refname, res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +	};
      +	int res = drefs->refs->be->for_each_reflog_ent_reverse(
      +		drefs->refs, refname, &debug_print_reflog_ent, &dbg);
     -+	printf("for_each_reflog_reverse: %s: %d\n", refname, res);
     ++	fprintf(stderr, "for_each_reflog_reverse: %s: %d\n", refname, res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +{
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	int res = drefs->refs->be->reflog_exists(drefs->refs, refname);
     -+	printf("reflog_exists: %s: %d\n", refname, res);
     ++	fprintf(stderr, "reflog_exists: %s: %d\n", refname, res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	int res = drefs->refs->be->create_reflog(drefs->refs, refname,
      +						 force_create, err);
     ++	fprintf(stderr, "create_reflog: %s: %d\n", refname, res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +{
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	int res = drefs->refs->be->delete_reflog(drefs->refs, refname);
     ++	fprintf(stderr, "delete_reflog: %s: %d\n", refname, res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +						 flags, prepare_fn,
      +						 should_prune_fn, cleanup_fn,
      +						 policy_cb_data);
     ++	fprintf(stderr, "reflog_expire: %s: %d\n", refname, res);
      +	return res;
      +}
      +
 15:  1c0cc646084 = 16:  5211c643104 vcxproj: adjust for the reftable changes
  -:  ----------- > 17:  9724854088c git-prompt: prepare for reftable refs backend
 16:  4f24b5f73de ! 18:  ece1fa1f625 Add reftable testing infrastructure
     @@ Commit message
          * Skip some tests that are incompatible:
      
            * t3210-pack-refs.sh - does not apply
     -      * t9903-bash-prompt - The bash mode reads .git/HEAD directly
            * t1450-fsck.sh - manipulates .git/ directly to create invalid state
      
          Major test failures:
     @@ Commit message
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
     - ## builtin/clone.c ##
     -@@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
     - 	}
     - 
     - 	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
     --		DEFAULT_REF_STORAGE, INIT_DB_QUIET);
     -+		default_ref_storage(), INIT_DB_QUIET);
     - 
     - 	if (real_git_dir)
     - 		git_dir = real_git_dir;
     -
     - ## refs.c ##
     -@@ refs.c: struct ref_store *get_main_ref_store(struct repository *r)
     - 	r->refs_private = ref_store_init(r->gitdir,
     - 					 r->ref_storage_format ?
     - 						 r->ref_storage_format :
     --						 DEFAULT_REF_STORAGE,
     -+						 default_ref_storage(),
     - 					 REF_STORE_ALL_CAPS);
     - 	if (getenv("GIT_DEBUG_REFS")) {
     - 		r->refs_private = debug_wrap(r->refs_private);
     -@@ refs.c: struct ref_store *get_submodule_ref_store(const char *submodule)
     - 		goto done;
     - 
     - 	/* assume that add_submodule_odb() has been called */
     --	refs = ref_store_init(submodule_sb.buf, DEFAULT_REF_STORAGE, /* XXX */
     -+	refs = ref_store_init(submodule_sb.buf, default_ref_storage(),
     - 			      REF_STORE_READ | REF_STORE_ODB);
     - 	register_ref_store_map(&submodule_ref_stores, "submodule",
     - 			       refs, submodule);
     -@@ refs.c: struct ref_store *get_submodule_ref_store(const char *submodule)
     - 
     - struct ref_store *get_worktree_ref_store(const struct worktree *wt)
     - {
     --	const char *format = DEFAULT_REF_STORAGE; /* XXX */
     -+	const char *format = default_ref_storage();
     - 	struct ref_store *refs;
     - 	const char *id;
     - 
     -
       ## t/t1409-avoid-packing-refs.sh ##
      @@ t/t1409-avoid-packing-refs.sh: test_description='avoid rewriting packed-refs unnecessarily'
       
     @@ t/t3210-pack-refs.sh: semantic is still the same.
       	git config core.logallrefupdates true
       '
      
     - ## t/t9903-bash-prompt.sh ##
     -@@ t/t9903-bash-prompt.sh: test_description='test git-specific bash prompt functions'
     - 
     - . ./lib-bash.sh
     - 
     -+if test_have_prereq REFTABLE
     -+then
     -+  skip_all='skipping tests; incompatible with reftable'
     -+  test_done
     -+fi
     -+
     - . "$GIT_BUILD_DIR/contrib/completion/git-prompt.sh"
     - 
     - actual="$TRASH_DIRECTORY/actual"
     -
       ## t/test-lib.sh ##
      @@ t/test-lib.sh: parisc* | hppa*)
       	;;
 17:  ad5658ffc51 = 19:  991abf9e1b2 Add "test-tool dump-reftable" command.

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v18 01/19] lib-t6000.sh: write tag using git-update-ref
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 02/19] checkout: add '\n' to reflog message Han-Wen Nienhuys via GitGitGadget
                                                     ` (18 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/lib-t6000.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/lib-t6000.sh b/t/lib-t6000.sh
index b0ed4767e32..fba6778ca35 100644
--- a/t/lib-t6000.sh
+++ b/t/lib-t6000.sh
@@ -1,7 +1,5 @@
 : included from 6002 and others
 
-mkdir -p .git/refs/tags
-
 >sed.script
 
 # Answer the sha1 has associated with the tag. The tag must exist under refs/tags
@@ -26,7 +24,8 @@ save_tag () {
 	_tag=$1
 	test -n "$_tag" || error "usage: save_tag tag commit-args ..."
 	shift 1
-	"$@" >".git/refs/tags/$_tag"
+
+	git update-ref "refs/tags/$_tag" $("$@")
 
 	echo "s/$(tag $_tag)/$_tag/g" >sed.script.tmp
 	cat sed.script >>sed.script.tmp
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 02/19] checkout: add '\n' to reflog message
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 01/19] lib-t6000.sh: write tag using git-update-ref Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 03/19] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
                                                     ` (17 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable precisely reproduces the given message. This leads to differences,
because the files backend implicitly adds a trailing '\n' to all messages.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/checkout.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index af849c644fe..bb11fcc4e99 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -884,8 +884,9 @@ static void update_refs_for_switch(const struct checkout_opts *opts,
 
 	reflog_msg = getenv("GIT_REFLOG_ACTION");
 	if (!reflog_msg)
-		strbuf_addf(&msg, "checkout: moving from %s to %s",
-			old_desc ? old_desc : "(invalid)", new_branch_info->name);
+		strbuf_addf(&msg, "checkout: moving from %s to %s\n",
+			    old_desc ? old_desc : "(invalid)",
+			    new_branch_info->name);
 	else
 		strbuf_insertstr(&msg, 0, reflog_msg);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 03/19] Write pseudorefs through ref backends.
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 01/19] lib-t6000.sh: write tag using git-update-ref Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 02/19] checkout: add '\n' to reflog message Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 04/19] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
                                                     ` (16 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Pseudorefs store transient data in in the repository. Examples are HEAD,
CHERRY_PICK_HEAD, etc.

These refs have always been read through the ref backends, but they were written
in a one-off routine that wrote an object ID or symref directly into
.git/<pseudo_ref_name>.

This causes problems when introducing a new ref storage backend. To remedy this,
extend the ref backend implementation with a write_pseudoref_fn and
update_pseudoref_fn.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c                | 119 ++++++++----------------------------------
 refs.h                |  11 ++++
 refs/files-backend.c  | 114 +++++++++++++++++++++++++++++++++++++++-
 refs/packed-backend.c |  21 +++++++-
 refs/refs-internal.h  |  18 +++++++
 5 files changed, 184 insertions(+), 99 deletions(-)

diff --git a/refs.c b/refs.c
index 224ff66c7bb..12908066b13 100644
--- a/refs.c
+++ b/refs.c
@@ -321,6 +321,12 @@ int ref_exists(const char *refname)
 	return refs_ref_exists(get_main_ref_store(the_repository), refname);
 }
 
+int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
+{
+	return refs_delete_pseudoref(get_main_ref_store(the_repository),
+				     pseudoref, old_oid);
+}
+
 static int filter_refs(const char *refname, const struct object_id *oid,
 			   int flags, void *data)
 {
@@ -739,101 +745,6 @@ long get_files_ref_lock_timeout_ms(void)
 	return timeout_ms;
 }
 
-static int write_pseudoref(const char *pseudoref, const struct object_id *oid,
-			   const struct object_id *old_oid, struct strbuf *err)
-{
-	const char *filename;
-	int fd;
-	struct lock_file lock = LOCK_INIT;
-	struct strbuf buf = STRBUF_INIT;
-	int ret = -1;
-
-	if (!oid)
-		return 0;
-
-	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
-
-	filename = git_path("%s", pseudoref);
-	fd = hold_lock_file_for_update_timeout(&lock, filename, 0,
-					       get_files_ref_lock_timeout_ms());
-	if (fd < 0) {
-		strbuf_addf(err, _("could not open '%s' for writing: %s"),
-			    filename, strerror(errno));
-		goto done;
-	}
-
-	if (old_oid) {
-		struct object_id actual_old_oid;
-
-		if (read_ref(pseudoref, &actual_old_oid)) {
-			if (!is_null_oid(old_oid)) {
-				strbuf_addf(err, _("could not read ref '%s'"),
-					    pseudoref);
-				rollback_lock_file(&lock);
-				goto done;
-			}
-		} else if (is_null_oid(old_oid)) {
-			strbuf_addf(err, _("ref '%s' already exists"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		} else if (!oideq(&actual_old_oid, old_oid)) {
-			strbuf_addf(err, _("unexpected object ID when writing '%s'"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		}
-	}
-
-	if (write_in_full(fd, buf.buf, buf.len) < 0) {
-		strbuf_addf(err, _("could not write to '%s'"), filename);
-		rollback_lock_file(&lock);
-		goto done;
-	}
-
-	commit_lock_file(&lock);
-	ret = 0;
-done:
-	strbuf_release(&buf);
-	return ret;
-}
-
-static int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
-{
-	const char *filename;
-
-	filename = git_path("%s", pseudoref);
-
-	if (old_oid && !is_null_oid(old_oid)) {
-		struct lock_file lock = LOCK_INIT;
-		int fd;
-		struct object_id actual_old_oid;
-
-		fd = hold_lock_file_for_update_timeout(
-				&lock, filename, 0,
-				get_files_ref_lock_timeout_ms());
-		if (fd < 0) {
-			error_errno(_("could not open '%s' for writing"),
-				    filename);
-			return -1;
-		}
-		if (read_ref(pseudoref, &actual_old_oid))
-			die(_("could not read ref '%s'"), pseudoref);
-		if (!oideq(&actual_old_oid, old_oid)) {
-			error(_("unexpected object ID when deleting '%s'"),
-			      pseudoref);
-			rollback_lock_file(&lock);
-			return -1;
-		}
-
-		unlink(filename);
-		rollback_lock_file(&lock);
-	} else {
-		unlink(filename);
-	}
-
-	return 0;
-}
 
 int refs_delete_ref(struct ref_store *refs, const char *msg,
 		    const char *refname,
@@ -845,7 +756,7 @@ int refs_delete_ref(struct ref_store *refs, const char *msg,
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
 		assert(refs == get_main_ref_store(the_repository));
-		return delete_pseudoref(refname, old_oid);
+		return refs_delete_pseudoref(refs, refname, old_oid);
 	}
 
 	transaction = ref_store_transaction_begin(refs, &err);
@@ -1172,7 +1083,8 @@ int refs_update_ref(struct ref_store *refs, const char *msg,
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
 		assert(refs == get_main_ref_store(the_repository));
-		ret = write_pseudoref(refname, new_oid, old_oid, &err);
+		ret = refs_write_pseudoref(refs, refname, new_oid, old_oid,
+					   &err);
 	} else {
 		t = ref_store_transaction_begin(refs, &err);
 		if (!t ||
@@ -1441,6 +1353,19 @@ int head_ref(each_ref_fn fn, void *cb_data)
 	return refs_head_ref(get_main_ref_store(the_repository), fn, cb_data);
 }
 
+int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			 const struct object_id *oid,
+			 const struct object_id *old_oid, struct strbuf *err)
+{
+	return refs->be->write_pseudoref(refs, pseudoref, oid, old_oid, err);
+}
+
+int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			  const struct object_id *old_oid)
+{
+	return refs->be->delete_pseudoref(refs, pseudoref, old_oid);
+}
+
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
 		const char *prefix, int trim, int flags)
diff --git a/refs.h b/refs.h
index e010f8aec28..4dad8f24914 100644
--- a/refs.h
+++ b/refs.h
@@ -732,6 +732,17 @@ int update_ref(const char *msg, const char *refname,
 	       const struct object_id *new_oid, const struct object_id *old_oid,
 	       unsigned int flags, enum action_on_err onerr);
 
+/* Pseudorefs (eg. HEAD, CHERRY_PICK_HEAD) have a separate routines for updating
+   and deletion as they cannot take part in normal transactional updates.
+   Pseudorefs should only be written for the main repository.
+*/
+int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			 const struct object_id *oid,
+			 const struct object_id *old_oid, struct strbuf *err);
+int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			  const struct object_id *old_oid);
+int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid);
+
 int parse_hide_refs_config(const char *var, const char *value, const char *);
 
 /*
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6516c7bc8c8..df7553f4cc3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -731,6 +731,115 @@ static int lock_raw_ref(struct files_ref_store *refs,
 	return ret;
 }
 
+static int files_write_pseudoref(struct ref_store *ref_store,
+				 const char *pseudoref,
+				 const struct object_id *oid,
+				 const struct object_id *old_oid,
+				 struct strbuf *err)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "write_pseudoref");
+	int fd;
+	struct lock_file lock = LOCK_INIT;
+	struct strbuf filename = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	int ret = -1;
+
+	if (!oid)
+		return 0;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+	fd = hold_lock_file_for_update_timeout(&lock, filename.buf, 0,
+					       get_files_ref_lock_timeout_ms());
+	if (fd < 0) {
+		strbuf_addf(err, _("could not open '%s' for writing: %s"),
+			    buf.buf, strerror(errno));
+		goto done;
+	}
+
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(err, _("could not read ref '%s'"),
+					    pseudoref);
+				rollback_lock_file(&lock);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(err, _("ref '%s' already exists"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(err,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+	}
+
+	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
+	if (write_in_full(fd, buf.buf, buf.len) < 0) {
+		strbuf_addf(err, _("could not write to '%s'"), filename.buf);
+		rollback_lock_file(&lock);
+		goto done;
+	}
+
+	commit_lock_file(&lock);
+	ret = 0;
+done:
+	strbuf_release(&buf);
+	strbuf_release(&filename);
+	return ret;
+}
+
+static int files_delete_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *old_oid)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "delete_pseudoref");
+	struct strbuf filename = STRBUF_INIT;
+	int ret = -1;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+
+	if (old_oid && !is_null_oid(old_oid)) {
+		struct lock_file lock = LOCK_INIT;
+		int fd;
+		struct object_id actual_old_oid;
+
+		fd = hold_lock_file_for_update_timeout(
+			&lock, filename.buf, 0,
+			get_files_ref_lock_timeout_ms());
+		if (fd < 0) {
+			error_errno(_("could not open '%s' for writing"),
+				    filename.buf);
+			goto done;
+		}
+		if (read_ref(pseudoref, &actual_old_oid))
+			die(_("could not read ref '%s'"), pseudoref);
+		if (!oideq(&actual_old_oid, old_oid)) {
+			error(_("unexpected object ID when deleting '%s'"),
+			      pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+
+		unlink(filename.buf);
+		rollback_lock_file(&lock);
+	} else {
+		unlink(filename.buf);
+	}
+	ret = 0;
+done:
+	strbuf_release(&filename);
+	return ret;
+}
+
 struct files_ref_iterator {
 	struct ref_iterator base;
 
@@ -3189,6 +3298,9 @@ struct ref_storage_be refs_be_files = {
 	files_rename_ref,
 	files_copy_ref,
 
+	files_write_pseudoref,
+	files_delete_pseudoref,
+
 	files_ref_iterator_begin,
 	files_read_raw_ref,
 
@@ -3198,5 +3310,5 @@ struct ref_storage_be refs_be_files = {
 	files_reflog_exists,
 	files_create_reflog,
 	files_delete_reflog,
-	files_reflog_expire
+	files_reflog_expire,
 };
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 4458a0f69cc..08e8253a893 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1590,6 +1590,22 @@ static int packed_copy_ref(struct ref_store *ref_store,
 	BUG("packed reference store does not support copying references");
 }
 
+static int packed_write_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *oid,
+				  const struct object_id *old_oid,
+				  struct strbuf *err)
+{
+	BUG("packed reference store does not support writing pseudo-references");
+}
+
+static int packed_delete_pseudoref(struct ref_store *ref_store,
+				   const char *pseudoref,
+				   const struct object_id *old_oid)
+{
+	BUG("packed reference store does not support deleting pseudo-references");
+}
+
 static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_store)
 {
 	return empty_ref_iterator_begin();
@@ -1656,6 +1672,9 @@ struct ref_storage_be refs_be_packed = {
 	packed_rename_ref,
 	packed_copy_ref,
 
+	packed_write_pseudoref,
+	packed_delete_pseudoref,
+
 	packed_ref_iterator_begin,
 	packed_read_raw_ref,
 
@@ -1665,5 +1684,5 @@ struct ref_storage_be refs_be_packed = {
 	packed_reflog_exists,
 	packed_create_reflog,
 	packed_delete_reflog,
-	packed_reflog_expire
+	packed_reflog_expire,
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 4271362d264..59b053d53a2 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -556,6 +556,21 @@ typedef int copy_ref_fn(struct ref_store *ref_store,
 			  const char *oldref, const char *newref,
 			  const char *logmsg);
 
+typedef int write_pseudoref_fn(struct ref_store *ref_store,
+			       const char *pseudoref,
+			       const struct object_id *oid,
+			       const struct object_id *old_oid,
+			       struct strbuf *err);
+
+/*
+ * Deletes a pseudoref. Deletion always succeeds (even if the pseudoref doesn't
+ * exist.), except if old_oid is specified. If it is, it can fail due to lock
+ * failure, failure reading the old OID, or an OID mismatch
+ */
+typedef int delete_pseudoref_fn(struct ref_store *ref_store,
+				const char *pseudoref,
+				const struct object_id *old_oid);
+
 /*
  * Iterate over the references in `ref_store` whose names start with
  * `prefix`. `prefix` is matched as a literal string, without regard
@@ -655,6 +670,9 @@ struct ref_storage_be {
 	rename_ref_fn *rename_ref;
 	copy_ref_fn *copy_ref;
 
+	write_pseudoref_fn *write_pseudoref;
+	delete_pseudoref_fn *delete_pseudoref;
+
 	ref_iterator_begin_fn *iterator_begin;
 	read_raw_ref_fn *read_raw_ref;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 04/19] Make refs_ref_exists public
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (2 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 03/19] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 05/19] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
                                                     ` (15 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 2 +-
 refs.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 12908066b13..812fee47108 100644
--- a/refs.c
+++ b/refs.c
@@ -311,7 +311,7 @@ int read_ref(const char *refname, struct object_id *oid)
 	return read_ref_full(refname, RESOLVE_REF_READING, oid, NULL);
 }
 
-static int refs_ref_exists(struct ref_store *refs, const char *refname)
+int refs_ref_exists(struct ref_store *refs, const char *refname)
 {
 	return !!refs_resolve_ref_unsafe(refs, refname, RESOLVE_REF_READING, NULL, NULL);
 }
diff --git a/refs.h b/refs.h
index 4dad8f24914..7aaa1226551 100644
--- a/refs.h
+++ b/refs.h
@@ -105,6 +105,8 @@ int refs_verify_refname_available(struct ref_store *refs,
 				  const struct string_list *skip,
 				  struct strbuf *err);
 
+int refs_ref_exists(struct ref_store *refs, const char *refname);
+
 int ref_exists(const char *refname);
 
 int should_autocreate_reflog(const char *refname);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 05/19] Treat BISECT_HEAD as a pseudo ref
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (3 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 04/19] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 06/19] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
                                                     ` (14 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Both the git-bisect.sh as bisect--helper inspected the file system directly.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/bisect--helper.c | 3 +--
 git-bisect.sh            | 4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index ec4996282e3..73f9324ad7d 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -13,7 +13,6 @@ static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
 static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
 static GIT_PATH_FUNC(git_path_bisect_ancestors_ok, "BISECT_ANCESTORS_OK")
 static GIT_PATH_FUNC(git_path_bisect_start, "BISECT_START")
-static GIT_PATH_FUNC(git_path_bisect_head, "BISECT_HEAD")
 static GIT_PATH_FUNC(git_path_bisect_log, "BISECT_LOG")
 static GIT_PATH_FUNC(git_path_head_name, "head-name")
 static GIT_PATH_FUNC(git_path_bisect_names, "BISECT_NAMES")
@@ -164,7 +163,7 @@ static int bisect_reset(const char *commit)
 		strbuf_addstr(&branch, commit);
 	}
 
-	if (!file_exists(git_path_bisect_head())) {
+	if (!ref_exists("BISECT_HEAD")) {
 		struct argv_array argv = ARGV_ARRAY_INIT;
 
 		argv_array_pushl(&argv, "checkout", branch.buf, "--", NULL);
diff --git a/git-bisect.sh b/git-bisect.sh
index 08a6ed57ddb..f03fbb18f00 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -41,7 +41,7 @@ TERM_GOOD=good
 
 bisect_head()
 {
-	if test -f "$GIT_DIR/BISECT_HEAD"
+	if git rev-parse --verify -q BISECT_HEAD > /dev/null
 	then
 		echo BISECT_HEAD
 	else
@@ -153,7 +153,7 @@ bisect_next() {
 	git bisect--helper --bisect-next-check $TERM_GOOD $TERM_BAD $TERM_GOOD|| exit
 
 	# Perform all bisection computation, display and checkout
-	git bisect--helper --next-all $(test -f "$GIT_DIR/BISECT_HEAD" && echo --no-checkout)
+	git bisect--helper --next-all $(git rev-parse --verify -q BISECT_HEAD > /dev/null && echo --no-checkout)
 	res=$?
 
 	# Check if we should exit because bisection is finished
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 06/19] Treat CHERRY_PICK_HEAD as a pseudo ref
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (4 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 05/19] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 07/19] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
                                                     ` (13 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Check for existence and delete CHERRY_PICK_HEAD through pseudo ref functions.
This will help cherry-pick work with alternate ref storage backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/commit.c | 34 +++++++++++++++++++---------------
 builtin/merge.c  |  2 +-
 path.c           |  1 -
 path.h           |  7 ++++---
 sequencer.c      | 42 ++++++++++++++++++++++++++----------------
 wt-status.c      |  4 ++--
 6 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index d1b7396052a..e27120b982b 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -847,21 +847,25 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 			if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
 				!merge_contains_scissors)
 				wt_status_add_cut_line(s->fp);
-			status_printf_ln(s, GIT_COLOR_NORMAL,
-			    whence == FROM_MERGE
-				? _("\n"
-					"It looks like you may be committing a merge.\n"
-					"If this is not correct, please remove the file\n"
-					"	%s\n"
-					"and try again.\n")
-				: _("\n"
-					"It looks like you may be committing a cherry-pick.\n"
-					"If this is not correct, please remove the file\n"
-					"	%s\n"
-					"and try again.\n"),
-				whence == FROM_MERGE ?
-					git_path_merge_head(the_repository) :
-					git_path_cherry_pick_head(the_repository));
+			if (whence == FROM_MERGE)
+				status_printf_ln(
+					s, GIT_COLOR_NORMAL,
+
+					_("\n"
+					  "It looks like you may be committing a merge.\n"
+					  "If this is not correct, please remove the file\n"
+					  "	%s\n"
+					  "and try again.\n"),
+					git_path_merge_head(the_repository));
+			else
+				status_printf_ln(
+					s, GIT_COLOR_NORMAL,
+
+					_("\n"
+					  "It looks like you may be committing a cherry-pick.\n"
+					  "If this is not correct, please run\n"
+					  "	git cherry-pick --abort\n"
+					  "and try again.\n"));
 		}
 
 		fprintf(s->fp, "\n");
diff --git a/builtin/merge.c b/builtin/merge.c
index 7da707bf55d..93b0a7b6eda 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -1352,7 +1352,7 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
 		else
 			die(_("You have not concluded your merge (MERGE_HEAD exists)."));
 	}
-	if (file_exists(git_path_cherry_pick_head(the_repository))) {
+	if (ref_exists("CHERRY_PICK_HEAD")) {
 		if (advice_resolve_conflict)
 			die(_("You have not concluded your cherry-pick (CHERRY_PICK_HEAD exists).\n"
 			    "Please, commit your changes before you merge."));
diff --git a/path.c b/path.c
index 8b2c7531919..783cc2ae819 100644
--- a/path.c
+++ b/path.c
@@ -1528,7 +1528,6 @@ char *xdg_cache_home(const char *filename)
 	return NULL;
 }
 
-REPO_GIT_PATH_FUNC(cherry_pick_head, "CHERRY_PICK_HEAD")
 REPO_GIT_PATH_FUNC(revert_head, "REVERT_HEAD")
 REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
 REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
diff --git a/path.h b/path.h
index 1f1bf8f87a8..8941c018a99 100644
--- a/path.h
+++ b/path.h
@@ -170,7 +170,6 @@ void report_linked_checkout_garbage(void);
 	}
 
 struct path_cache {
-	const char *cherry_pick_head;
 	const char *revert_head;
 	const char *squash_msg;
 	const char *merge_msg;
@@ -182,9 +181,11 @@ struct path_cache {
 	const char *shallow;
 };
 
-#define PATH_CACHE_INIT { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL }
+#define PATH_CACHE_INIT                                              \
+	{                                                            \
+		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
+	}
 
-const char *git_path_cherry_pick_head(struct repository *r);
 const char *git_path_revert_head(struct repository *r);
 const char *git_path_squash_msg(struct repository *r);
 const char *git_path_merge_msg(struct repository *r);
diff --git a/sequencer.c b/sequencer.c
index fd7701c88a8..26286ec8d08 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -381,7 +381,8 @@ static void print_advice(struct repository *r, int show_hint,
 		 * (typically rebase --interactive) wants to take care
 		 * of the commit itself so remove CHERRY_PICK_HEAD
 		 */
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		return;
 	}
 
@@ -1455,7 +1456,8 @@ static int do_commit(struct repository *r,
 				    author, opts, flags, &oid);
 		strbuf_release(&sb);
 		if (!res) {
-			unlink(git_path_cherry_pick_head(r));
+			refs_delete_pseudoref(get_main_ref_store(r),
+					      "CHERRY_PICK_HEAD", NULL);
 			unlink(git_path_merge_msg(r));
 			if (!is_rebase_i(opts))
 				print_commit_summary(r, NULL, &oid,
@@ -1966,7 +1968,8 @@ static int do_pick_commit(struct repository *r,
 		flags |= ALLOW_EMPTY;
 	} else if (allow == 2) {
 		drop_commit = 1;
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		unlink(git_path_merge_msg(r));
 		fprintf(stderr,
 			_("dropping %s %s -- patch contents already upstream\n"),
@@ -2305,8 +2308,10 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose)
 	struct replay_opts opts = REPLAY_OPTS_INIT;
 	int need_cleanup = 0;
 
-	if (file_exists(git_path_cherry_pick_head(r))) {
-		if (!unlink(git_path_cherry_pick_head(r)) && verbose)
+	if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD")) {
+		if (!refs_delete_pseudoref(get_main_ref_store(r),
+					   "CHERRY_PICK_HEAD", NULL) &&
+		    verbose)
 			warning(_("cancelling a cherry picking in progress"));
 		opts.action = REPLAY_PICK;
 		need_cleanup = 1;
@@ -2671,8 +2676,9 @@ static int create_seq_dir(struct repository *r)
 	enum replay_action action;
 	const char *in_progress_error = NULL;
 	const char *in_progress_advice = NULL;
-	unsigned int advise_skip = file_exists(git_path_revert_head(r)) ||
-				file_exists(git_path_cherry_pick_head(r));
+	unsigned int advise_skip =
+		file_exists(git_path_revert_head(r)) ||
+		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
 
 	if (!sequencer_get_last_command(r, &action)) {
 		switch (action) {
@@ -2771,7 +2777,7 @@ static int rollback_single_pick(struct repository *r)
 {
 	struct object_id head_oid;
 
-	if (!file_exists(git_path_cherry_pick_head(r)) &&
+	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
 	    !file_exists(git_path_revert_head(r)))
 		return error(_("no cherry-pick or revert in progress"));
 	if (read_ref_full("HEAD", 0, &head_oid, NULL))
@@ -2874,7 +2880,8 @@ int sequencer_skip(struct repository *r, struct replay_opts *opts)
 		}
 		break;
 	case REPLAY_PICK:
-		if (!file_exists(git_path_cherry_pick_head(r))) {
+		if (!refs_ref_exists(get_main_ref_store(r),
+				     "CHERRY_PICK_HEAD")) {
 			if (action != REPLAY_PICK)
 				return error(_("no cherry-pick in progress"));
 			if (!rollback_is_safe())
@@ -3569,7 +3576,8 @@ static int do_merge(struct repository *r,
 					oid_to_hex(&j->item->object.oid));
 
 		strbuf_release(&ref_name);
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		rollback_lock_file(&lock);
 
 		rollback_lock_file(&lock);
@@ -4201,7 +4209,7 @@ static int continue_single_pick(struct repository *r)
 {
 	const char *argv[] = { "commit", NULL };
 
-	if (!file_exists(git_path_cherry_pick_head(r)) &&
+	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
 	    !file_exists(git_path_revert_head(r)))
 		return error(_("no cherry-pick or revert in progress"));
 	return run_command_v_opt(argv, RUN_GIT_CMD);
@@ -4318,9 +4326,10 @@ static int commit_staged_changes(struct repository *r,
 	}
 
 	if (is_clean) {
-		const char *cherry_pick_head = git_path_cherry_pick_head(r);
-
-		if (file_exists(cherry_pick_head) && unlink(cherry_pick_head))
+		if (refs_ref_exists(get_main_ref_store(r),
+				    "CHERRY_PICK_HEAD") &&
+		    refs_delete_pseudoref(get_main_ref_store(r),
+					  "CHERRY_PICK_HEAD", NULL))
 			return error(_("could not remove CHERRY_PICK_HEAD"));
 		if (!final_fixup)
 			return 0;
@@ -4379,7 +4388,8 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
 
 	if (!is_rebase_i(opts)) {
 		/* Verify that the conflict has been resolved */
-		if (file_exists(git_path_cherry_pick_head(r)) ||
+		if (refs_ref_exists(get_main_ref_store(r),
+				    "CHERRY_PICK_HEAD") ||
 		    file_exists(git_path_revert_head(r))) {
 			res = continue_single_pick(r);
 			if (res)
@@ -5442,7 +5452,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list)
 
 int sequencer_determine_whence(struct repository *r, enum commit_whence *whence)
 {
-	if (file_exists(git_path_cherry_pick_head(r))) {
+	if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD")) {
 		struct object_id cherry_pick_head, rebase_head;
 
 		if (file_exists(git_path_seq_dir()))
diff --git a/wt-status.c b/wt-status.c
index 98dfa6f73f9..96302be030b 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1636,8 +1636,8 @@ void wt_status_get_state(struct repository *r,
 		state->merge_in_progress = 1;
 	} else if (wt_status_check_rebase(NULL, state)) {
 		;		/* all set */
-	} else if (!stat(git_path_cherry_pick_head(r), &st) &&
-			!get_oid("CHERRY_PICK_HEAD", &oid)) {
+	} else if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
+		   !get_oid("CHERRY_PICK_HEAD", &oid)) {
 		state->cherry_pick_in_progress = 1;
 		oidcpy(&state->cherry_pick_head_oid, &oid);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 07/19] Treat REVERT_HEAD as a pseudo ref
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (5 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 06/19] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 08/19] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
                                                     ` (12 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 path.c      |  1 -
 path.h      |  8 +++-----
 sequencer.c | 16 +++++++++-------
 wt-status.c |  2 +-
 4 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/path.c b/path.c
index 783cc2ae819..7b385e5eb28 100644
--- a/path.c
+++ b/path.c
@@ -1528,7 +1528,6 @@ char *xdg_cache_home(const char *filename)
 	return NULL;
 }
 
-REPO_GIT_PATH_FUNC(revert_head, "REVERT_HEAD")
 REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
 REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
 REPO_GIT_PATH_FUNC(merge_rr, "MERGE_RR")
diff --git a/path.h b/path.h
index 8941c018a99..e7e77da6aaa 100644
--- a/path.h
+++ b/path.h
@@ -170,7 +170,6 @@ void report_linked_checkout_garbage(void);
 	}
 
 struct path_cache {
-	const char *revert_head;
 	const char *squash_msg;
 	const char *merge_msg;
 	const char *merge_rr;
@@ -181,12 +180,11 @@ struct path_cache {
 	const char *shallow;
 };
 
-#define PATH_CACHE_INIT                                              \
-	{                                                            \
-		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
+#define PATH_CACHE_INIT                                        \
+	{                                                      \
+		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
 	}
 
-const char *git_path_revert_head(struct repository *r);
 const char *git_path_squash_msg(struct repository *r);
 const char *git_path_merge_msg(struct repository *r);
 const char *git_path_merge_rr(struct repository *r);
diff --git a/sequencer.c b/sequencer.c
index 26286ec8d08..25ef9fc773d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2317,8 +2317,10 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose)
 		need_cleanup = 1;
 	}
 
-	if (file_exists(git_path_revert_head(r))) {
-		if (!unlink(git_path_revert_head(r)) && verbose)
+	if (refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
+		if (!refs_delete_pseudoref(get_main_ref_store(r), "REVERT_HEAD",
+					   NULL) &&
+		    verbose)
 			warning(_("cancelling a revert in progress"));
 		opts.action = REPLAY_REVERT;
 		need_cleanup = 1;
@@ -2677,7 +2679,7 @@ static int create_seq_dir(struct repository *r)
 	const char *in_progress_error = NULL;
 	const char *in_progress_advice = NULL;
 	unsigned int advise_skip =
-		file_exists(git_path_revert_head(r)) ||
+		refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD") ||
 		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
 
 	if (!sequencer_get_last_command(r, &action)) {
@@ -2778,7 +2780,7 @@ static int rollback_single_pick(struct repository *r)
 	struct object_id head_oid;
 
 	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
-	    !file_exists(git_path_revert_head(r)))
+	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
 		return error(_("no cherry-pick or revert in progress"));
 	if (read_ref_full("HEAD", 0, &head_oid, NULL))
 		return error(_("cannot resolve HEAD"));
@@ -2872,7 +2874,7 @@ int sequencer_skip(struct repository *r, struct replay_opts *opts)
 	 */
 	switch (opts->action) {
 	case REPLAY_REVERT:
-		if (!file_exists(git_path_revert_head(r))) {
+		if (!refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
 			if (action != REPLAY_REVERT)
 				return error(_("no revert in progress"));
 			if (!rollback_is_safe())
@@ -4210,7 +4212,7 @@ static int continue_single_pick(struct repository *r)
 	const char *argv[] = { "commit", NULL };
 
 	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
-	    !file_exists(git_path_revert_head(r)))
+	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
 		return error(_("no cherry-pick or revert in progress"));
 	return run_command_v_opt(argv, RUN_GIT_CMD);
 }
@@ -4390,7 +4392,7 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
 		/* Verify that the conflict has been resolved */
 		if (refs_ref_exists(get_main_ref_store(r),
 				    "CHERRY_PICK_HEAD") ||
-		    file_exists(git_path_revert_head(r))) {
+		    refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
 			res = continue_single_pick(r);
 			if (res)
 				goto release_todo_list;
diff --git a/wt-status.c b/wt-status.c
index 96302be030b..81b768504a1 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1642,7 +1642,7 @@ void wt_status_get_state(struct repository *r,
 		oidcpy(&state->cherry_pick_head_oid, &oid);
 	}
 	wt_status_check_bisect(NULL, state);
-	if (!stat(git_path_revert_head(r), &st) &&
+	if (refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD") &&
 	    !get_oid("REVERT_HEAD", &oid)) {
 		state->revert_in_progress = 1;
 		oidcpy(&state->revert_head_oid, &oid);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 08/19] Move REF_LOG_ONLY to refs-internal.h
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (6 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 07/19] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 09/19] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
                                                     ` (11 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

REF_LOG_ONLY is used in the transaction preparation: if a symref is involved in
a transaction, the referent of the symref should be updated, and the symref
itself should only be updated in the reflog. Other ref backends will need to
duplicate this logic.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/files-backend.c | 7 -------
 refs/refs-internal.h | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index df7553f4cc3..141b6b08816 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -38,13 +38,6 @@
  */
 #define REF_NEEDS_COMMIT (1 << 6)
 
-/*
- * Used as a flag in ref_update::flags when we want to log a ref
- * update but not actually perform it.  This is used when a symbolic
- * ref update is split up.
- */
-#define REF_LOG_ONLY (1 << 7)
-
 /*
  * Used as a flag in ref_update::flags when the ref_update was via an
  * update to HEAD.
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 59b053d53a2..dc9e8d3a92b 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -31,6 +31,13 @@ struct ref_transaction;
  */
 #define REF_HAVE_OLD (1 << 3)
 
+/*
+ * Used as a flag in ref_update::flags when we want to log a ref
+ * update but not actually perform it.  This is used when a symbolic
+ * ref update is split up.
+ */
+#define REF_LOG_ONLY (1 << 7)
+
 /*
  * Return the length of time to retry acquiring a loose reference lock
  * before giving up, in milliseconds:
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 09/19] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (7 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 08/19] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 10/19] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
                                                     ` (10 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This happens implicitly in the files/packed ref backend; making it
explicit simplifies adding alternate ref storage backends, such as
reftable.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 812fee47108..29e710a29e9 100644
--- a/refs.c
+++ b/refs.c
@@ -1450,7 +1450,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1510,8 +1510,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
-			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
+			       cb_data);
 }
 
 int for_each_rawref(each_ref_fn fn, void *cb_data)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 10/19] Add .gitattributes for the reftable/ directory
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (8 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 09/19] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 11/19] Add reftable library Han-Wen Nienhuys via GitGitGadget
                                                     ` (9 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 reftable/.gitattributes

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 00000000000..f44451a3795
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 11/19] Add reftable library
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (9 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 10/19] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 12/19] Add standalone build infrastructure for reftable Han-Wen Nienhuys via GitGitGadget
                                                     ` (8 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. The rationale and format
layout is detailed in commit 35e6c47404 ("reftable: file format documentation",
2020-05-20).

This commits imports a fully functioning library for reading and writing
reftables, along with unittests.

It is suggested to read the code in the following order:

* The specification under Documentation/technical/reftable.txt

* reftable.h - the public API

* record.{c,h} - reading and writing records

* block.{c,h} - reading and writing blocks.

* reader.{c,h} - reading a complete reftable file.

* writer.{c,h} - writing a complete reftable file.

* merged.{c,h} and pq.{c,h} - reading a stack of reftables

* stack.{c,h} - writing and compacting stacks of reftable on the
filesystem.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE          |   31 +
 reftable/VERSION          |    1 +
 reftable/basics.c         |  215 +++++++
 reftable/basics.h         |   53 ++
 reftable/block.c          |  432 +++++++++++++
 reftable/block.h          |  129 ++++
 reftable/block_test.c     |  157 +++++
 reftable/compat.c         |   98 +++
 reftable/compat.h         |   48 ++
 reftable/constants.h      |   21 +
 reftable/dump.c           |  212 +++++++
 reftable/file.c           |   95 +++
 reftable/iter.c           |  242 ++++++++
 reftable/iter.h           |   72 +++
 reftable/merged.c         |  321 ++++++++++
 reftable/merged.h         |   39 ++
 reftable/merged_test.c    |  273 +++++++++
 reftable/pq.c             |  115 ++++
 reftable/pq.h             |   34 ++
 reftable/reader.c         |  744 +++++++++++++++++++++++
 reftable/reader.h         |   65 ++
 reftable/record.c         | 1126 ++++++++++++++++++++++++++++++++++
 reftable/record.h         |  143 +++++
 reftable/record_test.c    |  410 +++++++++++++
 reftable/refname.c        |  209 +++++++
 reftable/refname.h        |   38 ++
 reftable/refname_test.c   |   99 +++
 reftable/reftable-tests.h |   22 +
 reftable/reftable.c       |   90 +++
 reftable/reftable.h       |  571 ++++++++++++++++++
 reftable/reftable_test.c  |  631 +++++++++++++++++++
 reftable/stack.c          | 1209 +++++++++++++++++++++++++++++++++++++
 reftable/stack.h          |   48 ++
 reftable/stack_test.c     |  787 ++++++++++++++++++++++++
 reftable/strbuf.c         |  206 +++++++
 reftable/strbuf.h         |   88 +++
 reftable/strbuf_test.c    |   39 ++
 reftable/system.h         |   53 ++
 reftable/test_framework.c |   69 +++
 reftable/test_framework.h |   64 ++
 reftable/tree.c           |   63 ++
 reftable/tree.h           |   34 ++
 reftable/tree_test.c      |   62 ++
 reftable/update.sh        |   22 +
 reftable/writer.c         |  662 ++++++++++++++++++++
 reftable/writer.h         |   60 ++
 reftable/zlib-compat.c    |   92 +++
 47 files changed, 10294 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/compat.c
 create mode 100644 reftable/compat.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/refname_test.c
 create mode 100644 reftable/reftable-tests.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/strbuf.c
 create mode 100644 reftable/strbuf.h
 create mode 100644 reftable/strbuf_test.c
 create mode 100644 reftable/system.h
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 00000000000..402e0f9356b
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 00000000000..21aced024b7
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1 @@
+d7472cbd1afe6bf3da53e94971b1cc79ce183fa8 Remove outdated bits of the README file
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 00000000000..e5110ab6ac4
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,215 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(uint8_t *out, uint32_t i)
+{
+	out[0] = (uint8_t)((i >> 16) & 0xff);
+	out[1] = (uint8_t)((i >> 8) & 0xff);
+	out[2] = (uint8_t)(i & 0xff);
+}
+
+uint32_t get_be24(uint8_t *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)(i & 0xff);
+}
+
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args)
+{
+	size_t lo = 0;
+	size_t hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		size_t mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		reftable_free(*p);
+		p++;
+	}
+	reftable_free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	int names_cap = 0;
+	int names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = reftable_realloc(
+					names, names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = reftable_realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *reftable_error_str(int err)
+{
+	static char buf[250];
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return "I/O error";
+	case REFTABLE_FORMAT_ERROR:
+		return "corrupt reftable file";
+	case REFTABLE_NOT_EXIST_ERROR:
+		return "file does not exist";
+	case REFTABLE_LOCK_ERROR:
+		return "data is outdated";
+	case REFTABLE_API_ERROR:
+		return "misuse of the reftable API";
+	case REFTABLE_ZLIB_ERROR:
+		return "zlib failure";
+	case REFTABLE_NAME_CONFLICT:
+		return "file/directory conflict";
+	case REFTABLE_REFNAME_ERROR:
+		return "invalid refname";
+	case -1:
+		return "general error";
+	default:
+		snprintf(buf, sizeof(buf), "unknown error code %d", err);
+		return buf;
+	}
+}
+
+int reftable_error_to_errno(int err)
+{
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return EIO;
+	case REFTABLE_FORMAT_ERROR:
+		return EFAULT;
+	case REFTABLE_NOT_EXIST_ERROR:
+		return ENOENT;
+	case REFTABLE_LOCK_ERROR:
+		return EBUSY;
+	case REFTABLE_API_ERROR:
+		return EINVAL;
+	case REFTABLE_ZLIB_ERROR:
+		return EDOM;
+	default:
+		return ERANGE;
+	}
+}
+
+void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
+void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
+void (*reftable_free_ptr)(void *) = &free;
+
+void *reftable_malloc(size_t sz)
+{
+	return (*reftable_malloc_ptr)(sz);
+}
+
+void *reftable_realloc(void *p, size_t sz)
+{
+	return (*reftable_realloc_ptr)(p, sz);
+}
+
+void reftable_free(void *p)
+{
+	reftable_free_ptr(p);
+}
+
+void *reftable_calloc(size_t sz)
+{
+	void *p = reftable_malloc(sz);
+	memset(p, 0, sz);
+	return p;
+}
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *))
+{
+	reftable_malloc_ptr = malloc;
+	reftable_realloc_ptr = realloc;
+	reftable_free_ptr = free;
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 00000000000..e1b4c9fa1a1
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+/* Bigendian en/decoding of integers */
+
+void put_be24(uint8_t *out, uint32_t i);
+uint32_t get_be24(uint8_t *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args);
+
+/*
+  Frees a NULL terminated array of malloced strings. The array itself is also
+  freed.
+ */
+void free_names(char **a);
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp);
+
+/* compares two NULL-terminated arrays of strings. */
+int names_equal(char **a, char **b);
+
+/* returns the array size of a NULL-terminated array of strings. */
+int names_length(char **names);
+
+/* Allocation routines; they invoke the functions set through
+ * reftable_set_alloc() */
+void *reftable_malloc(size_t sz);
+void *reftable_realloc(void *p, size_t sz);
+void reftable_free(void *p);
+void *reftable_calloc(size_t sz);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 00000000000..d6a1a42a2ba
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,432 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int header_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 24;
+	case 2:
+		return 28;
+	}
+	abort();
+}
+
+int footer_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 68;
+	case 2:
+		return 72;
+	}
+	abort();
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct strbuf *key);
+
+void block_writer_init(struct block_writer *bw, uint8_t typ, uint8_t *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+uint8_t block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the reftable_record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec)
+{
+	struct strbuf empty = STRBUF_INIT;
+	struct strbuf last =
+		w->entries % w->restart_interval == 0 ? empty : w->last_key;
+	struct string_view out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct string_view start = out;
+
+	bool restart = false;
+	struct strbuf key = STRBUF_INIT;
+	int n = 0;
+
+	reftable_record_key(rec, &key);
+	n = reftable_encode_key(&restart, out, last, key,
+				reftable_record_val_type(rec));
+	if (n < 0)
+		goto done;
+	string_view_consume(&out, n);
+
+	n = reftable_record_encode(rec, out, w->hash_size);
+	if (n < 0)
+		goto done;
+	string_view_consume(&out, n);
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  &key) < 0)
+		goto done;
+
+	strbuf_release(&key);
+	return 0;
+
+done:
+	strbuf_release(&key);
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct strbuf *key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next)
+		return -1;
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = reftable_realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+
+	strbuf_reset(&w->last_key);
+	strbuf_addbuf(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		uint8_t *compressed = NULL;
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		size_t dest_cap = src_len;
+
+		compressed = reftable_malloc(dest_cap);
+		while (1) {
+			uLongf out_dest_len = dest_cap;
+
+			zresult = compress2(compressed, &out_dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				dest_cap *= 2;
+				compressed =
+					reftable_realloc(compressed, dest_cap);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				reftable_free(compressed);
+				return REFTABLE_ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed,
+			       out_dest_len);
+			w->next = out_dest_len + block_header_skip;
+			reftable_free(compressed);
+			break;
+		}
+	}
+	return w->next;
+}
+
+uint8_t block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct reftable_block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	uint8_t typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	uint16_t restart_count = 0;
+	uint32_t restart_start = 0;
+	uint8_t *restart_bytes = NULL;
+
+	if (!reftable_is_block_type(typ))
+		return REFTABLE_FORMAT_ERROR;
+
+	if (typ == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip; /* total size of dest
+							    buffer. */
+		uLongf src_len = block->len - block_header_skip;
+		/* Log blocks specify the *uncompressed* size in their header.
+		 */
+		uint8_t *uncompressed = reftable_malloc(sz);
+
+		/* Copy over the block header verbatim. It's not compressed. */
+		memcpy(uncompressed, block->data, block_header_skip);
+
+		/* Uncompress */
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed + block_header_skip, &dst_len,
+				    block->data + block_header_skip,
+				    &src_len)) {
+			reftable_free(uncompressed);
+			return REFTABLE_ZLIB_ERROR;
+		}
+
+		if (dst_len + block_header_skip != sz)
+			return REFTABLE_FORMAT_ERROR;
+
+		/* We're done with the input data. */
+		reftable_block_done(block);
+		block->data = uncompressed;
+		block->len = sz;
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	restart_count = get_be16(block->data + sz - 2);
+	restart_start = sz - 2 - 3 * restart_count;
+	restart_bytes = block->data + restart_start;
+
+	/* transfer ownership. */
+	br->block = *block;
+	block->data = NULL;
+	block->len = 0;
+
+	br->hash_size = hash_size;
+	br->block_len = restart_start;
+	br->full_block_size = full_block_size;
+	br->header_off = header_off;
+	br->restart_count = restart_count;
+	br->restart_bytes = restart_bytes;
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	strbuf_reset(&it->last_key);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct strbuf key;
+	struct block_reader *r;
+};
+
+static int restart_key_less(size_t idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct string_view in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct strbuf rkey = STRBUF_INIT;
+	struct strbuf last_key = STRBUF_INIT;
+	uint8_t unused_extra;
+	int n = reftable_decode_key(&rkey, &unused_extra, last_key, in);
+	int result;
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	result = strbuf_cmp(&a->key, &rkey);
+	strbuf_release(&rkey);
+	return result;
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	strbuf_reset(&dest->last_key);
+	strbuf_addbuf(&dest->last_key, &src->last_key);
+}
+
+int block_iter_next(struct block_iter *it, struct reftable_record *rec)
+{
+	struct string_view in = {
+		.buf = it->br->block.data + it->next_off,
+		.len = it->br->block_len - it->next_off,
+	};
+	struct string_view start = in;
+	struct strbuf key = STRBUF_INIT;
+	uint8_t extra = 0;
+	int n = 0;
+
+	if (it->next_off >= it->br->block_len)
+		return 1;
+
+	n = reftable_decode_key(&key, &extra, it->last_key, in);
+	if (n < 0)
+		return -1;
+
+	string_view_consume(&in, n);
+	n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
+	if (n < 0)
+		return -1;
+	string_view_consume(&in, n);
+
+	strbuf_reset(&it->last_key);
+	strbuf_addbuf(&it->last_key, &key);
+	it->next_off += start.len - in.len;
+	strbuf_release(&key);
+	return 0;
+}
+
+int block_reader_first_key(struct block_reader *br, struct strbuf *key)
+{
+	struct strbuf empty = STRBUF_INIT;
+	int off = br->header_off + 4;
+	struct string_view in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	uint8_t extra = 0;
+	int n = reftable_decode_key(key, &extra, empty, in);
+	if (n < 0)
+		return n;
+
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct strbuf *want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	strbuf_release(&it->last_key);
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct strbuf *want)
+{
+	struct restart_find_args args = {
+		.key = *want,
+		.r = br,
+	};
+	struct reftable_record rec = reftable_new_record(block_reader_type(br));
+	struct strbuf key = STRBUF_INIT;
+	int err = 0;
+	struct block_iter next = {
+		.last_key = STRBUF_INIT,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	/* We're looking for the last entry less/equal than the wanted key, so
+	   we have to go one entry too far and then back up.
+	*/
+	while (true) {
+		block_iter_copy_from(&next, it);
+		err = block_iter_next(&next, &rec);
+		if (err < 0)
+			goto done;
+
+		reftable_record_key(&rec, &key);
+		if (err > 0 || strbuf_cmp(&key, want) >= 0) {
+			err = 0;
+			goto done;
+		}
+
+		block_iter_copy_from(it, &next);
+	}
+
+done:
+	strbuf_release(&key);
+	strbuf_release(&next.last_key);
+	reftable_record_destroy(&rec);
+
+	return err;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	strbuf_release(&bw->last_key);
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 00000000000..14a6f835f46
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,129 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+/*
+  Writes reftable blocks. The block_writer is reused across blocks to minimize
+  allocation overhead.
+*/
+struct block_writer {
+	uint8_t *buf;
+	uint32_t block_size;
+
+	/* Offset ofof the global header. Nonzero in the first block only. */
+	uint32_t header_off;
+
+	/* How often to restart keys. */
+	int restart_interval;
+	int hash_size;
+
+	/* Offset of next uint8_t to write. */
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+
+	struct strbuf last_key;
+	int entries;
+};
+
+/*
+  initializes the blockwriter to write `typ` entries, using `buf` as temporary
+  storage. `buf` is not owned by the block_writer. */
+void block_writer_init(struct block_writer *bw, uint8_t typ, uint8_t *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+
+/*
+  returns the block type (eg. 'r' for ref records.
+*/
+uint8_t block_writer_type(struct block_writer *bw);
+
+/* appends the record, or -1 if it doesn't fit. */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec);
+
+/* appends the key restarts, and compress the block if necessary. */
+int block_writer_finish(struct block_writer *w);
+
+/* clears out internally allocated block_writer members. */
+void block_writer_clear(struct block_writer *bw);
+
+/* Read a block. */
+struct block_reader {
+	/* offset of the block header; nonzero for the first block in a
+	 * reftable. */
+	uint32_t header_off;
+
+	/* the memory block */
+	struct reftable_block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	uint8_t *restart_bytes;
+	uint16_t restart_count;
+
+	/* size of the data in the file. For log blocks, this is the compressed
+	 * size. */
+	uint32_t full_block_size;
+};
+
+/* Iterate over entries in a block */
+struct block_iter {
+	/* offset within the block of the next entry to read. */
+	uint32_t next_off;
+	struct block_reader *br;
+
+	/* key for last entry we read. */
+	struct strbuf last_key;
+};
+
+/* initializes a block reader. */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Position `it` at start of the block */
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+
+/* Position `it` to the `want` key in the block */
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct strbuf *want);
+
+/* Returns the block type (eg. 'r' for refs) */
+uint8_t block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct strbuf *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct reftable_record *rec);
+
+/* Seek to `want` with in the block pointed to by `it` */
+int block_iter_seek(struct block_iter *it, struct strbuf *want);
+
+/* deallocate memory for `it`. The block reader and its block is left intact. */
+void block_iter_close(struct block_iter *it);
+
+/* size of file header, depending on format version */
+int header_size(int version);
+
+/* size of file footer, depending on format version */
+int footer_size(int version);
+
+/* returns a block to its source. */
+void reftable_block_done(struct reftable_block *ret);
+
+#endif
diff --git a/reftable/block_test.c b/reftable/block_test.c
new file mode 100644
index 00000000000..d9b347e31fa
--- /dev/null
+++ b/reftable/block_test.c
@@ -0,0 +1,157 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+struct binsearch_args {
+	int key;
+	int *arr;
+};
+
+static int binsearch_func(size_t i, void *void_args)
+{
+	struct binsearch_args *args = (struct binsearch_args *)void_args;
+
+	return args->key < args->arr[i];
+}
+
+static void test_binsearch(void)
+{
+	int arr[] = { 2, 4, 6, 8, 10 };
+	size_t sz = ARRAY_SIZE(arr);
+	struct binsearch_args args = {
+		.arr = arr,
+	};
+
+	int i = 0;
+	for (i = 1; i < 11; i++) {
+		int res;
+		args.key = i;
+		res = binsearch(sz, &binsearch_func, &args);
+
+		if (res < sz) {
+			assert(args.key < arr[res]);
+			if (res > 0) {
+				assert(args.key >= arr[res - 1]);
+			}
+		} else {
+			assert(args.key == 10 || args.key == 11);
+		}
+	}
+}
+
+static void test_block_read_write(void)
+{
+	const int header_off = 21; /* random */
+	char *names[30];
+	const int N = ARRAY_SIZE(names);
+	const int block_size = 1024;
+	struct reftable_block block = { 0 };
+	struct block_writer bw = {
+		.last_key = STRBUF_INIT,
+	};
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_record rec = { 0 };
+	int i = 0;
+	int n;
+	struct block_reader br = { 0 };
+	struct block_iter it = { .last_key = STRBUF_INIT };
+	int j = 0;
+	struct strbuf want = STRBUF_INIT;
+
+	block.data = reftable_calloc(block_size);
+	block.len = block_size;
+	block.source = malloc_block_source();
+	block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size,
+			  header_off, hash_size(SHA1_ID));
+	reftable_record_from_ref(&rec, &ref);
+
+	for (i = 0; i < N; i++) {
+		char name[100];
+		uint8_t hash[SHA1_SIZE];
+		snprintf(name, sizeof(name), "branch%02d", i);
+		memset(hash, i, sizeof(hash));
+
+		ref.ref_name = name;
+		ref.value = hash;
+		names[i] = xstrdup(name);
+		n = block_writer_add(&bw, &rec);
+		ref.ref_name = NULL;
+		ref.value = NULL;
+		assert(n == 0);
+	}
+
+	n = block_writer_finish(&bw);
+	assert(n > 0);
+
+	block_writer_clear(&bw);
+
+	block_reader_init(&br, &block, header_off, block_size, SHA1_SIZE);
+
+	block_reader_start(&br, &it);
+
+	while (true) {
+		int r = block_iter_next(&it, &rec);
+		assert(r >= 0);
+		if (r > 0) {
+			break;
+		}
+		assert_streq(names[j], ref.ref_name);
+		j++;
+	}
+
+	reftable_record_clear(&rec);
+	block_iter_close(&it);
+
+	for (i = 0; i < N; i++) {
+		struct block_iter it = { .last_key = STRBUF_INIT };
+		strbuf_reset(&want);
+		strbuf_addstr(&want, names[i]);
+
+		n = block_reader_seek(&br, &it, &want);
+		assert(n == 0);
+
+		n = block_iter_next(&it, &rec);
+		assert(n == 0);
+
+		assert_streq(names[i], ref.ref_name);
+
+		want.len--;
+		n = block_reader_seek(&br, &it, &want);
+		assert(n == 0);
+
+		n = block_iter_next(&it, &rec);
+		assert(n == 0);
+		assert_streq(names[10 * (i / 10)], ref.ref_name);
+
+		block_iter_close(&it);
+	}
+
+	reftable_record_clear(&rec);
+	reftable_block_done(&br.block);
+	strbuf_release(&want);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+}
+
+int block_test_main(int argc, const char *argv[])
+{
+	add_test_case("binsearch", &test_binsearch);
+	add_test_case("block_read_write", &test_block_read_write);
+	return test_main(argc, argv);
+}
diff --git a/reftable/compat.c b/reftable/compat.c
new file mode 100644
index 00000000000..be0fd65d2f0
--- /dev/null
+++ b/reftable/compat.c
@@ -0,0 +1,98 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+
+*/
+
+/* compat.c - compatibility functions for standalone compilation */
+
+#include "system.h"
+#include "basics.h"
+
+#ifdef REFTABLE_STANDALONE
+
+#include <dirent.h>
+
+void put_be32(void *p, uint32_t i)
+{
+	uint8_t *out = (uint8_t *)p;
+
+	out[0] = (uint8_t)((i >> 24) & 0xff);
+	out[1] = (uint8_t)((i >> 16) & 0xff);
+	out[2] = (uint8_t)((i >> 8) & 0xff);
+	out[3] = (uint8_t)((i)&0xff);
+}
+
+uint32_t get_be32(uint8_t *in)
+{
+	return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
+	       (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
+}
+
+void put_be64(void *p, uint64_t v)
+{
+	uint8_t *out = (uint8_t *)p;
+	int i = sizeof(uint64_t);
+	while (i--) {
+		out[i] = (uint8_t)(v & 0xff);
+		v >>= 8;
+	}
+}
+
+uint64_t get_be64(void *out)
+{
+	uint8_t *bytes = (uint8_t *)out;
+	uint64_t v = 0;
+	int i = 0;
+	for (i = 0; i < sizeof(uint64_t); i++) {
+		v = (v << 8) | (uint8_t)(bytes[i] & 0xff);
+	}
+	return v;
+}
+
+uint16_t get_be16(uint8_t *in)
+{
+	return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
+}
+
+char *xstrdup(const char *s)
+{
+	int l = strlen(s);
+	char *dest = (char *)reftable_malloc(l + 1);
+	strncpy(dest, s, l + 1);
+	return dest;
+}
+
+void sleep_millisec(int millisecs)
+{
+	usleep(millisecs * 1000);
+}
+
+void reftable_clear_dir(const char *dirname)
+{
+	DIR *dir = opendir(dirname);
+	struct dirent *ent = NULL;
+	assert(dir);
+	while ((ent = readdir(dir)) != NULL) {
+		unlinkat(dirfd(dir), ent->d_name, 0);
+	}
+	closedir(dir);
+	rmdir(dirname);
+}
+
+#else
+
+#include "../dir.h"
+
+void reftable_clear_dir(const char *dirname)
+{
+	struct strbuf path = STRBUF_INIT;
+	strbuf_addstr(&path, dirname);
+	remove_dir_recursively(&path, 0);
+	strbuf_release(&path);
+}
+
+#endif
diff --git a/reftable/compat.h b/reftable/compat.h
new file mode 100644
index 00000000000..a765c57e966
--- /dev/null
+++ b/reftable/compat.h
@@ -0,0 +1,48 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef COMPAT_H
+#define COMPAT_H
+
+#include "system.h"
+
+#ifdef REFTABLE_STANDALONE
+
+/* functions that git-core provides, for standalone compilation */
+#include <stdint.h>
+
+uint64_t get_be64(void *in);
+void put_be64(void *out, uint64_t i);
+
+void put_be32(void *out, uint32_t i);
+uint32_t get_be32(uint8_t *in);
+
+uint16_t get_be16(uint8_t *in);
+
+#define ARRAY_SIZE(a) sizeof((a)) / sizeof((a)[0])
+#define FREE_AND_NULL(x)          \
+	do {                      \
+		reftable_free(x); \
+		(x) = NULL;       \
+	} while (0)
+#define QSORT(arr, n, cmp) qsort(arr, n, sizeof(arr[0]), cmp)
+#define SWAP(a, b)                              \
+	{                                       \
+		char tmp[sizeof(a)];            \
+		assert(sizeof(a) == sizeof(b)); \
+		memcpy(&tmp[0], &a, sizeof(a)); \
+		memcpy(&a, &b, sizeof(a));      \
+		memcpy(&b, &tmp[0], sizeof(a)); \
+	}
+
+char *xstrdup(const char *s);
+
+void sleep_millisec(int millisecs);
+
+#endif
+#endif
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 00000000000..5eee72c4c11
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,21 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 00000000000..00b444e8c9b
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,212 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+
+#include "reftable.h"
+#include "reftable-tests.h"
+
+static uint32_t hash_id;
+
+static int dump_table(const char *tablename)
+{
+	struct reftable_block_source src = { 0 };
+	int err = reftable_block_source_from_file(&src, tablename);
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+	struct reftable_reader *r = NULL;
+
+	if (err < 0)
+		return err;
+
+	err = reftable_new_reader(&r, &src, tablename);
+	if (err < 0)
+		return err;
+
+	err = reftable_reader_seek_ref(r, &it, "");
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_ref_record_print(&ref, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_ref_record_clear(&ref);
+
+	err = reftable_reader_seek_log(r, &it, "");
+	if (err < 0) {
+		return err;
+	}
+	while (1) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_log_record_print(&log, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_log_record_clear(&log);
+
+	reftable_reader_free(r);
+	return 0;
+}
+
+static int compact_stack(const char *stackdir)
+{
+	struct reftable_stack *stack = NULL;
+	struct reftable_write_options cfg = {};
+
+	int err = reftable_new_stack(&stack, stackdir, cfg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_stack_compact_all(stack, NULL);
+	if (err < 0)
+		goto done;
+done:
+	if (stack != NULL) {
+		reftable_stack_destroy(stack);
+	}
+	return err;
+}
+
+static int dump_stack(const char *stackdir)
+{
+	struct reftable_stack *stack = NULL;
+	struct reftable_write_options cfg = {};
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+	struct reftable_merged_table *merged = NULL;
+
+	int err = reftable_new_stack(&stack, stackdir, cfg);
+	if (err < 0)
+		return err;
+
+	merged = reftable_stack_merged_table(stack);
+
+	err = reftable_merged_table_seek_ref(merged, &it, "");
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_ref_record_print(&ref, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_ref_record_clear(&ref);
+
+	err = reftable_merged_table_seek_log(merged, &it, "");
+	if (err < 0) {
+		return err;
+	}
+	while (1) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_log_record_print(&log, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_log_record_clear(&log);
+
+	reftable_stack_destroy(stack);
+	return 0;
+}
+
+static void print_help(void)
+{
+	printf("usage: dump [-cst] arg\n\n"
+	       "options: \n"
+	       "  -c compact\n"
+	       "  -t dump table\n"
+	       "  -s dump stack\n"
+	       "  -h this help\n"
+	       "  -2 use SHA256\n"
+	       "\n");
+}
+
+int reftable_dump_main(int argc, char *const *argv)
+{
+	int err = 0;
+	int opt;
+	int opt_dump_table = 0;
+	int opt_dump_stack = 0;
+	int opt_compact = 0;
+	const char *arg = NULL;
+	while ((opt = getopt(argc, argv, "2chts")) != -1) {
+		switch (opt) {
+		case '2':
+			hash_id = 0x73323536;
+			break;
+		case 't':
+			opt_dump_table = 1;
+			break;
+		case 's':
+			opt_dump_stack = 1;
+			break;
+		case 'c':
+			opt_compact = 1;
+			break;
+		case '?':
+		case 'h':
+			print_help();
+			return 2;
+			break;
+		}
+	}
+
+	if (argv[optind] == NULL) {
+		fprintf(stderr, "need argument\n");
+		print_help();
+		return 2;
+	}
+
+	arg = argv[optind];
+
+	if (opt_dump_table) {
+		err = dump_table(arg);
+	} else if (opt_dump_stack) {
+		err = dump_stack(arg);
+	} else if (opt_compact) {
+		err = compact_stack(arg);
+	}
+
+	if (err < 0) {
+		fprintf(stderr, "%s: %s: %s\n", argv[0], arg,
+			reftable_error_str(err));
+		return 1;
+	}
+	return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 00000000000..eacf41f23b4
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,95 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	reftable_free(b);
+}
+
+static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = reftable_malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size)
+		return -1;
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int reftable_block_source_from_file(struct reftable_block_source *bs,
+				    const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	struct file_block_source *p = NULL;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return REFTABLE_NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0)
+		return -1;
+
+	p = reftable_calloc(sizeof(struct file_block_source));
+	p->size = st.st_size;
+	p->fd = fd;
+
+	assert(bs->ops == NULL);
+	bs->ops = &file_vtable;
+	bs->arg = p;
+	return 0;
+}
+
+int reftable_fd_write(void *arg, const void *data, size_t sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 00000000000..8914e09ac07
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,242 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct reftable_iterator *it)
+{
+	return it->ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct reftable_record *rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct reftable_iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct reftable_iterator *it)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec)
+{
+	return it->ops->next(it->iter_arg, rec);
+}
+
+void reftable_iterator_destroy(struct reftable_iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, ref);
+	return iterator_next(it, &rec);
+}
+
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, log);
+	return iterator_next(it, &rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	strbuf_release(&fri->oid);
+	reftable_iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg,
+				       struct reftable_record *rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+	int err = 0;
+	while (true) {
+		err = reftable_iterator_next_ref(&fri->it, ref);
+		if (err != 0) {
+			break;
+		}
+
+		if (fri->double_check) {
+			struct reftable_iterator it = { 0 };
+
+			err = reftable_table_seek_ref(&fri->tab, &it,
+						      ref->ref_name);
+			if (err == 0) {
+				err = reftable_iterator_next_ref(&it, ref);
+			}
+
+			reftable_iterator_destroy(&it);
+
+			if (err < 0) {
+				break;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+
+	reftable_ref_record_clear(ref);
+	return err;
+}
+
+struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reftable_block_done(&it->block_reader.block);
+	strbuf_release(&it->oid);
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	uint64_t off;
+	int err = 0;
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reftable_block_done(&it->block_reader.block);
+
+	off = it->offsets[it->offset_idx++];
+	err = reader_init_block_reader(it->r, &it->block_reader, off,
+				       BLOCK_TYPE_REF);
+	if (err < 0) {
+		return err;
+	}
+	if (err > 0) {
+		/* indexed block does not exist. */
+		return REFTABLE_FORMAT_ERROR;
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, uint8_t *oid,
+			       int oid_len, uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter empty = INDEXED_TABLE_REF_ITER_INIT;
+	struct indexed_table_ref_iter *itr =
+		reftable_calloc(sizeof(struct indexed_table_ref_iter));
+	int err = 0;
+
+	*itr = empty;
+	itr->r = r;
+	strbuf_add(&itr->oid, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		reftable_free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 00000000000..c96aaed5839
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,72 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "strbuf.h"
+
+struct reftable_iterator_vtable {
+	int (*next)(void *iter_arg, struct reftable_record *rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct reftable_iterator *it);
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec);
+
+/* Returns true for a zeroed out iterator, such as the one returned from
+   iterator_destroy. */
+bool iterator_is_null(struct reftable_iterator *it);
+
+/* iterator that produces only ref records that point to `oid` */
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reftable_table tab;
+	struct strbuf oid;
+	struct reftable_iterator it;
+};
+#define FILTERING_REF_ITERATOR_INIT \
+	{                           \
+		.oid = STRBUF_INIT  \
+	}
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
+					  struct filtering_ref_iterator *);
+
+/* iterator that produces only ref records that point to `oid`,
+   but using the object index.
+ */
+struct indexed_table_ref_iter {
+	struct reftable_reader *r;
+	struct strbuf oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+#define INDEXED_TABLE_REF_ITER_INIT                                     \
+	{                                                               \
+		.cur = { .last_key = STRBUF_INIT }, .oid = STRBUF_INIT, \
+	}
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, uint8_t *oid,
+			       int oid_len, uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 00000000000..9ca4fd42c11
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,321 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+#include "record.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct reftable_record rec = reftable_new_record(mi->typ);
+		int err = iterator_next(&mi->stack[i], &rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[i]);
+			reftable_record_destroy(&rec);
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->stack[i]);
+	}
+	reftable_free(mi->stack);
+}
+
+static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
+					       size_t idx)
+{
+	struct reftable_record rec = reftable_new_record(mi->typ);
+	struct pq_entry e = {
+		.rec = rec,
+		.index = idx,
+	};
+	int err = iterator_next(&mi->stack[idx], &rec);
+	if (err < 0)
+		return err;
+
+	if (err > 0) {
+		reftable_iterator_destroy(&mi->stack[idx]);
+		reftable_record_destroy(&rec);
+		return 0;
+	}
+
+	merged_iter_pqueue_add(&mi->pq, e);
+	return 0;
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
+{
+	if (iterator_is_null(&mi->stack[idx]))
+		return 0;
+	return merged_iter_advance_nonnull_subiter(mi, idx);
+}
+
+static int merged_iter_next_entry(struct merged_iter *mi,
+				  struct reftable_record *rec)
+{
+	struct strbuf entry_key = STRBUF_INIT;
+	struct pq_entry entry = { 0 };
+	int err = 0;
+
+	if (merged_iter_pqueue_is_empty(mi->pq))
+		return 1;
+
+	entry = merged_iter_pqueue_remove(&mi->pq);
+	err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0)
+		return err;
+
+	/*
+	  One can also use reftable as datacenter-local storage, where the ref
+	  database is maintained in globally consistent database (eg.
+	  CockroachDB or Spanner). In this scenario, replication delays together
+	  with compaction may cause newer tables to contain older entries. In
+	  such a deployment, the loop below must be changed to collect all
+	  entries for the same key, and return new the newest one.
+	*/
+	reftable_record_key(&entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct strbuf k = STRBUF_INIT;
+		int err = 0, cmp = 0;
+
+		reftable_record_key(&top.rec, &k);
+
+		cmp = strbuf_cmp(&k, &entry_key);
+		strbuf_release(&k);
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		reftable_record_destroy(&top.rec);
+	}
+
+	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
+	reftable_record_destroy(&entry.rec);
+	strbuf_release(&entry_key);
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec)
+{
+	while (true) {
+		int err = merged_iter_next_entry(mi, rec);
+		if (err == 0 && mi->suppress_deletions &&
+		    reftable_record_is_deletion(rec)) {
+			continue;
+		}
+
+		return err;
+	}
+}
+
+static int merged_iter_next_void(void *p, struct reftable_record *rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq))
+		return 1;
+
+	return merged_iter_next(mi, rec);
+}
+
+struct reftable_iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct reftable_iterator *it,
+				      struct merged_iter *mi)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id)
+{
+	struct reftable_merged_table *m = NULL;
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		struct reftable_reader *r = stack[i];
+		if (r->hash_id != hash_id) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i == 0) {
+			first_min = reftable_reader_min_update_index(r);
+		}
+
+		last_max = reftable_reader_max_update_index(r);
+	}
+
+	m = (struct reftable_merged_table *)reftable_calloc(
+		sizeof(struct reftable_merged_table));
+	m->stack = stack;
+	m->stack_len = n;
+	m->min = first_min;
+	m->max = last_max;
+	m->hash_id = hash_id;
+	*dest = m;
+	return 0;
+}
+
+void reftable_merged_table_close(struct reftable_merged_table *mt)
+{
+	int i = 0;
+	for (i = 0; i < mt->stack_len; i++) {
+		reftable_reader_free(mt->stack[i]);
+	}
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct reftable_merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void reftable_merged_table_free(struct reftable_merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	reftable_free(mt);
+}
+
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
+{
+	return mt->min;
+}
+
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec)
+{
+	struct reftable_iterator *iters = reftable_calloc(
+		sizeof(struct reftable_iterator) * mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = reftable_record_type(rec),
+		.hash_id = mt->hash_id,
+		.suppress_deletions = mt->suppress_deletions,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reader_seek(mt->stack[i], &iters[n], rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			reftable_iterator_destroy(&iters[i]);
+		}
+		reftable_free(iters);
+		return err;
+	}
+
+	merged.stack_len = n;
+	err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	} else {
+		struct merged_iter *p =
+			reftable_malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_merged_table_seek_log_at(mt, it, name, max);
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 00000000000..fe80556c1dc
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct reftable_merged_table {
+	struct reftable_reader **stack;
+	int stack_len;
+	uint32_t hash_id;
+	bool suppress_deletions;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	int stack_len;
+	uint8_t typ;
+	bool suppress_deletions;
+	struct merged_iter_pqueue pq;
+};
+
+void merged_table_clear(struct reftable_merged_table *mt);
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec);
+
+#endif
diff --git a/reftable/merged_test.c b/reftable/merged_test.c
new file mode 100644
index 00000000000..6f696918bbe
--- /dev/null
+++ b/reftable/merged_test.c
@@ -0,0 +1,273 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "pq.h"
+#include "reader.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_pq(void)
+{
+	char *names[54] = { 0 };
+	int N = ARRAY_SIZE(names) - 1;
+
+	struct merged_iter_pqueue pq = { 0 };
+	const char *last = NULL;
+
+	int i = 0;
+	for (i = 0; i < N; i++) {
+		char name[100];
+		snprintf(name, sizeof(name), "%02d", i);
+		names[i] = xstrdup(name);
+	}
+
+	i = 1;
+	do {
+		struct reftable_record rec =
+			reftable_new_record(BLOCK_TYPE_REF);
+		struct pq_entry e = { 0 };
+
+		reftable_record_as_ref(&rec)->ref_name = names[i];
+		e.rec = rec;
+		merged_iter_pqueue_add(&pq, e);
+		merged_iter_pqueue_check(pq);
+		i = (i * 7) % N;
+	} while (i != 1);
+
+	while (!merged_iter_pqueue_is_empty(pq)) {
+		struct pq_entry e = merged_iter_pqueue_remove(&pq);
+		struct reftable_ref_record *ref =
+			reftable_record_as_ref(&e.rec);
+
+		merged_iter_pqueue_check(pq);
+
+		if (last != NULL) {
+			assert(strcmp(last, ref->ref_name) < 0);
+		}
+		last = ref->ref_name;
+		ref->ref_name = NULL;
+		reftable_free(ref);
+	}
+
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+
+	merged_iter_pqueue_clear(&pq);
+}
+
+static void write_test_table(struct strbuf *buf,
+			     struct reftable_ref_record refs[], int n)
+{
+	int min = 0xffffffff;
+	int max = 0;
+	int i = 0;
+	int err;
+
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_writer *w = NULL;
+	for (i = 0; i < n; i++) {
+		uint64_t ui = refs[i].update_index;
+		if (ui > max) {
+			max = ui;
+		}
+		if (ui < min) {
+			min = ui;
+		}
+	}
+
+	w = reftable_new_writer(&strbuf_add_void, buf, &opts);
+	reftable_writer_set_limits(w, min, max);
+
+	for (i = 0; i < n; i++) {
+		uint64_t before = refs[i].update_index;
+		int n = reftable_writer_add_ref(w, &refs[i]);
+		assert(n == 0);
+		assert(before == refs[i].update_index);
+	}
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+
+	reftable_writer_free(w);
+}
+
+static struct reftable_merged_table *
+merged_table_from_records(struct reftable_ref_record **refs,
+			  struct reftable_block_source **source, int *sizes,
+			  struct strbuf *buf, int n)
+{
+	struct reftable_reader **rd = reftable_calloc(n * sizeof(*rd));
+	int i = 0;
+	struct reftable_merged_table *mt = NULL;
+	int err;
+	*source = reftable_calloc(n * sizeof(**source));
+	for (i = 0; i < n; i++) {
+		write_test_table(&buf[i], refs[i], sizes[i]);
+		block_source_from_strbuf(&(*source)[i], &buf[i]);
+
+		err = reftable_new_reader(&rd[i], &(*source)[i], "name");
+		assert_err(err);
+	}
+
+	err = reftable_new_merged_table(&mt, rd, n, SHA1_ID);
+	assert_err(err);
+	return mt;
+}
+
+static void test_merged_between(void)
+{
+	uint8_t hash1[SHA1_SIZE] = { 1, 2, 3, 0 };
+
+	struct reftable_ref_record r1[] = { {
+		.ref_name = "b",
+		.update_index = 1,
+		.value = hash1,
+	} };
+	struct reftable_ref_record r2[] = { {
+		.ref_name = "a",
+		.update_index = 2,
+	} };
+
+	struct reftable_ref_record *refs[] = { r1, r2 };
+	int sizes[] = { 1, 1 };
+	struct strbuf bufs[2] = { STRBUF_INIT, STRBUF_INIT };
+	struct reftable_block_source *bs = NULL;
+	struct reftable_merged_table *mt =
+		merged_table_from_records(refs, &bs, sizes, bufs, 2);
+	int i;
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_iterator it = { 0 };
+	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert_err(err);
+	assert(ref.update_index == 2);
+	reftable_ref_record_clear(&ref);
+
+	reftable_iterator_destroy(&it);
+	reftable_merged_table_close(mt);
+	reftable_merged_table_free(mt);
+	for (i = 0; i < ARRAY_SIZE(bufs); i++) {
+		strbuf_release(&bufs[i]);
+	}
+	reftable_free(bs);
+}
+
+static void test_merged(void)
+{
+	uint8_t hash1[SHA1_SIZE] = { 1 };
+	uint8_t hash2[SHA1_SIZE] = { 2 };
+	struct reftable_ref_record r1[] = { {
+						    .ref_name = "a",
+						    .update_index = 1,
+						    .value = hash1,
+					    },
+					    {
+						    .ref_name = "b",
+						    .update_index = 1,
+						    .value = hash1,
+					    },
+					    {
+						    .ref_name = "c",
+						    .update_index = 1,
+						    .value = hash1,
+					    } };
+	struct reftable_ref_record r2[] = { {
+		.ref_name = "a",
+		.update_index = 2,
+	} };
+	struct reftable_ref_record r3[] = {
+		{
+			.ref_name = "c",
+			.update_index = 3,
+			.value = hash2,
+		},
+		{
+			.ref_name = "d",
+			.update_index = 3,
+			.value = hash1,
+		},
+	};
+
+	struct reftable_ref_record want[] = {
+		r2[0],
+		r1[1],
+		r3[0],
+		r3[1],
+	};
+
+	struct reftable_ref_record *refs[] = { r1, r2, r3 };
+	int sizes[3] = { 3, 1, 2 };
+	struct strbuf bufs[3] = { STRBUF_INIT, STRBUF_INIT, STRBUF_INIT };
+	struct reftable_block_source *bs = NULL;
+
+	struct reftable_merged_table *mt =
+		merged_table_from_records(refs, &bs, sizes, bufs, 3);
+
+	struct reftable_iterator it = { 0 };
+	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	struct reftable_ref_record *out = NULL;
+	int len = 0;
+	int cap = 0;
+	int i = 0;
+
+	assert_err(err);
+	while (len < 100) { /* cap loops/recursion. */
+		struct reftable_ref_record ref = { 0 };
+		int err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			out = reftable_realloc(
+				out, sizeof(struct reftable_ref_record) * cap);
+		}
+		out[len++] = ref;
+	}
+	reftable_iterator_destroy(&it);
+
+	assert(ARRAY_SIZE(want) == len);
+	for (i = 0; i < len; i++) {
+		assert(reftable_ref_record_equal(&want[i], &out[i], SHA1_SIZE));
+	}
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&out[i]);
+	}
+	reftable_free(out);
+
+	for (i = 0; i < 3; i++) {
+		strbuf_release(&bufs[i]);
+	}
+	reftable_merged_table_close(mt);
+	reftable_merged_table_free(mt);
+	reftable_free(bs);
+}
+
+/* XXX test refs_for(oid) */
+
+int merged_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_merged_between", &test_merged_between);
+	add_test_case("test_pq", &test_pq);
+	add_test_case("test_merged", &test_merged);
+	return test_main(argc, argv);
+}
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 00000000000..de7ed658a42
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,115 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "reftable.h"
+#include "system.h"
+#include "basics.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct strbuf ak = STRBUF_INIT;
+	struct strbuf bk = STRBUF_INIT;
+	int cmp = 0;
+	reftable_record_key(&a.rec, &ak);
+	reftable_record_key(&b.rec, &bk);
+
+	cmp = strbuf_cmp(&ak, &bk);
+
+	strbuf_release(&ak);
+	strbuf_release(&bk);
+
+	if (cmp == 0)
+		return a.index > b.index;
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = reftable_realloc(pq->heap,
+					    pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		reftable_record_destroy(&pq->heap[i].rec);
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 00000000000..43c0974f162
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct reftable_record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	int len;
+	int cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 00000000000..3c8854c12d9
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,744 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct reftable_block_source *source)
+{
+	return source->ops->size(source->arg);
+}
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	int result = source->ops->read_block(source->arg, dest, off, size);
+	dest->source = *source;
+	return result;
+}
+
+void reftable_block_done(struct reftable_block *blockp)
+{
+	struct reftable_block_source source = blockp->source;
+	if (blockp != NULL && source.ops != NULL)
+		source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct reftable_block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reftable_reader_offsets *
+reader_offsets_for(struct reftable_reader *r, uint8_t typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reftable_reader *r,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size)
+		return 0;
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(&r->source, dest, off, sz);
+}
+
+uint32_t reftable_reader_hash_id(struct reftable_reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reftable_reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reftable_reader *r, uint8_t *footer,
+			uint8_t *header)
+{
+	uint8_t *f = footer;
+	uint8_t first_block_typ;
+	int err = 0;
+	uint32_t computed_crc;
+	uint32_t file_crc;
+
+	if (memcmp(f, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, header_size(r->version))) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	f++;
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	if (r->version == 1) {
+		r->hash_id = SHA1_ID;
+	} else {
+		r->hash_id = get_be32(f);
+		switch (r->hash_id) {
+		case SHA1_ID:
+			break;
+		case SHA256_ID:
+			break;
+		default:
+			err = REFTABLE_FORMAT_ERROR;
+			goto done;
+		}
+		f += 4;
+	}
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	computed_crc = crc32(0, footer, f - footer);
+	file_crc = get_be32(f);
+	f += 4;
+	if (computed_crc != file_crc) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	first_block_typ = header[header_size(r->version)];
+	r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+	r->ref_offsets.offset = 0;
+	r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+				  r->log_offsets.offset > 0);
+	r->obj_offsets.present = r->obj_offsets.offset > 0;
+	err = 0;
+done:
+	return err;
+}
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name)
+{
+	struct reftable_block footer = { 0 };
+	struct reftable_block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reftable_reader));
+
+	/* Need +1 to read type of first block. */
+	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
+	if (err != header_size(2) + 1) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	if (memcmp(header.data, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+	r->version = header.data[4];
+	if (r->version != 1 && r->version != 2) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	r->size = block_source_size(source) - footer_size(r->version);
+	r->source = *source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size,
+				      footer_size(r->version));
+	if (err != footer_size(r->version)) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+done:
+	reftable_block_done(&footer);
+	reftable_block_done(&header);
+	return err;
+}
+
+struct table_iter {
+	struct reftable_reader *r;
+	uint8_t typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+#define TABLE_ITER_INIT                          \
+	{                                        \
+		.bi = {.last_key = STRBUF_INIT } \
+	}
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti,
+				    struct reftable_record *rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && reftable_record_type(rec) == BLOCK_TYPE_REF) {
+		((struct reftable_ref_record *)rec->data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reftable_block_done(&ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(uint8_t *data, uint8_t *typ, uint64_t off,
+				  int version)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += header_size(version);
+	}
+
+	*typ = data[0];
+	if (reftable_is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, uint8_t want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+						   DEFAULT_BLOCK_SIZE;
+	struct reftable_block block = { 0 };
+	uint8_t block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : header_size(r->version);
+	int32_t block_size = 0;
+
+	if (next_off >= r->size)
+		return 1;
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0)
+		return err;
+
+	block_size = extract_block_size(block.data, &block_typ, next_off,
+					r->version);
+	if (block_size < 0)
+		return block_size;
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reftable_block_done(&block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reftable_block_done(&block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0)
+		return err;
+	else {
+		struct block_reader *brp =
+			reftable_malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
+{
+	if (reftable_record_type(rec) != ti->typ)
+		return REFTABLE_API_ERROR;
+
+	while (true) {
+		struct table_iter next = TABLE_ITER_INIT;
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct reftable_record *rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct reftable_iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reftable_reader *r,
+				struct table_iter *ti, uint64_t off,
+				uint8_t typ)
+{
+	struct block_reader br = { 0 };
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0)
+		return err;
+
+	brp = reftable_malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reftable_reader *r, struct table_iter *ti,
+			uint8_t typ, bool index)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
+			      struct reftable_record *want)
+{
+	struct reftable_record rec =
+		reftable_new_record(reftable_record_type(want));
+	struct strbuf want_key = STRBUF_INIT;
+	struct strbuf got_key = STRBUF_INIT;
+	struct table_iter next = TABLE_ITER_INIT;
+	int err = -1;
+
+	reftable_record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0)
+			goto done;
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0)
+			goto done;
+
+		if (strbuf_cmp(&got_key, &want_key) > 0) {
+			table_iter_block_done(&next);
+			break;
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, &want_key);
+	if (err < 0)
+		goto done;
+	err = 0;
+
+done:
+	block_iter_close(&next.bi);
+	reftable_record_destroy(&rec);
+	strbuf_release(&want_key);
+	strbuf_release(&got_key);
+	return err;
+}
+
+static int reader_seek_indexed(struct reftable_reader *r,
+			       struct reftable_iterator *it,
+			       struct reftable_record *rec)
+{
+	struct reftable_index_record want_index = { .last_key = STRBUF_INIT };
+	struct reftable_record want_index_rec = { 0 };
+	struct reftable_index_record index_result = { .last_key = STRBUF_INIT };
+	struct reftable_record index_result_rec = { 0 };
+	struct table_iter index_iter = TABLE_ITER_INIT;
+	struct table_iter next = TABLE_ITER_INIT;
+	int err = 0;
+
+	reftable_record_key(rec, &want_index.last_key);
+	reftable_record_from_index(&want_index_rec, &want_index);
+	reftable_record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, reftable_record_type(rec), true);
+	if (err < 0)
+		goto done;
+
+	err = reader_seek_linear(r, &index_iter, &want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, &index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0)
+			goto done;
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0)
+			goto done;
+
+		err = block_iter_seek(&next.bi, &want_index.last_key);
+		if (err < 0)
+			goto done;
+
+		if (next.typ == reftable_record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = REFTABLE_FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter empty = TABLE_ITER_INIT;
+		struct table_iter *malloced =
+			reftable_calloc(sizeof(struct table_iter));
+		*malloced = empty;
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+done:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	reftable_record_clear(&want_index_rec);
+	reftable_record_clear(&index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reftable_reader *r,
+				struct reftable_iterator *it,
+				struct reftable_record *rec)
+{
+	struct reftable_reader_offsets *offs =
+		reader_offsets_for(r, reftable_record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = TABLE_ITER_INIT;
+	int err = 0;
+	if (idx > 0)
+		return reader_seek_indexed(r, it, rec);
+
+	err = reader_start(r, &ti, reftable_record_type(rec), false);
+	if (err < 0)
+		return err;
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0)
+		return err;
+	else {
+		struct table_iter *p =
+			reftable_malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec)
+{
+	uint8_t typ = reftable_record_type(rec);
+
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.ref_name = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reftable_reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int reftable_new_reader(struct reftable_reader **p,
+			struct reftable_block_source *src, char const *name)
+{
+	struct reftable_reader *rd =
+		reftable_calloc(sizeof(struct reftable_reader));
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		block_source_close(src);
+		reftable_free(rd);
+	}
+	return err;
+}
+
+void reftable_reader_free(struct reftable_reader *r)
+{
+	reader_close(r);
+	reftable_free(r);
+}
+
+static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
+					    struct reftable_iterator *it,
+					    uint8_t *oid)
+{
+	struct reftable_obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct reftable_record want_rec = { 0 };
+	struct reftable_iterator oit = { 0 };
+	struct reftable_obj_record got = { 0 };
+	struct reftable_record got_rec = { 0 };
+	int err = 0;
+	struct indexed_table_ref_iter *itr = NULL;
+
+	/* Look through the reverse index. */
+	reftable_record_from_obj(&want_rec, &want);
+	err = reader_seek(r, &oit, &want_rec);
+	if (err != 0)
+		goto done;
+
+	/* read out the reftable_obj_record */
+	reftable_record_from_obj(&got_rec, &got);
+	err = iterator_next(&oit, &got_rec);
+	if (err < 0)
+		goto done;
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		/* didn't find it; return empty iterator */
+		iterator_set_empty(it);
+		err = 0;
+		goto done;
+	}
+
+	err = new_indexed_table_ref_iter(&itr, r, oid, hash_size(r->hash_id),
+					 got.offsets, got.offset_len);
+	if (err < 0)
+		goto done;
+	got.offsets = NULL;
+	iterator_from_indexed_table_ref_iter(it, itr);
+
+done:
+	reftable_iterator_destroy(&oit);
+	reftable_record_clear(&got_rec);
+	return err;
+}
+
+static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
+					      struct reftable_iterator *it,
+					      uint8_t *oid)
+{
+	struct table_iter ti_empty = TABLE_ITER_INIT;
+	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
+	struct filtering_ref_iterator *filter = NULL;
+	struct filtering_ref_iterator empty = FILTERING_REF_ITERATOR_INIT;
+	int oid_len = hash_size(r->hash_id);
+	int err;
+
+	*ti = ti_empty;
+	err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		reftable_free(ti);
+		return err;
+	}
+
+	filter = reftable_malloc(sizeof(struct filtering_ref_iterator));
+	*filter = empty;
+
+	strbuf_add(&filter->oid, oid, oid_len);
+	reftable_table_from_reader(&filter->tab, r);
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid)
+{
+	if (r->obj_offsets.present)
+		return reftable_reader_refs_for_indexed(r, it, oid);
+	return reftable_reader_refs_for_unindexed(r, it, oid);
+}
+
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 00000000000..213e8fe847c
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,65 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source *source);
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_close(struct reftable_block_source *source);
+
+/* metadata for a block type */
+struct reftable_reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+/* The state for reading a reftable file. */
+struct reftable_reader {
+	/* for convience, associate a name with the instance. */
+	char *name;
+	struct reftable_block_source source;
+
+	/* Size of the file, excluding the footer. */
+	uint64_t size;
+
+	/* 'sha1' for SHA1, 's256' for SHA-256 */
+	uint32_t hash_id;
+
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	/* Length of the OID keys in the 'o' section */
+	int object_id_len;
+	int version;
+
+	struct reftable_reader_offsets ref_offsets;
+	struct reftable_reader_offsets obj_offsets;
+	struct reftable_reader_offsets log_offsets;
+};
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name);
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec);
+void reader_close(struct reftable_reader *r);
+const char *reader_name(struct reftable_reader *r);
+
+/* initialize a block reader to read from `r` */
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, uint8_t want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 00000000000..1e87d437035
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1126 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+/* record.c - methods for different types of records. */
+
+#include "record.h"
+
+#include "system.h"
+#include "constants.h"
+#include "reftable.h"
+#include "basics.h"
+
+int get_var_int(uint64_t *dest, struct string_view *in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in->len == 0)
+		return -1;
+	val = in->buf[ptr] & 0x7f;
+
+	while (in->buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in->len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in->buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct string_view *dest, uint64_t val)
+{
+	uint8_t buf[10] = { 0 };
+	int i = 9;
+	int n = 0;
+	buf[i] = (uint8_t)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (uint8_t)(val & 0x7f);
+		i--;
+	}
+
+	n = sizeof(buf) - i - 1;
+	if (dest->len < n)
+		return -1;
+	memcpy(dest->buf, &buf[i + 1], n);
+	return n;
+}
+
+int reftable_is_block_type(uint8_t typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+static int decode_string(struct strbuf *dest, struct string_view in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, &in);
+	if (n <= 0)
+		return -1;
+	string_view_consume(&in, n);
+	if (in.len < tsize)
+		return -1;
+
+	strbuf_reset(dest);
+	strbuf_add(dest, in.buf, tsize);
+	string_view_consume(&in, tsize);
+
+	return start_len - in.len;
+}
+
+static int encode_string(char *str, struct string_view s)
+{
+	struct string_view start = s;
+	int l = strlen(str);
+	int n = put_var_int(&s, l);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+	if (s.len < l)
+		return -1;
+	memcpy(s.buf, str, l);
+	string_view_consume(&s, l);
+
+	return start.len - s.len;
+}
+
+int reftable_encode_key(bool *restart, struct string_view dest,
+			struct strbuf prev_key, struct strbuf key,
+			uint8_t extra)
+{
+	struct string_view start = dest;
+	int prefix_len = common_prefix_size(&prev_key, &key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(&dest, (uint64_t)prefix_len);
+	if (n < 0)
+		return -1;
+	string_view_consume(&dest, n);
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(&dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0)
+		return -1;
+	string_view_consume(&dest, n);
+
+	if (dest.len < suffix_len)
+		return -1;
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	string_view_consume(&dest, suffix_len);
+
+	return start.len - dest.len;
+}
+
+int reftable_decode_key(struct strbuf *key, uint8_t *extra,
+			struct strbuf last_key, struct string_view in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, &in);
+	if (n < 0)
+		return -1;
+	string_view_consume(&in, n);
+
+	if (prefix_len > last_key.len)
+		return -1;
+
+	n = get_var_int(&suffix_len, &in);
+	if (n <= 0)
+		return -1;
+	string_view_consume(&in, n);
+
+	*extra = (uint8_t)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len)
+		return -1;
+
+	strbuf_reset(key);
+	strbuf_add(key, last_key.buf, prefix_len);
+	strbuf_add(key, in.buf, suffix_len);
+	string_view_consume(&in, suffix_len);
+
+	return start_len - in.len;
+}
+
+static void reftable_ref_record_key(const void *r, struct strbuf *dest)
+{
+	const struct reftable_ref_record *rec =
+		(const struct reftable_ref_record *)r;
+	strbuf_reset(dest);
+	strbuf_addstr(dest, rec->ref_name);
+}
+
+static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
+	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	reftable_ref_record_clear(ref);
+	if (src->ref_name != NULL) {
+		ref->ref_name = xstrdup(src->ref_name);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = reftable_malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = reftable_malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9)
+		return '0' + c;
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, uint8_t *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size(hash_id));
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size(hash_id));
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void reftable_ref_record_clear_void(void *rec)
+{
+	reftable_ref_record_clear((struct reftable_ref_record *)rec);
+}
+
+void reftable_ref_record_clear(struct reftable_ref_record *ref)
+{
+	reftable_free(ref->ref_name);
+	reftable_free(ref->target);
+	reftable_free(ref->target_value);
+	reftable_free(ref->value);
+	memset(ref, 0, sizeof(struct reftable_ref_record));
+}
+
+static uint8_t reftable_ref_record_val_type(const void *rec)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL)
+		return 3;
+	return 0;
+}
+
+static int reftable_ref_record_encode(const void *rec, struct string_view s,
+				      int hash_size)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	struct string_view start = s;
+	int n = put_var_int(&s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		string_view_consume(&s, hash_size);
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		string_view_consume(&s, hash_size);
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		string_view_consume(&s, n);
+	}
+
+	return start.len - s.len;
+}
+
+static int reftable_ref_record_decode(void *rec, struct strbuf key,
+				      uint8_t val_type, struct string_view in,
+				      int hash_size)
+{
+	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
+	struct string_view start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, &in);
+	if (n < 0)
+		return n;
+	assert(hash_size > 0);
+
+	string_view_consume(&in, n);
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
+	memcpy(r->ref_name, key.buf, key.len);
+	r->ref_name[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = reftable_malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		string_view_consume(&in, hash_size);
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = reftable_malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		string_view_consume(&in, hash_size);
+		break;
+	case 3: {
+		struct strbuf dest = STRBUF_INIT;
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		string_view_consume(&in, n);
+		seen_target = true;
+		if (r->target != NULL) {
+			reftable_free(r->target);
+		}
+		r->target = dest.buf;
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+static bool reftable_ref_record_is_deletion_void(const void *p)
+{
+	return reftable_ref_record_is_deletion(
+		(const struct reftable_ref_record *)p);
+}
+
+struct reftable_record_vtable reftable_ref_record_vtable = {
+	.key = &reftable_ref_record_key,
+	.type = BLOCK_TYPE_REF,
+	.copy_from = &reftable_ref_record_copy_from,
+	.val_type = &reftable_ref_record_val_type,
+	.encode = &reftable_ref_record_encode,
+	.decode = &reftable_ref_record_decode,
+	.clear = &reftable_ref_record_clear_void,
+	.is_deletion = &reftable_ref_record_is_deletion_void,
+};
+
+static void reftable_obj_record_key(const void *r, struct strbuf *dest)
+{
+	const struct reftable_obj_record *rec =
+		(const struct reftable_obj_record *)r;
+	strbuf_reset(dest);
+	strbuf_add(dest, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void reftable_obj_record_clear(void *rec)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	FREE_AND_NULL(obj->hash_prefix);
+	FREE_AND_NULL(obj->offsets);
+	memset(obj, 0, sizeof(struct reftable_obj_record));
+}
+
+static void reftable_obj_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	const struct reftable_obj_record *src =
+		(const struct reftable_obj_record *)src_rec;
+	int olen;
+
+	reftable_obj_record_clear(obj);
+	*obj = *src;
+	obj->hash_prefix = reftable_malloc(obj->hash_prefix_len);
+	memcpy(obj->hash_prefix, src->hash_prefix, obj->hash_prefix_len);
+
+	olen = obj->offset_len * sizeof(uint64_t);
+	obj->offsets = reftable_malloc(olen);
+	memcpy(obj->offsets, src->offsets, olen);
+}
+
+static uint8_t reftable_obj_record_val_type(const void *rec)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8)
+		return r->offset_len;
+	return 0;
+}
+
+static int reftable_obj_record_encode(const void *rec, struct string_view s,
+				      int hash_size)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	struct string_view start = s;
+	int i = 0;
+	int n = 0;
+	uint64_t last = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(&s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		string_view_consume(&s, n);
+	}
+	if (r->offset_len == 0)
+		return start.len - s.len;
+	n = put_var_int(&s, r->offsets[0]);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	last = r->offsets[0];
+	for (i = 1; i < r->offset_len; i++) {
+		int n = put_var_int(&s, r->offsets[i] - last);
+		if (n < 0) {
+			return -1;
+		}
+		string_view_consume(&s, n);
+		last = r->offsets[i];
+	}
+	return start.len - s.len;
+}
+
+static int reftable_obj_record_decode(void *rec, struct strbuf key,
+				      uint8_t val_type, struct string_view in,
+				      int hash_size)
+{
+	struct string_view start = in;
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	uint64_t last;
+	int j;
+	r->hash_prefix = reftable_malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, &in);
+		if (n < 0) {
+			return n;
+		}
+
+		string_view_consume(&in, n);
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0)
+		return start.len - in.len;
+
+	r->offsets = reftable_malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], &in);
+	if (n < 0)
+		return n;
+	string_view_consume(&in, n);
+
+	last = r->offsets[0];
+	j = 1;
+	while (j < count) {
+		uint64_t delta = 0;
+		int n = get_var_int(&delta, &in);
+		if (n < 0) {
+			return n;
+		}
+		string_view_consume(&in, n);
+
+		last = r->offsets[j] = (delta + last);
+		j++;
+	}
+	return start.len - in.len;
+}
+
+static bool not_a_deletion(const void *p)
+{
+	return false;
+}
+
+struct reftable_record_vtable reftable_obj_record_vtable = {
+	.key = &reftable_obj_record_key,
+	.type = BLOCK_TYPE_OBJ,
+	.copy_from = &reftable_obj_record_copy_from,
+	.val_type = &reftable_obj_record_val_type,
+	.encode = &reftable_obj_record_encode,
+	.decode = &reftable_obj_record_decode,
+	.clear = &reftable_obj_record_clear,
+	.is_deletion = not_a_deletion,
+};
+
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size(hash_id));
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size(hash_id));
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static void reftable_log_record_key(const void *r, struct strbuf *dest)
+{
+	const struct reftable_log_record *rec =
+		(const struct reftable_log_record *)r;
+	int len = strlen(rec->ref_name);
+	uint8_t i64[8];
+	uint64_t ts = 0;
+	strbuf_reset(dest);
+	strbuf_add(dest, (uint8_t *)rec->ref_name, len + 1);
+
+	ts = (~ts) - rec->update_index;
+	put_be64(&i64[0], ts);
+	strbuf_add(dest, i64, sizeof(i64));
+}
+
+static void reftable_log_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
+	const struct reftable_log_record *src =
+		(const struct reftable_log_record *)src_rec;
+
+	reftable_log_record_clear(dst);
+	*dst = *src;
+	if (dst->ref_name != NULL) {
+		dst->ref_name = xstrdup(dst->ref_name);
+	}
+	if (dst->email != NULL) {
+		dst->email = xstrdup(dst->email);
+	}
+	if (dst->name != NULL) {
+		dst->name = xstrdup(dst->name);
+	}
+	if (dst->message != NULL) {
+		dst->message = xstrdup(dst->message);
+	}
+
+	if (dst->new_hash != NULL) {
+		dst->new_hash = reftable_malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = reftable_malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void reftable_log_record_clear_void(void *rec)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	reftable_log_record_clear(r);
+}
+
+void reftable_log_record_clear(struct reftable_log_record *r)
+{
+	reftable_free(r->ref_name);
+	reftable_free(r->new_hash);
+	reftable_free(r->old_hash);
+	reftable_free(r->name);
+	reftable_free(r->email);
+	reftable_free(r->message);
+	memset(r, 0, sizeof(struct reftable_log_record));
+}
+
+static uint8_t reftable_log_record_val_type(const void *rec)
+{
+	const struct reftable_log_record *log =
+		(const struct reftable_log_record *)rec;
+
+	return reftable_log_record_is_deletion(log) ? 0 : 1;
+}
+
+static uint8_t zero[SHA256_SIZE] = { 0 };
+
+static int reftable_log_record_encode(const void *rec, struct string_view s,
+				      int hash_size)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	struct string_view start = s;
+	int n = 0;
+	uint8_t *oldh = r->old_hash;
+	uint8_t *newh = r->new_hash;
+	if (reftable_log_record_is_deletion(r))
+		return 0;
+
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size)
+		return -1;
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	string_view_consume(&s, 2 * hash_size);
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	n = put_var_int(&s, r->time);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	if (s.len < 2)
+		return -1;
+
+	put_be16(s.buf, r->tz_offset);
+	string_view_consume(&s, 2);
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	return start.len - s.len;
+}
+
+static int reftable_log_record_decode(void *rec, struct strbuf key,
+				      uint8_t val_type, struct string_view in,
+				      int hash_size)
+{
+	struct string_view start = in;
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct strbuf dest = STRBUF_INIT;
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0)
+		return REFTABLE_FORMAT_ERROR;
+
+	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
+	memcpy(r->ref_name, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (val_type == 0) {
+		FREE_AND_NULL(r->old_hash);
+		FREE_AND_NULL(r->new_hash);
+		FREE_AND_NULL(r->message);
+		FREE_AND_NULL(r->email);
+		FREE_AND_NULL(r->name);
+		return 0;
+	}
+
+	if (in.len < 2 * hash_size)
+		return REFTABLE_FORMAT_ERROR;
+
+	r->old_hash = reftable_realloc(r->old_hash, hash_size);
+	r->new_hash = reftable_realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	string_view_consume(&in, 2 * hash_size);
+
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	string_view_consume(&in, n);
+
+	r->name = reftable_realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	strbuf_reset(&dest);
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	string_view_consume(&in, n);
+
+	r->email = reftable_realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, &in);
+	if (n < 0)
+		goto done;
+	string_view_consume(&in, n);
+	r->time = ts;
+	if (in.len < 2)
+		goto done;
+
+	r->tz_offset = get_be16(in.buf);
+	string_view_consume(&in, 2);
+
+	strbuf_reset(&dest);
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	string_view_consume(&in, n);
+
+	r->message = reftable_realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	strbuf_release(&dest);
+	return start.len - in.len;
+
+done:
+	strbuf_release(&dest);
+	return REFTABLE_FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL)
+		a = empty;
+
+	if (b == NULL)
+		b = empty;
+
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(uint8_t *a, uint8_t *b, int sz)
+{
+	if (a == NULL)
+		a = zero;
+
+	if (b == NULL)
+		b = zero;
+
+	return !memcmp(a, b, sz);
+}
+
+bool reftable_log_record_equal(struct reftable_log_record *a,
+			       struct reftable_log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+static bool reftable_log_record_is_deletion_void(const void *p)
+{
+	return reftable_log_record_is_deletion(
+		(const struct reftable_log_record *)p);
+}
+
+struct reftable_record_vtable reftable_log_record_vtable = {
+	.key = &reftable_log_record_key,
+	.type = BLOCK_TYPE_LOG,
+	.copy_from = &reftable_log_record_copy_from,
+	.val_type = &reftable_log_record_val_type,
+	.encode = &reftable_log_record_encode,
+	.decode = &reftable_log_record_decode,
+	.clear = &reftable_log_record_clear_void,
+	.is_deletion = &reftable_log_record_is_deletion_void,
+};
+
+struct reftable_record reftable_new_record(uint8_t typ)
+{
+	struct reftable_record rec = { NULL };
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct reftable_ref_record *r =
+			reftable_calloc(sizeof(struct reftable_ref_record));
+		reftable_record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct reftable_obj_record *r =
+			reftable_calloc(sizeof(struct reftable_obj_record));
+		reftable_record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct reftable_log_record *r =
+			reftable_calloc(sizeof(struct reftable_log_record));
+		reftable_record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct reftable_index_record empty = { .last_key =
+							       STRBUF_INIT };
+		struct reftable_index_record *r =
+			reftable_calloc(sizeof(struct reftable_index_record));
+		*r = empty;
+		reftable_record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+void *reftable_record_yield(struct reftable_record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+void reftable_record_destroy(struct reftable_record *rec)
+{
+	reftable_record_clear(rec);
+	reftable_free(reftable_record_yield(rec));
+}
+
+static void reftable_index_record_key(const void *r, struct strbuf *dest)
+{
+	struct reftable_index_record *rec = (struct reftable_index_record *)r;
+	strbuf_reset(dest);
+	strbuf_addbuf(dest, &rec->last_key);
+}
+
+static void reftable_index_record_copy_from(void *rec, const void *src_rec,
+					    int hash_size)
+{
+	struct reftable_index_record *dst = (struct reftable_index_record *)rec;
+	struct reftable_index_record *src =
+		(struct reftable_index_record *)src_rec;
+
+	strbuf_reset(&dst->last_key);
+	strbuf_addbuf(&dst->last_key, &src->last_key);
+	dst->offset = src->offset;
+}
+
+static void reftable_index_record_clear(void *rec)
+{
+	struct reftable_index_record *idx = (struct reftable_index_record *)rec;
+	strbuf_release(&idx->last_key);
+}
+
+static uint8_t reftable_index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int reftable_index_record_encode(const void *rec, struct string_view out,
+					int hash_size)
+{
+	const struct reftable_index_record *r =
+		(const struct reftable_index_record *)rec;
+	struct string_view start = out;
+
+	int n = put_var_int(&out, r->offset);
+	if (n < 0)
+		return n;
+
+	string_view_consume(&out, n);
+
+	return start.len - out.len;
+}
+
+static int reftable_index_record_decode(void *rec, struct strbuf key,
+					uint8_t val_type, struct string_view in,
+					int hash_size)
+{
+	struct string_view start = in;
+	struct reftable_index_record *r = (struct reftable_index_record *)rec;
+	int n = 0;
+
+	strbuf_reset(&r->last_key);
+	strbuf_addbuf(&r->last_key, &key);
+
+	n = get_var_int(&r->offset, &in);
+	if (n < 0)
+		return n;
+
+	string_view_consume(&in, n);
+	return start.len - in.len;
+}
+
+struct reftable_record_vtable reftable_index_record_vtable = {
+	.key = &reftable_index_record_key,
+	.type = BLOCK_TYPE_INDEX,
+	.copy_from = &reftable_index_record_copy_from,
+	.val_type = &reftable_index_record_val_type,
+	.encode = &reftable_index_record_encode,
+	.decode = &reftable_index_record_decode,
+	.clear = &reftable_index_record_clear,
+	.is_deletion = &not_a_deletion,
+};
+
+void reftable_record_key(struct reftable_record *rec, struct strbuf *dest)
+{
+	rec->ops->key(rec->data, dest);
+}
+
+uint8_t reftable_record_type(struct reftable_record *rec)
+{
+	return rec->ops->type;
+}
+
+int reftable_record_encode(struct reftable_record *rec, struct string_view dest,
+			   int hash_size)
+{
+	return rec->ops->encode(rec->data, dest, hash_size);
+}
+
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size)
+{
+	assert(src->ops->type == rec->ops->type);
+
+	rec->ops->copy_from(rec->data, src->data, hash_size);
+}
+
+uint8_t reftable_record_val_type(struct reftable_record *rec)
+{
+	return rec->ops->val_type(rec->data);
+}
+
+int reftable_record_decode(struct reftable_record *rec, struct strbuf key,
+			   uint8_t extra, struct string_view src, int hash_size)
+{
+	return rec->ops->decode(rec->data, key, extra, src, hash_size);
+}
+
+void reftable_record_clear(struct reftable_record *rec)
+{
+	rec->ops->clear(rec->data);
+}
+
+bool reftable_record_is_deletion(struct reftable_record *rec)
+{
+	return rec->ops->is_deletion(rec->data);
+}
+
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *ref_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = ref_rec;
+	rec->ops = &reftable_ref_record_vtable;
+}
+
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *obj_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = obj_rec;
+	rec->ops = &reftable_obj_record_vtable;
+}
+
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *index_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = index_rec;
+	rec->ops = &reftable_index_record_vtable;
+}
+
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *log_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = log_rec;
+	rec->ops = &reftable_log_record_vtable;
+}
+
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_REF);
+	return (struct reftable_ref_record *)rec->data;
+}
+
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_LOG);
+	return (struct reftable_log_record *)rec->data;
+}
+
+static bool hash_equal(uint8_t *a, uint8_t *b, int hash_size)
+{
+	if (a != NULL && b != NULL)
+		return !memcmp(a, b, hash_size);
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL)
+		return 0 == strcmp(a, b);
+
+	return a == b;
+}
+
+bool reftable_ref_record_equal(struct reftable_ref_record *a,
+			       struct reftable_ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->ref_name, b->ref_name) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int reftable_ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_ref_record *)a)->ref_name,
+		      ((struct reftable_ref_record *)b)->ref_name);
+}
+
+bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int reftable_log_record_compare_key(const void *a, const void *b)
+{
+	struct reftable_log_record *la = (struct reftable_log_record *)a;
+	struct reftable_log_record *lb = (struct reftable_log_record *)b;
+
+	int cmp = strcmp(la->ref_name, lb->ref_name);
+	if (cmp)
+		return cmp;
+	if (la->update_index > lb->update_index)
+		return -1;
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
+{
+	return (log->new_hash == NULL && log->old_hash == NULL &&
+		log->name == NULL && log->email == NULL &&
+		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
+		log->message == NULL);
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
+
+void string_view_consume(struct string_view *s, int n)
+{
+	s->buf += n;
+	s->len -= n;
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 00000000000..ba753dcec05
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,143 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "strbuf.h"
+#include "system.h"
+
+/*
+  A substring of existing string data. This structure takes no responsibility
+  for the lifetime of the data it points to.
+*/
+struct string_view {
+	uint8_t *buf;
+	int len;
+};
+
+/* Advance `s.buf` by `n`, and decrease length. */
+void string_view_consume(struct string_view *s, int n);
+
+/* utilities for de/encoding varints */
+
+int get_var_int(uint64_t *dest, struct string_view *in);
+int put_var_int(struct string_view *dest, uint64_t val);
+
+/* Methods for records. */
+struct reftable_record_vtable {
+	/* encode the key of to a uint8_t strbuf. */
+	void (*key)(const void *rec, struct strbuf *dest);
+
+	/* The record type of ('r' for ref). */
+	uint8_t type;
+
+	void (*copy_from)(void *dest, const void *src, int hash_size);
+
+	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
+	 * vs ref deletion) */
+	uint8_t (*val_type)(const void *rec);
+
+	/* encodes rec into dest, returning how much space was used. */
+	int (*encode)(const void *rec, struct string_view dest, int hash_size);
+
+	/* decode data from `src` into the record. */
+	int (*decode)(void *rec, struct strbuf key, uint8_t extra,
+		      struct string_view src, int hash_size);
+
+	/* deallocate and null the record. */
+	void (*clear)(void *rec);
+
+	/* is this a tombstone? */
+	bool (*is_deletion)(const void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct reftable_record {
+	void *data;
+	struct reftable_record_vtable *ops;
+};
+
+/* returns true for recognized block types. Block start with the block type. */
+int reftable_is_block_type(uint8_t typ);
+
+/* creates a malloced record of the given type. Dispose with record_destroy */
+struct reftable_record reftable_new_record(uint8_t typ);
+
+extern struct reftable_record_vtable reftable_ref_record_vtable;
+
+/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
+   number of bytes written. */
+int reftable_encode_key(bool *restart, struct string_view dest,
+			struct strbuf prev_key, struct strbuf key,
+			uint8_t extra);
+
+/* Decode into `key` and `extra` from `in` */
+int reftable_decode_key(struct strbuf *key, uint8_t *extra,
+			struct strbuf last_key, struct string_view in);
+
+/* reftable_index_record are used internally to speed up lookups. */
+struct reftable_index_record {
+	uint64_t offset; /* Offset of block */
+	struct strbuf last_key; /* Last key of the block. */
+};
+
+/* reftable_obj_record stores an object ID => ref mapping. */
+struct reftable_obj_record {
+	uint8_t *hash_prefix; /* leading bytes of the object ID */
+	int hash_prefix_len; /* number of leading bytes. Constant
+			      * across a single table. */
+	uint64_t *offsets; /* a vector of file offsets. */
+	int offset_len;
+};
+
+/* see struct record_vtable */
+
+void reftable_record_key(struct reftable_record *rec, struct strbuf *dest);
+uint8_t reftable_record_type(struct reftable_record *rec);
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size);
+uint8_t reftable_record_val_type(struct reftable_record *rec);
+int reftable_record_encode(struct reftable_record *rec, struct string_view dest,
+			   int hash_size);
+int reftable_record_decode(struct reftable_record *rec, struct strbuf key,
+			   uint8_t extra, struct string_view src,
+			   int hash_size);
+bool reftable_record_is_deletion(struct reftable_record *rec);
+
+/* zeroes out the embedded record */
+void reftable_record_clear(struct reftable_record *rec);
+
+/* clear out the record, yielding the reftable_record data that was
+ * encapsulated. */
+void *reftable_record_yield(struct reftable_record *rec);
+
+/* clear and deallocate embedded record, and zero `rec`. */
+void reftable_record_destroy(struct reftable_record *rec);
+
+/* initialize generic records from concrete records. The generic record should
+ * be zeroed out. */
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *objrec);
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *idxrec);
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *refrec);
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *logrec);
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *ref);
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *ref);
+
+/* for qsort. */
+int reftable_ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int reftable_log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/record_test.c b/reftable/record_test.c
new file mode 100644
index 00000000000..232e939e66d
--- /dev/null
+++ b/reftable/record_test.c
@@ -0,0 +1,410 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include "system.h"
+#include "basics.h"
+#include "constants.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_copy(struct reftable_record *rec)
+{
+	struct reftable_record copy =
+		reftable_new_record(reftable_record_type(rec));
+	reftable_record_copy_from(&copy, rec, SHA1_SIZE);
+	/* do it twice to catch memory leaks */
+	reftable_record_copy_from(&copy, rec, SHA1_SIZE);
+	switch (reftable_record_type(&copy)) {
+	case BLOCK_TYPE_REF:
+		assert(reftable_ref_record_equal(reftable_record_as_ref(&copy),
+						 reftable_record_as_ref(rec),
+						 SHA1_SIZE));
+		break;
+	case BLOCK_TYPE_LOG:
+		assert(reftable_log_record_equal(reftable_record_as_log(&copy),
+						 reftable_record_as_log(rec),
+						 SHA1_SIZE));
+		break;
+	}
+	reftable_record_destroy(&copy);
+}
+
+static void test_varint_roundtrip(void)
+{
+	uint64_t inputs[] = { 0,
+			      1,
+			      27,
+			      127,
+			      128,
+			      257,
+			      4096,
+			      ((uint64_t)1 << 63),
+			      ((uint64_t)1 << 63) + ((uint64_t)1 << 63) - 1 };
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(inputs); i++) {
+		uint8_t dest[10];
+
+		struct string_view out = {
+			.buf = dest,
+			.len = sizeof(dest),
+		};
+		uint64_t in = inputs[i];
+		int n = put_var_int(&out, in);
+		uint64_t got = 0;
+
+		assert(n > 0);
+		out.len = n;
+		n = get_var_int(&got, &out);
+		assert(n > 0);
+
+		assert(got == in);
+	}
+}
+
+static void test_common_prefix(void)
+{
+	struct {
+		const char *a, *b;
+		int want;
+	} cases[] = {
+		{ "abc", "ab", 2 },
+		{ "", "abc", 0 },
+		{ "abc", "abd", 2 },
+		{ "abc", "pqr", 0 },
+	};
+
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		struct strbuf a = STRBUF_INIT;
+		struct strbuf b = STRBUF_INIT;
+		strbuf_addstr(&a, cases[i].a);
+		strbuf_addstr(&b, cases[i].b);
+		assert(common_prefix_size(&a, &b) == cases[i].want);
+
+		strbuf_release(&a);
+		strbuf_release(&b);
+	}
+}
+
+static void set_hash(uint8_t *h, int j)
+{
+	int i = 0;
+	for (i = 0; i < hash_size(SHA1_ID); i++) {
+		h[i] = (j >> i) & 0xff;
+	}
+}
+
+static void test_reftable_ref_record_roundtrip(void)
+{
+	int i = 0;
+
+	for (i = 0; i <= 3; i++) {
+		struct reftable_ref_record in = { 0 };
+		struct reftable_ref_record out = {
+			.ref_name = xstrdup("old name"),
+			.value = reftable_calloc(SHA1_SIZE),
+			.target_value = reftable_calloc(SHA1_SIZE),
+			.target = xstrdup("old value"),
+		};
+		struct reftable_record rec_out = { 0 };
+		struct strbuf key = STRBUF_INIT;
+		struct reftable_record rec = { 0 };
+		uint8_t buffer[1024] = { 0 };
+		struct string_view dest = {
+			.buf = buffer,
+			.len = sizeof(buffer),
+		};
+
+		int n, m;
+
+		switch (i) {
+		case 0:
+			break;
+		case 1:
+			in.value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.value, 1);
+			break;
+		case 2:
+			in.value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.value, 1);
+			in.target_value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.target_value, 2);
+			break;
+		case 3:
+			in.target = xstrdup("target");
+			break;
+		}
+		in.ref_name = xstrdup("refs/heads/master");
+
+		reftable_record_from_ref(&rec, &in);
+		test_copy(&rec);
+
+		assert(reftable_record_val_type(&rec) == i);
+
+		reftable_record_key(&rec, &key);
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n > 0);
+
+		/* decode into a non-zero reftable_record to test for leaks. */
+
+		reftable_record_from_ref(&rec_out, &out);
+		m = reftable_record_decode(&rec_out, key, i, dest, SHA1_SIZE);
+		assert(n == m);
+
+		assert((out.value != NULL) == (in.value != NULL));
+		assert((out.target_value != NULL) == (in.target_value != NULL));
+		assert((out.target != NULL) == (in.target != NULL));
+		reftable_record_clear(&rec_out);
+
+		strbuf_release(&key);
+		reftable_ref_record_clear(&in);
+	}
+}
+
+static void test_reftable_log_record_equal(void)
+{
+	struct reftable_log_record in[2] = {
+		{
+			.ref_name = xstrdup("refs/heads/master"),
+			.update_index = 42,
+		},
+		{
+			.ref_name = xstrdup("refs/heads/master"),
+			.update_index = 22,
+		}
+	};
+
+	assert(!reftable_log_record_equal(&in[0], &in[1], SHA1_SIZE));
+	in[1].update_index = in[0].update_index;
+	assert(reftable_log_record_equal(&in[0], &in[1], SHA1_SIZE));
+	reftable_log_record_clear(&in[0]);
+	reftable_log_record_clear(&in[1]);
+}
+
+static void test_reftable_log_record_roundtrip(void)
+{
+	struct reftable_log_record in[2] = {
+		{
+			.ref_name = xstrdup("refs/heads/master"),
+			.old_hash = reftable_malloc(SHA1_SIZE),
+			.new_hash = reftable_malloc(SHA1_SIZE),
+			.name = xstrdup("han-wen"),
+			.email = xstrdup("hanwen@google.com"),
+			.message = xstrdup("test"),
+			.update_index = 42,
+			.time = 1577123507,
+			.tz_offset = 100,
+		},
+		{
+			.ref_name = xstrdup("refs/heads/master"),
+			.update_index = 22,
+		}
+	};
+	set_test_hash(in[0].new_hash, 1);
+	set_test_hash(in[0].old_hash, 2);
+	for (int i = 0; i < ARRAY_SIZE(in); i++) {
+		struct reftable_record rec = { 0 };
+		struct strbuf key = STRBUF_INIT;
+		uint8_t buffer[1024] = { 0 };
+		struct string_view dest = {
+			.buf = buffer,
+			.len = sizeof(buffer),
+		};
+		/* populate out, to check for leaks. */
+		struct reftable_log_record out = {
+			.ref_name = xstrdup("old name"),
+			.new_hash = reftable_calloc(SHA1_SIZE),
+			.old_hash = reftable_calloc(SHA1_SIZE),
+			.name = xstrdup("old name"),
+			.email = xstrdup("old@email"),
+			.message = xstrdup("old message"),
+		};
+		struct reftable_record rec_out = { 0 };
+		int n, m, valtype;
+
+		reftable_record_from_log(&rec, &in[i]);
+
+		test_copy(&rec);
+
+		reftable_record_key(&rec, &key);
+
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n >= 0);
+		reftable_record_from_log(&rec_out, &out);
+		valtype = reftable_record_val_type(&rec);
+		m = reftable_record_decode(&rec_out, key, valtype, dest,
+					   SHA1_SIZE);
+		assert(n == m);
+
+		assert(reftable_log_record_equal(&in[i], &out, SHA1_SIZE));
+		reftable_log_record_clear(&in[i]);
+		strbuf_release(&key);
+		reftable_record_clear(&rec_out);
+	}
+}
+
+static void test_u24_roundtrip(void)
+{
+	uint32_t in = 0x112233;
+	uint8_t dest[3];
+	uint32_t out;
+	put_be24(dest, in);
+	out = get_be24(dest);
+	assert(in == out);
+}
+
+static void test_key_roundtrip(void)
+{
+	uint8_t buffer[1024] = { 0 };
+	struct string_view dest = {
+		.buf = buffer,
+		.len = sizeof(buffer),
+	};
+	struct strbuf last_key = STRBUF_INIT;
+	struct strbuf key = STRBUF_INIT;
+	struct strbuf roundtrip = STRBUF_INIT;
+	bool restart;
+	uint8_t extra;
+	int n, m;
+	uint8_t rt_extra;
+
+	strbuf_addstr(&last_key, "refs/heads/master");
+	strbuf_addstr(&key, "refs/tags/bla");
+	extra = 6;
+	n = reftable_encode_key(&restart, dest, last_key, key, extra);
+	assert(!restart);
+	assert(n > 0);
+
+	m = reftable_decode_key(&roundtrip, &rt_extra, last_key, dest);
+	assert(n == m);
+	assert(0 == strbuf_cmp(&key, &roundtrip));
+	assert(rt_extra == extra);
+
+	strbuf_release(&last_key);
+	strbuf_release(&key);
+	strbuf_release(&roundtrip);
+}
+
+static void test_reftable_obj_record_roundtrip(void)
+{
+	uint8_t testHash1[SHA1_SIZE] = { 1, 2, 3, 4, 0 };
+	uint64_t till9[] = { 1, 2, 3, 4, 500, 600, 700, 800, 9000 };
+	struct reftable_obj_record recs[3] = { {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+						       .offsets = till9,
+						       .offset_len = 3,
+					       },
+					       {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+						       .offsets = till9,
+						       .offset_len = 9,
+					       },
+					       {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+					       } };
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(recs); i++) {
+		struct reftable_obj_record in = recs[i];
+		uint8_t buffer[1024] = { 0 };
+		struct string_view dest = {
+			.buf = buffer,
+			.len = sizeof(buffer),
+		};
+		struct reftable_record rec = { 0 };
+		struct strbuf key = STRBUF_INIT;
+		struct reftable_obj_record out = { 0 };
+		struct reftable_record rec_out = { 0 };
+		int n, m;
+		uint8_t extra;
+
+		reftable_record_from_obj(&rec, &in);
+		test_copy(&rec);
+		reftable_record_key(&rec, &key);
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n > 0);
+		extra = reftable_record_val_type(&rec);
+		reftable_record_from_obj(&rec_out, &out);
+		m = reftable_record_decode(&rec_out, key, extra, dest,
+					   SHA1_SIZE);
+		assert(n == m);
+
+		assert(in.hash_prefix_len == out.hash_prefix_len);
+		assert(in.offset_len == out.offset_len);
+
+		assert(!memcmp(in.hash_prefix, out.hash_prefix,
+			       in.hash_prefix_len));
+		assert(0 == memcmp(in.offsets, out.offsets,
+				   sizeof(uint64_t) * in.offset_len));
+		strbuf_release(&key);
+		reftable_record_clear(&rec_out);
+	}
+}
+
+static void test_reftable_index_record_roundtrip(void)
+{
+	struct reftable_index_record in = {
+		.offset = 42,
+		.last_key = STRBUF_INIT,
+	};
+	uint8_t buffer[1024] = { 0 };
+	struct string_view dest = {
+		.buf = buffer,
+		.len = sizeof(buffer),
+	};
+	struct strbuf key = STRBUF_INIT;
+	struct reftable_record rec = { 0 };
+	struct reftable_index_record out = { .last_key = STRBUF_INIT };
+	struct reftable_record out_rec = { NULL };
+	int n, m;
+	uint8_t extra;
+
+	strbuf_addstr(&in.last_key, "refs/heads/master");
+	reftable_record_from_index(&rec, &in);
+	reftable_record_key(&rec, &key);
+	test_copy(&rec);
+
+	assert(0 == strbuf_cmp(&key, &in.last_key));
+	n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+	assert(n > 0);
+
+	extra = reftable_record_val_type(&rec);
+	reftable_record_from_index(&out_rec, &out);
+	m = reftable_record_decode(&out_rec, key, extra, dest, SHA1_SIZE);
+	assert(m == n);
+
+	assert(in.offset == out.offset);
+
+	reftable_record_clear(&out_rec);
+	strbuf_release(&key);
+	strbuf_release(&in.last_key);
+}
+
+int record_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_reftable_log_record_equal",
+		      &test_reftable_log_record_equal);
+	add_test_case("test_reftable_log_record_roundtrip",
+		      &test_reftable_log_record_roundtrip);
+	add_test_case("test_reftable_ref_record_roundtrip",
+		      &test_reftable_ref_record_roundtrip);
+	add_test_case("test_varint_roundtrip", &test_varint_roundtrip);
+	add_test_case("test_key_roundtrip", &test_key_roundtrip);
+	add_test_case("test_common_prefix", &test_common_prefix);
+	add_test_case("test_reftable_obj_record_roundtrip",
+		      &test_reftable_obj_record_roundtrip);
+	add_test_case("test_reftable_index_record_roundtrip",
+		      &test_reftable_index_record_roundtrip);
+	add_test_case("test_u24_roundtrip", &test_u24_roundtrip);
+	return test_main(argc, argv);
+}
diff --git a/reftable/refname.c b/reftable/refname.c
new file mode 100644
index 00000000000..6b2658fc158
--- /dev/null
+++ b/reftable/refname.c
@@ -0,0 +1,209 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+#include "reftable.h"
+#include "basics.h"
+#include "refname.h"
+#include "strbuf.h"
+
+struct find_arg {
+	char **names;
+	const char *want;
+};
+
+static int find_name(size_t k, void *arg)
+{
+	struct find_arg *f_arg = (struct find_arg *)arg;
+	return strcmp(f_arg->names[k], f_arg->want) >= 0;
+}
+
+int modification_has_ref(struct modification *mod, const char *name)
+{
+	struct reftable_ref_record ref = { 0 };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = name,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len && !strcmp(mod->add[idx], name)) {
+			return 0;
+		}
+	}
+
+	if (mod->del_len > 0) {
+		struct find_arg arg = {
+			.names = mod->del,
+			.want = name,
+		};
+		int idx = binsearch(mod->del_len, find_name, &arg);
+		if (idx < mod->del_len && !strcmp(mod->del[idx], name)) {
+			return 1;
+		}
+	}
+
+	err = reftable_table_read_ref(&mod->tab, name, &ref);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static void modification_clear(struct modification *mod)
+{
+	/* don't delete the strings themselves; they're owned by ref records.
+	 */
+	FREE_AND_NULL(mod->add);
+	FREE_AND_NULL(mod->del);
+	mod->add_len = 0;
+	mod->del_len = 0;
+}
+
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix)
+{
+	struct reftable_iterator it = { NULL };
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = prefix,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len &&
+		    !strncmp(prefix, mod->add[idx], strlen(prefix)))
+			goto done;
+	}
+	err = reftable_table_seek_ref(&mod->tab, &it, prefix);
+	if (err)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err)
+			goto done;
+
+		if (mod->del_len > 0) {
+			struct find_arg arg = {
+				.names = mod->del,
+				.want = ref.ref_name,
+			};
+			int idx = binsearch(mod->del_len, find_name, &arg);
+			if (idx < mod->del_len &&
+			    !strcmp(ref.ref_name, mod->del[idx])) {
+				continue;
+			}
+		}
+
+		if (strncmp(ref.ref_name, prefix, strlen(prefix))) {
+			err = 1;
+			goto done;
+		}
+		err = 0;
+		goto done;
+	}
+
+done:
+	reftable_ref_record_clear(&ref);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int validate_ref_name(const char *name)
+{
+	while (true) {
+		char *next = strchr(name, '/');
+		if (!*name) {
+			return REFTABLE_REFNAME_ERROR;
+		}
+		if (!next) {
+			return 0;
+		}
+		if (next - name == 0 || (next - name == 1 && *name == '.') ||
+		    (next - name == 2 && name[0] == '.' && name[1] == '.'))
+			return REFTABLE_REFNAME_ERROR;
+		name = next + 1;
+	}
+	return 0;
+}
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz)
+{
+	struct modification mod = {
+		.tab = tab,
+		.add = reftable_calloc(sizeof(char *) * sz),
+		.del = reftable_calloc(sizeof(char *) * sz),
+	};
+	int i = 0;
+	int err = 0;
+	for (; i < sz; i++) {
+		if (reftable_ref_record_is_deletion(&recs[i])) {
+			mod.del[mod.del_len++] = recs[i].ref_name;
+		} else {
+			mod.add[mod.add_len++] = recs[i].ref_name;
+		}
+	}
+
+	err = modification_validate(&mod);
+	modification_clear(&mod);
+	return err;
+}
+
+static void strbuf_trim_component(struct strbuf *sl)
+{
+	while (sl->len > 0) {
+		bool is_slash = (sl->buf[sl->len - 1] == '/');
+		strbuf_setlen(sl, sl->len - 1);
+		if (is_slash)
+			break;
+	}
+}
+
+int modification_validate(struct modification *mod)
+{
+	struct strbuf slashed = STRBUF_INIT;
+	int err = 0;
+	int i = 0;
+	for (; i < mod->add_len; i++) {
+		err = validate_ref_name(mod->add[i]);
+		if (err)
+			goto done;
+		strbuf_reset(&slashed);
+		strbuf_addstr(&slashed, mod->add[i]);
+		strbuf_addstr(&slashed, "/");
+
+		err = modification_has_ref_with_prefix(mod, slashed.buf);
+		if (err == 0) {
+			err = REFTABLE_NAME_CONFLICT;
+			goto done;
+		}
+		if (err < 0)
+			goto done;
+
+		strbuf_reset(&slashed);
+		strbuf_addstr(&slashed, mod->add[i]);
+		while (slashed.len) {
+			strbuf_trim_component(&slashed);
+			err = modification_has_ref(mod, slashed.buf);
+			if (err == 0) {
+				err = REFTABLE_NAME_CONFLICT;
+				goto done;
+			}
+			if (err < 0)
+				goto done;
+		}
+	}
+	err = 0;
+done:
+	strbuf_release(&slashed);
+	return err;
+}
diff --git a/reftable/refname.h b/reftable/refname.h
new file mode 100644
index 00000000000..f335287fcdb
--- /dev/null
+++ b/reftable/refname.h
@@ -0,0 +1,38 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+#ifndef REFNAME_H
+#define REFNAME_H
+
+#include "reftable.h"
+
+struct modification {
+	struct reftable_table tab;
+
+	char **add;
+	size_t add_len;
+
+	char **del;
+	size_t del_len;
+};
+
+// -1 = error, 0 = found, 1 = not found
+int modification_has_ref(struct modification *mod, const char *name);
+
+// -1 = error, 0 = found, 1 = not found.
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix);
+
+// 0 = OK.
+int validate_ref_name(const char *name);
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz);
+
+int modification_validate(struct modification *mod);
+
+#endif
diff --git a/reftable/refname_test.c b/reftable/refname_test.c
new file mode 100644
index 00000000000..4b1b6912539
--- /dev/null
+++ b/reftable/refname_test.c
@@ -0,0 +1,99 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "record.h"
+#include "refname.h"
+#include "system.h"
+
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+struct testcase {
+	char *add;
+	char *del;
+	int error_code;
+};
+
+static void test_conflict(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+	struct reftable_ref_record rec = {
+		.ref_name = "a/b",
+		.target = "destination", /* make sure it's not a symref. */
+		.update_index = 1,
+	};
+	int err;
+	int i;
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct testcase cases[] = {
+		{ "a/b/c", NULL, REFTABLE_NAME_CONFLICT },
+		{ "b", NULL, 0 },
+		{ "a", NULL, REFTABLE_NAME_CONFLICT },
+		{ "a", "a/b", 0 },
+
+		{ "p/", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p//q", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p/./q", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p/../q", NULL, REFTABLE_REFNAME_ERROR },
+
+		{ "a/b/c", "a/b", 0 },
+		{ NULL, "a//b", 0 },
+	};
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_add_ref(w, &rec);
+	assert_err(err);
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+
+	block_source_from_strbuf(&source, &buf);
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	reftable_table_from_reader(&tab, rd);
+
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		struct modification mod = {
+			.tab = tab,
+		};
+
+		if (cases[i].add != NULL) {
+			mod.add = &cases[i].add;
+			mod.add_len = 1;
+		}
+		if (cases[i].del != NULL) {
+			mod.del = &cases[i].del;
+			mod.del_len = 1;
+		}
+
+		err = modification_validate(&mod);
+		assert(err == cases[i].error_code);
+	}
+
+	reftable_reader_free(rd);
+	strbuf_release(&buf);
+}
+
+int refname_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_conflict", &test_conflict);
+	return test_main(argc, argv);
+}
diff --git a/reftable/reftable-tests.h b/reftable/reftable-tests.h
new file mode 100644
index 00000000000..e38471888f7
--- /dev/null
+++ b/reftable/reftable-tests.h
@@ -0,0 +1,22 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_TESTS_H
+#define REFTABLE_TESTS_H
+
+int block_test_main(int argc, const char **argv);
+int merged_test_main(int argc, const char **argv);
+int record_test_main(int argc, const char **argv);
+int refname_test_main(int argc, const char **argv);
+int reftable_test_main(int argc, const char **argv);
+int strbuf_test_main(int argc, const char **argv);
+int stack_test_main(int argc, const char **argv);
+int tree_test_main(int argc, const char **argv);
+int reftable_dump_main(int argc, char *const *argv);
+
+#endif
diff --git a/reftable/reftable.c b/reftable/reftable.c
new file mode 100644
index 00000000000..6e7d1da10e7
--- /dev/null
+++ b/reftable/reftable.c
@@ -0,0 +1,90 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+#include "record.h"
+#include "reader.h"
+#include "merged.h"
+
+struct reftable_table_vtable {
+	int (*seek)(void *tab, struct reftable_iterator *it,
+		    struct reftable_record *);
+};
+
+static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
+				     struct reftable_record *rec)
+{
+	return reader_seek((struct reftable_reader *)tab, it, rec);
+}
+
+static struct reftable_table_vtable reader_vtable = {
+	.seek = reftable_reader_seek_void,
+};
+
+static int reftable_merged_table_seek_void(void *tab,
+					   struct reftable_iterator *it,
+					   struct reftable_record *rec)
+{
+	return merged_table_seek_record((struct reftable_merged_table *)tab, it,
+					rec);
+}
+
+static struct reftable_table_vtable merged_table_vtable = {
+	.seek = reftable_merged_table_seek_void,
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return tab->ops->seek(tab->table_arg, it, &rec);
+}
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &reader_vtable;
+	tab->table_arg = reader;
+}
+
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *merged)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &merged_table_vtable;
+	tab->table_arg = merged;
+}
+
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_iterator it = { 0 };
+	int err = reftable_table_seek_ref(tab, &it, name);
+	if (err)
+		goto done;
+
+	err = reftable_iterator_next_ref(&it, ref);
+	if (err)
+		goto done;
+
+	if (strcmp(ref->ref_name, name) ||
+	    reftable_ref_record_is_deletion(ref)) {
+		reftable_ref_record_clear(ref);
+		err = 1;
+		goto done;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	return err;
+}
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 00000000000..52e869c4a47
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,571 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *));
+
+/****************************************************************
+ Basic data types
+
+ Reftables store the state of each ref in struct reftable_ref_record, and they
+ store a sequence of reflog updates in struct reftable_log_record.
+ ****************************************************************/
+
+/* reftable_ref_record holds a ref database entry target_value */
+struct reftable_ref_record {
+	char *ref_name; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
+
+/* prints a reftable_ref_record onto stdout */
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id);
+
+/* frees and nulls all pointer values. */
+void reftable_ref_record_clear(struct reftable_ref_record *ref);
+
+/* returns whether two reftable_ref_records are the same */
+int reftable_ref_record_equal(struct reftable_ref_record *a,
+			      struct reftable_ref_record *b, int hash_size);
+
+/* reftable_log_record holds a reflog entry */
+struct reftable_log_record {
+	char *ref_name;
+	uint64_t update_index; /* logical timestamp of a transactional update.
+				*/
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int reftable_log_record_is_deletion(const struct reftable_log_record *log);
+
+/* frees and nulls all pointer values. */
+void reftable_log_record_clear(struct reftable_log_record *log);
+
+/* returns whether two records are equal. */
+int reftable_log_record_equal(struct reftable_log_record *a,
+			      struct reftable_log_record *b, int hash_size);
+
+/* dumps a reftable_log_record on stdout, for debugging/testing. */
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id);
+
+/****************************************************************
+ Error handling
+
+ Error are signaled with negative integer return values. 0 means success.
+ ****************************************************************/
+
+/* different types of errors */
+enum reftable_error {
+	/* Unexpected file system behavior */
+	REFTABLE_IO_ERROR = -2,
+
+	/* Format inconsistency on reading data
+	 */
+	REFTABLE_FORMAT_ERROR = -3,
+
+	/* File does not exist. Returned from block_source_from_file(),  because
+	   it needs special handling in stack.
+	*/
+	REFTABLE_NOT_EXIST_ERROR = -4,
+
+	/* Trying to write out-of-date data. */
+	REFTABLE_LOCK_ERROR = -5,
+
+	/* Misuse of the API:
+	   - on writing a record with NULL ref_name.
+	   - on writing a reftable_ref_record outside the table limits
+	   - on writing a ref or log record before the stack's next_update_index
+	   - on writing a log record with multiline message with
+	   exact_log_message unset
+	   - on reading a reftable_ref_record from log iterator, or vice versa.
+	*/
+	REFTABLE_API_ERROR = -6,
+
+	/* Decompression error */
+	REFTABLE_ZLIB_ERROR = -7,
+
+	/* Wrote a table without blocks. */
+	REFTABLE_EMPTY_TABLE_ERROR = -8,
+
+	/* Dir/file conflict. */
+	REFTABLE_NAME_CONFLICT = -9,
+
+	/* Illegal ref name. */
+	REFTABLE_REFNAME_ERROR = -10,
+};
+
+/* convert the numeric error code to a string. The string should not be
+ * deallocated. */
+const char *reftable_error_str(int err);
+
+/*
+ * Convert the numeric error code to an equivalent errno code.
+ */
+int reftable_error_to_errno(int err);
+
+/****************************************************************
+ Writing
+
+ Writing single reftables
+ ****************************************************************/
+
+/* reftable_write_options sets options for writing a single reftable. */
+struct reftable_write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+
+	/* boolean: do not check ref names for validity or dir/file conflicts.
+	 */
+	int skip_name_check;
+
+	/* boolean: copy log messages exactly. If unset, check that the message
+	 *   is a single line, and add '\n' if missing.
+	 */
+	int exact_log_message;
+};
+
+/* reftable_block_stats holds statistics for a single block type */
+struct reftable_block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct reftable_stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct reftable_block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct reftable_block_stats obj_stats;
+	/* stats for index blocks */
+	struct reftable_block_stats idx_stats;
+	/* stats for log blocks */
+	struct reftable_block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* reftable_new_writer creates a new writer */
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, const void *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int reftable_fd_write(void *fdp, const void *data, size_t size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+
+   For transactional updates, typically min==max. When converting an existing
+   ref database into a single reftable, this would be a range of update-index
+   timestamps.
+ */
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max);
+
+/* adds a reftable_ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned.
+
+   It is an error to write a ref record after a log record.
+ */
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n);
+
+/* adds a reftable_log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n);
+
+/* reftable_writer_close finalizes the reftable. The writer is retained so
+ * statistics can be inspected. */
+int reftable_writer_close(struct reftable_writer *w);
+
+/* writer_stats returns the statistics on the reftable being written.
+
+   This struct becomes invalid when the writer is freed.
+ */
+const struct reftable_stats *writer_stats(struct reftable_writer *w);
+
+/* reftable_writer_free deallocates memory for the writer */
+void reftable_writer_free(struct reftable_writer *w);
+
+/****************************************************************
+ * ITERATING
+ ****************************************************************/
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct reftable_iterator {
+	struct reftable_iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref);
+
+/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log);
+
+/* releases resources associated with an iterator. */
+void reftable_iterator_destroy(struct reftable_iterator *it);
+
+/****************************************************************
+ Reading single tables
+
+ The follow routines are for reading single files. For an application-level
+ interface, skip ahead to struct reftable_merged_table and struct
+ reftable_stack.
+ ****************************************************************/
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct reftable_block_source {
+	struct reftable_block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct reftable_block {
+	uint8_t *data;
+	int len;
+	struct reftable_block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct reftable_block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct reftable_block *dest,
+			  uint64_t off, uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct reftable_block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int reftable_block_source_from_file(struct reftable_block_source *block_src,
+				    const char *name);
+
+/* The reader struct is a handle to an open reftable file. */
+struct reftable_reader;
+
+/* reftable_new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp. The name is used for creating a stack. Typically, it is the
+ * basename of the file. The block source `src` is owned by the reader, and is
+ * closed on calling reftable_reader_destroy().
+ */
+int reftable_new_reader(struct reftable_reader **pp,
+			struct reftable_block_source *src, const char *name);
+
+/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
+   in the table.  To seek to the start of the table, use name = "".
+
+   example:
+
+   struct reftable_reader *r = NULL;
+   int err = reftable_new_reader(&r, &src, "filename");
+   if (err < 0) { ... }
+   struct reftable_iterator it  = {0};
+   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct reftable_ref_record ref  = {0};
+   while (1) {
+     err = reftable_iterator_next_ref(&it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   reftable_iterator_destroy(&it);
+   reftable_ref_record_clear(&ref);
+ */
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reftable_reader_hash_id(struct reftable_reader *r);
+
+/* seek to logs for the given name, older than update_index. To seek to the
+   start of the table, use name = "".
+ */
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reftable_reader_free(struct reftable_reader *);
+
+/* return an iterator for the refs pointing to `oid`. */
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid);
+
+/* return the max_update_index for a table */
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
+
+/****************************************************************
+ Merged tables
+
+ A ref database kept in a sequence of table files. The merged_table presents a
+ unified view to reading (seeking, iterating) a sequence of immutable tables.
+ ****************************************************************/
+
+/* A merged table is implements seeking/iterating over a stack of tables. */
+struct reftable_merged_table;
+
+/* reftable_new_merged_table creates a new merged table. It takes ownership of
+   the stack array.
+*/
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_reader **stack, int n,
+			      uint32_t hash_id);
+
+/* returns an iterator positioned just before 'name' */
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index);
+
+/* like reftable_merged_table_seek_log_at but look for the newest entry. */
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
+
+/* closes readers for the merged tables */
+void reftable_merged_table_close(struct reftable_merged_table *mt);
+
+/* releases memory for the merged_table */
+void reftable_merged_table_free(struct reftable_merged_table *m);
+
+/****************************************************************
+ Generic tables
+
+ A unified API for reading tables, either merged tables, or single readers.
+ ****************************************************************/
+
+struct reftable_table {
+	struct reftable_table_vtable *ops;
+	void *table_arg;
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name);
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader);
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *table);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref);
+
+/****************************************************************
+ Mutable ref database
+
+ The stack presents an interface to a mutable sequence of reftables.
+ ****************************************************************/
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct reftable_stack;
+
+/* open a new reftable stack. The tables along with the table list will be
+   stored in 'dir'. Typically, this should be .git/reftables.
+*/
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
+
+/* holds a transaction to add tables at the top of a stack. */
+struct reftable_addition;
+
+/*
+  returns a new transaction to add reftables to the given stack. As a side
+  effect, the ref database is locked.
+*/
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st);
+
+/* Adds a reftable to transaction. */
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg);
+
+/* Commits the transaction, releasing the lock. */
+int reftable_addition_commit(struct reftable_addition *add);
+
+/* Release all non-committed data from the transaction, and deallocate the
+   transaction. Releases the lock if held. */
+void reftable_addition_destroy(struct reftable_addition *add);
+
+/* add a new table to the stack. The write_table function must call
+   reftable_writer_set_limits, add refs and return an error value. */
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write_table)(struct reftable_writer *wr,
+					  void *write_arg),
+		       void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st);
+
+/* frees all resources associated with the stack. */
+void reftable_stack_destroy(struct reftable_stack *st);
+
+/* Reloads the stack if necessary. This is very cheap to run if the stack was up
+ * to date */
+int reftable_stack_reload(struct reftable_stack *st);
+
+/* Policy for expiring reflog entries. */
+struct reftable_log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int reftable_stack_auto_compact(struct reftable_stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log);
+
+/* statistics on past compactions. */
+struct reftable_compaction_stats {
+	uint64_t bytes; /* total number of bytes written */
+	uint64_t entries_written; /* total number of entries written, including
+				     failures. */
+	int attempts; /* how often we tried to compact */
+	int failures; /* failures happen on concurrent updates */
+};
+
+/* return statistics for compaction up till now. */
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st);
+
+#endif
diff --git a/reftable/reftable_test.c b/reftable/reftable_test.c
new file mode 100644
index 00000000000..3b598a56f1a
--- /dev/null
+++ b/reftable/reftable_test.c
@@ -0,0 +1,631 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "record.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static const int update_index = 5;
+
+static void test_buffer(void)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_block_source source = { NULL };
+	struct reftable_block out = { 0 };
+	int n;
+	uint8_t in[] = "hello";
+	strbuf_add(&buf, in, sizeof(in));
+	block_source_from_strbuf(&source, &buf);
+	assert(block_source_size(&source) == 6);
+	n = block_source_read_block(&source, &out, 0, sizeof(in));
+	assert(n == sizeof(in));
+	assert(!memcmp(in, out.data, n));
+	reftable_block_done(&out);
+
+	n = block_source_read_block(&source, &out, 1, 2);
+	assert(n == 2);
+	assert(!memcmp(out.data, "el", 2));
+
+	reftable_block_done(&out);
+	block_source_close(&source);
+	strbuf_release(&buf);
+}
+
+static void test_default_write_opts(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+
+	struct reftable_ref_record rec = {
+		.ref_name = "master",
+		.update_index = 1,
+	};
+	int err;
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader **readers = malloc(sizeof(*readers) * 1);
+	uint32_t hash_id;
+	struct reftable_reader *rd = NULL;
+	struct reftable_merged_table *merged = NULL;
+
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_add_ref(w, &rec);
+	assert_err(err);
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	hash_id = reftable_reader_hash_id(rd);
+	assert(hash_id == SHA1_ID);
+
+	readers[0] = rd;
+
+	err = reftable_new_merged_table(&merged, readers, 1, SHA1_ID);
+	assert_err(err);
+
+	reftable_merged_table_close(merged);
+	reftable_merged_table_free(merged);
+	strbuf_release(&buf);
+}
+
+static void write_table(char ***names, struct strbuf *buf, int N,
+			int block_size, uint32_t hash_id)
+{
+	struct reftable_write_options opts = {
+		.block_size = block_size,
+		.hash_id = hash_id,
+	};
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, buf, &opts);
+	struct reftable_ref_record ref = { 0 };
+	int i = 0, n;
+	struct reftable_log_record log = { 0 };
+	const struct reftable_stats *stats = NULL;
+	*names = reftable_calloc(sizeof(char *) * (N + 1));
+	reftable_writer_set_limits(w, update_index, update_index);
+	for (i = 0; i < N; i++) {
+		uint8_t hash[SHA256_SIZE] = { 0 };
+		char name[100];
+		int n;
+
+		set_test_hash(hash, i);
+
+		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
+
+		ref.ref_name = name;
+		ref.value = hash;
+		ref.update_index = update_index;
+		(*names)[i] = xstrdup(name);
+
+		n = reftable_writer_add_ref(w, &ref);
+		assert(n == 0);
+	}
+
+	for (i = 0; i < N; i++) {
+		uint8_t hash[SHA256_SIZE] = { 0 };
+		char name[100];
+		int n;
+
+		set_test_hash(hash, i);
+
+		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
+
+		log.ref_name = name;
+		log.new_hash = hash;
+		log.update_index = update_index;
+		log.message = "message";
+
+		n = reftable_writer_add_log(w, &log);
+		assert(n == 0);
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	stats = writer_stats(w);
+	for (i = 0; i < stats->ref_stats.blocks; i++) {
+		int off = i * opts.block_size;
+		if (off == 0) {
+			off = header_size((hash_id == SHA256_ID) ? 2 : 1);
+		}
+		assert(buf->buf[off] == 'r');
+	}
+
+	assert(stats->log_stats.blocks > 0);
+	reftable_writer_free(w);
+}
+
+static void test_log_buffer_size(void)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_write_options opts = {
+		.block_size = 4096,
+	};
+	int err;
+	struct reftable_log_record log = {
+		.ref_name = "refs/heads/master",
+		.name = "Han-Wen Nienhuys",
+		.email = "hanwen@google.com",
+		.tz_offset = 100,
+		.time = 0x5e430672,
+		.update_index = 0xa,
+		.message = "commit: 9\n",
+	};
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+
+	/* This tests buffer extension for log compression. Must use a random
+	   hash, to ensure that the compressed part is larger than the original.
+	*/
+	uint8_t hash1[SHA1_SIZE], hash2[SHA1_SIZE];
+	for (int i = 0; i < SHA1_SIZE; i++) {
+		hash1[i] = (uint8_t)(rand() % 256);
+		hash2[i] = (uint8_t)(rand() % 256);
+	}
+	log.old_hash = hash1;
+	log.new_hash = hash2;
+	reftable_writer_set_limits(w, update_index, update_index);
+	err = reftable_writer_add_log(w, &log);
+	assert_err(err);
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+	strbuf_release(&buf);
+}
+
+static void test_log_write_read(void)
+{
+	int N = 2;
+	char **names = reftable_calloc(sizeof(char *) * (N + 1));
+	int err;
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_ref_record ref = { 0 };
+	int i = 0;
+	struct reftable_log_record log = { 0 };
+	int n;
+	struct reftable_iterator it = { 0 };
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+	const struct reftable_stats *stats = NULL;
+	reftable_writer_set_limits(w, 0, N);
+	for (i = 0; i < N; i++) {
+		char name[256];
+		struct reftable_ref_record ref = { 0 };
+		snprintf(name, sizeof(name), "b%02d%0*d", i, 130, 7);
+		names[i] = xstrdup(name);
+		puts(name);
+		ref.ref_name = name;
+		ref.update_index = i;
+
+		err = reftable_writer_add_ref(w, &ref);
+		assert_err(err);
+	}
+	for (i = 0; i < N; i++) {
+		uint8_t hash1[SHA1_SIZE], hash2[SHA1_SIZE];
+		struct reftable_log_record log = { 0 };
+		set_test_hash(hash1, i);
+		set_test_hash(hash2, i + 1);
+
+		log.ref_name = names[i];
+		log.update_index = i;
+		log.old_hash = hash1;
+		log.new_hash = hash2;
+
+		err = reftable_writer_add_log(w, &log);
+		assert_err(err);
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	stats = writer_stats(w);
+	assert(stats->log_stats.blocks > 0);
+	reftable_writer_free(w);
+	w = NULL;
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.log");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, names[N - 1]);
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert_err(err);
+
+	/* end of iteration. */
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert(0 < err);
+
+	reftable_iterator_destroy(&it);
+	reftable_ref_record_clear(&ref);
+
+	err = reftable_reader_seek_log(&rd, &it, "");
+	assert_err(err);
+
+	i = 0;
+	while (true) {
+		int err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			break;
+		}
+
+		assert_err(err);
+		assert_streq(names[i], log.ref_name);
+		assert(i == log.update_index);
+		i++;
+		reftable_log_record_clear(&log);
+	}
+
+	assert(i == N);
+	reftable_iterator_destroy(&it);
+
+	/* cleanup. */
+	strbuf_release(&buf);
+	free_names(names);
+	reader_close(&rd);
+}
+
+static void test_table_read_write_sequential(void)
+{
+	char **names;
+	struct strbuf buf = STRBUF_INIT;
+	int N = 50;
+	struct reftable_iterator it = { 0 };
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader rd = { 0 };
+	int err = 0;
+	int j = 0;
+
+	write_table(&names, &buf, N, 256, SHA1_ID);
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, "");
+	assert_err(err);
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		int r = reftable_iterator_next_ref(&it, &ref);
+		assert(r >= 0);
+		if (r > 0) {
+			break;
+		}
+		assert(0 == strcmp(names[j], ref.ref_name));
+		assert(update_index == ref.update_index);
+
+		j++;
+		reftable_ref_record_clear(&ref);
+	}
+	assert(j == N);
+	reftable_iterator_destroy(&it);
+	strbuf_release(&buf);
+	free_names(names);
+
+	reader_close(&rd);
+}
+
+static void test_table_write_small_table(void)
+{
+	char **names;
+	struct strbuf buf = STRBUF_INIT;
+	int N = 1;
+	write_table(&names, &buf, N, 4096, SHA1_ID);
+	assert(buf.len < 200);
+	strbuf_release(&buf);
+	free_names(names);
+}
+
+static void test_table_read_api(void)
+{
+	char **names;
+	struct strbuf buf = STRBUF_INIT;
+	int N = 50;
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	int err;
+	int i;
+	struct reftable_log_record log = { 0 };
+	struct reftable_iterator it = { 0 };
+
+	write_table(&names, &buf, N, 256, SHA1_ID);
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, names[0]);
+	assert_err(err);
+
+	err = reftable_iterator_next_log(&it, &log);
+	assert(err == REFTABLE_API_ERROR);
+
+	strbuf_release(&buf);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_free(names);
+	reader_close(&rd);
+	strbuf_release(&buf);
+}
+
+static void test_table_read_write_seek(bool index, int hash_id)
+{
+	char **names;
+	struct strbuf buf = STRBUF_INIT;
+	int N = 50;
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	int err;
+	int i = 0;
+
+	struct reftable_iterator it = { 0 };
+	struct strbuf pastLast = STRBUF_INIT;
+	struct reftable_ref_record ref = { 0 };
+
+	write_table(&names, &buf, N, 256, hash_id);
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+	assert(hash_id == reftable_reader_hash_id(&rd));
+
+	if (!index) {
+		rd.ref_offsets.index_offset = 0;
+	} else {
+		assert(rd.ref_offsets.index_offset > 0);
+	}
+
+	for (i = 1; i < N; i++) {
+		int err = reftable_reader_seek_ref(&rd, &it, names[i]);
+		assert_err(err);
+		err = reftable_iterator_next_ref(&it, &ref);
+		assert_err(err);
+		assert(0 == strcmp(names[i], ref.ref_name));
+		assert(i == ref.value[0]);
+
+		reftable_ref_record_clear(&ref);
+		reftable_iterator_destroy(&it);
+	}
+
+	strbuf_addstr(&pastLast, names[N - 1]);
+	strbuf_addstr(&pastLast, "/");
+
+	err = reftable_reader_seek_ref(&rd, &it, pastLast.buf);
+	if (err == 0) {
+		struct reftable_ref_record ref = { 0 };
+		int err = reftable_iterator_next_ref(&it, &ref);
+		assert(err > 0);
+	} else {
+		assert(err > 0);
+	}
+
+	strbuf_release(&pastLast);
+	reftable_iterator_destroy(&it);
+
+	strbuf_release(&buf);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+	reftable_free(names);
+	reader_close(&rd);
+}
+
+static void test_table_read_write_seek_linear(void)
+{
+	test_table_read_write_seek(false, SHA1_ID);
+}
+
+static void test_table_read_write_seek_linear_sha256(void)
+{
+	test_table_read_write_seek(false, SHA256_ID);
+}
+
+static void test_table_read_write_seek_index(void)
+{
+	test_table_read_write_seek(true, SHA1_ID);
+}
+
+static void test_table_refs_for(bool indexed)
+{
+	int N = 50;
+	char **want_names = reftable_calloc(sizeof(char *) * (N + 1));
+	int want_names_len = 0;
+	uint8_t want_hash[SHA1_SIZE];
+
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_ref_record ref = { 0 };
+	int i = 0;
+	int n;
+	int err;
+	struct reftable_reader rd;
+	struct reftable_block_source source = { 0 };
+
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+
+	struct reftable_iterator it = { 0 };
+	int j;
+
+	set_test_hash(want_hash, 4);
+
+	for (i = 0; i < N; i++) {
+		uint8_t hash[SHA1_SIZE];
+		char fill[51] = { 0 };
+		char name[100];
+		uint8_t hash1[SHA1_SIZE];
+		uint8_t hash2[SHA1_SIZE];
+		struct reftable_ref_record ref = { 0 };
+
+		memset(hash, i, sizeof(hash));
+		memset(fill, 'x', 50);
+		/* Put the variable part in the start */
+		snprintf(name, sizeof(name), "br%02d%s", i, fill);
+		name[40] = 0;
+		ref.ref_name = name;
+
+		set_test_hash(hash1, i / 4);
+		set_test_hash(hash2, 3 + i / 4);
+		ref.value = hash1;
+		ref.target_value = hash2;
+
+		/* 80 bytes / entry, so 3 entries per block. Yields 17
+		 */
+		/* blocks. */
+		n = reftable_writer_add_ref(w, &ref);
+		assert(n == 0);
+
+		if (!memcmp(hash1, want_hash, SHA1_SIZE) ||
+		    !memcmp(hash2, want_hash, SHA1_SIZE)) {
+			want_names[want_names_len++] = xstrdup(name);
+		}
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	reftable_writer_free(w);
+	w = NULL;
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+	if (!indexed) {
+		rd.obj_offsets.present = 0;
+	}
+
+	err = reftable_reader_seek_ref(&rd, &it, "");
+	assert_err(err);
+	reftable_iterator_destroy(&it);
+
+	err = reftable_reader_refs_for(&rd, &it, want_hash);
+	assert_err(err);
+
+	j = 0;
+	while (true) {
+		int err = reftable_iterator_next_ref(&it, &ref);
+		assert(err >= 0);
+		if (err > 0) {
+			break;
+		}
+
+		assert(j < want_names_len);
+		assert(0 == strcmp(ref.ref_name, want_names[j]));
+		j++;
+		reftable_ref_record_clear(&ref);
+	}
+	assert(j == want_names_len);
+
+	strbuf_release(&buf);
+	free_names(want_names);
+	reftable_iterator_destroy(&it);
+	reader_close(&rd);
+}
+
+static void test_table_refs_for_no_index(void)
+{
+	test_table_refs_for(false);
+}
+
+static void test_table_refs_for_obj_index(void)
+{
+	test_table_refs_for(true);
+}
+
+static void test_table_empty(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_ref_record rec = { 0 };
+	struct reftable_iterator it = { 0 };
+	int err;
+
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_close(w);
+	assert(err == REFTABLE_EMPTY_TABLE_ERROR);
+	reftable_writer_free(w);
+
+	assert(buf.len == header_size(1) + footer_size(1));
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &rec);
+	assert(err > 0);
+
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	strbuf_release(&buf);
+}
+
+int reftable_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_default_write_opts", test_default_write_opts);
+	add_test_case("test_log_write_read", test_log_write_read);
+	add_test_case("test_table_read_write_seek_linear_sha256",
+		      &test_table_read_write_seek_linear_sha256);
+	add_test_case("test_log_buffer_size", test_log_buffer_size);
+	add_test_case("test_table_write_small_table",
+		      &test_table_write_small_table);
+	add_test_case("test_buffer", &test_buffer);
+	add_test_case("test_table_read_api", &test_table_read_api);
+	add_test_case("test_table_read_write_sequential",
+		      &test_table_read_write_sequential);
+	add_test_case("test_table_read_write_seek_linear",
+		      &test_table_read_write_seek_linear);
+	add_test_case("test_table_read_write_seek_index",
+		      &test_table_read_write_seek_index);
+	add_test_case("test_table_read_write_refs_for_no_index",
+		      &test_table_refs_for_no_index);
+	add_test_case("test_table_read_write_refs_for_obj_index",
+		      &test_table_refs_for_obj_index);
+	add_test_case("test_table_empty", &test_table_empty);
+	return test_main(argc, argv);
+}
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 00000000000..fad12ae301f
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1209 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "refname.h"
+#include "reftable.h"
+#include "writer.h"
+
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config)
+{
+	struct reftable_stack *p =
+		reftable_calloc(sizeof(struct reftable_stack));
+	struct strbuf list_file_name = STRBUF_INIT;
+	int err = 0;
+
+	if (config.hash_id == 0) {
+		config.hash_id = SHA1_ID;
+	}
+
+	*dest = NULL;
+
+	strbuf_reset(&list_file_name);
+	strbuf_addstr(&list_file_name, dir);
+	strbuf_addstr(&list_file_name, "/tables.list");
+
+	p->list_file = strbuf_detach(&list_file_name, NULL);
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = reftable_stack_reload_maybe_reuse(p, true);
+	if (err < 0) {
+		reftable_stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fd_read_lines(int fd, char ***namesp)
+{
+	off_t size = lseek(fd, 0, SEEK_END);
+	char *buf = NULL;
+	int err = 0;
+	if (size < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+	err = lseek(fd, 0, SEEK_SET);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	buf = reftable_malloc(size + 1);
+	if (read(fd, buf, size) != size) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+
+done:
+	reftable_free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	int fd = open(filename, O_RDONLY, 0644);
+	int err = 0;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			*namesp = reftable_calloc(sizeof(char *));
+			return 0;
+		}
+
+		return REFTABLE_IO_ERROR;
+	}
+	err = fd_read_lines(fd, namesp);
+	close(fd);
+	return err;
+}
+
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void reftable_stack_destroy(struct reftable_stack *st)
+{
+	if (st->merged != NULL) {
+		reftable_merged_table_close(st->merged);
+		reftable_merged_table_free(st->merged);
+		st->merged = NULL;
+	}
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	reftable_free(st);
+}
+
+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
+						   int cur_len)
+{
+	struct reftable_reader **cur =
+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->merged->stack[i];
+	}
+	return cur;
+}
+
+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
+				      bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reftable_reader **new_tables =
+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
+	int new_tables_len = 0;
+	struct reftable_merged_table *new_merged = NULL;
+	int i;
+
+	while (*names) {
+		struct reftable_reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct reftable_block_source src = { 0 };
+			struct strbuf table_path = STRBUF_INIT;
+			strbuf_addstr(&table_path, st->reftable_dir);
+			strbuf_addstr(&table_path, "/");
+			strbuf_addstr(&table_path, name);
+
+			err = reftable_block_source_from_file(&src,
+							      table_path.buf);
+			strbuf_release(&table_path);
+
+			if (err < 0)
+				goto done;
+
+			err = reftable_new_reader(&rd, &src, name);
+			if (err < 0)
+				goto done;
+		}
+
+		new_tables[new_tables_len++] = rd;
+	}
+
+	/* success! */
+	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
+					st->config.hash_id);
+	if (err < 0)
+		goto done;
+
+	new_tables = NULL;
+	new_tables_len = 0;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		reftable_merged_table_free(st->merged);
+	}
+	new_merged->suppress_deletions = true;
+	st->merged = new_merged;
+
+	for (i = 0; i < cur_len; i++) {
+		if (cur[i] != NULL) {
+			reader_close(cur[i]);
+			reftable_reader_free(cur[i]);
+		}
+	}
+
+done:
+	for (i = 0; i < new_tables_len; i++) {
+		reader_close(new_tables[i]);
+		reftable_reader_free(new_tables[i]);
+	}
+	reftable_free(new_tables);
+	reftable_free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0)
+		return diff;
+
+	return udiff;
+}
+
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0)
+		return err;
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		int err2 = 0;
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = reftable_stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != REFTABLE_NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
+		   writer. Check if there was one by checking if the name list
+		   changed.
+		*/
+		err2 = read_lines(st->list_file, &names_after);
+		if (err2 < 0) {
+			free_names(names);
+			return err2;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return err;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 1;
+		sleep_millisec(delay);
+	}
+
+	return 0;
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct reftable_stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0)
+		return err;
+
+	for (i = 0; i < st->merged->stack_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto done;
+		}
+
+		if (strcmp(st->merged->stack[i]->name, names[i])) {
+			err = 1;
+			goto done;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto done;
+	}
+
+done:
+	free_names(names);
+	return err;
+}
+
+int reftable_stack_reload(struct reftable_stack *st)
+{
+	int err = stack_uptodate(st);
+	if (err > 0)
+		return reftable_stack_reload_maybe_reuse(st, true);
+	return err;
+}
+
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write)(struct reftable_writer *wr, void *arg),
+		       void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == REFTABLE_LOCK_ERROR) {
+			/* Ignore error return, we want to propagate
+			   REFTABLE_LOCK_ERROR.
+			*/
+			reftable_stack_reload(st);
+		}
+		return err;
+	}
+
+	if (!st->disable_auto_compact)
+		return reftable_stack_auto_compact(st);
+
+	return 0;
+}
+
+static void format_name(struct strbuf *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
+	strbuf_reset(dest);
+	strbuf_addstr(dest, buf);
+}
+
+struct reftable_addition {
+	int lock_file_fd;
+	struct strbuf lock_file_name;
+	struct reftable_stack *stack;
+	char **names;
+	char **new_tables;
+	int new_tables_len;
+	uint64_t next_update_index;
+};
+
+#define REFTABLE_ADDITION_INIT                \
+	{                                     \
+		.lock_file_name = STRBUF_INIT \
+	}
+
+static int reftable_stack_init_addition(struct reftable_addition *add,
+					struct reftable_stack *st)
+{
+	int err = 0;
+	add->stack = st;
+
+	strbuf_reset(&add->lock_file_name);
+	strbuf_addstr(&add->lock_file_name, st->list_file);
+	strbuf_addstr(&add->lock_file_name, ".lock");
+
+	add->lock_file_fd = open(add->lock_file_name.buf,
+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (add->lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = REFTABLE_LOCK_ERROR;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	err = stack_uptodate(st);
+	if (err < 0)
+		goto done;
+
+	if (err > 1) {
+		err = REFTABLE_LOCK_ERROR;
+		goto done;
+	}
+
+	add->next_update_index = reftable_stack_next_update_index(st);
+done:
+	if (err) {
+		reftable_addition_close(add);
+	}
+	return err;
+}
+
+void reftable_addition_close(struct reftable_addition *add)
+{
+	int i = 0;
+	struct strbuf nm = STRBUF_INIT;
+	for (i = 0; i < add->new_tables_len; i++) {
+		strbuf_reset(&nm);
+		strbuf_addstr(&nm, add->stack->list_file);
+		strbuf_addstr(&nm, "/");
+		strbuf_addstr(&nm, add->new_tables[i]);
+		unlink(nm.buf);
+		reftable_free(add->new_tables[i]);
+		add->new_tables[i] = NULL;
+	}
+	reftable_free(add->new_tables);
+	add->new_tables = NULL;
+	add->new_tables_len = 0;
+
+	if (add->lock_file_fd > 0) {
+		close(add->lock_file_fd);
+		add->lock_file_fd = 0;
+	}
+	if (add->lock_file_name.len > 0) {
+		unlink(add->lock_file_name.buf);
+		strbuf_release(&add->lock_file_name);
+	}
+
+	free_names(add->names);
+	add->names = NULL;
+	strbuf_release(&nm);
+}
+
+void reftable_addition_destroy(struct reftable_addition *add)
+{
+	if (add == NULL) {
+		return;
+	}
+	reftable_addition_close(add);
+	reftable_free(add);
+}
+
+int reftable_addition_commit(struct reftable_addition *add)
+{
+	struct strbuf table_list = STRBUF_INIT;
+	int i = 0;
+	int err = 0;
+	if (add->new_tables_len == 0)
+		goto done;
+
+	for (i = 0; i < add->stack->merged->stack_len; i++) {
+		strbuf_addstr(&table_list, add->stack->merged->stack[i]->name);
+		strbuf_addstr(&table_list, "\n");
+	}
+	for (i = 0; i < add->new_tables_len; i++) {
+		strbuf_addstr(&table_list, add->new_tables[i]);
+		strbuf_addstr(&table_list, "\n");
+	}
+
+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	strbuf_release(&table_list);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = close(add->lock_file_fd);
+	add->lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = rename(add->lock_file_name.buf, add->stack->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = reftable_stack_reload(add->stack);
+
+done:
+	reftable_addition_close(add);
+	return err;
+}
+
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st)
+{
+	int err = 0;
+	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
+	*dest = reftable_calloc(sizeof(**dest));
+	**dest = empty;
+	err = reftable_stack_init_addition(*dest, st);
+	if (err) {
+		reftable_free(*dest);
+		*dest = NULL;
+	}
+	return err;
+}
+
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg)
+{
+	struct reftable_addition add = REFTABLE_ADDITION_INIT;
+	int err = reftable_stack_init_addition(&add, st);
+	if (err < 0)
+		goto done;
+	if (err > 0) {
+		err = REFTABLE_LOCK_ERROR;
+		goto done;
+	}
+
+	err = reftable_addition_add(&add, write_table, arg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_addition_commit(&add);
+done:
+	reftable_addition_close(&add);
+	return err;
+}
+
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg)
+{
+	struct strbuf temp_tab_file_name = STRBUF_INIT;
+	struct strbuf tab_file_name = STRBUF_INIT;
+	struct strbuf next_name = STRBUF_INIT;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+
+	strbuf_reset(&next_name);
+	format_name(&next_name, add->next_update_index, add->next_update_index);
+
+	strbuf_addstr(&temp_tab_file_name, add->stack->reftable_dir);
+	strbuf_addstr(&temp_tab_file_name, "/");
+	strbuf_addbuf(&temp_tab_file_name, &next_name);
+	strbuf_addstr(&temp_tab_file_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp(temp_tab_file_name.buf);
+	if (tab_fd < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
+				 &add->stack->config);
+	err = write_table(wr, arg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_writer_close(wr);
+	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto done;
+	}
+	if (err < 0)
+		goto done;
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = stack_check_addition(add->stack, temp_tab_file_name.buf);
+	if (err < 0)
+		goto done;
+
+	if (wr->min_update_index < add->next_update_index) {
+		err = REFTABLE_API_ERROR;
+		goto done;
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	strbuf_addstr(&next_name, ".ref");
+
+	strbuf_addstr(&tab_file_name, add->stack->reftable_dir);
+	strbuf_addstr(&tab_file_name, "/");
+	strbuf_addbuf(&tab_file_name, &next_name);
+
+	/* TODO: should check destination out of paranoia */
+	err = rename(temp_tab_file_name.buf, tab_file_name.buf);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	add->new_tables = reftable_realloc(add->new_tables,
+					   sizeof(*add->new_tables) *
+						   (add->new_tables_len + 1));
+	add->new_tables[add->new_tables_len] = strbuf_detach(&next_name, NULL);
+	add->new_tables_len++;
+done:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_file_name.len > 0) {
+		unlink(temp_tab_file_name.buf);
+	}
+
+	strbuf_release(&temp_tab_file_name);
+	strbuf_release(&tab_file_name);
+	strbuf_release(&next_name);
+	reftable_writer_free(wr);
+	return err;
+}
+
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0)
+		return reftable_reader_max_update_index(
+			       st->merged->stack[sz - 1]) +
+		       1;
+	return 1;
+}
+
+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
+				struct strbuf *temp_tab,
+				struct reftable_log_expiry_config *config)
+{
+	struct strbuf next_name = STRBUF_INIT;
+	int tab_fd = -1;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reftable_reader_min_update_index(st->merged->stack[first]),
+		    reftable_reader_max_update_index(st->merged->stack[first]));
+
+	strbuf_reset(temp_tab);
+	strbuf_addstr(temp_tab, st->reftable_dir);
+	strbuf_addstr(temp_tab, "/");
+	strbuf_addbuf(temp_tab, &next_name);
+	strbuf_addstr(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp(temp_tab->buf);
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0)
+		goto done;
+	err = reftable_writer_close(wr);
+	if (err < 0)
+		goto done;
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+done:
+	reftable_writer_free(wr);
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(temp_tab->buf);
+		strbuf_release(temp_tab);
+	}
+	strbuf_release(&next_name);
+	return err;
+}
+
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reftable_reader **subtabs = reftable_calloc(
+		sizeof(struct reftable_reader *) * (last - first + 1));
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+
+	uint64_t entries = 0;
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reftable_reader *t = st->merged->stack[i];
+		subtabs[j++] = t;
+		st->stats.bytes += t->size;
+	}
+	reftable_writer_set_limits(wr,
+				   st->merged->stack[first]->min_update_index,
+				   st->merged->stack[last]->max_update_index);
+
+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
+					st->config.hash_id);
+	if (err < 0) {
+		reftable_free(subtabs);
+		goto done;
+	}
+
+	err = reftable_merged_table_seek_ref(mt, &it, "");
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = reftable_writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+	reftable_iterator_destroy(&it);
+
+	err = reftable_merged_table_seek_log(mt, &it, "");
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = reftable_writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		reftable_merged_table_free(mt);
+	}
+	reftable_ref_record_clear(&ref);
+	reftable_log_record_clear(&log);
+	st->stats.entries_written += entries;
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct reftable_stack *st, int first, int last,
+			       struct reftable_log_expiry_config *expiry)
+{
+	struct strbuf temp_tab_file_name = STRBUF_INIT;
+	struct strbuf new_table_name = STRBUF_INIT;
+	struct strbuf lock_file_name = STRBUF_INIT;
+	struct strbuf ref_list_contents = STRBUF_INIT;
+	struct strbuf new_table_path = STRBUF_INIT;
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **listp = NULL;
+	char **delete_on_success =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	char **subtable_locks =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto done;
+	}
+
+	st->stats.attempts++;
+
+	strbuf_reset(&lock_file_name);
+	strbuf_addstr(&lock_file_name, st->list_file);
+	strbuf_addstr(&lock_file_name, ".lock");
+
+	lock_file_fd =
+		open(lock_file_name.buf, O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	/* Don't want to write to the lock for now.  */
+	close(lock_file_fd);
+	lock_file_fd = 0;
+
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0)
+		goto done;
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct strbuf subtab_file_name = STRBUF_INIT;
+		struct strbuf subtab_lock = STRBUF_INIT;
+		int sublock_file_fd = -1;
+
+		strbuf_addstr(&subtab_file_name, st->reftable_dir);
+		strbuf_addstr(&subtab_file_name, "/");
+		strbuf_addstr(&subtab_file_name,
+			      reader_name(st->merged->stack[i]));
+
+		strbuf_reset(&subtab_lock);
+		strbuf_addbuf(&subtab_lock, &subtab_file_name);
+		strbuf_addstr(&subtab_lock, ".lock");
+
+		sublock_file_fd = open(subtab_lock.buf,
+				       O_EXCL | O_CREAT | O_WRONLY, 0644);
+		if (sublock_file_fd > 0) {
+			close(sublock_file_fd);
+		} else if (sublock_file_fd < 0) {
+			if (errno == EEXIST) {
+				err = 1;
+			} else {
+				err = REFTABLE_IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = subtab_lock.buf;
+		delete_on_success[j] = subtab_file_name.buf;
+		j++;
+
+		if (err != 0)
+			goto done;
+	}
+
+	err = unlink(lock_file_name.buf);
+	if (err < 0)
+		goto done;
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
+				   expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0)
+		goto done;
+
+	lock_file_fd =
+		open(lock_file_name.buf, O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
+		    st->merged->stack[last]->max_update_index);
+	strbuf_addstr(&new_table_name, ".ref");
+
+	strbuf_reset(&new_table_path);
+	strbuf_addstr(&new_table_path, st->reftable_dir);
+	strbuf_addstr(&new_table_path, "/");
+	strbuf_addbuf(&new_table_path, &new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(temp_tab_file_name.buf, new_table_path.buf);
+		if (err < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto done;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		strbuf_addstr(&ref_list_contents, st->merged->stack[i]->name);
+		strbuf_addstr(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		strbuf_addbuf(&ref_list_contents, &new_table_name);
+		strbuf_addstr(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		strbuf_addstr(&ref_list_contents, st->merged->stack[i]->name);
+		strbuf_addstr(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(new_table_path.buf);
+		goto done;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(new_table_path.buf);
+		goto done;
+	}
+
+	err = rename(lock_file_name.buf, st->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(new_table_path.buf);
+		goto done;
+	}
+	have_lock = false;
+
+	/* Reload the stack before deleting. On windows, we can only delete the
+	   files after we closed them.
+	*/
+	err = reftable_stack_reload_maybe_reuse(st, first < last);
+
+	listp = delete_on_success;
+	while (*listp) {
+		if (strcmp(*listp, new_table_path.buf)) {
+			unlink(*listp);
+		}
+		listp++;
+	}
+
+done:
+	free_names(delete_on_success);
+
+	listp = subtable_locks;
+	while (*listp) {
+		unlink(*listp);
+		listp++;
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(lock_file_name.buf);
+	}
+	strbuf_release(&new_table_name);
+	strbuf_release(&new_table_path);
+	strbuf_release(&ref_list_contents);
+	strbuf_release(&temp_tab_file_name);
+	strbuf_release(&lock_file_name);
+	return err;
+}
+
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct reftable_stack *st, int first,
+				     int last,
+				     struct reftable_log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	if (sz == 0)
+		return 0;
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+
+	if (n == 0) {
+		*seglen = 0;
+		return segs;
+	}
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	reftable_free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
+{
+	uint64_t *sizes =
+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
+	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
+	int overhead = header_size(version) - 1;
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		sizes[i] = st->merged->stack[i]->size - overhead;
+	}
+	return sizes;
+}
+
+int reftable_stack_auto_compact(struct reftable_stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	reftable_free(sizes);
+	if (segment_size(&seg) > 0)
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+
+	return 0;
+}
+
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st)
+{
+	return &st->stats;
+}
+
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_table tab = { NULL };
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+	return reftable_table_read_ref(&tab, refname, ref);
+}
+
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err)
+		goto done;
+
+	err = reftable_iterator_next_log(&it, log);
+	if (err)
+		goto done;
+
+	if (strcmp(log->ref_name, refname) ||
+	    reftable_log_record_is_deletion(log)) {
+		err = 1;
+		goto done;
+	}
+
+done:
+	if (err) {
+		reftable_log_record_clear(log);
+	}
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name)
+{
+	int err = 0;
+	struct reftable_block_source src = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct reftable_ref_record *refs = NULL;
+	struct reftable_iterator it = { NULL };
+	int cap = 0;
+	int len = 0;
+	int i = 0;
+
+	if (st->config.skip_name_check)
+		return 0;
+
+	err = reftable_block_source_from_file(&src, new_tab_name);
+	if (err < 0)
+		goto done;
+
+	err = reftable_new_reader(&rd, &src, new_tab_name);
+	if (err < 0)
+		goto done;
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	if (err > 0) {
+		err = 0;
+		goto done;
+	}
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0)
+			goto done;
+
+		if (len >= cap) {
+			cap = 2 * cap + 1;
+			refs = reftable_realloc(refs, cap * sizeof(refs[0]));
+		}
+
+		refs[len++] = ref;
+	}
+
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+
+	err = validate_ref_record_addition(tab, refs, len);
+
+done:
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&refs[i]);
+	}
+
+	free(refs);
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 00000000000..60bca3b3e58
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,48 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+#include "system.h"
+
+struct reftable_stack {
+	char *list_file;
+	char *reftable_dir;
+	bool disable_auto_compact;
+
+	struct reftable_write_options config;
+
+	struct reftable_merged_table *merged;
+	struct reftable_compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg);
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config);
+int fastlog2(uint64_t sz);
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
+void reftable_addition_close(struct reftable_addition *add);
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
new file mode 100644
index 00000000000..9e3ebf939be
--- /dev/null
+++ b/reftable/stack_test.c
@@ -0,0 +1,787 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+
+#include "merged.h"
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+#include <sys/types.h>
+#include <dirent.h>
+
+static void test_read_file(void)
+{
+	char fn[256] = "/tmp/stack.test_read_file.XXXXXX";
+	int fd = mkstemp(fn);
+	char out[1024] = "line1\n\nline2\nline3";
+	int n, err;
+	char **names = NULL;
+	char *want[] = { "line1", "line2", "line3" };
+	int i = 0;
+
+	assert(fd > 0);
+	n = write(fd, out, strlen(out));
+	assert(n == strlen(out));
+	err = close(fd);
+	assert(err >= 0);
+
+	err = read_lines(fn, &names);
+	assert_err(err);
+
+	for (i = 0; names[i] != NULL; i++) {
+		assert(0 == strcmp(want[i], names[i]));
+	}
+	free_names(names);
+	remove(fn);
+}
+
+static void test_parse_names(void)
+{
+	char buf[] = "line\n";
+	char **names = NULL;
+	parse_names(buf, strlen(buf), &names);
+
+	assert(NULL != names[0]);
+	assert(0 == strcmp(names[0], "line"));
+	assert(NULL == names[1]);
+	free_names(names);
+}
+
+static void test_names_equal(void)
+{
+	char *a[] = { "a", "b", "c", NULL };
+	char *b[] = { "a", "b", "d", NULL };
+	char *c[] = { "a", "b", NULL };
+
+	assert(names_equal(a, a));
+	assert(!names_equal(a, b));
+	assert(!names_equal(a, c));
+}
+
+static int write_test_ref(struct reftable_writer *wr, void *arg)
+{
+	struct reftable_ref_record *ref = arg;
+	reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
+	return reftable_writer_add_ref(wr, ref);
+}
+
+struct write_log_arg {
+	struct reftable_log_record *log;
+	uint64_t update_index;
+};
+
+static int write_test_log(struct reftable_writer *wr, void *arg)
+{
+	struct write_log_arg *wla = arg;
+
+	reftable_writer_set_limits(wr, wla->update_index, wla->update_index);
+	return reftable_writer_add_log(wr, wla->log);
+}
+
+static void test_reftable_stack_add_one(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record ref = {
+		.ref_name = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st, ref.ref_name, &dest);
+	assert_err(err);
+	assert(0 == strcmp("master", dest.target));
+
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_uptodate(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st1 = NULL;
+	struct reftable_stack *st2 = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int err;
+	struct reftable_ref_record ref1 = {
+		.ref_name = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record ref2 = {
+		.ref_name = "branch2",
+		.update_index = 2,
+		.target = "master",
+	};
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st1, dir, cfg);
+	assert_err(err);
+
+	err = reftable_new_stack(&st2, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st1, &write_test_ref, &ref1);
+	assert_err(err);
+
+	err = reftable_stack_add(st2, &write_test_ref, &ref2);
+	assert(err == REFTABLE_LOCK_ERROR);
+
+	err = reftable_stack_reload(st2);
+	assert_err(err);
+
+	err = reftable_stack_add(st2, &write_test_ref, &ref2);
+	assert_err(err);
+	reftable_stack_destroy(st1);
+	reftable_stack_destroy(st2);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_transaction_api(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_addition *add = NULL;
+
+	struct reftable_ref_record ref = {
+		.ref_name = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	reftable_addition_destroy(add);
+
+	err = reftable_stack_new_addition(&add, st);
+	assert_err(err);
+
+	err = reftable_addition_add(add, &write_test_ref, &ref);
+	assert_err(err);
+
+	err = reftable_addition_commit(add);
+	assert_err(err);
+
+	reftable_addition_destroy(add);
+
+	err = reftable_stack_read_ref(st, ref.ref_name, &dest);
+	assert_err(err);
+	assert(0 == strcmp("master", dest.target));
+
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_validate_refname(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int i;
+	struct reftable_ref_record ref = {
+		.ref_name = "a/b",
+		.update_index = 1,
+		.target = "master",
+	};
+	char *additions[] = { "a", "a/b/c" };
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	for (i = 0; i < ARRAY_SIZE(additions); i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = additions[i],
+			.update_index = 1,
+			.target = "master",
+		};
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		assert(err == REFTABLE_NAME_CONFLICT);
+	}
+
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static int write_error(struct reftable_writer *wr, void *arg)
+{
+	return *((int *)arg);
+}
+
+static void test_reftable_stack_update_index_check(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record ref1 = {
+		.ref_name = "name1",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record ref2 = {
+		.ref_name = "name2",
+		.update_index = 1,
+		.target = "master",
+	};
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref1);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref2);
+	assert(err == REFTABLE_API_ERROR);
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_lock_failure(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err, i;
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+	for (i = -1; i != REFTABLE_EMPTY_TABLE_ERROR; i--) {
+		err = reftable_stack_add(st, &write_error, &i);
+		assert(err == i);
+	}
+
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_add(void)
+{
+	int i = 0;
+	int err = 0;
+	struct reftable_write_options cfg = {
+		.exact_log_message = true,
+	};
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_ref_record refs[2] = { { 0 } };
+	struct reftable_log_record logs[2] = { { 0 } };
+	int N = ARRAY_SIZE(refs);
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+	st->disable_auto_compact = true;
+
+	for (i = 0; i < N; i++) {
+		char buf[256];
+		snprintf(buf, sizeof(buf), "branch%02d", i);
+		refs[i].ref_name = xstrdup(buf);
+		refs[i].value = reftable_malloc(SHA1_SIZE);
+		refs[i].update_index = i + 1;
+		set_test_hash(refs[i].value, i);
+
+		logs[i].ref_name = xstrdup(buf);
+		logs[i].update_index = N + i + 1;
+		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+		logs[i].email = xstrdup("identity@invalid");
+		set_test_hash(logs[i].new_hash, i);
+	}
+
+	for (i = 0; i < N; i++) {
+		int err = reftable_stack_add(st, &write_test_ref, &refs[i]);
+		assert_err(err);
+	}
+
+	for (i = 0; i < N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		struct reftable_ref_record dest = { 0 };
+		int err = reftable_stack_read_ref(st, refs[i].ref_name, &dest);
+		assert_err(err);
+		assert(reftable_ref_record_equal(&dest, refs + i, SHA1_SIZE));
+		reftable_ref_record_clear(&dest);
+	}
+
+	for (i = 0; i < N; i++) {
+		struct reftable_log_record dest = { 0 };
+		int err = reftable_stack_read_log(st, refs[i].ref_name, &dest);
+		assert_err(err);
+		assert(reftable_log_record_equal(&dest, logs + i, SHA1_SIZE));
+		reftable_log_record_clear(&dest);
+	}
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i < N; i++) {
+		reftable_ref_record_clear(&refs[i]);
+		reftable_log_record_clear(&logs[i]);
+	}
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_log_normalize(void)
+{
+	int err = 0;
+	struct reftable_write_options cfg = {
+		0,
+	};
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+
+	uint8_t h1[SHA1_SIZE] = { 0x01 }, h2[SHA1_SIZE] = { 0x02 };
+
+	struct reftable_log_record input = {
+		.ref_name = "branch",
+		.update_index = 1,
+		.new_hash = h1,
+		.old_hash = h2,
+	};
+	struct reftable_log_record dest = {
+		.update_index = 0,
+	};
+	struct write_log_arg arg = {
+		.log = &input,
+		.update_index = 1,
+	};
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	input.message = "one\ntwo";
+	err = reftable_stack_add(st, &write_test_log, &arg);
+	assert(err == REFTABLE_API_ERROR);
+
+	input.message = "one";
+	err = reftable_stack_add(st, &write_test_log, &arg);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, input.ref_name, &dest);
+	assert_err(err);
+	assert(0 == strcmp(dest.message, "one\n"));
+
+	input.message = "two\n";
+	arg.update_index = 2;
+	err = reftable_stack_add(st, &write_test_log, &arg);
+	assert_err(err);
+	err = reftable_stack_read_log(st, input.ref_name, &dest);
+	assert_err(err);
+	assert(0 == strcmp(dest.message, "two\n"));
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	reftable_log_record_clear(&dest);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_tombstone(void)
+{
+	int i = 0;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record refs[2] = { { 0 } };
+	struct reftable_log_record logs[2] = { { 0 } };
+	int N = ARRAY_SIZE(refs);
+	struct reftable_ref_record dest = { 0 };
+	struct reftable_log_record log_dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		const char *buf = "branch";
+		refs[i].ref_name = xstrdup(buf);
+		refs[i].update_index = i + 1;
+		if (i % 2 == 0) {
+			refs[i].value = reftable_malloc(SHA1_SIZE);
+			set_test_hash(refs[i].value, i);
+		}
+		logs[i].ref_name = xstrdup(buf);
+		/* update_index is part of the key. */
+		logs[i].update_index = 42;
+		if (i % 2 == 0) {
+			logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+			set_test_hash(logs[i].new_hash, i);
+			logs[i].email = xstrdup("identity@invalid");
+		}
+	}
+	for (i = 0; i < N; i++) {
+		int err = reftable_stack_add(st, &write_test_ref, &refs[i]);
+		assert_err(err);
+	}
+	for (i = 0; i < N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_read_ref(st, "branch", &dest);
+	assert(err == 1);
+	reftable_ref_record_clear(&dest);
+
+	err = reftable_stack_read_log(st, "branch", &log_dest);
+	assert(err == 1);
+	reftable_log_record_clear(&log_dest);
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st, "branch", &dest);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, "branch", &log_dest);
+	assert(err == 1);
+	reftable_ref_record_clear(&dest);
+	reftable_log_record_clear(&log_dest);
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i < N; i++) {
+		reftable_ref_record_clear(&refs[i]);
+		reftable_log_record_clear(&logs[i]);
+	}
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_hash_id(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+
+	struct reftable_ref_record ref = {
+		.ref_name = "master",
+		.target = "target",
+		.update_index = 1,
+	};
+	struct reftable_write_options cfg32 = { .hash_id = SHA256_ID };
+	struct reftable_stack *st32 = NULL;
+	struct reftable_write_options cfg_default = { 0 };
+	struct reftable_stack *st_default = NULL;
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	/* can't read it with the wrong hash ID. */
+	err = reftable_new_stack(&st32, dir, cfg32);
+	assert(err == REFTABLE_FORMAT_ERROR);
+
+	/* check that we can read it back with default config too. */
+	err = reftable_new_stack(&st_default, dir, cfg_default);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st_default, "master", &dest);
+	assert_err(err);
+
+	assert(!strcmp(dest.target, ref.target));
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	reftable_stack_destroy(st_default);
+	reftable_clear_dir(dir);
+}
+
+static void test_log2(void)
+{
+	assert(1 == fastlog2(3));
+	assert(2 == fastlog2(4));
+	assert(2 == fastlog2(5));
+}
+
+static void test_sizes_to_segments(void)
+{
+	uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
+	/* .................0  1  2  3  4  5 */
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(segs[2].log == 3);
+	assert(segs[2].start == 5);
+	assert(segs[2].end == 6);
+
+	assert(segs[1].log == 2);
+	assert(segs[1].start == 2);
+	assert(segs[1].end == 5);
+	reftable_free(segs);
+}
+
+static void test_sizes_to_segments_empty(void)
+{
+	uint64_t sizes[0];
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(seglen == 0);
+	reftable_free(segs);
+}
+
+static void test_sizes_to_segments_all_equal(void)
+{
+	uint64_t sizes[] = { 5, 5 };
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(seglen == 1);
+	assert(segs[0].start == 0);
+	assert(segs[0].end == 2);
+	reftable_free(segs);
+}
+
+static void test_suggest_compaction_segment(void)
+{
+	uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
+	/* .................0    1    2  3   4  5  6 */
+	struct segment min =
+		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
+	assert(min.start == 2);
+	assert(min.end == 7);
+}
+
+static void test_suggest_compaction_segment_nothing(void)
+{
+	uint64_t sizes[] = { 64, 32, 16, 8, 4, 2 };
+	struct segment result =
+		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
+	assert(result.start == result.end);
+}
+
+static void test_reflog_expire(void)
+{
+	char dir[256] = "/tmp/stack.test_reflog_expire.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	struct reftable_log_record logs[20] = { { 0 } };
+	int N = ARRAY_SIZE(logs) - 1;
+	int i = 0;
+	int err;
+	struct reftable_log_expiry_config expiry = {
+		.time = 10,
+	};
+	struct reftable_log_record log = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 1; i <= N; i++) {
+		char buf[256];
+		snprintf(buf, sizeof(buf), "branch%02d", i);
+
+		logs[i].ref_name = xstrdup(buf);
+		logs[i].update_index = i;
+		logs[i].time = i;
+		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+		logs[i].email = xstrdup("identity@invalid");
+		set_test_hash(logs[i].new_hash, i);
+	}
+
+	for (i = 1; i <= N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	err = reftable_stack_compact_all(st, &expiry);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, logs[9].ref_name, &log);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, logs[11].ref_name, &log);
+	assert_err(err);
+
+	expiry.min_update_index = 15;
+	err = reftable_stack_compact_all(st, &expiry);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, logs[14].ref_name, &log);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, logs[16].ref_name, &log);
+	assert_err(err);
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i <= N; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	reftable_clear_dir(dir);
+	reftable_log_record_clear(&log);
+}
+
+static int write_nothing(struct reftable_writer *wr, void *arg)
+{
+	reftable_writer_set_limits(wr, 1, 1);
+	return 0;
+}
+
+static void test_empty_add(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_stack *st2 = NULL;
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_nothing, NULL);
+	assert_err(err);
+
+	err = reftable_new_stack(&st2, dir, cfg);
+	assert_err(err);
+	reftable_clear_dir(dir);
+	reftable_stack_destroy(st);
+	reftable_stack_destroy(st2);
+}
+
+static void test_reftable_stack_auto_compaction(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int err, i;
+	int N = 100;
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		char name[100];
+		struct reftable_ref_record ref = {
+			.ref_name = name,
+			.update_index = reftable_stack_next_update_index(st),
+			.target = "master",
+		};
+		snprintf(name, sizeof(name), "branch%04d", i);
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		assert_err(err);
+
+		assert(i < 3 || st->merged->stack_len < 2 * fastlog2(i));
+	}
+
+	assert(reftable_stack_compaction_stats(st)->entries_written <
+	       (uint64_t)(N * fastlog2(N)));
+
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+int stack_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_reftable_stack_uptodate",
+		      &test_reftable_stack_uptodate);
+	add_test_case("test_reftable_stack_transaction_api",
+		      &test_reftable_stack_transaction_api);
+	add_test_case("test_reftable_stack_hash_id",
+		      &test_reftable_stack_hash_id);
+	add_test_case("test_sizes_to_segments_all_equal",
+		      &test_sizes_to_segments_all_equal);
+	add_test_case("test_reftable_stack_auto_compaction",
+		      &test_reftable_stack_auto_compaction);
+	add_test_case("test_reftable_stack_validate_refname",
+		      &test_reftable_stack_validate_refname);
+	add_test_case("test_reftable_stack_update_index_check",
+		      &test_reftable_stack_update_index_check);
+	add_test_case("test_reftable_stack_lock_failure",
+		      &test_reftable_stack_lock_failure);
+	add_test_case("test_reftable_stack_log_normalize",
+		      &test_reftable_stack_log_normalize);
+	add_test_case("test_reftable_stack_tombstone",
+		      &test_reftable_stack_tombstone);
+	add_test_case("test_reftable_stack_add_one",
+		      &test_reftable_stack_add_one);
+	add_test_case("test_empty_add", test_empty_add);
+	add_test_case("test_reflog_expire", test_reflog_expire);
+	add_test_case("test_suggest_compaction_segment",
+		      &test_suggest_compaction_segment);
+	add_test_case("test_suggest_compaction_segment_nothing",
+		      &test_suggest_compaction_segment_nothing);
+	add_test_case("test_sizes_to_segments", &test_sizes_to_segments);
+	add_test_case("test_sizes_to_segments_empty",
+		      &test_sizes_to_segments_empty);
+	add_test_case("test_log2", &test_log2);
+	add_test_case("test_parse_names", &test_parse_names);
+	add_test_case("test_read_file", &test_read_file);
+	add_test_case("test_names_equal", &test_names_equal);
+	add_test_case("test_reftable_stack_add", &test_reftable_stack_add);
+	return test_main(argc, argv);
+}
diff --git a/reftable/strbuf.c b/reftable/strbuf.c
new file mode 100644
index 00000000000..eec446fb958
--- /dev/null
+++ b/reftable/strbuf.c
@@ -0,0 +1,206 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "strbuf.h"
+
+#include "system.h"
+#include "reftable.h"
+#include "basics.h"
+
+#ifdef REFTABLE_STANDALONE
+
+void strbuf_init(struct strbuf *s, size_t alloc)
+{
+	struct strbuf empty = STRBUF_INIT;
+	*s = empty;
+}
+
+void strbuf_grow(struct strbuf *s, size_t extra)
+{
+	size_t newcap = s->len + extra + 1;
+	if (newcap > s->cap) {
+		s->buf = reftable_realloc(s->buf, newcap);
+		s->cap = newcap;
+	}
+}
+
+static void strbuf_resize(struct strbuf *s, int l)
+{
+	int zl = l + 1; /* one uint8_t for 0 termination. */
+	assert(s->canary == STRBUF_CANARY);
+	if (s->cap < zl) {
+		int c = s->cap * 2;
+		if (c < zl) {
+			c = zl;
+		}
+		s->cap = c;
+		s->buf = reftable_realloc(s->buf, s->cap);
+	}
+	s->len = l;
+	s->buf[l] = 0;
+}
+
+void strbuf_setlen(struct strbuf *s, size_t l)
+{
+	assert(s->cap >= l + 1);
+	s->len = l;
+	s->buf[l] = 0;
+}
+
+void strbuf_reset(struct strbuf *s)
+{
+	strbuf_resize(s, 0);
+}
+
+void strbuf_addstr(struct strbuf *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+	assert(d->canary == STRBUF_CANARY);
+
+	strbuf_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void strbuf_addbuf(struct strbuf *s, struct strbuf *a)
+{
+	int end = s->len;
+	assert(s->canary == STRBUF_CANARY);
+	strbuf_resize(s, s->len + a->len);
+	memcpy(s->buf + end, a->buf, a->len);
+}
+
+char *strbuf_detach(struct strbuf *s, size_t *sz)
+{
+	char *p = NULL;
+	p = (char *)s->buf;
+	if (sz)
+		*sz = s->len;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void strbuf_release(struct strbuf *s)
+{
+	assert(s->canary == STRBUF_CANARY);
+	s->cap = 0;
+	s->len = 0;
+	reftable_free(s->buf);
+	s->buf = NULL;
+}
+
+int strbuf_cmp(const struct strbuf *a, const struct strbuf *b)
+{
+	int min = a->len < b->len ? a->len : b->len;
+	int res = memcmp(a->buf, b->buf, min);
+	assert(a->canary == STRBUF_CANARY);
+	assert(b->canary == STRBUF_CANARY);
+	if (res != 0)
+		return res;
+	if (a->len < b->len)
+		return -1;
+	else if (a->len > b->len)
+		return 1;
+	else
+		return 0;
+}
+
+int strbuf_add(struct strbuf *b, const void *data, size_t sz)
+{
+	assert(b->canary == STRBUF_CANARY);
+	strbuf_grow(b, sz);
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	b->buf[b->len] = 0;
+	return sz;
+}
+
+#endif
+
+static uint64_t strbuf_size(void *b)
+{
+	return ((struct strbuf *)b)->len;
+}
+
+int strbuf_add_void(void *b, const void *data, size_t sz)
+{
+	strbuf_add((struct strbuf *)b, data, sz);
+	return sz;
+}
+
+static void strbuf_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void strbuf_close(void *b)
+{
+}
+
+static int strbuf_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			     uint32_t size)
+{
+	struct strbuf *b = (struct strbuf *)v;
+	assert(off + size <= b->len);
+	dest->data = reftable_calloc(size);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable strbuf_vtable = {
+	.size = &strbuf_size,
+	.read_block = &strbuf_read_block,
+	.return_block = &strbuf_return_block,
+	.close = &strbuf_close,
+};
+
+void block_source_from_strbuf(struct reftable_block_source *bs,
+			      struct strbuf *buf)
+{
+	assert(bs->ops == NULL);
+	bs->ops = &strbuf_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+struct reftable_block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct reftable_block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct reftable_block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
+
+int common_prefix_size(struct strbuf *a, struct strbuf *b)
+{
+	int p = 0;
+	while (p < a->len && p < b->len) {
+		if (a->buf[p] != b->buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
+
+struct strbuf reftable_empty_strbuf = STRBUF_INIT;
diff --git a/reftable/strbuf.h b/reftable/strbuf.h
new file mode 100644
index 00000000000..c9a77b1fb55
--- /dev/null
+++ b/reftable/strbuf.h
@@ -0,0 +1,88 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#ifdef REFTABLE_STANDALONE
+
+#include "basics.h"
+#include "reftable.h"
+
+/*
+  Provides a bounds-checked, growable byte ranges. To use, initialize as "strbuf
+  x = STRBUF_INIT;"
+ */
+struct strbuf {
+	int len;
+	int cap;
+	char *buf;
+
+	/* Used to enforce initialization with STRBUF_INIT */
+	uint8_t canary;
+};
+#define STRBUF_CANARY 0x42
+#define STRBUF_INIT                       \
+	{                                 \
+		0, 0, NULL, STRBUF_CANARY \
+	}
+
+void strbuf_addstr(struct strbuf *dest, const char *src);
+
+/* Deallocate and clear strbuf */
+void strbuf_release(struct strbuf *strbuf);
+
+/* Set strbuf to 0 length, but retain buffer. */
+void strbuf_reset(struct strbuf *strbuf);
+
+/* Initializes a strbuf. Accepts a strbuf with random garbage. */
+void strbuf_init(struct strbuf *strbuf, size_t alloc);
+
+/* Return `buf`, clearing out `s`. Optionally return len (not cap) in `sz`.  */
+char *strbuf_detach(struct strbuf *s, size_t *sz);
+
+/* Set length of the slace to `l`, but don't reallocated. */
+void strbuf_setlen(struct strbuf *s, size_t l);
+
+/* Ensure `l` bytes beyond current length are available */
+void strbuf_grow(struct strbuf *s, size_t l);
+
+/* Signed comparison */
+int strbuf_cmp(const struct strbuf *a, const struct strbuf *b);
+
+/* Append `data` to the `dest` strbuf.  */
+int strbuf_add(struct strbuf *dest, const void *data, size_t sz);
+
+/* Append `add` to `dest. */
+void strbuf_addbuf(struct strbuf *dest, struct strbuf *add);
+
+#else
+
+#include "../git-compat-util.h"
+#include "../strbuf.h"
+
+#endif
+
+extern struct strbuf reftable_empty_strbuf;
+
+/* Like strbuf_add, but suitable for passing to reftable_new_writer
+ */
+int strbuf_add_void(void *b, const void *data, size_t sz);
+
+/* Find the longest shared prefix size of `a` and `b` */
+int common_prefix_size(struct strbuf *a, struct strbuf *b);
+
+struct reftable_block_source;
+
+/* Create an in-memory block source for reading reftables */
+void block_source_from_strbuf(struct reftable_block_source *bs,
+			      struct strbuf *buf);
+
+struct reftable_block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/strbuf_test.c b/reftable/strbuf_test.c
new file mode 100644
index 00000000000..862ed867916
--- /dev/null
+++ b/reftable/strbuf_test.c
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "strbuf.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_strbuf(void)
+{
+	struct strbuf s = STRBUF_INIT;
+	struct strbuf t = STRBUF_INIT;
+
+	strbuf_addstr(&s, "abc");
+	assert(0 == strcmp("abc", s.buf));
+
+	strbuf_addstr(&t, "pqr");
+	strbuf_addbuf(&s, &t);
+	assert(0 == strcmp("abcpqr", s.buf));
+
+	strbuf_release(&s);
+	strbuf_release(&t);
+}
+
+int strbuf_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_strbuf", &test_strbuf);
+	return test_main(argc, argv);
+}
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 00000000000..6629b1104cf
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#ifndef REFTABLE_STANDALONE
+
+#include "git-compat-util.h"
+#include "cache.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_STANDALONE */
+
+void reftable_clear_dir(const char *dirname);
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/test_framework.c b/reftable/test_framework.c
new file mode 100644
index 00000000000..f97c2d6df8b
--- /dev/null
+++ b/reftable/test_framework.c
@@ -0,0 +1,69 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "test_framework.h"
+
+#include "system.h"
+#include "basics.h"
+#include "constants.h"
+
+struct test_case **test_cases;
+int test_case_len;
+int test_case_cap;
+
+struct test_case *new_test_case(const char *name, void (*testfunc)(void))
+{
+	struct test_case *tc = reftable_malloc(sizeof(struct test_case));
+	tc->name = name;
+	tc->testfunc = testfunc;
+	return tc;
+}
+
+struct test_case *add_test_case(const char *name, void (*testfunc)(void))
+{
+	struct test_case *tc = new_test_case(name, testfunc);
+	if (test_case_len == test_case_cap) {
+		test_case_cap = 2 * test_case_cap + 1;
+		test_cases = reftable_realloc(
+			test_cases, sizeof(struct test_case) * test_case_cap);
+	}
+
+	test_cases[test_case_len++] = tc;
+	return tc;
+}
+
+int test_main(int argc, const char *argv[])
+{
+	const char *filter = NULL;
+	int i = 0;
+	if (argc > 1) {
+		filter = argv[1];
+	}
+
+	for (i = 0; i < test_case_len; i++) {
+		const char *name = test_cases[i]->name;
+		if (filter == NULL || strstr(name, filter) != NULL) {
+			printf("case %s\n", name);
+			test_cases[i]->testfunc();
+		} else {
+			printf("skip %s\n", name);
+		}
+
+		reftable_free(test_cases[i]);
+	}
+	reftable_free(test_cases);
+	test_cases = 0;
+	test_case_len = 0;
+	test_case_cap = 0;
+	return 0;
+}
+
+void set_test_hash(uint8_t *p, int i)
+{
+	memset(p, (uint8_t)i, hash_size(SHA1_ID));
+}
diff --git a/reftable/test_framework.h b/reftable/test_framework.h
new file mode 100644
index 00000000000..6a0009c1e54
--- /dev/null
+++ b/reftable/test_framework.h
@@ -0,0 +1,64 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TEST_FRAMEWORK_H
+#define TEST_FRAMEWORK_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#ifdef NDEBUG
+#undef NDEBUG
+#endif
+
+#include "system.h"
+
+#ifdef assert
+#undef assert
+#endif
+
+#define assert_err(c)                                                  \
+	if (c != 0) {                                                  \
+		fflush(stderr);                                        \
+		fflush(stdout);                                        \
+		fprintf(stderr, "%s: %d: error == %d (%s), want 0\n",  \
+			__FILE__, __LINE__, c, reftable_error_str(c)); \
+		abort();                                               \
+	}
+
+#define assert_streq(a, b)                                               \
+	if (strcmp(a, b)) {                                              \
+		fflush(stderr);                                          \
+		fflush(stdout);                                          \
+		fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, \
+			__LINE__, #a, a, #b, b);                         \
+		abort();                                                 \
+	}
+
+#define assert(c)                                                          \
+	if (!(c)) {                                                        \
+		fflush(stderr);                                            \
+		fflush(stdout);                                            \
+		fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, \
+			__LINE__, #c);                                     \
+		abort();                                                   \
+	}
+
+struct test_case {
+	const char *name;
+	void (*testfunc)(void);
+};
+
+struct test_case *new_test_case(const char *name, void (*testfunc)(void));
+struct test_case *add_test_case(const char *name, void (*testfunc)(void));
+int test_main(int argc, const char *argv[]);
+
+void set_test_hash(uint8_t *p, int i);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 00000000000..0061d14e306
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,63 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	int res;
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				reftable_calloc(sizeof(struct tree_node));
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	res = compare(key, (*rootp)->key);
+	if (res < 0)
+		return tree_search(key, &(*rootp)->left, compare, insert);
+	else if (res > 0)
+		return tree_search(key, &(*rootp)->right, compare, insert);
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	reftable_free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 00000000000..954512e9a3a
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+/* tree_node is a generic binary search tree. */
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+/* looks for `key` in `rootp` using `compare` as comparison function. If insert
+   is set, insert the key if it's not found. Else, return NULL.
+*/
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+
+/* performs an infix walk of the tree. */
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+
+/*
+  deallocates the tree nodes recursively. Keys should be deallocated separately
+  by walking over the tree. */
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/tree_test.c b/reftable/tree_test.c
new file mode 100644
index 00000000000..c6d448cbe8a
--- /dev/null
+++ b/reftable/tree_test.c
@@ -0,0 +1,62 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static int test_compare(const void *a, const void *b)
+{
+	return (char *)a - (char *)b;
+}
+
+struct curry {
+	void *last;
+};
+
+static void check_increasing(void *arg, void *key)
+{
+	struct curry *c = (struct curry *)arg;
+	if (c->last != NULL) {
+		assert(test_compare(c->last, key) < 0);
+	}
+	c->last = key;
+}
+
+static void test_tree(void)
+{
+	struct tree_node *root = NULL;
+
+	void *values[11] = { 0 };
+	struct tree_node *nodes[11] = { 0 };
+	int i = 1;
+	struct curry c = { 0 };
+	do {
+		nodes[i] = tree_search(values + i, &root, &test_compare, 1);
+		i = (i * 7) % 11;
+	} while (i != 1);
+
+	for (i = 1; i < ARRAY_SIZE(nodes); i++) {
+		assert(values + i == nodes[i]->key);
+		assert(nodes[i] ==
+		       tree_search(values + i, &root, &test_compare, 0));
+	}
+
+	infix_walk(root, check_increasing, &c);
+	tree_free(root);
+}
+
+int tree_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_tree", &test_tree);
+	return test_main(argc, argv);
+}
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100755
index 00000000000..4dadd875f4a
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,22 @@
+#!/bin/sh
+
+set -eu
+
+# Override this to import from somewhere else, say "../reftable".
+SRC=${SRC:-origin}
+BRANCH=${BRANCH:-master}
+
+((git --git-dir reftable-repo/.git fetch -f ${SRC} ${BRANCH}:import && cd reftable-repo && git checkout -f $(git rev-parse import) ) ||
+   git clone https://github.com/google/reftable reftable-repo)
+
+cp reftable-repo/c/*.[ch] reftable/
+cp reftable-repo/c/include/*.[ch] reftable/
+cp reftable-repo/LICENSE reftable/
+
+git --git-dir reftable-repo/.git show --no-patch --format=oneline HEAD \
+  > reftable/VERSION
+
+mv reftable/system.h reftable/system.h~
+sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
+
+git add reftable/*.[ch] reftable/LICENSE reftable/VERSION
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 00000000000..670009fac62
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,662 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct reftable_block_stats *
+writer_reftable_block_stats(struct reftable_writer *w, uint8_t typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct reftable_writer *w, uint8_t *data, size_t len,
+			int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		uint8_t *zeroed = reftable_calloc(w->pending_padding);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0)
+			return n;
+
+		w->pending_padding = 0;
+		reftable_free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0)
+		return n;
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct reftable_write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_version(struct reftable_writer *w)
+{
+	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
+}
+
+static int writer_write_header(struct reftable_writer *w, uint8_t *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	dest[4] = writer_version(w);
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	if (writer_version(w) == 2) {
+		put_be32(dest + 24, w->opts.hash_id);
+	}
+	return header_size(writer_version(w));
+}
+
+static void writer_reinit_block_writer(struct reftable_writer *w, uint8_t typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = header_size(writer_version(w));
+	}
+
+	strbuf_release(&w->last_key);
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, const void *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts)
+{
+	struct reftable_writer *wp =
+		reftable_calloc(sizeof(struct reftable_writer));
+	strbuf_init(&wp->block_writer_data.last_key, 0);
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->last_key = reftable_empty_strbuf;
+	wp->block = reftable_calloc(opts->block_size);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void reftable_writer_free(struct reftable_writer *w)
+{
+	reftable_free(w->block);
+	reftable_free(w);
+}
+
+struct obj_index_tree_node {
+	struct strbuf hash;
+	uint64_t *offsets;
+	int offset_len;
+	int offset_cap;
+};
+#define OBJ_INDEX_TREE_NODE_INIT    \
+	{                           \
+		.hash = STRBUF_INIT \
+	}
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return strbuf_cmp(&((const struct obj_index_tree_node *)a)->hash,
+			  &((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct reftable_writer *w, struct strbuf *hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = *hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		struct obj_index_tree_node empty = OBJ_INDEX_TREE_NODE_INIT;
+		key = reftable_malloc(sizeof(struct obj_index_tree_node));
+		*key = empty;
+
+		strbuf_reset(&key->hash);
+		strbuf_addbuf(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = reftable_realloc(
+			key->offsets, sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct reftable_writer *w,
+			     struct reftable_record *rec)
+{
+	int result = -1;
+	struct strbuf key = STRBUF_INIT;
+	int err = 0;
+	reftable_record_key(rec, &key);
+	if (strbuf_cmp(&w->last_key, &key) >= 0)
+		goto done;
+
+	strbuf_reset(&w->last_key);
+	strbuf_addbuf(&w->last_key, &key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, reftable_record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == reftable_record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto done;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto done;
+	}
+
+	writer_reinit_block_writer(w, reftable_record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto done;
+	}
+
+	result = 0;
+done:
+	strbuf_release(&key);
+	return result;
+}
+
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	struct reftable_ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->ref_name == NULL)
+		return REFTABLE_API_ERROR;
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index)
+		return REFTABLE_API_ERROR;
+
+	reftable_record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, &rec);
+	if (err < 0)
+		return err;
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct strbuf h = STRBUF_INIT;
+		strbuf_add(&h, (char *)ref->value, hash_size(w->opts.hash_id));
+		writer_index_hash(w, &h);
+		strbuf_release(&h);
+	}
+
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct strbuf h = STRBUF_INIT;
+		strbuf_add(&h, ref->target_value, hash_size(w->opts.hash_id));
+		writer_index_hash(w, &h);
+	}
+	return 0;
+}
+
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, reftable_ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	char *input_log_message = log->message;
+	struct strbuf cleaned_message = STRBUF_INIT;
+	int err;
+	if (log->ref_name == NULL)
+		return REFTABLE_API_ERROR;
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0)
+			return err;
+	}
+
+	if (!w->opts.exact_log_message && log->message != NULL) {
+		strbuf_addstr(&cleaned_message, log->message);
+		while (cleaned_message.len &&
+		       cleaned_message.buf[cleaned_message.len - 1] == '\n')
+			strbuf_setlen(&cleaned_message,
+				      cleaned_message.len - 1);
+		if (strchr(cleaned_message.buf, '\n')) {
+			// multiple lines not allowed.
+			err = REFTABLE_API_ERROR;
+			goto done;
+		}
+		strbuf_addstr(&cleaned_message, "\n");
+		log->message = cleaned_message.buf;
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	reftable_record_from_log(&rec, log);
+	err = writer_add_record(w, &rec);
+
+done:
+	log->message = input_log_message;
+	strbuf_release(&cleaned_message);
+	return err;
+}
+
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, reftable_log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct reftable_writer *w)
+{
+	uint8_t typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	struct reftable_block_stats *bstats = NULL;
+	if (err < 0)
+		return err;
+
+	while (w->index_len > threshold) {
+		struct reftable_index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct reftable_record rec = { 0 };
+			reftable_record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, &rec) == 0) {
+				continue;
+			}
+
+			err = writer_flush_block(w);
+			if (err < 0)
+				return err;
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, &rec);
+			if (err != 0) {
+				/* write into fresh block should always succeed
+				 */
+				abort();
+			}
+		}
+		for (i = 0; i < idx_len; i++) {
+			strbuf_release(&idx[i].last_key);
+		}
+		reftable_free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0)
+		return err;
+
+	bstats = writer_reftable_block_stats(w, typ);
+	bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
+	bstats->index_offset = index_start;
+	bstats->max_index_level = max_level;
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct strbuf *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(&entry->hash, arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct reftable_writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct reftable_obj_record obj_rec = {
+		.hash_prefix = (uint8_t *)entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct reftable_record rec = { 0 };
+	if (arg->err < 0)
+		goto done;
+
+	reftable_record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0)
+		goto done;
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0)
+		goto done;
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0)
+		goto done;
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+done:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	strbuf_release(&entry->hash);
+	reftable_free(entry);
+}
+
+static int writer_dump_object_index(struct reftable_writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0)
+		return closure.err;
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct reftable_writer *w)
+{
+	uint8_t typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL)
+		return 0;
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0)
+		return err;
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0)
+			return err;
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int reftable_writer_close(struct reftable_writer *w)
+{
+	uint8_t footer[72];
+	uint8_t *p = footer;
+	int err = writer_finish_public_section(w);
+	int empty_table = w->next == 0;
+	if (err != 0)
+		goto done;
+	w->pending_padding = 0;
+	if (empty_table) {
+		/* Empty tables need a header anyway. */
+		uint8_t header[28];
+		int n = writer_write_header(w, header);
+		err = padded_write(w, header, n, 0);
+		if (err < 0)
+			goto done;
+	}
+
+	p += writer_write_header(w, footer);
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+
+	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
+	if (err < 0)
+		goto done;
+
+	if (empty_table) {
+		err = REFTABLE_EMPTY_TABLE_ERROR;
+		goto done;
+	}
+
+done:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	strbuf_release(&w->last_key);
+	return err;
+}
+
+void writer_clear_index(struct reftable_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		strbuf_release(&w->index[i].last_key);
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct reftable_writer *w)
+{
+	uint8_t typ = block_writer_type(w->block_writer);
+	struct reftable_block_stats *bstats =
+		writer_reftable_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	struct reftable_index_record ir = { .last_key = STRBUF_INIT };
+	if (raw_bytes < 0)
+		return raw_bytes;
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0)
+		return err;
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = reftable_realloc(
+			w->index,
+			sizeof(struct reftable_index_record) * w->index_cap);
+	}
+
+	ir.offset = w->next;
+	strbuf_reset(&ir.last_key);
+	strbuf_addbuf(&ir.last_key, &w->block_writer->last_key);
+	w->index[w->index_len] = ir;
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct reftable_writer *w)
+{
+	if (w->block_writer == NULL)
+		return 0;
+	if (w->block_writer->entries == 0)
+		return 0;
+	return writer_flush_nonempty_block(w);
+}
+
+const struct reftable_stats *writer_stats(struct reftable_writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 00000000000..4b722c3f74c
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "strbuf.h"
+#include "tree.h"
+
+struct reftable_writer {
+	int (*write)(void *, const void *, size_t);
+	void *write_arg;
+	int pending_padding;
+	struct strbuf last_key;
+
+	/* offset of next block to write. */
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct reftable_write_options opts;
+
+	/* memory buffer for writing */
+	uint8_t *block;
+
+	/* writer for the current section. NULL or points to
+	 * block_writer_data */
+	struct block_writer *block_writer;
+
+	struct block_writer block_writer_data;
+
+	/* pending index records for the current section */
+	struct reftable_index_record *index;
+	int index_len;
+	int index_cap;
+
+	/*
+	  tree for use with tsearch; used to populate the 'o' inverse OID
+	  map */
+	struct tree_node *obj_index_tree;
+
+	struct reftable_stats stats;
+};
+
+/* finishes a block, and writes it to storage */
+int writer_flush_block(struct reftable_writer *w);
+
+/* deallocates memory related to the index */
+void writer_clear_index(struct reftable_writer *w);
+
+/* finishes writing a 'r' (refs) or 'g' (reflogs) section */
+int writer_finish_public_section(struct reftable_writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 00000000000..3e0b0f24f1c
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 12/19] Add standalone build infrastructure for reftable
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (10 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 11/19] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 13/19] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (7 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The reftable library is integrates with Git, so it become a first class
supported storage mechanism, but is sufficiently complex that other
projects (e.g. libgit2) may want to consume the same source code.

To make this possible, the library also compiles completely standalone, which is
demonstrated with the Bazel BUILD/WORKSPACE files that this commit adds.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/BUILD      | 203 ++++++++++++++++++++++++++++++++++++++++++++
 reftable/README.md  |  33 +++++++
 reftable/WORKSPACE  |  14 +++
 reftable/zlib.BUILD |  36 ++++++++
 4 files changed, 286 insertions(+)
 create mode 100644 reftable/BUILD
 create mode 100644 reftable/README.md
 create mode 100644 reftable/WORKSPACE
 create mode 100644 reftable/zlib.BUILD

diff --git a/reftable/BUILD b/reftable/BUILD
new file mode 100644
index 00000000000..d3946a717c0
--- /dev/null
+++ b/reftable/BUILD
@@ -0,0 +1,203 @@
+# Mirror core-git COPTS so reftable compiles without warning in CGit.
+GIT_COPTS = [
+    "-DREFTABLE_STANDALONE",
+    "-Wall",
+    "-Werror",
+    "-Wdeclaration-after-statement",
+    "-Wstrict-prototypes",
+    "-Wformat-security",
+    "-Wno-format-zero-length",
+    "-Wold-style-definition",
+    "-Woverflow",
+    "-Wpointer-arith",
+    "-Wstrict-prototypes",
+    "-Wunused",
+    "-Wvla",
+    "-Wextra",
+    "-Wmissing-prototypes",
+    "-Wno-empty-body",
+    "-Wno-missing-field-initializers",
+    "-Wno-sign-compare",
+    "-Werror=strict-aliasing",
+    "-Wno-unused-parameter",
+]
+
+cc_library(
+    name = "reftable",
+    srcs = [
+        "basics.c",
+        "block.c",
+        "compat.c",
+        "file.c",
+        "iter.c",
+        "merged.c",
+        "pq.c",
+        "reader.c",
+        "record.c",
+        "refname.c",
+        "reftable.c",
+        "strbuf.c",
+        "stack.c",
+        "tree.c",
+        "writer.c",
+        "zlib-compat.c",
+        "basics.h",
+        "block.h",
+        "compat.h",
+        "constants.h",
+        "iter.h",
+        "merged.h",
+        "pq.h",
+        "reader.h",
+        "refname.h",
+        "record.h",
+        "strbuf.h",
+        "stack.h",
+        "system.h",
+        "tree.h",
+        "writer.h",
+    ],
+    hdrs = [
+        "reftable.h",
+    ],
+    includes = [
+        "include",
+    ],
+    copts = [
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+    deps = ["@zlib"],
+    visibility = ["//visibility:public"]
+)
+
+cc_library(
+    name = "testlib",
+    srcs = [
+        "test_framework.c",
+        "dump.c",
+	"test_framework.h",
+    ],
+    hdrs = [
+            "reftable-tests.h",
+    ],
+    copts = GIT_COPTS,
+    deps = [":reftable"],
+    visibility = ["//visibility:public"]
+)
+
+cc_test(
+    name = "record_test",
+    srcs = ["record_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Drecord_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "reftable_test",
+    srcs = ["reftable_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dreftable_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "strbuf_test",
+    srcs = ["strbuf_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ] ,
+    copts = [
+        "-Dstrbuf_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "stack_test",
+    srcs = ["stack_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dstack_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "tree_test",
+    srcs = ["tree_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dtree_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "block_test",
+    srcs = ["block_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dblock_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "refname_test",
+    srcs = ["refname_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Drefname_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "merged_test",
+    srcs = ["merged_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dmerged_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+[sh_test(
+    name = "%s_valgrind_test" % t,
+    srcs = [ "valgrind_test.sh" ],
+    args = [ t ],
+    data = [ t ])
+ for t in ["record_test",
+           "merged_test",
+           "refname_test",
+           "tree_test",
+           "block_test",
+           "strbuf_test",
+           "stack_test"]]
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 00000000000..0345015c575
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,33 @@
+This directory contains an implementation of the reftable library.
+
+The library is integrated into the git-core project, but can also be built
+standalone, by compiling with -DREFTABLE_STANDALONE. The standalone build is
+exercised by the accompanying BUILD/WORKSPACE files, and can be run as
+
+  bazel test :all
+
+It includes a fragment of the zlib library to provide uncompress2(), which is a
+recent addition to the API. zlib is licensed as follows:
+
+```
+ (C) 1995-2017 Jean-loup Gailly and Mark Adler
+
+  This software is provided 'as-is', without any express or implied
+  warranty.  In no event will the authors be held liable for any damages
+  arising from the use of this software.
+
+  Permission is granted to anyone to use this software for any purpose,
+  including commercial applications, and to alter it and redistribute it
+  freely, subject to the following restrictions:
+
+  1. The origin of this software must not be misrepresented; you must not
+     claim that you wrote the original software. If you use this software
+     in a product, an acknowledgment in the product documentation would be
+     appreciated but is not required.
+  2. Altered source versions must be plainly marked as such, and must not be
+     misrepresented as being the original software.
+  3. This notice may not be removed or altered from any source distribution.
+
+  Jean-loup Gailly        Mark Adler
+  jloup@gzip.org          madler@alumni.caltech.edu
+```
diff --git a/reftable/WORKSPACE b/reftable/WORKSPACE
new file mode 100644
index 00000000000..9f91d16208a
--- /dev/null
+++ b/reftable/WORKSPACE
@@ -0,0 +1,14 @@
+workspace(name = "reftable")
+
+load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
+
+http_archive(
+    name = "zlib",
+    build_file = "@//:zlib.BUILD",
+    sha256 = "c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1",
+    strip_prefix = "zlib-1.2.11",
+    urls = [
+        "https://mirror.bazel.build/zlib.net/zlib-1.2.11.tar.gz",
+        "https://zlib.net/zlib-1.2.11.tar.gz",
+    ],
+)
diff --git a/reftable/zlib.BUILD b/reftable/zlib.BUILD
new file mode 100644
index 00000000000..edb77fdf8ee
--- /dev/null
+++ b/reftable/zlib.BUILD
@@ -0,0 +1,36 @@
+package(default_visibility = ["//visibility:public"])
+
+licenses(["notice"])  # BSD/MIT-like license (for zlib)
+
+cc_library(
+    name = "zlib",
+    srcs = [
+        "adler32.c",
+        "compress.c",
+        "crc32.c",
+        "crc32.h",
+        "deflate.c",
+        "deflate.h",
+        "gzclose.c",
+        "gzguts.h",
+        "gzlib.c",
+        "gzread.c",
+        "gzwrite.c",
+        "infback.c",
+        "inffast.c",
+        "inffast.h",
+        "inffixed.h",
+        "inflate.c",
+        "inflate.h",
+        "inftrees.c",
+        "inftrees.h",
+        "trees.c",
+        "trees.h",
+        "uncompr.c",
+        "zconf.h",
+        "zutil.c",
+        "zutil.h",
+    ],
+    hdrs = ["zlib.h"],
+    includes = ["."],
+)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 13/19] Reftable support for git-core
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (11 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 12/19] Add standalone build infrastructure for reftable Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 14/19] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
                                                     ` (6 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

This introduces the refs/reftable-backend.c containing reftable powered ref
storage backend.

It can be activated by passing --ref-storage=reftable to "git init".

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   28 +-
 builtin/clone.c                               |    3 +-
 builtin/init-db.c                             |   56 +-
 cache.h                                       |    6 +-
 refs.c                                        |   27 +-
 refs.h                                        |    3 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1335 +++++++++++++++++
 reftable/update.sh                            |    3 -
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |  155 ++
 14 files changed, 1608 insertions(+), 33 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ffd..72576235833 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index 372139f1f24..6571ed12468 100644
--- a/Makefile
+++ b/Makefile
@@ -807,6 +807,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -959,6 +960,7 @@ LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1162,7 +1164,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2352,11 +2354,30 @@ VCSSVN_OBJS += vcs-svn/sliding_window.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/compat.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/refname.o
+REFTABLE_OBJS += reftable/reftable.o
+REFTABLE_OBJS += reftable/strbuf.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2497,6 +2518,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -3112,7 +3136,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/builtin/clone.c b/builtin/clone.c
index 2a8e3aaaed3..3deb24a7040 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1111,7 +1111,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		}
 	}
 
-	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
+	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
+		default_ref_storage(), INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 0b7222e7188..da5b4670c84 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,7 +178,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -188,7 +189,8 @@ void initialize_repository_version(int hash_algo)
 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
 #endif
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 ||
+	    !strcmp(ref_storage_format, "reftable"))
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -238,6 +240,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -247,6 +250,24 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
+	/*
+	 * refs/heads is a file when using reftable. We can't reinitialize with
+	 * a reftable because it will overwrite HEAD
+	 */
+	if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
+			      is_directory(git_path_buf(&buf, "refs/heads"))) {
+		die("cannot switch ref storage format.");
+	}
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -261,15 +282,12 @@ static int create_default_files(const char *template_path,
 	 * Create the default symlink from ".git/HEAD" to the "master"
 	 * branch, if it does not exist yet.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
 			exit(1);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
@@ -383,7 +401,8 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 }
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash, unsigned int flags)
+	    const char *template_dir, int hash, const char *ref_storage_format,
+	    unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -422,6 +441,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
+	repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
@@ -450,6 +470,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -523,6 +545,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = default_ref_storage();
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
@@ -530,15 +553,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *object_format = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
@@ -648,9 +674,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
-	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
+	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
+		       ref_storage_format, flags);
 }
diff --git a/cache.h b/cache.h
index e5885cc9ead..33f1b81ba99 100644
--- a/cache.h
+++ b/cache.h
@@ -628,8 +628,9 @@ int path_inside_repo(const char *prefix, const char *path);
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash_algo,
-	    unsigned int flags);
-void initialize_repository_version(int hash_algo);
+	    const char *ref_storage_format, unsigned int flags);
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1044,6 +1045,7 @@ struct repository_format {
 	int hash_algo;
 	int has_extensions;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 29e710a29e9..c4b704aa177 100644
--- a/refs.c
+++ b/refs.c
@@ -17,10 +17,16 @@
 #include "argv-array.h"
 #include "repository.h"
 
+const char *default_ref_storage(void)
+{
+	const char *test = getenv("GIT_TEST_REFTABLE");
+	return test ? "reftable" : "files";
+}
+
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1717,13 +1723,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1739,7 +1745,11 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs_private = ref_store_init(r->gitdir,
+					 r->ref_storage_format ?
+						 r->ref_storage_format :
+						 default_ref_storage(),
+					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
 
@@ -1794,7 +1804,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, default_ref_storage(),
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1808,6 +1818,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = default_ref_storage();
 	struct ref_store *refs;
 	const char *id;
 
@@ -1821,9 +1832,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index 7aaa1226551..ddcc940d5c7 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* Returns the ref storage backend to use by default. */
+const char *default_ref_storage(void);
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index dc9e8d3a92b..7afe4c28310 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -693,6 +693,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 00000000000..5d3d6bb7ea6
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1335 @@
+#include "../cache.h"
+#include "../config.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../chdir-notify.h"
+
+#include "../reftable/reftable.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct git_reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *repo_dir;
+	char *reftable_dir;
+	struct reftable_stack *stack;
+};
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type);
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->ref_name = NULL;
+	reftable_log_record_clear(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = { NULL };
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	reftable_log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static struct ref_store *git_reftable_ref_store_create(const char *path,
+						       unsigned int store_flags)
+{
+	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct reftable_write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+	refs->repo_dir = xstrdup(path);
+	strbuf_addf(&sb, "%s/reftable", path);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
+	assert(refs->err != REFTABLE_API_ERROR);
+	strbuf_release(&sb);
+	return ref_store;
+}
+
+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct strbuf sb = STRBUF_INIT;
+
+	safe_create_dir(refs->reftable_dir, 1);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
+	write_file(sb.buf, "ref: refs/heads/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	return 0;
+}
+
+struct git_reftable_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+	unsigned int flags;
+	int err;
+	const char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = reftable_iterator_next_ref(&ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		/*
+		   We could filter pseudo refs here explicitly, but HEAD is not
+		   a PSEUDOREF, but a PER_WORKTREE, b/c each worktree can have
+		   its own HEAD.
+		 */
+		ri->base.refname = ri->ref.ref_name;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.ref_name,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	reftable_ref_record_clear(&ri->ref);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			    unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+	struct reftable_merged_table *mt = NULL;
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else {
+		mt = reftable_stack_merged_table(refs->stack);
+		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->prefix = prefix;
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int fixup_symrefs(struct ref_store *ref_store,
+			 struct ref_transaction *transaction)
+{
+	struct strbuf referent = STRBUF_INIT;
+	int i = 0;
+	int err = 0;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *update = transaction->updates[i];
+		struct object_id old_oid;
+
+		err = reftable_read_raw_ref(ref_store, update->refname,
+					    &old_oid, &referent,
+					    /* mutate input, like
+					       files-backend.c */
+					    &update->type);
+		if (err < 0 && errno == ENOENT &&
+		    is_null_oid(&update->old_oid)) {
+			err = 0;
+		}
+		if (err < 0)
+			goto done;
+
+		if (!(update->type & REF_ISSYMREF))
+			continue;
+
+		if (update->flags & REF_NO_DEREF) {
+			/* what should happen here? See files-backend.c
+			 * lock_ref_for_update. */
+		} else {
+			/*
+			  If we are updating a symref (eg. HEAD), we should also
+			  update the branch that the symref points to.
+
+			  This is generic functionality, and would be better
+			  done in refs.c, but the current implementation is
+			  intertwined with the locking in files-backend.c.
+			*/
+			int new_flags = update->flags;
+			struct ref_update *new_update = NULL;
+
+			/* if this is an update for HEAD, should also record a
+			   log entry for HEAD? See files-backend.c,
+			   split_head_update()
+			*/
+			new_update = ref_transaction_add_update(
+				transaction, referent.buf, new_flags,
+				&update->new_oid, &update->old_oid,
+				update->msg);
+			new_update->parent_update = update;
+
+			/* files-backend sets REF_LOG_ONLY here. */
+			update->flags |= REF_NO_DEREF | REF_LOG_ONLY;
+			update->flags &= ~REF_HAVE_OLD;
+		}
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	strbuf_release(&referent);
+	return err;
+}
+
+static int reftable_transaction_prepare(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_addition *add = NULL;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_new_addition(&add, refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = fixup_symrefs(ref_store, transaction);
+	if (err) {
+		goto done;
+	}
+
+	transaction->backend_data = add;
+	transaction->state = REF_TRANSACTION_PREPARED;
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	if (err < 0) {
+		transaction->state = REF_TRANSACTION_CLOSED;
+		strbuf_addf(errbuf, "reftable: transaction prepare: %s",
+			    reftable_error_str(err));
+	}
+
+	return err;
+}
+
+static int reftable_transaction_abort(struct ref_store *ref_store,
+				      struct ref_transaction *transaction,
+				      struct strbuf *err)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	reftable_addition_destroy(add);
+	transaction->backend_data = NULL;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid;
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp((*(struct ref_update **)a)->refname,
+		      (*(struct ref_update **)b)->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)transaction->ref_store;
+	uint64_t ts = reftable_stack_next_update_index(refs->stack);
+	int err = 0;
+	int i = 0;
+	struct reftable_log_record *logs =
+		calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct reftable_log_record *log = &logs[i];
+		fill_reftable_log_record(log);
+		log->ref_name = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_LOG_ONLY) {
+			continue;
+		}
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct reftable_ref_record ref = { NULL };
+			struct object_id peeled;
+
+			int peel_error = peel_object(&u->new_oid, &peeled);
+			ref.ref_name = (char *)u->refname;
+
+			if (!is_null_oid(&u->new_oid)) {
+				ref.value = u->new_oid.hash;
+			}
+			ref.update_index = ts;
+			if (!peel_error) {
+				ref.target_value = peeled.hash;
+			}
+
+			err = reftable_writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		err = reftable_writer_add_log(writer, &logs[i]);
+		clear_reftable_log_record(&logs[i]);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int reftable_transaction_finish(struct ref_store *ref_store,
+				       struct ref_transaction *transaction,
+				       struct strbuf *errmsg)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	int err = 0;
+	int i;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+
+	err = reftable_addition_add(add, &write_transaction_table, transaction);
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_addition_commit(add);
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_addition_destroy(add);
+	transaction->state = REF_TRANSACTION_CLOSED;
+	transaction->backend_data = NULL;
+	if (err) {
+		strbuf_addf(errmsg, "reftable: transaction failure: %s",
+			    reftable_error_str(err));
+		return -1;
+	}
+	return err;
+}
+
+static int
+reftable_transaction_initial_commit(struct ref_store *ref_store,
+				    struct ref_transaction *transaction,
+				    struct strbuf *errmsg)
+{
+	int err = reftable_transaction_prepare(ref_store, transaction, errmsg);
+	if (err)
+		return err;
+
+	return reftable_transaction_finish(ref_store, transaction, errmsg);
+}
+
+struct write_pseudoref_arg {
+	struct reftable_stack *stack;
+	const char *pseudoref;
+	const struct object_id *new_oid;
+	const struct object_id *old_oid;
+};
+
+static int write_pseudoref_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_pseudoref_arg *arg = (struct write_pseudoref_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	struct reftable_ref_record read_ref = { NULL };
+	struct reftable_ref_record write_ref = { NULL };
+
+	reftable_writer_set_limits(writer, ts, ts);
+	if (arg->old_oid) {
+		struct object_id read_oid;
+		err = reftable_stack_read_ref(arg->stack, arg->pseudoref,
+					      &read_ref);
+		if (err < 0)
+			goto done;
+
+		if ((err > 0) != is_null_oid(arg->old_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		/* XXX If old_oid is set, and we have a symref? */
+
+		if (err == 0 && read_ref.value == NULL) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		hashcpy(read_oid.hash, read_ref.value);
+		if (!oideq(arg->old_oid, &read_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+	}
+
+	write_ref.ref_name = (char *)arg->pseudoref;
+	write_ref.update_index = ts;
+	if (!is_null_oid(arg->new_oid))
+		write_ref.value = (uint8_t *)arg->new_oid->hash;
+
+	err = reftable_writer_add_ref(writer, &write_ref);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&read_ref);
+	return err;
+}
+
+static int reftable_write_pseudoref(struct ref_store *ref_store,
+				    const char *pseudoref,
+				    const struct object_id *oid,
+				    const struct object_id *old_oid,
+				    struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_pseudoref_arg arg = {
+		.stack = refs->stack,
+		.pseudoref = pseudoref,
+		.new_oid = oid,
+	};
+	struct reftable_addition *add = NULL;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_new_addition(&add, refs->stack);
+	if (err) {
+		goto done;
+	}
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		/* XXX this is cut & paste from files-backend - should factor
+		 * out? */
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(errbuf,
+					    _("could not read ref '%s'"),
+					    pseudoref);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(errbuf, _("ref '%s' already exists"),
+				    pseudoref);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(errbuf,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			goto done;
+		}
+	}
+
+	err = reftable_addition_add(add, &write_pseudoref_table, &arg);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref update failure: %s",
+			    reftable_error_str(err));
+	}
+
+	err = reftable_addition_commit(add);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref commit failure: %s",
+			    reftable_error_str(err));
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_addition_destroy(add);
+	return err;
+}
+
+static int reftable_delete_pseudoref(struct ref_store *ref_store,
+				     const char *pseudoref,
+				     const struct object_id *old_oid)
+{
+	struct strbuf errbuf = STRBUF_INIT;
+	int ret = reftable_write_pseudoref(ref_store, pseudoref, &null_oid,
+					   old_oid, &errbuf);
+	/* XXX what to do with the error message? */
+	strbuf_release(&errbuf);
+	return ret;
+}
+
+struct write_delete_refs_arg {
+	struct reftable_stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	int i = 0;
+
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_ref_record ref = {
+			.ref_name = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = reftable_writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_log_record log = {
+			.update_index = ts,
+		};
+		struct reftable_ref_record current = { NULL };
+		fill_reftable_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.ref_name = (char *)arg->refnames->items[i].string;
+
+		if (reftable_stack_read_ref(arg->stack, log.ref_name,
+					    &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = reftable_writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		reftable_ref_record_clear(&current);
+
+		clear_reftable_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
+				struct string_list *refnames,
+				unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_delete_refs_arg arg = {
+		.stack = refs->stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	string_list_sort(refnames);
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	return reftable_stack_compact_all(refs->stack, NULL);
+}
+
+struct write_create_symref_arg {
+	struct git_reftable_ref_store *refs;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = reftable_stack_next_update_index(create->refs->stack);
+	int err = 0;
+
+	struct reftable_ref_record ref = {
+		.ref_name = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	reftable_writer_set_limits(writer, ts, ts);
+	err = reftable_writer_add_ref(writer, &ref);
+	if (err == 0) {
+		struct reftable_log_record log = { NULL };
+		struct object_id new_oid;
+		struct object_id old_oid;
+
+		fill_reftable_log_record(&log);
+		log.ref_name = (char *)create->refname;
+		log.message = (char *)create->logmsg;
+		log.update_index = ts;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			err = reftable_writer_add_log(writer, &log);
+		}
+		log.ref_name = NULL;
+		log.message = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_reftable_log_record(&log);
+	}
+	return err;
+}
+
+static int reftable_create_symref(struct ref_store *ref_store,
+				  const char *refname, const char *target,
+				  const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(refs->stack, &write_create_symref_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+struct write_rename_arg {
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_ref_record ref = { NULL };
+	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto done;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto done;
+	}
+
+	free(ref.ref_name);
+	ref.ref_name = strdup(arg->newname);
+	reftable_writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct reftable_ref_record todo[2] = { { NULL } };
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = reftable_writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct reftable_log_record todo[2] = { { NULL } };
+		fill_reftable_log_record(&todo[0]);
+		fill_reftable_log_record(&todo[1]);
+
+		todo[0].ref_name = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].ref_name = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = reftable_writer_add_logs(writer, todo, 2);
+
+		clear_reftable_log_record(&todo[0]);
+		clear_reftable_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto done;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static int reftable_rename_ref(struct ref_store *ref_store,
+			       const char *oldrefname, const char *newrefname,
+			       const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct write_rename_arg arg = {
+		.stack = refs->stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_add(refs->stack, &write_rename_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+static int reftable_copy_ref(struct ref_store *ref_store,
+			     const char *oldrefname, const char *newrefname,
+			     const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+	char *last_name;
+};
+
+static int
+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = reftable_iterator_next_log(&ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.ref_name;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.ref_name, ri->last_name)) {
+			/* we want the refnames that we have reflogs for, so we
+			 * skip if we've already produced this name. This could
+			 * be faster by seeking directly to
+			 * reflog@update_index==0.
+			 */
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.ref_name);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+					     struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct reftable_reflog_ref_iterator *ri =
+		(struct reftable_reflog_ref_iterator *)ref_iterator;
+	reftable_log_record_clear(&ri->log);
+	reftable_iterator_destroy(&ri->iter);
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
+	reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+
+	struct reftable_merged_table *mt =
+		reftable_stack_merged_table(refs->stack);
+	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
+	if (err < 0) {
+		free(ri);
+		return NULL;
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
+			       1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int
+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_log_record log = { NULL };
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		hashcpy(old_oid.hash, log.old_hash);
+		hashcpy(new_oid.hash, log.new_hash);
+
+		full_committer = fmt_ident(log.name, log.email,
+					   WANT_COMMITTER_IDENT,
+					   /*date*/ NULL, IDENT_NO_DATE);
+		err = fn(&old_oid, &new_oid, full_committer, log.time,
+			 log.tz_offset, log.message, cb_data);
+		if (err)
+			break;
+	}
+
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+static int
+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
+					  const char *refname,
+					  each_reflog_ent_fn fn, void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+	int i = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct reftable_log_record log = { NULL };
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+
+		if (strcmp(log.ref_name, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (i = len; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		err = fn(&old_oid, &new_oid, full_committer, log->time,
+			 log->tz_offset, log->message, cb_data);
+		if (err) {
+			break;
+		}
+	}
+
+	for (i = 0; i < len; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+static int reftable_reflog_exists(struct ref_store *ref_store,
+				  const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int reftable_create_reflog(struct ref_store *ref_store,
+				  const char *refname, int force_create,
+				  struct strbuf *err)
+{
+	return 0;
+}
+
+static int reftable_delete_reflog(struct ref_store *ref_store,
+				  const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		reftable_log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct reftable_log_record tombstone = {
+		.ref_name = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->refs->stack);
+	int i = 0;
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int reftable_reflog_expire(struct ref_store *ref_store,
+				  const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags,
+				  reflog_expiry_prepare_fn prepare_fn,
+				  reflog_expiry_should_prune_fn should_prune_fn,
+				  reflog_expiry_cleanup_fn cleanup_fn,
+				  void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  reftable_pack_refs(), which will compact the entire stack and get rid
+	  of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.refs = refs,
+	};
+	struct reftable_log_record log = { NULL };
+	struct reftable_iterator it = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	mt = reftable_stack_merged_table(refs->stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		goto done;
+	}
+
+	while (1) {
+		struct object_id ooid;
+		struct object_id noid;
+
+		int err = reftable_iterator_next_log(&it, &log);
+		if (err < 0) {
+			goto done;
+		}
+
+		if (err > 0 || strcmp(log.ref_name, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int reftable_read_raw_ref(struct ref_store *ref_store,
+				 const char *refname, struct object_id *oid,
+				 struct strbuf *referent, unsigned int *type)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	/* This is usually not needed, but Git doesn't signal to ref backend if
+	   a subprocess updated the ref DB.  So we always check.
+	*/
+	err = reftable_stack_reload(refs->stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_read_ref(refs->stack, refname, &ref);
+	if (err > 0) {
+		errno = ENOENT;
+		err = -1;
+		goto done;
+	}
+	if (err < 0) {
+		errno = reftable_error_to_errno(err);
+		err = -1;
+		goto done;
+	}
+	if (ref.target != NULL) {
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else if (ref.value != NULL) {
+		hashcpy(oid->hash, ref.value);
+	} else {
+		*type |= REF_ISBROKEN;
+		errno = EINVAL;
+		err = -1;
+	}
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	git_reftable_ref_store_create,
+	reftable_init_db,
+	reftable_transaction_prepare,
+	reftable_transaction_finish,
+	reftable_transaction_abort,
+	reftable_transaction_initial_commit,
+
+	reftable_pack_refs,
+	reftable_create_symref,
+	reftable_delete_refs,
+	reftable_rename_ref,
+	reftable_copy_ref,
+
+	reftable_write_pseudoref,
+	reftable_delete_pseudoref,
+
+	reftable_ref_iterator_begin,
+	reftable_read_raw_ref,
+
+	reftable_reflog_iterator_begin,
+	reftable_for_each_reflog_ent_oldest_first,
+	reftable_for_each_reflog_ent_newest_first,
+	reftable_reflog_exists,
+	reftable_create_reflog,
+	reftable_delete_reflog,
+	reftable_reflog_expire,
+};
diff --git a/reftable/update.sh b/reftable/update.sh
index 4dadd875f4a..d816005db68 100755
--- a/reftable/update.sh
+++ b/reftable/update.sh
@@ -16,7 +16,4 @@ cp reftable-repo/LICENSE reftable/
 git --git-dir reftable-repo/.git show --no-patch --format=oneline HEAD \
   > reftable/VERSION
 
-mv reftable/system.h reftable/system.h~
-sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
-
 git add reftable/*.[ch] reftable/LICENSE reftable/VERSION
diff --git a/repository.c b/repository.c
index 6f7f6f002b1..087760bc184 100644
--- a/repository.c
+++ b/repository.c
@@ -178,6 +178,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 3c1f7d54bd3..293b3463049 100644
--- a/repository.h
+++ b/repository.h
@@ -74,6 +74,9 @@ struct repository {
 	 */
 	struct ref_store *refs_private;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index eb066db6d8c..dccd3970c04 100644
--- a/setup.c
+++ b/setup.c
@@ -469,9 +469,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -594,6 +596,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1239,8 +1242,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 00000000000..168d0b0b2f4
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,155 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+INVALID_SHA1=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+initialize ()  {
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	mv .git/hooks .git/hooks-disabled
+}
+
+test_expect_success 'delete ref' '
+	initialize &&
+	test_commit file &&
+	SHA=$(git show-ref -s --verify HEAD) &&
+	test_write_lines "$SHA refs/heads/master" "$SHA refs/tags/file" >expect &&
+	git show-ref > actual &&
+	! git update-ref -d refs/tags/file $INVALID_SHA1 &&
+	test_cmp expect actual &&
+	git update-ref -d refs/tags/file $SHA  &&
+	test_write_lines "$SHA refs/heads/master" >expect &&
+	git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone calls transaction_initial_commit' '
+	test_commit message1 file1 &&
+	git clone . cloned &&
+	(test  -f cloned/file1 || echo "Fixme.")
+'
+
+test_expect_success 'basic operation of reftable storage: commit, show-ref' '
+	initialize &&
+	test_commit file &&
+	test_write_lines refs/heads/master refs/tags/file >expect &&
+	git show-ref &&
+	git show-ref | cut -f2 -d" " > actual &&
+	test_cmp actual expect
+'
+
+test_expect_success 'reflog, repack' '
+	initialize &&
+	for count in $(test_seq 1 10)
+	do
+		test_commit "number $count" file.t $count number-$count ||
+		return 1
+	done &&
+	git pack-refs &&
+	ls -1 .git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 10 output &&
+	grep "commit (initial): number 1" output &&
+	grep "commit: number 10" output &&
+	git gc &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 0 output
+'
+
+test_expect_success 'branch switch in reflog output' '
+	initialize &&
+	test_commit file1 &&
+	git checkout -b branch1 &&
+	test_commit file2 &&
+	git checkout -b branch2 &&
+	git switch - &&
+	git rev-parse --symbolic-full-name HEAD > actual &&
+	echo refs/heads/branch1 > expect &&
+	test_cmp actual expect
+'
+
+
+# This matches show-ref's output
+print_ref() {
+	echo "$(git rev-parse "$1") $1"
+}
+
+test_expect_success 'peeled tags are stored' '
+	initialize &&
+	test_commit file &&
+	git tag -m "annotated tag" test_tag HEAD &&
+	{
+		print_ref "refs/heads/master" &&
+		print_ref "refs/tags/file" &&
+		print_ref "refs/tags/test_tag" &&
+		print_ref "refs/tags/test_tag^{}"
+	} >expect &&
+	git show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'show-ref works on fresh repo' '
+	initialize &&
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	>expect &&
+	! git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'checkout unborn branch' '
+	initialize &&
+	git checkout -b master
+'
+
+
+test_expect_success 'dir/file conflict' '
+	initialize &&
+	test_commit file &&
+	! git branch master/forbidden
+'
+
+
+test_expect_success 'do not clobber existing repo' '
+	rm -rf .git &&
+	git init --ref-storage=files &&
+	cat .git/HEAD > expect &&
+	test_commit file &&
+	(git init --ref-storage=reftable || true) &&
+	cat .git/HEAD > actual &&
+	test_cmp expect actual
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'pseudo refs' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+	git cherry-pick source &&
+	test -f file2
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'rebase' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+	git rebase source &&
+	test -f file2
+'
+
+test_done
+
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 14/19] Hookup unittests for the reftable library.
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (12 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 13/19] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 15/19] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
                                                     ` (5 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The unittests are under reftable/*_test.c, so all of the reftable code stays in
one directory. They are called from t/helpers/test-reftable.c in t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile                 | 17 ++++++++++++++++-
 t/helper/test-reftable.c | 15 +++++++++++++++
 t/helper/test-tool.c     |  1 +
 t/helper/test-tool.h     |  1 +
 t/t0031-reftable.sh      |  5 +++++
 5 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 t/helper/test-reftable.c

diff --git a/Makefile b/Makefile
index 6571ed12468..8f33ad899ae 100644
--- a/Makefile
+++ b/Makefile
@@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-read-cache.o
 TEST_BUILTINS_OBJS += test-read-graph.o
 TEST_BUILTINS_OBJS += test-read-midx.o
 TEST_BUILTINS_OBJS += test-ref-store.o
+TEST_BUILTINS_OBJS += test-reftable.o
 TEST_BUILTINS_OBJS += test-regex.o
 TEST_BUILTINS_OBJS += test-repository.o
 TEST_BUILTINS_OBJS += test-revision-walking.o
@@ -808,6 +809,7 @@ LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
 REFTABLE_LIB = reftable/libreftable.a
+REFTABLE_TEST_LIB = reftable/libreftable_test.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -2371,6 +2373,15 @@ REFTABLE_OBJS += reftable/tree.o
 REFTABLE_OBJS += reftable/writer.o
 REFTABLE_OBJS += reftable/zlib-compat.o
 
+REFTABLE_TEST_OBJS += reftable/block_test.o
+REFTABLE_TEST_OBJS += reftable/merged_test.o
+REFTABLE_TEST_OBJS += reftable/record_test.o
+REFTABLE_TEST_OBJS += reftable/refname_test.o
+REFTABLE_TEST_OBJS += reftable/reftable_test.o
+REFTABLE_TEST_OBJS += reftable/strbuf_test.o
+REFTABLE_TEST_OBJS += reftable/stack_test.o
+REFTABLE_TEST_OBJS += reftable/tree_test.o
+REFTABLE_TEST_OBJS += reftable/test_framework.o
 
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
@@ -2378,6 +2389,7 @@ OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
 	$(REFTABLE_OBJS) \
+	$(REFTABLE_TEST_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2521,6 +2533,9 @@ $(VCSSVN_LIB): $(VCSSVN_OBJS)
 $(REFTABLE_LIB): $(REFTABLE_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_TEST_LIB): $(REFTABLE_TEST_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -2803,7 +2818,7 @@ t/helper/test-svn-fe$X: $(VCSSVN_LIB)
 
 t/helper/test-tool$X: $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 
-t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS)
+t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS) $(REFTABLE_TEST_LIB)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS)
 
 check-sha1:: t/helper/test-tool$X
diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c
new file mode 100644
index 00000000000..def88834396
--- /dev/null
+++ b/t/helper/test-reftable.c
@@ -0,0 +1,15 @@
+#include "reftable/reftable-tests.h"
+#include "test-tool.h"
+
+int cmd__reftable(int argc, const char **argv)
+{
+	block_test_main(argc, argv);
+	merged_test_main(argc, argv);
+	record_test_main(argc, argv);
+	refname_test_main(argc, argv);
+	reftable_test_main(argc, argv);
+	strbuf_test_main(argc, argv);
+	stack_test_main(argc, argv);
+	tree_test_main(argc, argv);
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 590b2efca70..10366b7b762 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -52,6 +52,7 @@ static struct test_cmd cmds[] = {
 	{ "read-graph", cmd__read_graph },
 	{ "read-midx", cmd__read_midx },
 	{ "ref-store", cmd__ref_store },
+	{ "reftable", cmd__reftable },
 	{ "regex", cmd__regex },
 	{ "repository", cmd__repository },
 	{ "revision-walking", cmd__revision_walking },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ddc8e990e91..d52ba2f5e57 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -41,6 +41,7 @@ int cmd__read_cache(int argc, const char **argv);
 int cmd__read_graph(int argc, const char **argv);
 int cmd__read_midx(int argc, const char **argv);
 int cmd__ref_store(int argc, const char **argv);
+int cmd__reftable(int argc, const char **argv);
 int cmd__regex(int argc, const char **argv);
 int cmd__repository(int argc, const char **argv);
 int cmd__revision_walking(int argc, const char **argv);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
index 168d0b0b2f4..3615e364a51 100755
--- a/t/t0031-reftable.sh
+++ b/t/t0031-reftable.sh
@@ -15,6 +15,11 @@ initialize ()  {
 	mv .git/hooks .git/hooks-disabled
 }
 
+test_expect_success 'unittests' '
+	test-tool reftable
+'
+
+
 test_expect_success 'delete ref' '
 	initialize &&
 	test_commit file &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 15/19] Add GIT_DEBUG_REFS debugging mechanism
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (13 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 14/19] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 16/19] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
                                                     ` (4 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

When set in the environment, GIT_DEBUG_REFS makes git print operations and
results as they flow through the ref storage backend. This helps debug
discrepancies between different ref backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile              |   1 +
 refs.c                |   3 +
 refs/debug.c          | 411 ++++++++++++++++++++++++++++++++++++++++++
 refs/refs-internal.h  |   6 +
 t/t0033-debug-refs.sh |  18 ++
 5 files changed, 439 insertions(+)
 create mode 100644 refs/debug.c
 create mode 100755 t/t0033-debug-refs.sh

diff --git a/Makefile b/Makefile
index 8f33ad899ae..700ba10b01a 100644
--- a/Makefile
+++ b/Makefile
@@ -961,6 +961,7 @@ LIB_OBJS += rebase.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
+LIB_OBJS += refs/debug.o
 LIB_OBJS += refs/files-backend.o
 LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
diff --git a/refs.c b/refs.c
index c4b704aa177..ba5239b8421 100644
--- a/refs.c
+++ b/refs.c
@@ -1750,6 +1750,9 @@ struct ref_store *get_main_ref_store(struct repository *r)
 						 r->ref_storage_format :
 						 default_ref_storage(),
 					 REF_STORE_ALL_CAPS);
+	if (getenv("GIT_DEBUG_REFS")) {
+		r->refs_private = debug_wrap(r->refs_private);
+	}
 	return r->refs_private;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
new file mode 100644
index 00000000000..950cad18718
--- /dev/null
+++ b/refs/debug.c
@@ -0,0 +1,411 @@
+
+#include "refs-internal.h"
+
+struct debug_ref_store {
+	struct ref_store base;
+	struct ref_store *refs;
+};
+
+extern struct ref_storage_be refs_be_debug;
+struct ref_store *debug_wrap(struct ref_store *store);
+
+struct ref_store *debug_wrap(struct ref_store *store)
+{
+	struct debug_ref_store *res = malloc(sizeof(struct debug_ref_store));
+	res->refs = store;
+	base_ref_store_init((struct ref_store *)res, &refs_be_debug);
+	return (struct ref_store *)res;
+}
+
+static int debug_init_db(struct ref_store *refs, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res = drefs->refs->be->init_db(drefs->refs, err);
+	fprintf(stderr, "init_db: %d\n", res);
+	return res;
+}
+
+static int debug_transaction_prepare(struct ref_store *refs,
+				     struct ref_transaction *transaction,
+				     struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_prepare(drefs->refs, transaction,
+						   err);
+	fprintf(stderr, "transaction_prepare: %d\n", res);
+	return res;
+}
+
+static void print_update(int i, const char *refname,
+			 const struct object_id *old_oid,
+			 const struct object_id *new_oid, unsigned int flags,
+			 unsigned int type, const char *msg)
+{
+	char o[200] = "null";
+	char n[200] = "null";
+	if (old_oid)
+		oid_to_hex_r(o, old_oid);
+	if (new_oid)
+		oid_to_hex_r(n, new_oid);
+
+	type &= 0xf; /* see refs.h REF_* */
+	flags &= REF_HAVE_NEW | REF_HAVE_OLD | REF_NO_DEREF |
+		 REF_FORCE_CREATE_REFLOG | REF_LOG_ONLY;
+	fprintf(stderr, "%d: %s %s -> %s (F=0x%x, T=0x%x) \"%s\"\n", i, refname,
+		o, n, flags, type, msg);
+}
+
+static void print_transaction(struct ref_transaction *transaction)
+{
+	fprintf(stderr, "transaction {\n");
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		print_update(i, u->refname, &u->old_oid, &u->new_oid, u->flags,
+			     u->type, u->msg);
+	}
+	fprintf(stderr, "}\n");
+}
+
+static int debug_transaction_finish(struct ref_store *refs,
+				    struct ref_transaction *transaction,
+				    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_finish(drefs->refs, transaction,
+						  err);
+	print_transaction(transaction);
+	fprintf(stderr, "finish: %d\n", res);
+	return res;
+}
+
+static int debug_transaction_abort(struct ref_store *refs,
+				   struct ref_transaction *transaction,
+				   struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_abort(drefs->refs, transaction, err);
+	return res;
+}
+
+static int debug_initial_transaction_commit(struct ref_store *refs,
+					    struct ref_transaction *transaction,
+					    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->initial_transaction_commit(drefs->refs,
+							  transaction, err);
+	return res;
+}
+
+static int debug_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->pack_refs(drefs->refs, flags);
+	fprintf(stderr, "pack_refs: %d\n", res);
+	return res;
+}
+
+static int debug_create_symref(struct ref_store *ref_store,
+			       const char *ref_name, const char *target,
+			       const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_symref(drefs->refs, ref_name, target,
+						 logmsg);
+	fprintf(stderr, "create_symref: %s -> %s \"%s\": %d\n", ref_name,
+		target, logmsg, res);
+	return res;
+}
+
+static int debug_delete_refs(struct ref_store *ref_store, const char *msg,
+			     struct string_list *refnames, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->delete_refs(drefs->refs, msg, refnames, flags);
+	return res;
+}
+
+static int debug_rename_ref(struct ref_store *ref_store, const char *oldref,
+			    const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->rename_ref(drefs->refs, oldref, newref,
+					      logmsg);
+	fprintf(stderr, "rename_ref: %s -> %s \"%s\": %d\n", oldref, newref,
+		logmsg, res);
+	return res;
+}
+
+static int debug_copy_ref(struct ref_store *ref_store, const char *oldref,
+			  const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->copy_ref(drefs->refs, oldref, newref, logmsg);
+	fprintf(stderr, "copy_ref: %s -> %s \"%s\": %d\n", oldref, newref,
+		logmsg, res);
+	return res;
+}
+
+static int debug_write_pseudoref(struct ref_store *ref_store,
+				 const char *pseudoref,
+				 const struct object_id *oid,
+				 const struct object_id *old_oid,
+				 struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->write_pseudoref(drefs->refs, pseudoref, oid,
+						   old_oid, err);
+	char o[100] = "null";
+	char n[100] = "null";
+	if (oid)
+		oid_to_hex_r(o, oid);
+	if (old_oid)
+		oid_to_hex_r(n, old_oid);
+	fprintf(stderr, "write_pseudoref: %s, %s => %s, err %s: %d\n",
+		pseudoref, o, n, err->buf, res);
+	return res;
+}
+
+static int debug_delete_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *old_oid)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->delete_pseudoref(drefs->refs, pseudoref,
+						    old_oid);
+	char hex[100] = "null";
+	if (old_oid)
+		oid_to_hex_r(hex, old_oid);
+	fprintf(stderr, "delete_pseudoref: %s (%s): %d\n", pseudoref, hex, res);
+	return res;
+}
+
+struct debug_ref_iterator {
+	struct ref_iterator base;
+	struct ref_iterator *iter;
+};
+
+static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->advance(diter->iter);
+	fprintf(stderr, "iterator_advance: %s: %d\n", diter->iter->refname,
+		res);
+	diter->base.ordered = diter->iter->ordered;
+	diter->base.refname = diter->iter->refname;
+	diter->base.oid = diter->iter->oid;
+	diter->base.flags = diter->iter->flags;
+	return res;
+}
+static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				   struct object_id *peeled)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->peel(diter->iter, peeled);
+	fprintf(stderr, "iterator_peel: %s: %d\n", diter->iter->refname, res);
+	return res;
+}
+
+static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->abort(diter->iter);
+	fprintf(stderr, "iterator_abort: %d\n", res);
+	return res;
+}
+
+static struct ref_iterator_vtable debug_ref_iterator_vtable = {
+	debug_ref_iterator_advance, debug_ref_iterator_peel,
+	debug_ref_iterator_abort
+};
+
+static struct ref_iterator *
+debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			 unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->iterator_begin(drefs->refs, prefix, flags);
+	struct debug_ref_iterator *diter = xcalloc(1, sizeof(*diter));
+	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable, 1);
+	diter->iter = res;
+	fprintf(stderr, "ref_iterator_begin: %s (0x%x)\n", prefix, flags);
+	return &diter->base;
+}
+
+static int debug_read_raw_ref(struct ref_store *ref_store, const char *refname,
+			      struct object_id *oid, struct strbuf *referent,
+			      unsigned int *type)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = 0;
+
+	oidcpy(oid, &null_oid);
+	res = drefs->refs->be->read_raw_ref(drefs->refs, refname, oid, referent,
+					    type);
+
+	if (res == 0) {
+		fprintf(stderr, "read_raw_ref: %s: %s (=> %s) type %x: %d\n",
+			refname, oid_to_hex(oid), referent->buf, *type, res);
+	} else {
+		fprintf(stderr, "read_raw_ref: %s err %d\n", refname, res);
+	}
+	return res;
+}
+
+static struct ref_iterator *
+debug_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->reflog_iterator_begin(drefs->refs);
+	fprintf(stderr, "for_each_reflog_iterator_begin\n");
+	return res;
+}
+
+struct debug_reflog {
+	const char *refname;
+	each_reflog_ent_fn *fn;
+	void *cb_data;
+};
+
+static int debug_print_reflog_ent(struct object_id *old_oid,
+				  struct object_id *new_oid,
+				  const char *committer, timestamp_t timestamp,
+				  int tz, const char *msg, void *cb_data)
+{
+	struct debug_reflog *dbg = (struct debug_reflog *)cb_data;
+	int ret;
+	char o[100] = "null";
+	char n[100] = "null";
+	if (old_oid)
+		oid_to_hex_r(o, old_oid);
+	if (new_oid)
+		oid_to_hex_r(n, new_oid);
+
+	ret = dbg->fn(old_oid, new_oid, committer, timestamp, tz, msg,
+		      dbg->cb_data);
+	fprintf(stderr, "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
+		dbg->refname, ret, o, n, committer, (long int)timestamp, msg);
+	return ret;
+}
+
+static int debug_for_each_reflog_ent(struct ref_store *ref_store,
+				     const char *refname, each_reflog_ent_fn fn,
+				     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct debug_reflog dbg = {
+		.refname = refname,
+		.fn = fn,
+		.cb_data = cb_data,
+	};
+
+	int res = drefs->refs->be->for_each_reflog_ent(
+		drefs->refs, refname, &debug_print_reflog_ent, &dbg);
+	fprintf(stderr, "for_each_reflog: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_for_each_reflog_ent_reverse(struct ref_store *ref_store,
+					     const char *refname,
+					     each_reflog_ent_fn fn,
+					     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct debug_reflog dbg = {
+		.refname = refname,
+		.fn = fn,
+		.cb_data = cb_data,
+	};
+	int res = drefs->refs->be->for_each_reflog_ent_reverse(
+		drefs->refs, refname, &debug_print_reflog_ent, &dbg);
+	fprintf(stderr, "for_each_reflog_reverse: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_reflog_exists(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_exists(drefs->refs, refname);
+	fprintf(stderr, "reflog_exists: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_create_reflog(struct ref_store *ref_store, const char *refname,
+			       int force_create, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_reflog(drefs->refs, refname,
+						 force_create, err);
+	fprintf(stderr, "create_reflog: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_delete_reflog(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->delete_reflog(drefs->refs, refname);
+	fprintf(stderr, "delete_reflog: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
+			       const struct object_id *oid, unsigned int flags,
+			       reflog_expiry_prepare_fn prepare_fn,
+			       reflog_expiry_should_prune_fn should_prune_fn,
+			       reflog_expiry_cleanup_fn cleanup_fn,
+			       void *policy_cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_expire(drefs->refs, refname, oid,
+						 flags, prepare_fn,
+						 should_prune_fn, cleanup_fn,
+						 policy_cb_data);
+	fprintf(stderr, "reflog_expire: %s: %d\n", refname, res);
+	return res;
+}
+
+struct ref_storage_be refs_be_debug = {
+	NULL,
+	"debug",
+	NULL,
+	debug_init_db,
+	debug_transaction_prepare,
+	debug_transaction_finish,
+	debug_transaction_abort,
+	debug_initial_transaction_commit,
+
+	debug_pack_refs,
+	debug_create_symref,
+	debug_delete_refs,
+	debug_rename_ref,
+	debug_copy_ref,
+
+	debug_write_pseudoref,
+	debug_delete_pseudoref,
+
+	debug_ref_iterator_begin,
+	debug_read_raw_ref,
+
+	debug_reflog_iterator_begin,
+	debug_for_each_reflog_ent,
+	debug_for_each_reflog_ent_reverse,
+	debug_reflog_exists,
+	debug_create_reflog,
+	debug_delete_reflog,
+	debug_reflog_expire,
+};
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 7afe4c28310..5417623c86e 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -713,4 +713,10 @@ struct ref_store {
 void base_ref_store_init(struct ref_store *refs,
 			 const struct ref_storage_be *be);
 
+/*
+ * Print out ref operations as they occur. Useful for debugging alternate ref
+ * backends.
+ */
+struct ref_store *debug_wrap(struct ref_store *store);
+
 #endif /* REFS_REFS_INTERNAL_H */
diff --git a/t/t0033-debug-refs.sh b/t/t0033-debug-refs.sh
new file mode 100755
index 00000000000..ed76c2c84a1
--- /dev/null
+++ b/t/t0033-debug-refs.sh
@@ -0,0 +1,18 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='cross-check reftable with files, using GIT_DEBUG_REFS output'
+
+. ./test-lib.sh
+
+test_expect_success 'GIT_DEBUG_REFS' '
+	git init --ref-storage=files files &&
+	git init --ref-storage=reftable reftable &&
+	(cd files && GIT_DEBUG_REFS=1 test_commit message file) > files.txt &&
+	(cd reftable && GIT_DEBUG_REFS=1 test_commit message file) > reftable.txt &&
+	test_cmp files.txt reftable.txt
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 16/19] vcxproj: adjust for the reftable changes
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (14 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 15/19] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Johannes Schindelin via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 17/19] git-prompt: prepare for reftable refs backend SZEDER Gábor via GitGitGadget
                                                     ` (3 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This allows Git to be compiled via Visual Studio again after integrating
the `hn/reftable` branch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.mak.uname                           |  2 +-
 contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/config.mak.uname b/config.mak.uname
index c7eba69e54e..ae4e25a1a42 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -709,7 +709,7 @@ vcxproj:
 	# Make .vcxproj files and add them
 	unset QUIET_GEN QUIET_BUILT_IN; \
 	perl contrib/buildsystems/generate -g Vcxproj
-	git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
+	git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj
 
 	# Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
 	(echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm
index 5c666f9ac03..33a08d31652 100644
--- a/contrib/buildsystems/Generators/Vcxproj.pm
+++ b/contrib/buildsystems/Generators/Vcxproj.pm
@@ -77,7 +77,7 @@ sub createProject {
     my $libs_release = "\n    ";
     my $libs_debug = "\n    ";
     if (!$static_library) {
-      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
+      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
       $libs_debug = $libs_release;
       $libs_debug =~ s/zlib\.lib/zlibd\.lib/g;
       $libs_debug =~ s/libcurl\.lib/libcurl-d\.lib/g;
@@ -231,6 +231,7 @@ sub createProject {
 EOM
     if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') {
       my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"};
+      my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"};
       my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"};
 
       print F << "EOM";
@@ -240,6 +241,14 @@ sub createProject {
       <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
     </ProjectReference>
 EOM
+      if (!($name =~ /xdiff|libreftable/)) {
+        print F << "EOM";
+    <ProjectReference Include="$cdup\\reftable\\libreftable\\libreftable.vcxproj">
+      <Project>$uuid_libreftable</Project>
+      <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
+    </ProjectReference>
+EOM
+      }
       if (!($name =~ 'xdiff')) {
         print F << "EOM";
     <ProjectReference Include="$cdup\\xdiff\\lib\\xdiff_lib.vcxproj">
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 17/19] git-prompt: prepare for reftable refs backend
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (15 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 16/19] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
@ 2020-06-22 21:55                                   ` SZEDER Gábor via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 18/19] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
                                                     ` (2 subsequent siblings)
  19 siblings, 0 replies; 409+ messages in thread
From: SZEDER Gábor via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, SZEDER Gábor

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

In our git-prompt script we strive to use Bash builtins wherever
possible, because fork()-ing subshells for command substitutions and
fork()+exec()-ing Git commands are expensive on some platforms.  We
even read and parse '.git/HEAD' using Bash builtins to get the name of
the current branch [1].  However, the upcoming reftable refs backend
won't use '.git/HEAD' at all, but will write an invalid refname as
placeholder for backwards compatibility instead, which will break our
git-prompt script.

Update the git-prompt script to recognize the placeholder '.git/HEAD'
written by the reftable backend (its content is specified in the
reftable specs), and then fall back to use 'git symbolic-ref' to get
the name of the current branch.

[1] 3a43c4b5bd (bash prompt: use bash builtins to find out current
    branch, 2011-03-31)

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 contrib/completion/git-prompt.sh | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/contrib/completion/git-prompt.sh b/contrib/completion/git-prompt.sh
index 014cd7c3cfc..f96d666d87d 100644
--- a/contrib/completion/git-prompt.sh
+++ b/contrib/completion/git-prompt.sh
@@ -460,10 +460,15 @@ __git_ps1 ()
 			if ! __git_eread "$g/HEAD" head; then
 				return $exit
 			fi
-			# is it a symbolic ref?
 			b="${head#ref: }"
 			if [ "$head" = "$b" ]; then
 				detached=yes
+			elif [ "$b" = "refs/heads/.invalid" ]; then
+				# Reftable
+				b="$(git symbolic-ref HEAD 2>/dev/null)" ||
+				detached=yes
+			fi
+			if [ "$detached" = yes ]; then
 				b="$(
 				case "${GIT_PS1_DESCRIBE_STYLE-}" in
 				(contains)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 18/19] Add reftable testing infrastructure
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (16 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 17/19] git-prompt: prepare for reftable refs backend SZEDER Gábor via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-22 21:55                                   ` [PATCH v18 19/19] Add "test-tool dump-reftable" command Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* Add GIT_TEST_REFTABLE environment var to control default ref storage

* Add test_prerequisite REFTABLE.

* Skip some tests that are incompatible:

  * t3210-pack-refs.sh - does not apply
  * t1450-fsck.sh - manipulates .git/ directly to create invalid state

Major test failures:

 * t1400-update-ref.sh - Reads from .git/{refs,logs} directly
 * t1404-update-ref-errors.sh - Manipulates .git/refs/ directly
 * t1405 - inspecs .git/ directly.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/t1409-avoid-packing-refs.sh | 6 ++++++
 t/t1450-fsck.sh               | 6 ++++++
 t/t3210-pack-refs.sh          | 6 ++++++
 t/test-lib.sh                 | 5 +++++
 4 files changed, 23 insertions(+)

diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
index be12fb63506..c6f78325563 100755
--- a/t/t1409-avoid-packing-refs.sh
+++ b/t/t1409-avoid-packing-refs.sh
@@ -4,6 +4,12 @@ test_description='avoid rewriting packed-refs unnecessarily'
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 # Add an identifying mark to the packed-refs file header line. This
 # shouldn't upset readers, and it should be omitted if the file is
 # ever rewritten.
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 344a2aad82f..09669203249 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -8,6 +8,12 @@ test_description='git fsck random collection of tests
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success setup '
 	test_oid_init &&
 	git config gc.auto 0 &&
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index f41b2afb996..edaef2c175a 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -11,6 +11,12 @@ semantic is still the same.
 '
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success 'enable reflogs' '
 	git config core.logallrefupdates true
 '
diff --git a/t/test-lib.sh b/t/test-lib.sh
index dbc027ff267..3ce9b957b1b 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1504,6 +1504,11 @@ parisc* | hppa*)
 	;;
 esac
 
+if test -n "$GIT_TEST_REFTABLE"
+then
+  test_set_prereq REFTABLE
+fi
+
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v18 19/19] Add "test-tool dump-reftable" command.
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (17 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 18/19] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
@ 2020-06-22 21:55                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  19 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-22 21:55 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This command dumps individual tables or a stack of of tables.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile                 | 1 +
 t/helper/test-reftable.c | 5 +++++
 t/helper/test-tool.c     | 1 +
 t/helper/test-tool.h     | 1 +
 4 files changed, 8 insertions(+)

diff --git a/Makefile b/Makefile
index 700ba10b01a..d949b79720c 100644
--- a/Makefile
+++ b/Makefile
@@ -2375,6 +2375,7 @@ REFTABLE_OBJS += reftable/writer.o
 REFTABLE_OBJS += reftable/zlib-compat.o
 
 REFTABLE_TEST_OBJS += reftable/block_test.o
+REFTABLE_TEST_OBJS += reftable/dump.o
 REFTABLE_TEST_OBJS += reftable/merged_test.o
 REFTABLE_TEST_OBJS += reftable/record_test.o
 REFTABLE_TEST_OBJS += reftable/refname_test.o
diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c
index def88834396..aff4fbccda1 100644
--- a/t/helper/test-reftable.c
+++ b/t/helper/test-reftable.c
@@ -13,3 +13,8 @@ int cmd__reftable(int argc, const char **argv)
 	tree_test_main(argc, argv);
 	return 0;
 }
+
+int cmd__dump_reftable(int argc, const char **argv)
+{
+	return reftable_dump_main(argc, (char *const *)argv);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 10366b7b762..9e689f9d2b3 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -53,6 +53,7 @@ static struct test_cmd cmds[] = {
 	{ "read-midx", cmd__read_midx },
 	{ "ref-store", cmd__ref_store },
 	{ "reftable", cmd__reftable },
+	{ "dump-reftable", cmd__dump_reftable },
 	{ "regex", cmd__regex },
 	{ "repository", cmd__repository },
 	{ "revision-walking", cmd__revision_walking },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index d52ba2f5e57..bf833e01d45 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -17,6 +17,7 @@ int cmd__dump_cache_tree(int argc, const char **argv);
 int cmd__dump_fsmonitor(int argc, const char **argv);
 int cmd__dump_split_index(int argc, const char **argv);
 int cmd__dump_untracked_cache(int argc, const char **argv);
+int cmd__dump_reftable(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__genzeros(int argc, const char **argv);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 00/20] Reftable support git-core
  2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                     ` (18 preceding siblings ...)
  2020-06-22 21:55                                   ` [PATCH v18 19/19] Add "test-tool dump-reftable" command Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                   ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 01/20] lib-t6000.sh: write tag using git-update-ref Han-Wen Nienhuys via GitGitGadget
                                                       ` (21 more replies)
  19 siblings, 22 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend. 

Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1

Summary 21852 tests pass 556 tests fail

v22

 * worktree support

Han-Wen Nienhuys (18):
  lib-t6000.sh: write tag using git-update-ref
  t3432: use git-reflog to inspect the reflog for HEAD
  checkout: add '\n' to reflog message
  Write pseudorefs through ref backends.
  Make refs_ref_exists public
  Treat BISECT_HEAD as a pseudo ref
  Treat CHERRY_PICK_HEAD as a pseudo ref
  Treat REVERT_HEAD as a pseudo ref
  Move REF_LOG_ONLY to refs-internal.h
  Iterate over the "refs/" namespace in for_each_[raw]ref
  Add .gitattributes for the reftable/ directory
  Add reftable library
  Add standalone build infrastructure for reftable
  Reftable support for git-core
  Hookup unittests for the reftable library.
  Add GIT_DEBUG_REFS debugging mechanism
  Add reftable testing infrastructure
  Add "test-tool dump-reftable" command.

Johannes Schindelin (1):
  vcxproj: adjust for the reftable changes

SZEDER Gábor (1):
  git-prompt: prepare for reftable refs backend

 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   47 +-
 builtin/bisect--helper.c                      |    3 +-
 builtin/checkout.c                            |    5 +-
 builtin/clone.c                               |    4 +-
 builtin/commit.c                              |   34 +-
 builtin/init-db.c                             |   54 +-
 builtin/merge.c                               |    2 +-
 builtin/worktree.c                            |   31 +-
 cache.h                                       |    8 +-
 config.mak.uname                              |    2 +-
 contrib/buildsystems/Generators/Vcxproj.pm    |   11 +-
 contrib/completion/git-prompt.sh              |    7 +-
 git-bisect.sh                                 |    4 +-
 path.c                                        |    2 -
 path.h                                        |    9 +-
 refs.c                                        |  157 +-
 refs.h                                        |   16 +
 refs/debug.c                                  |  416 +++++
 refs/files-backend.c                          |  121 +-
 refs/packed-backend.c                         |   21 +-
 refs/refs-internal.h                          |   32 +
 refs/reftable-backend.c                       | 1497 +++++++++++++++++
 reftable/.gitattributes                       |    1 +
 reftable/BUILD                                |  203 +++
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   33 +
 reftable/VERSION                              |    1 +
 reftable/WORKSPACE                            |   14 +
 reftable/basics.c                             |  215 +++
 reftable/basics.h                             |   53 +
 reftable/block.c                              |  432 +++++
 reftable/block.h                              |  129 ++
 reftable/block_test.c                         |  157 ++
 reftable/compat.c                             |   98 ++
 reftable/compat.h                             |   48 +
 reftable/constants.h                          |   21 +
 reftable/dump.c                               |  212 +++
 reftable/file.c                               |   95 ++
 reftable/iter.c                               |  242 +++
 reftable/iter.h                               |   72 +
 reftable/merged.c                             |  317 ++++
 reftable/merged.h                             |   43 +
 reftable/merged_test.c                        |  286 ++++
 reftable/pq.c                                 |  115 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  744 ++++++++
 reftable/reader.h                             |   65 +
 reftable/record.c                             | 1126 +++++++++++++
 reftable/record.h                             |  143 ++
 reftable/record_test.c                        |  410 +++++
 reftable/refname.c                            |  209 +++
 reftable/refname.h                            |   38 +
 reftable/refname_test.c                       |   99 ++
 reftable/reftable-tests.h                     |   22 +
 reftable/reftable.c                           |  154 ++
 reftable/reftable.h                           |  585 +++++++
 reftable/reftable_test.c                      |  632 +++++++
 reftable/stack.c                              | 1225 ++++++++++++++
 reftable/stack.h                              |   50 +
 reftable/stack_test.c                         |  787 +++++++++
 reftable/strbuf.c                             |  206 +++
 reftable/strbuf.h                             |   88 +
 reftable/strbuf_test.c                        |   39 +
 reftable/system.h                             |   53 +
 reftable/test_framework.c                     |   69 +
 reftable/test_framework.h                     |   64 +
 reftable/tree.c                               |   63 +
 reftable/tree.h                               |   34 +
 reftable/tree_test.c                          |   62 +
 reftable/update.sh                            |   19 +
 reftable/writer.c                             |  663 ++++++++
 reftable/writer.h                             |   60 +
 reftable/zlib-compat.c                        |   92 +
 reftable/zlib.BUILD                           |   36 +
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 sequencer.c                                   |   56 +-
 setup.c                                       |   12 +-
 t/helper/test-reftable.c                      |   20 +
 t/helper/test-tool.c                          |    2 +
 t/helper/test-tool.h                          |    2 +
 t/lib-t6000.sh                                |    5 +-
 t/t0031-reftable.sh                           |  178 ++
 t/t0033-debug-refs.sh                         |   18 +
 t/t1409-avoid-packing-refs.sh                 |    6 +
 t/t1450-fsck.sh                               |    6 +
 t/t3210-pack-refs.sh                          |    6 +
 t/t3432-rebase-fast-forward.sh                |    7 +-
 t/test-lib.sh                                 |    5 +
 wt-status.c                                   |    6 +-
 91 files changed, 13304 insertions(+), 209 deletions(-)
 create mode 100644 refs/debug.c
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/.gitattributes
 create mode 100644 reftable/BUILD
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/WORKSPACE
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/compat.c
 create mode 100644 reftable/compat.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/refname_test.c
 create mode 100644 reftable/reftable-tests.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/strbuf.c
 create mode 100644 reftable/strbuf.h
 create mode 100644 reftable/strbuf_test.c
 create mode 100644 reftable/system.h
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100644 reftable/zlib.BUILD
 create mode 100644 t/helper/test-reftable.c
 create mode 100755 t/t0031-reftable.sh
 create mode 100755 t/t0033-debug-refs.sh


base-commit: b9a2d1a0207fb9ded3fa524f54db3bc322a12cc4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v19
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v19
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v18:

  1:  b968b795af =  1:  596c316da4 lib-t6000.sh: write tag using git-update-ref
  -:  ---------- >  2:  277da0cf7e t3432: use git-reflog to inspect the reflog for HEAD
  2:  5210225977 =  3:  125695ce92 checkout: add '\n' to reflog message
  3:  15c9dd66e1 =  4:  5715681b3f Write pseudorefs through ref backends.
  4:  a5bce2e3fe =  5:  c8cf2d2ce1 Make refs_ref_exists public
  5:  a29d898907 =  6:  e36b29de79 Treat BISECT_HEAD as a pseudo ref
  6:  11a690d2b8 =  7:  1676df9851 Treat CHERRY_PICK_HEAD as a pseudo ref
  7:  4e52ec0dbc =  8:  0a5f97ea34 Treat REVERT_HEAD as a pseudo ref
  8:  37e350af15 =  9:  9463ed9093 Move REF_LOG_ONLY to refs-internal.h
  9:  468f00eaf6 = 10:  a116aebe11 Iterate over the "refs/" namespace in for_each_[raw]ref
 10:  21febeaa81 = 11:  e4545658ed Add .gitattributes for the reftable/ directory
 11:  3c84f43cfa ! 12:  169f6c7f54 Add reftable library
     @@ reftable/LICENSE (new)
      
       ## reftable/VERSION (new) ##
      @@
     -+d7472cbd1afe6bf3da53e94971b1cc79ce183fa8 Remove outdated bits of the README file
     ++994cc2f626ff626ff04be5240413775f832110fb C: formatting tweaks
      
       ## reftable/basics.c (new) ##
      @@
     @@ reftable/basics.c (new)
      +void parse_names(char *buf, int size, char ***namesp)
      +{
      +	char **names = NULL;
     -+	int names_cap = 0;
     -+	int names_len = 0;
     ++	size_t names_cap = 0;
     ++	size_t names_len = 0;
      +
      +	char *p = buf;
      +	char *end = buf + size;
     @@ reftable/block_test.c (new)
      +		snprintf(name, sizeof(name), "branch%02d", i);
      +		memset(hash, i, sizeof(hash));
      +
     -+		ref.ref_name = name;
     ++		ref.refname = name;
      +		ref.value = hash;
      +		names[i] = xstrdup(name);
      +		n = block_writer_add(&bw, &rec);
     -+		ref.ref_name = NULL;
     ++		ref.refname = NULL;
      +		ref.value = NULL;
      +		assert(n == 0);
      +	}
     @@ reftable/block_test.c (new)
      +		if (r > 0) {
      +			break;
      +		}
     -+		assert_streq(names[j], ref.ref_name);
     ++		assert_streq(names[j], ref.refname);
      +		j++;
      +	}
      +
     @@ reftable/block_test.c (new)
      +		n = block_iter_next(&it, &rec);
      +		assert(n == 0);
      +
     -+		assert_streq(names[i], ref.ref_name);
     ++		assert_streq(names[i], ref.refname);
      +
      +		want.len--;
      +		n = block_reader_seek(&br, &it, &want);
     @@ reftable/block_test.c (new)
      +
      +		n = block_iter_next(&it, &rec);
      +		assert(n == 0);
     -+		assert_streq(names[10 * (i / 10)], ref.ref_name);
     ++		assert_streq(names[10 * (i / 10)], ref.refname);
      +
      +		block_iter_close(&it);
      +	}
     @@ reftable/iter.c (new)
      +			struct reftable_iterator it = { 0 };
      +
      +			err = reftable_table_seek_ref(&fri->tab, &it,
     -+						      ref->ref_name);
     ++						      ref->refname);
      +			if (err == 0) {
      +				err = reftable_iterator_next_ref(&it, ref);
      +			}
     @@ reftable/merged.c (new)
      +}
      +
      +int reftable_new_merged_table(struct reftable_merged_table **dest,
     -+			      struct reftable_reader **stack, int n,
     ++			      struct reftable_table *stack, int n,
      +			      uint32_t hash_id)
      +{
      +	struct reftable_merged_table *m = NULL;
     @@ reftable/merged.c (new)
      +	uint64_t first_min = 0;
      +	int i = 0;
      +	for (i = 0; i < n; i++) {
     -+		struct reftable_reader *r = stack[i];
     -+		if (r->hash_id != hash_id) {
     ++		uint64_t min = reftable_table_min_update_index(&stack[i]);
     ++		uint64_t max = reftable_table_max_update_index(&stack[i]);
     ++
     ++		if (reftable_table_hash_id(&stack[i]) != hash_id) {
      +			return REFTABLE_FORMAT_ERROR;
      +		}
     -+		if (i > 0 && last_max >= reftable_reader_min_update_index(r)) {
     -+			return REFTABLE_FORMAT_ERROR;
     ++		if (i == 0 || min < first_min) {
     ++			first_min = min;
      +		}
     -+		if (i == 0) {
     -+			first_min = reftable_reader_min_update_index(r);
     ++		if (i == 0 || max > last_max) {
     ++			last_max = max;
      +		}
     -+
     -+		last_max = reftable_reader_max_update_index(r);
      +	}
      +
      +	m = (struct reftable_merged_table *)reftable_calloc(
     @@ reftable/merged.c (new)
      +	return 0;
      +}
      +
     -+void reftable_merged_table_close(struct reftable_merged_table *mt)
     -+{
     -+	int i = 0;
     -+	for (i = 0; i < mt->stack_len; i++) {
     -+		reftable_reader_free(mt->stack[i]);
     -+	}
     -+	FREE_AND_NULL(mt->stack);
     -+	mt->stack_len = 0;
     -+}
     -+
      +/* clears the list of subtable, without affecting the readers themselves. */
      +void merged_table_clear(struct reftable_merged_table *mt)
      +{
     @@ reftable/merged.c (new)
      +	int err = 0;
      +	int i = 0;
      +	for (i = 0; i < mt->stack_len && err == 0; i++) {
     -+		int e = reader_seek(mt->stack[i], &iters[n], rec);
     ++		int e = reftable_table_seek_record(&mt->stack[i], &iters[n],
     ++						   rec);
      +		if (e < 0) {
      +			err = e;
      +		}
     @@ reftable/merged.c (new)
      +				   const char *name)
      +{
      +	struct reftable_ref_record ref = {
     -+		.ref_name = (char *)name,
     ++		.refname = (char *)name,
      +	};
      +	struct reftable_record rec = { 0 };
      +	reftable_record_from_ref(&rec, &ref);
     @@ reftable/merged.c (new)
      +				      const char *name, uint64_t update_index)
      +{
      +	struct reftable_log_record log = {
     -+		.ref_name = (char *)name,
     ++		.refname = (char *)name,
      +		.update_index = update_index,
      +	};
      +	struct reftable_record rec = { 0 };
     @@ reftable/merged.c (new)
      +{
      +	uint64_t max = ~((uint64_t)0);
      +	return reftable_merged_table_seek_log_at(mt, it, name, max);
     ++}
     ++
     ++uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *mt)
     ++{
     ++	return mt->hash_id;
      +}
      
       ## reftable/merged.h (new) ##
     @@ reftable/merged.h (new)
      +#include "reftable.h"
      +
      +struct reftable_merged_table {
     -+	struct reftable_reader **stack;
     -+	int stack_len;
     ++	struct reftable_table *stack;
     ++	size_t stack_len;
      +	uint32_t hash_id;
      +	bool suppress_deletions;
      +
     @@ reftable/merged.h (new)
      +struct merged_iter {
      +	struct reftable_iterator *stack;
      +	uint32_t hash_id;
     -+	int stack_len;
     ++	size_t stack_len;
      +	uint8_t typ;
      +	bool suppress_deletions;
      +	struct merged_iter_pqueue pq;
     @@ reftable/merged.h (new)
      +			     struct reftable_iterator *it,
      +			     struct reftable_record *rec);
      +
     ++int reftable_table_seek_record(struct reftable_table *tab,
     ++			       struct reftable_iterator *it,
     ++			       struct reftable_record *rec);
     ++
      +#endif
      
       ## reftable/merged_test.c (new) ##
     @@ reftable/merged_test.c (new)
      +			reftable_new_record(BLOCK_TYPE_REF);
      +		struct pq_entry e = { 0 };
      +
     -+		reftable_record_as_ref(&rec)->ref_name = names[i];
     ++		reftable_record_as_ref(&rec)->refname = names[i];
      +		e.rec = rec;
      +		merged_iter_pqueue_add(&pq, e);
      +		merged_iter_pqueue_check(pq);
     @@ reftable/merged_test.c (new)
      +		merged_iter_pqueue_check(pq);
      +
      +		if (last != NULL) {
     -+			assert(strcmp(last, ref->ref_name) < 0);
     ++			assert(strcmp(last, ref->refname) < 0);
      +		}
     -+		last = ref->ref_name;
     -+		ref->ref_name = NULL;
     ++		last = ref->refname;
     ++		ref->refname = NULL;
      +		reftable_free(ref);
      +	}
      +
     @@ reftable/merged_test.c (new)
      +
      +static struct reftable_merged_table *
      +merged_table_from_records(struct reftable_ref_record **refs,
     -+			  struct reftable_block_source **source, int *sizes,
     ++			  struct reftable_block_source **source,
     ++			  struct reftable_reader ***readers, int *sizes,
      +			  struct strbuf *buf, int n)
      +{
     -+	struct reftable_reader **rd = reftable_calloc(n * sizeof(*rd));
      +	int i = 0;
      +	struct reftable_merged_table *mt = NULL;
      +	int err;
     ++	struct reftable_table *tabs =
     ++		reftable_calloc(n * sizeof(struct reftable_table));
     ++	*readers = reftable_calloc(n * sizeof(struct reftable_reader *));
      +	*source = reftable_calloc(n * sizeof(**source));
      +	for (i = 0; i < n; i++) {
      +		write_test_table(&buf[i], refs[i], sizes[i]);
      +		block_source_from_strbuf(&(*source)[i], &buf[i]);
      +
     -+		err = reftable_new_reader(&rd[i], &(*source)[i], "name");
     ++		err = reftable_new_reader(&(*readers)[i], &(*source)[i],
     ++					  "name");
      +		assert_err(err);
     ++		reftable_table_from_reader(&tabs[i], (*readers)[i]);
      +	}
      +
     -+	err = reftable_new_merged_table(&mt, rd, n, SHA1_ID);
     ++	err = reftable_new_merged_table(&mt, tabs, n, SHA1_ID);
      +	assert_err(err);
      +	return mt;
      +}
      +
     ++static void readers_destroy(struct reftable_reader **readers, size_t n)
     ++{
     ++	int i = 0;
     ++	for (; i < n; i++)
     ++		reftable_reader_free(readers[i]);
     ++	reftable_free(readers);
     ++}
     ++
      +static void test_merged_between(void)
      +{
      +	uint8_t hash1[SHA1_SIZE] = { 1, 2, 3, 0 };
      +
      +	struct reftable_ref_record r1[] = { {
     -+		.ref_name = "b",
     ++		.refname = "b",
      +		.update_index = 1,
      +		.value = hash1,
      +	} };
      +	struct reftable_ref_record r2[] = { {
     -+		.ref_name = "a",
     ++		.refname = "a",
      +		.update_index = 2,
      +	} };
      +
     @@ reftable/merged_test.c (new)
      +	int sizes[] = { 1, 1 };
      +	struct strbuf bufs[2] = { STRBUF_INIT, STRBUF_INIT };
      +	struct reftable_block_source *bs = NULL;
     ++	struct reftable_reader **readers = NULL;
      +	struct reftable_merged_table *mt =
     -+		merged_table_from_records(refs, &bs, sizes, bufs, 2);
     ++		merged_table_from_records(refs, &bs, &readers, sizes, bufs, 2);
      +	int i;
      +	struct reftable_ref_record ref = { 0 };
      +	struct reftable_iterator it = { 0 };
     @@ reftable/merged_test.c (new)
      +	assert_err(err);
      +	assert(ref.update_index == 2);
      +	reftable_ref_record_clear(&ref);
     -+
      +	reftable_iterator_destroy(&it);
     -+	reftable_merged_table_close(mt);
     ++	readers_destroy(readers, 2);
      +	reftable_merged_table_free(mt);
      +	for (i = 0; i < ARRAY_SIZE(bufs); i++) {
      +		strbuf_release(&bufs[i]);
     @@ reftable/merged_test.c (new)
      +	uint8_t hash1[SHA1_SIZE] = { 1 };
      +	uint8_t hash2[SHA1_SIZE] = { 2 };
      +	struct reftable_ref_record r1[] = { {
     -+						    .ref_name = "a",
     ++						    .refname = "a",
      +						    .update_index = 1,
      +						    .value = hash1,
      +					    },
      +					    {
     -+						    .ref_name = "b",
     ++						    .refname = "b",
      +						    .update_index = 1,
      +						    .value = hash1,
      +					    },
      +					    {
     -+						    .ref_name = "c",
     ++						    .refname = "c",
      +						    .update_index = 1,
      +						    .value = hash1,
      +					    } };
      +	struct reftable_ref_record r2[] = { {
     -+		.ref_name = "a",
     ++		.refname = "a",
      +		.update_index = 2,
      +	} };
      +	struct reftable_ref_record r3[] = {
      +		{
     -+			.ref_name = "c",
     ++			.refname = "c",
      +			.update_index = 3,
      +			.value = hash2,
      +		},
      +		{
     -+			.ref_name = "d",
     ++			.refname = "d",
      +			.update_index = 3,
      +			.value = hash1,
      +		},
     @@ reftable/merged_test.c (new)
      +	int sizes[3] = { 3, 1, 2 };
      +	struct strbuf bufs[3] = { STRBUF_INIT, STRBUF_INIT, STRBUF_INIT };
      +	struct reftable_block_source *bs = NULL;
     -+
     ++	struct reftable_reader **readers = NULL;
      +	struct reftable_merged_table *mt =
     -+		merged_table_from_records(refs, &bs, sizes, bufs, 3);
     ++		merged_table_from_records(refs, &bs, &readers, sizes, bufs, 3);
      +
      +	struct reftable_iterator it = { 0 };
      +	int err = reftable_merged_table_seek_ref(mt, &it, "a");
      +	struct reftable_ref_record *out = NULL;
     -+	int len = 0;
     -+	int cap = 0;
     ++	size_t len = 0;
     ++	size_t cap = 0;
      +	int i = 0;
      +
      +	assert_err(err);
     @@ reftable/merged_test.c (new)
      +	for (i = 0; i < 3; i++) {
      +		strbuf_release(&bufs[i]);
      +	}
     -+	reftable_merged_table_close(mt);
     ++	readers_destroy(readers, 3);
      +	reftable_merged_table_free(mt);
      +	reftable_free(bs);
      +}
     @@ reftable/pq.h (new)
      +
      +struct merged_iter_pqueue {
      +	struct pq_entry *heap;
     -+	int len;
     -+	int cap;
     ++	size_t len;
     ++	size_t cap;
      +};
      +
      +struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
     @@ reftable/reader.c (new)
      +			     uint64_t next_off, uint8_t want_typ)
      +{
      +	int32_t guess_block_size = r->block_size ? r->block_size :
     -+						   DEFAULT_BLOCK_SIZE;
     ++							 DEFAULT_BLOCK_SIZE;
      +	struct reftable_block block = { 0 };
      +	uint8_t block_typ = 0;
      +	int err = 0;
     @@ reftable/reader.c (new)
      +			     struct reftable_iterator *it, const char *name)
      +{
      +	struct reftable_ref_record ref = {
     -+		.ref_name = (char *)name,
     ++		.refname = (char *)name,
      +	};
      +	struct reftable_record rec = { 0 };
      +	reftable_record_from_ref(&rec, &ref);
     @@ reftable/reader.c (new)
      +				uint64_t update_index)
      +{
      +	struct reftable_log_record log = {
     -+		.ref_name = (char *)name,
     ++		.refname = (char *)name,
      +		.update_index = update_index,
      +	};
      +	struct reftable_record rec = { 0 };
     @@ reftable/record.c (new)
      +	const struct reftable_ref_record *rec =
      +		(const struct reftable_ref_record *)r;
      +	strbuf_reset(dest);
     -+	strbuf_addstr(dest, rec->ref_name);
     ++	strbuf_addstr(dest, rec->refname);
      +}
      +
      +static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
     @@ reftable/record.c (new)
      +	/* This is simple and correct, but we could probably reuse the hash
      +	   fields. */
      +	reftable_ref_record_clear(ref);
     -+	if (src->ref_name != NULL) {
     -+		ref->ref_name = xstrdup(src->ref_name);
     ++	if (src->refname != NULL) {
     ++		ref->refname = xstrdup(src->refname);
      +	}
      +
      +	if (src->target != NULL) {
     @@ reftable/record.c (new)
      +			       uint32_t hash_id)
      +{
      +	char hex[SHA256_SIZE + 1] = { 0 };
     -+	printf("ref{%s(%" PRIu64 ") ", ref->ref_name, ref->update_index);
     ++	printf("ref{%s(%" PRIu64 ") ", ref->refname, ref->update_index);
      +	if (ref->value != NULL) {
      +		hex_format(hex, ref->value, hash_size(hash_id));
      +		printf("%s", hex);
     @@ reftable/record.c (new)
      +
      +void reftable_ref_record_clear(struct reftable_ref_record *ref)
      +{
     -+	reftable_free(ref->ref_name);
     ++	reftable_free(ref->refname);
      +	reftable_free(ref->target);
      +	reftable_free(ref->target_value);
      +	reftable_free(ref->value);
     @@ reftable/record.c (new)
      +
      +	string_view_consume(&in, n);
      +
     -+	r->ref_name = reftable_realloc(r->ref_name, key.len + 1);
     -+	memcpy(r->ref_name, key.buf, key.len);
     -+	r->ref_name[key.len] = 0;
     ++	r->refname = reftable_realloc(r->refname, key.len + 1);
     ++	memcpy(r->refname, key.buf, key.len);
     ++	r->refname[key.len] = 0;
      +
      +	switch (val_type) {
      +	case 1:
     @@ reftable/record.c (new)
      +{
      +	char hex[SHA256_SIZE + 1] = { 0 };
      +
     -+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->ref_name,
     ++	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->refname,
      +	       log->update_index, log->name, log->email, log->time,
      +	       log->tz_offset);
      +	hex_format(hex, log->old_hash, hash_size(hash_id));
     @@ reftable/record.c (new)
      +{
      +	const struct reftable_log_record *rec =
      +		(const struct reftable_log_record *)r;
     -+	int len = strlen(rec->ref_name);
     ++	int len = strlen(rec->refname);
      +	uint8_t i64[8];
      +	uint64_t ts = 0;
      +	strbuf_reset(dest);
     -+	strbuf_add(dest, (uint8_t *)rec->ref_name, len + 1);
     ++	strbuf_add(dest, (uint8_t *)rec->refname, len + 1);
      +
      +	ts = (~ts) - rec->update_index;
      +	put_be64(&i64[0], ts);
     @@ reftable/record.c (new)
      +
      +	reftable_log_record_clear(dst);
      +	*dst = *src;
     -+	if (dst->ref_name != NULL) {
     -+		dst->ref_name = xstrdup(dst->ref_name);
     ++	if (dst->refname != NULL) {
     ++		dst->refname = xstrdup(dst->refname);
      +	}
      +	if (dst->email != NULL) {
      +		dst->email = xstrdup(dst->email);
     @@ reftable/record.c (new)
      +
      +void reftable_log_record_clear(struct reftable_log_record *r)
      +{
     -+	reftable_free(r->ref_name);
     ++	reftable_free(r->refname);
      +	reftable_free(r->new_hash);
      +	reftable_free(r->old_hash);
      +	reftable_free(r->name);
     @@ reftable/record.c (new)
      +	if (key.len <= 9 || key.buf[key.len - 9] != 0)
      +		return REFTABLE_FORMAT_ERROR;
      +
     -+	r->ref_name = reftable_realloc(r->ref_name, key.len - 8);
     -+	memcpy(r->ref_name, key.buf, key.len - 8);
     ++	r->refname = reftable_realloc(r->refname, key.len - 8);
     ++	memcpy(r->refname, key.buf, key.len - 8);
      +	ts = get_be64(key.buf + key.len - 8);
      +
      +	r->update_index = (~max) - ts;
     @@ reftable/record.c (new)
      +			       struct reftable_ref_record *b, int hash_size)
      +{
      +	assert(hash_size > 0);
     -+	return 0 == strcmp(a->ref_name, b->ref_name) &&
     ++	return 0 == strcmp(a->refname, b->refname) &&
      +	       a->update_index == b->update_index &&
      +	       hash_equal(a->value, b->value, hash_size) &&
      +	       hash_equal(a->target_value, b->target_value, hash_size) &&
     @@ reftable/record.c (new)
      +
      +int reftable_ref_record_compare_name(const void *a, const void *b)
      +{
     -+	return strcmp(((struct reftable_ref_record *)a)->ref_name,
     -+		      ((struct reftable_ref_record *)b)->ref_name);
     ++	return strcmp(((struct reftable_ref_record *)a)->refname,
     ++		      ((struct reftable_ref_record *)b)->refname);
      +}
      +
      +bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
     @@ reftable/record.c (new)
      +	struct reftable_log_record *la = (struct reftable_log_record *)a;
      +	struct reftable_log_record *lb = (struct reftable_log_record *)b;
      +
     -+	int cmp = strcmp(la->ref_name, lb->ref_name);
     ++	int cmp = strcmp(la->refname, lb->refname);
      +	if (cmp)
      +		return cmp;
      +	if (la->update_index > lb->update_index)
     @@ reftable/record.h (new)
      +*/
      +struct string_view {
      +	uint8_t *buf;
     -+	int len;
     ++	size_t len;
      +};
      +
      +/* Advance `s.buf` by `n`, and decrease length. */
     @@ reftable/record_test.c (new)
      +	for (i = 0; i <= 3; i++) {
      +		struct reftable_ref_record in = { 0 };
      +		struct reftable_ref_record out = {
     -+			.ref_name = xstrdup("old name"),
     ++			.refname = xstrdup("old name"),
      +			.value = reftable_calloc(SHA1_SIZE),
      +			.target_value = reftable_calloc(SHA1_SIZE),
      +			.target = xstrdup("old value"),
     @@ reftable/record_test.c (new)
      +			in.target = xstrdup("target");
      +			break;
      +		}
     -+		in.ref_name = xstrdup("refs/heads/master");
     ++		in.refname = xstrdup("refs/heads/master");
      +
      +		reftable_record_from_ref(&rec, &in);
      +		test_copy(&rec);
     @@ reftable/record_test.c (new)
      +{
      +	struct reftable_log_record in[2] = {
      +		{
     -+			.ref_name = xstrdup("refs/heads/master"),
     ++			.refname = xstrdup("refs/heads/master"),
      +			.update_index = 42,
      +		},
      +		{
     -+			.ref_name = xstrdup("refs/heads/master"),
     ++			.refname = xstrdup("refs/heads/master"),
      +			.update_index = 22,
      +		}
      +	};
     @@ reftable/record_test.c (new)
      +{
      +	struct reftable_log_record in[2] = {
      +		{
     -+			.ref_name = xstrdup("refs/heads/master"),
     ++			.refname = xstrdup("refs/heads/master"),
      +			.old_hash = reftable_malloc(SHA1_SIZE),
      +			.new_hash = reftable_malloc(SHA1_SIZE),
      +			.name = xstrdup("han-wen"),
     @@ reftable/record_test.c (new)
      +			.tz_offset = 100,
      +		},
      +		{
     -+			.ref_name = xstrdup("refs/heads/master"),
     ++			.refname = xstrdup("refs/heads/master"),
      +			.update_index = 22,
      +		}
      +	};
     @@ reftable/record_test.c (new)
      +		};
      +		/* populate out, to check for leaks. */
      +		struct reftable_log_record out = {
     -+			.ref_name = xstrdup("old name"),
     ++			.refname = xstrdup("old name"),
      +			.new_hash = reftable_calloc(SHA1_SIZE),
      +			.old_hash = reftable_calloc(SHA1_SIZE),
      +			.name = xstrdup("old name"),
     @@ reftable/refname.c (new)
      +		if (mod->del_len > 0) {
      +			struct find_arg arg = {
      +				.names = mod->del,
     -+				.want = ref.ref_name,
     ++				.want = ref.refname,
      +			};
      +			int idx = binsearch(mod->del_len, find_name, &arg);
      +			if (idx < mod->del_len &&
     -+			    !strcmp(ref.ref_name, mod->del[idx])) {
     ++			    !strcmp(ref.refname, mod->del[idx])) {
      +				continue;
      +			}
      +		}
      +
     -+		if (strncmp(ref.ref_name, prefix, strlen(prefix))) {
     ++		if (strncmp(ref.refname, prefix, strlen(prefix))) {
      +			err = 1;
      +			goto done;
      +		}
     @@ reftable/refname.c (new)
      +	return err;
      +}
      +
     -+int validate_ref_name(const char *name)
     ++int validate_refname(const char *name)
      +{
      +	while (true) {
      +		char *next = strchr(name, '/');
     @@ reftable/refname.c (new)
      +	int err = 0;
      +	for (; i < sz; i++) {
      +		if (reftable_ref_record_is_deletion(&recs[i])) {
     -+			mod.del[mod.del_len++] = recs[i].ref_name;
     ++			mod.del[mod.del_len++] = recs[i].refname;
      +		} else {
     -+			mod.add[mod.add_len++] = recs[i].ref_name;
     ++			mod.add[mod.add_len++] = recs[i].refname;
      +		}
      +	}
      +
     @@ reftable/refname.c (new)
      +	int err = 0;
      +	int i = 0;
      +	for (; i < mod->add_len; i++) {
     -+		err = validate_ref_name(mod->add[i]);
     ++		err = validate_refname(mod->add[i]);
      +		if (err)
      +			goto done;
      +		strbuf_reset(&slashed);
     @@ reftable/refname.h (new)
      +				     const char *prefix);
      +
      +// 0 = OK.
     -+int validate_ref_name(const char *name);
     ++int validate_refname(const char *name);
      +
      +int validate_ref_record_addition(struct reftable_table tab,
      +				 struct reftable_ref_record *recs, size_t sz);
     @@ reftable/refname_test.c (new)
      +	struct reftable_writer *w =
      +		reftable_new_writer(&strbuf_add_void, &buf, &opts);
      +	struct reftable_ref_record rec = {
     -+		.ref_name = "a/b",
     ++		.refname = "a/b",
      +		.target = "destination", /* make sure it's not a symref. */
      +		.update_index = 1,
      +	};
     @@ reftable/reftable.c (new)
      +#include "merged.h"
      +
      +struct reftable_table_vtable {
     -+	int (*seek)(void *tab, struct reftable_iterator *it,
     -+		    struct reftable_record *);
     ++	int (*seek_record)(void *tab, struct reftable_iterator *it,
     ++			   struct reftable_record *);
     ++	uint32_t (*hash_id)(void *tab);
     ++	uint64_t (*min_update_index)(void *tab);
     ++	uint64_t (*max_update_index)(void *tab);
      +};
      +
      +static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
     @@ reftable/reftable.c (new)
      +	return reader_seek((struct reftable_reader *)tab, it, rec);
      +}
      +
     ++static uint32_t reftable_reader_hash_id_void(void *tab)
     ++{
     ++	return reftable_reader_hash_id((struct reftable_reader *)tab);
     ++}
     ++
     ++static uint64_t reftable_reader_min_update_index_void(void *tab)
     ++{
     ++	return reftable_reader_min_update_index((struct reftable_reader *)tab);
     ++}
     ++
     ++static uint64_t reftable_reader_max_update_index_void(void *tab)
     ++{
     ++	return reftable_reader_max_update_index((struct reftable_reader *)tab);
     ++}
     ++
      +static struct reftable_table_vtable reader_vtable = {
     -+	.seek = reftable_reader_seek_void,
     ++	.seek_record = reftable_reader_seek_void,
     ++	.hash_id = reftable_reader_hash_id_void,
     ++	.min_update_index = reftable_reader_min_update_index_void,
     ++	.max_update_index = reftable_reader_max_update_index_void,
      +};
      +
      +static int reftable_merged_table_seek_void(void *tab,
     @@ reftable/reftable.c (new)
      +					rec);
      +}
      +
     ++static uint32_t reftable_merged_table_hash_id_void(void *tab)
     ++{
     ++	return reftable_merged_table_hash_id(
     ++		(struct reftable_merged_table *)tab);
     ++}
     ++
     ++static uint64_t reftable_merged_table_min_update_index_void(void *tab)
     ++{
     ++	return reftable_merged_table_min_update_index(
     ++		(struct reftable_merged_table *)tab);
     ++}
     ++
     ++static uint64_t reftable_merged_table_max_update_index_void(void *tab)
     ++{
     ++	return reftable_merged_table_max_update_index(
     ++		(struct reftable_merged_table *)tab);
     ++}
     ++
      +static struct reftable_table_vtable merged_table_vtable = {
     -+	.seek = reftable_merged_table_seek_void,
     ++	.seek_record = reftable_merged_table_seek_void,
     ++	.hash_id = reftable_merged_table_hash_id_void,
     ++	.min_update_index = reftable_merged_table_min_update_index_void,
     ++	.max_update_index = reftable_merged_table_max_update_index_void,
      +};
      +
      +int reftable_table_seek_ref(struct reftable_table *tab,
      +			    struct reftable_iterator *it, const char *name)
      +{
      +	struct reftable_ref_record ref = {
     -+		.ref_name = (char *)name,
     ++		.refname = (char *)name,
      +	};
      +	struct reftable_record rec = { 0 };
      +	reftable_record_from_ref(&rec, &ref);
     -+	return tab->ops->seek(tab->table_arg, it, &rec);
     ++	return tab->ops->seek_record(tab->table_arg, it, &rec);
      +}
      +
      +void reftable_table_from_reader(struct reftable_table *tab,
     @@ reftable/reftable.c (new)
      +	if (err)
      +		goto done;
      +
     -+	if (strcmp(ref->ref_name, name) ||
     ++	if (strcmp(ref->refname, name) ||
      +	    reftable_ref_record_is_deletion(ref)) {
      +		reftable_ref_record_clear(ref);
      +		err = 1;
     @@ reftable/reftable.c (new)
      +done:
      +	reftable_iterator_destroy(&it);
      +	return err;
     ++}
     ++
     ++int reftable_table_seek_record(struct reftable_table *tab,
     ++			       struct reftable_iterator *it,
     ++			       struct reftable_record *rec)
     ++{
     ++	return tab->ops->seek_record(tab->table_arg, it, rec);
     ++}
     ++
     ++uint64_t reftable_table_max_update_index(struct reftable_table *tab)
     ++{
     ++	return tab->ops->max_update_index(tab->table_arg);
     ++}
     ++
     ++uint64_t reftable_table_min_update_index(struct reftable_table *tab)
     ++{
     ++	return tab->ops->min_update_index(tab->table_arg);
     ++}
     ++
     ++uint32_t reftable_table_hash_id(struct reftable_table *tab)
     ++{
     ++	return tab->ops->hash_id(tab->table_arg);
      +}
      
       ## reftable/reftable.h (new) ##
     @@ reftable/reftable.h (new)
      +
      +/* reftable_ref_record holds a ref database entry target_value */
      +struct reftable_ref_record {
     -+	char *ref_name; /* Name of the ref, malloced. */
     ++	char *refname; /* Name of the ref, malloced. */
      +	uint64_t update_index; /* Logical timestamp at which this value is
      +				  written */
      +	uint8_t *value; /* SHA1, or NULL. malloced. */
     @@ reftable/reftable.h (new)
      +
      +/* reftable_log_record holds a reflog entry */
      +struct reftable_log_record {
     -+	char *ref_name;
     ++	char *refname;
      +	uint64_t update_index; /* logical timestamp of a transactional update.
      +				*/
      +	uint8_t *new_hash;
     @@ reftable/reftable.h (new)
      +	REFTABLE_LOCK_ERROR = -5,
      +
      +	/* Misuse of the API:
     -+	   - on writing a record with NULL ref_name.
     ++	   - on writing a record with NULL refname.
      +	   - on writing a reftable_ref_record outside the table limits
      +	   - on writing a ref or log record before the stack's next_update_index
      +	   - on writing a log record with multiline message with
     @@ reftable/reftable.h (new)
      +/* A merged table is implements seeking/iterating over a stack of tables. */
      +struct reftable_merged_table;
      +
     ++/* A generic reftable; see below. */
     ++struct reftable_table;
     ++
      +/* reftable_new_merged_table creates a new merged table. It takes ownership of
      +   the stack array.
      +*/
      +int reftable_new_merged_table(struct reftable_merged_table **dest,
     -+			      struct reftable_reader **stack, int n,
     ++			      struct reftable_table *stack, int n,
      +			      uint32_t hash_id);
      +
      +/* returns an iterator positioned just before 'name' */
     @@ reftable/reftable.h (new)
      +uint64_t
      +reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
      +
     -+/* closes readers for the merged tables */
     -+void reftable_merged_table_close(struct reftable_merged_table *mt);
     -+
      +/* releases memory for the merged_table */
      +void reftable_merged_table_free(struct reftable_merged_table *m);
      +
     ++/* return the hash ID of the merged table. */
     ++uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *m);
     ++
      +/****************************************************************
      + Generic tables
      +
     @@ reftable/reftable.h (new)
      +
      +void reftable_table_from_reader(struct reftable_table *tab,
      +				struct reftable_reader *reader);
     ++
     ++/* returns the hash ID from a generic reftable_table */
     ++uint32_t reftable_table_hash_id(struct reftable_table *tab);
     ++
     ++/* create a generic table from reftable_merged_table */
      +void reftable_table_from_merged_table(struct reftable_table *tab,
      +				      struct reftable_merged_table *table);
      +
     ++/* returns the max update_index covered by this table. */
     ++uint64_t reftable_table_max_update_index(struct reftable_table *tab);
     ++
     ++/* returns the min update_index covered by this table. */
     ++uint64_t reftable_table_min_update_index(struct reftable_table *tab);
     ++
      +/* convenience function to read a single ref. Returns < 0 for error, 0
      +   for success, and 1 if ref not found. */
      +int reftable_table_read_ref(struct reftable_table *tab, const char *name,
     @@ reftable/reftable_test.c (new)
      +		reftable_new_writer(&strbuf_add_void, &buf, &opts);
      +
      +	struct reftable_ref_record rec = {
     -+		.ref_name = "master",
     ++		.refname = "master",
      +		.update_index = 1,
      +	};
      +	int err;
      +	struct reftable_block_source source = { 0 };
     -+	struct reftable_reader **readers = malloc(sizeof(*readers) * 1);
     ++	struct reftable_reader **readers =
     ++		reftable_calloc(sizeof(*readers) * 1);
     ++	struct reftable_table *tab = reftable_calloc(sizeof(*tab) * 1);
      +	uint32_t hash_id;
      +	struct reftable_reader *rd = NULL;
      +	struct reftable_merged_table *merged = NULL;
     @@ reftable/reftable_test.c (new)
      +	hash_id = reftable_reader_hash_id(rd);
      +	assert(hash_id == SHA1_ID);
      +
     -+	readers[0] = rd;
     -+
     -+	err = reftable_new_merged_table(&merged, readers, 1, SHA1_ID);
     ++	reftable_table_from_reader(&tab[0], rd);
     ++	err = reftable_new_merged_table(&merged, tab, 1, SHA1_ID);
      +	assert_err(err);
      +
     -+	reftable_merged_table_close(merged);
     ++	reader_close(rd);
      +	reftable_merged_table_free(merged);
      +	strbuf_release(&buf);
      +}
     @@ reftable/reftable_test.c (new)
      +
      +		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
      +
     -+		ref.ref_name = name;
     ++		ref.refname = name;
      +		ref.value = hash;
      +		ref.update_index = update_index;
      +		(*names)[i] = xstrdup(name);
     @@ reftable/reftable_test.c (new)
      +
      +		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
      +
     -+		log.ref_name = name;
     ++		log.refname = name;
      +		log.new_hash = hash;
      +		log.update_index = update_index;
      +		log.message = "message";
     @@ reftable/reftable_test.c (new)
      +	};
      +	int err;
      +	struct reftable_log_record log = {
     -+		.ref_name = "refs/heads/master",
     ++		.refname = "refs/heads/master",
      +		.name = "Han-Wen Nienhuys",
      +		.email = "hanwen@google.com",
      +		.tz_offset = 100,
     @@ reftable/reftable_test.c (new)
      +		snprintf(name, sizeof(name), "b%02d%0*d", i, 130, 7);
      +		names[i] = xstrdup(name);
      +		puts(name);
     -+		ref.ref_name = name;
     ++		ref.refname = name;
      +		ref.update_index = i;
      +
      +		err = reftable_writer_add_ref(w, &ref);
     @@ reftable/reftable_test.c (new)
      +		set_test_hash(hash1, i);
      +		set_test_hash(hash2, i + 1);
      +
     -+		log.ref_name = names[i];
     ++		log.refname = names[i];
      +		log.update_index = i;
      +		log.old_hash = hash1;
      +		log.new_hash = hash2;
     @@ reftable/reftable_test.c (new)
      +		}
      +
      +		assert_err(err);
     -+		assert_streq(names[i], log.ref_name);
     ++		assert_streq(names[i], log.refname);
      +		assert(i == log.update_index);
      +		i++;
      +		reftable_log_record_clear(&log);
     @@ reftable/reftable_test.c (new)
      +		if (r > 0) {
      +			break;
      +		}
     -+		assert(0 == strcmp(names[j], ref.ref_name));
     ++		assert(0 == strcmp(names[j], ref.refname));
      +		assert(update_index == ref.update_index);
      +
      +		j++;
     @@ reftable/reftable_test.c (new)
      +		assert_err(err);
      +		err = reftable_iterator_next_ref(&it, &ref);
      +		assert_err(err);
     -+		assert(0 == strcmp(names[i], ref.ref_name));
     ++		assert(0 == strcmp(names[i], ref.refname));
      +		assert(i == ref.value[0]);
      +
      +		reftable_ref_record_clear(&ref);
     @@ reftable/reftable_test.c (new)
      +		/* Put the variable part in the start */
      +		snprintf(name, sizeof(name), "br%02d%s", i, fill);
      +		name[40] = 0;
     -+		ref.ref_name = name;
     ++		ref.refname = name;
      +
      +		set_test_hash(hash1, i / 4);
      +		set_test_hash(hash2, 3 + i / 4);
     @@ reftable/reftable_test.c (new)
      +		}
      +
      +		assert(j < want_names_len);
     -+		assert(0 == strcmp(ref.ref_name, want_names[j]));
     ++		assert(0 == strcmp(ref.refname, want_names[j]));
      +		j++;
      +		reftable_ref_record_clear(&ref);
      +	}
     @@ reftable/stack.c (new)
      +void reftable_stack_destroy(struct reftable_stack *st)
      +{
      +	if (st->merged != NULL) {
     -+		reftable_merged_table_close(st->merged);
      +		reftable_merged_table_free(st->merged);
      +		st->merged = NULL;
      +	}
     ++
     ++	if (st->readers != NULL) {
     ++		int i = 0;
     ++		for (i = 0; i < st->readers_len; i++) {
     ++			reftable_reader_free(st->readers[i]);
     ++		}
     ++		st->readers_len = 0;
     ++		FREE_AND_NULL(st->readers);
     ++	}
      +	FREE_AND_NULL(st->list_file);
      +	FREE_AND_NULL(st->reftable_dir);
      +	reftable_free(st);
     @@ reftable/stack.c (new)
      +		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
      +	int i = 0;
      +	for (i = 0; i < cur_len; i++) {
     -+		cur[i] = st->merged->stack[i];
     ++		cur[i] = st->readers[i];
      +	}
      +	return cur;
      +}
     @@ reftable/stack.c (new)
      +	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
      +	int err = 0;
      +	int names_len = names_length(names);
     -+	struct reftable_reader **new_tables =
     -+		reftable_malloc(sizeof(struct reftable_reader *) * names_len);
     -+	int new_tables_len = 0;
     ++	struct reftable_reader **new_readers =
     ++		reftable_calloc(sizeof(struct reftable_reader *) * names_len);
     ++	struct reftable_table *new_tables =
     ++		reftable_calloc(sizeof(struct reftable_table) * names_len);
     ++	int new_readers_len = 0;
      +	struct reftable_merged_table *new_merged = NULL;
      +	int i;
      +
     @@ reftable/stack.c (new)
      +				goto done;
      +		}
      +
     -+		new_tables[new_tables_len++] = rd;
     ++		new_readers[new_readers_len] = rd;
     ++		reftable_table_from_reader(&new_tables[new_readers_len], rd);
     ++		new_readers_len++;
      +	}
      +
      +	/* success! */
     -+	err = reftable_new_merged_table(&new_merged, new_tables, new_tables_len,
     -+					st->config.hash_id);
     ++	err = reftable_new_merged_table(&new_merged, new_tables,
     ++					new_readers_len, st->config.hash_id);
      +	if (err < 0)
      +		goto done;
      +
      +	new_tables = NULL;
     -+	new_tables_len = 0;
     ++	st->readers_len = new_readers_len;
      +	if (st->merged != NULL) {
      +		merged_table_clear(st->merged);
      +		reftable_merged_table_free(st->merged);
      +	}
     ++	if (st->readers != NULL) {
     ++		reftable_free(st->readers);
     ++	}
     ++	st->readers = new_readers;
     ++	new_readers = NULL;
     ++	new_readers_len = 0;
     ++
      +	new_merged->suppress_deletions = true;
      +	st->merged = new_merged;
     -+
      +	for (i = 0; i < cur_len; i++) {
      +		if (cur[i] != NULL) {
      +			reader_close(cur[i]);
     @@ reftable/stack.c (new)
      +	}
      +
      +done:
     -+	for (i = 0; i < new_tables_len; i++) {
     -+		reader_close(new_tables[i]);
     -+		reftable_reader_free(new_tables[i]);
     ++	for (i = 0; i < new_readers_len; i++) {
     ++		reader_close(new_readers[i]);
     ++		reftable_reader_free(new_readers[i]);
      +	}
     ++	reftable_free(new_readers);
      +	reftable_free(new_tables);
      +	reftable_free(cur);
      +	return err;
     @@ reftable/stack.c (new)
      +	if (err < 0)
      +		return err;
      +
     -+	for (i = 0; i < st->merged->stack_len; i++) {
     ++	for (i = 0; i < st->readers_len; i++) {
      +		if (names[i] == NULL) {
      +			err = 1;
      +			goto done;
      +		}
      +
     -+		if (strcmp(st->merged->stack[i]->name, names[i])) {
     ++		if (strcmp(st->readers[i]->name, names[i])) {
      +			err = 1;
      +			goto done;
      +		}
     @@ reftable/stack.c (new)
      +		goto done;
      +
      +	for (i = 0; i < add->stack->merged->stack_len; i++) {
     -+		strbuf_addstr(&table_list, add->stack->merged->stack[i]->name);
     ++		strbuf_addstr(&table_list, add->stack->readers[i]->name);
      +		strbuf_addstr(&table_list, "\n");
      +	}
      +	for (i = 0; i < add->new_tables_len; i++) {
     @@ reftable/stack.c (new)
      +{
      +	int sz = st->merged->stack_len;
      +	if (sz > 0)
     -+		return reftable_reader_max_update_index(
     -+			       st->merged->stack[sz - 1]) +
     ++		return reftable_reader_max_update_index(st->readers[sz - 1]) +
      +		       1;
      +	return 1;
      +}
     @@ reftable/stack.c (new)
      +	int err = 0;
      +
      +	format_name(&next_name,
     -+		    reftable_reader_min_update_index(st->merged->stack[first]),
     -+		    reftable_reader_max_update_index(st->merged->stack[first]));
     ++		    reftable_reader_min_update_index(st->readers[first]),
     ++		    reftable_reader_max_update_index(st->readers[first]));
      +
      +	strbuf_reset(temp_tab);
      +	strbuf_addstr(temp_tab, st->reftable_dir);
     @@ reftable/stack.c (new)
      +			struct reftable_log_expiry_config *config)
      +{
      +	int subtabs_len = last - first + 1;
     -+	struct reftable_reader **subtabs = reftable_calloc(
     -+		sizeof(struct reftable_reader *) * (last - first + 1));
     ++	struct reftable_table *subtabs = reftable_calloc(
     ++		sizeof(struct reftable_table) * (last - first + 1));
      +	struct reftable_merged_table *mt = NULL;
      +	int err = 0;
      +	struct reftable_iterator it = { 0 };
     @@ reftable/stack.c (new)
      +
      +	int i = 0, j = 0;
      +	for (i = first, j = 0; i <= last; i++) {
     -+		struct reftable_reader *t = st->merged->stack[i];
     -+		subtabs[j++] = t;
     ++		struct reftable_reader *t = st->readers[i];
     ++		reftable_table_from_reader(&subtabs[j++], t);
      +		st->stats.bytes += t->size;
      +	}
     -+	reftable_writer_set_limits(wr,
     -+				   st->merged->stack[first]->min_update_index,
     -+				   st->merged->stack[last]->max_update_index);
     ++	reftable_writer_set_limits(wr, st->readers[first]->min_update_index,
     ++				   st->readers[last]->max_update_index);
      +
      +	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
      +					st->config.hash_id);
     @@ reftable/stack.c (new)
      +
      +		strbuf_addstr(&subtab_file_name, st->reftable_dir);
      +		strbuf_addstr(&subtab_file_name, "/");
     -+		strbuf_addstr(&subtab_file_name,
     -+			      reader_name(st->merged->stack[i]));
     ++		strbuf_addstr(&subtab_file_name, reader_name(st->readers[i]));
      +
      +		strbuf_reset(&subtab_lock);
      +		strbuf_addbuf(&subtab_lock, &subtab_file_name);
     @@ reftable/stack.c (new)
      +	}
      +	have_lock = true;
      +
     -+	format_name(&new_table_name, st->merged->stack[first]->min_update_index,
     -+		    st->merged->stack[last]->max_update_index);
     ++	format_name(&new_table_name, st->readers[first]->min_update_index,
     ++		    st->readers[last]->max_update_index);
      +	strbuf_addstr(&new_table_name, ".ref");
      +
      +	strbuf_reset(&new_table_path);
     @@ reftable/stack.c (new)
      +	}
      +
      +	for (i = 0; i < first; i++) {
     -+		strbuf_addstr(&ref_list_contents, st->merged->stack[i]->name);
     ++		strbuf_addstr(&ref_list_contents, st->readers[i]->name);
      +		strbuf_addstr(&ref_list_contents, "\n");
      +	}
      +	if (!is_empty_table) {
     @@ reftable/stack.c (new)
      +		strbuf_addstr(&ref_list_contents, "\n");
      +	}
      +	for (i = last + 1; i < st->merged->stack_len; i++) {
     -+		strbuf_addstr(&ref_list_contents, st->merged->stack[i]->name);
     ++		strbuf_addstr(&ref_list_contents, st->readers[i]->name);
      +		strbuf_addstr(&ref_list_contents, "\n");
      +	}
      +
     @@ reftable/stack.c (new)
      +	int overhead = header_size(version) - 1;
      +	int i = 0;
      +	for (i = 0; i < st->merged->stack_len; i++) {
     -+		sizes[i] = st->merged->stack[i]->size - overhead;
     ++		sizes[i] = st->readers[i]->size - overhead;
      +	}
      +	return sizes;
      +}
     @@ reftable/stack.c (new)
      +	if (err)
      +		goto done;
      +
     -+	if (strcmp(log->ref_name, refname) ||
     ++	if (strcmp(log->refname, refname) ||
      +	    reftable_log_record_is_deletion(log)) {
      +		err = 1;
      +		goto done;
     @@ reftable/stack.h (new)
      +
      +	struct reftable_write_options config;
      +
     ++	struct reftable_reader **readers;
     ++	size_t readers_len;
      +	struct reftable_merged_table *merged;
      +	struct reftable_compaction_stats stats;
      +};
     @@ reftable/stack_test.c (new)
      +	struct reftable_stack *st = NULL;
      +	int err;
      +	struct reftable_ref_record ref = {
     -+		.ref_name = "HEAD",
     ++		.refname = "HEAD",
      +		.update_index = 1,
      +		.target = "master",
      +	};
     @@ reftable/stack_test.c (new)
      +	err = reftable_stack_add(st, &write_test_ref, &ref);
      +	assert_err(err);
      +
     -+	err = reftable_stack_read_ref(st, ref.ref_name, &dest);
     ++	err = reftable_stack_read_ref(st, ref.refname, &dest);
      +	assert_err(err);
      +	assert(0 == strcmp("master", dest.target));
      +
     @@ reftable/stack_test.c (new)
      +	char dir[256] = "/tmp/stack_test.XXXXXX";
      +	int err;
      +	struct reftable_ref_record ref1 = {
     -+		.ref_name = "HEAD",
     ++		.refname = "HEAD",
      +		.update_index = 1,
      +		.target = "master",
      +	};
      +	struct reftable_ref_record ref2 = {
     -+		.ref_name = "branch2",
     ++		.refname = "branch2",
      +		.update_index = 2,
      +		.target = "master",
      +	};
     @@ reftable/stack_test.c (new)
      +	struct reftable_addition *add = NULL;
      +
      +	struct reftable_ref_record ref = {
     -+		.ref_name = "HEAD",
     ++		.refname = "HEAD",
      +		.update_index = 1,
      +		.target = "master",
      +	};
     @@ reftable/stack_test.c (new)
      +
      +	reftable_addition_destroy(add);
      +
     -+	err = reftable_stack_read_ref(st, ref.ref_name, &dest);
     ++	err = reftable_stack_read_ref(st, ref.refname, &dest);
      +	assert_err(err);
      +	assert(0 == strcmp("master", dest.target));
      +
     @@ reftable/stack_test.c (new)
      +	char dir[256] = "/tmp/stack_test.XXXXXX";
      +	int i;
      +	struct reftable_ref_record ref = {
     -+		.ref_name = "a/b",
     ++		.refname = "a/b",
      +		.update_index = 1,
      +		.target = "master",
      +	};
     @@ reftable/stack_test.c (new)
      +
      +	for (i = 0; i < ARRAY_SIZE(additions); i++) {
      +		struct reftable_ref_record ref = {
     -+			.ref_name = additions[i],
     ++			.refname = additions[i],
      +			.update_index = 1,
      +			.target = "master",
      +		};
     @@ reftable/stack_test.c (new)
      +	struct reftable_stack *st = NULL;
      +	int err;
      +	struct reftable_ref_record ref1 = {
     -+		.ref_name = "name1",
     ++		.refname = "name1",
      +		.update_index = 1,
      +		.target = "master",
      +	};
      +	struct reftable_ref_record ref2 = {
     -+		.ref_name = "name2",
     ++		.refname = "name2",
      +		.update_index = 1,
      +		.target = "master",
      +	};
     @@ reftable/stack_test.c (new)
      +	for (i = 0; i < N; i++) {
      +		char buf[256];
      +		snprintf(buf, sizeof(buf), "branch%02d", i);
     -+		refs[i].ref_name = xstrdup(buf);
     ++		refs[i].refname = xstrdup(buf);
      +		refs[i].value = reftable_malloc(SHA1_SIZE);
      +		refs[i].update_index = i + 1;
      +		set_test_hash(refs[i].value, i);
      +
     -+		logs[i].ref_name = xstrdup(buf);
     ++		logs[i].refname = xstrdup(buf);
      +		logs[i].update_index = N + i + 1;
      +		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
      +		logs[i].email = xstrdup("identity@invalid");
     @@ reftable/stack_test.c (new)
      +
      +	for (i = 0; i < N; i++) {
      +		struct reftable_ref_record dest = { 0 };
     -+		int err = reftable_stack_read_ref(st, refs[i].ref_name, &dest);
     ++		int err = reftable_stack_read_ref(st, refs[i].refname, &dest);
      +		assert_err(err);
      +		assert(reftable_ref_record_equal(&dest, refs + i, SHA1_SIZE));
      +		reftable_ref_record_clear(&dest);
     @@ reftable/stack_test.c (new)
      +
      +	for (i = 0; i < N; i++) {
      +		struct reftable_log_record dest = { 0 };
     -+		int err = reftable_stack_read_log(st, refs[i].ref_name, &dest);
     ++		int err = reftable_stack_read_log(st, refs[i].refname, &dest);
      +		assert_err(err);
      +		assert(reftable_log_record_equal(&dest, logs + i, SHA1_SIZE));
      +		reftable_log_record_clear(&dest);
     @@ reftable/stack_test.c (new)
      +	uint8_t h1[SHA1_SIZE] = { 0x01 }, h2[SHA1_SIZE] = { 0x02 };
      +
      +	struct reftable_log_record input = {
     -+		.ref_name = "branch",
     ++		.refname = "branch",
      +		.update_index = 1,
      +		.new_hash = h1,
      +		.old_hash = h2,
     @@ reftable/stack_test.c (new)
      +	err = reftable_stack_add(st, &write_test_log, &arg);
      +	assert_err(err);
      +
     -+	err = reftable_stack_read_log(st, input.ref_name, &dest);
     ++	err = reftable_stack_read_log(st, input.refname, &dest);
      +	assert_err(err);
      +	assert(0 == strcmp(dest.message, "one\n"));
      +
     @@ reftable/stack_test.c (new)
      +	arg.update_index = 2;
      +	err = reftable_stack_add(st, &write_test_log, &arg);
      +	assert_err(err);
     -+	err = reftable_stack_read_log(st, input.ref_name, &dest);
     ++	err = reftable_stack_read_log(st, input.refname, &dest);
      +	assert_err(err);
      +	assert(0 == strcmp(dest.message, "two\n"));
      +
     @@ reftable/stack_test.c (new)
      +
      +	for (i = 0; i < N; i++) {
      +		const char *buf = "branch";
     -+		refs[i].ref_name = xstrdup(buf);
     ++		refs[i].refname = xstrdup(buf);
      +		refs[i].update_index = i + 1;
      +		if (i % 2 == 0) {
      +			refs[i].value = reftable_malloc(SHA1_SIZE);
      +			set_test_hash(refs[i].value, i);
      +		}
     -+		logs[i].ref_name = xstrdup(buf);
     ++		logs[i].refname = xstrdup(buf);
      +		/* update_index is part of the key. */
      +		logs[i].update_index = 42;
      +		if (i % 2 == 0) {
     @@ reftable/stack_test.c (new)
      +	int err;
      +
      +	struct reftable_ref_record ref = {
     -+		.ref_name = "master",
     ++		.refname = "master",
      +		.target = "target",
      +		.update_index = 1,
      +	};
     @@ reftable/stack_test.c (new)
      +		char buf[256];
      +		snprintf(buf, sizeof(buf), "branch%02d", i);
      +
     -+		logs[i].ref_name = xstrdup(buf);
     ++		logs[i].refname = xstrdup(buf);
      +		logs[i].update_index = i;
      +		logs[i].time = i;
      +		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
     @@ reftable/stack_test.c (new)
      +	err = reftable_stack_compact_all(st, &expiry);
      +	assert_err(err);
      +
     -+	err = reftable_stack_read_log(st, logs[9].ref_name, &log);
     ++	err = reftable_stack_read_log(st, logs[9].refname, &log);
      +	assert(err == 1);
      +
     -+	err = reftable_stack_read_log(st, logs[11].ref_name, &log);
     ++	err = reftable_stack_read_log(st, logs[11].refname, &log);
      +	assert_err(err);
      +
      +	expiry.min_update_index = 15;
      +	err = reftable_stack_compact_all(st, &expiry);
      +	assert_err(err);
      +
     -+	err = reftable_stack_read_log(st, logs[14].ref_name, &log);
     ++	err = reftable_stack_read_log(st, logs[14].refname, &log);
      +	assert(err == 1);
      +
     -+	err = reftable_stack_read_log(st, logs[16].ref_name, &log);
     ++	err = reftable_stack_read_log(st, logs[16].refname, &log);
      +	assert_err(err);
      +
      +	/* cleanup */
     @@ reftable/stack_test.c (new)
      +	for (i = 0; i < N; i++) {
      +		char name[100];
      +		struct reftable_ref_record ref = {
     -+			.ref_name = name,
     ++			.refname = name,
      +			.update_index = reftable_stack_next_update_index(st),
      +			.target = "master",
      +		};
     @@ reftable/strbuf.h (new)
      +  x = STRBUF_INIT;"
      + */
      +struct strbuf {
     -+	int len;
     -+	int cap;
     ++	size_t len;
     ++	size_t cap;
      +	char *buf;
      +
      +	/* Used to enforce initialization with STRBUF_INIT */
     @@ reftable/writer.c (new)
      +struct obj_index_tree_node {
      +	struct strbuf hash;
      +	uint64_t *offsets;
     -+	int offset_len;
     -+	int offset_cap;
     ++	size_t offset_len;
     ++	size_t offset_cap;
      +};
     ++
      +#define OBJ_INDEX_TREE_NODE_INIT    \
      +	{                           \
      +		.hash = STRBUF_INIT \
     @@ reftable/writer.c (new)
      +	struct reftable_ref_record copy = *ref;
      +	int err = 0;
      +
     -+	if (ref->ref_name == NULL)
     ++	if (ref->refname == NULL)
      +		return REFTABLE_API_ERROR;
      +	if (ref->update_index < w->min_update_index ||
      +	    ref->update_index > w->max_update_index)
     @@ reftable/writer.c (new)
      +	char *input_log_message = log->message;
      +	struct strbuf cleaned_message = STRBUF_INIT;
      +	int err;
     -+	if (log->ref_name == NULL)
     ++	if (log->refname == NULL)
      +		return REFTABLE_API_ERROR;
      +
      +	if (w->block_writer != NULL &&
     @@ reftable/writer.h (new)
      +
      +	/* pending index records for the current section */
      +	struct reftable_index_record *index;
     -+	int index_len;
     -+	int index_cap;
     ++	size_t index_len;
     ++	size_t index_cap;
      +
      +	/*
      +	  tree for use with tsearch; used to populate the 'o' inverse OID
 12:  c92b8d12ec = 13:  d155240b16 Add standalone build infrastructure for reftable
 13:  479fe884e9 ! 14:  073bff7279 Reftable support for git-core
     @@ Commit message
      
          For background, see the previous commit introducing the library.
      
     -    This introduces the refs/reftable-backend.c containing reftable powered ref
     -    storage backend.
     +    This introduces the file refs/reftable-backend.c containing a reftable-powered
     +    ref storage backend.
      
     -    It can be activated by passing --ref-storage=reftable to "git init".
     +    It can be activated by passing --ref-storage=reftable to "git init", or setting
     +    GIT_TEST_REFTABLE in the environment.
      
          Example use: see t/t0031-reftable.sh
      
     @@ Makefile: cocciclean:
      
       ## builtin/clone.c ##
      @@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
     - 		}
       	}
       
     --	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, INIT_DB_QUIET);
     -+	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN,
     + 	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, NULL,
     +-		INIT_DB_QUIET);
      +		default_ref_storage(), INIT_DB_QUIET);
       
       	if (real_git_dir)
       		git_dir = real_git_dir;
     +@@ builtin/clone.c: int cmd_clone(int argc, const char **argv, const char *prefix)
     + 		 * Now that we know what algorithm the remote side is using,
     + 		 * let's set ours to the same thing.
     + 		 */
     +-		initialize_repository_version(hash_algo);
     ++		initialize_repository_version(hash_algo, default_ref_storage());
     + 		repo_set_hash_algo(the_repository, hash_algo);
     + 
     + 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
      
       ## builtin/init-db.c ##
      @@ builtin/init-db.c: static int needs_work_tree_config(const char *git_dir, const char *work_tree)
     @@ builtin/init-db.c: static int create_default_files(const char *template_path,
       	 * We need to create a "refs" dir in any case so that older
       	 * versions of git can tell that this is a repository.
      @@ builtin/init-db.c: static int create_default_files(const char *template_path,
     - 	 * Create the default symlink from ".git/HEAD" to the "master"
     - 	 * branch, if it does not exist yet.
     + 	 * Point the HEAD symref to the initial branch with if HEAD does
     + 	 * not yet exist.
       	 */
      -	path = git_path_buf(&buf, "HEAD");
      -	reinit = (!access(path, R_OK)
      -		  || readlink(path, junk, sizeof(junk)-1) != -1);
       	if (!reinit) {
     - 		if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
     - 			exit(1);
     + 		char *ref;
     + 
     +@@ builtin/init-db.c: static int create_default_files(const char *template_path,
     + 		free(ref);
       	}
       
      -	initialize_repository_version(fmt->hash_algo);
     @@ builtin/init-db.c: static int create_default_files(const char *template_path,
       	/* Check filemode trustability */
       	path = git_path_buf(&buf, "config");
      @@ builtin/init-db.c: static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
     - }
       
       int init_db(const char *git_dir, const char *real_git_dir,
     --	    const char *template_dir, int hash, unsigned int flags)
     -+	    const char *template_dir, int hash, const char *ref_storage_format,
     -+	    unsigned int flags)
     + 	    const char *template_dir, int hash, const char *initial_branch,
     +-	    unsigned int flags)
     ++	    const char *ref_storage_format, unsigned int flags)
       {
       	int reinit;
       	int exist_ok = flags & INIT_DB_EXIST_OK;
     @@ builtin/init-db.c: static const char *const init_db_usage[] = {
       	const char *work_tree;
       	const char *template_dir = NULL;
      @@ builtin/init-db.c: int cmd_init_db(int argc, const char **argv, const char *prefix)
     - 	const char *object_format = NULL;
     + 	const char *initial_branch = NULL;
       	int hash_algo = GIT_HASH_UNKNOWN;
       	const struct option init_db_options[] = {
      -		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
     @@ builtin/init-db.c: int cmd_init_db(int argc, const char **argv, const char *pref
      +			   N_("the ref storage format to use")),
       		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
       			   N_("separate git dir from working tree")),
     - 		OPT_STRING(0, "object-format", &object_format, N_("hash"),
     + 		OPT_STRING('b', "initial-branch", &initial_branch, N_("name"),
      @@ builtin/init-db.c: int cmd_init_db(int argc, const char **argv, const char *prefix)
       	}
       
     @@ builtin/init-db.c: int cmd_init_db(int argc, const char **argv, const char *pref
       	UNLEAK(work_tree);
       
       	flags |= INIT_DB_EXIST_OK;
     --	return init_db(git_dir, real_git_dir, template_dir, hash_algo, flags);
     -+	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
     -+		       ref_storage_format, flags);
     + 	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
     +-		       initial_branch, flags);
     ++		       initial_branch, ref_storage_format, flags);
       }
      
     + ## builtin/worktree.c ##
     +@@
     + #include "submodule.h"
     + #include "utf8.h"
     + #include "worktree.h"
     ++#include "../refs/refs-internal.h"
     + 
     + static const char * const worktree_usage[] = {
     + 	N_("git worktree add [<options>] <path> [<commit-ish>]"),
     +@@ builtin/worktree.c: static int add_worktree(const char *path, const char *refname,
     + 	 * worktree.
     + 	 */
     + 	strbuf_reset(&sb);
     +-	strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
     +-	write_file(sb.buf, "%s", oid_to_hex(&null_oid));
     +-	strbuf_reset(&sb);
     ++	if (get_main_ref_store(the_repository)->be == &refs_be_reftable) {
     ++		/* XXX this is cut & paste from reftable_init_db. */
     ++		strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
     ++		write_file(sb.buf, "%s", "ref: refs/heads/.invalid\n");
     ++		strbuf_reset(&sb);
     ++
     ++		strbuf_addf(&sb, "%s/refs", sb_repo.buf);
     ++		safe_create_dir(sb.buf, 1);
     ++		strbuf_reset(&sb);
     ++
     ++		strbuf_addf(&sb, "%s/refs/heads", sb_repo.buf);
     ++		write_file(sb.buf, "this repository uses the reftable format");
     ++		strbuf_reset(&sb);
     ++
     ++		strbuf_addf(&sb, "%s/reftable", sb_repo.buf);
     ++		safe_create_dir(sb.buf, 1);
     ++		strbuf_reset(&sb);
     ++	} else {
     ++		strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
     ++		write_file(sb.buf, "%s", oid_to_hex(&null_oid));
     ++		strbuf_reset(&sb);
     ++	}
     ++
     + 	strbuf_addf(&sb, "%s/commondir", sb_repo.buf);
     + 	write_file(sb.buf, "../..");
     + 
     +
       ## cache.h ##
      @@ cache.h: int path_inside_repo(const char *prefix, const char *path);
     + #define INIT_DB_EXIST_OK 0x0002
       
       int init_db(const char *git_dir, const char *real_git_dir,
     - 	    const char *template_dir, int hash_algo,
     --	    unsigned int flags);
     +-	    const char *template_dir, int hash_algo,
     +-	    const char *initial_branch, unsigned int flags);
      -void initialize_repository_version(int hash_algo);
     ++	    const char *template_dir, int hash_algo, const char *initial_branch,
      +	    const char *ref_storage_format, unsigned int flags);
      +void initialize_repository_version(int hash_algo,
      +				   const char *ref_storage_format);
     @@ cache.h: struct repository_format {
      
       ## refs.c ##
      @@
     - #include "argv-array.h"
       #include "repository.h"
     + #include "sigchain.h"
       
      +const char *default_ref_storage(void)
      +{
     @@ refs.c: struct ref_store *get_main_ref_store(struct repository *r)
      -	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
      +	r->refs_private = ref_store_init(r->gitdir,
      +					 r->ref_storage_format ?
     -+						 r->ref_storage_format :
     -+						 default_ref_storage(),
     ++						       r->ref_storage_format :
     ++						       default_ref_storage(),
      +					 REF_STORE_ALL_CAPS);
       	return r->refs_private;
       }
     @@ refs/refs-internal.h: struct ref_storage_be {
       ## refs/reftable-backend.c (new) ##
      @@
      +#include "../cache.h"
     ++#include "../chdir-notify.h"
      +#include "../config.h"
     -+#include "../refs.h"
     -+#include "refs-internal.h"
      +#include "../iterator.h"
      +#include "../lockfile.h"
     -+#include "../chdir-notify.h"
     -+
     ++#include "../refs.h"
      +#include "../reftable/reftable.h"
     ++#include "../worktree.h"
     ++#include "refs-internal.h"
      +
      +extern struct ref_storage_be refs_be_reftable;
      +
     @@ refs/reftable-backend.c (new)
      +
      +	int err;
      +	char *repo_dir;
     ++
      +	char *reftable_dir;
     -+	struct reftable_stack *stack;
     ++	char *worktree_reftable_dir;
     ++
     ++	struct reftable_stack *main_stack;
     ++	struct reftable_stack *worktree_stack;
      +};
      +
     -+static int reftable_read_raw_ref(struct ref_store *ref_store,
     -+				 const char *refname, struct object_id *oid,
     -+				 struct strbuf *referent, unsigned int *type);
     ++static struct reftable_stack *stack_for(struct git_reftable_ref_store *store,
     ++					const char *refname)
     ++{
     ++	if (store->worktree_stack == NULL)
     ++		return store->main_stack;
     ++
     ++	switch (ref_type(refname)) {
     ++	case REF_TYPE_PER_WORKTREE:
     ++	case REF_TYPE_PSEUDOREF:
     ++	case REF_TYPE_OTHER_PSEUDOREF:
     ++		return store->worktree_stack;
     ++	default:
     ++	case REF_TYPE_MAIN_PSEUDOREF:
     ++	case REF_TYPE_NORMAL:
     ++		return store->main_stack;
     ++	}
     ++}
     ++
     ++static int git_reftable_read_raw_ref(struct ref_store *ref_store,
     ++				     const char *refname, struct object_id *oid,
     ++				     struct strbuf *referent,
     ++				     unsigned int *type);
      +
      +static void clear_reftable_log_record(struct reftable_log_record *log)
      +{
      +	log->old_hash = NULL;
      +	log->new_hash = NULL;
      +	log->message = NULL;
     -+	log->ref_name = NULL;
     ++	log->refname = NULL;
      +	reftable_log_record_clear(log);
      +}
      +
     @@ refs/reftable-backend.c (new)
      +	log->tz_offset = sign * atoi(split.tz_begin);
      +}
      +
     ++static int has_suffix(struct strbuf *b, const char *suffix)
     ++{
     ++	size_t len = strlen(suffix);
     ++
     ++	if (len > b->len) {
     ++		return 0;
     ++	}
     ++
     ++	return 0 == strncmp(b->buf + b->len - len, suffix, len);
     ++}
     ++
     ++static int trim_component(struct strbuf *b)
     ++{
     ++	char *last;
     ++	last = strrchr(b->buf, '/');
     ++	if (!last)
     ++		return -1;
     ++	strbuf_setlen(b, last - b->buf);
     ++	return 0;
     ++}
     ++
     ++static int is_worktree(struct strbuf *b)
     ++{
     ++	if (trim_component(b) < 0) {
     ++		return 0;
     ++	}
     ++	if (!has_suffix(b, "/worktrees")) {
     ++		return 0;
     ++	}
     ++	trim_component(b);
     ++	return 1;
     ++}
     ++
      +static struct ref_store *git_reftable_ref_store_create(const char *path,
      +						       unsigned int store_flags)
      +{
     @@ refs/reftable-backend.c (new)
      +		.hash_id = the_hash_algo->format_id,
      +	};
      +	struct strbuf sb = STRBUF_INIT;
     ++	const char *gitdir = path;
     ++	struct strbuf wt_buf = STRBUF_INIT;
     ++	int wt = 0;
     ++
     ++	strbuf_addstr(&wt_buf, path);
     ++
     ++	/* this is clumsy, but the official worktree functions (eg.
     ++	 * get_worktrees()) function will try to initialize a ref storage
     ++	 * backend, leading to infinite recursion.  */
     ++	wt = is_worktree(&wt_buf);
     ++	if (wt) {
     ++		gitdir = wt_buf.buf;
     ++	}
      +
      +	base_ref_store_init(ref_store, &refs_be_reftable);
      +	refs->store_flags = store_flags;
     -+	refs->repo_dir = xstrdup(path);
     -+	strbuf_addf(&sb, "%s/reftable", path);
     ++	refs->repo_dir = xstrdup(gitdir);
     ++	strbuf_addf(&sb, "%s/reftable", gitdir);
      +	refs->reftable_dir = xstrdup(sb.buf);
      +	strbuf_reset(&sb);
      +
     -+	refs->err = reftable_new_stack(&refs->stack, refs->reftable_dir, cfg);
     ++	refs->err =
     ++		reftable_new_stack(&refs->main_stack, refs->reftable_dir, cfg);
      +	assert(refs->err != REFTABLE_API_ERROR);
     ++
     ++	if (refs->err == 0 && wt) {
     ++		strbuf_addf(&sb, "%s/reftable", path);
     ++		refs->worktree_reftable_dir = xstrdup(sb.buf);
     ++
     ++		refs->err = reftable_new_stack(&refs->worktree_stack,
     ++					       refs->worktree_reftable_dir,
     ++					       cfg);
     ++		assert(refs->err != REFTABLE_API_ERROR);
     ++	}
     ++
      +	strbuf_release(&sb);
     ++	strbuf_release(&wt_buf);
      +	return ref_store;
      +}
      +
     -+static int reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
     ++static int git_reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
      +{
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
      +	struct strbuf sb = STRBUF_INIT;
      +
      +	safe_create_dir(refs->reftable_dir, 1);
     ++	assert(refs->worktree_reftable_dir == NULL);
      +
      +	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
      +	write_file(sb.buf, "ref: refs/heads/.invalid");
     @@ refs/reftable-backend.c (new)
      +	struct reftable_ref_record ref;
      +	struct object_id oid;
      +	struct ref_store *ref_store;
     ++
     ++	/* In case we must iterate over 2 stacks, this is non-null. */
     ++	struct reftable_merged_table *merged;
      +	unsigned int flags;
      +	int err;
      +	const char *prefix;
     @@ refs/reftable-backend.c (new)
      +		   a PSEUDOREF, but a PER_WORKTREE, b/c each worktree can have
      +		   its own HEAD.
      +		 */
     -+		ri->base.refname = ri->ref.ref_name;
     ++		ri->base.refname = ri->ref.refname;
      +		if (ri->prefix != NULL &&
     -+		    strncmp(ri->prefix, ri->ref.ref_name, strlen(ri->prefix))) {
     ++		    strncmp(ri->prefix, ri->ref.refname, strlen(ri->prefix))) {
      +			ri->err = 1;
      +			break;
      +		}
     @@ refs/reftable-backend.c (new)
      +		} else if (ri->ref.target != NULL) {
      +			int out_flags = 0;
      +			const char *resolved = refs_resolve_ref_unsafe(
     -+				ri->ref_store, ri->ref.ref_name,
     ++				ri->ref_store, ri->ref.refname,
      +				RESOLVE_REF_READING, &ri->oid, &out_flags);
      +			ri->base.flags = out_flags;
      +			if (resolved == NULL &&
     @@ refs/reftable-backend.c (new)
      +		(struct git_reftable_iterator *)ref_iterator;
      +	reftable_ref_record_clear(&ri->ref);
      +	reftable_iterator_destroy(&ri->iter);
     ++	if (ri->merged) {
     ++		reftable_merged_table_free(ri->merged);
     ++	}
      +	return 0;
      +}
      +
     @@ refs/reftable-backend.c (new)
      +};
      +
      +static struct ref_iterator *
     -+reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
     -+			    unsigned int flags)
     ++git_reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
     ++				unsigned int flags)
      +{
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
      +	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
     -+	struct reftable_merged_table *mt = NULL;
      +
      +	if (refs->err < 0) {
      +		ri->err = refs->err;
     -+	} else {
     -+		mt = reftable_stack_merged_table(refs->stack);
     ++	} else if (refs->worktree_stack == NULL) {
     ++		struct reftable_merged_table *mt =
     ++			reftable_stack_merged_table(refs->main_stack);
      +		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
     ++	} else {
     ++		struct reftable_merged_table *mt1 =
     ++			reftable_stack_merged_table(refs->main_stack);
     ++		struct reftable_merged_table *mt2 =
     ++			reftable_stack_merged_table(refs->worktree_stack);
     ++		struct reftable_table *tabs =
     ++			xcalloc(2, sizeof(struct reftable_table));
     ++		reftable_table_from_merged_table(&tabs[0], mt1);
     ++		reftable_table_from_merged_table(&tabs[1], mt2);
     ++		ri->err = reftable_new_merged_table(&ri->merged, tabs, 2,
     ++						    the_hash_algo->format_id);
     ++		if (ri->err == 0)
     ++			ri->err = reftable_merged_table_seek_ref(
     ++				ri->merged, &ri->iter, prefix);
      +	}
      +
      +	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
     @@ refs/reftable-backend.c (new)
      +		struct ref_update *update = transaction->updates[i];
      +		struct object_id old_oid;
      +
     -+		err = reftable_read_raw_ref(ref_store, update->refname,
     -+					    &old_oid, &referent,
     -+					    /* mutate input, like
     -+					       files-backend.c */
     -+					    &update->type);
     ++		err = git_reftable_read_raw_ref(ref_store, update->refname,
     ++						&old_oid, &referent,
     ++						/* mutate input, like
     ++						   files-backend.c */
     ++						&update->type);
      +		if (err < 0 && errno == ENOENT &&
      +		    is_null_oid(&update->old_oid)) {
      +			err = 0;
     @@ refs/reftable-backend.c (new)
      +	return err;
      +}
      +
     -+static int reftable_transaction_prepare(struct ref_store *ref_store,
     -+					struct ref_transaction *transaction,
     -+					struct strbuf *errbuf)
     ++static int git_reftable_transaction_prepare(struct ref_store *ref_store,
     ++					    struct ref_transaction *transaction,
     ++					    struct strbuf *errbuf)
      +{
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
      +	struct reftable_addition *add = NULL;
     ++	struct reftable_stack *stack =
     ++		transaction->nr ?
     ++			      stack_for(refs, transaction->updates[0]->refname) :
     ++			      refs->main_stack;
      +	int err = refs->err;
      +	if (err < 0) {
      +		goto done;
      +	}
      +
     -+	err = reftable_stack_reload(refs->stack);
     ++	err = reftable_stack_reload(stack);
      +	if (err) {
      +		goto done;
      +	}
      +
     -+	err = reftable_stack_new_addition(&add, refs->stack);
     ++	err = reftable_stack_new_addition(&add, stack);
      +	if (err) {
      +		goto done;
      +	}
     @@ refs/reftable-backend.c (new)
      +	return err;
      +}
      +
     -+static int reftable_transaction_abort(struct ref_store *ref_store,
     -+				      struct ref_transaction *transaction,
     -+				      struct strbuf *err)
     ++static int git_reftable_transaction_abort(struct ref_store *ref_store,
     ++					  struct ref_transaction *transaction,
     ++					  struct strbuf *err)
      +{
      +	struct reftable_addition *add =
      +		(struct reftable_addition *)transaction->backend_data;
     @@ refs/reftable-backend.c (new)
      +	struct ref_transaction *transaction = (struct ref_transaction *)arg;
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)transaction->ref_store;
     -+	uint64_t ts = reftable_stack_next_update_index(refs->stack);
     ++	struct reftable_stack *stack =
     ++		stack_for(refs, transaction->updates[0]->refname);
     ++	uint64_t ts = reftable_stack_next_update_index(stack);
      +	int err = 0;
      +	int i = 0;
      +	struct reftable_log_record *logs =
     @@ refs/reftable-backend.c (new)
      +		struct ref_update *u = sorted[i];
      +		struct reftable_log_record *log = &logs[i];
      +		fill_reftable_log_record(log);
     -+		log->ref_name = (char *)u->refname;
     ++		log->refname = (char *)u->refname;
      +		log->old_hash = u->old_oid.hash;
      +		log->new_hash = u->new_oid.hash;
      +		log->update_index = ts;
     @@ refs/reftable-backend.c (new)
      +			struct object_id peeled;
      +
      +			int peel_error = peel_object(&u->new_oid, &peeled);
     -+			ref.ref_name = (char *)u->refname;
     ++			ref.refname = (char *)u->refname;
      +
      +			if (!is_null_oid(&u->new_oid)) {
      +				ref.value = u->new_oid.hash;
     @@ refs/reftable-backend.c (new)
      +	return err;
      +}
      +
     -+static int reftable_transaction_finish(struct ref_store *ref_store,
     -+				       struct ref_transaction *transaction,
     -+				       struct strbuf *errmsg)
     ++static int git_reftable_transaction_finish(struct ref_store *ref_store,
     ++					   struct ref_transaction *transaction,
     ++					   struct strbuf *errmsg)
      +{
      +	struct reftable_addition *add =
      +		(struct reftable_addition *)transaction->backend_data;
     @@ refs/reftable-backend.c (new)
      +			}
      +		}
      +	}
     -+
     -+	err = reftable_addition_add(add, &write_transaction_table, transaction);
     -+	if (err < 0) {
     -+		goto done;
     ++	if (transaction->nr) {
     ++		err = reftable_addition_add(add, &write_transaction_table,
     ++					    transaction);
     ++		if (err < 0) {
     ++			goto done;
     ++		}
      +	}
      +
      +	err = reftable_addition_commit(add);
     @@ refs/reftable-backend.c (new)
      +}
      +
      +static int
     -+reftable_transaction_initial_commit(struct ref_store *ref_store,
     -+				    struct ref_transaction *transaction,
     -+				    struct strbuf *errmsg)
     ++git_reftable_transaction_initial_commit(struct ref_store *ref_store,
     ++					struct ref_transaction *transaction,
     ++					struct strbuf *errmsg)
      +{
     -+	int err = reftable_transaction_prepare(ref_store, transaction, errmsg);
     ++	int err = git_reftable_transaction_prepare(ref_store, transaction,
     ++						   errmsg);
      +	if (err)
      +		return err;
      +
     -+	return reftable_transaction_finish(ref_store, transaction, errmsg);
     ++	return git_reftable_transaction_finish(ref_store, transaction, errmsg);
      +}
      +
      +struct write_pseudoref_arg {
     @@ refs/reftable-backend.c (new)
      +		}
      +	}
      +
     -+	write_ref.ref_name = (char *)arg->pseudoref;
     ++	write_ref.refname = (char *)arg->pseudoref;
      +	write_ref.update_index = ts;
      +	if (!is_null_oid(arg->new_oid))
      +		write_ref.value = (uint8_t *)arg->new_oid->hash;
     @@ refs/reftable-backend.c (new)
      +	return err;
      +}
      +
     -+static int reftable_write_pseudoref(struct ref_store *ref_store,
     -+				    const char *pseudoref,
     -+				    const struct object_id *oid,
     -+				    const struct object_id *old_oid,
     -+				    struct strbuf *errbuf)
     ++static int git_reftable_write_pseudoref(struct ref_store *ref_store,
     ++					const char *pseudoref,
     ++					const struct object_id *oid,
     ++					const struct object_id *old_oid,
     ++					struct strbuf *errbuf)
      +{
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_stack *stack = stack_for(refs, pseudoref);
      +	struct write_pseudoref_arg arg = {
     -+		.stack = refs->stack,
     ++		.stack = stack,
      +		.pseudoref = pseudoref,
      +		.new_oid = oid,
      +	};
     @@ refs/reftable-backend.c (new)
      +		goto done;
      +	}
      +
     -+	err = reftable_stack_reload(refs->stack);
     ++	err = reftable_stack_reload(stack);
      +	if (err) {
      +		goto done;
      +	}
     -+	err = reftable_stack_new_addition(&add, refs->stack);
     ++	err = reftable_stack_new_addition(&add, stack);
      +	if (err) {
      +		goto done;
      +	}
     @@ refs/reftable-backend.c (new)
      +				     const struct object_id *old_oid)
      +{
      +	struct strbuf errbuf = STRBUF_INIT;
     -+	int ret = reftable_write_pseudoref(ref_store, pseudoref, &null_oid,
     -+					   old_oid, &errbuf);
     ++	int ret = git_reftable_write_pseudoref(ref_store, pseudoref, &null_oid,
     ++					       old_oid, &errbuf);
      +	/* XXX what to do with the error message? */
      +	strbuf_release(&errbuf);
      +	return ret;
     @@ refs/reftable-backend.c (new)
      +	reftable_writer_set_limits(writer, ts, ts);
      +	for (i = 0; i < arg->refnames->nr; i++) {
      +		struct reftable_ref_record ref = {
     -+			.ref_name = (char *)arg->refnames->items[i].string,
     ++			.refname = (char *)arg->refnames->items[i].string,
      +			.update_index = ts,
      +		};
      +		err = reftable_writer_add_ref(writer, &ref);
     @@ refs/reftable-backend.c (new)
      +		log.new_hash = NULL;
      +		log.old_hash = NULL;
      +		log.update_index = ts;
     -+		log.ref_name = (char *)arg->refnames->items[i].string;
     ++		log.refname = (char *)arg->refnames->items[i].string;
      +
     -+		if (reftable_stack_read_ref(arg->stack, log.ref_name,
     ++		if (reftable_stack_read_ref(arg->stack, log.refname,
      +					    &current) == 0) {
      +			log.old_hash = current.value;
      +		}
     @@ refs/reftable-backend.c (new)
      +	return 0;
      +}
      +
     -+static int reftable_delete_refs(struct ref_store *ref_store, const char *msg,
     -+				struct string_list *refnames,
     -+				unsigned int flags)
     ++static int git_reftable_delete_refs(struct ref_store *ref_store,
     ++				    const char *msg,
     ++				    struct string_list *refnames,
     ++				    unsigned int flags)
      +{
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_stack *stack =
     ++		stack_for(refs, refnames->items[0].string);
      +	struct write_delete_refs_arg arg = {
     -+		.stack = refs->stack,
     ++		.stack = stack,
      +		.refnames = refnames,
      +		.logmsg = msg,
      +		.flags = flags,
     @@ refs/reftable-backend.c (new)
      +	}
      +
      +	string_list_sort(refnames);
     -+	err = reftable_stack_reload(refs->stack);
     ++	err = reftable_stack_reload(stack);
      +	if (err) {
      +		goto done;
      +	}
     -+	err = reftable_stack_add(refs->stack, &write_delete_refs_table, &arg);
     ++	err = reftable_stack_add(stack, &write_delete_refs_table, &arg);
      +done:
      +	assert(err != REFTABLE_API_ERROR);
      +	return err;
      +}
      +
     -+static int reftable_pack_refs(struct ref_store *ref_store, unsigned int flags)
     ++static int git_reftable_pack_refs(struct ref_store *ref_store,
     ++				  unsigned int flags)
      +{
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
     -+	if (refs->err < 0) {
     -+		return refs->err;
     ++	int err = refs->err;
     ++	if (err < 0) {
     ++		return err;
      +	}
     -+	return reftable_stack_compact_all(refs->stack, NULL);
     ++	err = reftable_stack_compact_all(refs->main_stack, NULL);
     ++	if (err == 0 && refs->worktree_stack != NULL)
     ++		err = reftable_stack_compact_all(refs->worktree_stack, NULL);
     ++	return err;
      +}
      +
      +struct write_create_symref_arg {
      +	struct git_reftable_ref_store *refs;
     ++	struct reftable_stack *stack;
      +	const char *refname;
      +	const char *target;
      +	const char *logmsg;
     @@ refs/reftable-backend.c (new)
      +{
      +	struct write_create_symref_arg *create =
      +		(struct write_create_symref_arg *)arg;
     -+	uint64_t ts = reftable_stack_next_update_index(create->refs->stack);
     ++	uint64_t ts = reftable_stack_next_update_index(create->stack);
      +	int err = 0;
      +
      +	struct reftable_ref_record ref = {
     -+		.ref_name = (char *)create->refname,
     ++		.refname = (char *)create->refname,
      +		.target = (char *)create->target,
      +		.update_index = ts,
      +	};
     @@ refs/reftable-backend.c (new)
      +		struct object_id old_oid;
      +
      +		fill_reftable_log_record(&log);
     -+		log.ref_name = (char *)create->refname;
     ++		log.refname = (char *)create->refname;
      +		log.message = (char *)create->logmsg;
      +		log.update_index = ts;
      +		if (refs_resolve_ref_unsafe(
     @@ refs/reftable-backend.c (new)
      +		if (log.old_hash != NULL || log.new_hash != NULL) {
      +			err = reftable_writer_add_log(writer, &log);
      +		}
     -+		log.ref_name = NULL;
     ++		log.refname = NULL;
      +		log.message = NULL;
      +		log.old_hash = NULL;
      +		log.new_hash = NULL;
     @@ refs/reftable-backend.c (new)
      +	return err;
      +}
      +
     -+static int reftable_create_symref(struct ref_store *ref_store,
     -+				  const char *refname, const char *target,
     -+				  const char *logmsg)
     ++static int git_reftable_create_symref(struct ref_store *ref_store,
     ++				      const char *refname, const char *target,
     ++				      const char *logmsg)
      +{
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_stack *stack = stack_for(refs, refname);
      +	struct write_create_symref_arg arg = { .refs = refs,
     ++					       .stack = stack,
      +					       .refname = refname,
      +					       .target = target,
      +					       .logmsg = logmsg };
     @@ refs/reftable-backend.c (new)
      +	if (err < 0) {
      +		goto done;
      +	}
     -+	err = reftable_stack_reload(refs->stack);
     ++	err = reftable_stack_reload(stack);
      +	if (err) {
      +		goto done;
      +	}
     -+	err = reftable_stack_add(refs->stack, &write_create_symref_table, &arg);
     ++	err = reftable_stack_add(stack, &write_create_symref_table, &arg);
      +done:
      +	assert(err != REFTABLE_API_ERROR);
      +	return err;
     @@ refs/reftable-backend.c (new)
      +		goto done;
      +	}
      +
     -+	free(ref.ref_name);
     -+	ref.ref_name = strdup(arg->newname);
     ++	free(ref.refname);
     ++	ref.refname = strdup(arg->newname);
      +	reftable_writer_set_limits(writer, ts, ts);
      +	ref.update_index = ts;
      +
      +	{
      +		struct reftable_ref_record todo[2] = { { NULL } };
     -+		todo[0].ref_name = (char *)arg->oldname;
     ++		todo[0].refname = (char *)arg->oldname;
      +		todo[0].update_index = ts;
      +		/* leave todo[0] empty */
      +		todo[1] = ref;
     @@ refs/reftable-backend.c (new)
      +		fill_reftable_log_record(&todo[0]);
      +		fill_reftable_log_record(&todo[1]);
      +
     -+		todo[0].ref_name = (char *)arg->oldname;
     ++		todo[0].refname = (char *)arg->oldname;
      +		todo[0].update_index = ts;
      +		todo[0].message = (char *)arg->logmsg;
      +		todo[0].old_hash = ref.value;
      +		todo[0].new_hash = NULL;
      +
     -+		todo[1].ref_name = (char *)arg->newname;
     ++		todo[1].refname = (char *)arg->newname;
      +		todo[1].update_index = ts;
      +		todo[1].old_hash = NULL;
      +		todo[1].new_hash = ref.value;
     @@ refs/reftable-backend.c (new)
      +	return err;
      +}
      +
     -+static int reftable_rename_ref(struct ref_store *ref_store,
     -+			       const char *oldrefname, const char *newrefname,
     -+			       const char *logmsg)
     ++static int git_reftable_rename_ref(struct ref_store *ref_store,
     ++				   const char *oldrefname,
     ++				   const char *newrefname, const char *logmsg)
      +{
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_stack *stack = stack_for(refs, newrefname);
      +	struct write_rename_arg arg = {
     -+		.stack = refs->stack,
     ++		.stack = stack,
      +		.oldname = oldrefname,
      +		.newname = newrefname,
      +		.logmsg = logmsg,
     @@ refs/reftable-backend.c (new)
      +	if (err < 0) {
      +		goto done;
      +	}
     -+	err = reftable_stack_reload(refs->stack);
     ++	err = reftable_stack_reload(stack);
      +	if (err) {
      +		goto done;
      +	}
      +
     -+	err = reftable_stack_add(refs->stack, &write_rename_table, &arg);
     ++	err = reftable_stack_add(stack, &write_rename_table, &arg);
      +done:
      +	assert(err != REFTABLE_API_ERROR);
      +	return err;
      +}
      +
     -+static int reftable_copy_ref(struct ref_store *ref_store,
     -+			     const char *oldrefname, const char *newrefname,
     -+			     const char *logmsg)
     ++static int git_reftable_copy_ref(struct ref_store *ref_store,
     ++				 const char *oldrefname, const char *newrefname,
     ++				 const char *logmsg)
      +{
      +	BUG("reftable reference store does not support copying references");
      +}
      +
     -+struct reftable_reflog_ref_iterator {
     ++struct git_reftable_reflog_ref_iterator {
      +	struct ref_iterator base;
      +	struct reftable_iterator iter;
      +	struct reftable_log_record log;
      +	struct object_id oid;
     ++
     ++	/* Used when iterating over worktree & main */
     ++	struct reftable_merged_table *merged;
      +	char *last_name;
      +};
      +
      +static int
     -+reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
     ++git_reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
      +{
     -+	struct reftable_reflog_ref_iterator *ri =
     -+		(struct reftable_reflog_ref_iterator *)ref_iterator;
     ++	struct git_reftable_reflog_ref_iterator *ri =
     ++		(struct git_reftable_reflog_ref_iterator *)ref_iterator;
      +
      +	while (1) {
      +		int err = reftable_iterator_next_log(&ri->iter, &ri->log);
     @@ refs/reftable-backend.c (new)
      +			return ITER_ERROR;
      +		}
      +
     -+		ri->base.refname = ri->log.ref_name;
     ++		ri->base.refname = ri->log.refname;
      +		if (ri->last_name != NULL &&
     -+		    !strcmp(ri->log.ref_name, ri->last_name)) {
     ++		    !strcmp(ri->log.refname, ri->last_name)) {
      +			/* we want the refnames that we have reflogs for, so we
      +			 * skip if we've already produced this name. This could
      +			 * be faster by seeking directly to
     @@ refs/reftable-backend.c (new)
      +		}
      +
      +		free(ri->last_name);
     -+		ri->last_name = xstrdup(ri->log.ref_name);
     ++		ri->last_name = xstrdup(ri->log.refname);
      +		hashcpy(ri->oid.hash, ri->log.new_hash);
      +		return ITER_OK;
      +	}
      +}
      +
     -+static int reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
     -+					     struct object_id *peeled)
     ++static int
     ++git_reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
     ++				      struct object_id *peeled)
      +{
      +	BUG("not supported.");
      +	return -1;
      +}
      +
     -+static int reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
     ++static int
     ++git_reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
      +{
     -+	struct reftable_reflog_ref_iterator *ri =
     -+		(struct reftable_reflog_ref_iterator *)ref_iterator;
     ++	struct git_reftable_reflog_ref_iterator *ri =
     ++		(struct git_reftable_reflog_ref_iterator *)ref_iterator;
      +	reftable_log_record_clear(&ri->log);
      +	reftable_iterator_destroy(&ri->iter);
     ++	if (ri->merged)
     ++		reftable_merged_table_free(ri->merged);
      +	return 0;
      +}
      +
     -+static struct ref_iterator_vtable reftable_reflog_ref_iterator_vtable = {
     -+	reftable_reflog_ref_iterator_advance, reftable_reflog_ref_iterator_peel,
     -+	reftable_reflog_ref_iterator_abort
     ++static struct ref_iterator_vtable git_reftable_reflog_ref_iterator_vtable = {
     ++	git_reftable_reflog_ref_iterator_advance,
     ++	git_reftable_reflog_ref_iterator_peel,
     ++	git_reftable_reflog_ref_iterator_abort
      +};
      +
      +static struct ref_iterator *
     -+reftable_reflog_iterator_begin(struct ref_store *ref_store)
     ++git_reftable_reflog_iterator_begin(struct ref_store *ref_store)
      +{
     -+	struct reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
     ++	struct git_reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
      +
     -+	struct reftable_merged_table *mt =
     -+		reftable_stack_merged_table(refs->stack);
     -+	int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
     -+	if (err < 0) {
     -+		free(ri);
     -+		return NULL;
     ++	if (refs->worktree_stack == NULL) {
     ++		struct reftable_stack *stack = refs->main_stack;
     ++		struct reftable_merged_table *mt =
     ++			reftable_stack_merged_table(stack);
     ++		int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
     ++		if (err < 0) {
     ++			free(ri);
     ++			/* XXX is this allowed? */
     ++			return NULL;
     ++		}
     ++	} else {
     ++		struct reftable_merged_table *mt1 =
     ++			reftable_stack_merged_table(refs->main_stack);
     ++		struct reftable_merged_table *mt2 =
     ++			reftable_stack_merged_table(refs->worktree_stack);
     ++		struct reftable_table *tabs =
     ++			xcalloc(2, sizeof(struct reftable_table));
     ++		int err = 0;
     ++		reftable_table_from_merged_table(&tabs[0], mt1);
     ++		reftable_table_from_merged_table(&tabs[1], mt2);
     ++		err = reftable_new_merged_table(&ri->merged, tabs, 2,
     ++						the_hash_algo->format_id);
     ++		if (err < 0) {
     ++			free(tabs);
     ++			/* XXX see above */
     ++			return NULL;
     ++		}
     ++		err = reftable_merged_table_seek_ref(ri->merged, &ri->iter, "");
     ++		if (err < 0) {
     ++			return NULL;
     ++		}
      +	}
     -+
     -+	base_ref_iterator_init(&ri->base, &reftable_reflog_ref_iterator_vtable,
     -+			       1);
     ++	base_ref_iterator_init(&ri->base,
     ++			       &git_reftable_reflog_ref_iterator_vtable, 1);
      +	ri->base.oid = &ri->oid;
      +
      +	return (struct ref_iterator *)ri;
      +}
      +
     -+static int
     -+reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
     -+					  const char *refname,
     -+					  each_reflog_ent_fn fn, void *cb_data)
     ++static int git_reftable_for_each_reflog_ent_newest_first(
     ++	struct ref_store *ref_store, const char *refname, each_reflog_ent_fn fn,
     ++	void *cb_data)
      +{
      +	struct reftable_iterator it = { NULL };
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_stack *stack = stack_for(refs, refname);
      +	struct reftable_merged_table *mt = NULL;
      +	int err = 0;
      +	struct reftable_log_record log = { NULL };
     @@ refs/reftable-backend.c (new)
      +		return refs->err;
      +	}
      +
     -+	mt = reftable_stack_merged_table(refs->stack);
     ++	mt = reftable_stack_merged_table(stack);
      +	err = reftable_merged_table_seek_log(mt, &it, refname);
      +	while (err == 0) {
      +		struct object_id old_oid;
     @@ refs/reftable-backend.c (new)
      +			break;
      +		}
      +
     -+		if (strcmp(log.ref_name, refname)) {
     ++		if (strcmp(log.refname, refname)) {
      +			break;
      +		}
      +
     @@ refs/reftable-backend.c (new)
      +	return err;
      +}
      +
     -+static int
     -+reftable_for_each_reflog_ent_oldest_first(struct ref_store *ref_store,
     -+					  const char *refname,
     -+					  each_reflog_ent_fn fn, void *cb_data)
     ++static int git_reftable_for_each_reflog_ent_oldest_first(
     ++	struct ref_store *ref_store, const char *refname, each_reflog_ent_fn fn,
     ++	void *cb_data)
      +{
      +	struct reftable_iterator it = { NULL };
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_stack *stack = stack_for(refs, refname);
      +	struct reftable_merged_table *mt = NULL;
      +	struct reftable_log_record *logs = NULL;
      +	int cap = 0;
     @@ refs/reftable-backend.c (new)
      +	if (refs->err < 0) {
      +		return refs->err;
      +	}
     -+	mt = reftable_stack_merged_table(refs->stack);
     ++	mt = reftable_stack_merged_table(stack);
      +	err = reftable_merged_table_seek_log(mt, &it, refname);
      +
      +	while (err == 0) {
     @@ refs/reftable-backend.c (new)
      +			break;
      +		}
      +
     -+		if (strcmp(log.ref_name, refname)) {
     ++		if (strcmp(log.refname, refname)) {
      +			break;
      +		}
      +
     @@ refs/reftable-backend.c (new)
      +	return err;
      +}
      +
     -+static int reftable_reflog_exists(struct ref_store *ref_store,
     -+				  const char *refname)
     ++static int git_reftable_reflog_exists(struct ref_store *ref_store,
     ++				      const char *refname)
      +{
      +	/* always exists. */
      +	return 1;
      +}
      +
     -+static int reftable_create_reflog(struct ref_store *ref_store,
     -+				  const char *refname, int force_create,
     -+				  struct strbuf *err)
     ++static int git_reftable_create_reflog(struct ref_store *ref_store,
     ++				      const char *refname, int force_create,
     ++				      struct strbuf *err)
      +{
      +	return 0;
      +}
      +
     -+static int reftable_delete_reflog(struct ref_store *ref_store,
     -+				  const char *refname)
     ++static int git_reftable_delete_reflog(struct ref_store *ref_store,
     ++				      const char *refname)
      +{
      +	return 0;
      +}
      +
      +struct reflog_expiry_arg {
      +	struct git_reftable_ref_store *refs;
     ++	struct reftable_stack *stack;
      +	struct reftable_log_record *tombstones;
      +	int len;
      +	int cap;
     @@ refs/reftable-backend.c (new)
      +			      const char *refname, uint64_t ts)
      +{
      +	struct reftable_log_record tombstone = {
     -+		.ref_name = xstrdup(refname),
     ++		.refname = xstrdup(refname),
      +		.update_index = ts,
      +	};
      +	if (arg->len == arg->cap) {
     @@ refs/reftable-backend.c (new)
      +static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
      +{
      +	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
     -+	uint64_t ts = reftable_stack_next_update_index(arg->refs->stack);
     ++	uint64_t ts = reftable_stack_next_update_index(arg->stack);
      +	int i = 0;
      +	reftable_writer_set_limits(writer, ts, ts);
      +	for (i = 0; i < arg->len; i++) {
     @@ refs/reftable-backend.c (new)
      +	return 0;
      +}
      +
     -+static int reftable_reflog_expire(struct ref_store *ref_store,
     -+				  const char *refname,
     -+				  const struct object_id *oid,
     -+				  unsigned int flags,
     -+				  reflog_expiry_prepare_fn prepare_fn,
     -+				  reflog_expiry_should_prune_fn should_prune_fn,
     -+				  reflog_expiry_cleanup_fn cleanup_fn,
     -+				  void *policy_cb_data)
     ++static int
     ++git_reftable_reflog_expire(struct ref_store *ref_store, const char *refname,
     ++			   const struct object_id *oid, unsigned int flags,
     ++			   reflog_expiry_prepare_fn prepare_fn,
     ++			   reflog_expiry_should_prune_fn should_prune_fn,
     ++			   reflog_expiry_cleanup_fn cleanup_fn,
     ++			   void *policy_cb_data)
      +{
      +	/*
      +	  For log expiry, we write tombstones in place of the expired entries,
     @@ refs/reftable-backend.c (new)
      +	  stack, and expiring entries paradoxically takes extra memory.
      +
      +	  This memory is only reclaimed when some operation issues a
     -+	  reftable_pack_refs(), which will compact the entire stack and get rid
     -+	  of deletion entries.
     ++	  git_reftable_pack_refs(), which will compact the entire stack and get
     ++	  rid of deletion entries.
      +
      +	  It would be better if the refs backend supported an API that sets a
      +	  criterion for all refs, passing the criterion to pack_refs().
      +	*/
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_stack *stack = stack_for(refs, refname);
      +	struct reftable_merged_table *mt = NULL;
      +	struct reflog_expiry_arg arg = {
     ++		.stack = stack,
      +		.refs = refs,
      +	};
      +	struct reftable_log_record log = { NULL };
     @@ refs/reftable-backend.c (new)
      +	if (refs->err < 0) {
      +		return refs->err;
      +	}
     -+	err = reftable_stack_reload(refs->stack);
     ++	err = reftable_stack_reload(stack);
      +	if (err) {
      +		goto done;
      +	}
      +
     -+	mt = reftable_stack_merged_table(refs->stack);
     ++	mt = reftable_stack_merged_table(stack);
      +	err = reftable_merged_table_seek_log(mt, &it, refname);
      +	if (err < 0) {
      +		goto done;
     @@ refs/reftable-backend.c (new)
      +			goto done;
      +		}
      +
     -+		if (err > 0 || strcmp(log.ref_name, refname)) {
     ++		if (err > 0 || strcmp(log.refname, refname)) {
      +			break;
      +		}
      +		hashcpy(ooid.hash, log.old_hash);
     @@ refs/reftable-backend.c (new)
      +			add_log_tombstone(&arg, refname, log.update_index);
      +		}
      +	}
     -+	err = reftable_stack_add(refs->stack, &write_reflog_expiry_table, &arg);
     ++	err = reftable_stack_add(stack, &write_reflog_expiry_table, &arg);
      +
      +done:
      +	assert(err != REFTABLE_API_ERROR);
     @@ refs/reftable-backend.c (new)
      +	return err;
      +}
      +
     -+static int reftable_read_raw_ref(struct ref_store *ref_store,
     -+				 const char *refname, struct object_id *oid,
     -+				 struct strbuf *referent, unsigned int *type)
     ++static int git_reftable_read_raw_ref(struct ref_store *ref_store,
     ++				     const char *refname, struct object_id *oid,
     ++				     struct strbuf *referent,
     ++				     unsigned int *type)
      +{
      +	struct git_reftable_ref_store *refs =
      +		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_stack *stack = stack_for(refs, refname);
     ++
      +	struct reftable_ref_record ref = { NULL };
      +	int err = 0;
      +	if (refs->err < 0) {
     @@ refs/reftable-backend.c (new)
      +	/* This is usually not needed, but Git doesn't signal to ref backend if
      +	   a subprocess updated the ref DB.  So we always check.
      +	*/
     -+	err = reftable_stack_reload(refs->stack);
     ++	err = reftable_stack_reload(stack);
      +	if (err) {
      +		goto done;
      +	}
      +
     -+	err = reftable_stack_read_ref(refs->stack, refname, &ref);
     ++	err = reftable_stack_read_ref(stack, refname, &ref);
      +	if (err > 0) {
      +		errno = ENOENT;
      +		err = -1;
     @@ refs/reftable-backend.c (new)
      +	&refs_be_files,
      +	"reftable",
      +	git_reftable_ref_store_create,
     -+	reftable_init_db,
     -+	reftable_transaction_prepare,
     -+	reftable_transaction_finish,
     -+	reftable_transaction_abort,
     -+	reftable_transaction_initial_commit,
     -+
     -+	reftable_pack_refs,
     -+	reftable_create_symref,
     -+	reftable_delete_refs,
     -+	reftable_rename_ref,
     -+	reftable_copy_ref,
     -+
     -+	reftable_write_pseudoref,
     ++	git_reftable_init_db,
     ++	git_reftable_transaction_prepare,
     ++	git_reftable_transaction_finish,
     ++	git_reftable_transaction_abort,
     ++	git_reftable_transaction_initial_commit,
     ++
     ++	git_reftable_pack_refs,
     ++	git_reftable_create_symref,
     ++	git_reftable_delete_refs,
     ++	git_reftable_rename_ref,
     ++	git_reftable_copy_ref,
     ++
     ++	git_reftable_write_pseudoref,
      +	reftable_delete_pseudoref,
      +
     -+	reftable_ref_iterator_begin,
     -+	reftable_read_raw_ref,
     ++	git_reftable_ref_iterator_begin,
     ++	git_reftable_read_raw_ref,
      +
     -+	reftable_reflog_iterator_begin,
     -+	reftable_for_each_reflog_ent_oldest_first,
     -+	reftable_for_each_reflog_ent_newest_first,
     -+	reftable_reflog_exists,
     -+	reftable_create_reflog,
     -+	reftable_delete_reflog,
     -+	reftable_reflog_expire,
     ++	git_reftable_reflog_iterator_begin,
     ++	git_reftable_for_each_reflog_ent_oldest_first,
     ++	git_reftable_for_each_reflog_ent_newest_first,
     ++	git_reftable_reflog_exists,
     ++	git_reftable_create_reflog,
     ++	git_reftable_delete_reflog,
     ++	git_reftable_reflog_expire,
      +};
      
       ## reftable/update.sh ##
     @@ t/t0031-reftable.sh (new)
      +	test -f file2
      +'
      +
     ++test_expect_success 'worktrees' '
     ++	git init --ref-storage=reftable start &&
     ++	(cd start && test_commit file1 && git checkout -b branch1 &&
     ++	git checkout -b branch2 &&
     ++	git worktree add  ../wt
     ++	) &&
     ++	cd wt &&
     ++	git checkout branch1 &&
     ++	git branch
     ++'
     ++
     ++test_expect_success 'worktrees 2' '
     ++	initialize &&
     ++	test_commit file1 &&
     ++	mkdir existing_empty &&
     ++	git worktree add --detach existing_empty master
     ++'
     ++
      +test_done
      +
 14:  eafd8eeefc = 15:  04c86e7395 Hookup unittests for the reftable library.
 15:  46af142f33 ! 16:  c751265705 Add GIT_DEBUG_REFS debugging mechanism
     @@ Makefile: LIB_OBJS += rebase.o
       LIB_OBJS += refs/reftable-backend.o
       LIB_OBJS += refs/iterator.o
      
     + ## builtin/worktree.c ##
     +@@ builtin/worktree.c: static int add_worktree(const char *path, const char *refname,
     + 	 * worktree.
     + 	 */
     + 	strbuf_reset(&sb);
     +-	if (get_main_ref_store(the_repository)->be == &refs_be_reftable) {
     ++
     ++	/* XXX: check GIT_TEST_REFTABLE because GIT_DEBUG_REFS obscures the
     ++	 * instance type. */
     ++	if (get_main_ref_store(the_repository)->be == &refs_be_reftable ||
     ++	    getenv("GIT_TEST_REFTABLE") != NULL) {
     + 		/* XXX this is cut & paste from reftable_init_db. */
     + 		strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
     + 		write_file(sb.buf, "%s", "ref: refs/heads/.invalid\n");
     +
       ## refs.c ##
      @@ refs.c: struct ref_store *get_main_ref_store(struct repository *r)
     - 						 r->ref_storage_format :
     - 						 default_ref_storage(),
     + 						       r->ref_storage_format :
     + 						       default_ref_storage(),
       					 REF_STORE_ALL_CAPS);
      +	if (getenv("GIT_DEBUG_REFS")) {
     -+		r->refs_private = debug_wrap(r->refs_private);
     ++		r->refs_private = debug_wrap(r->gitdir, r->refs_private);
      +	}
       	return r->refs_private;
       }
     @@ refs/debug.c (new)
      +};
      +
      +extern struct ref_storage_be refs_be_debug;
     -+struct ref_store *debug_wrap(struct ref_store *store);
     ++struct ref_store *debug_wrap(const char *gitdir, struct ref_store *store);
      +
     -+struct ref_store *debug_wrap(struct ref_store *store)
     ++struct ref_store *debug_wrap(const char *gitdir, struct ref_store *store)
      +{
      +	struct debug_ref_store *res = malloc(sizeof(struct debug_ref_store));
     ++	fprintf(stderr, "ref_store for %s\n", gitdir);
      +	res->refs = store;
      +	base_ref_store_init((struct ref_store *)res, &refs_be_debug);
      +	return (struct ref_store *)res;
     @@ refs/debug.c (new)
      +	struct debug_ref_iterator *diter =
      +		(struct debug_ref_iterator *)ref_iterator;
      +	int res = diter->iter->vtable->advance(diter->iter);
     -+	fprintf(stderr, "iterator_advance: %s: %d\n", diter->iter->refname,
     -+		res);
     ++	if (res)
     ++		fprintf(stderr, "iterator_advance: %d\n", res);
     ++	else
     ++		fprintf(stderr, "iterator_advance before: %s\n",
     ++			diter->iter->refname);
     ++
      +	diter->base.ordered = diter->iter->ordered;
      +	diter->base.refname = diter->iter->refname;
      +	diter->base.oid = diter->iter->oid;
     @@ refs/refs-internal.h: struct ref_store {
      + * Print out ref operations as they occur. Useful for debugging alternate ref
      + * backends.
      + */
     -+struct ref_store *debug_wrap(struct ref_store *store);
     ++struct ref_store *debug_wrap(const char *gitdir, struct ref_store *store);
      +
       #endif /* REFS_REFS_INTERNAL_H */
      
 16:  5211c64310 = 17:  ef0dd45f07 vcxproj: adjust for the reftable changes
 17:  9724854088 = 18:  d4dc1deae5 git-prompt: prepare for reftable refs backend
 18:  ece1fa1f62 = 19:  5a85abf9c4 Add reftable testing infrastructure
 19:  991abf9e1b = 20:  0ebf7beb95 Add "test-tool dump-reftable" command.

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v19 01/20] lib-t6000.sh: write tag using git-update-ref
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 02/20] t3432: use git-reflog to inspect the reflog for HEAD Han-Wen Nienhuys via GitGitGadget
                                                       ` (20 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/lib-t6000.sh | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/t/lib-t6000.sh b/t/lib-t6000.sh
index b0ed4767e3..fba6778ca3 100644
--- a/t/lib-t6000.sh
+++ b/t/lib-t6000.sh
@@ -1,7 +1,5 @@
 : included from 6002 and others
 
-mkdir -p .git/refs/tags
-
 >sed.script
 
 # Answer the sha1 has associated with the tag. The tag must exist under refs/tags
@@ -26,7 +24,8 @@ save_tag () {
 	_tag=$1
 	test -n "$_tag" || error "usage: save_tag tag commit-args ..."
 	shift 1
-	"$@" >".git/refs/tags/$_tag"
+
+	git update-ref "refs/tags/$_tag" $("$@")
 
 	echo "s/$(tag $_tag)/$_tag/g" >sed.script.tmp
 	cat sed.script >>sed.script.tmp
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 02/20] t3432: use git-reflog to inspect the reflog for HEAD
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 01/20] lib-t6000.sh: write tag using git-update-ref Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-30 15:23                                       ` Denton Liu
  2020-06-29 18:56                                     ` [PATCH v19 03/20] checkout: add '\n' to reflog message Han-Wen Nienhuys via GitGitGadget
                                                       ` (19 subsequent siblings)
  21 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/t3432-rebase-fast-forward.sh | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/t/t3432-rebase-fast-forward.sh b/t/t3432-rebase-fast-forward.sh
index 6f0452c0ea..22afeb8ccd 100755
--- a/t/t3432-rebase-fast-forward.sh
+++ b/t/t3432-rebase-fast-forward.sh
@@ -60,15 +60,16 @@ test_rebase_same_head_ () {
 		fi &&
 		oldhead=\$(git rev-parse HEAD) &&
 		test_when_finished 'git reset --hard \$oldhead' &&
-		cp .git/logs/HEAD expect &&
+		git reflog HEAD > expect &&
 		git rebase$flag $* >stdout &&
+		git reflog HEAD > actual &&
 		if test $what = work
 		then
 			old=\$(wc -l <expect) &&
-			test_line_count '-gt' \$old .git/logs/HEAD
+			test_line_count '-gt' \$old actual
 		elif test $what = noop
 		then
-			test_cmp expect .git/logs/HEAD
+			test_cmp expect actual
 		fi &&
 		newhead=\$(git rev-parse HEAD) &&
 		if test $cmp = same
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 03/20] checkout: add '\n' to reflog message
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 01/20] lib-t6000.sh: write tag using git-update-ref Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 02/20] t3432: use git-reflog to inspect the reflog for HEAD Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 20:07                                       ` Junio C Hamano
  2020-06-29 18:56                                     ` [PATCH v19 04/20] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
                                                       ` (18 subsequent siblings)
  21 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable precisely reproduces the given message. This leads to differences,
because the files backend implicitly adds a trailing '\n' to all messages.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/checkout.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index af849c644f..bb11fcc4e9 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -884,8 +884,9 @@ static void update_refs_for_switch(const struct checkout_opts *opts,
 
 	reflog_msg = getenv("GIT_REFLOG_ACTION");
 	if (!reflog_msg)
-		strbuf_addf(&msg, "checkout: moving from %s to %s",
-			old_desc ? old_desc : "(invalid)", new_branch_info->name);
+		strbuf_addf(&msg, "checkout: moving from %s to %s\n",
+			    old_desc ? old_desc : "(invalid)",
+			    new_branch_info->name);
 	else
 		strbuf_insertstr(&msg, 0, reflog_msg);
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 04/20] Write pseudorefs through ref backends.
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (2 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 03/20] checkout: add '\n' to reflog message Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 05/20] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
                                                       ` (17 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Pseudorefs store transient data in in the repository. Examples are HEAD,
CHERRY_PICK_HEAD, etc.

These refs have always been read through the ref backends, but they were written
in a one-off routine that wrote an object ID or symref directly into
.git/<pseudo_ref_name>.

This causes problems when introducing a new ref storage backend. To remedy this,
extend the ref backend implementation with a write_pseudoref_fn and
update_pseudoref_fn.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c                | 119 ++++++++----------------------------------
 refs.h                |  11 ++++
 refs/files-backend.c  | 114 +++++++++++++++++++++++++++++++++++++++-
 refs/packed-backend.c |  21 +++++++-
 refs/refs-internal.h  |  18 +++++++
 5 files changed, 184 insertions(+), 99 deletions(-)

diff --git a/refs.c b/refs.c
index 639cba93b4..1e9d893191 100644
--- a/refs.c
+++ b/refs.c
@@ -323,6 +323,12 @@ int ref_exists(const char *refname)
 	return refs_ref_exists(get_main_ref_store(the_repository), refname);
 }
 
+int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
+{
+	return refs_delete_pseudoref(get_main_ref_store(the_repository),
+				     pseudoref, old_oid);
+}
+
 static int filter_refs(const char *refname, const struct object_id *oid,
 			   int flags, void *data)
 {
@@ -771,101 +777,6 @@ long get_files_ref_lock_timeout_ms(void)
 	return timeout_ms;
 }
 
-static int write_pseudoref(const char *pseudoref, const struct object_id *oid,
-			   const struct object_id *old_oid, struct strbuf *err)
-{
-	const char *filename;
-	int fd;
-	struct lock_file lock = LOCK_INIT;
-	struct strbuf buf = STRBUF_INIT;
-	int ret = -1;
-
-	if (!oid)
-		return 0;
-
-	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
-
-	filename = git_path("%s", pseudoref);
-	fd = hold_lock_file_for_update_timeout(&lock, filename, 0,
-					       get_files_ref_lock_timeout_ms());
-	if (fd < 0) {
-		strbuf_addf(err, _("could not open '%s' for writing: %s"),
-			    filename, strerror(errno));
-		goto done;
-	}
-
-	if (old_oid) {
-		struct object_id actual_old_oid;
-
-		if (read_ref(pseudoref, &actual_old_oid)) {
-			if (!is_null_oid(old_oid)) {
-				strbuf_addf(err, _("could not read ref '%s'"),
-					    pseudoref);
-				rollback_lock_file(&lock);
-				goto done;
-			}
-		} else if (is_null_oid(old_oid)) {
-			strbuf_addf(err, _("ref '%s' already exists"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		} else if (!oideq(&actual_old_oid, old_oid)) {
-			strbuf_addf(err, _("unexpected object ID when writing '%s'"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		}
-	}
-
-	if (write_in_full(fd, buf.buf, buf.len) < 0) {
-		strbuf_addf(err, _("could not write to '%s'"), filename);
-		rollback_lock_file(&lock);
-		goto done;
-	}
-
-	commit_lock_file(&lock);
-	ret = 0;
-done:
-	strbuf_release(&buf);
-	return ret;
-}
-
-static int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
-{
-	const char *filename;
-
-	filename = git_path("%s", pseudoref);
-
-	if (old_oid && !is_null_oid(old_oid)) {
-		struct lock_file lock = LOCK_INIT;
-		int fd;
-		struct object_id actual_old_oid;
-
-		fd = hold_lock_file_for_update_timeout(
-				&lock, filename, 0,
-				get_files_ref_lock_timeout_ms());
-		if (fd < 0) {
-			error_errno(_("could not open '%s' for writing"),
-				    filename);
-			return -1;
-		}
-		if (read_ref(pseudoref, &actual_old_oid))
-			die(_("could not read ref '%s'"), pseudoref);
-		if (!oideq(&actual_old_oid, old_oid)) {
-			error(_("unexpected object ID when deleting '%s'"),
-			      pseudoref);
-			rollback_lock_file(&lock);
-			return -1;
-		}
-
-		unlink(filename);
-		rollback_lock_file(&lock);
-	} else {
-		unlink(filename);
-	}
-
-	return 0;
-}
 
 int refs_delete_ref(struct ref_store *refs, const char *msg,
 		    const char *refname,
@@ -877,7 +788,7 @@ int refs_delete_ref(struct ref_store *refs, const char *msg,
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
 		assert(refs == get_main_ref_store(the_repository));
-		return delete_pseudoref(refname, old_oid);
+		return refs_delete_pseudoref(refs, refname, old_oid);
 	}
 
 	transaction = ref_store_transaction_begin(refs, &err);
@@ -1204,7 +1115,8 @@ int refs_update_ref(struct ref_store *refs, const char *msg,
 
 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
 		assert(refs == get_main_ref_store(the_repository));
-		ret = write_pseudoref(refname, new_oid, old_oid, &err);
+		ret = refs_write_pseudoref(refs, refname, new_oid, old_oid,
+					   &err);
 	} else {
 		t = ref_store_transaction_begin(refs, &err);
 		if (!t ||
@@ -1473,6 +1385,19 @@ int head_ref(each_ref_fn fn, void *cb_data)
 	return refs_head_ref(get_main_ref_store(the_repository), fn, cb_data);
 }
 
+int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			 const struct object_id *oid,
+			 const struct object_id *old_oid, struct strbuf *err)
+{
+	return refs->be->write_pseudoref(refs, pseudoref, oid, old_oid, err);
+}
+
+int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			  const struct object_id *old_oid)
+{
+	return refs->be->delete_pseudoref(refs, pseudoref, old_oid);
+}
+
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
 		const char *prefix, int trim, int flags)
diff --git a/refs.h b/refs.h
index f212f8945e..9ad1400aab 100644
--- a/refs.h
+++ b/refs.h
@@ -741,6 +741,17 @@ int update_ref(const char *msg, const char *refname,
 	       const struct object_id *new_oid, const struct object_id *old_oid,
 	       unsigned int flags, enum action_on_err onerr);
 
+/* Pseudorefs (eg. HEAD, CHERRY_PICK_HEAD) have a separate routines for updating
+   and deletion as they cannot take part in normal transactional updates.
+   Pseudorefs should only be written for the main repository.
+*/
+int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
+			 const struct object_id *oid,
+			 const struct object_id *old_oid, struct strbuf *err);
+int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
+			  const struct object_id *old_oid);
+int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid);
+
 int parse_hide_refs_config(const char *var, const char *value, const char *);
 
 /*
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6516c7bc8c..df7553f4cc 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -731,6 +731,115 @@ static int lock_raw_ref(struct files_ref_store *refs,
 	return ret;
 }
 
+static int files_write_pseudoref(struct ref_store *ref_store,
+				 const char *pseudoref,
+				 const struct object_id *oid,
+				 const struct object_id *old_oid,
+				 struct strbuf *err)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "write_pseudoref");
+	int fd;
+	struct lock_file lock = LOCK_INIT;
+	struct strbuf filename = STRBUF_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	int ret = -1;
+
+	if (!oid)
+		return 0;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+	fd = hold_lock_file_for_update_timeout(&lock, filename.buf, 0,
+					       get_files_ref_lock_timeout_ms());
+	if (fd < 0) {
+		strbuf_addf(err, _("could not open '%s' for writing: %s"),
+			    buf.buf, strerror(errno));
+		goto done;
+	}
+
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(err, _("could not read ref '%s'"),
+					    pseudoref);
+				rollback_lock_file(&lock);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(err, _("ref '%s' already exists"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(err,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+	}
+
+	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
+	if (write_in_full(fd, buf.buf, buf.len) < 0) {
+		strbuf_addf(err, _("could not write to '%s'"), filename.buf);
+		rollback_lock_file(&lock);
+		goto done;
+	}
+
+	commit_lock_file(&lock);
+	ret = 0;
+done:
+	strbuf_release(&buf);
+	strbuf_release(&filename);
+	return ret;
+}
+
+static int files_delete_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *old_oid)
+{
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "delete_pseudoref");
+	struct strbuf filename = STRBUF_INIT;
+	int ret = -1;
+
+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
+
+	if (old_oid && !is_null_oid(old_oid)) {
+		struct lock_file lock = LOCK_INIT;
+		int fd;
+		struct object_id actual_old_oid;
+
+		fd = hold_lock_file_for_update_timeout(
+			&lock, filename.buf, 0,
+			get_files_ref_lock_timeout_ms());
+		if (fd < 0) {
+			error_errno(_("could not open '%s' for writing"),
+				    filename.buf);
+			goto done;
+		}
+		if (read_ref(pseudoref, &actual_old_oid))
+			die(_("could not read ref '%s'"), pseudoref);
+		if (!oideq(&actual_old_oid, old_oid)) {
+			error(_("unexpected object ID when deleting '%s'"),
+			      pseudoref);
+			rollback_lock_file(&lock);
+			goto done;
+		}
+
+		unlink(filename.buf);
+		rollback_lock_file(&lock);
+	} else {
+		unlink(filename.buf);
+	}
+	ret = 0;
+done:
+	strbuf_release(&filename);
+	return ret;
+}
+
 struct files_ref_iterator {
 	struct ref_iterator base;
 
@@ -3189,6 +3298,9 @@ struct ref_storage_be refs_be_files = {
 	files_rename_ref,
 	files_copy_ref,
 
+	files_write_pseudoref,
+	files_delete_pseudoref,
+
 	files_ref_iterator_begin,
 	files_read_raw_ref,
 
@@ -3198,5 +3310,5 @@ struct ref_storage_be refs_be_files = {
 	files_reflog_exists,
 	files_create_reflog,
 	files_delete_reflog,
-	files_reflog_expire
+	files_reflog_expire,
 };
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 4458a0f69c..08e8253a89 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1590,6 +1590,22 @@ static int packed_copy_ref(struct ref_store *ref_store,
 	BUG("packed reference store does not support copying references");
 }
 
+static int packed_write_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *oid,
+				  const struct object_id *old_oid,
+				  struct strbuf *err)
+{
+	BUG("packed reference store does not support writing pseudo-references");
+}
+
+static int packed_delete_pseudoref(struct ref_store *ref_store,
+				   const char *pseudoref,
+				   const struct object_id *old_oid)
+{
+	BUG("packed reference store does not support deleting pseudo-references");
+}
+
 static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_store)
 {
 	return empty_ref_iterator_begin();
@@ -1656,6 +1672,9 @@ struct ref_storage_be refs_be_packed = {
 	packed_rename_ref,
 	packed_copy_ref,
 
+	packed_write_pseudoref,
+	packed_delete_pseudoref,
+
 	packed_ref_iterator_begin,
 	packed_read_raw_ref,
 
@@ -1665,5 +1684,5 @@ struct ref_storage_be refs_be_packed = {
 	packed_reflog_exists,
 	packed_create_reflog,
 	packed_delete_reflog,
-	packed_reflog_expire
+	packed_reflog_expire,
 };
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 4271362d26..59b053d53a 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -556,6 +556,21 @@ typedef int copy_ref_fn(struct ref_store *ref_store,
 			  const char *oldref, const char *newref,
 			  const char *logmsg);
 
+typedef int write_pseudoref_fn(struct ref_store *ref_store,
+			       const char *pseudoref,
+			       const struct object_id *oid,
+			       const struct object_id *old_oid,
+			       struct strbuf *err);
+
+/*
+ * Deletes a pseudoref. Deletion always succeeds (even if the pseudoref doesn't
+ * exist.), except if old_oid is specified. If it is, it can fail due to lock
+ * failure, failure reading the old OID, or an OID mismatch
+ */
+typedef int delete_pseudoref_fn(struct ref_store *ref_store,
+				const char *pseudoref,
+				const struct object_id *old_oid);
+
 /*
  * Iterate over the references in `ref_store` whose names start with
  * `prefix`. `prefix` is matched as a literal string, without regard
@@ -655,6 +670,9 @@ struct ref_storage_be {
 	rename_ref_fn *rename_ref;
 	copy_ref_fn *copy_ref;
 
+	write_pseudoref_fn *write_pseudoref;
+	delete_pseudoref_fn *delete_pseudoref;
+
 	ref_iterator_begin_fn *iterator_begin;
 	read_raw_ref_fn *read_raw_ref;
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 05/20] Make refs_ref_exists public
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (3 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 04/20] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 06/20] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
                                                       ` (16 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 2 +-
 refs.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 1e9d893191..0363ddbeeb 100644
--- a/refs.c
+++ b/refs.c
@@ -313,7 +313,7 @@ int read_ref(const char *refname, struct object_id *oid)
 	return read_ref_full(refname, RESOLVE_REF_READING, oid, NULL);
 }
 
-static int refs_ref_exists(struct ref_store *refs, const char *refname)
+int refs_ref_exists(struct ref_store *refs, const char *refname)
 {
 	return !!refs_resolve_ref_unsafe(refs, refname, RESOLVE_REF_READING, NULL, NULL);
 }
diff --git a/refs.h b/refs.h
index 9ad1400aab..e9916ca6f0 100644
--- a/refs.h
+++ b/refs.h
@@ -105,6 +105,8 @@ int refs_verify_refname_available(struct ref_store *refs,
 				  const struct string_list *skip,
 				  struct strbuf *err);
 
+int refs_ref_exists(struct ref_store *refs, const char *refname);
+
 int ref_exists(const char *refname);
 
 int should_autocreate_reflog(const char *refname);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 06/20] Treat BISECT_HEAD as a pseudo ref
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (4 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 05/20] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 07/20] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
                                                       ` (15 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Both the git-bisect.sh as bisect--helper inspected the file system directly.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/bisect--helper.c | 3 +--
 git-bisect.sh            | 4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index ec4996282e..73f9324ad7 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -13,7 +13,6 @@ static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
 static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
 static GIT_PATH_FUNC(git_path_bisect_ancestors_ok, "BISECT_ANCESTORS_OK")
 static GIT_PATH_FUNC(git_path_bisect_start, "BISECT_START")
-static GIT_PATH_FUNC(git_path_bisect_head, "BISECT_HEAD")
 static GIT_PATH_FUNC(git_path_bisect_log, "BISECT_LOG")
 static GIT_PATH_FUNC(git_path_head_name, "head-name")
 static GIT_PATH_FUNC(git_path_bisect_names, "BISECT_NAMES")
@@ -164,7 +163,7 @@ static int bisect_reset(const char *commit)
 		strbuf_addstr(&branch, commit);
 	}
 
-	if (!file_exists(git_path_bisect_head())) {
+	if (!ref_exists("BISECT_HEAD")) {
 		struct argv_array argv = ARGV_ARRAY_INIT;
 
 		argv_array_pushl(&argv, "checkout", branch.buf, "--", NULL);
diff --git a/git-bisect.sh b/git-bisect.sh
index 08a6ed57dd..f03fbb18f0 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -41,7 +41,7 @@ TERM_GOOD=good
 
 bisect_head()
 {
-	if test -f "$GIT_DIR/BISECT_HEAD"
+	if git rev-parse --verify -q BISECT_HEAD > /dev/null
 	then
 		echo BISECT_HEAD
 	else
@@ -153,7 +153,7 @@ bisect_next() {
 	git bisect--helper --bisect-next-check $TERM_GOOD $TERM_BAD $TERM_GOOD|| exit
 
 	# Perform all bisection computation, display and checkout
-	git bisect--helper --next-all $(test -f "$GIT_DIR/BISECT_HEAD" && echo --no-checkout)
+	git bisect--helper --next-all $(git rev-parse --verify -q BISECT_HEAD > /dev/null && echo --no-checkout)
 	res=$?
 
 	# Check if we should exit because bisection is finished
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 07/20] Treat CHERRY_PICK_HEAD as a pseudo ref
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (5 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 06/20] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 08/20] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
                                                       ` (14 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Check for existence and delete CHERRY_PICK_HEAD through pseudo ref functions.
This will help cherry-pick work with alternate ref storage backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/commit.c | 34 +++++++++++++++++++---------------
 builtin/merge.c  |  2 +-
 path.c           |  1 -
 path.h           |  7 ++++---
 sequencer.c      | 42 ++++++++++++++++++++++++++----------------
 wt-status.c      |  4 ++--
 6 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index d1b7396052..e27120b982 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -847,21 +847,25 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 			if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
 				!merge_contains_scissors)
 				wt_status_add_cut_line(s->fp);
-			status_printf_ln(s, GIT_COLOR_NORMAL,
-			    whence == FROM_MERGE
-				? _("\n"
-					"It looks like you may be committing a merge.\n"
-					"If this is not correct, please remove the file\n"
-					"	%s\n"
-					"and try again.\n")
-				: _("\n"
-					"It looks like you may be committing a cherry-pick.\n"
-					"If this is not correct, please remove the file\n"
-					"	%s\n"
-					"and try again.\n"),
-				whence == FROM_MERGE ?
-					git_path_merge_head(the_repository) :
-					git_path_cherry_pick_head(the_repository));
+			if (whence == FROM_MERGE)
+				status_printf_ln(
+					s, GIT_COLOR_NORMAL,
+
+					_("\n"
+					  "It looks like you may be committing a merge.\n"
+					  "If this is not correct, please remove the file\n"
+					  "	%s\n"
+					  "and try again.\n"),
+					git_path_merge_head(the_repository));
+			else
+				status_printf_ln(
+					s, GIT_COLOR_NORMAL,
+
+					_("\n"
+					  "It looks like you may be committing a cherry-pick.\n"
+					  "If this is not correct, please run\n"
+					  "	git cherry-pick --abort\n"
+					  "and try again.\n"));
 		}
 
 		fprintf(s->fp, "\n");
diff --git a/builtin/merge.c b/builtin/merge.c
index 7da707bf55..93b0a7b6ed 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -1352,7 +1352,7 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
 		else
 			die(_("You have not concluded your merge (MERGE_HEAD exists)."));
 	}
-	if (file_exists(git_path_cherry_pick_head(the_repository))) {
+	if (ref_exists("CHERRY_PICK_HEAD")) {
 		if (advice_resolve_conflict)
 			die(_("You have not concluded your cherry-pick (CHERRY_PICK_HEAD exists).\n"
 			    "Please, commit your changes before you merge."));
diff --git a/path.c b/path.c
index 8b2c753191..783cc2ae81 100644
--- a/path.c
+++ b/path.c
@@ -1528,7 +1528,6 @@ char *xdg_cache_home(const char *filename)
 	return NULL;
 }
 
-REPO_GIT_PATH_FUNC(cherry_pick_head, "CHERRY_PICK_HEAD")
 REPO_GIT_PATH_FUNC(revert_head, "REVERT_HEAD")
 REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
 REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
diff --git a/path.h b/path.h
index 1f1bf8f87a..8941c018a9 100644
--- a/path.h
+++ b/path.h
@@ -170,7 +170,6 @@ void report_linked_checkout_garbage(void);
 	}
 
 struct path_cache {
-	const char *cherry_pick_head;
 	const char *revert_head;
 	const char *squash_msg;
 	const char *merge_msg;
@@ -182,9 +181,11 @@ struct path_cache {
 	const char *shallow;
 };
 
-#define PATH_CACHE_INIT { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL }
+#define PATH_CACHE_INIT                                              \
+	{                                                            \
+		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
+	}
 
-const char *git_path_cherry_pick_head(struct repository *r);
 const char *git_path_revert_head(struct repository *r);
 const char *git_path_squash_msg(struct repository *r);
 const char *git_path_merge_msg(struct repository *r);
diff --git a/sequencer.c b/sequencer.c
index fd7701c88a..26286ec8d0 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -381,7 +381,8 @@ static void print_advice(struct repository *r, int show_hint,
 		 * (typically rebase --interactive) wants to take care
 		 * of the commit itself so remove CHERRY_PICK_HEAD
 		 */
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		return;
 	}
 
@@ -1455,7 +1456,8 @@ static int do_commit(struct repository *r,
 				    author, opts, flags, &oid);
 		strbuf_release(&sb);
 		if (!res) {
-			unlink(git_path_cherry_pick_head(r));
+			refs_delete_pseudoref(get_main_ref_store(r),
+					      "CHERRY_PICK_HEAD", NULL);
 			unlink(git_path_merge_msg(r));
 			if (!is_rebase_i(opts))
 				print_commit_summary(r, NULL, &oid,
@@ -1966,7 +1968,8 @@ static int do_pick_commit(struct repository *r,
 		flags |= ALLOW_EMPTY;
 	} else if (allow == 2) {
 		drop_commit = 1;
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		unlink(git_path_merge_msg(r));
 		fprintf(stderr,
 			_("dropping %s %s -- patch contents already upstream\n"),
@@ -2305,8 +2308,10 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose)
 	struct replay_opts opts = REPLAY_OPTS_INIT;
 	int need_cleanup = 0;
 
-	if (file_exists(git_path_cherry_pick_head(r))) {
-		if (!unlink(git_path_cherry_pick_head(r)) && verbose)
+	if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD")) {
+		if (!refs_delete_pseudoref(get_main_ref_store(r),
+					   "CHERRY_PICK_HEAD", NULL) &&
+		    verbose)
 			warning(_("cancelling a cherry picking in progress"));
 		opts.action = REPLAY_PICK;
 		need_cleanup = 1;
@@ -2671,8 +2676,9 @@ static int create_seq_dir(struct repository *r)
 	enum replay_action action;
 	const char *in_progress_error = NULL;
 	const char *in_progress_advice = NULL;
-	unsigned int advise_skip = file_exists(git_path_revert_head(r)) ||
-				file_exists(git_path_cherry_pick_head(r));
+	unsigned int advise_skip =
+		file_exists(git_path_revert_head(r)) ||
+		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
 
 	if (!sequencer_get_last_command(r, &action)) {
 		switch (action) {
@@ -2771,7 +2777,7 @@ static int rollback_single_pick(struct repository *r)
 {
 	struct object_id head_oid;
 
-	if (!file_exists(git_path_cherry_pick_head(r)) &&
+	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
 	    !file_exists(git_path_revert_head(r)))
 		return error(_("no cherry-pick or revert in progress"));
 	if (read_ref_full("HEAD", 0, &head_oid, NULL))
@@ -2874,7 +2880,8 @@ int sequencer_skip(struct repository *r, struct replay_opts *opts)
 		}
 		break;
 	case REPLAY_PICK:
-		if (!file_exists(git_path_cherry_pick_head(r))) {
+		if (!refs_ref_exists(get_main_ref_store(r),
+				     "CHERRY_PICK_HEAD")) {
 			if (action != REPLAY_PICK)
 				return error(_("no cherry-pick in progress"));
 			if (!rollback_is_safe())
@@ -3569,7 +3576,8 @@ static int do_merge(struct repository *r,
 					oid_to_hex(&j->item->object.oid));
 
 		strbuf_release(&ref_name);
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
+				      NULL);
 		rollback_lock_file(&lock);
 
 		rollback_lock_file(&lock);
@@ -4201,7 +4209,7 @@ static int continue_single_pick(struct repository *r)
 {
 	const char *argv[] = { "commit", NULL };
 
-	if (!file_exists(git_path_cherry_pick_head(r)) &&
+	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
 	    !file_exists(git_path_revert_head(r)))
 		return error(_("no cherry-pick or revert in progress"));
 	return run_command_v_opt(argv, RUN_GIT_CMD);
@@ -4318,9 +4326,10 @@ static int commit_staged_changes(struct repository *r,
 	}
 
 	if (is_clean) {
-		const char *cherry_pick_head = git_path_cherry_pick_head(r);
-
-		if (file_exists(cherry_pick_head) && unlink(cherry_pick_head))
+		if (refs_ref_exists(get_main_ref_store(r),
+				    "CHERRY_PICK_HEAD") &&
+		    refs_delete_pseudoref(get_main_ref_store(r),
+					  "CHERRY_PICK_HEAD", NULL))
 			return error(_("could not remove CHERRY_PICK_HEAD"));
 		if (!final_fixup)
 			return 0;
@@ -4379,7 +4388,8 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
 
 	if (!is_rebase_i(opts)) {
 		/* Verify that the conflict has been resolved */
-		if (file_exists(git_path_cherry_pick_head(r)) ||
+		if (refs_ref_exists(get_main_ref_store(r),
+				    "CHERRY_PICK_HEAD") ||
 		    file_exists(git_path_revert_head(r))) {
 			res = continue_single_pick(r);
 			if (res)
@@ -5442,7 +5452,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list)
 
 int sequencer_determine_whence(struct repository *r, enum commit_whence *whence)
 {
-	if (file_exists(git_path_cherry_pick_head(r))) {
+	if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD")) {
 		struct object_id cherry_pick_head, rebase_head;
 
 		if (file_exists(git_path_seq_dir()))
diff --git a/wt-status.c b/wt-status.c
index c560cbe860..3cec356ce6 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1673,8 +1673,8 @@ void wt_status_get_state(struct repository *r,
 		state->merge_in_progress = 1;
 	} else if (wt_status_check_rebase(NULL, state)) {
 		;		/* all set */
-	} else if (!stat(git_path_cherry_pick_head(r), &st) &&
-			!get_oid("CHERRY_PICK_HEAD", &oid)) {
+	} else if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
+		   !get_oid("CHERRY_PICK_HEAD", &oid)) {
 		state->cherry_pick_in_progress = 1;
 		oidcpy(&state->cherry_pick_head_oid, &oid);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 08/20] Treat REVERT_HEAD as a pseudo ref
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (6 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 07/20] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 09/20] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
                                                       ` (13 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 path.c      |  1 -
 path.h      |  8 +++-----
 sequencer.c | 16 +++++++++-------
 wt-status.c |  2 +-
 4 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/path.c b/path.c
index 783cc2ae81..7b385e5eb2 100644
--- a/path.c
+++ b/path.c
@@ -1528,7 +1528,6 @@ char *xdg_cache_home(const char *filename)
 	return NULL;
 }
 
-REPO_GIT_PATH_FUNC(revert_head, "REVERT_HEAD")
 REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
 REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
 REPO_GIT_PATH_FUNC(merge_rr, "MERGE_RR")
diff --git a/path.h b/path.h
index 8941c018a9..e7e77da6aa 100644
--- a/path.h
+++ b/path.h
@@ -170,7 +170,6 @@ void report_linked_checkout_garbage(void);
 	}
 
 struct path_cache {
-	const char *revert_head;
 	const char *squash_msg;
 	const char *merge_msg;
 	const char *merge_rr;
@@ -181,12 +180,11 @@ struct path_cache {
 	const char *shallow;
 };
 
-#define PATH_CACHE_INIT                                              \
-	{                                                            \
-		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
+#define PATH_CACHE_INIT                                        \
+	{                                                      \
+		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
 	}
 
-const char *git_path_revert_head(struct repository *r);
 const char *git_path_squash_msg(struct repository *r);
 const char *git_path_merge_msg(struct repository *r);
 const char *git_path_merge_rr(struct repository *r);
diff --git a/sequencer.c b/sequencer.c
index 26286ec8d0..25ef9fc773 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2317,8 +2317,10 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose)
 		need_cleanup = 1;
 	}
 
-	if (file_exists(git_path_revert_head(r))) {
-		if (!unlink(git_path_revert_head(r)) && verbose)
+	if (refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
+		if (!refs_delete_pseudoref(get_main_ref_store(r), "REVERT_HEAD",
+					   NULL) &&
+		    verbose)
 			warning(_("cancelling a revert in progress"));
 		opts.action = REPLAY_REVERT;
 		need_cleanup = 1;
@@ -2677,7 +2679,7 @@ static int create_seq_dir(struct repository *r)
 	const char *in_progress_error = NULL;
 	const char *in_progress_advice = NULL;
 	unsigned int advise_skip =
-		file_exists(git_path_revert_head(r)) ||
+		refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD") ||
 		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
 
 	if (!sequencer_get_last_command(r, &action)) {
@@ -2778,7 +2780,7 @@ static int rollback_single_pick(struct repository *r)
 	struct object_id head_oid;
 
 	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
-	    !file_exists(git_path_revert_head(r)))
+	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
 		return error(_("no cherry-pick or revert in progress"));
 	if (read_ref_full("HEAD", 0, &head_oid, NULL))
 		return error(_("cannot resolve HEAD"));
@@ -2872,7 +2874,7 @@ int sequencer_skip(struct repository *r, struct replay_opts *opts)
 	 */
 	switch (opts->action) {
 	case REPLAY_REVERT:
-		if (!file_exists(git_path_revert_head(r))) {
+		if (!refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
 			if (action != REPLAY_REVERT)
 				return error(_("no revert in progress"));
 			if (!rollback_is_safe())
@@ -4210,7 +4212,7 @@ static int continue_single_pick(struct repository *r)
 	const char *argv[] = { "commit", NULL };
 
 	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
-	    !file_exists(git_path_revert_head(r)))
+	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
 		return error(_("no cherry-pick or revert in progress"));
 	return run_command_v_opt(argv, RUN_GIT_CMD);
 }
@@ -4390,7 +4392,7 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
 		/* Verify that the conflict has been resolved */
 		if (refs_ref_exists(get_main_ref_store(r),
 				    "CHERRY_PICK_HEAD") ||
-		    file_exists(git_path_revert_head(r))) {
+		    refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
 			res = continue_single_pick(r);
 			if (res)
 				goto release_todo_list;
diff --git a/wt-status.c b/wt-status.c
index 3cec356ce6..16d5a77de3 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1679,7 +1679,7 @@ void wt_status_get_state(struct repository *r,
 		oidcpy(&state->cherry_pick_head_oid, &oid);
 	}
 	wt_status_check_bisect(NULL, state);
-	if (!stat(git_path_revert_head(r), &st) &&
+	if (refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD") &&
 	    !get_oid("REVERT_HEAD", &oid)) {
 		state->revert_in_progress = 1;
 		oidcpy(&state->revert_head_oid, &oid);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 09/20] Move REF_LOG_ONLY to refs-internal.h
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (7 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 08/20] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 10/20] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
                                                       ` (12 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

REF_LOG_ONLY is used in the transaction preparation: if a symref is involved in
a transaction, the referent of the symref should be updated, and the symref
itself should only be updated in the reflog. Other ref backends will need to
duplicate this logic.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/files-backend.c | 7 -------
 refs/refs-internal.h | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index df7553f4cc..141b6b0881 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -38,13 +38,6 @@
  */
 #define REF_NEEDS_COMMIT (1 << 6)
 
-/*
- * Used as a flag in ref_update::flags when we want to log a ref
- * update but not actually perform it.  This is used when a symbolic
- * ref update is split up.
- */
-#define REF_LOG_ONLY (1 << 7)
-
 /*
  * Used as a flag in ref_update::flags when the ref_update was via an
  * update to HEAD.
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 59b053d53a..dc9e8d3a92 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -31,6 +31,13 @@ struct ref_transaction;
  */
 #define REF_HAVE_OLD (1 << 3)
 
+/*
+ * Used as a flag in ref_update::flags when we want to log a ref
+ * update but not actually perform it.  This is used when a symbolic
+ * ref update is split up.
+ */
+#define REF_LOG_ONLY (1 << 7)
+
 /*
  * Return the length of time to retry acquiring a loose reference lock
  * before giving up, in milliseconds:
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 10/20] Iterate over the "refs/" namespace in for_each_[raw]ref
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (8 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 09/20] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 11/20] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
                                                       ` (11 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This happens implicitly in the files/packed ref backend; making it
explicit simplifies adding alternate ref storage backends, such as
reftable.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 0363ddbeeb..36c0e76256 100644
--- a/refs.c
+++ b/refs.c
@@ -1482,7 +1482,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1542,8 +1542,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
-			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
+			       cb_data);
 }
 
 int for_each_rawref(each_ref_fn fn, void *cb_data)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 11/20] Add .gitattributes for the reftable/ directory
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (9 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 10/20] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 12/20] Add reftable library Han-Wen Nienhuys via GitGitGadget
                                                       ` (10 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 reftable/.gitattributes

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 0000000000..f44451a379
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 12/20] Add reftable library
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (10 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 11/20] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 13/20] Add standalone build infrastructure for reftable Han-Wen Nienhuys via GitGitGadget
                                                       ` (9 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. The rationale and format
layout is detailed in commit 35e6c47404 ("reftable: file format documentation",
2020-05-20).

This commits imports a fully functioning library for reading and writing
reftables, along with unittests.

It is suggested to read the code in the following order:

* The specification under Documentation/technical/reftable.txt

* reftable.h - the public API

* record.{c,h} - reading and writing records

* block.{c,h} - reading and writing blocks.

* reader.{c,h} - reading a complete reftable file.

* writer.{c,h} - writing a complete reftable file.

* merged.{c,h} and pq.{c,h} - reading a stack of reftables

* stack.{c,h} - writing and compacting stacks of reftable on the
filesystem.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE          |   31 +
 reftable/VERSION          |    1 +
 reftable/basics.c         |  215 +++++++
 reftable/basics.h         |   53 ++
 reftable/block.c          |  432 +++++++++++++
 reftable/block.h          |  129 ++++
 reftable/block_test.c     |  157 +++++
 reftable/compat.c         |   98 +++
 reftable/compat.h         |   48 ++
 reftable/constants.h      |   21 +
 reftable/dump.c           |  212 +++++++
 reftable/file.c           |   95 +++
 reftable/iter.c           |  242 ++++++++
 reftable/iter.h           |   72 +++
 reftable/merged.c         |  317 ++++++++++
 reftable/merged.h         |   43 ++
 reftable/merged_test.c    |  286 +++++++++
 reftable/pq.c             |  115 ++++
 reftable/pq.h             |   34 +
 reftable/reader.c         |  744 ++++++++++++++++++++++
 reftable/reader.h         |   65 ++
 reftable/record.c         | 1126 ++++++++++++++++++++++++++++++++++
 reftable/record.h         |  143 +++++
 reftable/record_test.c    |  410 +++++++++++++
 reftable/refname.c        |  209 +++++++
 reftable/refname.h        |   38 ++
 reftable/refname_test.c   |   99 +++
 reftable/reftable-tests.h |   22 +
 reftable/reftable.c       |  154 +++++
 reftable/reftable.h       |  585 ++++++++++++++++++
 reftable/reftable_test.c  |  632 +++++++++++++++++++
 reftable/stack.c          | 1225 +++++++++++++++++++++++++++++++++++++
 reftable/stack.h          |   50 ++
 reftable/stack_test.c     |  787 ++++++++++++++++++++++++
 reftable/strbuf.c         |  206 +++++++
 reftable/strbuf.h         |   88 +++
 reftable/strbuf_test.c    |   39 ++
 reftable/system.h         |   53 ++
 reftable/test_framework.c |   69 +++
 reftable/test_framework.h |   64 ++
 reftable/tree.c           |   63 ++
 reftable/tree.h           |   34 +
 reftable/tree_test.c      |   62 ++
 reftable/update.sh        |   22 +
 reftable/writer.c         |  663 ++++++++++++++++++++
 reftable/writer.h         |   60 ++
 reftable/zlib-compat.c    |   92 +++
 47 files changed, 10405 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/compat.c
 create mode 100644 reftable/compat.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/refname_test.c
 create mode 100644 reftable/reftable-tests.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/strbuf.c
 create mode 100644 reftable/strbuf.h
 create mode 100644 reftable/strbuf_test.c
 create mode 100644 reftable/system.h
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 0000000000..402e0f9356
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 0000000000..b0d1123483
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1 @@
+994cc2f626ff626ff04be5240413775f832110fb C: formatting tweaks
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 0000000000..35b42b6b26
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,215 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(uint8_t *out, uint32_t i)
+{
+	out[0] = (uint8_t)((i >> 16) & 0xff);
+	out[1] = (uint8_t)((i >> 8) & 0xff);
+	out[2] = (uint8_t)(i & 0xff);
+}
+
+uint32_t get_be24(uint8_t *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)(i & 0xff);
+}
+
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args)
+{
+	size_t lo = 0;
+	size_t hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		size_t mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		reftable_free(*p);
+		p++;
+	}
+	reftable_free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	size_t names_cap = 0;
+	size_t names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = reftable_realloc(
+					names, names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = reftable_realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *reftable_error_str(int err)
+{
+	static char buf[250];
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return "I/O error";
+	case REFTABLE_FORMAT_ERROR:
+		return "corrupt reftable file";
+	case REFTABLE_NOT_EXIST_ERROR:
+		return "file does not exist";
+	case REFTABLE_LOCK_ERROR:
+		return "data is outdated";
+	case REFTABLE_API_ERROR:
+		return "misuse of the reftable API";
+	case REFTABLE_ZLIB_ERROR:
+		return "zlib failure";
+	case REFTABLE_NAME_CONFLICT:
+		return "file/directory conflict";
+	case REFTABLE_REFNAME_ERROR:
+		return "invalid refname";
+	case -1:
+		return "general error";
+	default:
+		snprintf(buf, sizeof(buf), "unknown error code %d", err);
+		return buf;
+	}
+}
+
+int reftable_error_to_errno(int err)
+{
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return EIO;
+	case REFTABLE_FORMAT_ERROR:
+		return EFAULT;
+	case REFTABLE_NOT_EXIST_ERROR:
+		return ENOENT;
+	case REFTABLE_LOCK_ERROR:
+		return EBUSY;
+	case REFTABLE_API_ERROR:
+		return EINVAL;
+	case REFTABLE_ZLIB_ERROR:
+		return EDOM;
+	default:
+		return ERANGE;
+	}
+}
+
+void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
+void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
+void (*reftable_free_ptr)(void *) = &free;
+
+void *reftable_malloc(size_t sz)
+{
+	return (*reftable_malloc_ptr)(sz);
+}
+
+void *reftable_realloc(void *p, size_t sz)
+{
+	return (*reftable_realloc_ptr)(p, sz);
+}
+
+void reftable_free(void *p)
+{
+	reftable_free_ptr(p);
+}
+
+void *reftable_calloc(size_t sz)
+{
+	void *p = reftable_malloc(sz);
+	memset(p, 0, sz);
+	return p;
+}
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *))
+{
+	reftable_malloc_ptr = malloc;
+	reftable_realloc_ptr = realloc;
+	reftable_free_ptr = free;
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 0000000000..e1b4c9fa1a
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+/* Bigendian en/decoding of integers */
+
+void put_be24(uint8_t *out, uint32_t i);
+uint32_t get_be24(uint8_t *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args);
+
+/*
+  Frees a NULL terminated array of malloced strings. The array itself is also
+  freed.
+ */
+void free_names(char **a);
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp);
+
+/* compares two NULL-terminated arrays of strings. */
+int names_equal(char **a, char **b);
+
+/* returns the array size of a NULL-terminated array of strings. */
+int names_length(char **names);
+
+/* Allocation routines; they invoke the functions set through
+ * reftable_set_alloc() */
+void *reftable_malloc(size_t sz);
+void *reftable_realloc(void *p, size_t sz);
+void reftable_free(void *p);
+void *reftable_calloc(size_t sz);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 0000000000..d6a1a42a2b
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,432 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int header_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 24;
+	case 2:
+		return 28;
+	}
+	abort();
+}
+
+int footer_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 68;
+	case 2:
+		return 72;
+	}
+	abort();
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct strbuf *key);
+
+void block_writer_init(struct block_writer *bw, uint8_t typ, uint8_t *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+uint8_t block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the reftable_record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec)
+{
+	struct strbuf empty = STRBUF_INIT;
+	struct strbuf last =
+		w->entries % w->restart_interval == 0 ? empty : w->last_key;
+	struct string_view out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct string_view start = out;
+
+	bool restart = false;
+	struct strbuf key = STRBUF_INIT;
+	int n = 0;
+
+	reftable_record_key(rec, &key);
+	n = reftable_encode_key(&restart, out, last, key,
+				reftable_record_val_type(rec));
+	if (n < 0)
+		goto done;
+	string_view_consume(&out, n);
+
+	n = reftable_record_encode(rec, out, w->hash_size);
+	if (n < 0)
+		goto done;
+	string_view_consume(&out, n);
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  &key) < 0)
+		goto done;
+
+	strbuf_release(&key);
+	return 0;
+
+done:
+	strbuf_release(&key);
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct strbuf *key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next)
+		return -1;
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = reftable_realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+
+	strbuf_reset(&w->last_key);
+	strbuf_addbuf(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		uint8_t *compressed = NULL;
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		size_t dest_cap = src_len;
+
+		compressed = reftable_malloc(dest_cap);
+		while (1) {
+			uLongf out_dest_len = dest_cap;
+
+			zresult = compress2(compressed, &out_dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				dest_cap *= 2;
+				compressed =
+					reftable_realloc(compressed, dest_cap);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				reftable_free(compressed);
+				return REFTABLE_ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed,
+			       out_dest_len);
+			w->next = out_dest_len + block_header_skip;
+			reftable_free(compressed);
+			break;
+		}
+	}
+	return w->next;
+}
+
+uint8_t block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct reftable_block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	uint8_t typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	uint16_t restart_count = 0;
+	uint32_t restart_start = 0;
+	uint8_t *restart_bytes = NULL;
+
+	if (!reftable_is_block_type(typ))
+		return REFTABLE_FORMAT_ERROR;
+
+	if (typ == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip; /* total size of dest
+							    buffer. */
+		uLongf src_len = block->len - block_header_skip;
+		/* Log blocks specify the *uncompressed* size in their header.
+		 */
+		uint8_t *uncompressed = reftable_malloc(sz);
+
+		/* Copy over the block header verbatim. It's not compressed. */
+		memcpy(uncompressed, block->data, block_header_skip);
+
+		/* Uncompress */
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed + block_header_skip, &dst_len,
+				    block->data + block_header_skip,
+				    &src_len)) {
+			reftable_free(uncompressed);
+			return REFTABLE_ZLIB_ERROR;
+		}
+
+		if (dst_len + block_header_skip != sz)
+			return REFTABLE_FORMAT_ERROR;
+
+		/* We're done with the input data. */
+		reftable_block_done(block);
+		block->data = uncompressed;
+		block->len = sz;
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	restart_count = get_be16(block->data + sz - 2);
+	restart_start = sz - 2 - 3 * restart_count;
+	restart_bytes = block->data + restart_start;
+
+	/* transfer ownership. */
+	br->block = *block;
+	block->data = NULL;
+	block->len = 0;
+
+	br->hash_size = hash_size;
+	br->block_len = restart_start;
+	br->full_block_size = full_block_size;
+	br->header_off = header_off;
+	br->restart_count = restart_count;
+	br->restart_bytes = restart_bytes;
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	strbuf_reset(&it->last_key);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct strbuf key;
+	struct block_reader *r;
+};
+
+static int restart_key_less(size_t idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct string_view in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct strbuf rkey = STRBUF_INIT;
+	struct strbuf last_key = STRBUF_INIT;
+	uint8_t unused_extra;
+	int n = reftable_decode_key(&rkey, &unused_extra, last_key, in);
+	int result;
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	result = strbuf_cmp(&a->key, &rkey);
+	strbuf_release(&rkey);
+	return result;
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	strbuf_reset(&dest->last_key);
+	strbuf_addbuf(&dest->last_key, &src->last_key);
+}
+
+int block_iter_next(struct block_iter *it, struct reftable_record *rec)
+{
+	struct string_view in = {
+		.buf = it->br->block.data + it->next_off,
+		.len = it->br->block_len - it->next_off,
+	};
+	struct string_view start = in;
+	struct strbuf key = STRBUF_INIT;
+	uint8_t extra = 0;
+	int n = 0;
+
+	if (it->next_off >= it->br->block_len)
+		return 1;
+
+	n = reftable_decode_key(&key, &extra, it->last_key, in);
+	if (n < 0)
+		return -1;
+
+	string_view_consume(&in, n);
+	n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
+	if (n < 0)
+		return -1;
+	string_view_consume(&in, n);
+
+	strbuf_reset(&it->last_key);
+	strbuf_addbuf(&it->last_key, &key);
+	it->next_off += start.len - in.len;
+	strbuf_release(&key);
+	return 0;
+}
+
+int block_reader_first_key(struct block_reader *br, struct strbuf *key)
+{
+	struct strbuf empty = STRBUF_INIT;
+	int off = br->header_off + 4;
+	struct string_view in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	uint8_t extra = 0;
+	int n = reftable_decode_key(key, &extra, empty, in);
+	if (n < 0)
+		return n;
+
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct strbuf *want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	strbuf_release(&it->last_key);
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct strbuf *want)
+{
+	struct restart_find_args args = {
+		.key = *want,
+		.r = br,
+	};
+	struct reftable_record rec = reftable_new_record(block_reader_type(br));
+	struct strbuf key = STRBUF_INIT;
+	int err = 0;
+	struct block_iter next = {
+		.last_key = STRBUF_INIT,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	/* We're looking for the last entry less/equal than the wanted key, so
+	   we have to go one entry too far and then back up.
+	*/
+	while (true) {
+		block_iter_copy_from(&next, it);
+		err = block_iter_next(&next, &rec);
+		if (err < 0)
+			goto done;
+
+		reftable_record_key(&rec, &key);
+		if (err > 0 || strbuf_cmp(&key, want) >= 0) {
+			err = 0;
+			goto done;
+		}
+
+		block_iter_copy_from(it, &next);
+	}
+
+done:
+	strbuf_release(&key);
+	strbuf_release(&next.last_key);
+	reftable_record_destroy(&rec);
+
+	return err;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	strbuf_release(&bw->last_key);
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 0000000000..14a6f835f4
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,129 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+/*
+  Writes reftable blocks. The block_writer is reused across blocks to minimize
+  allocation overhead.
+*/
+struct block_writer {
+	uint8_t *buf;
+	uint32_t block_size;
+
+	/* Offset ofof the global header. Nonzero in the first block only. */
+	uint32_t header_off;
+
+	/* How often to restart keys. */
+	int restart_interval;
+	int hash_size;
+
+	/* Offset of next uint8_t to write. */
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+
+	struct strbuf last_key;
+	int entries;
+};
+
+/*
+  initializes the blockwriter to write `typ` entries, using `buf` as temporary
+  storage. `buf` is not owned by the block_writer. */
+void block_writer_init(struct block_writer *bw, uint8_t typ, uint8_t *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+
+/*
+  returns the block type (eg. 'r' for ref records.
+*/
+uint8_t block_writer_type(struct block_writer *bw);
+
+/* appends the record, or -1 if it doesn't fit. */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec);
+
+/* appends the key restarts, and compress the block if necessary. */
+int block_writer_finish(struct block_writer *w);
+
+/* clears out internally allocated block_writer members. */
+void block_writer_clear(struct block_writer *bw);
+
+/* Read a block. */
+struct block_reader {
+	/* offset of the block header; nonzero for the first block in a
+	 * reftable. */
+	uint32_t header_off;
+
+	/* the memory block */
+	struct reftable_block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	uint8_t *restart_bytes;
+	uint16_t restart_count;
+
+	/* size of the data in the file. For log blocks, this is the compressed
+	 * size. */
+	uint32_t full_block_size;
+};
+
+/* Iterate over entries in a block */
+struct block_iter {
+	/* offset within the block of the next entry to read. */
+	uint32_t next_off;
+	struct block_reader *br;
+
+	/* key for last entry we read. */
+	struct strbuf last_key;
+};
+
+/* initializes a block reader. */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Position `it` at start of the block */
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+
+/* Position `it` to the `want` key in the block */
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct strbuf *want);
+
+/* Returns the block type (eg. 'r' for refs) */
+uint8_t block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct strbuf *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct reftable_record *rec);
+
+/* Seek to `want` with in the block pointed to by `it` */
+int block_iter_seek(struct block_iter *it, struct strbuf *want);
+
+/* deallocate memory for `it`. The block reader and its block is left intact. */
+void block_iter_close(struct block_iter *it);
+
+/* size of file header, depending on format version */
+int header_size(int version);
+
+/* size of file footer, depending on format version */
+int footer_size(int version);
+
+/* returns a block to its source. */
+void reftable_block_done(struct reftable_block *ret);
+
+#endif
diff --git a/reftable/block_test.c b/reftable/block_test.c
new file mode 100644
index 0000000000..3e071d7851
--- /dev/null
+++ b/reftable/block_test.c
@@ -0,0 +1,157 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+struct binsearch_args {
+	int key;
+	int *arr;
+};
+
+static int binsearch_func(size_t i, void *void_args)
+{
+	struct binsearch_args *args = (struct binsearch_args *)void_args;
+
+	return args->key < args->arr[i];
+}
+
+static void test_binsearch(void)
+{
+	int arr[] = { 2, 4, 6, 8, 10 };
+	size_t sz = ARRAY_SIZE(arr);
+	struct binsearch_args args = {
+		.arr = arr,
+	};
+
+	int i = 0;
+	for (i = 1; i < 11; i++) {
+		int res;
+		args.key = i;
+		res = binsearch(sz, &binsearch_func, &args);
+
+		if (res < sz) {
+			assert(args.key < arr[res]);
+			if (res > 0) {
+				assert(args.key >= arr[res - 1]);
+			}
+		} else {
+			assert(args.key == 10 || args.key == 11);
+		}
+	}
+}
+
+static void test_block_read_write(void)
+{
+	const int header_off = 21; /* random */
+	char *names[30];
+	const int N = ARRAY_SIZE(names);
+	const int block_size = 1024;
+	struct reftable_block block = { 0 };
+	struct block_writer bw = {
+		.last_key = STRBUF_INIT,
+	};
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_record rec = { 0 };
+	int i = 0;
+	int n;
+	struct block_reader br = { 0 };
+	struct block_iter it = { .last_key = STRBUF_INIT };
+	int j = 0;
+	struct strbuf want = STRBUF_INIT;
+
+	block.data = reftable_calloc(block_size);
+	block.len = block_size;
+	block.source = malloc_block_source();
+	block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size,
+			  header_off, hash_size(SHA1_ID));
+	reftable_record_from_ref(&rec, &ref);
+
+	for (i = 0; i < N; i++) {
+		char name[100];
+		uint8_t hash[SHA1_SIZE];
+		snprintf(name, sizeof(name), "branch%02d", i);
+		memset(hash, i, sizeof(hash));
+
+		ref.refname = name;
+		ref.value = hash;
+		names[i] = xstrdup(name);
+		n = block_writer_add(&bw, &rec);
+		ref.refname = NULL;
+		ref.value = NULL;
+		assert(n == 0);
+	}
+
+	n = block_writer_finish(&bw);
+	assert(n > 0);
+
+	block_writer_clear(&bw);
+
+	block_reader_init(&br, &block, header_off, block_size, SHA1_SIZE);
+
+	block_reader_start(&br, &it);
+
+	while (true) {
+		int r = block_iter_next(&it, &rec);
+		assert(r >= 0);
+		if (r > 0) {
+			break;
+		}
+		assert_streq(names[j], ref.refname);
+		j++;
+	}
+
+	reftable_record_clear(&rec);
+	block_iter_close(&it);
+
+	for (i = 0; i < N; i++) {
+		struct block_iter it = { .last_key = STRBUF_INIT };
+		strbuf_reset(&want);
+		strbuf_addstr(&want, names[i]);
+
+		n = block_reader_seek(&br, &it, &want);
+		assert(n == 0);
+
+		n = block_iter_next(&it, &rec);
+		assert(n == 0);
+
+		assert_streq(names[i], ref.refname);
+
+		want.len--;
+		n = block_reader_seek(&br, &it, &want);
+		assert(n == 0);
+
+		n = block_iter_next(&it, &rec);
+		assert(n == 0);
+		assert_streq(names[10 * (i / 10)], ref.refname);
+
+		block_iter_close(&it);
+	}
+
+	reftable_record_clear(&rec);
+	reftable_block_done(&br.block);
+	strbuf_release(&want);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+}
+
+int block_test_main(int argc, const char *argv[])
+{
+	add_test_case("binsearch", &test_binsearch);
+	add_test_case("block_read_write", &test_block_read_write);
+	return test_main(argc, argv);
+}
diff --git a/reftable/compat.c b/reftable/compat.c
new file mode 100644
index 0000000000..be0fd65d2f
--- /dev/null
+++ b/reftable/compat.c
@@ -0,0 +1,98 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+
+*/
+
+/* compat.c - compatibility functions for standalone compilation */
+
+#include "system.h"
+#include "basics.h"
+
+#ifdef REFTABLE_STANDALONE
+
+#include <dirent.h>
+
+void put_be32(void *p, uint32_t i)
+{
+	uint8_t *out = (uint8_t *)p;
+
+	out[0] = (uint8_t)((i >> 24) & 0xff);
+	out[1] = (uint8_t)((i >> 16) & 0xff);
+	out[2] = (uint8_t)((i >> 8) & 0xff);
+	out[3] = (uint8_t)((i)&0xff);
+}
+
+uint32_t get_be32(uint8_t *in)
+{
+	return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
+	       (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
+}
+
+void put_be64(void *p, uint64_t v)
+{
+	uint8_t *out = (uint8_t *)p;
+	int i = sizeof(uint64_t);
+	while (i--) {
+		out[i] = (uint8_t)(v & 0xff);
+		v >>= 8;
+	}
+}
+
+uint64_t get_be64(void *out)
+{
+	uint8_t *bytes = (uint8_t *)out;
+	uint64_t v = 0;
+	int i = 0;
+	for (i = 0; i < sizeof(uint64_t); i++) {
+		v = (v << 8) | (uint8_t)(bytes[i] & 0xff);
+	}
+	return v;
+}
+
+uint16_t get_be16(uint8_t *in)
+{
+	return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
+}
+
+char *xstrdup(const char *s)
+{
+	int l = strlen(s);
+	char *dest = (char *)reftable_malloc(l + 1);
+	strncpy(dest, s, l + 1);
+	return dest;
+}
+
+void sleep_millisec(int millisecs)
+{
+	usleep(millisecs * 1000);
+}
+
+void reftable_clear_dir(const char *dirname)
+{
+	DIR *dir = opendir(dirname);
+	struct dirent *ent = NULL;
+	assert(dir);
+	while ((ent = readdir(dir)) != NULL) {
+		unlinkat(dirfd(dir), ent->d_name, 0);
+	}
+	closedir(dir);
+	rmdir(dirname);
+}
+
+#else
+
+#include "../dir.h"
+
+void reftable_clear_dir(const char *dirname)
+{
+	struct strbuf path = STRBUF_INIT;
+	strbuf_addstr(&path, dirname);
+	remove_dir_recursively(&path, 0);
+	strbuf_release(&path);
+}
+
+#endif
diff --git a/reftable/compat.h b/reftable/compat.h
new file mode 100644
index 0000000000..a765c57e96
--- /dev/null
+++ b/reftable/compat.h
@@ -0,0 +1,48 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef COMPAT_H
+#define COMPAT_H
+
+#include "system.h"
+
+#ifdef REFTABLE_STANDALONE
+
+/* functions that git-core provides, for standalone compilation */
+#include <stdint.h>
+
+uint64_t get_be64(void *in);
+void put_be64(void *out, uint64_t i);
+
+void put_be32(void *out, uint32_t i);
+uint32_t get_be32(uint8_t *in);
+
+uint16_t get_be16(uint8_t *in);
+
+#define ARRAY_SIZE(a) sizeof((a)) / sizeof((a)[0])
+#define FREE_AND_NULL(x)          \
+	do {                      \
+		reftable_free(x); \
+		(x) = NULL;       \
+	} while (0)
+#define QSORT(arr, n, cmp) qsort(arr, n, sizeof(arr[0]), cmp)
+#define SWAP(a, b)                              \
+	{                                       \
+		char tmp[sizeof(a)];            \
+		assert(sizeof(a) == sizeof(b)); \
+		memcpy(&tmp[0], &a, sizeof(a)); \
+		memcpy(&a, &b, sizeof(a));      \
+		memcpy(&b, &tmp[0], sizeof(a)); \
+	}
+
+char *xstrdup(const char *s);
+
+void sleep_millisec(int millisecs);
+
+#endif
+#endif
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 0000000000..5eee72c4c1
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,21 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 0000000000..00b444e8c9
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,212 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+
+#include "reftable.h"
+#include "reftable-tests.h"
+
+static uint32_t hash_id;
+
+static int dump_table(const char *tablename)
+{
+	struct reftable_block_source src = { 0 };
+	int err = reftable_block_source_from_file(&src, tablename);
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+	struct reftable_reader *r = NULL;
+
+	if (err < 0)
+		return err;
+
+	err = reftable_new_reader(&r, &src, tablename);
+	if (err < 0)
+		return err;
+
+	err = reftable_reader_seek_ref(r, &it, "");
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_ref_record_print(&ref, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_ref_record_clear(&ref);
+
+	err = reftable_reader_seek_log(r, &it, "");
+	if (err < 0) {
+		return err;
+	}
+	while (1) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_log_record_print(&log, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_log_record_clear(&log);
+
+	reftable_reader_free(r);
+	return 0;
+}
+
+static int compact_stack(const char *stackdir)
+{
+	struct reftable_stack *stack = NULL;
+	struct reftable_write_options cfg = {};
+
+	int err = reftable_new_stack(&stack, stackdir, cfg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_stack_compact_all(stack, NULL);
+	if (err < 0)
+		goto done;
+done:
+	if (stack != NULL) {
+		reftable_stack_destroy(stack);
+	}
+	return err;
+}
+
+static int dump_stack(const char *stackdir)
+{
+	struct reftable_stack *stack = NULL;
+	struct reftable_write_options cfg = {};
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+	struct reftable_merged_table *merged = NULL;
+
+	int err = reftable_new_stack(&stack, stackdir, cfg);
+	if (err < 0)
+		return err;
+
+	merged = reftable_stack_merged_table(stack);
+
+	err = reftable_merged_table_seek_ref(merged, &it, "");
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_ref_record_print(&ref, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_ref_record_clear(&ref);
+
+	err = reftable_merged_table_seek_log(merged, &it, "");
+	if (err < 0) {
+		return err;
+	}
+	while (1) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_log_record_print(&log, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_log_record_clear(&log);
+
+	reftable_stack_destroy(stack);
+	return 0;
+}
+
+static void print_help(void)
+{
+	printf("usage: dump [-cst] arg\n\n"
+	       "options: \n"
+	       "  -c compact\n"
+	       "  -t dump table\n"
+	       "  -s dump stack\n"
+	       "  -h this help\n"
+	       "  -2 use SHA256\n"
+	       "\n");
+}
+
+int reftable_dump_main(int argc, char *const *argv)
+{
+	int err = 0;
+	int opt;
+	int opt_dump_table = 0;
+	int opt_dump_stack = 0;
+	int opt_compact = 0;
+	const char *arg = NULL;
+	while ((opt = getopt(argc, argv, "2chts")) != -1) {
+		switch (opt) {
+		case '2':
+			hash_id = 0x73323536;
+			break;
+		case 't':
+			opt_dump_table = 1;
+			break;
+		case 's':
+			opt_dump_stack = 1;
+			break;
+		case 'c':
+			opt_compact = 1;
+			break;
+		case '?':
+		case 'h':
+			print_help();
+			return 2;
+			break;
+		}
+	}
+
+	if (argv[optind] == NULL) {
+		fprintf(stderr, "need argument\n");
+		print_help();
+		return 2;
+	}
+
+	arg = argv[optind];
+
+	if (opt_dump_table) {
+		err = dump_table(arg);
+	} else if (opt_dump_stack) {
+		err = dump_stack(arg);
+	} else if (opt_compact) {
+		err = compact_stack(arg);
+	}
+
+	if (err < 0) {
+		fprintf(stderr, "%s: %s: %s\n", argv[0], arg,
+			reftable_error_str(err));
+		return 1;
+	}
+	return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 0000000000..eacf41f23b
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,95 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	reftable_free(b);
+}
+
+static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = reftable_malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size)
+		return -1;
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int reftable_block_source_from_file(struct reftable_block_source *bs,
+				    const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	struct file_block_source *p = NULL;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return REFTABLE_NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0)
+		return -1;
+
+	p = reftable_calloc(sizeof(struct file_block_source));
+	p->size = st.st_size;
+	p->fd = fd;
+
+	assert(bs->ops == NULL);
+	bs->ops = &file_vtable;
+	bs->arg = p;
+	return 0;
+}
+
+int reftable_fd_write(void *arg, const void *data, size_t sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 0000000000..7f385270b0
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,242 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct reftable_iterator *it)
+{
+	return it->ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct reftable_record *rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct reftable_iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct reftable_iterator *it)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec)
+{
+	return it->ops->next(it->iter_arg, rec);
+}
+
+void reftable_iterator_destroy(struct reftable_iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, ref);
+	return iterator_next(it, &rec);
+}
+
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, log);
+	return iterator_next(it, &rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	strbuf_release(&fri->oid);
+	reftable_iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg,
+				       struct reftable_record *rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+	int err = 0;
+	while (true) {
+		err = reftable_iterator_next_ref(&fri->it, ref);
+		if (err != 0) {
+			break;
+		}
+
+		if (fri->double_check) {
+			struct reftable_iterator it = { 0 };
+
+			err = reftable_table_seek_ref(&fri->tab, &it,
+						      ref->refname);
+			if (err == 0) {
+				err = reftable_iterator_next_ref(&it, ref);
+			}
+
+			reftable_iterator_destroy(&it);
+
+			if (err < 0) {
+				break;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+
+	reftable_ref_record_clear(ref);
+	return err;
+}
+
+struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reftable_block_done(&it->block_reader.block);
+	strbuf_release(&it->oid);
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	uint64_t off;
+	int err = 0;
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reftable_block_done(&it->block_reader.block);
+
+	off = it->offsets[it->offset_idx++];
+	err = reader_init_block_reader(it->r, &it->block_reader, off,
+				       BLOCK_TYPE_REF);
+	if (err < 0) {
+		return err;
+	}
+	if (err > 0) {
+		/* indexed block does not exist. */
+		return REFTABLE_FORMAT_ERROR;
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, uint8_t *oid,
+			       int oid_len, uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter empty = INDEXED_TABLE_REF_ITER_INIT;
+	struct indexed_table_ref_iter *itr =
+		reftable_calloc(sizeof(struct indexed_table_ref_iter));
+	int err = 0;
+
+	*itr = empty;
+	itr->r = r;
+	strbuf_add(&itr->oid, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		reftable_free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 0000000000..c96aaed583
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,72 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "strbuf.h"
+
+struct reftable_iterator_vtable {
+	int (*next)(void *iter_arg, struct reftable_record *rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct reftable_iterator *it);
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec);
+
+/* Returns true for a zeroed out iterator, such as the one returned from
+   iterator_destroy. */
+bool iterator_is_null(struct reftable_iterator *it);
+
+/* iterator that produces only ref records that point to `oid` */
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reftable_table tab;
+	struct strbuf oid;
+	struct reftable_iterator it;
+};
+#define FILTERING_REF_ITERATOR_INIT \
+	{                           \
+		.oid = STRBUF_INIT  \
+	}
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
+					  struct filtering_ref_iterator *);
+
+/* iterator that produces only ref records that point to `oid`,
+   but using the object index.
+ */
+struct indexed_table_ref_iter {
+	struct reftable_reader *r;
+	struct strbuf oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+#define INDEXED_TABLE_REF_ITER_INIT                                     \
+	{                                                               \
+		.cur = { .last_key = STRBUF_INIT }, .oid = STRBUF_INIT, \
+	}
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, uint8_t *oid,
+			       int oid_len, uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 0000000000..770178183f
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,317 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+#include "record.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct reftable_record rec = reftable_new_record(mi->typ);
+		int err = iterator_next(&mi->stack[i], &rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[i]);
+			reftable_record_destroy(&rec);
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->stack[i]);
+	}
+	reftable_free(mi->stack);
+}
+
+static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
+					       size_t idx)
+{
+	struct reftable_record rec = reftable_new_record(mi->typ);
+	struct pq_entry e = {
+		.rec = rec,
+		.index = idx,
+	};
+	int err = iterator_next(&mi->stack[idx], &rec);
+	if (err < 0)
+		return err;
+
+	if (err > 0) {
+		reftable_iterator_destroy(&mi->stack[idx]);
+		reftable_record_destroy(&rec);
+		return 0;
+	}
+
+	merged_iter_pqueue_add(&mi->pq, e);
+	return 0;
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
+{
+	if (iterator_is_null(&mi->stack[idx]))
+		return 0;
+	return merged_iter_advance_nonnull_subiter(mi, idx);
+}
+
+static int merged_iter_next_entry(struct merged_iter *mi,
+				  struct reftable_record *rec)
+{
+	struct strbuf entry_key = STRBUF_INIT;
+	struct pq_entry entry = { 0 };
+	int err = 0;
+
+	if (merged_iter_pqueue_is_empty(mi->pq))
+		return 1;
+
+	entry = merged_iter_pqueue_remove(&mi->pq);
+	err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0)
+		return err;
+
+	/*
+	  One can also use reftable as datacenter-local storage, where the ref
+	  database is maintained in globally consistent database (eg.
+	  CockroachDB or Spanner). In this scenario, replication delays together
+	  with compaction may cause newer tables to contain older entries. In
+	  such a deployment, the loop below must be changed to collect all
+	  entries for the same key, and return new the newest one.
+	*/
+	reftable_record_key(&entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct strbuf k = STRBUF_INIT;
+		int err = 0, cmp = 0;
+
+		reftable_record_key(&top.rec, &k);
+
+		cmp = strbuf_cmp(&k, &entry_key);
+		strbuf_release(&k);
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		reftable_record_destroy(&top.rec);
+	}
+
+	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
+	reftable_record_destroy(&entry.rec);
+	strbuf_release(&entry_key);
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec)
+{
+	while (true) {
+		int err = merged_iter_next_entry(mi, rec);
+		if (err == 0 && mi->suppress_deletions &&
+		    reftable_record_is_deletion(rec)) {
+			continue;
+		}
+
+		return err;
+	}
+}
+
+static int merged_iter_next_void(void *p, struct reftable_record *rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq))
+		return 1;
+
+	return merged_iter_next(mi, rec);
+}
+
+struct reftable_iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct reftable_iterator *it,
+				      struct merged_iter *mi)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_table *stack, int n,
+			      uint32_t hash_id)
+{
+	struct reftable_merged_table *m = NULL;
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		uint64_t min = reftable_table_min_update_index(&stack[i]);
+		uint64_t max = reftable_table_max_update_index(&stack[i]);
+
+		if (reftable_table_hash_id(&stack[i]) != hash_id) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i == 0 || min < first_min) {
+			first_min = min;
+		}
+		if (i == 0 || max > last_max) {
+			last_max = max;
+		}
+	}
+
+	m = (struct reftable_merged_table *)reftable_calloc(
+		sizeof(struct reftable_merged_table));
+	m->stack = stack;
+	m->stack_len = n;
+	m->min = first_min;
+	m->max = last_max;
+	m->hash_id = hash_id;
+	*dest = m;
+	return 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct reftable_merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void reftable_merged_table_free(struct reftable_merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	reftable_free(mt);
+}
+
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
+{
+	return mt->min;
+}
+
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec)
+{
+	struct reftable_iterator *iters = reftable_calloc(
+		sizeof(struct reftable_iterator) * mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = reftable_record_type(rec),
+		.hash_id = mt->hash_id,
+		.suppress_deletions = mt->suppress_deletions,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reftable_table_seek_record(&mt->stack[i], &iters[n],
+						   rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			reftable_iterator_destroy(&iters[i]);
+		}
+		reftable_free(iters);
+		return err;
+	}
+
+	merged.stack_len = n;
+	err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	} else {
+		struct merged_iter *p =
+			reftable_malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	struct reftable_ref_record ref = {
+		.refname = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.refname = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_merged_table_seek_log_at(mt, it, name, max);
+}
+
+uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *mt)
+{
+	return mt->hash_id;
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 0000000000..a568726da8
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,43 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct reftable_merged_table {
+	struct reftable_table *stack;
+	size_t stack_len;
+	uint32_t hash_id;
+	bool suppress_deletions;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	size_t stack_len;
+	uint8_t typ;
+	bool suppress_deletions;
+	struct merged_iter_pqueue pq;
+};
+
+void merged_table_clear(struct reftable_merged_table *mt);
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec);
+
+int reftable_table_seek_record(struct reftable_table *tab,
+			       struct reftable_iterator *it,
+			       struct reftable_record *rec);
+
+#endif
diff --git a/reftable/merged_test.c b/reftable/merged_test.c
new file mode 100644
index 0000000000..2477eed06b
--- /dev/null
+++ b/reftable/merged_test.c
@@ -0,0 +1,286 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "pq.h"
+#include "reader.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_pq(void)
+{
+	char *names[54] = { 0 };
+	int N = ARRAY_SIZE(names) - 1;
+
+	struct merged_iter_pqueue pq = { 0 };
+	const char *last = NULL;
+
+	int i = 0;
+	for (i = 0; i < N; i++) {
+		char name[100];
+		snprintf(name, sizeof(name), "%02d", i);
+		names[i] = xstrdup(name);
+	}
+
+	i = 1;
+	do {
+		struct reftable_record rec =
+			reftable_new_record(BLOCK_TYPE_REF);
+		struct pq_entry e = { 0 };
+
+		reftable_record_as_ref(&rec)->refname = names[i];
+		e.rec = rec;
+		merged_iter_pqueue_add(&pq, e);
+		merged_iter_pqueue_check(pq);
+		i = (i * 7) % N;
+	} while (i != 1);
+
+	while (!merged_iter_pqueue_is_empty(pq)) {
+		struct pq_entry e = merged_iter_pqueue_remove(&pq);
+		struct reftable_ref_record *ref =
+			reftable_record_as_ref(&e.rec);
+
+		merged_iter_pqueue_check(pq);
+
+		if (last != NULL) {
+			assert(strcmp(last, ref->refname) < 0);
+		}
+		last = ref->refname;
+		ref->refname = NULL;
+		reftable_free(ref);
+	}
+
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+
+	merged_iter_pqueue_clear(&pq);
+}
+
+static void write_test_table(struct strbuf *buf,
+			     struct reftable_ref_record refs[], int n)
+{
+	int min = 0xffffffff;
+	int max = 0;
+	int i = 0;
+	int err;
+
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_writer *w = NULL;
+	for (i = 0; i < n; i++) {
+		uint64_t ui = refs[i].update_index;
+		if (ui > max) {
+			max = ui;
+		}
+		if (ui < min) {
+			min = ui;
+		}
+	}
+
+	w = reftable_new_writer(&strbuf_add_void, buf, &opts);
+	reftable_writer_set_limits(w, min, max);
+
+	for (i = 0; i < n; i++) {
+		uint64_t before = refs[i].update_index;
+		int n = reftable_writer_add_ref(w, &refs[i]);
+		assert(n == 0);
+		assert(before == refs[i].update_index);
+	}
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+
+	reftable_writer_free(w);
+}
+
+static struct reftable_merged_table *
+merged_table_from_records(struct reftable_ref_record **refs,
+			  struct reftable_block_source **source,
+			  struct reftable_reader ***readers, int *sizes,
+			  struct strbuf *buf, int n)
+{
+	int i = 0;
+	struct reftable_merged_table *mt = NULL;
+	int err;
+	struct reftable_table *tabs =
+		reftable_calloc(n * sizeof(struct reftable_table));
+	*readers = reftable_calloc(n * sizeof(struct reftable_reader *));
+	*source = reftable_calloc(n * sizeof(**source));
+	for (i = 0; i < n; i++) {
+		write_test_table(&buf[i], refs[i], sizes[i]);
+		block_source_from_strbuf(&(*source)[i], &buf[i]);
+
+		err = reftable_new_reader(&(*readers)[i], &(*source)[i],
+					  "name");
+		assert_err(err);
+		reftable_table_from_reader(&tabs[i], (*readers)[i]);
+	}
+
+	err = reftable_new_merged_table(&mt, tabs, n, SHA1_ID);
+	assert_err(err);
+	return mt;
+}
+
+static void readers_destroy(struct reftable_reader **readers, size_t n)
+{
+	int i = 0;
+	for (; i < n; i++)
+		reftable_reader_free(readers[i]);
+	reftable_free(readers);
+}
+
+static void test_merged_between(void)
+{
+	uint8_t hash1[SHA1_SIZE] = { 1, 2, 3, 0 };
+
+	struct reftable_ref_record r1[] = { {
+		.refname = "b",
+		.update_index = 1,
+		.value = hash1,
+	} };
+	struct reftable_ref_record r2[] = { {
+		.refname = "a",
+		.update_index = 2,
+	} };
+
+	struct reftable_ref_record *refs[] = { r1, r2 };
+	int sizes[] = { 1, 1 };
+	struct strbuf bufs[2] = { STRBUF_INIT, STRBUF_INIT };
+	struct reftable_block_source *bs = NULL;
+	struct reftable_reader **readers = NULL;
+	struct reftable_merged_table *mt =
+		merged_table_from_records(refs, &bs, &readers, sizes, bufs, 2);
+	int i;
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_iterator it = { 0 };
+	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert_err(err);
+	assert(ref.update_index == 2);
+	reftable_ref_record_clear(&ref);
+	reftable_iterator_destroy(&it);
+	readers_destroy(readers, 2);
+	reftable_merged_table_free(mt);
+	for (i = 0; i < ARRAY_SIZE(bufs); i++) {
+		strbuf_release(&bufs[i]);
+	}
+	reftable_free(bs);
+}
+
+static void test_merged(void)
+{
+	uint8_t hash1[SHA1_SIZE] = { 1 };
+	uint8_t hash2[SHA1_SIZE] = { 2 };
+	struct reftable_ref_record r1[] = { {
+						    .refname = "a",
+						    .update_index = 1,
+						    .value = hash1,
+					    },
+					    {
+						    .refname = "b",
+						    .update_index = 1,
+						    .value = hash1,
+					    },
+					    {
+						    .refname = "c",
+						    .update_index = 1,
+						    .value = hash1,
+					    } };
+	struct reftable_ref_record r2[] = { {
+		.refname = "a",
+		.update_index = 2,
+	} };
+	struct reftable_ref_record r3[] = {
+		{
+			.refname = "c",
+			.update_index = 3,
+			.value = hash2,
+		},
+		{
+			.refname = "d",
+			.update_index = 3,
+			.value = hash1,
+		},
+	};
+
+	struct reftable_ref_record want[] = {
+		r2[0],
+		r1[1],
+		r3[0],
+		r3[1],
+	};
+
+	struct reftable_ref_record *refs[] = { r1, r2, r3 };
+	int sizes[3] = { 3, 1, 2 };
+	struct strbuf bufs[3] = { STRBUF_INIT, STRBUF_INIT, STRBUF_INIT };
+	struct reftable_block_source *bs = NULL;
+	struct reftable_reader **readers = NULL;
+	struct reftable_merged_table *mt =
+		merged_table_from_records(refs, &bs, &readers, sizes, bufs, 3);
+
+	struct reftable_iterator it = { 0 };
+	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	struct reftable_ref_record *out = NULL;
+	size_t len = 0;
+	size_t cap = 0;
+	int i = 0;
+
+	assert_err(err);
+	while (len < 100) { /* cap loops/recursion. */
+		struct reftable_ref_record ref = { 0 };
+		int err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			out = reftable_realloc(
+				out, sizeof(struct reftable_ref_record) * cap);
+		}
+		out[len++] = ref;
+	}
+	reftable_iterator_destroy(&it);
+
+	assert(ARRAY_SIZE(want) == len);
+	for (i = 0; i < len; i++) {
+		assert(reftable_ref_record_equal(&want[i], &out[i], SHA1_SIZE));
+	}
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&out[i]);
+	}
+	reftable_free(out);
+
+	for (i = 0; i < 3; i++) {
+		strbuf_release(&bufs[i]);
+	}
+	readers_destroy(readers, 3);
+	reftable_merged_table_free(mt);
+	reftable_free(bs);
+}
+
+/* XXX test refs_for(oid) */
+
+int merged_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_merged_between", &test_merged_between);
+	add_test_case("test_pq", &test_pq);
+	add_test_case("test_merged", &test_merged);
+	return test_main(argc, argv);
+}
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 0000000000..de7ed658a4
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,115 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "reftable.h"
+#include "system.h"
+#include "basics.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct strbuf ak = STRBUF_INIT;
+	struct strbuf bk = STRBUF_INIT;
+	int cmp = 0;
+	reftable_record_key(&a.rec, &ak);
+	reftable_record_key(&b.rec, &bk);
+
+	cmp = strbuf_cmp(&ak, &bk);
+
+	strbuf_release(&ak);
+	strbuf_release(&bk);
+
+	if (cmp == 0)
+		return a.index > b.index;
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = reftable_realloc(pq->heap,
+					    pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		reftable_record_destroy(&pq->heap[i].rec);
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 0000000000..1117f70306
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct reftable_record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	size_t len;
+	size_t cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 0000000000..29e5b70439
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,744 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct reftable_block_source *source)
+{
+	return source->ops->size(source->arg);
+}
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	int result = source->ops->read_block(source->arg, dest, off, size);
+	dest->source = *source;
+	return result;
+}
+
+void reftable_block_done(struct reftable_block *blockp)
+{
+	struct reftable_block_source source = blockp->source;
+	if (blockp != NULL && source.ops != NULL)
+		source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct reftable_block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reftable_reader_offsets *
+reader_offsets_for(struct reftable_reader *r, uint8_t typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reftable_reader *r,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size)
+		return 0;
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(&r->source, dest, off, sz);
+}
+
+uint32_t reftable_reader_hash_id(struct reftable_reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reftable_reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reftable_reader *r, uint8_t *footer,
+			uint8_t *header)
+{
+	uint8_t *f = footer;
+	uint8_t first_block_typ;
+	int err = 0;
+	uint32_t computed_crc;
+	uint32_t file_crc;
+
+	if (memcmp(f, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, header_size(r->version))) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	f++;
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	if (r->version == 1) {
+		r->hash_id = SHA1_ID;
+	} else {
+		r->hash_id = get_be32(f);
+		switch (r->hash_id) {
+		case SHA1_ID:
+			break;
+		case SHA256_ID:
+			break;
+		default:
+			err = REFTABLE_FORMAT_ERROR;
+			goto done;
+		}
+		f += 4;
+	}
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	computed_crc = crc32(0, footer, f - footer);
+	file_crc = get_be32(f);
+	f += 4;
+	if (computed_crc != file_crc) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	first_block_typ = header[header_size(r->version)];
+	r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+	r->ref_offsets.offset = 0;
+	r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+				  r->log_offsets.offset > 0);
+	r->obj_offsets.present = r->obj_offsets.offset > 0;
+	err = 0;
+done:
+	return err;
+}
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name)
+{
+	struct reftable_block footer = { 0 };
+	struct reftable_block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reftable_reader));
+
+	/* Need +1 to read type of first block. */
+	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
+	if (err != header_size(2) + 1) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	if (memcmp(header.data, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+	r->version = header.data[4];
+	if (r->version != 1 && r->version != 2) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	r->size = block_source_size(source) - footer_size(r->version);
+	r->source = *source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size,
+				      footer_size(r->version));
+	if (err != footer_size(r->version)) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+done:
+	reftable_block_done(&footer);
+	reftable_block_done(&header);
+	return err;
+}
+
+struct table_iter {
+	struct reftable_reader *r;
+	uint8_t typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+#define TABLE_ITER_INIT                          \
+	{                                        \
+		.bi = {.last_key = STRBUF_INIT } \
+	}
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti,
+				    struct reftable_record *rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && reftable_record_type(rec) == BLOCK_TYPE_REF) {
+		((struct reftable_ref_record *)rec->data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reftable_block_done(&ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(uint8_t *data, uint8_t *typ, uint64_t off,
+				  int version)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += header_size(version);
+	}
+
+	*typ = data[0];
+	if (reftable_is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, uint8_t want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+							 DEFAULT_BLOCK_SIZE;
+	struct reftable_block block = { 0 };
+	uint8_t block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : header_size(r->version);
+	int32_t block_size = 0;
+
+	if (next_off >= r->size)
+		return 1;
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0)
+		return err;
+
+	block_size = extract_block_size(block.data, &block_typ, next_off,
+					r->version);
+	if (block_size < 0)
+		return block_size;
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reftable_block_done(&block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reftable_block_done(&block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0)
+		return err;
+	else {
+		struct block_reader *brp =
+			reftable_malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
+{
+	if (reftable_record_type(rec) != ti->typ)
+		return REFTABLE_API_ERROR;
+
+	while (true) {
+		struct table_iter next = TABLE_ITER_INIT;
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct reftable_record *rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct reftable_iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reftable_reader *r,
+				struct table_iter *ti, uint64_t off,
+				uint8_t typ)
+{
+	struct block_reader br = { 0 };
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0)
+		return err;
+
+	brp = reftable_malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reftable_reader *r, struct table_iter *ti,
+			uint8_t typ, bool index)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
+			      struct reftable_record *want)
+{
+	struct reftable_record rec =
+		reftable_new_record(reftable_record_type(want));
+	struct strbuf want_key = STRBUF_INIT;
+	struct strbuf got_key = STRBUF_INIT;
+	struct table_iter next = TABLE_ITER_INIT;
+	int err = -1;
+
+	reftable_record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0)
+			goto done;
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0)
+			goto done;
+
+		if (strbuf_cmp(&got_key, &want_key) > 0) {
+			table_iter_block_done(&next);
+			break;
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, &want_key);
+	if (err < 0)
+		goto done;
+	err = 0;
+
+done:
+	block_iter_close(&next.bi);
+	reftable_record_destroy(&rec);
+	strbuf_release(&want_key);
+	strbuf_release(&got_key);
+	return err;
+}
+
+static int reader_seek_indexed(struct reftable_reader *r,
+			       struct reftable_iterator *it,
+			       struct reftable_record *rec)
+{
+	struct reftable_index_record want_index = { .last_key = STRBUF_INIT };
+	struct reftable_record want_index_rec = { 0 };
+	struct reftable_index_record index_result = { .last_key = STRBUF_INIT };
+	struct reftable_record index_result_rec = { 0 };
+	struct table_iter index_iter = TABLE_ITER_INIT;
+	struct table_iter next = TABLE_ITER_INIT;
+	int err = 0;
+
+	reftable_record_key(rec, &want_index.last_key);
+	reftable_record_from_index(&want_index_rec, &want_index);
+	reftable_record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, reftable_record_type(rec), true);
+	if (err < 0)
+		goto done;
+
+	err = reader_seek_linear(r, &index_iter, &want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, &index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0)
+			goto done;
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0)
+			goto done;
+
+		err = block_iter_seek(&next.bi, &want_index.last_key);
+		if (err < 0)
+			goto done;
+
+		if (next.typ == reftable_record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = REFTABLE_FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter empty = TABLE_ITER_INIT;
+		struct table_iter *malloced =
+			reftable_calloc(sizeof(struct table_iter));
+		*malloced = empty;
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+done:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	reftable_record_clear(&want_index_rec);
+	reftable_record_clear(&index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reftable_reader *r,
+				struct reftable_iterator *it,
+				struct reftable_record *rec)
+{
+	struct reftable_reader_offsets *offs =
+		reader_offsets_for(r, reftable_record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = TABLE_ITER_INIT;
+	int err = 0;
+	if (idx > 0)
+		return reader_seek_indexed(r, it, rec);
+
+	err = reader_start(r, &ti, reftable_record_type(rec), false);
+	if (err < 0)
+		return err;
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0)
+		return err;
+	else {
+		struct table_iter *p =
+			reftable_malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec)
+{
+	uint8_t typ = reftable_record_type(rec);
+
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.refname = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.refname = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reftable_reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int reftable_new_reader(struct reftable_reader **p,
+			struct reftable_block_source *src, char const *name)
+{
+	struct reftable_reader *rd =
+		reftable_calloc(sizeof(struct reftable_reader));
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		block_source_close(src);
+		reftable_free(rd);
+	}
+	return err;
+}
+
+void reftable_reader_free(struct reftable_reader *r)
+{
+	reader_close(r);
+	reftable_free(r);
+}
+
+static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
+					    struct reftable_iterator *it,
+					    uint8_t *oid)
+{
+	struct reftable_obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct reftable_record want_rec = { 0 };
+	struct reftable_iterator oit = { 0 };
+	struct reftable_obj_record got = { 0 };
+	struct reftable_record got_rec = { 0 };
+	int err = 0;
+	struct indexed_table_ref_iter *itr = NULL;
+
+	/* Look through the reverse index. */
+	reftable_record_from_obj(&want_rec, &want);
+	err = reader_seek(r, &oit, &want_rec);
+	if (err != 0)
+		goto done;
+
+	/* read out the reftable_obj_record */
+	reftable_record_from_obj(&got_rec, &got);
+	err = iterator_next(&oit, &got_rec);
+	if (err < 0)
+		goto done;
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		/* didn't find it; return empty iterator */
+		iterator_set_empty(it);
+		err = 0;
+		goto done;
+	}
+
+	err = new_indexed_table_ref_iter(&itr, r, oid, hash_size(r->hash_id),
+					 got.offsets, got.offset_len);
+	if (err < 0)
+		goto done;
+	got.offsets = NULL;
+	iterator_from_indexed_table_ref_iter(it, itr);
+
+done:
+	reftable_iterator_destroy(&oit);
+	reftable_record_clear(&got_rec);
+	return err;
+}
+
+static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
+					      struct reftable_iterator *it,
+					      uint8_t *oid)
+{
+	struct table_iter ti_empty = TABLE_ITER_INIT;
+	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
+	struct filtering_ref_iterator *filter = NULL;
+	struct filtering_ref_iterator empty = FILTERING_REF_ITERATOR_INIT;
+	int oid_len = hash_size(r->hash_id);
+	int err;
+
+	*ti = ti_empty;
+	err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		reftable_free(ti);
+		return err;
+	}
+
+	filter = reftable_malloc(sizeof(struct filtering_ref_iterator));
+	*filter = empty;
+
+	strbuf_add(&filter->oid, oid, oid_len);
+	reftable_table_from_reader(&filter->tab, r);
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid)
+{
+	if (r->obj_offsets.present)
+		return reftable_reader_refs_for_indexed(r, it, oid);
+	return reftable_reader_refs_for_unindexed(r, it, oid);
+}
+
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 0000000000..213e8fe847
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,65 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source *source);
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_close(struct reftable_block_source *source);
+
+/* metadata for a block type */
+struct reftable_reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+/* The state for reading a reftable file. */
+struct reftable_reader {
+	/* for convience, associate a name with the instance. */
+	char *name;
+	struct reftable_block_source source;
+
+	/* Size of the file, excluding the footer. */
+	uint64_t size;
+
+	/* 'sha1' for SHA1, 's256' for SHA-256 */
+	uint32_t hash_id;
+
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	/* Length of the OID keys in the 'o' section */
+	int object_id_len;
+	int version;
+
+	struct reftable_reader_offsets ref_offsets;
+	struct reftable_reader_offsets obj_offsets;
+	struct reftable_reader_offsets log_offsets;
+};
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name);
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec);
+void reader_close(struct reftable_reader *r);
+const char *reader_name(struct reftable_reader *r);
+
+/* initialize a block reader to read from `r` */
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, uint8_t want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 0000000000..08ab921d32
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1126 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+/* record.c - methods for different types of records. */
+
+#include "record.h"
+
+#include "system.h"
+#include "constants.h"
+#include "reftable.h"
+#include "basics.h"
+
+int get_var_int(uint64_t *dest, struct string_view *in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in->len == 0)
+		return -1;
+	val = in->buf[ptr] & 0x7f;
+
+	while (in->buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in->len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in->buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct string_view *dest, uint64_t val)
+{
+	uint8_t buf[10] = { 0 };
+	int i = 9;
+	int n = 0;
+	buf[i] = (uint8_t)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (uint8_t)(val & 0x7f);
+		i--;
+	}
+
+	n = sizeof(buf) - i - 1;
+	if (dest->len < n)
+		return -1;
+	memcpy(dest->buf, &buf[i + 1], n);
+	return n;
+}
+
+int reftable_is_block_type(uint8_t typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+static int decode_string(struct strbuf *dest, struct string_view in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, &in);
+	if (n <= 0)
+		return -1;
+	string_view_consume(&in, n);
+	if (in.len < tsize)
+		return -1;
+
+	strbuf_reset(dest);
+	strbuf_add(dest, in.buf, tsize);
+	string_view_consume(&in, tsize);
+
+	return start_len - in.len;
+}
+
+static int encode_string(char *str, struct string_view s)
+{
+	struct string_view start = s;
+	int l = strlen(str);
+	int n = put_var_int(&s, l);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+	if (s.len < l)
+		return -1;
+	memcpy(s.buf, str, l);
+	string_view_consume(&s, l);
+
+	return start.len - s.len;
+}
+
+int reftable_encode_key(bool *restart, struct string_view dest,
+			struct strbuf prev_key, struct strbuf key,
+			uint8_t extra)
+{
+	struct string_view start = dest;
+	int prefix_len = common_prefix_size(&prev_key, &key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(&dest, (uint64_t)prefix_len);
+	if (n < 0)
+		return -1;
+	string_view_consume(&dest, n);
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(&dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0)
+		return -1;
+	string_view_consume(&dest, n);
+
+	if (dest.len < suffix_len)
+		return -1;
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	string_view_consume(&dest, suffix_len);
+
+	return start.len - dest.len;
+}
+
+int reftable_decode_key(struct strbuf *key, uint8_t *extra,
+			struct strbuf last_key, struct string_view in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, &in);
+	if (n < 0)
+		return -1;
+	string_view_consume(&in, n);
+
+	if (prefix_len > last_key.len)
+		return -1;
+
+	n = get_var_int(&suffix_len, &in);
+	if (n <= 0)
+		return -1;
+	string_view_consume(&in, n);
+
+	*extra = (uint8_t)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len)
+		return -1;
+
+	strbuf_reset(key);
+	strbuf_add(key, last_key.buf, prefix_len);
+	strbuf_add(key, in.buf, suffix_len);
+	string_view_consume(&in, suffix_len);
+
+	return start_len - in.len;
+}
+
+static void reftable_ref_record_key(const void *r, struct strbuf *dest)
+{
+	const struct reftable_ref_record *rec =
+		(const struct reftable_ref_record *)r;
+	strbuf_reset(dest);
+	strbuf_addstr(dest, rec->refname);
+}
+
+static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
+	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	reftable_ref_record_clear(ref);
+	if (src->refname != NULL) {
+		ref->refname = xstrdup(src->refname);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = reftable_malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = reftable_malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9)
+		return '0' + c;
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, uint8_t *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+	printf("ref{%s(%" PRIu64 ") ", ref->refname, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size(hash_id));
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size(hash_id));
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void reftable_ref_record_clear_void(void *rec)
+{
+	reftable_ref_record_clear((struct reftable_ref_record *)rec);
+}
+
+void reftable_ref_record_clear(struct reftable_ref_record *ref)
+{
+	reftable_free(ref->refname);
+	reftable_free(ref->target);
+	reftable_free(ref->target_value);
+	reftable_free(ref->value);
+	memset(ref, 0, sizeof(struct reftable_ref_record));
+}
+
+static uint8_t reftable_ref_record_val_type(const void *rec)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL)
+		return 3;
+	return 0;
+}
+
+static int reftable_ref_record_encode(const void *rec, struct string_view s,
+				      int hash_size)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	struct string_view start = s;
+	int n = put_var_int(&s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		string_view_consume(&s, hash_size);
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		string_view_consume(&s, hash_size);
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		string_view_consume(&s, n);
+	}
+
+	return start.len - s.len;
+}
+
+static int reftable_ref_record_decode(void *rec, struct strbuf key,
+				      uint8_t val_type, struct string_view in,
+				      int hash_size)
+{
+	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
+	struct string_view start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, &in);
+	if (n < 0)
+		return n;
+	assert(hash_size > 0);
+
+	string_view_consume(&in, n);
+
+	r->refname = reftable_realloc(r->refname, key.len + 1);
+	memcpy(r->refname, key.buf, key.len);
+	r->refname[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = reftable_malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		string_view_consume(&in, hash_size);
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = reftable_malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		string_view_consume(&in, hash_size);
+		break;
+	case 3: {
+		struct strbuf dest = STRBUF_INIT;
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		string_view_consume(&in, n);
+		seen_target = true;
+		if (r->target != NULL) {
+			reftable_free(r->target);
+		}
+		r->target = dest.buf;
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+static bool reftable_ref_record_is_deletion_void(const void *p)
+{
+	return reftable_ref_record_is_deletion(
+		(const struct reftable_ref_record *)p);
+}
+
+struct reftable_record_vtable reftable_ref_record_vtable = {
+	.key = &reftable_ref_record_key,
+	.type = BLOCK_TYPE_REF,
+	.copy_from = &reftable_ref_record_copy_from,
+	.val_type = &reftable_ref_record_val_type,
+	.encode = &reftable_ref_record_encode,
+	.decode = &reftable_ref_record_decode,
+	.clear = &reftable_ref_record_clear_void,
+	.is_deletion = &reftable_ref_record_is_deletion_void,
+};
+
+static void reftable_obj_record_key(const void *r, struct strbuf *dest)
+{
+	const struct reftable_obj_record *rec =
+		(const struct reftable_obj_record *)r;
+	strbuf_reset(dest);
+	strbuf_add(dest, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void reftable_obj_record_clear(void *rec)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	FREE_AND_NULL(obj->hash_prefix);
+	FREE_AND_NULL(obj->offsets);
+	memset(obj, 0, sizeof(struct reftable_obj_record));
+}
+
+static void reftable_obj_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	const struct reftable_obj_record *src =
+		(const struct reftable_obj_record *)src_rec;
+	int olen;
+
+	reftable_obj_record_clear(obj);
+	*obj = *src;
+	obj->hash_prefix = reftable_malloc(obj->hash_prefix_len);
+	memcpy(obj->hash_prefix, src->hash_prefix, obj->hash_prefix_len);
+
+	olen = obj->offset_len * sizeof(uint64_t);
+	obj->offsets = reftable_malloc(olen);
+	memcpy(obj->offsets, src->offsets, olen);
+}
+
+static uint8_t reftable_obj_record_val_type(const void *rec)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8)
+		return r->offset_len;
+	return 0;
+}
+
+static int reftable_obj_record_encode(const void *rec, struct string_view s,
+				      int hash_size)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	struct string_view start = s;
+	int i = 0;
+	int n = 0;
+	uint64_t last = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(&s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		string_view_consume(&s, n);
+	}
+	if (r->offset_len == 0)
+		return start.len - s.len;
+	n = put_var_int(&s, r->offsets[0]);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	last = r->offsets[0];
+	for (i = 1; i < r->offset_len; i++) {
+		int n = put_var_int(&s, r->offsets[i] - last);
+		if (n < 0) {
+			return -1;
+		}
+		string_view_consume(&s, n);
+		last = r->offsets[i];
+	}
+	return start.len - s.len;
+}
+
+static int reftable_obj_record_decode(void *rec, struct strbuf key,
+				      uint8_t val_type, struct string_view in,
+				      int hash_size)
+{
+	struct string_view start = in;
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	uint64_t last;
+	int j;
+	r->hash_prefix = reftable_malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, &in);
+		if (n < 0) {
+			return n;
+		}
+
+		string_view_consume(&in, n);
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0)
+		return start.len - in.len;
+
+	r->offsets = reftable_malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], &in);
+	if (n < 0)
+		return n;
+	string_view_consume(&in, n);
+
+	last = r->offsets[0];
+	j = 1;
+	while (j < count) {
+		uint64_t delta = 0;
+		int n = get_var_int(&delta, &in);
+		if (n < 0) {
+			return n;
+		}
+		string_view_consume(&in, n);
+
+		last = r->offsets[j] = (delta + last);
+		j++;
+	}
+	return start.len - in.len;
+}
+
+static bool not_a_deletion(const void *p)
+{
+	return false;
+}
+
+struct reftable_record_vtable reftable_obj_record_vtable = {
+	.key = &reftable_obj_record_key,
+	.type = BLOCK_TYPE_OBJ,
+	.copy_from = &reftable_obj_record_copy_from,
+	.val_type = &reftable_obj_record_val_type,
+	.encode = &reftable_obj_record_encode,
+	.decode = &reftable_obj_record_decode,
+	.clear = &reftable_obj_record_clear,
+	.is_deletion = not_a_deletion,
+};
+
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->refname,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size(hash_id));
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size(hash_id));
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static void reftable_log_record_key(const void *r, struct strbuf *dest)
+{
+	const struct reftable_log_record *rec =
+		(const struct reftable_log_record *)r;
+	int len = strlen(rec->refname);
+	uint8_t i64[8];
+	uint64_t ts = 0;
+	strbuf_reset(dest);
+	strbuf_add(dest, (uint8_t *)rec->refname, len + 1);
+
+	ts = (~ts) - rec->update_index;
+	put_be64(&i64[0], ts);
+	strbuf_add(dest, i64, sizeof(i64));
+}
+
+static void reftable_log_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
+	const struct reftable_log_record *src =
+		(const struct reftable_log_record *)src_rec;
+
+	reftable_log_record_clear(dst);
+	*dst = *src;
+	if (dst->refname != NULL) {
+		dst->refname = xstrdup(dst->refname);
+	}
+	if (dst->email != NULL) {
+		dst->email = xstrdup(dst->email);
+	}
+	if (dst->name != NULL) {
+		dst->name = xstrdup(dst->name);
+	}
+	if (dst->message != NULL) {
+		dst->message = xstrdup(dst->message);
+	}
+
+	if (dst->new_hash != NULL) {
+		dst->new_hash = reftable_malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = reftable_malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void reftable_log_record_clear_void(void *rec)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	reftable_log_record_clear(r);
+}
+
+void reftable_log_record_clear(struct reftable_log_record *r)
+{
+	reftable_free(r->refname);
+	reftable_free(r->new_hash);
+	reftable_free(r->old_hash);
+	reftable_free(r->name);
+	reftable_free(r->email);
+	reftable_free(r->message);
+	memset(r, 0, sizeof(struct reftable_log_record));
+}
+
+static uint8_t reftable_log_record_val_type(const void *rec)
+{
+	const struct reftable_log_record *log =
+		(const struct reftable_log_record *)rec;
+
+	return reftable_log_record_is_deletion(log) ? 0 : 1;
+}
+
+static uint8_t zero[SHA256_SIZE] = { 0 };
+
+static int reftable_log_record_encode(const void *rec, struct string_view s,
+				      int hash_size)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	struct string_view start = s;
+	int n = 0;
+	uint8_t *oldh = r->old_hash;
+	uint8_t *newh = r->new_hash;
+	if (reftable_log_record_is_deletion(r))
+		return 0;
+
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size)
+		return -1;
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	string_view_consume(&s, 2 * hash_size);
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	n = put_var_int(&s, r->time);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	if (s.len < 2)
+		return -1;
+
+	put_be16(s.buf, r->tz_offset);
+	string_view_consume(&s, 2);
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	return start.len - s.len;
+}
+
+static int reftable_log_record_decode(void *rec, struct strbuf key,
+				      uint8_t val_type, struct string_view in,
+				      int hash_size)
+{
+	struct string_view start = in;
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct strbuf dest = STRBUF_INIT;
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0)
+		return REFTABLE_FORMAT_ERROR;
+
+	r->refname = reftable_realloc(r->refname, key.len - 8);
+	memcpy(r->refname, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (val_type == 0) {
+		FREE_AND_NULL(r->old_hash);
+		FREE_AND_NULL(r->new_hash);
+		FREE_AND_NULL(r->message);
+		FREE_AND_NULL(r->email);
+		FREE_AND_NULL(r->name);
+		return 0;
+	}
+
+	if (in.len < 2 * hash_size)
+		return REFTABLE_FORMAT_ERROR;
+
+	r->old_hash = reftable_realloc(r->old_hash, hash_size);
+	r->new_hash = reftable_realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	string_view_consume(&in, 2 * hash_size);
+
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	string_view_consume(&in, n);
+
+	r->name = reftable_realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	strbuf_reset(&dest);
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	string_view_consume(&in, n);
+
+	r->email = reftable_realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, &in);
+	if (n < 0)
+		goto done;
+	string_view_consume(&in, n);
+	r->time = ts;
+	if (in.len < 2)
+		goto done;
+
+	r->tz_offset = get_be16(in.buf);
+	string_view_consume(&in, 2);
+
+	strbuf_reset(&dest);
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	string_view_consume(&in, n);
+
+	r->message = reftable_realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	strbuf_release(&dest);
+	return start.len - in.len;
+
+done:
+	strbuf_release(&dest);
+	return REFTABLE_FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL)
+		a = empty;
+
+	if (b == NULL)
+		b = empty;
+
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(uint8_t *a, uint8_t *b, int sz)
+{
+	if (a == NULL)
+		a = zero;
+
+	if (b == NULL)
+		b = zero;
+
+	return !memcmp(a, b, sz);
+}
+
+bool reftable_log_record_equal(struct reftable_log_record *a,
+			       struct reftable_log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+static bool reftable_log_record_is_deletion_void(const void *p)
+{
+	return reftable_log_record_is_deletion(
+		(const struct reftable_log_record *)p);
+}
+
+struct reftable_record_vtable reftable_log_record_vtable = {
+	.key = &reftable_log_record_key,
+	.type = BLOCK_TYPE_LOG,
+	.copy_from = &reftable_log_record_copy_from,
+	.val_type = &reftable_log_record_val_type,
+	.encode = &reftable_log_record_encode,
+	.decode = &reftable_log_record_decode,
+	.clear = &reftable_log_record_clear_void,
+	.is_deletion = &reftable_log_record_is_deletion_void,
+};
+
+struct reftable_record reftable_new_record(uint8_t typ)
+{
+	struct reftable_record rec = { NULL };
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct reftable_ref_record *r =
+			reftable_calloc(sizeof(struct reftable_ref_record));
+		reftable_record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct reftable_obj_record *r =
+			reftable_calloc(sizeof(struct reftable_obj_record));
+		reftable_record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct reftable_log_record *r =
+			reftable_calloc(sizeof(struct reftable_log_record));
+		reftable_record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct reftable_index_record empty = { .last_key =
+							       STRBUF_INIT };
+		struct reftable_index_record *r =
+			reftable_calloc(sizeof(struct reftable_index_record));
+		*r = empty;
+		reftable_record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+void *reftable_record_yield(struct reftable_record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+void reftable_record_destroy(struct reftable_record *rec)
+{
+	reftable_record_clear(rec);
+	reftable_free(reftable_record_yield(rec));
+}
+
+static void reftable_index_record_key(const void *r, struct strbuf *dest)
+{
+	struct reftable_index_record *rec = (struct reftable_index_record *)r;
+	strbuf_reset(dest);
+	strbuf_addbuf(dest, &rec->last_key);
+}
+
+static void reftable_index_record_copy_from(void *rec, const void *src_rec,
+					    int hash_size)
+{
+	struct reftable_index_record *dst = (struct reftable_index_record *)rec;
+	struct reftable_index_record *src =
+		(struct reftable_index_record *)src_rec;
+
+	strbuf_reset(&dst->last_key);
+	strbuf_addbuf(&dst->last_key, &src->last_key);
+	dst->offset = src->offset;
+}
+
+static void reftable_index_record_clear(void *rec)
+{
+	struct reftable_index_record *idx = (struct reftable_index_record *)rec;
+	strbuf_release(&idx->last_key);
+}
+
+static uint8_t reftable_index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int reftable_index_record_encode(const void *rec, struct string_view out,
+					int hash_size)
+{
+	const struct reftable_index_record *r =
+		(const struct reftable_index_record *)rec;
+	struct string_view start = out;
+
+	int n = put_var_int(&out, r->offset);
+	if (n < 0)
+		return n;
+
+	string_view_consume(&out, n);
+
+	return start.len - out.len;
+}
+
+static int reftable_index_record_decode(void *rec, struct strbuf key,
+					uint8_t val_type, struct string_view in,
+					int hash_size)
+{
+	struct string_view start = in;
+	struct reftable_index_record *r = (struct reftable_index_record *)rec;
+	int n = 0;
+
+	strbuf_reset(&r->last_key);
+	strbuf_addbuf(&r->last_key, &key);
+
+	n = get_var_int(&r->offset, &in);
+	if (n < 0)
+		return n;
+
+	string_view_consume(&in, n);
+	return start.len - in.len;
+}
+
+struct reftable_record_vtable reftable_index_record_vtable = {
+	.key = &reftable_index_record_key,
+	.type = BLOCK_TYPE_INDEX,
+	.copy_from = &reftable_index_record_copy_from,
+	.val_type = &reftable_index_record_val_type,
+	.encode = &reftable_index_record_encode,
+	.decode = &reftable_index_record_decode,
+	.clear = &reftable_index_record_clear,
+	.is_deletion = &not_a_deletion,
+};
+
+void reftable_record_key(struct reftable_record *rec, struct strbuf *dest)
+{
+	rec->ops->key(rec->data, dest);
+}
+
+uint8_t reftable_record_type(struct reftable_record *rec)
+{
+	return rec->ops->type;
+}
+
+int reftable_record_encode(struct reftable_record *rec, struct string_view dest,
+			   int hash_size)
+{
+	return rec->ops->encode(rec->data, dest, hash_size);
+}
+
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size)
+{
+	assert(src->ops->type == rec->ops->type);
+
+	rec->ops->copy_from(rec->data, src->data, hash_size);
+}
+
+uint8_t reftable_record_val_type(struct reftable_record *rec)
+{
+	return rec->ops->val_type(rec->data);
+}
+
+int reftable_record_decode(struct reftable_record *rec, struct strbuf key,
+			   uint8_t extra, struct string_view src, int hash_size)
+{
+	return rec->ops->decode(rec->data, key, extra, src, hash_size);
+}
+
+void reftable_record_clear(struct reftable_record *rec)
+{
+	rec->ops->clear(rec->data);
+}
+
+bool reftable_record_is_deletion(struct reftable_record *rec)
+{
+	return rec->ops->is_deletion(rec->data);
+}
+
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *ref_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = ref_rec;
+	rec->ops = &reftable_ref_record_vtable;
+}
+
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *obj_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = obj_rec;
+	rec->ops = &reftable_obj_record_vtable;
+}
+
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *index_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = index_rec;
+	rec->ops = &reftable_index_record_vtable;
+}
+
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *log_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = log_rec;
+	rec->ops = &reftable_log_record_vtable;
+}
+
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_REF);
+	return (struct reftable_ref_record *)rec->data;
+}
+
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_LOG);
+	return (struct reftable_log_record *)rec->data;
+}
+
+static bool hash_equal(uint8_t *a, uint8_t *b, int hash_size)
+{
+	if (a != NULL && b != NULL)
+		return !memcmp(a, b, hash_size);
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL)
+		return 0 == strcmp(a, b);
+
+	return a == b;
+}
+
+bool reftable_ref_record_equal(struct reftable_ref_record *a,
+			       struct reftable_ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->refname, b->refname) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int reftable_ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_ref_record *)a)->refname,
+		      ((struct reftable_ref_record *)b)->refname);
+}
+
+bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int reftable_log_record_compare_key(const void *a, const void *b)
+{
+	struct reftable_log_record *la = (struct reftable_log_record *)a;
+	struct reftable_log_record *lb = (struct reftable_log_record *)b;
+
+	int cmp = strcmp(la->refname, lb->refname);
+	if (cmp)
+		return cmp;
+	if (la->update_index > lb->update_index)
+		return -1;
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
+{
+	return (log->new_hash == NULL && log->old_hash == NULL &&
+		log->name == NULL && log->email == NULL &&
+		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
+		log->message == NULL);
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
+
+void string_view_consume(struct string_view *s, int n)
+{
+	s->buf += n;
+	s->len -= n;
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 0000000000..339e3888c9
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,143 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "strbuf.h"
+#include "system.h"
+
+/*
+  A substring of existing string data. This structure takes no responsibility
+  for the lifetime of the data it points to.
+*/
+struct string_view {
+	uint8_t *buf;
+	size_t len;
+};
+
+/* Advance `s.buf` by `n`, and decrease length. */
+void string_view_consume(struct string_view *s, int n);
+
+/* utilities for de/encoding varints */
+
+int get_var_int(uint64_t *dest, struct string_view *in);
+int put_var_int(struct string_view *dest, uint64_t val);
+
+/* Methods for records. */
+struct reftable_record_vtable {
+	/* encode the key of to a uint8_t strbuf. */
+	void (*key)(const void *rec, struct strbuf *dest);
+
+	/* The record type of ('r' for ref). */
+	uint8_t type;
+
+	void (*copy_from)(void *dest, const void *src, int hash_size);
+
+	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
+	 * vs ref deletion) */
+	uint8_t (*val_type)(const void *rec);
+
+	/* encodes rec into dest, returning how much space was used. */
+	int (*encode)(const void *rec, struct string_view dest, int hash_size);
+
+	/* decode data from `src` into the record. */
+	int (*decode)(void *rec, struct strbuf key, uint8_t extra,
+		      struct string_view src, int hash_size);
+
+	/* deallocate and null the record. */
+	void (*clear)(void *rec);
+
+	/* is this a tombstone? */
+	bool (*is_deletion)(const void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct reftable_record {
+	void *data;
+	struct reftable_record_vtable *ops;
+};
+
+/* returns true for recognized block types. Block start with the block type. */
+int reftable_is_block_type(uint8_t typ);
+
+/* creates a malloced record of the given type. Dispose with record_destroy */
+struct reftable_record reftable_new_record(uint8_t typ);
+
+extern struct reftable_record_vtable reftable_ref_record_vtable;
+
+/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
+   number of bytes written. */
+int reftable_encode_key(bool *restart, struct string_view dest,
+			struct strbuf prev_key, struct strbuf key,
+			uint8_t extra);
+
+/* Decode into `key` and `extra` from `in` */
+int reftable_decode_key(struct strbuf *key, uint8_t *extra,
+			struct strbuf last_key, struct string_view in);
+
+/* reftable_index_record are used internally to speed up lookups. */
+struct reftable_index_record {
+	uint64_t offset; /* Offset of block */
+	struct strbuf last_key; /* Last key of the block. */
+};
+
+/* reftable_obj_record stores an object ID => ref mapping. */
+struct reftable_obj_record {
+	uint8_t *hash_prefix; /* leading bytes of the object ID */
+	int hash_prefix_len; /* number of leading bytes. Constant
+			      * across a single table. */
+	uint64_t *offsets; /* a vector of file offsets. */
+	int offset_len;
+};
+
+/* see struct record_vtable */
+
+void reftable_record_key(struct reftable_record *rec, struct strbuf *dest);
+uint8_t reftable_record_type(struct reftable_record *rec);
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size);
+uint8_t reftable_record_val_type(struct reftable_record *rec);
+int reftable_record_encode(struct reftable_record *rec, struct string_view dest,
+			   int hash_size);
+int reftable_record_decode(struct reftable_record *rec, struct strbuf key,
+			   uint8_t extra, struct string_view src,
+			   int hash_size);
+bool reftable_record_is_deletion(struct reftable_record *rec);
+
+/* zeroes out the embedded record */
+void reftable_record_clear(struct reftable_record *rec);
+
+/* clear out the record, yielding the reftable_record data that was
+ * encapsulated. */
+void *reftable_record_yield(struct reftable_record *rec);
+
+/* clear and deallocate embedded record, and zero `rec`. */
+void reftable_record_destroy(struct reftable_record *rec);
+
+/* initialize generic records from concrete records. The generic record should
+ * be zeroed out. */
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *objrec);
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *idxrec);
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *refrec);
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *logrec);
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *ref);
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *ref);
+
+/* for qsort. */
+int reftable_ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int reftable_log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/record_test.c b/reftable/record_test.c
new file mode 100644
index 0000000000..52b22d7dc7
--- /dev/null
+++ b/reftable/record_test.c
@@ -0,0 +1,410 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include "system.h"
+#include "basics.h"
+#include "constants.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_copy(struct reftable_record *rec)
+{
+	struct reftable_record copy =
+		reftable_new_record(reftable_record_type(rec));
+	reftable_record_copy_from(&copy, rec, SHA1_SIZE);
+	/* do it twice to catch memory leaks */
+	reftable_record_copy_from(&copy, rec, SHA1_SIZE);
+	switch (reftable_record_type(&copy)) {
+	case BLOCK_TYPE_REF:
+		assert(reftable_ref_record_equal(reftable_record_as_ref(&copy),
+						 reftable_record_as_ref(rec),
+						 SHA1_SIZE));
+		break;
+	case BLOCK_TYPE_LOG:
+		assert(reftable_log_record_equal(reftable_record_as_log(&copy),
+						 reftable_record_as_log(rec),
+						 SHA1_SIZE));
+		break;
+	}
+	reftable_record_destroy(&copy);
+}
+
+static void test_varint_roundtrip(void)
+{
+	uint64_t inputs[] = { 0,
+			      1,
+			      27,
+			      127,
+			      128,
+			      257,
+			      4096,
+			      ((uint64_t)1 << 63),
+			      ((uint64_t)1 << 63) + ((uint64_t)1 << 63) - 1 };
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(inputs); i++) {
+		uint8_t dest[10];
+
+		struct string_view out = {
+			.buf = dest,
+			.len = sizeof(dest),
+		};
+		uint64_t in = inputs[i];
+		int n = put_var_int(&out, in);
+		uint64_t got = 0;
+
+		assert(n > 0);
+		out.len = n;
+		n = get_var_int(&got, &out);
+		assert(n > 0);
+
+		assert(got == in);
+	}
+}
+
+static void test_common_prefix(void)
+{
+	struct {
+		const char *a, *b;
+		int want;
+	} cases[] = {
+		{ "abc", "ab", 2 },
+		{ "", "abc", 0 },
+		{ "abc", "abd", 2 },
+		{ "abc", "pqr", 0 },
+	};
+
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		struct strbuf a = STRBUF_INIT;
+		struct strbuf b = STRBUF_INIT;
+		strbuf_addstr(&a, cases[i].a);
+		strbuf_addstr(&b, cases[i].b);
+		assert(common_prefix_size(&a, &b) == cases[i].want);
+
+		strbuf_release(&a);
+		strbuf_release(&b);
+	}
+}
+
+static void set_hash(uint8_t *h, int j)
+{
+	int i = 0;
+	for (i = 0; i < hash_size(SHA1_ID); i++) {
+		h[i] = (j >> i) & 0xff;
+	}
+}
+
+static void test_reftable_ref_record_roundtrip(void)
+{
+	int i = 0;
+
+	for (i = 0; i <= 3; i++) {
+		struct reftable_ref_record in = { 0 };
+		struct reftable_ref_record out = {
+			.refname = xstrdup("old name"),
+			.value = reftable_calloc(SHA1_SIZE),
+			.target_value = reftable_calloc(SHA1_SIZE),
+			.target = xstrdup("old value"),
+		};
+		struct reftable_record rec_out = { 0 };
+		struct strbuf key = STRBUF_INIT;
+		struct reftable_record rec = { 0 };
+		uint8_t buffer[1024] = { 0 };
+		struct string_view dest = {
+			.buf = buffer,
+			.len = sizeof(buffer),
+		};
+
+		int n, m;
+
+		switch (i) {
+		case 0:
+			break;
+		case 1:
+			in.value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.value, 1);
+			break;
+		case 2:
+			in.value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.value, 1);
+			in.target_value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.target_value, 2);
+			break;
+		case 3:
+			in.target = xstrdup("target");
+			break;
+		}
+		in.refname = xstrdup("refs/heads/master");
+
+		reftable_record_from_ref(&rec, &in);
+		test_copy(&rec);
+
+		assert(reftable_record_val_type(&rec) == i);
+
+		reftable_record_key(&rec, &key);
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n > 0);
+
+		/* decode into a non-zero reftable_record to test for leaks. */
+
+		reftable_record_from_ref(&rec_out, &out);
+		m = reftable_record_decode(&rec_out, key, i, dest, SHA1_SIZE);
+		assert(n == m);
+
+		assert((out.value != NULL) == (in.value != NULL));
+		assert((out.target_value != NULL) == (in.target_value != NULL));
+		assert((out.target != NULL) == (in.target != NULL));
+		reftable_record_clear(&rec_out);
+
+		strbuf_release(&key);
+		reftable_ref_record_clear(&in);
+	}
+}
+
+static void test_reftable_log_record_equal(void)
+{
+	struct reftable_log_record in[2] = {
+		{
+			.refname = xstrdup("refs/heads/master"),
+			.update_index = 42,
+		},
+		{
+			.refname = xstrdup("refs/heads/master"),
+			.update_index = 22,
+		}
+	};
+
+	assert(!reftable_log_record_equal(&in[0], &in[1], SHA1_SIZE));
+	in[1].update_index = in[0].update_index;
+	assert(reftable_log_record_equal(&in[0], &in[1], SHA1_SIZE));
+	reftable_log_record_clear(&in[0]);
+	reftable_log_record_clear(&in[1]);
+}
+
+static void test_reftable_log_record_roundtrip(void)
+{
+	struct reftable_log_record in[2] = {
+		{
+			.refname = xstrdup("refs/heads/master"),
+			.old_hash = reftable_malloc(SHA1_SIZE),
+			.new_hash = reftable_malloc(SHA1_SIZE),
+			.name = xstrdup("han-wen"),
+			.email = xstrdup("hanwen@google.com"),
+			.message = xstrdup("test"),
+			.update_index = 42,
+			.time = 1577123507,
+			.tz_offset = 100,
+		},
+		{
+			.refname = xstrdup("refs/heads/master"),
+			.update_index = 22,
+		}
+	};
+	set_test_hash(in[0].new_hash, 1);
+	set_test_hash(in[0].old_hash, 2);
+	for (int i = 0; i < ARRAY_SIZE(in); i++) {
+		struct reftable_record rec = { 0 };
+		struct strbuf key = STRBUF_INIT;
+		uint8_t buffer[1024] = { 0 };
+		struct string_view dest = {
+			.buf = buffer,
+			.len = sizeof(buffer),
+		};
+		/* populate out, to check for leaks. */
+		struct reftable_log_record out = {
+			.refname = xstrdup("old name"),
+			.new_hash = reftable_calloc(SHA1_SIZE),
+			.old_hash = reftable_calloc(SHA1_SIZE),
+			.name = xstrdup("old name"),
+			.email = xstrdup("old@email"),
+			.message = xstrdup("old message"),
+		};
+		struct reftable_record rec_out = { 0 };
+		int n, m, valtype;
+
+		reftable_record_from_log(&rec, &in[i]);
+
+		test_copy(&rec);
+
+		reftable_record_key(&rec, &key);
+
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n >= 0);
+		reftable_record_from_log(&rec_out, &out);
+		valtype = reftable_record_val_type(&rec);
+		m = reftable_record_decode(&rec_out, key, valtype, dest,
+					   SHA1_SIZE);
+		assert(n == m);
+
+		assert(reftable_log_record_equal(&in[i], &out, SHA1_SIZE));
+		reftable_log_record_clear(&in[i]);
+		strbuf_release(&key);
+		reftable_record_clear(&rec_out);
+	}
+}
+
+static void test_u24_roundtrip(void)
+{
+	uint32_t in = 0x112233;
+	uint8_t dest[3];
+	uint32_t out;
+	put_be24(dest, in);
+	out = get_be24(dest);
+	assert(in == out);
+}
+
+static void test_key_roundtrip(void)
+{
+	uint8_t buffer[1024] = { 0 };
+	struct string_view dest = {
+		.buf = buffer,
+		.len = sizeof(buffer),
+	};
+	struct strbuf last_key = STRBUF_INIT;
+	struct strbuf key = STRBUF_INIT;
+	struct strbuf roundtrip = STRBUF_INIT;
+	bool restart;
+	uint8_t extra;
+	int n, m;
+	uint8_t rt_extra;
+
+	strbuf_addstr(&last_key, "refs/heads/master");
+	strbuf_addstr(&key, "refs/tags/bla");
+	extra = 6;
+	n = reftable_encode_key(&restart, dest, last_key, key, extra);
+	assert(!restart);
+	assert(n > 0);
+
+	m = reftable_decode_key(&roundtrip, &rt_extra, last_key, dest);
+	assert(n == m);
+	assert(0 == strbuf_cmp(&key, &roundtrip));
+	assert(rt_extra == extra);
+
+	strbuf_release(&last_key);
+	strbuf_release(&key);
+	strbuf_release(&roundtrip);
+}
+
+static void test_reftable_obj_record_roundtrip(void)
+{
+	uint8_t testHash1[SHA1_SIZE] = { 1, 2, 3, 4, 0 };
+	uint64_t till9[] = { 1, 2, 3, 4, 500, 600, 700, 800, 9000 };
+	struct reftable_obj_record recs[3] = { {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+						       .offsets = till9,
+						       .offset_len = 3,
+					       },
+					       {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+						       .offsets = till9,
+						       .offset_len = 9,
+					       },
+					       {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+					       } };
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(recs); i++) {
+		struct reftable_obj_record in = recs[i];
+		uint8_t buffer[1024] = { 0 };
+		struct string_view dest = {
+			.buf = buffer,
+			.len = sizeof(buffer),
+		};
+		struct reftable_record rec = { 0 };
+		struct strbuf key = STRBUF_INIT;
+		struct reftable_obj_record out = { 0 };
+		struct reftable_record rec_out = { 0 };
+		int n, m;
+		uint8_t extra;
+
+		reftable_record_from_obj(&rec, &in);
+		test_copy(&rec);
+		reftable_record_key(&rec, &key);
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n > 0);
+		extra = reftable_record_val_type(&rec);
+		reftable_record_from_obj(&rec_out, &out);
+		m = reftable_record_decode(&rec_out, key, extra, dest,
+					   SHA1_SIZE);
+		assert(n == m);
+
+		assert(in.hash_prefix_len == out.hash_prefix_len);
+		assert(in.offset_len == out.offset_len);
+
+		assert(!memcmp(in.hash_prefix, out.hash_prefix,
+			       in.hash_prefix_len));
+		assert(0 == memcmp(in.offsets, out.offsets,
+				   sizeof(uint64_t) * in.offset_len));
+		strbuf_release(&key);
+		reftable_record_clear(&rec_out);
+	}
+}
+
+static void test_reftable_index_record_roundtrip(void)
+{
+	struct reftable_index_record in = {
+		.offset = 42,
+		.last_key = STRBUF_INIT,
+	};
+	uint8_t buffer[1024] = { 0 };
+	struct string_view dest = {
+		.buf = buffer,
+		.len = sizeof(buffer),
+	};
+	struct strbuf key = STRBUF_INIT;
+	struct reftable_record rec = { 0 };
+	struct reftable_index_record out = { .last_key = STRBUF_INIT };
+	struct reftable_record out_rec = { NULL };
+	int n, m;
+	uint8_t extra;
+
+	strbuf_addstr(&in.last_key, "refs/heads/master");
+	reftable_record_from_index(&rec, &in);
+	reftable_record_key(&rec, &key);
+	test_copy(&rec);
+
+	assert(0 == strbuf_cmp(&key, &in.last_key));
+	n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+	assert(n > 0);
+
+	extra = reftable_record_val_type(&rec);
+	reftable_record_from_index(&out_rec, &out);
+	m = reftable_record_decode(&out_rec, key, extra, dest, SHA1_SIZE);
+	assert(m == n);
+
+	assert(in.offset == out.offset);
+
+	reftable_record_clear(&out_rec);
+	strbuf_release(&key);
+	strbuf_release(&in.last_key);
+}
+
+int record_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_reftable_log_record_equal",
+		      &test_reftable_log_record_equal);
+	add_test_case("test_reftable_log_record_roundtrip",
+		      &test_reftable_log_record_roundtrip);
+	add_test_case("test_reftable_ref_record_roundtrip",
+		      &test_reftable_ref_record_roundtrip);
+	add_test_case("test_varint_roundtrip", &test_varint_roundtrip);
+	add_test_case("test_key_roundtrip", &test_key_roundtrip);
+	add_test_case("test_common_prefix", &test_common_prefix);
+	add_test_case("test_reftable_obj_record_roundtrip",
+		      &test_reftable_obj_record_roundtrip);
+	add_test_case("test_reftable_index_record_roundtrip",
+		      &test_reftable_index_record_roundtrip);
+	add_test_case("test_u24_roundtrip", &test_u24_roundtrip);
+	return test_main(argc, argv);
+}
diff --git a/reftable/refname.c b/reftable/refname.c
new file mode 100644
index 0000000000..c43393b943
--- /dev/null
+++ b/reftable/refname.c
@@ -0,0 +1,209 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+#include "reftable.h"
+#include "basics.h"
+#include "refname.h"
+#include "strbuf.h"
+
+struct find_arg {
+	char **names;
+	const char *want;
+};
+
+static int find_name(size_t k, void *arg)
+{
+	struct find_arg *f_arg = (struct find_arg *)arg;
+	return strcmp(f_arg->names[k], f_arg->want) >= 0;
+}
+
+int modification_has_ref(struct modification *mod, const char *name)
+{
+	struct reftable_ref_record ref = { 0 };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = name,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len && !strcmp(mod->add[idx], name)) {
+			return 0;
+		}
+	}
+
+	if (mod->del_len > 0) {
+		struct find_arg arg = {
+			.names = mod->del,
+			.want = name,
+		};
+		int idx = binsearch(mod->del_len, find_name, &arg);
+		if (idx < mod->del_len && !strcmp(mod->del[idx], name)) {
+			return 1;
+		}
+	}
+
+	err = reftable_table_read_ref(&mod->tab, name, &ref);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static void modification_clear(struct modification *mod)
+{
+	/* don't delete the strings themselves; they're owned by ref records.
+	 */
+	FREE_AND_NULL(mod->add);
+	FREE_AND_NULL(mod->del);
+	mod->add_len = 0;
+	mod->del_len = 0;
+}
+
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix)
+{
+	struct reftable_iterator it = { NULL };
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = prefix,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len &&
+		    !strncmp(prefix, mod->add[idx], strlen(prefix)))
+			goto done;
+	}
+	err = reftable_table_seek_ref(&mod->tab, &it, prefix);
+	if (err)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err)
+			goto done;
+
+		if (mod->del_len > 0) {
+			struct find_arg arg = {
+				.names = mod->del,
+				.want = ref.refname,
+			};
+			int idx = binsearch(mod->del_len, find_name, &arg);
+			if (idx < mod->del_len &&
+			    !strcmp(ref.refname, mod->del[idx])) {
+				continue;
+			}
+		}
+
+		if (strncmp(ref.refname, prefix, strlen(prefix))) {
+			err = 1;
+			goto done;
+		}
+		err = 0;
+		goto done;
+	}
+
+done:
+	reftable_ref_record_clear(&ref);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int validate_refname(const char *name)
+{
+	while (true) {
+		char *next = strchr(name, '/');
+		if (!*name) {
+			return REFTABLE_REFNAME_ERROR;
+		}
+		if (!next) {
+			return 0;
+		}
+		if (next - name == 0 || (next - name == 1 && *name == '.') ||
+		    (next - name == 2 && name[0] == '.' && name[1] == '.'))
+			return REFTABLE_REFNAME_ERROR;
+		name = next + 1;
+	}
+	return 0;
+}
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz)
+{
+	struct modification mod = {
+		.tab = tab,
+		.add = reftable_calloc(sizeof(char *) * sz),
+		.del = reftable_calloc(sizeof(char *) * sz),
+	};
+	int i = 0;
+	int err = 0;
+	for (; i < sz; i++) {
+		if (reftable_ref_record_is_deletion(&recs[i])) {
+			mod.del[mod.del_len++] = recs[i].refname;
+		} else {
+			mod.add[mod.add_len++] = recs[i].refname;
+		}
+	}
+
+	err = modification_validate(&mod);
+	modification_clear(&mod);
+	return err;
+}
+
+static void strbuf_trim_component(struct strbuf *sl)
+{
+	while (sl->len > 0) {
+		bool is_slash = (sl->buf[sl->len - 1] == '/');
+		strbuf_setlen(sl, sl->len - 1);
+		if (is_slash)
+			break;
+	}
+}
+
+int modification_validate(struct modification *mod)
+{
+	struct strbuf slashed = STRBUF_INIT;
+	int err = 0;
+	int i = 0;
+	for (; i < mod->add_len; i++) {
+		err = validate_refname(mod->add[i]);
+		if (err)
+			goto done;
+		strbuf_reset(&slashed);
+		strbuf_addstr(&slashed, mod->add[i]);
+		strbuf_addstr(&slashed, "/");
+
+		err = modification_has_ref_with_prefix(mod, slashed.buf);
+		if (err == 0) {
+			err = REFTABLE_NAME_CONFLICT;
+			goto done;
+		}
+		if (err < 0)
+			goto done;
+
+		strbuf_reset(&slashed);
+		strbuf_addstr(&slashed, mod->add[i]);
+		while (slashed.len) {
+			strbuf_trim_component(&slashed);
+			err = modification_has_ref(mod, slashed.buf);
+			if (err == 0) {
+				err = REFTABLE_NAME_CONFLICT;
+				goto done;
+			}
+			if (err < 0)
+				goto done;
+		}
+	}
+	err = 0;
+done:
+	strbuf_release(&slashed);
+	return err;
+}
diff --git a/reftable/refname.h b/reftable/refname.h
new file mode 100644
index 0000000000..13bf1e6861
--- /dev/null
+++ b/reftable/refname.h
@@ -0,0 +1,38 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+#ifndef REFNAME_H
+#define REFNAME_H
+
+#include "reftable.h"
+
+struct modification {
+	struct reftable_table tab;
+
+	char **add;
+	size_t add_len;
+
+	char **del;
+	size_t del_len;
+};
+
+// -1 = error, 0 = found, 1 = not found
+int modification_has_ref(struct modification *mod, const char *name);
+
+// -1 = error, 0 = found, 1 = not found.
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix);
+
+// 0 = OK.
+int validate_refname(const char *name);
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz);
+
+int modification_validate(struct modification *mod);
+
+#endif
diff --git a/reftable/refname_test.c b/reftable/refname_test.c
new file mode 100644
index 0000000000..f4ad544e23
--- /dev/null
+++ b/reftable/refname_test.c
@@ -0,0 +1,99 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "record.h"
+#include "refname.h"
+#include "system.h"
+
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+struct testcase {
+	char *add;
+	char *del;
+	int error_code;
+};
+
+static void test_conflict(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+	struct reftable_ref_record rec = {
+		.refname = "a/b",
+		.target = "destination", /* make sure it's not a symref. */
+		.update_index = 1,
+	};
+	int err;
+	int i;
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct testcase cases[] = {
+		{ "a/b/c", NULL, REFTABLE_NAME_CONFLICT },
+		{ "b", NULL, 0 },
+		{ "a", NULL, REFTABLE_NAME_CONFLICT },
+		{ "a", "a/b", 0 },
+
+		{ "p/", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p//q", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p/./q", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p/../q", NULL, REFTABLE_REFNAME_ERROR },
+
+		{ "a/b/c", "a/b", 0 },
+		{ NULL, "a//b", 0 },
+	};
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_add_ref(w, &rec);
+	assert_err(err);
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+
+	block_source_from_strbuf(&source, &buf);
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	reftable_table_from_reader(&tab, rd);
+
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		struct modification mod = {
+			.tab = tab,
+		};
+
+		if (cases[i].add != NULL) {
+			mod.add = &cases[i].add;
+			mod.add_len = 1;
+		}
+		if (cases[i].del != NULL) {
+			mod.del = &cases[i].del;
+			mod.del_len = 1;
+		}
+
+		err = modification_validate(&mod);
+		assert(err == cases[i].error_code);
+	}
+
+	reftable_reader_free(rd);
+	strbuf_release(&buf);
+}
+
+int refname_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_conflict", &test_conflict);
+	return test_main(argc, argv);
+}
diff --git a/reftable/reftable-tests.h b/reftable/reftable-tests.h
new file mode 100644
index 0000000000..e38471888f
--- /dev/null
+++ b/reftable/reftable-tests.h
@@ -0,0 +1,22 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_TESTS_H
+#define REFTABLE_TESTS_H
+
+int block_test_main(int argc, const char **argv);
+int merged_test_main(int argc, const char **argv);
+int record_test_main(int argc, const char **argv);
+int refname_test_main(int argc, const char **argv);
+int reftable_test_main(int argc, const char **argv);
+int strbuf_test_main(int argc, const char **argv);
+int stack_test_main(int argc, const char **argv);
+int tree_test_main(int argc, const char **argv);
+int reftable_dump_main(int argc, char *const *argv);
+
+#endif
diff --git a/reftable/reftable.c b/reftable/reftable.c
new file mode 100644
index 0000000000..676cc201f0
--- /dev/null
+++ b/reftable/reftable.c
@@ -0,0 +1,154 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+#include "record.h"
+#include "reader.h"
+#include "merged.h"
+
+struct reftable_table_vtable {
+	int (*seek_record)(void *tab, struct reftable_iterator *it,
+			   struct reftable_record *);
+	uint32_t (*hash_id)(void *tab);
+	uint64_t (*min_update_index)(void *tab);
+	uint64_t (*max_update_index)(void *tab);
+};
+
+static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
+				     struct reftable_record *rec)
+{
+	return reader_seek((struct reftable_reader *)tab, it, rec);
+}
+
+static uint32_t reftable_reader_hash_id_void(void *tab)
+{
+	return reftable_reader_hash_id((struct reftable_reader *)tab);
+}
+
+static uint64_t reftable_reader_min_update_index_void(void *tab)
+{
+	return reftable_reader_min_update_index((struct reftable_reader *)tab);
+}
+
+static uint64_t reftable_reader_max_update_index_void(void *tab)
+{
+	return reftable_reader_max_update_index((struct reftable_reader *)tab);
+}
+
+static struct reftable_table_vtable reader_vtable = {
+	.seek_record = reftable_reader_seek_void,
+	.hash_id = reftable_reader_hash_id_void,
+	.min_update_index = reftable_reader_min_update_index_void,
+	.max_update_index = reftable_reader_max_update_index_void,
+};
+
+static int reftable_merged_table_seek_void(void *tab,
+					   struct reftable_iterator *it,
+					   struct reftable_record *rec)
+{
+	return merged_table_seek_record((struct reftable_merged_table *)tab, it,
+					rec);
+}
+
+static uint32_t reftable_merged_table_hash_id_void(void *tab)
+{
+	return reftable_merged_table_hash_id(
+		(struct reftable_merged_table *)tab);
+}
+
+static uint64_t reftable_merged_table_min_update_index_void(void *tab)
+{
+	return reftable_merged_table_min_update_index(
+		(struct reftable_merged_table *)tab);
+}
+
+static uint64_t reftable_merged_table_max_update_index_void(void *tab)
+{
+	return reftable_merged_table_max_update_index(
+		(struct reftable_merged_table *)tab);
+}
+
+static struct reftable_table_vtable merged_table_vtable = {
+	.seek_record = reftable_merged_table_seek_void,
+	.hash_id = reftable_merged_table_hash_id_void,
+	.min_update_index = reftable_merged_table_min_update_index_void,
+	.max_update_index = reftable_merged_table_max_update_index_void,
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.refname = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return tab->ops->seek_record(tab->table_arg, it, &rec);
+}
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &reader_vtable;
+	tab->table_arg = reader;
+}
+
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *merged)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &merged_table_vtable;
+	tab->table_arg = merged;
+}
+
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_iterator it = { 0 };
+	int err = reftable_table_seek_ref(tab, &it, name);
+	if (err)
+		goto done;
+
+	err = reftable_iterator_next_ref(&it, ref);
+	if (err)
+		goto done;
+
+	if (strcmp(ref->refname, name) ||
+	    reftable_ref_record_is_deletion(ref)) {
+		reftable_ref_record_clear(ref);
+		err = 1;
+		goto done;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int reftable_table_seek_record(struct reftable_table *tab,
+			       struct reftable_iterator *it,
+			       struct reftable_record *rec)
+{
+	return tab->ops->seek_record(tab->table_arg, it, rec);
+}
+
+uint64_t reftable_table_max_update_index(struct reftable_table *tab)
+{
+	return tab->ops->max_update_index(tab->table_arg);
+}
+
+uint64_t reftable_table_min_update_index(struct reftable_table *tab)
+{
+	return tab->ops->min_update_index(tab->table_arg);
+}
+
+uint32_t reftable_table_hash_id(struct reftable_table *tab)
+{
+	return tab->ops->hash_id(tab->table_arg);
+}
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 0000000000..5f8ebf5540
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,585 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *));
+
+/****************************************************************
+ Basic data types
+
+ Reftables store the state of each ref in struct reftable_ref_record, and they
+ store a sequence of reflog updates in struct reftable_log_record.
+ ****************************************************************/
+
+/* reftable_ref_record holds a ref database entry target_value */
+struct reftable_ref_record {
+	char *refname; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
+
+/* prints a reftable_ref_record onto stdout */
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id);
+
+/* frees and nulls all pointer values. */
+void reftable_ref_record_clear(struct reftable_ref_record *ref);
+
+/* returns whether two reftable_ref_records are the same */
+int reftable_ref_record_equal(struct reftable_ref_record *a,
+			      struct reftable_ref_record *b, int hash_size);
+
+/* reftable_log_record holds a reflog entry */
+struct reftable_log_record {
+	char *refname;
+	uint64_t update_index; /* logical timestamp of a transactional update.
+				*/
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int reftable_log_record_is_deletion(const struct reftable_log_record *log);
+
+/* frees and nulls all pointer values. */
+void reftable_log_record_clear(struct reftable_log_record *log);
+
+/* returns whether two records are equal. */
+int reftable_log_record_equal(struct reftable_log_record *a,
+			      struct reftable_log_record *b, int hash_size);
+
+/* dumps a reftable_log_record on stdout, for debugging/testing. */
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id);
+
+/****************************************************************
+ Error handling
+
+ Error are signaled with negative integer return values. 0 means success.
+ ****************************************************************/
+
+/* different types of errors */
+enum reftable_error {
+	/* Unexpected file system behavior */
+	REFTABLE_IO_ERROR = -2,
+
+	/* Format inconsistency on reading data
+	 */
+	REFTABLE_FORMAT_ERROR = -3,
+
+	/* File does not exist. Returned from block_source_from_file(),  because
+	   it needs special handling in stack.
+	*/
+	REFTABLE_NOT_EXIST_ERROR = -4,
+
+	/* Trying to write out-of-date data. */
+	REFTABLE_LOCK_ERROR = -5,
+
+	/* Misuse of the API:
+	   - on writing a record with NULL refname.
+	   - on writing a reftable_ref_record outside the table limits
+	   - on writing a ref or log record before the stack's next_update_index
+	   - on writing a log record with multiline message with
+	   exact_log_message unset
+	   - on reading a reftable_ref_record from log iterator, or vice versa.
+	*/
+	REFTABLE_API_ERROR = -6,
+
+	/* Decompression error */
+	REFTABLE_ZLIB_ERROR = -7,
+
+	/* Wrote a table without blocks. */
+	REFTABLE_EMPTY_TABLE_ERROR = -8,
+
+	/* Dir/file conflict. */
+	REFTABLE_NAME_CONFLICT = -9,
+
+	/* Illegal ref name. */
+	REFTABLE_REFNAME_ERROR = -10,
+};
+
+/* convert the numeric error code to a string. The string should not be
+ * deallocated. */
+const char *reftable_error_str(int err);
+
+/*
+ * Convert the numeric error code to an equivalent errno code.
+ */
+int reftable_error_to_errno(int err);
+
+/****************************************************************
+ Writing
+
+ Writing single reftables
+ ****************************************************************/
+
+/* reftable_write_options sets options for writing a single reftable. */
+struct reftable_write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+
+	/* boolean: do not check ref names for validity or dir/file conflicts.
+	 */
+	int skip_name_check;
+
+	/* boolean: copy log messages exactly. If unset, check that the message
+	 *   is a single line, and add '\n' if missing.
+	 */
+	int exact_log_message;
+};
+
+/* reftable_block_stats holds statistics for a single block type */
+struct reftable_block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct reftable_stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct reftable_block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct reftable_block_stats obj_stats;
+	/* stats for index blocks */
+	struct reftable_block_stats idx_stats;
+	/* stats for log blocks */
+	struct reftable_block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* reftable_new_writer creates a new writer */
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, const void *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int reftable_fd_write(void *fdp, const void *data, size_t size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+
+   For transactional updates, typically min==max. When converting an existing
+   ref database into a single reftable, this would be a range of update-index
+   timestamps.
+ */
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max);
+
+/* adds a reftable_ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned.
+
+   It is an error to write a ref record after a log record.
+ */
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n);
+
+/* adds a reftable_log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n);
+
+/* reftable_writer_close finalizes the reftable. The writer is retained so
+ * statistics can be inspected. */
+int reftable_writer_close(struct reftable_writer *w);
+
+/* writer_stats returns the statistics on the reftable being written.
+
+   This struct becomes invalid when the writer is freed.
+ */
+const struct reftable_stats *writer_stats(struct reftable_writer *w);
+
+/* reftable_writer_free deallocates memory for the writer */
+void reftable_writer_free(struct reftable_writer *w);
+
+/****************************************************************
+ * ITERATING
+ ****************************************************************/
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct reftable_iterator {
+	struct reftable_iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref);
+
+/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log);
+
+/* releases resources associated with an iterator. */
+void reftable_iterator_destroy(struct reftable_iterator *it);
+
+/****************************************************************
+ Reading single tables
+
+ The follow routines are for reading single files. For an application-level
+ interface, skip ahead to struct reftable_merged_table and struct
+ reftable_stack.
+ ****************************************************************/
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct reftable_block_source {
+	struct reftable_block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct reftable_block {
+	uint8_t *data;
+	int len;
+	struct reftable_block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct reftable_block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct reftable_block *dest,
+			  uint64_t off, uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct reftable_block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int reftable_block_source_from_file(struct reftable_block_source *block_src,
+				    const char *name);
+
+/* The reader struct is a handle to an open reftable file. */
+struct reftable_reader;
+
+/* reftable_new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp. The name is used for creating a stack. Typically, it is the
+ * basename of the file. The block source `src` is owned by the reader, and is
+ * closed on calling reftable_reader_destroy().
+ */
+int reftable_new_reader(struct reftable_reader **pp,
+			struct reftable_block_source *src, const char *name);
+
+/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
+   in the table.  To seek to the start of the table, use name = "".
+
+   example:
+
+   struct reftable_reader *r = NULL;
+   int err = reftable_new_reader(&r, &src, "filename");
+   if (err < 0) { ... }
+   struct reftable_iterator it  = {0};
+   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct reftable_ref_record ref  = {0};
+   while (1) {
+     err = reftable_iterator_next_ref(&it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   reftable_iterator_destroy(&it);
+   reftable_ref_record_clear(&ref);
+ */
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reftable_reader_hash_id(struct reftable_reader *r);
+
+/* seek to logs for the given name, older than update_index. To seek to the
+   start of the table, use name = "".
+ */
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reftable_reader_free(struct reftable_reader *);
+
+/* return an iterator for the refs pointing to `oid`. */
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid);
+
+/* return the max_update_index for a table */
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
+
+/****************************************************************
+ Merged tables
+
+ A ref database kept in a sequence of table files. The merged_table presents a
+ unified view to reading (seeking, iterating) a sequence of immutable tables.
+ ****************************************************************/
+
+/* A merged table is implements seeking/iterating over a stack of tables. */
+struct reftable_merged_table;
+
+/* A generic reftable; see below. */
+struct reftable_table;
+
+/* reftable_new_merged_table creates a new merged table. It takes ownership of
+   the stack array.
+*/
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_table *stack, int n,
+			      uint32_t hash_id);
+
+/* returns an iterator positioned just before 'name' */
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index);
+
+/* like reftable_merged_table_seek_log_at but look for the newest entry. */
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
+
+/* releases memory for the merged_table */
+void reftable_merged_table_free(struct reftable_merged_table *m);
+
+/* return the hash ID of the merged table. */
+uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *m);
+
+/****************************************************************
+ Generic tables
+
+ A unified API for reading tables, either merged tables, or single readers.
+ ****************************************************************/
+
+struct reftable_table {
+	struct reftable_table_vtable *ops;
+	void *table_arg;
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name);
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader);
+
+/* returns the hash ID from a generic reftable_table */
+uint32_t reftable_table_hash_id(struct reftable_table *tab);
+
+/* create a generic table from reftable_merged_table */
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *table);
+
+/* returns the max update_index covered by this table. */
+uint64_t reftable_table_max_update_index(struct reftable_table *tab);
+
+/* returns the min update_index covered by this table. */
+uint64_t reftable_table_min_update_index(struct reftable_table *tab);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref);
+
+/****************************************************************
+ Mutable ref database
+
+ The stack presents an interface to a mutable sequence of reftables.
+ ****************************************************************/
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct reftable_stack;
+
+/* open a new reftable stack. The tables along with the table list will be
+   stored in 'dir'. Typically, this should be .git/reftables.
+*/
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
+
+/* holds a transaction to add tables at the top of a stack. */
+struct reftable_addition;
+
+/*
+  returns a new transaction to add reftables to the given stack. As a side
+  effect, the ref database is locked.
+*/
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st);
+
+/* Adds a reftable to transaction. */
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg);
+
+/* Commits the transaction, releasing the lock. */
+int reftable_addition_commit(struct reftable_addition *add);
+
+/* Release all non-committed data from the transaction, and deallocate the
+   transaction. Releases the lock if held. */
+void reftable_addition_destroy(struct reftable_addition *add);
+
+/* add a new table to the stack. The write_table function must call
+   reftable_writer_set_limits, add refs and return an error value. */
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write_table)(struct reftable_writer *wr,
+					  void *write_arg),
+		       void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st);
+
+/* frees all resources associated with the stack. */
+void reftable_stack_destroy(struct reftable_stack *st);
+
+/* Reloads the stack if necessary. This is very cheap to run if the stack was up
+ * to date */
+int reftable_stack_reload(struct reftable_stack *st);
+
+/* Policy for expiring reflog entries. */
+struct reftable_log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int reftable_stack_auto_compact(struct reftable_stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log);
+
+/* statistics on past compactions. */
+struct reftable_compaction_stats {
+	uint64_t bytes; /* total number of bytes written */
+	uint64_t entries_written; /* total number of entries written, including
+				     failures. */
+	int attempts; /* how often we tried to compact */
+	int failures; /* failures happen on concurrent updates */
+};
+
+/* return statistics for compaction up till now. */
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st);
+
+#endif
diff --git a/reftable/reftable_test.c b/reftable/reftable_test.c
new file mode 100644
index 0000000000..ad21630999
--- /dev/null
+++ b/reftable/reftable_test.c
@@ -0,0 +1,632 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "record.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static const int update_index = 5;
+
+static void test_buffer(void)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_block_source source = { NULL };
+	struct reftable_block out = { 0 };
+	int n;
+	uint8_t in[] = "hello";
+	strbuf_add(&buf, in, sizeof(in));
+	block_source_from_strbuf(&source, &buf);
+	assert(block_source_size(&source) == 6);
+	n = block_source_read_block(&source, &out, 0, sizeof(in));
+	assert(n == sizeof(in));
+	assert(!memcmp(in, out.data, n));
+	reftable_block_done(&out);
+
+	n = block_source_read_block(&source, &out, 1, 2);
+	assert(n == 2);
+	assert(!memcmp(out.data, "el", 2));
+
+	reftable_block_done(&out);
+	block_source_close(&source);
+	strbuf_release(&buf);
+}
+
+static void test_default_write_opts(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+
+	struct reftable_ref_record rec = {
+		.refname = "master",
+		.update_index = 1,
+	};
+	int err;
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader **readers =
+		reftable_calloc(sizeof(*readers) * 1);
+	struct reftable_table *tab = reftable_calloc(sizeof(*tab) * 1);
+	uint32_t hash_id;
+	struct reftable_reader *rd = NULL;
+	struct reftable_merged_table *merged = NULL;
+
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_add_ref(w, &rec);
+	assert_err(err);
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	hash_id = reftable_reader_hash_id(rd);
+	assert(hash_id == SHA1_ID);
+
+	reftable_table_from_reader(&tab[0], rd);
+	err = reftable_new_merged_table(&merged, tab, 1, SHA1_ID);
+	assert_err(err);
+
+	reader_close(rd);
+	reftable_merged_table_free(merged);
+	strbuf_release(&buf);
+}
+
+static void write_table(char ***names, struct strbuf *buf, int N,
+			int block_size, uint32_t hash_id)
+{
+	struct reftable_write_options opts = {
+		.block_size = block_size,
+		.hash_id = hash_id,
+	};
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, buf, &opts);
+	struct reftable_ref_record ref = { 0 };
+	int i = 0, n;
+	struct reftable_log_record log = { 0 };
+	const struct reftable_stats *stats = NULL;
+	*names = reftable_calloc(sizeof(char *) * (N + 1));
+	reftable_writer_set_limits(w, update_index, update_index);
+	for (i = 0; i < N; i++) {
+		uint8_t hash[SHA256_SIZE] = { 0 };
+		char name[100];
+		int n;
+
+		set_test_hash(hash, i);
+
+		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
+
+		ref.refname = name;
+		ref.value = hash;
+		ref.update_index = update_index;
+		(*names)[i] = xstrdup(name);
+
+		n = reftable_writer_add_ref(w, &ref);
+		assert(n == 0);
+	}
+
+	for (i = 0; i < N; i++) {
+		uint8_t hash[SHA256_SIZE] = { 0 };
+		char name[100];
+		int n;
+
+		set_test_hash(hash, i);
+
+		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
+
+		log.refname = name;
+		log.new_hash = hash;
+		log.update_index = update_index;
+		log.message = "message";
+
+		n = reftable_writer_add_log(w, &log);
+		assert(n == 0);
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	stats = writer_stats(w);
+	for (i = 0; i < stats->ref_stats.blocks; i++) {
+		int off = i * opts.block_size;
+		if (off == 0) {
+			off = header_size((hash_id == SHA256_ID) ? 2 : 1);
+		}
+		assert(buf->buf[off] == 'r');
+	}
+
+	assert(stats->log_stats.blocks > 0);
+	reftable_writer_free(w);
+}
+
+static void test_log_buffer_size(void)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_write_options opts = {
+		.block_size = 4096,
+	};
+	int err;
+	struct reftable_log_record log = {
+		.refname = "refs/heads/master",
+		.name = "Han-Wen Nienhuys",
+		.email = "hanwen@google.com",
+		.tz_offset = 100,
+		.time = 0x5e430672,
+		.update_index = 0xa,
+		.message = "commit: 9\n",
+	};
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+
+	/* This tests buffer extension for log compression. Must use a random
+	   hash, to ensure that the compressed part is larger than the original.
+	*/
+	uint8_t hash1[SHA1_SIZE], hash2[SHA1_SIZE];
+	for (int i = 0; i < SHA1_SIZE; i++) {
+		hash1[i] = (uint8_t)(rand() % 256);
+		hash2[i] = (uint8_t)(rand() % 256);
+	}
+	log.old_hash = hash1;
+	log.new_hash = hash2;
+	reftable_writer_set_limits(w, update_index, update_index);
+	err = reftable_writer_add_log(w, &log);
+	assert_err(err);
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+	strbuf_release(&buf);
+}
+
+static void test_log_write_read(void)
+{
+	int N = 2;
+	char **names = reftable_calloc(sizeof(char *) * (N + 1));
+	int err;
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_ref_record ref = { 0 };
+	int i = 0;
+	struct reftable_log_record log = { 0 };
+	int n;
+	struct reftable_iterator it = { 0 };
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+	const struct reftable_stats *stats = NULL;
+	reftable_writer_set_limits(w, 0, N);
+	for (i = 0; i < N; i++) {
+		char name[256];
+		struct reftable_ref_record ref = { 0 };
+		snprintf(name, sizeof(name), "b%02d%0*d", i, 130, 7);
+		names[i] = xstrdup(name);
+		puts(name);
+		ref.refname = name;
+		ref.update_index = i;
+
+		err = reftable_writer_add_ref(w, &ref);
+		assert_err(err);
+	}
+	for (i = 0; i < N; i++) {
+		uint8_t hash1[SHA1_SIZE], hash2[SHA1_SIZE];
+		struct reftable_log_record log = { 0 };
+		set_test_hash(hash1, i);
+		set_test_hash(hash2, i + 1);
+
+		log.refname = names[i];
+		log.update_index = i;
+		log.old_hash = hash1;
+		log.new_hash = hash2;
+
+		err = reftable_writer_add_log(w, &log);
+		assert_err(err);
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	stats = writer_stats(w);
+	assert(stats->log_stats.blocks > 0);
+	reftable_writer_free(w);
+	w = NULL;
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.log");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, names[N - 1]);
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert_err(err);
+
+	/* end of iteration. */
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert(0 < err);
+
+	reftable_iterator_destroy(&it);
+	reftable_ref_record_clear(&ref);
+
+	err = reftable_reader_seek_log(&rd, &it, "");
+	assert_err(err);
+
+	i = 0;
+	while (true) {
+		int err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			break;
+		}
+
+		assert_err(err);
+		assert_streq(names[i], log.refname);
+		assert(i == log.update_index);
+		i++;
+		reftable_log_record_clear(&log);
+	}
+
+	assert(i == N);
+	reftable_iterator_destroy(&it);
+
+	/* cleanup. */
+	strbuf_release(&buf);
+	free_names(names);
+	reader_close(&rd);
+}
+
+static void test_table_read_write_sequential(void)
+{
+	char **names;
+	struct strbuf buf = STRBUF_INIT;
+	int N = 50;
+	struct reftable_iterator it = { 0 };
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader rd = { 0 };
+	int err = 0;
+	int j = 0;
+
+	write_table(&names, &buf, N, 256, SHA1_ID);
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, "");
+	assert_err(err);
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		int r = reftable_iterator_next_ref(&it, &ref);
+		assert(r >= 0);
+		if (r > 0) {
+			break;
+		}
+		assert(0 == strcmp(names[j], ref.refname));
+		assert(update_index == ref.update_index);
+
+		j++;
+		reftable_ref_record_clear(&ref);
+	}
+	assert(j == N);
+	reftable_iterator_destroy(&it);
+	strbuf_release(&buf);
+	free_names(names);
+
+	reader_close(&rd);
+}
+
+static void test_table_write_small_table(void)
+{
+	char **names;
+	struct strbuf buf = STRBUF_INIT;
+	int N = 1;
+	write_table(&names, &buf, N, 4096, SHA1_ID);
+	assert(buf.len < 200);
+	strbuf_release(&buf);
+	free_names(names);
+}
+
+static void test_table_read_api(void)
+{
+	char **names;
+	struct strbuf buf = STRBUF_INIT;
+	int N = 50;
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	int err;
+	int i;
+	struct reftable_log_record log = { 0 };
+	struct reftable_iterator it = { 0 };
+
+	write_table(&names, &buf, N, 256, SHA1_ID);
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, names[0]);
+	assert_err(err);
+
+	err = reftable_iterator_next_log(&it, &log);
+	assert(err == REFTABLE_API_ERROR);
+
+	strbuf_release(&buf);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_free(names);
+	reader_close(&rd);
+	strbuf_release(&buf);
+}
+
+static void test_table_read_write_seek(bool index, int hash_id)
+{
+	char **names;
+	struct strbuf buf = STRBUF_INIT;
+	int N = 50;
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	int err;
+	int i = 0;
+
+	struct reftable_iterator it = { 0 };
+	struct strbuf pastLast = STRBUF_INIT;
+	struct reftable_ref_record ref = { 0 };
+
+	write_table(&names, &buf, N, 256, hash_id);
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+	assert(hash_id == reftable_reader_hash_id(&rd));
+
+	if (!index) {
+		rd.ref_offsets.index_offset = 0;
+	} else {
+		assert(rd.ref_offsets.index_offset > 0);
+	}
+
+	for (i = 1; i < N; i++) {
+		int err = reftable_reader_seek_ref(&rd, &it, names[i]);
+		assert_err(err);
+		err = reftable_iterator_next_ref(&it, &ref);
+		assert_err(err);
+		assert(0 == strcmp(names[i], ref.refname));
+		assert(i == ref.value[0]);
+
+		reftable_ref_record_clear(&ref);
+		reftable_iterator_destroy(&it);
+	}
+
+	strbuf_addstr(&pastLast, names[N - 1]);
+	strbuf_addstr(&pastLast, "/");
+
+	err = reftable_reader_seek_ref(&rd, &it, pastLast.buf);
+	if (err == 0) {
+		struct reftable_ref_record ref = { 0 };
+		int err = reftable_iterator_next_ref(&it, &ref);
+		assert(err > 0);
+	} else {
+		assert(err > 0);
+	}
+
+	strbuf_release(&pastLast);
+	reftable_iterator_destroy(&it);
+
+	strbuf_release(&buf);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+	reftable_free(names);
+	reader_close(&rd);
+}
+
+static void test_table_read_write_seek_linear(void)
+{
+	test_table_read_write_seek(false, SHA1_ID);
+}
+
+static void test_table_read_write_seek_linear_sha256(void)
+{
+	test_table_read_write_seek(false, SHA256_ID);
+}
+
+static void test_table_read_write_seek_index(void)
+{
+	test_table_read_write_seek(true, SHA1_ID);
+}
+
+static void test_table_refs_for(bool indexed)
+{
+	int N = 50;
+	char **want_names = reftable_calloc(sizeof(char *) * (N + 1));
+	int want_names_len = 0;
+	uint8_t want_hash[SHA1_SIZE];
+
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_ref_record ref = { 0 };
+	int i = 0;
+	int n;
+	int err;
+	struct reftable_reader rd;
+	struct reftable_block_source source = { 0 };
+
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+
+	struct reftable_iterator it = { 0 };
+	int j;
+
+	set_test_hash(want_hash, 4);
+
+	for (i = 0; i < N; i++) {
+		uint8_t hash[SHA1_SIZE];
+		char fill[51] = { 0 };
+		char name[100];
+		uint8_t hash1[SHA1_SIZE];
+		uint8_t hash2[SHA1_SIZE];
+		struct reftable_ref_record ref = { 0 };
+
+		memset(hash, i, sizeof(hash));
+		memset(fill, 'x', 50);
+		/* Put the variable part in the start */
+		snprintf(name, sizeof(name), "br%02d%s", i, fill);
+		name[40] = 0;
+		ref.refname = name;
+
+		set_test_hash(hash1, i / 4);
+		set_test_hash(hash2, 3 + i / 4);
+		ref.value = hash1;
+		ref.target_value = hash2;
+
+		/* 80 bytes / entry, so 3 entries per block. Yields 17
+		 */
+		/* blocks. */
+		n = reftable_writer_add_ref(w, &ref);
+		assert(n == 0);
+
+		if (!memcmp(hash1, want_hash, SHA1_SIZE) ||
+		    !memcmp(hash2, want_hash, SHA1_SIZE)) {
+			want_names[want_names_len++] = xstrdup(name);
+		}
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	reftable_writer_free(w);
+	w = NULL;
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+	if (!indexed) {
+		rd.obj_offsets.present = 0;
+	}
+
+	err = reftable_reader_seek_ref(&rd, &it, "");
+	assert_err(err);
+	reftable_iterator_destroy(&it);
+
+	err = reftable_reader_refs_for(&rd, &it, want_hash);
+	assert_err(err);
+
+	j = 0;
+	while (true) {
+		int err = reftable_iterator_next_ref(&it, &ref);
+		assert(err >= 0);
+		if (err > 0) {
+			break;
+		}
+
+		assert(j < want_names_len);
+		assert(0 == strcmp(ref.refname, want_names[j]));
+		j++;
+		reftable_ref_record_clear(&ref);
+	}
+	assert(j == want_names_len);
+
+	strbuf_release(&buf);
+	free_names(want_names);
+	reftable_iterator_destroy(&it);
+	reader_close(&rd);
+}
+
+static void test_table_refs_for_no_index(void)
+{
+	test_table_refs_for(false);
+}
+
+static void test_table_refs_for_obj_index(void)
+{
+	test_table_refs_for(true);
+}
+
+static void test_table_empty(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_ref_record rec = { 0 };
+	struct reftable_iterator it = { 0 };
+	int err;
+
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_close(w);
+	assert(err == REFTABLE_EMPTY_TABLE_ERROR);
+	reftable_writer_free(w);
+
+	assert(buf.len == header_size(1) + footer_size(1));
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &rec);
+	assert(err > 0);
+
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	strbuf_release(&buf);
+}
+
+int reftable_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_default_write_opts", test_default_write_opts);
+	add_test_case("test_log_write_read", test_log_write_read);
+	add_test_case("test_table_read_write_seek_linear_sha256",
+		      &test_table_read_write_seek_linear_sha256);
+	add_test_case("test_log_buffer_size", test_log_buffer_size);
+	add_test_case("test_table_write_small_table",
+		      &test_table_write_small_table);
+	add_test_case("test_buffer", &test_buffer);
+	add_test_case("test_table_read_api", &test_table_read_api);
+	add_test_case("test_table_read_write_sequential",
+		      &test_table_read_write_sequential);
+	add_test_case("test_table_read_write_seek_linear",
+		      &test_table_read_write_seek_linear);
+	add_test_case("test_table_read_write_seek_index",
+		      &test_table_read_write_seek_index);
+	add_test_case("test_table_read_write_refs_for_no_index",
+		      &test_table_refs_for_no_index);
+	add_test_case("test_table_read_write_refs_for_obj_index",
+		      &test_table_refs_for_obj_index);
+	add_test_case("test_table_empty", &test_table_empty);
+	return test_main(argc, argv);
+}
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 0000000000..94b2cb265c
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1225 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "refname.h"
+#include "reftable.h"
+#include "writer.h"
+
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config)
+{
+	struct reftable_stack *p =
+		reftable_calloc(sizeof(struct reftable_stack));
+	struct strbuf list_file_name = STRBUF_INIT;
+	int err = 0;
+
+	if (config.hash_id == 0) {
+		config.hash_id = SHA1_ID;
+	}
+
+	*dest = NULL;
+
+	strbuf_reset(&list_file_name);
+	strbuf_addstr(&list_file_name, dir);
+	strbuf_addstr(&list_file_name, "/tables.list");
+
+	p->list_file = strbuf_detach(&list_file_name, NULL);
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = reftable_stack_reload_maybe_reuse(p, true);
+	if (err < 0) {
+		reftable_stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fd_read_lines(int fd, char ***namesp)
+{
+	off_t size = lseek(fd, 0, SEEK_END);
+	char *buf = NULL;
+	int err = 0;
+	if (size < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+	err = lseek(fd, 0, SEEK_SET);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	buf = reftable_malloc(size + 1);
+	if (read(fd, buf, size) != size) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+
+done:
+	reftable_free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	int fd = open(filename, O_RDONLY, 0644);
+	int err = 0;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			*namesp = reftable_calloc(sizeof(char *));
+			return 0;
+		}
+
+		return REFTABLE_IO_ERROR;
+	}
+	err = fd_read_lines(fd, namesp);
+	close(fd);
+	return err;
+}
+
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void reftable_stack_destroy(struct reftable_stack *st)
+{
+	if (st->merged != NULL) {
+		reftable_merged_table_free(st->merged);
+		st->merged = NULL;
+	}
+
+	if (st->readers != NULL) {
+		int i = 0;
+		for (i = 0; i < st->readers_len; i++) {
+			reftable_reader_free(st->readers[i]);
+		}
+		st->readers_len = 0;
+		FREE_AND_NULL(st->readers);
+	}
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	reftable_free(st);
+}
+
+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
+						   int cur_len)
+{
+	struct reftable_reader **cur =
+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->readers[i];
+	}
+	return cur;
+}
+
+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
+				      bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reftable_reader **new_readers =
+		reftable_calloc(sizeof(struct reftable_reader *) * names_len);
+	struct reftable_table *new_tables =
+		reftable_calloc(sizeof(struct reftable_table) * names_len);
+	int new_readers_len = 0;
+	struct reftable_merged_table *new_merged = NULL;
+	int i;
+
+	while (*names) {
+		struct reftable_reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct reftable_block_source src = { 0 };
+			struct strbuf table_path = STRBUF_INIT;
+			strbuf_addstr(&table_path, st->reftable_dir);
+			strbuf_addstr(&table_path, "/");
+			strbuf_addstr(&table_path, name);
+
+			err = reftable_block_source_from_file(&src,
+							      table_path.buf);
+			strbuf_release(&table_path);
+
+			if (err < 0)
+				goto done;
+
+			err = reftable_new_reader(&rd, &src, name);
+			if (err < 0)
+				goto done;
+		}
+
+		new_readers[new_readers_len] = rd;
+		reftable_table_from_reader(&new_tables[new_readers_len], rd);
+		new_readers_len++;
+	}
+
+	/* success! */
+	err = reftable_new_merged_table(&new_merged, new_tables,
+					new_readers_len, st->config.hash_id);
+	if (err < 0)
+		goto done;
+
+	new_tables = NULL;
+	st->readers_len = new_readers_len;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		reftable_merged_table_free(st->merged);
+	}
+	if (st->readers != NULL) {
+		reftable_free(st->readers);
+	}
+	st->readers = new_readers;
+	new_readers = NULL;
+	new_readers_len = 0;
+
+	new_merged->suppress_deletions = true;
+	st->merged = new_merged;
+	for (i = 0; i < cur_len; i++) {
+		if (cur[i] != NULL) {
+			reader_close(cur[i]);
+			reftable_reader_free(cur[i]);
+		}
+	}
+
+done:
+	for (i = 0; i < new_readers_len; i++) {
+		reader_close(new_readers[i]);
+		reftable_reader_free(new_readers[i]);
+	}
+	reftable_free(new_readers);
+	reftable_free(new_tables);
+	reftable_free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0)
+		return diff;
+
+	return udiff;
+}
+
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0)
+		return err;
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		int err2 = 0;
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = reftable_stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != REFTABLE_NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
+		   writer. Check if there was one by checking if the name list
+		   changed.
+		*/
+		err2 = read_lines(st->list_file, &names_after);
+		if (err2 < 0) {
+			free_names(names);
+			return err2;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return err;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 1;
+		sleep_millisec(delay);
+	}
+
+	return 0;
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct reftable_stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0)
+		return err;
+
+	for (i = 0; i < st->readers_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto done;
+		}
+
+		if (strcmp(st->readers[i]->name, names[i])) {
+			err = 1;
+			goto done;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto done;
+	}
+
+done:
+	free_names(names);
+	return err;
+}
+
+int reftable_stack_reload(struct reftable_stack *st)
+{
+	int err = stack_uptodate(st);
+	if (err > 0)
+		return reftable_stack_reload_maybe_reuse(st, true);
+	return err;
+}
+
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write)(struct reftable_writer *wr, void *arg),
+		       void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == REFTABLE_LOCK_ERROR) {
+			/* Ignore error return, we want to propagate
+			   REFTABLE_LOCK_ERROR.
+			*/
+			reftable_stack_reload(st);
+		}
+		return err;
+	}
+
+	if (!st->disable_auto_compact)
+		return reftable_stack_auto_compact(st);
+
+	return 0;
+}
+
+static void format_name(struct strbuf *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
+	strbuf_reset(dest);
+	strbuf_addstr(dest, buf);
+}
+
+struct reftable_addition {
+	int lock_file_fd;
+	struct strbuf lock_file_name;
+	struct reftable_stack *stack;
+	char **names;
+	char **new_tables;
+	int new_tables_len;
+	uint64_t next_update_index;
+};
+
+#define REFTABLE_ADDITION_INIT                \
+	{                                     \
+		.lock_file_name = STRBUF_INIT \
+	}
+
+static int reftable_stack_init_addition(struct reftable_addition *add,
+					struct reftable_stack *st)
+{
+	int err = 0;
+	add->stack = st;
+
+	strbuf_reset(&add->lock_file_name);
+	strbuf_addstr(&add->lock_file_name, st->list_file);
+	strbuf_addstr(&add->lock_file_name, ".lock");
+
+	add->lock_file_fd = open(add->lock_file_name.buf,
+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (add->lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = REFTABLE_LOCK_ERROR;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	err = stack_uptodate(st);
+	if (err < 0)
+		goto done;
+
+	if (err > 1) {
+		err = REFTABLE_LOCK_ERROR;
+		goto done;
+	}
+
+	add->next_update_index = reftable_stack_next_update_index(st);
+done:
+	if (err) {
+		reftable_addition_close(add);
+	}
+	return err;
+}
+
+void reftable_addition_close(struct reftable_addition *add)
+{
+	int i = 0;
+	struct strbuf nm = STRBUF_INIT;
+	for (i = 0; i < add->new_tables_len; i++) {
+		strbuf_reset(&nm);
+		strbuf_addstr(&nm, add->stack->list_file);
+		strbuf_addstr(&nm, "/");
+		strbuf_addstr(&nm, add->new_tables[i]);
+		unlink(nm.buf);
+		reftable_free(add->new_tables[i]);
+		add->new_tables[i] = NULL;
+	}
+	reftable_free(add->new_tables);
+	add->new_tables = NULL;
+	add->new_tables_len = 0;
+
+	if (add->lock_file_fd > 0) {
+		close(add->lock_file_fd);
+		add->lock_file_fd = 0;
+	}
+	if (add->lock_file_name.len > 0) {
+		unlink(add->lock_file_name.buf);
+		strbuf_release(&add->lock_file_name);
+	}
+
+	free_names(add->names);
+	add->names = NULL;
+	strbuf_release(&nm);
+}
+
+void reftable_addition_destroy(struct reftable_addition *add)
+{
+	if (add == NULL) {
+		return;
+	}
+	reftable_addition_close(add);
+	reftable_free(add);
+}
+
+int reftable_addition_commit(struct reftable_addition *add)
+{
+	struct strbuf table_list = STRBUF_INIT;
+	int i = 0;
+	int err = 0;
+	if (add->new_tables_len == 0)
+		goto done;
+
+	for (i = 0; i < add->stack->merged->stack_len; i++) {
+		strbuf_addstr(&table_list, add->stack->readers[i]->name);
+		strbuf_addstr(&table_list, "\n");
+	}
+	for (i = 0; i < add->new_tables_len; i++) {
+		strbuf_addstr(&table_list, add->new_tables[i]);
+		strbuf_addstr(&table_list, "\n");
+	}
+
+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	strbuf_release(&table_list);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = close(add->lock_file_fd);
+	add->lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = rename(add->lock_file_name.buf, add->stack->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = reftable_stack_reload(add->stack);
+
+done:
+	reftable_addition_close(add);
+	return err;
+}
+
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st)
+{
+	int err = 0;
+	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
+	*dest = reftable_calloc(sizeof(**dest));
+	**dest = empty;
+	err = reftable_stack_init_addition(*dest, st);
+	if (err) {
+		reftable_free(*dest);
+		*dest = NULL;
+	}
+	return err;
+}
+
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg)
+{
+	struct reftable_addition add = REFTABLE_ADDITION_INIT;
+	int err = reftable_stack_init_addition(&add, st);
+	if (err < 0)
+		goto done;
+	if (err > 0) {
+		err = REFTABLE_LOCK_ERROR;
+		goto done;
+	}
+
+	err = reftable_addition_add(&add, write_table, arg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_addition_commit(&add);
+done:
+	reftable_addition_close(&add);
+	return err;
+}
+
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg)
+{
+	struct strbuf temp_tab_file_name = STRBUF_INIT;
+	struct strbuf tab_file_name = STRBUF_INIT;
+	struct strbuf next_name = STRBUF_INIT;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+
+	strbuf_reset(&next_name);
+	format_name(&next_name, add->next_update_index, add->next_update_index);
+
+	strbuf_addstr(&temp_tab_file_name, add->stack->reftable_dir);
+	strbuf_addstr(&temp_tab_file_name, "/");
+	strbuf_addbuf(&temp_tab_file_name, &next_name);
+	strbuf_addstr(&temp_tab_file_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp(temp_tab_file_name.buf);
+	if (tab_fd < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
+				 &add->stack->config);
+	err = write_table(wr, arg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_writer_close(wr);
+	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto done;
+	}
+	if (err < 0)
+		goto done;
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = stack_check_addition(add->stack, temp_tab_file_name.buf);
+	if (err < 0)
+		goto done;
+
+	if (wr->min_update_index < add->next_update_index) {
+		err = REFTABLE_API_ERROR;
+		goto done;
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	strbuf_addstr(&next_name, ".ref");
+
+	strbuf_addstr(&tab_file_name, add->stack->reftable_dir);
+	strbuf_addstr(&tab_file_name, "/");
+	strbuf_addbuf(&tab_file_name, &next_name);
+
+	/* TODO: should check destination out of paranoia */
+	err = rename(temp_tab_file_name.buf, tab_file_name.buf);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	add->new_tables = reftable_realloc(add->new_tables,
+					   sizeof(*add->new_tables) *
+						   (add->new_tables_len + 1));
+	add->new_tables[add->new_tables_len] = strbuf_detach(&next_name, NULL);
+	add->new_tables_len++;
+done:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_file_name.len > 0) {
+		unlink(temp_tab_file_name.buf);
+	}
+
+	strbuf_release(&temp_tab_file_name);
+	strbuf_release(&tab_file_name);
+	strbuf_release(&next_name);
+	reftable_writer_free(wr);
+	return err;
+}
+
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0)
+		return reftable_reader_max_update_index(st->readers[sz - 1]) +
+		       1;
+	return 1;
+}
+
+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
+				struct strbuf *temp_tab,
+				struct reftable_log_expiry_config *config)
+{
+	struct strbuf next_name = STRBUF_INIT;
+	int tab_fd = -1;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reftable_reader_min_update_index(st->readers[first]),
+		    reftable_reader_max_update_index(st->readers[first]));
+
+	strbuf_reset(temp_tab);
+	strbuf_addstr(temp_tab, st->reftable_dir);
+	strbuf_addstr(temp_tab, "/");
+	strbuf_addbuf(temp_tab, &next_name);
+	strbuf_addstr(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp(temp_tab->buf);
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0)
+		goto done;
+	err = reftable_writer_close(wr);
+	if (err < 0)
+		goto done;
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+done:
+	reftable_writer_free(wr);
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(temp_tab->buf);
+		strbuf_release(temp_tab);
+	}
+	strbuf_release(&next_name);
+	return err;
+}
+
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reftable_table *subtabs = reftable_calloc(
+		sizeof(struct reftable_table) * (last - first + 1));
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+
+	uint64_t entries = 0;
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reftable_reader *t = st->readers[i];
+		reftable_table_from_reader(&subtabs[j++], t);
+		st->stats.bytes += t->size;
+	}
+	reftable_writer_set_limits(wr, st->readers[first]->min_update_index,
+				   st->readers[last]->max_update_index);
+
+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
+					st->config.hash_id);
+	if (err < 0) {
+		reftable_free(subtabs);
+		goto done;
+	}
+
+	err = reftable_merged_table_seek_ref(mt, &it, "");
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = reftable_writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+	reftable_iterator_destroy(&it);
+
+	err = reftable_merged_table_seek_log(mt, &it, "");
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = reftable_writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		reftable_merged_table_free(mt);
+	}
+	reftable_ref_record_clear(&ref);
+	reftable_log_record_clear(&log);
+	st->stats.entries_written += entries;
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct reftable_stack *st, int first, int last,
+			       struct reftable_log_expiry_config *expiry)
+{
+	struct strbuf temp_tab_file_name = STRBUF_INIT;
+	struct strbuf new_table_name = STRBUF_INIT;
+	struct strbuf lock_file_name = STRBUF_INIT;
+	struct strbuf ref_list_contents = STRBUF_INIT;
+	struct strbuf new_table_path = STRBUF_INIT;
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **listp = NULL;
+	char **delete_on_success =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	char **subtable_locks =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto done;
+	}
+
+	st->stats.attempts++;
+
+	strbuf_reset(&lock_file_name);
+	strbuf_addstr(&lock_file_name, st->list_file);
+	strbuf_addstr(&lock_file_name, ".lock");
+
+	lock_file_fd =
+		open(lock_file_name.buf, O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	/* Don't want to write to the lock for now.  */
+	close(lock_file_fd);
+	lock_file_fd = 0;
+
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0)
+		goto done;
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct strbuf subtab_file_name = STRBUF_INIT;
+		struct strbuf subtab_lock = STRBUF_INIT;
+		int sublock_file_fd = -1;
+
+		strbuf_addstr(&subtab_file_name, st->reftable_dir);
+		strbuf_addstr(&subtab_file_name, "/");
+		strbuf_addstr(&subtab_file_name, reader_name(st->readers[i]));
+
+		strbuf_reset(&subtab_lock);
+		strbuf_addbuf(&subtab_lock, &subtab_file_name);
+		strbuf_addstr(&subtab_lock, ".lock");
+
+		sublock_file_fd = open(subtab_lock.buf,
+				       O_EXCL | O_CREAT | O_WRONLY, 0644);
+		if (sublock_file_fd > 0) {
+			close(sublock_file_fd);
+		} else if (sublock_file_fd < 0) {
+			if (errno == EEXIST) {
+				err = 1;
+			} else {
+				err = REFTABLE_IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = subtab_lock.buf;
+		delete_on_success[j] = subtab_file_name.buf;
+		j++;
+
+		if (err != 0)
+			goto done;
+	}
+
+	err = unlink(lock_file_name.buf);
+	if (err < 0)
+		goto done;
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
+				   expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0)
+		goto done;
+
+	lock_file_fd =
+		open(lock_file_name.buf, O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->readers[first]->min_update_index,
+		    st->readers[last]->max_update_index);
+	strbuf_addstr(&new_table_name, ".ref");
+
+	strbuf_reset(&new_table_path);
+	strbuf_addstr(&new_table_path, st->reftable_dir);
+	strbuf_addstr(&new_table_path, "/");
+	strbuf_addbuf(&new_table_path, &new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(temp_tab_file_name.buf, new_table_path.buf);
+		if (err < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto done;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		strbuf_addstr(&ref_list_contents, st->readers[i]->name);
+		strbuf_addstr(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		strbuf_addbuf(&ref_list_contents, &new_table_name);
+		strbuf_addstr(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		strbuf_addstr(&ref_list_contents, st->readers[i]->name);
+		strbuf_addstr(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(new_table_path.buf);
+		goto done;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(new_table_path.buf);
+		goto done;
+	}
+
+	err = rename(lock_file_name.buf, st->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(new_table_path.buf);
+		goto done;
+	}
+	have_lock = false;
+
+	/* Reload the stack before deleting. On windows, we can only delete the
+	   files after we closed them.
+	*/
+	err = reftable_stack_reload_maybe_reuse(st, first < last);
+
+	listp = delete_on_success;
+	while (*listp) {
+		if (strcmp(*listp, new_table_path.buf)) {
+			unlink(*listp);
+		}
+		listp++;
+	}
+
+done:
+	free_names(delete_on_success);
+
+	listp = subtable_locks;
+	while (*listp) {
+		unlink(*listp);
+		listp++;
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(lock_file_name.buf);
+	}
+	strbuf_release(&new_table_name);
+	strbuf_release(&new_table_path);
+	strbuf_release(&ref_list_contents);
+	strbuf_release(&temp_tab_file_name);
+	strbuf_release(&lock_file_name);
+	return err;
+}
+
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct reftable_stack *st, int first,
+				     int last,
+				     struct reftable_log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	if (sz == 0)
+		return 0;
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+
+	if (n == 0) {
+		*seglen = 0;
+		return segs;
+	}
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	reftable_free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
+{
+	uint64_t *sizes =
+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
+	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
+	int overhead = header_size(version) - 1;
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		sizes[i] = st->readers[i]->size - overhead;
+	}
+	return sizes;
+}
+
+int reftable_stack_auto_compact(struct reftable_stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	reftable_free(sizes);
+	if (segment_size(&seg) > 0)
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+
+	return 0;
+}
+
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st)
+{
+	return &st->stats;
+}
+
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_table tab = { NULL };
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+	return reftable_table_read_ref(&tab, refname, ref);
+}
+
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err)
+		goto done;
+
+	err = reftable_iterator_next_log(&it, log);
+	if (err)
+		goto done;
+
+	if (strcmp(log->refname, refname) ||
+	    reftable_log_record_is_deletion(log)) {
+		err = 1;
+		goto done;
+	}
+
+done:
+	if (err) {
+		reftable_log_record_clear(log);
+	}
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name)
+{
+	int err = 0;
+	struct reftable_block_source src = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct reftable_ref_record *refs = NULL;
+	struct reftable_iterator it = { NULL };
+	int cap = 0;
+	int len = 0;
+	int i = 0;
+
+	if (st->config.skip_name_check)
+		return 0;
+
+	err = reftable_block_source_from_file(&src, new_tab_name);
+	if (err < 0)
+		goto done;
+
+	err = reftable_new_reader(&rd, &src, new_tab_name);
+	if (err < 0)
+		goto done;
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	if (err > 0) {
+		err = 0;
+		goto done;
+	}
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0)
+			goto done;
+
+		if (len >= cap) {
+			cap = 2 * cap + 1;
+			refs = reftable_realloc(refs, cap * sizeof(refs[0]));
+		}
+
+		refs[len++] = ref;
+	}
+
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+
+	err = validate_ref_record_addition(tab, refs, len);
+
+done:
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&refs[i]);
+	}
+
+	free(refs);
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 0000000000..e1baa827dd
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,50 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+#include "system.h"
+
+struct reftable_stack {
+	char *list_file;
+	char *reftable_dir;
+	bool disable_auto_compact;
+
+	struct reftable_write_options config;
+
+	struct reftable_reader **readers;
+	size_t readers_len;
+	struct reftable_merged_table *merged;
+	struct reftable_compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg);
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config);
+int fastlog2(uint64_t sz);
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
+void reftable_addition_close(struct reftable_addition *add);
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
new file mode 100644
index 0000000000..f6c74bc6a7
--- /dev/null
+++ b/reftable/stack_test.c
@@ -0,0 +1,787 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+
+#include "merged.h"
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+#include <sys/types.h>
+#include <dirent.h>
+
+static void test_read_file(void)
+{
+	char fn[256] = "/tmp/stack.test_read_file.XXXXXX";
+	int fd = mkstemp(fn);
+	char out[1024] = "line1\n\nline2\nline3";
+	int n, err;
+	char **names = NULL;
+	char *want[] = { "line1", "line2", "line3" };
+	int i = 0;
+
+	assert(fd > 0);
+	n = write(fd, out, strlen(out));
+	assert(n == strlen(out));
+	err = close(fd);
+	assert(err >= 0);
+
+	err = read_lines(fn, &names);
+	assert_err(err);
+
+	for (i = 0; names[i] != NULL; i++) {
+		assert(0 == strcmp(want[i], names[i]));
+	}
+	free_names(names);
+	remove(fn);
+}
+
+static void test_parse_names(void)
+{
+	char buf[] = "line\n";
+	char **names = NULL;
+	parse_names(buf, strlen(buf), &names);
+
+	assert(NULL != names[0]);
+	assert(0 == strcmp(names[0], "line"));
+	assert(NULL == names[1]);
+	free_names(names);
+}
+
+static void test_names_equal(void)
+{
+	char *a[] = { "a", "b", "c", NULL };
+	char *b[] = { "a", "b", "d", NULL };
+	char *c[] = { "a", "b", NULL };
+
+	assert(names_equal(a, a));
+	assert(!names_equal(a, b));
+	assert(!names_equal(a, c));
+}
+
+static int write_test_ref(struct reftable_writer *wr, void *arg)
+{
+	struct reftable_ref_record *ref = arg;
+	reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
+	return reftable_writer_add_ref(wr, ref);
+}
+
+struct write_log_arg {
+	struct reftable_log_record *log;
+	uint64_t update_index;
+};
+
+static int write_test_log(struct reftable_writer *wr, void *arg)
+{
+	struct write_log_arg *wla = arg;
+
+	reftable_writer_set_limits(wr, wla->update_index, wla->update_index);
+	return reftable_writer_add_log(wr, wla->log);
+}
+
+static void test_reftable_stack_add_one(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record ref = {
+		.refname = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st, ref.refname, &dest);
+	assert_err(err);
+	assert(0 == strcmp("master", dest.target));
+
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_uptodate(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st1 = NULL;
+	struct reftable_stack *st2 = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int err;
+	struct reftable_ref_record ref1 = {
+		.refname = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record ref2 = {
+		.refname = "branch2",
+		.update_index = 2,
+		.target = "master",
+	};
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st1, dir, cfg);
+	assert_err(err);
+
+	err = reftable_new_stack(&st2, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st1, &write_test_ref, &ref1);
+	assert_err(err);
+
+	err = reftable_stack_add(st2, &write_test_ref, &ref2);
+	assert(err == REFTABLE_LOCK_ERROR);
+
+	err = reftable_stack_reload(st2);
+	assert_err(err);
+
+	err = reftable_stack_add(st2, &write_test_ref, &ref2);
+	assert_err(err);
+	reftable_stack_destroy(st1);
+	reftable_stack_destroy(st2);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_transaction_api(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_addition *add = NULL;
+
+	struct reftable_ref_record ref = {
+		.refname = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	reftable_addition_destroy(add);
+
+	err = reftable_stack_new_addition(&add, st);
+	assert_err(err);
+
+	err = reftable_addition_add(add, &write_test_ref, &ref);
+	assert_err(err);
+
+	err = reftable_addition_commit(add);
+	assert_err(err);
+
+	reftable_addition_destroy(add);
+
+	err = reftable_stack_read_ref(st, ref.refname, &dest);
+	assert_err(err);
+	assert(0 == strcmp("master", dest.target));
+
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_validate_refname(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int i;
+	struct reftable_ref_record ref = {
+		.refname = "a/b",
+		.update_index = 1,
+		.target = "master",
+	};
+	char *additions[] = { "a", "a/b/c" };
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	for (i = 0; i < ARRAY_SIZE(additions); i++) {
+		struct reftable_ref_record ref = {
+			.refname = additions[i],
+			.update_index = 1,
+			.target = "master",
+		};
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		assert(err == REFTABLE_NAME_CONFLICT);
+	}
+
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static int write_error(struct reftable_writer *wr, void *arg)
+{
+	return *((int *)arg);
+}
+
+static void test_reftable_stack_update_index_check(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record ref1 = {
+		.refname = "name1",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record ref2 = {
+		.refname = "name2",
+		.update_index = 1,
+		.target = "master",
+	};
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref1);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref2);
+	assert(err == REFTABLE_API_ERROR);
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_lock_failure(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err, i;
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+	for (i = -1; i != REFTABLE_EMPTY_TABLE_ERROR; i--) {
+		err = reftable_stack_add(st, &write_error, &i);
+		assert(err == i);
+	}
+
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_add(void)
+{
+	int i = 0;
+	int err = 0;
+	struct reftable_write_options cfg = {
+		.exact_log_message = true,
+	};
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_ref_record refs[2] = { { 0 } };
+	struct reftable_log_record logs[2] = { { 0 } };
+	int N = ARRAY_SIZE(refs);
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+	st->disable_auto_compact = true;
+
+	for (i = 0; i < N; i++) {
+		char buf[256];
+		snprintf(buf, sizeof(buf), "branch%02d", i);
+		refs[i].refname = xstrdup(buf);
+		refs[i].value = reftable_malloc(SHA1_SIZE);
+		refs[i].update_index = i + 1;
+		set_test_hash(refs[i].value, i);
+
+		logs[i].refname = xstrdup(buf);
+		logs[i].update_index = N + i + 1;
+		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+		logs[i].email = xstrdup("identity@invalid");
+		set_test_hash(logs[i].new_hash, i);
+	}
+
+	for (i = 0; i < N; i++) {
+		int err = reftable_stack_add(st, &write_test_ref, &refs[i]);
+		assert_err(err);
+	}
+
+	for (i = 0; i < N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		struct reftable_ref_record dest = { 0 };
+		int err = reftable_stack_read_ref(st, refs[i].refname, &dest);
+		assert_err(err);
+		assert(reftable_ref_record_equal(&dest, refs + i, SHA1_SIZE));
+		reftable_ref_record_clear(&dest);
+	}
+
+	for (i = 0; i < N; i++) {
+		struct reftable_log_record dest = { 0 };
+		int err = reftable_stack_read_log(st, refs[i].refname, &dest);
+		assert_err(err);
+		assert(reftable_log_record_equal(&dest, logs + i, SHA1_SIZE));
+		reftable_log_record_clear(&dest);
+	}
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i < N; i++) {
+		reftable_ref_record_clear(&refs[i]);
+		reftable_log_record_clear(&logs[i]);
+	}
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_log_normalize(void)
+{
+	int err = 0;
+	struct reftable_write_options cfg = {
+		0,
+	};
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+
+	uint8_t h1[SHA1_SIZE] = { 0x01 }, h2[SHA1_SIZE] = { 0x02 };
+
+	struct reftable_log_record input = {
+		.refname = "branch",
+		.update_index = 1,
+		.new_hash = h1,
+		.old_hash = h2,
+	};
+	struct reftable_log_record dest = {
+		.update_index = 0,
+	};
+	struct write_log_arg arg = {
+		.log = &input,
+		.update_index = 1,
+	};
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	input.message = "one\ntwo";
+	err = reftable_stack_add(st, &write_test_log, &arg);
+	assert(err == REFTABLE_API_ERROR);
+
+	input.message = "one";
+	err = reftable_stack_add(st, &write_test_log, &arg);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, input.refname, &dest);
+	assert_err(err);
+	assert(0 == strcmp(dest.message, "one\n"));
+
+	input.message = "two\n";
+	arg.update_index = 2;
+	err = reftable_stack_add(st, &write_test_log, &arg);
+	assert_err(err);
+	err = reftable_stack_read_log(st, input.refname, &dest);
+	assert_err(err);
+	assert(0 == strcmp(dest.message, "two\n"));
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	reftable_log_record_clear(&dest);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_tombstone(void)
+{
+	int i = 0;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record refs[2] = { { 0 } };
+	struct reftable_log_record logs[2] = { { 0 } };
+	int N = ARRAY_SIZE(refs);
+	struct reftable_ref_record dest = { 0 };
+	struct reftable_log_record log_dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		const char *buf = "branch";
+		refs[i].refname = xstrdup(buf);
+		refs[i].update_index = i + 1;
+		if (i % 2 == 0) {
+			refs[i].value = reftable_malloc(SHA1_SIZE);
+			set_test_hash(refs[i].value, i);
+		}
+		logs[i].refname = xstrdup(buf);
+		/* update_index is part of the key. */
+		logs[i].update_index = 42;
+		if (i % 2 == 0) {
+			logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+			set_test_hash(logs[i].new_hash, i);
+			logs[i].email = xstrdup("identity@invalid");
+		}
+	}
+	for (i = 0; i < N; i++) {
+		int err = reftable_stack_add(st, &write_test_ref, &refs[i]);
+		assert_err(err);
+	}
+	for (i = 0; i < N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_read_ref(st, "branch", &dest);
+	assert(err == 1);
+	reftable_ref_record_clear(&dest);
+
+	err = reftable_stack_read_log(st, "branch", &log_dest);
+	assert(err == 1);
+	reftable_log_record_clear(&log_dest);
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st, "branch", &dest);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, "branch", &log_dest);
+	assert(err == 1);
+	reftable_ref_record_clear(&dest);
+	reftable_log_record_clear(&log_dest);
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i < N; i++) {
+		reftable_ref_record_clear(&refs[i]);
+		reftable_log_record_clear(&logs[i]);
+	}
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_hash_id(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+
+	struct reftable_ref_record ref = {
+		.refname = "master",
+		.target = "target",
+		.update_index = 1,
+	};
+	struct reftable_write_options cfg32 = { .hash_id = SHA256_ID };
+	struct reftable_stack *st32 = NULL;
+	struct reftable_write_options cfg_default = { 0 };
+	struct reftable_stack *st_default = NULL;
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	/* can't read it with the wrong hash ID. */
+	err = reftable_new_stack(&st32, dir, cfg32);
+	assert(err == REFTABLE_FORMAT_ERROR);
+
+	/* check that we can read it back with default config too. */
+	err = reftable_new_stack(&st_default, dir, cfg_default);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st_default, "master", &dest);
+	assert_err(err);
+
+	assert(!strcmp(dest.target, ref.target));
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	reftable_stack_destroy(st_default);
+	reftable_clear_dir(dir);
+}
+
+static void test_log2(void)
+{
+	assert(1 == fastlog2(3));
+	assert(2 == fastlog2(4));
+	assert(2 == fastlog2(5));
+}
+
+static void test_sizes_to_segments(void)
+{
+	uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
+	/* .................0  1  2  3  4  5 */
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(segs[2].log == 3);
+	assert(segs[2].start == 5);
+	assert(segs[2].end == 6);
+
+	assert(segs[1].log == 2);
+	assert(segs[1].start == 2);
+	assert(segs[1].end == 5);
+	reftable_free(segs);
+}
+
+static void test_sizes_to_segments_empty(void)
+{
+	uint64_t sizes[0];
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(seglen == 0);
+	reftable_free(segs);
+}
+
+static void test_sizes_to_segments_all_equal(void)
+{
+	uint64_t sizes[] = { 5, 5 };
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(seglen == 1);
+	assert(segs[0].start == 0);
+	assert(segs[0].end == 2);
+	reftable_free(segs);
+}
+
+static void test_suggest_compaction_segment(void)
+{
+	uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
+	/* .................0    1    2  3   4  5  6 */
+	struct segment min =
+		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
+	assert(min.start == 2);
+	assert(min.end == 7);
+}
+
+static void test_suggest_compaction_segment_nothing(void)
+{
+	uint64_t sizes[] = { 64, 32, 16, 8, 4, 2 };
+	struct segment result =
+		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
+	assert(result.start == result.end);
+}
+
+static void test_reflog_expire(void)
+{
+	char dir[256] = "/tmp/stack.test_reflog_expire.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	struct reftable_log_record logs[20] = { { 0 } };
+	int N = ARRAY_SIZE(logs) - 1;
+	int i = 0;
+	int err;
+	struct reftable_log_expiry_config expiry = {
+		.time = 10,
+	};
+	struct reftable_log_record log = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 1; i <= N; i++) {
+		char buf[256];
+		snprintf(buf, sizeof(buf), "branch%02d", i);
+
+		logs[i].refname = xstrdup(buf);
+		logs[i].update_index = i;
+		logs[i].time = i;
+		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+		logs[i].email = xstrdup("identity@invalid");
+		set_test_hash(logs[i].new_hash, i);
+	}
+
+	for (i = 1; i <= N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	err = reftable_stack_compact_all(st, &expiry);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, logs[9].refname, &log);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, logs[11].refname, &log);
+	assert_err(err);
+
+	expiry.min_update_index = 15;
+	err = reftable_stack_compact_all(st, &expiry);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, logs[14].refname, &log);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, logs[16].refname, &log);
+	assert_err(err);
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i <= N; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	reftable_clear_dir(dir);
+	reftable_log_record_clear(&log);
+}
+
+static int write_nothing(struct reftable_writer *wr, void *arg)
+{
+	reftable_writer_set_limits(wr, 1, 1);
+	return 0;
+}
+
+static void test_empty_add(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_stack *st2 = NULL;
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_nothing, NULL);
+	assert_err(err);
+
+	err = reftable_new_stack(&st2, dir, cfg);
+	assert_err(err);
+	reftable_clear_dir(dir);
+	reftable_stack_destroy(st);
+	reftable_stack_destroy(st2);
+}
+
+static void test_reftable_stack_auto_compaction(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int err, i;
+	int N = 100;
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		char name[100];
+		struct reftable_ref_record ref = {
+			.refname = name,
+			.update_index = reftable_stack_next_update_index(st),
+			.target = "master",
+		};
+		snprintf(name, sizeof(name), "branch%04d", i);
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		assert_err(err);
+
+		assert(i < 3 || st->merged->stack_len < 2 * fastlog2(i));
+	}
+
+	assert(reftable_stack_compaction_stats(st)->entries_written <
+	       (uint64_t)(N * fastlog2(N)));
+
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+int stack_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_reftable_stack_uptodate",
+		      &test_reftable_stack_uptodate);
+	add_test_case("test_reftable_stack_transaction_api",
+		      &test_reftable_stack_transaction_api);
+	add_test_case("test_reftable_stack_hash_id",
+		      &test_reftable_stack_hash_id);
+	add_test_case("test_sizes_to_segments_all_equal",
+		      &test_sizes_to_segments_all_equal);
+	add_test_case("test_reftable_stack_auto_compaction",
+		      &test_reftable_stack_auto_compaction);
+	add_test_case("test_reftable_stack_validate_refname",
+		      &test_reftable_stack_validate_refname);
+	add_test_case("test_reftable_stack_update_index_check",
+		      &test_reftable_stack_update_index_check);
+	add_test_case("test_reftable_stack_lock_failure",
+		      &test_reftable_stack_lock_failure);
+	add_test_case("test_reftable_stack_log_normalize",
+		      &test_reftable_stack_log_normalize);
+	add_test_case("test_reftable_stack_tombstone",
+		      &test_reftable_stack_tombstone);
+	add_test_case("test_reftable_stack_add_one",
+		      &test_reftable_stack_add_one);
+	add_test_case("test_empty_add", test_empty_add);
+	add_test_case("test_reflog_expire", test_reflog_expire);
+	add_test_case("test_suggest_compaction_segment",
+		      &test_suggest_compaction_segment);
+	add_test_case("test_suggest_compaction_segment_nothing",
+		      &test_suggest_compaction_segment_nothing);
+	add_test_case("test_sizes_to_segments", &test_sizes_to_segments);
+	add_test_case("test_sizes_to_segments_empty",
+		      &test_sizes_to_segments_empty);
+	add_test_case("test_log2", &test_log2);
+	add_test_case("test_parse_names", &test_parse_names);
+	add_test_case("test_read_file", &test_read_file);
+	add_test_case("test_names_equal", &test_names_equal);
+	add_test_case("test_reftable_stack_add", &test_reftable_stack_add);
+	return test_main(argc, argv);
+}
diff --git a/reftable/strbuf.c b/reftable/strbuf.c
new file mode 100644
index 0000000000..eec446fb95
--- /dev/null
+++ b/reftable/strbuf.c
@@ -0,0 +1,206 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "strbuf.h"
+
+#include "system.h"
+#include "reftable.h"
+#include "basics.h"
+
+#ifdef REFTABLE_STANDALONE
+
+void strbuf_init(struct strbuf *s, size_t alloc)
+{
+	struct strbuf empty = STRBUF_INIT;
+	*s = empty;
+}
+
+void strbuf_grow(struct strbuf *s, size_t extra)
+{
+	size_t newcap = s->len + extra + 1;
+	if (newcap > s->cap) {
+		s->buf = reftable_realloc(s->buf, newcap);
+		s->cap = newcap;
+	}
+}
+
+static void strbuf_resize(struct strbuf *s, int l)
+{
+	int zl = l + 1; /* one uint8_t for 0 termination. */
+	assert(s->canary == STRBUF_CANARY);
+	if (s->cap < zl) {
+		int c = s->cap * 2;
+		if (c < zl) {
+			c = zl;
+		}
+		s->cap = c;
+		s->buf = reftable_realloc(s->buf, s->cap);
+	}
+	s->len = l;
+	s->buf[l] = 0;
+}
+
+void strbuf_setlen(struct strbuf *s, size_t l)
+{
+	assert(s->cap >= l + 1);
+	s->len = l;
+	s->buf[l] = 0;
+}
+
+void strbuf_reset(struct strbuf *s)
+{
+	strbuf_resize(s, 0);
+}
+
+void strbuf_addstr(struct strbuf *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+	assert(d->canary == STRBUF_CANARY);
+
+	strbuf_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void strbuf_addbuf(struct strbuf *s, struct strbuf *a)
+{
+	int end = s->len;
+	assert(s->canary == STRBUF_CANARY);
+	strbuf_resize(s, s->len + a->len);
+	memcpy(s->buf + end, a->buf, a->len);
+}
+
+char *strbuf_detach(struct strbuf *s, size_t *sz)
+{
+	char *p = NULL;
+	p = (char *)s->buf;
+	if (sz)
+		*sz = s->len;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void strbuf_release(struct strbuf *s)
+{
+	assert(s->canary == STRBUF_CANARY);
+	s->cap = 0;
+	s->len = 0;
+	reftable_free(s->buf);
+	s->buf = NULL;
+}
+
+int strbuf_cmp(const struct strbuf *a, const struct strbuf *b)
+{
+	int min = a->len < b->len ? a->len : b->len;
+	int res = memcmp(a->buf, b->buf, min);
+	assert(a->canary == STRBUF_CANARY);
+	assert(b->canary == STRBUF_CANARY);
+	if (res != 0)
+		return res;
+	if (a->len < b->len)
+		return -1;
+	else if (a->len > b->len)
+		return 1;
+	else
+		return 0;
+}
+
+int strbuf_add(struct strbuf *b, const void *data, size_t sz)
+{
+	assert(b->canary == STRBUF_CANARY);
+	strbuf_grow(b, sz);
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	b->buf[b->len] = 0;
+	return sz;
+}
+
+#endif
+
+static uint64_t strbuf_size(void *b)
+{
+	return ((struct strbuf *)b)->len;
+}
+
+int strbuf_add_void(void *b, const void *data, size_t sz)
+{
+	strbuf_add((struct strbuf *)b, data, sz);
+	return sz;
+}
+
+static void strbuf_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void strbuf_close(void *b)
+{
+}
+
+static int strbuf_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			     uint32_t size)
+{
+	struct strbuf *b = (struct strbuf *)v;
+	assert(off + size <= b->len);
+	dest->data = reftable_calloc(size);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable strbuf_vtable = {
+	.size = &strbuf_size,
+	.read_block = &strbuf_read_block,
+	.return_block = &strbuf_return_block,
+	.close = &strbuf_close,
+};
+
+void block_source_from_strbuf(struct reftable_block_source *bs,
+			      struct strbuf *buf)
+{
+	assert(bs->ops == NULL);
+	bs->ops = &strbuf_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+struct reftable_block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct reftable_block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct reftable_block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
+
+int common_prefix_size(struct strbuf *a, struct strbuf *b)
+{
+	int p = 0;
+	while (p < a->len && p < b->len) {
+		if (a->buf[p] != b->buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
+
+struct strbuf reftable_empty_strbuf = STRBUF_INIT;
diff --git a/reftable/strbuf.h b/reftable/strbuf.h
new file mode 100644
index 0000000000..5d5aadf26a
--- /dev/null
+++ b/reftable/strbuf.h
@@ -0,0 +1,88 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#ifdef REFTABLE_STANDALONE
+
+#include "basics.h"
+#include "reftable.h"
+
+/*
+  Provides a bounds-checked, growable byte ranges. To use, initialize as "strbuf
+  x = STRBUF_INIT;"
+ */
+struct strbuf {
+	size_t len;
+	size_t cap;
+	char *buf;
+
+	/* Used to enforce initialization with STRBUF_INIT */
+	uint8_t canary;
+};
+#define STRBUF_CANARY 0x42
+#define STRBUF_INIT                       \
+	{                                 \
+		0, 0, NULL, STRBUF_CANARY \
+	}
+
+void strbuf_addstr(struct strbuf *dest, const char *src);
+
+/* Deallocate and clear strbuf */
+void strbuf_release(struct strbuf *strbuf);
+
+/* Set strbuf to 0 length, but retain buffer. */
+void strbuf_reset(struct strbuf *strbuf);
+
+/* Initializes a strbuf. Accepts a strbuf with random garbage. */
+void strbuf_init(struct strbuf *strbuf, size_t alloc);
+
+/* Return `buf`, clearing out `s`. Optionally return len (not cap) in `sz`.  */
+char *strbuf_detach(struct strbuf *s, size_t *sz);
+
+/* Set length of the slace to `l`, but don't reallocated. */
+void strbuf_setlen(struct strbuf *s, size_t l);
+
+/* Ensure `l` bytes beyond current length are available */
+void strbuf_grow(struct strbuf *s, size_t l);
+
+/* Signed comparison */
+int strbuf_cmp(const struct strbuf *a, const struct strbuf *b);
+
+/* Append `data` to the `dest` strbuf.  */
+int strbuf_add(struct strbuf *dest, const void *data, size_t sz);
+
+/* Append `add` to `dest. */
+void strbuf_addbuf(struct strbuf *dest, struct strbuf *add);
+
+#else
+
+#include "../git-compat-util.h"
+#include "../strbuf.h"
+
+#endif
+
+extern struct strbuf reftable_empty_strbuf;
+
+/* Like strbuf_add, but suitable for passing to reftable_new_writer
+ */
+int strbuf_add_void(void *b, const void *data, size_t sz);
+
+/* Find the longest shared prefix size of `a` and `b` */
+int common_prefix_size(struct strbuf *a, struct strbuf *b);
+
+struct reftable_block_source;
+
+/* Create an in-memory block source for reading reftables */
+void block_source_from_strbuf(struct reftable_block_source *bs,
+			      struct strbuf *buf);
+
+struct reftable_block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/strbuf_test.c b/reftable/strbuf_test.c
new file mode 100644
index 0000000000..862ed86791
--- /dev/null
+++ b/reftable/strbuf_test.c
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "strbuf.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_strbuf(void)
+{
+	struct strbuf s = STRBUF_INIT;
+	struct strbuf t = STRBUF_INIT;
+
+	strbuf_addstr(&s, "abc");
+	assert(0 == strcmp("abc", s.buf));
+
+	strbuf_addstr(&t, "pqr");
+	strbuf_addbuf(&s, &t);
+	assert(0 == strcmp("abcpqr", s.buf));
+
+	strbuf_release(&s);
+	strbuf_release(&t);
+}
+
+int strbuf_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_strbuf", &test_strbuf);
+	return test_main(argc, argv);
+}
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 0000000000..6629b1104c
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#ifndef REFTABLE_STANDALONE
+
+#include "git-compat-util.h"
+#include "cache.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_STANDALONE */
+
+void reftable_clear_dir(const char *dirname);
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/test_framework.c b/reftable/test_framework.c
new file mode 100644
index 0000000000..f97c2d6df8
--- /dev/null
+++ b/reftable/test_framework.c
@@ -0,0 +1,69 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "test_framework.h"
+
+#include "system.h"
+#include "basics.h"
+#include "constants.h"
+
+struct test_case **test_cases;
+int test_case_len;
+int test_case_cap;
+
+struct test_case *new_test_case(const char *name, void (*testfunc)(void))
+{
+	struct test_case *tc = reftable_malloc(sizeof(struct test_case));
+	tc->name = name;
+	tc->testfunc = testfunc;
+	return tc;
+}
+
+struct test_case *add_test_case(const char *name, void (*testfunc)(void))
+{
+	struct test_case *tc = new_test_case(name, testfunc);
+	if (test_case_len == test_case_cap) {
+		test_case_cap = 2 * test_case_cap + 1;
+		test_cases = reftable_realloc(
+			test_cases, sizeof(struct test_case) * test_case_cap);
+	}
+
+	test_cases[test_case_len++] = tc;
+	return tc;
+}
+
+int test_main(int argc, const char *argv[])
+{
+	const char *filter = NULL;
+	int i = 0;
+	if (argc > 1) {
+		filter = argv[1];
+	}
+
+	for (i = 0; i < test_case_len; i++) {
+		const char *name = test_cases[i]->name;
+		if (filter == NULL || strstr(name, filter) != NULL) {
+			printf("case %s\n", name);
+			test_cases[i]->testfunc();
+		} else {
+			printf("skip %s\n", name);
+		}
+
+		reftable_free(test_cases[i]);
+	}
+	reftable_free(test_cases);
+	test_cases = 0;
+	test_case_len = 0;
+	test_case_cap = 0;
+	return 0;
+}
+
+void set_test_hash(uint8_t *p, int i)
+{
+	memset(p, (uint8_t)i, hash_size(SHA1_ID));
+}
diff --git a/reftable/test_framework.h b/reftable/test_framework.h
new file mode 100644
index 0000000000..6a0009c1e5
--- /dev/null
+++ b/reftable/test_framework.h
@@ -0,0 +1,64 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TEST_FRAMEWORK_H
+#define TEST_FRAMEWORK_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#ifdef NDEBUG
+#undef NDEBUG
+#endif
+
+#include "system.h"
+
+#ifdef assert
+#undef assert
+#endif
+
+#define assert_err(c)                                                  \
+	if (c != 0) {                                                  \
+		fflush(stderr);                                        \
+		fflush(stdout);                                        \
+		fprintf(stderr, "%s: %d: error == %d (%s), want 0\n",  \
+			__FILE__, __LINE__, c, reftable_error_str(c)); \
+		abort();                                               \
+	}
+
+#define assert_streq(a, b)                                               \
+	if (strcmp(a, b)) {                                              \
+		fflush(stderr);                                          \
+		fflush(stdout);                                          \
+		fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, \
+			__LINE__, #a, a, #b, b);                         \
+		abort();                                                 \
+	}
+
+#define assert(c)                                                          \
+	if (!(c)) {                                                        \
+		fflush(stderr);                                            \
+		fflush(stdout);                                            \
+		fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, \
+			__LINE__, #c);                                     \
+		abort();                                                   \
+	}
+
+struct test_case {
+	const char *name;
+	void (*testfunc)(void);
+};
+
+struct test_case *new_test_case(const char *name, void (*testfunc)(void));
+struct test_case *add_test_case(const char *name, void (*testfunc)(void));
+int test_main(int argc, const char *argv[]);
+
+void set_test_hash(uint8_t *p, int i);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 0000000000..0061d14e30
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,63 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	int res;
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				reftable_calloc(sizeof(struct tree_node));
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	res = compare(key, (*rootp)->key);
+	if (res < 0)
+		return tree_search(key, &(*rootp)->left, compare, insert);
+	else if (res > 0)
+		return tree_search(key, &(*rootp)->right, compare, insert);
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	reftable_free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 0000000000..954512e9a3
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+/* tree_node is a generic binary search tree. */
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+/* looks for `key` in `rootp` using `compare` as comparison function. If insert
+   is set, insert the key if it's not found. Else, return NULL.
+*/
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+
+/* performs an infix walk of the tree. */
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+
+/*
+  deallocates the tree nodes recursively. Keys should be deallocated separately
+  by walking over the tree. */
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/tree_test.c b/reftable/tree_test.c
new file mode 100644
index 0000000000..c6d448cbe8
--- /dev/null
+++ b/reftable/tree_test.c
@@ -0,0 +1,62 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static int test_compare(const void *a, const void *b)
+{
+	return (char *)a - (char *)b;
+}
+
+struct curry {
+	void *last;
+};
+
+static void check_increasing(void *arg, void *key)
+{
+	struct curry *c = (struct curry *)arg;
+	if (c->last != NULL) {
+		assert(test_compare(c->last, key) < 0);
+	}
+	c->last = key;
+}
+
+static void test_tree(void)
+{
+	struct tree_node *root = NULL;
+
+	void *values[11] = { 0 };
+	struct tree_node *nodes[11] = { 0 };
+	int i = 1;
+	struct curry c = { 0 };
+	do {
+		nodes[i] = tree_search(values + i, &root, &test_compare, 1);
+		i = (i * 7) % 11;
+	} while (i != 1);
+
+	for (i = 1; i < ARRAY_SIZE(nodes); i++) {
+		assert(values + i == nodes[i]->key);
+		assert(nodes[i] ==
+		       tree_search(values + i, &root, &test_compare, 0));
+	}
+
+	infix_walk(root, check_increasing, &c);
+	tree_free(root);
+}
+
+int tree_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_tree", &test_tree);
+	return test_main(argc, argv);
+}
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100755
index 0000000000..4dadd875f4
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,22 @@
+#!/bin/sh
+
+set -eu
+
+# Override this to import from somewhere else, say "../reftable".
+SRC=${SRC:-origin}
+BRANCH=${BRANCH:-master}
+
+((git --git-dir reftable-repo/.git fetch -f ${SRC} ${BRANCH}:import && cd reftable-repo && git checkout -f $(git rev-parse import) ) ||
+   git clone https://github.com/google/reftable reftable-repo)
+
+cp reftable-repo/c/*.[ch] reftable/
+cp reftable-repo/c/include/*.[ch] reftable/
+cp reftable-repo/LICENSE reftable/
+
+git --git-dir reftable-repo/.git show --no-patch --format=oneline HEAD \
+  > reftable/VERSION
+
+mv reftable/system.h reftable/system.h~
+sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
+
+git add reftable/*.[ch] reftable/LICENSE reftable/VERSION
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 0000000000..96d71463e5
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,663 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct reftable_block_stats *
+writer_reftable_block_stats(struct reftable_writer *w, uint8_t typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct reftable_writer *w, uint8_t *data, size_t len,
+			int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		uint8_t *zeroed = reftable_calloc(w->pending_padding);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0)
+			return n;
+
+		w->pending_padding = 0;
+		reftable_free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0)
+		return n;
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct reftable_write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_version(struct reftable_writer *w)
+{
+	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
+}
+
+static int writer_write_header(struct reftable_writer *w, uint8_t *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	dest[4] = writer_version(w);
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	if (writer_version(w) == 2) {
+		put_be32(dest + 24, w->opts.hash_id);
+	}
+	return header_size(writer_version(w));
+}
+
+static void writer_reinit_block_writer(struct reftable_writer *w, uint8_t typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = header_size(writer_version(w));
+	}
+
+	strbuf_release(&w->last_key);
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, const void *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts)
+{
+	struct reftable_writer *wp =
+		reftable_calloc(sizeof(struct reftable_writer));
+	strbuf_init(&wp->block_writer_data.last_key, 0);
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->last_key = reftable_empty_strbuf;
+	wp->block = reftable_calloc(opts->block_size);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void reftable_writer_free(struct reftable_writer *w)
+{
+	reftable_free(w->block);
+	reftable_free(w);
+}
+
+struct obj_index_tree_node {
+	struct strbuf hash;
+	uint64_t *offsets;
+	size_t offset_len;
+	size_t offset_cap;
+};
+
+#define OBJ_INDEX_TREE_NODE_INIT    \
+	{                           \
+		.hash = STRBUF_INIT \
+	}
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return strbuf_cmp(&((const struct obj_index_tree_node *)a)->hash,
+			  &((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct reftable_writer *w, struct strbuf *hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = *hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		struct obj_index_tree_node empty = OBJ_INDEX_TREE_NODE_INIT;
+		key = reftable_malloc(sizeof(struct obj_index_tree_node));
+		*key = empty;
+
+		strbuf_reset(&key->hash);
+		strbuf_addbuf(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = reftable_realloc(
+			key->offsets, sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct reftable_writer *w,
+			     struct reftable_record *rec)
+{
+	int result = -1;
+	struct strbuf key = STRBUF_INIT;
+	int err = 0;
+	reftable_record_key(rec, &key);
+	if (strbuf_cmp(&w->last_key, &key) >= 0)
+		goto done;
+
+	strbuf_reset(&w->last_key);
+	strbuf_addbuf(&w->last_key, &key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, reftable_record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == reftable_record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto done;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto done;
+	}
+
+	writer_reinit_block_writer(w, reftable_record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto done;
+	}
+
+	result = 0;
+done:
+	strbuf_release(&key);
+	return result;
+}
+
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	struct reftable_ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->refname == NULL)
+		return REFTABLE_API_ERROR;
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index)
+		return REFTABLE_API_ERROR;
+
+	reftable_record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, &rec);
+	if (err < 0)
+		return err;
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct strbuf h = STRBUF_INIT;
+		strbuf_add(&h, (char *)ref->value, hash_size(w->opts.hash_id));
+		writer_index_hash(w, &h);
+		strbuf_release(&h);
+	}
+
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct strbuf h = STRBUF_INIT;
+		strbuf_add(&h, ref->target_value, hash_size(w->opts.hash_id));
+		writer_index_hash(w, &h);
+	}
+	return 0;
+}
+
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, reftable_ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	char *input_log_message = log->message;
+	struct strbuf cleaned_message = STRBUF_INIT;
+	int err;
+	if (log->refname == NULL)
+		return REFTABLE_API_ERROR;
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0)
+			return err;
+	}
+
+	if (!w->opts.exact_log_message && log->message != NULL) {
+		strbuf_addstr(&cleaned_message, log->message);
+		while (cleaned_message.len &&
+		       cleaned_message.buf[cleaned_message.len - 1] == '\n')
+			strbuf_setlen(&cleaned_message,
+				      cleaned_message.len - 1);
+		if (strchr(cleaned_message.buf, '\n')) {
+			// multiple lines not allowed.
+			err = REFTABLE_API_ERROR;
+			goto done;
+		}
+		strbuf_addstr(&cleaned_message, "\n");
+		log->message = cleaned_message.buf;
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	reftable_record_from_log(&rec, log);
+	err = writer_add_record(w, &rec);
+
+done:
+	log->message = input_log_message;
+	strbuf_release(&cleaned_message);
+	return err;
+}
+
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, reftable_log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct reftable_writer *w)
+{
+	uint8_t typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	struct reftable_block_stats *bstats = NULL;
+	if (err < 0)
+		return err;
+
+	while (w->index_len > threshold) {
+		struct reftable_index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct reftable_record rec = { 0 };
+			reftable_record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, &rec) == 0) {
+				continue;
+			}
+
+			err = writer_flush_block(w);
+			if (err < 0)
+				return err;
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, &rec);
+			if (err != 0) {
+				/* write into fresh block should always succeed
+				 */
+				abort();
+			}
+		}
+		for (i = 0; i < idx_len; i++) {
+			strbuf_release(&idx[i].last_key);
+		}
+		reftable_free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0)
+		return err;
+
+	bstats = writer_reftable_block_stats(w, typ);
+	bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
+	bstats->index_offset = index_start;
+	bstats->max_index_level = max_level;
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct strbuf *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(&entry->hash, arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct reftable_writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct reftable_obj_record obj_rec = {
+		.hash_prefix = (uint8_t *)entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct reftable_record rec = { 0 };
+	if (arg->err < 0)
+		goto done;
+
+	reftable_record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0)
+		goto done;
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0)
+		goto done;
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0)
+		goto done;
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+done:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	strbuf_release(&entry->hash);
+	reftable_free(entry);
+}
+
+static int writer_dump_object_index(struct reftable_writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0)
+		return closure.err;
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct reftable_writer *w)
+{
+	uint8_t typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL)
+		return 0;
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0)
+		return err;
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0)
+			return err;
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int reftable_writer_close(struct reftable_writer *w)
+{
+	uint8_t footer[72];
+	uint8_t *p = footer;
+	int err = writer_finish_public_section(w);
+	int empty_table = w->next == 0;
+	if (err != 0)
+		goto done;
+	w->pending_padding = 0;
+	if (empty_table) {
+		/* Empty tables need a header anyway. */
+		uint8_t header[28];
+		int n = writer_write_header(w, header);
+		err = padded_write(w, header, n, 0);
+		if (err < 0)
+			goto done;
+	}
+
+	p += writer_write_header(w, footer);
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+
+	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
+	if (err < 0)
+		goto done;
+
+	if (empty_table) {
+		err = REFTABLE_EMPTY_TABLE_ERROR;
+		goto done;
+	}
+
+done:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	strbuf_release(&w->last_key);
+	return err;
+}
+
+void writer_clear_index(struct reftable_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		strbuf_release(&w->index[i].last_key);
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct reftable_writer *w)
+{
+	uint8_t typ = block_writer_type(w->block_writer);
+	struct reftable_block_stats *bstats =
+		writer_reftable_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	struct reftable_index_record ir = { .last_key = STRBUF_INIT };
+	if (raw_bytes < 0)
+		return raw_bytes;
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0)
+		return err;
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = reftable_realloc(
+			w->index,
+			sizeof(struct reftable_index_record) * w->index_cap);
+	}
+
+	ir.offset = w->next;
+	strbuf_reset(&ir.last_key);
+	strbuf_addbuf(&ir.last_key, &w->block_writer->last_key);
+	w->index[w->index_len] = ir;
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct reftable_writer *w)
+{
+	if (w->block_writer == NULL)
+		return 0;
+	if (w->block_writer->entries == 0)
+		return 0;
+	return writer_flush_nonempty_block(w);
+}
+
+const struct reftable_stats *writer_stats(struct reftable_writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 0000000000..c69ecffea0
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "strbuf.h"
+#include "tree.h"
+
+struct reftable_writer {
+	int (*write)(void *, const void *, size_t);
+	void *write_arg;
+	int pending_padding;
+	struct strbuf last_key;
+
+	/* offset of next block to write. */
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct reftable_write_options opts;
+
+	/* memory buffer for writing */
+	uint8_t *block;
+
+	/* writer for the current section. NULL or points to
+	 * block_writer_data */
+	struct block_writer *block_writer;
+
+	struct block_writer block_writer_data;
+
+	/* pending index records for the current section */
+	struct reftable_index_record *index;
+	size_t index_len;
+	size_t index_cap;
+
+	/*
+	  tree for use with tsearch; used to populate the 'o' inverse OID
+	  map */
+	struct tree_node *obj_index_tree;
+
+	struct reftable_stats stats;
+};
+
+/* finishes a block, and writes it to storage */
+int writer_flush_block(struct reftable_writer *w);
+
+/* deallocates memory related to the index */
+void writer_clear_index(struct reftable_writer *w);
+
+/* finishes writing a 'r' (refs) or 'g' (reflogs) section */
+int writer_finish_public_section(struct reftable_writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 0000000000..3e0b0f24f1
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 13/20] Add standalone build infrastructure for reftable
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (11 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 12/20] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 14/20] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (8 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The reftable library is integrates with Git, so it become a first class
supported storage mechanism, but is sufficiently complex that other
projects (e.g. libgit2) may want to consume the same source code.

To make this possible, the library also compiles completely standalone, which is
demonstrated with the Bazel BUILD/WORKSPACE files that this commit adds.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/BUILD      | 203 ++++++++++++++++++++++++++++++++++++++++++++
 reftable/README.md  |  33 +++++++
 reftable/WORKSPACE  |  14 +++
 reftable/zlib.BUILD |  36 ++++++++
 4 files changed, 286 insertions(+)
 create mode 100644 reftable/BUILD
 create mode 100644 reftable/README.md
 create mode 100644 reftable/WORKSPACE
 create mode 100644 reftable/zlib.BUILD

diff --git a/reftable/BUILD b/reftable/BUILD
new file mode 100644
index 0000000000..d3946a717c
--- /dev/null
+++ b/reftable/BUILD
@@ -0,0 +1,203 @@
+# Mirror core-git COPTS so reftable compiles without warning in CGit.
+GIT_COPTS = [
+    "-DREFTABLE_STANDALONE",
+    "-Wall",
+    "-Werror",
+    "-Wdeclaration-after-statement",
+    "-Wstrict-prototypes",
+    "-Wformat-security",
+    "-Wno-format-zero-length",
+    "-Wold-style-definition",
+    "-Woverflow",
+    "-Wpointer-arith",
+    "-Wstrict-prototypes",
+    "-Wunused",
+    "-Wvla",
+    "-Wextra",
+    "-Wmissing-prototypes",
+    "-Wno-empty-body",
+    "-Wno-missing-field-initializers",
+    "-Wno-sign-compare",
+    "-Werror=strict-aliasing",
+    "-Wno-unused-parameter",
+]
+
+cc_library(
+    name = "reftable",
+    srcs = [
+        "basics.c",
+        "block.c",
+        "compat.c",
+        "file.c",
+        "iter.c",
+        "merged.c",
+        "pq.c",
+        "reader.c",
+        "record.c",
+        "refname.c",
+        "reftable.c",
+        "strbuf.c",
+        "stack.c",
+        "tree.c",
+        "writer.c",
+        "zlib-compat.c",
+        "basics.h",
+        "block.h",
+        "compat.h",
+        "constants.h",
+        "iter.h",
+        "merged.h",
+        "pq.h",
+        "reader.h",
+        "refname.h",
+        "record.h",
+        "strbuf.h",
+        "stack.h",
+        "system.h",
+        "tree.h",
+        "writer.h",
+    ],
+    hdrs = [
+        "reftable.h",
+    ],
+    includes = [
+        "include",
+    ],
+    copts = [
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+    deps = ["@zlib"],
+    visibility = ["//visibility:public"]
+)
+
+cc_library(
+    name = "testlib",
+    srcs = [
+        "test_framework.c",
+        "dump.c",
+	"test_framework.h",
+    ],
+    hdrs = [
+            "reftable-tests.h",
+    ],
+    copts = GIT_COPTS,
+    deps = [":reftable"],
+    visibility = ["//visibility:public"]
+)
+
+cc_test(
+    name = "record_test",
+    srcs = ["record_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Drecord_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "reftable_test",
+    srcs = ["reftable_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dreftable_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "strbuf_test",
+    srcs = ["strbuf_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ] ,
+    copts = [
+        "-Dstrbuf_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "stack_test",
+    srcs = ["stack_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dstack_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "tree_test",
+    srcs = ["tree_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dtree_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "block_test",
+    srcs = ["block_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dblock_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "refname_test",
+    srcs = ["refname_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Drefname_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "merged_test",
+    srcs = ["merged_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dmerged_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+[sh_test(
+    name = "%s_valgrind_test" % t,
+    srcs = [ "valgrind_test.sh" ],
+    args = [ t ],
+    data = [ t ])
+ for t in ["record_test",
+           "merged_test",
+           "refname_test",
+           "tree_test",
+           "block_test",
+           "strbuf_test",
+           "stack_test"]]
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 0000000000..0345015c57
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,33 @@
+This directory contains an implementation of the reftable library.
+
+The library is integrated into the git-core project, but can also be built
+standalone, by compiling with -DREFTABLE_STANDALONE. The standalone build is
+exercised by the accompanying BUILD/WORKSPACE files, and can be run as
+
+  bazel test :all
+
+It includes a fragment of the zlib library to provide uncompress2(), which is a
+recent addition to the API. zlib is licensed as follows:
+
+```
+ (C) 1995-2017 Jean-loup Gailly and Mark Adler
+
+  This software is provided 'as-is', without any express or implied
+  warranty.  In no event will the authors be held liable for any damages
+  arising from the use of this software.
+
+  Permission is granted to anyone to use this software for any purpose,
+  including commercial applications, and to alter it and redistribute it
+  freely, subject to the following restrictions:
+
+  1. The origin of this software must not be misrepresented; you must not
+     claim that you wrote the original software. If you use this software
+     in a product, an acknowledgment in the product documentation would be
+     appreciated but is not required.
+  2. Altered source versions must be plainly marked as such, and must not be
+     misrepresented as being the original software.
+  3. This notice may not be removed or altered from any source distribution.
+
+  Jean-loup Gailly        Mark Adler
+  jloup@gzip.org          madler@alumni.caltech.edu
+```
diff --git a/reftable/WORKSPACE b/reftable/WORKSPACE
new file mode 100644
index 0000000000..9f91d16208
--- /dev/null
+++ b/reftable/WORKSPACE
@@ -0,0 +1,14 @@
+workspace(name = "reftable")
+
+load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
+
+http_archive(
+    name = "zlib",
+    build_file = "@//:zlib.BUILD",
+    sha256 = "c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1",
+    strip_prefix = "zlib-1.2.11",
+    urls = [
+        "https://mirror.bazel.build/zlib.net/zlib-1.2.11.tar.gz",
+        "https://zlib.net/zlib-1.2.11.tar.gz",
+    ],
+)
diff --git a/reftable/zlib.BUILD b/reftable/zlib.BUILD
new file mode 100644
index 0000000000..edb77fdf8e
--- /dev/null
+++ b/reftable/zlib.BUILD
@@ -0,0 +1,36 @@
+package(default_visibility = ["//visibility:public"])
+
+licenses(["notice"])  # BSD/MIT-like license (for zlib)
+
+cc_library(
+    name = "zlib",
+    srcs = [
+        "adler32.c",
+        "compress.c",
+        "crc32.c",
+        "crc32.h",
+        "deflate.c",
+        "deflate.h",
+        "gzclose.c",
+        "gzguts.h",
+        "gzlib.c",
+        "gzread.c",
+        "gzwrite.c",
+        "infback.c",
+        "inffast.c",
+        "inffast.h",
+        "inffixed.h",
+        "inflate.c",
+        "inflate.h",
+        "inftrees.c",
+        "inftrees.h",
+        "trees.c",
+        "trees.h",
+        "uncompr.c",
+        "zconf.h",
+        "zutil.c",
+        "zutil.h",
+    ],
+    hdrs = ["zlib.h"],
+    includes = ["."],
+)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 14/20] Reftable support for git-core
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (12 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 13/20] Add standalone build infrastructure for reftable Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 15/20] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
                                                       ` (7 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

This introduces the file refs/reftable-backend.c containing a reftable-powered
ref storage backend.

It can be activated by passing --ref-storage=reftable to "git init", or setting
GIT_TEST_REFTABLE in the environment.

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   28 +-
 builtin/clone.c                               |    4 +-
 builtin/init-db.c                             |   54 +-
 builtin/worktree.c                            |   27 +-
 cache.h                                       |    8 +-
 refs.c                                        |   27 +-
 refs.h                                        |    3 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1497 +++++++++++++++++
 reftable/update.sh                            |    3 -
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   12 +-
 t/t0031-reftable.sh                           |  173 ++
 15 files changed, 1811 insertions(+), 38 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ff..7257623583 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index 372139f1f2..6571ed1246 100644
--- a/Makefile
+++ b/Makefile
@@ -807,6 +807,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -959,6 +960,7 @@ LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1162,7 +1164,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2352,11 +2354,30 @@ VCSSVN_OBJS += vcs-svn/sliding_window.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/compat.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/refname.o
+REFTABLE_OBJS += reftable/reftable.o
+REFTABLE_OBJS += reftable/strbuf.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2497,6 +2518,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -3112,7 +3136,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/builtin/clone.c b/builtin/clone.c
index bef70745c0..8921168f86 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1112,7 +1112,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 
 	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, NULL,
-		INIT_DB_QUIET);
+		default_ref_storage(), INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
@@ -1227,7 +1227,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		 * Now that we know what algorithm the remote side is using,
 		 * let's set ours to the same thing.
 		 */
-		initialize_repository_version(hash_algo);
+		initialize_repository_version(hash_algo, default_ref_storage());
 		repo_set_hash_algo(the_repository, hash_algo);
 
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
diff --git a/builtin/init-db.c b/builtin/init-db.c
index cee64823cb..0f2873dff8 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,7 +178,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -188,7 +189,8 @@ void initialize_repository_version(int hash_algo)
 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
 #endif
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 ||
+	    !strcmp(ref_storage_format, "reftable"))
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -239,6 +241,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -248,6 +251,24 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
+	/*
+	 * refs/heads is a file when using reftable. We can't reinitialize with
+	 * a reftable because it will overwrite HEAD
+	 */
+	if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
+			      is_directory(git_path_buf(&buf, "refs/heads"))) {
+		die("cannot switch ref storage format.");
+	}
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -262,9 +283,6 @@ static int create_default_files(const char *template_path,
 	 * Point the HEAD symref to the initial branch with if HEAD does
 	 * not yet exist.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		char *ref;
 
@@ -281,7 +299,7 @@ static int create_default_files(const char *template_path,
 		free(ref);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
@@ -396,7 +414,7 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash, const char *initial_branch,
-	    unsigned int flags)
+	    const char *ref_storage_format, unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -435,6 +453,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
+	repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
@@ -467,6 +486,8 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -540,6 +561,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = default_ref_storage();
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
@@ -548,15 +570,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *initial_branch = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_STRING('b', "initial-branch", &initial_branch, N_("name"),
@@ -668,10 +693,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
 	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
-		       initial_branch, flags);
+		       initial_branch, ref_storage_format, flags);
 }
diff --git a/builtin/worktree.c b/builtin/worktree.c
index f0cbdef718..1935967072 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -12,6 +12,7 @@
 #include "submodule.h"
 #include "utf8.h"
 #include "worktree.h"
+#include "../refs/refs-internal.h"
 
 static const char * const worktree_usage[] = {
 	N_("git worktree add [<options>] <path> [<commit-ish>]"),
@@ -402,9 +403,29 @@ static int add_worktree(const char *path, const char *refname,
 	 * worktree.
 	 */
 	strbuf_reset(&sb);
-	strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
-	write_file(sb.buf, "%s", oid_to_hex(&null_oid));
-	strbuf_reset(&sb);
+	if (get_main_ref_store(the_repository)->be == &refs_be_reftable) {
+		/* XXX this is cut & paste from reftable_init_db. */
+		strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
+		write_file(sb.buf, "%s", "ref: refs/heads/.invalid\n");
+		strbuf_reset(&sb);
+
+		strbuf_addf(&sb, "%s/refs", sb_repo.buf);
+		safe_create_dir(sb.buf, 1);
+		strbuf_reset(&sb);
+
+		strbuf_addf(&sb, "%s/refs/heads", sb_repo.buf);
+		write_file(sb.buf, "this repository uses the reftable format");
+		strbuf_reset(&sb);
+
+		strbuf_addf(&sb, "%s/reftable", sb_repo.buf);
+		safe_create_dir(sb.buf, 1);
+		strbuf_reset(&sb);
+	} else {
+		strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
+		write_file(sb.buf, "%s", oid_to_hex(&null_oid));
+		strbuf_reset(&sb);
+	}
+
 	strbuf_addf(&sb, "%s/commondir", sb_repo.buf);
 	write_file(sb.buf, "../..");
 
diff --git a/cache.h b/cache.h
index 126ec56c7f..b4a4f42412 100644
--- a/cache.h
+++ b/cache.h
@@ -627,9 +627,10 @@ int path_inside_repo(const char *prefix, const char *path);
 #define INIT_DB_EXIST_OK 0x0002
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash_algo,
-	    const char *initial_branch, unsigned int flags);
-void initialize_repository_version(int hash_algo);
+	    const char *template_dir, int hash_algo, const char *initial_branch,
+	    const char *ref_storage_format, unsigned int flags);
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1044,6 +1045,7 @@ struct repository_format {
 	int hash_algo;
 	int has_extensions;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 };
 
diff --git a/refs.c b/refs.c
index 36c0e76256..577adef4ae 100644
--- a/refs.c
+++ b/refs.c
@@ -19,10 +19,16 @@
 #include "repository.h"
 #include "sigchain.h"
 
+const char *default_ref_storage(void)
+{
+	const char *test = getenv("GIT_TEST_REFTABLE");
+	return test ? "reftable" : "files";
+}
+
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1749,13 +1755,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1771,7 +1777,11 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs_private = ref_store_init(r->gitdir,
+					 r->ref_storage_format ?
+						       r->ref_storage_format :
+						       default_ref_storage(),
+					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
 
@@ -1826,7 +1836,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, default_ref_storage(),
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1840,6 +1850,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = default_ref_storage();
 	struct ref_store *refs;
 	const char *id;
 
@@ -1853,9 +1864,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index e9916ca6f0..7a877ce3c3 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* Returns the ref storage backend to use by default. */
+const char *default_ref_storage(void);
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index dc9e8d3a92..7afe4c2831 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -693,6 +693,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 0000000000..7b7d03e11e
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1497 @@
+#include "../cache.h"
+#include "../chdir-notify.h"
+#include "../config.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../refs.h"
+#include "../reftable/reftable.h"
+#include "../worktree.h"
+#include "refs-internal.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct git_reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *repo_dir;
+
+	char *reftable_dir;
+	char *worktree_reftable_dir;
+
+	struct reftable_stack *main_stack;
+	struct reftable_stack *worktree_stack;
+};
+
+static struct reftable_stack *stack_for(struct git_reftable_ref_store *store,
+					const char *refname)
+{
+	if (store->worktree_stack == NULL)
+		return store->main_stack;
+
+	switch (ref_type(refname)) {
+	case REF_TYPE_PER_WORKTREE:
+	case REF_TYPE_PSEUDOREF:
+	case REF_TYPE_OTHER_PSEUDOREF:
+		return store->worktree_stack;
+	default:
+	case REF_TYPE_MAIN_PSEUDOREF:
+	case REF_TYPE_NORMAL:
+		return store->main_stack;
+	}
+}
+
+static int git_reftable_read_raw_ref(struct ref_store *ref_store,
+				     const char *refname, struct object_id *oid,
+				     struct strbuf *referent,
+				     unsigned int *type);
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->refname = NULL;
+	reftable_log_record_clear(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = { NULL };
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	reftable_log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static int has_suffix(struct strbuf *b, const char *suffix)
+{
+	size_t len = strlen(suffix);
+
+	if (len > b->len) {
+		return 0;
+	}
+
+	return 0 == strncmp(b->buf + b->len - len, suffix, len);
+}
+
+static int trim_component(struct strbuf *b)
+{
+	char *last;
+	last = strrchr(b->buf, '/');
+	if (!last)
+		return -1;
+	strbuf_setlen(b, last - b->buf);
+	return 0;
+}
+
+static int is_worktree(struct strbuf *b)
+{
+	if (trim_component(b) < 0) {
+		return 0;
+	}
+	if (!has_suffix(b, "/worktrees")) {
+		return 0;
+	}
+	trim_component(b);
+	return 1;
+}
+
+static struct ref_store *git_reftable_ref_store_create(const char *path,
+						       unsigned int store_flags)
+{
+	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct reftable_write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+	const char *gitdir = path;
+	struct strbuf wt_buf = STRBUF_INIT;
+	int wt = 0;
+
+	strbuf_addstr(&wt_buf, path);
+
+	/* this is clumsy, but the official worktree functions (eg.
+	 * get_worktrees()) function will try to initialize a ref storage
+	 * backend, leading to infinite recursion.  */
+	wt = is_worktree(&wt_buf);
+	if (wt) {
+		gitdir = wt_buf.buf;
+	}
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+	refs->repo_dir = xstrdup(gitdir);
+	strbuf_addf(&sb, "%s/reftable", gitdir);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	refs->err =
+		reftable_new_stack(&refs->main_stack, refs->reftable_dir, cfg);
+	assert(refs->err != REFTABLE_API_ERROR);
+
+	if (refs->err == 0 && wt) {
+		strbuf_addf(&sb, "%s/reftable", path);
+		refs->worktree_reftable_dir = xstrdup(sb.buf);
+
+		refs->err = reftable_new_stack(&refs->worktree_stack,
+					       refs->worktree_reftable_dir,
+					       cfg);
+		assert(refs->err != REFTABLE_API_ERROR);
+	}
+
+	strbuf_release(&sb);
+	strbuf_release(&wt_buf);
+	return ref_store;
+}
+
+static int git_reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct strbuf sb = STRBUF_INIT;
+
+	safe_create_dir(refs->reftable_dir, 1);
+	assert(refs->worktree_reftable_dir == NULL);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
+	write_file(sb.buf, "ref: refs/heads/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	return 0;
+}
+
+struct git_reftable_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+
+	/* In case we must iterate over 2 stacks, this is non-null. */
+	struct reftable_merged_table *merged;
+	unsigned int flags;
+	int err;
+	const char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = reftable_iterator_next_ref(&ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		/*
+		   We could filter pseudo refs here explicitly, but HEAD is not
+		   a PSEUDOREF, but a PER_WORKTREE, b/c each worktree can have
+		   its own HEAD.
+		 */
+		ri->base.refname = ri->ref.refname;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.refname, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.refname,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	reftable_ref_record_clear(&ri->ref);
+	reftable_iterator_destroy(&ri->iter);
+	if (ri->merged) {
+		reftable_merged_table_free(ri->merged);
+	}
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+git_reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+				unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else if (refs->worktree_stack == NULL) {
+		struct reftable_merged_table *mt =
+			reftable_stack_merged_table(refs->main_stack);
+		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
+	} else {
+		struct reftable_merged_table *mt1 =
+			reftable_stack_merged_table(refs->main_stack);
+		struct reftable_merged_table *mt2 =
+			reftable_stack_merged_table(refs->worktree_stack);
+		struct reftable_table *tabs =
+			xcalloc(2, sizeof(struct reftable_table));
+		reftable_table_from_merged_table(&tabs[0], mt1);
+		reftable_table_from_merged_table(&tabs[1], mt2);
+		ri->err = reftable_new_merged_table(&ri->merged, tabs, 2,
+						    the_hash_algo->format_id);
+		if (ri->err == 0)
+			ri->err = reftable_merged_table_seek_ref(
+				ri->merged, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->prefix = prefix;
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int fixup_symrefs(struct ref_store *ref_store,
+			 struct ref_transaction *transaction)
+{
+	struct strbuf referent = STRBUF_INIT;
+	int i = 0;
+	int err = 0;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *update = transaction->updates[i];
+		struct object_id old_oid;
+
+		err = git_reftable_read_raw_ref(ref_store, update->refname,
+						&old_oid, &referent,
+						/* mutate input, like
+						   files-backend.c */
+						&update->type);
+		if (err < 0 && errno == ENOENT &&
+		    is_null_oid(&update->old_oid)) {
+			err = 0;
+		}
+		if (err < 0)
+			goto done;
+
+		if (!(update->type & REF_ISSYMREF))
+			continue;
+
+		if (update->flags & REF_NO_DEREF) {
+			/* what should happen here? See files-backend.c
+			 * lock_ref_for_update. */
+		} else {
+			/*
+			  If we are updating a symref (eg. HEAD), we should also
+			  update the branch that the symref points to.
+
+			  This is generic functionality, and would be better
+			  done in refs.c, but the current implementation is
+			  intertwined with the locking in files-backend.c.
+			*/
+			int new_flags = update->flags;
+			struct ref_update *new_update = NULL;
+
+			/* if this is an update for HEAD, should also record a
+			   log entry for HEAD? See files-backend.c,
+			   split_head_update()
+			*/
+			new_update = ref_transaction_add_update(
+				transaction, referent.buf, new_flags,
+				&update->new_oid, &update->old_oid,
+				update->msg);
+			new_update->parent_update = update;
+
+			/* files-backend sets REF_LOG_ONLY here. */
+			update->flags |= REF_NO_DEREF | REF_LOG_ONLY;
+			update->flags &= ~REF_HAVE_OLD;
+		}
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	strbuf_release(&referent);
+	return err;
+}
+
+static int git_reftable_transaction_prepare(struct ref_store *ref_store,
+					    struct ref_transaction *transaction,
+					    struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_addition *add = NULL;
+	struct reftable_stack *stack =
+		transaction->nr ?
+			      stack_for(refs, transaction->updates[0]->refname) :
+			      refs->main_stack;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_new_addition(&add, stack);
+	if (err) {
+		goto done;
+	}
+
+	err = fixup_symrefs(ref_store, transaction);
+	if (err) {
+		goto done;
+	}
+
+	transaction->backend_data = add;
+	transaction->state = REF_TRANSACTION_PREPARED;
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	if (err < 0) {
+		transaction->state = REF_TRANSACTION_CLOSED;
+		strbuf_addf(errbuf, "reftable: transaction prepare: %s",
+			    reftable_error_str(err));
+	}
+
+	return err;
+}
+
+static int git_reftable_transaction_abort(struct ref_store *ref_store,
+					  struct ref_transaction *transaction,
+					  struct strbuf *err)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	reftable_addition_destroy(add);
+	transaction->backend_data = NULL;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid;
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp((*(struct ref_update **)a)->refname,
+		      (*(struct ref_update **)b)->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)transaction->ref_store;
+	struct reftable_stack *stack =
+		stack_for(refs, transaction->updates[0]->refname);
+	uint64_t ts = reftable_stack_next_update_index(stack);
+	int err = 0;
+	int i = 0;
+	struct reftable_log_record *logs =
+		calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct reftable_log_record *log = &logs[i];
+		fill_reftable_log_record(log);
+		log->refname = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_LOG_ONLY) {
+			continue;
+		}
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct reftable_ref_record ref = { NULL };
+			struct object_id peeled;
+
+			int peel_error = peel_object(&u->new_oid, &peeled);
+			ref.refname = (char *)u->refname;
+
+			if (!is_null_oid(&u->new_oid)) {
+				ref.value = u->new_oid.hash;
+			}
+			ref.update_index = ts;
+			if (!peel_error) {
+				ref.target_value = peeled.hash;
+			}
+
+			err = reftable_writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		err = reftable_writer_add_log(writer, &logs[i]);
+		clear_reftable_log_record(&logs[i]);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int git_reftable_transaction_finish(struct ref_store *ref_store,
+					   struct ref_transaction *transaction,
+					   struct strbuf *errmsg)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	int err = 0;
+	int i;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+	if (transaction->nr) {
+		err = reftable_addition_add(add, &write_transaction_table,
+					    transaction);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+	err = reftable_addition_commit(add);
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_addition_destroy(add);
+	transaction->state = REF_TRANSACTION_CLOSED;
+	transaction->backend_data = NULL;
+	if (err) {
+		strbuf_addf(errmsg, "reftable: transaction failure: %s",
+			    reftable_error_str(err));
+		return -1;
+	}
+	return err;
+}
+
+static int
+git_reftable_transaction_initial_commit(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *errmsg)
+{
+	int err = git_reftable_transaction_prepare(ref_store, transaction,
+						   errmsg);
+	if (err)
+		return err;
+
+	return git_reftable_transaction_finish(ref_store, transaction, errmsg);
+}
+
+struct write_pseudoref_arg {
+	struct reftable_stack *stack;
+	const char *pseudoref;
+	const struct object_id *new_oid;
+	const struct object_id *old_oid;
+};
+
+static int write_pseudoref_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_pseudoref_arg *arg = (struct write_pseudoref_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	struct reftable_ref_record read_ref = { NULL };
+	struct reftable_ref_record write_ref = { NULL };
+
+	reftable_writer_set_limits(writer, ts, ts);
+	if (arg->old_oid) {
+		struct object_id read_oid;
+		err = reftable_stack_read_ref(arg->stack, arg->pseudoref,
+					      &read_ref);
+		if (err < 0)
+			goto done;
+
+		if ((err > 0) != is_null_oid(arg->old_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		/* XXX If old_oid is set, and we have a symref? */
+
+		if (err == 0 && read_ref.value == NULL) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+
+		hashcpy(read_oid.hash, read_ref.value);
+		if (!oideq(arg->old_oid, &read_oid)) {
+			err = REFTABLE_LOCK_ERROR;
+			goto done;
+		}
+	}
+
+	write_ref.refname = (char *)arg->pseudoref;
+	write_ref.update_index = ts;
+	if (!is_null_oid(arg->new_oid))
+		write_ref.value = (uint8_t *)arg->new_oid->hash;
+
+	err = reftable_writer_add_ref(writer, &write_ref);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&read_ref);
+	return err;
+}
+
+static int git_reftable_write_pseudoref(struct ref_store *ref_store,
+					const char *pseudoref,
+					const struct object_id *oid,
+					const struct object_id *old_oid,
+					struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, pseudoref);
+	struct write_pseudoref_arg arg = {
+		.stack = stack,
+		.pseudoref = pseudoref,
+		.new_oid = oid,
+	};
+	struct reftable_addition *add = NULL;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_new_addition(&add, stack);
+	if (err) {
+		goto done;
+	}
+	if (old_oid) {
+		struct object_id actual_old_oid;
+
+		/* XXX this is cut & paste from files-backend - should factor
+		 * out? */
+		if (read_ref(pseudoref, &actual_old_oid)) {
+			if (!is_null_oid(old_oid)) {
+				strbuf_addf(errbuf,
+					    _("could not read ref '%s'"),
+					    pseudoref);
+				goto done;
+			}
+		} else if (is_null_oid(old_oid)) {
+			strbuf_addf(errbuf, _("ref '%s' already exists"),
+				    pseudoref);
+			goto done;
+		} else if (!oideq(&actual_old_oid, old_oid)) {
+			strbuf_addf(errbuf,
+				    _("unexpected object ID when writing '%s'"),
+				    pseudoref);
+			goto done;
+		}
+	}
+
+	err = reftable_addition_add(add, &write_pseudoref_table, &arg);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref update failure: %s",
+			    reftable_error_str(err));
+	}
+
+	err = reftable_addition_commit(add);
+	if (err < 0) {
+		strbuf_addf(errbuf, "reftable: pseudoref commit failure: %s",
+			    reftable_error_str(err));
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_addition_destroy(add);
+	return err;
+}
+
+static int reftable_delete_pseudoref(struct ref_store *ref_store,
+				     const char *pseudoref,
+				     const struct object_id *old_oid)
+{
+	struct strbuf errbuf = STRBUF_INIT;
+	int ret = git_reftable_write_pseudoref(ref_store, pseudoref, &null_oid,
+					       old_oid, &errbuf);
+	/* XXX what to do with the error message? */
+	strbuf_release(&errbuf);
+	return ret;
+}
+
+struct write_delete_refs_arg {
+	struct reftable_stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	int i = 0;
+
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_ref_record ref = {
+			.refname = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = reftable_writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_log_record log = {
+			.update_index = ts,
+		};
+		struct reftable_ref_record current = { NULL };
+		fill_reftable_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.refname = (char *)arg->refnames->items[i].string;
+
+		if (reftable_stack_read_ref(arg->stack, log.refname,
+					    &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = reftable_writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		reftable_ref_record_clear(&current);
+
+		clear_reftable_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int git_reftable_delete_refs(struct ref_store *ref_store,
+				    const char *msg,
+				    struct string_list *refnames,
+				    unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack =
+		stack_for(refs, refnames->items[0].string);
+	struct write_delete_refs_arg arg = {
+		.stack = stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	string_list_sort(refnames);
+	err = reftable_stack_reload(stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(stack, &write_delete_refs_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+static int git_reftable_pack_refs(struct ref_store *ref_store,
+				  unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	int err = refs->err;
+	if (err < 0) {
+		return err;
+	}
+	err = reftable_stack_compact_all(refs->main_stack, NULL);
+	if (err == 0 && refs->worktree_stack != NULL)
+		err = reftable_stack_compact_all(refs->worktree_stack, NULL);
+	return err;
+}
+
+struct write_create_symref_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = reftable_stack_next_update_index(create->stack);
+	int err = 0;
+
+	struct reftable_ref_record ref = {
+		.refname = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	reftable_writer_set_limits(writer, ts, ts);
+	err = reftable_writer_add_ref(writer, &ref);
+	if (err == 0) {
+		struct reftable_log_record log = { NULL };
+		struct object_id new_oid;
+		struct object_id old_oid;
+
+		fill_reftable_log_record(&log);
+		log.refname = (char *)create->refname;
+		log.message = (char *)create->logmsg;
+		log.update_index = ts;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			err = reftable_writer_add_log(writer, &log);
+		}
+		log.refname = NULL;
+		log.message = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_reftable_log_record(&log);
+	}
+	return err;
+}
+
+static int git_reftable_create_symref(struct ref_store *ref_store,
+				      const char *refname, const char *target,
+				      const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, refname);
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .stack = stack,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(stack, &write_create_symref_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+struct write_rename_arg {
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_ref_record ref = { NULL };
+	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto done;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto done;
+	}
+
+	free(ref.refname);
+	ref.refname = strdup(arg->newname);
+	reftable_writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct reftable_ref_record todo[2] = { { NULL } };
+		todo[0].refname = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = reftable_writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct reftable_log_record todo[2] = { { NULL } };
+		fill_reftable_log_record(&todo[0]);
+		fill_reftable_log_record(&todo[1]);
+
+		todo[0].refname = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].refname = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = reftable_writer_add_logs(writer, todo, 2);
+
+		clear_reftable_log_record(&todo[0]);
+		clear_reftable_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto done;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static int git_reftable_rename_ref(struct ref_store *ref_store,
+				   const char *oldrefname,
+				   const char *newrefname, const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, newrefname);
+	struct write_rename_arg arg = {
+		.stack = stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_add(stack, &write_rename_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+static int git_reftable_copy_ref(struct ref_store *ref_store,
+				 const char *oldrefname, const char *newrefname,
+				 const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct git_reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+
+	/* Used when iterating over worktree & main */
+	struct reftable_merged_table *merged;
+	char *last_name;
+};
+
+static int
+git_reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_reflog_ref_iterator *ri =
+		(struct git_reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = reftable_iterator_next_log(&ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.refname;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.refname, ri->last_name)) {
+			/* we want the refnames that we have reflogs for, so we
+			 * skip if we've already produced this name. This could
+			 * be faster by seeking directly to
+			 * reflog@update_index==0.
+			 */
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.refname);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int
+git_reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int
+git_reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_reflog_ref_iterator *ri =
+		(struct git_reftable_reflog_ref_iterator *)ref_iterator;
+	reftable_log_record_clear(&ri->log);
+	reftable_iterator_destroy(&ri->iter);
+	if (ri->merged)
+		reftable_merged_table_free(ri->merged);
+	return 0;
+}
+
+static struct ref_iterator_vtable git_reftable_reflog_ref_iterator_vtable = {
+	git_reftable_reflog_ref_iterator_advance,
+	git_reftable_reflog_ref_iterator_peel,
+	git_reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+git_reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct git_reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+
+	if (refs->worktree_stack == NULL) {
+		struct reftable_stack *stack = refs->main_stack;
+		struct reftable_merged_table *mt =
+			reftable_stack_merged_table(stack);
+		int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
+		if (err < 0) {
+			free(ri);
+			/* XXX is this allowed? */
+			return NULL;
+		}
+	} else {
+		struct reftable_merged_table *mt1 =
+			reftable_stack_merged_table(refs->main_stack);
+		struct reftable_merged_table *mt2 =
+			reftable_stack_merged_table(refs->worktree_stack);
+		struct reftable_table *tabs =
+			xcalloc(2, sizeof(struct reftable_table));
+		int err = 0;
+		reftable_table_from_merged_table(&tabs[0], mt1);
+		reftable_table_from_merged_table(&tabs[1], mt2);
+		err = reftable_new_merged_table(&ri->merged, tabs, 2,
+						the_hash_algo->format_id);
+		if (err < 0) {
+			free(tabs);
+			/* XXX see above */
+			return NULL;
+		}
+		err = reftable_merged_table_seek_ref(ri->merged, &ri->iter, "");
+		if (err < 0) {
+			return NULL;
+		}
+	}
+	base_ref_iterator_init(&ri->base,
+			       &git_reftable_reflog_ref_iterator_vtable, 1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int git_reftable_for_each_reflog_ent_newest_first(
+	struct ref_store *ref_store, const char *refname, each_reflog_ent_fn fn,
+	void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, refname);
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_log_record log = { NULL };
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+
+		if (strcmp(log.refname, refname)) {
+			break;
+		}
+
+		hashcpy(old_oid.hash, log.old_hash);
+		hashcpy(new_oid.hash, log.new_hash);
+
+		full_committer = fmt_ident(log.name, log.email,
+					   WANT_COMMITTER_IDENT,
+					   /*date*/ NULL, IDENT_NO_DATE);
+		err = fn(&old_oid, &new_oid, full_committer, log.time,
+			 log.tz_offset, log.message, cb_data);
+		if (err)
+			break;
+	}
+
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+static int git_reftable_for_each_reflog_ent_oldest_first(
+	struct ref_store *ref_store, const char *refname, each_reflog_ent_fn fn,
+	void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, refname);
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+	int i = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = reftable_stack_merged_table(stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct reftable_log_record log = { NULL };
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+
+		if (strcmp(log.refname, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (i = len; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		err = fn(&old_oid, &new_oid, full_committer, log->time,
+			 log->tz_offset, log->message, cb_data);
+		if (err) {
+			break;
+		}
+	}
+
+	for (i = 0; i < len; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+static int git_reftable_reflog_exists(struct ref_store *ref_store,
+				      const char *refname)
+{
+	/* always exists. */
+	return 1;
+}
+
+static int git_reftable_create_reflog(struct ref_store *ref_store,
+				      const char *refname, int force_create,
+				      struct strbuf *err)
+{
+	return 0;
+}
+
+static int git_reftable_delete_reflog(struct ref_store *ref_store,
+				      const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	struct reftable_log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		reftable_log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct reftable_log_record tombstone = {
+		.refname = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int i = 0;
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int
+git_reftable_reflog_expire(struct ref_store *ref_store, const char *refname,
+			   const struct object_id *oid, unsigned int flags,
+			   reflog_expiry_prepare_fn prepare_fn,
+			   reflog_expiry_should_prune_fn should_prune_fn,
+			   reflog_expiry_cleanup_fn cleanup_fn,
+			   void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  git_reftable_pack_refs(), which will compact the entire stack and get
+	  rid of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, refname);
+	struct reftable_merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.stack = stack,
+		.refs = refs,
+	};
+	struct reftable_log_record log = { NULL };
+	struct reftable_iterator it = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	err = reftable_stack_reload(stack);
+	if (err) {
+		goto done;
+	}
+
+	mt = reftable_stack_merged_table(stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		goto done;
+	}
+
+	while (1) {
+		struct object_id ooid;
+		struct object_id noid;
+
+		int err = reftable_iterator_next_log(&it, &log);
+		if (err < 0) {
+			goto done;
+		}
+
+		if (err > 0 || strcmp(log.refname, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	err = reftable_stack_add(stack, &write_reflog_expiry_table, &arg);
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int git_reftable_read_raw_ref(struct ref_store *ref_store,
+				     const char *refname, struct object_id *oid,
+				     struct strbuf *referent,
+				     unsigned int *type)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, refname);
+
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	/* This is usually not needed, but Git doesn't signal to ref backend if
+	   a subprocess updated the ref DB.  So we always check.
+	*/
+	err = reftable_stack_reload(stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_read_ref(stack, refname, &ref);
+	if (err > 0) {
+		errno = ENOENT;
+		err = -1;
+		goto done;
+	}
+	if (err < 0) {
+		errno = reftable_error_to_errno(err);
+		err = -1;
+		goto done;
+	}
+	if (ref.target != NULL) {
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else if (ref.value != NULL) {
+		hashcpy(oid->hash, ref.value);
+	} else {
+		*type |= REF_ISBROKEN;
+		errno = EINVAL;
+		err = -1;
+	}
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	git_reftable_ref_store_create,
+	git_reftable_init_db,
+	git_reftable_transaction_prepare,
+	git_reftable_transaction_finish,
+	git_reftable_transaction_abort,
+	git_reftable_transaction_initial_commit,
+
+	git_reftable_pack_refs,
+	git_reftable_create_symref,
+	git_reftable_delete_refs,
+	git_reftable_rename_ref,
+	git_reftable_copy_ref,
+
+	git_reftable_write_pseudoref,
+	reftable_delete_pseudoref,
+
+	git_reftable_ref_iterator_begin,
+	git_reftable_read_raw_ref,
+
+	git_reftable_reflog_iterator_begin,
+	git_reftable_for_each_reflog_ent_oldest_first,
+	git_reftable_for_each_reflog_ent_newest_first,
+	git_reftable_reflog_exists,
+	git_reftable_create_reflog,
+	git_reftable_delete_reflog,
+	git_reftable_reflog_expire,
+};
diff --git a/reftable/update.sh b/reftable/update.sh
index 4dadd875f4..d816005db6 100755
--- a/reftable/update.sh
+++ b/reftable/update.sh
@@ -16,7 +16,4 @@ cp reftable-repo/LICENSE reftable/
 git --git-dir reftable-repo/.git show --no-patch --format=oneline HEAD \
   > reftable/VERSION
 
-mv reftable/system.h reftable/system.h~
-sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
-
 git add reftable/*.[ch] reftable/LICENSE reftable/VERSION
diff --git a/repository.c b/repository.c
index 6f7f6f002b..087760bc18 100644
--- a/repository.c
+++ b/repository.c
@@ -178,6 +178,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 3c1f7d54bd..293b346304 100644
--- a/repository.h
+++ b/repository.h
@@ -74,6 +74,9 @@ struct repository {
 	 */
 	struct ref_store *refs_private;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index dbac2eabe8..9273a9447a 100644
--- a/setup.c
+++ b/setup.c
@@ -469,9 +469,11 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
 			if (!value)
 				return config_error_nonbool(var);
 			data->partial_clone = xstrdup(value);
-		} else if (!strcmp(ext, "worktreeconfig"))
+		} else if (!strcmp(ext, "worktreeconfig")) {
 			data->worktree_config = git_config_bool(var, value);
-		else
+		} else if (!strcmp(ext, "refstorage")) {
+			data->ref_storage = xstrdup(value);
+		} else
 			string_list_append(&data->unknown_extensions, ext);
 	}
 
@@ -594,6 +596,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->unknown_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1239,8 +1242,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 0000000000..a6634bc882
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,173 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+INVALID_SHA1=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+initialize ()  {
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	mv .git/hooks .git/hooks-disabled
+}
+
+test_expect_success 'delete ref' '
+	initialize &&
+	test_commit file &&
+	SHA=$(git show-ref -s --verify HEAD) &&
+	test_write_lines "$SHA refs/heads/master" "$SHA refs/tags/file" >expect &&
+	git show-ref > actual &&
+	! git update-ref -d refs/tags/file $INVALID_SHA1 &&
+	test_cmp expect actual &&
+	git update-ref -d refs/tags/file $SHA  &&
+	test_write_lines "$SHA refs/heads/master" >expect &&
+	git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone calls transaction_initial_commit' '
+	test_commit message1 file1 &&
+	git clone . cloned &&
+	(test  -f cloned/file1 || echo "Fixme.")
+'
+
+test_expect_success 'basic operation of reftable storage: commit, show-ref' '
+	initialize &&
+	test_commit file &&
+	test_write_lines refs/heads/master refs/tags/file >expect &&
+	git show-ref &&
+	git show-ref | cut -f2 -d" " > actual &&
+	test_cmp actual expect
+'
+
+test_expect_success 'reflog, repack' '
+	initialize &&
+	for count in $(test_seq 1 10)
+	do
+		test_commit "number $count" file.t $count number-$count ||
+		return 1
+	done &&
+	git pack-refs &&
+	ls -1 .git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 10 output &&
+	grep "commit (initial): number 1" output &&
+	grep "commit: number 10" output &&
+	git gc &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 0 output
+'
+
+test_expect_success 'branch switch in reflog output' '
+	initialize &&
+	test_commit file1 &&
+	git checkout -b branch1 &&
+	test_commit file2 &&
+	git checkout -b branch2 &&
+	git switch - &&
+	git rev-parse --symbolic-full-name HEAD > actual &&
+	echo refs/heads/branch1 > expect &&
+	test_cmp actual expect
+'
+
+
+# This matches show-ref's output
+print_ref() {
+	echo "$(git rev-parse "$1") $1"
+}
+
+test_expect_success 'peeled tags are stored' '
+	initialize &&
+	test_commit file &&
+	git tag -m "annotated tag" test_tag HEAD &&
+	{
+		print_ref "refs/heads/master" &&
+		print_ref "refs/tags/file" &&
+		print_ref "refs/tags/test_tag" &&
+		print_ref "refs/tags/test_tag^{}"
+	} >expect &&
+	git show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'show-ref works on fresh repo' '
+	initialize &&
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	>expect &&
+	! git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'checkout unborn branch' '
+	initialize &&
+	git checkout -b master
+'
+
+
+test_expect_success 'dir/file conflict' '
+	initialize &&
+	test_commit file &&
+	! git branch master/forbidden
+'
+
+
+test_expect_success 'do not clobber existing repo' '
+	rm -rf .git &&
+	git init --ref-storage=files &&
+	cat .git/HEAD > expect &&
+	test_commit file &&
+	(git init --ref-storage=reftable || true) &&
+	cat .git/HEAD > actual &&
+	test_cmp expect actual
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'pseudo refs' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+	git cherry-pick source &&
+	test -f file2
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'rebase' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+	git rebase source &&
+	test -f file2
+'
+
+test_expect_success 'worktrees' '
+	git init --ref-storage=reftable start &&
+	(cd start && test_commit file1 && git checkout -b branch1 &&
+	git checkout -b branch2 &&
+	git worktree add  ../wt
+	) &&
+	cd wt &&
+	git checkout branch1 &&
+	git branch
+'
+
+test_expect_success 'worktrees 2' '
+	initialize &&
+	test_commit file1 &&
+	mkdir existing_empty &&
+	git worktree add --detach existing_empty master
+'
+
+test_done
+
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 15/20] Hookup unittests for the reftable library.
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (13 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 14/20] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 16/20] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
                                                       ` (6 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The unittests are under reftable/*_test.c, so all of the reftable code stays in
one directory. They are called from t/helpers/test-reftable.c in t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile                 | 17 ++++++++++++++++-
 t/helper/test-reftable.c | 15 +++++++++++++++
 t/helper/test-tool.c     |  1 +
 t/helper/test-tool.h     |  1 +
 t/t0031-reftable.sh      |  5 +++++
 5 files changed, 38 insertions(+), 1 deletion(-)
 create mode 100644 t/helper/test-reftable.c

diff --git a/Makefile b/Makefile
index 6571ed1246..8f33ad899a 100644
--- a/Makefile
+++ b/Makefile
@@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-read-cache.o
 TEST_BUILTINS_OBJS += test-read-graph.o
 TEST_BUILTINS_OBJS += test-read-midx.o
 TEST_BUILTINS_OBJS += test-ref-store.o
+TEST_BUILTINS_OBJS += test-reftable.o
 TEST_BUILTINS_OBJS += test-regex.o
 TEST_BUILTINS_OBJS += test-repository.o
 TEST_BUILTINS_OBJS += test-revision-walking.o
@@ -808,6 +809,7 @@ LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
 REFTABLE_LIB = reftable/libreftable.a
+REFTABLE_TEST_LIB = reftable/libreftable_test.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -2371,6 +2373,15 @@ REFTABLE_OBJS += reftable/tree.o
 REFTABLE_OBJS += reftable/writer.o
 REFTABLE_OBJS += reftable/zlib-compat.o
 
+REFTABLE_TEST_OBJS += reftable/block_test.o
+REFTABLE_TEST_OBJS += reftable/merged_test.o
+REFTABLE_TEST_OBJS += reftable/record_test.o
+REFTABLE_TEST_OBJS += reftable/refname_test.o
+REFTABLE_TEST_OBJS += reftable/reftable_test.o
+REFTABLE_TEST_OBJS += reftable/strbuf_test.o
+REFTABLE_TEST_OBJS += reftable/stack_test.o
+REFTABLE_TEST_OBJS += reftable/tree_test.o
+REFTABLE_TEST_OBJS += reftable/test_framework.o
 
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
@@ -2378,6 +2389,7 @@ OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
 	$(REFTABLE_OBJS) \
+	$(REFTABLE_TEST_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2521,6 +2533,9 @@ $(VCSSVN_LIB): $(VCSSVN_OBJS)
 $(REFTABLE_LIB): $(REFTABLE_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_TEST_LIB): $(REFTABLE_TEST_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -2803,7 +2818,7 @@ t/helper/test-svn-fe$X: $(VCSSVN_LIB)
 
 t/helper/test-tool$X: $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 
-t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS)
+t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS) $(REFTABLE_TEST_LIB)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS)
 
 check-sha1:: t/helper/test-tool$X
diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c
new file mode 100644
index 0000000000..def8883439
--- /dev/null
+++ b/t/helper/test-reftable.c
@@ -0,0 +1,15 @@
+#include "reftable/reftable-tests.h"
+#include "test-tool.h"
+
+int cmd__reftable(int argc, const char **argv)
+{
+	block_test_main(argc, argv);
+	merged_test_main(argc, argv);
+	record_test_main(argc, argv);
+	refname_test_main(argc, argv);
+	reftable_test_main(argc, argv);
+	strbuf_test_main(argc, argv);
+	stack_test_main(argc, argv);
+	tree_test_main(argc, argv);
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 590b2efca7..10366b7b76 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -52,6 +52,7 @@ static struct test_cmd cmds[] = {
 	{ "read-graph", cmd__read_graph },
 	{ "read-midx", cmd__read_midx },
 	{ "ref-store", cmd__ref_store },
+	{ "reftable", cmd__reftable },
 	{ "regex", cmd__regex },
 	{ "repository", cmd__repository },
 	{ "revision-walking", cmd__revision_walking },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ddc8e990e9..d52ba2f5e5 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -41,6 +41,7 @@ int cmd__read_cache(int argc, const char **argv);
 int cmd__read_graph(int argc, const char **argv);
 int cmd__read_midx(int argc, const char **argv);
 int cmd__ref_store(int argc, const char **argv);
+int cmd__reftable(int argc, const char **argv);
 int cmd__regex(int argc, const char **argv);
 int cmd__repository(int argc, const char **argv);
 int cmd__revision_walking(int argc, const char **argv);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
index a6634bc882..8496371f38 100755
--- a/t/t0031-reftable.sh
+++ b/t/t0031-reftable.sh
@@ -15,6 +15,11 @@ initialize ()  {
 	mv .git/hooks .git/hooks-disabled
 }
 
+test_expect_success 'unittests' '
+	test-tool reftable
+'
+
+
 test_expect_success 'delete ref' '
 	initialize &&
 	test_commit file &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 16/20] Add GIT_DEBUG_REFS debugging mechanism
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (14 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 15/20] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 17/20] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
                                                       ` (5 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

When set in the environment, GIT_DEBUG_REFS makes git print operations and
results as they flow through the ref storage backend. This helps debug
discrepancies between different ref backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile              |   1 +
 builtin/worktree.c    |   6 +-
 refs.c                |   3 +
 refs/debug.c          | 416 ++++++++++++++++++++++++++++++++++++++++++
 refs/refs-internal.h  |   6 +
 t/t0033-debug-refs.sh |  18 ++
 6 files changed, 449 insertions(+), 1 deletion(-)
 create mode 100644 refs/debug.c
 create mode 100755 t/t0033-debug-refs.sh

diff --git a/Makefile b/Makefile
index 8f33ad899a..700ba10b01 100644
--- a/Makefile
+++ b/Makefile
@@ -961,6 +961,7 @@ LIB_OBJS += rebase.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
+LIB_OBJS += refs/debug.o
 LIB_OBJS += refs/files-backend.o
 LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 1935967072..bdc792ee3f 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -403,7 +403,11 @@ static int add_worktree(const char *path, const char *refname,
 	 * worktree.
 	 */
 	strbuf_reset(&sb);
-	if (get_main_ref_store(the_repository)->be == &refs_be_reftable) {
+
+	/* XXX: check GIT_TEST_REFTABLE because GIT_DEBUG_REFS obscures the
+	 * instance type. */
+	if (get_main_ref_store(the_repository)->be == &refs_be_reftable ||
+	    getenv("GIT_TEST_REFTABLE") != NULL) {
 		/* XXX this is cut & paste from reftable_init_db. */
 		strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
 		write_file(sb.buf, "%s", "ref: refs/heads/.invalid\n");
diff --git a/refs.c b/refs.c
index 577adef4ae..63769a8b81 100644
--- a/refs.c
+++ b/refs.c
@@ -1782,6 +1782,9 @@ struct ref_store *get_main_ref_store(struct repository *r)
 						       r->ref_storage_format :
 						       default_ref_storage(),
 					 REF_STORE_ALL_CAPS);
+	if (getenv("GIT_DEBUG_REFS")) {
+		r->refs_private = debug_wrap(r->gitdir, r->refs_private);
+	}
 	return r->refs_private;
 }
 
diff --git a/refs/debug.c b/refs/debug.c
new file mode 100644
index 0000000000..a2317f1a65
--- /dev/null
+++ b/refs/debug.c
@@ -0,0 +1,416 @@
+
+#include "refs-internal.h"
+
+struct debug_ref_store {
+	struct ref_store base;
+	struct ref_store *refs;
+};
+
+extern struct ref_storage_be refs_be_debug;
+struct ref_store *debug_wrap(const char *gitdir, struct ref_store *store);
+
+struct ref_store *debug_wrap(const char *gitdir, struct ref_store *store)
+{
+	struct debug_ref_store *res = malloc(sizeof(struct debug_ref_store));
+	fprintf(stderr, "ref_store for %s\n", gitdir);
+	res->refs = store;
+	base_ref_store_init((struct ref_store *)res, &refs_be_debug);
+	return (struct ref_store *)res;
+}
+
+static int debug_init_db(struct ref_store *refs, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res = drefs->refs->be->init_db(drefs->refs, err);
+	fprintf(stderr, "init_db: %d\n", res);
+	return res;
+}
+
+static int debug_transaction_prepare(struct ref_store *refs,
+				     struct ref_transaction *transaction,
+				     struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_prepare(drefs->refs, transaction,
+						   err);
+	fprintf(stderr, "transaction_prepare: %d\n", res);
+	return res;
+}
+
+static void print_update(int i, const char *refname,
+			 const struct object_id *old_oid,
+			 const struct object_id *new_oid, unsigned int flags,
+			 unsigned int type, const char *msg)
+{
+	char o[200] = "null";
+	char n[200] = "null";
+	if (old_oid)
+		oid_to_hex_r(o, old_oid);
+	if (new_oid)
+		oid_to_hex_r(n, new_oid);
+
+	type &= 0xf; /* see refs.h REF_* */
+	flags &= REF_HAVE_NEW | REF_HAVE_OLD | REF_NO_DEREF |
+		 REF_FORCE_CREATE_REFLOG | REF_LOG_ONLY;
+	fprintf(stderr, "%d: %s %s -> %s (F=0x%x, T=0x%x) \"%s\"\n", i, refname,
+		o, n, flags, type, msg);
+}
+
+static void print_transaction(struct ref_transaction *transaction)
+{
+	fprintf(stderr, "transaction {\n");
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		print_update(i, u->refname, &u->old_oid, &u->new_oid, u->flags,
+			     u->type, u->msg);
+	}
+	fprintf(stderr, "}\n");
+}
+
+static int debug_transaction_finish(struct ref_store *refs,
+				    struct ref_transaction *transaction,
+				    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_finish(drefs->refs, transaction,
+						  err);
+	print_transaction(transaction);
+	fprintf(stderr, "finish: %d\n", res);
+	return res;
+}
+
+static int debug_transaction_abort(struct ref_store *refs,
+				   struct ref_transaction *transaction,
+				   struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_abort(drefs->refs, transaction, err);
+	return res;
+}
+
+static int debug_initial_transaction_commit(struct ref_store *refs,
+					    struct ref_transaction *transaction,
+					    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->initial_transaction_commit(drefs->refs,
+							  transaction, err);
+	return res;
+}
+
+static int debug_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->pack_refs(drefs->refs, flags);
+	fprintf(stderr, "pack_refs: %d\n", res);
+	return res;
+}
+
+static int debug_create_symref(struct ref_store *ref_store,
+			       const char *ref_name, const char *target,
+			       const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_symref(drefs->refs, ref_name, target,
+						 logmsg);
+	fprintf(stderr, "create_symref: %s -> %s \"%s\": %d\n", ref_name,
+		target, logmsg, res);
+	return res;
+}
+
+static int debug_delete_refs(struct ref_store *ref_store, const char *msg,
+			     struct string_list *refnames, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->delete_refs(drefs->refs, msg, refnames, flags);
+	return res;
+}
+
+static int debug_rename_ref(struct ref_store *ref_store, const char *oldref,
+			    const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->rename_ref(drefs->refs, oldref, newref,
+					      logmsg);
+	fprintf(stderr, "rename_ref: %s -> %s \"%s\": %d\n", oldref, newref,
+		logmsg, res);
+	return res;
+}
+
+static int debug_copy_ref(struct ref_store *ref_store, const char *oldref,
+			  const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->copy_ref(drefs->refs, oldref, newref, logmsg);
+	fprintf(stderr, "copy_ref: %s -> %s \"%s\": %d\n", oldref, newref,
+		logmsg, res);
+	return res;
+}
+
+static int debug_write_pseudoref(struct ref_store *ref_store,
+				 const char *pseudoref,
+				 const struct object_id *oid,
+				 const struct object_id *old_oid,
+				 struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->write_pseudoref(drefs->refs, pseudoref, oid,
+						   old_oid, err);
+	char o[100] = "null";
+	char n[100] = "null";
+	if (oid)
+		oid_to_hex_r(o, oid);
+	if (old_oid)
+		oid_to_hex_r(n, old_oid);
+	fprintf(stderr, "write_pseudoref: %s, %s => %s, err %s: %d\n",
+		pseudoref, o, n, err->buf, res);
+	return res;
+}
+
+static int debug_delete_pseudoref(struct ref_store *ref_store,
+				  const char *pseudoref,
+				  const struct object_id *old_oid)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->delete_pseudoref(drefs->refs, pseudoref,
+						    old_oid);
+	char hex[100] = "null";
+	if (old_oid)
+		oid_to_hex_r(hex, old_oid);
+	fprintf(stderr, "delete_pseudoref: %s (%s): %d\n", pseudoref, hex, res);
+	return res;
+}
+
+struct debug_ref_iterator {
+	struct ref_iterator base;
+	struct ref_iterator *iter;
+};
+
+static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->advance(diter->iter);
+	if (res)
+		fprintf(stderr, "iterator_advance: %d\n", res);
+	else
+		fprintf(stderr, "iterator_advance before: %s\n",
+			diter->iter->refname);
+
+	diter->base.ordered = diter->iter->ordered;
+	diter->base.refname = diter->iter->refname;
+	diter->base.oid = diter->iter->oid;
+	diter->base.flags = diter->iter->flags;
+	return res;
+}
+static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				   struct object_id *peeled)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->peel(diter->iter, peeled);
+	fprintf(stderr, "iterator_peel: %s: %d\n", diter->iter->refname, res);
+	return res;
+}
+
+static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->abort(diter->iter);
+	fprintf(stderr, "iterator_abort: %d\n", res);
+	return res;
+}
+
+static struct ref_iterator_vtable debug_ref_iterator_vtable = {
+	debug_ref_iterator_advance, debug_ref_iterator_peel,
+	debug_ref_iterator_abort
+};
+
+static struct ref_iterator *
+debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			 unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->iterator_begin(drefs->refs, prefix, flags);
+	struct debug_ref_iterator *diter = xcalloc(1, sizeof(*diter));
+	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable, 1);
+	diter->iter = res;
+	fprintf(stderr, "ref_iterator_begin: %s (0x%x)\n", prefix, flags);
+	return &diter->base;
+}
+
+static int debug_read_raw_ref(struct ref_store *ref_store, const char *refname,
+			      struct object_id *oid, struct strbuf *referent,
+			      unsigned int *type)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = 0;
+
+	oidcpy(oid, &null_oid);
+	res = drefs->refs->be->read_raw_ref(drefs->refs, refname, oid, referent,
+					    type);
+
+	if (res == 0) {
+		fprintf(stderr, "read_raw_ref: %s: %s (=> %s) type %x: %d\n",
+			refname, oid_to_hex(oid), referent->buf, *type, res);
+	} else {
+		fprintf(stderr, "read_raw_ref: %s err %d\n", refname, res);
+	}
+	return res;
+}
+
+static struct ref_iterator *
+debug_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->reflog_iterator_begin(drefs->refs);
+	fprintf(stderr, "for_each_reflog_iterator_begin\n");
+	return res;
+}
+
+struct debug_reflog {
+	const char *refname;
+	each_reflog_ent_fn *fn;
+	void *cb_data;
+};
+
+static int debug_print_reflog_ent(struct object_id *old_oid,
+				  struct object_id *new_oid,
+				  const char *committer, timestamp_t timestamp,
+				  int tz, const char *msg, void *cb_data)
+{
+	struct debug_reflog *dbg = (struct debug_reflog *)cb_data;
+	int ret;
+	char o[100] = "null";
+	char n[100] = "null";
+	if (old_oid)
+		oid_to_hex_r(o, old_oid);
+	if (new_oid)
+		oid_to_hex_r(n, new_oid);
+
+	ret = dbg->fn(old_oid, new_oid, committer, timestamp, tz, msg,
+		      dbg->cb_data);
+	fprintf(stderr, "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
+		dbg->refname, ret, o, n, committer, (long int)timestamp, msg);
+	return ret;
+}
+
+static int debug_for_each_reflog_ent(struct ref_store *ref_store,
+				     const char *refname, each_reflog_ent_fn fn,
+				     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct debug_reflog dbg = {
+		.refname = refname,
+		.fn = fn,
+		.cb_data = cb_data,
+	};
+
+	int res = drefs->refs->be->for_each_reflog_ent(
+		drefs->refs, refname, &debug_print_reflog_ent, &dbg);
+	fprintf(stderr, "for_each_reflog: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_for_each_reflog_ent_reverse(struct ref_store *ref_store,
+					     const char *refname,
+					     each_reflog_ent_fn fn,
+					     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct debug_reflog dbg = {
+		.refname = refname,
+		.fn = fn,
+		.cb_data = cb_data,
+	};
+	int res = drefs->refs->be->for_each_reflog_ent_reverse(
+		drefs->refs, refname, &debug_print_reflog_ent, &dbg);
+	fprintf(stderr, "for_each_reflog_reverse: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_reflog_exists(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_exists(drefs->refs, refname);
+	fprintf(stderr, "reflog_exists: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_create_reflog(struct ref_store *ref_store, const char *refname,
+			       int force_create, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_reflog(drefs->refs, refname,
+						 force_create, err);
+	fprintf(stderr, "create_reflog: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_delete_reflog(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->delete_reflog(drefs->refs, refname);
+	fprintf(stderr, "delete_reflog: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
+			       const struct object_id *oid, unsigned int flags,
+			       reflog_expiry_prepare_fn prepare_fn,
+			       reflog_expiry_should_prune_fn should_prune_fn,
+			       reflog_expiry_cleanup_fn cleanup_fn,
+			       void *policy_cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_expire(drefs->refs, refname, oid,
+						 flags, prepare_fn,
+						 should_prune_fn, cleanup_fn,
+						 policy_cb_data);
+	fprintf(stderr, "reflog_expire: %s: %d\n", refname, res);
+	return res;
+}
+
+struct ref_storage_be refs_be_debug = {
+	NULL,
+	"debug",
+	NULL,
+	debug_init_db,
+	debug_transaction_prepare,
+	debug_transaction_finish,
+	debug_transaction_abort,
+	debug_initial_transaction_commit,
+
+	debug_pack_refs,
+	debug_create_symref,
+	debug_delete_refs,
+	debug_rename_ref,
+	debug_copy_ref,
+
+	debug_write_pseudoref,
+	debug_delete_pseudoref,
+
+	debug_ref_iterator_begin,
+	debug_read_raw_ref,
+
+	debug_reflog_iterator_begin,
+	debug_for_each_reflog_ent,
+	debug_for_each_reflog_ent_reverse,
+	debug_reflog_exists,
+	debug_create_reflog,
+	debug_delete_reflog,
+	debug_reflog_expire,
+};
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 7afe4c2831..b5c9f57af7 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -713,4 +713,10 @@ struct ref_store {
 void base_ref_store_init(struct ref_store *refs,
 			 const struct ref_storage_be *be);
 
+/*
+ * Print out ref operations as they occur. Useful for debugging alternate ref
+ * backends.
+ */
+struct ref_store *debug_wrap(const char *gitdir, struct ref_store *store);
+
 #endif /* REFS_REFS_INTERNAL_H */
diff --git a/t/t0033-debug-refs.sh b/t/t0033-debug-refs.sh
new file mode 100755
index 0000000000..ed76c2c84a
--- /dev/null
+++ b/t/t0033-debug-refs.sh
@@ -0,0 +1,18 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='cross-check reftable with files, using GIT_DEBUG_REFS output'
+
+. ./test-lib.sh
+
+test_expect_success 'GIT_DEBUG_REFS' '
+	git init --ref-storage=files files &&
+	git init --ref-storage=reftable reftable &&
+	(cd files && GIT_DEBUG_REFS=1 test_commit message file) > files.txt &&
+	(cd reftable && GIT_DEBUG_REFS=1 test_commit message file) > reftable.txt &&
+	test_cmp files.txt reftable.txt
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 17/20] vcxproj: adjust for the reftable changes
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (15 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 16/20] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Johannes Schindelin via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 18/20] git-prompt: prepare for reftable refs backend SZEDER Gábor via GitGitGadget
                                                       ` (4 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This allows Git to be compiled via Visual Studio again after integrating
the `hn/reftable` branch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.mak.uname                           |  2 +-
 contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/config.mak.uname b/config.mak.uname
index c7eba69e54..ae4e25a1a4 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -709,7 +709,7 @@ vcxproj:
 	# Make .vcxproj files and add them
 	unset QUIET_GEN QUIET_BUILT_IN; \
 	perl contrib/buildsystems/generate -g Vcxproj
-	git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
+	git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj
 
 	# Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
 	(echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm
index 5c666f9ac0..33a08d3165 100644
--- a/contrib/buildsystems/Generators/Vcxproj.pm
+++ b/contrib/buildsystems/Generators/Vcxproj.pm
@@ -77,7 +77,7 @@ sub createProject {
     my $libs_release = "\n    ";
     my $libs_debug = "\n    ";
     if (!$static_library) {
-      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
+      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
       $libs_debug = $libs_release;
       $libs_debug =~ s/zlib\.lib/zlibd\.lib/g;
       $libs_debug =~ s/libcurl\.lib/libcurl-d\.lib/g;
@@ -231,6 +231,7 @@ sub createProject {
 EOM
     if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') {
       my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"};
+      my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"};
       my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"};
 
       print F << "EOM";
@@ -240,6 +241,14 @@ sub createProject {
       <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
     </ProjectReference>
 EOM
+      if (!($name =~ /xdiff|libreftable/)) {
+        print F << "EOM";
+    <ProjectReference Include="$cdup\\reftable\\libreftable\\libreftable.vcxproj">
+      <Project>$uuid_libreftable</Project>
+      <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
+    </ProjectReference>
+EOM
+      }
       if (!($name =~ 'xdiff')) {
         print F << "EOM";
     <ProjectReference Include="$cdup\\xdiff\\lib\\xdiff_lib.vcxproj">
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 18/20] git-prompt: prepare for reftable refs backend
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (16 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 17/20] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
@ 2020-06-29 18:56                                     ` SZEDER Gábor via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 19/20] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
                                                       ` (3 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: SZEDER Gábor via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, SZEDER Gábor

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

In our git-prompt script we strive to use Bash builtins wherever
possible, because fork()-ing subshells for command substitutions and
fork()+exec()-ing Git commands are expensive on some platforms.  We
even read and parse '.git/HEAD' using Bash builtins to get the name of
the current branch [1].  However, the upcoming reftable refs backend
won't use '.git/HEAD' at all, but will write an invalid refname as
placeholder for backwards compatibility instead, which will break our
git-prompt script.

Update the git-prompt script to recognize the placeholder '.git/HEAD'
written by the reftable backend (its content is specified in the
reftable specs), and then fall back to use 'git symbolic-ref' to get
the name of the current branch.

[1] 3a43c4b5bd (bash prompt: use bash builtins to find out current
    branch, 2011-03-31)

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 contrib/completion/git-prompt.sh | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/contrib/completion/git-prompt.sh b/contrib/completion/git-prompt.sh
index e6cd5464e5..e8fdb678d4 100644
--- a/contrib/completion/git-prompt.sh
+++ b/contrib/completion/git-prompt.sh
@@ -476,10 +476,15 @@ __git_ps1 ()
 			if ! __git_eread "$g/HEAD" head; then
 				return $exit
 			fi
-			# is it a symbolic ref?
 			b="${head#ref: }"
 			if [ "$head" = "$b" ]; then
 				detached=yes
+			elif [ "$b" = "refs/heads/.invalid" ]; then
+				# Reftable
+				b="$(git symbolic-ref HEAD 2>/dev/null)" ||
+				detached=yes
+			fi
+			if [ "$detached" = yes ]; then
 				b="$(
 				case "${GIT_PS1_DESCRIBE_STYLE-}" in
 				(contains)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 19/20] Add reftable testing infrastructure
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (17 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 18/20] git-prompt: prepare for reftable refs backend SZEDER Gábor via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 18:56                                     ` [PATCH v19 20/20] Add "test-tool dump-reftable" command Han-Wen Nienhuys via GitGitGadget
                                                       ` (2 subsequent siblings)
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* Add GIT_TEST_REFTABLE environment var to control default ref storage

* Add test_prerequisite REFTABLE.

* Skip some tests that are incompatible:

  * t3210-pack-refs.sh - does not apply
  * t1450-fsck.sh - manipulates .git/ directly to create invalid state

Major test failures:

 * t1400-update-ref.sh - Reads from .git/{refs,logs} directly
 * t1404-update-ref-errors.sh - Manipulates .git/refs/ directly
 * t1405 - inspecs .git/ directly.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/t1409-avoid-packing-refs.sh | 6 ++++++
 t/t1450-fsck.sh               | 6 ++++++
 t/t3210-pack-refs.sh          | 6 ++++++
 t/test-lib.sh                 | 5 +++++
 4 files changed, 23 insertions(+)

diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
index be12fb6350..c6f7832556 100755
--- a/t/t1409-avoid-packing-refs.sh
+++ b/t/t1409-avoid-packing-refs.sh
@@ -4,6 +4,12 @@ test_description='avoid rewriting packed-refs unnecessarily'
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 # Add an identifying mark to the packed-refs file header line. This
 # shouldn't upset readers, and it should be omitted if the file is
 # ever rewritten.
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 344a2aad82..0966920324 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -8,6 +8,12 @@ test_description='git fsck random collection of tests
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success setup '
 	test_oid_init &&
 	git config gc.auto 0 &&
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index f41b2afb99..edaef2c175 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -11,6 +11,12 @@ semantic is still the same.
 '
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success 'enable reflogs' '
 	git config core.logallrefupdates true
 '
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 618a7c8d5b..9b22c7def3 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1505,6 +1505,11 @@ parisc* | hppa*)
 	;;
 esac
 
+if test -n "$GIT_TEST_REFTABLE"
+then
+  test_set_prereq REFTABLE
+fi
+
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v19 20/20] Add "test-tool dump-reftable" command.
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (18 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 19/20] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 18:56                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-06-29 22:54                                     ` [PATCH v19 00/20] Reftable support git-core Junio C Hamano
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
  21 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-06-29 18:56 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This command dumps individual tables or a stack of of tables.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile                 | 1 +
 t/helper/test-reftable.c | 5 +++++
 t/helper/test-tool.c     | 1 +
 t/helper/test-tool.h     | 1 +
 4 files changed, 8 insertions(+)

diff --git a/Makefile b/Makefile
index 700ba10b01..d949b79720 100644
--- a/Makefile
+++ b/Makefile
@@ -2375,6 +2375,7 @@ REFTABLE_OBJS += reftable/writer.o
 REFTABLE_OBJS += reftable/zlib-compat.o
 
 REFTABLE_TEST_OBJS += reftable/block_test.o
+REFTABLE_TEST_OBJS += reftable/dump.o
 REFTABLE_TEST_OBJS += reftable/merged_test.o
 REFTABLE_TEST_OBJS += reftable/record_test.o
 REFTABLE_TEST_OBJS += reftable/refname_test.o
diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c
index def8883439..aff4fbccda 100644
--- a/t/helper/test-reftable.c
+++ b/t/helper/test-reftable.c
@@ -13,3 +13,8 @@ int cmd__reftable(int argc, const char **argv)
 	tree_test_main(argc, argv);
 	return 0;
 }
+
+int cmd__dump_reftable(int argc, const char **argv)
+{
+	return reftable_dump_main(argc, (char *const *)argv);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 10366b7b76..9e689f9d2b 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -53,6 +53,7 @@ static struct test_cmd cmds[] = {
 	{ "read-midx", cmd__read_midx },
 	{ "ref-store", cmd__ref_store },
 	{ "reftable", cmd__reftable },
+	{ "dump-reftable", cmd__dump_reftable },
 	{ "regex", cmd__regex },
 	{ "repository", cmd__repository },
 	{ "revision-walking", cmd__revision_walking },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index d52ba2f5e5..bf833e01d4 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -17,6 +17,7 @@ int cmd__dump_cache_tree(int argc, const char **argv);
 int cmd__dump_fsmonitor(int argc, const char **argv);
 int cmd__dump_split_index(int argc, const char **argv);
 int cmd__dump_untracked_cache(int argc, const char **argv);
+int cmd__dump_reftable(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__genzeros(int argc, const char **argv);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v19 03/20] checkout: add '\n' to reflog message
  2020-06-29 18:56                                     ` [PATCH v19 03/20] checkout: add '\n' to reflog message Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 20:07                                       ` Junio C Hamano
  2020-06-30  8:30                                         ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-06-29 20:07 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Han-Wen Nienhuys <hanwen@google.com>
>
> Reftable precisely reproduces the given message. This leads to differences,
> because the files backend implicitly adds a trailing '\n' to all messages.

What does this mean?  With the files backend we'll now see a
redundant two LFs in a row?  I think you are careful enough not to
introduce unnecessary compatibility breakage like that, so perhaps
"implicitly adds" is a wrong way to characterize what happens in the
files backend, and it only adds LF when the message does not end
with one, but does not add an extra one if not necessary?  

If so, then the change in the patch does not break compatibility,
but the above description does not give readers confidence that it
is indeed the case.  IOW it is unclear how this change manages to
avoid breaking existing code.

Sorry, but I am left puzzled.

>
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  builtin/checkout.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index af849c644f..bb11fcc4e9 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -884,8 +884,9 @@ static void update_refs_for_switch(const struct checkout_opts *opts,
>  
>  	reflog_msg = getenv("GIT_REFLOG_ACTION");
>  	if (!reflog_msg)
> -		strbuf_addf(&msg, "checkout: moving from %s to %s",
> -			old_desc ? old_desc : "(invalid)", new_branch_info->name);
> +		strbuf_addf(&msg, "checkout: moving from %s to %s\n",
> +			    old_desc ? old_desc : "(invalid)",
> +			    new_branch_info->name);
>  	else
>  		strbuf_insertstr(&msg, 0, reflog_msg);

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v19 00/20] Reftable support git-core
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (19 preceding siblings ...)
  2020-06-29 18:56                                     ` [PATCH v19 20/20] Add "test-tool dump-reftable" command Han-Wen Nienhuys via GitGitGadget
@ 2020-06-29 22:54                                     ` Junio C Hamano
  2020-06-30  9:28                                       ` Han-Wen Nienhuys
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
  21 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-06-29 22:54 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> base-commit: b9a2d1a0207fb9ded3fa524f54db3bc322a12cc4

This is based on 'next', which usually is a sure way for a topic to
stay forever out of 'next', but we have impactful dependence only on
two topics, I think, and a good news is that both of them are in
pretty good shape.  I think Brian's part 2 of SHA-256 work should be
on the 'master' branch soon, and Dscho's "customizable default
branch" is also ready---it just would, like all other topics, want
to spend at least one week on 'next' to be safe.  And after that,
this topic can be directly on 'master' (there is another trivial
conflict around bisect--helper, but I am not worried about it),
which looks quite good.

Tentatively I've prepared a merge of bc/sha-256-part-2 and
js/default-branch-name on top of today's 'master' and these patches
applied cleanly on the result.  As you mentioned, we may want to
flip the orders of patches a bit to split uncontroversial ones out
to a separate preparatory topic and get them merged to 'next' and
then to 'master' earlier than the remainder.

Thanks.

-- >8 --
fixup! Reftable support for git-core

Noticed by the "make && make distclean && git clean -n -x" test.

diff --git a/Makefile b/Makefile
index d949b79720..c22fd41662 100644
--- a/Makefile
+++ b/Makefile
@@ -3153,7 +3153,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB) $(REFTABLE_TEST_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
-- 
2.27.0-221-ga08a83db2b







^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v19 03/20] checkout: add '\n' to reflog message
  2020-06-29 20:07                                       ` Junio C Hamano
@ 2020-06-30  8:30                                         ` Han-Wen Nienhuys
  2020-06-30 23:58                                           ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-06-30  8:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Mon, Jun 29, 2020 at 10:08 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Han-Wen Nienhuys <hanwen@google.com>
> >
> > Reftable precisely reproduces the given message. This leads to differences,
> > because the files backend implicitly adds a trailing '\n' to all messages.
>
> What does this mean?  With the files backend we'll now see a
> redundant two LFs in a row?  I think you are careful enough not to
> introduce unnecessary compatibility breakage like that, so perhaps
> "implicitly adds" is a wrong way to characterize what happens in the
> files backend, and it only adds LF when the message does not end
> with one, but does not add an extra one if not necessary?
>
> If so, then the change in the patch does not break compatibility,
> but the above description does not give readers confidence that it
> is indeed the case.  IOW it is unclear how this change manages to
> avoid breaking existing code.
>
> Sorry, but I am left puzzled.

Most places that write a reflog message use a message that ends with '\n'.

Some places (the one mentioned here) do not append a '\n'. When it is
read back, the message does have a '\n', leading to discrepancies with
reftable (which faithfully reproduced the message without '\n')

I initially thought we could or should fix this across git-core, but I
think it will be a lot of work to find all occurrences, so I
implemented

  https://github.com/google/reftable/commit/785f745daf567aeabe47cb01024ea879b9e15c2f

which does extra validation on reflog messages, and normalizes the
'\n' at the end. Maybe this is also worth discussing in the reftable
spec: in reftable, you can use multi-line reflog messages, but that
seems fundamentally impossible in the traditional file storage.

I can drop this commit from the series, but I think it would be a
useful cleanup anyways.

--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v19 00/20] Reftable support git-core
  2020-06-29 22:54                                     ` [PATCH v19 00/20] Reftable support git-core Junio C Hamano
@ 2020-06-30  9:28                                       ` Han-Wen Nienhuys
  2020-07-01  0:03                                         ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-06-30  9:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

On Tue, Jun 30, 2020 at 12:54 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > base-commit: b9a2d1a0207fb9ded3fa524f54db3bc322a12cc4
>
> This is based on 'next', which usually is a sure way for a topic to
> stay forever out of 'next', but we have impactful dependence only on
> two topics, I think, and a good news is that both of them are in
> pretty good shape.  I think Brian's part 2 of SHA-256 work should be
> on the 'master' branch soon, and Dscho's "customizable default
> branch" is also ready---it just would, like all other topics, want
> to spend at least one week on 'next' to be safe.  And after that,
> this topic can be directly on 'master' (there is another trivial
> conflict around bisect--helper, but I am not worried about it),
> which looks quite good.

ok. For the next time, I should keep basing myself on master, even if
I know there are conflicts?

Do you have an opinion on
https://public-inbox.org/git/pull.665.git.1592580071.gitgitgadget@gmail.com/
?

There is some overlap with in sequencer.c, and Phillip's approach is
likely more principled, so I'd like to base reftable on that.

> fixup! Reftable support for git-core

thanks, folded in!

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v19 02/20] t3432: use git-reflog to inspect the reflog for HEAD
  2020-06-29 18:56                                     ` [PATCH v19 02/20] t3432: use git-reflog to inspect the reflog for HEAD Han-Wen Nienhuys via GitGitGadget
@ 2020-06-30 15:23                                       ` Denton Liu
  0 siblings, 0 replies; 409+ messages in thread
From: Denton Liu @ 2020-06-30 15:23 UTC (permalink / raw)
  To: Han-Wen Nienhuys via GitGitGadget; +Cc: git, Han-Wen Nienhuys, Han-Wen Nienhuys

Hi Han-Wen,

On Mon, Jun 29, 2020 at 06:56:40PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> From: Han-Wen Nienhuys <hanwen@google.com>
> 
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
>  t/t3432-rebase-fast-forward.sh | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/t/t3432-rebase-fast-forward.sh b/t/t3432-rebase-fast-forward.sh
> index 6f0452c0ea..22afeb8ccd 100755
> --- a/t/t3432-rebase-fast-forward.sh
> +++ b/t/t3432-rebase-fast-forward.sh
> @@ -60,15 +60,16 @@ test_rebase_same_head_ () {
>  		fi &&
>  		oldhead=\$(git rev-parse HEAD) &&
>  		test_when_finished 'git reset --hard \$oldhead' &&
> -		cp .git/logs/HEAD expect &&
> +		git reflog HEAD > expect &&

Tiny nit: there should be no space present after the redirect operator,
so it should be written like this

	git reflog HEAD >expect

>  		git rebase$flag $* >stdout &&
> +		git reflog HEAD > actual &&

Same here.

>  		if test $what = work
>  		then
>  			old=\$(wc -l <expect) &&
> -			test_line_count '-gt' \$old .git/logs/HEAD
> +			test_line_count '-gt' \$old actual
>  		elif test $what = noop
>  		then
> -			test_cmp expect .git/logs/HEAD
> +			test_cmp expect actual
>  		fi &&
>  		newhead=\$(git rev-parse HEAD) &&
>  		if test $cmp = same
> -- 
> gitgitgadget
> 

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v19 03/20] checkout: add '\n' to reflog message
  2020-06-30  8:30                                         ` Han-Wen Nienhuys
@ 2020-06-30 23:58                                           ` Junio C Hamano
  2020-07-01 16:56                                             ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-06-30 23:58 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

Han-Wen Nienhuys <hanwen@google.com> writes:

> Most places that write a reflog message use a message that ends with '\n'.
>
> Some places (the one mentioned here) do not append a '\n'.

Just to make sure I am following, I see

	if (update_ref("initial pull", "HEAD", merge_head, ...

in bultin/pull.c; this is supposed to be written with "\n" under the
new world order?

As reflog messages are supposed to be a single liner (due to the
nature of the original storage format), I have a feeling that it is
cumbersome to the programmers to requre "\n" at the end.  It would
not add much value to the readability of the resulting code, either.

If the places that have hardcoded messages like the above have to be
written like so:

	if (update_ref("initial pull\n", "HEAD", merge_head, ...

it makes the code worse.

Let's see if I can come up with a coherent "rule" that we can follow
to improve the current situation that is an undisciplined mixture as
you found out.

    ... goes and looks the current code ...

So, what happens is that update->msg in each ref transaction (in
refs.c::ref_transaction_add_update()) gets verbatim copy of what the
caller gives the refs API, but when it is written out as an reflog
entry, the message is cleansed by calling refs.c::copy_reflog_msg()
that squashes a run of whitespaces (not just SP but including LF and
HT) into a single SP and then trims whitespaces at the end.  

Then to the result, a LF is unconditionally added.

I would consider the massaging of the application-supplied message
done in copy_reflog_msg() as normalization, which turns possibly
invalid data (like a message that has LF anywhere in it) into an
acceptable form.  Which defines the kosher form of a reflog message
as not having any LF (or excess SP) in it, and the terminating LF is
not part of the message.  From that point of view, if we were to do
something, I would rather standardize on not having the final (and
only) LF.

But some callers that prepare a message in a string buffer may find
it easier not having to strip the trailing LF, or even depend on the
fact that they can feed a string split into multiple lines and the
result becomes a single line.  So I do not see a point in "fixing"
the existing callers that gives an extra LF at the end or other
not-so-kosher form of data---these will be normalized by calling
copy_reflog_msg() anyway.  

If they are easy to fix, like sending a constant string with LF at
the end, we would want to fix them to improve the readability of the
resulting code, though.  If the update_ref() call in builtin/pull.c
were sending "initial pull\n" as its first parameter, it would be
worth fixing by removing the LF, as the result would be easier to
read and it is not too much work.  On the other hand, if the message
that eventually becomes a reflog entry comes from outside and if the
existing code relies on the sanitizing done by copy_reflog_msg(), I
do not see a point in carefully removing the LF at the end on the
caller's side, either.

> I initially thought we could or should fix this across git-core, but I
> think it will be a lot of work to find all occurrences.

I think we are on the same page, even if the definition of which is
correct might be different between us.  As long as all the refs
backend does the same sanitizing (which is probably easy---just call
copy_reflog_msg() from them, or perhaps we may want to move the call
to where update->msg are queued for each step in a transaction,
which would relieve individual backends from having to worry about
this issue at all), they would all behave the same for the same
reflog data.

Thanks.


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v19 00/20] Reftable support git-core
  2020-06-30  9:28                                       ` Han-Wen Nienhuys
@ 2020-07-01  0:03                                         ` Junio C Hamano
  2020-07-01 10:16                                           ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-07-01  0:03 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys

Han-Wen Nienhuys <hanwen@google.com> writes:

> On Tue, Jun 30, 2020 at 12:54 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>> > base-commit: b9a2d1a0207fb9ded3fa524f54db3bc322a12cc4
>>
>> This is based on 'next', which usually is a sure way for a topic to
>> stay forever out of 'next', but we have impactful dependence only on
>> two topics, I think, and a good news is that both of them are in
>> pretty good shape.  I think Brian's part 2 of SHA-256 work should be
>> on the 'master' branch soon, and Dscho's "customizable default
>> branch" is also ready---it just would, like all other topics, want
>> to spend at least one week on 'next' to be safe.  And after that,
>> this topic can be directly on 'master' (there is another trivial
>> conflict around bisect--helper, but I am not worried about it),
>> which looks quite good.
>
> ok. For the next time, I should keep basing myself on master, even if
> I know there are conflicts?

Right now we know there won't be and that is why I said the above.
If your next round would change the code drastically, or somebody
sends changes that deliberately conflicts with what you are doing
to sabotage you, the situation would become different.

> Do you have an opinion on
> https://public-inbox.org/git/pull.665.git.1592580071.gitgitgadget@gmail.com/
> ?
>
> There is some overlap with in sequencer.c, and Phillip's approach is
> likely more principled, so I'd like to base reftable on that.

I assumed that these were offered to you as possible improvements to
be folded into your series, so I didn't read them very carefully and
I didn't queue them myself.  I expected that I would see them,
possibly modified to fit the context better, as part of your series
sent from you, perhaps to become a part of early clean-up portion of
your topic.

Thanks.


^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v19 00/20] Reftable support git-core
  2020-07-01  0:03                                         ` Junio C Hamano
@ 2020-07-01 10:16                                           ` Han-Wen Nienhuys
  2020-07-01 20:56                                             ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-07-01 10:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys, Phillip Wood

On Wed, Jul 1, 2020 at 2:03 AM Junio C Hamano <gitster@pobox.com> wrote:
> > Do you have an opinion on
> > https://public-inbox.org/git/pull.665.git.1592580071.gitgitgadget@gmail.com/
> > ?
> >
> > There is some overlap with in sequencer.c, and Phillip's approach is
> > likely more principled, so I'd like to base reftable on that.
>
> I assumed that these were offered to you as possible improvements to
> be folded into your series, so I didn't read them very carefully and
> I didn't queue them myself.  I expected that I would see them,
> possibly modified to fit the context better, as part of your series
> sent from you, perhaps to become a part of early clean-up portion of
> your topic.

They are changing the signature of widely used functions, which is
useful for my series but not completely necessary. I would rather that
someone else decides on how to go forward with the series.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: [PATCH v19 03/20] checkout: add '\n' to reflog message
  2020-06-30 23:58                                           ` Junio C Hamano
@ 2020-07-01 16:56                                             ` Han-Wen Nienhuys
  2020-07-01 20:22                                               ` Re* " Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-07-01 16:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys via GitGitGadget, git

On Wed, Jul 1, 2020 at 1:58 AM Junio C Hamano <gitster@pobox.com> wrote:
> > I initially thought we could or should fix this across git-core, but I
> > think it will be a lot of work to find all occurrences.
>
> I think we are on the same page, even if the definition of which is
> correct might be different between us.  As long as all the refs

Right. The practical upshot of this is that I should drop this patch, though.

-- 
Han-Wen Nienhuys - hanwenn@gmail.com - http://www.xs4all.nl/~hanwen

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re* [PATCH v19 03/20] checkout: add '\n' to reflog message
  2020-07-01 16:56                                             ` Han-Wen Nienhuys
@ 2020-07-01 20:22                                               ` Junio C Hamano
  2020-07-06 15:56                                                 ` Han-Wen Nienhuys
  0 siblings, 1 reply; 409+ messages in thread
From: Junio C Hamano @ 2020-07-01 20:22 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys via GitGitGadget, git

Han-Wen Nienhuys <hanwenn@gmail.com> writes:

> On Wed, Jul 1, 2020 at 1:58 AM Junio C Hamano <gitster@pobox.com> wrote:
>> > I initially thought we could or should fix this across git-core, but I
>> > think it will be a lot of work to find all occurrences.
>>
>> I think we are on the same page, even if the definition of which is
>> correct might be different between us.  As long as all the refs
>
> Right. The practical upshot of this is that I should drop this patch, though.

I guess we are on the same page on that one, too, even if our
definition of which side is up and which side is down may be
different ;-)

I think this can be dropped, as it does not even try to enforce "all
reflog message arguments must end with LF".

At the same time, we would need to make sure that reftable backend
cleanses the reflog message the callers of refs API supplies the
same way as the files backend.  In short, if a series of Git
operations that accumulate records in a reflog are performed with
either the files or the reftable backend, "git log -g" and other
forms to retrieve these messages should produce identical results
from both backends.

What I observed during this review was:

 - We expect that a reflog message consists of a single line.  The
   file format used by the files backend may add a LF after the
   message as a delimiter, and output by commands like "git log -g"
   may complete such an incomplete line by adding a LF at the end,
   but philosophically, the terminating LF is not a part of the
   message.

 - We however allow callers of refs API to supply a random sequence
   of NUL terminated bytes.  We cleanse caller-supplied message by
   squasing a run of whitespaces into a SP, and by trimming trailing
   whitespace, before storing the message.  This is how we tolerate,
   instead of erring out, a message with LF in it (be it at the end,
   in the middle, or both).

Currently, the cleansing of the reflog message is done by the files
backend, before the log is written out.  This is sufficient with the
current code, as that is the only backend that writes reflogs.  But
new backends can be added that write reflogs, and we'd want the
resulting log message we would read out of "log -g" the same no
matter what backend is used.

For that, we could force each new ref backend implementation to call
the same cleansing helper, but instead why not do the cleansing at
the backend agnostic way in refs.c layer?  That would be less error
prone.

An added benefit is that the "cleansing" function could be updated
later, independent from individual backends, to e.g. allow
multi-line log messages if we wanted to, and when that happens, it
would help a lot to ensure we covered all bases if the cleansing
function (which would be updated) is called from the generic layer.

	Side note: I am not interested in supporting multi-line
	reflog messages right at the moment (nobody is asking for
	it), but I envision that instead of the "squash a run of
	whitespaces into a SP and rtrim" cleansing, we can
	%urlencode problematic bytes in the message *AND* append a
	SP at the end, when a new version of Git that supports
	multi-line and/or verbatim reflog messages writes a reflog
	record.  The reading side can detect the presense of SP at
	the end (which should have been rtrimmed out if it were
	written by existing versions of Git) as a signal that
	decoding %urlencode recovers the original reflog message.

In any case, a patch that moves the existing "squash SPs and rtrim"
cleansing from the files backend to the generic layer may look like
the attached patch.  We can add reftable backend on top of a change
like this one and then we do not have to worry about each backend
cleansing the incoming reflog messages the same way.  Nice?

Thanks.

 refs.c               | 50 +++++++++++++++++++++++++++++++++++++++++---------
 refs/files-backend.c |  2 +-
 refs/refs-internal.h |  6 ------
 3 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/refs.c b/refs.c
index 224ff66c7b..75a0cda68a 100644
--- a/refs.c
+++ b/refs.c
@@ -870,7 +870,7 @@ int delete_ref(const char *msg, const char *refname,
 			       old_oid, flags);
 }
 
-void copy_reflog_msg(struct strbuf *sb, const char *msg)
+static void copy_reflog_msg(struct strbuf *sb, const char *msg)
 {
 	char c;
 	int wasspace = 1;
@@ -887,6 +887,15 @@ void copy_reflog_msg(struct strbuf *sb, const char *msg)
 	strbuf_rtrim(sb);
 }
 
+static char *normalize_reflog_message(const char *msg)
+{
+	struct strbuf sb = STRBUF_INIT;
+
+	if (msg && *msg)
+		copy_reflog_msg(&sb, msg);
+	return strbuf_detach(&sb, NULL);
+}
+
 int should_autocreate_reflog(const char *refname)
 {
 	switch (log_all_ref_updates) {
@@ -1092,7 +1101,7 @@ struct ref_update *ref_transaction_add_update(
 		oidcpy(&update->new_oid, new_oid);
 	if (flags & REF_HAVE_OLD)
 		oidcpy(&update->old_oid, old_oid);
-	update->msg = xstrdup_or_null(msg);
+	update->msg = normalize_reflog_message(msg);
 	return update;
 }
 
@@ -1951,9 +1960,14 @@ int refs_create_symref(struct ref_store *refs,
 		       const char *refs_heads_master,
 		       const char *logmsg)
 {
-	return refs->be->create_symref(refs, ref_target,
-				       refs_heads_master,
-				       logmsg);
+	char *msg;
+	int retval;
+
+	msg = normalize_reflog_message(logmsg);
+	retval = refs->be->create_symref(refs, ref_target,
+					 refs_heads_master, msg);
+	free(msg);
+	return retval;
 }
 
 int create_symref(const char *ref_target, const char *refs_heads_master,
@@ -2268,10 +2282,16 @@ int initial_ref_transaction_commit(struct ref_transaction *transaction,
 	return refs->be->initial_transaction_commit(refs, transaction, err);
 }
 
-int refs_delete_refs(struct ref_store *refs, const char *msg,
+int refs_delete_refs(struct ref_store *refs, const char *logmsg,
 		     struct string_list *refnames, unsigned int flags)
 {
-	return refs->be->delete_refs(refs, msg, refnames, flags);
+	char *msg;
+	int retval;
+
+	msg = normalize_reflog_message(logmsg);
+	retval = refs->be->delete_refs(refs, msg, refnames, flags);
+	free(msg);
+	return retval;
 }
 
 int delete_refs(const char *msg, struct string_list *refnames,
@@ -2283,7 +2303,13 @@ int delete_refs(const char *msg, struct string_list *refnames,
 int refs_rename_ref(struct ref_store *refs, const char *oldref,
 		    const char *newref, const char *logmsg)
 {
-	return refs->be->rename_ref(refs, oldref, newref, logmsg);
+	char *msg;
+	int retval;
+
+	msg = normalize_reflog_message(logmsg);
+	retval = refs->be->rename_ref(refs, oldref, newref, msg);
+	free(msg);
+	return retval;
 }
 
 int rename_ref(const char *oldref, const char *newref, const char *logmsg)
@@ -2294,7 +2320,13 @@ int rename_ref(const char *oldref, const char *newref, const char *logmsg)
 int refs_copy_existing_ref(struct ref_store *refs, const char *oldref,
 		    const char *newref, const char *logmsg)
 {
-	return refs->be->copy_ref(refs, oldref, newref, logmsg);
+	char *msg;
+	int retval;
+
+	msg = normalize_reflog_message(logmsg);
+	retval = refs->be->copy_ref(refs, oldref, newref, msg);
+	free(msg);
+	return retval;
 }
 
 int copy_existing_ref(const char *oldref, const char *newref, const char *logmsg)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6516c7bc8c..e0aba23eb2 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1629,7 +1629,7 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
 
 	strbuf_addf(&sb, "%s %s %s", oid_to_hex(old_oid), oid_to_hex(new_oid), committer);
 	if (msg && *msg)
-		copy_reflog_msg(&sb, msg);
+		strbuf_addstr(&sb, msg);
 	strbuf_addch(&sb, '\n');
 	if (write_in_full(fd, sb.buf, sb.len) < 0)
 		ret = -1;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 4271362d26..357359a0be 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -96,12 +96,6 @@ enum peel_status {
  */
 enum peel_status peel_object(const struct object_id *name, struct object_id *oid);
 
-/*
- * Copy the reflog message msg to sb while cleaning up the whitespaces.
- * Especially, convert LF to space, because reflog file is one line per entry.
- */
-void copy_reflog_msg(struct strbuf *sb, const char *msg);
-
 /**
  * Information needed for a single ref update. Set new_oid to the new
  * value or to null_oid to delete the ref. To check the old value



^ permalink raw reply related	[flat|nested] 409+ messages in thread

* Re: [PATCH v19 00/20] Reftable support git-core
  2020-07-01 10:16                                           ` Han-Wen Nienhuys
@ 2020-07-01 20:56                                             ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-07-01 20:56 UTC (permalink / raw)
  To: Han-Wen Nienhuys
  Cc: Han-Wen Nienhuys via GitGitGadget, git, Han-Wen Nienhuys,
	Phillip Wood, Phillip Wood

Han-Wen Nienhuys <hanwen@google.com> writes:

> On Wed, Jul 1, 2020 at 2:03 AM Junio C Hamano <gitster@pobox.com> wrote:
>> > Do you have an opinion on
>> > https://public-inbox.org/git/pull.665.git.1592580071.gitgitgadget@gmail.com/
>> > ?
>> >
>> > There is some overlap with in sequencer.c, and Phillip's approach is
>> > likely more principled, so I'd like to base reftable on that.
>>
>> I assumed that these were offered to you as possible improvements to
>> be folded into your series, so I didn't read them very carefully and
>> I didn't queue them myself.  I expected that I would see them,
>> possibly modified to fit the context better, as part of your series
>> sent from you, perhaps to become a part of early clean-up portion of
>> your topic.
>
> They are changing the signature of widely used functions, which is
> useful for my series but not completely necessary. I would rather that
> someone else decides on how to go forward with the series.

I was about to say that having written one ref backend, you are
likely to have better context to judge how much better Phillip's
patches would make the world be than I do ;-), but having worked on
the "cleanse caller-supplied reflog message at generic layer"
illustration patch, I think I am familiar enough with these three
functions Phillip's patch touched to comment on them.

I am not sure why delete_ref() needs to be modified to take a
caller-supplied repository when refs_delete_ref() is already even a
better way to do so (it allows the caller to give a ref_store, which
is a part of a repository).  The same between update_ref() and
refs_update_ref().

Introducing a new repo_delete_ref() to extend the API into a
three-tuple,

	delete_ref(...) {
		return repo_delete_ref(the_repository, ...);
	}
	repo_delete_ref(repo, ...) {
		return refs_delete_ref(get_main_ref_store(repo), ...);
	}
	refs_delete_ref(...) {
		/* as we already have it */
	}

_might_ make sense, but then we should do so for not just delete and
update, but for all the operations consistently.  I do not know if
that is worth it offhand.  The issue Phillip's patch addresses looks
like a mere "lack of convenience", not a fundamental "without this
we cannot write correct code easily", at least to me.

What disturbed me more while I was looking at refs.c was that some
operations are not done (perhaps cannot be done) as a part of a
transaction.  refs_delete_refs() and refs_rename_ref() directly call
into the backend layer, for example, and that does not smell right.

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: Re* [PATCH v19 03/20] checkout: add '\n' to reflog message
  2020-07-01 20:22                                               ` Re* " Junio C Hamano
@ 2020-07-06 15:56                                                 ` Han-Wen Nienhuys
  2020-07-06 18:53                                                   ` Junio C Hamano
  0 siblings, 1 reply; 409+ messages in thread
From: Han-Wen Nienhuys @ 2020-07-06 15:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys via GitGitGadget, git

On Wed, Jul 1, 2020 at 10:22 PM Junio C Hamano <gitster@pobox.com> wrote:
> In any case, a patch that moves the existing "squash SPs and rtrim"
> cleansing from the files backend to the generic layer may look like
> the attached patch.  We can add reftable backend on top of a change
> like this one and then we do not have to worry about each backend
> cleansing the incoming reflog messages the same way.  Nice?

Yes, very nice!  Will you merge this, or should I make this part of
the reftable series?
The reftable code already has normalization for reflog messages, so it
doesn't really make a difference; either way is fine.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

^ permalink raw reply	[flat|nested] 409+ messages in thread

* Re: Re* [PATCH v19 03/20] checkout: add '\n' to reflog message
  2020-07-06 15:56                                                 ` Han-Wen Nienhuys
@ 2020-07-06 18:53                                                   ` Junio C Hamano
  0 siblings, 0 replies; 409+ messages in thread
From: Junio C Hamano @ 2020-07-06 18:53 UTC (permalink / raw)
  To: Han-Wen Nienhuys; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys via GitGitGadget, git

Han-Wen Nienhuys <hanwen@google.com> writes:

> On Wed, Jul 1, 2020 at 10:22 PM Junio C Hamano <gitster@pobox.com> wrote:
>> In any case, a patch that moves the existing "squash SPs and rtrim"
>> cleansing from the files backend to the generic layer may look like
>> the attached patch.  We can add reftable backend on top of a change
>> like this one and then we do not have to worry about each backend
>> cleansing the incoming reflog messages the same way.  Nice?
>
> Yes, very nice!  Will you merge this, or should I make this part of
> the reftable series?
> The reftable code already has normalization for reflog messages, so it
> doesn't really make a difference; either way is fine.

It probably fits well in the "to prepare the existing code to
support any new backend" series you have split out of the reftable
series and sent separately earlier, not even "part of the reftable
series", I think.  With something like that, you may even be able
to drop the custom reflog message munging from reftable proper,
just like the whole point of the patch you are responding was to
drop the custom munging from the files backend.

Thanks.


^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v20 00/21] Reftable support git-core
  2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
                                                       ` (20 preceding siblings ...)
  2020-06-29 22:54                                     ` [PATCH v19 00/20] Reftable support git-core Junio C Hamano
@ 2020-07-31 15:26                                     ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:26                                       ` [PATCH v20 01/21] refs: add \t to reflog in the files backend Han-Wen Nienhuys via GitGitGadget
                                                         ` (20 more replies)
  21 siblings, 21 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:26 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys

This adds the reftable library, and hooks it up as a ref backend. 

This is still based on next, as there are conflicts basing on master.

Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1

Summary 455 tests fail

v24

 * rebase
 * FETCH_HEAD special casing

Han-Wen Nienhuys (19):
  refs: add \t to reflog in the files backend
  Split off reading loose ref data in separate function
  t1400: use git rev-parse for testing PSEUDOREF existence
  Modify pseudo refs through ref backend storage
  Make HEAD a PSEUDOREF rather than PER_WORKTREE.
  Make refs_ref_exists public
  Treat CHERRY_PICK_HEAD as a pseudo ref
  Treat REVERT_HEAD as a pseudo ref
  Move REF_LOG_ONLY to refs-internal.h
  Iteration  over entire ref namespace is iterating over "refs/"
  Add .gitattributes for the reftable/ directory
  Add reftable library
  Add standalone build infrastructure for reftable
  Reftable support for git-core
  Read FETCH_HEAD as loose ref
  Hookup unittests for the reftable library.
  Add GIT_DEBUG_REFS debugging mechanism
  Add reftable testing infrastructure
  Add "test-tool dump-reftable" command.

Johannes Schindelin (1):
  vcxproj: adjust for the reftable changes

SZEDER Gábor (1):
  git-prompt: prepare for reftable refs backend

 Documentation/git-update-ref.txt              |   13 +-
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   47 +-
 builtin/clone.c                               |    4 +-
 builtin/commit.c                              |   34 +-
 builtin/init-db.c                             |   55 +-
 builtin/merge.c                               |    2 +-
 builtin/worktree.c                            |   29 +-
 cache.h                                       |    8 +-
 config.mak.uname                              |    2 +-
 contrib/buildsystems/Generators/Vcxproj.pm    |   11 +-
 contrib/completion/git-prompt.sh              |    7 +-
 path.c                                        |    2 -
 path.h                                        |    9 +-
 ref-filter.c                                  |    7 +-
 refs.c                                        |  171 +-
 refs.h                                        |    6 +
 refs/debug.c                                  |  387 +++++
 refs/files-backend.c                          |   45 +-
 refs/refs-internal.h                          |   20 +
 refs/reftable-backend.c                       | 1415 +++++++++++++++++
 reftable/.gitattributes                       |    1 +
 reftable/BUILD                                |  203 +++
 reftable/LICENSE                              |   31 +
 reftable/README.md                            |   33 +
 reftable/VERSION                              |    1 +
 reftable/WORKSPACE                            |   14 +
 reftable/basics.c                             |  215 +++
 reftable/basics.h                             |   53 +
 reftable/block.c                              |  432 +++++
 reftable/block.h                              |  129 ++
 reftable/block_test.c                         |  157 ++
 reftable/compat.c                             |   98 ++
 reftable/compat.h                             |   48 +
 reftable/constants.h                          |   21 +
 reftable/dump.c                               |  212 +++
 reftable/file.c                               |   95 ++
 reftable/iter.c                               |  242 +++
 reftable/iter.h                               |   72 +
 reftable/merged.c                             |  317 ++++
 reftable/merged.h                             |   43 +
 reftable/merged_test.c                        |  286 ++++
 reftable/pq.c                                 |  115 ++
 reftable/pq.h                                 |   34 +
 reftable/reader.c                             |  744 +++++++++
 reftable/reader.h                             |   65 +
 reftable/record.c                             | 1126 +++++++++++++
 reftable/record.h                             |  143 ++
 reftable/record_test.c                        |  410 +++++
 reftable/refname.c                            |  209 +++
 reftable/refname.h                            |   38 +
 reftable/refname_test.c                       |   99 ++
 reftable/reftable-tests.h                     |   22 +
 reftable/reftable.c                           |  154 ++
 reftable/reftable.h                           |  585 +++++++
 reftable/reftable_test.c                      |  632 ++++++++
 reftable/stack.c                              | 1225 ++++++++++++++
 reftable/stack.h                              |   50 +
 reftable/stack_test.c                         |  787 +++++++++
 reftable/strbuf.c                             |  206 +++
 reftable/strbuf.h                             |   88 +
 reftable/strbuf_test.c                        |   39 +
 reftable/system.h                             |   53 +
 reftable/test_framework.c                     |   69 +
 reftable/test_framework.h                     |   64 +
 reftable/tree.c                               |   63 +
 reftable/tree.h                               |   34 +
 reftable/tree_test.c                          |   62 +
 reftable/update.sh                            |   19 +
 reftable/writer.c                             |  663 ++++++++
 reftable/writer.h                             |   60 +
 reftable/zlib-compat.c                        |   92 ++
 reftable/zlib.BUILD                           |   36 +
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 sequencer.c                                   |   56 +-
 setup.c                                       |   10 +-
 t/helper/test-reftable.c                      |   20 +
 t/helper/test-tool.c                          |    2 +
 t/helper/test-tool.h                          |    2 +
 t/t0031-reftable.sh                           |  185 +++
 t/t0033-debug-refs.sh                         |   18 +
 t/t1400-update-ref.sh                         |   30 +-
 t/t1405-main-ref-store.sh                     |    5 +-
 t/t1409-avoid-packing-refs.sh                 |    6 +
 t/t1450-fsck.sh                               |    6 +
 t/t3210-pack-refs.sh                          |    6 +
 t/test-lib.sh                                 |    5 +
 wt-status.c                                   |    6 +-
 89 files changed, 13076 insertions(+), 256 deletions(-)
 create mode 100644 refs/debug.c
 create mode 100644 refs/reftable-backend.c
 create mode 100644 reftable/.gitattributes
 create mode 100644 reftable/BUILD
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/README.md
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/WORKSPACE
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/compat.c
 create mode 100644 reftable/compat.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/refname_test.c
 create mode 100644 reftable/reftable-tests.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/strbuf.c
 create mode 100644 reftable/strbuf.h
 create mode 100644 reftable/strbuf_test.c
 create mode 100644 reftable/system.h
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c
 create mode 100644 reftable/zlib.BUILD
 create mode 100644 t/helper/test-reftable.c
 create mode 100755 t/t0031-reftable.sh
 create mode 100755 t/t0033-debug-refs.sh


base-commit: e8ab941b671da6890181aea5b5755d1d9eea24ec
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v20
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v20
Pull-Request: https://github.com/gitgitgadget/git/pull/539

Range-diff vs v19:

  1:  596c316da4 <  -:  ---------- lib-t6000.sh: write tag using git-update-ref
  2:  277da0cf7e <  -:  ---------- t3432: use git-reflog to inspect the reflog for HEAD
  3:  125695ce92 <  -:  ---------- checkout: add '\n' to reflog message
  -:  ---------- >  1:  05ced51683 refs: add \t to reflog in the files backend
  -:  ---------- >  2:  d0f0680c0e Split off reading loose ref data in separate function
  -:  ---------- >  3:  716cb30c1b t1400: use git rev-parse for testing PSEUDOREF existence
  4:  5715681b3f !  4:  400ffe531f Write pseudorefs through ref backends.
     @@ Metadata
      Author: Han-Wen Nienhuys <hanwen@google.com>
      
       ## Commit message ##
     -    Write pseudorefs through ref backends.
     +    Modify pseudo refs through ref backend storage
      
     -    Pseudorefs store transient data in in the repository. Examples are HEAD,
     -    CHERRY_PICK_HEAD, etc.
     +    The previous behavior was introduced in commit 74ec19d4be
     +    ("pseudorefs: create and use pseudoref update and delete functions",
     +    Jul 31, 2015), with the justification "alternate ref backends still
     +    need to store pseudorefs in GIT_DIR".
      
     -    These refs have always been read through the ref backends, but they were written
     -    in a one-off routine that wrote an object ID or symref directly into
     -    .git/<pseudo_ref_name>.
     +    Refs such as REBASE_HEAD are read through the ref backend. This can
     +    only work consistently if they are written through the ref backend as
     +    well. Tooling that works directly on files under .git should be
     +    updated to use git commands to read refs instead.
      
     -    This causes problems when introducing a new ref storage backend. To remedy this,
     -    extend the ref backend implementation with a write_pseudoref_fn and
     -    update_pseudoref_fn.
     +    The following behaviors change:
     +
     +    * Updates to pseudorefs (eg. ORIG_HEAD) with
     +      core.logAllRefUpdates=always will create reflogs for the pseudoref.
     +
     +    * non-HEAD pseudoref symrefs are also dereferenced on deletion. Update
     +      t1405 accordingly.
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
     - ## refs.c ##
     -@@ refs.c: int ref_exists(const char *refname)
     - 	return refs_ref_exists(get_main_ref_store(the_repository), refname);
     - }
     + ## Documentation/git-update-ref.txt ##
     +@@ Documentation/git-update-ref.txt: still see a subset of the modifications.
     + 
     + LOGGING UPDATES
     + ---------------
     +-If config parameter "core.logAllRefUpdates" is true and the ref is one under
     +-"refs/heads/", "refs/remotes/", "refs/notes/", or the symbolic ref HEAD; or
     +-the file "$GIT_DIR/logs/<ref>" exists then `git update-ref` will append
     +-a line to the log file "$GIT_DIR/logs/<ref>" (dereferencing all
     +-symbolic refs before creating the log name) describing the change
     +-in ref value.  Log lines are formatted as:
     ++If config parameter "core.logAllRefUpdates" is true and the ref is one
     ++under "refs/heads/", "refs/remotes/", "refs/notes/", or a pseudoref
     ++like HEAD or ORIG_HEAD; or the file "$GIT_DIR/logs/<ref>" exists then
     ++`git update-ref` will append a line to the log file
     ++"$GIT_DIR/logs/<ref>" (dereferencing all symbolic refs before creating
     ++the log name) describing the change in ref value.  Log lines are
     ++formatted as:
     + 
     +     oldsha1 SP newsha1 SP committer LF
       
     -+int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
     -+{
     -+	return refs_delete_pseudoref(get_main_ref_store(the_repository),
     -+				     pseudoref, old_oid);
     -+}
     -+
     - static int filter_refs(const char *refname, const struct object_id *oid,
     - 			   int flags, void *data)
     - {
     +
     + ## refs.c ##
      @@ refs.c: long get_files_ref_lock_timeout_ms(void)
       	return timeout_ms;
       }
     @@ refs.c: long get_files_ref_lock_timeout_ms(void)
      -
      -	return 0;
      -}
     - 
     +-
       int refs_delete_ref(struct ref_store *refs, const char *msg,
       		    const char *refname,
     + 		    const struct object_id *old_oid,
      @@ refs.c: int refs_delete_ref(struct ref_store *refs, const char *msg,
     + 	struct ref_transaction *transaction;
     + 	struct strbuf err = STRBUF_INIT;
       
     - 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
     - 		assert(refs == get_main_ref_store(the_repository));
     +-	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
     +-		assert(refs == get_main_ref_store(the_repository));
      -		return delete_pseudoref(refname, old_oid);
     -+		return refs_delete_pseudoref(refs, refname, old_oid);
     - 	}
     - 
     +-	}
     +-
       	transaction = ref_store_transaction_begin(refs, &err);
     + 	if (!transaction ||
     + 	    ref_transaction_delete(transaction, refname, old_oid,
      @@ refs.c: int refs_update_ref(struct ref_store *refs, const char *msg,
     + 	struct strbuf err = STRBUF_INIT;
     + 	int ret = 0;
       
     - 	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
     - 		assert(refs == get_main_ref_store(the_repository));
     +-	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
     +-		assert(refs == get_main_ref_store(the_repository));
      -		ret = write_pseudoref(refname, new_oid, old_oid, &err);
     -+		ret = refs_write_pseudoref(refs, refname, new_oid, old_oid,
     -+					   &err);
     - 	} else {
     - 		t = ref_store_transaction_begin(refs, &err);
     - 		if (!t ||
     -@@ refs.c: int head_ref(each_ref_fn fn, void *cb_data)
     - 	return refs_head_ref(get_main_ref_store(the_repository), fn, cb_data);
     - }
     - 
     -+int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
     -+			 const struct object_id *oid,
     -+			 const struct object_id *old_oid, struct strbuf *err)
     -+{
     -+	return refs->be->write_pseudoref(refs, pseudoref, oid, old_oid, err);
     -+}
     -+
     -+int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
     -+			  const struct object_id *old_oid)
     -+{
     -+	return refs->be->delete_pseudoref(refs, pseudoref, old_oid);
     -+}
     -+
     - struct ref_iterator *refs_ref_iterator_begin(
     - 		struct ref_store *refs,
     - 		const char *prefix, int trim, int flags)
     -
     - ## refs.h ##
     -@@ refs.h: int update_ref(const char *msg, const char *refname,
     - 	       const struct object_id *new_oid, const struct object_id *old_oid,
     - 	       unsigned int flags, enum action_on_err onerr);
     - 
     -+/* Pseudorefs (eg. HEAD, CHERRY_PICK_HEAD) have a separate routines for updating
     -+   and deletion as they cannot take part in normal transactional updates.
     -+   Pseudorefs should only be written for the main repository.
     -+*/
     -+int refs_write_pseudoref(struct ref_store *refs, const char *pseudoref,
     -+			 const struct object_id *oid,
     -+			 const struct object_id *old_oid, struct strbuf *err);
     -+int refs_delete_pseudoref(struct ref_store *refs, const char *pseudoref,
     -+			  const struct object_id *old_oid);
     -+int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid);
     -+
     - int parse_hide_refs_config(const char *var, const char *value, const char *);
     - 
     - /*
     -
     - ## refs/files-backend.c ##
     -@@ refs/files-backend.c: static int lock_raw_ref(struct files_ref_store *refs,
     - 	return ret;
     - }
     - 
     -+static int files_write_pseudoref(struct ref_store *ref_store,
     -+				 const char *pseudoref,
     -+				 const struct object_id *oid,
     -+				 const struct object_id *old_oid,
     -+				 struct strbuf *err)
     -+{
     -+	struct files_ref_store *refs =
     -+		files_downcast(ref_store, REF_STORE_READ, "write_pseudoref");
     -+	int fd;
     -+	struct lock_file lock = LOCK_INIT;
     -+	struct strbuf filename = STRBUF_INIT;
     -+	struct strbuf buf = STRBUF_INIT;
     -+	int ret = -1;
     -+
     -+	if (!oid)
     -+		return 0;
     -+
     -+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
     -+	fd = hold_lock_file_for_update_timeout(&lock, filename.buf, 0,
     -+					       get_files_ref_lock_timeout_ms());
     -+	if (fd < 0) {
     -+		strbuf_addf(err, _("could not open '%s' for writing: %s"),
     -+			    buf.buf, strerror(errno));
     -+		goto done;
     -+	}
     -+
     -+	if (old_oid) {
     -+		struct object_id actual_old_oid;
     -+
     -+		if (read_ref(pseudoref, &actual_old_oid)) {
     -+			if (!is_null_oid(old_oid)) {
     -+				strbuf_addf(err, _("could not read ref '%s'"),
     -+					    pseudoref);
     -+				rollback_lock_file(&lock);
     -+				goto done;
     -+			}
     -+		} else if (is_null_oid(old_oid)) {
     -+			strbuf_addf(err, _("ref '%s' already exists"),
     -+				    pseudoref);
     -+			rollback_lock_file(&lock);
     -+			goto done;
     -+		} else if (!oideq(&actual_old_oid, old_oid)) {
     -+			strbuf_addf(err,
     -+				    _("unexpected object ID when writing '%s'"),
     -+				    pseudoref);
     -+			rollback_lock_file(&lock);
     -+			goto done;
     -+		}
     -+	}
     -+
     -+	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
     -+	if (write_in_full(fd, buf.buf, buf.len) < 0) {
     -+		strbuf_addf(err, _("could not write to '%s'"), filename.buf);
     -+		rollback_lock_file(&lock);
     -+		goto done;
     -+	}
     -+
     -+	commit_lock_file(&lock);
     -+	ret = 0;
     -+done:
     -+	strbuf_release(&buf);
     -+	strbuf_release(&filename);
     -+	return ret;
     -+}
     -+
     -+static int files_delete_pseudoref(struct ref_store *ref_store,
     -+				  const char *pseudoref,
     -+				  const struct object_id *old_oid)
     -+{
     -+	struct files_ref_store *refs =
     -+		files_downcast(ref_store, REF_STORE_READ, "delete_pseudoref");
     -+	struct strbuf filename = STRBUF_INIT;
     -+	int ret = -1;
     -+
     -+	strbuf_addf(&filename, "%s/%s", refs->gitdir, pseudoref);
     -+
     -+	if (old_oid && !is_null_oid(old_oid)) {
     -+		struct lock_file lock = LOCK_INIT;
     -+		int fd;
     -+		struct object_id actual_old_oid;
     -+
     -+		fd = hold_lock_file_for_update_timeout(
     -+			&lock, filename.buf, 0,
     -+			get_files_ref_lock_timeout_ms());
     -+		if (fd < 0) {
     -+			error_errno(_("could not open '%s' for writing"),
     -+				    filename.buf);
     -+			goto done;
     -+		}
     -+		if (read_ref(pseudoref, &actual_old_oid))
     -+			die(_("could not read ref '%s'"), pseudoref);
     -+		if (!oideq(&actual_old_oid, old_oid)) {
     -+			error(_("unexpected object ID when deleting '%s'"),
     -+			      pseudoref);
     -+			rollback_lock_file(&lock);
     -+			goto done;
     -+		}
     -+
     -+		unlink(filename.buf);
     -+		rollback_lock_file(&lock);
     -+	} else {
     -+		unlink(filename.buf);
     -+	}
     -+	ret = 0;
     -+done:
     -+	strbuf_release(&filename);
     -+	return ret;
     -+}
     -+
     - struct files_ref_iterator {
     - 	struct ref_iterator base;
     - 
     -@@ refs/files-backend.c: struct ref_storage_be refs_be_files = {
     - 	files_rename_ref,
     - 	files_copy_ref,
     - 
     -+	files_write_pseudoref,
     -+	files_delete_pseudoref,
     -+
     - 	files_ref_iterator_begin,
     - 	files_read_raw_ref,
     - 
     -@@ refs/files-backend.c: struct ref_storage_be refs_be_files = {
     - 	files_reflog_exists,
     - 	files_create_reflog,
     - 	files_delete_reflog,
     --	files_reflog_expire
     -+	files_reflog_expire,
     - };
     +-	} else {
     +-		t = ref_store_transaction_begin(refs, &err);
     +-		if (!t ||
     +-		    ref_transaction_update(t, refname, new_oid, old_oid,
     +-					   flags, msg, &err) ||
     +-		    ref_transaction_commit(t, &err)) {
     +-			ret = 1;
     +-			ref_transaction_free(t);
     +-		}
     ++	t = ref_store_transaction_begin(refs, &err);
     ++	if (!t ||
     ++	    ref_transaction_update(t, refname, new_oid, old_oid, flags, msg,
     ++				   &err) ||
     ++	    ref_transaction_commit(t, &err)) {
     ++		ret = 1;
     ++		ref_transaction_free(t);
     + 	}
     + 	if (ret) {
     + 		const char *str = _("update_ref failed for ref '%s': %s");
      
     - ## refs/packed-backend.c ##
     -@@ refs/packed-backend.c: static int packed_copy_ref(struct ref_store *ref_store,
     - 	BUG("packed reference store does not support copying references");
     - }
     - 
     -+static int packed_write_pseudoref(struct ref_store *ref_store,
     -+				  const char *pseudoref,
     -+				  const struct object_id *oid,
     -+				  const struct object_id *old_oid,
     -+				  struct strbuf *err)
     -+{
     -+	BUG("packed reference store does not support writing pseudo-references");
     -+}
     -+
     -+static int packed_delete_pseudoref(struct ref_store *ref_store,
     -+				   const char *pseudoref,
     -+				   const struct object_id *old_oid)
     -+{
     -+	BUG("packed reference store does not support deleting pseudo-references");
     -+}
     -+
     - static struct ref_iterator *packed_reflog_iterator_begin(struct ref_store *ref_store)
     - {
     - 	return empty_ref_iterator_begin();
     -@@ refs/packed-backend.c: struct ref_storage_be refs_be_packed = {
     - 	packed_rename_ref,
     - 	packed_copy_ref,
     - 
     -+	packed_write_pseudoref,
     -+	packed_delete_pseudoref,
     -+
     - 	packed_ref_iterator_begin,
     - 	packed_read_raw_ref,
     - 
     -@@ refs/packed-backend.c: struct ref_storage_be refs_be_packed = {
     - 	packed_reflog_exists,
     - 	packed_create_reflog,
     - 	packed_delete_reflog,
     --	packed_reflog_expire
     -+	packed_reflog_expire,
     - };
     + ## t/t1400-update-ref.sh ##
     +@@ t/t1400-update-ref.sh: test_expect_success 'core.logAllRefUpdates=always creates reflog by default' '
     + 	git reflog exists $outside
     + '
     + 
     +-test_expect_success 'core.logAllRefUpdates=always creates no reflog for ORIG_HEAD' '
     ++test_expect_success 'core.logAllRefUpdates=always creates reflog for ORIG_HEAD' '
     + 	test_config core.logAllRefUpdates always &&
     + 	git update-ref ORIG_HEAD $A &&
     +-	test_must_fail git reflog exists ORIG_HEAD
     ++	git reflog exists ORIG_HEAD
     + '
     + 
     + test_expect_success '--no-create-reflog overrides core.logAllRefUpdates=always' '
     +@@ t/t1400-update-ref.sh: test_expect_success 'git cat-file blob master@{2005-05-26 23:42}:F (expect OTHER
     + test_expect_success 'given old value for missing pseudoref, do not create' '
     + 	test_must_fail git update-ref PSEUDOREF $A $B 2>err &&
     + 	test_must_fail git rev-parse PSEUDOREF &&
     +-	test_i18ngrep "could not read ref" err
     ++	test_i18ngrep "unable to resolve reference" err
     + '
     + 
     + test_expect_success 'create pseudoref' '
     +@@ t/t1400-update-ref.sh: test_expect_success 'overwrite pseudoref with correct old value' '
     + test_expect_success 'do not overwrite pseudoref with wrong old value' '
     + 	test_must_fail git update-ref PSEUDOREF $D $E 2>err &&
     + 	test $C = $(git rev-parse PSEUDOREF) &&
     +-	test_i18ngrep "unexpected object ID" err
     ++	test_i18ngrep "cannot lock ref.*expected" err
     + '
     + 
     + test_expect_success 'delete pseudoref' '
     +@@ t/t1400-update-ref.sh: test_expect_success 'do not delete pseudoref with wrong old value' '
     + 	git update-ref PSEUDOREF $A &&
     + 	test_must_fail git update-ref -d PSEUDOREF $B 2>err &&
     + 	test $A = $(git rev-parse PSEUDOREF) &&
     +-	test_i18ngrep "unexpected object ID" err
     ++	test_i18ngrep "cannot lock ref.*expected" err
     + '
     + 
     + test_expect_success 'delete pseudoref with correct old value' '
      
     - ## refs/refs-internal.h ##
     -@@ refs/refs-internal.h: typedef int copy_ref_fn(struct ref_store *ref_store,
     - 			  const char *oldref, const char *newref,
     - 			  const char *logmsg);
     - 
     -+typedef int write_pseudoref_fn(struct ref_store *ref_store,
     -+			       const char *pseudoref,
     -+			       const struct object_id *oid,
     -+			       const struct object_id *old_oid,
     -+			       struct strbuf *err);
     -+
     -+/*
     -+ * Deletes a pseudoref. Deletion always succeeds (even if the pseudoref doesn't
     -+ * exist.), except if old_oid is specified. If it is, it can fail due to lock
     -+ * failure, failure reading the old OID, or an OID mismatch
     -+ */
     -+typedef int delete_pseudoref_fn(struct ref_store *ref_store,
     -+				const char *pseudoref,
     -+				const struct object_id *old_oid);
     -+
     - /*
     -  * Iterate over the references in `ref_store` whose names start with
     -  * `prefix`. `prefix` is matched as a literal string, without regard
     -@@ refs/refs-internal.h: struct ref_storage_be {
     - 	rename_ref_fn *rename_ref;
     - 	copy_ref_fn *copy_ref;
     - 
     -+	write_pseudoref_fn *write_pseudoref;
     -+	delete_pseudoref_fn *delete_pseudoref;
     -+
     - 	ref_iterator_begin_fn *iterator_begin;
     - 	read_raw_ref_fn *read_raw_ref;
     - 
     + ## t/t1405-main-ref-store.sh ##
     +@@ t/t1405-main-ref-store.sh: test_expect_success 'create_symref(FOO, refs/heads/master)' '
     + test_expect_success 'delete_refs(FOO, refs/tags/new-tag)' '
     + 	git rev-parse FOO -- &&
     + 	git rev-parse refs/tags/new-tag -- &&
     +-	$RUN delete-refs 0 nothing FOO refs/tags/new-tag &&
     ++	m=$(git rev-parse master) &&
     ++	REF_NO_DEREF=1 &&
     ++	$RUN delete-refs $REF_NO_DEREF nothing FOO refs/tags/new-tag &&
     ++	test_must_fail git rev-parse --symbolic-full-name FOO &&
     + 	test_must_fail git rev-parse FOO -- &&
     + 	test_must_fail git rev-parse refs/tags/new-tag --
     + '
  -:  ---------- >  5:  e9df96925b Make HEAD a PSEUDOREF rather than PER_WORKTREE.
  5:  c8cf2d2ce1 =  6:  d33ea20fba Make refs_ref_exists public
  6:  e36b29de79 <  -:  ---------- Treat BISECT_HEAD as a pseudo ref
  7:  1676df9851 !  7:  d12f6e905c Treat CHERRY_PICK_HEAD as a pseudo ref
     @@ Metadata
       ## Commit message ##
          Treat CHERRY_PICK_HEAD as a pseudo ref
      
     -    Check for existence and delete CHERRY_PICK_HEAD through pseudo ref functions.
     +    Check for existence and delete CHERRY_PICK_HEAD through ref functions.
          This will help cherry-pick work with alternate ref storage backends.
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
     @@ sequencer.c: static void print_advice(struct repository *r, int show_hint,
       		 * of the commit itself so remove CHERRY_PICK_HEAD
       		 */
      -		unlink(git_path_cherry_pick_head(r));
     -+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
     -+				      NULL);
     ++		refs_delete_ref(get_main_ref_store(r), "", "CHERRY_PICK_HEAD",
     ++				NULL, 0);
       		return;
       	}
       
     @@ sequencer.c: static int do_commit(struct repository *r,
       		strbuf_release(&sb);
       		if (!res) {
      -			unlink(git_path_cherry_pick_head(r));
     -+			refs_delete_pseudoref(get_main_ref_store(r),
     -+					      "CHERRY_PICK_HEAD", NULL);
     ++			refs_delete_ref(get_main_ref_store(r), "",
     ++					"CHERRY_PICK_HEAD", NULL, 0);
       			unlink(git_path_merge_msg(r));
       			if (!is_rebase_i(opts))
       				print_commit_summary(r, NULL, &oid,
     @@ sequencer.c: static int do_pick_commit(struct repository *r,
       	} else if (allow == 2) {
       		drop_commit = 1;
      -		unlink(git_path_cherry_pick_head(r));
     -+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
     -+				      NULL);
     ++		refs_delete_ref(get_main_ref_store(r), "", "CHERRY_PICK_HEAD",
     ++				NULL, 0);
       		unlink(git_path_merge_msg(r));
       		fprintf(stderr,
       			_("dropping %s %s -- patch contents already upstream\n"),
     @@ sequencer.c: void sequencer_post_commit_cleanup(struct repository *r, int verbos
      -	if (file_exists(git_path_cherry_pick_head(r))) {
      -		if (!unlink(git_path_cherry_pick_head(r)) && verbose)
      +	if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD")) {
     -+		if (!refs_delete_pseudoref(get_main_ref_store(r),
     -+					   "CHERRY_PICK_HEAD", NULL) &&
     ++		if (!refs_delete_ref(get_main_ref_store(r), "",
     ++				     "CHERRY_PICK_HEAD", NULL, 0) &&
      +		    verbose)
       			warning(_("cancelling a cherry picking in progress"));
       		opts.action = REPLAY_PICK;
     @@ sequencer.c: static int do_merge(struct repository *r,
       
       		strbuf_release(&ref_name);
      -		unlink(git_path_cherry_pick_head(r));
     -+		refs_delete_pseudoref(get_main_ref_store(r), "CHERRY_PICK_HEAD",
     -+				      NULL);
     ++		refs_delete_ref(get_main_ref_store(r), "", "CHERRY_PICK_HEAD",
     ++				NULL, 0);
       		rollback_lock_file(&lock);
       
       		rollback_lock_file(&lock);
     @@ sequencer.c: static int commit_staged_changes(struct repository *r,
      -		if (file_exists(cherry_pick_head) && unlink(cherry_pick_head))
      +		if (refs_ref_exists(get_main_ref_store(r),
      +				    "CHERRY_PICK_HEAD") &&
     -+		    refs_delete_pseudoref(get_main_ref_store(r),
     -+					  "CHERRY_PICK_HEAD", NULL))
     ++		    refs_delete_ref(get_main_ref_store(r), "",
     ++				    "CHERRY_PICK_HEAD", NULL, 0))
       			return error(_("could not remove CHERRY_PICK_HEAD"));
       		if (!final_fixup)
       			return 0;
  8:  0a5f97ea34 !  8:  224f7bf224 Treat REVERT_HEAD as a pseudo ref
     @@ sequencer.c: void sequencer_post_commit_cleanup(struct repository *r, int verbos
      -	if (file_exists(git_path_revert_head(r))) {
      -		if (!unlink(git_path_revert_head(r)) && verbose)
      +	if (refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
     -+		if (!refs_delete_pseudoref(get_main_ref_store(r), "REVERT_HEAD",
     -+					   NULL) &&
     ++		if (!refs_delete_ref(get_main_ref_store(r), "", "REVERT_HEAD",
     ++				     NULL, 0) &&
      +		    verbose)
       			warning(_("cancelling a revert in progress"));
       		opts.action = REPLAY_REVERT;
  9:  9463ed9093 =  9:  4b601b545c Move REF_LOG_ONLY to refs-internal.h
 10:  a116aebe11 ! 10:  ea8a78374f Iterate over the "refs/" namespace in for_each_[raw]ref
     @@ Metadata
      Author: Han-Wen Nienhuys <hanwen@google.com>
      
       ## Commit message ##
     -    Iterate over the "refs/" namespace in for_each_[raw]ref
     +    Iteration  over entire ref namespace is iterating over "refs/"
     +
     +    This changes behavior for in for_each_[raw]ref and
     +    for_each_fullref_in_pattern.
      
          This happens implicitly in the files/packed ref backend; making it
          explicit simplifies adding alternate ref storage backends, such as
     @@ Commit message
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
     + ## ref-filter.c ##
     +@@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
     + {
     + 	struct string_list prefixes = STRING_LIST_INIT_DUP;
     + 	struct string_list_item *prefix;
     ++	const char *all = "refs/";
     + 	int ret;
     + 
     + 	if (!filter->match_as_path) {
     +@@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
     + 		 * prefixes like "refs/heads/" etc. are stripped off,
     + 		 * so we have to look at everything:
     + 		 */
     +-		return for_each_fullref_in("", cb, cb_data, broken);
     ++		return for_each_fullref_in(all, cb, cb_data, broken);
     + 	}
     + 
     + 	if (filter->ignore_case) {
     +@@ ref-filter.c: static int for_each_fullref_in_pattern(struct ref_filter *filter,
     + 		 * so just return everything and let the caller
     + 		 * sort it out.
     + 		 */
     +-		return for_each_fullref_in("", cb, cb_data, broken);
     ++		return for_each_fullref_in(all, cb, cb_data, broken);
     + 	}
     + 
     + 	if (!filter->name_patterns[0]) {
     + 		/* no patterns; we have to look at everything */
     +-		return for_each_fullref_in("", cb, cb_data, broken);
     ++		return for_each_fullref_in(all, cb, cb_data, broken);
     + 	}
     + 
     + 	find_longest_prefixes(&prefixes, filter->name_patterns);
     +
       ## refs.c ##
      @@ refs.c: static int do_for_each_ref(struct ref_store *refs, const char *prefix,
       
 11:  e4545658ed = 11:  0606b2a1c8 Add .gitattributes for the reftable/ directory
 12:  169f6c7f54 = 12:  12d98125c2 Add reftable library
 13:  d155240b16 = 13:  70ce4ce49d Add standalone build infrastructure for reftable
 14:  073bff7279 ! 14:  7af3b2340e Reftable support for git-core
     @@ builtin/init-db.c: int init_db(const char *git_dir, const char *real_git_dir,
       		git_config_set("receive.denyNonFastforwards", "true");
       	}
       
     -+	git_config_set("extensions.refStorage", ref_storage_format);
     ++	if (!strcmp(ref_storage_format, "reftable"))
     ++		git_config_set("extensions.refStorage", ref_storage_format);
      +
       	if (!(flags & INIT_DB_QUIET)) {
       		int len = strlen(git_dir);
     @@ cache.h: int path_inside_repo(const char *prefix, const char *path);
       void sanitize_stdfds(void);
       int daemonize(void);
      @@ cache.h: struct repository_format {
     + 	int is_bare;
       	int hash_algo;
     - 	int has_extensions;
       	char *work_tree;
      +	char *ref_storage;
       	struct string_list unknown_extensions;
     + 	struct string_list v1_only_extensions;
       };
     - 
      
       ## refs.c ##
      @@
     @@ refs.c: struct ref_store *get_main_ref_store(struct repository *r)
      -	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
      +	r->refs_private = ref_store_init(r->gitdir,
      +					 r->ref_storage_format ?
     -+						       r->ref_storage_format :
     -+						       default_ref_storage(),
     ++						 r->ref_storage_format :
     ++						 default_ref_storage(),
      +					 REF_STORE_ALL_CAPS);
       	return r->refs_private;
       }
     @@ refs/reftable-backend.c (new)
      +	struct reftable_addition *add = NULL;
      +	struct reftable_stack *stack =
      +		transaction->nr ?
     -+			      stack_for(refs, transaction->updates[0]->refname) :
     -+			      refs->main_stack;
     ++			stack_for(refs, transaction->updates[0]->refname) :
     ++			refs->main_stack;
      +	int err = refs->err;
      +	if (err < 0) {
      +		goto done;
     @@ refs/reftable-backend.c (new)
      +	return git_reftable_transaction_finish(ref_store, transaction, errmsg);
      +}
      +
     -+struct write_pseudoref_arg {
     -+	struct reftable_stack *stack;
     -+	const char *pseudoref;
     -+	const struct object_id *new_oid;
     -+	const struct object_id *old_oid;
     -+};
     -+
     -+static int write_pseudoref_table(struct reftable_writer *writer, void *argv)
     -+{
     -+	struct write_pseudoref_arg *arg = (struct write_pseudoref_arg *)argv;
     -+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
     -+	int err = 0;
     -+	struct reftable_ref_record read_ref = { NULL };
     -+	struct reftable_ref_record write_ref = { NULL };
     -+
     -+	reftable_writer_set_limits(writer, ts, ts);
     -+	if (arg->old_oid) {
     -+		struct object_id read_oid;
     -+		err = reftable_stack_read_ref(arg->stack, arg->pseudoref,
     -+					      &read_ref);
     -+		if (err < 0)
     -+			goto done;
     -+
     -+		if ((err > 0) != is_null_oid(arg->old_oid)) {
     -+			err = REFTABLE_LOCK_ERROR;
     -+			goto done;
     -+		}
     -+
     -+		/* XXX If old_oid is set, and we have a symref? */
     -+
     -+		if (err == 0 && read_ref.value == NULL) {
     -+			err = REFTABLE_LOCK_ERROR;
     -+			goto done;
     -+		}
     -+
     -+		hashcpy(read_oid.hash, read_ref.value);
     -+		if (!oideq(arg->old_oid, &read_oid)) {
     -+			err = REFTABLE_LOCK_ERROR;
     -+			goto done;
     -+		}
     -+	}
     -+
     -+	write_ref.refname = (char *)arg->pseudoref;
     -+	write_ref.update_index = ts;
     -+	if (!is_null_oid(arg->new_oid))
     -+		write_ref.value = (uint8_t *)arg->new_oid->hash;
     -+
     -+	err = reftable_writer_add_ref(writer, &write_ref);
     -+done:
     -+	assert(err != REFTABLE_API_ERROR);
     -+	reftable_ref_record_clear(&read_ref);
     -+	return err;
     -+}
     -+
     -+static int git_reftable_write_pseudoref(struct ref_store *ref_store,
     -+					const char *pseudoref,
     -+					const struct object_id *oid,
     -+					const struct object_id *old_oid,
     -+					struct strbuf *errbuf)
     -+{
     -+	struct git_reftable_ref_store *refs =
     -+		(struct git_reftable_ref_store *)ref_store;
     -+	struct reftable_stack *stack = stack_for(refs, pseudoref);
     -+	struct write_pseudoref_arg arg = {
     -+		.stack = stack,
     -+		.pseudoref = pseudoref,
     -+		.new_oid = oid,
     -+	};
     -+	struct reftable_addition *add = NULL;
     -+	int err = refs->err;
     -+	if (err < 0) {
     -+		goto done;
     -+	}
     -+
     -+	err = reftable_stack_reload(stack);
     -+	if (err) {
     -+		goto done;
     -+	}
     -+	err = reftable_stack_new_addition(&add, stack);
     -+	if (err) {
     -+		goto done;
     -+	}
     -+	if (old_oid) {
     -+		struct object_id actual_old_oid;
     -+
     -+		/* XXX this is cut & paste from files-backend - should factor
     -+		 * out? */
     -+		if (read_ref(pseudoref, &actual_old_oid)) {
     -+			if (!is_null_oid(old_oid)) {
     -+				strbuf_addf(errbuf,
     -+					    _("could not read ref '%s'"),
     -+					    pseudoref);
     -+				goto done;
     -+			}
     -+		} else if (is_null_oid(old_oid)) {
     -+			strbuf_addf(errbuf, _("ref '%s' already exists"),
     -+				    pseudoref);
     -+			goto done;
     -+		} else if (!oideq(&actual_old_oid, old_oid)) {
     -+			strbuf_addf(errbuf,
     -+				    _("unexpected object ID when writing '%s'"),
     -+				    pseudoref);
     -+			goto done;
     -+		}
     -+	}
     -+
     -+	err = reftable_addition_add(add, &write_pseudoref_table, &arg);
     -+	if (err < 0) {
     -+		strbuf_addf(errbuf, "reftable: pseudoref update failure: %s",
     -+			    reftable_error_str(err));
     -+	}
     -+
     -+	err = reftable_addition_commit(add);
     -+	if (err < 0) {
     -+		strbuf_addf(errbuf, "reftable: pseudoref commit failure: %s",
     -+			    reftable_error_str(err));
     -+	}
     -+
     -+done:
     -+	assert(err != REFTABLE_API_ERROR);
     -+	reftable_addition_destroy(add);
     -+	return err;
     -+}
     -+
     -+static int reftable_delete_pseudoref(struct ref_store *ref_store,
     -+				     const char *pseudoref,
     -+				     const struct object_id *old_oid)
     -+{
     -+	struct strbuf errbuf = STRBUF_INIT;
     -+	int ret = git_reftable_write_pseudoref(ref_store, pseudoref, &null_oid,
     -+					       old_oid, &errbuf);
     -+	/* XXX what to do with the error message? */
     -+	strbuf_release(&errbuf);
     -+	return ret;
     -+}
     -+
      +struct write_delete_refs_arg {
      +	struct reftable_stack *stack;
      +	struct string_list *refnames;
     @@ refs/reftable-backend.c (new)
      +static int git_reftable_reflog_exists(struct ref_store *ref_store,
      +				      const char *refname)
      +{
     -+	/* always exists. */
     -+	return 1;
     ++	struct reftable_iterator it = { NULL };
     ++	struct git_reftable_ref_store *refs =
     ++		(struct git_reftable_ref_store *)ref_store;
     ++	struct reftable_stack *stack = stack_for(refs, refname);
     ++	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
     ++	struct reftable_log_record log = { NULL };
     ++	int err = refs->err;
     ++
     ++	if (err < 0) {
     ++		goto done;
     ++	}
     ++	err = reftable_merged_table_seek_log(mt, &it, refname);
     ++	if (err) {
     ++		goto done;
     ++	}
     ++	err = reftable_iterator_next_log(&it, &log);
     ++	if (err) {
     ++		goto done;
     ++	}
     ++
     ++	if (strcmp(log.refname, refname)) {
     ++		err = 1;
     ++	}
     ++
     ++done:
     ++	reftable_iterator_destroy(&it);
     ++	reftable_log_record_clear(&log);
     ++	return !err;
      +}
      +
      +static int git_reftable_create_reflog(struct ref_store *ref_store,
     @@ refs/reftable-backend.c (new)
      +	git_reftable_rename_ref,
      +	git_reftable_copy_ref,
      +
     -+	git_reftable_write_pseudoref,
     -+	reftable_delete_pseudoref,
     -+
      +	git_reftable_ref_iterator_begin,
      +	git_reftable_read_raw_ref,
      +
     @@ repository.h: struct repository {
       	 */
      
       ## setup.c ##
     -@@ setup.c: static int check_repo_format(const char *var, const char *value, void *vdata)
     - 			if (!value)
     - 				return config_error_nonbool(var);
     - 			data->partial_clone = xstrdup(value);
     --		} else if (!strcmp(ext, "worktreeconfig"))
     -+		} else if (!strcmp(ext, "worktreeconfig")) {
     - 			data->worktree_config = git_config_bool(var, value);
     --		else
     -+		} else if (!strcmp(ext, "refstorage")) {
     -+			data->ref_storage = xstrdup(value);
     -+		} else
     - 			string_list_append(&data->unknown_extensions, ext);
     +@@ setup.c: static enum extension_result handle_extension(const char *var,
     + {
     + 	if (!strcmp(ext, "noop-v1")) {
     + 		return EXTENSION_OK;
     ++	} else if (!strcmp(ext, "refstorage")) {
     ++		data->ref_storage = xstrdup(value);
     ++		return EXTENSION_OK;
       	}
     +-
     + 	return EXTENSION_UNKNOWN;
     + }
       
      @@ setup.c: void clear_repository_format(struct repository_format *format)
     - 	string_list_clear(&format->unknown_extensions, 0);
     + 	string_list_clear(&format->v1_only_extensions, 0);
       	free(format->work_tree);
       	free(format->partial_clone);
      +	free(format->ref_storage);
  -:  ---------- > 15:  a6449d4346 Read FETCH_HEAD as loose ref
 15:  04c86e7395 ! 16:  f13e4798c4 Hookup unittests for the reftable library.
     @@ Makefile: t/helper/test-svn-fe$X: $(VCSSVN_LIB)
       	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS)
       
       check-sha1:: t/helper/test-tool$X
     +@@ Makefile: cocciclean:
     + clean: profile-clean coverage-clean cocciclean
     + 	$(RM) *.res
     + 	$(RM) $(OBJECTS)
     +-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
     ++	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB) $(REFTABLE_TEST_LIB)
     + 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
     + 	$(RM) $(TEST_PROGRAMS)
     + 	$(RM) $(FUZZ_PROGRAMS)
      
       ## t/helper/test-reftable.c (new) ##
      @@
 16:  c751265705 ! 17:  679dc37871 Add GIT_DEBUG_REFS debugging mechanism
     @@ builtin/worktree.c: static int add_worktree(const char *path, const char *refnam
       	strbuf_reset(&sb);
      -	if (get_main_ref_store(the_repository)->be == &refs_be_reftable) {
      +
     -+	/* XXX: check GIT_TEST_REFTABLE because GIT_DEBUG_REFS obscures the
     -+	 * instance type. */
     -+	if (get_main_ref_store(the_repository)->be == &refs_be_reftable ||
     -+	    getenv("GIT_TEST_REFTABLE") != NULL) {
     ++	if (!strcmp(ref_store_backend_name(get_main_ref_store(the_repository)),
     ++		    "reftable")) {
       		/* XXX this is cut & paste from reftable_init_db. */
       		strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
       		write_file(sb.buf, "%s", "ref: refs/heads/.invalid\n");
      
       ## refs.c ##
     +@@ refs.c: int ref_storage_backend_exists(const char *name)
     + 	return find_ref_storage_backend(name) != NULL;
     + }
     + 
     ++const char *ref_store_backend_name(struct ref_store *refs)
     ++{
     ++	return refs->be->name;
     ++}
     ++
     + /*
     +  * How to handle various characters in refnames:
     +  * 0: An acceptable character for refs
      @@ refs.c: struct ref_store *get_main_ref_store(struct repository *r)
     - 						       r->ref_storage_format :
     - 						       default_ref_storage(),
     + 						 r->ref_storage_format :
     + 						 default_ref_storage(),
       					 REF_STORE_ALL_CAPS);
      +	if (getenv("GIT_DEBUG_REFS")) {
      +		r->refs_private = debug_wrap(r->gitdir, r->refs_private);
     @@ refs.c: struct ref_store *get_main_ref_store(struct repository *r)
       }
       
      
     + ## refs.h ##
     +@@ refs.h: int reflog_expire(const char *refname, const struct object_id *oid,
     + int ref_storage_backend_exists(const char *name);
     + 
     + struct ref_store *get_main_ref_store(struct repository *r);
     ++const char *ref_store_backend_name(struct ref_store *refs);
     + 
     + /**
     +  * Submodules
     +
       ## refs/debug.c (new) ##
      @@
      +
     @@ refs/debug.c (new)
      +struct ref_store *debug_wrap(const char *gitdir, struct ref_store *store)
      +{
      +	struct debug_ref_store *res = malloc(sizeof(struct debug_ref_store));
     ++	struct ref_storage_be *be_copy = malloc(sizeof(*be_copy));
     ++	*be_copy = refs_be_debug;
     ++	be_copy->name = ref_store_backend_name(store);
      +	fprintf(stderr, "ref_store for %s\n", gitdir);
      +	res->refs = store;
     -+	base_ref_store_init((struct ref_store *)res, &refs_be_debug);
     ++	base_ref_store_init((struct ref_store *)res, be_copy);
      +	return (struct ref_store *)res;
      +}
      +
     @@ refs/debug.c (new)
      +	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
      +	int res =
      +		drefs->refs->be->delete_refs(drefs->refs, msg, refnames, flags);
     ++	int i = 0;
     ++	fprintf(stderr, "delete_refs {\n");
     ++	for (i = 0; i < refnames->nr; i++)
     ++		fprintf(stderr, "%s\n", refnames->items[i].string);
     ++	fprintf(stderr, "}: %d\n", res);
      +	return res;
      +}
      +
     @@ refs/debug.c (new)
      +	return res;
      +}
      +
     -+static int debug_write_pseudoref(struct ref_store *ref_store,
     -+				 const char *pseudoref,
     -+				 const struct object_id *oid,
     -+				 const struct object_id *old_oid,
     -+				 struct strbuf *err)
     -+{
     -+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
     -+	int res = drefs->refs->be->write_pseudoref(drefs->refs, pseudoref, oid,
     -+						   old_oid, err);
     -+	char o[100] = "null";
     -+	char n[100] = "null";
     -+	if (oid)
     -+		oid_to_hex_r(o, oid);
     -+	if (old_oid)
     -+		oid_to_hex_r(n, old_oid);
     -+	fprintf(stderr, "write_pseudoref: %s, %s => %s, err %s: %d\n",
     -+		pseudoref, o, n, err->buf, res);
     -+	return res;
     -+}
     -+
     -+static int debug_delete_pseudoref(struct ref_store *ref_store,
     -+				  const char *pseudoref,
     -+				  const struct object_id *old_oid)
     -+{
     -+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
     -+	int res = drefs->refs->be->delete_pseudoref(drefs->refs, pseudoref,
     -+						    old_oid);
     -+	char hex[100] = "null";
     -+	if (old_oid)
     -+		oid_to_hex_r(hex, old_oid);
     -+	fprintf(stderr, "delete_pseudoref: %s (%s): %d\n", pseudoref, hex, res);
     -+	return res;
     -+}
     -+
      +struct debug_ref_iterator {
      +	struct ref_iterator base;
      +	struct ref_iterator *iter;
     @@ refs/debug.c (new)
      +	debug_rename_ref,
      +	debug_copy_ref,
      +
     -+	debug_write_pseudoref,
     -+	debug_delete_pseudoref,
     -+
      +	debug_ref_iterator_begin,
      +	debug_read_raw_ref,
      +
     @@ refs/debug.c (new)
      +};
      
       ## refs/refs-internal.h ##
     -@@ refs/refs-internal.h: struct ref_store {
     +@@ refs/refs-internal.h: int parse_loose_ref_contents(const char *buf, struct object_id *oid,
       void base_ref_store_init(struct ref_store *refs,
       			 const struct ref_storage_be *be);
       
 17:  ef0dd45f07 = 18:  9e6cdbe204 vcxproj: adjust for the reftable changes
 18:  d4dc1deae5 = 19:  9cd73da381 git-prompt: prepare for reftable refs backend
 19:  5a85abf9c4 = 20:  a2e5b3082e Add reftable testing infrastructure
 20:  0ebf7beb95 = 21:  539fa0935f Add "test-tool dump-reftable" command.

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 409+ messages in thread

* [PATCH v20 01/21] refs: add \t to reflog in the files backend
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:26                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:26                                       ` [PATCH v20 02/21] Split off reading loose ref data in separate function Han-Wen Nienhuys via GitGitGadget
                                                         ` (19 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:26 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

commit 523fa69c3 (Jul 10, 2020) "reflog: cleanse messages in the
refs.c layer" centralized reflog normalizaton.  However, the
normalizaton added a leading "\t" to the message. This is an artifact
of the reflog storage format in the files backend, so it should be
added there.

Routines that parse back the reflog (such as grab_nth_branch_switch)
expect the "\t" to not be in the message, so without this fix, git
with reftable cannot process the "@{-1}" syntax.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c               | 1 -
 refs/files-backend.c | 4 +++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index 89814c7be4..2dd851fe81 100644
--- a/refs.c
+++ b/refs.c
@@ -907,7 +907,6 @@ static void copy_reflog_msg(struct strbuf *sb, const char *msg)
 	char c;
 	int wasspace = 1;
 
-	strbuf_addch(sb, '\t');
 	while ((c = *msg++)) {
 		if (wasspace && isspace(c))
 			continue;
diff --git a/refs/files-backend.c b/refs/files-backend.c
index e0aba23eb2..985631f33e 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1628,8 +1628,10 @@ static int log_ref_write_fd(int fd, const struct object_id *old_oid,
 	int ret = 0;
 
 	strbuf_addf(&sb, "%s %s %s", oid_to_hex(old_oid), oid_to_hex(new_oid), committer);
-	if (msg && *msg)
+	if (msg && *msg) {
+		strbuf_addch(&sb, '\t');
 		strbuf_addstr(&sb, msg);
+	}
 	strbuf_addch(&sb, '\n');
 	if (write_in_full(fd, sb.buf, sb.len) < 0)
 		ret = -1;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 02/21] Split off reading loose ref data in separate function
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:26                                       ` [PATCH v20 01/21] refs: add \t to reflog in the files backend Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:26                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:26                                       ` [PATCH v20 03/21] t1400: use git rev-parse for testing PSEUDOREF existence Han-Wen Nienhuys via GitGitGadget
                                                         ` (18 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:26 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This prepares for handling FETCH_HEAD (which is not a regular ref)
separately in the reftable backend.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/files-backend.c | 34 +++++++++++++++++++---------------
 refs/refs-internal.h |  6 ++++++
 2 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 985631f33e..3a3573986f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -360,7 +360,6 @@ static int files_read_raw_ref(struct ref_store *ref_store,
 	struct strbuf sb_path = STRBUF_INIT;
 	const char *path;
 	const char *buf;
-	const char *p;
 	struct stat st;
 	int fd;
 	int ret = -1;
@@ -465,6 +464,21 @@ static int files_read_raw_ref(struct ref_store *ref_store,
 	close(fd);
 	strbuf_rtrim(&sb_contents);
 	buf = sb_contents.buf;
+
+	ret = parse_loose_ref_contents(buf, oid, referent, type);
+
+out:
+	save_errno = errno;
+	strbuf_release(&sb_path);
+	strbuf_release(&sb_contents);
+	errno = save_errno;
+	return ret;
+}
+
+int parse_loose_ref_contents(const char *buf, struct object_id *oid,
+			     struct strbuf *referent, unsigned int *type)
+{
+	const char *p;
 	if (skip_prefix(buf, "ref:", &buf)) {
 		while (isspace(*buf))
 			buf++;
@@ -472,29 +486,19 @@ static int files_read_raw_ref(struct ref_store *ref_store,
 		strbuf_reset(referent);
 		strbuf_addstr(referent, buf);
 		*type |= REF_ISSYMREF;
-		ret = 0;
-		goto out;
+		return 0;
 	}
 
 	/*
-	 * Please note that FETCH_HEAD has additional
-	 * data after the sha.
+	 * FETCH_HEAD has additional data after the sha.
 	 */
 	if (parse_oid_hex(buf, oid, &p) ||
 	    (*p != '\0' && !isspace(*p))) {
 		*type |= REF_ISBROKEN;
 		errno = EINVAL;
-		goto out;
+		return -1;
 	}
-
-	ret = 0;
-
-out:
-	save_errno = errno;
-	strbuf_release(&sb_path);
-	strbuf_release(&sb_contents);
-	errno = save_errno;
-	return ret;
+	return 0;
 }
 
 static void unlock_ref(struct ref_lock *lock)
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 357359a0be..24d79fb5c1 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -674,6 +674,12 @@ struct ref_store {
 	const struct ref_storage_be *be;
 };
 
+/*
+ * Parse contents of a loose ref file.
+ */
+int parse_loose_ref_contents(const char *buf, struct object_id *oid,
+			     struct strbuf *referent, unsigned int *type);
+
 /*
  * Fill in the generic part of refs and add it to our collection of
  * reference stores.
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 03/21] t1400: use git rev-parse for testing PSEUDOREF existence
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:26                                       ` [PATCH v20 01/21] refs: add \t to reflog in the files backend Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:26                                       ` [PATCH v20 02/21] Split off reading loose ref data in separate function Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:26                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 04/21] Modify pseudo refs through ref backend storage Han-Wen Nienhuys via GitGitGadget
                                                         ` (17 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:26 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This will allow these tests to run with alternative ref backends

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/t1400-update-ref.sh | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index 27171f8261..7414b898f8 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -475,57 +475,57 @@ test_expect_success 'git cat-file blob master@{2005-05-26 23:42}:F (expect OTHER
 
 test_expect_success 'given old value for missing pseudoref, do not create' '
 	test_must_fail git update-ref PSEUDOREF $A $B 2>err &&
-	test_path_is_missing .git/PSEUDOREF &&
+	test_must_fail git rev-parse PSEUDOREF &&
 	test_i18ngrep "could not read ref" err
 '
 
 test_expect_success 'create pseudoref' '
 	git update-ref PSEUDOREF $A &&
-	test $A = $(cat .git/PSEUDOREF)
+	test $A = $(git rev-parse PSEUDOREF)
 '
 
 test_expect_success 'overwrite pseudoref with no old value given' '
 	git update-ref PSEUDOREF $B &&
-	test $B = $(cat .git/PSEUDOREF)
+	test $B = $(git rev-parse PSEUDOREF)
 '
 
 test_expect_success 'overwrite pseudoref with correct old value' '
 	git update-ref PSEUDOREF $C $B &&
-	test $C = $(cat .git/PSEUDOREF)
+	test $C = $(git rev-parse PSEUDOREF)
 '
 
 test_expect_success 'do not overwrite pseudoref with wrong old value' '
 	test_must_fail git update-ref PSEUDOREF $D $E 2>err &&
-	test $C = $(cat .git/PSEUDOREF) &&
+	test $C = $(git rev-parse PSEUDOREF) &&
 	test_i18ngrep "unexpected object ID" err
 '
 
 test_expect_success 'delete pseudoref' '
 	git update-ref -d PSEUDOREF &&
-	test_path_is_missing .git/PSEUDOREF
+	test_must_fail git rev-parse PSEUDOREF
 '
 
 test_expect_success 'do not delete pseudoref with wrong old value' '
 	git update-ref PSEUDOREF $A &&
 	test_must_fail git update-ref -d PSEUDOREF $B 2>err &&
-	test $A = $(cat .git/PSEUDOREF) &&
+	test $A = $(git rev-parse PSEUDOREF) &&
 	test_i18ngrep "unexpected object ID" err
 '
 
 test_expect_success 'delete pseudoref with correct old value' '
 	git update-ref -d PSEUDOREF $A &&
-	test_path_is_missing .git/PSEUDOREF
+	test_must_fail git rev-parse PSEUDOREF
 '
 
 test_expect_success 'create pseudoref with old OID zero' '
 	git update-ref PSEUDOREF $A $Z &&
-	test $A = $(cat .git/PSEUDOREF)
+	test $A = $(git rev-parse PSEUDOREF)
 '
 
 test_expect_success 'do not overwrite pseudoref with old OID zero' '
 	test_when_finished git update-ref -d PSEUDOREF &&
 	test_must_fail git update-ref PSEUDOREF $B $Z 2>err &&
-	test $A = $(cat .git/PSEUDOREF) &&
+	test $A = $(git rev-parse PSEUDOREF) &&
 	test_i18ngrep "already exists" err
 '
 
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 04/21] Modify pseudo refs through ref backend storage
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (2 preceding siblings ...)
  2020-07-31 15:26                                       ` [PATCH v20 03/21] t1400: use git rev-parse for testing PSEUDOREF existence Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 05/21] Make HEAD a PSEUDOREF rather than PER_WORKTREE Han-Wen Nienhuys via GitGitGadget
                                                         ` (16 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The previous behavior was introduced in commit 74ec19d4be
("pseudorefs: create and use pseudoref update and delete functions",
Jul 31, 2015), with the justification "alternate ref backends still
need to store pseudorefs in GIT_DIR".

Refs such as REBASE_HEAD are read through the ref backend. This can
only work consistently if they are written through the ref backend as
well. Tooling that works directly on files under .git should be
updated to use git commands to read refs instead.

The following behaviors change:

* Updates to pseudorefs (eg. ORIG_HEAD) with
  core.logAllRefUpdates=always will create reflogs for the pseudoref.

* non-HEAD pseudoref symrefs are also dereferenced on deletion. Update
  t1405 accordingly.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Documentation/git-update-ref.txt |  13 ++--
 refs.c                           | 120 ++-----------------------------
 t/t1400-update-ref.sh            |  10 +--
 t/t1405-main-ref-store.sh        |   5 +-
 4 files changed, 23 insertions(+), 125 deletions(-)

diff --git a/Documentation/git-update-ref.txt b/Documentation/git-update-ref.txt
index 3e737c2360..d401234b03 100644
--- a/Documentation/git-update-ref.txt
+++ b/Documentation/git-update-ref.txt
@@ -148,12 +148,13 @@ still see a subset of the modifications.
 
 LOGGING UPDATES
 ---------------
-If config parameter "core.logAllRefUpdates" is true and the ref is one under
-"refs/heads/", "refs/remotes/", "refs/notes/", or the symbolic ref HEAD; or
-the file "$GIT_DIR/logs/<ref>" exists then `git update-ref` will append
-a line to the log file "$GIT_DIR/logs/<ref>" (dereferencing all
-symbolic refs before creating the log name) describing the change
-in ref value.  Log lines are formatted as:
+If config parameter "core.logAllRefUpdates" is true and the ref is one
+under "refs/heads/", "refs/remotes/", "refs/notes/", or a pseudoref
+like HEAD or ORIG_HEAD; or the file "$GIT_DIR/logs/<ref>" exists then
+`git update-ref` will append a line to the log file
+"$GIT_DIR/logs/<ref>" (dereferencing all symbolic refs before creating
+the log name) describing the change in ref value.  Log lines are
+formatted as:
 
     oldsha1 SP newsha1 SP committer LF
 
diff --git a/refs.c b/refs.c
index 2dd851fe81..5c25821ad5 100644
--- a/refs.c
+++ b/refs.c
@@ -771,102 +771,6 @@ long get_files_ref_lock_timeout_ms(void)
 	return timeout_ms;
 }
 
-static int write_pseudoref(const char *pseudoref, const struct object_id *oid,
-			   const struct object_id *old_oid, struct strbuf *err)
-{
-	const char *filename;
-	int fd;
-	struct lock_file lock = LOCK_INIT;
-	struct strbuf buf = STRBUF_INIT;
-	int ret = -1;
-
-	if (!oid)
-		return 0;
-
-	strbuf_addf(&buf, "%s\n", oid_to_hex(oid));
-
-	filename = git_path("%s", pseudoref);
-	fd = hold_lock_file_for_update_timeout(&lock, filename, 0,
-					       get_files_ref_lock_timeout_ms());
-	if (fd < 0) {
-		strbuf_addf(err, _("could not open '%s' for writing: %s"),
-			    filename, strerror(errno));
-		goto done;
-	}
-
-	if (old_oid) {
-		struct object_id actual_old_oid;
-
-		if (read_ref(pseudoref, &actual_old_oid)) {
-			if (!is_null_oid(old_oid)) {
-				strbuf_addf(err, _("could not read ref '%s'"),
-					    pseudoref);
-				rollback_lock_file(&lock);
-				goto done;
-			}
-		} else if (is_null_oid(old_oid)) {
-			strbuf_addf(err, _("ref '%s' already exists"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		} else if (!oideq(&actual_old_oid, old_oid)) {
-			strbuf_addf(err, _("unexpected object ID when writing '%s'"),
-				    pseudoref);
-			rollback_lock_file(&lock);
-			goto done;
-		}
-	}
-
-	if (write_in_full(fd, buf.buf, buf.len) < 0) {
-		strbuf_addf(err, _("could not write to '%s'"), filename);
-		rollback_lock_file(&lock);
-		goto done;
-	}
-
-	commit_lock_file(&lock);
-	ret = 0;
-done:
-	strbuf_release(&buf);
-	return ret;
-}
-
-static int delete_pseudoref(const char *pseudoref, const struct object_id *old_oid)
-{
-	const char *filename;
-
-	filename = git_path("%s", pseudoref);
-
-	if (old_oid && !is_null_oid(old_oid)) {
-		struct lock_file lock = LOCK_INIT;
-		int fd;
-		struct object_id actual_old_oid;
-
-		fd = hold_lock_file_for_update_timeout(
-				&lock, filename, 0,
-				get_files_ref_lock_timeout_ms());
-		if (fd < 0) {
-			error_errno(_("could not open '%s' for writing"),
-				    filename);
-			return -1;
-		}
-		if (read_ref(pseudoref, &actual_old_oid))
-			die(_("could not read ref '%s'"), pseudoref);
-		if (!oideq(&actual_old_oid, old_oid)) {
-			error(_("unexpected object ID when deleting '%s'"),
-			      pseudoref);
-			rollback_lock_file(&lock);
-			return -1;
-		}
-
-		unlink(filename);
-		rollback_lock_file(&lock);
-	} else {
-		unlink(filename);
-	}
-
-	return 0;
-}
-
 int refs_delete_ref(struct ref_store *refs, const char *msg,
 		    const char *refname,
 		    const struct object_id *old_oid,
@@ -875,11 +779,6 @@ int refs_delete_ref(struct ref_store *refs, const char *msg,
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
 
-	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
-		assert(refs == get_main_ref_store(the_repository));
-		return delete_pseudoref(refname, old_oid);
-	}
-
 	transaction = ref_store_transaction_begin(refs, &err);
 	if (!transaction ||
 	    ref_transaction_delete(transaction, refname, old_oid,
@@ -1210,18 +1109,13 @@ int refs_update_ref(struct ref_store *refs, const char *msg,
 	struct strbuf err = STRBUF_INIT;
 	int ret = 0;
 
-	if (ref_type(refname) == REF_TYPE_PSEUDOREF) {
-		assert(refs == get_main_ref_store(the_repository));
-		ret = write_pseudoref(refname, new_oid, old_oid, &err);
-	} else {
-		t = ref_store_transaction_begin(refs, &err);
-		if (!t ||
-		    ref_transaction_update(t, refname, new_oid, old_oid,
-					   flags, msg, &err) ||
-		    ref_transaction_commit(t, &err)) {
-			ret = 1;
-			ref_transaction_free(t);
-		}
+	t = ref_store_transaction_begin(refs, &err);
+	if (!t ||
+	    ref_transaction_update(t, refname, new_oid, old_oid, flags, msg,
+				   &err) ||
+	    ref_transaction_commit(t, &err)) {
+		ret = 1;
+		ref_transaction_free(t);
 	}
 	if (ret) {
 		const char *str = _("update_ref failed for ref '%s': %s");
diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index 7414b898f8..65d349fb33 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -160,10 +160,10 @@ test_expect_success 'core.logAllRefUpdates=always creates reflog by default' '
 	git reflog exists $outside
 '
 
-test_expect_success 'core.logAllRefUpdates=always creates no reflog for ORIG_HEAD' '
+test_expect_success 'core.logAllRefUpdates=always creates reflog for ORIG_HEAD' '
 	test_config core.logAllRefUpdates always &&
 	git update-ref ORIG_HEAD $A &&
-	test_must_fail git reflog exists ORIG_HEAD
+	git reflog exists ORIG_HEAD
 '
 
 test_expect_success '--no-create-reflog overrides core.logAllRefUpdates=always' '
@@ -476,7 +476,7 @@ test_expect_success 'git cat-file blob master@{2005-05-26 23:42}:F (expect OTHER
 test_expect_success 'given old value for missing pseudoref, do not create' '
 	test_must_fail git update-ref PSEUDOREF $A $B 2>err &&
 	test_must_fail git rev-parse PSEUDOREF &&
-	test_i18ngrep "could not read ref" err
+	test_i18ngrep "unable to resolve reference" err
 '
 
 test_expect_success 'create pseudoref' '
@@ -497,7 +497,7 @@ test_expect_success 'overwrite pseudoref with correct old value' '
 test_expect_success 'do not overwrite pseudoref with wrong old value' '
 	test_must_fail git update-ref PSEUDOREF $D $E 2>err &&
 	test $C = $(git rev-parse PSEUDOREF) &&
-	test_i18ngrep "unexpected object ID" err
+	test_i18ngrep "cannot lock ref.*expected" err
 '
 
 test_expect_success 'delete pseudoref' '
@@ -509,7 +509,7 @@ test_expect_success 'do not delete pseudoref with wrong old value' '
 	git update-ref PSEUDOREF $A &&
 	test_must_fail git update-ref -d PSEUDOREF $B 2>err &&
 	test $A = $(git rev-parse PSEUDOREF) &&
-	test_i18ngrep "unexpected object ID" err
+	test_i18ngrep "cannot lock ref.*expected" err
 '
 
 test_expect_success 'delete pseudoref with correct old value' '
diff --git a/t/t1405-main-ref-store.sh b/t/t1405-main-ref-store.sh
index 331899ddc4..74af927fba 100755
--- a/t/t1405-main-ref-store.sh
+++ b/t/t1405-main-ref-store.sh
@@ -31,7 +31,10 @@ test_expect_success 'create_symref(FOO, refs/heads/master)' '
 test_expect_success 'delete_refs(FOO, refs/tags/new-tag)' '
 	git rev-parse FOO -- &&
 	git rev-parse refs/tags/new-tag -- &&
-	$RUN delete-refs 0 nothing FOO refs/tags/new-tag &&
+	m=$(git rev-parse master) &&
+	REF_NO_DEREF=1 &&
+	$RUN delete-refs $REF_NO_DEREF nothing FOO refs/tags/new-tag &&
+	test_must_fail git rev-parse --symbolic-full-name FOO &&
 	test_must_fail git rev-parse FOO -- &&
 	test_must_fail git rev-parse refs/tags/new-tag --
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 05/21] Make HEAD a PSEUDOREF rather than PER_WORKTREE.
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (3 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 04/21] Modify pseudo refs through ref backend storage Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 06/21] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
                                                         ` (15 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This is consistent with the definition of REF_TYPE_PSEUDOREF
(uppercase in the root ref namespace).

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/refs.c b/refs.c
index 5c25821ad5..947b51f781 100644
--- a/refs.c
+++ b/refs.c
@@ -708,10 +708,9 @@ int dwim_log(const char *str, int len, struct object_id *oid, char **log)
 
 static int is_per_worktree_ref(const char *refname)
 {
-	return !strcmp(refname, "HEAD") ||
-		starts_with(refname, "refs/worktree/") ||
-		starts_with(refname, "refs/bisect/") ||
-		starts_with(refname, "refs/rewritten/");
+	return starts_with(refname, "refs/worktree/") ||
+	       starts_with(refname, "refs/bisect/") ||
+	       starts_with(refname, "refs/rewritten/");
 }
 
 static int is_pseudoref_syntax(const char *refname)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 06/21] Make refs_ref_exists public
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (4 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 05/21] Make HEAD a PSEUDOREF rather than PER_WORKTREE Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 07/21] Treat CHERRY_PICK_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
                                                         ` (14 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs.c | 2 +-
 refs.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 947b51f781..5e65e79f24 100644
--- a/refs.c
+++ b/refs.c
@@ -313,7 +313,7 @@ int read_ref(const char *refname, struct object_id *oid)
 	return read_ref_full(refname, RESOLVE_REF_READING, oid, NULL);
 }
 
-static int refs_ref_exists(struct ref_store *refs, const char *refname)
+int refs_ref_exists(struct ref_store *refs, const char *refname)
 {
 	return !!refs_resolve_ref_unsafe(refs, refname, RESOLVE_REF_READING, NULL, NULL);
 }
diff --git a/refs.h b/refs.h
index f212f8945e..f8835f0aef 100644
--- a/refs.h
+++ b/refs.h
@@ -105,6 +105,8 @@ int refs_verify_refname_available(struct ref_store *refs,
 				  const struct string_list *skip,
 				  struct strbuf *err);
 
+int refs_ref_exists(struct ref_store *refs, const char *refname);
+
 int ref_exists(const char *refname);
 
 int should_autocreate_reflog(const char *refname);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 07/21] Treat CHERRY_PICK_HEAD as a pseudo ref
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (5 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 06/21] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 08/21] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
                                                         ` (13 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Check for existence and delete CHERRY_PICK_HEAD through ref functions.
This will help cherry-pick work with alternate ref storage backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 builtin/commit.c | 34 +++++++++++++++++++---------------
 builtin/merge.c  |  2 +-
 path.c           |  1 -
 path.h           |  7 ++++---
 sequencer.c      | 42 ++++++++++++++++++++++++++----------------
 wt-status.c      |  4 ++--
 6 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index d1b7396052..e27120b982 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -847,21 +847,25 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 			if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS &&
 				!merge_contains_scissors)
 				wt_status_add_cut_line(s->fp);
-			status_printf_ln(s, GIT_COLOR_NORMAL,
-			    whence == FROM_MERGE
-				? _("\n"
-					"It looks like you may be committing a merge.\n"
-					"If this is not correct, please remove the file\n"
-					"	%s\n"
-					"and try again.\n")
-				: _("\n"
-					"It looks like you may be committing a cherry-pick.\n"
-					"If this is not correct, please remove the file\n"
-					"	%s\n"
-					"and try again.\n"),
-				whence == FROM_MERGE ?
-					git_path_merge_head(the_repository) :
-					git_path_cherry_pick_head(the_repository));
+			if (whence == FROM_MERGE)
+				status_printf_ln(
+					s, GIT_COLOR_NORMAL,
+
+					_("\n"
+					  "It looks like you may be committing a merge.\n"
+					  "If this is not correct, please remove the file\n"
+					  "	%s\n"
+					  "and try again.\n"),
+					git_path_merge_head(the_repository));
+			else
+				status_printf_ln(
+					s, GIT_COLOR_NORMAL,
+
+					_("\n"
+					  "It looks like you may be committing a cherry-pick.\n"
+					  "If this is not correct, please run\n"
+					  "	git cherry-pick --abort\n"
+					  "and try again.\n"));
 		}
 
 		fprintf(s->fp, "\n");
diff --git a/builtin/merge.c b/builtin/merge.c
index 7da707bf55..93b0a7b6ed 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -1352,7 +1352,7 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
 		else
 			die(_("You have not concluded your merge (MERGE_HEAD exists)."));
 	}
-	if (file_exists(git_path_cherry_pick_head(the_repository))) {
+	if (ref_exists("CHERRY_PICK_HEAD")) {
 		if (advice_resolve_conflict)
 			die(_("You have not concluded your cherry-pick (CHERRY_PICK_HEAD exists).\n"
 			    "Please, commit your changes before you merge."));
diff --git a/path.c b/path.c
index 8b2c753191..783cc2ae81 100644
--- a/path.c
+++ b/path.c
@@ -1528,7 +1528,6 @@ char *xdg_cache_home(const char *filename)
 	return NULL;
 }
 
-REPO_GIT_PATH_FUNC(cherry_pick_head, "CHERRY_PICK_HEAD")
 REPO_GIT_PATH_FUNC(revert_head, "REVERT_HEAD")
 REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
 REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
diff --git a/path.h b/path.h
index 1f1bf8f87a..8941c018a9 100644
--- a/path.h
+++ b/path.h
@@ -170,7 +170,6 @@ void report_linked_checkout_garbage(void);
 	}
 
 struct path_cache {
-	const char *cherry_pick_head;
 	const char *revert_head;
 	const char *squash_msg;
 	const char *merge_msg;
@@ -182,9 +181,11 @@ struct path_cache {
 	const char *shallow;
 };
 
-#define PATH_CACHE_INIT { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL }
+#define PATH_CACHE_INIT                                              \
+	{                                                            \
+		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
+	}
 
-const char *git_path_cherry_pick_head(struct repository *r);
 const char *git_path_revert_head(struct repository *r);
 const char *git_path_squash_msg(struct repository *r);
 const char *git_path_merge_msg(struct repository *r);
diff --git a/sequencer.c b/sequencer.c
index fd7701c88a..601b1cffcd 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -381,7 +381,8 @@ static void print_advice(struct repository *r, int show_hint,
 		 * (typically rebase --interactive) wants to take care
 		 * of the commit itself so remove CHERRY_PICK_HEAD
 		 */
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_ref(get_main_ref_store(r), "", "CHERRY_PICK_HEAD",
+				NULL, 0);
 		return;
 	}
 
@@ -1455,7 +1456,8 @@ static int do_commit(struct repository *r,
 				    author, opts, flags, &oid);
 		strbuf_release(&sb);
 		if (!res) {
-			unlink(git_path_cherry_pick_head(r));
+			refs_delete_ref(get_main_ref_store(r), "",
+					"CHERRY_PICK_HEAD", NULL, 0);
 			unlink(git_path_merge_msg(r));
 			if (!is_rebase_i(opts))
 				print_commit_summary(r, NULL, &oid,
@@ -1966,7 +1968,8 @@ static int do_pick_commit(struct repository *r,
 		flags |= ALLOW_EMPTY;
 	} else if (allow == 2) {
 		drop_commit = 1;
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_ref(get_main_ref_store(r), "", "CHERRY_PICK_HEAD",
+				NULL, 0);
 		unlink(git_path_merge_msg(r));
 		fprintf(stderr,
 			_("dropping %s %s -- patch contents already upstream\n"),
@@ -2305,8 +2308,10 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose)
 	struct replay_opts opts = REPLAY_OPTS_INIT;
 	int need_cleanup = 0;
 
-	if (file_exists(git_path_cherry_pick_head(r))) {
-		if (!unlink(git_path_cherry_pick_head(r)) && verbose)
+	if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD")) {
+		if (!refs_delete_ref(get_main_ref_store(r), "",
+				     "CHERRY_PICK_HEAD", NULL, 0) &&
+		    verbose)
 			warning(_("cancelling a cherry picking in progress"));
 		opts.action = REPLAY_PICK;
 		need_cleanup = 1;
@@ -2671,8 +2676,9 @@ static int create_seq_dir(struct repository *r)
 	enum replay_action action;
 	const char *in_progress_error = NULL;
 	const char *in_progress_advice = NULL;
-	unsigned int advise_skip = file_exists(git_path_revert_head(r)) ||
-				file_exists(git_path_cherry_pick_head(r));
+	unsigned int advise_skip =
+		file_exists(git_path_revert_head(r)) ||
+		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
 
 	if (!sequencer_get_last_command(r, &action)) {
 		switch (action) {
@@ -2771,7 +2777,7 @@ static int rollback_single_pick(struct repository *r)
 {
 	struct object_id head_oid;
 
-	if (!file_exists(git_path_cherry_pick_head(r)) &&
+	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
 	    !file_exists(git_path_revert_head(r)))
 		return error(_("no cherry-pick or revert in progress"));
 	if (read_ref_full("HEAD", 0, &head_oid, NULL))
@@ -2874,7 +2880,8 @@ int sequencer_skip(struct repository *r, struct replay_opts *opts)
 		}
 		break;
 	case REPLAY_PICK:
-		if (!file_exists(git_path_cherry_pick_head(r))) {
+		if (!refs_ref_exists(get_main_ref_store(r),
+				     "CHERRY_PICK_HEAD")) {
 			if (action != REPLAY_PICK)
 				return error(_("no cherry-pick in progress"));
 			if (!rollback_is_safe())
@@ -3569,7 +3576,8 @@ static int do_merge(struct repository *r,
 					oid_to_hex(&j->item->object.oid));
 
 		strbuf_release(&ref_name);
-		unlink(git_path_cherry_pick_head(r));
+		refs_delete_ref(get_main_ref_store(r), "", "CHERRY_PICK_HEAD",
+				NULL, 0);
 		rollback_lock_file(&lock);
 
 		rollback_lock_file(&lock);
@@ -4201,7 +4209,7 @@ static int continue_single_pick(struct repository *r)
 {
 	const char *argv[] = { "commit", NULL };
 
-	if (!file_exists(git_path_cherry_pick_head(r)) &&
+	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
 	    !file_exists(git_path_revert_head(r)))
 		return error(_("no cherry-pick or revert in progress"));
 	return run_command_v_opt(argv, RUN_GIT_CMD);
@@ -4318,9 +4326,10 @@ static int commit_staged_changes(struct repository *r,
 	}
 
 	if (is_clean) {
-		const char *cherry_pick_head = git_path_cherry_pick_head(r);
-
-		if (file_exists(cherry_pick_head) && unlink(cherry_pick_head))
+		if (refs_ref_exists(get_main_ref_store(r),
+				    "CHERRY_PICK_HEAD") &&
+		    refs_delete_ref(get_main_ref_store(r), "",
+				    "CHERRY_PICK_HEAD", NULL, 0))
 			return error(_("could not remove CHERRY_PICK_HEAD"));
 		if (!final_fixup)
 			return 0;
@@ -4379,7 +4388,8 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
 
 	if (!is_rebase_i(opts)) {
 		/* Verify that the conflict has been resolved */
-		if (file_exists(git_path_cherry_pick_head(r)) ||
+		if (refs_ref_exists(get_main_ref_store(r),
+				    "CHERRY_PICK_HEAD") ||
 		    file_exists(git_path_revert_head(r))) {
 			res = continue_single_pick(r);
 			if (res)
@@ -5442,7 +5452,7 @@ int todo_list_rearrange_squash(struct todo_list *todo_list)
 
 int sequencer_determine_whence(struct repository *r, enum commit_whence *whence)
 {
-	if (file_exists(git_path_cherry_pick_head(r))) {
+	if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD")) {
 		struct object_id cherry_pick_head, rebase_head;
 
 		if (file_exists(git_path_seq_dir()))
diff --git a/wt-status.c b/wt-status.c
index 20f2075868..8fddf65a4b 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1673,8 +1673,8 @@ void wt_status_get_state(struct repository *r,
 		state->merge_in_progress = 1;
 	} else if (wt_status_check_rebase(NULL, state)) {
 		;		/* all set */
-	} else if (!stat(git_path_cherry_pick_head(r), &st) &&
-			!get_oid("CHERRY_PICK_HEAD", &oid)) {
+	} else if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
+		   !get_oid("CHERRY_PICK_HEAD", &oid)) {
 		state->cherry_pick_in_progress = 1;
 		oidcpy(&state->cherry_pick_head_oid, &oid);
 	}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 08/21] Treat REVERT_HEAD as a pseudo ref
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (6 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 07/21] Treat CHERRY_PICK_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 09/21] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
                                                         ` (12 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 path.c      |  1 -
 path.h      |  8 +++-----
 sequencer.c | 16 +++++++++-------
 wt-status.c |  2 +-
 4 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/path.c b/path.c
index 783cc2ae81..7b385e5eb2 100644
--- a/path.c
+++ b/path.c
@@ -1528,7 +1528,6 @@ char *xdg_cache_home(const char *filename)
 	return NULL;
 }
 
-REPO_GIT_PATH_FUNC(revert_head, "REVERT_HEAD")
 REPO_GIT_PATH_FUNC(squash_msg, "SQUASH_MSG")
 REPO_GIT_PATH_FUNC(merge_msg, "MERGE_MSG")
 REPO_GIT_PATH_FUNC(merge_rr, "MERGE_RR")
diff --git a/path.h b/path.h
index 8941c018a9..e7e77da6aa 100644
--- a/path.h
+++ b/path.h
@@ -170,7 +170,6 @@ void report_linked_checkout_garbage(void);
 	}
 
 struct path_cache {
-	const char *revert_head;
 	const char *squash_msg;
 	const char *merge_msg;
 	const char *merge_rr;
@@ -181,12 +180,11 @@ struct path_cache {
 	const char *shallow;
 };
 
-#define PATH_CACHE_INIT                                              \
-	{                                                            \
-		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
+#define PATH_CACHE_INIT                                        \
+	{                                                      \
+		NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL \
 	}
 
-const char *git_path_revert_head(struct repository *r);
 const char *git_path_squash_msg(struct repository *r);
 const char *git_path_merge_msg(struct repository *r);
 const char *git_path_merge_rr(struct repository *r);
diff --git a/sequencer.c b/sequencer.c
index 601b1cffcd..5e7b1634f3 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2317,8 +2317,10 @@ void sequencer_post_commit_cleanup(struct repository *r, int verbose)
 		need_cleanup = 1;
 	}
 
-	if (file_exists(git_path_revert_head(r))) {
-		if (!unlink(git_path_revert_head(r)) && verbose)
+	if (refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
+		if (!refs_delete_ref(get_main_ref_store(r), "", "REVERT_HEAD",
+				     NULL, 0) &&
+		    verbose)
 			warning(_("cancelling a revert in progress"));
 		opts.action = REPLAY_REVERT;
 		need_cleanup = 1;
@@ -2677,7 +2679,7 @@ static int create_seq_dir(struct repository *r)
 	const char *in_progress_error = NULL;
 	const char *in_progress_advice = NULL;
 	unsigned int advise_skip =
-		file_exists(git_path_revert_head(r)) ||
+		refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD") ||
 		refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD");
 
 	if (!sequencer_get_last_command(r, &action)) {
@@ -2778,7 +2780,7 @@ static int rollback_single_pick(struct repository *r)
 	struct object_id head_oid;
 
 	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
-	    !file_exists(git_path_revert_head(r)))
+	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
 		return error(_("no cherry-pick or revert in progress"));
 	if (read_ref_full("HEAD", 0, &head_oid, NULL))
 		return error(_("cannot resolve HEAD"));
@@ -2872,7 +2874,7 @@ int sequencer_skip(struct repository *r, struct replay_opts *opts)
 	 */
 	switch (opts->action) {
 	case REPLAY_REVERT:
-		if (!file_exists(git_path_revert_head(r))) {
+		if (!refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
 			if (action != REPLAY_REVERT)
 				return error(_("no revert in progress"));
 			if (!rollback_is_safe())
@@ -4210,7 +4212,7 @@ static int continue_single_pick(struct repository *r)
 	const char *argv[] = { "commit", NULL };
 
 	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
-	    !file_exists(git_path_revert_head(r)))
+	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
 		return error(_("no cherry-pick or revert in progress"));
 	return run_command_v_opt(argv, RUN_GIT_CMD);
 }
@@ -4390,7 +4392,7 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts)
 		/* Verify that the conflict has been resolved */
 		if (refs_ref_exists(get_main_ref_store(r),
 				    "CHERRY_PICK_HEAD") ||
-		    file_exists(git_path_revert_head(r))) {
+		    refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
 			res = continue_single_pick(r);
 			if (res)
 				goto release_todo_list;
diff --git a/wt-status.c b/wt-status.c
index 8fddf65a4b..df81b8d7a7 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1679,7 +1679,7 @@ void wt_status_get_state(struct repository *r,
 		oidcpy(&state->cherry_pick_head_oid, &oid);
 	}
 	wt_status_check_bisect(NULL, state);
-	if (!stat(git_path_revert_head(r), &st) &&
+	if (refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD") &&
 	    !get_oid("REVERT_HEAD", &oid)) {
 		state->revert_in_progress = 1;
 		oidcpy(&state->revert_head_oid, &oid);
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 09/21] Move REF_LOG_ONLY to refs-internal.h
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (7 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 08/21] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 10/21] Iteration over entire ref namespace is iterating over "refs/" Han-Wen Nienhuys via GitGitGadget
                                                         ` (11 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

REF_LOG_ONLY is used in the transaction preparation: if a symref is involved in
a transaction, the referent of the symref should be updated, and the symref
itself should only be updated in the reflog. Other ref backends will need to
duplicate this logic.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/files-backend.c | 7 -------
 refs/refs-internal.h | 7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 3a3573986f..5829d5f8f7 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -38,13 +38,6 @@
  */
 #define REF_NEEDS_COMMIT (1 << 6)
 
-/*
- * Used as a flag in ref_update::flags when we want to log a ref
- * update but not actually perform it.  This is used when a symbolic
- * ref update is split up.
- */
-#define REF_LOG_ONLY (1 << 7)
-
 /*
  * Used as a flag in ref_update::flags when the ref_update was via an
  * update to HEAD.
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 24d79fb5c1..61fd2e661f 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -31,6 +31,13 @@ struct ref_transaction;
  */
 #define REF_HAVE_OLD (1 << 3)
 
+/*
+ * Used as a flag in ref_update::flags when we want to log a ref
+ * update but not actually perform it.  This is used when a symbolic
+ * ref update is split up.
+ */
+#define REF_LOG_ONLY (1 << 7)
+
 /*
  * Return the length of time to retry acquiring a loose reference lock
  * before giving up, in milliseconds:
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 10/21] Iteration  over entire ref namespace is iterating over "refs/"
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (8 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 09/21] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 11/21] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
                                                         ` (10 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This changes behavior for in for_each_[raw]ref and
for_each_fullref_in_pattern.

This happens implicitly in the files/packed ref backend; making it
explicit simplifies adding alternate ref storage backends, such as
reftable.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 ref-filter.c | 7 ++++---
 refs.c       | 6 +++---
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index f2b078db11..88c82b2cca 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1943,6 +1943,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 {
 	struct string_list prefixes = STRING_LIST_INIT_DUP;
 	struct string_list_item *prefix;
+	const char *all = "refs/";
 	int ret;
 
 	if (!filter->match_as_path) {
@@ -1951,7 +1952,7 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * prefixes like "refs/heads/" etc. are stripped off,
 		 * so we have to look at everything:
 		 */
-		return for_each_fullref_in("", cb, cb_data, broken);
+		return for_each_fullref_in(all, cb, cb_data, broken);
 	}
 
 	if (filter->ignore_case) {
@@ -1960,12 +1961,12 @@ static int for_each_fullref_in_pattern(struct ref_filter *filter,
 		 * so just return everything and let the caller
 		 * sort it out.
 		 */
-		return for_each_fullref_in("", cb, cb_data, broken);
+		return for_each_fullref_in(all, cb, cb_data, broken);
 	}
 
 	if (!filter->name_patterns[0]) {
 		/* no patterns; we have to look at everything */
-		return for_each_fullref_in("", cb, cb_data, broken);
+		return for_each_fullref_in(all, cb, cb_data, broken);
 	}
 
 	find_longest_prefixes(&prefixes, filter->name_patterns);
diff --git a/refs.c b/refs.c
index 5e65e79f24..d4b5397c8f 100644
--- a/refs.c
+++ b/refs.c
@@ -1458,7 +1458,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 
 int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, 0, cb_data);
 }
 
 int for_each_ref(each_ref_fn fn, void *cb_data)
@@ -1518,8 +1518,8 @@ int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
 
 int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
 {
-	return do_for_each_ref(refs, "", fn, 0,
-			       DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
+	return do_for_each_ref(refs, "refs/", fn, 0, DO_FOR_EACH_INCLUDE_BROKEN,
+			       cb_data);
 }
 
 int for_each_rawref(each_ref_fn fn, void *cb_data)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 11/21] Add .gitattributes for the reftable/ directory
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (9 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 10/21] Iteration over entire ref namespace is iterating over "refs/" Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 12/21] Add reftable library Han-Wen Nienhuys via GitGitGadget
                                                         ` (9 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/.gitattributes | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 reftable/.gitattributes

diff --git a/reftable/.gitattributes b/reftable/.gitattributes
new file mode 100644
index 0000000000..f44451a379
--- /dev/null
+++ b/reftable/.gitattributes
@@ -0,0 +1 @@
+/zlib-compat.c	whitespace=-indent-with-non-tab,-trailing-space
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 12/21] Add reftable library
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (10 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 11/21] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 13/21] Add standalone build infrastructure for reftable Han-Wen Nienhuys via GitGitGadget
                                                         ` (8 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

Reftable is a format for storing the ref database. The rationale and format
layout is detailed in commit 35e6c47404 ("reftable: file format documentation",
2020-05-20).

This commits imports a fully functioning library for reading and writing
reftables, along with unittests.

It is suggested to read the code in the following order:

* The specification under Documentation/technical/reftable.txt

* reftable.h - the public API

* record.{c,h} - reading and writing records

* block.{c,h} - reading and writing blocks.

* reader.{c,h} - reading a complete reftable file.

* writer.{c,h} - writing a complete reftable file.

* merged.{c,h} and pq.{c,h} - reading a stack of reftables

* stack.{c,h} - writing and compacting stacks of reftable on the
filesystem.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/LICENSE          |   31 +
 reftable/VERSION          |    1 +
 reftable/basics.c         |  215 +++++++
 reftable/basics.h         |   53 ++
 reftable/block.c          |  432 +++++++++++++
 reftable/block.h          |  129 ++++
 reftable/block_test.c     |  157 +++++
 reftable/compat.c         |   98 +++
 reftable/compat.h         |   48 ++
 reftable/constants.h      |   21 +
 reftable/dump.c           |  212 +++++++
 reftable/file.c           |   95 +++
 reftable/iter.c           |  242 ++++++++
 reftable/iter.h           |   72 +++
 reftable/merged.c         |  317 ++++++++++
 reftable/merged.h         |   43 ++
 reftable/merged_test.c    |  286 +++++++++
 reftable/pq.c             |  115 ++++
 reftable/pq.h             |   34 +
 reftable/reader.c         |  744 ++++++++++++++++++++++
 reftable/reader.h         |   65 ++
 reftable/record.c         | 1126 ++++++++++++++++++++++++++++++++++
 reftable/record.h         |  143 +++++
 reftable/record_test.c    |  410 +++++++++++++
 reftable/refname.c        |  209 +++++++
 reftable/refname.h        |   38 ++
 reftable/refname_test.c   |   99 +++
 reftable/reftable-tests.h |   22 +
 reftable/reftable.c       |  154 +++++
 reftable/reftable.h       |  585 ++++++++++++++++++
 reftable/reftable_test.c  |  632 +++++++++++++++++++
 reftable/stack.c          | 1225 +++++++++++++++++++++++++++++++++++++
 reftable/stack.h          |   50 ++
 reftable/stack_test.c     |  787 ++++++++++++++++++++++++
 reftable/strbuf.c         |  206 +++++++
 reftable/strbuf.h         |   88 +++
 reftable/strbuf_test.c    |   39 ++
 reftable/system.h         |   53 ++
 reftable/test_framework.c |   69 +++
 reftable/test_framework.h |   64 ++
 reftable/tree.c           |   63 ++
 reftable/tree.h           |   34 +
 reftable/tree_test.c      |   62 ++
 reftable/update.sh        |   22 +
 reftable/writer.c         |  663 ++++++++++++++++++++
 reftable/writer.h         |   60 ++
 reftable/zlib-compat.c    |   92 +++
 47 files changed, 10405 insertions(+)
 create mode 100644 reftable/LICENSE
 create mode 100644 reftable/VERSION
 create mode 100644 reftable/basics.c
 create mode 100644 reftable/basics.h
 create mode 100644 reftable/block.c
 create mode 100644 reftable/block.h
 create mode 100644 reftable/block_test.c
 create mode 100644 reftable/compat.c
 create mode 100644 reftable/compat.h
 create mode 100644 reftable/constants.h
 create mode 100644 reftable/dump.c
 create mode 100644 reftable/file.c
 create mode 100644 reftable/iter.c
 create mode 100644 reftable/iter.h
 create mode 100644 reftable/merged.c
 create mode 100644 reftable/merged.h
 create mode 100644 reftable/merged_test.c
 create mode 100644 reftable/pq.c
 create mode 100644 reftable/pq.h
 create mode 100644 reftable/reader.c
 create mode 100644 reftable/reader.h
 create mode 100644 reftable/record.c
 create mode 100644 reftable/record.h
 create mode 100644 reftable/record_test.c
 create mode 100644 reftable/refname.c
 create mode 100644 reftable/refname.h
 create mode 100644 reftable/refname_test.c
 create mode 100644 reftable/reftable-tests.h
 create mode 100644 reftable/reftable.c
 create mode 100644 reftable/reftable.h
 create mode 100644 reftable/reftable_test.c
 create mode 100644 reftable/stack.c
 create mode 100644 reftable/stack.h
 create mode 100644 reftable/stack_test.c
 create mode 100644 reftable/strbuf.c
 create mode 100644 reftable/strbuf.h
 create mode 100644 reftable/strbuf_test.c
 create mode 100644 reftable/system.h
 create mode 100644 reftable/test_framework.c
 create mode 100644 reftable/test_framework.h
 create mode 100644 reftable/tree.c
 create mode 100644 reftable/tree.h
 create mode 100644 reftable/tree_test.c
 create mode 100755 reftable/update.sh
 create mode 100644 reftable/writer.c
 create mode 100644 reftable/writer.h
 create mode 100644 reftable/zlib-compat.c

diff --git a/reftable/LICENSE b/reftable/LICENSE
new file mode 100644
index 0000000000..402e0f9356
--- /dev/null
+++ b/reftable/LICENSE
@@ -0,0 +1,31 @@
+BSD License
+
+Copyright (c) 2020, Google LLC
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+* Redistributions of source code must retain the above copyright notice,
+this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above copyright
+notice, this list of conditions and the following disclaimer in the
+documentation and/or other materials provided with the distribution.
+
+* Neither the name of Google LLC nor the names of its contributors may
+be used to endorse or promote products derived from this software
+without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/reftable/VERSION b/reftable/VERSION
new file mode 100644
index 0000000000..b0d1123483
--- /dev/null
+++ b/reftable/VERSION
@@ -0,0 +1 @@
+994cc2f626ff626ff04be5240413775f832110fb C: formatting tweaks
diff --git a/reftable/basics.c b/reftable/basics.c
new file mode 100644
index 0000000000..35b42b6b26
--- /dev/null
+++ b/reftable/basics.c
@@ -0,0 +1,215 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "basics.h"
+
+#include "system.h"
+
+void put_be24(uint8_t *out, uint32_t i)
+{
+	out[0] = (uint8_t)((i >> 16) & 0xff);
+	out[1] = (uint8_t)((i >> 8) & 0xff);
+	out[2] = (uint8_t)(i & 0xff);
+}
+
+uint32_t get_be24(uint8_t *in)
+{
+	return (uint32_t)(in[0]) << 16 | (uint32_t)(in[1]) << 8 |
+	       (uint32_t)(in[2]);
+}
+
+void put_be16(uint8_t *out, uint16_t i)
+{
+	out[0] = (uint8_t)((i >> 8) & 0xff);
+	out[1] = (uint8_t)(i & 0xff);
+}
+
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args)
+{
+	size_t lo = 0;
+	size_t hi = sz;
+
+	/* invariant: (hi == sz) || f(hi) == true
+	   (lo == 0 && f(0) == true) || fi(lo) == false
+	 */
+	while (hi - lo > 1) {
+		size_t mid = lo + (hi - lo) / 2;
+
+		int val = f(mid, args);
+		if (val) {
+			hi = mid;
+		} else {
+			lo = mid;
+		}
+	}
+
+	if (lo == 0) {
+		if (f(0, args)) {
+			return 0;
+		} else {
+			return 1;
+		}
+	}
+
+	return hi;
+}
+
+void free_names(char **a)
+{
+	char **p = a;
+	if (p == NULL) {
+		return;
+	}
+	while (*p) {
+		reftable_free(*p);
+		p++;
+	}
+	reftable_free(a);
+}
+
+int names_length(char **names)
+{
+	int len = 0;
+	char **p = names;
+	while (*p) {
+		p++;
+		len++;
+	}
+	return len;
+}
+
+void parse_names(char *buf, int size, char ***namesp)
+{
+	char **names = NULL;
+	size_t names_cap = 0;
+	size_t names_len = 0;
+
+	char *p = buf;
+	char *end = buf + size;
+	while (p < end) {
+		char *next = strchr(p, '\n');
+		if (next != NULL) {
+			*next = 0;
+		} else {
+			next = end;
+		}
+		if (p < next) {
+			if (names_len == names_cap) {
+				names_cap = 2 * names_cap + 1;
+				names = reftable_realloc(
+					names, names_cap * sizeof(char *));
+			}
+			names[names_len++] = xstrdup(p);
+		}
+		p = next + 1;
+	}
+
+	if (names_len == names_cap) {
+		names_cap = 2 * names_cap + 1;
+		names = reftable_realloc(names, names_cap * sizeof(char *));
+	}
+
+	names[names_len] = NULL;
+	*namesp = names;
+}
+
+int names_equal(char **a, char **b)
+{
+	while (*a && *b) {
+		if (strcmp(*a, *b)) {
+			return 0;
+		}
+
+		a++;
+		b++;
+	}
+
+	return *a == *b;
+}
+
+const char *reftable_error_str(int err)
+{
+	static char buf[250];
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return "I/O error";
+	case REFTABLE_FORMAT_ERROR:
+		return "corrupt reftable file";
+	case REFTABLE_NOT_EXIST_ERROR:
+		return "file does not exist";
+	case REFTABLE_LOCK_ERROR:
+		return "data is outdated";
+	case REFTABLE_API_ERROR:
+		return "misuse of the reftable API";
+	case REFTABLE_ZLIB_ERROR:
+		return "zlib failure";
+	case REFTABLE_NAME_CONFLICT:
+		return "file/directory conflict";
+	case REFTABLE_REFNAME_ERROR:
+		return "invalid refname";
+	case -1:
+		return "general error";
+	default:
+		snprintf(buf, sizeof(buf), "unknown error code %d", err);
+		return buf;
+	}
+}
+
+int reftable_error_to_errno(int err)
+{
+	switch (err) {
+	case REFTABLE_IO_ERROR:
+		return EIO;
+	case REFTABLE_FORMAT_ERROR:
+		return EFAULT;
+	case REFTABLE_NOT_EXIST_ERROR:
+		return ENOENT;
+	case REFTABLE_LOCK_ERROR:
+		return EBUSY;
+	case REFTABLE_API_ERROR:
+		return EINVAL;
+	case REFTABLE_ZLIB_ERROR:
+		return EDOM;
+	default:
+		return ERANGE;
+	}
+}
+
+void *(*reftable_malloc_ptr)(size_t sz) = &malloc;
+void *(*reftable_realloc_ptr)(void *, size_t) = &realloc;
+void (*reftable_free_ptr)(void *) = &free;
+
+void *reftable_malloc(size_t sz)
+{
+	return (*reftable_malloc_ptr)(sz);
+}
+
+void *reftable_realloc(void *p, size_t sz)
+{
+	return (*reftable_realloc_ptr)(p, sz);
+}
+
+void reftable_free(void *p)
+{
+	reftable_free_ptr(p);
+}
+
+void *reftable_calloc(size_t sz)
+{
+	void *p = reftable_malloc(sz);
+	memset(p, 0, sz);
+	return p;
+}
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *))
+{
+	reftable_malloc_ptr = malloc;
+	reftable_realloc_ptr = realloc;
+	reftable_free_ptr = free;
+}
diff --git a/reftable/basics.h b/reftable/basics.h
new file mode 100644
index 0000000000..e1b4c9fa1a
--- /dev/null
+++ b/reftable/basics.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BASICS_H
+#define BASICS_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#define true 1
+#define false 0
+
+/* Bigendian en/decoding of integers */
+
+void put_be24(uint8_t *out, uint32_t i);
+uint32_t get_be24(uint8_t *in);
+void put_be16(uint8_t *out, uint16_t i);
+
+/*
+  find smallest index i in [0, sz) at which f(i) is true, assuming
+  that f is ascending. Return sz if f(i) is false for all indices.
+*/
+int binsearch(size_t sz, int (*f)(size_t k, void *args), void *args);
+
+/*
+  Frees a NULL terminated array of malloced strings. The array itself is also
+  freed.
+ */
+void free_names(char **a);
+
+/* parse a newline separated list of names. Empty names are discarded. */
+void parse_names(char *buf, int size, char ***namesp);
+
+/* compares two NULL-terminated arrays of strings. */
+int names_equal(char **a, char **b);
+
+/* returns the array size of a NULL-terminated array of strings. */
+int names_length(char **names);
+
+/* Allocation routines; they invoke the functions set through
+ * reftable_set_alloc() */
+void *reftable_malloc(size_t sz);
+void *reftable_realloc(void *p, size_t sz);
+void reftable_free(void *p);
+void *reftable_calloc(size_t sz);
+
+#endif
diff --git a/reftable/block.c b/reftable/block.c
new file mode 100644
index 0000000000..d6a1a42a2b
--- /dev/null
+++ b/reftable/block.c
@@ -0,0 +1,432 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "zlib.h"
+
+int header_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 24;
+	case 2:
+		return 28;
+	}
+	abort();
+}
+
+int footer_size(int version)
+{
+	switch (version) {
+	case 1:
+		return 68;
+	case 2:
+		return 72;
+	}
+	abort();
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct strbuf *key);
+
+void block_writer_init(struct block_writer *bw, uint8_t typ, uint8_t *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size)
+{
+	bw->buf = buf;
+	bw->hash_size = hash_size;
+	bw->block_size = block_size;
+	bw->header_off = header_off;
+	bw->buf[header_off] = typ;
+	bw->next = header_off + 4;
+	bw->restart_interval = 16;
+	bw->entries = 0;
+	bw->restart_len = 0;
+	bw->last_key.len = 0;
+}
+
+uint8_t block_writer_type(struct block_writer *bw)
+{
+	return bw->buf[bw->header_off];
+}
+
+/* adds the reftable_record to the block. Returns -1 if it does not fit, 0 on
+   success */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec)
+{
+	struct strbuf empty = STRBUF_INIT;
+	struct strbuf last =
+		w->entries % w->restart_interval == 0 ? empty : w->last_key;
+	struct string_view out = {
+		.buf = w->buf + w->next,
+		.len = w->block_size - w->next,
+	};
+
+	struct string_view start = out;
+
+	bool restart = false;
+	struct strbuf key = STRBUF_INIT;
+	int n = 0;
+
+	reftable_record_key(rec, &key);
+	n = reftable_encode_key(&restart, out, last, key,
+				reftable_record_val_type(rec));
+	if (n < 0)
+		goto done;
+	string_view_consume(&out, n);
+
+	n = reftable_record_encode(rec, out, w->hash_size);
+	if (n < 0)
+		goto done;
+	string_view_consume(&out, n);
+
+	if (block_writer_register_restart(w, start.len - out.len, restart,
+					  &key) < 0)
+		goto done;
+
+	strbuf_release(&key);
+	return 0;
+
+done:
+	strbuf_release(&key);
+	return -1;
+}
+
+int block_writer_register_restart(struct block_writer *w, int n, bool restart,
+				  struct strbuf *key)
+{
+	int rlen = w->restart_len;
+	if (rlen >= MAX_RESTARTS) {
+		restart = false;
+	}
+
+	if (restart) {
+		rlen++;
+	}
+	if (2 + 3 * rlen + n > w->block_size - w->next)
+		return -1;
+	if (restart) {
+		if (w->restart_len == w->restart_cap) {
+			w->restart_cap = w->restart_cap * 2 + 1;
+			w->restarts = reftable_realloc(
+				w->restarts, sizeof(uint32_t) * w->restart_cap);
+		}
+
+		w->restarts[w->restart_len++] = w->next;
+	}
+
+	w->next += n;
+
+	strbuf_reset(&w->last_key);
+	strbuf_addbuf(&w->last_key, key);
+	w->entries++;
+	return 0;
+}
+
+int block_writer_finish(struct block_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->restart_len; i++) {
+		put_be24(w->buf + w->next, w->restarts[i]);
+		w->next += 3;
+	}
+
+	put_be16(w->buf + w->next, w->restart_len);
+	w->next += 2;
+	put_be24(w->buf + 1 + w->header_off, w->next);
+
+	if (block_writer_type(w) == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + w->header_off;
+		uint8_t *compressed = NULL;
+		int zresult = 0;
+		uLongf src_len = w->next - block_header_skip;
+		size_t dest_cap = src_len;
+
+		compressed = reftable_malloc(dest_cap);
+		while (1) {
+			uLongf out_dest_len = dest_cap;
+
+			zresult = compress2(compressed, &out_dest_len,
+					    w->buf + block_header_skip, src_len,
+					    9);
+			if (zresult == Z_BUF_ERROR) {
+				dest_cap *= 2;
+				compressed =
+					reftable_realloc(compressed, dest_cap);
+				continue;
+			}
+
+			if (Z_OK != zresult) {
+				reftable_free(compressed);
+				return REFTABLE_ZLIB_ERROR;
+			}
+
+			memcpy(w->buf + block_header_skip, compressed,
+			       out_dest_len);
+			w->next = out_dest_len + block_header_skip;
+			reftable_free(compressed);
+			break;
+		}
+	}
+	return w->next;
+}
+
+uint8_t block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_init(struct block_reader *br, struct reftable_block *block,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size)
+{
+	uint32_t full_block_size = table_block_size;
+	uint8_t typ = block->data[header_off];
+	uint32_t sz = get_be24(block->data + header_off + 1);
+
+	uint16_t restart_count = 0;
+	uint32_t restart_start = 0;
+	uint8_t *restart_bytes = NULL;
+
+	if (!reftable_is_block_type(typ))
+		return REFTABLE_FORMAT_ERROR;
+
+	if (typ == BLOCK_TYPE_LOG) {
+		int block_header_skip = 4 + header_off;
+		uLongf dst_len = sz - block_header_skip; /* total size of dest
+							    buffer. */
+		uLongf src_len = block->len - block_header_skip;
+		/* Log blocks specify the *uncompressed* size in their header.
+		 */
+		uint8_t *uncompressed = reftable_malloc(sz);
+
+		/* Copy over the block header verbatim. It's not compressed. */
+		memcpy(uncompressed, block->data, block_header_skip);
+
+		/* Uncompress */
+		if (Z_OK != uncompress_return_consumed(
+				    uncompressed + block_header_skip, &dst_len,
+				    block->data + block_header_skip,
+				    &src_len)) {
+			reftable_free(uncompressed);
+			return REFTABLE_ZLIB_ERROR;
+		}
+
+		if (dst_len + block_header_skip != sz)
+			return REFTABLE_FORMAT_ERROR;
+
+		/* We're done with the input data. */
+		reftable_block_done(block);
+		block->data = uncompressed;
+		block->len = sz;
+		block->source = malloc_block_source();
+		full_block_size = src_len + block_header_skip;
+	} else if (full_block_size == 0) {
+		full_block_size = sz;
+	} else if (sz < full_block_size && sz < block->len &&
+		   block->data[sz] != 0) {
+		/* If the block is smaller than the full block size, it is
+		   padded (data followed by '\0') or the next block is
+		   unaligned. */
+		full_block_size = sz;
+	}
+
+	restart_count = get_be16(block->data + sz - 2);
+	restart_start = sz - 2 - 3 * restart_count;
+	restart_bytes = block->data + restart_start;
+
+	/* transfer ownership. */
+	br->block = *block;
+	block->data = NULL;
+	block->len = 0;
+
+	br->hash_size = hash_size;
+	br->block_len = restart_start;
+	br->full_block_size = full_block_size;
+	br->header_off = header_off;
+	br->restart_count = restart_count;
+	br->restart_bytes = restart_bytes;
+
+	return 0;
+}
+
+static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+{
+	return get_be24(br->restart_bytes + 3 * i);
+}
+
+void block_reader_start(struct block_reader *br, struct block_iter *it)
+{
+	it->br = br;
+	strbuf_reset(&it->last_key);
+	it->next_off = br->header_off + 4;
+}
+
+struct restart_find_args {
+	int error;
+	struct strbuf key;
+	struct block_reader *r;
+};
+
+static int restart_key_less(size_t idx, void *args)
+{
+	struct restart_find_args *a = (struct restart_find_args *)args;
+	uint32_t off = block_reader_restart_offset(a->r, idx);
+	struct string_view in = {
+		.buf = a->r->block.data + off,
+		.len = a->r->block_len - off,
+	};
+
+	/* the restart key is verbatim in the block, so this could avoid the
+	   alloc for decoding the key */
+	struct strbuf rkey = STRBUF_INIT;
+	struct strbuf last_key = STRBUF_INIT;
+	uint8_t unused_extra;
+	int n = reftable_decode_key(&rkey, &unused_extra, last_key, in);
+	int result;
+	if (n < 0) {
+		a->error = 1;
+		return -1;
+	}
+
+	result = strbuf_cmp(&a->key, &rkey);
+	strbuf_release(&rkey);
+	return result;
+}
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+{
+	dest->br = src->br;
+	dest->next_off = src->next_off;
+	strbuf_reset(&dest->last_key);
+	strbuf_addbuf(&dest->last_key, &src->last_key);
+}
+
+int block_iter_next(struct block_iter *it, struct reftable_record *rec)
+{
+	struct string_view in = {
+		.buf = it->br->block.data + it->next_off,
+		.len = it->br->block_len - it->next_off,
+	};
+	struct string_view start = in;
+	struct strbuf key = STRBUF_INIT;
+	uint8_t extra = 0;
+	int n = 0;
+
+	if (it->next_off >= it->br->block_len)
+		return 1;
+
+	n = reftable_decode_key(&key, &extra, it->last_key, in);
+	if (n < 0)
+		return -1;
+
+	string_view_consume(&in, n);
+	n = reftable_record_decode(rec, key, extra, in, it->br->hash_size);
+	if (n < 0)
+		return -1;
+	string_view_consume(&in, n);
+
+	strbuf_reset(&it->last_key);
+	strbuf_addbuf(&it->last_key, &key);
+	it->next_off += start.len - in.len;
+	strbuf_release(&key);
+	return 0;
+}
+
+int block_reader_first_key(struct block_reader *br, struct strbuf *key)
+{
+	struct strbuf empty = STRBUF_INIT;
+	int off = br->header_off + 4;
+	struct string_view in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+
+	uint8_t extra = 0;
+	int n = reftable_decode_key(key, &extra, empty, in);
+	if (n < 0)
+		return n;
+
+	return 0;
+}
+
+int block_iter_seek(struct block_iter *it, struct strbuf *want)
+{
+	return block_reader_seek(it->br, it, want);
+}
+
+void block_iter_close(struct block_iter *it)
+{
+	strbuf_release(&it->last_key);
+}
+
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct strbuf *want)
+{
+	struct restart_find_args args = {
+		.key = *want,
+		.r = br,
+	};
+	struct reftable_record rec = reftable_new_record(block_reader_type(br));
+	struct strbuf key = STRBUF_INIT;
+	int err = 0;
+	struct block_iter next = {
+		.last_key = STRBUF_INIT,
+	};
+
+	int i = binsearch(br->restart_count, &restart_key_less, &args);
+	if (args.error) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	it->br = br;
+	if (i > 0) {
+		i--;
+		it->next_off = block_reader_restart_offset(br, i);
+	} else {
+		it->next_off = br->header_off + 4;
+	}
+
+	/* We're looking for the last entry less/equal than the wanted key, so
+	   we have to go one entry too far and then back up.
+	*/
+	while (true) {
+		block_iter_copy_from(&next, it);
+		err = block_iter_next(&next, &rec);
+		if (err < 0)
+			goto done;
+
+		reftable_record_key(&rec, &key);
+		if (err > 0 || strbuf_cmp(&key, want) >= 0) {
+			err = 0;
+			goto done;
+		}
+
+		block_iter_copy_from(it, &next);
+	}
+
+done:
+	strbuf_release(&key);
+	strbuf_release(&next.last_key);
+	reftable_record_destroy(&rec);
+
+	return err;
+}
+
+void block_writer_clear(struct block_writer *bw)
+{
+	FREE_AND_NULL(bw->restarts);
+	strbuf_release(&bw->last_key);
+	/* the block is not owned. */
+}
diff --git a/reftable/block.h b/reftable/block.h
new file mode 100644
index 0000000000..14a6f835f4
--- /dev/null
+++ b/reftable/block.h
@@ -0,0 +1,129 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef BLOCK_H
+#define BLOCK_H
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+
+/*
+  Writes reftable blocks. The block_writer is reused across blocks to minimize
+  allocation overhead.
+*/
+struct block_writer {
+	uint8_t *buf;
+	uint32_t block_size;
+
+	/* Offset ofof the global header. Nonzero in the first block only. */
+	uint32_t header_off;
+
+	/* How often to restart keys. */
+	int restart_interval;
+	int hash_size;
+
+	/* Offset of next uint8_t to write. */
+	uint32_t next;
+	uint32_t *restarts;
+	uint32_t restart_len;
+	uint32_t restart_cap;
+
+	struct strbuf last_key;
+	int entries;
+};
+
+/*
+  initializes the blockwriter to write `typ` entries, using `buf` as temporary
+  storage. `buf` is not owned by the block_writer. */
+void block_writer_init(struct block_writer *bw, uint8_t typ, uint8_t *buf,
+		       uint32_t block_size, uint32_t header_off, int hash_size);
+
+/*
+  returns the block type (eg. 'r' for ref records.
+*/
+uint8_t block_writer_type(struct block_writer *bw);
+
+/* appends the record, or -1 if it doesn't fit. */
+int block_writer_add(struct block_writer *w, struct reftable_record *rec);
+
+/* appends the key restarts, and compress the block if necessary. */
+int block_writer_finish(struct block_writer *w);
+
+/* clears out internally allocated block_writer members. */
+void block_writer_clear(struct block_writer *bw);
+
+/* Read a block. */
+struct block_reader {
+	/* offset of the block header; nonzero for the first block in a
+	 * reftable. */
+	uint32_t header_off;
+
+	/* the memory block */
+	struct reftable_block block;
+	int hash_size;
+
+	/* size of the data, excluding restart data. */
+	uint32_t block_len;
+	uint8_t *restart_bytes;
+	uint16_t restart_count;
+
+	/* size of the data in the file. For log blocks, this is the compressed
+	 * size. */
+	uint32_t full_block_size;
+};
+
+/* Iterate over entries in a block */
+struct block_iter {
+	/* offset within the block of the next entry to read. */
+	uint32_t next_off;
+	struct block_reader *br;
+
+	/* key for last entry we read. */
+	struct strbuf last_key;
+};
+
+/* initializes a block reader. */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Position `it` at start of the block */
+void block_reader_start(struct block_reader *br, struct block_iter *it);
+
+/* Position `it` to the `want` key in the block */
+int block_reader_seek(struct block_reader *br, struct block_iter *it,
+		      struct strbuf *want);
+
+/* Returns the block type (eg. 'r' for refs) */
+uint8_t block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct strbuf *key);
+
+void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+
+/* return < 0 for error, 0 for OK, > 0 for EOF. */
+int block_iter_next(struct block_iter *it, struct reftable_record *rec);
+
+/* Seek to `want` with in the block pointed to by `it` */
+int block_iter_seek(struct block_iter *it, struct strbuf *want);
+
+/* deallocate memory for `it`. The block reader and its block is left intact. */
+void block_iter_close(struct block_iter *it);
+
+/* size of file header, depending on format version */
+int header_size(int version);
+
+/* size of file footer, depending on format version */
+int footer_size(int version);
+
+/* returns a block to its source. */
+void reftable_block_done(struct reftable_block *ret);
+
+#endif
diff --git a/reftable/block_test.c b/reftable/block_test.c
new file mode 100644
index 0000000000..3e071d7851
--- /dev/null
+++ b/reftable/block_test.c
@@ -0,0 +1,157 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "block.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+struct binsearch_args {
+	int key;
+	int *arr;
+};
+
+static int binsearch_func(size_t i, void *void_args)
+{
+	struct binsearch_args *args = (struct binsearch_args *)void_args;
+
+	return args->key < args->arr[i];
+}
+
+static void test_binsearch(void)
+{
+	int arr[] = { 2, 4, 6, 8, 10 };
+	size_t sz = ARRAY_SIZE(arr);
+	struct binsearch_args args = {
+		.arr = arr,
+	};
+
+	int i = 0;
+	for (i = 1; i < 11; i++) {
+		int res;
+		args.key = i;
+		res = binsearch(sz, &binsearch_func, &args);
+
+		if (res < sz) {
+			assert(args.key < arr[res]);
+			if (res > 0) {
+				assert(args.key >= arr[res - 1]);
+			}
+		} else {
+			assert(args.key == 10 || args.key == 11);
+		}
+	}
+}
+
+static void test_block_read_write(void)
+{
+	const int header_off = 21; /* random */
+	char *names[30];
+	const int N = ARRAY_SIZE(names);
+	const int block_size = 1024;
+	struct reftable_block block = { 0 };
+	struct block_writer bw = {
+		.last_key = STRBUF_INIT,
+	};
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_record rec = { 0 };
+	int i = 0;
+	int n;
+	struct block_reader br = { 0 };
+	struct block_iter it = { .last_key = STRBUF_INIT };
+	int j = 0;
+	struct strbuf want = STRBUF_INIT;
+
+	block.data = reftable_calloc(block_size);
+	block.len = block_size;
+	block.source = malloc_block_source();
+	block_writer_init(&bw, BLOCK_TYPE_REF, block.data, block_size,
+			  header_off, hash_size(SHA1_ID));
+	reftable_record_from_ref(&rec, &ref);
+
+	for (i = 0; i < N; i++) {
+		char name[100];
+		uint8_t hash[SHA1_SIZE];
+		snprintf(name, sizeof(name), "branch%02d", i);
+		memset(hash, i, sizeof(hash));
+
+		ref.refname = name;
+		ref.value = hash;
+		names[i] = xstrdup(name);
+		n = block_writer_add(&bw, &rec);
+		ref.refname = NULL;
+		ref.value = NULL;
+		assert(n == 0);
+	}
+
+	n = block_writer_finish(&bw);
+	assert(n > 0);
+
+	block_writer_clear(&bw);
+
+	block_reader_init(&br, &block, header_off, block_size, SHA1_SIZE);
+
+	block_reader_start(&br, &it);
+
+	while (true) {
+		int r = block_iter_next(&it, &rec);
+		assert(r >= 0);
+		if (r > 0) {
+			break;
+		}
+		assert_streq(names[j], ref.refname);
+		j++;
+	}
+
+	reftable_record_clear(&rec);
+	block_iter_close(&it);
+
+	for (i = 0; i < N; i++) {
+		struct block_iter it = { .last_key = STRBUF_INIT };
+		strbuf_reset(&want);
+		strbuf_addstr(&want, names[i]);
+
+		n = block_reader_seek(&br, &it, &want);
+		assert(n == 0);
+
+		n = block_iter_next(&it, &rec);
+		assert(n == 0);
+
+		assert_streq(names[i], ref.refname);
+
+		want.len--;
+		n = block_reader_seek(&br, &it, &want);
+		assert(n == 0);
+
+		n = block_iter_next(&it, &rec);
+		assert(n == 0);
+		assert_streq(names[10 * (i / 10)], ref.refname);
+
+		block_iter_close(&it);
+	}
+
+	reftable_record_clear(&rec);
+	reftable_block_done(&br.block);
+	strbuf_release(&want);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+}
+
+int block_test_main(int argc, const char *argv[])
+{
+	add_test_case("binsearch", &test_binsearch);
+	add_test_case("block_read_write", &test_block_read_write);
+	return test_main(argc, argv);
+}
diff --git a/reftable/compat.c b/reftable/compat.c
new file mode 100644
index 0000000000..be0fd65d2f
--- /dev/null
+++ b/reftable/compat.c
@@ -0,0 +1,98 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+
+*/
+
+/* compat.c - compatibility functions for standalone compilation */
+
+#include "system.h"
+#include "basics.h"
+
+#ifdef REFTABLE_STANDALONE
+
+#include <dirent.h>
+
+void put_be32(void *p, uint32_t i)
+{
+	uint8_t *out = (uint8_t *)p;
+
+	out[0] = (uint8_t)((i >> 24) & 0xff);
+	out[1] = (uint8_t)((i >> 16) & 0xff);
+	out[2] = (uint8_t)((i >> 8) & 0xff);
+	out[3] = (uint8_t)((i)&0xff);
+}
+
+uint32_t get_be32(uint8_t *in)
+{
+	return (uint32_t)(in[0]) << 24 | (uint32_t)(in[1]) << 16 |
+	       (uint32_t)(in[2]) << 8 | (uint32_t)(in[3]);
+}
+
+void put_be64(void *p, uint64_t v)
+{
+	uint8_t *out = (uint8_t *)p;
+	int i = sizeof(uint64_t);
+	while (i--) {
+		out[i] = (uint8_t)(v & 0xff);
+		v >>= 8;
+	}
+}
+
+uint64_t get_be64(void *out)
+{
+	uint8_t *bytes = (uint8_t *)out;
+	uint64_t v = 0;
+	int i = 0;
+	for (i = 0; i < sizeof(uint64_t); i++) {
+		v = (v << 8) | (uint8_t)(bytes[i] & 0xff);
+	}
+	return v;
+}
+
+uint16_t get_be16(uint8_t *in)
+{
+	return (uint32_t)(in[0]) << 8 | (uint32_t)(in[1]);
+}
+
+char *xstrdup(const char *s)
+{
+	int l = strlen(s);
+	char *dest = (char *)reftable_malloc(l + 1);
+	strncpy(dest, s, l + 1);
+	return dest;
+}
+
+void sleep_millisec(int millisecs)
+{
+	usleep(millisecs * 1000);
+}
+
+void reftable_clear_dir(const char *dirname)
+{
+	DIR *dir = opendir(dirname);
+	struct dirent *ent = NULL;
+	assert(dir);
+	while ((ent = readdir(dir)) != NULL) {
+		unlinkat(dirfd(dir), ent->d_name, 0);
+	}
+	closedir(dir);
+	rmdir(dirname);
+}
+
+#else
+
+#include "../dir.h"
+
+void reftable_clear_dir(const char *dirname)
+{
+	struct strbuf path = STRBUF_INIT;
+	strbuf_addstr(&path, dirname);
+	remove_dir_recursively(&path, 0);
+	strbuf_release(&path);
+}
+
+#endif
diff --git a/reftable/compat.h b/reftable/compat.h
new file mode 100644
index 0000000000..a765c57e96
--- /dev/null
+++ b/reftable/compat.h
@@ -0,0 +1,48 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef COMPAT_H
+#define COMPAT_H
+
+#include "system.h"
+
+#ifdef REFTABLE_STANDALONE
+
+/* functions that git-core provides, for standalone compilation */
+#include <stdint.h>
+
+uint64_t get_be64(void *in);
+void put_be64(void *out, uint64_t i);
+
+void put_be32(void *out, uint32_t i);
+uint32_t get_be32(uint8_t *in);
+
+uint16_t get_be16(uint8_t *in);
+
+#define ARRAY_SIZE(a) sizeof((a)) / sizeof((a)[0])
+#define FREE_AND_NULL(x)          \
+	do {                      \
+		reftable_free(x); \
+		(x) = NULL;       \
+	} while (0)
+#define QSORT(arr, n, cmp) qsort(arr, n, sizeof(arr[0]), cmp)
+#define SWAP(a, b)                              \
+	{                                       \
+		char tmp[sizeof(a)];            \
+		assert(sizeof(a) == sizeof(b)); \
+		memcpy(&tmp[0], &a, sizeof(a)); \
+		memcpy(&a, &b, sizeof(a));      \
+		memcpy(&b, &tmp[0], sizeof(a)); \
+	}
+
+char *xstrdup(const char *s);
+
+void sleep_millisec(int millisecs);
+
+#endif
+#endif
diff --git a/reftable/constants.h b/reftable/constants.h
new file mode 100644
index 0000000000..5eee72c4c1
--- /dev/null
+++ b/reftable/constants.h
@@ -0,0 +1,21 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef CONSTANTS_H
+#define CONSTANTS_H
+
+#define BLOCK_TYPE_LOG 'g'
+#define BLOCK_TYPE_INDEX 'i'
+#define BLOCK_TYPE_REF 'r'
+#define BLOCK_TYPE_OBJ 'o'
+#define BLOCK_TYPE_ANY 0
+
+#define MAX_RESTARTS ((1 << 16) - 1)
+#define DEFAULT_BLOCK_SIZE 4096
+
+#endif
diff --git a/reftable/dump.c b/reftable/dump.c
new file mode 100644
index 0000000000..00b444e8c9
--- /dev/null
+++ b/reftable/dump.c
@@ -0,0 +1,212 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+
+#include "reftable.h"
+#include "reftable-tests.h"
+
+static uint32_t hash_id;
+
+static int dump_table(const char *tablename)
+{
+	struct reftable_block_source src = { 0 };
+	int err = reftable_block_source_from_file(&src, tablename);
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+	struct reftable_reader *r = NULL;
+
+	if (err < 0)
+		return err;
+
+	err = reftable_new_reader(&r, &src, tablename);
+	if (err < 0)
+		return err;
+
+	err = reftable_reader_seek_ref(r, &it, "");
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_ref_record_print(&ref, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_ref_record_clear(&ref);
+
+	err = reftable_reader_seek_log(r, &it, "");
+	if (err < 0) {
+		return err;
+	}
+	while (1) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_log_record_print(&log, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_log_record_clear(&log);
+
+	reftable_reader_free(r);
+	return 0;
+}
+
+static int compact_stack(const char *stackdir)
+{
+	struct reftable_stack *stack = NULL;
+	struct reftable_write_options cfg = {};
+
+	int err = reftable_new_stack(&stack, stackdir, cfg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_stack_compact_all(stack, NULL);
+	if (err < 0)
+		goto done;
+done:
+	if (stack != NULL) {
+		reftable_stack_destroy(stack);
+	}
+	return err;
+}
+
+static int dump_stack(const char *stackdir)
+{
+	struct reftable_stack *stack = NULL;
+	struct reftable_write_options cfg = {};
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+	struct reftable_merged_table *merged = NULL;
+
+	int err = reftable_new_stack(&stack, stackdir, cfg);
+	if (err < 0)
+		return err;
+
+	merged = reftable_stack_merged_table(stack);
+
+	err = reftable_merged_table_seek_ref(merged, &it, "");
+	if (err < 0) {
+		return err;
+	}
+
+	while (1) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_ref_record_print(&ref, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_ref_record_clear(&ref);
+
+	err = reftable_merged_table_seek_log(merged, &it, "");
+	if (err < 0) {
+		return err;
+	}
+	while (1) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0) {
+			return err;
+		}
+		reftable_log_record_print(&log, hash_id);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_log_record_clear(&log);
+
+	reftable_stack_destroy(stack);
+	return 0;
+}
+
+static void print_help(void)
+{
+	printf("usage: dump [-cst] arg\n\n"
+	       "options: \n"
+	       "  -c compact\n"
+	       "  -t dump table\n"
+	       "  -s dump stack\n"
+	       "  -h this help\n"
+	       "  -2 use SHA256\n"
+	       "\n");
+}
+
+int reftable_dump_main(int argc, char *const *argv)
+{
+	int err = 0;
+	int opt;
+	int opt_dump_table = 0;
+	int opt_dump_stack = 0;
+	int opt_compact = 0;
+	const char *arg = NULL;
+	while ((opt = getopt(argc, argv, "2chts")) != -1) {
+		switch (opt) {
+		case '2':
+			hash_id = 0x73323536;
+			break;
+		case 't':
+			opt_dump_table = 1;
+			break;
+		case 's':
+			opt_dump_stack = 1;
+			break;
+		case 'c':
+			opt_compact = 1;
+			break;
+		case '?':
+		case 'h':
+			print_help();
+			return 2;
+			break;
+		}
+	}
+
+	if (argv[optind] == NULL) {
+		fprintf(stderr, "need argument\n");
+		print_help();
+		return 2;
+	}
+
+	arg = argv[optind];
+
+	if (opt_dump_table) {
+		err = dump_table(arg);
+	} else if (opt_dump_stack) {
+		err = dump_stack(arg);
+	} else if (opt_compact) {
+		err = compact_stack(arg);
+	}
+
+	if (err < 0) {
+		fprintf(stderr, "%s: %s: %s\n", argv[0], arg,
+			reftable_error_str(err));
+		return 1;
+	}
+	return 0;
+}
diff --git a/reftable/file.c b/reftable/file.c
new file mode 100644
index 0000000000..eacf41f23b
--- /dev/null
+++ b/reftable/file.c
@@ -0,0 +1,95 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+
+#include "block.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+struct file_block_source {
+	int fd;
+	uint64_t size;
+};
+
+static uint64_t file_size(void *b)
+{
+	return ((struct file_block_source *)b)->size;
+}
+
+static void file_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void file_close(void *b)
+{
+	int fd = ((struct file_block_source *)b)->fd;
+	if (fd > 0) {
+		close(fd);
+		((struct file_block_source *)b)->fd = 0;
+	}
+
+	reftable_free(b);
+}
+
+static int file_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			   uint32_t size)
+{
+	struct file_block_source *b = (struct file_block_source *)v;
+	assert(off + size <= b->size);
+	dest->data = reftable_malloc(size);
+	if (pread(b->fd, dest->data, size, off) != size)
+		return -1;
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable file_vtable = {
+	.size = &file_size,
+	.read_block = &file_read_block,
+	.return_block = &file_return_block,
+	.close = &file_close,
+};
+
+int reftable_block_source_from_file(struct reftable_block_source *bs,
+				    const char *name)
+{
+	struct stat st = { 0 };
+	int err = 0;
+	int fd = open(name, O_RDONLY);
+	struct file_block_source *p = NULL;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			return REFTABLE_NOT_EXIST_ERROR;
+		}
+		return -1;
+	}
+
+	err = fstat(fd, &st);
+	if (err < 0)
+		return -1;
+
+	p = reftable_calloc(sizeof(struct file_block_source));
+	p->size = st.st_size;
+	p->fd = fd;
+
+	assert(bs->ops == NULL);
+	bs->ops = &file_vtable;
+	bs->arg = p;
+	return 0;
+}
+
+int reftable_fd_write(void *arg, const void *data, size_t sz)
+{
+	int *fdp = (int *)arg;
+	return write(*fdp, data, sz);
+}
diff --git a/reftable/iter.c b/reftable/iter.c
new file mode 100644
index 0000000000..7f385270b0
--- /dev/null
+++ b/reftable/iter.c
@@ -0,0 +1,242 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "iter.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "reftable.h"
+
+bool iterator_is_null(struct reftable_iterator *it)
+{
+	return it->ops == NULL;
+}
+
+static int empty_iterator_next(void *arg, struct reftable_record *rec)
+{
+	return 1;
+}
+
+static void empty_iterator_close(void *arg)
+{
+}
+
+struct reftable_iterator_vtable empty_vtable = {
+	.next = &empty_iterator_next,
+	.close = &empty_iterator_close,
+};
+
+void iterator_set_empty(struct reftable_iterator *it)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = NULL;
+	it->ops = &empty_vtable;
+}
+
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec)
+{
+	return it->ops->next(it->iter_arg, rec);
+}
+
+void reftable_iterator_destroy(struct reftable_iterator *it)
+{
+	if (it->ops == NULL) {
+		return;
+	}
+	it->ops->close(it->iter_arg);
+	it->ops = NULL;
+	FREE_AND_NULL(it->iter_arg);
+}
+
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, ref);
+	return iterator_next(it, &rec);
+}
+
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, log);
+	return iterator_next(it, &rec);
+}
+
+static void filtering_ref_iterator_close(void *iter_arg)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	strbuf_release(&fri->oid);
+	reftable_iterator_destroy(&fri->it);
+}
+
+static int filtering_ref_iterator_next(void *iter_arg,
+				       struct reftable_record *rec)
+{
+	struct filtering_ref_iterator *fri =
+		(struct filtering_ref_iterator *)iter_arg;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+	int err = 0;
+	while (true) {
+		err = reftable_iterator_next_ref(&fri->it, ref);
+		if (err != 0) {
+			break;
+		}
+
+		if (fri->double_check) {
+			struct reftable_iterator it = { 0 };
+
+			err = reftable_table_seek_ref(&fri->tab, &it,
+						      ref->refname);
+			if (err == 0) {
+				err = reftable_iterator_next_ref(&it, ref);
+			}
+
+			reftable_iterator_destroy(&it);
+
+			if (err < 0) {
+				break;
+			}
+
+			if (err > 0) {
+				continue;
+			}
+		}
+
+		if ((ref->target_value != NULL &&
+		     !memcmp(fri->oid.buf, ref->target_value, fri->oid.len)) ||
+		    (ref->value != NULL &&
+		     !memcmp(fri->oid.buf, ref->value, fri->oid.len))) {
+			return 0;
+		}
+	}
+
+	reftable_ref_record_clear(ref);
+	return err;
+}
+
+struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.next = &filtering_ref_iterator_next,
+	.close = &filtering_ref_iterator_close,
+};
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *it,
+					  struct filtering_ref_iterator *fri)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = fri;
+	it->ops = &filtering_ref_iterator_vtable;
+}
+
+static void indexed_table_ref_iter_close(void *p)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	block_iter_close(&it->cur);
+	reftable_block_done(&it->block_reader.block);
+	strbuf_release(&it->oid);
+}
+
+static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
+{
+	uint64_t off;
+	int err = 0;
+	if (it->offset_idx == it->offset_len) {
+		it->finished = true;
+		return 1;
+	}
+
+	reftable_block_done(&it->block_reader.block);
+
+	off = it->offsets[it->offset_idx++];
+	err = reader_init_block_reader(it->r, &it->block_reader, off,
+				       BLOCK_TYPE_REF);
+	if (err < 0) {
+		return err;
+	}
+	if (err > 0) {
+		/* indexed block does not exist. */
+		return REFTABLE_FORMAT_ERROR;
+	}
+	block_reader_start(&it->block_reader, &it->cur);
+	return 0;
+}
+
+static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec)
+{
+	struct indexed_table_ref_iter *it = (struct indexed_table_ref_iter *)p;
+	struct reftable_ref_record *ref =
+		(struct reftable_ref_record *)rec->data;
+
+	while (true) {
+		int err = block_iter_next(&it->cur, rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			err = indexed_table_ref_iter_next_block(it);
+			if (err < 0) {
+				return err;
+			}
+
+			if (it->finished) {
+				return 1;
+			}
+			continue;
+		}
+
+		if (!memcmp(it->oid.buf, ref->target_value, it->oid.len) ||
+		    !memcmp(it->oid.buf, ref->value, it->oid.len)) {
+			return 0;
+		}
+	}
+}
+
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, uint8_t *oid,
+			       int oid_len, uint64_t *offsets, int offset_len)
+{
+	struct indexed_table_ref_iter empty = INDEXED_TABLE_REF_ITER_INIT;
+	struct indexed_table_ref_iter *itr =
+		reftable_calloc(sizeof(struct indexed_table_ref_iter));
+	int err = 0;
+
+	*itr = empty;
+	itr->r = r;
+	strbuf_add(&itr->oid, oid, oid_len);
+
+	itr->offsets = offsets;
+	itr->offset_len = offset_len;
+
+	err = indexed_table_ref_iter_next_block(itr);
+	if (err < 0) {
+		reftable_free(itr);
+	} else {
+		*dest = itr;
+	}
+	return err;
+}
+
+struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.next = &indexed_table_ref_iter_next,
+	.close = &indexed_table_ref_iter_close,
+};
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = itr;
+	it->ops = &indexed_table_ref_iter_vtable;
+}
diff --git a/reftable/iter.h b/reftable/iter.h
new file mode 100644
index 0000000000..c96aaed583
--- /dev/null
+++ b/reftable/iter.h
@@ -0,0 +1,72 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef ITER_H
+#define ITER_H
+
+#include "block.h"
+#include "record.h"
+#include "strbuf.h"
+
+struct reftable_iterator_vtable {
+	int (*next)(void *iter_arg, struct reftable_record *rec);
+	void (*close)(void *iter_arg);
+};
+
+void iterator_set_empty(struct reftable_iterator *it);
+int iterator_next(struct reftable_iterator *it, struct reftable_record *rec);
+
+/* Returns true for a zeroed out iterator, such as the one returned from
+   iterator_destroy. */
+bool iterator_is_null(struct reftable_iterator *it);
+
+/* iterator that produces only ref records that point to `oid` */
+struct filtering_ref_iterator {
+	bool double_check;
+	struct reftable_table tab;
+	struct strbuf oid;
+	struct reftable_iterator it;
+};
+#define FILTERING_REF_ITERATOR_INIT \
+	{                           \
+		.oid = STRBUF_INIT  \
+	}
+
+void iterator_from_filtering_ref_iterator(struct reftable_iterator *,
+					  struct filtering_ref_iterator *);
+
+/* iterator that produces only ref records that point to `oid`,
+   but using the object index.
+ */
+struct indexed_table_ref_iter {
+	struct reftable_reader *r;
+	struct strbuf oid;
+
+	/* mutable */
+	uint64_t *offsets;
+
+	/* Points to the next offset to read. */
+	int offset_idx;
+	int offset_len;
+	struct block_reader block_reader;
+	struct block_iter cur;
+	bool finished;
+};
+
+#define INDEXED_TABLE_REF_ITER_INIT                                     \
+	{                                                               \
+		.cur = { .last_key = STRBUF_INIT }, .oid = STRBUF_INIT, \
+	}
+
+void iterator_from_indexed_table_ref_iter(struct reftable_iterator *it,
+					  struct indexed_table_ref_iter *itr);
+int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
+			       struct reftable_reader *r, uint8_t *oid,
+			       int oid_len, uint64_t *offsets, int offset_len);
+
+#endif
diff --git a/reftable/merged.c b/reftable/merged.c
new file mode 100644
index 0000000000..770178183f
--- /dev/null
+++ b/reftable/merged.c
@@ -0,0 +1,317 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "constants.h"
+#include "iter.h"
+#include "pq.h"
+#include "reader.h"
+#include "record.h"
+
+static int merged_iter_init(struct merged_iter *mi)
+{
+	int i = 0;
+	for (i = 0; i < mi->stack_len; i++) {
+		struct reftable_record rec = reftable_new_record(mi->typ);
+		int err = iterator_next(&mi->stack[i], &rec);
+		if (err < 0) {
+			return err;
+		}
+
+		if (err > 0) {
+			reftable_iterator_destroy(&mi->stack[i]);
+			reftable_record_destroy(&rec);
+		} else {
+			struct pq_entry e = {
+				.rec = rec,
+				.index = i,
+			};
+			merged_iter_pqueue_add(&mi->pq, e);
+		}
+	}
+
+	return 0;
+}
+
+static void merged_iter_close(void *p)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	int i = 0;
+	merged_iter_pqueue_clear(&mi->pq);
+	for (i = 0; i < mi->stack_len; i++) {
+		reftable_iterator_destroy(&mi->stack[i]);
+	}
+	reftable_free(mi->stack);
+}
+
+static int merged_iter_advance_nonnull_subiter(struct merged_iter *mi,
+					       size_t idx)
+{
+	struct reftable_record rec = reftable_new_record(mi->typ);
+	struct pq_entry e = {
+		.rec = rec,
+		.index = idx,
+	};
+	int err = iterator_next(&mi->stack[idx], &rec);
+	if (err < 0)
+		return err;
+
+	if (err > 0) {
+		reftable_iterator_destroy(&mi->stack[idx]);
+		reftable_record_destroy(&rec);
+		return 0;
+	}
+
+	merged_iter_pqueue_add(&mi->pq, e);
+	return 0;
+}
+
+static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
+{
+	if (iterator_is_null(&mi->stack[idx]))
+		return 0;
+	return merged_iter_advance_nonnull_subiter(mi, idx);
+}
+
+static int merged_iter_next_entry(struct merged_iter *mi,
+				  struct reftable_record *rec)
+{
+	struct strbuf entry_key = STRBUF_INIT;
+	struct pq_entry entry = { 0 };
+	int err = 0;
+
+	if (merged_iter_pqueue_is_empty(mi->pq))
+		return 1;
+
+	entry = merged_iter_pqueue_remove(&mi->pq);
+	err = merged_iter_advance_subiter(mi, entry.index);
+	if (err < 0)
+		return err;
+
+	/*
+	  One can also use reftable as datacenter-local storage, where the ref
+	  database is maintained in globally consistent database (eg.
+	  CockroachDB or Spanner). In this scenario, replication delays together
+	  with compaction may cause newer tables to contain older entries. In
+	  such a deployment, the loop below must be changed to collect all
+	  entries for the same key, and return new the newest one.
+	*/
+	reftable_record_key(&entry.rec, &entry_key);
+	while (!merged_iter_pqueue_is_empty(mi->pq)) {
+		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
+		struct strbuf k = STRBUF_INIT;
+		int err = 0, cmp = 0;
+
+		reftable_record_key(&top.rec, &k);
+
+		cmp = strbuf_cmp(&k, &entry_key);
+		strbuf_release(&k);
+
+		if (cmp > 0) {
+			break;
+		}
+
+		merged_iter_pqueue_remove(&mi->pq);
+		err = merged_iter_advance_subiter(mi, top.index);
+		if (err < 0) {
+			return err;
+		}
+		reftable_record_destroy(&top.rec);
+	}
+
+	reftable_record_copy_from(rec, &entry.rec, hash_size(mi->hash_id));
+	reftable_record_destroy(&entry.rec);
+	strbuf_release(&entry_key);
+	return 0;
+}
+
+static int merged_iter_next(struct merged_iter *mi, struct reftable_record *rec)
+{
+	while (true) {
+		int err = merged_iter_next_entry(mi, rec);
+		if (err == 0 && mi->suppress_deletions &&
+		    reftable_record_is_deletion(rec)) {
+			continue;
+		}
+
+		return err;
+	}
+}
+
+static int merged_iter_next_void(void *p, struct reftable_record *rec)
+{
+	struct merged_iter *mi = (struct merged_iter *)p;
+	if (merged_iter_pqueue_is_empty(mi->pq))
+		return 1;
+
+	return merged_iter_next(mi, rec);
+}
+
+struct reftable_iterator_vtable merged_iter_vtable = {
+	.next = &merged_iter_next_void,
+	.close = &merged_iter_close,
+};
+
+static void iterator_from_merged_iter(struct reftable_iterator *it,
+				      struct merged_iter *mi)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = mi;
+	it->ops = &merged_iter_vtable;
+}
+
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_table *stack, int n,
+			      uint32_t hash_id)
+{
+	struct reftable_merged_table *m = NULL;
+	uint64_t last_max = 0;
+	uint64_t first_min = 0;
+	int i = 0;
+	for (i = 0; i < n; i++) {
+		uint64_t min = reftable_table_min_update_index(&stack[i]);
+		uint64_t max = reftable_table_max_update_index(&stack[i]);
+
+		if (reftable_table_hash_id(&stack[i]) != hash_id) {
+			return REFTABLE_FORMAT_ERROR;
+		}
+		if (i == 0 || min < first_min) {
+			first_min = min;
+		}
+		if (i == 0 || max > last_max) {
+			last_max = max;
+		}
+	}
+
+	m = (struct reftable_merged_table *)reftable_calloc(
+		sizeof(struct reftable_merged_table));
+	m->stack = stack;
+	m->stack_len = n;
+	m->min = first_min;
+	m->max = last_max;
+	m->hash_id = hash_id;
+	*dest = m;
+	return 0;
+}
+
+/* clears the list of subtable, without affecting the readers themselves. */
+void merged_table_clear(struct reftable_merged_table *mt)
+{
+	FREE_AND_NULL(mt->stack);
+	mt->stack_len = 0;
+}
+
+void reftable_merged_table_free(struct reftable_merged_table *mt)
+{
+	if (mt == NULL) {
+		return;
+	}
+	merged_table_clear(mt);
+	reftable_free(mt);
+}
+
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt)
+{
+	return mt->max;
+}
+
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
+{
+	return mt->min;
+}
+
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec)
+{
+	struct reftable_iterator *iters = reftable_calloc(
+		sizeof(struct reftable_iterator) * mt->stack_len);
+	struct merged_iter merged = {
+		.stack = iters,
+		.typ = reftable_record_type(rec),
+		.hash_id = mt->hash_id,
+		.suppress_deletions = mt->suppress_deletions,
+	};
+	int n = 0;
+	int err = 0;
+	int i = 0;
+	for (i = 0; i < mt->stack_len && err == 0; i++) {
+		int e = reftable_table_seek_record(&mt->stack[i], &iters[n],
+						   rec);
+		if (e < 0) {
+			err = e;
+		}
+		if (e == 0) {
+			n++;
+		}
+	}
+	if (err < 0) {
+		int i = 0;
+		for (i = 0; i < n; i++) {
+			reftable_iterator_destroy(&iters[i]);
+		}
+		reftable_free(iters);
+		return err;
+	}
+
+	merged.stack_len = n;
+	err = merged_iter_init(&merged);
+	if (err < 0) {
+		merged_iter_close(&merged);
+		return err;
+	} else {
+		struct merged_iter *p =
+			reftable_malloc(sizeof(struct merged_iter));
+		*p = merged;
+		iterator_from_merged_iter(it, p);
+	}
+	return 0;
+}
+
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	struct reftable_ref_record ref = {
+		.refname = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.refname = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return merged_table_seek_record(mt, it, &rec);
+}
+
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_merged_table_seek_log_at(mt, it, name, max);
+}
+
+uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *mt)
+{
+	return mt->hash_id;
+}
diff --git a/reftable/merged.h b/reftable/merged.h
new file mode 100644
index 0000000000..a568726da8
--- /dev/null
+++ b/reftable/merged.h
@@ -0,0 +1,43 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef MERGED_H
+#define MERGED_H
+
+#include "pq.h"
+#include "reftable.h"
+
+struct reftable_merged_table {
+	struct reftable_table *stack;
+	size_t stack_len;
+	uint32_t hash_id;
+	bool suppress_deletions;
+
+	uint64_t min;
+	uint64_t max;
+};
+
+struct merged_iter {
+	struct reftable_iterator *stack;
+	uint32_t hash_id;
+	size_t stack_len;
+	uint8_t typ;
+	bool suppress_deletions;
+	struct merged_iter_pqueue pq;
+};
+
+void merged_table_clear(struct reftable_merged_table *mt);
+int merged_table_seek_record(struct reftable_merged_table *mt,
+			     struct reftable_iterator *it,
+			     struct reftable_record *rec);
+
+int reftable_table_seek_record(struct reftable_table *tab,
+			       struct reftable_iterator *it,
+			       struct reftable_record *rec);
+
+#endif
diff --git a/reftable/merged_test.c b/reftable/merged_test.c
new file mode 100644
index 0000000000..2477eed06b
--- /dev/null
+++ b/reftable/merged_test.c
@@ -0,0 +1,286 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "merged.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "pq.h"
+#include "reader.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_pq(void)
+{
+	char *names[54] = { 0 };
+	int N = ARRAY_SIZE(names) - 1;
+
+	struct merged_iter_pqueue pq = { 0 };
+	const char *last = NULL;
+
+	int i = 0;
+	for (i = 0; i < N; i++) {
+		char name[100];
+		snprintf(name, sizeof(name), "%02d", i);
+		names[i] = xstrdup(name);
+	}
+
+	i = 1;
+	do {
+		struct reftable_record rec =
+			reftable_new_record(BLOCK_TYPE_REF);
+		struct pq_entry e = { 0 };
+
+		reftable_record_as_ref(&rec)->refname = names[i];
+		e.rec = rec;
+		merged_iter_pqueue_add(&pq, e);
+		merged_iter_pqueue_check(pq);
+		i = (i * 7) % N;
+	} while (i != 1);
+
+	while (!merged_iter_pqueue_is_empty(pq)) {
+		struct pq_entry e = merged_iter_pqueue_remove(&pq);
+		struct reftable_ref_record *ref =
+			reftable_record_as_ref(&e.rec);
+
+		merged_iter_pqueue_check(pq);
+
+		if (last != NULL) {
+			assert(strcmp(last, ref->refname) < 0);
+		}
+		last = ref->refname;
+		ref->refname = NULL;
+		reftable_free(ref);
+	}
+
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+
+	merged_iter_pqueue_clear(&pq);
+}
+
+static void write_test_table(struct strbuf *buf,
+			     struct reftable_ref_record refs[], int n)
+{
+	int min = 0xffffffff;
+	int max = 0;
+	int i = 0;
+	int err;
+
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_writer *w = NULL;
+	for (i = 0; i < n; i++) {
+		uint64_t ui = refs[i].update_index;
+		if (ui > max) {
+			max = ui;
+		}
+		if (ui < min) {
+			min = ui;
+		}
+	}
+
+	w = reftable_new_writer(&strbuf_add_void, buf, &opts);
+	reftable_writer_set_limits(w, min, max);
+
+	for (i = 0; i < n; i++) {
+		uint64_t before = refs[i].update_index;
+		int n = reftable_writer_add_ref(w, &refs[i]);
+		assert(n == 0);
+		assert(before == refs[i].update_index);
+	}
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+
+	reftable_writer_free(w);
+}
+
+static struct reftable_merged_table *
+merged_table_from_records(struct reftable_ref_record **refs,
+			  struct reftable_block_source **source,
+			  struct reftable_reader ***readers, int *sizes,
+			  struct strbuf *buf, int n)
+{
+	int i = 0;
+	struct reftable_merged_table *mt = NULL;
+	int err;
+	struct reftable_table *tabs =
+		reftable_calloc(n * sizeof(struct reftable_table));
+	*readers = reftable_calloc(n * sizeof(struct reftable_reader *));
+	*source = reftable_calloc(n * sizeof(**source));
+	for (i = 0; i < n; i++) {
+		write_test_table(&buf[i], refs[i], sizes[i]);
+		block_source_from_strbuf(&(*source)[i], &buf[i]);
+
+		err = reftable_new_reader(&(*readers)[i], &(*source)[i],
+					  "name");
+		assert_err(err);
+		reftable_table_from_reader(&tabs[i], (*readers)[i]);
+	}
+
+	err = reftable_new_merged_table(&mt, tabs, n, SHA1_ID);
+	assert_err(err);
+	return mt;
+}
+
+static void readers_destroy(struct reftable_reader **readers, size_t n)
+{
+	int i = 0;
+	for (; i < n; i++)
+		reftable_reader_free(readers[i]);
+	reftable_free(readers);
+}
+
+static void test_merged_between(void)
+{
+	uint8_t hash1[SHA1_SIZE] = { 1, 2, 3, 0 };
+
+	struct reftable_ref_record r1[] = { {
+		.refname = "b",
+		.update_index = 1,
+		.value = hash1,
+	} };
+	struct reftable_ref_record r2[] = { {
+		.refname = "a",
+		.update_index = 2,
+	} };
+
+	struct reftable_ref_record *refs[] = { r1, r2 };
+	int sizes[] = { 1, 1 };
+	struct strbuf bufs[2] = { STRBUF_INIT, STRBUF_INIT };
+	struct reftable_block_source *bs = NULL;
+	struct reftable_reader **readers = NULL;
+	struct reftable_merged_table *mt =
+		merged_table_from_records(refs, &bs, &readers, sizes, bufs, 2);
+	int i;
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_iterator it = { 0 };
+	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert_err(err);
+	assert(ref.update_index == 2);
+	reftable_ref_record_clear(&ref);
+	reftable_iterator_destroy(&it);
+	readers_destroy(readers, 2);
+	reftable_merged_table_free(mt);
+	for (i = 0; i < ARRAY_SIZE(bufs); i++) {
+		strbuf_release(&bufs[i]);
+	}
+	reftable_free(bs);
+}
+
+static void test_merged(void)
+{
+	uint8_t hash1[SHA1_SIZE] = { 1 };
+	uint8_t hash2[SHA1_SIZE] = { 2 };
+	struct reftable_ref_record r1[] = { {
+						    .refname = "a",
+						    .update_index = 1,
+						    .value = hash1,
+					    },
+					    {
+						    .refname = "b",
+						    .update_index = 1,
+						    .value = hash1,
+					    },
+					    {
+						    .refname = "c",
+						    .update_index = 1,
+						    .value = hash1,
+					    } };
+	struct reftable_ref_record r2[] = { {
+		.refname = "a",
+		.update_index = 2,
+	} };
+	struct reftable_ref_record r3[] = {
+		{
+			.refname = "c",
+			.update_index = 3,
+			.value = hash2,
+		},
+		{
+			.refname = "d",
+			.update_index = 3,
+			.value = hash1,
+		},
+	};
+
+	struct reftable_ref_record want[] = {
+		r2[0],
+		r1[1],
+		r3[0],
+		r3[1],
+	};
+
+	struct reftable_ref_record *refs[] = { r1, r2, r3 };
+	int sizes[3] = { 3, 1, 2 };
+	struct strbuf bufs[3] = { STRBUF_INIT, STRBUF_INIT, STRBUF_INIT };
+	struct reftable_block_source *bs = NULL;
+	struct reftable_reader **readers = NULL;
+	struct reftable_merged_table *mt =
+		merged_table_from_records(refs, &bs, &readers, sizes, bufs, 3);
+
+	struct reftable_iterator it = { 0 };
+	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	struct reftable_ref_record *out = NULL;
+	size_t len = 0;
+	size_t cap = 0;
+	int i = 0;
+
+	assert_err(err);
+	while (len < 100) { /* cap loops/recursion. */
+		struct reftable_ref_record ref = { 0 };
+		int err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			out = reftable_realloc(
+				out, sizeof(struct reftable_ref_record) * cap);
+		}
+		out[len++] = ref;
+	}
+	reftable_iterator_destroy(&it);
+
+	assert(ARRAY_SIZE(want) == len);
+	for (i = 0; i < len; i++) {
+		assert(reftable_ref_record_equal(&want[i], &out[i], SHA1_SIZE));
+	}
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&out[i]);
+	}
+	reftable_free(out);
+
+	for (i = 0; i < 3; i++) {
+		strbuf_release(&bufs[i]);
+	}
+	readers_destroy(readers, 3);
+	reftable_merged_table_free(mt);
+	reftable_free(bs);
+}
+
+/* XXX test refs_for(oid) */
+
+int merged_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_merged_between", &test_merged_between);
+	add_test_case("test_pq", &test_pq);
+	add_test_case("test_merged", &test_merged);
+	return test_main(argc, argv);
+}
diff --git a/reftable/pq.c b/reftable/pq.c
new file mode 100644
index 0000000000..de7ed658a4
--- /dev/null
+++ b/reftable/pq.c
@@ -0,0 +1,115 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "pq.h"
+
+#include "reftable.h"
+#include "system.h"
+#include "basics.h"
+
+int pq_less(struct pq_entry a, struct pq_entry b)
+{
+	struct strbuf ak = STRBUF_INIT;
+	struct strbuf bk = STRBUF_INIT;
+	int cmp = 0;
+	reftable_record_key(&a.rec, &ak);
+	reftable_record_key(&b.rec, &bk);
+
+	cmp = strbuf_cmp(&ak, &bk);
+
+	strbuf_release(&ak);
+	strbuf_release(&bk);
+
+	if (cmp == 0)
+		return a.index > b.index;
+
+	return cmp < 0;
+}
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq)
+{
+	return pq.heap[0];
+}
+
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq)
+{
+	return pq.len == 0;
+}
+
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq)
+{
+	int i = 0;
+	for (i = 1; i < pq.len; i++) {
+		int parent = (i - 1) / 2;
+
+		assert(pq_less(pq.heap[parent], pq.heap[i]));
+	}
+}
+
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	struct pq_entry e = pq->heap[0];
+	pq->heap[0] = pq->heap[pq->len - 1];
+	pq->len--;
+
+	i = 0;
+	while (i < pq->len) {
+		int min = i;
+		int j = 2 * i + 1;
+		int k = 2 * i + 2;
+		if (j < pq->len && pq_less(pq->heap[j], pq->heap[i])) {
+			min = j;
+		}
+		if (k < pq->len && pq_less(pq->heap[k], pq->heap[min])) {
+			min = k;
+		}
+
+		if (min == i) {
+			break;
+		}
+
+		SWAP(pq->heap[i], pq->heap[min]);
+		i = min;
+	}
+
+	return e;
+}
+
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e)
+{
+	int i = 0;
+	if (pq->len == pq->cap) {
+		pq->cap = 2 * pq->cap + 1;
+		pq->heap = reftable_realloc(pq->heap,
+					    pq->cap * sizeof(struct pq_entry));
+	}
+
+	pq->heap[pq->len++] = e;
+	i = pq->len - 1;
+	while (i > 0) {
+		int j = (i - 1) / 2;
+		if (pq_less(pq->heap[j], pq->heap[i])) {
+			break;
+		}
+
+		SWAP(pq->heap[j], pq->heap[i]);
+
+		i = j;
+	}
+}
+
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq)
+{
+	int i = 0;
+	for (i = 0; i < pq->len; i++) {
+		reftable_record_destroy(&pq->heap[i].rec);
+	}
+	FREE_AND_NULL(pq->heap);
+	pq->len = pq->cap = 0;
+}
diff --git a/reftable/pq.h b/reftable/pq.h
new file mode 100644
index 0000000000..1117f70306
--- /dev/null
+++ b/reftable/pq.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef PQ_H
+#define PQ_H
+
+#include "record.h"
+
+struct pq_entry {
+	int index;
+	struct reftable_record rec;
+};
+
+int pq_less(struct pq_entry a, struct pq_entry b);
+
+struct merged_iter_pqueue {
+	struct pq_entry *heap;
+	size_t len;
+	size_t cap;
+};
+
+struct pq_entry merged_iter_pqueue_top(struct merged_iter_pqueue pq);
+bool merged_iter_pqueue_is_empty(struct merged_iter_pqueue pq);
+void merged_iter_pqueue_check(struct merged_iter_pqueue pq);
+struct pq_entry merged_iter_pqueue_remove(struct merged_iter_pqueue *pq);
+void merged_iter_pqueue_add(struct merged_iter_pqueue *pq, struct pq_entry e);
+void merged_iter_pqueue_clear(struct merged_iter_pqueue *pq);
+
+#endif
diff --git a/reftable/reader.c b/reftable/reader.c
new file mode 100644
index 0000000000..29e5b70439
--- /dev/null
+++ b/reftable/reader.c
@@ -0,0 +1,744 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reader.h"
+
+#include "system.h"
+#include "block.h"
+#include "constants.h"
+#include "iter.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+uint64_t block_source_size(struct reftable_block_source *source)
+{
+	return source->ops->size(source->arg);
+}
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size)
+{
+	int result = source->ops->read_block(source->arg, dest, off, size);
+	dest->source = *source;
+	return result;
+}
+
+void reftable_block_done(struct reftable_block *blockp)
+{
+	struct reftable_block_source source = blockp->source;
+	if (blockp != NULL && source.ops != NULL)
+		source.ops->return_block(source.arg, blockp);
+	blockp->data = NULL;
+	blockp->len = 0;
+	blockp->source.ops = NULL;
+	blockp->source.arg = NULL;
+}
+
+void block_source_close(struct reftable_block_source *source)
+{
+	if (source->ops == NULL) {
+		return;
+	}
+
+	source->ops->close(source->arg);
+	source->ops = NULL;
+}
+
+static struct reftable_reader_offsets *
+reader_offsets_for(struct reftable_reader *r, uint8_t typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+		return &r->ref_offsets;
+	case BLOCK_TYPE_LOG:
+		return &r->log_offsets;
+	case BLOCK_TYPE_OBJ:
+		return &r->obj_offsets;
+	}
+	abort();
+}
+
+static int reader_get_block(struct reftable_reader *r,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t sz)
+{
+	if (off >= r->size)
+		return 0;
+
+	if (off + sz > r->size) {
+		sz = r->size - off;
+	}
+
+	return block_source_read_block(&r->source, dest, off, sz);
+}
+
+uint32_t reftable_reader_hash_id(struct reftable_reader *r)
+{
+	return r->hash_id;
+}
+
+const char *reader_name(struct reftable_reader *r)
+{
+	return r->name;
+}
+
+static int parse_footer(struct reftable_reader *r, uint8_t *footer,
+			uint8_t *header)
+{
+	uint8_t *f = footer;
+	uint8_t first_block_typ;
+	int err = 0;
+	uint32_t computed_crc;
+	uint32_t file_crc;
+
+	if (memcmp(f, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+	f += 4;
+
+	if (memcmp(footer, header, header_size(r->version))) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	f++;
+	r->block_size = get_be24(f);
+
+	f += 3;
+	r->min_update_index = get_be64(f);
+	f += 8;
+	r->max_update_index = get_be64(f);
+	f += 8;
+
+	if (r->version == 1) {
+		r->hash_id = SHA1_ID;
+	} else {
+		r->hash_id = get_be32(f);
+		switch (r->hash_id) {
+		case SHA1_ID:
+			break;
+		case SHA256_ID:
+			break;
+		default:
+			err = REFTABLE_FORMAT_ERROR;
+			goto done;
+		}
+		f += 4;
+	}
+
+	r->ref_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	r->obj_offsets.offset = get_be64(f);
+	f += 8;
+
+	r->object_id_len = r->obj_offsets.offset & ((1 << 5) - 1);
+	r->obj_offsets.offset >>= 5;
+
+	r->obj_offsets.index_offset = get_be64(f);
+	f += 8;
+	r->log_offsets.offset = get_be64(f);
+	f += 8;
+	r->log_offsets.index_offset = get_be64(f);
+	f += 8;
+
+	computed_crc = crc32(0, footer, f - footer);
+	file_crc = get_be32(f);
+	f += 4;
+	if (computed_crc != file_crc) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	first_block_typ = header[header_size(r->version)];
+	r->ref_offsets.present = (first_block_typ == BLOCK_TYPE_REF);
+	r->ref_offsets.offset = 0;
+	r->log_offsets.present = (first_block_typ == BLOCK_TYPE_LOG ||
+				  r->log_offsets.offset > 0);
+	r->obj_offsets.present = r->obj_offsets.offset > 0;
+	err = 0;
+done:
+	return err;
+}
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name)
+{
+	struct reftable_block footer = { 0 };
+	struct reftable_block header = { 0 };
+	int err = 0;
+
+	memset(r, 0, sizeof(struct reftable_reader));
+
+	/* Need +1 to read type of first block. */
+	err = block_source_read_block(source, &header, 0, header_size(2) + 1);
+	if (err != header_size(2) + 1) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	if (memcmp(header.data, "REFT", 4)) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+	r->version = header.data[4];
+	if (r->version != 1 && r->version != 2) {
+		err = REFTABLE_FORMAT_ERROR;
+		goto done;
+	}
+
+	r->size = block_source_size(source) - footer_size(r->version);
+	r->source = *source;
+	r->name = xstrdup(name);
+	r->hash_id = 0;
+
+	err = block_source_read_block(source, &footer, r->size,
+				      footer_size(r->version));
+	if (err != footer_size(r->version)) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = parse_footer(r, footer.data, header.data);
+done:
+	reftable_block_done(&footer);
+	reftable_block_done(&header);
+	return err;
+}
+
+struct table_iter {
+	struct reftable_reader *r;
+	uint8_t typ;
+	uint64_t block_off;
+	struct block_iter bi;
+	bool finished;
+};
+#define TABLE_ITER_INIT                          \
+	{                                        \
+		.bi = {.last_key = STRBUF_INIT } \
+	}
+
+static void table_iter_copy_from(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = src->block_off;
+	dest->finished = src->finished;
+	block_iter_copy_from(&dest->bi, &src->bi);
+}
+
+static int table_iter_next_in_block(struct table_iter *ti,
+				    struct reftable_record *rec)
+{
+	int res = block_iter_next(&ti->bi, rec);
+	if (res == 0 && reftable_record_type(rec) == BLOCK_TYPE_REF) {
+		((struct reftable_ref_record *)rec->data)->update_index +=
+			ti->r->min_update_index;
+	}
+
+	return res;
+}
+
+static void table_iter_block_done(struct table_iter *ti)
+{
+	if (ti->bi.br == NULL) {
+		return;
+	}
+	reftable_block_done(&ti->bi.br->block);
+	FREE_AND_NULL(ti->bi.br);
+
+	ti->bi.last_key.len = 0;
+	ti->bi.next_off = 0;
+}
+
+static int32_t extract_block_size(uint8_t *data, uint8_t *typ, uint64_t off,
+				  int version)
+{
+	int32_t result = 0;
+
+	if (off == 0) {
+		data += header_size(version);
+	}
+
+	*typ = data[0];
+	if (reftable_is_block_type(*typ)) {
+		result = get_be24(data + 1);
+	}
+	return result;
+}
+
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, uint8_t want_typ)
+{
+	int32_t guess_block_size = r->block_size ? r->block_size :
+							 DEFAULT_BLOCK_SIZE;
+	struct reftable_block block = { 0 };
+	uint8_t block_typ = 0;
+	int err = 0;
+	uint32_t header_off = next_off ? 0 : header_size(r->version);
+	int32_t block_size = 0;
+
+	if (next_off >= r->size)
+		return 1;
+
+	err = reader_get_block(r, &block, next_off, guess_block_size);
+	if (err < 0)
+		return err;
+
+	block_size = extract_block_size(block.data, &block_typ, next_off,
+					r->version);
+	if (block_size < 0)
+		return block_size;
+
+	if (want_typ != BLOCK_TYPE_ANY && block_typ != want_typ) {
+		reftable_block_done(&block);
+		return 1;
+	}
+
+	if (block_size > guess_block_size) {
+		reftable_block_done(&block);
+		err = reader_get_block(r, &block, next_off, block_size);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	return block_reader_init(br, &block, header_off, r->block_size,
+				 hash_size(r->hash_id));
+}
+
+static int table_iter_next_block(struct table_iter *dest,
+				 struct table_iter *src)
+{
+	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
+	struct block_reader br = { 0 };
+	int err = 0;
+
+	dest->r = src->r;
+	dest->typ = src->typ;
+	dest->block_off = next_block_off;
+
+	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
+	if (err > 0) {
+		dest->finished = true;
+		return 1;
+	}
+	if (err != 0)
+		return err;
+	else {
+		struct block_reader *brp =
+			reftable_malloc(sizeof(struct block_reader));
+		*brp = br;
+
+		dest->finished = false;
+		block_reader_start(brp, &dest->bi);
+	}
+	return 0;
+}
+
+static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
+{
+	if (reftable_record_type(rec) != ti->typ)
+		return REFTABLE_API_ERROR;
+
+	while (true) {
+		struct table_iter next = TABLE_ITER_INIT;
+		int err = 0;
+		if (ti->finished) {
+			return 1;
+		}
+
+		err = table_iter_next_in_block(ti, rec);
+		if (err <= 0) {
+			return err;
+		}
+
+		err = table_iter_next_block(&next, ti);
+		if (err != 0) {
+			ti->finished = true;
+		}
+		table_iter_block_done(ti);
+		if (err != 0) {
+			return err;
+		}
+		table_iter_copy_from(ti, &next);
+		block_iter_close(&next.bi);
+	}
+}
+
+static int table_iter_next_void(void *ti, struct reftable_record *rec)
+{
+	return table_iter_next((struct table_iter *)ti, rec);
+}
+
+static void table_iter_close(void *p)
+{
+	struct table_iter *ti = (struct table_iter *)p;
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
+struct reftable_iterator_vtable table_iter_vtable = {
+	.next = &table_iter_next_void,
+	.close = &table_iter_close,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	assert(it->ops == NULL);
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static int reader_table_iter_at(struct reftable_reader *r,
+				struct table_iter *ti, uint64_t off,
+				uint8_t typ)
+{
+	struct block_reader br = { 0 };
+	struct block_reader *brp = NULL;
+
+	int err = reader_init_block_reader(r, &br, off, typ);
+	if (err != 0)
+		return err;
+
+	brp = reftable_malloc(sizeof(struct block_reader));
+	*brp = br;
+	ti->r = r;
+	ti->typ = block_reader_type(brp);
+	ti->block_off = off;
+	block_reader_start(brp, &ti->bi);
+	return 0;
+}
+
+static int reader_start(struct reftable_reader *r, struct table_iter *ti,
+			uint8_t typ, bool index)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	uint64_t off = offs->offset;
+	if (index) {
+		off = offs->index_offset;
+		if (off == 0) {
+			return 1;
+		}
+		typ = BLOCK_TYPE_INDEX;
+	}
+
+	return reader_table_iter_at(r, ti, off, typ);
+}
+
+static int reader_seek_linear(struct reftable_reader *r, struct table_iter *ti,
+			      struct reftable_record *want)
+{
+	struct reftable_record rec =
+		reftable_new_record(reftable_record_type(want));
+	struct strbuf want_key = STRBUF_INIT;
+	struct strbuf got_key = STRBUF_INIT;
+	struct table_iter next = TABLE_ITER_INIT;
+	int err = -1;
+
+	reftable_record_key(want, &want_key);
+
+	while (true) {
+		err = table_iter_next_block(&next, ti);
+		if (err < 0)
+			goto done;
+
+		if (err > 0) {
+			break;
+		}
+
+		err = block_reader_first_key(next.bi.br, &got_key);
+		if (err < 0)
+			goto done;
+
+		if (strbuf_cmp(&got_key, &want_key) > 0) {
+			table_iter_block_done(&next);
+			break;
+		}
+
+		table_iter_block_done(ti);
+		table_iter_copy_from(ti, &next);
+	}
+
+	err = block_iter_seek(&ti->bi, &want_key);
+	if (err < 0)
+		goto done;
+	err = 0;
+
+done:
+	block_iter_close(&next.bi);
+	reftable_record_destroy(&rec);
+	strbuf_release(&want_key);
+	strbuf_release(&got_key);
+	return err;
+}
+
+static int reader_seek_indexed(struct reftable_reader *r,
+			       struct reftable_iterator *it,
+			       struct reftable_record *rec)
+{
+	struct reftable_index_record want_index = { .last_key = STRBUF_INIT };
+	struct reftable_record want_index_rec = { 0 };
+	struct reftable_index_record index_result = { .last_key = STRBUF_INIT };
+	struct reftable_record index_result_rec = { 0 };
+	struct table_iter index_iter = TABLE_ITER_INIT;
+	struct table_iter next = TABLE_ITER_INIT;
+	int err = 0;
+
+	reftable_record_key(rec, &want_index.last_key);
+	reftable_record_from_index(&want_index_rec, &want_index);
+	reftable_record_from_index(&index_result_rec, &index_result);
+
+	err = reader_start(r, &index_iter, reftable_record_type(rec), true);
+	if (err < 0)
+		goto done;
+
+	err = reader_seek_linear(r, &index_iter, &want_index_rec);
+	while (true) {
+		err = table_iter_next(&index_iter, &index_result_rec);
+		table_iter_block_done(&index_iter);
+		if (err != 0)
+			goto done;
+
+		err = reader_table_iter_at(r, &next, index_result.offset, 0);
+		if (err != 0)
+			goto done;
+
+		err = block_iter_seek(&next.bi, &want_index.last_key);
+		if (err < 0)
+			goto done;
+
+		if (next.typ == reftable_record_type(rec)) {
+			err = 0;
+			break;
+		}
+
+		if (next.typ != BLOCK_TYPE_INDEX) {
+			err = REFTABLE_FORMAT_ERROR;
+			break;
+		}
+
+		table_iter_copy_from(&index_iter, &next);
+	}
+
+	if (err == 0) {
+		struct table_iter empty = TABLE_ITER_INIT;
+		struct table_iter *malloced =
+			reftable_calloc(sizeof(struct table_iter));
+		*malloced = empty;
+		table_iter_copy_from(malloced, &next);
+		iterator_from_table_iter(it, malloced);
+	}
+done:
+	block_iter_close(&next.bi);
+	table_iter_close(&index_iter);
+	reftable_record_clear(&want_index_rec);
+	reftable_record_clear(&index_result_rec);
+	return err;
+}
+
+static int reader_seek_internal(struct reftable_reader *r,
+				struct reftable_iterator *it,
+				struct reftable_record *rec)
+{
+	struct reftable_reader_offsets *offs =
+		reader_offsets_for(r, reftable_record_type(rec));
+	uint64_t idx = offs->index_offset;
+	struct table_iter ti = TABLE_ITER_INIT;
+	int err = 0;
+	if (idx > 0)
+		return reader_seek_indexed(r, it, rec);
+
+	err = reader_start(r, &ti, reftable_record_type(rec), false);
+	if (err < 0)
+		return err;
+	err = reader_seek_linear(r, &ti, rec);
+	if (err < 0)
+		return err;
+	else {
+		struct table_iter *p =
+			reftable_malloc(sizeof(struct table_iter));
+		*p = ti;
+		iterator_from_table_iter(it, p);
+	}
+
+	return 0;
+}
+
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec)
+{
+	uint8_t typ = reftable_record_type(rec);
+
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	if (!offs->present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	return reader_seek_internal(r, it, rec);
+}
+
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.refname = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index)
+{
+	struct reftable_log_record log = {
+		.refname = (char *)name,
+		.update_index = update_index,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_log(&rec, &log);
+	return reader_seek(r, it, &rec);
+}
+
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name)
+{
+	uint64_t max = ~((uint64_t)0);
+	return reftable_reader_seek_log_at(r, it, name, max);
+}
+
+void reader_close(struct reftable_reader *r)
+{
+	block_source_close(&r->source);
+	FREE_AND_NULL(r->name);
+}
+
+int reftable_new_reader(struct reftable_reader **p,
+			struct reftable_block_source *src, char const *name)
+{
+	struct reftable_reader *rd =
+		reftable_calloc(sizeof(struct reftable_reader));
+	int err = init_reader(rd, src, name);
+	if (err == 0) {
+		*p = rd;
+	} else {
+		block_source_close(src);
+		reftable_free(rd);
+	}
+	return err;
+}
+
+void reftable_reader_free(struct reftable_reader *r)
+{
+	reader_close(r);
+	reftable_free(r);
+}
+
+static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
+					    struct reftable_iterator *it,
+					    uint8_t *oid)
+{
+	struct reftable_obj_record want = {
+		.hash_prefix = oid,
+		.hash_prefix_len = r->object_id_len,
+	};
+	struct reftable_record want_rec = { 0 };
+	struct reftable_iterator oit = { 0 };
+	struct reftable_obj_record got = { 0 };
+	struct reftable_record got_rec = { 0 };
+	int err = 0;
+	struct indexed_table_ref_iter *itr = NULL;
+
+	/* Look through the reverse index. */
+	reftable_record_from_obj(&want_rec, &want);
+	err = reader_seek(r, &oit, &want_rec);
+	if (err != 0)
+		goto done;
+
+	/* read out the reftable_obj_record */
+	reftable_record_from_obj(&got_rec, &got);
+	err = iterator_next(&oit, &got_rec);
+	if (err < 0)
+		goto done;
+
+	if (err > 0 ||
+	    memcmp(want.hash_prefix, got.hash_prefix, r->object_id_len)) {
+		/* didn't find it; return empty iterator */
+		iterator_set_empty(it);
+		err = 0;
+		goto done;
+	}
+
+	err = new_indexed_table_ref_iter(&itr, r, oid, hash_size(r->hash_id),
+					 got.offsets, got.offset_len);
+	if (err < 0)
+		goto done;
+	got.offsets = NULL;
+	iterator_from_indexed_table_ref_iter(it, itr);
+
+done:
+	reftable_iterator_destroy(&oit);
+	reftable_record_clear(&got_rec);
+	return err;
+}
+
+static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
+					      struct reftable_iterator *it,
+					      uint8_t *oid)
+{
+	struct table_iter ti_empty = TABLE_ITER_INIT;
+	struct table_iter *ti = reftable_calloc(sizeof(struct table_iter));
+	struct filtering_ref_iterator *filter = NULL;
+	struct filtering_ref_iterator empty = FILTERING_REF_ITERATOR_INIT;
+	int oid_len = hash_size(r->hash_id);
+	int err;
+
+	*ti = ti_empty;
+	err = reader_start(r, ti, BLOCK_TYPE_REF, false);
+	if (err < 0) {
+		reftable_free(ti);
+		return err;
+	}
+
+	filter = reftable_malloc(sizeof(struct filtering_ref_iterator));
+	*filter = empty;
+
+	strbuf_add(&filter->oid, oid, oid_len);
+	reftable_table_from_reader(&filter->tab, r);
+	filter->double_check = false;
+	iterator_from_table_iter(&filter->it, ti);
+
+	iterator_from_filtering_ref_iterator(it, filter);
+	return 0;
+}
+
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid)
+{
+	if (r->obj_offsets.present)
+		return reftable_reader_refs_for_indexed(r, it, oid);
+	return reftable_reader_refs_for_unindexed(r, it, oid);
+}
+
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r)
+{
+	return r->max_update_index;
+}
+
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
+{
+	return r->min_update_index;
+}
diff --git a/reftable/reader.h b/reftable/reader.h
new file mode 100644
index 0000000000..213e8fe847
--- /dev/null
+++ b/reftable/reader.h
@@ -0,0 +1,65 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef READER_H
+#define READER_H
+
+#include "block.h"
+#include "record.h"
+#include "reftable.h"
+
+uint64_t block_source_size(struct reftable_block_source *source);
+
+int block_source_read_block(struct reftable_block_source *source,
+			    struct reftable_block *dest, uint64_t off,
+			    uint32_t size);
+void block_source_close(struct reftable_block_source *source);
+
+/* metadata for a block type */
+struct reftable_reader_offsets {
+	bool present;
+	uint64_t offset;
+	uint64_t index_offset;
+};
+
+/* The state for reading a reftable file. */
+struct reftable_reader {
+	/* for convience, associate a name with the instance. */
+	char *name;
+	struct reftable_block_source source;
+
+	/* Size of the file, excluding the footer. */
+	uint64_t size;
+
+	/* 'sha1' for SHA1, 's256' for SHA-256 */
+	uint32_t hash_id;
+
+	uint32_t block_size;
+	uint64_t min_update_index;
+	uint64_t max_update_index;
+	/* Length of the OID keys in the 'o' section */
+	int object_id_len;
+	int version;
+
+	struct reftable_reader_offsets ref_offsets;
+	struct reftable_reader_offsets obj_offsets;
+	struct reftable_reader_offsets log_offsets;
+};
+
+int init_reader(struct reftable_reader *r, struct reftable_block_source *source,
+		const char *name);
+int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		struct reftable_record *rec);
+void reader_close(struct reftable_reader *r);
+const char *reader_name(struct reftable_reader *r);
+
+/* initialize a block reader to read from `r` */
+int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
+			     uint64_t next_off, uint8_t want_typ);
+
+#endif
diff --git a/reftable/record.c b/reftable/record.c
new file mode 100644
index 0000000000..08ab921d32
--- /dev/null
+++ b/reftable/record.c
@@ -0,0 +1,1126 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+/* record.c - methods for different types of records. */
+
+#include "record.h"
+
+#include "system.h"
+#include "constants.h"
+#include "reftable.h"
+#include "basics.h"
+
+int get_var_int(uint64_t *dest, struct string_view *in)
+{
+	int ptr = 0;
+	uint64_t val;
+
+	if (in->len == 0)
+		return -1;
+	val = in->buf[ptr] & 0x7f;
+
+	while (in->buf[ptr] & 0x80) {
+		ptr++;
+		if (ptr > in->len) {
+			return -1;
+		}
+		val = (val + 1) << 7 | (uint64_t)(in->buf[ptr] & 0x7f);
+	}
+
+	*dest = val;
+	return ptr + 1;
+}
+
+int put_var_int(struct string_view *dest, uint64_t val)
+{
+	uint8_t buf[10] = { 0 };
+	int i = 9;
+	int n = 0;
+	buf[i] = (uint8_t)(val & 0x7f);
+	i--;
+	while (true) {
+		val >>= 7;
+		if (!val) {
+			break;
+		}
+		val--;
+		buf[i] = 0x80 | (uint8_t)(val & 0x7f);
+		i--;
+	}
+
+	n = sizeof(buf) - i - 1;
+	if (dest->len < n)
+		return -1;
+	memcpy(dest->buf, &buf[i + 1], n);
+	return n;
+}
+
+int reftable_is_block_type(uint8_t typ)
+{
+	switch (typ) {
+	case BLOCK_TYPE_REF:
+	case BLOCK_TYPE_LOG:
+	case BLOCK_TYPE_OBJ:
+	case BLOCK_TYPE_INDEX:
+		return true;
+	}
+	return false;
+}
+
+static int decode_string(struct strbuf *dest, struct string_view in)
+{
+	int start_len = in.len;
+	uint64_t tsize = 0;
+	int n = get_var_int(&tsize, &in);
+	if (n <= 0)
+		return -1;
+	string_view_consume(&in, n);
+	if (in.len < tsize)
+		return -1;
+
+	strbuf_reset(dest);
+	strbuf_add(dest, in.buf, tsize);
+	string_view_consume(&in, tsize);
+
+	return start_len - in.len;
+}
+
+static int encode_string(char *str, struct string_view s)
+{
+	struct string_view start = s;
+	int l = strlen(str);
+	int n = put_var_int(&s, l);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+	if (s.len < l)
+		return -1;
+	memcpy(s.buf, str, l);
+	string_view_consume(&s, l);
+
+	return start.len - s.len;
+}
+
+int reftable_encode_key(bool *restart, struct string_view dest,
+			struct strbuf prev_key, struct strbuf key,
+			uint8_t extra)
+{
+	struct string_view start = dest;
+	int prefix_len = common_prefix_size(&prev_key, &key);
+	uint64_t suffix_len = key.len - prefix_len;
+	int n = put_var_int(&dest, (uint64_t)prefix_len);
+	if (n < 0)
+		return -1;
+	string_view_consume(&dest, n);
+
+	*restart = (prefix_len == 0);
+
+	n = put_var_int(&dest, suffix_len << 3 | (uint64_t)extra);
+	if (n < 0)
+		return -1;
+	string_view_consume(&dest, n);
+
+	if (dest.len < suffix_len)
+		return -1;
+	memcpy(dest.buf, key.buf + prefix_len, suffix_len);
+	string_view_consume(&dest, suffix_len);
+
+	return start.len - dest.len;
+}
+
+int reftable_decode_key(struct strbuf *key, uint8_t *extra,
+			struct strbuf last_key, struct string_view in)
+{
+	int start_len = in.len;
+	uint64_t prefix_len = 0;
+	uint64_t suffix_len = 0;
+	int n = get_var_int(&prefix_len, &in);
+	if (n < 0)
+		return -1;
+	string_view_consume(&in, n);
+
+	if (prefix_len > last_key.len)
+		return -1;
+
+	n = get_var_int(&suffix_len, &in);
+	if (n <= 0)
+		return -1;
+	string_view_consume(&in, n);
+
+	*extra = (uint8_t)(suffix_len & 0x7);
+	suffix_len >>= 3;
+
+	if (in.len < suffix_len)
+		return -1;
+
+	strbuf_reset(key);
+	strbuf_add(key, last_key.buf, prefix_len);
+	strbuf_add(key, in.buf, suffix_len);
+	string_view_consume(&in, suffix_len);
+
+	return start_len - in.len;
+}
+
+static void reftable_ref_record_key(const void *r, struct strbuf *dest)
+{
+	const struct reftable_ref_record *rec =
+		(const struct reftable_ref_record *)r;
+	strbuf_reset(dest);
+	strbuf_addstr(dest, rec->refname);
+}
+
+static void reftable_ref_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_ref_record *ref = (struct reftable_ref_record *)rec;
+	struct reftable_ref_record *src = (struct reftable_ref_record *)src_rec;
+	assert(hash_size > 0);
+
+	/* This is simple and correct, but we could probably reuse the hash
+	   fields. */
+	reftable_ref_record_clear(ref);
+	if (src->refname != NULL) {
+		ref->refname = xstrdup(src->refname);
+	}
+
+	if (src->target != NULL) {
+		ref->target = xstrdup(src->target);
+	}
+
+	if (src->target_value != NULL) {
+		ref->target_value = reftable_malloc(hash_size);
+		memcpy(ref->target_value, src->target_value, hash_size);
+	}
+
+	if (src->value != NULL) {
+		ref->value = reftable_malloc(hash_size);
+		memcpy(ref->value, src->value, hash_size);
+	}
+	ref->update_index = src->update_index;
+}
+
+static char hexdigit(int c)
+{
+	if (c <= 9)
+		return '0' + c;
+	return 'a' + (c - 10);
+}
+
+static void hex_format(char *dest, uint8_t *src, int hash_size)
+{
+	assert(hash_size > 0);
+	if (src != NULL) {
+		int i = 0;
+		for (i = 0; i < hash_size; i++) {
+			dest[2 * i] = hexdigit(src[i] >> 4);
+			dest[2 * i + 1] = hexdigit(src[i] & 0xf);
+		}
+		dest[2 * hash_size] = 0;
+	}
+}
+
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+	printf("ref{%s(%" PRIu64 ") ", ref->refname, ref->update_index);
+	if (ref->value != NULL) {
+		hex_format(hex, ref->value, hash_size(hash_id));
+		printf("%s", hex);
+	}
+	if (ref->target_value != NULL) {
+		hex_format(hex, ref->target_value, hash_size(hash_id));
+		printf(" (T %s)", hex);
+	}
+	if (ref->target != NULL) {
+		printf("=> %s", ref->target);
+	}
+	printf("}\n");
+}
+
+static void reftable_ref_record_clear_void(void *rec)
+{
+	reftable_ref_record_clear((struct reftable_ref_record *)rec);
+}
+
+void reftable_ref_record_clear(struct reftable_ref_record *ref)
+{
+	reftable_free(ref->refname);
+	reftable_free(ref->target);
+	reftable_free(ref->target_value);
+	reftable_free(ref->value);
+	memset(ref, 0, sizeof(struct reftable_ref_record));
+}
+
+static uint8_t reftable_ref_record_val_type(const void *rec)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	if (r->value != NULL) {
+		if (r->target_value != NULL) {
+			return 2;
+		} else {
+			return 1;
+		}
+	} else if (r->target != NULL)
+		return 3;
+	return 0;
+}
+
+static int reftable_ref_record_encode(const void *rec, struct string_view s,
+				      int hash_size)
+{
+	const struct reftable_ref_record *r =
+		(const struct reftable_ref_record *)rec;
+	struct string_view start = s;
+	int n = put_var_int(&s, r->update_index);
+	assert(hash_size > 0);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	if (r->value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->value, hash_size);
+		string_view_consume(&s, hash_size);
+	}
+
+	if (r->target_value != NULL) {
+		if (s.len < hash_size) {
+			return -1;
+		}
+		memcpy(s.buf, r->target_value, hash_size);
+		string_view_consume(&s, hash_size);
+	}
+
+	if (r->target != NULL) {
+		int n = encode_string(r->target, s);
+		if (n < 0) {
+			return -1;
+		}
+		string_view_consume(&s, n);
+	}
+
+	return start.len - s.len;
+}
+
+static int reftable_ref_record_decode(void *rec, struct strbuf key,
+				      uint8_t val_type, struct string_view in,
+				      int hash_size)
+{
+	struct reftable_ref_record *r = (struct reftable_ref_record *)rec;
+	struct string_view start = in;
+	bool seen_value = false;
+	bool seen_target_value = false;
+	bool seen_target = false;
+
+	int n = get_var_int(&r->update_index, &in);
+	if (n < 0)
+		return n;
+	assert(hash_size > 0);
+
+	string_view_consume(&in, n);
+
+	r->refname = reftable_realloc(r->refname, key.len + 1);
+	memcpy(r->refname, key.buf, key.len);
+	r->refname[key.len] = 0;
+
+	switch (val_type) {
+	case 1:
+	case 2:
+		if (in.len < hash_size) {
+			return -1;
+		}
+
+		if (r->value == NULL) {
+			r->value = reftable_malloc(hash_size);
+		}
+		seen_value = true;
+		memcpy(r->value, in.buf, hash_size);
+		string_view_consume(&in, hash_size);
+		if (val_type == 1) {
+			break;
+		}
+		if (r->target_value == NULL) {
+			r->target_value = reftable_malloc(hash_size);
+		}
+		seen_target_value = true;
+		memcpy(r->target_value, in.buf, hash_size);
+		string_view_consume(&in, hash_size);
+		break;
+	case 3: {
+		struct strbuf dest = STRBUF_INIT;
+		int n = decode_string(&dest, in);
+		if (n < 0) {
+			return -1;
+		}
+		string_view_consume(&in, n);
+		seen_target = true;
+		if (r->target != NULL) {
+			reftable_free(r->target);
+		}
+		r->target = dest.buf;
+	} break;
+
+	case 0:
+		break;
+	default:
+		abort();
+		break;
+	}
+
+	if (!seen_target && r->target != NULL) {
+		FREE_AND_NULL(r->target);
+	}
+	if (!seen_target_value && r->target_value != NULL) {
+		FREE_AND_NULL(r->target_value);
+	}
+	if (!seen_value && r->value != NULL) {
+		FREE_AND_NULL(r->value);
+	}
+
+	return start.len - in.len;
+}
+
+static bool reftable_ref_record_is_deletion_void(const void *p)
+{
+	return reftable_ref_record_is_deletion(
+		(const struct reftable_ref_record *)p);
+}
+
+struct reftable_record_vtable reftable_ref_record_vtable = {
+	.key = &reftable_ref_record_key,
+	.type = BLOCK_TYPE_REF,
+	.copy_from = &reftable_ref_record_copy_from,
+	.val_type = &reftable_ref_record_val_type,
+	.encode = &reftable_ref_record_encode,
+	.decode = &reftable_ref_record_decode,
+	.clear = &reftable_ref_record_clear_void,
+	.is_deletion = &reftable_ref_record_is_deletion_void,
+};
+
+static void reftable_obj_record_key(const void *r, struct strbuf *dest)
+{
+	const struct reftable_obj_record *rec =
+		(const struct reftable_obj_record *)r;
+	strbuf_reset(dest);
+	strbuf_add(dest, rec->hash_prefix, rec->hash_prefix_len);
+}
+
+static void reftable_obj_record_clear(void *rec)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	FREE_AND_NULL(obj->hash_prefix);
+	FREE_AND_NULL(obj->offsets);
+	memset(obj, 0, sizeof(struct reftable_obj_record));
+}
+
+static void reftable_obj_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_obj_record *obj = (struct reftable_obj_record *)rec;
+	const struct reftable_obj_record *src =
+		(const struct reftable_obj_record *)src_rec;
+	int olen;
+
+	reftable_obj_record_clear(obj);
+	*obj = *src;
+	obj->hash_prefix = reftable_malloc(obj->hash_prefix_len);
+	memcpy(obj->hash_prefix, src->hash_prefix, obj->hash_prefix_len);
+
+	olen = obj->offset_len * sizeof(uint64_t);
+	obj->offsets = reftable_malloc(olen);
+	memcpy(obj->offsets, src->offsets, olen);
+}
+
+static uint8_t reftable_obj_record_val_type(const void *rec)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	if (r->offset_len > 0 && r->offset_len < 8)
+		return r->offset_len;
+	return 0;
+}
+
+static int reftable_obj_record_encode(const void *rec, struct string_view s,
+				      int hash_size)
+{
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	struct string_view start = s;
+	int i = 0;
+	int n = 0;
+	uint64_t last = 0;
+	if (r->offset_len == 0 || r->offset_len >= 8) {
+		n = put_var_int(&s, r->offset_len);
+		if (n < 0) {
+			return -1;
+		}
+		string_view_consume(&s, n);
+	}
+	if (r->offset_len == 0)
+		return start.len - s.len;
+	n = put_var_int(&s, r->offsets[0]);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	last = r->offsets[0];
+	for (i = 1; i < r->offset_len; i++) {
+		int n = put_var_int(&s, r->offsets[i] - last);
+		if (n < 0) {
+			return -1;
+		}
+		string_view_consume(&s, n);
+		last = r->offsets[i];
+	}
+	return start.len - s.len;
+}
+
+static int reftable_obj_record_decode(void *rec, struct strbuf key,
+				      uint8_t val_type, struct string_view in,
+				      int hash_size)
+{
+	struct string_view start = in;
+	struct reftable_obj_record *r = (struct reftable_obj_record *)rec;
+	uint64_t count = val_type;
+	int n = 0;
+	uint64_t last;
+	int j;
+	r->hash_prefix = reftable_malloc(key.len);
+	memcpy(r->hash_prefix, key.buf, key.len);
+	r->hash_prefix_len = key.len;
+
+	if (val_type == 0) {
+		n = get_var_int(&count, &in);
+		if (n < 0) {
+			return n;
+		}
+
+		string_view_consume(&in, n);
+	}
+
+	r->offsets = NULL;
+	r->offset_len = 0;
+	if (count == 0)
+		return start.len - in.len;
+
+	r->offsets = reftable_malloc(count * sizeof(uint64_t));
+	r->offset_len = count;
+
+	n = get_var_int(&r->offsets[0], &in);
+	if (n < 0)
+		return n;
+	string_view_consume(&in, n);
+
+	last = r->offsets[0];
+	j = 1;
+	while (j < count) {
+		uint64_t delta = 0;
+		int n = get_var_int(&delta, &in);
+		if (n < 0) {
+			return n;
+		}
+		string_view_consume(&in, n);
+
+		last = r->offsets[j] = (delta + last);
+		j++;
+	}
+	return start.len - in.len;
+}
+
+static bool not_a_deletion(const void *p)
+{
+	return false;
+}
+
+struct reftable_record_vtable reftable_obj_record_vtable = {
+	.key = &reftable_obj_record_key,
+	.type = BLOCK_TYPE_OBJ,
+	.copy_from = &reftable_obj_record_copy_from,
+	.val_type = &reftable_obj_record_val_type,
+	.encode = &reftable_obj_record_encode,
+	.decode = &reftable_obj_record_decode,
+	.clear = &reftable_obj_record_clear,
+	.is_deletion = not_a_deletion,
+};
+
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id)
+{
+	char hex[SHA256_SIZE + 1] = { 0 };
+
+	printf("log{%s(%" PRIu64 ") %s <%s> %" PRIu64 " %04d\n", log->refname,
+	       log->update_index, log->name, log->email, log->time,
+	       log->tz_offset);
+	hex_format(hex, log->old_hash, hash_size(hash_id));
+	printf("%s => ", hex);
+	hex_format(hex, log->new_hash, hash_size(hash_id));
+	printf("%s\n\n%s\n}\n", hex, log->message);
+}
+
+static void reftable_log_record_key(const void *r, struct strbuf *dest)
+{
+	const struct reftable_log_record *rec =
+		(const struct reftable_log_record *)r;
+	int len = strlen(rec->refname);
+	uint8_t i64[8];
+	uint64_t ts = 0;
+	strbuf_reset(dest);
+	strbuf_add(dest, (uint8_t *)rec->refname, len + 1);
+
+	ts = (~ts) - rec->update_index;
+	put_be64(&i64[0], ts);
+	strbuf_add(dest, i64, sizeof(i64));
+}
+
+static void reftable_log_record_copy_from(void *rec, const void *src_rec,
+					  int hash_size)
+{
+	struct reftable_log_record *dst = (struct reftable_log_record *)rec;
+	const struct reftable_log_record *src =
+		(const struct reftable_log_record *)src_rec;
+
+	reftable_log_record_clear(dst);
+	*dst = *src;
+	if (dst->refname != NULL) {
+		dst->refname = xstrdup(dst->refname);
+	}
+	if (dst->email != NULL) {
+		dst->email = xstrdup(dst->email);
+	}
+	if (dst->name != NULL) {
+		dst->name = xstrdup(dst->name);
+	}
+	if (dst->message != NULL) {
+		dst->message = xstrdup(dst->message);
+	}
+
+	if (dst->new_hash != NULL) {
+		dst->new_hash = reftable_malloc(hash_size);
+		memcpy(dst->new_hash, src->new_hash, hash_size);
+	}
+	if (dst->old_hash != NULL) {
+		dst->old_hash = reftable_malloc(hash_size);
+		memcpy(dst->old_hash, src->old_hash, hash_size);
+	}
+}
+
+static void reftable_log_record_clear_void(void *rec)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	reftable_log_record_clear(r);
+}
+
+void reftable_log_record_clear(struct reftable_log_record *r)
+{
+	reftable_free(r->refname);
+	reftable_free(r->new_hash);
+	reftable_free(r->old_hash);
+	reftable_free(r->name);
+	reftable_free(r->email);
+	reftable_free(r->message);
+	memset(r, 0, sizeof(struct reftable_log_record));
+}
+
+static uint8_t reftable_log_record_val_type(const void *rec)
+{
+	const struct reftable_log_record *log =
+		(const struct reftable_log_record *)rec;
+
+	return reftable_log_record_is_deletion(log) ? 0 : 1;
+}
+
+static uint8_t zero[SHA256_SIZE] = { 0 };
+
+static int reftable_log_record_encode(const void *rec, struct string_view s,
+				      int hash_size)
+{
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	struct string_view start = s;
+	int n = 0;
+	uint8_t *oldh = r->old_hash;
+	uint8_t *newh = r->new_hash;
+	if (reftable_log_record_is_deletion(r))
+		return 0;
+
+	if (oldh == NULL) {
+		oldh = zero;
+	}
+	if (newh == NULL) {
+		newh = zero;
+	}
+
+	if (s.len < 2 * hash_size)
+		return -1;
+
+	memcpy(s.buf, oldh, hash_size);
+	memcpy(s.buf + hash_size, newh, hash_size);
+	string_view_consume(&s, 2 * hash_size);
+
+	n = encode_string(r->name ? r->name : "", s);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	n = encode_string(r->email ? r->email : "", s);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	n = put_var_int(&s, r->time);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	if (s.len < 2)
+		return -1;
+
+	put_be16(s.buf, r->tz_offset);
+	string_view_consume(&s, 2);
+
+	n = encode_string(r->message ? r->message : "", s);
+	if (n < 0)
+		return -1;
+	string_view_consume(&s, n);
+
+	return start.len - s.len;
+}
+
+static int reftable_log_record_decode(void *rec, struct strbuf key,
+				      uint8_t val_type, struct string_view in,
+				      int hash_size)
+{
+	struct string_view start = in;
+	struct reftable_log_record *r = (struct reftable_log_record *)rec;
+	uint64_t max = 0;
+	uint64_t ts = 0;
+	struct strbuf dest = STRBUF_INIT;
+	int n;
+
+	if (key.len <= 9 || key.buf[key.len - 9] != 0)
+		return REFTABLE_FORMAT_ERROR;
+
+	r->refname = reftable_realloc(r->refname, key.len - 8);
+	memcpy(r->refname, key.buf, key.len - 8);
+	ts = get_be64(key.buf + key.len - 8);
+
+	r->update_index = (~max) - ts;
+
+	if (val_type == 0) {
+		FREE_AND_NULL(r->old_hash);
+		FREE_AND_NULL(r->new_hash);
+		FREE_AND_NULL(r->message);
+		FREE_AND_NULL(r->email);
+		FREE_AND_NULL(r->name);
+		return 0;
+	}
+
+	if (in.len < 2 * hash_size)
+		return REFTABLE_FORMAT_ERROR;
+
+	r->old_hash = reftable_realloc(r->old_hash, hash_size);
+	r->new_hash = reftable_realloc(r->new_hash, hash_size);
+
+	memcpy(r->old_hash, in.buf, hash_size);
+	memcpy(r->new_hash, in.buf + hash_size, hash_size);
+
+	string_view_consume(&in, 2 * hash_size);
+
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	string_view_consume(&in, n);
+
+	r->name = reftable_realloc(r->name, dest.len + 1);
+	memcpy(r->name, dest.buf, dest.len);
+	r->name[dest.len] = 0;
+
+	strbuf_reset(&dest);
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	string_view_consume(&in, n);
+
+	r->email = reftable_realloc(r->email, dest.len + 1);
+	memcpy(r->email, dest.buf, dest.len);
+	r->email[dest.len] = 0;
+
+	ts = 0;
+	n = get_var_int(&ts, &in);
+	if (n < 0)
+		goto done;
+	string_view_consume(&in, n);
+	r->time = ts;
+	if (in.len < 2)
+		goto done;
+
+	r->tz_offset = get_be16(in.buf);
+	string_view_consume(&in, 2);
+
+	strbuf_reset(&dest);
+	n = decode_string(&dest, in);
+	if (n < 0)
+		goto done;
+	string_view_consume(&in, n);
+
+	r->message = reftable_realloc(r->message, dest.len + 1);
+	memcpy(r->message, dest.buf, dest.len);
+	r->message[dest.len] = 0;
+
+	strbuf_release(&dest);
+	return start.len - in.len;
+
+done:
+	strbuf_release(&dest);
+	return REFTABLE_FORMAT_ERROR;
+}
+
+static bool null_streq(char *a, char *b)
+{
+	char *empty = "";
+	if (a == NULL)
+		a = empty;
+
+	if (b == NULL)
+		b = empty;
+
+	return 0 == strcmp(a, b);
+}
+
+static bool zero_hash_eq(uint8_t *a, uint8_t *b, int sz)
+{
+	if (a == NULL)
+		a = zero;
+
+	if (b == NULL)
+		b = zero;
+
+	return !memcmp(a, b, sz);
+}
+
+bool reftable_log_record_equal(struct reftable_log_record *a,
+			       struct reftable_log_record *b, int hash_size)
+{
+	return null_streq(a->name, b->name) && null_streq(a->email, b->email) &&
+	       null_streq(a->message, b->message) &&
+	       zero_hash_eq(a->old_hash, b->old_hash, hash_size) &&
+	       zero_hash_eq(a->new_hash, b->new_hash, hash_size) &&
+	       a->time == b->time && a->tz_offset == b->tz_offset &&
+	       a->update_index == b->update_index;
+}
+
+static bool reftable_log_record_is_deletion_void(const void *p)
+{
+	return reftable_log_record_is_deletion(
+		(const struct reftable_log_record *)p);
+}
+
+struct reftable_record_vtable reftable_log_record_vtable = {
+	.key = &reftable_log_record_key,
+	.type = BLOCK_TYPE_LOG,
+	.copy_from = &reftable_log_record_copy_from,
+	.val_type = &reftable_log_record_val_type,
+	.encode = &reftable_log_record_encode,
+	.decode = &reftable_log_record_decode,
+	.clear = &reftable_log_record_clear_void,
+	.is_deletion = &reftable_log_record_is_deletion_void,
+};
+
+struct reftable_record reftable_new_record(uint8_t typ)
+{
+	struct reftable_record rec = { NULL };
+	switch (typ) {
+	case BLOCK_TYPE_REF: {
+		struct reftable_ref_record *r =
+			reftable_calloc(sizeof(struct reftable_ref_record));
+		reftable_record_from_ref(&rec, r);
+		return rec;
+	}
+
+	case BLOCK_TYPE_OBJ: {
+		struct reftable_obj_record *r =
+			reftable_calloc(sizeof(struct reftable_obj_record));
+		reftable_record_from_obj(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_LOG: {
+		struct reftable_log_record *r =
+			reftable_calloc(sizeof(struct reftable_log_record));
+		reftable_record_from_log(&rec, r);
+		return rec;
+	}
+	case BLOCK_TYPE_INDEX: {
+		struct reftable_index_record empty = { .last_key =
+							       STRBUF_INIT };
+		struct reftable_index_record *r =
+			reftable_calloc(sizeof(struct reftable_index_record));
+		*r = empty;
+		reftable_record_from_index(&rec, r);
+		return rec;
+	}
+	}
+	abort();
+	return rec;
+}
+
+void *reftable_record_yield(struct reftable_record *rec)
+{
+	void *p = rec->data;
+	rec->data = NULL;
+	return p;
+}
+
+void reftable_record_destroy(struct reftable_record *rec)
+{
+	reftable_record_clear(rec);
+	reftable_free(reftable_record_yield(rec));
+}
+
+static void reftable_index_record_key(const void *r, struct strbuf *dest)
+{
+	struct reftable_index_record *rec = (struct reftable_index_record *)r;
+	strbuf_reset(dest);
+	strbuf_addbuf(dest, &rec->last_key);
+}
+
+static void reftable_index_record_copy_from(void *rec, const void *src_rec,
+					    int hash_size)
+{
+	struct reftable_index_record *dst = (struct reftable_index_record *)rec;
+	struct reftable_index_record *src =
+		(struct reftable_index_record *)src_rec;
+
+	strbuf_reset(&dst->last_key);
+	strbuf_addbuf(&dst->last_key, &src->last_key);
+	dst->offset = src->offset;
+}
+
+static void reftable_index_record_clear(void *rec)
+{
+	struct reftable_index_record *idx = (struct reftable_index_record *)rec;
+	strbuf_release(&idx->last_key);
+}
+
+static uint8_t reftable_index_record_val_type(const void *rec)
+{
+	return 0;
+}
+
+static int reftable_index_record_encode(const void *rec, struct string_view out,
+					int hash_size)
+{
+	const struct reftable_index_record *r =
+		(const struct reftable_index_record *)rec;
+	struct string_view start = out;
+
+	int n = put_var_int(&out, r->offset);
+	if (n < 0)
+		return n;
+
+	string_view_consume(&out, n);
+
+	return start.len - out.len;
+}
+
+static int reftable_index_record_decode(void *rec, struct strbuf key,
+					uint8_t val_type, struct string_view in,
+					int hash_size)
+{
+	struct string_view start = in;
+	struct reftable_index_record *r = (struct reftable_index_record *)rec;
+	int n = 0;
+
+	strbuf_reset(&r->last_key);
+	strbuf_addbuf(&r->last_key, &key);
+
+	n = get_var_int(&r->offset, &in);
+	if (n < 0)
+		return n;
+
+	string_view_consume(&in, n);
+	return start.len - in.len;
+}
+
+struct reftable_record_vtable reftable_index_record_vtable = {
+	.key = &reftable_index_record_key,
+	.type = BLOCK_TYPE_INDEX,
+	.copy_from = &reftable_index_record_copy_from,
+	.val_type = &reftable_index_record_val_type,
+	.encode = &reftable_index_record_encode,
+	.decode = &reftable_index_record_decode,
+	.clear = &reftable_index_record_clear,
+	.is_deletion = &not_a_deletion,
+};
+
+void reftable_record_key(struct reftable_record *rec, struct strbuf *dest)
+{
+	rec->ops->key(rec->data, dest);
+}
+
+uint8_t reftable_record_type(struct reftable_record *rec)
+{
+	return rec->ops->type;
+}
+
+int reftable_record_encode(struct reftable_record *rec, struct string_view dest,
+			   int hash_size)
+{
+	return rec->ops->encode(rec->data, dest, hash_size);
+}
+
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size)
+{
+	assert(src->ops->type == rec->ops->type);
+
+	rec->ops->copy_from(rec->data, src->data, hash_size);
+}
+
+uint8_t reftable_record_val_type(struct reftable_record *rec)
+{
+	return rec->ops->val_type(rec->data);
+}
+
+int reftable_record_decode(struct reftable_record *rec, struct strbuf key,
+			   uint8_t extra, struct string_view src, int hash_size)
+{
+	return rec->ops->decode(rec->data, key, extra, src, hash_size);
+}
+
+void reftable_record_clear(struct reftable_record *rec)
+{
+	rec->ops->clear(rec->data);
+}
+
+bool reftable_record_is_deletion(struct reftable_record *rec)
+{
+	return rec->ops->is_deletion(rec->data);
+}
+
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *ref_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = ref_rec;
+	rec->ops = &reftable_ref_record_vtable;
+}
+
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *obj_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = obj_rec;
+	rec->ops = &reftable_obj_record_vtable;
+}
+
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *index_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = index_rec;
+	rec->ops = &reftable_index_record_vtable;
+}
+
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *log_rec)
+{
+	assert(rec->ops == NULL);
+	rec->data = log_rec;
+	rec->ops = &reftable_log_record_vtable;
+}
+
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_REF);
+	return (struct reftable_ref_record *)rec->data;
+}
+
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *rec)
+{
+	assert(reftable_record_type(rec) == BLOCK_TYPE_LOG);
+	return (struct reftable_log_record *)rec->data;
+}
+
+static bool hash_equal(uint8_t *a, uint8_t *b, int hash_size)
+{
+	if (a != NULL && b != NULL)
+		return !memcmp(a, b, hash_size);
+
+	return a == b;
+}
+
+static bool str_equal(char *a, char *b)
+{
+	if (a != NULL && b != NULL)
+		return 0 == strcmp(a, b);
+
+	return a == b;
+}
+
+bool reftable_ref_record_equal(struct reftable_ref_record *a,
+			       struct reftable_ref_record *b, int hash_size)
+{
+	assert(hash_size > 0);
+	return 0 == strcmp(a->refname, b->refname) &&
+	       a->update_index == b->update_index &&
+	       hash_equal(a->value, b->value, hash_size) &&
+	       hash_equal(a->target_value, b->target_value, hash_size) &&
+	       str_equal(a->target, b->target);
+}
+
+int reftable_ref_record_compare_name(const void *a, const void *b)
+{
+	return strcmp(((struct reftable_ref_record *)a)->refname,
+		      ((struct reftable_ref_record *)b)->refname);
+}
+
+bool reftable_ref_record_is_deletion(const struct reftable_ref_record *ref)
+{
+	return ref->value == NULL && ref->target == NULL &&
+	       ref->target_value == NULL;
+}
+
+int reftable_log_record_compare_key(const void *a, const void *b)
+{
+	struct reftable_log_record *la = (struct reftable_log_record *)a;
+	struct reftable_log_record *lb = (struct reftable_log_record *)b;
+
+	int cmp = strcmp(la->refname, lb->refname);
+	if (cmp)
+		return cmp;
+	if (la->update_index > lb->update_index)
+		return -1;
+	return (la->update_index < lb->update_index) ? 1 : 0;
+}
+
+bool reftable_log_record_is_deletion(const struct reftable_log_record *log)
+{
+	return (log->new_hash == NULL && log->old_hash == NULL &&
+		log->name == NULL && log->email == NULL &&
+		log->message == NULL && log->time == 0 && log->tz_offset == 0 &&
+		log->message == NULL);
+}
+
+int hash_size(uint32_t id)
+{
+	switch (id) {
+	case 0:
+	case SHA1_ID:
+		return SHA1_SIZE;
+	case SHA256_ID:
+		return SHA256_SIZE;
+	}
+	abort();
+}
+
+void string_view_consume(struct string_view *s, int n)
+{
+	s->buf += n;
+	s->len -= n;
+}
diff --git a/reftable/record.h b/reftable/record.h
new file mode 100644
index 0000000000..339e3888c9
--- /dev/null
+++ b/reftable/record.h
@@ -0,0 +1,143 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef RECORD_H
+#define RECORD_H
+
+#include "reftable.h"
+#include "strbuf.h"
+#include "system.h"
+
+/*
+  A substring of existing string data. This structure takes no responsibility
+  for the lifetime of the data it points to.
+*/
+struct string_view {
+	uint8_t *buf;
+	size_t len;
+};
+
+/* Advance `s.buf` by `n`, and decrease length. */
+void string_view_consume(struct string_view *s, int n);
+
+/* utilities for de/encoding varints */
+
+int get_var_int(uint64_t *dest, struct string_view *in);
+int put_var_int(struct string_view *dest, uint64_t val);
+
+/* Methods for records. */
+struct reftable_record_vtable {
+	/* encode the key of to a uint8_t strbuf. */
+	void (*key)(const void *rec, struct strbuf *dest);
+
+	/* The record type of ('r' for ref). */
+	uint8_t type;
+
+	void (*copy_from)(void *dest, const void *src, int hash_size);
+
+	/* a value of [0..7], indicating record subvariants (eg. ref vs. symref
+	 * vs ref deletion) */
+	uint8_t (*val_type)(const void *rec);
+
+	/* encodes rec into dest, returning how much space was used. */
+	int (*encode)(const void *rec, struct string_view dest, int hash_size);
+
+	/* decode data from `src` into the record. */
+	int (*decode)(void *rec, struct strbuf key, uint8_t extra,
+		      struct string_view src, int hash_size);
+
+	/* deallocate and null the record. */
+	void (*clear)(void *rec);
+
+	/* is this a tombstone? */
+	bool (*is_deletion)(const void *rec);
+};
+
+/* record is a generic wrapper for different types of records. */
+struct reftable_record {
+	void *data;
+	struct reftable_record_vtable *ops;
+};
+
+/* returns true for recognized block types. Block start with the block type. */
+int reftable_is_block_type(uint8_t typ);
+
+/* creates a malloced record of the given type. Dispose with record_destroy */
+struct reftable_record reftable_new_record(uint8_t typ);
+
+extern struct reftable_record_vtable reftable_ref_record_vtable;
+
+/* Encode `key` into `dest`. Sets `restart` to indicate a restart. Returns
+   number of bytes written. */
+int reftable_encode_key(bool *restart, struct string_view dest,
+			struct strbuf prev_key, struct strbuf key,
+			uint8_t extra);
+
+/* Decode into `key` and `extra` from `in` */
+int reftable_decode_key(struct strbuf *key, uint8_t *extra,
+			struct strbuf last_key, struct string_view in);
+
+/* reftable_index_record are used internally to speed up lookups. */
+struct reftable_index_record {
+	uint64_t offset; /* Offset of block */
+	struct strbuf last_key; /* Last key of the block. */
+};
+
+/* reftable_obj_record stores an object ID => ref mapping. */
+struct reftable_obj_record {
+	uint8_t *hash_prefix; /* leading bytes of the object ID */
+	int hash_prefix_len; /* number of leading bytes. Constant
+			      * across a single table. */
+	uint64_t *offsets; /* a vector of file offsets. */
+	int offset_len;
+};
+
+/* see struct record_vtable */
+
+void reftable_record_key(struct reftable_record *rec, struct strbuf *dest);
+uint8_t reftable_record_type(struct reftable_record *rec);
+void reftable_record_copy_from(struct reftable_record *rec,
+			       struct reftable_record *src, int hash_size);
+uint8_t reftable_record_val_type(struct reftable_record *rec);
+int reftable_record_encode(struct reftable_record *rec, struct string_view dest,
+			   int hash_size);
+int reftable_record_decode(struct reftable_record *rec, struct strbuf key,
+			   uint8_t extra, struct string_view src,
+			   int hash_size);
+bool reftable_record_is_deletion(struct reftable_record *rec);
+
+/* zeroes out the embedded record */
+void reftable_record_clear(struct reftable_record *rec);
+
+/* clear out the record, yielding the reftable_record data that was
+ * encapsulated. */
+void *reftable_record_yield(struct reftable_record *rec);
+
+/* clear and deallocate embedded record, and zero `rec`. */
+void reftable_record_destroy(struct reftable_record *rec);
+
+/* initialize generic records from concrete records. The generic record should
+ * be zeroed out. */
+void reftable_record_from_obj(struct reftable_record *rec,
+			      struct reftable_obj_record *objrec);
+void reftable_record_from_index(struct reftable_record *rec,
+				struct reftable_index_record *idxrec);
+void reftable_record_from_ref(struct reftable_record *rec,
+			      struct reftable_ref_record *refrec);
+void reftable_record_from_log(struct reftable_record *rec,
+			      struct reftable_log_record *logrec);
+struct reftable_ref_record *reftable_record_as_ref(struct reftable_record *ref);
+struct reftable_log_record *reftable_record_as_log(struct reftable_record *ref);
+
+/* for qsort. */
+int reftable_ref_record_compare_name(const void *a, const void *b);
+
+/* for qsort. */
+int reftable_log_record_compare_key(const void *a, const void *b);
+
+#endif
diff --git a/reftable/record_test.c b/reftable/record_test.c
new file mode 100644
index 0000000000..52b22d7dc7
--- /dev/null
+++ b/reftable/record_test.c
@@ -0,0 +1,410 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "record.h"
+
+#include "system.h"
+#include "basics.h"
+#include "constants.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_copy(struct reftable_record *rec)
+{
+	struct reftable_record copy =
+		reftable_new_record(reftable_record_type(rec));
+	reftable_record_copy_from(&copy, rec, SHA1_SIZE);
+	/* do it twice to catch memory leaks */
+	reftable_record_copy_from(&copy, rec, SHA1_SIZE);
+	switch (reftable_record_type(&copy)) {
+	case BLOCK_TYPE_REF:
+		assert(reftable_ref_record_equal(reftable_record_as_ref(&copy),
+						 reftable_record_as_ref(rec),
+						 SHA1_SIZE));
+		break;
+	case BLOCK_TYPE_LOG:
+		assert(reftable_log_record_equal(reftable_record_as_log(&copy),
+						 reftable_record_as_log(rec),
+						 SHA1_SIZE));
+		break;
+	}
+	reftable_record_destroy(&copy);
+}
+
+static void test_varint_roundtrip(void)
+{
+	uint64_t inputs[] = { 0,
+			      1,
+			      27,
+			      127,
+			      128,
+			      257,
+			      4096,
+			      ((uint64_t)1 << 63),
+			      ((uint64_t)1 << 63) + ((uint64_t)1 << 63) - 1 };
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(inputs); i++) {
+		uint8_t dest[10];
+
+		struct string_view out = {
+			.buf = dest,
+			.len = sizeof(dest),
+		};
+		uint64_t in = inputs[i];
+		int n = put_var_int(&out, in);
+		uint64_t got = 0;
+
+		assert(n > 0);
+		out.len = n;
+		n = get_var_int(&got, &out);
+		assert(n > 0);
+
+		assert(got == in);
+	}
+}
+
+static void test_common_prefix(void)
+{
+	struct {
+		const char *a, *b;
+		int want;
+	} cases[] = {
+		{ "abc", "ab", 2 },
+		{ "", "abc", 0 },
+		{ "abc", "abd", 2 },
+		{ "abc", "pqr", 0 },
+	};
+
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		struct strbuf a = STRBUF_INIT;
+		struct strbuf b = STRBUF_INIT;
+		strbuf_addstr(&a, cases[i].a);
+		strbuf_addstr(&b, cases[i].b);
+		assert(common_prefix_size(&a, &b) == cases[i].want);
+
+		strbuf_release(&a);
+		strbuf_release(&b);
+	}
+}
+
+static void set_hash(uint8_t *h, int j)
+{
+	int i = 0;
+	for (i = 0; i < hash_size(SHA1_ID); i++) {
+		h[i] = (j >> i) & 0xff;
+	}
+}
+
+static void test_reftable_ref_record_roundtrip(void)
+{
+	int i = 0;
+
+	for (i = 0; i <= 3; i++) {
+		struct reftable_ref_record in = { 0 };
+		struct reftable_ref_record out = {
+			.refname = xstrdup("old name"),
+			.value = reftable_calloc(SHA1_SIZE),
+			.target_value = reftable_calloc(SHA1_SIZE),
+			.target = xstrdup("old value"),
+		};
+		struct reftable_record rec_out = { 0 };
+		struct strbuf key = STRBUF_INIT;
+		struct reftable_record rec = { 0 };
+		uint8_t buffer[1024] = { 0 };
+		struct string_view dest = {
+			.buf = buffer,
+			.len = sizeof(buffer),
+		};
+
+		int n, m;
+
+		switch (i) {
+		case 0:
+			break;
+		case 1:
+			in.value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.value, 1);
+			break;
+		case 2:
+			in.value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.value, 1);
+			in.target_value = reftable_malloc(SHA1_SIZE);
+			set_hash(in.target_value, 2);
+			break;
+		case 3:
+			in.target = xstrdup("target");
+			break;
+		}
+		in.refname = xstrdup("refs/heads/master");
+
+		reftable_record_from_ref(&rec, &in);
+		test_copy(&rec);
+
+		assert(reftable_record_val_type(&rec) == i);
+
+		reftable_record_key(&rec, &key);
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n > 0);
+
+		/* decode into a non-zero reftable_record to test for leaks. */
+
+		reftable_record_from_ref(&rec_out, &out);
+		m = reftable_record_decode(&rec_out, key, i, dest, SHA1_SIZE);
+		assert(n == m);
+
+		assert((out.value != NULL) == (in.value != NULL));
+		assert((out.target_value != NULL) == (in.target_value != NULL));
+		assert((out.target != NULL) == (in.target != NULL));
+		reftable_record_clear(&rec_out);
+
+		strbuf_release(&key);
+		reftable_ref_record_clear(&in);
+	}
+}
+
+static void test_reftable_log_record_equal(void)
+{
+	struct reftable_log_record in[2] = {
+		{
+			.refname = xstrdup("refs/heads/master"),
+			.update_index = 42,
+		},
+		{
+			.refname = xstrdup("refs/heads/master"),
+			.update_index = 22,
+		}
+	};
+
+	assert(!reftable_log_record_equal(&in[0], &in[1], SHA1_SIZE));
+	in[1].update_index = in[0].update_index;
+	assert(reftable_log_record_equal(&in[0], &in[1], SHA1_SIZE));
+	reftable_log_record_clear(&in[0]);
+	reftable_log_record_clear(&in[1]);
+}
+
+static void test_reftable_log_record_roundtrip(void)
+{
+	struct reftable_log_record in[2] = {
+		{
+			.refname = xstrdup("refs/heads/master"),
+			.old_hash = reftable_malloc(SHA1_SIZE),
+			.new_hash = reftable_malloc(SHA1_SIZE),
+			.name = xstrdup("han-wen"),
+			.email = xstrdup("hanwen@google.com"),
+			.message = xstrdup("test"),
+			.update_index = 42,
+			.time = 1577123507,
+			.tz_offset = 100,
+		},
+		{
+			.refname = xstrdup("refs/heads/master"),
+			.update_index = 22,
+		}
+	};
+	set_test_hash(in[0].new_hash, 1);
+	set_test_hash(in[0].old_hash, 2);
+	for (int i = 0; i < ARRAY_SIZE(in); i++) {
+		struct reftable_record rec = { 0 };
+		struct strbuf key = STRBUF_INIT;
+		uint8_t buffer[1024] = { 0 };
+		struct string_view dest = {
+			.buf = buffer,
+			.len = sizeof(buffer),
+		};
+		/* populate out, to check for leaks. */
+		struct reftable_log_record out = {
+			.refname = xstrdup("old name"),
+			.new_hash = reftable_calloc(SHA1_SIZE),
+			.old_hash = reftable_calloc(SHA1_SIZE),
+			.name = xstrdup("old name"),
+			.email = xstrdup("old@email"),
+			.message = xstrdup("old message"),
+		};
+		struct reftable_record rec_out = { 0 };
+		int n, m, valtype;
+
+		reftable_record_from_log(&rec, &in[i]);
+
+		test_copy(&rec);
+
+		reftable_record_key(&rec, &key);
+
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n >= 0);
+		reftable_record_from_log(&rec_out, &out);
+		valtype = reftable_record_val_type(&rec);
+		m = reftable_record_decode(&rec_out, key, valtype, dest,
+					   SHA1_SIZE);
+		assert(n == m);
+
+		assert(reftable_log_record_equal(&in[i], &out, SHA1_SIZE));
+		reftable_log_record_clear(&in[i]);
+		strbuf_release(&key);
+		reftable_record_clear(&rec_out);
+	}
+}
+
+static void test_u24_roundtrip(void)
+{
+	uint32_t in = 0x112233;
+	uint8_t dest[3];
+	uint32_t out;
+	put_be24(dest, in);
+	out = get_be24(dest);
+	assert(in == out);
+}
+
+static void test_key_roundtrip(void)
+{
+	uint8_t buffer[1024] = { 0 };
+	struct string_view dest = {
+		.buf = buffer,
+		.len = sizeof(buffer),
+	};
+	struct strbuf last_key = STRBUF_INIT;
+	struct strbuf key = STRBUF_INIT;
+	struct strbuf roundtrip = STRBUF_INIT;
+	bool restart;
+	uint8_t extra;
+	int n, m;
+	uint8_t rt_extra;
+
+	strbuf_addstr(&last_key, "refs/heads/master");
+	strbuf_addstr(&key, "refs/tags/bla");
+	extra = 6;
+	n = reftable_encode_key(&restart, dest, last_key, key, extra);
+	assert(!restart);
+	assert(n > 0);
+
+	m = reftable_decode_key(&roundtrip, &rt_extra, last_key, dest);
+	assert(n == m);
+	assert(0 == strbuf_cmp(&key, &roundtrip));
+	assert(rt_extra == extra);
+
+	strbuf_release(&last_key);
+	strbuf_release(&key);
+	strbuf_release(&roundtrip);
+}
+
+static void test_reftable_obj_record_roundtrip(void)
+{
+	uint8_t testHash1[SHA1_SIZE] = { 1, 2, 3, 4, 0 };
+	uint64_t till9[] = { 1, 2, 3, 4, 500, 600, 700, 800, 9000 };
+	struct reftable_obj_record recs[3] = { {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+						       .offsets = till9,
+						       .offset_len = 3,
+					       },
+					       {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+						       .offsets = till9,
+						       .offset_len = 9,
+					       },
+					       {
+						       .hash_prefix = testHash1,
+						       .hash_prefix_len = 5,
+					       } };
+	int i = 0;
+	for (i = 0; i < ARRAY_SIZE(recs); i++) {
+		struct reftable_obj_record in = recs[i];
+		uint8_t buffer[1024] = { 0 };
+		struct string_view dest = {
+			.buf = buffer,
+			.len = sizeof(buffer),
+		};
+		struct reftable_record rec = { 0 };
+		struct strbuf key = STRBUF_INIT;
+		struct reftable_obj_record out = { 0 };
+		struct reftable_record rec_out = { 0 };
+		int n, m;
+		uint8_t extra;
+
+		reftable_record_from_obj(&rec, &in);
+		test_copy(&rec);
+		reftable_record_key(&rec, &key);
+		n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+		assert(n > 0);
+		extra = reftable_record_val_type(&rec);
+		reftable_record_from_obj(&rec_out, &out);
+		m = reftable_record_decode(&rec_out, key, extra, dest,
+					   SHA1_SIZE);
+		assert(n == m);
+
+		assert(in.hash_prefix_len == out.hash_prefix_len);
+		assert(in.offset_len == out.offset_len);
+
+		assert(!memcmp(in.hash_prefix, out.hash_prefix,
+			       in.hash_prefix_len));
+		assert(0 == memcmp(in.offsets, out.offsets,
+				   sizeof(uint64_t) * in.offset_len));
+		strbuf_release(&key);
+		reftable_record_clear(&rec_out);
+	}
+}
+
+static void test_reftable_index_record_roundtrip(void)
+{
+	struct reftable_index_record in = {
+		.offset = 42,
+		.last_key = STRBUF_INIT,
+	};
+	uint8_t buffer[1024] = { 0 };
+	struct string_view dest = {
+		.buf = buffer,
+		.len = sizeof(buffer),
+	};
+	struct strbuf key = STRBUF_INIT;
+	struct reftable_record rec = { 0 };
+	struct reftable_index_record out = { .last_key = STRBUF_INIT };
+	struct reftable_record out_rec = { NULL };
+	int n, m;
+	uint8_t extra;
+
+	strbuf_addstr(&in.last_key, "refs/heads/master");
+	reftable_record_from_index(&rec, &in);
+	reftable_record_key(&rec, &key);
+	test_copy(&rec);
+
+	assert(0 == strbuf_cmp(&key, &in.last_key));
+	n = reftable_record_encode(&rec, dest, SHA1_SIZE);
+	assert(n > 0);
+
+	extra = reftable_record_val_type(&rec);
+	reftable_record_from_index(&out_rec, &out);
+	m = reftable_record_decode(&out_rec, key, extra, dest, SHA1_SIZE);
+	assert(m == n);
+
+	assert(in.offset == out.offset);
+
+	reftable_record_clear(&out_rec);
+	strbuf_release(&key);
+	strbuf_release(&in.last_key);
+}
+
+int record_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_reftable_log_record_equal",
+		      &test_reftable_log_record_equal);
+	add_test_case("test_reftable_log_record_roundtrip",
+		      &test_reftable_log_record_roundtrip);
+	add_test_case("test_reftable_ref_record_roundtrip",
+		      &test_reftable_ref_record_roundtrip);
+	add_test_case("test_varint_roundtrip", &test_varint_roundtrip);
+	add_test_case("test_key_roundtrip", &test_key_roundtrip);
+	add_test_case("test_common_prefix", &test_common_prefix);
+	add_test_case("test_reftable_obj_record_roundtrip",
+		      &test_reftable_obj_record_roundtrip);
+	add_test_case("test_reftable_index_record_roundtrip",
+		      &test_reftable_index_record_roundtrip);
+	add_test_case("test_u24_roundtrip", &test_u24_roundtrip);
+	return test_main(argc, argv);
+}
diff --git a/reftable/refname.c b/reftable/refname.c
new file mode 100644
index 0000000000..c43393b943
--- /dev/null
+++ b/reftable/refname.c
@@ -0,0 +1,209 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "system.h"
+#include "reftable.h"
+#include "basics.h"
+#include "refname.h"
+#include "strbuf.h"
+
+struct find_arg {
+	char **names;
+	const char *want;
+};
+
+static int find_name(size_t k, void *arg)
+{
+	struct find_arg *f_arg = (struct find_arg *)arg;
+	return strcmp(f_arg->names[k], f_arg->want) >= 0;
+}
+
+int modification_has_ref(struct modification *mod, const char *name)
+{
+	struct reftable_ref_record ref = { 0 };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = name,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len && !strcmp(mod->add[idx], name)) {
+			return 0;
+		}
+	}
+
+	if (mod->del_len > 0) {
+		struct find_arg arg = {
+			.names = mod->del,
+			.want = name,
+		};
+		int idx = binsearch(mod->del_len, find_name, &arg);
+		if (idx < mod->del_len && !strcmp(mod->del[idx], name)) {
+			return 1;
+		}
+	}
+
+	err = reftable_table_read_ref(&mod->tab, name, &ref);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static void modification_clear(struct modification *mod)
+{
+	/* don't delete the strings themselves; they're owned by ref records.
+	 */
+	FREE_AND_NULL(mod->add);
+	FREE_AND_NULL(mod->del);
+	mod->add_len = 0;
+	mod->del_len = 0;
+}
+
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix)
+{
+	struct reftable_iterator it = { NULL };
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+
+	if (mod->add_len > 0) {
+		struct find_arg arg = {
+			.names = mod->add,
+			.want = prefix,
+		};
+		int idx = binsearch(mod->add_len, find_name, &arg);
+		if (idx < mod->add_len &&
+		    !strncmp(prefix, mod->add[idx], strlen(prefix)))
+			goto done;
+	}
+	err = reftable_table_seek_ref(&mod->tab, &it, prefix);
+	if (err)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err)
+			goto done;
+
+		if (mod->del_len > 0) {
+			struct find_arg arg = {
+				.names = mod->del,
+				.want = ref.refname,
+			};
+			int idx = binsearch(mod->del_len, find_name, &arg);
+			if (idx < mod->del_len &&
+			    !strcmp(ref.refname, mod->del[idx])) {
+				continue;
+			}
+		}
+
+		if (strncmp(ref.refname, prefix, strlen(prefix))) {
+			err = 1;
+			goto done;
+		}
+		err = 0;
+		goto done;
+	}
+
+done:
+	reftable_ref_record_clear(&ref);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int validate_refname(const char *name)
+{
+	while (true) {
+		char *next = strchr(name, '/');
+		if (!*name) {
+			return REFTABLE_REFNAME_ERROR;
+		}
+		if (!next) {
+			return 0;
+		}
+		if (next - name == 0 || (next - name == 1 && *name == '.') ||
+		    (next - name == 2 && name[0] == '.' && name[1] == '.'))
+			return REFTABLE_REFNAME_ERROR;
+		name = next + 1;
+	}
+	return 0;
+}
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz)
+{
+	struct modification mod = {
+		.tab = tab,
+		.add = reftable_calloc(sizeof(char *) * sz),
+		.del = reftable_calloc(sizeof(char *) * sz),
+	};
+	int i = 0;
+	int err = 0;
+	for (; i < sz; i++) {
+		if (reftable_ref_record_is_deletion(&recs[i])) {
+			mod.del[mod.del_len++] = recs[i].refname;
+		} else {
+			mod.add[mod.add_len++] = recs[i].refname;
+		}
+	}
+
+	err = modification_validate(&mod);
+	modification_clear(&mod);
+	return err;
+}
+
+static void strbuf_trim_component(struct strbuf *sl)
+{
+	while (sl->len > 0) {
+		bool is_slash = (sl->buf[sl->len - 1] == '/');
+		strbuf_setlen(sl, sl->len - 1);
+		if (is_slash)
+			break;
+	}
+}
+
+int modification_validate(struct modification *mod)
+{
+	struct strbuf slashed = STRBUF_INIT;
+	int err = 0;
+	int i = 0;
+	for (; i < mod->add_len; i++) {
+		err = validate_refname(mod->add[i]);
+		if (err)
+			goto done;
+		strbuf_reset(&slashed);
+		strbuf_addstr(&slashed, mod->add[i]);
+		strbuf_addstr(&slashed, "/");
+
+		err = modification_has_ref_with_prefix(mod, slashed.buf);
+		if (err == 0) {
+			err = REFTABLE_NAME_CONFLICT;
+			goto done;
+		}
+		if (err < 0)
+			goto done;
+
+		strbuf_reset(&slashed);
+		strbuf_addstr(&slashed, mod->add[i]);
+		while (slashed.len) {
+			strbuf_trim_component(&slashed);
+			err = modification_has_ref(mod, slashed.buf);
+			if (err == 0) {
+				err = REFTABLE_NAME_CONFLICT;
+				goto done;
+			}
+			if (err < 0)
+				goto done;
+		}
+	}
+	err = 0;
+done:
+	strbuf_release(&slashed);
+	return err;
+}
diff --git a/reftable/refname.h b/reftable/refname.h
new file mode 100644
index 0000000000..13bf1e6861
--- /dev/null
+++ b/reftable/refname.h
@@ -0,0 +1,38 @@
+/*
+  Copyright 2020 Google LLC
+
+  Use of this source code is governed by a BSD-style
+  license that can be found in the LICENSE file or at
+  https://developers.google.com/open-source/licenses/bsd
+*/
+#ifndef REFNAME_H
+#define REFNAME_H
+
+#include "reftable.h"
+
+struct modification {
+	struct reftable_table tab;
+
+	char **add;
+	size_t add_len;
+
+	char **del;
+	size_t del_len;
+};
+
+// -1 = error, 0 = found, 1 = not found
+int modification_has_ref(struct modification *mod, const char *name);
+
+// -1 = error, 0 = found, 1 = not found.
+int modification_has_ref_with_prefix(struct modification *mod,
+				     const char *prefix);
+
+// 0 = OK.
+int validate_refname(const char *name);
+
+int validate_ref_record_addition(struct reftable_table tab,
+				 struct reftable_ref_record *recs, size_t sz);
+
+int modification_validate(struct modification *mod);
+
+#endif
diff --git a/reftable/refname_test.c b/reftable/refname_test.c
new file mode 100644
index 0000000000..f4ad544e23
--- /dev/null
+++ b/reftable/refname_test.c
@@ -0,0 +1,99 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "record.h"
+#include "refname.h"
+#include "system.h"
+
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+struct testcase {
+	char *add;
+	char *del;
+	int error_code;
+};
+
+static void test_conflict(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+	struct reftable_ref_record rec = {
+		.refname = "a/b",
+		.target = "destination", /* make sure it's not a symref. */
+		.update_index = 1,
+	};
+	int err;
+	int i;
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct testcase cases[] = {
+		{ "a/b/c", NULL, REFTABLE_NAME_CONFLICT },
+		{ "b", NULL, 0 },
+		{ "a", NULL, REFTABLE_NAME_CONFLICT },
+		{ "a", "a/b", 0 },
+
+		{ "p/", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p//q", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p/./q", NULL, REFTABLE_REFNAME_ERROR },
+		{ "p/../q", NULL, REFTABLE_REFNAME_ERROR },
+
+		{ "a/b/c", "a/b", 0 },
+		{ NULL, "a//b", 0 },
+	};
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_add_ref(w, &rec);
+	assert_err(err);
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+
+	block_source_from_strbuf(&source, &buf);
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	reftable_table_from_reader(&tab, rd);
+
+	for (i = 0; i < ARRAY_SIZE(cases); i++) {
+		struct modification mod = {
+			.tab = tab,
+		};
+
+		if (cases[i].add != NULL) {
+			mod.add = &cases[i].add;
+			mod.add_len = 1;
+		}
+		if (cases[i].del != NULL) {
+			mod.del = &cases[i].del;
+			mod.del_len = 1;
+		}
+
+		err = modification_validate(&mod);
+		assert(err == cases[i].error_code);
+	}
+
+	reftable_reader_free(rd);
+	strbuf_release(&buf);
+}
+
+int refname_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_conflict", &test_conflict);
+	return test_main(argc, argv);
+}
diff --git a/reftable/reftable-tests.h b/reftable/reftable-tests.h
new file mode 100644
index 0000000000..e38471888f
--- /dev/null
+++ b/reftable/reftable-tests.h
@@ -0,0 +1,22 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_TESTS_H
+#define REFTABLE_TESTS_H
+
+int block_test_main(int argc, const char **argv);
+int merged_test_main(int argc, const char **argv);
+int record_test_main(int argc, const char **argv);
+int refname_test_main(int argc, const char **argv);
+int reftable_test_main(int argc, const char **argv);
+int strbuf_test_main(int argc, const char **argv);
+int stack_test_main(int argc, const char **argv);
+int tree_test_main(int argc, const char **argv);
+int reftable_dump_main(int argc, char *const *argv);
+
+#endif
diff --git a/reftable/reftable.c b/reftable/reftable.c
new file mode 100644
index 0000000000..676cc201f0
--- /dev/null
+++ b/reftable/reftable.c
@@ -0,0 +1,154 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+#include "record.h"
+#include "reader.h"
+#include "merged.h"
+
+struct reftable_table_vtable {
+	int (*seek_record)(void *tab, struct reftable_iterator *it,
+			   struct reftable_record *);
+	uint32_t (*hash_id)(void *tab);
+	uint64_t (*min_update_index)(void *tab);
+	uint64_t (*max_update_index)(void *tab);
+};
+
+static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
+				     struct reftable_record *rec)
+{
+	return reader_seek((struct reftable_reader *)tab, it, rec);
+}
+
+static uint32_t reftable_reader_hash_id_void(void *tab)
+{
+	return reftable_reader_hash_id((struct reftable_reader *)tab);
+}
+
+static uint64_t reftable_reader_min_update_index_void(void *tab)
+{
+	return reftable_reader_min_update_index((struct reftable_reader *)tab);
+}
+
+static uint64_t reftable_reader_max_update_index_void(void *tab)
+{
+	return reftable_reader_max_update_index((struct reftable_reader *)tab);
+}
+
+static struct reftable_table_vtable reader_vtable = {
+	.seek_record = reftable_reader_seek_void,
+	.hash_id = reftable_reader_hash_id_void,
+	.min_update_index = reftable_reader_min_update_index_void,
+	.max_update_index = reftable_reader_max_update_index_void,
+};
+
+static int reftable_merged_table_seek_void(void *tab,
+					   struct reftable_iterator *it,
+					   struct reftable_record *rec)
+{
+	return merged_table_seek_record((struct reftable_merged_table *)tab, it,
+					rec);
+}
+
+static uint32_t reftable_merged_table_hash_id_void(void *tab)
+{
+	return reftable_merged_table_hash_id(
+		(struct reftable_merged_table *)tab);
+}
+
+static uint64_t reftable_merged_table_min_update_index_void(void *tab)
+{
+	return reftable_merged_table_min_update_index(
+		(struct reftable_merged_table *)tab);
+}
+
+static uint64_t reftable_merged_table_max_update_index_void(void *tab)
+{
+	return reftable_merged_table_max_update_index(
+		(struct reftable_merged_table *)tab);
+}
+
+static struct reftable_table_vtable merged_table_vtable = {
+	.seek_record = reftable_merged_table_seek_void,
+	.hash_id = reftable_merged_table_hash_id_void,
+	.min_update_index = reftable_merged_table_min_update_index_void,
+	.max_update_index = reftable_merged_table_max_update_index_void,
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name)
+{
+	struct reftable_ref_record ref = {
+		.refname = (char *)name,
+	};
+	struct reftable_record rec = { 0 };
+	reftable_record_from_ref(&rec, &ref);
+	return tab->ops->seek_record(tab->table_arg, it, &rec);
+}
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &reader_vtable;
+	tab->table_arg = reader;
+}
+
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *merged)
+{
+	assert(tab->ops == NULL);
+	tab->ops = &merged_table_vtable;
+	tab->table_arg = merged;
+}
+
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_iterator it = { 0 };
+	int err = reftable_table_seek_ref(tab, &it, name);
+	if (err)
+		goto done;
+
+	err = reftable_iterator_next_ref(&it, ref);
+	if (err)
+		goto done;
+
+	if (strcmp(ref->refname, name) ||
+	    reftable_ref_record_is_deletion(ref)) {
+		reftable_ref_record_clear(ref);
+		err = 1;
+		goto done;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int reftable_table_seek_record(struct reftable_table *tab,
+			       struct reftable_iterator *it,
+			       struct reftable_record *rec)
+{
+	return tab->ops->seek_record(tab->table_arg, it, rec);
+}
+
+uint64_t reftable_table_max_update_index(struct reftable_table *tab)
+{
+	return tab->ops->max_update_index(tab->table_arg);
+}
+
+uint64_t reftable_table_min_update_index(struct reftable_table *tab)
+{
+	return tab->ops->min_update_index(tab->table_arg);
+}
+
+uint32_t reftable_table_hash_id(struct reftable_table *tab)
+{
+	return tab->ops->hash_id(tab->table_arg);
+}
diff --git a/reftable/reftable.h b/reftable/reftable.h
new file mode 100644
index 0000000000..5f8ebf5540
--- /dev/null
+++ b/reftable/reftable.h
@@ -0,0 +1,585 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef REFTABLE_H
+#define REFTABLE_H
+
+#include <stdint.h>
+#include <stddef.h>
+
+void reftable_set_alloc(void *(*malloc)(size_t),
+			void *(*realloc)(void *, size_t), void (*free)(void *));
+
+/****************************************************************
+ Basic data types
+
+ Reftables store the state of each ref in struct reftable_ref_record, and they
+ store a sequence of reflog updates in struct reftable_log_record.
+ ****************************************************************/
+
+/* reftable_ref_record holds a ref database entry target_value */
+struct reftable_ref_record {
+	char *refname; /* Name of the ref, malloced. */
+	uint64_t update_index; /* Logical timestamp at which this value is
+				  written */
+	uint8_t *value; /* SHA1, or NULL. malloced. */
+	uint8_t *target_value; /* peeled annotated tag, or NULL. malloced. */
+	char *target; /* symref, or NULL. malloced. */
+};
+
+/* returns whether 'ref' represents a deletion */
+int reftable_ref_record_is_deletion(const struct reftable_ref_record *ref);
+
+/* prints a reftable_ref_record onto stdout */
+void reftable_ref_record_print(struct reftable_ref_record *ref,
+			       uint32_t hash_id);
+
+/* frees and nulls all pointer values. */
+void reftable_ref_record_clear(struct reftable_ref_record *ref);
+
+/* returns whether two reftable_ref_records are the same */
+int reftable_ref_record_equal(struct reftable_ref_record *a,
+			      struct reftable_ref_record *b, int hash_size);
+
+/* reftable_log_record holds a reflog entry */
+struct reftable_log_record {
+	char *refname;
+	uint64_t update_index; /* logical timestamp of a transactional update.
+				*/
+	uint8_t *new_hash;
+	uint8_t *old_hash;
+	char *name;
+	char *email;
+	uint64_t time;
+	int16_t tz_offset;
+	char *message;
+};
+
+/* returns whether 'ref' represents the deletion of a log record. */
+int reftable_log_record_is_deletion(const struct reftable_log_record *log);
+
+/* frees and nulls all pointer values. */
+void reftable_log_record_clear(struct reftable_log_record *log);
+
+/* returns whether two records are equal. */
+int reftable_log_record_equal(struct reftable_log_record *a,
+			      struct reftable_log_record *b, int hash_size);
+
+/* dumps a reftable_log_record on stdout, for debugging/testing. */
+void reftable_log_record_print(struct reftable_log_record *log,
+			       uint32_t hash_id);
+
+/****************************************************************
+ Error handling
+
+ Error are signaled with negative integer return values. 0 means success.
+ ****************************************************************/
+
+/* different types of errors */
+enum reftable_error {
+	/* Unexpected file system behavior */
+	REFTABLE_IO_ERROR = -2,
+
+	/* Format inconsistency on reading data
+	 */
+	REFTABLE_FORMAT_ERROR = -3,
+
+	/* File does not exist. Returned from block_source_from_file(),  because
+	   it needs special handling in stack.
+	*/
+	REFTABLE_NOT_EXIST_ERROR = -4,
+
+	/* Trying to write out-of-date data. */
+	REFTABLE_LOCK_ERROR = -5,
+
+	/* Misuse of the API:
+	   - on writing a record with NULL refname.
+	   - on writing a reftable_ref_record outside the table limits
+	   - on writing a ref or log record before the stack's next_update_index
+	   - on writing a log record with multiline message with
+	   exact_log_message unset
+	   - on reading a reftable_ref_record from log iterator, or vice versa.
+	*/
+	REFTABLE_API_ERROR = -6,
+
+	/* Decompression error */
+	REFTABLE_ZLIB_ERROR = -7,
+
+	/* Wrote a table without blocks. */
+	REFTABLE_EMPTY_TABLE_ERROR = -8,
+
+	/* Dir/file conflict. */
+	REFTABLE_NAME_CONFLICT = -9,
+
+	/* Illegal ref name. */
+	REFTABLE_REFNAME_ERROR = -10,
+};
+
+/* convert the numeric error code to a string. The string should not be
+ * deallocated. */
+const char *reftable_error_str(int err);
+
+/*
+ * Convert the numeric error code to an equivalent errno code.
+ */
+int reftable_error_to_errno(int err);
+
+/****************************************************************
+ Writing
+
+ Writing single reftables
+ ****************************************************************/
+
+/* reftable_write_options sets options for writing a single reftable. */
+struct reftable_write_options {
+	/* boolean: do not pad out blocks to block size. */
+	int unpadded;
+
+	/* the blocksize. Should be less than 2^24. */
+	uint32_t block_size;
+
+	/* boolean: do not generate a SHA1 => ref index. */
+	int skip_index_objects;
+
+	/* how often to write complete keys in each block. */
+	int restart_interval;
+
+	/* 4-byte identifier ("sha1", "s256") of the hash.
+	 * Defaults to SHA1 if unset
+	 */
+	uint32_t hash_id;
+
+	/* boolean: do not check ref names for validity or dir/file conflicts.
+	 */
+	int skip_name_check;
+
+	/* boolean: copy log messages exactly. If unset, check that the message
+	 *   is a single line, and add '\n' if missing.
+	 */
+	int exact_log_message;
+};
+
+/* reftable_block_stats holds statistics for a single block type */
+struct reftable_block_stats {
+	/* total number of entries written */
+	int entries;
+	/* total number of key restarts */
+	int restarts;
+	/* total number of blocks */
+	int blocks;
+	/* total number of index blocks */
+	int index_blocks;
+	/* depth of the index */
+	int max_index_level;
+
+	/* offset of the first block for this type */
+	uint64_t offset;
+	/* offset of the top level index block for this type, or 0 if not
+	 * present */
+	uint64_t index_offset;
+};
+
+/* stats holds overall statistics for a single reftable */
+struct reftable_stats {
+	/* total number of blocks written. */
+	int blocks;
+	/* stats for ref data */
+	struct reftable_block_stats ref_stats;
+	/* stats for the SHA1 to ref map. */
+	struct reftable_block_stats obj_stats;
+	/* stats for index blocks */
+	struct reftable_block_stats idx_stats;
+	/* stats for log blocks */
+	struct reftable_block_stats log_stats;
+
+	/* disambiguation length of shortened object IDs. */
+	int object_id_len;
+};
+
+/* reftable_new_writer creates a new writer */
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, const void *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts);
+
+/* write to a file descriptor. fdp should be an int* pointing to the fd. */
+int reftable_fd_write(void *fdp, const void *data, size_t size);
+
+/* Set the range of update indices for the records we will add.  When
+   writing a table into a stack, the min should be at least
+   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+
+   For transactional updates, typically min==max. When converting an existing
+   ref database into a single reftable, this would be a range of update-index
+   timestamps.
+ */
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max);
+
+/* adds a reftable_ref_record. Must be called in ascending
+   order. The update_index must be within the limits set by
+   reftable_writer_set_limits(), or REFTABLE_API_ERROR is returned.
+
+   It is an error to write a ref record after a log record.
+ */
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref);
+
+/* Convenience function to add multiple refs. Will sort the refs by
+   name before adding. */
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n);
+
+/* adds a reftable_log_record. Must be called in ascending order (with more
+   recent log entries first.)
+ */
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log);
+
+/* Convenience function to add multiple logs. Will sort the records by
+   key before adding. */
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n);
+
+/* reftable_writer_close finalizes the reftable. The writer is retained so
+ * statistics can be inspected. */
+int reftable_writer_close(struct reftable_writer *w);
+
+/* writer_stats returns the statistics on the reftable being written.
+
+   This struct becomes invalid when the writer is freed.
+ */
+const struct reftable_stats *writer_stats(struct reftable_writer *w);
+
+/* reftable_writer_free deallocates memory for the writer */
+void reftable_writer_free(struct reftable_writer *w);
+
+/****************************************************************
+ * ITERATING
+ ****************************************************************/
+
+/* iterator is the generic interface for walking over data stored in a
+   reftable. It is generally passed around by value.
+*/
+struct reftable_iterator {
+	struct reftable_iterator_vtable *ops;
+	void *iter_arg;
+};
+
+/* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_ref(struct reftable_iterator *it,
+			       struct reftable_ref_record *ref);
+
+/* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
+   end of iteration.
+*/
+int reftable_iterator_next_log(struct reftable_iterator *it,
+			       struct reftable_log_record *log);
+
+/* releases resources associated with an iterator. */
+void reftable_iterator_destroy(struct reftable_iterator *it);
+
+/****************************************************************
+ Reading single tables
+
+ The follow routines are for reading single files. For an application-level
+ interface, skip ahead to struct reftable_merged_table and struct
+ reftable_stack.
+ ****************************************************************/
+
+/* block_source is a generic wrapper for a seekable readable file.
+   It is generally passed around by value.
+ */
+struct reftable_block_source {
+	struct reftable_block_source_vtable *ops;
+	void *arg;
+};
+
+/* a contiguous segment of bytes. It keeps track of its generating block_source
+   so it can return itself into the pool.
+*/
+struct reftable_block {
+	uint8_t *data;
+	int len;
+	struct reftable_block_source source;
+};
+
+/* block_source_vtable are the operations that make up block_source */
+struct reftable_block_source_vtable {
+	/* returns the size of a block source */
+	uint64_t (*size)(void *source);
+
+	/* reads a segment from the block source. It is an error to read
+	   beyond the end of the block */
+	int (*read_block)(void *source, struct reftable_block *dest,
+			  uint64_t off, uint32_t size);
+	/* mark the block as read; may return the data back to malloc */
+	void (*return_block)(void *source, struct reftable_block *blockp);
+
+	/* release all resources associated with the block source */
+	void (*close)(void *source);
+};
+
+/* opens a file on the file system as a block_source */
+int reftable_block_source_from_file(struct reftable_block_source *block_src,
+				    const char *name);
+
+/* The reader struct is a handle to an open reftable file. */
+struct reftable_reader;
+
+/* reftable_new_reader opens a reftable for reading. If successful, returns 0
+ * code and sets pp. The name is used for creating a stack. Typically, it is the
+ * basename of the file. The block source `src` is owned by the reader, and is
+ * closed on calling reftable_reader_destroy().
+ */
+int reftable_new_reader(struct reftable_reader **pp,
+			struct reftable_block_source *src, const char *name);
+
+/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
+   in the table.  To seek to the start of the table, use name = "".
+
+   example:
+
+   struct reftable_reader *r = NULL;
+   int err = reftable_new_reader(&r, &src, "filename");
+   if (err < 0) { ... }
+   struct reftable_iterator it  = {0};
+   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
+   if (err < 0) { ... }
+   struct reftable_ref_record ref  = {0};
+   while (1) {
+     err = reftable_iterator_next_ref(&it, &ref);
+     if (err > 0) {
+       break;
+     }
+     if (err < 0) {
+       ..error handling..
+     }
+     ..found..
+   }
+   reftable_iterator_destroy(&it);
+   reftable_ref_record_clear(&ref);
+ */
+int reftable_reader_seek_ref(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* returns the hash ID used in this table. */
+uint32_t reftable_reader_hash_id(struct reftable_reader *r);
+
+/* seek to logs for the given name, older than update_index. To seek to the
+   start of the table, use name = "".
+ */
+int reftable_reader_seek_log_at(struct reftable_reader *r,
+				struct reftable_iterator *it, const char *name,
+				uint64_t update_index);
+
+/* seek to newest log entry for given name. */
+int reftable_reader_seek_log(struct reftable_reader *r,
+			     struct reftable_iterator *it, const char *name);
+
+/* closes and deallocates a reader. */
+void reftable_reader_free(struct reftable_reader *);
+
+/* return an iterator for the refs pointing to `oid`. */
+int reftable_reader_refs_for(struct reftable_reader *r,
+			     struct reftable_iterator *it, uint8_t *oid);
+
+/* return the max_update_index for a table */
+uint64_t reftable_reader_max_update_index(struct reftable_reader *r);
+
+/* return the min_update_index for a table */
+uint64_t reftable_reader_min_update_index(struct reftable_reader *r);
+
+/****************************************************************
+ Merged tables
+
+ A ref database kept in a sequence of table files. The merged_table presents a
+ unified view to reading (seeking, iterating) a sequence of immutable tables.
+ ****************************************************************/
+
+/* A merged table is implements seeking/iterating over a stack of tables. */
+struct reftable_merged_table;
+
+/* A generic reftable; see below. */
+struct reftable_table;
+
+/* reftable_new_merged_table creates a new merged table. It takes ownership of
+   the stack array.
+*/
+int reftable_new_merged_table(struct reftable_merged_table **dest,
+			      struct reftable_table *stack, int n,
+			      uint32_t hash_id);
+
+/* returns an iterator positioned just before 'name' */
+int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns an iterator for log entry, at given update_index */
+int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
+				      struct reftable_iterator *it,
+				      const char *name, uint64_t update_index);
+
+/* like reftable_merged_table_seek_log_at but look for the newest entry. */
+int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   const char *name);
+
+/* returns the max update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
+
+/* returns the min update_index covered by this merged table. */
+uint64_t
+reftable_merged_table_min_update_index(struct reftable_merged_table *mt);
+
+/* releases memory for the merged_table */
+void reftable_merged_table_free(struct reftable_merged_table *m);
+
+/* return the hash ID of the merged table. */
+uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *m);
+
+/****************************************************************
+ Generic tables
+
+ A unified API for reading tables, either merged tables, or single readers.
+ ****************************************************************/
+
+struct reftable_table {
+	struct reftable_table_vtable *ops;
+	void *table_arg;
+};
+
+int reftable_table_seek_ref(struct reftable_table *tab,
+			    struct reftable_iterator *it, const char *name);
+
+void reftable_table_from_reader(struct reftable_table *tab,
+				struct reftable_reader *reader);
+
+/* returns the hash ID from a generic reftable_table */
+uint32_t reftable_table_hash_id(struct reftable_table *tab);
+
+/* create a generic table from reftable_merged_table */
+void reftable_table_from_merged_table(struct reftable_table *tab,
+				      struct reftable_merged_table *table);
+
+/* returns the max update_index covered by this table. */
+uint64_t reftable_table_max_update_index(struct reftable_table *tab);
+
+/* returns the min update_index covered by this table. */
+uint64_t reftable_table_min_update_index(struct reftable_table *tab);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_table_read_ref(struct reftable_table *tab, const char *name,
+			    struct reftable_ref_record *ref);
+
+/****************************************************************
+ Mutable ref database
+
+ The stack presents an interface to a mutable sequence of reftables.
+ ****************************************************************/
+
+/* a stack is a stack of reftables, which can be mutated by pushing a table to
+ * the top of the stack */
+struct reftable_stack;
+
+/* open a new reftable stack. The tables along with the table list will be
+   stored in 'dir'. Typically, this should be .git/reftables.
+*/
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config);
+
+/* returns the update_index at which a next table should be written. */
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st);
+
+/* holds a transaction to add tables at the top of a stack. */
+struct reftable_addition;
+
+/*
+  returns a new transaction to add reftables to the given stack. As a side
+  effect, the ref database is locked.
+*/
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st);
+
+/* Adds a reftable to transaction. */
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg);
+
+/* Commits the transaction, releasing the lock. */
+int reftable_addition_commit(struct reftable_addition *add);
+
+/* Release all non-committed data from the transaction, and deallocate the
+   transaction. Releases the lock if held. */
+void reftable_addition_destroy(struct reftable_addition *add);
+
+/* add a new table to the stack. The write_table function must call
+   reftable_writer_set_limits, add refs and return an error value. */
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write_table)(struct reftable_writer *wr,
+					  void *write_arg),
+		       void *write_arg);
+
+/* returns the merged_table for seeking. This table is valid until the
+   next write or reload, and should not be closed or deleted.
+*/
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st);
+
+/* frees all resources associated with the stack. */
+void reftable_stack_destroy(struct reftable_stack *st);
+
+/* Reloads the stack if necessary. This is very cheap to run if the stack was up
+ * to date */
+int reftable_stack_reload(struct reftable_stack *st);
+
+/* Policy for expiring reflog entries. */
+struct reftable_log_expiry_config {
+	/* Drop entries older than this timestamp */
+	uint64_t time;
+
+	/* Drop older entries */
+	uint64_t min_update_index;
+};
+
+/* compacts all reftables into a giant table. Expire reflog entries if config is
+ * non-NULL */
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config);
+
+/* heuristically compact unbalanced table stack. */
+int reftable_stack_auto_compact(struct reftable_stack *st);
+
+/* convenience function to read a single ref. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref);
+
+/* convenience function to read a single log. Returns < 0 for error, 0
+   for success, and 1 if ref not found. */
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log);
+
+/* statistics on past compactions. */
+struct reftable_compaction_stats {
+	uint64_t bytes; /* total number of bytes written */
+	uint64_t entries_written; /* total number of entries written, including
+				     failures. */
+	int attempts; /* how often we tried to compact */
+	int failures; /* failures happen on concurrent updates */
+};
+
+/* return statistics for compaction up till now. */
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st);
+
+#endif
diff --git a/reftable/reftable_test.c b/reftable/reftable_test.c
new file mode 100644
index 0000000000..ad21630999
--- /dev/null
+++ b/reftable/reftable_test.c
@@ -0,0 +1,632 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "reftable.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "block.h"
+#include "constants.h"
+#include "reader.h"
+#include "record.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static const int update_index = 5;
+
+static void test_buffer(void)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_block_source source = { NULL };
+	struct reftable_block out = { 0 };
+	int n;
+	uint8_t in[] = "hello";
+	strbuf_add(&buf, in, sizeof(in));
+	block_source_from_strbuf(&source, &buf);
+	assert(block_source_size(&source) == 6);
+	n = block_source_read_block(&source, &out, 0, sizeof(in));
+	assert(n == sizeof(in));
+	assert(!memcmp(in, out.data, n));
+	reftable_block_done(&out);
+
+	n = block_source_read_block(&source, &out, 1, 2);
+	assert(n == 2);
+	assert(!memcmp(out.data, "el", 2));
+
+	reftable_block_done(&out);
+	block_source_close(&source);
+	strbuf_release(&buf);
+}
+
+static void test_default_write_opts(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+
+	struct reftable_ref_record rec = {
+		.refname = "master",
+		.update_index = 1,
+	};
+	int err;
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader **readers =
+		reftable_calloc(sizeof(*readers) * 1);
+	struct reftable_table *tab = reftable_calloc(sizeof(*tab) * 1);
+	uint32_t hash_id;
+	struct reftable_reader *rd = NULL;
+	struct reftable_merged_table *merged = NULL;
+
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_add_ref(w, &rec);
+	assert_err(err);
+
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	hash_id = reftable_reader_hash_id(rd);
+	assert(hash_id == SHA1_ID);
+
+	reftable_table_from_reader(&tab[0], rd);
+	err = reftable_new_merged_table(&merged, tab, 1, SHA1_ID);
+	assert_err(err);
+
+	reader_close(rd);
+	reftable_merged_table_free(merged);
+	strbuf_release(&buf);
+}
+
+static void write_table(char ***names, struct strbuf *buf, int N,
+			int block_size, uint32_t hash_id)
+{
+	struct reftable_write_options opts = {
+		.block_size = block_size,
+		.hash_id = hash_id,
+	};
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, buf, &opts);
+	struct reftable_ref_record ref = { 0 };
+	int i = 0, n;
+	struct reftable_log_record log = { 0 };
+	const struct reftable_stats *stats = NULL;
+	*names = reftable_calloc(sizeof(char *) * (N + 1));
+	reftable_writer_set_limits(w, update_index, update_index);
+	for (i = 0; i < N; i++) {
+		uint8_t hash[SHA256_SIZE] = { 0 };
+		char name[100];
+		int n;
+
+		set_test_hash(hash, i);
+
+		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
+
+		ref.refname = name;
+		ref.value = hash;
+		ref.update_index = update_index;
+		(*names)[i] = xstrdup(name);
+
+		n = reftable_writer_add_ref(w, &ref);
+		assert(n == 0);
+	}
+
+	for (i = 0; i < N; i++) {
+		uint8_t hash[SHA256_SIZE] = { 0 };
+		char name[100];
+		int n;
+
+		set_test_hash(hash, i);
+
+		snprintf(name, sizeof(name), "refs/heads/branch%02d", i);
+
+		log.refname = name;
+		log.new_hash = hash;
+		log.update_index = update_index;
+		log.message = "message";
+
+		n = reftable_writer_add_log(w, &log);
+		assert(n == 0);
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	stats = writer_stats(w);
+	for (i = 0; i < stats->ref_stats.blocks; i++) {
+		int off = i * opts.block_size;
+		if (off == 0) {
+			off = header_size((hash_id == SHA256_ID) ? 2 : 1);
+		}
+		assert(buf->buf[off] == 'r');
+	}
+
+	assert(stats->log_stats.blocks > 0);
+	reftable_writer_free(w);
+}
+
+static void test_log_buffer_size(void)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_write_options opts = {
+		.block_size = 4096,
+	};
+	int err;
+	struct reftable_log_record log = {
+		.refname = "refs/heads/master",
+		.name = "Han-Wen Nienhuys",
+		.email = "hanwen@google.com",
+		.tz_offset = 100,
+		.time = 0x5e430672,
+		.update_index = 0xa,
+		.message = "commit: 9\n",
+	};
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+
+	/* This tests buffer extension for log compression. Must use a random
+	   hash, to ensure that the compressed part is larger than the original.
+	*/
+	uint8_t hash1[SHA1_SIZE], hash2[SHA1_SIZE];
+	for (int i = 0; i < SHA1_SIZE; i++) {
+		hash1[i] = (uint8_t)(rand() % 256);
+		hash2[i] = (uint8_t)(rand() % 256);
+	}
+	log.old_hash = hash1;
+	log.new_hash = hash2;
+	reftable_writer_set_limits(w, update_index, update_index);
+	err = reftable_writer_add_log(w, &log);
+	assert_err(err);
+	err = reftable_writer_close(w);
+	assert_err(err);
+	reftable_writer_free(w);
+	strbuf_release(&buf);
+}
+
+static void test_log_write_read(void)
+{
+	int N = 2;
+	char **names = reftable_calloc(sizeof(char *) * (N + 1));
+	int err;
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_ref_record ref = { 0 };
+	int i = 0;
+	struct reftable_log_record log = { 0 };
+	int n;
+	struct reftable_iterator it = { 0 };
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+	const struct reftable_stats *stats = NULL;
+	reftable_writer_set_limits(w, 0, N);
+	for (i = 0; i < N; i++) {
+		char name[256];
+		struct reftable_ref_record ref = { 0 };
+		snprintf(name, sizeof(name), "b%02d%0*d", i, 130, 7);
+		names[i] = xstrdup(name);
+		puts(name);
+		ref.refname = name;
+		ref.update_index = i;
+
+		err = reftable_writer_add_ref(w, &ref);
+		assert_err(err);
+	}
+	for (i = 0; i < N; i++) {
+		uint8_t hash1[SHA1_SIZE], hash2[SHA1_SIZE];
+		struct reftable_log_record log = { 0 };
+		set_test_hash(hash1, i);
+		set_test_hash(hash2, i + 1);
+
+		log.refname = names[i];
+		log.update_index = i;
+		log.old_hash = hash1;
+		log.new_hash = hash2;
+
+		err = reftable_writer_add_log(w, &log);
+		assert_err(err);
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	stats = writer_stats(w);
+	assert(stats->log_stats.blocks > 0);
+	reftable_writer_free(w);
+	w = NULL;
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.log");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, names[N - 1]);
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert_err(err);
+
+	/* end of iteration. */
+	err = reftable_iterator_next_ref(&it, &ref);
+	assert(0 < err);
+
+	reftable_iterator_destroy(&it);
+	reftable_ref_record_clear(&ref);
+
+	err = reftable_reader_seek_log(&rd, &it, "");
+	assert_err(err);
+
+	i = 0;
+	while (true) {
+		int err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			break;
+		}
+
+		assert_err(err);
+		assert_streq(names[i], log.refname);
+		assert(i == log.update_index);
+		i++;
+		reftable_log_record_clear(&log);
+	}
+
+	assert(i == N);
+	reftable_iterator_destroy(&it);
+
+	/* cleanup. */
+	strbuf_release(&buf);
+	free_names(names);
+	reader_close(&rd);
+}
+
+static void test_table_read_write_sequential(void)
+{
+	char **names;
+	struct strbuf buf = STRBUF_INIT;
+	int N = 50;
+	struct reftable_iterator it = { 0 };
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader rd = { 0 };
+	int err = 0;
+	int j = 0;
+
+	write_table(&names, &buf, N, 256, SHA1_ID);
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, "");
+	assert_err(err);
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		int r = reftable_iterator_next_ref(&it, &ref);
+		assert(r >= 0);
+		if (r > 0) {
+			break;
+		}
+		assert(0 == strcmp(names[j], ref.refname));
+		assert(update_index == ref.update_index);
+
+		j++;
+		reftable_ref_record_clear(&ref);
+	}
+	assert(j == N);
+	reftable_iterator_destroy(&it);
+	strbuf_release(&buf);
+	free_names(names);
+
+	reader_close(&rd);
+}
+
+static void test_table_write_small_table(void)
+{
+	char **names;
+	struct strbuf buf = STRBUF_INIT;
+	int N = 1;
+	write_table(&names, &buf, N, 4096, SHA1_ID);
+	assert(buf.len < 200);
+	strbuf_release(&buf);
+	free_names(names);
+}
+
+static void test_table_read_api(void)
+{
+	char **names;
+	struct strbuf buf = STRBUF_INIT;
+	int N = 50;
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	int err;
+	int i;
+	struct reftable_log_record log = { 0 };
+	struct reftable_iterator it = { 0 };
+
+	write_table(&names, &buf, N, 256, SHA1_ID);
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(&rd, &it, names[0]);
+	assert_err(err);
+
+	err = reftable_iterator_next_log(&it, &log);
+	assert(err == REFTABLE_API_ERROR);
+
+	strbuf_release(&buf);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+	reftable_iterator_destroy(&it);
+	reftable_free(names);
+	reader_close(&rd);
+	strbuf_release(&buf);
+}
+
+static void test_table_read_write_seek(bool index, int hash_id)
+{
+	char **names;
+	struct strbuf buf = STRBUF_INIT;
+	int N = 50;
+	struct reftable_reader rd = { 0 };
+	struct reftable_block_source source = { 0 };
+	int err;
+	int i = 0;
+
+	struct reftable_iterator it = { 0 };
+	struct strbuf pastLast = STRBUF_INIT;
+	struct reftable_ref_record ref = { 0 };
+
+	write_table(&names, &buf, N, 256, hash_id);
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+	assert(hash_id == reftable_reader_hash_id(&rd));
+
+	if (!index) {
+		rd.ref_offsets.index_offset = 0;
+	} else {
+		assert(rd.ref_offsets.index_offset > 0);
+	}
+
+	for (i = 1; i < N; i++) {
+		int err = reftable_reader_seek_ref(&rd, &it, names[i]);
+		assert_err(err);
+		err = reftable_iterator_next_ref(&it, &ref);
+		assert_err(err);
+		assert(0 == strcmp(names[i], ref.refname));
+		assert(i == ref.value[0]);
+
+		reftable_ref_record_clear(&ref);
+		reftable_iterator_destroy(&it);
+	}
+
+	strbuf_addstr(&pastLast, names[N - 1]);
+	strbuf_addstr(&pastLast, "/");
+
+	err = reftable_reader_seek_ref(&rd, &it, pastLast.buf);
+	if (err == 0) {
+		struct reftable_ref_record ref = { 0 };
+		int err = reftable_iterator_next_ref(&it, &ref);
+		assert(err > 0);
+	} else {
+		assert(err > 0);
+	}
+
+	strbuf_release(&pastLast);
+	reftable_iterator_destroy(&it);
+
+	strbuf_release(&buf);
+	for (i = 0; i < N; i++) {
+		reftable_free(names[i]);
+	}
+	reftable_free(names);
+	reader_close(&rd);
+}
+
+static void test_table_read_write_seek_linear(void)
+{
+	test_table_read_write_seek(false, SHA1_ID);
+}
+
+static void test_table_read_write_seek_linear_sha256(void)
+{
+	test_table_read_write_seek(false, SHA256_ID);
+}
+
+static void test_table_read_write_seek_index(void)
+{
+	test_table_read_write_seek(true, SHA1_ID);
+}
+
+static void test_table_refs_for(bool indexed)
+{
+	int N = 50;
+	char **want_names = reftable_calloc(sizeof(char *) * (N + 1));
+	int want_names_len = 0;
+	uint8_t want_hash[SHA1_SIZE];
+
+	struct reftable_write_options opts = {
+		.block_size = 256,
+	};
+	struct reftable_ref_record ref = { 0 };
+	int i = 0;
+	int n;
+	int err;
+	struct reftable_reader rd;
+	struct reftable_block_source source = { 0 };
+
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+
+	struct reftable_iterator it = { 0 };
+	int j;
+
+	set_test_hash(want_hash, 4);
+
+	for (i = 0; i < N; i++) {
+		uint8_t hash[SHA1_SIZE];
+		char fill[51] = { 0 };
+		char name[100];
+		uint8_t hash1[SHA1_SIZE];
+		uint8_t hash2[SHA1_SIZE];
+		struct reftable_ref_record ref = { 0 };
+
+		memset(hash, i, sizeof(hash));
+		memset(fill, 'x', 50);
+		/* Put the variable part in the start */
+		snprintf(name, sizeof(name), "br%02d%s", i, fill);
+		name[40] = 0;
+		ref.refname = name;
+
+		set_test_hash(hash1, i / 4);
+		set_test_hash(hash2, 3 + i / 4);
+		ref.value = hash1;
+		ref.target_value = hash2;
+
+		/* 80 bytes / entry, so 3 entries per block. Yields 17
+		 */
+		/* blocks. */
+		n = reftable_writer_add_ref(w, &ref);
+		assert(n == 0);
+
+		if (!memcmp(hash1, want_hash, SHA1_SIZE) ||
+		    !memcmp(hash2, want_hash, SHA1_SIZE)) {
+			want_names[want_names_len++] = xstrdup(name);
+		}
+	}
+
+	n = reftable_writer_close(w);
+	assert(n == 0);
+
+	reftable_writer_free(w);
+	w = NULL;
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = init_reader(&rd, &source, "file.ref");
+	assert_err(err);
+	if (!indexed) {
+		rd.obj_offsets.present = 0;
+	}
+
+	err = reftable_reader_seek_ref(&rd, &it, "");
+	assert_err(err);
+	reftable_iterator_destroy(&it);
+
+	err = reftable_reader_refs_for(&rd, &it, want_hash);
+	assert_err(err);
+
+	j = 0;
+	while (true) {
+		int err = reftable_iterator_next_ref(&it, &ref);
+		assert(err >= 0);
+		if (err > 0) {
+			break;
+		}
+
+		assert(j < want_names_len);
+		assert(0 == strcmp(ref.refname, want_names[j]));
+		j++;
+		reftable_ref_record_clear(&ref);
+	}
+	assert(j == want_names_len);
+
+	strbuf_release(&buf);
+	free_names(want_names);
+	reftable_iterator_destroy(&it);
+	reader_close(&rd);
+}
+
+static void test_table_refs_for_no_index(void)
+{
+	test_table_refs_for(false);
+}
+
+static void test_table_refs_for_obj_index(void)
+{
+	test_table_refs_for(true);
+}
+
+static void test_table_empty(void)
+{
+	struct reftable_write_options opts = { 0 };
+	struct strbuf buf = STRBUF_INIT;
+	struct reftable_writer *w =
+		reftable_new_writer(&strbuf_add_void, &buf, &opts);
+	struct reftable_block_source source = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_ref_record rec = { 0 };
+	struct reftable_iterator it = { 0 };
+	int err;
+
+	reftable_writer_set_limits(w, 1, 1);
+
+	err = reftable_writer_close(w);
+	assert(err == REFTABLE_EMPTY_TABLE_ERROR);
+	reftable_writer_free(w);
+
+	assert(buf.len == header_size(1) + footer_size(1));
+
+	block_source_from_strbuf(&source, &buf);
+
+	err = reftable_new_reader(&rd, &source, "filename");
+	assert_err(err);
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	assert_err(err);
+
+	err = reftable_iterator_next_ref(&it, &rec);
+	assert(err > 0);
+
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	strbuf_release(&buf);
+}
+
+int reftable_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_default_write_opts", test_default_write_opts);
+	add_test_case("test_log_write_read", test_log_write_read);
+	add_test_case("test_table_read_write_seek_linear_sha256",
+		      &test_table_read_write_seek_linear_sha256);
+	add_test_case("test_log_buffer_size", test_log_buffer_size);
+	add_test_case("test_table_write_small_table",
+		      &test_table_write_small_table);
+	add_test_case("test_buffer", &test_buffer);
+	add_test_case("test_table_read_api", &test_table_read_api);
+	add_test_case("test_table_read_write_sequential",
+		      &test_table_read_write_sequential);
+	add_test_case("test_table_read_write_seek_linear",
+		      &test_table_read_write_seek_linear);
+	add_test_case("test_table_read_write_seek_index",
+		      &test_table_read_write_seek_index);
+	add_test_case("test_table_read_write_refs_for_no_index",
+		      &test_table_refs_for_no_index);
+	add_test_case("test_table_read_write_refs_for_obj_index",
+		      &test_table_refs_for_obj_index);
+	add_test_case("test_table_empty", &test_table_empty);
+	return test_main(argc, argv);
+}
diff --git a/reftable/stack.c b/reftable/stack.c
new file mode 100644
index 0000000000..94b2cb265c
--- /dev/null
+++ b/reftable/stack.c
@@ -0,0 +1,1225 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+#include "merged.h"
+#include "reader.h"
+#include "refname.h"
+#include "reftable.h"
+#include "writer.h"
+
+int reftable_new_stack(struct reftable_stack **dest, const char *dir,
+		       struct reftable_write_options config)
+{
+	struct reftable_stack *p =
+		reftable_calloc(sizeof(struct reftable_stack));
+	struct strbuf list_file_name = STRBUF_INIT;
+	int err = 0;
+
+	if (config.hash_id == 0) {
+		config.hash_id = SHA1_ID;
+	}
+
+	*dest = NULL;
+
+	strbuf_reset(&list_file_name);
+	strbuf_addstr(&list_file_name, dir);
+	strbuf_addstr(&list_file_name, "/tables.list");
+
+	p->list_file = strbuf_detach(&list_file_name, NULL);
+	p->reftable_dir = xstrdup(dir);
+	p->config = config;
+
+	err = reftable_stack_reload_maybe_reuse(p, true);
+	if (err < 0) {
+		reftable_stack_destroy(p);
+	} else {
+		*dest = p;
+	}
+	return err;
+}
+
+static int fd_read_lines(int fd, char ***namesp)
+{
+	off_t size = lseek(fd, 0, SEEK_END);
+	char *buf = NULL;
+	int err = 0;
+	if (size < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+	err = lseek(fd, 0, SEEK_SET);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	buf = reftable_malloc(size + 1);
+	if (read(fd, buf, size) != size) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+	buf[size] = 0;
+
+	parse_names(buf, size, namesp);
+
+done:
+	reftable_free(buf);
+	return err;
+}
+
+int read_lines(const char *filename, char ***namesp)
+{
+	int fd = open(filename, O_RDONLY, 0644);
+	int err = 0;
+	if (fd < 0) {
+		if (errno == ENOENT) {
+			*namesp = reftable_calloc(sizeof(char *));
+			return 0;
+		}
+
+		return REFTABLE_IO_ERROR;
+	}
+	err = fd_read_lines(fd, namesp);
+	close(fd);
+	return err;
+}
+
+struct reftable_merged_table *
+reftable_stack_merged_table(struct reftable_stack *st)
+{
+	return st->merged;
+}
+
+/* Close and free the stack */
+void reftable_stack_destroy(struct reftable_stack *st)
+{
+	if (st->merged != NULL) {
+		reftable_merged_table_free(st->merged);
+		st->merged = NULL;
+	}
+
+	if (st->readers != NULL) {
+		int i = 0;
+		for (i = 0; i < st->readers_len; i++) {
+			reftable_reader_free(st->readers[i]);
+		}
+		st->readers_len = 0;
+		FREE_AND_NULL(st->readers);
+	}
+	FREE_AND_NULL(st->list_file);
+	FREE_AND_NULL(st->reftable_dir);
+	reftable_free(st);
+}
+
+static struct reftable_reader **stack_copy_readers(struct reftable_stack *st,
+						   int cur_len)
+{
+	struct reftable_reader **cur =
+		reftable_calloc(sizeof(struct reftable_reader *) * cur_len);
+	int i = 0;
+	for (i = 0; i < cur_len; i++) {
+		cur[i] = st->readers[i];
+	}
+	return cur;
+}
+
+static int reftable_stack_reload_once(struct reftable_stack *st, char **names,
+				      bool reuse_open)
+{
+	int cur_len = st->merged == NULL ? 0 : st->merged->stack_len;
+	struct reftable_reader **cur = stack_copy_readers(st, cur_len);
+	int err = 0;
+	int names_len = names_length(names);
+	struct reftable_reader **new_readers =
+		reftable_calloc(sizeof(struct reftable_reader *) * names_len);
+	struct reftable_table *new_tables =
+		reftable_calloc(sizeof(struct reftable_table) * names_len);
+	int new_readers_len = 0;
+	struct reftable_merged_table *new_merged = NULL;
+	int i;
+
+	while (*names) {
+		struct reftable_reader *rd = NULL;
+		char *name = *names++;
+
+		/* this is linear; we assume compaction keeps the number of
+		   tables under control so this is not quadratic. */
+		int j = 0;
+		for (j = 0; reuse_open && j < cur_len; j++) {
+			if (cur[j] != NULL && 0 == strcmp(cur[j]->name, name)) {
+				rd = cur[j];
+				cur[j] = NULL;
+				break;
+			}
+		}
+
+		if (rd == NULL) {
+			struct reftable_block_source src = { 0 };
+			struct strbuf table_path = STRBUF_INIT;
+			strbuf_addstr(&table_path, st->reftable_dir);
+			strbuf_addstr(&table_path, "/");
+			strbuf_addstr(&table_path, name);
+
+			err = reftable_block_source_from_file(&src,
+							      table_path.buf);
+			strbuf_release(&table_path);
+
+			if (err < 0)
+				goto done;
+
+			err = reftable_new_reader(&rd, &src, name);
+			if (err < 0)
+				goto done;
+		}
+
+		new_readers[new_readers_len] = rd;
+		reftable_table_from_reader(&new_tables[new_readers_len], rd);
+		new_readers_len++;
+	}
+
+	/* success! */
+	err = reftable_new_merged_table(&new_merged, new_tables,
+					new_readers_len, st->config.hash_id);
+	if (err < 0)
+		goto done;
+
+	new_tables = NULL;
+	st->readers_len = new_readers_len;
+	if (st->merged != NULL) {
+		merged_table_clear(st->merged);
+		reftable_merged_table_free(st->merged);
+	}
+	if (st->readers != NULL) {
+		reftable_free(st->readers);
+	}
+	st->readers = new_readers;
+	new_readers = NULL;
+	new_readers_len = 0;
+
+	new_merged->suppress_deletions = true;
+	st->merged = new_merged;
+	for (i = 0; i < cur_len; i++) {
+		if (cur[i] != NULL) {
+			reader_close(cur[i]);
+			reftable_reader_free(cur[i]);
+		}
+	}
+
+done:
+	for (i = 0; i < new_readers_len; i++) {
+		reader_close(new_readers[i]);
+		reftable_reader_free(new_readers[i]);
+	}
+	reftable_free(new_readers);
+	reftable_free(new_tables);
+	reftable_free(cur);
+	return err;
+}
+
+/* return negative if a before b. */
+static int tv_cmp(struct timeval *a, struct timeval *b)
+{
+	time_t diff = a->tv_sec - b->tv_sec;
+	int udiff = a->tv_usec - b->tv_usec;
+
+	if (diff != 0)
+		return diff;
+
+	return udiff;
+}
+
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open)
+{
+	struct timeval deadline = { 0 };
+	int err = gettimeofday(&deadline, NULL);
+	int64_t delay = 0;
+	int tries = 0;
+	if (err < 0)
+		return err;
+
+	deadline.tv_sec += 3;
+	while (true) {
+		char **names = NULL;
+		char **names_after = NULL;
+		struct timeval now = { 0 };
+		int err = gettimeofday(&now, NULL);
+		int err2 = 0;
+		if (err < 0) {
+			return err;
+		}
+
+		/* Only look at deadlines after the first few times. This
+		   simplifies debugging in GDB */
+		tries++;
+		if (tries > 3 && tv_cmp(&now, &deadline) >= 0) {
+			break;
+		}
+
+		err = read_lines(st->list_file, &names);
+		if (err < 0) {
+			free_names(names);
+			return err;
+		}
+		err = reftable_stack_reload_once(st, names, reuse_open);
+		if (err == 0) {
+			free_names(names);
+			break;
+		}
+		if (err != REFTABLE_NOT_EXIST_ERROR) {
+			free_names(names);
+			return err;
+		}
+
+		/* err == REFTABLE_NOT_EXIST_ERROR can be caused by a concurrent
+		   writer. Check if there was one by checking if the name list
+		   changed.
+		*/
+		err2 = read_lines(st->list_file, &names_after);
+		if (err2 < 0) {
+			free_names(names);
+			return err2;
+		}
+
+		if (names_equal(names_after, names)) {
+			free_names(names);
+			free_names(names_after);
+			return err;
+		}
+		free_names(names);
+		free_names(names_after);
+
+		delay = delay + (delay * rand()) / RAND_MAX + 1;
+		sleep_millisec(delay);
+	}
+
+	return 0;
+}
+
+/* -1 = error
+ 0 = up to date
+ 1 = changed. */
+static int stack_uptodate(struct reftable_stack *st)
+{
+	char **names = NULL;
+	int err = read_lines(st->list_file, &names);
+	int i = 0;
+	if (err < 0)
+		return err;
+
+	for (i = 0; i < st->readers_len; i++) {
+		if (names[i] == NULL) {
+			err = 1;
+			goto done;
+		}
+
+		if (strcmp(st->readers[i]->name, names[i])) {
+			err = 1;
+			goto done;
+		}
+	}
+
+	if (names[st->merged->stack_len] != NULL) {
+		err = 1;
+		goto done;
+	}
+
+done:
+	free_names(names);
+	return err;
+}
+
+int reftable_stack_reload(struct reftable_stack *st)
+{
+	int err = stack_uptodate(st);
+	if (err > 0)
+		return reftable_stack_reload_maybe_reuse(st, true);
+	return err;
+}
+
+int reftable_stack_add(struct reftable_stack *st,
+		       int (*write)(struct reftable_writer *wr, void *arg),
+		       void *arg)
+{
+	int err = stack_try_add(st, write, arg);
+	if (err < 0) {
+		if (err == REFTABLE_LOCK_ERROR) {
+			/* Ignore error return, we want to propagate
+			   REFTABLE_LOCK_ERROR.
+			*/
+			reftable_stack_reload(st);
+		}
+		return err;
+	}
+
+	if (!st->disable_auto_compact)
+		return reftable_stack_auto_compact(st);
+
+	return 0;
+}
+
+static void format_name(struct strbuf *dest, uint64_t min, uint64_t max)
+{
+	char buf[100];
+	snprintf(buf, sizeof(buf), "0x%012" PRIx64 "-0x%012" PRIx64, min, max);
+	strbuf_reset(dest);
+	strbuf_addstr(dest, buf);
+}
+
+struct reftable_addition {
+	int lock_file_fd;
+	struct strbuf lock_file_name;
+	struct reftable_stack *stack;
+	char **names;
+	char **new_tables;
+	int new_tables_len;
+	uint64_t next_update_index;
+};
+
+#define REFTABLE_ADDITION_INIT                \
+	{                                     \
+		.lock_file_name = STRBUF_INIT \
+	}
+
+static int reftable_stack_init_addition(struct reftable_addition *add,
+					struct reftable_stack *st)
+{
+	int err = 0;
+	add->stack = st;
+
+	strbuf_reset(&add->lock_file_name);
+	strbuf_addstr(&add->lock_file_name, st->list_file);
+	strbuf_addstr(&add->lock_file_name, ".lock");
+
+	add->lock_file_fd = open(add->lock_file_name.buf,
+				 O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (add->lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = REFTABLE_LOCK_ERROR;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	err = stack_uptodate(st);
+	if (err < 0)
+		goto done;
+
+	if (err > 1) {
+		err = REFTABLE_LOCK_ERROR;
+		goto done;
+	}
+
+	add->next_update_index = reftable_stack_next_update_index(st);
+done:
+	if (err) {
+		reftable_addition_close(add);
+	}
+	return err;
+}
+
+void reftable_addition_close(struct reftable_addition *add)
+{
+	int i = 0;
+	struct strbuf nm = STRBUF_INIT;
+	for (i = 0; i < add->new_tables_len; i++) {
+		strbuf_reset(&nm);
+		strbuf_addstr(&nm, add->stack->list_file);
+		strbuf_addstr(&nm, "/");
+		strbuf_addstr(&nm, add->new_tables[i]);
+		unlink(nm.buf);
+		reftable_free(add->new_tables[i]);
+		add->new_tables[i] = NULL;
+	}
+	reftable_free(add->new_tables);
+	add->new_tables = NULL;
+	add->new_tables_len = 0;
+
+	if (add->lock_file_fd > 0) {
+		close(add->lock_file_fd);
+		add->lock_file_fd = 0;
+	}
+	if (add->lock_file_name.len > 0) {
+		unlink(add->lock_file_name.buf);
+		strbuf_release(&add->lock_file_name);
+	}
+
+	free_names(add->names);
+	add->names = NULL;
+	strbuf_release(&nm);
+}
+
+void reftable_addition_destroy(struct reftable_addition *add)
+{
+	if (add == NULL) {
+		return;
+	}
+	reftable_addition_close(add);
+	reftable_free(add);
+}
+
+int reftable_addition_commit(struct reftable_addition *add)
+{
+	struct strbuf table_list = STRBUF_INIT;
+	int i = 0;
+	int err = 0;
+	if (add->new_tables_len == 0)
+		goto done;
+
+	for (i = 0; i < add->stack->merged->stack_len; i++) {
+		strbuf_addstr(&table_list, add->stack->readers[i]->name);
+		strbuf_addstr(&table_list, "\n");
+	}
+	for (i = 0; i < add->new_tables_len; i++) {
+		strbuf_addstr(&table_list, add->new_tables[i]);
+		strbuf_addstr(&table_list, "\n");
+	}
+
+	err = write(add->lock_file_fd, table_list.buf, table_list.len);
+	strbuf_release(&table_list);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = close(add->lock_file_fd);
+	add->lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = rename(add->lock_file_name.buf, add->stack->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = reftable_stack_reload(add->stack);
+
+done:
+	reftable_addition_close(add);
+	return err;
+}
+
+int reftable_stack_new_addition(struct reftable_addition **dest,
+				struct reftable_stack *st)
+{
+	int err = 0;
+	struct reftable_addition empty = REFTABLE_ADDITION_INIT;
+	*dest = reftable_calloc(sizeof(**dest));
+	**dest = empty;
+	err = reftable_stack_init_addition(*dest, st);
+	if (err) {
+		reftable_free(*dest);
+		*dest = NULL;
+	}
+	return err;
+}
+
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg)
+{
+	struct reftable_addition add = REFTABLE_ADDITION_INIT;
+	int err = reftable_stack_init_addition(&add, st);
+	if (err < 0)
+		goto done;
+	if (err > 0) {
+		err = REFTABLE_LOCK_ERROR;
+		goto done;
+	}
+
+	err = reftable_addition_add(&add, write_table, arg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_addition_commit(&add);
+done:
+	reftable_addition_close(&add);
+	return err;
+}
+
+int reftable_addition_add(struct reftable_addition *add,
+			  int (*write_table)(struct reftable_writer *wr,
+					     void *arg),
+			  void *arg)
+{
+	struct strbuf temp_tab_file_name = STRBUF_INIT;
+	struct strbuf tab_file_name = STRBUF_INIT;
+	struct strbuf next_name = STRBUF_INIT;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+	int tab_fd = 0;
+
+	strbuf_reset(&next_name);
+	format_name(&next_name, add->next_update_index, add->next_update_index);
+
+	strbuf_addstr(&temp_tab_file_name, add->stack->reftable_dir);
+	strbuf_addstr(&temp_tab_file_name, "/");
+	strbuf_addbuf(&temp_tab_file_name, &next_name);
+	strbuf_addstr(&temp_tab_file_name, ".temp.XXXXXX");
+
+	tab_fd = mkstemp(temp_tab_file_name.buf);
+	if (tab_fd < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd,
+				 &add->stack->config);
+	err = write_table(wr, arg);
+	if (err < 0)
+		goto done;
+
+	err = reftable_writer_close(wr);
+	if (err == REFTABLE_EMPTY_TABLE_ERROR) {
+		err = 0;
+		goto done;
+	}
+	if (err < 0)
+		goto done;
+
+	err = close(tab_fd);
+	tab_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	err = stack_check_addition(add->stack, temp_tab_file_name.buf);
+	if (err < 0)
+		goto done;
+
+	if (wr->min_update_index < add->next_update_index) {
+		err = REFTABLE_API_ERROR;
+		goto done;
+	}
+
+	format_name(&next_name, wr->min_update_index, wr->max_update_index);
+	strbuf_addstr(&next_name, ".ref");
+
+	strbuf_addstr(&tab_file_name, add->stack->reftable_dir);
+	strbuf_addstr(&tab_file_name, "/");
+	strbuf_addbuf(&tab_file_name, &next_name);
+
+	/* TODO: should check destination out of paranoia */
+	err = rename(temp_tab_file_name.buf, tab_file_name.buf);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		goto done;
+	}
+
+	add->new_tables = reftable_realloc(add->new_tables,
+					   sizeof(*add->new_tables) *
+						   (add->new_tables_len + 1));
+	add->new_tables[add->new_tables_len] = strbuf_detach(&next_name, NULL);
+	add->new_tables_len++;
+done:
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (temp_tab_file_name.len > 0) {
+		unlink(temp_tab_file_name.buf);
+	}
+
+	strbuf_release(&temp_tab_file_name);
+	strbuf_release(&tab_file_name);
+	strbuf_release(&next_name);
+	reftable_writer_free(wr);
+	return err;
+}
+
+uint64_t reftable_stack_next_update_index(struct reftable_stack *st)
+{
+	int sz = st->merged->stack_len;
+	if (sz > 0)
+		return reftable_reader_max_update_index(st->readers[sz - 1]) +
+		       1;
+	return 1;
+}
+
+static int stack_compact_locked(struct reftable_stack *st, int first, int last,
+				struct strbuf *temp_tab,
+				struct reftable_log_expiry_config *config)
+{
+	struct strbuf next_name = STRBUF_INIT;
+	int tab_fd = -1;
+	struct reftable_writer *wr = NULL;
+	int err = 0;
+
+	format_name(&next_name,
+		    reftable_reader_min_update_index(st->readers[first]),
+		    reftable_reader_max_update_index(st->readers[first]));
+
+	strbuf_reset(temp_tab);
+	strbuf_addstr(temp_tab, st->reftable_dir);
+	strbuf_addstr(temp_tab, "/");
+	strbuf_addbuf(temp_tab, &next_name);
+	strbuf_addstr(temp_tab, ".temp.XXXXXX");
+
+	tab_fd = mkstemp(temp_tab->buf);
+	wr = reftable_new_writer(reftable_fd_write, &tab_fd, &st->config);
+
+	err = stack_write_compact(st, wr, first, last, config);
+	if (err < 0)
+		goto done;
+	err = reftable_writer_close(wr);
+	if (err < 0)
+		goto done;
+
+	err = close(tab_fd);
+	tab_fd = 0;
+
+done:
+	reftable_writer_free(wr);
+	if (tab_fd > 0) {
+		close(tab_fd);
+		tab_fd = 0;
+	}
+	if (err != 0 && temp_tab->len > 0) {
+		unlink(temp_tab->buf);
+		strbuf_release(temp_tab);
+	}
+	strbuf_release(&next_name);
+	return err;
+}
+
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config)
+{
+	int subtabs_len = last - first + 1;
+	struct reftable_table *subtabs = reftable_calloc(
+		sizeof(struct reftable_table) * (last - first + 1));
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_iterator it = { 0 };
+	struct reftable_ref_record ref = { 0 };
+	struct reftable_log_record log = { 0 };
+
+	uint64_t entries = 0;
+
+	int i = 0, j = 0;
+	for (i = first, j = 0; i <= last; i++) {
+		struct reftable_reader *t = st->readers[i];
+		reftable_table_from_reader(&subtabs[j++], t);
+		st->stats.bytes += t->size;
+	}
+	reftable_writer_set_limits(wr, st->readers[first]->min_update_index,
+				   st->readers[last]->max_update_index);
+
+	err = reftable_new_merged_table(&mt, subtabs, subtabs_len,
+					st->config.hash_id);
+	if (err < 0) {
+		reftable_free(subtabs);
+		goto done;
+	}
+
+	err = reftable_merged_table_seek_ref(mt, &it, "");
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_ref_record_is_deletion(&ref)) {
+			continue;
+		}
+
+		err = reftable_writer_add_ref(wr, &ref);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+	reftable_iterator_destroy(&it);
+
+	err = reftable_merged_table_seek_log(mt, &it, "");
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+		if (first == 0 && reftable_log_record_is_deletion(&log)) {
+			continue;
+		}
+
+		if (config != NULL && config->time > 0 &&
+		    log.time < config->time) {
+			continue;
+		}
+
+		if (config != NULL && config->min_update_index > 0 &&
+		    log.update_index < config->min_update_index) {
+			continue;
+		}
+
+		err = reftable_writer_add_log(wr, &log);
+		if (err < 0) {
+			break;
+		}
+		entries++;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	if (mt != NULL) {
+		merged_table_clear(mt);
+		reftable_merged_table_free(mt);
+	}
+	reftable_ref_record_clear(&ref);
+	reftable_log_record_clear(&log);
+	st->stats.entries_written += entries;
+	return err;
+}
+
+/* <  0: error. 0 == OK, > 0 attempt failed; could retry. */
+static int stack_compact_range(struct reftable_stack *st, int first, int last,
+			       struct reftable_log_expiry_config *expiry)
+{
+	struct strbuf temp_tab_file_name = STRBUF_INIT;
+	struct strbuf new_table_name = STRBUF_INIT;
+	struct strbuf lock_file_name = STRBUF_INIT;
+	struct strbuf ref_list_contents = STRBUF_INIT;
+	struct strbuf new_table_path = STRBUF_INIT;
+	int err = 0;
+	bool have_lock = false;
+	int lock_file_fd = 0;
+	int compact_count = last - first + 1;
+	char **listp = NULL;
+	char **delete_on_success =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	char **subtable_locks =
+		reftable_calloc(sizeof(char *) * (compact_count + 1));
+	int i = 0;
+	int j = 0;
+	bool is_empty_table = false;
+
+	if (first > last || (expiry == NULL && first == last)) {
+		err = 0;
+		goto done;
+	}
+
+	st->stats.attempts++;
+
+	strbuf_reset(&lock_file_name);
+	strbuf_addstr(&lock_file_name, st->list_file);
+	strbuf_addstr(&lock_file_name, ".lock");
+
+	lock_file_fd =
+		open(lock_file_name.buf, O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	/* Don't want to write to the lock for now.  */
+	close(lock_file_fd);
+	lock_file_fd = 0;
+
+	have_lock = true;
+	err = stack_uptodate(st);
+	if (err != 0)
+		goto done;
+
+	for (i = first, j = 0; i <= last; i++) {
+		struct strbuf subtab_file_name = STRBUF_INIT;
+		struct strbuf subtab_lock = STRBUF_INIT;
+		int sublock_file_fd = -1;
+
+		strbuf_addstr(&subtab_file_name, st->reftable_dir);
+		strbuf_addstr(&subtab_file_name, "/");
+		strbuf_addstr(&subtab_file_name, reader_name(st->readers[i]));
+
+		strbuf_reset(&subtab_lock);
+		strbuf_addbuf(&subtab_lock, &subtab_file_name);
+		strbuf_addstr(&subtab_lock, ".lock");
+
+		sublock_file_fd = open(subtab_lock.buf,
+				       O_EXCL | O_CREAT | O_WRONLY, 0644);
+		if (sublock_file_fd > 0) {
+			close(sublock_file_fd);
+		} else if (sublock_file_fd < 0) {
+			if (errno == EEXIST) {
+				err = 1;
+			} else {
+				err = REFTABLE_IO_ERROR;
+			}
+		}
+
+		subtable_locks[j] = subtab_lock.buf;
+		delete_on_success[j] = subtab_file_name.buf;
+		j++;
+
+		if (err != 0)
+			goto done;
+	}
+
+	err = unlink(lock_file_name.buf);
+	if (err < 0)
+		goto done;
+	have_lock = false;
+
+	err = stack_compact_locked(st, first, last, &temp_tab_file_name,
+				   expiry);
+	/* Compaction + tombstones can create an empty table out of non-empty
+	 * tables. */
+	is_empty_table = (err == REFTABLE_EMPTY_TABLE_ERROR);
+	if (is_empty_table) {
+		err = 0;
+	}
+	if (err < 0)
+		goto done;
+
+	lock_file_fd =
+		open(lock_file_name.buf, O_EXCL | O_CREAT | O_WRONLY, 0644);
+	if (lock_file_fd < 0) {
+		if (errno == EEXIST) {
+			err = 1;
+		} else {
+			err = REFTABLE_IO_ERROR;
+		}
+		goto done;
+	}
+	have_lock = true;
+
+	format_name(&new_table_name, st->readers[first]->min_update_index,
+		    st->readers[last]->max_update_index);
+	strbuf_addstr(&new_table_name, ".ref");
+
+	strbuf_reset(&new_table_path);
+	strbuf_addstr(&new_table_path, st->reftable_dir);
+	strbuf_addstr(&new_table_path, "/");
+	strbuf_addbuf(&new_table_path, &new_table_name);
+
+	if (!is_empty_table) {
+		err = rename(temp_tab_file_name.buf, new_table_path.buf);
+		if (err < 0) {
+			err = REFTABLE_IO_ERROR;
+			goto done;
+		}
+	}
+
+	for (i = 0; i < first; i++) {
+		strbuf_addstr(&ref_list_contents, st->readers[i]->name);
+		strbuf_addstr(&ref_list_contents, "\n");
+	}
+	if (!is_empty_table) {
+		strbuf_addbuf(&ref_list_contents, &new_table_name);
+		strbuf_addstr(&ref_list_contents, "\n");
+	}
+	for (i = last + 1; i < st->merged->stack_len; i++) {
+		strbuf_addstr(&ref_list_contents, st->readers[i]->name);
+		strbuf_addstr(&ref_list_contents, "\n");
+	}
+
+	err = write(lock_file_fd, ref_list_contents.buf, ref_list_contents.len);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(new_table_path.buf);
+		goto done;
+	}
+	err = close(lock_file_fd);
+	lock_file_fd = 0;
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(new_table_path.buf);
+		goto done;
+	}
+
+	err = rename(lock_file_name.buf, st->list_file);
+	if (err < 0) {
+		err = REFTABLE_IO_ERROR;
+		unlink(new_table_path.buf);
+		goto done;
+	}
+	have_lock = false;
+
+	/* Reload the stack before deleting. On windows, we can only delete the
+	   files after we closed them.
+	*/
+	err = reftable_stack_reload_maybe_reuse(st, first < last);
+
+	listp = delete_on_success;
+	while (*listp) {
+		if (strcmp(*listp, new_table_path.buf)) {
+			unlink(*listp);
+		}
+		listp++;
+	}
+
+done:
+	free_names(delete_on_success);
+
+	listp = subtable_locks;
+	while (*listp) {
+		unlink(*listp);
+		listp++;
+	}
+	free_names(subtable_locks);
+	if (lock_file_fd > 0) {
+		close(lock_file_fd);
+		lock_file_fd = 0;
+	}
+	if (have_lock) {
+		unlink(lock_file_name.buf);
+	}
+	strbuf_release(&new_table_name);
+	strbuf_release(&new_table_path);
+	strbuf_release(&ref_list_contents);
+	strbuf_release(&temp_tab_file_name);
+	strbuf_release(&lock_file_name);
+	return err;
+}
+
+int reftable_stack_compact_all(struct reftable_stack *st,
+			       struct reftable_log_expiry_config *config)
+{
+	return stack_compact_range(st, 0, st->merged->stack_len - 1, config);
+}
+
+static int stack_compact_range_stats(struct reftable_stack *st, int first,
+				     int last,
+				     struct reftable_log_expiry_config *config)
+{
+	int err = stack_compact_range(st, first, last, config);
+	if (err > 0) {
+		st->stats.failures++;
+	}
+	return err;
+}
+
+static int segment_size(struct segment *s)
+{
+	return s->end - s->start;
+}
+
+int fastlog2(uint64_t sz)
+{
+	int l = 0;
+	if (sz == 0)
+		return 0;
+	for (; sz; sz /= 2) {
+		l++;
+	}
+	return l - 1;
+}
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n)
+{
+	struct segment *segs = reftable_calloc(sizeof(struct segment) * n);
+	int next = 0;
+	struct segment cur = { 0 };
+	int i = 0;
+
+	if (n == 0) {
+		*seglen = 0;
+		return segs;
+	}
+	for (i = 0; i < n; i++) {
+		int log = fastlog2(sizes[i]);
+		if (cur.log != log && cur.bytes > 0) {
+			struct segment fresh = {
+				.start = i,
+			};
+
+			segs[next++] = cur;
+			cur = fresh;
+		}
+
+		cur.log = log;
+		cur.end = i + 1;
+		cur.bytes += sizes[i];
+	}
+	segs[next++] = cur;
+	*seglen = next;
+	return segs;
+}
+
+struct segment suggest_compaction_segment(uint64_t *sizes, int n)
+{
+	int seglen = 0;
+	struct segment *segs = sizes_to_segments(&seglen, sizes, n);
+	struct segment min_seg = {
+		.log = 64,
+	};
+	int i = 0;
+	for (i = 0; i < seglen; i++) {
+		if (segment_size(&segs[i]) == 1) {
+			continue;
+		}
+
+		if (segs[i].log < min_seg.log) {
+			min_seg = segs[i];
+		}
+	}
+
+	while (min_seg.start > 0) {
+		int prev = min_seg.start - 1;
+		if (fastlog2(min_seg.bytes) < fastlog2(sizes[prev])) {
+			break;
+		}
+
+		min_seg.start = prev;
+		min_seg.bytes += sizes[prev];
+	}
+
+	reftable_free(segs);
+	return min_seg;
+}
+
+static uint64_t *stack_table_sizes_for_compaction(struct reftable_stack *st)
+{
+	uint64_t *sizes =
+		reftable_calloc(sizeof(uint64_t) * st->merged->stack_len);
+	int version = (st->config.hash_id == SHA1_ID) ? 1 : 2;
+	int overhead = header_size(version) - 1;
+	int i = 0;
+	for (i = 0; i < st->merged->stack_len; i++) {
+		sizes[i] = st->readers[i]->size - overhead;
+	}
+	return sizes;
+}
+
+int reftable_stack_auto_compact(struct reftable_stack *st)
+{
+	uint64_t *sizes = stack_table_sizes_for_compaction(st);
+	struct segment seg =
+		suggest_compaction_segment(sizes, st->merged->stack_len);
+	reftable_free(sizes);
+	if (segment_size(&seg) > 0)
+		return stack_compact_range_stats(st, seg.start, seg.end - 1,
+						 NULL);
+
+	return 0;
+}
+
+struct reftable_compaction_stats *
+reftable_stack_compaction_stats(struct reftable_stack *st)
+{
+	return &st->stats;
+}
+
+int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_table tab = { NULL };
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+	return reftable_table_read_ref(&tab, refname, ref);
+}
+
+int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
+			    struct reftable_log_record *log)
+{
+	struct reftable_iterator it = { 0 };
+	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
+	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err)
+		goto done;
+
+	err = reftable_iterator_next_log(&it, log);
+	if (err)
+		goto done;
+
+	if (strcmp(log->refname, refname) ||
+	    reftable_log_record_is_deletion(log)) {
+		err = 1;
+		goto done;
+	}
+
+done:
+	if (err) {
+		reftable_log_record_clear(log);
+	}
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name)
+{
+	int err = 0;
+	struct reftable_block_source src = { 0 };
+	struct reftable_reader *rd = NULL;
+	struct reftable_table tab = { NULL };
+	struct reftable_ref_record *refs = NULL;
+	struct reftable_iterator it = { NULL };
+	int cap = 0;
+	int len = 0;
+	int i = 0;
+
+	if (st->config.skip_name_check)
+		return 0;
+
+	err = reftable_block_source_from_file(&src, new_tab_name);
+	if (err < 0)
+		goto done;
+
+	err = reftable_new_reader(&rd, &src, new_tab_name);
+	if (err < 0)
+		goto done;
+
+	err = reftable_reader_seek_ref(rd, &it, "");
+	if (err > 0) {
+		err = 0;
+		goto done;
+	}
+	if (err < 0)
+		goto done;
+
+	while (true) {
+		struct reftable_ref_record ref = { 0 };
+		err = reftable_iterator_next_ref(&it, &ref);
+		if (err > 0) {
+			break;
+		}
+		if (err < 0)
+			goto done;
+
+		if (len >= cap) {
+			cap = 2 * cap + 1;
+			refs = reftable_realloc(refs, cap * sizeof(refs[0]));
+		}
+
+		refs[len++] = ref;
+	}
+
+	reftable_table_from_merged_table(&tab, reftable_stack_merged_table(st));
+
+	err = validate_ref_record_addition(tab, refs, len);
+
+done:
+	for (i = 0; i < len; i++) {
+		reftable_ref_record_clear(&refs[i]);
+	}
+
+	free(refs);
+	reftable_iterator_destroy(&it);
+	reftable_reader_free(rd);
+	return err;
+}
diff --git a/reftable/stack.h b/reftable/stack.h
new file mode 100644
index 0000000000..e1baa827dd
--- /dev/null
+++ b/reftable/stack.h
@@ -0,0 +1,50 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef STACK_H
+#define STACK_H
+
+#include "reftable.h"
+#include "system.h"
+
+struct reftable_stack {
+	char *list_file;
+	char *reftable_dir;
+	bool disable_auto_compact;
+
+	struct reftable_write_options config;
+
+	struct reftable_reader **readers;
+	size_t readers_len;
+	struct reftable_merged_table *merged;
+	struct reftable_compaction_stats stats;
+};
+
+int read_lines(const char *filename, char ***lines);
+int stack_try_add(struct reftable_stack *st,
+		  int (*write_table)(struct reftable_writer *wr, void *arg),
+		  void *arg);
+int stack_write_compact(struct reftable_stack *st, struct reftable_writer *wr,
+			int first, int last,
+			struct reftable_log_expiry_config *config);
+int fastlog2(uint64_t sz);
+int stack_check_addition(struct reftable_stack *st, const char *new_tab_name);
+void reftable_addition_close(struct reftable_addition *add);
+int reftable_stack_reload_maybe_reuse(struct reftable_stack *st,
+				      bool reuse_open);
+
+struct segment {
+	int start, end;
+	int log;
+	uint64_t bytes;
+};
+
+struct segment *sizes_to_segments(int *seglen, uint64_t *sizes, int n);
+struct segment suggest_compaction_segment(uint64_t *sizes, int n);
+
+#endif
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
new file mode 100644
index 0000000000..f6c74bc6a7
--- /dev/null
+++ b/reftable/stack_test.c
@@ -0,0 +1,787 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "stack.h"
+
+#include "system.h"
+
+#include "merged.h"
+#include "basics.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+#include <sys/types.h>
+#include <dirent.h>
+
+static void test_read_file(void)
+{
+	char fn[256] = "/tmp/stack.test_read_file.XXXXXX";
+	int fd = mkstemp(fn);
+	char out[1024] = "line1\n\nline2\nline3";
+	int n, err;
+	char **names = NULL;
+	char *want[] = { "line1", "line2", "line3" };
+	int i = 0;
+
+	assert(fd > 0);
+	n = write(fd, out, strlen(out));
+	assert(n == strlen(out));
+	err = close(fd);
+	assert(err >= 0);
+
+	err = read_lines(fn, &names);
+	assert_err(err);
+
+	for (i = 0; names[i] != NULL; i++) {
+		assert(0 == strcmp(want[i], names[i]));
+	}
+	free_names(names);
+	remove(fn);
+}
+
+static void test_parse_names(void)
+{
+	char buf[] = "line\n";
+	char **names = NULL;
+	parse_names(buf, strlen(buf), &names);
+
+	assert(NULL != names[0]);
+	assert(0 == strcmp(names[0], "line"));
+	assert(NULL == names[1]);
+	free_names(names);
+}
+
+static void test_names_equal(void)
+{
+	char *a[] = { "a", "b", "c", NULL };
+	char *b[] = { "a", "b", "d", NULL };
+	char *c[] = { "a", "b", NULL };
+
+	assert(names_equal(a, a));
+	assert(!names_equal(a, b));
+	assert(!names_equal(a, c));
+}
+
+static int write_test_ref(struct reftable_writer *wr, void *arg)
+{
+	struct reftable_ref_record *ref = arg;
+	reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
+	return reftable_writer_add_ref(wr, ref);
+}
+
+struct write_log_arg {
+	struct reftable_log_record *log;
+	uint64_t update_index;
+};
+
+static int write_test_log(struct reftable_writer *wr, void *arg)
+{
+	struct write_log_arg *wla = arg;
+
+	reftable_writer_set_limits(wr, wla->update_index, wla->update_index);
+	return reftable_writer_add_log(wr, wla->log);
+}
+
+static void test_reftable_stack_add_one(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record ref = {
+		.refname = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st, ref.refname, &dest);
+	assert_err(err);
+	assert(0 == strcmp("master", dest.target));
+
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_uptodate(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st1 = NULL;
+	struct reftable_stack *st2 = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int err;
+	struct reftable_ref_record ref1 = {
+		.refname = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record ref2 = {
+		.refname = "branch2",
+		.update_index = 2,
+		.target = "master",
+	};
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st1, dir, cfg);
+	assert_err(err);
+
+	err = reftable_new_stack(&st2, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st1, &write_test_ref, &ref1);
+	assert_err(err);
+
+	err = reftable_stack_add(st2, &write_test_ref, &ref2);
+	assert(err == REFTABLE_LOCK_ERROR);
+
+	err = reftable_stack_reload(st2);
+	assert_err(err);
+
+	err = reftable_stack_add(st2, &write_test_ref, &ref2);
+	assert_err(err);
+	reftable_stack_destroy(st1);
+	reftable_stack_destroy(st2);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_transaction_api(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_addition *add = NULL;
+
+	struct reftable_ref_record ref = {
+		.refname = "HEAD",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	reftable_addition_destroy(add);
+
+	err = reftable_stack_new_addition(&add, st);
+	assert_err(err);
+
+	err = reftable_addition_add(add, &write_test_ref, &ref);
+	assert_err(err);
+
+	err = reftable_addition_commit(add);
+	assert_err(err);
+
+	reftable_addition_destroy(add);
+
+	err = reftable_stack_read_ref(st, ref.refname, &dest);
+	assert_err(err);
+	assert(0 == strcmp("master", dest.target));
+
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_validate_refname(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int i;
+	struct reftable_ref_record ref = {
+		.refname = "a/b",
+		.update_index = 1,
+		.target = "master",
+	};
+	char *additions[] = { "a", "a/b/c" };
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	for (i = 0; i < ARRAY_SIZE(additions); i++) {
+		struct reftable_ref_record ref = {
+			.refname = additions[i],
+			.update_index = 1,
+			.target = "master",
+		};
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		assert(err == REFTABLE_NAME_CONFLICT);
+	}
+
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static int write_error(struct reftable_writer *wr, void *arg)
+{
+	return *((int *)arg);
+}
+
+static void test_reftable_stack_update_index_check(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record ref1 = {
+		.refname = "name1",
+		.update_index = 1,
+		.target = "master",
+	};
+	struct reftable_ref_record ref2 = {
+		.refname = "name2",
+		.update_index = 1,
+		.target = "master",
+	};
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref1);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref2);
+	assert(err == REFTABLE_API_ERROR);
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_lock_failure(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err, i;
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+	for (i = -1; i != REFTABLE_EMPTY_TABLE_ERROR; i--) {
+		err = reftable_stack_add(st, &write_error, &i);
+		assert(err == i);
+	}
+
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_add(void)
+{
+	int i = 0;
+	int err = 0;
+	struct reftable_write_options cfg = {
+		.exact_log_message = true,
+	};
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_ref_record refs[2] = { { 0 } };
+	struct reftable_log_record logs[2] = { { 0 } };
+	int N = ARRAY_SIZE(refs);
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+	st->disable_auto_compact = true;
+
+	for (i = 0; i < N; i++) {
+		char buf[256];
+		snprintf(buf, sizeof(buf), "branch%02d", i);
+		refs[i].refname = xstrdup(buf);
+		refs[i].value = reftable_malloc(SHA1_SIZE);
+		refs[i].update_index = i + 1;
+		set_test_hash(refs[i].value, i);
+
+		logs[i].refname = xstrdup(buf);
+		logs[i].update_index = N + i + 1;
+		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+		logs[i].email = xstrdup("identity@invalid");
+		set_test_hash(logs[i].new_hash, i);
+	}
+
+	for (i = 0; i < N; i++) {
+		int err = reftable_stack_add(st, &write_test_ref, &refs[i]);
+		assert_err(err);
+	}
+
+	for (i = 0; i < N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		struct reftable_ref_record dest = { 0 };
+		int err = reftable_stack_read_ref(st, refs[i].refname, &dest);
+		assert_err(err);
+		assert(reftable_ref_record_equal(&dest, refs + i, SHA1_SIZE));
+		reftable_ref_record_clear(&dest);
+	}
+
+	for (i = 0; i < N; i++) {
+		struct reftable_log_record dest = { 0 };
+		int err = reftable_stack_read_log(st, refs[i].refname, &dest);
+		assert_err(err);
+		assert(reftable_log_record_equal(&dest, logs + i, SHA1_SIZE));
+		reftable_log_record_clear(&dest);
+	}
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i < N; i++) {
+		reftable_ref_record_clear(&refs[i]);
+		reftable_log_record_clear(&logs[i]);
+	}
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_log_normalize(void)
+{
+	int err = 0;
+	struct reftable_write_options cfg = {
+		0,
+	};
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+
+	uint8_t h1[SHA1_SIZE] = { 0x01 }, h2[SHA1_SIZE] = { 0x02 };
+
+	struct reftable_log_record input = {
+		.refname = "branch",
+		.update_index = 1,
+		.new_hash = h1,
+		.old_hash = h2,
+	};
+	struct reftable_log_record dest = {
+		.update_index = 0,
+	};
+	struct write_log_arg arg = {
+		.log = &input,
+		.update_index = 1,
+	};
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	input.message = "one\ntwo";
+	err = reftable_stack_add(st, &write_test_log, &arg);
+	assert(err == REFTABLE_API_ERROR);
+
+	input.message = "one";
+	err = reftable_stack_add(st, &write_test_log, &arg);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, input.refname, &dest);
+	assert_err(err);
+	assert(0 == strcmp(dest.message, "one\n"));
+
+	input.message = "two\n";
+	arg.update_index = 2;
+	err = reftable_stack_add(st, &write_test_log, &arg);
+	assert_err(err);
+	err = reftable_stack_read_log(st, input.refname, &dest);
+	assert_err(err);
+	assert(0 == strcmp(dest.message, "two\n"));
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	reftable_log_record_clear(&dest);
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_tombstone(void)
+{
+	int i = 0;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	struct reftable_ref_record refs[2] = { { 0 } };
+	struct reftable_log_record logs[2] = { { 0 } };
+	int N = ARRAY_SIZE(refs);
+	struct reftable_ref_record dest = { 0 };
+	struct reftable_log_record log_dest = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		const char *buf = "branch";
+		refs[i].refname = xstrdup(buf);
+		refs[i].update_index = i + 1;
+		if (i % 2 == 0) {
+			refs[i].value = reftable_malloc(SHA1_SIZE);
+			set_test_hash(refs[i].value, i);
+		}
+		logs[i].refname = xstrdup(buf);
+		/* update_index is part of the key. */
+		logs[i].update_index = 42;
+		if (i % 2 == 0) {
+			logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+			set_test_hash(logs[i].new_hash, i);
+			logs[i].email = xstrdup("identity@invalid");
+		}
+	}
+	for (i = 0; i < N; i++) {
+		int err = reftable_stack_add(st, &write_test_ref, &refs[i]);
+		assert_err(err);
+	}
+	for (i = 0; i < N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_read_ref(st, "branch", &dest);
+	assert(err == 1);
+	reftable_ref_record_clear(&dest);
+
+	err = reftable_stack_read_log(st, "branch", &log_dest);
+	assert(err == 1);
+	reftable_log_record_clear(&log_dest);
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st, "branch", &dest);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, "branch", &log_dest);
+	assert(err == 1);
+	reftable_ref_record_clear(&dest);
+	reftable_log_record_clear(&log_dest);
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i < N; i++) {
+		reftable_ref_record_clear(&refs[i]);
+		reftable_log_record_clear(&logs[i]);
+	}
+	reftable_clear_dir(dir);
+}
+
+static void test_reftable_stack_hash_id(void)
+{
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+
+	struct reftable_ref_record ref = {
+		.refname = "master",
+		.target = "target",
+		.update_index = 1,
+	};
+	struct reftable_write_options cfg32 = { .hash_id = SHA256_ID };
+	struct reftable_stack *st32 = NULL;
+	struct reftable_write_options cfg_default = { 0 };
+	struct reftable_stack *st_default = NULL;
+	struct reftable_ref_record dest = { 0 };
+
+	assert(mkdtemp(dir));
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_test_ref, &ref);
+	assert_err(err);
+
+	/* can't read it with the wrong hash ID. */
+	err = reftable_new_stack(&st32, dir, cfg32);
+	assert(err == REFTABLE_FORMAT_ERROR);
+
+	/* check that we can read it back with default config too. */
+	err = reftable_new_stack(&st_default, dir, cfg_default);
+	assert_err(err);
+
+	err = reftable_stack_read_ref(st_default, "master", &dest);
+	assert_err(err);
+
+	assert(!strcmp(dest.target, ref.target));
+	reftable_ref_record_clear(&dest);
+	reftable_stack_destroy(st);
+	reftable_stack_destroy(st_default);
+	reftable_clear_dir(dir);
+}
+
+static void test_log2(void)
+{
+	assert(1 == fastlog2(3));
+	assert(2 == fastlog2(4));
+	assert(2 == fastlog2(5));
+}
+
+static void test_sizes_to_segments(void)
+{
+	uint64_t sizes[] = { 2, 3, 4, 5, 7, 9 };
+	/* .................0  1  2  3  4  5 */
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(segs[2].log == 3);
+	assert(segs[2].start == 5);
+	assert(segs[2].end == 6);
+
+	assert(segs[1].log == 2);
+	assert(segs[1].start == 2);
+	assert(segs[1].end == 5);
+	reftable_free(segs);
+}
+
+static void test_sizes_to_segments_empty(void)
+{
+	uint64_t sizes[0];
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(seglen == 0);
+	reftable_free(segs);
+}
+
+static void test_sizes_to_segments_all_equal(void)
+{
+	uint64_t sizes[] = { 5, 5 };
+
+	int seglen = 0;
+	struct segment *segs =
+		sizes_to_segments(&seglen, sizes, ARRAY_SIZE(sizes));
+	assert(seglen == 1);
+	assert(segs[0].start == 0);
+	assert(segs[0].end == 2);
+	reftable_free(segs);
+}
+
+static void test_suggest_compaction_segment(void)
+{
+	uint64_t sizes[] = { 128, 64, 17, 16, 9, 9, 9, 16, 16 };
+	/* .................0    1    2  3   4  5  6 */
+	struct segment min =
+		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
+	assert(min.start == 2);
+	assert(min.end == 7);
+}
+
+static void test_suggest_compaction_segment_nothing(void)
+{
+	uint64_t sizes[] = { 64, 32, 16, 8, 4, 2 };
+	struct segment result =
+		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
+	assert(result.start == result.end);
+}
+
+static void test_reflog_expire(void)
+{
+	char dir[256] = "/tmp/stack.test_reflog_expire.XXXXXX";
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	struct reftable_log_record logs[20] = { { 0 } };
+	int N = ARRAY_SIZE(logs) - 1;
+	int i = 0;
+	int err;
+	struct reftable_log_expiry_config expiry = {
+		.time = 10,
+	};
+	struct reftable_log_record log = { 0 };
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 1; i <= N; i++) {
+		char buf[256];
+		snprintf(buf, sizeof(buf), "branch%02d", i);
+
+		logs[i].refname = xstrdup(buf);
+		logs[i].update_index = i;
+		logs[i].time = i;
+		logs[i].new_hash = reftable_malloc(SHA1_SIZE);
+		logs[i].email = xstrdup("identity@invalid");
+		set_test_hash(logs[i].new_hash, i);
+	}
+
+	for (i = 1; i <= N; i++) {
+		struct write_log_arg arg = {
+			.log = &logs[i],
+			.update_index = reftable_stack_next_update_index(st),
+		};
+		int err = reftable_stack_add(st, &write_test_log, &arg);
+		assert_err(err);
+	}
+
+	err = reftable_stack_compact_all(st, NULL);
+	assert_err(err);
+
+	err = reftable_stack_compact_all(st, &expiry);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, logs[9].refname, &log);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, logs[11].refname, &log);
+	assert_err(err);
+
+	expiry.min_update_index = 15;
+	err = reftable_stack_compact_all(st, &expiry);
+	assert_err(err);
+
+	err = reftable_stack_read_log(st, logs[14].refname, &log);
+	assert(err == 1);
+
+	err = reftable_stack_read_log(st, logs[16].refname, &log);
+	assert_err(err);
+
+	/* cleanup */
+	reftable_stack_destroy(st);
+	for (i = 0; i <= N; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	reftable_clear_dir(dir);
+	reftable_log_record_clear(&log);
+}
+
+static int write_nothing(struct reftable_writer *wr, void *arg)
+{
+	reftable_writer_set_limits(wr, 1, 1);
+	return 0;
+}
+
+static void test_empty_add(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	int err;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	struct reftable_stack *st2 = NULL;
+
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	err = reftable_stack_add(st, &write_nothing, NULL);
+	assert_err(err);
+
+	err = reftable_new_stack(&st2, dir, cfg);
+	assert_err(err);
+	reftable_clear_dir(dir);
+	reftable_stack_destroy(st);
+	reftable_stack_destroy(st2);
+}
+
+static void test_reftable_stack_auto_compaction(void)
+{
+	struct reftable_write_options cfg = { 0 };
+	struct reftable_stack *st = NULL;
+	char dir[256] = "/tmp/stack_test.XXXXXX";
+	int err, i;
+	int N = 100;
+	assert(mkdtemp(dir));
+
+	err = reftable_new_stack(&st, dir, cfg);
+	assert_err(err);
+
+	for (i = 0; i < N; i++) {
+		char name[100];
+		struct reftable_ref_record ref = {
+			.refname = name,
+			.update_index = reftable_stack_next_update_index(st),
+			.target = "master",
+		};
+		snprintf(name, sizeof(name), "branch%04d", i);
+
+		err = reftable_stack_add(st, &write_test_ref, &ref);
+		assert_err(err);
+
+		assert(i < 3 || st->merged->stack_len < 2 * fastlog2(i));
+	}
+
+	assert(reftable_stack_compaction_stats(st)->entries_written <
+	       (uint64_t)(N * fastlog2(N)));
+
+	reftable_stack_destroy(st);
+	reftable_clear_dir(dir);
+}
+
+int stack_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_reftable_stack_uptodate",
+		      &test_reftable_stack_uptodate);
+	add_test_case("test_reftable_stack_transaction_api",
+		      &test_reftable_stack_transaction_api);
+	add_test_case("test_reftable_stack_hash_id",
+		      &test_reftable_stack_hash_id);
+	add_test_case("test_sizes_to_segments_all_equal",
+		      &test_sizes_to_segments_all_equal);
+	add_test_case("test_reftable_stack_auto_compaction",
+		      &test_reftable_stack_auto_compaction);
+	add_test_case("test_reftable_stack_validate_refname",
+		      &test_reftable_stack_validate_refname);
+	add_test_case("test_reftable_stack_update_index_check",
+		      &test_reftable_stack_update_index_check);
+	add_test_case("test_reftable_stack_lock_failure",
+		      &test_reftable_stack_lock_failure);
+	add_test_case("test_reftable_stack_log_normalize",
+		      &test_reftable_stack_log_normalize);
+	add_test_case("test_reftable_stack_tombstone",
+		      &test_reftable_stack_tombstone);
+	add_test_case("test_reftable_stack_add_one",
+		      &test_reftable_stack_add_one);
+	add_test_case("test_empty_add", test_empty_add);
+	add_test_case("test_reflog_expire", test_reflog_expire);
+	add_test_case("test_suggest_compaction_segment",
+		      &test_suggest_compaction_segment);
+	add_test_case("test_suggest_compaction_segment_nothing",
+		      &test_suggest_compaction_segment_nothing);
+	add_test_case("test_sizes_to_segments", &test_sizes_to_segments);
+	add_test_case("test_sizes_to_segments_empty",
+		      &test_sizes_to_segments_empty);
+	add_test_case("test_log2", &test_log2);
+	add_test_case("test_parse_names", &test_parse_names);
+	add_test_case("test_read_file", &test_read_file);
+	add_test_case("test_names_equal", &test_names_equal);
+	add_test_case("test_reftable_stack_add", &test_reftable_stack_add);
+	return test_main(argc, argv);
+}
diff --git a/reftable/strbuf.c b/reftable/strbuf.c
new file mode 100644
index 0000000000..eec446fb95
--- /dev/null
+++ b/reftable/strbuf.c
@@ -0,0 +1,206 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "strbuf.h"
+
+#include "system.h"
+#include "reftable.h"
+#include "basics.h"
+
+#ifdef REFTABLE_STANDALONE
+
+void strbuf_init(struct strbuf *s, size_t alloc)
+{
+	struct strbuf empty = STRBUF_INIT;
+	*s = empty;
+}
+
+void strbuf_grow(struct strbuf *s, size_t extra)
+{
+	size_t newcap = s->len + extra + 1;
+	if (newcap > s->cap) {
+		s->buf = reftable_realloc(s->buf, newcap);
+		s->cap = newcap;
+	}
+}
+
+static void strbuf_resize(struct strbuf *s, int l)
+{
+	int zl = l + 1; /* one uint8_t for 0 termination. */
+	assert(s->canary == STRBUF_CANARY);
+	if (s->cap < zl) {
+		int c = s->cap * 2;
+		if (c < zl) {
+			c = zl;
+		}
+		s->cap = c;
+		s->buf = reftable_realloc(s->buf, s->cap);
+	}
+	s->len = l;
+	s->buf[l] = 0;
+}
+
+void strbuf_setlen(struct strbuf *s, size_t l)
+{
+	assert(s->cap >= l + 1);
+	s->len = l;
+	s->buf[l] = 0;
+}
+
+void strbuf_reset(struct strbuf *s)
+{
+	strbuf_resize(s, 0);
+}
+
+void strbuf_addstr(struct strbuf *d, const char *s)
+{
+	int l1 = d->len;
+	int l2 = strlen(s);
+	assert(d->canary == STRBUF_CANARY);
+
+	strbuf_resize(d, l2 + l1);
+	memcpy(d->buf + l1, s, l2);
+}
+
+void strbuf_addbuf(struct strbuf *s, struct strbuf *a)
+{
+	int end = s->len;
+	assert(s->canary == STRBUF_CANARY);
+	strbuf_resize(s, s->len + a->len);
+	memcpy(s->buf + end, a->buf, a->len);
+}
+
+char *strbuf_detach(struct strbuf *s, size_t *sz)
+{
+	char *p = NULL;
+	p = (char *)s->buf;
+	if (sz)
+		*sz = s->len;
+	s->buf = NULL;
+	s->cap = 0;
+	s->len = 0;
+	return p;
+}
+
+void strbuf_release(struct strbuf *s)
+{
+	assert(s->canary == STRBUF_CANARY);
+	s->cap = 0;
+	s->len = 0;
+	reftable_free(s->buf);
+	s->buf = NULL;
+}
+
+int strbuf_cmp(const struct strbuf *a, const struct strbuf *b)
+{
+	int min = a->len < b->len ? a->len : b->len;
+	int res = memcmp(a->buf, b->buf, min);
+	assert(a->canary == STRBUF_CANARY);
+	assert(b->canary == STRBUF_CANARY);
+	if (res != 0)
+		return res;
+	if (a->len < b->len)
+		return -1;
+	else if (a->len > b->len)
+		return 1;
+	else
+		return 0;
+}
+
+int strbuf_add(struct strbuf *b, const void *data, size_t sz)
+{
+	assert(b->canary == STRBUF_CANARY);
+	strbuf_grow(b, sz);
+	memcpy(b->buf + b->len, data, sz);
+	b->len += sz;
+	b->buf[b->len] = 0;
+	return sz;
+}
+
+#endif
+
+static uint64_t strbuf_size(void *b)
+{
+	return ((struct strbuf *)b)->len;
+}
+
+int strbuf_add_void(void *b, const void *data, size_t sz)
+{
+	strbuf_add((struct strbuf *)b, data, sz);
+	return sz;
+}
+
+static void strbuf_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+static void strbuf_close(void *b)
+{
+}
+
+static int strbuf_read_block(void *v, struct reftable_block *dest, uint64_t off,
+			     uint32_t size)
+{
+	struct strbuf *b = (struct strbuf *)v;
+	assert(off + size <= b->len);
+	dest->data = reftable_calloc(size);
+	memcpy(dest->data, b->buf + off, size);
+	dest->len = size;
+	return size;
+}
+
+struct reftable_block_source_vtable strbuf_vtable = {
+	.size = &strbuf_size,
+	.read_block = &strbuf_read_block,
+	.return_block = &strbuf_return_block,
+	.close = &strbuf_close,
+};
+
+void block_source_from_strbuf(struct reftable_block_source *bs,
+			      struct strbuf *buf)
+{
+	assert(bs->ops == NULL);
+	bs->ops = &strbuf_vtable;
+	bs->arg = buf;
+}
+
+static void malloc_return_block(void *b, struct reftable_block *dest)
+{
+	memset(dest->data, 0xff, dest->len);
+	reftable_free(dest->data);
+}
+
+struct reftable_block_source_vtable malloc_vtable = {
+	.return_block = &malloc_return_block,
+};
+
+struct reftable_block_source malloc_block_source_instance = {
+	.ops = &malloc_vtable,
+};
+
+struct reftable_block_source malloc_block_source(void)
+{
+	return malloc_block_source_instance;
+}
+
+int common_prefix_size(struct strbuf *a, struct strbuf *b)
+{
+	int p = 0;
+	while (p < a->len && p < b->len) {
+		if (a->buf[p] != b->buf[p]) {
+			break;
+		}
+		p++;
+	}
+
+	return p;
+}
+
+struct strbuf reftable_empty_strbuf = STRBUF_INIT;
diff --git a/reftable/strbuf.h b/reftable/strbuf.h
new file mode 100644
index 0000000000..5d5aadf26a
--- /dev/null
+++ b/reftable/strbuf.h
@@ -0,0 +1,88 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SLICE_H
+#define SLICE_H
+
+#ifdef REFTABLE_STANDALONE
+
+#include "basics.h"
+#include "reftable.h"
+
+/*
+  Provides a bounds-checked, growable byte ranges. To use, initialize as "strbuf
+  x = STRBUF_INIT;"
+ */
+struct strbuf {
+	size_t len;
+	size_t cap;
+	char *buf;
+
+	/* Used to enforce initialization with STRBUF_INIT */
+	uint8_t canary;
+};
+#define STRBUF_CANARY 0x42
+#define STRBUF_INIT                       \
+	{                                 \
+		0, 0, NULL, STRBUF_CANARY \
+	}
+
+void strbuf_addstr(struct strbuf *dest, const char *src);
+
+/* Deallocate and clear strbuf */
+void strbuf_release(struct strbuf *strbuf);
+
+/* Set strbuf to 0 length, but retain buffer. */
+void strbuf_reset(struct strbuf *strbuf);
+
+/* Initializes a strbuf. Accepts a strbuf with random garbage. */
+void strbuf_init(struct strbuf *strbuf, size_t alloc);
+
+/* Return `buf`, clearing out `s`. Optionally return len (not cap) in `sz`.  */
+char *strbuf_detach(struct strbuf *s, size_t *sz);
+
+/* Set length of the slace to `l`, but don't reallocated. */
+void strbuf_setlen(struct strbuf *s, size_t l);
+
+/* Ensure `l` bytes beyond current length are available */
+void strbuf_grow(struct strbuf *s, size_t l);
+
+/* Signed comparison */
+int strbuf_cmp(const struct strbuf *a, const struct strbuf *b);
+
+/* Append `data` to the `dest` strbuf.  */
+int strbuf_add(struct strbuf *dest, const void *data, size_t sz);
+
+/* Append `add` to `dest. */
+void strbuf_addbuf(struct strbuf *dest, struct strbuf *add);
+
+#else
+
+#include "../git-compat-util.h"
+#include "../strbuf.h"
+
+#endif
+
+extern struct strbuf reftable_empty_strbuf;
+
+/* Like strbuf_add, but suitable for passing to reftable_new_writer
+ */
+int strbuf_add_void(void *b, const void *data, size_t sz);
+
+/* Find the longest shared prefix size of `a` and `b` */
+int common_prefix_size(struct strbuf *a, struct strbuf *b);
+
+struct reftable_block_source;
+
+/* Create an in-memory block source for reading reftables */
+void block_source_from_strbuf(struct reftable_block_source *bs,
+			      struct strbuf *buf);
+
+struct reftable_block_source malloc_block_source(void);
+
+#endif
diff --git a/reftable/strbuf_test.c b/reftable/strbuf_test.c
new file mode 100644
index 0000000000..862ed86791
--- /dev/null
+++ b/reftable/strbuf_test.c
@@ -0,0 +1,39 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "strbuf.h"
+
+#include "system.h"
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static void test_strbuf(void)
+{
+	struct strbuf s = STRBUF_INIT;
+	struct strbuf t = STRBUF_INIT;
+
+	strbuf_addstr(&s, "abc");
+	assert(0 == strcmp("abc", s.buf));
+
+	strbuf_addstr(&t, "pqr");
+	strbuf_addbuf(&s, &t);
+	assert(0 == strcmp("abcpqr", s.buf));
+
+	strbuf_release(&s);
+	strbuf_release(&t);
+}
+
+int strbuf_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_strbuf", &test_strbuf);
+	return test_main(argc, argv);
+}
diff --git a/reftable/system.h b/reftable/system.h
new file mode 100644
index 0000000000..6629b1104c
--- /dev/null
+++ b/reftable/system.h
@@ -0,0 +1,53 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef SYSTEM_H
+#define SYSTEM_H
+
+#ifndef REFTABLE_STANDALONE
+
+#include "git-compat-util.h"
+#include "cache.h"
+#include <zlib.h>
+
+#else
+
+#include <assert.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <zlib.h>
+
+#include "compat.h"
+
+#endif /* REFTABLE_STANDALONE */
+
+void reftable_clear_dir(const char *dirname);
+
+#define SHA1_ID 0x73686131
+#define SHA256_ID 0x73323536
+#define SHA1_SIZE 20
+#define SHA256_SIZE 32
+
+typedef int bool;
+
+/* This is uncompress2, which is only available in zlib as of 2017.
+ */
+int uncompress_return_consumed(Bytef *dest, uLongf *destLen,
+			       const Bytef *source, uLong *sourceLen);
+int hash_size(uint32_t id);
+
+#endif
diff --git a/reftable/test_framework.c b/reftable/test_framework.c
new file mode 100644
index 0000000000..f97c2d6df8
--- /dev/null
+++ b/reftable/test_framework.c
@@ -0,0 +1,69 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "test_framework.h"
+
+#include "system.h"
+#include "basics.h"
+#include "constants.h"
+
+struct test_case **test_cases;
+int test_case_len;
+int test_case_cap;
+
+struct test_case *new_test_case(const char *name, void (*testfunc)(void))
+{
+	struct test_case *tc = reftable_malloc(sizeof(struct test_case));
+	tc->name = name;
+	tc->testfunc = testfunc;
+	return tc;
+}
+
+struct test_case *add_test_case(const char *name, void (*testfunc)(void))
+{
+	struct test_case *tc = new_test_case(name, testfunc);
+	if (test_case_len == test_case_cap) {
+		test_case_cap = 2 * test_case_cap + 1;
+		test_cases = reftable_realloc(
+			test_cases, sizeof(struct test_case) * test_case_cap);
+	}
+
+	test_cases[test_case_len++] = tc;
+	return tc;
+}
+
+int test_main(int argc, const char *argv[])
+{
+	const char *filter = NULL;
+	int i = 0;
+	if (argc > 1) {
+		filter = argv[1];
+	}
+
+	for (i = 0; i < test_case_len; i++) {
+		const char *name = test_cases[i]->name;
+		if (filter == NULL || strstr(name, filter) != NULL) {
+			printf("case %s\n", name);
+			test_cases[i]->testfunc();
+		} else {
+			printf("skip %s\n", name);
+		}
+
+		reftable_free(test_cases[i]);
+	}
+	reftable_free(test_cases);
+	test_cases = 0;
+	test_case_len = 0;
+	test_case_cap = 0;
+	return 0;
+}
+
+void set_test_hash(uint8_t *p, int i)
+{
+	memset(p, (uint8_t)i, hash_size(SHA1_ID));
+}
diff --git a/reftable/test_framework.h b/reftable/test_framework.h
new file mode 100644
index 0000000000..6a0009c1e5
--- /dev/null
+++ b/reftable/test_framework.h
@@ -0,0 +1,64 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TEST_FRAMEWORK_H
+#define TEST_FRAMEWORK_H
+
+#include "system.h"
+
+#include "reftable.h"
+
+#ifdef NDEBUG
+#undef NDEBUG
+#endif
+
+#include "system.h"
+
+#ifdef assert
+#undef assert
+#endif
+
+#define assert_err(c)                                                  \
+	if (c != 0) {                                                  \
+		fflush(stderr);                                        \
+		fflush(stdout);                                        \
+		fprintf(stderr, "%s: %d: error == %d (%s), want 0\n",  \
+			__FILE__, __LINE__, c, reftable_error_str(c)); \
+		abort();                                               \
+	}
+
+#define assert_streq(a, b)                                               \
+	if (strcmp(a, b)) {                                              \
+		fflush(stderr);                                          \
+		fflush(stdout);                                          \
+		fprintf(stderr, "%s:%d: %s (%s) != %s (%s)\n", __FILE__, \
+			__LINE__, #a, a, #b, b);                         \
+		abort();                                                 \
+	}
+
+#define assert(c)                                                          \
+	if (!(c)) {                                                        \
+		fflush(stderr);                                            \
+		fflush(stdout);                                            \
+		fprintf(stderr, "%s: %d: failed assertion %s\n", __FILE__, \
+			__LINE__, #c);                                     \
+		abort();                                                   \
+	}
+
+struct test_case {
+	const char *name;
+	void (*testfunc)(void);
+};
+
+struct test_case *new_test_case(const char *name, void (*testfunc)(void));
+struct test_case *add_test_case(const char *name, void (*testfunc)(void));
+int test_main(int argc, const char *argv[]);
+
+void set_test_hash(uint8_t *p, int i);
+
+#endif
diff --git a/reftable/tree.c b/reftable/tree.c
new file mode 100644
index 0000000000..0061d14e30
--- /dev/null
+++ b/reftable/tree.c
@@ -0,0 +1,63 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "system.h"
+
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert)
+{
+	int res;
+	if (*rootp == NULL) {
+		if (!insert) {
+			return NULL;
+		} else {
+			struct tree_node *n =
+				reftable_calloc(sizeof(struct tree_node));
+			n->key = key;
+			*rootp = n;
+			return *rootp;
+		}
+	}
+
+	res = compare(key, (*rootp)->key);
+	if (res < 0)
+		return tree_search(key, &(*rootp)->left, compare, insert);
+	else if (res > 0)
+		return tree_search(key, &(*rootp)->right, compare, insert);
+	return *rootp;
+}
+
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg)
+{
+	if (t->left != NULL) {
+		infix_walk(t->left, action, arg);
+	}
+	action(arg, t->key);
+	if (t->right != NULL) {
+		infix_walk(t->right, action, arg);
+	}
+}
+
+void tree_free(struct tree_node *t)
+{
+	if (t == NULL) {
+		return;
+	}
+	if (t->left != NULL) {
+		tree_free(t->left);
+	}
+	if (t->right != NULL) {
+		tree_free(t->right);
+	}
+	reftable_free(t);
+}
diff --git a/reftable/tree.h b/reftable/tree.h
new file mode 100644
index 0000000000..954512e9a3
--- /dev/null
+++ b/reftable/tree.h
@@ -0,0 +1,34 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef TREE_H
+#define TREE_H
+
+/* tree_node is a generic binary search tree. */
+struct tree_node {
+	void *key;
+	struct tree_node *left, *right;
+};
+
+/* looks for `key` in `rootp` using `compare` as comparison function. If insert
+   is set, insert the key if it's not found. Else, return NULL.
+*/
+struct tree_node *tree_search(void *key, struct tree_node **rootp,
+			      int (*compare)(const void *, const void *),
+			      int insert);
+
+/* performs an infix walk of the tree. */
+void infix_walk(struct tree_node *t, void (*action)(void *arg, void *key),
+		void *arg);
+
+/*
+  deallocates the tree nodes recursively. Keys should be deallocated separately
+  by walking over the tree. */
+void tree_free(struct tree_node *t);
+
+#endif
diff --git a/reftable/tree_test.c b/reftable/tree_test.c
new file mode 100644
index 0000000000..c6d448cbe8
--- /dev/null
+++ b/reftable/tree_test.c
@@ -0,0 +1,62 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "tree.h"
+
+#include "basics.h"
+#include "record.h"
+#include "reftable.h"
+#include "test_framework.h"
+#include "reftable-tests.h"
+
+static int test_compare(const void *a, const void *b)
+{
+	return (char *)a - (char *)b;
+}
+
+struct curry {
+	void *last;
+};
+
+static void check_increasing(void *arg, void *key)
+{
+	struct curry *c = (struct curry *)arg;
+	if (c->last != NULL) {
+		assert(test_compare(c->last, key) < 0);
+	}
+	c->last = key;
+}
+
+static void test_tree(void)
+{
+	struct tree_node *root = NULL;
+
+	void *values[11] = { 0 };
+	struct tree_node *nodes[11] = { 0 };
+	int i = 1;
+	struct curry c = { 0 };
+	do {
+		nodes[i] = tree_search(values + i, &root, &test_compare, 1);
+		i = (i * 7) % 11;
+	} while (i != 1);
+
+	for (i = 1; i < ARRAY_SIZE(nodes); i++) {
+		assert(values + i == nodes[i]->key);
+		assert(nodes[i] ==
+		       tree_search(values + i, &root, &test_compare, 0));
+	}
+
+	infix_walk(root, check_increasing, &c);
+	tree_free(root);
+}
+
+int tree_test_main(int argc, const char *argv[])
+{
+	add_test_case("test_tree", &test_tree);
+	return test_main(argc, argv);
+}
diff --git a/reftable/update.sh b/reftable/update.sh
new file mode 100755
index 0000000000..4dadd875f4
--- /dev/null
+++ b/reftable/update.sh
@@ -0,0 +1,22 @@
+#!/bin/sh
+
+set -eu
+
+# Override this to import from somewhere else, say "../reftable".
+SRC=${SRC:-origin}
+BRANCH=${BRANCH:-master}
+
+((git --git-dir reftable-repo/.git fetch -f ${SRC} ${BRANCH}:import && cd reftable-repo && git checkout -f $(git rev-parse import) ) ||
+   git clone https://github.com/google/reftable reftable-repo)
+
+cp reftable-repo/c/*.[ch] reftable/
+cp reftable-repo/c/include/*.[ch] reftable/
+cp reftable-repo/LICENSE reftable/
+
+git --git-dir reftable-repo/.git show --no-patch --format=oneline HEAD \
+  > reftable/VERSION
+
+mv reftable/system.h reftable/system.h~
+sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
+
+git add reftable/*.[ch] reftable/LICENSE reftable/VERSION
diff --git a/reftable/writer.c b/reftable/writer.c
new file mode 100644
index 0000000000..96d71463e5
--- /dev/null
+++ b/reftable/writer.c
@@ -0,0 +1,663 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#include "writer.h"
+
+#include "system.h"
+
+#include "block.h"
+#include "constants.h"
+#include "record.h"
+#include "reftable.h"
+#include "tree.h"
+
+static struct reftable_block_stats *
+writer_reftable_block_stats(struct reftable_writer *w, uint8_t typ)
+{
+	switch (typ) {
+	case 'r':
+		return &w->stats.ref_stats;
+	case 'o':
+		return &w->stats.obj_stats;
+	case 'i':
+		return &w->stats.idx_stats;
+	case 'g':
+		return &w->stats.log_stats;
+	}
+	assert(false);
+	return NULL;
+}
+
+/* write data, queuing the padding for the next write. Returns negative for
+ * error. */
+static int padded_write(struct reftable_writer *w, uint8_t *data, size_t len,
+			int padding)
+{
+	int n = 0;
+	if (w->pending_padding > 0) {
+		uint8_t *zeroed = reftable_calloc(w->pending_padding);
+		int n = w->write(w->write_arg, zeroed, w->pending_padding);
+		if (n < 0)
+			return n;
+
+		w->pending_padding = 0;
+		reftable_free(zeroed);
+	}
+
+	w->pending_padding = padding;
+	n = w->write(w->write_arg, data, len);
+	if (n < 0)
+		return n;
+	n += padding;
+	return 0;
+}
+
+static void options_set_defaults(struct reftable_write_options *opts)
+{
+	if (opts->restart_interval == 0) {
+		opts->restart_interval = 16;
+	}
+
+	if (opts->hash_id == 0) {
+		opts->hash_id = SHA1_ID;
+	}
+	if (opts->block_size == 0) {
+		opts->block_size = DEFAULT_BLOCK_SIZE;
+	}
+}
+
+static int writer_version(struct reftable_writer *w)
+{
+	return (w->opts.hash_id == 0 || w->opts.hash_id == SHA1_ID) ? 1 : 2;
+}
+
+static int writer_write_header(struct reftable_writer *w, uint8_t *dest)
+{
+	memcpy((char *)dest, "REFT", 4);
+
+	dest[4] = writer_version(w);
+
+	put_be24(dest + 5, w->opts.block_size);
+	put_be64(dest + 8, w->min_update_index);
+	put_be64(dest + 16, w->max_update_index);
+	if (writer_version(w) == 2) {
+		put_be32(dest + 24, w->opts.hash_id);
+	}
+	return header_size(writer_version(w));
+}
+
+static void writer_reinit_block_writer(struct reftable_writer *w, uint8_t typ)
+{
+	int block_start = 0;
+	if (w->next == 0) {
+		block_start = header_size(writer_version(w));
+	}
+
+	strbuf_release(&w->last_key);
+	block_writer_init(&w->block_writer_data, typ, w->block,
+			  w->opts.block_size, block_start,
+			  hash_size(w->opts.hash_id));
+	w->block_writer = &w->block_writer_data;
+	w->block_writer->restart_interval = w->opts.restart_interval;
+}
+
+struct reftable_writer *
+reftable_new_writer(int (*writer_func)(void *, const void *, size_t),
+		    void *writer_arg, struct reftable_write_options *opts)
+{
+	struct reftable_writer *wp =
+		reftable_calloc(sizeof(struct reftable_writer));
+	strbuf_init(&wp->block_writer_data.last_key, 0);
+	options_set_defaults(opts);
+	if (opts->block_size >= (1 << 24)) {
+		/* TODO - error return? */
+		abort();
+	}
+	wp->last_key = reftable_empty_strbuf;
+	wp->block = reftable_calloc(opts->block_size);
+	wp->write = writer_func;
+	wp->write_arg = writer_arg;
+	wp->opts = *opts;
+	writer_reinit_block_writer(wp, BLOCK_TYPE_REF);
+
+	return wp;
+}
+
+void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+				uint64_t max)
+{
+	w->min_update_index = min;
+	w->max_update_index = max;
+}
+
+void reftable_writer_free(struct reftable_writer *w)
+{
+	reftable_free(w->block);
+	reftable_free(w);
+}
+
+struct obj_index_tree_node {
+	struct strbuf hash;
+	uint64_t *offsets;
+	size_t offset_len;
+	size_t offset_cap;
+};
+
+#define OBJ_INDEX_TREE_NODE_INIT    \
+	{                           \
+		.hash = STRBUF_INIT \
+	}
+
+static int obj_index_tree_node_compare(const void *a, const void *b)
+{
+	return strbuf_cmp(&((const struct obj_index_tree_node *)a)->hash,
+			  &((const struct obj_index_tree_node *)b)->hash);
+}
+
+static void writer_index_hash(struct reftable_writer *w, struct strbuf *hash)
+{
+	uint64_t off = w->next;
+
+	struct obj_index_tree_node want = { .hash = *hash };
+
+	struct tree_node *node = tree_search(&want, &w->obj_index_tree,
+					     &obj_index_tree_node_compare, 0);
+	struct obj_index_tree_node *key = NULL;
+	if (node == NULL) {
+		struct obj_index_tree_node empty = OBJ_INDEX_TREE_NODE_INIT;
+		key = reftable_malloc(sizeof(struct obj_index_tree_node));
+		*key = empty;
+
+		strbuf_reset(&key->hash);
+		strbuf_addbuf(&key->hash, hash);
+		tree_search((void *)key, &w->obj_index_tree,
+			    &obj_index_tree_node_compare, 1);
+	} else {
+		key = node->key;
+	}
+
+	if (key->offset_len > 0 && key->offsets[key->offset_len - 1] == off) {
+		return;
+	}
+
+	if (key->offset_len == key->offset_cap) {
+		key->offset_cap = 2 * key->offset_cap + 1;
+		key->offsets = reftable_realloc(
+			key->offsets, sizeof(uint64_t) * key->offset_cap);
+	}
+
+	key->offsets[key->offset_len++] = off;
+}
+
+static int writer_add_record(struct reftable_writer *w,
+			     struct reftable_record *rec)
+{
+	int result = -1;
+	struct strbuf key = STRBUF_INIT;
+	int err = 0;
+	reftable_record_key(rec, &key);
+	if (strbuf_cmp(&w->last_key, &key) >= 0)
+		goto done;
+
+	strbuf_reset(&w->last_key);
+	strbuf_addbuf(&w->last_key, &key);
+	if (w->block_writer == NULL) {
+		writer_reinit_block_writer(w, reftable_record_type(rec));
+	}
+
+	assert(block_writer_type(w->block_writer) == reftable_record_type(rec));
+
+	if (block_writer_add(w->block_writer, rec) == 0) {
+		result = 0;
+		goto done;
+	}
+
+	err = writer_flush_block(w);
+	if (err < 0) {
+		result = err;
+		goto done;
+	}
+
+	writer_reinit_block_writer(w, reftable_record_type(rec));
+	err = block_writer_add(w->block_writer, rec);
+	if (err < 0) {
+		result = err;
+		goto done;
+	}
+
+	result = 0;
+done:
+	strbuf_release(&key);
+	return result;
+}
+
+int reftable_writer_add_ref(struct reftable_writer *w,
+			    struct reftable_ref_record *ref)
+{
+	struct reftable_record rec = { 0 };
+	struct reftable_ref_record copy = *ref;
+	int err = 0;
+
+	if (ref->refname == NULL)
+		return REFTABLE_API_ERROR;
+	if (ref->update_index < w->min_update_index ||
+	    ref->update_index > w->max_update_index)
+		return REFTABLE_API_ERROR;
+
+	reftable_record_from_ref(&rec, &copy);
+	copy.update_index -= w->min_update_index;
+	err = writer_add_record(w, &rec);
+	if (err < 0)
+		return err;
+
+	if (!w->opts.skip_index_objects && ref->value != NULL) {
+		struct strbuf h = STRBUF_INIT;
+		strbuf_add(&h, (char *)ref->value, hash_size(w->opts.hash_id));
+		writer_index_hash(w, &h);
+		strbuf_release(&h);
+	}
+
+	if (!w->opts.skip_index_objects && ref->target_value != NULL) {
+		struct strbuf h = STRBUF_INIT;
+		strbuf_add(&h, ref->target_value, hash_size(w->opts.hash_id));
+		writer_index_hash(w, &h);
+	}
+	return 0;
+}
+
+int reftable_writer_add_refs(struct reftable_writer *w,
+			     struct reftable_ref_record *refs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(refs, n, reftable_ref_record_compare_name);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_ref(w, &refs[i]);
+	}
+	return err;
+}
+
+int reftable_writer_add_log(struct reftable_writer *w,
+			    struct reftable_log_record *log)
+{
+	struct reftable_record rec = { 0 };
+	char *input_log_message = log->message;
+	struct strbuf cleaned_message = STRBUF_INIT;
+	int err;
+	if (log->refname == NULL)
+		return REFTABLE_API_ERROR;
+
+	if (w->block_writer != NULL &&
+	    block_writer_type(w->block_writer) == BLOCK_TYPE_REF) {
+		int err = writer_finish_public_section(w);
+		if (err < 0)
+			return err;
+	}
+
+	if (!w->opts.exact_log_message && log->message != NULL) {
+		strbuf_addstr(&cleaned_message, log->message);
+		while (cleaned_message.len &&
+		       cleaned_message.buf[cleaned_message.len - 1] == '\n')
+			strbuf_setlen(&cleaned_message,
+				      cleaned_message.len - 1);
+		if (strchr(cleaned_message.buf, '\n')) {
+			// multiple lines not allowed.
+			err = REFTABLE_API_ERROR;
+			goto done;
+		}
+		strbuf_addstr(&cleaned_message, "\n");
+		log->message = cleaned_message.buf;
+	}
+
+	w->next -= w->pending_padding;
+	w->pending_padding = 0;
+
+	reftable_record_from_log(&rec, log);
+	err = writer_add_record(w, &rec);
+
+done:
+	log->message = input_log_message;
+	strbuf_release(&cleaned_message);
+	return err;
+}
+
+int reftable_writer_add_logs(struct reftable_writer *w,
+			     struct reftable_log_record *logs, int n)
+{
+	int err = 0;
+	int i = 0;
+	QSORT(logs, n, reftable_log_record_compare_key);
+	for (i = 0; err == 0 && i < n; i++) {
+		err = reftable_writer_add_log(w, &logs[i]);
+	}
+	return err;
+}
+
+static int writer_finish_section(struct reftable_writer *w)
+{
+	uint8_t typ = block_writer_type(w->block_writer);
+	uint64_t index_start = 0;
+	int max_level = 0;
+	int threshold = w->opts.unpadded ? 1 : 3;
+	int before_blocks = w->stats.idx_stats.blocks;
+	int err = writer_flush_block(w);
+	int i = 0;
+	struct reftable_block_stats *bstats = NULL;
+	if (err < 0)
+		return err;
+
+	while (w->index_len > threshold) {
+		struct reftable_index_record *idx = NULL;
+		int idx_len = 0;
+
+		max_level++;
+		index_start = w->next;
+		writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+		idx = w->index;
+		idx_len = w->index_len;
+
+		w->index = NULL;
+		w->index_len = 0;
+		w->index_cap = 0;
+		for (i = 0; i < idx_len; i++) {
+			struct reftable_record rec = { 0 };
+			reftable_record_from_index(&rec, idx + i);
+			if (block_writer_add(w->block_writer, &rec) == 0) {
+				continue;
+			}
+
+			err = writer_flush_block(w);
+			if (err < 0)
+				return err;
+
+			writer_reinit_block_writer(w, BLOCK_TYPE_INDEX);
+
+			err = block_writer_add(w->block_writer, &rec);
+			if (err != 0) {
+				/* write into fresh block should always succeed
+				 */
+				abort();
+			}
+		}
+		for (i = 0; i < idx_len; i++) {
+			strbuf_release(&idx[i].last_key);
+		}
+		reftable_free(idx);
+	}
+
+	writer_clear_index(w);
+
+	err = writer_flush_block(w);
+	if (err < 0)
+		return err;
+
+	bstats = writer_reftable_block_stats(w, typ);
+	bstats->index_blocks = w->stats.idx_stats.blocks - before_blocks;
+	bstats->index_offset = index_start;
+	bstats->max_index_level = max_level;
+
+	/* Reinit lastKey, as the next section can start with any key. */
+	w->last_key.len = 0;
+
+	return 0;
+}
+
+struct common_prefix_arg {
+	struct strbuf *last;
+	int max;
+};
+
+static void update_common(void *void_arg, void *key)
+{
+	struct common_prefix_arg *arg = (struct common_prefix_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	if (arg->last != NULL) {
+		int n = common_prefix_size(&entry->hash, arg->last);
+		if (n > arg->max) {
+			arg->max = n;
+		}
+	}
+	arg->last = &entry->hash;
+}
+
+struct write_record_arg {
+	struct reftable_writer *w;
+	int err;
+};
+
+static void write_object_record(void *void_arg, void *key)
+{
+	struct write_record_arg *arg = (struct write_record_arg *)void_arg;
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+	struct reftable_obj_record obj_rec = {
+		.hash_prefix = (uint8_t *)entry->hash.buf,
+		.hash_prefix_len = arg->w->stats.object_id_len,
+		.offsets = entry->offsets,
+		.offset_len = entry->offset_len,
+	};
+	struct reftable_record rec = { 0 };
+	if (arg->err < 0)
+		goto done;
+
+	reftable_record_from_obj(&rec, &obj_rec);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0)
+		goto done;
+
+	arg->err = writer_flush_block(arg->w);
+	if (arg->err < 0)
+		goto done;
+
+	writer_reinit_block_writer(arg->w, BLOCK_TYPE_OBJ);
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+	if (arg->err == 0)
+		goto done;
+	obj_rec.offset_len = 0;
+	arg->err = block_writer_add(arg->w->block_writer, &rec);
+
+	/* Should be able to write into a fresh block. */
+	assert(arg->err == 0);
+
+done:;
+}
+
+static void object_record_free(void *void_arg, void *key)
+{
+	struct obj_index_tree_node *entry = (struct obj_index_tree_node *)key;
+
+	FREE_AND_NULL(entry->offsets);
+	strbuf_release(&entry->hash);
+	reftable_free(entry);
+}
+
+static int writer_dump_object_index(struct reftable_writer *w)
+{
+	struct write_record_arg closure = { .w = w };
+	struct common_prefix_arg common = { 0 };
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &update_common, &common);
+	}
+	w->stats.object_id_len = common.max + 1;
+
+	writer_reinit_block_writer(w, BLOCK_TYPE_OBJ);
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &write_object_record, &closure);
+	}
+
+	if (closure.err < 0)
+		return closure.err;
+	return writer_finish_section(w);
+}
+
+int writer_finish_public_section(struct reftable_writer *w)
+{
+	uint8_t typ = 0;
+	int err = 0;
+
+	if (w->block_writer == NULL)
+		return 0;
+
+	typ = block_writer_type(w->block_writer);
+	err = writer_finish_section(w);
+	if (err < 0)
+		return err;
+	if (typ == BLOCK_TYPE_REF && !w->opts.skip_index_objects &&
+	    w->stats.ref_stats.index_blocks > 0) {
+		err = writer_dump_object_index(w);
+		if (err < 0)
+			return err;
+	}
+
+	if (w->obj_index_tree != NULL) {
+		infix_walk(w->obj_index_tree, &object_record_free, NULL);
+		tree_free(w->obj_index_tree);
+		w->obj_index_tree = NULL;
+	}
+
+	w->block_writer = NULL;
+	return 0;
+}
+
+int reftable_writer_close(struct reftable_writer *w)
+{
+	uint8_t footer[72];
+	uint8_t *p = footer;
+	int err = writer_finish_public_section(w);
+	int empty_table = w->next == 0;
+	if (err != 0)
+		goto done;
+	w->pending_padding = 0;
+	if (empty_table) {
+		/* Empty tables need a header anyway. */
+		uint8_t header[28];
+		int n = writer_write_header(w, header);
+		err = padded_write(w, header, n, 0);
+		if (err < 0)
+			goto done;
+	}
+
+	p += writer_write_header(w, footer);
+	put_be64(p, w->stats.ref_stats.index_offset);
+	p += 8;
+	put_be64(p, (w->stats.obj_stats.offset) << 5 | w->stats.object_id_len);
+	p += 8;
+	put_be64(p, w->stats.obj_stats.index_offset);
+	p += 8;
+
+	put_be64(p, w->stats.log_stats.offset);
+	p += 8;
+	put_be64(p, w->stats.log_stats.index_offset);
+	p += 8;
+
+	put_be32(p, crc32(0, footer, p - footer));
+	p += 4;
+
+	err = padded_write(w, footer, footer_size(writer_version(w)), 0);
+	if (err < 0)
+		goto done;
+
+	if (empty_table) {
+		err = REFTABLE_EMPTY_TABLE_ERROR;
+		goto done;
+	}
+
+done:
+	/* free up memory. */
+	block_writer_clear(&w->block_writer_data);
+	writer_clear_index(w);
+	strbuf_release(&w->last_key);
+	return err;
+}
+
+void writer_clear_index(struct reftable_writer *w)
+{
+	int i = 0;
+	for (i = 0; i < w->index_len; i++) {
+		strbuf_release(&w->index[i].last_key);
+	}
+
+	FREE_AND_NULL(w->index);
+	w->index_len = 0;
+	w->index_cap = 0;
+}
+
+const int debug = 0;
+
+static int writer_flush_nonempty_block(struct reftable_writer *w)
+{
+	uint8_t typ = block_writer_type(w->block_writer);
+	struct reftable_block_stats *bstats =
+		writer_reftable_block_stats(w, typ);
+	uint64_t block_typ_off = (bstats->blocks == 0) ? w->next : 0;
+	int raw_bytes = block_writer_finish(w->block_writer);
+	int padding = 0;
+	int err = 0;
+	struct reftable_index_record ir = { .last_key = STRBUF_INIT };
+	if (raw_bytes < 0)
+		return raw_bytes;
+
+	if (!w->opts.unpadded && typ != BLOCK_TYPE_LOG) {
+		padding = w->opts.block_size - raw_bytes;
+	}
+
+	if (block_typ_off > 0) {
+		bstats->offset = block_typ_off;
+	}
+
+	bstats->entries += w->block_writer->entries;
+	bstats->restarts += w->block_writer->restart_len;
+	bstats->blocks++;
+	w->stats.blocks++;
+
+	if (debug) {
+		fprintf(stderr, "block %c off %" PRIu64 " sz %d (%d)\n", typ,
+			w->next, raw_bytes,
+			get_be24(w->block + w->block_writer->header_off + 1));
+	}
+
+	if (w->next == 0) {
+		writer_write_header(w, w->block);
+	}
+
+	err = padded_write(w, w->block, raw_bytes, padding);
+	if (err < 0)
+		return err;
+
+	if (w->index_cap == w->index_len) {
+		w->index_cap = 2 * w->index_cap + 1;
+		w->index = reftable_realloc(
+			w->index,
+			sizeof(struct reftable_index_record) * w->index_cap);
+	}
+
+	ir.offset = w->next;
+	strbuf_reset(&ir.last_key);
+	strbuf_addbuf(&ir.last_key, &w->block_writer->last_key);
+	w->index[w->index_len] = ir;
+
+	w->index_len++;
+	w->next += padding + raw_bytes;
+	w->block_writer = NULL;
+	return 0;
+}
+
+int writer_flush_block(struct reftable_writer *w)
+{
+	if (w->block_writer == NULL)
+		return 0;
+	if (w->block_writer->entries == 0)
+		return 0;
+	return writer_flush_nonempty_block(w);
+}
+
+const struct reftable_stats *writer_stats(struct reftable_writer *w)
+{
+	return &w->stats;
+}
diff --git a/reftable/writer.h b/reftable/writer.h
new file mode 100644
index 0000000000..c69ecffea0
--- /dev/null
+++ b/reftable/writer.h
@@ -0,0 +1,60 @@
+/*
+Copyright 2020 Google LLC
+
+Use of this source code is governed by a BSD-style
+license that can be found in the LICENSE file or at
+https://developers.google.com/open-source/licenses/bsd
+*/
+
+#ifndef WRITER_H
+#define WRITER_H
+
+#include "basics.h"
+#include "block.h"
+#include "reftable.h"
+#include "strbuf.h"
+#include "tree.h"
+
+struct reftable_writer {
+	int (*write)(void *, const void *, size_t);
+	void *write_arg;
+	int pending_padding;
+	struct strbuf last_key;
+
+	/* offset of next block to write. */
+	uint64_t next;
+	uint64_t min_update_index, max_update_index;
+	struct reftable_write_options opts;
+
+	/* memory buffer for writing */
+	uint8_t *block;
+
+	/* writer for the current section. NULL or points to
+	 * block_writer_data */
+	struct block_writer *block_writer;
+
+	struct block_writer block_writer_data;
+
+	/* pending index records for the current section */
+	struct reftable_index_record *index;
+	size_t index_len;
+	size_t index_cap;
+
+	/*
+	  tree for use with tsearch; used to populate the 'o' inverse OID
+	  map */
+	struct tree_node *obj_index_tree;
+
+	struct reftable_stats stats;
+};
+
+/* finishes a block, and writes it to storage */
+int writer_flush_block(struct reftable_writer *w);
+
+/* deallocates memory related to the index */
+void writer_clear_index(struct reftable_writer *w);
+
+/* finishes writing a 'r' (refs) or 'g' (reflogs) section */
+int writer_finish_public_section(struct reftable_writer *w);
+
+#endif
diff --git a/reftable/zlib-compat.c b/reftable/zlib-compat.c
new file mode 100644
index 0000000000..3e0b0f24f1
--- /dev/null
+++ b/reftable/zlib-compat.c
@@ -0,0 +1,92 @@
+/* taken from zlib's uncompr.c
+
+   commit cacf7f1d4e3d44d871b605da3b647f07d718623f
+   Author: Mark Adler <madler@alumni.caltech.edu>
+   Date:   Sun Jan 15 09:18:46 2017 -0800
+
+       zlib 1.2.11
+
+*/
+
+/*
+ * Copyright (C) 1995-2003, 2010, 2014, 2016 Jean-loup Gailly, Mark Adler
+ * For conditions of distribution and use, see copyright notice in zlib.h
+ */
+
+#include "system.h"
+
+/* clang-format off */
+
+/* ===========================================================================
+     Decompresses the source buffer into the destination buffer.  *sourceLen is
+   the byte length of the source buffer. Upon entry, *destLen is the total size
+   of the destination buffer, which must be large enough to hold the entire
+   uncompressed data. (The size of the uncompressed data must have been saved
+   previously by the compressor and transmitted to the decompressor by some
+   mechanism outside the scope of this compression library.) Upon exit,
+   *destLen is the size of the decompressed data and *sourceLen is the number
+   of source bytes consumed. Upon return, source + *sourceLen points to the
+   first unused input byte.
+
+     uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough
+   memory, Z_BUF_ERROR if there was not enough room in the output buffer, or
+   Z_DATA_ERROR if the input data was corrupted, including if the input data is
+   an incomplete zlib stream.
+*/
+int ZEXPORT uncompress_return_consumed (
+    Bytef *dest,
+    uLongf *destLen,
+    const Bytef *source,
+    uLong *sourceLen) {
+    z_stream stream;
+    int err;
+    const uInt max = (uInt)-1;
+    uLong len, left;
+    Byte buf[1];    /* for detection of incomplete stream when *destLen == 0 */
+
+    len = *sourceLen;
+    if (*destLen) {
+        left = *destLen;
+        *destLen = 0;
+    }
+    else {
+        left = 1;
+        dest = buf;
+    }
+
+    stream.next_in = (z_const Bytef *)source;
+    stream.avail_in = 0;
+    stream.zalloc = (alloc_func)0;
+    stream.zfree = (free_func)0;
+    stream.opaque = (voidpf)0;
+
+    err = inflateInit(&stream);
+    if (err != Z_OK) return err;
+
+    stream.next_out = dest;
+    stream.avail_out = 0;
+
+    do {
+        if (stream.avail_out == 0) {
+            stream.avail_out = left > (uLong)max ? max : (uInt)left;
+            left -= stream.avail_out;
+        }
+        if (stream.avail_in == 0) {
+            stream.avail_in = len > (uLong)max ? max : (uInt)len;
+            len -= stream.avail_in;
+        }
+        err = inflate(&stream, Z_NO_FLUSH);
+    } while (err == Z_OK);
+
+    *sourceLen -= len + stream.avail_in;
+    if (dest != buf)
+        *destLen = stream.total_out;
+    else if (stream.total_out && err == Z_BUF_ERROR)
+        left = 1;
+
+    inflateEnd(&stream);
+    return err == Z_STREAM_END ? Z_OK :
+           err == Z_NEED_DICT ? Z_DATA_ERROR  :
+           err == Z_BUF_ERROR && left + stream.avail_out ? Z_DATA_ERROR :
+           err;
+}
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 13/21] Add standalone build infrastructure for reftable
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (11 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 12/21] Add reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 14/21] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
                                                         ` (7 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The reftable library is integrates with Git, so it become a first class
supported storage mechanism, but is sufficiently complex that other
projects (e.g. libgit2) may want to consume the same source code.

To make this possible, the library also compiles completely standalone, which is
demonstrated with the Bazel BUILD/WORKSPACE files that this commit adds.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 reftable/BUILD      | 203 ++++++++++++++++++++++++++++++++++++++++++++
 reftable/README.md  |  33 +++++++
 reftable/WORKSPACE  |  14 +++
 reftable/zlib.BUILD |  36 ++++++++
 4 files changed, 286 insertions(+)
 create mode 100644 reftable/BUILD
 create mode 100644 reftable/README.md
 create mode 100644 reftable/WORKSPACE
 create mode 100644 reftable/zlib.BUILD

diff --git a/reftable/BUILD b/reftable/BUILD
new file mode 100644
index 0000000000..d3946a717c
--- /dev/null
+++ b/reftable/BUILD
@@ -0,0 +1,203 @@
+# Mirror core-git COPTS so reftable compiles without warning in CGit.
+GIT_COPTS = [
+    "-DREFTABLE_STANDALONE",
+    "-Wall",
+    "-Werror",
+    "-Wdeclaration-after-statement",
+    "-Wstrict-prototypes",
+    "-Wformat-security",
+    "-Wno-format-zero-length",
+    "-Wold-style-definition",
+    "-Woverflow",
+    "-Wpointer-arith",
+    "-Wstrict-prototypes",
+    "-Wunused",
+    "-Wvla",
+    "-Wextra",
+    "-Wmissing-prototypes",
+    "-Wno-empty-body",
+    "-Wno-missing-field-initializers",
+    "-Wno-sign-compare",
+    "-Werror=strict-aliasing",
+    "-Wno-unused-parameter",
+]
+
+cc_library(
+    name = "reftable",
+    srcs = [
+        "basics.c",
+        "block.c",
+        "compat.c",
+        "file.c",
+        "iter.c",
+        "merged.c",
+        "pq.c",
+        "reader.c",
+        "record.c",
+        "refname.c",
+        "reftable.c",
+        "strbuf.c",
+        "stack.c",
+        "tree.c",
+        "writer.c",
+        "zlib-compat.c",
+        "basics.h",
+        "block.h",
+        "compat.h",
+        "constants.h",
+        "iter.h",
+        "merged.h",
+        "pq.h",
+        "reader.h",
+        "refname.h",
+        "record.h",
+        "strbuf.h",
+        "stack.h",
+        "system.h",
+        "tree.h",
+        "writer.h",
+    ],
+    hdrs = [
+        "reftable.h",
+    ],
+    includes = [
+        "include",
+    ],
+    copts = [
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+    deps = ["@zlib"],
+    visibility = ["//visibility:public"]
+)
+
+cc_library(
+    name = "testlib",
+    srcs = [
+        "test_framework.c",
+        "dump.c",
+	"test_framework.h",
+    ],
+    hdrs = [
+            "reftable-tests.h",
+    ],
+    copts = GIT_COPTS,
+    deps = [":reftable"],
+    visibility = ["//visibility:public"]
+)
+
+cc_test(
+    name = "record_test",
+    srcs = ["record_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Drecord_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "reftable_test",
+    srcs = ["reftable_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dreftable_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "strbuf_test",
+    srcs = ["strbuf_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ] ,
+    copts = [
+        "-Dstrbuf_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "stack_test",
+    srcs = ["stack_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dstack_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "tree_test",
+    srcs = ["tree_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dtree_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "block_test",
+    srcs = ["block_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dblock_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "refname_test",
+    srcs = ["refname_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Drefname_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+cc_test(
+    name = "merged_test",
+    srcs = ["merged_test.c"],
+    deps = [
+        ":reftable",
+        ":testlib",
+    ],
+    copts = [
+        "-Dmerged_test_main=main",
+        "-fvisibility=protected",
+    ] + GIT_COPTS,
+)
+
+[sh_test(
+    name = "%s_valgrind_test" % t,
+    srcs = [ "valgrind_test.sh" ],
+    args = [ t ],
+    data = [ t ])
+ for t in ["record_test",
+           "merged_test",
+           "refname_test",
+           "tree_test",
+           "block_test",
+           "strbuf_test",
+           "stack_test"]]
diff --git a/reftable/README.md b/reftable/README.md
new file mode 100644
index 0000000000..0345015c57
--- /dev/null
+++ b/reftable/README.md
@@ -0,0 +1,33 @@
+This directory contains an implementation of the reftable library.
+
+The library is integrated into the git-core project, but can also be built
+standalone, by compiling with -DREFTABLE_STANDALONE. The standalone build is
+exercised by the accompanying BUILD/WORKSPACE files, and can be run as
+
+  bazel test :all
+
+It includes a fragment of the zlib library to provide uncompress2(), which is a
+recent addition to the API. zlib is licensed as follows:
+
+```
+ (C) 1995-2017 Jean-loup Gailly and Mark Adler
+
+  This software is provided 'as-is', without any express or implied
+  warranty.  In no event will the authors be held liable for any damages
+  arising from the use of this software.
+
+  Permission is granted to anyone to use this software for any purpose,
+  including commercial applications, and to alter it and redistribute it
+  freely, subject to the following restrictions:
+
+  1. The origin of this software must not be misrepresented; you must not
+     claim that you wrote the original software. If you use this software
+     in a product, an acknowledgment in the product documentation would be
+     appreciated but is not required.
+  2. Altered source versions must be plainly marked as such, and must not be
+     misrepresented as being the original software.
+  3. This notice may not be removed or altered from any source distribution.
+
+  Jean-loup Gailly        Mark Adler
+  jloup@gzip.org          madler@alumni.caltech.edu
+```
diff --git a/reftable/WORKSPACE b/reftable/WORKSPACE
new file mode 100644
index 0000000000..9f91d16208
--- /dev/null
+++ b/reftable/WORKSPACE
@@ -0,0 +1,14 @@
+workspace(name = "reftable")
+
+load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
+
+http_archive(
+    name = "zlib",
+    build_file = "@//:zlib.BUILD",
+    sha256 = "c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1",
+    strip_prefix = "zlib-1.2.11",
+    urls = [
+        "https://mirror.bazel.build/zlib.net/zlib-1.2.11.tar.gz",
+        "https://zlib.net/zlib-1.2.11.tar.gz",
+    ],
+)
diff --git a/reftable/zlib.BUILD b/reftable/zlib.BUILD
new file mode 100644
index 0000000000..edb77fdf8e
--- /dev/null
+++ b/reftable/zlib.BUILD
@@ -0,0 +1,36 @@
+package(default_visibility = ["//visibility:public"])
+
+licenses(["notice"])  # BSD/MIT-like license (for zlib)
+
+cc_library(
+    name = "zlib",
+    srcs = [
+        "adler32.c",
+        "compress.c",
+        "crc32.c",
+        "crc32.h",
+        "deflate.c",
+        "deflate.h",
+        "gzclose.c",
+        "gzguts.h",
+        "gzlib.c",
+        "gzread.c",
+        "gzwrite.c",
+        "infback.c",
+        "inffast.c",
+        "inffast.h",
+        "inffixed.h",
+        "inflate.c",
+        "inflate.h",
+        "inftrees.c",
+        "inftrees.h",
+        "trees.c",
+        "trees.h",
+        "uncompr.c",
+        "zconf.h",
+        "zutil.c",
+        "zutil.h",
+    ],
+    hdrs = ["zlib.h"],
+    includes = ["."],
+)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 14/21] Reftable support for git-core
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (12 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 13/21] Add standalone build infrastructure for reftable Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 15/21] Read FETCH_HEAD as loose ref Han-Wen Nienhuys via GitGitGadget
                                                         ` (6 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library.

This introduces the file refs/reftable-backend.c containing a reftable-powered
ref storage backend.

It can be activated by passing --ref-storage=reftable to "git init", or setting
GIT_TEST_REFTABLE in the environment.

Example use: see t/t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Co-authored-by: Jeff King <peff@peff.net>
---
 .../technical/repository-version.txt          |    7 +
 Makefile                                      |   28 +-
 builtin/clone.c                               |    4 +-
 builtin/init-db.c                             |   55 +-
 builtin/worktree.c                            |   27 +-
 cache.h                                       |    8 +-
 refs.c                                        |   27 +-
 refs.h                                        |    3 +
 refs/refs-internal.h                          |    1 +
 refs/reftable-backend.c                       | 1384 +++++++++++++++++
 reftable/update.sh                            |    3 -
 repository.c                                  |    2 +
 repository.h                                  |    3 +
 setup.c                                       |   10 +-
 t/t0031-reftable.sh                           |  173 +++
 15 files changed, 1698 insertions(+), 37 deletions(-)
 create mode 100644 refs/reftable-backend.c
 create mode 100755 t/t0031-reftable.sh

diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
index 7844ef30ff..7257623583 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -100,3 +100,10 @@ If set, by default "git config" reads from both "config" and
 multiple working directory mode, "config" file is shared while
 "config.worktree" is per-working directory (i.e., it's in
 GIT_COMMON_DIR/worktrees/<id>/config.worktree)
+
+==== `refStorage`
+
+Specifies the file format for the ref database. Values are `files`
+(for the traditional packed + loose ref format) and `reftable` for the
+binary reftable format. See https://github.com/google/reftable for
+more information.
diff --git a/Makefile b/Makefile
index 372139f1f2..6571ed1246 100644
--- a/Makefile
+++ b/Makefile
@@ -807,6 +807,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
+REFTABLE_LIB = reftable/libreftable.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -959,6 +960,7 @@ LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
 LIB_OBJS += refs/packed-backend.o
 LIB_OBJS += refs/ref-cache.o
@@ -1162,7 +1164,7 @@ THIRD_PARTY_SOURCES += compat/regex/%
 THIRD_PARTY_SOURCES += sha1collisiondetection/%
 THIRD_PARTY_SOURCES += sha1dc/%
 
-GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB)
+GITLIBS = common-main.o $(LIB_FILE) $(XDIFF_LIB) $(REFTABLE_LIB)
 EXTLIBS =
 
 GIT_USER_AGENT = git/$(GIT_VERSION)
@@ -2352,11 +2354,30 @@ VCSSVN_OBJS += vcs-svn/sliding_window.o
 VCSSVN_OBJS += vcs-svn/svndiff.o
 VCSSVN_OBJS += vcs-svn/svndump.o
 
+REFTABLE_OBJS += reftable/basics.o
+REFTABLE_OBJS += reftable/block.o
+REFTABLE_OBJS += reftable/compat.o
+REFTABLE_OBJS += reftable/file.o
+REFTABLE_OBJS += reftable/iter.o
+REFTABLE_OBJS += reftable/merged.o
+REFTABLE_OBJS += reftable/pq.o
+REFTABLE_OBJS += reftable/reader.o
+REFTABLE_OBJS += reftable/record.o
+REFTABLE_OBJS += reftable/refname.o
+REFTABLE_OBJS += reftable/reftable.o
+REFTABLE_OBJS += reftable/strbuf.o
+REFTABLE_OBJS += reftable/stack.o
+REFTABLE_OBJS += reftable/tree.o
+REFTABLE_OBJS += reftable/writer.o
+REFTABLE_OBJS += reftable/zlib-compat.o
+
+
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(XDIFF_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
+	$(REFTABLE_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2497,6 +2518,9 @@ $(XDIFF_LIB): $(XDIFF_OBJS)
 $(VCSSVN_LIB): $(VCSSVN_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_LIB): $(REFTABLE_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -3112,7 +3136,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/builtin/clone.c b/builtin/clone.c
index a9f3312a54..4f78ec4a2b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1120,7 +1120,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	}
 
 	init_db(git_dir, real_git_dir, option_template, GIT_HASH_UNKNOWN, NULL,
-		INIT_DB_QUIET);
+		default_ref_storage(), INIT_DB_QUIET);
 
 	if (real_git_dir)
 		git_dir = real_git_dir;
@@ -1235,7 +1235,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		 * Now that we know what algorithm the remote side is using,
 		 * let's set ours to the same thing.
 		 */
-		initialize_repository_version(hash_algo);
+		initialize_repository_version(hash_algo, default_ref_storage());
 		repo_set_hash_algo(the_repository, hash_algo);
 
 		mapped_refs = wanted_peer_refs(refs, &remote->fetch);
diff --git a/builtin/init-db.c b/builtin/init-db.c
index cee64823cb..f9589717c3 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -178,7 +178,8 @@ static int needs_work_tree_config(const char *git_dir, const char *work_tree)
 	return 1;
 }
 
-void initialize_repository_version(int hash_algo)
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format)
 {
 	char repo_version_string[10];
 	int repo_version = GIT_REPO_VERSION;
@@ -188,7 +189,8 @@ void initialize_repository_version(int hash_algo)
 		die(_("The hash algorithm %s is not supported in this build."), hash_algos[hash_algo].name);
 #endif
 
-	if (hash_algo != GIT_HASH_SHA1)
+	if (hash_algo != GIT_HASH_SHA1 ||
+	    !strcmp(ref_storage_format, "reftable"))
 		repo_version = GIT_REPO_VERSION_READ;
 
 	/* This forces creation of new config file */
@@ -239,6 +241,7 @@ static int create_default_files(const char *template_path,
 	is_bare_repository_cfg = init_is_bare_repository;
 	if (init_shared_repository != -1)
 		set_shared_repository(init_shared_repository);
+	the_repository->ref_storage_format = xstrdup(fmt->ref_storage);
 
 	/*
 	 * We would have created the above under user's umask -- under
@@ -248,6 +251,24 @@ static int create_default_files(const char *template_path,
 		adjust_shared_perm(get_git_dir());
 	}
 
+	/*
+	 * Check to see if .git/HEAD exists; this must happen before
+	 * initializing the ref db, because we want to see if there is an
+	 * existing HEAD.
+	 */
+	path = git_path_buf(&buf, "HEAD");
+	reinit = (!access(path, R_OK) ||
+		  readlink(path, junk, sizeof(junk) - 1) != -1);
+
+	/*
+	 * refs/heads is a file when using reftable. We can't reinitialize with
+	 * a reftable because it will overwrite HEAD
+	 */
+	if (reinit && (!strcmp(fmt->ref_storage, "reftable")) ==
+			      is_directory(git_path_buf(&buf, "refs/heads"))) {
+		die("cannot switch ref storage format.");
+	}
+
 	/*
 	 * We need to create a "refs" dir in any case so that older
 	 * versions of git can tell that this is a repository.
@@ -262,9 +283,6 @@ static int create_default_files(const char *template_path,
 	 * Point the HEAD symref to the initial branch with if HEAD does
 	 * not yet exist.
 	 */
-	path = git_path_buf(&buf, "HEAD");
-	reinit = (!access(path, R_OK)
-		  || readlink(path, junk, sizeof(junk)-1) != -1);
 	if (!reinit) {
 		char *ref;
 
@@ -281,7 +299,7 @@ static int create_default_files(const char *template_path,
 		free(ref);
 	}
 
-	initialize_repository_version(fmt->hash_algo);
+	initialize_repository_version(fmt->hash_algo, fmt->ref_storage);
 
 	/* Check filemode trustability */
 	path = git_path_buf(&buf, "config");
@@ -396,7 +414,7 @@ static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash
 
 int init_db(const char *git_dir, const char *real_git_dir,
 	    const char *template_dir, int hash, const char *initial_branch,
-	    unsigned int flags)
+	    const char *ref_storage_format, unsigned int flags)
 {
 	int reinit;
 	int exist_ok = flags & INIT_DB_EXIST_OK;
@@ -435,6 +453,7 @@ int init_db(const char *git_dir, const char *real_git_dir,
 	 * is an attempt to reinitialize new repository with an old tool.
 	 */
 	check_repository_format(&repo_fmt);
+	repo_fmt.ref_storage = xstrdup(ref_storage_format);
 
 	validate_hash_algorithm(&repo_fmt, hash);
 
@@ -467,6 +486,9 @@ int init_db(const char *git_dir, const char *real_git_dir,
 		git_config_set("receive.denyNonFastforwards", "true");
 	}
 
+	if (!strcmp(ref_storage_format, "reftable"))
+		git_config_set("extensions.refStorage", ref_storage_format);
+
 	if (!(flags & INIT_DB_QUIET)) {
 		int len = strlen(git_dir);
 
@@ -540,6 +562,7 @@ static const char *const init_db_usage[] = {
 int cmd_init_db(int argc, const char **argv, const char *prefix)
 {
 	const char *git_dir;
+	const char *ref_storage_format = default_ref_storage();
 	const char *real_git_dir = NULL;
 	const char *work_tree;
 	const char *template_dir = NULL;
@@ -548,15 +571,18 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	const char *initial_branch = NULL;
 	int hash_algo = GIT_HASH_UNKNOWN;
 	const struct option init_db_options[] = {
-		OPT_STRING(0, "template", &template_dir, N_("template-directory"),
-				N_("directory from which templates will be used")),
+		OPT_STRING(0, "template", &template_dir,
+			   N_("template-directory"),
+			   N_("directory from which templates will be used")),
 		OPT_SET_INT(0, "bare", &is_bare_repository_cfg,
-				N_("create a bare repository"), 1),
+			    N_("create a bare repository"), 1),
 		{ OPTION_CALLBACK, 0, "shared", &init_shared_repository,
-			N_("permissions"),
-			N_("specify that the git repository is to be shared amongst several users"),
-			PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0},
+		  N_("permissions"),
+		  N_("specify that the git repository is to be shared amongst several users"),
+		  PARSE_OPT_OPTARG | PARSE_OPT_NONEG, shared_callback, 0 },
 		OPT_BIT('q', "quiet", &flags, N_("be quiet"), INIT_DB_QUIET),
+		OPT_STRING(0, "ref-storage", &ref_storage_format, N_("backend"),
+			   N_("the ref storage format to use")),
 		OPT_STRING(0, "separate-git-dir", &real_git_dir, N_("gitdir"),
 			   N_("separate git dir from working tree")),
 		OPT_STRING('b', "initial-branch", &initial_branch, N_("name"),
@@ -668,10 +694,11 @@ int cmd_init_db(int argc, const char **argv, const char *prefix)
 	}
 
 	UNLEAK(real_git_dir);
+	UNLEAK(ref_storage_format);
 	UNLEAK(git_dir);
 	UNLEAK(work_tree);
 
 	flags |= INIT_DB_EXIST_OK;
 	return init_db(git_dir, real_git_dir, template_dir, hash_algo,
-		       initial_branch, flags);
+		       initial_branch, ref_storage_format, flags);
 }
diff --git a/builtin/worktree.c b/builtin/worktree.c
index f0cbdef718..1935967072 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -12,6 +12,7 @@
 #include "submodule.h"
 #include "utf8.h"
 #include "worktree.h"
+#include "../refs/refs-internal.h"
 
 static const char * const worktree_usage[] = {
 	N_("git worktree add [<options>] <path> [<commit-ish>]"),
@@ -402,9 +403,29 @@ static int add_worktree(const char *path, const char *refname,
 	 * worktree.
 	 */
 	strbuf_reset(&sb);
-	strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
-	write_file(sb.buf, "%s", oid_to_hex(&null_oid));
-	strbuf_reset(&sb);
+	if (get_main_ref_store(the_repository)->be == &refs_be_reftable) {
+		/* XXX this is cut & paste from reftable_init_db. */
+		strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
+		write_file(sb.buf, "%s", "ref: refs/heads/.invalid\n");
+		strbuf_reset(&sb);
+
+		strbuf_addf(&sb, "%s/refs", sb_repo.buf);
+		safe_create_dir(sb.buf, 1);
+		strbuf_reset(&sb);
+
+		strbuf_addf(&sb, "%s/refs/heads", sb_repo.buf);
+		write_file(sb.buf, "this repository uses the reftable format");
+		strbuf_reset(&sb);
+
+		strbuf_addf(&sb, "%s/reftable", sb_repo.buf);
+		safe_create_dir(sb.buf, 1);
+		strbuf_reset(&sb);
+	} else {
+		strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
+		write_file(sb.buf, "%s", oid_to_hex(&null_oid));
+		strbuf_reset(&sb);
+	}
+
 	strbuf_addf(&sb, "%s/commondir", sb_repo.buf);
 	write_file(sb.buf, "../..");
 
diff --git a/cache.h b/cache.h
index 0290849c19..6090da4967 100644
--- a/cache.h
+++ b/cache.h
@@ -627,9 +627,10 @@ int path_inside_repo(const char *prefix, const char *path);
 #define INIT_DB_EXIST_OK 0x0002
 
 int init_db(const char *git_dir, const char *real_git_dir,
-	    const char *template_dir, int hash_algo,
-	    const char *initial_branch, unsigned int flags);
-void initialize_repository_version(int hash_algo);
+	    const char *template_dir, int hash_algo, const char *initial_branch,
+	    const char *ref_storage_format, unsigned int flags);
+void initialize_repository_version(int hash_algo,
+				   const char *ref_storage_format);
 
 void sanitize_stdfds(void);
 int daemonize(void);
@@ -1043,6 +1044,7 @@ struct repository_format {
 	int is_bare;
 	int hash_algo;
 	char *work_tree;
+	char *ref_storage;
 	struct string_list unknown_extensions;
 	struct string_list v1_only_extensions;
 };
diff --git a/refs.c b/refs.c
index d4b5397c8f..3789bc1071 100644
--- a/refs.c
+++ b/refs.c
@@ -19,10 +19,16 @@
 #include "repository.h"
 #include "sigchain.h"
 
+const char *default_ref_storage(void)
+{
+	const char *test = getenv("GIT_TEST_REFTABLE");
+	return test ? "reftable" : "files";
+}
+
 /*
  * List of all available backends
  */
-static struct ref_storage_be *refs_backends = &refs_be_files;
+static struct ref_storage_be *refs_backends = &refs_be_reftable;
 
 static struct ref_storage_be *find_ref_storage_backend(const char *name)
 {
@@ -1725,13 +1731,13 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
  * Create, record, and return a ref_store instance for the specified
  * gitdir.
  */
-static struct ref_store *ref_store_init(const char *gitdir,
+static struct ref_store *ref_store_init(const char *gitdir, const char *be_name,
 					unsigned int flags)
 {
-	const char *be_name = "files";
-	struct ref_storage_be *be = find_ref_storage_backend(be_name);
+	struct ref_storage_be *be;
 	struct ref_store *refs;
 
+	be = find_ref_storage_backend(be_name);
 	if (!be)
 		BUG("reference backend %s is unknown", be_name);
 
@@ -1747,7 +1753,11 @@ struct ref_store *get_main_ref_store(struct repository *r)
 	if (!r->gitdir)
 		BUG("attempting to get main_ref_store outside of repository");
 
-	r->refs_private = ref_store_init(r->gitdir, REF_STORE_ALL_CAPS);
+	r->refs_private = ref_store_init(r->gitdir,
+					 r->ref_storage_format ?
+						 r->ref_storage_format :
+						 default_ref_storage(),
+					 REF_STORE_ALL_CAPS);
 	return r->refs_private;
 }
 
@@ -1802,7 +1812,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 		goto done;
 
 	/* assume that add_submodule_odb() has been called */
-	refs = ref_store_init(submodule_sb.buf,
+	refs = ref_store_init(submodule_sb.buf, default_ref_storage(),
 			      REF_STORE_READ | REF_STORE_ODB);
 	register_ref_store_map(&submodule_ref_stores, "submodule",
 			       refs, submodule);
@@ -1816,6 +1826,7 @@ struct ref_store *get_submodule_ref_store(const char *submodule)
 
 struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 {
+	const char *format = default_ref_storage();
 	struct ref_store *refs;
 	const char *id;
 
@@ -1829,9 +1840,9 @@ struct ref_store *get_worktree_ref_store(const struct worktree *wt)
 
 	if (wt->id)
 		refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
-				      REF_STORE_ALL_CAPS);
+				      format, REF_STORE_ALL_CAPS);
 	else
-		refs = ref_store_init(get_git_common_dir(),
+		refs = ref_store_init(get_git_common_dir(), format,
 				      REF_STORE_ALL_CAPS);
 
 	if (refs)
diff --git a/refs.h b/refs.h
index f8835f0aef..581ec270dc 100644
--- a/refs.h
+++ b/refs.h
@@ -9,6 +9,9 @@ struct string_list;
 struct string_list_item;
 struct worktree;
 
+/* Returns the ref storage backend to use by default. */
+const char *default_ref_storage(void);
+
 /*
  * Resolve a reference, recursively following symbolic refererences.
  *
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 61fd2e661f..b9b5c2e87c 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -669,6 +669,7 @@ struct ref_storage_be {
 };
 
 extern struct ref_storage_be refs_be_files;
+extern struct ref_storage_be refs_be_reftable;
 extern struct ref_storage_be refs_be_packed;
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
new file mode 100644
index 0000000000..93f3d337b6
--- /dev/null
+++ b/refs/reftable-backend.c
@@ -0,0 +1,1384 @@
+#include "../cache.h"
+#include "../chdir-notify.h"
+#include "../config.h"
+#include "../iterator.h"
+#include "../lockfile.h"
+#include "../refs.h"
+#include "../reftable/reftable.h"
+#include "../worktree.h"
+#include "refs-internal.h"
+
+extern struct ref_storage_be refs_be_reftable;
+
+struct git_reftable_ref_store {
+	struct ref_store base;
+	unsigned int store_flags;
+
+	int err;
+	char *repo_dir;
+
+	char *reftable_dir;
+	char *worktree_reftable_dir;
+
+	struct reftable_stack *main_stack;
+	struct reftable_stack *worktree_stack;
+};
+
+static struct reftable_stack *stack_for(struct git_reftable_ref_store *store,
+					const char *refname)
+{
+	if (store->worktree_stack == NULL)
+		return store->main_stack;
+
+	switch (ref_type(refname)) {
+	case REF_TYPE_PER_WORKTREE:
+	case REF_TYPE_PSEUDOREF:
+	case REF_TYPE_OTHER_PSEUDOREF:
+		return store->worktree_stack;
+	default:
+	case REF_TYPE_MAIN_PSEUDOREF:
+	case REF_TYPE_NORMAL:
+		return store->main_stack;
+	}
+}
+
+static int git_reftable_read_raw_ref(struct ref_store *ref_store,
+				     const char *refname, struct object_id *oid,
+				     struct strbuf *referent,
+				     unsigned int *type);
+
+static void clear_reftable_log_record(struct reftable_log_record *log)
+{
+	log->old_hash = NULL;
+	log->new_hash = NULL;
+	log->message = NULL;
+	log->refname = NULL;
+	reftable_log_record_clear(log);
+}
+
+static void fill_reftable_log_record(struct reftable_log_record *log)
+{
+	const char *info = git_committer_info(0);
+	struct ident_split split = { NULL };
+	int result = split_ident_line(&split, info, strlen(info));
+	int sign = 1;
+	assert(0 == result);
+
+	reftable_log_record_clear(log);
+	log->name =
+		xstrndup(split.name_begin, split.name_end - split.name_begin);
+	log->email =
+		xstrndup(split.mail_begin, split.mail_end - split.mail_begin);
+	log->time = atol(split.date_begin);
+	if (*split.tz_begin == '-') {
+		sign = -1;
+		split.tz_begin++;
+	}
+	if (*split.tz_begin == '+') {
+		sign = 1;
+		split.tz_begin++;
+	}
+
+	log->tz_offset = sign * atoi(split.tz_begin);
+}
+
+static int has_suffix(struct strbuf *b, const char *suffix)
+{
+	size_t len = strlen(suffix);
+
+	if (len > b->len) {
+		return 0;
+	}
+
+	return 0 == strncmp(b->buf + b->len - len, suffix, len);
+}
+
+static int trim_component(struct strbuf *b)
+{
+	char *last;
+	last = strrchr(b->buf, '/');
+	if (!last)
+		return -1;
+	strbuf_setlen(b, last - b->buf);
+	return 0;
+}
+
+static int is_worktree(struct strbuf *b)
+{
+	if (trim_component(b) < 0) {
+		return 0;
+	}
+	if (!has_suffix(b, "/worktrees")) {
+		return 0;
+	}
+	trim_component(b);
+	return 1;
+}
+
+static struct ref_store *git_reftable_ref_store_create(const char *path,
+						       unsigned int store_flags)
+{
+	struct git_reftable_ref_store *refs = xcalloc(1, sizeof(*refs));
+	struct ref_store *ref_store = (struct ref_store *)refs;
+	struct reftable_write_options cfg = {
+		.block_size = 4096,
+		.hash_id = the_hash_algo->format_id,
+	};
+	struct strbuf sb = STRBUF_INIT;
+	const char *gitdir = path;
+	struct strbuf wt_buf = STRBUF_INIT;
+	int wt = 0;
+
+	strbuf_addstr(&wt_buf, path);
+
+	/* this is clumsy, but the official worktree functions (eg.
+	 * get_worktrees()) function will try to initialize a ref storage
+	 * backend, leading to infinite recursion.  */
+	wt = is_worktree(&wt_buf);
+	if (wt) {
+		gitdir = wt_buf.buf;
+	}
+
+	base_ref_store_init(ref_store, &refs_be_reftable);
+	refs->store_flags = store_flags;
+	refs->repo_dir = xstrdup(gitdir);
+	strbuf_addf(&sb, "%s/reftable", gitdir);
+	refs->reftable_dir = xstrdup(sb.buf);
+	strbuf_reset(&sb);
+
+	refs->err =
+		reftable_new_stack(&refs->main_stack, refs->reftable_dir, cfg);
+	assert(refs->err != REFTABLE_API_ERROR);
+
+	if (refs->err == 0 && wt) {
+		strbuf_addf(&sb, "%s/reftable", path);
+		refs->worktree_reftable_dir = xstrdup(sb.buf);
+
+		refs->err = reftable_new_stack(&refs->worktree_stack,
+					       refs->worktree_reftable_dir,
+					       cfg);
+		assert(refs->err != REFTABLE_API_ERROR);
+	}
+
+	strbuf_release(&sb);
+	strbuf_release(&wt_buf);
+	return ref_store;
+}
+
+static int git_reftable_init_db(struct ref_store *ref_store, struct strbuf *err)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct strbuf sb = STRBUF_INIT;
+
+	safe_create_dir(refs->reftable_dir, 1);
+	assert(refs->worktree_reftable_dir == NULL);
+
+	strbuf_addf(&sb, "%s/HEAD", refs->repo_dir);
+	write_file(sb.buf, "ref: refs/heads/.invalid");
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs", refs->repo_dir);
+	safe_create_dir(sb.buf, 1);
+	strbuf_reset(&sb);
+
+	strbuf_addf(&sb, "%s/refs/heads", refs->repo_dir);
+	write_file(sb.buf, "this repository uses the reftable format");
+
+	return 0;
+}
+
+struct git_reftable_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_ref_record ref;
+	struct object_id oid;
+	struct ref_store *ref_store;
+
+	/* In case we must iterate over 2 stacks, this is non-null. */
+	struct reftable_merged_table *merged;
+	unsigned int flags;
+	int err;
+	const char *prefix;
+};
+
+static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	while (ri->err == 0) {
+		ri->err = reftable_iterator_next_ref(&ri->iter, &ri->ref);
+		if (ri->err) {
+			break;
+		}
+
+		/*
+		   We could filter pseudo refs here explicitly, but HEAD is not
+		   a PSEUDOREF, but a PER_WORKTREE, b/c each worktree can have
+		   its own HEAD.
+		 */
+		ri->base.refname = ri->ref.refname;
+		if (ri->prefix != NULL &&
+		    strncmp(ri->prefix, ri->ref.refname, strlen(ri->prefix))) {
+			ri->err = 1;
+			break;
+		}
+		if (ri->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
+		    ref_type(ri->base.refname) != REF_TYPE_PER_WORKTREE)
+			continue;
+
+		ri->base.flags = 0;
+		if (ri->ref.value != NULL) {
+			hashcpy(ri->oid.hash, ri->ref.value);
+		} else if (ri->ref.target != NULL) {
+			int out_flags = 0;
+			const char *resolved = refs_resolve_ref_unsafe(
+				ri->ref_store, ri->ref.refname,
+				RESOLVE_REF_READING, &ri->oid, &out_flags);
+			ri->base.flags = out_flags;
+			if (resolved == NULL &&
+			    !(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+			    (ri->base.flags & REF_ISBROKEN)) {
+				continue;
+			}
+		}
+
+		ri->base.oid = &ri->oid;
+		if (!(ri->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(ri->base.refname, ri->base.oid,
+					    ri->base.flags)) {
+			continue;
+		}
+
+		break;
+	}
+
+	if (ri->err > 0) {
+		return ITER_DONE;
+	}
+	if (ri->err < 0) {
+		return ITER_ERROR;
+	}
+
+	return ITER_OK;
+}
+
+static int reftable_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	if (ri->ref.target_value != NULL) {
+		hashcpy(peeled->hash, ri->ref.target_value);
+		return 0;
+	}
+
+	return -1;
+}
+
+static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_iterator *ri =
+		(struct git_reftable_iterator *)ref_iterator;
+	reftable_ref_record_clear(&ri->ref);
+	reftable_iterator_destroy(&ri->iter);
+	if (ri->merged) {
+		reftable_merged_table_free(ri->merged);
+	}
+	return 0;
+}
+
+static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
+	reftable_ref_iterator_advance, reftable_ref_iterator_peel,
+	reftable_ref_iterator_abort
+};
+
+static struct ref_iterator *
+git_reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+				unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct git_reftable_iterator *ri = xcalloc(1, sizeof(*ri));
+
+	if (refs->err < 0) {
+		ri->err = refs->err;
+	} else if (refs->worktree_stack == NULL) {
+		struct reftable_merged_table *mt =
+			reftable_stack_merged_table(refs->main_stack);
+		ri->err = reftable_merged_table_seek_ref(mt, &ri->iter, prefix);
+	} else {
+		struct reftable_merged_table *mt1 =
+			reftable_stack_merged_table(refs->main_stack);
+		struct reftable_merged_table *mt2 =
+			reftable_stack_merged_table(refs->worktree_stack);
+		struct reftable_table *tabs =
+			xcalloc(2, sizeof(struct reftable_table));
+		reftable_table_from_merged_table(&tabs[0], mt1);
+		reftable_table_from_merged_table(&tabs[1], mt2);
+		ri->err = reftable_new_merged_table(&ri->merged, tabs, 2,
+						    the_hash_algo->format_id);
+		if (ri->err == 0)
+			ri->err = reftable_merged_table_seek_ref(
+				ri->merged, &ri->iter, prefix);
+	}
+
+	base_ref_iterator_init(&ri->base, &reftable_ref_iterator_vtable, 1);
+	ri->prefix = prefix;
+	ri->base.oid = &ri->oid;
+	ri->flags = flags;
+	ri->ref_store = ref_store;
+	return &ri->base;
+}
+
+static int fixup_symrefs(struct ref_store *ref_store,
+			 struct ref_transaction *transaction)
+{
+	struct strbuf referent = STRBUF_INIT;
+	int i = 0;
+	int err = 0;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *update = transaction->updates[i];
+		struct object_id old_oid;
+
+		err = git_reftable_read_raw_ref(ref_store, update->refname,
+						&old_oid, &referent,
+						/* mutate input, like
+						   files-backend.c */
+						&update->type);
+		if (err < 0 && errno == ENOENT &&
+		    is_null_oid(&update->old_oid)) {
+			err = 0;
+		}
+		if (err < 0)
+			goto done;
+
+		if (!(update->type & REF_ISSYMREF))
+			continue;
+
+		if (update->flags & REF_NO_DEREF) {
+			/* what should happen here? See files-backend.c
+			 * lock_ref_for_update. */
+		} else {
+			/*
+			  If we are updating a symref (eg. HEAD), we should also
+			  update the branch that the symref points to.
+
+			  This is generic functionality, and would be better
+			  done in refs.c, but the current implementation is
+			  intertwined with the locking in files-backend.c.
+			*/
+			int new_flags = update->flags;
+			struct ref_update *new_update = NULL;
+
+			/* if this is an update for HEAD, should also record a
+			   log entry for HEAD? See files-backend.c,
+			   split_head_update()
+			*/
+			new_update = ref_transaction_add_update(
+				transaction, referent.buf, new_flags,
+				&update->new_oid, &update->old_oid,
+				update->msg);
+			new_update->parent_update = update;
+
+			/* files-backend sets REF_LOG_ONLY here. */
+			update->flags |= REF_NO_DEREF | REF_LOG_ONLY;
+			update->flags &= ~REF_HAVE_OLD;
+		}
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	strbuf_release(&referent);
+	return err;
+}
+
+static int git_reftable_transaction_prepare(struct ref_store *ref_store,
+					    struct ref_transaction *transaction,
+					    struct strbuf *errbuf)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_addition *add = NULL;
+	struct reftable_stack *stack =
+		transaction->nr ?
+			stack_for(refs, transaction->updates[0]->refname) :
+			refs->main_stack;
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	err = reftable_stack_reload(stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_new_addition(&add, stack);
+	if (err) {
+		goto done;
+	}
+
+	err = fixup_symrefs(ref_store, transaction);
+	if (err) {
+		goto done;
+	}
+
+	transaction->backend_data = add;
+	transaction->state = REF_TRANSACTION_PREPARED;
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	if (err < 0) {
+		transaction->state = REF_TRANSACTION_CLOSED;
+		strbuf_addf(errbuf, "reftable: transaction prepare: %s",
+			    reftable_error_str(err));
+	}
+
+	return err;
+}
+
+static int git_reftable_transaction_abort(struct ref_store *ref_store,
+					  struct ref_transaction *transaction,
+					  struct strbuf *err)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	reftable_addition_destroy(add);
+	transaction->backend_data = NULL;
+	return 0;
+}
+
+static int reftable_check_old_oid(struct ref_store *refs, const char *refname,
+				  struct object_id *want_oid)
+{
+	struct object_id out_oid;
+	int out_flags = 0;
+	const char *resolved = refs_resolve_ref_unsafe(
+		refs, refname, RESOLVE_REF_READING, &out_oid, &out_flags);
+	if (is_null_oid(want_oid) != (resolved == NULL)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	if (resolved != NULL && !oideq(&out_oid, want_oid)) {
+		return REFTABLE_LOCK_ERROR;
+	}
+
+	return 0;
+}
+
+static int ref_update_cmp(const void *a, const void *b)
+{
+	return strcmp((*(struct ref_update **)a)->refname,
+		      (*(struct ref_update **)b)->refname);
+}
+
+static int write_transaction_table(struct reftable_writer *writer, void *arg)
+{
+	struct ref_transaction *transaction = (struct ref_transaction *)arg;
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)transaction->ref_store;
+	struct reftable_stack *stack =
+		stack_for(refs, transaction->updates[0]->refname);
+	uint64_t ts = reftable_stack_next_update_index(stack);
+	int err = 0;
+	int i = 0;
+	struct reftable_log_record *logs =
+		calloc(transaction->nr, sizeof(*logs));
+	struct ref_update **sorted =
+		malloc(transaction->nr * sizeof(struct ref_update *));
+	COPY_ARRAY(sorted, transaction->updates, transaction->nr);
+	QSORT(sorted, transaction->nr, ref_update_cmp);
+	reftable_writer_set_limits(writer, ts, ts);
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = sorted[i];
+		struct reftable_log_record *log = &logs[i];
+		fill_reftable_log_record(log);
+		log->refname = (char *)u->refname;
+		log->old_hash = u->old_oid.hash;
+		log->new_hash = u->new_oid.hash;
+		log->update_index = ts;
+		log->message = u->msg;
+
+		if (u->flags & REF_LOG_ONLY) {
+			continue;
+		}
+
+		if (u->flags & REF_HAVE_NEW) {
+			struct reftable_ref_record ref = { NULL };
+			struct object_id peeled;
+
+			int peel_error = peel_object(&u->new_oid, &peeled);
+			ref.refname = (char *)u->refname;
+
+			if (!is_null_oid(&u->new_oid)) {
+				ref.value = u->new_oid.hash;
+			}
+			ref.update_index = ts;
+			if (!peel_error) {
+				ref.target_value = peeled.hash;
+			}
+
+			err = reftable_writer_add_ref(writer, &ref);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+
+	for (i = 0; i < transaction->nr; i++) {
+		err = reftable_writer_add_log(writer, &logs[i]);
+		clear_reftable_log_record(&logs[i]);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	free(logs);
+	free(sorted);
+	return err;
+}
+
+static int git_reftable_transaction_finish(struct ref_store *ref_store,
+					   struct ref_transaction *transaction,
+					   struct strbuf *errmsg)
+{
+	struct reftable_addition *add =
+		(struct reftable_addition *)transaction->backend_data;
+	int err = 0;
+	int i;
+
+	for (i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		if (u->flags & REF_HAVE_OLD) {
+			err = reftable_check_old_oid(transaction->ref_store,
+						     u->refname, &u->old_oid);
+			if (err < 0) {
+				goto done;
+			}
+		}
+	}
+	if (transaction->nr) {
+		err = reftable_addition_add(add, &write_transaction_table,
+					    transaction);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+	err = reftable_addition_commit(add);
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_addition_destroy(add);
+	transaction->state = REF_TRANSACTION_CLOSED;
+	transaction->backend_data = NULL;
+	if (err) {
+		strbuf_addf(errmsg, "reftable: transaction failure: %s",
+			    reftable_error_str(err));
+		return -1;
+	}
+	return err;
+}
+
+static int
+git_reftable_transaction_initial_commit(struct ref_store *ref_store,
+					struct ref_transaction *transaction,
+					struct strbuf *errmsg)
+{
+	int err = git_reftable_transaction_prepare(ref_store, transaction,
+						   errmsg);
+	if (err)
+		return err;
+
+	return git_reftable_transaction_finish(ref_store, transaction, errmsg);
+}
+
+struct write_delete_refs_arg {
+	struct reftable_stack *stack;
+	struct string_list *refnames;
+	const char *logmsg;
+	unsigned int flags;
+};
+
+static int write_delete_refs_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_delete_refs_arg *arg =
+		(struct write_delete_refs_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int err = 0;
+	int i = 0;
+
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_ref_record ref = {
+			.refname = (char *)arg->refnames->items[i].string,
+			.update_index = ts,
+		};
+		err = reftable_writer_add_ref(writer, &ref);
+		if (err < 0) {
+			return err;
+		}
+	}
+
+	for (i = 0; i < arg->refnames->nr; i++) {
+		struct reftable_log_record log = {
+			.update_index = ts,
+		};
+		struct reftable_ref_record current = { NULL };
+		fill_reftable_log_record(&log);
+		log.message = xstrdup(arg->logmsg);
+		log.new_hash = NULL;
+		log.old_hash = NULL;
+		log.update_index = ts;
+		log.refname = (char *)arg->refnames->items[i].string;
+
+		if (reftable_stack_read_ref(arg->stack, log.refname,
+					    &current) == 0) {
+			log.old_hash = current.value;
+		}
+		err = reftable_writer_add_log(writer, &log);
+		log.old_hash = NULL;
+		reftable_ref_record_clear(&current);
+
+		clear_reftable_log_record(&log);
+		if (err < 0) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int git_reftable_delete_refs(struct ref_store *ref_store,
+				    const char *msg,
+				    struct string_list *refnames,
+				    unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack =
+		stack_for(refs, refnames->items[0].string);
+	struct write_delete_refs_arg arg = {
+		.stack = stack,
+		.refnames = refnames,
+		.logmsg = msg,
+		.flags = flags,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+
+	string_list_sort(refnames);
+	err = reftable_stack_reload(stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(stack, &write_delete_refs_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+static int git_reftable_pack_refs(struct ref_store *ref_store,
+				  unsigned int flags)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	int err = refs->err;
+	if (err < 0) {
+		return err;
+	}
+	err = reftable_stack_compact_all(refs->main_stack, NULL);
+	if (err == 0 && refs->worktree_stack != NULL)
+		err = reftable_stack_compact_all(refs->worktree_stack, NULL);
+	return err;
+}
+
+struct write_create_symref_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	const char *refname;
+	const char *target;
+	const char *logmsg;
+};
+
+static int write_create_symref_table(struct reftable_writer *writer, void *arg)
+{
+	struct write_create_symref_arg *create =
+		(struct write_create_symref_arg *)arg;
+	uint64_t ts = reftable_stack_next_update_index(create->stack);
+	int err = 0;
+
+	struct reftable_ref_record ref = {
+		.refname = (char *)create->refname,
+		.target = (char *)create->target,
+		.update_index = ts,
+	};
+	reftable_writer_set_limits(writer, ts, ts);
+	err = reftable_writer_add_ref(writer, &ref);
+	if (err == 0) {
+		struct reftable_log_record log = { NULL };
+		struct object_id new_oid;
+		struct object_id old_oid;
+
+		fill_reftable_log_record(&log);
+		log.refname = (char *)create->refname;
+		log.message = (char *)create->logmsg;
+		log.update_index = ts;
+		if (refs_resolve_ref_unsafe(
+			    (struct ref_store *)create->refs, create->refname,
+			    RESOLVE_REF_READING, &old_oid, NULL) != NULL) {
+			log.old_hash = old_oid.hash;
+		}
+
+		if (refs_resolve_ref_unsafe((struct ref_store *)create->refs,
+					    create->target, RESOLVE_REF_READING,
+					    &new_oid, NULL) != NULL) {
+			log.new_hash = new_oid.hash;
+		}
+
+		if (log.old_hash != NULL || log.new_hash != NULL) {
+			err = reftable_writer_add_log(writer, &log);
+		}
+		log.refname = NULL;
+		log.message = NULL;
+		log.old_hash = NULL;
+		log.new_hash = NULL;
+		clear_reftable_log_record(&log);
+	}
+	return err;
+}
+
+static int git_reftable_create_symref(struct ref_store *ref_store,
+				      const char *refname, const char *target,
+				      const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, refname);
+	struct write_create_symref_arg arg = { .refs = refs,
+					       .stack = stack,
+					       .refname = refname,
+					       .target = target,
+					       .logmsg = logmsg };
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(stack);
+	if (err) {
+		goto done;
+	}
+	err = reftable_stack_add(stack, &write_create_symref_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+struct write_rename_arg {
+	struct reftable_stack *stack;
+	const char *oldname;
+	const char *newname;
+	const char *logmsg;
+};
+
+static int write_rename_table(struct reftable_writer *writer, void *argv)
+{
+	struct write_rename_arg *arg = (struct write_rename_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	struct reftable_ref_record ref = { NULL };
+	int err = reftable_stack_read_ref(arg->stack, arg->oldname, &ref);
+
+	if (err) {
+		goto done;
+	}
+
+	/* XXX do ref renames overwrite the target? */
+	if (reftable_stack_read_ref(arg->stack, arg->newname, &ref) == 0) {
+		goto done;
+	}
+
+	free(ref.refname);
+	ref.refname = strdup(arg->newname);
+	reftable_writer_set_limits(writer, ts, ts);
+	ref.update_index = ts;
+
+	{
+		struct reftable_ref_record todo[2] = { { NULL } };
+		todo[0].refname = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		/* leave todo[0] empty */
+		todo[1] = ref;
+		todo[1].update_index = ts;
+
+		err = reftable_writer_add_refs(writer, todo, 2);
+		if (err < 0) {
+			goto done;
+		}
+	}
+
+	if (ref.value != NULL) {
+		struct reftable_log_record todo[2] = { { NULL } };
+		fill_reftable_log_record(&todo[0]);
+		fill_reftable_log_record(&todo[1]);
+
+		todo[0].refname = (char *)arg->oldname;
+		todo[0].update_index = ts;
+		todo[0].message = (char *)arg->logmsg;
+		todo[0].old_hash = ref.value;
+		todo[0].new_hash = NULL;
+
+		todo[1].refname = (char *)arg->newname;
+		todo[1].update_index = ts;
+		todo[1].old_hash = NULL;
+		todo[1].new_hash = ref.value;
+		todo[1].message = (char *)arg->logmsg;
+
+		err = reftable_writer_add_logs(writer, todo, 2);
+
+		clear_reftable_log_record(&todo[0]);
+		clear_reftable_log_record(&todo[1]);
+
+		if (err < 0) {
+			goto done;
+		}
+
+	} else {
+		/* XXX symrefs? */
+	}
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+static int git_reftable_rename_ref(struct ref_store *ref_store,
+				   const char *oldrefname,
+				   const char *newrefname, const char *logmsg)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, newrefname);
+	struct write_rename_arg arg = {
+		.stack = stack,
+		.oldname = oldrefname,
+		.newname = newrefname,
+		.logmsg = logmsg,
+	};
+	int err = refs->err;
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_stack_reload(stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_add(stack, &write_rename_table, &arg);
+done:
+	assert(err != REFTABLE_API_ERROR);
+	return err;
+}
+
+static int git_reftable_copy_ref(struct ref_store *ref_store,
+				 const char *oldrefname, const char *newrefname,
+				 const char *logmsg)
+{
+	BUG("reftable reference store does not support copying references");
+}
+
+struct git_reftable_reflog_ref_iterator {
+	struct ref_iterator base;
+	struct reftable_iterator iter;
+	struct reftable_log_record log;
+	struct object_id oid;
+
+	/* Used when iterating over worktree & main */
+	struct reftable_merged_table *merged;
+	char *last_name;
+};
+
+static int
+git_reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_reflog_ref_iterator *ri =
+		(struct git_reftable_reflog_ref_iterator *)ref_iterator;
+
+	while (1) {
+		int err = reftable_iterator_next_log(&ri->iter, &ri->log);
+		if (err > 0) {
+			return ITER_DONE;
+		}
+		if (err < 0) {
+			return ITER_ERROR;
+		}
+
+		ri->base.refname = ri->log.refname;
+		if (ri->last_name != NULL &&
+		    !strcmp(ri->log.refname, ri->last_name)) {
+			/* we want the refnames that we have reflogs for, so we
+			 * skip if we've already produced this name. This could
+			 * be faster by seeking directly to
+			 * reflog@update_index==0.
+			 */
+			continue;
+		}
+
+		free(ri->last_name);
+		ri->last_name = xstrdup(ri->log.refname);
+		hashcpy(ri->oid.hash, ri->log.new_hash);
+		return ITER_OK;
+	}
+}
+
+static int
+git_reftable_reflog_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				      struct object_id *peeled)
+{
+	BUG("not supported.");
+	return -1;
+}
+
+static int
+git_reftable_reflog_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct git_reftable_reflog_ref_iterator *ri =
+		(struct git_reftable_reflog_ref_iterator *)ref_iterator;
+	reftable_log_record_clear(&ri->log);
+	reftable_iterator_destroy(&ri->iter);
+	if (ri->merged)
+		reftable_merged_table_free(ri->merged);
+	return 0;
+}
+
+static struct ref_iterator_vtable git_reftable_reflog_ref_iterator_vtable = {
+	git_reftable_reflog_ref_iterator_advance,
+	git_reftable_reflog_ref_iterator_peel,
+	git_reftable_reflog_ref_iterator_abort
+};
+
+static struct ref_iterator *
+git_reftable_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct git_reftable_reflog_ref_iterator *ri = xcalloc(sizeof(*ri), 1);
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+
+	if (refs->worktree_stack == NULL) {
+		struct reftable_stack *stack = refs->main_stack;
+		struct reftable_merged_table *mt =
+			reftable_stack_merged_table(stack);
+		int err = reftable_merged_table_seek_log(mt, &ri->iter, "");
+		if (err < 0) {
+			free(ri);
+			/* XXX is this allowed? */
+			return NULL;
+		}
+	} else {
+		struct reftable_merged_table *mt1 =
+			reftable_stack_merged_table(refs->main_stack);
+		struct reftable_merged_table *mt2 =
+			reftable_stack_merged_table(refs->worktree_stack);
+		struct reftable_table *tabs =
+			xcalloc(2, sizeof(struct reftable_table));
+		int err = 0;
+		reftable_table_from_merged_table(&tabs[0], mt1);
+		reftable_table_from_merged_table(&tabs[1], mt2);
+		err = reftable_new_merged_table(&ri->merged, tabs, 2,
+						the_hash_algo->format_id);
+		if (err < 0) {
+			free(tabs);
+			/* XXX see above */
+			return NULL;
+		}
+		err = reftable_merged_table_seek_ref(ri->merged, &ri->iter, "");
+		if (err < 0) {
+			return NULL;
+		}
+	}
+	base_ref_iterator_init(&ri->base,
+			       &git_reftable_reflog_ref_iterator_vtable, 1);
+	ri->base.oid = &ri->oid;
+
+	return (struct ref_iterator *)ri;
+}
+
+static int git_reftable_for_each_reflog_ent_newest_first(
+	struct ref_store *ref_store, const char *refname, each_reflog_ent_fn fn,
+	void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, refname);
+	struct reftable_merged_table *mt = NULL;
+	int err = 0;
+	struct reftable_log_record log = { NULL };
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	mt = reftable_stack_merged_table(stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	while (err == 0) {
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+
+		if (strcmp(log.refname, refname)) {
+			break;
+		}
+
+		hashcpy(old_oid.hash, log.old_hash);
+		hashcpy(new_oid.hash, log.new_hash);
+
+		full_committer = fmt_ident(log.name, log.email,
+					   WANT_COMMITTER_IDENT,
+					   /*date*/ NULL, IDENT_NO_DATE);
+		err = fn(&old_oid, &new_oid, full_committer, log.time,
+			 log.tz_offset, log.message, cb_data);
+		if (err)
+			break;
+	}
+
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+static int git_reftable_for_each_reflog_ent_oldest_first(
+	struct ref_store *ref_store, const char *refname, each_reflog_ent_fn fn,
+	void *cb_data)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, refname);
+	struct reftable_merged_table *mt = NULL;
+	struct reftable_log_record *logs = NULL;
+	int cap = 0;
+	int len = 0;
+	int err = 0;
+	int i = 0;
+
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	mt = reftable_stack_merged_table(stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+
+	while (err == 0) {
+		struct reftable_log_record log = { NULL };
+		err = reftable_iterator_next_log(&it, &log);
+		if (err > 0) {
+			err = 0;
+			break;
+		}
+		if (err < 0) {
+			break;
+		}
+
+		if (strcmp(log.refname, refname)) {
+			break;
+		}
+
+		if (len == cap) {
+			cap = 2 * cap + 1;
+			logs = realloc(logs, cap * sizeof(*logs));
+		}
+
+		logs[len++] = log;
+	}
+
+	for (i = len; i--;) {
+		struct reftable_log_record *log = &logs[i];
+		struct object_id old_oid;
+		struct object_id new_oid;
+		const char *full_committer = "";
+
+		hashcpy(old_oid.hash, log->old_hash);
+		hashcpy(new_oid.hash, log->new_hash);
+
+		full_committer = fmt_ident(log->name, log->email,
+					   WANT_COMMITTER_IDENT, NULL,
+					   IDENT_NO_DATE);
+		err = fn(&old_oid, &new_oid, full_committer, log->time,
+			 log->tz_offset, log->message, cb_data);
+		if (err) {
+			break;
+		}
+	}
+
+	for (i = 0; i < len; i++) {
+		reftable_log_record_clear(&logs[i]);
+	}
+	free(logs);
+
+	reftable_iterator_destroy(&it);
+	return err;
+}
+
+static int git_reftable_reflog_exists(struct ref_store *ref_store,
+				      const char *refname)
+{
+	struct reftable_iterator it = { NULL };
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, refname);
+	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
+	struct reftable_log_record log = { NULL };
+	int err = refs->err;
+
+	if (err < 0) {
+		goto done;
+	}
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err) {
+		goto done;
+	}
+	err = reftable_iterator_next_log(&it, &log);
+	if (err) {
+		goto done;
+	}
+
+	if (strcmp(log.refname, refname)) {
+		err = 1;
+	}
+
+done:
+	reftable_iterator_destroy(&it);
+	reftable_log_record_clear(&log);
+	return !err;
+}
+
+static int git_reftable_create_reflog(struct ref_store *ref_store,
+				      const char *refname, int force_create,
+				      struct strbuf *err)
+{
+	return 0;
+}
+
+static int git_reftable_delete_reflog(struct ref_store *ref_store,
+				      const char *refname)
+{
+	return 0;
+}
+
+struct reflog_expiry_arg {
+	struct git_reftable_ref_store *refs;
+	struct reftable_stack *stack;
+	struct reftable_log_record *tombstones;
+	int len;
+	int cap;
+};
+
+static void clear_log_tombstones(struct reflog_expiry_arg *arg)
+{
+	int i = 0;
+	for (; i < arg->len; i++) {
+		reftable_log_record_clear(&arg->tombstones[i]);
+	}
+
+	FREE_AND_NULL(arg->tombstones);
+}
+
+static void add_log_tombstone(struct reflog_expiry_arg *arg,
+			      const char *refname, uint64_t ts)
+{
+	struct reftable_log_record tombstone = {
+		.refname = xstrdup(refname),
+		.update_index = ts,
+	};
+	if (arg->len == arg->cap) {
+		arg->cap = 2 * arg->cap + 1;
+		arg->tombstones =
+			realloc(arg->tombstones, arg->cap * sizeof(tombstone));
+	}
+	arg->tombstones[arg->len++] = tombstone;
+}
+
+static int write_reflog_expiry_table(struct reftable_writer *writer, void *argv)
+{
+	struct reflog_expiry_arg *arg = (struct reflog_expiry_arg *)argv;
+	uint64_t ts = reftable_stack_next_update_index(arg->stack);
+	int i = 0;
+	reftable_writer_set_limits(writer, ts, ts);
+	for (i = 0; i < arg->len; i++) {
+		int err = reftable_writer_add_log(writer, &arg->tombstones[i]);
+		if (err) {
+			return err;
+		}
+	}
+	return 0;
+}
+
+static int
+git_reftable_reflog_expire(struct ref_store *ref_store, const char *refname,
+			   const struct object_id *oid, unsigned int flags,
+			   reflog_expiry_prepare_fn prepare_fn,
+			   reflog_expiry_should_prune_fn should_prune_fn,
+			   reflog_expiry_cleanup_fn cleanup_fn,
+			   void *policy_cb_data)
+{
+	/*
+	  For log expiry, we write tombstones in place of the expired entries,
+	  This means that the entries are still retrievable by delving into the
+	  stack, and expiring entries paradoxically takes extra memory.
+
+	  This memory is only reclaimed when some operation issues a
+	  git_reftable_pack_refs(), which will compact the entire stack and get
+	  rid of deletion entries.
+
+	  It would be better if the refs backend supported an API that sets a
+	  criterion for all refs, passing the criterion to pack_refs().
+	*/
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, refname);
+	struct reftable_merged_table *mt = NULL;
+	struct reflog_expiry_arg arg = {
+		.stack = stack,
+		.refs = refs,
+	};
+	struct reftable_log_record log = { NULL };
+	struct reftable_iterator it = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+	err = reftable_stack_reload(stack);
+	if (err) {
+		goto done;
+	}
+
+	mt = reftable_stack_merged_table(stack);
+	err = reftable_merged_table_seek_log(mt, &it, refname);
+	if (err < 0) {
+		goto done;
+	}
+
+	while (1) {
+		struct object_id ooid;
+		struct object_id noid;
+
+		int err = reftable_iterator_next_log(&it, &log);
+		if (err < 0) {
+			goto done;
+		}
+
+		if (err > 0 || strcmp(log.refname, refname)) {
+			break;
+		}
+		hashcpy(ooid.hash, log.old_hash);
+		hashcpy(noid.hash, log.new_hash);
+
+		if (should_prune_fn(&ooid, &noid, log.email,
+				    (timestamp_t)log.time, log.tz_offset,
+				    log.message, policy_cb_data)) {
+			add_log_tombstone(&arg, refname, log.update_index);
+		}
+	}
+	err = reftable_stack_add(stack, &write_reflog_expiry_table, &arg);
+
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_log_record_clear(&log);
+	reftable_iterator_destroy(&it);
+	clear_log_tombstones(&arg);
+	return err;
+}
+
+static int git_reftable_read_raw_ref(struct ref_store *ref_store,
+				     const char *refname, struct object_id *oid,
+				     struct strbuf *referent,
+				     unsigned int *type)
+{
+	struct git_reftable_ref_store *refs =
+		(struct git_reftable_ref_store *)ref_store;
+	struct reftable_stack *stack = stack_for(refs, refname);
+
+	struct reftable_ref_record ref = { NULL };
+	int err = 0;
+	if (refs->err < 0) {
+		return refs->err;
+	}
+
+	/* This is usually not needed, but Git doesn't signal to ref backend if
+	   a subprocess updated the ref DB.  So we always check.
+	*/
+	err = reftable_stack_reload(stack);
+	if (err) {
+		goto done;
+	}
+
+	err = reftable_stack_read_ref(stack, refname, &ref);
+	if (err > 0) {
+		errno = ENOENT;
+		err = -1;
+		goto done;
+	}
+	if (err < 0) {
+		errno = reftable_error_to_errno(err);
+		err = -1;
+		goto done;
+	}
+	if (ref.target != NULL) {
+		strbuf_reset(referent);
+		strbuf_addstr(referent, ref.target);
+		*type |= REF_ISSYMREF;
+	} else if (ref.value != NULL) {
+		hashcpy(oid->hash, ref.value);
+	} else {
+		*type |= REF_ISBROKEN;
+		errno = EINVAL;
+		err = -1;
+	}
+done:
+	assert(err != REFTABLE_API_ERROR);
+	reftable_ref_record_clear(&ref);
+	return err;
+}
+
+struct ref_storage_be refs_be_reftable = {
+	&refs_be_files,
+	"reftable",
+	git_reftable_ref_store_create,
+	git_reftable_init_db,
+	git_reftable_transaction_prepare,
+	git_reftable_transaction_finish,
+	git_reftable_transaction_abort,
+	git_reftable_transaction_initial_commit,
+
+	git_reftable_pack_refs,
+	git_reftable_create_symref,
+	git_reftable_delete_refs,
+	git_reftable_rename_ref,
+	git_reftable_copy_ref,
+
+	git_reftable_ref_iterator_begin,
+	git_reftable_read_raw_ref,
+
+	git_reftable_reflog_iterator_begin,
+	git_reftable_for_each_reflog_ent_oldest_first,
+	git_reftable_for_each_reflog_ent_newest_first,
+	git_reftable_reflog_exists,
+	git_reftable_create_reflog,
+	git_reftable_delete_reflog,
+	git_reftable_reflog_expire,
+};
diff --git a/reftable/update.sh b/reftable/update.sh
index 4dadd875f4..d816005db6 100755
--- a/reftable/update.sh
+++ b/reftable/update.sh
@@ -16,7 +16,4 @@ cp reftable-repo/LICENSE reftable/
 git --git-dir reftable-repo/.git show --no-patch --format=oneline HEAD \
   > reftable/VERSION
 
-mv reftable/system.h reftable/system.h~
-sed 's|if REFTABLE_IN_GITCORE|if 1 /* REFTABLE_IN_GITCORE */|'  < reftable/system.h~ > reftable/system.h
-
 git add reftable/*.[ch] reftable/LICENSE reftable/VERSION
diff --git a/repository.c b/repository.c
index 6f7f6f002b..087760bc18 100644
--- a/repository.c
+++ b/repository.c
@@ -178,6 +178,8 @@ int repo_init(struct repository *repo,
 	if (worktree)
 		repo_set_worktree(repo, worktree);
 
+	repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
+
 	clear_repository_format(&format);
 	return 0;
 
diff --git a/repository.h b/repository.h
index 3c1f7d54bd..293b346304 100644
--- a/repository.h
+++ b/repository.h
@@ -74,6 +74,9 @@ struct repository {
 	 */
 	struct ref_store *refs_private;
 
+	/* The format to use for the ref database. */
+	char *ref_storage_format;
+
 	/*
 	 * Contains path to often used file names.
 	 */
diff --git a/setup.c b/setup.c
index 78e7ae1be7..d8ad7e9166 100644
--- a/setup.c
+++ b/setup.c
@@ -490,8 +490,10 @@ static enum extension_result handle_extension(const char *var,
 {
 	if (!strcmp(ext, "noop-v1")) {
 		return EXTENSION_OK;
+	} else if (!strcmp(ext, "refstorage")) {
+		data->ref_storage = xstrdup(value);
+		return EXTENSION_OK;
 	}
-
 	return EXTENSION_UNKNOWN;
 }
 
@@ -642,6 +644,7 @@ void clear_repository_format(struct repository_format *format)
 	string_list_clear(&format->v1_only_extensions, 0);
 	free(format->work_tree);
 	free(format->partial_clone);
+	free(format->ref_storage);
 	init_repository_format(format);
 }
 
@@ -1299,8 +1302,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
 				gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
 			setup_git_env(gitdir);
 		}
-		if (startup_info->have_repository)
+		if (startup_info->have_repository) {
 			repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+			the_repository->ref_storage_format =
+				xstrdup_or_null(repo_fmt.ref_storage);
+		}
 	}
 
 	strbuf_release(&dir);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
new file mode 100755
index 0000000000..a6634bc882
--- /dev/null
+++ b/t/t0031-reftable.sh
@@ -0,0 +1,173 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='reftable basics'
+
+. ./test-lib.sh
+
+INVALID_SHA1=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+initialize ()  {
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	mv .git/hooks .git/hooks-disabled
+}
+
+test_expect_success 'delete ref' '
+	initialize &&
+	test_commit file &&
+	SHA=$(git show-ref -s --verify HEAD) &&
+	test_write_lines "$SHA refs/heads/master" "$SHA refs/tags/file" >expect &&
+	git show-ref > actual &&
+	! git update-ref -d refs/tags/file $INVALID_SHA1 &&
+	test_cmp expect actual &&
+	git update-ref -d refs/tags/file $SHA  &&
+	test_write_lines "$SHA refs/heads/master" >expect &&
+	git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'clone calls transaction_initial_commit' '
+	test_commit message1 file1 &&
+	git clone . cloned &&
+	(test  -f cloned/file1 || echo "Fixme.")
+'
+
+test_expect_success 'basic operation of reftable storage: commit, show-ref' '
+	initialize &&
+	test_commit file &&
+	test_write_lines refs/heads/master refs/tags/file >expect &&
+	git show-ref &&
+	git show-ref | cut -f2 -d" " > actual &&
+	test_cmp actual expect
+'
+
+test_expect_success 'reflog, repack' '
+	initialize &&
+	for count in $(test_seq 1 10)
+	do
+		test_commit "number $count" file.t $count number-$count ||
+		return 1
+	done &&
+	git pack-refs &&
+	ls -1 .git/reftable >table-files &&
+	test_line_count = 2 table-files &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 10 output &&
+	grep "commit (initial): number 1" output &&
+	grep "commit: number 10" output &&
+	git gc &&
+	git reflog refs/heads/master >output &&
+	test_line_count = 0 output
+'
+
+test_expect_success 'branch switch in reflog output' '
+	initialize &&
+	test_commit file1 &&
+	git checkout -b branch1 &&
+	test_commit file2 &&
+	git checkout -b branch2 &&
+	git switch - &&
+	git rev-parse --symbolic-full-name HEAD > actual &&
+	echo refs/heads/branch1 > expect &&
+	test_cmp actual expect
+'
+
+
+# This matches show-ref's output
+print_ref() {
+	echo "$(git rev-parse "$1") $1"
+}
+
+test_expect_success 'peeled tags are stored' '
+	initialize &&
+	test_commit file &&
+	git tag -m "annotated tag" test_tag HEAD &&
+	{
+		print_ref "refs/heads/master" &&
+		print_ref "refs/tags/file" &&
+		print_ref "refs/tags/test_tag" &&
+		print_ref "refs/tags/test_tag^{}"
+	} >expect &&
+	git show-ref -d >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'show-ref works on fresh repo' '
+	initialize &&
+	rm -rf .git &&
+	git init --ref-storage=reftable &&
+	>expect &&
+	! git show-ref > actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'checkout unborn branch' '
+	initialize &&
+	git checkout -b master
+'
+
+
+test_expect_success 'dir/file conflict' '
+	initialize &&
+	test_commit file &&
+	! git branch master/forbidden
+'
+
+
+test_expect_success 'do not clobber existing repo' '
+	rm -rf .git &&
+	git init --ref-storage=files &&
+	cat .git/HEAD > expect &&
+	test_commit file &&
+	(git init --ref-storage=reftable || true) &&
+	cat .git/HEAD > actual &&
+	test_cmp expect actual
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'pseudo refs' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+	git cherry-pick source &&
+	test -f file2
+'
+
+# cherry-pick uses a pseudo ref.
+test_expect_success 'rebase' '
+	initialize &&
+	test_commit message1 file1 &&
+	test_commit message2 file2 &&
+	git branch source &&
+	git checkout HEAD^ &&
+	test_commit message3 file3 &&
+	git rebase source &&
+	test -f file2
+'
+
+test_expect_success 'worktrees' '
+	git init --ref-storage=reftable start &&
+	(cd start && test_commit file1 && git checkout -b branch1 &&
+	git checkout -b branch2 &&
+	git worktree add  ../wt
+	) &&
+	cd wt &&
+	git checkout branch1 &&
+	git branch
+'
+
+test_expect_success 'worktrees 2' '
+	initialize &&
+	test_commit file1 &&
+	mkdir existing_empty &&
+	git worktree add --detach existing_empty master
+'
+
+test_done
+
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 15/21] Read FETCH_HEAD as loose ref
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (13 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 14/21] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 16/21] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
                                                         ` (5 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

FETCH_HEAD is a loose ref (either symref or OID), followed by further
metadata. It can therefore not be read through a ref backend
normally. Special case this in reftable-backend.c.

This functionality is shared between all backends, but the ref backend
interface doesn't have a method to retrieve the repository directory,
hence this functionality must be pushed down to the backend
implementation.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 refs/reftable-backend.c | 31 +++++++++++++++++++++++++++++++
 t/t0031-reftable.sh     |  9 ++++++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 93f3d337b6..8d30f3ea61 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1304,6 +1304,33 @@ git_reftable_reflog_expire(struct ref_store *ref_store, const char *refname,
 	return err;
 }
 
+static int read_fetch_head(struct git_reftable_ref_store *refs,
+			   const char *refname, struct object_id *oid,
+			   struct strbuf *referent, unsigned int *type)
+{
+	struct strbuf sb = STRBUF_INIT;
+	struct strbuf path_sb = STRBUF_INIT;
+	int ret = -1;
+	int fd;
+	strbuf_addf(&path_sb, "%s/%s", refs->repo_dir, refname);
+	fd = open(path_sb.buf, O_RDONLY);
+	if (fd < 0) {
+		goto out;
+	}
+	strbuf_reset(&sb);
+	if (strbuf_read(&sb, fd, 256) < 0) {
+		ret = -1;
+	}
+	strbuf_rtrim(&sb);
+
+	ret = parse_loose_ref_contents(sb.buf, oid, referent, type);
+out:
+	close(fd);
+	strbuf_release(&sb);
+	strbuf_release(&path_sb);
+	return ret;
+}
+
 static int git_reftable_read_raw_ref(struct ref_store *ref_store,
 				     const char *refname, struct object_id *oid,
 				     struct strbuf *referent,
@@ -1319,6 +1346,10 @@ static int git_reftable_read_raw_ref(struct ref_store *ref_store,
 		return refs->err;
 	}
 
+	if (!strcmp(refname, "FETCH_HEAD")) {
+		return read_fetch_head(refs, refname, oid, referent, type);
+	}
+
 	/* This is usually not needed, but Git doesn't signal to ref backend if
 	   a subprocess updated the ref DB.  So we always check.
 	*/
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
index a6634bc882..9f90c10030 100755
--- a/t/t0031-reftable.sh
+++ b/t/t0031-reftable.sh
@@ -169,5 +169,12 @@ test_expect_success 'worktrees 2' '
 	git worktree add --detach existing_empty master
 '
 
-test_done
+test_expect_success 'FETCH_HEAD' '
+        initialize &&
+	test_commit one &&
+	(git init sub && cd sub && test_commit two) &&
+	git fetch sub &&
+	git checkout FETCH_HEAD
+'
 
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 16/21] Hookup unittests for the reftable library.
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (14 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 15/21] Read FETCH_HEAD as loose ref Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 17/21] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
                                                         ` (4 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

The unittests are under reftable/*_test.c, so all of the reftable code stays in
one directory. They are called from t/helpers/test-reftable.c in t0031-reftable.sh

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile                 | 19 +++++++++++++++++--
 t/helper/test-reftable.c | 15 +++++++++++++++
 t/helper/test-tool.c     |  1 +
 t/helper/test-tool.h     |  1 +
 t/t0031-reftable.sh      |  5 +++++
 5 files changed, 39 insertions(+), 2 deletions(-)
 create mode 100644 t/helper/test-reftable.c

diff --git a/Makefile b/Makefile
index 6571ed1246..5e69c16507 100644
--- a/Makefile
+++ b/Makefile
@@ -725,6 +725,7 @@ TEST_BUILTINS_OBJS += test-read-cache.o
 TEST_BUILTINS_OBJS += test-read-graph.o
 TEST_BUILTINS_OBJS += test-read-midx.o
 TEST_BUILTINS_OBJS += test-ref-store.o
+TEST_BUILTINS_OBJS += test-reftable.o
 TEST_BUILTINS_OBJS += test-regex.o
 TEST_BUILTINS_OBJS += test-repository.o
 TEST_BUILTINS_OBJS += test-revision-walking.o
@@ -808,6 +809,7 @@ LIB_FILE = libgit.a
 XDIFF_LIB = xdiff/lib.a
 VCSSVN_LIB = vcs-svn/lib.a
 REFTABLE_LIB = reftable/libreftable.a
+REFTABLE_TEST_LIB = reftable/libreftable_test.a
 
 GENERATED_H += config-list.h
 GENERATED_H += command-list.h
@@ -2371,6 +2373,15 @@ REFTABLE_OBJS += reftable/tree.o
 REFTABLE_OBJS += reftable/writer.o
 REFTABLE_OBJS += reftable/zlib-compat.o
 
+REFTABLE_TEST_OBJS += reftable/block_test.o
+REFTABLE_TEST_OBJS += reftable/merged_test.o
+REFTABLE_TEST_OBJS += reftable/record_test.o
+REFTABLE_TEST_OBJS += reftable/refname_test.o
+REFTABLE_TEST_OBJS += reftable/reftable_test.o
+REFTABLE_TEST_OBJS += reftable/strbuf_test.o
+REFTABLE_TEST_OBJS += reftable/stack_test.o
+REFTABLE_TEST_OBJS += reftable/tree_test.o
+REFTABLE_TEST_OBJS += reftable/test_framework.o
 
 TEST_OBJS := $(patsubst %$X,%.o,$(TEST_PROGRAMS)) $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
@@ -2378,6 +2389,7 @@ OBJECTS := $(LIB_OBJS) $(BUILTIN_OBJS) $(PROGRAM_OBJS) $(TEST_OBJS) \
 	$(VCSSVN_OBJS) \
 	$(FUZZ_OBJS) \
 	$(REFTABLE_OBJS) \
+	$(REFTABLE_TEST_OBJS) \
 	common-main.o \
 	git.o
 ifndef NO_CURL
@@ -2521,6 +2533,9 @@ $(VCSSVN_LIB): $(VCSSVN_OBJS)
 $(REFTABLE_LIB): $(REFTABLE_OBJS)
 	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
 
+$(REFTABLE_TEST_LIB): $(REFTABLE_TEST_OBJS)
+	$(QUIET_AR)$(RM) $@ && $(AR) $(ARFLAGS) $@ $^
+
 export DEFAULT_EDITOR DEFAULT_PAGER
 
 Documentation/GIT-EXCLUDED-PROGRAMS: FORCE
@@ -2803,7 +2818,7 @@ t/helper/test-svn-fe$X: $(VCSSVN_LIB)
 
 t/helper/test-tool$X: $(patsubst %,t/helper/%,$(TEST_BUILTINS_OBJS))
 
-t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS)
+t/helper/test-%$X: t/helper/test-%.o GIT-LDFLAGS $(GITLIBS) $(REFTABLE_TEST_LIB)
 	$(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) $(filter %.a,$^) $(LIBS)
 
 check-sha1:: t/helper/test-tool$X
@@ -3136,7 +3151,7 @@ cocciclean:
 clean: profile-clean coverage-clean cocciclean
 	$(RM) *.res
 	$(RM) $(OBJECTS)
-	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB)
+	$(RM) $(LIB_FILE) $(XDIFF_LIB) $(VCSSVN_LIB) $(REFTABLE_LIB) $(REFTABLE_TEST_LIB)
 	$(RM) $(ALL_PROGRAMS) $(SCRIPT_LIB) $(BUILT_INS) git$X
 	$(RM) $(TEST_PROGRAMS)
 	$(RM) $(FUZZ_PROGRAMS)
diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c
new file mode 100644
index 0000000000..def8883439
--- /dev/null
+++ b/t/helper/test-reftable.c
@@ -0,0 +1,15 @@
+#include "reftable/reftable-tests.h"
+#include "test-tool.h"
+
+int cmd__reftable(int argc, const char **argv)
+{
+	block_test_main(argc, argv);
+	merged_test_main(argc, argv);
+	record_test_main(argc, argv);
+	refname_test_main(argc, argv);
+	reftable_test_main(argc, argv);
+	strbuf_test_main(argc, argv);
+	stack_test_main(argc, argv);
+	tree_test_main(argc, argv);
+	return 0;
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 590b2efca7..10366b7b76 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -52,6 +52,7 @@ static struct test_cmd cmds[] = {
 	{ "read-graph", cmd__read_graph },
 	{ "read-midx", cmd__read_midx },
 	{ "ref-store", cmd__ref_store },
+	{ "reftable", cmd__reftable },
 	{ "regex", cmd__regex },
 	{ "repository", cmd__repository },
 	{ "revision-walking", cmd__revision_walking },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index ddc8e990e9..d52ba2f5e5 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -41,6 +41,7 @@ int cmd__read_cache(int argc, const char **argv);
 int cmd__read_graph(int argc, const char **argv);
 int cmd__read_midx(int argc, const char **argv);
 int cmd__ref_store(int argc, const char **argv);
+int cmd__reftable(int argc, const char **argv);
 int cmd__regex(int argc, const char **argv);
 int cmd__repository(int argc, const char **argv);
 int cmd__revision_walking(int argc, const char **argv);
diff --git a/t/t0031-reftable.sh b/t/t0031-reftable.sh
index 9f90c10030..34cc787a61 100755
--- a/t/t0031-reftable.sh
+++ b/t/t0031-reftable.sh
@@ -15,6 +15,11 @@ initialize ()  {
 	mv .git/hooks .git/hooks-disabled
 }
 
+test_expect_success 'unittests' '
+	test-tool reftable
+'
+
+
 test_expect_success 'delete ref' '
 	initialize &&
 	test_commit file &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 17/21] Add GIT_DEBUG_REFS debugging mechanism
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (15 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 16/21] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 18/21] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
                                                         ` (3 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

When set in the environment, GIT_DEBUG_REFS makes git print operations and
results as they flow through the ref storage backend. This helps debug
discrepancies between different ref backends.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile              |   1 +
 builtin/worktree.c    |   4 +-
 refs.c                |   8 +
 refs.h                |   1 +
 refs/debug.c          | 387 ++++++++++++++++++++++++++++++++++++++++++
 refs/refs-internal.h  |   6 +
 t/t0033-debug-refs.sh |  18 ++
 7 files changed, 424 insertions(+), 1 deletion(-)
 create mode 100644 refs/debug.c
 create mode 100755 t/t0033-debug-refs.sh

diff --git a/Makefile b/Makefile
index 5e69c16507..0948a3bb00 100644
--- a/Makefile
+++ b/Makefile
@@ -961,6 +961,7 @@ LIB_OBJS += rebase.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
+LIB_OBJS += refs/debug.o
 LIB_OBJS += refs/files-backend.o
 LIB_OBJS += refs/reftable-backend.o
 LIB_OBJS += refs/iterator.o
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 1935967072..12d2204239 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -403,7 +403,9 @@ static int add_worktree(const char *path, const char *refname,
 	 * worktree.
 	 */
 	strbuf_reset(&sb);
-	if (get_main_ref_store(the_repository)->be == &refs_be_reftable) {
+
+	if (!strcmp(ref_store_backend_name(get_main_ref_store(the_repository)),
+		    "reftable")) {
 		/* XXX this is cut & paste from reftable_init_db. */
 		strbuf_addf(&sb, "%s/HEAD", sb_repo.buf);
 		write_file(sb.buf, "%s", "ref: refs/heads/.invalid\n");
diff --git a/refs.c b/refs.c
index 3789bc1071..b37ccd4c53 100644
--- a/refs.c
+++ b/refs.c
@@ -44,6 +44,11 @@ int ref_storage_backend_exists(const char *name)
 	return find_ref_storage_backend(name) != NULL;
 }
 
+const char *ref_store_backend_name(struct ref_store *refs)
+{
+	return refs->be->name;
+}
+
 /*
  * How to handle various characters in refnames:
  * 0: An acceptable character for refs
@@ -1758,6 +1763,9 @@ struct ref_store *get_main_ref_store(struct repository *r)
 						 r->ref_storage_format :
 						 default_ref_storage(),
 					 REF_STORE_ALL_CAPS);
+	if (getenv("GIT_DEBUG_REFS")) {
+		r->refs_private = debug_wrap(r->gitdir, r->refs_private);
+	}
 	return r->refs_private;
 }
 
diff --git a/refs.h b/refs.h
index 581ec270dc..f6d619ea4b 100644
--- a/refs.h
+++ b/refs.h
@@ -825,6 +825,7 @@ int reflog_expire(const char *refname, const struct object_id *oid,
 int ref_storage_backend_exists(const char *name);
 
 struct ref_store *get_main_ref_store(struct repository *r);
+const char *ref_store_backend_name(struct ref_store *refs);
 
 /**
  * Submodules
diff --git a/refs/debug.c b/refs/debug.c
new file mode 100644
index 0000000000..865c5ec873
--- /dev/null
+++ b/refs/debug.c
@@ -0,0 +1,387 @@
+
+#include "refs-internal.h"
+
+struct debug_ref_store {
+	struct ref_store base;
+	struct ref_store *refs;
+};
+
+extern struct ref_storage_be refs_be_debug;
+struct ref_store *debug_wrap(const char *gitdir, struct ref_store *store);
+
+struct ref_store *debug_wrap(const char *gitdir, struct ref_store *store)
+{
+	struct debug_ref_store *res = malloc(sizeof(struct debug_ref_store));
+	struct ref_storage_be *be_copy = malloc(sizeof(*be_copy));
+	*be_copy = refs_be_debug;
+	be_copy->name = ref_store_backend_name(store);
+	fprintf(stderr, "ref_store for %s\n", gitdir);
+	res->refs = store;
+	base_ref_store_init((struct ref_store *)res, be_copy);
+	return (struct ref_store *)res;
+}
+
+static int debug_init_db(struct ref_store *refs, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res = drefs->refs->be->init_db(drefs->refs, err);
+	fprintf(stderr, "init_db: %d\n", res);
+	return res;
+}
+
+static int debug_transaction_prepare(struct ref_store *refs,
+				     struct ref_transaction *transaction,
+				     struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_prepare(drefs->refs, transaction,
+						   err);
+	fprintf(stderr, "transaction_prepare: %d\n", res);
+	return res;
+}
+
+static void print_update(int i, const char *refname,
+			 const struct object_id *old_oid,
+			 const struct object_id *new_oid, unsigned int flags,
+			 unsigned int type, const char *msg)
+{
+	char o[200] = "null";
+	char n[200] = "null";
+	if (old_oid)
+		oid_to_hex_r(o, old_oid);
+	if (new_oid)
+		oid_to_hex_r(n, new_oid);
+
+	type &= 0xf; /* see refs.h REF_* */
+	flags &= REF_HAVE_NEW | REF_HAVE_OLD | REF_NO_DEREF |
+		 REF_FORCE_CREATE_REFLOG | REF_LOG_ONLY;
+	fprintf(stderr, "%d: %s %s -> %s (F=0x%x, T=0x%x) \"%s\"\n", i, refname,
+		o, n, flags, type, msg);
+}
+
+static void print_transaction(struct ref_transaction *transaction)
+{
+	fprintf(stderr, "transaction {\n");
+	for (int i = 0; i < transaction->nr; i++) {
+		struct ref_update *u = transaction->updates[i];
+		print_update(i, u->refname, &u->old_oid, &u->new_oid, u->flags,
+			     u->type, u->msg);
+	}
+	fprintf(stderr, "}\n");
+}
+
+static int debug_transaction_finish(struct ref_store *refs,
+				    struct ref_transaction *transaction,
+				    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_finish(drefs->refs, transaction,
+						  err);
+	print_transaction(transaction);
+	fprintf(stderr, "finish: %d\n", res);
+	return res;
+}
+
+static int debug_transaction_abort(struct ref_store *refs,
+				   struct ref_transaction *transaction,
+				   struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->transaction_abort(drefs->refs, transaction, err);
+	return res;
+}
+
+static int debug_initial_transaction_commit(struct ref_store *refs,
+					    struct ref_transaction *transaction,
+					    struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)refs;
+	int res;
+	transaction->ref_store = drefs->refs;
+	res = drefs->refs->be->initial_transaction_commit(drefs->refs,
+							  transaction, err);
+	return res;
+}
+
+static int debug_pack_refs(struct ref_store *ref_store, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->pack_refs(drefs->refs, flags);
+	fprintf(stderr, "pack_refs: %d\n", res);
+	return res;
+}
+
+static int debug_create_symref(struct ref_store *ref_store,
+			       const char *ref_name, const char *target,
+			       const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_symref(drefs->refs, ref_name, target,
+						 logmsg);
+	fprintf(stderr, "create_symref: %s -> %s \"%s\": %d\n", ref_name,
+		target, logmsg, res);
+	return res;
+}
+
+static int debug_delete_refs(struct ref_store *ref_store, const char *msg,
+			     struct string_list *refnames, unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->delete_refs(drefs->refs, msg, refnames, flags);
+	int i = 0;
+	fprintf(stderr, "delete_refs {\n");
+	for (i = 0; i < refnames->nr; i++)
+		fprintf(stderr, "%s\n", refnames->items[i].string);
+	fprintf(stderr, "}: %d\n", res);
+	return res;
+}
+
+static int debug_rename_ref(struct ref_store *ref_store, const char *oldref,
+			    const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->rename_ref(drefs->refs, oldref, newref,
+					      logmsg);
+	fprintf(stderr, "rename_ref: %s -> %s \"%s\": %d\n", oldref, newref,
+		logmsg, res);
+	return res;
+}
+
+static int debug_copy_ref(struct ref_store *ref_store, const char *oldref,
+			  const char *newref, const char *logmsg)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res =
+		drefs->refs->be->copy_ref(drefs->refs, oldref, newref, logmsg);
+	fprintf(stderr, "copy_ref: %s -> %s \"%s\": %d\n", oldref, newref,
+		logmsg, res);
+	return res;
+}
+
+struct debug_ref_iterator {
+	struct ref_iterator base;
+	struct ref_iterator *iter;
+};
+
+static int debug_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->advance(diter->iter);
+	if (res)
+		fprintf(stderr, "iterator_advance: %d\n", res);
+	else
+		fprintf(stderr, "iterator_advance before: %s\n",
+			diter->iter->refname);
+
+	diter->base.ordered = diter->iter->ordered;
+	diter->base.refname = diter->iter->refname;
+	diter->base.oid = diter->iter->oid;
+	diter->base.flags = diter->iter->flags;
+	return res;
+}
+static int debug_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				   struct object_id *peeled)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->peel(diter->iter, peeled);
+	fprintf(stderr, "iterator_peel: %s: %d\n", diter->iter->refname, res);
+	return res;
+}
+
+static int debug_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct debug_ref_iterator *diter =
+		(struct debug_ref_iterator *)ref_iterator;
+	int res = diter->iter->vtable->abort(diter->iter);
+	fprintf(stderr, "iterator_abort: %d\n", res);
+	return res;
+}
+
+static struct ref_iterator_vtable debug_ref_iterator_vtable = {
+	debug_ref_iterator_advance, debug_ref_iterator_peel,
+	debug_ref_iterator_abort
+};
+
+static struct ref_iterator *
+debug_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
+			 unsigned int flags)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->iterator_begin(drefs->refs, prefix, flags);
+	struct debug_ref_iterator *diter = xcalloc(1, sizeof(*diter));
+	base_ref_iterator_init(&diter->base, &debug_ref_iterator_vtable, 1);
+	diter->iter = res;
+	fprintf(stderr, "ref_iterator_begin: %s (0x%x)\n", prefix, flags);
+	return &diter->base;
+}
+
+static int debug_read_raw_ref(struct ref_store *ref_store, const char *refname,
+			      struct object_id *oid, struct strbuf *referent,
+			      unsigned int *type)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = 0;
+
+	oidcpy(oid, &null_oid);
+	res = drefs->refs->be->read_raw_ref(drefs->refs, refname, oid, referent,
+					    type);
+
+	if (res == 0) {
+		fprintf(stderr, "read_raw_ref: %s: %s (=> %s) type %x: %d\n",
+			refname, oid_to_hex(oid), referent->buf, *type, res);
+	} else {
+		fprintf(stderr, "read_raw_ref: %s err %d\n", refname, res);
+	}
+	return res;
+}
+
+static struct ref_iterator *
+debug_reflog_iterator_begin(struct ref_store *ref_store)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct ref_iterator *res =
+		drefs->refs->be->reflog_iterator_begin(drefs->refs);
+	fprintf(stderr, "for_each_reflog_iterator_begin\n");
+	return res;
+}
+
+struct debug_reflog {
+	const char *refname;
+	each_reflog_ent_fn *fn;
+	void *cb_data;
+};
+
+static int debug_print_reflog_ent(struct object_id *old_oid,
+				  struct object_id *new_oid,
+				  const char *committer, timestamp_t timestamp,
+				  int tz, const char *msg, void *cb_data)
+{
+	struct debug_reflog *dbg = (struct debug_reflog *)cb_data;
+	int ret;
+	char o[100] = "null";
+	char n[100] = "null";
+	if (old_oid)
+		oid_to_hex_r(o, old_oid);
+	if (new_oid)
+		oid_to_hex_r(n, new_oid);
+
+	ret = dbg->fn(old_oid, new_oid, committer, timestamp, tz, msg,
+		      dbg->cb_data);
+	fprintf(stderr, "reflog_ent %s (ret %d): %s -> %s, %s %ld \"%s\"\n",
+		dbg->refname, ret, o, n, committer, (long int)timestamp, msg);
+	return ret;
+}
+
+static int debug_for_each_reflog_ent(struct ref_store *ref_store,
+				     const char *refname, each_reflog_ent_fn fn,
+				     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct debug_reflog dbg = {
+		.refname = refname,
+		.fn = fn,
+		.cb_data = cb_data,
+	};
+
+	int res = drefs->refs->be->for_each_reflog_ent(
+		drefs->refs, refname, &debug_print_reflog_ent, &dbg);
+	fprintf(stderr, "for_each_reflog: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_for_each_reflog_ent_reverse(struct ref_store *ref_store,
+					     const char *refname,
+					     each_reflog_ent_fn fn,
+					     void *cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	struct debug_reflog dbg = {
+		.refname = refname,
+		.fn = fn,
+		.cb_data = cb_data,
+	};
+	int res = drefs->refs->be->for_each_reflog_ent_reverse(
+		drefs->refs, refname, &debug_print_reflog_ent, &dbg);
+	fprintf(stderr, "for_each_reflog_reverse: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_reflog_exists(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_exists(drefs->refs, refname);
+	fprintf(stderr, "reflog_exists: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_create_reflog(struct ref_store *ref_store, const char *refname,
+			       int force_create, struct strbuf *err)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->create_reflog(drefs->refs, refname,
+						 force_create, err);
+	fprintf(stderr, "create_reflog: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_delete_reflog(struct ref_store *ref_store, const char *refname)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->delete_reflog(drefs->refs, refname);
+	fprintf(stderr, "delete_reflog: %s: %d\n", refname, res);
+	return res;
+}
+
+static int debug_reflog_expire(struct ref_store *ref_store, const char *refname,
+			       const struct object_id *oid, unsigned int flags,
+			       reflog_expiry_prepare_fn prepare_fn,
+			       reflog_expiry_should_prune_fn should_prune_fn,
+			       reflog_expiry_cleanup_fn cleanup_fn,
+			       void *policy_cb_data)
+{
+	struct debug_ref_store *drefs = (struct debug_ref_store *)ref_store;
+	int res = drefs->refs->be->reflog_expire(drefs->refs, refname, oid,
+						 flags, prepare_fn,
+						 should_prune_fn, cleanup_fn,
+						 policy_cb_data);
+	fprintf(stderr, "reflog_expire: %s: %d\n", refname, res);
+	return res;
+}
+
+struct ref_storage_be refs_be_debug = {
+	NULL,
+	"debug",
+	NULL,
+	debug_init_db,
+	debug_transaction_prepare,
+	debug_transaction_finish,
+	debug_transaction_abort,
+	debug_initial_transaction_commit,
+
+	debug_pack_refs,
+	debug_create_symref,
+	debug_delete_refs,
+	debug_rename_ref,
+	debug_copy_ref,
+
+	debug_ref_iterator_begin,
+	debug_read_raw_ref,
+
+	debug_reflog_iterator_begin,
+	debug_for_each_reflog_ent,
+	debug_for_each_reflog_ent_reverse,
+	debug_reflog_exists,
+	debug_create_reflog,
+	debug_delete_reflog,
+	debug_reflog_expire,
+};
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index b9b5c2e87c..e1da4b3a53 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -695,4 +695,10 @@ int parse_loose_ref_contents(const char *buf, struct object_id *oid,
 void base_ref_store_init(struct ref_store *refs,
 			 const struct ref_storage_be *be);
 
+/*
+ * Print out ref operations as they occur. Useful for debugging alternate ref
+ * backends.
+ */
+struct ref_store *debug_wrap(const char *gitdir, struct ref_store *store);
+
 #endif /* REFS_REFS_INTERNAL_H */
diff --git a/t/t0033-debug-refs.sh b/t/t0033-debug-refs.sh
new file mode 100755
index 0000000000..ed76c2c84a
--- /dev/null
+++ b/t/t0033-debug-refs.sh
@@ -0,0 +1,18 @@
+#!/bin/sh
+#
+# Copyright (c) 2020 Google LLC
+#
+
+test_description='cross-check reftable with files, using GIT_DEBUG_REFS output'
+
+. ./test-lib.sh
+
+test_expect_success 'GIT_DEBUG_REFS' '
+	git init --ref-storage=files files &&
+	git init --ref-storage=reftable reftable &&
+	(cd files && GIT_DEBUG_REFS=1 test_commit message file) > files.txt &&
+	(cd reftable && GIT_DEBUG_REFS=1 test_commit message file) > reftable.txt &&
+	test_cmp files.txt reftable.txt
+'
+
+test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 18/21] vcxproj: adjust for the reftable changes
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (16 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 17/21] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Johannes Schindelin via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 19/21] git-prompt: prepare for reftable refs backend SZEDER Gábor via GitGitGadget
                                                         ` (2 subsequent siblings)
  20 siblings, 0 replies; 409+ messages in thread
From: Johannes Schindelin via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Johannes Schindelin

From: Johannes Schindelin <johannes.schindelin@gmx.de>

This allows Git to be compiled via Visual Studio again after integrating
the `hn/reftable` branch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 config.mak.uname                           |  2 +-
 contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/config.mak.uname b/config.mak.uname
index c7eba69e54..ae4e25a1a4 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -709,7 +709,7 @@ vcxproj:
 	# Make .vcxproj files and add them
 	unset QUIET_GEN QUIET_BUILT_IN; \
 	perl contrib/buildsystems/generate -g Vcxproj
-	git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
+	git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj
 
 	# Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
 	(echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm
index 5c666f9ac0..33a08d3165 100644
--- a/contrib/buildsystems/Generators/Vcxproj.pm
+++ b/contrib/buildsystems/Generators/Vcxproj.pm
@@ -77,7 +77,7 @@ sub createProject {
     my $libs_release = "\n    ";
     my $libs_debug = "\n    ";
     if (!$static_library) {
-      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
+      $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
       $libs_debug = $libs_release;
       $libs_debug =~ s/zlib\.lib/zlibd\.lib/g;
       $libs_debug =~ s/libcurl\.lib/libcurl-d\.lib/g;
@@ -231,6 +231,7 @@ sub createProject {
 EOM
     if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') {
       my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"};
+      my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"};
       my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"};
 
       print F << "EOM";
@@ -240,6 +241,14 @@ sub createProject {
       <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
     </ProjectReference>
 EOM
+      if (!($name =~ /xdiff|libreftable/)) {
+        print F << "EOM";
+    <ProjectReference Include="$cdup\\reftable\\libreftable\\libreftable.vcxproj">
+      <Project>$uuid_libreftable</Project>
+      <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
+    </ProjectReference>
+EOM
+      }
       if (!($name =~ 'xdiff')) {
         print F << "EOM";
     <ProjectReference Include="$cdup\\xdiff\\lib\\xdiff_lib.vcxproj">
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 19/21] git-prompt: prepare for reftable refs backend
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (17 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 18/21] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
@ 2020-07-31 15:27                                       ` SZEDER Gábor via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 20/21] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 21/21] Add "test-tool dump-reftable" command Han-Wen Nienhuys via GitGitGadget
  20 siblings, 0 replies; 409+ messages in thread
From: SZEDER Gábor via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, SZEDER Gábor

From: =?UTF-8?q?SZEDER=20G=C3=A1bor?= <szeder.dev@gmail.com>

In our git-prompt script we strive to use Bash builtins wherever
possible, because fork()-ing subshells for command substitutions and
fork()+exec()-ing Git commands are expensive on some platforms.  We
even read and parse '.git/HEAD' using Bash builtins to get the name of
the current branch [1].  However, the upcoming reftable refs backend
won't use '.git/HEAD' at all, but will write an invalid refname as
placeholder for backwards compatibility instead, which will break our
git-prompt script.

Update the git-prompt script to recognize the placeholder '.git/HEAD'
written by the reftable backend (its content is specified in the
reftable specs), and then fall back to use 'git symbolic-ref' to get
the name of the current branch.

[1] 3a43c4b5bd (bash prompt: use bash builtins to find out current
    branch, 2011-03-31)

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
---
 contrib/completion/git-prompt.sh | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/contrib/completion/git-prompt.sh b/contrib/completion/git-prompt.sh
index 16260bab73..5086bbdbfe 100644
--- a/contrib/completion/git-prompt.sh
+++ b/contrib/completion/git-prompt.sh
@@ -476,10 +476,15 @@ __git_ps1 ()
 			if ! __git_eread "$g/HEAD" head; then
 				return $exit
 			fi
-			# is it a symbolic ref?
 			b="${head#ref: }"
 			if [ "$head" = "$b" ]; then
 				detached=yes
+			elif [ "$b" = "refs/heads/.invalid" ]; then
+				# Reftable
+				b="$(git symbolic-ref HEAD 2>/dev/null)" ||
+				detached=yes
+			fi
+			if [ "$detached" = yes ]; then
 				b="$(
 				case "${GIT_PS1_DESCRIBE_STYLE-}" in
 				(contains)
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 20/21] Add reftable testing infrastructure
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (18 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 19/21] git-prompt: prepare for reftable refs backend SZEDER Gábor via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  2020-07-31 15:27                                       ` [PATCH v20 21/21] Add "test-tool dump-reftable" command Han-Wen Nienhuys via GitGitGadget
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

* Add GIT_TEST_REFTABLE environment var to control default ref storage

* Add test_prerequisite REFTABLE.

* Skip some tests that are incompatible:

  * t3210-pack-refs.sh - does not apply
  * t1450-fsck.sh - manipulates .git/ directly to create invalid state

Major test failures:

 * t1400-update-ref.sh - Reads from .git/{refs,logs} directly
 * t1404-update-ref-errors.sh - Manipulates .git/refs/ directly
 * t1405 - inspecs .git/ directly.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 t/t1409-avoid-packing-refs.sh | 6 ++++++
 t/t1450-fsck.sh               | 6 ++++++
 t/t3210-pack-refs.sh          | 6 ++++++
 t/test-lib.sh                 | 5 +++++
 4 files changed, 23 insertions(+)

diff --git a/t/t1409-avoid-packing-refs.sh b/t/t1409-avoid-packing-refs.sh
index be12fb6350..c6f7832556 100755
--- a/t/t1409-avoid-packing-refs.sh
+++ b/t/t1409-avoid-packing-refs.sh
@@ -4,6 +4,12 @@ test_description='avoid rewriting packed-refs unnecessarily'
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 # Add an identifying mark to the packed-refs file header line. This
 # shouldn't upset readers, and it should be omitted if the file is
 # ever rewritten.
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 344a2aad82..0966920324 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -8,6 +8,12 @@ test_description='git fsck random collection of tests
 
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success setup '
 	test_oid_init &&
 	git config gc.auto 0 &&
diff --git a/t/t3210-pack-refs.sh b/t/t3210-pack-refs.sh
index f41b2afb99..edaef2c175 100755
--- a/t/t3210-pack-refs.sh
+++ b/t/t3210-pack-refs.sh
@@ -11,6 +11,12 @@ semantic is still the same.
 '
 . ./test-lib.sh
 
+if test_have_prereq REFTABLE
+then
+  skip_all='skipping pack-refs tests; incompatible with reftable'
+  test_done
+fi
+
 test_expect_success 'enable reflogs' '
 	git config core.logallrefupdates true
 '
diff --git a/t/test-lib.sh b/t/test-lib.sh
index ba224c86f5..7a341c7bc5 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1508,6 +1508,11 @@ parisc* | hppa*)
 	;;
 esac
 
+if test -n "$GIT_TEST_REFTABLE"
+then
+  test_set_prereq REFTABLE
+fi
+
 ( COLUMNS=1 && test $COLUMNS = 1 ) && test_set_prereq COLUMNS_CAN_BE_1
 test -z "$NO_PERL" && test_set_prereq PERL
 test -z "$NO_PTHREADS" && test_set_prereq PTHREADS
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 409+ messages in thread

* [PATCH v20 21/21] Add "test-tool dump-reftable" command.
  2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
                                                         ` (19 preceding siblings ...)
  2020-07-31 15:27                                       ` [PATCH v20 20/21] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
@ 2020-07-31 15:27                                       ` Han-Wen Nienhuys via GitGitGadget
  20 siblings, 0 replies; 409+ messages in thread
From: Han-Wen Nienhuys via GitGitGadget @ 2020-07-31 15:27 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Han-Wen Nienhuys

From: Han-Wen Nienhuys <hanwen@google.com>

This command dumps individual tables or a stack of of tables.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
---
 Makefile                 | 1 +
 t/helper/test-reftable.c | 5 +++++
 t/helper/test-tool.c     | 1 +
 t/helper/test-tool.h     | 1 +
 4 files changed, 8 insertions(+)

diff --git a/Makefile b/Makefile
index 0948a3bb00..c22fd41662 100644
--- a/Makefile
+++ b/Makefile
@@ -2375,6 +2375,7 @@ REFTABLE_OBJS += reftable/writer.o
 REFTABLE_OBJS += reftable/zlib-compat.o
 
 REFTABLE_TEST_OBJS += reftable/block_test.o
+REFTABLE_TEST_OBJS += reftable/dump.o
 REFTABLE_TEST_OBJS += reftable/merged_test.o
 REFTABLE_TEST_OBJS += reftable/record_test.o
 REFTABLE_TEST_OBJS += reftable/refname_test.o
diff --git a/t/helper/test-reftable.c b/t/helper/test-reftable.c
index def8883439..aff4fbccda 100644
--- a/t/helper/test-reftable.c
+++ b/t/helper/test-reftable.c
@@ -13,3 +13,8 @@ int cmd__reftable(int argc, const char **argv)
 	tree_test_main(argc, argv);
 	return 0;
 }
+
+int cmd__dump_reftable(int argc, const char **argv)
+{
+	return reftable_dump_main(argc, (char *const *)argv);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 10366b7b76..9e689f9d2b 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -53,6 +53,7 @@ static struct test_cmd cmds[] = {
 	{ "read-midx", cmd__read_midx },
 	{ "ref-store", cmd__ref_store },
 	{ "reftable", cmd__reftable },
+	{ "dump-reftable", cmd__dump_reftable },
 	{ "regex", cmd__regex },
 	{ "repository", cmd__repository },
 	{ "revision-walking", cmd__revision_walking },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index d52ba2f5e5..bf833e01d4 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -17,6 +17,7 @@ int cmd__dump_cache_tree(int argc, const char **argv);
 int cmd__dump_fsmonitor(int argc, const char **argv);
 int cmd__dump_split_index(int argc, const char **argv);
 int cmd__dump_untracked_cache(int argc, const char **argv);
+int cmd__dump_reftable(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__genzeros(int argc, const char **argv);
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 409+ messages in thread

end of thread, other threads:[~2020-07-31 15:28 UTC | newest]

Thread overview: 409+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-23 19:41 [PATCH 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
2020-01-23 19:41 ` [PATCH 1/5] setup.c: enable repo detection for reftable Han-Wen Nienhuys via GitGitGadget
2020-01-23 19:41 ` [PATCH 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
2020-01-23 19:41 ` [PATCH 3/5] Document how ref iterators and symrefs interact Han-Wen Nienhuys via GitGitGadget
2020-01-23 19:41 ` [PATCH 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-01-23 19:41 ` [PATCH 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-01-23 21:44 ` [PATCH 0/5] Reftable support git-core Junio C Hamano
2020-01-27 13:52   ` Han-Wen Nienhuys
2020-01-27 13:57     ` Han-Wen Nienhuys
2020-01-23 22:45 ` Stephan Beyer
2020-01-27 13:57   ` Han-Wen Nienhuys
2020-01-27 14:22 ` [PATCH v2 " Han-Wen Nienhuys via GitGitGadget
2020-01-27 14:22   ` [PATCH v2 1/5] setup.c: enable repo detection for reftable Han-Wen Nienhuys via GitGitGadget
2020-01-27 14:22   ` [PATCH v2 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
2020-01-27 22:28     ` Junio C Hamano
2020-01-28 15:58       ` Han-Wen Nienhuys
2020-01-30  4:19         ` Junio C Hamano
2020-01-27 14:22   ` [PATCH v2 3/5] Document how ref iterators and symrefs interact Han-Wen Nienhuys via GitGitGadget
2020-01-27 22:53     ` Junio C Hamano
2020-01-28 16:07       ` Han-Wen Nienhuys
2020-01-28 19:35         ` Junio C Hamano
2020-01-27 14:22   ` [PATCH v2 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-01-27 14:22   ` [PATCH v2 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-01-28  7:31     ` Jeff King
2020-01-28 15:36       ` Martin Fick
2020-01-29  8:12         ` Jeff King
2020-01-29 16:49           ` Martin Fick
2020-01-29 18:40             ` Han-Wen Nienhuys
2020-01-29 19:47               ` Martin Fick
2020-01-29 19:50                 ` Han-Wen Nienhuys
2020-01-30  7:21                   ` Jeff King
2020-02-03 16:39                     ` Han-Wen Nienhuys
2020-02-03 17:05                       ` Jeff King
2020-02-03 17:09                         ` Han-Wen Nienhuys
2020-02-04 18:54                         ` Han-Wen Nienhuys
2020-02-04 20:06                           ` Jeff King
2020-02-04 20:26                             ` Han-Wen Nienhuys
2020-01-29 18:34           ` Junio C Hamano
2020-01-28 15:56       ` Han-Wen Nienhuys
2020-01-29 10:47         ` Jeff King
2020-01-29 18:43           ` Junio C Hamano
2020-01-29 18:53             ` Han-Wen Nienhuys
2020-01-30  7:26             ` Jeff King
2020-02-04 19:06           ` Han-Wen Nienhuys
2020-02-04 19:54             ` Jeff King
2020-02-04 20:22       ` Han-Wen Nienhuys
2020-02-04 22:13         ` Jeff King
2020-02-04 20:27   ` [PATCH v3 0/6] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
2020-02-04 20:27     ` [PATCH v3 1/6] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
2020-02-04 20:27     ` [PATCH v3 2/6] setup.c: enable repo detection for reftable Han-Wen Nienhuys via GitGitGadget
2020-02-04 20:31       ` Han-Wen Nienhuys
2020-02-04 20:27     ` [PATCH v3 3/6] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
2020-02-04 21:29       ` Junio C Hamano
2020-02-05 11:34         ` Han-Wen Nienhuys
2020-02-05 11:42       ` SZEDER Gábor
2020-02-05 12:24         ` Jeff King
2020-02-04 20:27     ` [PATCH v3 4/6] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
2020-02-04 20:27     ` [PATCH v3 5/6] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-02-04 20:27     ` [PATCH v3 6/6] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-02-06 22:55     ` [PATCH v4 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
2020-02-06 22:55       ` [PATCH v4 1/5] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
2020-02-06 22:55       ` [PATCH v4 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
2020-02-06 22:55       ` [PATCH v4 3/5] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
2020-02-06 22:55       ` [PATCH v4 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-02-06 23:07         ` Junio C Hamano
2020-02-07  0:16         ` brian m. carlson
2020-02-10 13:16           ` Han-Wen Nienhuys
2020-02-11  0:05             ` brian m. carlson
2020-02-11 14:20               ` Han-Wen Nienhuys
2020-02-11 16:31                 ` Junio C Hamano
2020-02-11 16:40                   ` Han-Wen Nienhuys
2020-02-11 23:40                     ` brian m. carlson
2020-02-18  9:25                       ` Han-Wen Nienhuys
2020-02-11 16:46                   ` Han-Wen Nienhuys
2020-02-20 17:20                     ` Jonathan Nieder
2020-02-06 22:55       ` [PATCH v4 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-02-06 23:49         ` brian m. carlson
2020-02-10 13:18           ` Han-Wen Nienhuys
2020-02-06 23:31       ` [PATCH v4 0/5] Reftable support git-core brian m. carlson
2020-02-10 14:14       ` [PATCH v5 " Han-Wen Nienhuys via GitGitGadget
2020-02-10 14:14         ` [PATCH v5 1/5] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
2020-02-10 14:14         ` [PATCH v5 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
2020-02-10 14:14         ` [PATCH v5 3/5] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
2020-02-10 14:14         ` [PATCH v5 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-02-10 14:14         ` [PATCH v5 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-02-18  8:43         ` [PATCH v6 0/5] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
2020-02-18  8:43           ` [PATCH v6 1/5] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
2020-02-18  8:43           ` [PATCH v6 2/5] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
2020-02-18  8:43           ` [PATCH v6 3/5] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
2020-02-18  8:43           ` [PATCH v6 4/5] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-02-18 21:11             ` Junio C Hamano
2020-02-19  6:55               ` Jeff King
2020-02-19 17:00                 ` Han-Wen Nienhuys
2020-02-18  8:43           ` [PATCH v6 5/5] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-02-18 21:05           ` [PATCH v6 0/5] Reftable support git-core Junio C Hamano
2020-02-19 16:59             ` Han-Wen Nienhuys
2020-02-19 17:02               ` Junio C Hamano
2020-02-19 17:21                 ` Han-Wen Nienhuys
2020-02-19 18:10                   ` Junio C Hamano
2020-02-19 19:14                     ` Han-Wen Nienhuys
2020-02-19 20:09                       ` Junio C Hamano
2020-02-20 11:19                     ` Jeff King
2020-02-21  6:40           ` Jonathan Nieder
2020-02-26 17:16             ` Han-Wen Nienhuys
2020-02-26 20:04               ` Junio C Hamano
2020-02-27  0:01               ` brian m. carlson
     [not found]             ` <CAFQ2z_NQn9O3kFmHk8Cr31FY66ToU4bUdE=asHUfN++zBG+SPw@mail.gmail.com>
2020-02-26 17:41               ` Jonathan Nieder
2020-02-26 17:54                 ` Han-Wen Nienhuys
2020-02-26  8:49           ` [PATCH v7 0/6] " Han-Wen Nienhuys via GitGitGadget
2020-02-26  8:49             ` [PATCH v7 1/6] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
2020-02-26  8:49             ` [PATCH v7 2/6] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
2020-02-26  8:49             ` [PATCH v7 3/6] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
2020-02-26  8:49             ` [PATCH v7 4/6] reftable: file format documentation Jonathan Nieder via GitGitGadget
2020-02-26  8:49             ` [PATCH v7 5/6] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-02-26  8:49             ` [PATCH v7 6/6] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-02-26 18:12               ` Junio C Hamano
2020-02-26 18:59                 ` Han-Wen Nienhuys
2020-02-26 19:59                   ` Junio C Hamano
2020-02-27 16:03                     ` Han-Wen Nienhuys
2020-02-27 16:23                       ` Junio C Hamano
2020-02-27 17:56                         ` Han-Wen Nienhuys
2020-02-26 21:31               ` Junio C Hamano
2020-02-27 16:01                 ` Han-Wen Nienhuys
2020-02-27 16:26                   ` Junio C Hamano
2020-02-26 17:35             ` [PATCH v7 0/6] Reftable support git-core Junio C Hamano
2020-03-24  6:06             ` Jonathan Nieder
2020-04-01 11:28             ` [PATCH v8 0/9] " Han-Wen Nienhuys via GitGitGadget
2020-04-01 11:28               ` [PATCH v8 1/9] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
2020-04-01 11:28               ` [PATCH v8 2/9] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
2020-04-01 11:28               ` [PATCH v8 3/9] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
2020-04-01 11:28               ` [PATCH v8 4/9] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
2020-04-01 11:28               ` [PATCH v8 5/9] reftable: file format documentation Jonathan Nieder via GitGitGadget
2020-04-01 11:28               ` [PATCH v8 6/9] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
2020-04-01 11:28               ` [PATCH v8 7/9] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
2020-04-01 11:28               ` [PATCH v8 8/9] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-04-01 11:28               ` [PATCH v8 9/9] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-04-15 23:29               ` [PATCH v8 0/9] Reftable support git-core Junio C Hamano
2020-04-18  3:22                 ` Danh Doan
2020-04-20 21:14               ` [PATCH v9 00/10] " Han-Wen Nienhuys via GitGitGadget
2020-04-20 21:14                 ` [PATCH v9 01/10] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
2020-04-20 21:14                 ` [PATCH v9 02/10] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
2020-04-20 21:14                 ` [PATCH v9 03/10] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
2020-04-20 21:14                 ` [PATCH v9 04/10] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
2020-04-20 21:14                 ` [PATCH v9 05/10] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
2020-04-20 21:14                 ` [PATCH v9 06/10] reftable: file format documentation Jonathan Nieder via GitGitGadget
2020-04-20 21:14                 ` [PATCH v9 07/10] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
2020-04-20 21:14                 ` [PATCH v9 08/10] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
2020-04-20 21:14                 ` [PATCH v9 09/10] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-04-20 22:06                   ` Junio C Hamano
2020-04-21 19:04                     ` Han-Wen Nienhuys
2020-04-22 17:35                     ` Johannes Schindelin
2020-04-20 21:14                 ` [PATCH v9 10/10] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-04-21 20:13                 ` [PATCH v9 00/10] Reftable support git-core Junio C Hamano
2020-04-23 21:27                   ` Han-Wen Nienhuys
2020-04-23 21:43                     ` Junio C Hamano
2020-04-23 21:52                       ` Junio C Hamano
2020-04-25 13:58                         ` Johannes Schindelin
2020-04-27 20:13                 ` [PATCH v10 00/12] " Han-Wen Nienhuys via GitGitGadget
2020-04-27 20:13                   ` [PATCH v10 01/12] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
2020-04-27 20:13                   ` [PATCH v10 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
2020-04-30 21:17                     ` Emily Shaffer
2020-05-04 18:03                       ` Han-Wen Nienhuys
2020-05-05 18:26                         ` Pseudo ref handling (was Re: [PATCH v10 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref) Han-Wen Nienhuys
2020-04-27 20:13                   ` [PATCH v10 03/12] create .git/refs in files-backend.c Han-Wen Nienhuys via GitGitGadget
2020-04-30 21:24                     ` Emily Shaffer
2020-04-30 21:49                       ` Junio C Hamano
2020-05-04 18:10                       ` Han-Wen Nienhuys
2020-04-27 20:13                   ` [PATCH v10 04/12] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
2020-04-27 20:13                   ` [PATCH v10 05/12] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
2020-04-27 20:13                   ` [PATCH v10 06/12] reftable: file format documentation Jonathan Nieder via GitGitGadget
2020-04-27 20:13                   ` [PATCH v10 07/12] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
2020-04-27 20:13                   ` [PATCH v10 08/12] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
2020-04-27 20:13                   ` [PATCH v10 09/12] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-04-28 14:55                     ` Danh Doan
2020-04-28 15:29                       ` Junio C Hamano
2020-04-28 15:31                         ` Junio C Hamano
2020-04-28 20:21                       ` Han-Wen Nienhuys
2020-04-28 20:23                         ` Han-Wen Nienhuys
2020-04-27 20:13                   ` [PATCH v10 10/12] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-04-27 20:13                   ` [PATCH v10 11/12] Add some reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
2020-04-27 20:13                   ` [PATCH v10 12/12] t: use update-ref and show-ref to reading/writing refs Han-Wen Nienhuys via GitGitGadget
2020-05-04 19:03                   ` [PATCH v11 00/12] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
2020-05-04 19:03                     ` [PATCH v11 01/12] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
2020-05-04 19:03                     ` [PATCH v11 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
2020-05-04 19:03                     ` [PATCH v11 03/12] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
2020-05-04 19:03                     ` [PATCH v11 04/12] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
2020-05-04 19:03                     ` [PATCH v11 05/12] reftable: file format documentation Jonathan Nieder via GitGitGadget
2020-05-04 19:03                     ` [PATCH v11 06/12] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
2020-05-04 19:03                     ` [PATCH v11 07/12] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
2020-05-04 19:03                     ` [PATCH v11 08/12] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-05-04 19:03                     ` [PATCH v11 09/12] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-05-04 19:03                     ` [PATCH v11 10/12] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
2020-05-04 19:03                     ` [PATCH v11 11/12] Add some reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
2020-05-04 19:03                     ` [PATCH v11 12/12] t: use update-ref and show-ref to reading/writing refs Han-Wen Nienhuys via GitGitGadget
2020-05-06  4:29                     ` [PATCH v11 00/12] Reftable support git-core Junio C Hamano
2020-05-07  9:59                     ` [PATCH v12 " Han-Wen Nienhuys via GitGitGadget
2020-05-07  9:59                       ` [PATCH v12 01/12] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
2020-05-08 18:54                         ` Junio C Hamano
2020-05-07  9:59                       ` [PATCH v12 02/12] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
2020-05-08 18:54                         ` Junio C Hamano
2020-05-11 11:41                           ` Han-Wen Nienhuys
2020-05-07  9:59                       ` [PATCH v12 03/12] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
2020-05-08 18:58                         ` Junio C Hamano
2020-05-11 11:42                           ` Han-Wen Nienhuys
2020-05-11 14:49                             ` Junio C Hamano
2020-05-11 15:11                               ` Han-Wen Nienhuys
2020-05-07  9:59                       ` [PATCH v12 04/12] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
2020-05-07  9:59                       ` [PATCH v12 05/12] reftable: file format documentation Jonathan Nieder via GitGitGadget
2020-05-07  9:59                       ` [PATCH v12 06/12] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
2020-05-08 19:59                         ` Junio C Hamano
2020-05-07  9:59                       ` [PATCH v12 07/12] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
2020-05-07  9:59                       ` [PATCH v12 08/12] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-05-07  9:59                       ` [PATCH v12 09/12] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-05-07  9:59                       ` [PATCH v12 10/12] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
2020-05-07  9:59                       ` [PATCH v12 11/12] t: use update-ref and show-ref to reading/writing refs Han-Wen Nienhuys via GitGitGadget
2020-05-07  9:59                       ` [PATCH v12 12/12] Add some reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
2020-05-11 19:46                       ` [PATCH v13 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
2020-05-11 19:46                         ` [PATCH v13 01/13] refs.h: clarify reflog iteration order Han-Wen Nienhuys via GitGitGadget
2020-05-18 23:31                           ` Junio C Hamano
2020-05-11 19:46                         ` [PATCH v13 02/13] t: use update-ref and show-ref to reading/writing refs Han-Wen Nienhuys via GitGitGadget
2020-05-18 23:34                           ` Junio C Hamano
2020-05-11 19:46                         ` [PATCH v13 03/13] refs: document how ref_iterator_advance_fn should handle symrefs Han-Wen Nienhuys via GitGitGadget
2020-05-18 23:43                           ` Junio C Hamano
2020-05-11 19:46                         ` [PATCH v13 04/13] reftable: file format documentation Jonathan Nieder via GitGitGadget
2020-05-19 22:00                           ` Junio C Hamano
2020-05-20 16:06                             ` Han-Wen Nienhuys
2020-05-20 17:20                               ` Han-Wen Nienhuys
2020-05-20 17:25                                 ` Han-Wen Nienhuys
2020-05-20 17:33                                   ` Junio C Hamano
2020-05-20 18:52                             ` Jonathan Nieder
2020-05-11 19:46                         ` [PATCH v13 05/13] reftable: clarify how empty tables should be written Han-Wen Nienhuys via GitGitGadget
2020-05-19 22:01                           ` Junio C Hamano
2020-05-11 19:46                         ` [PATCH v13 06/13] reftable: define version 2 of the spec to accomodate SHA256 Han-Wen Nienhuys via GitGitGadget
2020-05-19 22:32                           ` Junio C Hamano
2020-05-20 12:38                             ` Han-Wen Nienhuys
2020-05-20 14:40                               ` Junio C Hamano
2020-05-11 19:46                         ` [PATCH v13 07/13] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
2020-05-12 10:22                           ` Phillip Wood
2020-05-12 16:48                             ` Han-Wen Nienhuys
2020-05-13 10:06                               ` Phillip Wood
2020-05-13 18:10                                 ` Phillip Wood
2020-05-11 19:46                         ` [PATCH v13 08/13] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
2020-05-11 19:46                         ` [PATCH v13 09/13] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
2020-05-11 19:46                         ` [PATCH v13 10/13] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-05-11 19:46                         ` [PATCH v13 11/13] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-05-13 19:55                           ` Junio C Hamano
2020-05-11 19:46                         ` [PATCH v13 12/13] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
2020-05-11 19:46                         ` [PATCH v13 13/13] Add some reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
2020-05-13 19:57                           ` Junio C Hamano
2020-05-19 13:54                             ` Han-Wen Nienhuys
2020-05-19 15:21                               ` Junio C Hamano
2020-05-12  0:41                         ` [PATCH v13 00/13] Reftable support git-core Junio C Hamano
2020-05-12  7:49                           ` Han-Wen Nienhuys
2020-05-13 21:21                             ` Junio C Hamano
2020-05-18 20:31                         ` [PATCH v14 0/9] " Han-Wen Nienhuys via GitGitGadget
2020-05-18 20:31                           ` [PATCH v14 1/9] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
2020-05-18 20:31                           ` [PATCH v14 2/9] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
2020-05-18 20:31                           ` [PATCH v14 3/9] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
2020-05-18 20:31                           ` [PATCH v14 4/9] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
2020-05-18 20:31                           ` [PATCH v14 5/9] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-05-18 20:31                           ` [PATCH v14 6/9] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-05-18 20:31                           ` [PATCH v14 7/9] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
2020-05-18 20:31                           ` [PATCH v14 8/9] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
2020-05-18 20:31                           ` [PATCH v14 9/9] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
2020-05-28 19:46                           ` [PATCH v15 00/13] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
2020-05-28 19:46                             ` [PATCH v15 01/13] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
2020-05-28 19:46                             ` [PATCH v15 02/13] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
2020-05-28 19:46                             ` [PATCH v15 03/13] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
2020-05-28 20:52                               ` Junio C Hamano
2020-05-28 19:46                             ` [PATCH v15 04/13] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
2020-05-28 19:46                             ` [PATCH v15 05/13] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
2020-05-28 19:46                             ` [PATCH v15 06/13] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
2020-05-28 19:46                             ` [PATCH v15 07/13] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
2020-05-28 19:46                             ` [PATCH v15 08/13] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
2020-05-28 19:46                             ` [PATCH v15 09/13] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-05-28 19:46                             ` [PATCH v15 10/13] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-05-28 19:46                             ` [PATCH v15 11/13] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
2020-05-28 19:46                             ` [PATCH v15 12/13] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
2020-05-28 19:46                             ` [PATCH v15 13/13] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
2020-05-28 20:15                             ` [PATCH v15 00/13] Reftable support git-core Junio C Hamano
2020-05-28 21:21                             ` Junio C Hamano
2020-06-05 18:03                             ` [PATCH v16 00/14] " Han-Wen Nienhuys via GitGitGadget
2020-06-05 18:03                               ` [PATCH v16 01/14] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
2020-06-09 10:16                                 ` Phillip Wood
2020-06-05 18:03                               ` [PATCH v16 02/14] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
2020-06-09 10:36                                 ` Phillip Wood
2020-06-10 18:05                                   ` Han-Wen Nienhuys
2020-06-11 14:59                                     ` Phillip Wood
2020-06-12  9:51                                       ` Phillip Wood
2020-06-15 11:32                                         ` Han-Wen Nienhuys
2020-06-05 18:03                               ` [PATCH v16 03/14] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
2020-06-05 18:03                               ` [PATCH v16 04/14] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
2020-06-05 18:03                               ` [PATCH v16 05/14] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
2020-06-05 18:03                               ` [PATCH v16 06/14] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
2020-06-05 18:03                               ` [PATCH v16 07/14] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
2020-06-05 18:03                               ` [PATCH v16 08/14] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
2020-06-05 18:03                               ` [PATCH v16 09/14] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-06-05 18:03                               ` [PATCH v16 10/14] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-06-05 18:03                               ` [PATCH v16 11/14] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
2020-06-08 19:39                                 ` Junio C Hamano
2020-06-09 17:22                                   ` [PATCH] Fixup! Add t/helper/test-reftable.c hanwen
2020-06-09 20:45                                     ` Junio C Hamano
2020-06-05 18:03                               ` [PATCH v16 12/14] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
2020-06-05 18:03                               ` [PATCH v16 13/14] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
2020-06-05 18:03                               ` [PATCH v16 14/14] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
2020-06-09 23:14                               ` [PATCH v16 00/14] Reftable support git-core Junio C Hamano
2020-06-10  6:56                                 ` Han-Wen Nienhuys
2020-06-10 17:09                                   ` Junio C Hamano
2020-06-10 17:38                                     ` Junio C Hamano
2020-06-10 18:59                                     ` Johannes Schindelin
2020-06-10 19:04                                       ` Han-Wen Nienhuys
2020-06-10 19:20                                         ` Johannes Schindelin
2020-06-10 16:57                                 ` Han-Wen Nienhuys
2020-06-16 19:20                               ` [PATCH v17 00/17] " Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 01/17] lib-t6000.sh: write tag using git-update-ref Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 02/17] checkout: add '\n' to reflog message Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 03/17] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 04/17] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 05/17] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 06/17] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 07/17] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 08/17] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 09/17] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 10/17] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 11/17] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 12/17] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-06-19 14:24                                   ` SZEDER Gábor
2020-06-16 19:20                                 ` [PATCH v17 13/17] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 14/17] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 15/17] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
2020-06-16 19:20                                 ` [PATCH v17 16/17] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
2020-06-19 16:03                                   ` SZEDER Gábor
2020-06-16 19:20                                 ` [PATCH v17 17/17] Add "test-tool dump-reftable" command Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                 ` [PATCH v18 00/19] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 01/19] lib-t6000.sh: write tag using git-update-ref Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 02/19] checkout: add '\n' to reflog message Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 03/19] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 04/19] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 05/19] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 06/19] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 07/19] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 08/19] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 09/19] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 10/19] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 11/19] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 12/19] Add standalone build infrastructure for reftable Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 13/19] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 14/19] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 15/19] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 16/19] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 17/19] git-prompt: prepare for reftable refs backend SZEDER Gábor via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 18/19] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
2020-06-22 21:55                                   ` [PATCH v18 19/19] Add "test-tool dump-reftable" command Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                   ` [PATCH v19 00/20] Reftable support git-core Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 01/20] lib-t6000.sh: write tag using git-update-ref Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 02/20] t3432: use git-reflog to inspect the reflog for HEAD Han-Wen Nienhuys via GitGitGadget
2020-06-30 15:23                                       ` Denton Liu
2020-06-29 18:56                                     ` [PATCH v19 03/20] checkout: add '\n' to reflog message Han-Wen Nienhuys via GitGitGadget
2020-06-29 20:07                                       ` Junio C Hamano
2020-06-30  8:30                                         ` Han-Wen Nienhuys
2020-06-30 23:58                                           ` Junio C Hamano
2020-07-01 16:56                                             ` Han-Wen Nienhuys
2020-07-01 20:22                                               ` Re* " Junio C Hamano
2020-07-06 15:56                                                 ` Han-Wen Nienhuys
2020-07-06 18:53                                                   ` Junio C Hamano
2020-06-29 18:56                                     ` [PATCH v19 04/20] Write pseudorefs through ref backends Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 05/20] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 06/20] Treat BISECT_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 07/20] Treat CHERRY_PICK_HEAD " Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 08/20] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 09/20] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 10/20] Iterate over the "refs/" namespace in for_each_[raw]ref Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 11/20] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 12/20] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 13/20] Add standalone build infrastructure for reftable Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 14/20] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 15/20] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 16/20] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 17/20] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 18/20] git-prompt: prepare for reftable refs backend SZEDER Gábor via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 19/20] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
2020-06-29 18:56                                     ` [PATCH v19 20/20] Add "test-tool dump-reftable" command Han-Wen Nienhuys via GitGitGadget
2020-06-29 22:54                                     ` [PATCH v19 00/20] Reftable support git-core Junio C Hamano
2020-06-30  9:28                                       ` Han-Wen Nienhuys
2020-07-01  0:03                                         ` Junio C Hamano
2020-07-01 10:16                                           ` Han-Wen Nienhuys
2020-07-01 20:56                                             ` Junio C Hamano
2020-07-31 15:26                                     ` [PATCH v20 00/21] " Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:26                                       ` [PATCH v20 01/21] refs: add \t to reflog in the files backend Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:26                                       ` [PATCH v20 02/21] Split off reading loose ref data in separate function Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:26                                       ` [PATCH v20 03/21] t1400: use git rev-parse for testing PSEUDOREF existence Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 04/21] Modify pseudo refs through ref backend storage Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 05/21] Make HEAD a PSEUDOREF rather than PER_WORKTREE Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 06/21] Make refs_ref_exists public Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 07/21] Treat CHERRY_PICK_HEAD as a pseudo ref Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 08/21] Treat REVERT_HEAD " Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 09/21] Move REF_LOG_ONLY to refs-internal.h Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 10/21] Iteration over entire ref namespace is iterating over "refs/" Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 11/21] Add .gitattributes for the reftable/ directory Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 12/21] Add reftable library Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 13/21] Add standalone build infrastructure for reftable Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 14/21] Reftable support for git-core Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 15/21] Read FETCH_HEAD as loose ref Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 16/21] Hookup unittests for the reftable library Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 17/21] Add GIT_DEBUG_REFS debugging mechanism Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 18/21] vcxproj: adjust for the reftable changes Johannes Schindelin via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 19/21] git-prompt: prepare for reftable refs backend SZEDER Gábor via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 20/21] Add reftable testing infrastructure Han-Wen Nienhuys via GitGitGadget
2020-07-31 15:27                                       ` [PATCH v20 21/21] Add "test-tool dump-reftable" command Han-Wen Nienhuys via GitGitGadget

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).