git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] reftable: optimize table and block iterators
@ 2024-03-27  6:36 Patrick Steinhardt
  2024-03-27  6:36 ` [PATCH 1/9] reftable/block: rename `block_reader_start()` Patrick Steinhardt
                   ` (10 more replies)
  0 siblings, 11 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-03-27  6:36 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1790 bytes --]

Hi,

 * The code to iterate over reftable blocks has seen some optimization
   to reduce memory allocation and deallocation.

Preceding patch series have optimized memory allocation patterns when
iterating through ref or log records. This has led to a constant number
of allocations when decoding and yielding those records to our callers.
One thing that is still missing though is an optimization for the table
and block iterators which are responsible for iterating through the
separate blocks in the table. So while the number of allocations does
not scale with the number of refs (directly) anymore, it still scales
with the number of blocks.

This is getting tackled by this patch series now which refactors the
table and block iterators such that the former can reuse the latter
without reallocations. With this change iterating through records is now
using a truly constant number of allocations.

Patrick

Patrick Steinhardt (9):
  reftable/block: rename `block_reader_start()`
  reftable/block: merge `block_iter_seek()` and `block_reader_seek()`
  reftable/block: better grouping of functions
  reftable/block: introduce `block_reader_release()`
  reftable/block: move ownership of block reader into `struct
    table_iter`
  reftable/reader: iterate to next block in place
  reftable/block: reuse uncompressed blocks
  reftable/block: open-code call to `uncompress2()`
  reftable/block: reuse `zstream` state on inflation

 reftable/block.c      | 152 ++++++++++++++++++++++--------------
 reftable/block.h      |  49 +++++++-----
 reftable/block_test.c |   6 +-
 reftable/iter.c       |   2 +-
 reftable/reader.c     | 176 ++++++++++++++++++++++--------------------
 5 files changed, 222 insertions(+), 163 deletions(-)

-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/9] reftable/block: rename `block_reader_start()`
  2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
@ 2024-03-27  6:36 ` Patrick Steinhardt
  2024-03-27  6:37 ` [PATCH 2/9] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-03-27  6:36 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3109 bytes --]

The function `block_reader_start()` does not really apply to the block
reader, but to the block iterator. It's name is thus somewhat confusing.
Rename it to `block_iter_seek_start()` to clarify.

We will rename `block_reader_seek()` in similar spirit in the next
commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c      | 2 +-
 reftable/block.h      | 2 +-
 reftable/block_test.c | 2 +-
 reftable/iter.c       | 2 +-
 reftable/reader.c     | 4 ++--
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index e2a2cee58d..d5f32867bb 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -266,7 +266,7 @@ static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
 	return get_be24(br->restart_bytes + 3 * i);
 }
 
-void block_reader_start(struct block_reader *br, struct block_iter *it)
+void block_iter_seek_start(struct block_iter *it, struct block_reader *br)
 {
 	it->br = br;
 	strbuf_reset(&it->last_key);
diff --git a/reftable/block.h b/reftable/block.h
index 47acc62c0a..44a9a8de7d 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -98,7 +98,7 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl,
 		      int hash_size);
 
 /* Position `it` at start of the block */
-void block_reader_start(struct block_reader *br, struct block_iter *it);
+void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
 
 /* Position `it` to the `want` key in the block */
 int block_reader_seek(struct block_reader *br, struct block_iter *it,
diff --git a/reftable/block_test.c b/reftable/block_test.c
index e162c6e33f..a719be7c50 100644
--- a/reftable/block_test.c
+++ b/reftable/block_test.c
@@ -69,7 +69,7 @@ static void test_block_read_write(void)
 
 	block_reader_init(&br, &block, header_off, block_size, GIT_SHA1_RAWSZ);
 
-	block_reader_start(&br, &it);
+	block_iter_seek_start(&it, &br);
 
 	while (1) {
 		int r = block_iter_next(&it, &rec);
diff --git a/reftable/iter.c b/reftable/iter.c
index 7aa30c4a51..aa9ac199b1 100644
--- a/reftable/iter.c
+++ b/reftable/iter.c
@@ -115,7 +115,7 @@ static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
 		/* indexed block does not exist. */
 		return REFTABLE_FORMAT_ERROR;
 	}
-	block_reader_start(&it->block_reader, &it->cur);
+	block_iter_seek_start(&it->cur, &it->block_reader);
 	return 0;
 }
 
diff --git a/reftable/reader.c b/reftable/reader.c
index b113daab77..d7d47e8640 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -345,7 +345,7 @@ static int table_iter_next_block(struct table_iter *dest,
 		*brp = br;
 
 		dest->is_finished = 0;
-		block_reader_start(brp, &dest->bi);
+		block_iter_seek_start(&dest->bi, brp);
 	}
 	return 0;
 }
@@ -429,7 +429,7 @@ static int reader_table_iter_at(struct reftable_reader *r,
 	ti->r = r;
 	ti->typ = block_reader_type(brp);
 	ti->block_off = off;
-	block_reader_start(brp, &ti->bi);
+	block_iter_seek_start(&ti->bi, brp);
 	return 0;
 }
 
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/9] reftable/block: merge `block_iter_seek()` and `block_reader_seek()`
  2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
  2024-03-27  6:36 ` [PATCH 1/9] reftable/block: rename `block_reader_start()` Patrick Steinhardt
@ 2024-03-27  6:37 ` Patrick Steinhardt
  2024-03-27  6:37 ` [PATCH 3/9] reftable/block: better grouping of functions Patrick Steinhardt
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-03-27  6:37 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3814 bytes --]

The function `block_iter_seek()` is merely a simple wrapper around
`block_reader_seek()`. Merge those two functions into a new function
`block_iter_seek_key()` that more clearly says what it is actually
doing.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c      | 9 ++-------
 reftable/block.h      | 7 ++-----
 reftable/block_test.c | 4 ++--
 reftable/reader.c     | 4 ++--
 4 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index d5f32867bb..3f182c5d1f 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -362,19 +362,14 @@ int block_reader_first_key(struct block_reader *br, struct strbuf *key)
 	return 0;
 }
 
-int block_iter_seek(struct block_iter *it, struct strbuf *want)
-{
-	return block_reader_seek(it->br, it, want);
-}
-
 void block_iter_close(struct block_iter *it)
 {
 	strbuf_release(&it->last_key);
 	strbuf_release(&it->scratch);
 }
 
-int block_reader_seek(struct block_reader *br, struct block_iter *it,
-		      struct strbuf *want)
+int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
+			struct strbuf *want)
 {
 	struct restart_find_args args = {
 		.key = *want,
diff --git a/reftable/block.h b/reftable/block.h
index 44a9a8de7d..1734bee917 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -101,8 +101,8 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl,
 void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
 
 /* Position `it` to the `want` key in the block */
-int block_reader_seek(struct block_reader *br, struct block_iter *it,
-		      struct strbuf *want);
+int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
+			struct strbuf *want);
 
 /* Returns the block type (eg. 'r' for refs) */
 uint8_t block_reader_type(struct block_reader *r);
@@ -115,9 +115,6 @@ void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
 /* return < 0 for error, 0 for OK, > 0 for EOF. */
 int block_iter_next(struct block_iter *it, struct reftable_record *rec);
 
-/* Seek to `want` with in the block pointed to by `it` */
-int block_iter_seek(struct block_iter *it, struct strbuf *want);
-
 /* deallocate memory for `it`. The block reader and its block is left intact. */
 void block_iter_close(struct block_iter *it);
 
diff --git a/reftable/block_test.c b/reftable/block_test.c
index a719be7c50..26a9cfbc83 100644
--- a/reftable/block_test.c
+++ b/reftable/block_test.c
@@ -89,7 +89,7 @@ static void test_block_read_write(void)
 		strbuf_reset(&want);
 		strbuf_addstr(&want, names[i]);
 
-		n = block_reader_seek(&br, &it, &want);
+		n = block_iter_seek_key(&it, &br, &want);
 		EXPECT(n == 0);
 
 		n = block_iter_next(&it, &rec);
@@ -98,7 +98,7 @@ static void test_block_read_write(void)
 		EXPECT_STREQ(names[i], rec.u.ref.refname);
 
 		want.len--;
-		n = block_reader_seek(&br, &it, &want);
+		n = block_iter_seek_key(&it, &br, &want);
 		EXPECT(n == 0);
 
 		n = block_iter_next(&it, &rec);
diff --git a/reftable/reader.c b/reftable/reader.c
index d7d47e8640..f70efa2b7c 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -483,7 +483,7 @@ static int reader_seek_linear(struct table_iter *ti,
 		table_iter_copy_from(ti, &next);
 	}
 
-	err = block_iter_seek(&ti->bi, &want_key);
+	err = block_iter_seek_key(&ti->bi, ti->bi.br, &want_key);
 	if (err < 0)
 		goto done;
 	err = 0;
@@ -558,7 +558,7 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		if (err != 0)
 			goto done;
 
-		err = block_iter_seek(&next.bi, &want_index.u.idx.last_key);
+		err = block_iter_seek_key(&next.bi, next.bi.br, &want_index.u.idx.last_key);
 		if (err < 0)
 			goto done;
 
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 3/9] reftable/block: better grouping of functions
  2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
  2024-03-27  6:36 ` [PATCH 1/9] reftable/block: rename `block_reader_start()` Patrick Steinhardt
  2024-03-27  6:37 ` [PATCH 2/9] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
@ 2024-03-27  6:37 ` Patrick Steinhardt
  2024-03-27  6:37 ` [PATCH 4/9] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-03-27  6:37 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 4026 bytes --]

Function definitions and declaration of `struct block_reader` and
`struct block_iter` are somewhat mixed up, making it hard to see which
functions belong together. Rearrange them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 50 ++++++++++++++++++++++++------------------------
 reftable/block.h | 22 ++++++++++-----------
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 3f182c5d1f..6b78dd3424 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -175,11 +175,6 @@ int block_writer_finish(struct block_writer *w)
 	return w->next;
 }
 
-uint8_t block_reader_type(struct block_reader *r)
-{
-	return r->block.data[r->header_off];
-}
-
 int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		      uint32_t header_off, uint32_t table_block_size,
 		      int hash_size)
@@ -261,6 +256,31 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 	return err;
 }
 
+uint8_t block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_first_key(struct block_reader *br, struct strbuf *key)
+{
+	int off = br->header_off + 4, n;
+	struct string_view in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+	uint8_t extra = 0;
+
+	strbuf_reset(key);
+
+	n = reftable_decode_key(key, &extra, in);
+	if (n < 0)
+		return n;
+	if (!key->len)
+		return REFTABLE_FORMAT_ERROR;
+
+	return 0;
+}
+
 static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
 {
 	return get_be24(br->restart_bytes + 3 * i);
@@ -342,26 +362,6 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 	return 0;
 }
 
-int block_reader_first_key(struct block_reader *br, struct strbuf *key)
-{
-	int off = br->header_off + 4, n;
-	struct string_view in = {
-		.buf = br->block.data + off,
-		.len = br->block_len - off,
-	};
-	uint8_t extra = 0;
-
-	strbuf_reset(key);
-
-	n = reftable_decode_key(key, &extra, in);
-	if (n < 0)
-		return n;
-	if (!key->len)
-		return REFTABLE_FORMAT_ERROR;
-
-	return 0;
-}
-
 void block_iter_close(struct block_iter *it)
 {
 	strbuf_release(&it->last_key);
diff --git a/reftable/block.h b/reftable/block.h
index 1734bee917..d73ed73549 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -76,6 +76,17 @@ struct block_reader {
 	uint32_t full_block_size;
 };
 
+/* initializes a block reader. */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Returns the block type (eg. 'r' for refs) */
+uint8_t block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct strbuf *key);
+
 /* Iterate over entries in a block */
 struct block_iter {
 	/* offset within the block of the next entry to read. */
@@ -92,11 +103,6 @@ struct block_iter {
 	.scratch = STRBUF_INIT, \
 }
 
-/* initializes a block reader. */
-int block_reader_init(struct block_reader *br, struct reftable_block *bl,
-		      uint32_t header_off, uint32_t table_block_size,
-		      int hash_size);
-
 /* Position `it` at start of the block */
 void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
 
@@ -104,12 +110,6 @@ void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
 int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
 			struct strbuf *want);
 
-/* Returns the block type (eg. 'r' for refs) */
-uint8_t block_reader_type(struct block_reader *r);
-
-/* Decodes the first key in the block */
-int block_reader_first_key(struct block_reader *br, struct strbuf *key);
-
 void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
 
 /* return < 0 for error, 0 for OK, > 0 for EOF. */
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 4/9] reftable/block: introduce `block_reader_release()`
  2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2024-03-27  6:37 ` [PATCH 3/9] reftable/block: better grouping of functions Patrick Steinhardt
@ 2024-03-27  6:37 ` Patrick Steinhardt
  2024-04-03 13:16   ` Karthik Nayak
  2024-03-27  6:37 ` [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Patrick Steinhardt @ 2024-03-27  6:37 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1693 bytes --]

Introduce a new function `block_reader_release()` that releases
resources acquired by the block reader. This function will be extended
in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c  | 5 +++++
 reftable/block.h  | 2 ++
 reftable/reader.c | 2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/reftable/block.c b/reftable/block.c
index 6b78dd3424..013849f028 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -256,6 +256,11 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 	return err;
 }
 
+void block_reader_release(struct block_reader *br)
+{
+	reftable_block_done(&br->block);
+}
+
 uint8_t block_reader_type(struct block_reader *r)
 {
 	return r->block.data[r->header_off];
diff --git a/reftable/block.h b/reftable/block.h
index d73ed73549..601a1e0e89 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -81,6 +81,8 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl,
 		      uint32_t header_off, uint32_t table_block_size,
 		      int hash_size);
 
+void block_reader_release(struct block_reader *br);
+
 /* Returns the block type (eg. 'r' for refs) */
 uint8_t block_reader_type(struct block_reader *r);
 
diff --git a/reftable/reader.c b/reftable/reader.c
index f70efa2b7c..f925570bf3 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -253,7 +253,7 @@ static void table_iter_block_done(struct table_iter *ti)
 	if (!ti->bi.br) {
 		return;
 	}
-	reftable_block_done(&ti->bi.br->block);
+	block_reader_release(ti->bi.br);
 	FREE_AND_NULL(ti->bi.br);
 
 	ti->bi.last_key.len = 0;
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter`
  2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2024-03-27  6:37 ` [PATCH 4/9] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
@ 2024-03-27  6:37 ` Patrick Steinhardt
  2024-04-03  4:52   ` Justin Tobler
  2024-03-27  6:37 ` [PATCH 6/9] reftable/reader: iterate to next block in place Patrick Steinhardt
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 31+ messages in thread
From: Patrick Steinhardt @ 2024-03-27  6:37 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 15715 bytes --]

The table iterator allows the caller to iterate through all records in a
reftable table. To do so it iterates through all blocks of the desired
type one by one, where for each block it creates a new block iterator
and yields all its entries.

One of the things that is somewhat confusing in this context is who owns
the block reader that is being used to read the blocks and pass them to
the block iterator. Intuitively, as the table iterator is responsible
for iterating through the blocks, one would assume that this iterator is
also responsible for managing the lifecycle of the reader. And while it
somewhat is, the block reader is ultimately stored inside of the block
iterator.

Refactor the code such that the block reader is instead fully managed by
the table iterator. Instead of passing the reader to the block iterator,
we now only end up passing the block data to it. Despite clearing up the
lifecycle of the reader, it will also allow for better reuse of the
reader in subsequent patches.

The following benchmark prints a single matching ref out of 1 million
refs. Before:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 6,607 allocs, 6,482 frees, 509,635 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated

Note that while there are more allocation and free calls now, the
overall number of bytes allocated is significantly lower. The number of
allocations will be reduced significantly by the next patch though.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c  |  43 ++++++++++------
 reftable/block.h  |  19 ++++---
 reftable/reader.c | 123 ++++++++++++++++++++++------------------------
 3 files changed, 102 insertions(+), 83 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 013849f028..8f5dfe10bf 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -261,12 +261,12 @@ void block_reader_release(struct block_reader *br)
 	reftable_block_done(&br->block);
 }
 
-uint8_t block_reader_type(struct block_reader *r)
+uint8_t block_reader_type(const struct block_reader *r)
 {
 	return r->block.data[r->header_off];
 }
 
-int block_reader_first_key(struct block_reader *br, struct strbuf *key)
+int block_reader_first_key(const struct block_reader *br, struct strbuf *key)
 {
 	int off = br->header_off + 4, n;
 	struct string_view in = {
@@ -286,14 +286,16 @@ int block_reader_first_key(struct block_reader *br, struct strbuf *key)
 	return 0;
 }
 
-static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+static uint32_t block_reader_restart_offset(const struct block_reader *br, int i)
 {
 	return get_be24(br->restart_bytes + 3 * i);
 }
 
-void block_iter_seek_start(struct block_iter *it, struct block_reader *br)
+void block_iter_seek_start(struct block_iter *it, const struct block_reader *br)
 {
-	it->br = br;
+	it->block = br->block.data;
+	it->block_len = br->block_len;
+	it->hash_size = br->hash_size;
 	strbuf_reset(&it->last_key);
 	it->next_off = br->header_off + 4;
 }
@@ -301,7 +303,7 @@ void block_iter_seek_start(struct block_iter *it, struct block_reader *br)
 struct restart_find_args {
 	int error;
 	struct strbuf key;
-	struct block_reader *r;
+	const struct block_reader *r;
 };
 
 static int restart_key_less(size_t idx, void *args)
@@ -329,9 +331,11 @@ static int restart_key_less(size_t idx, void *args)
 	return result < 0;
 }
 
-void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src)
 {
-	dest->br = src->br;
+	dest->block = src->block;
+	dest->block_len = src->block_len;
+	dest->hash_size = src->hash_size;
 	dest->next_off = src->next_off;
 	strbuf_reset(&dest->last_key);
 	strbuf_addbuf(&dest->last_key, &src->last_key);
@@ -340,14 +344,14 @@ void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
 int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 {
 	struct string_view in = {
-		.buf = it->br->block.data + it->next_off,
-		.len = it->br->block_len - it->next_off,
+		.buf = (unsigned char *) it->block + it->next_off,
+		.len = it->block_len - it->next_off,
 	};
 	struct string_view start = in;
 	uint8_t extra = 0;
 	int n = 0;
 
-	if (it->next_off >= it->br->block_len)
+	if (it->next_off >= it->block_len)
 		return 1;
 
 	n = reftable_decode_key(&it->last_key, &extra, in);
@@ -357,7 +361,7 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 		return REFTABLE_FORMAT_ERROR;
 
 	string_view_consume(&in, n);
-	n = reftable_record_decode(rec, it->last_key, extra, in, it->br->hash_size,
+	n = reftable_record_decode(rec, it->last_key, extra, in, it->hash_size,
 				   &it->scratch);
 	if (n < 0)
 		return -1;
@@ -367,13 +371,22 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 	return 0;
 }
 
+void block_iter_reset(struct block_iter *it)
+{
+	strbuf_reset(&it->last_key);
+	it->next_off = 0;
+	it->block = NULL;
+	it->block_len = 0;
+	it->hash_size = 0;
+}
+
 void block_iter_close(struct block_iter *it)
 {
 	strbuf_release(&it->last_key);
 	strbuf_release(&it->scratch);
 }
 
-int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
+int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
 			struct strbuf *want)
 {
 	struct restart_find_args args = {
@@ -394,7 +407,9 @@ int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
 		it->next_off = block_reader_restart_offset(br, i - 1);
 	else
 		it->next_off = br->header_off + 4;
-	it->br = br;
+	it->block = br->block.data;
+	it->block_len = br->block_len;
+	it->hash_size = br->hash_size;
 
 	reftable_record_init(&rec, block_reader_type(br));
 
diff --git a/reftable/block.h b/reftable/block.h
index 601a1e0e89..b41efa5042 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -84,16 +84,18 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl,
 void block_reader_release(struct block_reader *br);
 
 /* Returns the block type (eg. 'r' for refs) */
-uint8_t block_reader_type(struct block_reader *r);
+uint8_t block_reader_type(const struct block_reader *r);
 
 /* Decodes the first key in the block */
-int block_reader_first_key(struct block_reader *br, struct strbuf *key);
+int block_reader_first_key(const struct block_reader *br, struct strbuf *key);
 
 /* Iterate over entries in a block */
 struct block_iter {
 	/* offset within the block of the next entry to read. */
 	uint32_t next_off;
-	struct block_reader *br;
+	const unsigned char *block;
+	size_t block_len;
+	int hash_size;
 
 	/* key for last entry we read. */
 	struct strbuf last_key;
@@ -106,17 +108,22 @@ struct block_iter {
 }
 
 /* Position `it` at start of the block */
-void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
+void block_iter_seek_start(struct block_iter *it, const struct block_reader *br);
 
 /* Position `it` to the `want` key in the block */
-int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
+int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
 			struct strbuf *want);
 
-void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src);
 
 /* return < 0 for error, 0 for OK, > 0 for EOF. */
 int block_iter_next(struct block_iter *it, struct reftable_record *rec);
 
+/*
+ * Reset the block iterator to pristine state without releasing its memory.
+ */
+void block_iter_reset(struct block_iter *it);
+
 /* deallocate memory for `it`. The block reader and its block is left intact. */
 void block_iter_close(struct block_iter *it);
 
diff --git a/reftable/reader.c b/reftable/reader.c
index f925570bf3..b77b639751 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -220,6 +220,7 @@ struct table_iter {
 	struct reftable_reader *r;
 	uint8_t typ;
 	uint64_t block_off;
+	struct block_reader br;
 	struct block_iter bi;
 	int is_finished;
 };
@@ -227,16 +228,6 @@ struct table_iter {
 	.bi = BLOCK_ITER_INIT \
 }
 
-static void table_iter_copy_from(struct table_iter *dest,
-				 struct table_iter *src)
-{
-	dest->r = src->r;
-	dest->typ = src->typ;
-	dest->block_off = src->block_off;
-	dest->is_finished = src->is_finished;
-	block_iter_copy_from(&dest->bi, &src->bi);
-}
-
 static int table_iter_next_in_block(struct table_iter *ti,
 				    struct reftable_record *rec)
 {
@@ -250,14 +241,8 @@ static int table_iter_next_in_block(struct table_iter *ti,
 
 static void table_iter_block_done(struct table_iter *ti)
 {
-	if (!ti->bi.br) {
-		return;
-	}
-	block_reader_release(ti->bi.br);
-	FREE_AND_NULL(ti->bi.br);
-
-	ti->bi.last_key.len = 0;
-	ti->bi.next_off = 0;
+	block_reader_release(&ti->br);
+	block_iter_reset(&ti->bi);
 }
 
 static int32_t extract_block_size(uint8_t *data, uint8_t *typ, uint64_t off,
@@ -321,32 +306,33 @@ int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
 	return err;
 }
 
+static void table_iter_close(struct table_iter *ti)
+{
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
 static int table_iter_next_block(struct table_iter *dest,
 				 struct table_iter *src)
 {
-	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
-	struct block_reader br = { 0 };
-	int err = 0;
+	uint64_t next_block_off = src->block_off + src->br.full_block_size;
+	int err;
 
 	dest->r = src->r;
 	dest->typ = src->typ;
 	dest->block_off = next_block_off;
 
-	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
-	if (err > 0) {
+	err = reader_init_block_reader(src->r, &dest->br, next_block_off, src->typ);
+	if (err > 0)
 		dest->is_finished = 1;
-		return 1;
-	}
-	if (err != 0)
+	if (err) {
+		table_iter_block_done(dest);
 		return err;
-	else {
-		struct block_reader *brp =
-			reftable_malloc(sizeof(struct block_reader));
-		*brp = br;
-
-		dest->is_finished = 0;
-		block_iter_seek_start(&dest->bi, brp);
 	}
+
+	dest->is_finished = 0;
+	block_iter_seek_start(&dest->bi, &dest->br);
+
 	return 0;
 }
 
@@ -377,14 +363,13 @@ static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
 		 * iterator is drained.
 		 */
 		err = table_iter_next_block(&next, ti);
-		table_iter_block_done(ti);
 		if (err) {
 			ti->is_finished = 1;
 			return err;
 		}
 
-		table_iter_copy_from(ti, &next);
-		block_iter_close(&next.bi);
+		table_iter_close(ti);
+		*ti = next;
 	}
 }
 
@@ -393,16 +378,14 @@ static int table_iter_next_void(void *ti, struct reftable_record *rec)
 	return table_iter_next(ti, rec);
 }
 
-static void table_iter_close(void *p)
+static void table_iter_close_void(void *ti)
 {
-	struct table_iter *ti = p;
-	table_iter_block_done(ti);
-	block_iter_close(&ti->bi);
+	table_iter_close(ti);
 }
 
 static struct reftable_iterator_vtable table_iter_vtable = {
 	.next = &table_iter_next_void,
-	.close = &table_iter_close,
+	.close = &table_iter_close_void,
 };
 
 static void iterator_from_table_iter(struct reftable_iterator *it,
@@ -417,19 +400,16 @@ static int reader_table_iter_at(struct reftable_reader *r,
 				struct table_iter *ti, uint64_t off,
 				uint8_t typ)
 {
-	struct block_reader br = { 0 };
-	struct block_reader *brp = NULL;
+	int err;
 
-	int err = reader_init_block_reader(r, &br, off, typ);
+	err = reader_init_block_reader(r, &ti->br, off, typ);
 	if (err != 0)
 		return err;
 
-	brp = reftable_malloc(sizeof(struct block_reader));
-	*brp = br;
 	ti->r = r;
-	ti->typ = block_reader_type(brp);
+	ti->typ = block_reader_type(&ti->br);
 	ti->block_off = off;
-	block_iter_seek_start(&ti->bi, brp);
+	block_iter_seek_start(&ti->bi, &ti->br);
 	return 0;
 }
 
@@ -454,23 +434,34 @@ static int reader_seek_linear(struct table_iter *ti,
 {
 	struct strbuf want_key = STRBUF_INIT;
 	struct strbuf got_key = STRBUF_INIT;
-	struct table_iter next = TABLE_ITER_INIT;
 	struct reftable_record rec;
 	int err = -1;
 
 	reftable_record_init(&rec, reftable_record_type(want));
 	reftable_record_key(want, &want_key);
 
+	/*
+	 * First we need to locate the block that must contain our record. To
+	 * do so we scan through blocks linearly until we find the first block
+	 * whose first key is bigger than our wanted key. Once we have found
+	 * that block we know that the key must be contained in the preceding
+	 * block.
+	 *
+	 * This algorithm is somewhat unfortunate because it means that we
+	 * always have to seek one block too far and then back up. But as we
+	 * can only decode the _first_ key of a block but not its _last_ key we
+	 * have no other way to do this.
+	 */
 	while (1) {
+		struct table_iter next = TABLE_ITER_INIT;
+
 		err = table_iter_next_block(&next, ti);
 		if (err < 0)
 			goto done;
-
-		if (err > 0) {
+		if (err > 0)
 			break;
-		}
 
-		err = block_reader_first_key(next.bi.br, &got_key);
+		err = block_reader_first_key(&next.br, &got_key);
 		if (err < 0)
 			goto done;
 
@@ -480,16 +471,20 @@ static int reader_seek_linear(struct table_iter *ti,
 		}
 
 		table_iter_block_done(ti);
-		table_iter_copy_from(ti, &next);
+		*ti = next;
 	}
 
-	err = block_iter_seek_key(&ti->bi, ti->bi.br, &want_key);
+	/*
+	 * We have located the block that must contain our record, so we seek
+	 * the wanted key inside of it. If the block does not contain our key
+	 * we know that the corresponding record does not exist.
+	 */
+	err = block_iter_seek_key(&ti->bi, &ti->br, &want_key);
 	if (err < 0)
 		goto done;
 	err = 0;
 
 done:
-	block_iter_close(&next.bi);
 	reftable_record_release(&rec);
 	strbuf_release(&want_key);
 	strbuf_release(&got_key);
@@ -508,6 +503,7 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		.u.idx = { .last_key = STRBUF_INIT },
 	};
 	struct table_iter index_iter = TABLE_ITER_INIT;
+	struct table_iter empty = TABLE_ITER_INIT;
 	struct table_iter next = TABLE_ITER_INIT;
 	int err = 0;
 
@@ -549,7 +545,6 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		 * not exist.
 		 */
 		err = table_iter_next(&index_iter, &index_result);
-		table_iter_block_done(&index_iter);
 		if (err != 0)
 			goto done;
 
@@ -558,7 +553,7 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		if (err != 0)
 			goto done;
 
-		err = block_iter_seek_key(&next.bi, next.bi.br, &want_index.u.idx.last_key);
+		err = block_iter_seek_key(&next.bi, &next.br, &want_index.u.idx.last_key);
 		if (err < 0)
 			goto done;
 
@@ -572,18 +567,20 @@ static int reader_seek_indexed(struct reftable_reader *r,
 			break;
 		}
 
-		table_iter_copy_from(&index_iter, &next);
+		table_iter_close(&index_iter);
+		index_iter = next;
+		next = empty;
 	}
 
 	if (err == 0) {
-		struct table_iter empty = TABLE_ITER_INIT;
 		struct table_iter *malloced = reftable_calloc(1, sizeof(*malloced));
-		*malloced = empty;
-		table_iter_copy_from(malloced, &next);
+		*malloced = next;
+		next = empty;
 		iterator_from_table_iter(it, malloced);
 	}
+
 done:
-	block_iter_close(&next.bi);
+	table_iter_close(&next);
 	table_iter_close(&index_iter);
 	reftable_record_release(&want_index);
 	reftable_record_release(&index_result);
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 6/9] reftable/reader: iterate to next block in place
  2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2024-03-27  6:37 ` [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
@ 2024-03-27  6:37 ` Patrick Steinhardt
  2024-03-27  6:37 ` [PATCH 7/9] reftable/block: reuse uncompressed blocks Patrick Steinhardt
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-03-27  6:37 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 4330 bytes --]

The table iterator has to iterate towards the next block once it has
yielded all records of the current block. This is done by creating a new
table iterator, initializing it to the next block, releasing the old
iterator and then copying over the data.

Refactor the code to instead advance the table iterator in place. This
is simpler and unlocks some optimizations in subsequent patches. Also,
it allows us to avoid some allocations.

The following measurements show a single matching ref out of 1 million
refs. Before this change:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 315 allocs, 190 frees, 107,027 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c  |  2 ++
 reftable/reader.c | 47 ++++++++++++++++++++++++++---------------------
 2 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 8f5dfe10bf..471ebd8580 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -188,6 +188,8 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 	uint8_t *restart_bytes = NULL;
 	uint8_t *uncompressed = NULL;
 
+	reftable_block_done(&br->block);
+
 	if (!reftable_is_block_type(typ)) {
 		err =  REFTABLE_FORMAT_ERROR;
 		goto done;
diff --git a/reftable/reader.c b/reftable/reader.c
index b77b639751..dd4de294a1 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -312,26 +312,20 @@ static void table_iter_close(struct table_iter *ti)
 	block_iter_close(&ti->bi);
 }
 
-static int table_iter_next_block(struct table_iter *dest,
-				 struct table_iter *src)
+static int table_iter_next_block(struct table_iter *ti)
 {
-	uint64_t next_block_off = src->block_off + src->br.full_block_size;
+	uint64_t next_block_off = ti->block_off + ti->br.full_block_size;
 	int err;
 
-	dest->r = src->r;
-	dest->typ = src->typ;
-	dest->block_off = next_block_off;
-
-	err = reader_init_block_reader(src->r, &dest->br, next_block_off, src->typ);
+	err = reader_init_block_reader(ti->r, &ti->br, next_block_off, ti->typ);
 	if (err > 0)
-		dest->is_finished = 1;
-	if (err) {
-		table_iter_block_done(dest);
+		ti->is_finished = 1;
+	if (err)
 		return err;
-	}
 
-	dest->is_finished = 0;
-	block_iter_seek_start(&dest->bi, &dest->br);
+	ti->block_off = next_block_off;
+	ti->is_finished = 0;
+	block_iter_seek_start(&ti->bi, &ti->br);
 
 	return 0;
 }
@@ -342,7 +336,6 @@ static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
 		return REFTABLE_API_ERROR;
 
 	while (1) {
-		struct table_iter next = TABLE_ITER_INIT;
 		int err;
 
 		if (ti->is_finished)
@@ -362,14 +355,11 @@ static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
 		 * table and retry. If there are no more blocks then the
 		 * iterator is drained.
 		 */
-		err = table_iter_next_block(&next, ti);
+		err = table_iter_next_block(ti);
 		if (err) {
 			ti->is_finished = 1;
 			return err;
 		}
-
-		table_iter_close(ti);
-		*ti = next;
 	}
 }
 
@@ -453,9 +443,24 @@ static int reader_seek_linear(struct table_iter *ti,
 	 * have no other way to do this.
 	 */
 	while (1) {
-		struct table_iter next = TABLE_ITER_INIT;
+		struct table_iter next = *ti;
+
+		/*
+		 * We must be careful to not modify underlying data of `ti`
+		 * because we may find that `next` does not contain our desired
+		 * block, but that `ti` does. In that case, we would discard
+		 * `next` and continue with `ti`.
+		 *
+		 * This also means that we cannot reuse allocated memory for
+		 * `next` here. While it would be great if we could, it should
+		 * in practice not be too bad given that we should only ever
+		 * end up doing linear seeks with at most three blocks. As soon
+		 * as we have more than three blocks we would have an index, so
+		 * we would not do a linear search there anymore.
+		 */
+		memset(&next.br.block, 0, sizeof(next.br.block));
 
-		err = table_iter_next_block(&next, ti);
+		err = table_iter_next_block(&next);
 		if (err < 0)
 			goto done;
 		if (err > 0)
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 7/9] reftable/block: reuse uncompressed blocks
  2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
                   ` (5 preceding siblings ...)
  2024-03-27  6:37 ` [PATCH 6/9] reftable/reader: iterate to next block in place Patrick Steinhardt
@ 2024-03-27  6:37 ` Patrick Steinhardt
  2024-03-27  6:37 ` [PATCH 8/9] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-03-27  6:37 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 5187 bytes --]

The reftable backend stores reflog entries in a compressed format and
thus needs to uncompress blocks before one can read records from it.
For each reflog block we thus have to allocate an array that we can
decompress the block contents into. This block is being discarded
whenever the table iterator moves to the next block. Consequently, we
reallocate a new array on every block, which is quite wasteful.

Refactor the code to reuse the uncompressed block data when moving the
block reader to a new block. This significantly reduces the number of
allocations when iterating through many compressed blocks. The following
measurements are done with `git reflog list` when listing 100k reflogs.
Before:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 45,755 allocs, 45,633 frees, 254,779,456 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 23,028 allocs, 22,906 frees, 162,813,547 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c  | 14 ++++++--------
 reftable/block.h  |  4 ++++
 reftable/reader.c | 27 ++++++++++++++++-----------
 3 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 471ebd8580..31af075c1d 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -186,7 +186,6 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 	uint16_t restart_count = 0;
 	uint32_t restart_start = 0;
 	uint8_t *restart_bytes = NULL;
-	uint8_t *uncompressed = NULL;
 
 	reftable_block_done(&br->block);
 
@@ -202,14 +201,15 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		uLongf src_len = block->len - block_header_skip;
 
 		/* Log blocks specify the *uncompressed* size in their header. */
-		REFTABLE_ALLOC_ARRAY(uncompressed, sz);
+		REFTABLE_ALLOC_GROW(br->uncompressed_data, sz,
+				    br->uncompressed_cap);
 
 		/* Copy over the block header verbatim. It's not compressed. */
-		memcpy(uncompressed, block->data, block_header_skip);
+		memcpy(br->uncompressed_data, block->data, block_header_skip);
 
 		/* Uncompress */
 		if (Z_OK !=
-		    uncompress2(uncompressed + block_header_skip, &dst_len,
+		    uncompress2(br->uncompressed_data + block_header_skip, &dst_len,
 				block->data + block_header_skip, &src_len)) {
 			err = REFTABLE_ZLIB_ERROR;
 			goto done;
@@ -222,10 +222,8 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 
 		/* We're done with the input data. */
 		reftable_block_done(block);
-		block->data = uncompressed;
-		uncompressed = NULL;
+		block->data = br->uncompressed_data;
 		block->len = sz;
-		block->source = malloc_block_source();
 		full_block_size = src_len + block_header_skip;
 	} else if (full_block_size == 0) {
 		full_block_size = sz;
@@ -254,12 +252,12 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 	br->restart_bytes = restart_bytes;
 
 done:
-	reftable_free(uncompressed);
 	return err;
 }
 
 void block_reader_release(struct block_reader *br)
 {
+	reftable_free(br->uncompressed_data);
 	reftable_block_done(&br->block);
 }
 
diff --git a/reftable/block.h b/reftable/block.h
index b41efa5042..79275d67f1 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -66,6 +66,10 @@ struct block_reader {
 	struct reftable_block block;
 	int hash_size;
 
+	/* Uncompressed data for log entries. */
+	unsigned char *uncompressed_data;
+	size_t uncompressed_cap;
+
 	/* size of the data, excluding restart data. */
 	uint32_t block_len;
 	uint8_t *restart_bytes;
diff --git a/reftable/reader.c b/reftable/reader.c
index dd4de294a1..aacd5f1337 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -459,6 +459,8 @@ static int reader_seek_linear(struct table_iter *ti,
 		 * we would not do a linear search there anymore.
 		 */
 		memset(&next.br.block, 0, sizeof(next.br.block));
+		next.br.uncompressed_data = NULL;
+		next.br.uncompressed_cap = 0;
 
 		err = table_iter_next_block(&next);
 		if (err < 0)
@@ -599,25 +601,28 @@ static int reader_seek_internal(struct reftable_reader *r,
 	struct reftable_reader_offsets *offs =
 		reader_offsets_for(r, reftable_record_type(rec));
 	uint64_t idx = offs->index_offset;
-	struct table_iter ti = TABLE_ITER_INIT;
-	int err = 0;
+	struct table_iter ti = TABLE_ITER_INIT, *p;
+	int err;
+
 	if (idx > 0)
 		return reader_seek_indexed(r, it, rec);
 
 	err = reader_start(r, &ti, reftable_record_type(rec), 0);
 	if (err < 0)
-		return err;
+		goto out;
+
 	err = reader_seek_linear(&ti, rec);
 	if (err < 0)
-		return err;
-	else {
-		struct table_iter *p =
-			reftable_malloc(sizeof(struct table_iter));
-		*p = ti;
-		iterator_from_table_iter(it, p);
-	}
+		goto out;
 
-	return 0;
+	REFTABLE_ALLOC_ARRAY(p, 1);
+	*p = ti;
+	iterator_from_table_iter(it, p);
+
+out:
+	if (err)
+		table_iter_close(&ti);
+	return err;
 }
 
 static int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 8/9] reftable/block: open-code call to `uncompress2()`
  2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
                   ` (6 preceding siblings ...)
  2024-03-27  6:37 ` [PATCH 7/9] reftable/block: reuse uncompressed blocks Patrick Steinhardt
@ 2024-03-27  6:37 ` Patrick Steinhardt
  2024-03-27  6:37 ` [PATCH 9/9] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-03-27  6:37 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 4364 bytes --]

The reftable format stores log blocks in a compressed format. Thus,
whenever we want to read such a block we first need to decompress it.
This is done by calling the convenience function `uncompress2()` of the
zlib library, which is a simple wrapper that manages the lifecycle of
the `zstream` structure for us.

While nice for one-off inflation of data, when iterating through reflogs
we will likely end up inflating many such log blocks. This requires us
to reallocate the state of the `zstream` every single time, which adds
up over time. It would thus be great to reuse the `zstream` instead of
discarding it after every inflation.

Open-code the call to `uncompress2()` such that we can start reusing the
`zstream` in the subsequent commit. Note that our open-coded variant is
different from `uncompress2()` in two ways:

  - We do not loop around `inflate()` until we have processed all input.
    As our input is limited by the maximum block size, which is 16MB, we
    should not hit limits of `inflate()`.

  - We use `Z_FINISH` instead of `Z_NO_FLUSH`. Quoting the `inflate()`
    documentation: "inflate() should normally be called until it returns
    Z_STREAM_END or an error. However if all decompression is to be
    performed in a single step (a single call of inflate), the parameter
    flush should be set to Z_FINISH."

    Furthermore, "Z_FINISH also informs inflate to not maintain a
    sliding window if the stream completes, which reduces inflate's
    memory footprint."

Other than that this commit is expected to be functionally equivalent
and does not yet reuse the `zstream`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 38 ++++++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 31af075c1d..31e7255056 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -195,10 +195,10 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 	}
 
 	if (typ == BLOCK_TYPE_LOG) {
-		int block_header_skip = 4 + header_off;
-		uLongf dst_len = sz - block_header_skip; /* total size of dest
-							    buffer. */
-		uLongf src_len = block->len - block_header_skip;
+		uint32_t block_header_skip = 4 + header_off;
+		uLong dst_len = sz - block_header_skip;
+		uLong src_len = block->len - block_header_skip;
+		z_stream stream = {0};
 
 		/* Log blocks specify the *uncompressed* size in their header. */
 		REFTABLE_ALLOC_GROW(br->uncompressed_data, sz,
@@ -207,15 +207,33 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		/* Copy over the block header verbatim. It's not compressed. */
 		memcpy(br->uncompressed_data, block->data, block_header_skip);
 
-		/* Uncompress */
-		if (Z_OK !=
-		    uncompress2(br->uncompressed_data + block_header_skip, &dst_len,
-				block->data + block_header_skip, &src_len)) {
+		err = inflateInit(&stream);
+		if (err != Z_OK) {
 			err = REFTABLE_ZLIB_ERROR;
 			goto done;
 		}
 
-		if (dst_len + block_header_skip != sz) {
+		stream.next_in = block->data + block_header_skip;
+		stream.avail_in = src_len;
+		stream.next_out = br->uncompressed_data + block_header_skip;
+		stream.avail_out = dst_len;
+
+		/*
+		 * We know both input as well as output size, and we know that
+		 * the sizes should never be bigger than `uInt_MAX` because
+		 * blocks can at most be 16MB large. We can thus use `Z_FINISH`
+		 * here to instruct zlib to inflate the data in one go, which
+		 * is more efficient than using `Z_NO_FLUSH`.
+		 */
+		err = inflate(&stream, Z_FINISH);
+		inflateEnd(&stream);
+		if (err != Z_STREAM_END) {
+			err = REFTABLE_ZLIB_ERROR;
+			goto done;
+		}
+		err = 0;
+
+		if (stream.total_out + block_header_skip != sz) {
 			err = REFTABLE_FORMAT_ERROR;
 			goto done;
 		}
@@ -224,7 +242,7 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		reftable_block_done(block);
 		block->data = br->uncompressed_data;
 		block->len = sz;
-		full_block_size = src_len + block_header_skip;
+		full_block_size = src_len + block_header_skip - stream.avail_in;
 	} else if (full_block_size == 0) {
 		full_block_size = sz;
 	} else if (sz < full_block_size && sz < block->len &&
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 9/9] reftable/block: reuse `zstream` state on inflation
  2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
                   ` (7 preceding siblings ...)
  2024-03-27  6:37 ` [PATCH 8/9] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
@ 2024-03-27  6:37 ` Patrick Steinhardt
  2024-04-03 13:33 ` [PATCH 0/9] reftable: optimize table and block iterators Karthik Nayak
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
  10 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-03-27  6:37 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 5387 bytes --]

When calling `inflateInit()` and `inflate()`, the zlib library will
allocate several data structures for the underlying `zstream` to keep
track of various information. Thus, when inflating repeatedly, it is
possible to optimize memory allocation patterns by reusing the `zstream`
and then calling `inflateReset()` on it to prepare it for the next chunk
of data to inflate.

This is exactly what the reftable code is doing: when iterating through
reflogs we need to potentially inflate many log blocks, but we discard
the `zstream` every single time. Instead, as we reuse the `block_reader`
for each of the blocks anyway, we can initialize the `zstream` once and
then reuse it for subsequent inflations.

Refactor the code to do so, which leads to a significant reduction in
the number of allocations. The following measurements were done when
iterating through 1 million reflog entries. Before:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 23,028 allocs, 22,906 frees, 162,813,552 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 302 allocs, 180 frees, 88,352 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c  | 25 +++++++++++++++----------
 reftable/block.h  |  3 +++
 reftable/reader.c |  1 +
 3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 31e7255056..2e39640b1c 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -198,7 +198,6 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		uint32_t block_header_skip = 4 + header_off;
 		uLong dst_len = sz - block_header_skip;
 		uLong src_len = block->len - block_header_skip;
-		z_stream stream = {0};
 
 		/* Log blocks specify the *uncompressed* size in their header. */
 		REFTABLE_ALLOC_GROW(br->uncompressed_data, sz,
@@ -207,16 +206,21 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		/* Copy over the block header verbatim. It's not compressed. */
 		memcpy(br->uncompressed_data, block->data, block_header_skip);
 
-		err = inflateInit(&stream);
+		if (!br->zstream) {
+			REFTABLE_CALLOC_ARRAY(br->zstream, 1);
+			err = inflateInit(br->zstream);
+		} else {
+			err = inflateReset(br->zstream);
+		}
 		if (err != Z_OK) {
 			err = REFTABLE_ZLIB_ERROR;
 			goto done;
 		}
 
-		stream.next_in = block->data + block_header_skip;
-		stream.avail_in = src_len;
-		stream.next_out = br->uncompressed_data + block_header_skip;
-		stream.avail_out = dst_len;
+		br->zstream->next_in = block->data + block_header_skip;
+		br->zstream->avail_in = src_len;
+		br->zstream->next_out = br->uncompressed_data + block_header_skip;
+		br->zstream->avail_out = dst_len;
 
 		/*
 		 * We know both input as well as output size, and we know that
@@ -225,15 +229,14 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		 * here to instruct zlib to inflate the data in one go, which
 		 * is more efficient than using `Z_NO_FLUSH`.
 		 */
-		err = inflate(&stream, Z_FINISH);
-		inflateEnd(&stream);
+		err = inflate(br->zstream, Z_FINISH);
 		if (err != Z_STREAM_END) {
 			err = REFTABLE_ZLIB_ERROR;
 			goto done;
 		}
 		err = 0;
 
-		if (stream.total_out + block_header_skip != sz) {
+		if (br->zstream->total_out + block_header_skip != sz) {
 			err = REFTABLE_FORMAT_ERROR;
 			goto done;
 		}
@@ -242,7 +245,7 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		reftable_block_done(block);
 		block->data = br->uncompressed_data;
 		block->len = sz;
-		full_block_size = src_len + block_header_skip - stream.avail_in;
+		full_block_size = src_len + block_header_skip - br->zstream->avail_in;
 	} else if (full_block_size == 0) {
 		full_block_size = sz;
 	} else if (sz < full_block_size && sz < block->len &&
@@ -275,6 +278,8 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 
 void block_reader_release(struct block_reader *br)
 {
+	inflateEnd(br->zstream);
+	reftable_free(br->zstream);
 	reftable_free(br->uncompressed_data);
 	reftable_block_done(&br->block);
 }
diff --git a/reftable/block.h b/reftable/block.h
index 79275d67f1..93f4338771 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -56,6 +56,8 @@ int block_writer_finish(struct block_writer *w);
 /* clears out internally allocated block_writer members. */
 void block_writer_release(struct block_writer *bw);
 
+struct z_stream;
+
 /* Read a block. */
 struct block_reader {
 	/* offset of the block header; nonzero for the first block in a
@@ -67,6 +69,7 @@ struct block_reader {
 	int hash_size;
 
 	/* Uncompressed data for log entries. */
+	z_stream *zstream;
 	unsigned char *uncompressed_data;
 	size_t uncompressed_cap;
 
diff --git a/reftable/reader.c b/reftable/reader.c
index aacd5f1337..481dff10d4 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -459,6 +459,7 @@ static int reader_seek_linear(struct table_iter *ti,
 		 * we would not do a linear search there anymore.
 		 */
 		memset(&next.br.block, 0, sizeof(next.br.block));
+		next.br.zstream = NULL;
 		next.br.uncompressed_data = NULL;
 		next.br.uncompressed_cap = 0;
 
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter`
  2024-03-27  6:37 ` [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
@ 2024-04-03  4:52   ` Justin Tobler
  2024-04-03 13:10     ` Patrick Steinhardt
  0 siblings, 1 reply; 31+ messages in thread
From: Justin Tobler @ 2024-04-03  4:52 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 24/03/27 07:37AM, Patrick Steinhardt wrote:
> The table iterator allows the caller to iterate through all records in a
> reftable table. To do so it iterates through all blocks of the desired
> type one by one, where for each block it creates a new block iterator
> and yields all its entries.
> 
> One of the things that is somewhat confusing in this context is who owns
> the block reader that is being used to read the blocks and pass them to
> the block iterator. Intuitively, as the table iterator is responsible
> for iterating through the blocks, one would assume that this iterator is
> also responsible for managing the lifecycle of the reader. And while it
> somewhat is, the block reader is ultimately stored inside of the block
> iterator.
> 
> Refactor the code such that the block reader is instead fully managed by
> the table iterator. Instead of passing the reader to the block iterator,
> we now only end up passing the block data to it. Despite clearing up the
> lifecycle of the reader, it will also allow for better reuse of the
> reader in subsequent patches.
> 
> The following benchmark prints a single matching ref out of 1 million
> refs. Before:
> 
>   HEAP SUMMARY:
>       in use at exit: 13,603 bytes in 125 blocks
>     total heap usage: 6,607 allocs, 6,482 frees, 509,635 bytes allocated
> 
> After:
> 
>   HEAP SUMMARY:
>       in use at exit: 13,603 bytes in 125 blocks
>     total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated
> 
> Note that while there are more allocation and free calls now, the
> overall number of bytes allocated is significantly lower. The number of
> allocations will be reduced significantly by the next patch though.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
...
> @@ -340,14 +344,14 @@ void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
>  int block_iter_next(struct block_iter *it, struct reftable_record *rec)
>  {
>  	struct string_view in = {
> -		.buf = it->br->block.data + it->next_off,
> -		.len = it->br->block_len - it->next_off,
> +		.buf = (unsigned char *) it->block + it->next_off,

Would it be best to use the `uint8_t *` type instead of `unsigned char *`
to match `string_view.buf`? Not sure if it matters in this case.

> +		.len = it->block_len - it->next_off,
>  	};
...  
> diff --git a/reftable/block.h b/reftable/block.h
> index 601a1e0e89..b41efa5042 100644
> --- a/reftable/block.h
> +++ b/reftable/block.h
> @@ -84,16 +84,18 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl,
>  void block_reader_release(struct block_reader *br);
>  
>  /* Returns the block type (eg. 'r' for refs) */
> -uint8_t block_reader_type(struct block_reader *r);
> +uint8_t block_reader_type(const struct block_reader *r);
>  
>  /* Decodes the first key in the block */
> -int block_reader_first_key(struct block_reader *br, struct strbuf *key);
> +int block_reader_first_key(const struct block_reader *br, struct strbuf *key);
>  
>  /* Iterate over entries in a block */
>  struct block_iter {
>  	/* offset within the block of the next entry to read. */
>  	uint32_t next_off;
> -	struct block_reader *br;
> +	const unsigned char *block;

Same question here. Would it be better to use `uint8_t *`? Or does it not
really matter?

> +	size_t block_len;
> +	int hash_size;
>  
>  	/* key for last entry we read. */
>  	struct strbuf last_key;
> @@ -106,17 +108,22 @@ struct block_iter {
>  }
>  
>  /* Position `it` at start of the block */
> -void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
> +void block_iter_seek_start(struct block_iter *it, const struct block_reader *br);
>  
>  /* Position `it` to the `want` key in the block */
> -int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
> +int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
>  			struct strbuf *want);
>  
> -void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
> +void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src);
>  
>  /* return < 0 for error, 0 for OK, > 0 for EOF. */
>  int block_iter_next(struct block_iter *it, struct reftable_record *rec);
>  
> +/*
> + * Reset the block iterator to pristine state without releasing its memory.
> + */

Do we want to make the comment a single line to match the other adjacent
examples?

> +void block_iter_reset(struct block_iter *it);
> +
>  /* deallocate memory for `it`. The block reader and its block is left intact. */
>  void block_iter_close(struct block_iter *it);
>  
...

-Justin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter`
  2024-04-03  4:52   ` Justin Tobler
@ 2024-04-03 13:10     ` Patrick Steinhardt
  0 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-03 13:10 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 5406 bytes --]

On Tue, Apr 02, 2024 at 11:52:58PM -0500, Justin Tobler wrote:
> On 24/03/27 07:37AM, Patrick Steinhardt wrote:
> > The table iterator allows the caller to iterate through all records in a
> > reftable table. To do so it iterates through all blocks of the desired
> > type one by one, where for each block it creates a new block iterator
> > and yields all its entries.
> > 
> > One of the things that is somewhat confusing in this context is who owns
> > the block reader that is being used to read the blocks and pass them to
> > the block iterator. Intuitively, as the table iterator is responsible
> > for iterating through the blocks, one would assume that this iterator is
> > also responsible for managing the lifecycle of the reader. And while it
> > somewhat is, the block reader is ultimately stored inside of the block
> > iterator.
> > 
> > Refactor the code such that the block reader is instead fully managed by
> > the table iterator. Instead of passing the reader to the block iterator,
> > we now only end up passing the block data to it. Despite clearing up the
> > lifecycle of the reader, it will also allow for better reuse of the
> > reader in subsequent patches.
> > 
> > The following benchmark prints a single matching ref out of 1 million
> > refs. Before:
> > 
> >   HEAP SUMMARY:
> >       in use at exit: 13,603 bytes in 125 blocks
> >     total heap usage: 6,607 allocs, 6,482 frees, 509,635 bytes allocated
> > 
> > After:
> > 
> >   HEAP SUMMARY:
> >       in use at exit: 13,603 bytes in 125 blocks
> >     total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated
> > 
> > Note that while there are more allocation and free calls now, the
> > overall number of bytes allocated is significantly lower. The number of
> > allocations will be reduced significantly by the next patch though.
> > 
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ...
> > @@ -340,14 +344,14 @@ void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
> >  int block_iter_next(struct block_iter *it, struct reftable_record *rec)
> >  {
> >  	struct string_view in = {
> > -		.buf = it->br->block.data + it->next_off,
> > -		.len = it->br->block_len - it->next_off,
> > +		.buf = (unsigned char *) it->block + it->next_off,
> 
> Would it be best to use the `uint8_t *` type instead of `unsigned char *`
> to match `string_view.buf`? Not sure if it matters in this case.

It doesn't really. `uint8_t` should always be equivalent to `unsigned
char`. I'd also think that `unsigned char` is a bit more idiomatic in
our codebase, but may be mistaken there.

> > +		.len = it->block_len - it->next_off,
> >  	};
> ...  
> > diff --git a/reftable/block.h b/reftable/block.h
> > index 601a1e0e89..b41efa5042 100644
> > --- a/reftable/block.h
> > +++ b/reftable/block.h
> > @@ -84,16 +84,18 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl,
> >  void block_reader_release(struct block_reader *br);
> >  
> >  /* Returns the block type (eg. 'r' for refs) */
> > -uint8_t block_reader_type(struct block_reader *r);
> > +uint8_t block_reader_type(const struct block_reader *r);
> >  
> >  /* Decodes the first key in the block */
> > -int block_reader_first_key(struct block_reader *br, struct strbuf *key);
> > +int block_reader_first_key(const struct block_reader *br, struct strbuf *key);
> >  
> >  /* Iterate over entries in a block */
> >  struct block_iter {
> >  	/* offset within the block of the next entry to read. */
> >  	uint32_t next_off;
> > -	struct block_reader *br;
> > +	const unsigned char *block;
> 
> Same question here. Would it be better to use `uint8_t *`? Or does it not
> really matter?

Same answer :)

> > +	size_t block_len;
> > +	int hash_size;
> >  
> >  	/* key for last entry we read. */
> >  	struct strbuf last_key;
> > @@ -106,17 +108,22 @@ struct block_iter {
> >  }
> >  
> >  /* Position `it` at start of the block */
> > -void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
> > +void block_iter_seek_start(struct block_iter *it, const struct block_reader *br);
> >  
> >  /* Position `it` to the `want` key in the block */
> > -int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
> > +int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
> >  			struct strbuf *want);
> >  
> > -void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
> > +void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src);
> >  
> >  /* return < 0 for error, 0 for OK, > 0 for EOF. */
> >  int block_iter_next(struct block_iter *it, struct reftable_record *rec);
> >  
> > +/*
> > + * Reset the block iterator to pristine state without releasing its memory.
> > + */
> 
> Do we want to make the comment a single line to match the other adjacent
> examples?

Can do. I'll refrain from sending a new version of this patch series for
now though as none of the comments really need addressing, in my
opinion. But please, feel free to push back in case you disagree.

Patrick

> > +void block_iter_reset(struct block_iter *it);
> > +
> >  /* deallocate memory for `it`. The block reader and its block is left intact. */
> >  void block_iter_close(struct block_iter *it);
> >  
> ...
> 
> -Justin

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/9] reftable/block: introduce `block_reader_release()`
  2024-03-27  6:37 ` [PATCH 4/9] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
@ 2024-04-03 13:16   ` Karthik Nayak
  2024-04-08 12:10     ` Patrick Steinhardt
  0 siblings, 1 reply; 31+ messages in thread
From: Karthik Nayak @ 2024-04-03 13:16 UTC (permalink / raw)
  To: Patrick Steinhardt, git

[-- Attachment #1: Type: text/plain, Size: 606 bytes --]

Patrick Steinhardt <ps@pks.im> writes:
>
> diff --git a/reftable/reader.c b/reftable/reader.c
> index f70efa2b7c..f925570bf3 100644
> --- a/reftable/reader.c
> +++ b/reftable/reader.c
> @@ -253,7 +253,7 @@ static void table_iter_block_done(struct table_iter *ti)
>  	if (!ti->bi.br) {
>  		return;
>  	}
> -	reftable_block_done(&ti->bi.br->block);
> +	block_reader_release(ti->bi.br);
>  	FREE_AND_NULL(ti->bi.br);
>
>  	ti->bi.last_key.len = 0;

I would expect `FREE_AND_NULL(ti->bi.br)` to also be within
`block_reader_release`, but then you'd have to pass `table_iter` or
`**`. So I guess this is okay.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/9] reftable: optimize table and block iterators
  2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
                   ` (8 preceding siblings ...)
  2024-03-27  6:37 ` [PATCH 9/9] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
@ 2024-04-03 13:33 ` Karthik Nayak
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
  10 siblings, 0 replies; 31+ messages in thread
From: Karthik Nayak @ 2024-04-03 13:33 UTC (permalink / raw)
  To: Patrick Steinhardt, git

[-- Attachment #1: Type: text/plain, Size: 1050 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
>  * The code to iterate over reftable blocks has seen some optimization
>    to reduce memory allocation and deallocation.
>
> Preceding patch series have optimized memory allocation patterns when
> iterating through ref or log records. This has led to a constant number
> of allocations when decoding and yielding those records to our callers.
> One thing that is still missing though is an optimization for the table
> and block iterators which are responsible for iterating through the
> separate blocks in the table. So while the number of allocations does
> not scale with the number of refs (directly) anymore, it still scales
> with the number of blocks.
>
> This is getting tackled by this patch series now which refactors the
> table and block iterators such that the former can reuse the latter
> without reallocations. With this change iterating through records is now
> using a truly constant number of allocations.

This patch series already looks good to me, I have nothing to add here.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 4/9] reftable/block: introduce `block_reader_release()`
  2024-04-03 13:16   ` Karthik Nayak
@ 2024-04-08 12:10     ` Patrick Steinhardt
  0 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-08 12:10 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1107 bytes --]

On Wed, Apr 03, 2024 at 06:16:35AM -0700, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> >
> > diff --git a/reftable/reader.c b/reftable/reader.c
> > index f70efa2b7c..f925570bf3 100644
> > --- a/reftable/reader.c
> > +++ b/reftable/reader.c
> > @@ -253,7 +253,7 @@ static void table_iter_block_done(struct table_iter *ti)
> >  	if (!ti->bi.br) {
> >  		return;
> >  	}
> > -	reftable_block_done(&ti->bi.br->block);
> > +	block_reader_release(ti->bi.br);
> >  	FREE_AND_NULL(ti->bi.br);
> >
> >  	ti->bi.last_key.len = 0;
> 
> I would expect `FREE_AND_NULL(ti->bi.br)` to also be within
> `block_reader_release`, but then you'd have to pass `table_iter` or
> `**`. So I guess this is okay.

It is customary in the Git codebase to discern `release` and `free`
functions. `release` will release all memory hosted by a structure, but
will not release the structure itself. `free` on the other hand is
expected to free both the memory hosted by a structure, and the
structure itself. `block_reader_release()` thus shouldn't free the
`struct block_reader`.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 00/10] reftable: optimize table and block iterators
  2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
                   ` (9 preceding siblings ...)
  2024-04-03 13:33 ` [PATCH 0/9] reftable: optimize table and block iterators Karthik Nayak
@ 2024-04-08 12:16 ` Patrick Steinhardt
  2024-04-08 12:16   ` [PATCH v2 01/10] reftable/block: rename `block_reader_start()` Patrick Steinhardt
                     ` (11 more replies)
  10 siblings, 12 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-08 12:16 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Karthik Nayak, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 4982 bytes --]

Hi,

this is the second version of my patch series that aims to optimize
the reftable table and block iterators.

Changes compared to v1:

    - The series now deepends on ps/reftable-binsearch-update at
      d51d8cc368 (reftable/block: avoid decoding keys when searching
      restart points, 2024-04-03). This is to resolve a merge conflict
      with that other series which has landed in "next" already.

    - Rewrote a comment to be single-line to fit into the style of other
      comments better.

    - A new patch on top that avoids copying block iterators altogether
      for another speedup.

Thanks!

Patrick

Patrick Steinhardt (10):
  reftable/block: rename `block_reader_start()`
  reftable/block: merge `block_iter_seek()` and `block_reader_seek()`
  reftable/block: better grouping of functions
  reftable/block: introduce `block_reader_release()`
  reftable/block: move ownership of block reader into `struct
    table_iter`
  reftable/reader: iterate to next block in place
  reftable/block: reuse uncompressed blocks
  reftable/block: open-code call to `uncompress2()`
  reftable/block: reuse `zstream` state on inflation
  reftable/block: avoid copying block iterators on seek

 reftable/block.c      | 176 +++++++++++++++++++++++++-----------------
 reftable/block.h      |  47 ++++++-----
 reftable/block_test.c |   6 +-
 reftable/iter.c       |   2 +-
 reftable/reader.c     | 176 ++++++++++++++++++++++--------------------
 5 files changed, 229 insertions(+), 178 deletions(-)

Range-diff against v1:
 1:  24b0dda29e =  1:  eb487557a8 reftable/block: rename `block_reader_start()`
 2:  a2b7f0f559 !  2:  d0b318b8ee reftable/block: merge `block_iter_seek()` and `block_reader_seek()`
    @@ reftable/block.c: int block_reader_first_key(struct block_reader *br, struct str
     +int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
     +			struct strbuf *want)
      {
    - 	struct restart_find_args args = {
    - 		.key = *want,
    + 	struct restart_needle_less_args args = {
    + 		.needle = *want,
     
      ## reftable/block.h ##
     @@ reftable/block.h: int block_reader_init(struct block_reader *br, struct reftable_block *bl,
 3:  88a705b3e2 =  3:  c3f928d1e9 reftable/block: better grouping of functions
 4:  9a1253649a =  4:  35f1bf5072 reftable/block: introduce `block_reader_release()`
 5:  f10882a084 !  5:  e8e8bbae62 reftable/block: move ownership of block reader into `struct table_iter`
    @@ reftable/block.c: int block_reader_first_key(struct block_reader *br, struct str
      	it->next_off = br->header_off + 4;
      }
     @@ reftable/block.c: void block_iter_seek_start(struct block_iter *it, struct block_reader *br)
    - struct restart_find_args {
    + struct restart_needle_less_args {
      	int error;
    - 	struct strbuf key;
    --	struct block_reader *r;
    -+	const struct block_reader *r;
    + 	struct strbuf needle;
    +-	struct block_reader *reader;
    ++	const struct block_reader *reader;
      };
      
    - static int restart_key_less(size_t idx, void *args)
    -@@ reftable/block.c: static int restart_key_less(size_t idx, void *args)
    - 	return result < 0;
    + static int restart_needle_less(size_t idx, void *_args)
    +@@ reftable/block.c: static int restart_needle_less(size_t idx, void *_args)
    + 	return args->needle.len < suffix_len;
      }
      
     -void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
    @@ reftable/block.c: int block_iter_next(struct block_iter *it, struct reftable_rec
     +int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
      			struct strbuf *want)
      {
    - 	struct restart_find_args args = {
    + 	struct restart_needle_less_args args = {
     @@ reftable/block.c: int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
      		it->next_off = block_reader_restart_offset(br, i - 1);
      	else
    @@ reftable/block.h: struct block_iter {
      /* return < 0 for error, 0 for OK, > 0 for EOF. */
      int block_iter_next(struct block_iter *it, struct reftable_record *rec);
      
    -+/*
    -+ * Reset the block iterator to pristine state without releasing its memory.
    -+ */
    ++/* Reset the block iterator to pristine state without releasing its memory. */
     +void block_iter_reset(struct block_iter *it);
     +
      /* deallocate memory for `it`. The block reader and its block is left intact. */
 6:  ae359cb714 =  6:  685f0a40bc reftable/reader: iterate to next block in place
 7:  1e4eba7e9b =  7:  a7906a3383 reftable/block: reuse uncompressed blocks
 8:  bf4c1ab797 =  8:  6635c7b986 reftable/block: open-code call to `uncompress2()`
 9:  43e6538968 =  9:  587b5601c0 reftable/block: reuse `zstream` state on inflation
 -:  ---------- > 10:  cc5ff0d598 reftable/block: avoid copying block iterators on seek
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 01/10] reftable/block: rename `block_reader_start()`
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
@ 2024-04-08 12:16   ` Patrick Steinhardt
  2024-04-08 12:16   ` [PATCH v2 02/10] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-08 12:16 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Karthik Nayak, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 3109 bytes --]

The function `block_reader_start()` does not really apply to the block
reader, but to the block iterator. It's name is thus somewhat confusing.
Rename it to `block_iter_seek_start()` to clarify.

We will rename `block_reader_seek()` in similar spirit in the next
commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c      | 2 +-
 reftable/block.h      | 2 +-
 reftable/block_test.c | 2 +-
 reftable/iter.c       | 2 +-
 reftable/reader.c     | 4 ++--
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 298e8c56b9..53ea92d04e 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -266,7 +266,7 @@ static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
 	return get_be24(br->restart_bytes + 3 * i);
 }
 
-void block_reader_start(struct block_reader *br, struct block_iter *it)
+void block_iter_seek_start(struct block_iter *it, struct block_reader *br)
 {
 	it->br = br;
 	strbuf_reset(&it->last_key);
diff --git a/reftable/block.h b/reftable/block.h
index 47acc62c0a..44a9a8de7d 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -98,7 +98,7 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl,
 		      int hash_size);
 
 /* Position `it` at start of the block */
-void block_reader_start(struct block_reader *br, struct block_iter *it);
+void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
 
 /* Position `it` to the `want` key in the block */
 int block_reader_seek(struct block_reader *br, struct block_iter *it,
diff --git a/reftable/block_test.c b/reftable/block_test.c
index e162c6e33f..a719be7c50 100644
--- a/reftable/block_test.c
+++ b/reftable/block_test.c
@@ -69,7 +69,7 @@ static void test_block_read_write(void)
 
 	block_reader_init(&br, &block, header_off, block_size, GIT_SHA1_RAWSZ);
 
-	block_reader_start(&br, &it);
+	block_iter_seek_start(&it, &br);
 
 	while (1) {
 		int r = block_iter_next(&it, &rec);
diff --git a/reftable/iter.c b/reftable/iter.c
index 7aa30c4a51..aa9ac199b1 100644
--- a/reftable/iter.c
+++ b/reftable/iter.c
@@ -115,7 +115,7 @@ static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
 		/* indexed block does not exist. */
 		return REFTABLE_FORMAT_ERROR;
 	}
-	block_reader_start(&it->block_reader, &it->cur);
+	block_iter_seek_start(&it->cur, &it->block_reader);
 	return 0;
 }
 
diff --git a/reftable/reader.c b/reftable/reader.c
index b113daab77..d7d47e8640 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -345,7 +345,7 @@ static int table_iter_next_block(struct table_iter *dest,
 		*brp = br;
 
 		dest->is_finished = 0;
-		block_reader_start(brp, &dest->bi);
+		block_iter_seek_start(&dest->bi, brp);
 	}
 	return 0;
 }
@@ -429,7 +429,7 @@ static int reader_table_iter_at(struct reftable_reader *r,
 	ti->r = r;
 	ti->typ = block_reader_type(brp);
 	ti->block_off = off;
-	block_reader_start(brp, &ti->bi);
+	block_iter_seek_start(&ti->bi, brp);
 	return 0;
 }
 
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 02/10] reftable/block: merge `block_iter_seek()` and `block_reader_seek()`
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
  2024-04-08 12:16   ` [PATCH v2 01/10] reftable/block: rename `block_reader_start()` Patrick Steinhardt
@ 2024-04-08 12:16   ` Patrick Steinhardt
  2024-04-08 12:16   ` [PATCH v2 03/10] reftable/block: better grouping of functions Patrick Steinhardt
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-08 12:16 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Karthik Nayak, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 3824 bytes --]

The function `block_iter_seek()` is merely a simple wrapper around
`block_reader_seek()`. Merge those two functions into a new function
`block_iter_seek_key()` that more clearly says what it is actually
doing.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c      | 9 ++-------
 reftable/block.h      | 7 ++-----
 reftable/block_test.c | 4 ++--
 reftable/reader.c     | 4 ++--
 4 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 53ea92d04e..1015f9c04c 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -373,19 +373,14 @@ int block_reader_first_key(struct block_reader *br, struct strbuf *key)
 	return 0;
 }
 
-int block_iter_seek(struct block_iter *it, struct strbuf *want)
-{
-	return block_reader_seek(it->br, it, want);
-}
-
 void block_iter_close(struct block_iter *it)
 {
 	strbuf_release(&it->last_key);
 	strbuf_release(&it->scratch);
 }
 
-int block_reader_seek(struct block_reader *br, struct block_iter *it,
-		      struct strbuf *want)
+int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
+			struct strbuf *want)
 {
 	struct restart_needle_less_args args = {
 		.needle = *want,
diff --git a/reftable/block.h b/reftable/block.h
index 44a9a8de7d..1734bee917 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -101,8 +101,8 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl,
 void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
 
 /* Position `it` to the `want` key in the block */
-int block_reader_seek(struct block_reader *br, struct block_iter *it,
-		      struct strbuf *want);
+int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
+			struct strbuf *want);
 
 /* Returns the block type (eg. 'r' for refs) */
 uint8_t block_reader_type(struct block_reader *r);
@@ -115,9 +115,6 @@ void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
 /* return < 0 for error, 0 for OK, > 0 for EOF. */
 int block_iter_next(struct block_iter *it, struct reftable_record *rec);
 
-/* Seek to `want` with in the block pointed to by `it` */
-int block_iter_seek(struct block_iter *it, struct strbuf *want);
-
 /* deallocate memory for `it`. The block reader and its block is left intact. */
 void block_iter_close(struct block_iter *it);
 
diff --git a/reftable/block_test.c b/reftable/block_test.c
index a719be7c50..26a9cfbc83 100644
--- a/reftable/block_test.c
+++ b/reftable/block_test.c
@@ -89,7 +89,7 @@ static void test_block_read_write(void)
 		strbuf_reset(&want);
 		strbuf_addstr(&want, names[i]);
 
-		n = block_reader_seek(&br, &it, &want);
+		n = block_iter_seek_key(&it, &br, &want);
 		EXPECT(n == 0);
 
 		n = block_iter_next(&it, &rec);
@@ -98,7 +98,7 @@ static void test_block_read_write(void)
 		EXPECT_STREQ(names[i], rec.u.ref.refname);
 
 		want.len--;
-		n = block_reader_seek(&br, &it, &want);
+		n = block_iter_seek_key(&it, &br, &want);
 		EXPECT(n == 0);
 
 		n = block_iter_next(&it, &rec);
diff --git a/reftable/reader.c b/reftable/reader.c
index d7d47e8640..f70efa2b7c 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -483,7 +483,7 @@ static int reader_seek_linear(struct table_iter *ti,
 		table_iter_copy_from(ti, &next);
 	}
 
-	err = block_iter_seek(&ti->bi, &want_key);
+	err = block_iter_seek_key(&ti->bi, ti->bi.br, &want_key);
 	if (err < 0)
 		goto done;
 	err = 0;
@@ -558,7 +558,7 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		if (err != 0)
 			goto done;
 
-		err = block_iter_seek(&next.bi, &want_index.u.idx.last_key);
+		err = block_iter_seek_key(&next.bi, next.bi.br, &want_index.u.idx.last_key);
 		if (err < 0)
 			goto done;
 
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 03/10] reftable/block: better grouping of functions
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
  2024-04-08 12:16   ` [PATCH v2 01/10] reftable/block: rename `block_reader_start()` Patrick Steinhardt
  2024-04-08 12:16   ` [PATCH v2 02/10] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
@ 2024-04-08 12:16   ` Patrick Steinhardt
  2024-04-08 12:16   ` [PATCH v2 04/10] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-08 12:16 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Karthik Nayak, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 4026 bytes --]

Function definitions and declaration of `struct block_reader` and
`struct block_iter` are somewhat mixed up, making it hard to see which
functions belong together. Rearrange them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 50 ++++++++++++++++++++++++------------------------
 reftable/block.h | 22 ++++++++++-----------
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 1015f9c04c..e65453e11b 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -175,11 +175,6 @@ int block_writer_finish(struct block_writer *w)
 	return w->next;
 }
 
-uint8_t block_reader_type(struct block_reader *r)
-{
-	return r->block.data[r->header_off];
-}
-
 int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		      uint32_t header_off, uint32_t table_block_size,
 		      int hash_size)
@@ -261,6 +256,31 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 	return err;
 }
 
+uint8_t block_reader_type(struct block_reader *r)
+{
+	return r->block.data[r->header_off];
+}
+
+int block_reader_first_key(struct block_reader *br, struct strbuf *key)
+{
+	int off = br->header_off + 4, n;
+	struct string_view in = {
+		.buf = br->block.data + off,
+		.len = br->block_len - off,
+	};
+	uint8_t extra = 0;
+
+	strbuf_reset(key);
+
+	n = reftable_decode_key(key, &extra, in);
+	if (n < 0)
+		return n;
+	if (!key->len)
+		return REFTABLE_FORMAT_ERROR;
+
+	return 0;
+}
+
 static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
 {
 	return get_be24(br->restart_bytes + 3 * i);
@@ -353,26 +373,6 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 	return 0;
 }
 
-int block_reader_first_key(struct block_reader *br, struct strbuf *key)
-{
-	int off = br->header_off + 4, n;
-	struct string_view in = {
-		.buf = br->block.data + off,
-		.len = br->block_len - off,
-	};
-	uint8_t extra = 0;
-
-	strbuf_reset(key);
-
-	n = reftable_decode_key(key, &extra, in);
-	if (n < 0)
-		return n;
-	if (!key->len)
-		return REFTABLE_FORMAT_ERROR;
-
-	return 0;
-}
-
 void block_iter_close(struct block_iter *it)
 {
 	strbuf_release(&it->last_key);
diff --git a/reftable/block.h b/reftable/block.h
index 1734bee917..d73ed73549 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -76,6 +76,17 @@ struct block_reader {
 	uint32_t full_block_size;
 };
 
+/* initializes a block reader. */
+int block_reader_init(struct block_reader *br, struct reftable_block *bl,
+		      uint32_t header_off, uint32_t table_block_size,
+		      int hash_size);
+
+/* Returns the block type (eg. 'r' for refs) */
+uint8_t block_reader_type(struct block_reader *r);
+
+/* Decodes the first key in the block */
+int block_reader_first_key(struct block_reader *br, struct strbuf *key);
+
 /* Iterate over entries in a block */
 struct block_iter {
 	/* offset within the block of the next entry to read. */
@@ -92,11 +103,6 @@ struct block_iter {
 	.scratch = STRBUF_INIT, \
 }
 
-/* initializes a block reader. */
-int block_reader_init(struct block_reader *br, struct reftable_block *bl,
-		      uint32_t header_off, uint32_t table_block_size,
-		      int hash_size);
-
 /* Position `it` at start of the block */
 void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
 
@@ -104,12 +110,6 @@ void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
 int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
 			struct strbuf *want);
 
-/* Returns the block type (eg. 'r' for refs) */
-uint8_t block_reader_type(struct block_reader *r);
-
-/* Decodes the first key in the block */
-int block_reader_first_key(struct block_reader *br, struct strbuf *key);
-
 void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
 
 /* return < 0 for error, 0 for OK, > 0 for EOF. */
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 04/10] reftable/block: introduce `block_reader_release()`
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2024-04-08 12:16   ` [PATCH v2 03/10] reftable/block: better grouping of functions Patrick Steinhardt
@ 2024-04-08 12:16   ` Patrick Steinhardt
  2024-04-08 12:16   ` [PATCH v2 05/10] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-08 12:16 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Karthik Nayak, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1693 bytes --]

Introduce a new function `block_reader_release()` that releases
resources acquired by the block reader. This function will be extended
in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c  | 5 +++++
 reftable/block.h  | 2 ++
 reftable/reader.c | 2 +-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/reftable/block.c b/reftable/block.c
index e65453e11b..fe836c21e5 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -256,6 +256,11 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 	return err;
 }
 
+void block_reader_release(struct block_reader *br)
+{
+	reftable_block_done(&br->block);
+}
+
 uint8_t block_reader_type(struct block_reader *r)
 {
 	return r->block.data[r->header_off];
diff --git a/reftable/block.h b/reftable/block.h
index d73ed73549..601a1e0e89 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -81,6 +81,8 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl,
 		      uint32_t header_off, uint32_t table_block_size,
 		      int hash_size);
 
+void block_reader_release(struct block_reader *br);
+
 /* Returns the block type (eg. 'r' for refs) */
 uint8_t block_reader_type(struct block_reader *r);
 
diff --git a/reftable/reader.c b/reftable/reader.c
index f70efa2b7c..f925570bf3 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -253,7 +253,7 @@ static void table_iter_block_done(struct table_iter *ti)
 	if (!ti->bi.br) {
 		return;
 	}
-	reftable_block_done(&ti->bi.br->block);
+	block_reader_release(ti->bi.br);
 	FREE_AND_NULL(ti->bi.br);
 
 	ti->bi.last_key.len = 0;
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 05/10] reftable/block: move ownership of block reader into `struct table_iter`
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2024-04-08 12:16   ` [PATCH v2 04/10] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
@ 2024-04-08 12:16   ` Patrick Steinhardt
  2024-04-08 12:16   ` [PATCH v2 06/10] reftable/reader: iterate to next block in place Patrick Steinhardt
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-08 12:16 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Karthik Nayak, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 15761 bytes --]

The table iterator allows the caller to iterate through all records in a
reftable table. To do so it iterates through all blocks of the desired
type one by one, where for each block it creates a new block iterator
and yields all its entries.

One of the things that is somewhat confusing in this context is who owns
the block reader that is being used to read the blocks and pass them to
the block iterator. Intuitively, as the table iterator is responsible
for iterating through the blocks, one would assume that this iterator is
also responsible for managing the lifecycle of the reader. And while it
somewhat is, the block reader is ultimately stored inside of the block
iterator.

Refactor the code such that the block reader is instead fully managed by
the table iterator. Instead of passing the reader to the block iterator,
we now only end up passing the block data to it. Despite clearing up the
lifecycle of the reader, it will also allow for better reuse of the
reader in subsequent patches.

The following benchmark prints a single matching ref out of 1 million
refs. Before:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 6,607 allocs, 6,482 frees, 509,635 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated

Note that while there are more allocation and free calls now, the
overall number of bytes allocated is significantly lower. The number of
allocations will be reduced significantly by the next patch though.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c  |  43 ++++++++++------
 reftable/block.h  |  17 ++++---
 reftable/reader.c | 123 ++++++++++++++++++++++------------------------
 3 files changed, 100 insertions(+), 83 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index fe836c21e5..2d8d0668b3 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -261,12 +261,12 @@ void block_reader_release(struct block_reader *br)
 	reftable_block_done(&br->block);
 }
 
-uint8_t block_reader_type(struct block_reader *r)
+uint8_t block_reader_type(const struct block_reader *r)
 {
 	return r->block.data[r->header_off];
 }
 
-int block_reader_first_key(struct block_reader *br, struct strbuf *key)
+int block_reader_first_key(const struct block_reader *br, struct strbuf *key)
 {
 	int off = br->header_off + 4, n;
 	struct string_view in = {
@@ -286,14 +286,16 @@ int block_reader_first_key(struct block_reader *br, struct strbuf *key)
 	return 0;
 }
 
-static uint32_t block_reader_restart_offset(struct block_reader *br, int i)
+static uint32_t block_reader_restart_offset(const struct block_reader *br, int i)
 {
 	return get_be24(br->restart_bytes + 3 * i);
 }
 
-void block_iter_seek_start(struct block_iter *it, struct block_reader *br)
+void block_iter_seek_start(struct block_iter *it, const struct block_reader *br)
 {
-	it->br = br;
+	it->block = br->block.data;
+	it->block_len = br->block_len;
+	it->hash_size = br->hash_size;
 	strbuf_reset(&it->last_key);
 	it->next_off = br->header_off + 4;
 }
@@ -301,7 +303,7 @@ void block_iter_seek_start(struct block_iter *it, struct block_reader *br)
 struct restart_needle_less_args {
 	int error;
 	struct strbuf needle;
-	struct block_reader *reader;
+	const struct block_reader *reader;
 };
 
 static int restart_needle_less(size_t idx, void *_args)
@@ -340,9 +342,11 @@ static int restart_needle_less(size_t idx, void *_args)
 	return args->needle.len < suffix_len;
 }
 
-void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
+void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src)
 {
-	dest->br = src->br;
+	dest->block = src->block;
+	dest->block_len = src->block_len;
+	dest->hash_size = src->hash_size;
 	dest->next_off = src->next_off;
 	strbuf_reset(&dest->last_key);
 	strbuf_addbuf(&dest->last_key, &src->last_key);
@@ -351,14 +355,14 @@ void block_iter_copy_from(struct block_iter *dest, struct block_iter *src)
 int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 {
 	struct string_view in = {
-		.buf = it->br->block.data + it->next_off,
-		.len = it->br->block_len - it->next_off,
+		.buf = (unsigned char *) it->block + it->next_off,
+		.len = it->block_len - it->next_off,
 	};
 	struct string_view start = in;
 	uint8_t extra = 0;
 	int n = 0;
 
-	if (it->next_off >= it->br->block_len)
+	if (it->next_off >= it->block_len)
 		return 1;
 
 	n = reftable_decode_key(&it->last_key, &extra, in);
@@ -368,7 +372,7 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 		return REFTABLE_FORMAT_ERROR;
 
 	string_view_consume(&in, n);
-	n = reftable_record_decode(rec, it->last_key, extra, in, it->br->hash_size,
+	n = reftable_record_decode(rec, it->last_key, extra, in, it->hash_size,
 				   &it->scratch);
 	if (n < 0)
 		return -1;
@@ -378,13 +382,22 @@ int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 	return 0;
 }
 
+void block_iter_reset(struct block_iter *it)
+{
+	strbuf_reset(&it->last_key);
+	it->next_off = 0;
+	it->block = NULL;
+	it->block_len = 0;
+	it->hash_size = 0;
+}
+
 void block_iter_close(struct block_iter *it)
 {
 	strbuf_release(&it->last_key);
 	strbuf_release(&it->scratch);
 }
 
-int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
+int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
 			struct strbuf *want)
 {
 	struct restart_needle_less_args args = {
@@ -436,7 +449,9 @@ int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
 		it->next_off = block_reader_restart_offset(br, i - 1);
 	else
 		it->next_off = br->header_off + 4;
-	it->br = br;
+	it->block = br->block.data;
+	it->block_len = br->block_len;
+	it->hash_size = br->hash_size;
 
 	reftable_record_init(&rec, block_reader_type(br));
 
diff --git a/reftable/block.h b/reftable/block.h
index 601a1e0e89..d733d45ee0 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -84,16 +84,18 @@ int block_reader_init(struct block_reader *br, struct reftable_block *bl,
 void block_reader_release(struct block_reader *br);
 
 /* Returns the block type (eg. 'r' for refs) */
-uint8_t block_reader_type(struct block_reader *r);
+uint8_t block_reader_type(const struct block_reader *r);
 
 /* Decodes the first key in the block */
-int block_reader_first_key(struct block_reader *br, struct strbuf *key);
+int block_reader_first_key(const struct block_reader *br, struct strbuf *key);
 
 /* Iterate over entries in a block */
 struct block_iter {
 	/* offset within the block of the next entry to read. */
 	uint32_t next_off;
-	struct block_reader *br;
+	const unsigned char *block;
+	size_t block_len;
+	int hash_size;
 
 	/* key for last entry we read. */
 	struct strbuf last_key;
@@ -106,17 +108,20 @@ struct block_iter {
 }
 
 /* Position `it` at start of the block */
-void block_iter_seek_start(struct block_iter *it, struct block_reader *br);
+void block_iter_seek_start(struct block_iter *it, const struct block_reader *br);
 
 /* Position `it` to the `want` key in the block */
-int block_iter_seek_key(struct block_iter *it, struct block_reader *br,
+int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
 			struct strbuf *want);
 
-void block_iter_copy_from(struct block_iter *dest, struct block_iter *src);
+void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src);
 
 /* return < 0 for error, 0 for OK, > 0 for EOF. */
 int block_iter_next(struct block_iter *it, struct reftable_record *rec);
 
+/* Reset the block iterator to pristine state without releasing its memory. */
+void block_iter_reset(struct block_iter *it);
+
 /* deallocate memory for `it`. The block reader and its block is left intact. */
 void block_iter_close(struct block_iter *it);
 
diff --git a/reftable/reader.c b/reftable/reader.c
index f925570bf3..b77b639751 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -220,6 +220,7 @@ struct table_iter {
 	struct reftable_reader *r;
 	uint8_t typ;
 	uint64_t block_off;
+	struct block_reader br;
 	struct block_iter bi;
 	int is_finished;
 };
@@ -227,16 +228,6 @@ struct table_iter {
 	.bi = BLOCK_ITER_INIT \
 }
 
-static void table_iter_copy_from(struct table_iter *dest,
-				 struct table_iter *src)
-{
-	dest->r = src->r;
-	dest->typ = src->typ;
-	dest->block_off = src->block_off;
-	dest->is_finished = src->is_finished;
-	block_iter_copy_from(&dest->bi, &src->bi);
-}
-
 static int table_iter_next_in_block(struct table_iter *ti,
 				    struct reftable_record *rec)
 {
@@ -250,14 +241,8 @@ static int table_iter_next_in_block(struct table_iter *ti,
 
 static void table_iter_block_done(struct table_iter *ti)
 {
-	if (!ti->bi.br) {
-		return;
-	}
-	block_reader_release(ti->bi.br);
-	FREE_AND_NULL(ti->bi.br);
-
-	ti->bi.last_key.len = 0;
-	ti->bi.next_off = 0;
+	block_reader_release(&ti->br);
+	block_iter_reset(&ti->bi);
 }
 
 static int32_t extract_block_size(uint8_t *data, uint8_t *typ, uint64_t off,
@@ -321,32 +306,33 @@ int reader_init_block_reader(struct reftable_reader *r, struct block_reader *br,
 	return err;
 }
 
+static void table_iter_close(struct table_iter *ti)
+{
+	table_iter_block_done(ti);
+	block_iter_close(&ti->bi);
+}
+
 static int table_iter_next_block(struct table_iter *dest,
 				 struct table_iter *src)
 {
-	uint64_t next_block_off = src->block_off + src->bi.br->full_block_size;
-	struct block_reader br = { 0 };
-	int err = 0;
+	uint64_t next_block_off = src->block_off + src->br.full_block_size;
+	int err;
 
 	dest->r = src->r;
 	dest->typ = src->typ;
 	dest->block_off = next_block_off;
 
-	err = reader_init_block_reader(src->r, &br, next_block_off, src->typ);
-	if (err > 0) {
+	err = reader_init_block_reader(src->r, &dest->br, next_block_off, src->typ);
+	if (err > 0)
 		dest->is_finished = 1;
-		return 1;
-	}
-	if (err != 0)
+	if (err) {
+		table_iter_block_done(dest);
 		return err;
-	else {
-		struct block_reader *brp =
-			reftable_malloc(sizeof(struct block_reader));
-		*brp = br;
-
-		dest->is_finished = 0;
-		block_iter_seek_start(&dest->bi, brp);
 	}
+
+	dest->is_finished = 0;
+	block_iter_seek_start(&dest->bi, &dest->br);
+
 	return 0;
 }
 
@@ -377,14 +363,13 @@ static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
 		 * iterator is drained.
 		 */
 		err = table_iter_next_block(&next, ti);
-		table_iter_block_done(ti);
 		if (err) {
 			ti->is_finished = 1;
 			return err;
 		}
 
-		table_iter_copy_from(ti, &next);
-		block_iter_close(&next.bi);
+		table_iter_close(ti);
+		*ti = next;
 	}
 }
 
@@ -393,16 +378,14 @@ static int table_iter_next_void(void *ti, struct reftable_record *rec)
 	return table_iter_next(ti, rec);
 }
 
-static void table_iter_close(void *p)
+static void table_iter_close_void(void *ti)
 {
-	struct table_iter *ti = p;
-	table_iter_block_done(ti);
-	block_iter_close(&ti->bi);
+	table_iter_close(ti);
 }
 
 static struct reftable_iterator_vtable table_iter_vtable = {
 	.next = &table_iter_next_void,
-	.close = &table_iter_close,
+	.close = &table_iter_close_void,
 };
 
 static void iterator_from_table_iter(struct reftable_iterator *it,
@@ -417,19 +400,16 @@ static int reader_table_iter_at(struct reftable_reader *r,
 				struct table_iter *ti, uint64_t off,
 				uint8_t typ)
 {
-	struct block_reader br = { 0 };
-	struct block_reader *brp = NULL;
+	int err;
 
-	int err = reader_init_block_reader(r, &br, off, typ);
+	err = reader_init_block_reader(r, &ti->br, off, typ);
 	if (err != 0)
 		return err;
 
-	brp = reftable_malloc(sizeof(struct block_reader));
-	*brp = br;
 	ti->r = r;
-	ti->typ = block_reader_type(brp);
+	ti->typ = block_reader_type(&ti->br);
 	ti->block_off = off;
-	block_iter_seek_start(&ti->bi, brp);
+	block_iter_seek_start(&ti->bi, &ti->br);
 	return 0;
 }
 
@@ -454,23 +434,34 @@ static int reader_seek_linear(struct table_iter *ti,
 {
 	struct strbuf want_key = STRBUF_INIT;
 	struct strbuf got_key = STRBUF_INIT;
-	struct table_iter next = TABLE_ITER_INIT;
 	struct reftable_record rec;
 	int err = -1;
 
 	reftable_record_init(&rec, reftable_record_type(want));
 	reftable_record_key(want, &want_key);
 
+	/*
+	 * First we need to locate the block that must contain our record. To
+	 * do so we scan through blocks linearly until we find the first block
+	 * whose first key is bigger than our wanted key. Once we have found
+	 * that block we know that the key must be contained in the preceding
+	 * block.
+	 *
+	 * This algorithm is somewhat unfortunate because it means that we
+	 * always have to seek one block too far and then back up. But as we
+	 * can only decode the _first_ key of a block but not its _last_ key we
+	 * have no other way to do this.
+	 */
 	while (1) {
+		struct table_iter next = TABLE_ITER_INIT;
+
 		err = table_iter_next_block(&next, ti);
 		if (err < 0)
 			goto done;
-
-		if (err > 0) {
+		if (err > 0)
 			break;
-		}
 
-		err = block_reader_first_key(next.bi.br, &got_key);
+		err = block_reader_first_key(&next.br, &got_key);
 		if (err < 0)
 			goto done;
 
@@ -480,16 +471,20 @@ static int reader_seek_linear(struct table_iter *ti,
 		}
 
 		table_iter_block_done(ti);
-		table_iter_copy_from(ti, &next);
+		*ti = next;
 	}
 
-	err = block_iter_seek_key(&ti->bi, ti->bi.br, &want_key);
+	/*
+	 * We have located the block that must contain our record, so we seek
+	 * the wanted key inside of it. If the block does not contain our key
+	 * we know that the corresponding record does not exist.
+	 */
+	err = block_iter_seek_key(&ti->bi, &ti->br, &want_key);
 	if (err < 0)
 		goto done;
 	err = 0;
 
 done:
-	block_iter_close(&next.bi);
 	reftable_record_release(&rec);
 	strbuf_release(&want_key);
 	strbuf_release(&got_key);
@@ -508,6 +503,7 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		.u.idx = { .last_key = STRBUF_INIT },
 	};
 	struct table_iter index_iter = TABLE_ITER_INIT;
+	struct table_iter empty = TABLE_ITER_INIT;
 	struct table_iter next = TABLE_ITER_INIT;
 	int err = 0;
 
@@ -549,7 +545,6 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		 * not exist.
 		 */
 		err = table_iter_next(&index_iter, &index_result);
-		table_iter_block_done(&index_iter);
 		if (err != 0)
 			goto done;
 
@@ -558,7 +553,7 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		if (err != 0)
 			goto done;
 
-		err = block_iter_seek_key(&next.bi, next.bi.br, &want_index.u.idx.last_key);
+		err = block_iter_seek_key(&next.bi, &next.br, &want_index.u.idx.last_key);
 		if (err < 0)
 			goto done;
 
@@ -572,18 +567,20 @@ static int reader_seek_indexed(struct reftable_reader *r,
 			break;
 		}
 
-		table_iter_copy_from(&index_iter, &next);
+		table_iter_close(&index_iter);
+		index_iter = next;
+		next = empty;
 	}
 
 	if (err == 0) {
-		struct table_iter empty = TABLE_ITER_INIT;
 		struct table_iter *malloced = reftable_calloc(1, sizeof(*malloced));
-		*malloced = empty;
-		table_iter_copy_from(malloced, &next);
+		*malloced = next;
+		next = empty;
 		iterator_from_table_iter(it, malloced);
 	}
+
 done:
-	block_iter_close(&next.bi);
+	table_iter_close(&next);
 	table_iter_close(&index_iter);
 	reftable_record_release(&want_index);
 	reftable_record_release(&index_result);
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 06/10] reftable/reader: iterate to next block in place
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2024-04-08 12:16   ` [PATCH v2 05/10] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
@ 2024-04-08 12:16   ` Patrick Steinhardt
  2024-04-08 12:16   ` [PATCH v2 07/10] reftable/block: reuse uncompressed blocks Patrick Steinhardt
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-08 12:16 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Karthik Nayak, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 4330 bytes --]

The table iterator has to iterate towards the next block once it has
yielded all records of the current block. This is done by creating a new
table iterator, initializing it to the next block, releasing the old
iterator and then copying over the data.

Refactor the code to instead advance the table iterator in place. This
is simpler and unlocks some optimizations in subsequent patches. Also,
it allows us to avoid some allocations.

The following measurements show a single matching ref out of 1 million
refs. Before this change:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 7,235 allocs, 7,110 frees, 301,481 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 315 allocs, 190 frees, 107,027 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c  |  2 ++
 reftable/reader.c | 47 ++++++++++++++++++++++++++---------------------
 2 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 2d8d0668b3..0c4e71eae3 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -188,6 +188,8 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 	uint8_t *restart_bytes = NULL;
 	uint8_t *uncompressed = NULL;
 
+	reftable_block_done(&br->block);
+
 	if (!reftable_is_block_type(typ)) {
 		err =  REFTABLE_FORMAT_ERROR;
 		goto done;
diff --git a/reftable/reader.c b/reftable/reader.c
index b77b639751..dd4de294a1 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -312,26 +312,20 @@ static void table_iter_close(struct table_iter *ti)
 	block_iter_close(&ti->bi);
 }
 
-static int table_iter_next_block(struct table_iter *dest,
-				 struct table_iter *src)
+static int table_iter_next_block(struct table_iter *ti)
 {
-	uint64_t next_block_off = src->block_off + src->br.full_block_size;
+	uint64_t next_block_off = ti->block_off + ti->br.full_block_size;
 	int err;
 
-	dest->r = src->r;
-	dest->typ = src->typ;
-	dest->block_off = next_block_off;
-
-	err = reader_init_block_reader(src->r, &dest->br, next_block_off, src->typ);
+	err = reader_init_block_reader(ti->r, &ti->br, next_block_off, ti->typ);
 	if (err > 0)
-		dest->is_finished = 1;
-	if (err) {
-		table_iter_block_done(dest);
+		ti->is_finished = 1;
+	if (err)
 		return err;
-	}
 
-	dest->is_finished = 0;
-	block_iter_seek_start(&dest->bi, &dest->br);
+	ti->block_off = next_block_off;
+	ti->is_finished = 0;
+	block_iter_seek_start(&ti->bi, &ti->br);
 
 	return 0;
 }
@@ -342,7 +336,6 @@ static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
 		return REFTABLE_API_ERROR;
 
 	while (1) {
-		struct table_iter next = TABLE_ITER_INIT;
 		int err;
 
 		if (ti->is_finished)
@@ -362,14 +355,11 @@ static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
 		 * table and retry. If there are no more blocks then the
 		 * iterator is drained.
 		 */
-		err = table_iter_next_block(&next, ti);
+		err = table_iter_next_block(ti);
 		if (err) {
 			ti->is_finished = 1;
 			return err;
 		}
-
-		table_iter_close(ti);
-		*ti = next;
 	}
 }
 
@@ -453,9 +443,24 @@ static int reader_seek_linear(struct table_iter *ti,
 	 * have no other way to do this.
 	 */
 	while (1) {
-		struct table_iter next = TABLE_ITER_INIT;
+		struct table_iter next = *ti;
+
+		/*
+		 * We must be careful to not modify underlying data of `ti`
+		 * because we may find that `next` does not contain our desired
+		 * block, but that `ti` does. In that case, we would discard
+		 * `next` and continue with `ti`.
+		 *
+		 * This also means that we cannot reuse allocated memory for
+		 * `next` here. While it would be great if we could, it should
+		 * in practice not be too bad given that we should only ever
+		 * end up doing linear seeks with at most three blocks. As soon
+		 * as we have more than three blocks we would have an index, so
+		 * we would not do a linear search there anymore.
+		 */
+		memset(&next.br.block, 0, sizeof(next.br.block));
 
-		err = table_iter_next_block(&next, ti);
+		err = table_iter_next_block(&next);
 		if (err < 0)
 			goto done;
 		if (err > 0)
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 07/10] reftable/block: reuse uncompressed blocks
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2024-04-08 12:16   ` [PATCH v2 06/10] reftable/reader: iterate to next block in place Patrick Steinhardt
@ 2024-04-08 12:16   ` Patrick Steinhardt
  2024-04-08 12:16   ` [PATCH v2 08/10] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-08 12:16 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Karthik Nayak, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 5187 bytes --]

The reftable backend stores reflog entries in a compressed format and
thus needs to uncompress blocks before one can read records from it.
For each reflog block we thus have to allocate an array that we can
decompress the block contents into. This block is being discarded
whenever the table iterator moves to the next block. Consequently, we
reallocate a new array on every block, which is quite wasteful.

Refactor the code to reuse the uncompressed block data when moving the
block reader to a new block. This significantly reduces the number of
allocations when iterating through many compressed blocks. The following
measurements are done with `git reflog list` when listing 100k reflogs.
Before:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 45,755 allocs, 45,633 frees, 254,779,456 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 23,028 allocs, 22,906 frees, 162,813,547 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c  | 14 ++++++--------
 reftable/block.h  |  4 ++++
 reftable/reader.c | 27 ++++++++++++++++-----------
 3 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 0c4e71eae3..9460273290 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -186,7 +186,6 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 	uint16_t restart_count = 0;
 	uint32_t restart_start = 0;
 	uint8_t *restart_bytes = NULL;
-	uint8_t *uncompressed = NULL;
 
 	reftable_block_done(&br->block);
 
@@ -202,14 +201,15 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		uLongf src_len = block->len - block_header_skip;
 
 		/* Log blocks specify the *uncompressed* size in their header. */
-		REFTABLE_ALLOC_ARRAY(uncompressed, sz);
+		REFTABLE_ALLOC_GROW(br->uncompressed_data, sz,
+				    br->uncompressed_cap);
 
 		/* Copy over the block header verbatim. It's not compressed. */
-		memcpy(uncompressed, block->data, block_header_skip);
+		memcpy(br->uncompressed_data, block->data, block_header_skip);
 
 		/* Uncompress */
 		if (Z_OK !=
-		    uncompress2(uncompressed + block_header_skip, &dst_len,
+		    uncompress2(br->uncompressed_data + block_header_skip, &dst_len,
 				block->data + block_header_skip, &src_len)) {
 			err = REFTABLE_ZLIB_ERROR;
 			goto done;
@@ -222,10 +222,8 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 
 		/* We're done with the input data. */
 		reftable_block_done(block);
-		block->data = uncompressed;
-		uncompressed = NULL;
+		block->data = br->uncompressed_data;
 		block->len = sz;
-		block->source = malloc_block_source();
 		full_block_size = src_len + block_header_skip;
 	} else if (full_block_size == 0) {
 		full_block_size = sz;
@@ -254,12 +252,12 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 	br->restart_bytes = restart_bytes;
 
 done:
-	reftable_free(uncompressed);
 	return err;
 }
 
 void block_reader_release(struct block_reader *br)
 {
+	reftable_free(br->uncompressed_data);
 	reftable_block_done(&br->block);
 }
 
diff --git a/reftable/block.h b/reftable/block.h
index d733d45ee0..12414eb642 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -66,6 +66,10 @@ struct block_reader {
 	struct reftable_block block;
 	int hash_size;
 
+	/* Uncompressed data for log entries. */
+	unsigned char *uncompressed_data;
+	size_t uncompressed_cap;
+
 	/* size of the data, excluding restart data. */
 	uint32_t block_len;
 	uint8_t *restart_bytes;
diff --git a/reftable/reader.c b/reftable/reader.c
index dd4de294a1..aacd5f1337 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -459,6 +459,8 @@ static int reader_seek_linear(struct table_iter *ti,
 		 * we would not do a linear search there anymore.
 		 */
 		memset(&next.br.block, 0, sizeof(next.br.block));
+		next.br.uncompressed_data = NULL;
+		next.br.uncompressed_cap = 0;
 
 		err = table_iter_next_block(&next);
 		if (err < 0)
@@ -599,25 +601,28 @@ static int reader_seek_internal(struct reftable_reader *r,
 	struct reftable_reader_offsets *offs =
 		reader_offsets_for(r, reftable_record_type(rec));
 	uint64_t idx = offs->index_offset;
-	struct table_iter ti = TABLE_ITER_INIT;
-	int err = 0;
+	struct table_iter ti = TABLE_ITER_INIT, *p;
+	int err;
+
 	if (idx > 0)
 		return reader_seek_indexed(r, it, rec);
 
 	err = reader_start(r, &ti, reftable_record_type(rec), 0);
 	if (err < 0)
-		return err;
+		goto out;
+
 	err = reader_seek_linear(&ti, rec);
 	if (err < 0)
-		return err;
-	else {
-		struct table_iter *p =
-			reftable_malloc(sizeof(struct table_iter));
-		*p = ti;
-		iterator_from_table_iter(it, p);
-	}
+		goto out;
 
-	return 0;
+	REFTABLE_ALLOC_ARRAY(p, 1);
+	*p = ti;
+	iterator_from_table_iter(it, p);
+
+out:
+	if (err)
+		table_iter_close(&ti);
+	return err;
 }
 
 static int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 08/10] reftable/block: open-code call to `uncompress2()`
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2024-04-08 12:16   ` [PATCH v2 07/10] reftable/block: reuse uncompressed blocks Patrick Steinhardt
@ 2024-04-08 12:16   ` Patrick Steinhardt
  2024-04-08 12:17   ` [PATCH v2 09/10] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-08 12:16 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Karthik Nayak, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 4364 bytes --]

The reftable format stores log blocks in a compressed format. Thus,
whenever we want to read such a block we first need to decompress it.
This is done by calling the convenience function `uncompress2()` of the
zlib library, which is a simple wrapper that manages the lifecycle of
the `zstream` structure for us.

While nice for one-off inflation of data, when iterating through reflogs
we will likely end up inflating many such log blocks. This requires us
to reallocate the state of the `zstream` every single time, which adds
up over time. It would thus be great to reuse the `zstream` instead of
discarding it after every inflation.

Open-code the call to `uncompress2()` such that we can start reusing the
`zstream` in the subsequent commit. Note that our open-coded variant is
different from `uncompress2()` in two ways:

  - We do not loop around `inflate()` until we have processed all input.
    As our input is limited by the maximum block size, which is 16MB, we
    should not hit limits of `inflate()`.

  - We use `Z_FINISH` instead of `Z_NO_FLUSH`. Quoting the `inflate()`
    documentation: "inflate() should normally be called until it returns
    Z_STREAM_END or an error. However if all decompression is to be
    performed in a single step (a single call of inflate), the parameter
    flush should be set to Z_FINISH."

    Furthermore, "Z_FINISH also informs inflate to not maintain a
    sliding window if the stream completes, which reduces inflate's
    memory footprint."

Other than that this commit is expected to be functionally equivalent
and does not yet reuse the `zstream`.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 38 ++++++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 9460273290..435922b569 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -195,10 +195,10 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 	}
 
 	if (typ == BLOCK_TYPE_LOG) {
-		int block_header_skip = 4 + header_off;
-		uLongf dst_len = sz - block_header_skip; /* total size of dest
-							    buffer. */
-		uLongf src_len = block->len - block_header_skip;
+		uint32_t block_header_skip = 4 + header_off;
+		uLong dst_len = sz - block_header_skip;
+		uLong src_len = block->len - block_header_skip;
+		z_stream stream = {0};
 
 		/* Log blocks specify the *uncompressed* size in their header. */
 		REFTABLE_ALLOC_GROW(br->uncompressed_data, sz,
@@ -207,15 +207,33 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		/* Copy over the block header verbatim. It's not compressed. */
 		memcpy(br->uncompressed_data, block->data, block_header_skip);
 
-		/* Uncompress */
-		if (Z_OK !=
-		    uncompress2(br->uncompressed_data + block_header_skip, &dst_len,
-				block->data + block_header_skip, &src_len)) {
+		err = inflateInit(&stream);
+		if (err != Z_OK) {
 			err = REFTABLE_ZLIB_ERROR;
 			goto done;
 		}
 
-		if (dst_len + block_header_skip != sz) {
+		stream.next_in = block->data + block_header_skip;
+		stream.avail_in = src_len;
+		stream.next_out = br->uncompressed_data + block_header_skip;
+		stream.avail_out = dst_len;
+
+		/*
+		 * We know both input as well as output size, and we know that
+		 * the sizes should never be bigger than `uInt_MAX` because
+		 * blocks can at most be 16MB large. We can thus use `Z_FINISH`
+		 * here to instruct zlib to inflate the data in one go, which
+		 * is more efficient than using `Z_NO_FLUSH`.
+		 */
+		err = inflate(&stream, Z_FINISH);
+		inflateEnd(&stream);
+		if (err != Z_STREAM_END) {
+			err = REFTABLE_ZLIB_ERROR;
+			goto done;
+		}
+		err = 0;
+
+		if (stream.total_out + block_header_skip != sz) {
 			err = REFTABLE_FORMAT_ERROR;
 			goto done;
 		}
@@ -224,7 +242,7 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		reftable_block_done(block);
 		block->data = br->uncompressed_data;
 		block->len = sz;
-		full_block_size = src_len + block_header_skip;
+		full_block_size = src_len + block_header_skip - stream.avail_in;
 	} else if (full_block_size == 0) {
 		full_block_size = sz;
 	} else if (sz < full_block_size && sz < block->len &&
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 09/10] reftable/block: reuse `zstream` state on inflation
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2024-04-08 12:16   ` [PATCH v2 08/10] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
@ 2024-04-08 12:17   ` Patrick Steinhardt
  2024-04-10 10:15     ` Karthik Nayak
  2024-04-08 12:17   ` [PATCH v2 10/10] reftable/block: avoid copying block iterators on seek Patrick Steinhardt
                     ` (2 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-08 12:17 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Karthik Nayak, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 5387 bytes --]

When calling `inflateInit()` and `inflate()`, the zlib library will
allocate several data structures for the underlying `zstream` to keep
track of various information. Thus, when inflating repeatedly, it is
possible to optimize memory allocation patterns by reusing the `zstream`
and then calling `inflateReset()` on it to prepare it for the next chunk
of data to inflate.

This is exactly what the reftable code is doing: when iterating through
reflogs we need to potentially inflate many log blocks, but we discard
the `zstream` every single time. Instead, as we reuse the `block_reader`
for each of the blocks anyway, we can initialize the `zstream` once and
then reuse it for subsequent inflations.

Refactor the code to do so, which leads to a significant reduction in
the number of allocations. The following measurements were done when
iterating through 1 million reflog entries. Before:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 23,028 allocs, 22,906 frees, 162,813,552 bytes allocated

After:

  HEAP SUMMARY:
      in use at exit: 13,473 bytes in 122 blocks
    total heap usage: 302 allocs, 180 frees, 88,352 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c  | 25 +++++++++++++++----------
 reftable/block.h  |  3 +++
 reftable/reader.c |  1 +
 3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 435922b569..c6c4a68ea1 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -198,7 +198,6 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		uint32_t block_header_skip = 4 + header_off;
 		uLong dst_len = sz - block_header_skip;
 		uLong src_len = block->len - block_header_skip;
-		z_stream stream = {0};
 
 		/* Log blocks specify the *uncompressed* size in their header. */
 		REFTABLE_ALLOC_GROW(br->uncompressed_data, sz,
@@ -207,16 +206,21 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		/* Copy over the block header verbatim. It's not compressed. */
 		memcpy(br->uncompressed_data, block->data, block_header_skip);
 
-		err = inflateInit(&stream);
+		if (!br->zstream) {
+			REFTABLE_CALLOC_ARRAY(br->zstream, 1);
+			err = inflateInit(br->zstream);
+		} else {
+			err = inflateReset(br->zstream);
+		}
 		if (err != Z_OK) {
 			err = REFTABLE_ZLIB_ERROR;
 			goto done;
 		}
 
-		stream.next_in = block->data + block_header_skip;
-		stream.avail_in = src_len;
-		stream.next_out = br->uncompressed_data + block_header_skip;
-		stream.avail_out = dst_len;
+		br->zstream->next_in = block->data + block_header_skip;
+		br->zstream->avail_in = src_len;
+		br->zstream->next_out = br->uncompressed_data + block_header_skip;
+		br->zstream->avail_out = dst_len;
 
 		/*
 		 * We know both input as well as output size, and we know that
@@ -225,15 +229,14 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		 * here to instruct zlib to inflate the data in one go, which
 		 * is more efficient than using `Z_NO_FLUSH`.
 		 */
-		err = inflate(&stream, Z_FINISH);
-		inflateEnd(&stream);
+		err = inflate(br->zstream, Z_FINISH);
 		if (err != Z_STREAM_END) {
 			err = REFTABLE_ZLIB_ERROR;
 			goto done;
 		}
 		err = 0;
 
-		if (stream.total_out + block_header_skip != sz) {
+		if (br->zstream->total_out + block_header_skip != sz) {
 			err = REFTABLE_FORMAT_ERROR;
 			goto done;
 		}
@@ -242,7 +245,7 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 		reftable_block_done(block);
 		block->data = br->uncompressed_data;
 		block->len = sz;
-		full_block_size = src_len + block_header_skip - stream.avail_in;
+		full_block_size = src_len + block_header_skip - br->zstream->avail_in;
 	} else if (full_block_size == 0) {
 		full_block_size = sz;
 	} else if (sz < full_block_size && sz < block->len &&
@@ -275,6 +278,8 @@ int block_reader_init(struct block_reader *br, struct reftable_block *block,
 
 void block_reader_release(struct block_reader *br)
 {
+	inflateEnd(br->zstream);
+	reftable_free(br->zstream);
 	reftable_free(br->uncompressed_data);
 	reftable_block_done(&br->block);
 }
diff --git a/reftable/block.h b/reftable/block.h
index 12414eb642..c1bd1892cb 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -56,6 +56,8 @@ int block_writer_finish(struct block_writer *w);
 /* clears out internally allocated block_writer members. */
 void block_writer_release(struct block_writer *bw);
 
+struct z_stream;
+
 /* Read a block. */
 struct block_reader {
 	/* offset of the block header; nonzero for the first block in a
@@ -67,6 +69,7 @@ struct block_reader {
 	int hash_size;
 
 	/* Uncompressed data for log entries. */
+	z_stream *zstream;
 	unsigned char *uncompressed_data;
 	size_t uncompressed_cap;
 
diff --git a/reftable/reader.c b/reftable/reader.c
index aacd5f1337..481dff10d4 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -459,6 +459,7 @@ static int reader_seek_linear(struct table_iter *ti,
 		 * we would not do a linear search there anymore.
 		 */
 		memset(&next.br.block, 0, sizeof(next.br.block));
+		next.br.zstream = NULL;
 		next.br.uncompressed_data = NULL;
 		next.br.uncompressed_cap = 0;
 
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 10/10] reftable/block: avoid copying block iterators on seek
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2024-04-08 12:17   ` [PATCH v2 09/10] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
@ 2024-04-08 12:17   ` Patrick Steinhardt
  2024-04-09  1:29     ` Justin Tobler
  2024-04-09  1:32   ` [PATCH v2 00/10] reftable: optimize table and block iterators Justin Tobler
  2024-04-10 11:35   ` Karthik Nayak
  11 siblings, 1 reply; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-08 12:17 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Karthik Nayak, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 4548 bytes --]

When seeking a reftable record in a block we need to position the
iterator _before_ the sought-after record so that the next call to
`block_iter_next()` would yield that record. To achieve this, the loop
that performs the linear needs to restore the previous position once it
has found the record.

This is done by advancing two `block_iter`s: one to check whether the
next record is our sought-after record, and one that we update after
every iteration. This of course involves quite a lot of copying and also
leads to needless memory allocations.

Refactor the code to get rid of the `next` iterator and the copying this
involves. Instead, we can restore the previous offset such that the call
to `next` will return the correct record.

Next to being simpler conceptually this also leads to a nice speedup.
The following benchmark parser 10k refs out of 100k existing refs via
`git-rev-list --no-walk`:

  Benchmark 1: rev-list: print many refs (HEAD~)
    Time (mean ± σ):     170.2 ms ±   1.7 ms    [User: 86.1 ms, System: 83.6 ms]
    Range (min … max):   166.4 ms … 180.3 ms    500 runs

  Benchmark 2: rev-list: print many refs (HEAD~)
    Time (mean ± σ):     161.6 ms ±   1.6 ms    [User: 78.1 ms, System: 83.0 ms]
    Range (min … max):   158.4 ms … 172.3 ms    500 runs

  Summary
    rev-list: print many refs (HEAD) ran
      1.05 ± 0.01 times faster than rev-list: print many refs (HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 32 ++++++++++++++------------------
 reftable/block.h |  2 --
 2 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index c6c4a68ea1..3e87460cba 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -365,16 +365,6 @@ static int restart_needle_less(size_t idx, void *_args)
 	return args->needle.len < suffix_len;
 }
 
-void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src)
-{
-	dest->block = src->block;
-	dest->block_len = src->block_len;
-	dest->hash_size = src->hash_size;
-	dest->next_off = src->next_off;
-	strbuf_reset(&dest->last_key);
-	strbuf_addbuf(&dest->last_key, &src->last_key);
-}
-
 int block_iter_next(struct block_iter *it, struct reftable_record *rec)
 {
 	struct string_view in = {
@@ -427,7 +417,6 @@ int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
 		.needle = *want,
 		.reader = br,
 	};
-	struct block_iter next = BLOCK_ITER_INIT;
 	struct reftable_record rec;
 	int err = 0;
 	size_t i;
@@ -486,11 +475,13 @@ int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
 	 * far and then back up.
 	 */
 	while (1) {
-		block_iter_copy_from(&next, it);
-		err = block_iter_next(&next, &rec);
+		size_t prev_off = it->next_off;
+
+		err = block_iter_next(it, &rec);
 		if (err < 0)
 			goto done;
 		if (err > 0) {
+			it->next_off = prev_off;
 			err = 0;
 			goto done;
 		}
@@ -501,18 +492,23 @@ int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
 		 * record does not exist in the block and can thus abort early.
 		 * In case it is equal to the sought-after key we have found
 		 * the desired record.
+		 *
+		 * Note that we store the next record's key record directly in
+		 * `last_key` without restoring the key of the preceding record
+		 * in case we need to go one record back. This is safe to do as
+		 * `block_iter_next()` would return the ref whose key is equal
+		 * to `last_key` now, and naturally all keys share a prefix
+		 * with themselves.
 		 */
 		reftable_record_key(&rec, &it->last_key);
-		if (strbuf_cmp(&it->last_key, want) >= 0)
+		if (strbuf_cmp(&it->last_key, want) >= 0) {
+			it->next_off = prev_off;
 			goto done;
-
-		block_iter_copy_from(it, &next);
+		}
 	}
 
 done:
-	block_iter_close(&next);
 	reftable_record_release(&rec);
-
 	return err;
 }
 
diff --git a/reftable/block.h b/reftable/block.h
index c1bd1892cb..ea4384a7e2 100644
--- a/reftable/block.h
+++ b/reftable/block.h
@@ -121,8 +121,6 @@ void block_iter_seek_start(struct block_iter *it, const struct block_reader *br)
 int block_iter_seek_key(struct block_iter *it, const struct block_reader *br,
 			struct strbuf *want);
 
-void block_iter_copy_from(struct block_iter *dest, const struct block_iter *src);
-
 /* return < 0 for error, 0 for OK, > 0 for EOF. */
 int block_iter_next(struct block_iter *it, struct reftable_record *rec);
 
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] reftable/block: avoid copying block iterators on seek
  2024-04-08 12:17   ` [PATCH v2 10/10] reftable/block: avoid copying block iterators on seek Patrick Steinhardt
@ 2024-04-09  1:29     ` Justin Tobler
  2024-04-09  3:18       ` Patrick Steinhardt
  0 siblings, 1 reply; 31+ messages in thread
From: Justin Tobler @ 2024-04-09  1:29 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys, Karthik Nayak

On 24/04/08 02:17PM, Patrick Steinhardt wrote:
> When seeking a reftable record in a block we need to position the
> iterator _before_ the sought-after record so that the next call to
> `block_iter_next()` would yield that record. To achieve this, the loop
> that performs the linear needs to restore the previous position once it

Did we mean to say "linear seek" here? Otherwise this looks good to me.

-Justin

> has found the record.
> 
> This is done by advancing two `block_iter`s: one to check whether the
> next record is our sought-after record, and one that we update after
> every iteration. This of course involves quite a lot of copying and also
> leads to needless memory allocations.
> 
> Refactor the code to get rid of the `next` iterator and the copying this
> involves. Instead, we can restore the previous offset such that the call
> to `next` will return the correct record.
> 
> Next to being simpler conceptually this also leads to a nice speedup.
> The following benchmark parser 10k refs out of 100k existing refs via
> `git-rev-list --no-walk`:
> 
>   Benchmark 1: rev-list: print many refs (HEAD~)
>     Time (mean ± σ):     170.2 ms ±   1.7 ms    [User: 86.1 ms, System: 83.6 ms]
>     Range (min … max):   166.4 ms … 180.3 ms    500 runs
> 
>   Benchmark 2: rev-list: print many refs (HEAD~)
>     Time (mean ± σ):     161.6 ms ±   1.6 ms    [User: 78.1 ms, System: 83.0 ms]
>     Range (min … max):   158.4 ms … 172.3 ms    500 runs
> 
>   Summary
>     rev-list: print many refs (HEAD) ran
>       1.05 ± 0.01 times faster than rev-list: print many refs (HEAD~)
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] reftable: optimize table and block iterators
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
                     ` (9 preceding siblings ...)
  2024-04-08 12:17   ` [PATCH v2 10/10] reftable/block: avoid copying block iterators on seek Patrick Steinhardt
@ 2024-04-09  1:32   ` Justin Tobler
  2024-04-10 11:35   ` Karthik Nayak
  11 siblings, 0 replies; 31+ messages in thread
From: Justin Tobler @ 2024-04-09  1:32 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Han-Wen Nienhuys, Karthik Nayak

On 24/04/08 02:16PM, Patrick Steinhardt wrote:
> Hi,
> 
> this is the second version of my patch series that aims to optimize
> the reftable table and block iterators.
> 
> Changes compared to v1:
> 
>     - The series now deepends on ps/reftable-binsearch-update at
>       d51d8cc368 (reftable/block: avoid decoding keys when searching
>       restart points, 2024-04-03). This is to resolve a merge conflict
>       with that other series which has landed in "next" already.
> 
>     - Rewrote a comment to be single-line to fit into the style of other
>       comments better.
> 
>     - A new patch on top that avoids copying block iterators altogether
>       for another speedup.
> 
> Thanks!
> 
> Patrick

Thanks, other than a small question about one of the commit messages,
this version looks good to me.

-Justin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 10/10] reftable/block: avoid copying block iterators on seek
  2024-04-09  1:29     ` Justin Tobler
@ 2024-04-09  3:18       ` Patrick Steinhardt
  0 siblings, 0 replies; 31+ messages in thread
From: Patrick Steinhardt @ 2024-04-09  3:18 UTC (permalink / raw)
  To: git, Han-Wen Nienhuys, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 1911 bytes --]

On Mon, Apr 08, 2024 at 08:29:14PM -0500, Justin Tobler wrote:
> On 24/04/08 02:17PM, Patrick Steinhardt wrote:
> > When seeking a reftable record in a block we need to position the
> > iterator _before_ the sought-after record so that the next call to
> > `block_iter_next()` would yield that record. To achieve this, the loop
> > that performs the linear needs to restore the previous position once it
> 
> Did we mean to say "linear seek" here? Otherwise this looks good to me.
> 
> -Justin

Oh, yes, of course. Thanks for reading this carefully!

Patrick

> > has found the record.
> > 
> > This is done by advancing two `block_iter`s: one to check whether the
> > next record is our sought-after record, and one that we update after
> > every iteration. This of course involves quite a lot of copying and also
> > leads to needless memory allocations.
> > 
> > Refactor the code to get rid of the `next` iterator and the copying this
> > involves. Instead, we can restore the previous offset such that the call
> > to `next` will return the correct record.
> > 
> > Next to being simpler conceptually this also leads to a nice speedup.
> > The following benchmark parser 10k refs out of 100k existing refs via
> > `git-rev-list --no-walk`:
> > 
> >   Benchmark 1: rev-list: print many refs (HEAD~)
> >     Time (mean ± σ):     170.2 ms ±   1.7 ms    [User: 86.1 ms, System: 83.6 ms]
> >     Range (min … max):   166.4 ms … 180.3 ms    500 runs
> > 
> >   Benchmark 2: rev-list: print many refs (HEAD~)
> >     Time (mean ± σ):     161.6 ms ±   1.6 ms    [User: 78.1 ms, System: 83.0 ms]
> >     Range (min … max):   158.4 ms … 172.3 ms    500 runs
> > 
> >   Summary
> >     rev-list: print many refs (HEAD) ran
> >       1.05 ± 0.01 times faster than rev-list: print many refs (HEAD~)
> > 
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 09/10] reftable/block: reuse `zstream` state on inflation
  2024-04-08 12:17   ` [PATCH v2 09/10] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
@ 2024-04-10 10:15     ` Karthik Nayak
  0 siblings, 0 replies; 31+ messages in thread
From: Karthik Nayak @ 2024-04-10 10:15 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Han-Wen Nienhuys, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1351 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> When calling `inflateInit()` and `inflate()`, the zlib library will
> allocate several data structures for the underlying `zstream` to keep
> track of various information. Thus, when inflating repeatedly, it is
> possible to optimize memory allocation patterns by reusing the `zstream`
> and then calling `inflateReset()` on it to prepare it for the next chunk
> of data to inflate.
>
> This is exactly what the reftable code is doing: when iterating through
> reflogs we need to potentially inflate many log blocks, but we discard
> the `zstream` every single time. Instead, as we reuse the `block_reader`
> for each of the blocks anyway, we can initialize the `zstream` once and
> then reuse it for subsequent inflations.
>
> Refactor the code to do so, which leads to a significant reduction in
> the number of allocations. The following measurements were done when
> iterating through 1 million reflog entries. Before:
>
>   HEAP SUMMARY:
>       in use at exit: 13,473 bytes in 122 blocks
>     total heap usage: 23,028 allocs, 22,906 frees, 162,813,552 bytes allocated
>
> After:
>
>   HEAP SUMMARY:
>       in use at exit: 13,473 bytes in 122 blocks
>     total heap usage: 302 allocs, 180 frees, 88,352 bytes allocated
>

Really nice how just reusing the data structure has such a significant
impact.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 00/10] reftable: optimize table and block iterators
  2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
                     ` (10 preceding siblings ...)
  2024-04-09  1:32   ` [PATCH v2 00/10] reftable: optimize table and block iterators Justin Tobler
@ 2024-04-10 11:35   ` Karthik Nayak
  11 siblings, 0 replies; 31+ messages in thread
From: Karthik Nayak @ 2024-04-10 11:35 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Han-Wen Nienhuys, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 723 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> this is the second version of my patch series that aims to optimize
> the reftable table and block iterators.
>
> Changes compared to v1:
>
>     - The series now deepends on ps/reftable-binsearch-update at
>       d51d8cc368 (reftable/block: avoid decoding keys when searching
>       restart points, 2024-04-03). This is to resolve a merge conflict
>       with that other series which has landed in "next" already.
>
>     - Rewrote a comment to be single-line to fit into the style of other
>       comments better.
>
>     - A new patch on top that avoids copying block iterators altogether
>       for another speedup.
>

Overall this series looks good to me. Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-04-10 11:35 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-27  6:36 [PATCH 0/9] reftable: optimize table and block iterators Patrick Steinhardt
2024-03-27  6:36 ` [PATCH 1/9] reftable/block: rename `block_reader_start()` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 2/9] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 3/9] reftable/block: better grouping of functions Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 4/9] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
2024-04-03 13:16   ` Karthik Nayak
2024-04-08 12:10     ` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 5/9] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
2024-04-03  4:52   ` Justin Tobler
2024-04-03 13:10     ` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 6/9] reftable/reader: iterate to next block in place Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 7/9] reftable/block: reuse uncompressed blocks Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 8/9] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
2024-03-27  6:37 ` [PATCH 9/9] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
2024-04-03 13:33 ` [PATCH 0/9] reftable: optimize table and block iterators Karthik Nayak
2024-04-08 12:16 ` [PATCH v2 00/10] " Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 01/10] reftable/block: rename `block_reader_start()` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 02/10] reftable/block: merge `block_iter_seek()` and `block_reader_seek()` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 03/10] reftable/block: better grouping of functions Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 04/10] reftable/block: introduce `block_reader_release()` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 05/10] reftable/block: move ownership of block reader into `struct table_iter` Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 06/10] reftable/reader: iterate to next block in place Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 07/10] reftable/block: reuse uncompressed blocks Patrick Steinhardt
2024-04-08 12:16   ` [PATCH v2 08/10] reftable/block: open-code call to `uncompress2()` Patrick Steinhardt
2024-04-08 12:17   ` [PATCH v2 09/10] reftable/block: reuse `zstream` state on inflation Patrick Steinhardt
2024-04-10 10:15     ` Karthik Nayak
2024-04-08 12:17   ` [PATCH v2 10/10] reftable/block: avoid copying block iterators on seek Patrick Steinhardt
2024-04-09  1:29     ` Justin Tobler
2024-04-09  3:18       ` Patrick Steinhardt
2024-04-09  1:32   ` [PATCH v2 00/10] reftable: optimize table and block iterators Justin Tobler
2024-04-10 11:35   ` Karthik Nayak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).