git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] reftable: prepare for re-seekable iterators
@ 2024-05-08 11:03 Patrick Steinhardt
  2024-05-08 11:03 ` [PATCH 01/13] reftable/block: use `size_t` to track restart point index Patrick Steinhardt
                   ` (14 more replies)
  0 siblings, 15 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:03 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3313 bytes --]

Hi,

the reftable library uses iterators both to iterate through a set of
records, but also to look up a single record. In past patch series, I
have focussed quite a lot to optimize the case where we iterate through
a large set of records. But looking up a records is still quite
inefficient when doing multiple lookups. This is because whenever we
want to look up a record, we need to create a new iterator, including
all of its internal data structures.

To address this inefficiency, the patch series at hand refactors the
reftable library such that creation of iterators and seeking on an
iterator are separate steps. This refactoring prepares us for reusing
iterators to perform multiple seeks, which in turn will allow us to
reuse internal data structures for subsequent seeks.

The patch series is structured as follows:

  - Patches 1 to 5 perform some general cleanups to make the reftable
    iterators easier to understand.

  - Patchges 6 to 9 refactor the iterators internally such that creation
    of the iterator and seeking on it is clearly separated.

  - Patches 10 to 13 adapt the external interfaces such that they allow
    for reuse of iterators.

Note: this series does not yet go all the way to re-seekable iterators,
and there are no users yet. The patch series is complex enough as-is
already, so I decided to defer that to the next iteration. Thus, the
whole refactoring here should essentially be a large no-op that prepares
the infrastructure for re-seekable iterators.

The series depends on pks/reftable-write-optim at fa74f32291
(reftable/block: reuse compressed array, 2024-04-08).

Thanks!

Patrick

Patrick Steinhardt (13):
  reftable/block: use `size_t` to track restart point index
  reftable/reader: avoid copying index iterator
  reftable/reader: unify indexed and linear seeking
  reftable/reader: separate concerns of table iter and reftable reader
  reftable/reader: inline `reader_seek_internal()`
  reftable/reader: set up the reader when initializing table iterator
  reftable/merged: split up initialization and seeking of records
  reftable/merged: simplify indices for subiterators
  reftable/generic: move seeking of records into the iterator
  reftable/generic: adapt interface to allow reuse of iterators
  reftable/reader: adapt interface to allow reuse of iterators
  reftable/stack: provide convenience functions to create iterators
  reftable/merged: adapt interface to allow reuse of iterators

 refs/reftable-backend.c      |  48 ++++----
 reftable/block.c             |   4 +-
 reftable/generic.c           |  94 +++++++++++----
 reftable/generic.h           |   9 +-
 reftable/iter.c              |  23 +++-
 reftable/merged.c            | 148 ++++++++----------------
 reftable/merged.h            |   6 +
 reftable/merged_test.c       |  19 ++-
 reftable/reader.c            | 218 +++++++++++++++--------------------
 reftable/readwrite_test.c    |  35 ++++--
 reftable/reftable-generic.h  |   8 +-
 reftable/reftable-iterator.h |  21 ++++
 reftable/reftable-merged.h   |  15 ---
 reftable/reftable-reader.h   |  45 ++------
 reftable/reftable-stack.h    |  18 +++
 reftable/stack.c             |  29 ++++-
 16 files changed, 378 insertions(+), 362 deletions(-)

-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 01/13] reftable/block: use `size_t` to track restart point index
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
@ 2024-05-08 11:03 ` Patrick Steinhardt
  2024-05-08 11:03 ` [PATCH 02/13] reftable/reader: avoid copying index iterator Patrick Steinhardt
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:03 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1060 bytes --]

The function `block_reader_restart_offset()` gets the offset of the
`i`th restart point. `i` is a signed integer though, which is certainly
not the correct type to track indices like this. Furthermore, both
callers end up passing a `size_t`.

Refactor the code to use a `size_t` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 5942cb4053..00030eee06 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -326,9 +326,9 @@ int block_reader_first_key(const struct block_reader *br, struct strbuf *key)
 	return 0;
 }
 
-static uint32_t block_reader_restart_offset(const struct block_reader *br, int i)
+static uint32_t block_reader_restart_offset(const struct block_reader *br, size_t idx)
 {
-	return get_be24(br->restart_bytes + 3 * i);
+	return get_be24(br->restart_bytes + 3 * idx);
 }
 
 void block_iter_seek_start(struct block_iter *it, const struct block_reader *br)
-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 02/13] reftable/reader: avoid copying index iterator
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
  2024-05-08 11:03 ` [PATCH 01/13] reftable/block: use `size_t` to track restart point index Patrick Steinhardt
@ 2024-05-08 11:03 ` Patrick Steinhardt
  2024-05-08 11:03 ` [PATCH 03/13] reftable/reader: unify indexed and linear seeking Patrick Steinhardt
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:03 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3550 bytes --]

When doing an indexed seek we need to walk down the multi-level index
until we finally hit a record of the desired indexed type. This loop
performs a copy of the index iterator on every iteration, which is both
hard to understand and completely unnecessary.

Refactor the code so that we use a single iterator to walk down the
indices, only.

Note that while this should improve performance, the improvement is
negligible in all but the most unreasonable repositories. This is
because the effect is only really noticeable when we have to walk down
many levels of indices, which is not something that a repository would
typically have. So the motivation for this change is really only about
readability.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c | 38 ++++++++++++++------------------------
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index 481dff10d4..6bfadcad71 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -510,13 +510,11 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		.type = BLOCK_TYPE_INDEX,
 		.u.idx = { .last_key = STRBUF_INIT },
 	};
-	struct table_iter index_iter = TABLE_ITER_INIT;
-	struct table_iter empty = TABLE_ITER_INIT;
-	struct table_iter next = TABLE_ITER_INIT;
+	struct table_iter ti = TABLE_ITER_INIT, *malloced;
 	int err = 0;
 
 	reftable_record_key(rec, &want_index.u.idx.last_key);
-	err = reader_start(r, &index_iter, reftable_record_type(rec), 1);
+	err = reader_start(r, &ti, reftable_record_type(rec), 1);
 	if (err < 0)
 		goto done;
 
@@ -526,7 +524,7 @@ static int reader_seek_indexed(struct reftable_reader *r,
 	 * highest layer that identifies the relevant index block as well as
 	 * the record inside that block that corresponds to our wanted key.
 	 */
-	err = reader_seek_linear(&index_iter, &want_index);
+	err = reader_seek_linear(&ti, &want_index);
 	if (err < 0)
 		goto done;
 
@@ -552,44 +550,36 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		 * all levels of the index only to find out that the key does
 		 * not exist.
 		 */
-		err = table_iter_next(&index_iter, &index_result);
+		err = table_iter_next(&ti, &index_result);
 		if (err != 0)
 			goto done;
 
-		err = reader_table_iter_at(r, &next, index_result.u.idx.offset,
-					   0);
+		err = reader_table_iter_at(r, &ti, index_result.u.idx.offset, 0);
 		if (err != 0)
 			goto done;
 
-		err = block_iter_seek_key(&next.bi, &next.br, &want_index.u.idx.last_key);
+		err = block_iter_seek_key(&ti.bi, &ti.br, &want_index.u.idx.last_key);
 		if (err < 0)
 			goto done;
 
-		if (next.typ == reftable_record_type(rec)) {
+		if (ti.typ == reftable_record_type(rec)) {
 			err = 0;
 			break;
 		}
 
-		if (next.typ != BLOCK_TYPE_INDEX) {
+		if (ti.typ != BLOCK_TYPE_INDEX) {
 			err = REFTABLE_FORMAT_ERROR;
-			break;
+			goto done;
 		}
-
-		table_iter_close(&index_iter);
-		index_iter = next;
-		next = empty;
 	}
 
-	if (err == 0) {
-		struct table_iter *malloced = reftable_calloc(1, sizeof(*malloced));
-		*malloced = next;
-		next = empty;
-		iterator_from_table_iter(it, malloced);
-	}
+	REFTABLE_ALLOC_ARRAY(malloced, 1);
+	*malloced = ti;
+	iterator_from_table_iter(it, malloced);
 
 done:
-	table_iter_close(&next);
-	table_iter_close(&index_iter);
+	if (err)
+		table_iter_close(&ti);
 	reftable_record_release(&want_index);
 	reftable_record_release(&index_result);
 	return err;
-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 03/13] reftable/reader: unify indexed and linear seeking
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
  2024-05-08 11:03 ` [PATCH 01/13] reftable/block: use `size_t` to track restart point index Patrick Steinhardt
  2024-05-08 11:03 ` [PATCH 02/13] reftable/reader: avoid copying index iterator Patrick Steinhardt
@ 2024-05-08 11:03 ` Patrick Steinhardt
  2024-05-08 11:03 ` [PATCH 04/13] reftable/reader: separate concerns of table iter and reftable reader Patrick Steinhardt
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:03 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3957 bytes --]

In `reader_seek_internal()` we either end up doing an indexed seek when
there is one or a linear seek otherwise. These two code paths are
disjunct without a good reason, where the indexed seek will cause us to
exit early.

Refactor the two code paths such that it becomes possible to share a bit
more code between them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c | 42 ++++++++++++++++--------------------------
 1 file changed, 16 insertions(+), 26 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index 6bfadcad71..cf7f126d8d 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -425,7 +425,7 @@ static int reader_seek_linear(struct table_iter *ti,
 	struct strbuf want_key = STRBUF_INIT;
 	struct strbuf got_key = STRBUF_INIT;
 	struct reftable_record rec;
-	int err = -1;
+	int err;
 
 	reftable_record_init(&rec, reftable_record_type(want));
 	reftable_record_key(want, &want_key);
@@ -499,8 +499,8 @@ static int reader_seek_linear(struct table_iter *ti,
 	return err;
 }
 
-static int reader_seek_indexed(struct reftable_reader *r,
-			       struct reftable_iterator *it,
+static int reader_seek_indexed(struct table_iter *ti,
+			       struct reftable_reader *r,
 			       struct reftable_record *rec)
 {
 	struct reftable_record want_index = {
@@ -510,13 +510,9 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		.type = BLOCK_TYPE_INDEX,
 		.u.idx = { .last_key = STRBUF_INIT },
 	};
-	struct table_iter ti = TABLE_ITER_INIT, *malloced;
-	int err = 0;
+	int err;
 
 	reftable_record_key(rec, &want_index.u.idx.last_key);
-	err = reader_start(r, &ti, reftable_record_type(rec), 1);
-	if (err < 0)
-		goto done;
 
 	/*
 	 * The index may consist of multiple levels, where each level may have
@@ -524,7 +520,7 @@ static int reader_seek_indexed(struct reftable_reader *r,
 	 * highest layer that identifies the relevant index block as well as
 	 * the record inside that block that corresponds to our wanted key.
 	 */
-	err = reader_seek_linear(&ti, &want_index);
+	err = reader_seek_linear(ti, &want_index);
 	if (err < 0)
 		goto done;
 
@@ -550,36 +546,30 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		 * all levels of the index only to find out that the key does
 		 * not exist.
 		 */
-		err = table_iter_next(&ti, &index_result);
+		err = table_iter_next(ti, &index_result);
 		if (err != 0)
 			goto done;
 
-		err = reader_table_iter_at(r, &ti, index_result.u.idx.offset, 0);
+		err = reader_table_iter_at(r, ti, index_result.u.idx.offset, 0);
 		if (err != 0)
 			goto done;
 
-		err = block_iter_seek_key(&ti.bi, &ti.br, &want_index.u.idx.last_key);
+		err = block_iter_seek_key(&ti->bi, &ti->br, &want_index.u.idx.last_key);
 		if (err < 0)
 			goto done;
 
-		if (ti.typ == reftable_record_type(rec)) {
+		if (ti->typ == reftable_record_type(rec)) {
 			err = 0;
 			break;
 		}
 
-		if (ti.typ != BLOCK_TYPE_INDEX) {
+		if (ti->typ != BLOCK_TYPE_INDEX) {
 			err = REFTABLE_FORMAT_ERROR;
 			goto done;
 		}
 	}
 
-	REFTABLE_ALLOC_ARRAY(malloced, 1);
-	*malloced = ti;
-	iterator_from_table_iter(it, malloced);
-
 done:
-	if (err)
-		table_iter_close(&ti);
 	reftable_record_release(&want_index);
 	reftable_record_release(&index_result);
 	return err;
@@ -595,15 +585,15 @@ static int reader_seek_internal(struct reftable_reader *r,
 	struct table_iter ti = TABLE_ITER_INIT, *p;
 	int err;
 
-	if (idx > 0)
-		return reader_seek_indexed(r, it, rec);
-
-	err = reader_start(r, &ti, reftable_record_type(rec), 0);
+	err = reader_start(r, &ti, reftable_record_type(rec), !!idx);
 	if (err < 0)
 		goto out;
 
-	err = reader_seek_linear(&ti, rec);
-	if (err < 0)
+	if (idx)
+		err = reader_seek_indexed(&ti, r, rec);
+	else
+		err = reader_seek_linear(&ti, rec);
+	if (err)
 		goto out;
 
 	REFTABLE_ALLOC_ARRAY(p, 1);
-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 04/13] reftable/reader: separate concerns of table iter and reftable reader
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2024-05-08 11:03 ` [PATCH 03/13] reftable/reader: unify indexed and linear seeking Patrick Steinhardt
@ 2024-05-08 11:03 ` Patrick Steinhardt
  2024-05-08 11:03 ` [PATCH 05/13] reftable/reader: inline `reader_seek_internal()` Patrick Steinhardt
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:03 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 4207 bytes --]

In "reftable/reader.c" we implement two different interfaces:

  - The reftable reader contains the logic to read reftables.

  - The table iterator is used to iterate through a single reftable read
    by the reader.

The way those two types are used in the code is somewhat confusing
though because seeking inside a table is implemented as if it was part
of the reftable reader, even though it is ultimately more of a detail
implemented by the table iterator.

Make the boundary between those two types clearer by renaming functions
that seek records in a table such that they clearly belong to the table
iterator's logic.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index cf7f126d8d..b210753441 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -386,9 +386,8 @@ static void iterator_from_table_iter(struct reftable_iterator *it,
 	it->ops = &table_iter_vtable;
 }
 
-static int reader_table_iter_at(struct reftable_reader *r,
-				struct table_iter *ti, uint64_t off,
-				uint8_t typ)
+static int table_iter_seek_to(struct table_iter *ti, struct reftable_reader *r,
+			      uint64_t off, uint8_t typ)
 {
 	int err;
 
@@ -403,8 +402,8 @@ static int reader_table_iter_at(struct reftable_reader *r,
 	return 0;
 }
 
-static int reader_start(struct reftable_reader *r, struct table_iter *ti,
-			uint8_t typ, int index)
+static int table_iter_seek_start(struct table_iter *ti, struct reftable_reader *r,
+				 uint8_t typ, int index)
 {
 	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
 	uint64_t off = offs->offset;
@@ -416,11 +415,11 @@ static int reader_start(struct reftable_reader *r, struct table_iter *ti,
 		typ = BLOCK_TYPE_INDEX;
 	}
 
-	return reader_table_iter_at(r, ti, off, typ);
+	return table_iter_seek_to(ti, r, off, typ);
 }
 
-static int reader_seek_linear(struct table_iter *ti,
-			      struct reftable_record *want)
+static int table_iter_seek_linear(struct table_iter *ti,
+				  struct reftable_record *want)
 {
 	struct strbuf want_key = STRBUF_INIT;
 	struct strbuf got_key = STRBUF_INIT;
@@ -499,9 +498,8 @@ static int reader_seek_linear(struct table_iter *ti,
 	return err;
 }
 
-static int reader_seek_indexed(struct table_iter *ti,
-			       struct reftable_reader *r,
-			       struct reftable_record *rec)
+static int table_iter_seek_indexed(struct table_iter *ti,
+				   struct reftable_record *rec)
 {
 	struct reftable_record want_index = {
 		.type = BLOCK_TYPE_INDEX, .u.idx = { .last_key = STRBUF_INIT }
@@ -520,7 +518,7 @@ static int reader_seek_indexed(struct table_iter *ti,
 	 * highest layer that identifies the relevant index block as well as
 	 * the record inside that block that corresponds to our wanted key.
 	 */
-	err = reader_seek_linear(ti, &want_index);
+	err = table_iter_seek_linear(ti, &want_index);
 	if (err < 0)
 		goto done;
 
@@ -550,7 +548,7 @@ static int reader_seek_indexed(struct table_iter *ti,
 		if (err != 0)
 			goto done;
 
-		err = reader_table_iter_at(r, ti, index_result.u.idx.offset, 0);
+		err = table_iter_seek_to(ti, ti->r, index_result.u.idx.offset, 0);
 		if (err != 0)
 			goto done;
 
@@ -585,14 +583,14 @@ static int reader_seek_internal(struct reftable_reader *r,
 	struct table_iter ti = TABLE_ITER_INIT, *p;
 	int err;
 
-	err = reader_start(r, &ti, reftable_record_type(rec), !!idx);
+	err = table_iter_seek_start(&ti, r, reftable_record_type(rec), !!idx);
 	if (err < 0)
 		goto out;
 
 	if (idx)
-		err = reader_seek_indexed(&ti, r, rec);
+		err = table_iter_seek_indexed(&ti, rec);
 	else
-		err = reader_seek_linear(&ti, rec);
+		err = table_iter_seek_linear(&ti, rec);
 	if (err)
 		goto out;
 
@@ -742,7 +740,7 @@ static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
 	int err;
 
 	*ti = ti_empty;
-	err = reader_start(r, ti, BLOCK_TYPE_REF, 0);
+	err = table_iter_seek_start(ti, r, BLOCK_TYPE_REF, 0);
 	if (err < 0) {
 		reftable_free(ti);
 		return err;
-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 05/13] reftable/reader: inline `reader_seek_internal()`
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
                   ` (3 preceding siblings ...)
  2024-05-08 11:03 ` [PATCH 04/13] reftable/reader: separate concerns of table iter and reftable reader Patrick Steinhardt
@ 2024-05-08 11:03 ` Patrick Steinhardt
  2024-05-08 11:04 ` [PATCH 06/13] reftable/reader: set up the reader when initializing table iterator Patrick Steinhardt
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:03 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2238 bytes --]

We have both `reader_seek()` and `reader_seek_internal()`, where the
former function only exists so that we can exit early in case the given
table has no records of the sought-after type.

Merge these two functions into one.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c | 34 ++++++++++++----------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index b210753441..c3541e2c43 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -573,21 +573,25 @@ static int table_iter_seek_indexed(struct table_iter *ti,
 	return err;
 }
 
-static int reader_seek_internal(struct reftable_reader *r,
-				struct reftable_iterator *it,
-				struct reftable_record *rec)
+static int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		       struct reftable_record *rec)
 {
-	struct reftable_reader_offsets *offs =
-		reader_offsets_for(r, reftable_record_type(rec));
-	uint64_t idx = offs->index_offset;
+	uint8_t typ = reftable_record_type(rec);
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
 	struct table_iter ti = TABLE_ITER_INIT, *p;
 	int err;
 
-	err = table_iter_seek_start(&ti, r, reftable_record_type(rec), !!idx);
+	if (!offs->is_present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	err = table_iter_seek_start(&ti, r, reftable_record_type(rec),
+				    !!offs->index_offset);
 	if (err < 0)
 		goto out;
 
-	if (idx)
+	if (offs->index_offset)
 		err = table_iter_seek_indexed(&ti, rec);
 	else
 		err = table_iter_seek_linear(&ti, rec);
@@ -604,20 +608,6 @@ static int reader_seek_internal(struct reftable_reader *r,
 	return err;
 }
 
-static int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
-		       struct reftable_record *rec)
-{
-	uint8_t typ = reftable_record_type(rec);
-
-	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
-	if (!offs->is_present) {
-		iterator_set_empty(it);
-		return 0;
-	}
-
-	return reader_seek_internal(r, it, rec);
-}
-
 int reftable_reader_seek_ref(struct reftable_reader *r,
 			     struct reftable_iterator *it, const char *name)
 {
-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 06/13] reftable/reader: set up the reader when initializing table iterator
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
                   ` (4 preceding siblings ...)
  2024-05-08 11:03 ` [PATCH 05/13] reftable/reader: inline `reader_seek_internal()` Patrick Steinhardt
@ 2024-05-08 11:04 ` Patrick Steinhardt
  2024-05-08 11:04 ` [PATCH 07/13] reftable/merged: split up initialization and seeking of records Patrick Steinhardt
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:04 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 4237 bytes --]

All the seeking functions accept a `struct reftable_reader` as input
such that they can use the reader to look up the respective blocks.
Refactor the code to instead set up the reader as a member of `struct
table_iter` during initialization such that we don't have to pass the
reader on every single call.

This step is required to move seeking of records into the generic
`struct reftable_iterator` infrastructure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index c3541e2c43..021608f638 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -224,8 +224,14 @@ struct table_iter {
 	struct block_iter bi;
 	int is_finished;
 };
-#define TABLE_ITER_INIT { \
-	.bi = BLOCK_ITER_INIT \
+
+static int table_iter_init(struct table_iter *ti, struct reftable_reader *r)
+{
+	struct block_iter bi = BLOCK_ITER_INIT;
+	memset(ti, 0, sizeof(*ti));
+	ti->r = r;
+	ti->bi = bi;
+	return 0;
 }
 
 static int table_iter_next_in_block(struct table_iter *ti,
@@ -386,26 +392,23 @@ static void iterator_from_table_iter(struct reftable_iterator *it,
 	it->ops = &table_iter_vtable;
 }
 
-static int table_iter_seek_to(struct table_iter *ti, struct reftable_reader *r,
-			      uint64_t off, uint8_t typ)
+static int table_iter_seek_to(struct table_iter *ti, uint64_t off, uint8_t typ)
 {
 	int err;
 
-	err = reader_init_block_reader(r, &ti->br, off, typ);
+	err = reader_init_block_reader(ti->r, &ti->br, off, typ);
 	if (err != 0)
 		return err;
 
-	ti->r = r;
 	ti->typ = block_reader_type(&ti->br);
 	ti->block_off = off;
 	block_iter_seek_start(&ti->bi, &ti->br);
 	return 0;
 }
 
-static int table_iter_seek_start(struct table_iter *ti, struct reftable_reader *r,
-				 uint8_t typ, int index)
+static int table_iter_seek_start(struct table_iter *ti, uint8_t typ, int index)
 {
-	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	struct reftable_reader_offsets *offs = reader_offsets_for(ti->r, typ);
 	uint64_t off = offs->offset;
 	if (index) {
 		off = offs->index_offset;
@@ -415,7 +418,7 @@ static int table_iter_seek_start(struct table_iter *ti, struct reftable_reader *
 		typ = BLOCK_TYPE_INDEX;
 	}
 
-	return table_iter_seek_to(ti, r, off, typ);
+	return table_iter_seek_to(ti, off, typ);
 }
 
 static int table_iter_seek_linear(struct table_iter *ti,
@@ -548,7 +551,7 @@ static int table_iter_seek_indexed(struct table_iter *ti,
 		if (err != 0)
 			goto done;
 
-		err = table_iter_seek_to(ti, ti->r, index_result.u.idx.offset, 0);
+		err = table_iter_seek_to(ti, index_result.u.idx.offset, 0);
 		if (err != 0)
 			goto done;
 
@@ -578,7 +581,7 @@ static int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
 {
 	uint8_t typ = reftable_record_type(rec);
 	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
-	struct table_iter ti = TABLE_ITER_INIT, *p;
+	struct table_iter ti, *p;
 	int err;
 
 	if (!offs->is_present) {
@@ -586,7 +589,9 @@ static int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
 		return 0;
 	}
 
-	err = table_iter_seek_start(&ti, r, reftable_record_type(rec),
+	table_iter_init(&ti, r);
+
+	err = table_iter_seek_start(&ti, reftable_record_type(rec),
 				    !!offs->index_offset);
 	if (err < 0)
 		goto out;
@@ -722,15 +727,15 @@ static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
 					      struct reftable_iterator *it,
 					      uint8_t *oid)
 {
-	struct table_iter ti_empty = TABLE_ITER_INIT;
-	struct table_iter *ti = reftable_calloc(1, sizeof(*ti));
+	struct table_iter *ti;
 	struct filtering_ref_iterator *filter = NULL;
 	struct filtering_ref_iterator empty = FILTERING_REF_ITERATOR_INIT;
 	int oid_len = hash_size(r->hash_id);
 	int err;
 
-	*ti = ti_empty;
-	err = table_iter_seek_start(ti, r, BLOCK_TYPE_REF, 0);
+	REFTABLE_ALLOC_ARRAY(ti, 1);
+	table_iter_init(ti, r);
+	err = table_iter_seek_start(ti, BLOCK_TYPE_REF, 0);
 	if (err < 0) {
 		reftable_free(ti);
 		return err;
-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 07/13] reftable/merged: split up initialization and seeking of records
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
                   ` (5 preceding siblings ...)
  2024-05-08 11:04 ` [PATCH 06/13] reftable/reader: set up the reader when initializing table iterator Patrick Steinhardt
@ 2024-05-08 11:04 ` Patrick Steinhardt
  2024-05-10 19:18   ` Justin Tobler
  2024-05-08 11:04 ` [PATCH 08/13] reftable/merged: simplify indices for subiterators Patrick Steinhardt
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:04 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3428 bytes --]

To initialize a `struct merged_iter`, we need to seek all subiterators
to the wanted record and then add their results to the priority queue
used to sort the records. This logic is split up across two functions,
`merged_table_seek_record()` and `merged_table_iter()`. The scope of
these functions is somewhat weird though, where `merged_table_iter()` is
only responsible for adding the records of the subiterators to the
priority queue.

Clarify the scope of those functions such that `merged_table_iter()` is
only responsible for initializing the iterator's structure. Performing
the subiterator seeks are now part of `merged_table_seek_record()`.

This step is required to move seeking of records into the generic
`struct reftable_iterator` infrastructure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 59 ++++++++++++++++++-----------------------------
 1 file changed, 22 insertions(+), 37 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index f85a24c678..4e1b78e93f 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -25,34 +25,18 @@ struct merged_subiter {
 struct merged_iter {
 	struct merged_subiter *subiters;
 	struct merged_iter_pqueue pq;
-	uint32_t hash_id;
 	size_t stack_len;
-	uint8_t typ;
 	int suppress_deletions;
 	ssize_t advance_index;
 };
 
-static int merged_iter_init(struct merged_iter *mi)
+static void merged_iter_init(struct merged_iter *mi,
+			     struct reftable_merged_table *mt)
 {
-	for (size_t i = 0; i < mi->stack_len; i++) {
-		struct pq_entry e = {
-			.index = i,
-			.rec = &mi->subiters[i].rec,
-		};
-		int err;
-
-		reftable_record_init(&mi->subiters[i].rec, mi->typ);
-		err = iterator_next(&mi->subiters[i].iter,
-				    &mi->subiters[i].rec);
-		if (err < 0)
-			return err;
-		if (err > 0)
-			continue;
-
-		merged_iter_pqueue_add(&mi->pq, &e);
-	}
-
-	return 0;
+	memset(mi, 0, sizeof(*mi));
+	mi->advance_index = -1;
+	mi->suppress_deletions = mt->suppress_deletions;
+	REFTABLE_CALLOC_ARRAY(mi->subiters, mt->stack_len);
 }
 
 static void merged_iter_close(void *p)
@@ -246,32 +230,33 @@ static int merged_table_seek_record(struct reftable_merged_table *mt,
 				    struct reftable_iterator *it,
 				    struct reftable_record *rec)
 {
-	struct merged_iter merged = {
-		.typ = reftable_record_type(rec),
-		.hash_id = mt->hash_id,
-		.suppress_deletions = mt->suppress_deletions,
-		.advance_index = -1,
-	};
-	struct merged_iter *p;
+	struct merged_iter merged, *p;
 	int err;
 
-	REFTABLE_CALLOC_ARRAY(merged.subiters, mt->stack_len);
+	merged_iter_init(&merged, mt);
+
 	for (size_t i = 0; i < mt->stack_len; i++) {
+		reftable_record_init(&merged.subiters[merged.stack_len].rec,
+				     reftable_record_type(rec));
+
 		err = reftable_table_seek_record(&mt->stack[i],
 						 &merged.subiters[merged.stack_len].iter, rec);
 		if (err < 0)
 			goto out;
-		if (!err)
-			merged.stack_len++;
-	}
+		if (err > 0)
+			continue;
 
-	err = merged_iter_init(&merged);
-	if (err < 0)
-		goto out;
+		err = merged_iter_advance_subiter(&merged, merged.stack_len);
+		if (err < 0)
+			goto out;
+
+		merged.stack_len++;
+	}
 
-	p = reftable_malloc(sizeof(struct merged_iter));
+	p = reftable_malloc(sizeof(*p));
 	*p = merged;
 	iterator_from_merged_iter(it, p);
+	err = 0;
 
 out:
 	if (err < 0)
-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 08/13] reftable/merged: simplify indices for subiterators
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
                   ` (6 preceding siblings ...)
  2024-05-08 11:04 ` [PATCH 07/13] reftable/merged: split up initialization and seeking of records Patrick Steinhardt
@ 2024-05-08 11:04 ` Patrick Steinhardt
  2024-05-10 19:25   ` Justin Tobler
  2024-05-08 11:04 ` [PATCH 09/13] reftable/generic: move seeking of records into the iterator Patrick Steinhardt
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:04 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 2031 bytes --]

When seeking on a merged table, we perform the seek for each of the
subiterators. If the subiterator hasa the desired record we add it to
the priority queue, otherwise we skip it and don't add it to the stack
of subiterators hosted by the merged table.

The consequence of this is that the index of the subiterator in the
merged table does not necessarily correspond to the index of it in the
merged iterator. Next to being potentially confusing, it also means that
we won't easily be able to re-seek the merged iterator because we have
no clear connection between both of the data structures.

Refactor the code so that the index stays the same in both structures.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index 4e1b78e93f..18a2a6f09b 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -37,6 +37,7 @@ static void merged_iter_init(struct merged_iter *mi,
 	mi->advance_index = -1;
 	mi->suppress_deletions = mt->suppress_deletions;
 	REFTABLE_CALLOC_ARRAY(mi->subiters, mt->stack_len);
+	mi->stack_len = mt->stack_len;
 }
 
 static void merged_iter_close(void *p)
@@ -236,21 +237,19 @@ static int merged_table_seek_record(struct reftable_merged_table *mt,
 	merged_iter_init(&merged, mt);
 
 	for (size_t i = 0; i < mt->stack_len; i++) {
-		reftable_record_init(&merged.subiters[merged.stack_len].rec,
+		reftable_record_init(&merged.subiters[i].rec,
 				     reftable_record_type(rec));
 
 		err = reftable_table_seek_record(&mt->stack[i],
-						 &merged.subiters[merged.stack_len].iter, rec);
+						 &merged.subiters[i].iter, rec);
 		if (err < 0)
 			goto out;
 		if (err > 0)
 			continue;
 
-		err = merged_iter_advance_subiter(&merged, merged.stack_len);
+		err = merged_iter_advance_subiter(&merged, i);
 		if (err < 0)
 			goto out;
-
-		merged.stack_len++;
 	}
 
 	p = reftable_malloc(sizeof(*p));
-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 09/13] reftable/generic: move seeking of records into the iterator
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
                   ` (7 preceding siblings ...)
  2024-05-08 11:04 ` [PATCH 08/13] reftable/merged: simplify indices for subiterators Patrick Steinhardt
@ 2024-05-08 11:04 ` Patrick Steinhardt
  2024-05-10 21:44   ` Justin Tobler
  2024-05-08 11:04 ` [PATCH 10/13] reftable/generic: adapt interface to allow reuse of iterators Patrick Steinhardt
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:04 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 17826 bytes --]

Reftable iterators are created by seeking on the parent structure of a
corresponding record. For example, to create an iterator for the merged
table you would call `reftable_merged_table_seek_ref()`. Most notably,
it is not posible to create an iterator and then seek it afterwards.

While this may be a bit easier to reason about, it comes with two
significant downsides. The first downside is that the logic to find
records is split up between the parent data structure and the iterator
itself. Conceptually, it is more straight forward if all that logic was
contained in a single place, which should be the iterator.

The second and more significant downside is that it is impossible to
reuse iterators for multiple seeks. Whenever you want to look up a
record, you need to re-create the whole infrastructure again, which is
quite a waste of time. Furthermore, it is impossible to for example
optimize seeks, for example when seeking the same record multiple times.

To address this, we essentially split up the concerns properly such that
the parent data structure is responsible for setting up the iterator via
a new `init_iter()` callback, whereas the iterator handles seeks via a
new `seek()` callback. This will eventually allow us to call `seek()` on
the iterator multiple times, where every iterator can potentially
optimize for certain cases.

Note that at this point in time we are not yet ready to reuse the
iterators. This will be left for a future patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/generic.c |  47 +++++++++++++----
 reftable/generic.h |   9 +++-
 reftable/iter.c    |  15 ++++++
 reftable/merged.c  |  97 +++++++++++++++++-----------------
 reftable/reader.c  | 126 +++++++++++++++++++++++++--------------------
 5 files changed, 177 insertions(+), 117 deletions(-)

diff --git a/reftable/generic.c b/reftable/generic.c
index b9f1c7c18a..1cf68fe124 100644
--- a/reftable/generic.c
+++ b/reftable/generic.c
@@ -12,25 +12,39 @@ license that can be found in the LICENSE file or at
 #include "reftable-iterator.h"
 #include "reftable-generic.h"
 
+void table_init_iter(struct reftable_table *tab,
+		     struct reftable_iterator *it,
+		     uint8_t typ)
+{
+
+	tab->ops->init_iter(tab->table_arg, it, typ);
+}
+
 int reftable_table_seek_ref(struct reftable_table *tab,
 			    struct reftable_iterator *it, const char *name)
 {
-	struct reftable_record rec = { .type = BLOCK_TYPE_REF,
-				       .u.ref = {
-					       .refname = (char *)name,
-				       } };
-	return tab->ops->seek_record(tab->table_arg, it, &rec);
+	struct reftable_record want = {
+		.type = BLOCK_TYPE_REF,
+		.u.ref = {
+			.refname = (char *)name,
+		},
+	};
+	table_init_iter(tab, it, BLOCK_TYPE_REF);
+	return it->ops->seek(it->iter_arg, &want);
 }
 
 int reftable_table_seek_log(struct reftable_table *tab,
 			    struct reftable_iterator *it, const char *name)
 {
-	struct reftable_record rec = { .type = BLOCK_TYPE_LOG,
-				       .u.log = {
-					       .refname = (char *)name,
-					       .update_index = ~((uint64_t)0),
-				       } };
-	return tab->ops->seek_record(tab->table_arg, it, &rec);
+	struct reftable_record want = {
+		.type = BLOCK_TYPE_LOG,
+		.u.log = {
+			.refname = (char *)name,
+			.update_index = ~((uint64_t)0),
+		},
+	};
+	table_init_iter(tab, it, BLOCK_TYPE_LOG);
+	return it->ops->seek(it->iter_arg, &want);
 }
 
 int reftable_table_read_ref(struct reftable_table *tab, const char *name,
@@ -152,11 +166,21 @@ int reftable_iterator_next_log(struct reftable_iterator *it,
 	return err;
 }
 
+int iterator_seek(struct reftable_iterator *it, struct reftable_record *want)
+{
+	return it->ops->seek(it->iter_arg, want);
+}
+
 int iterator_next(struct reftable_iterator *it, struct reftable_record *rec)
 {
 	return it->ops->next(it->iter_arg, rec);
 }
 
+static int empty_iterator_seek(void *arg, struct reftable_record *want)
+{
+	return 0;
+}
+
 static int empty_iterator_next(void *arg, struct reftable_record *rec)
 {
 	return 1;
@@ -167,6 +191,7 @@ static void empty_iterator_close(void *arg)
 }
 
 static struct reftable_iterator_vtable empty_vtable = {
+	.seek = &empty_iterator_seek,
 	.next = &empty_iterator_next,
 	.close = &empty_iterator_close,
 };
diff --git a/reftable/generic.h b/reftable/generic.h
index 98886a0640..8341fa570e 100644
--- a/reftable/generic.h
+++ b/reftable/generic.h
@@ -14,19 +14,24 @@ license that can be found in the LICENSE file or at
 
 /* generic interface to reftables */
 struct reftable_table_vtable {
-	int (*seek_record)(void *tab, struct reftable_iterator *it,
-			   struct reftable_record *);
+	void (*init_iter)(void *tab, struct reftable_iterator *it, uint8_t typ);
 	uint32_t (*hash_id)(void *tab);
 	uint64_t (*min_update_index)(void *tab);
 	uint64_t (*max_update_index)(void *tab);
 };
 
+void table_init_iter(struct reftable_table *tab,
+		     struct reftable_iterator *it,
+		     uint8_t typ);
+
 struct reftable_iterator_vtable {
+	int (*seek)(void *iter_arg, struct reftable_record *want);
 	int (*next)(void *iter_arg, struct reftable_record *rec);
 	void (*close)(void *iter_arg);
 };
 
 void iterator_set_empty(struct reftable_iterator *it);
+int iterator_seek(struct reftable_iterator *it, struct reftable_record *want);
 int iterator_next(struct reftable_iterator *it, struct reftable_record *rec);
 
 #endif
diff --git a/reftable/iter.c b/reftable/iter.c
index aa9ac199b1..b4528fab47 100644
--- a/reftable/iter.c
+++ b/reftable/iter.c
@@ -23,6 +23,13 @@ static void filtering_ref_iterator_close(void *iter_arg)
 	reftable_iterator_destroy(&fri->it);
 }
 
+static int filtering_ref_iterator_seek(void *iter_arg,
+				       struct reftable_record *want)
+{
+	struct filtering_ref_iterator *fri = iter_arg;
+	return iterator_seek(&fri->it, want);
+}
+
 static int filtering_ref_iterator_next(void *iter_arg,
 				       struct reftable_record *rec)
 {
@@ -73,6 +80,7 @@ static int filtering_ref_iterator_next(void *iter_arg,
 }
 
 static struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.seek = &filtering_ref_iterator_seek,
 	.next = &filtering_ref_iterator_next,
 	.close = &filtering_ref_iterator_close,
 };
@@ -119,6 +127,12 @@ static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
 	return 0;
 }
 
+static int indexed_table_ref_iter_seek(void *p, struct reftable_record *want)
+{
+	BUG("seeking indexed table is not supported");
+	return -1;
+}
+
 static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec)
 {
 	struct indexed_table_ref_iter *it = p;
@@ -175,6 +189,7 @@ int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
 }
 
 static struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.seek = &indexed_table_ref_iter_seek,
 	.next = &indexed_table_ref_iter_next,
 	.close = &indexed_table_ref_iter_close,
 };
diff --git a/reftable/merged.c b/reftable/merged.c
index 18a2a6f09b..fc7946d90d 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -31,12 +31,18 @@ struct merged_iter {
 };
 
 static void merged_iter_init(struct merged_iter *mi,
-			     struct reftable_merged_table *mt)
+			     struct reftable_merged_table *mt,
+			     uint8_t typ)
 {
 	memset(mi, 0, sizeof(*mi));
 	mi->advance_index = -1;
 	mi->suppress_deletions = mt->suppress_deletions;
+
 	REFTABLE_CALLOC_ARRAY(mi->subiters, mt->stack_len);
+	for (size_t i = 0; i < mt->stack_len; i++) {
+		reftable_record_init(&mi->subiters[i].rec, typ);
+		table_init_iter(&mt->stack[i], &mi->subiters[i].iter, typ);
+	}
 	mi->stack_len = mt->stack_len;
 }
 
@@ -68,6 +74,27 @@ static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
 	return 0;
 }
 
+static int merged_iter_seek(struct merged_iter *mi, struct reftable_record *want)
+{
+	int err;
+
+	mi->advance_index = -1;
+
+	for (size_t i = 0; i < mi->stack_len; i++) {
+		err = iterator_seek(&mi->subiters[i].iter, want);
+		if (err < 0)
+			return err;
+		if (err > 0)
+			continue;
+
+		err = merged_iter_advance_subiter(mi, i);
+		if (err < 0)
+			return err;
+	}
+
+	return 0;
+}
+
 static int merged_iter_next_entry(struct merged_iter *mi,
 				  struct reftable_record *rec)
 {
@@ -133,6 +160,11 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 	return 0;
 }
 
+static int merged_iter_seek_void(void *it, struct reftable_record *want)
+{
+	return merged_iter_seek(it, want);
+}
+
 static int merged_iter_next_void(void *p, struct reftable_record *rec)
 {
 	struct merged_iter *mi = p;
@@ -147,6 +179,7 @@ static int merged_iter_next_void(void *p, struct reftable_record *rec)
 }
 
 static struct reftable_iterator_vtable merged_iter_vtable = {
+	.seek = merged_iter_seek_void,
 	.next = &merged_iter_next_void,
 	.close = &merged_iter_close,
 };
@@ -220,47 +253,13 @@ reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
 	return mt->min;
 }
 
-static int reftable_table_seek_record(struct reftable_table *tab,
-				      struct reftable_iterator *it,
-				      struct reftable_record *rec)
-{
-	return tab->ops->seek_record(tab->table_arg, it, rec);
-}
-
-static int merged_table_seek_record(struct reftable_merged_table *mt,
-				    struct reftable_iterator *it,
-				    struct reftable_record *rec)
+static void merged_table_init_iter(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   uint8_t typ)
 {
-	struct merged_iter merged, *p;
-	int err;
-
-	merged_iter_init(&merged, mt);
-
-	for (size_t i = 0; i < mt->stack_len; i++) {
-		reftable_record_init(&merged.subiters[i].rec,
-				     reftable_record_type(rec));
-
-		err = reftable_table_seek_record(&mt->stack[i],
-						 &merged.subiters[i].iter, rec);
-		if (err < 0)
-			goto out;
-		if (err > 0)
-			continue;
-
-		err = merged_iter_advance_subiter(&merged, i);
-		if (err < 0)
-			goto out;
-	}
-
-	p = reftable_malloc(sizeof(*p));
-	*p = merged;
-	iterator_from_merged_iter(it, p);
-	err = 0;
-
-out:
-	if (err < 0)
-		merged_iter_close(&merged);
-	return err;
+	struct merged_iter *mi = reftable_malloc(sizeof(*mi));
+	merged_iter_init(mi, mt, typ);
+	iterator_from_merged_iter(it, mi);
 }
 
 int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
@@ -273,7 +272,8 @@ int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
 			.refname = (char *)name,
 		},
 	};
-	return merged_table_seek_record(mt, it, &rec);
+	merged_table_init_iter(mt, it, BLOCK_TYPE_REF);
+	return iterator_seek(it, &rec);
 }
 
 int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
@@ -285,7 +285,8 @@ int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
 					       .refname = (char *)name,
 					       .update_index = update_index,
 				       } };
-	return merged_table_seek_record(mt, it, &rec);
+	merged_table_init_iter(mt, it, BLOCK_TYPE_LOG);
+	return iterator_seek(it, &rec);
 }
 
 int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
@@ -301,11 +302,11 @@ uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *mt)
 	return mt->hash_id;
 }
 
-static int reftable_merged_table_seek_void(void *tab,
-					   struct reftable_iterator *it,
-					   struct reftable_record *rec)
+static void reftable_merged_table_init_iter_void(void *tab,
+						 struct reftable_iterator *it,
+						 uint8_t typ)
 {
-	return merged_table_seek_record(tab, it, rec);
+	merged_table_init_iter(tab, it, typ);
 }
 
 static uint32_t reftable_merged_table_hash_id_void(void *tab)
@@ -324,7 +325,7 @@ static uint64_t reftable_merged_table_max_update_index_void(void *tab)
 }
 
 static struct reftable_table_vtable merged_table_vtable = {
-	.seek_record = reftable_merged_table_seek_void,
+	.init_iter = reftable_merged_table_init_iter_void,
 	.hash_id = reftable_merged_table_hash_id_void,
 	.min_update_index = reftable_merged_table_min_update_index_void,
 	.max_update_index = reftable_merged_table_max_update_index_void,
diff --git a/reftable/reader.c b/reftable/reader.c
index 021608f638..a5a13cb0b9 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -369,29 +369,6 @@ static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
 	}
 }
 
-static int table_iter_next_void(void *ti, struct reftable_record *rec)
-{
-	return table_iter_next(ti, rec);
-}
-
-static void table_iter_close_void(void *ti)
-{
-	table_iter_close(ti);
-}
-
-static struct reftable_iterator_vtable table_iter_vtable = {
-	.next = &table_iter_next_void,
-	.close = &table_iter_close_void,
-};
-
-static void iterator_from_table_iter(struct reftable_iterator *it,
-				     struct table_iter *ti)
-{
-	assert(!it->ops);
-	it->iter_arg = ti;
-	it->ops = &table_iter_vtable;
-}
-
 static int table_iter_seek_to(struct table_iter *ti, uint64_t off, uint8_t typ)
 {
 	int err;
@@ -576,43 +553,74 @@ static int table_iter_seek_indexed(struct table_iter *ti,
 	return err;
 }
 
-static int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
-		       struct reftable_record *rec)
+static int table_iter_seek(struct table_iter *ti,
+			   struct reftable_record *want)
 {
-	uint8_t typ = reftable_record_type(rec);
-	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
-	struct table_iter ti, *p;
+	uint8_t typ = reftable_record_type(want);
+	struct reftable_reader_offsets *offs = reader_offsets_for(ti->r, typ);
 	int err;
 
-	if (!offs->is_present) {
-		iterator_set_empty(it);
-		return 0;
-	}
-
-	table_iter_init(&ti, r);
-
-	err = table_iter_seek_start(&ti, reftable_record_type(rec),
+	err = table_iter_seek_start(ti, reftable_record_type(want),
 				    !!offs->index_offset);
 	if (err < 0)
 		goto out;
 
 	if (offs->index_offset)
-		err = table_iter_seek_indexed(&ti, rec);
+		err = table_iter_seek_indexed(ti, want);
 	else
-		err = table_iter_seek_linear(&ti, rec);
+		err = table_iter_seek_linear(ti, want);
 	if (err)
 		goto out;
 
-	REFTABLE_ALLOC_ARRAY(p, 1);
-	*p = ti;
-	iterator_from_table_iter(it, p);
-
 out:
-	if (err)
-		table_iter_close(&ti);
 	return err;
 }
 
+static int table_iter_seek_void(void *ti, struct reftable_record *want)
+{
+	return table_iter_seek(ti, want);
+}
+
+static int table_iter_next_void(void *ti, struct reftable_record *rec)
+{
+	return table_iter_next(ti, rec);
+}
+
+static void table_iter_close_void(void *ti)
+{
+	table_iter_close(ti);
+}
+
+static struct reftable_iterator_vtable table_iter_vtable = {
+	.seek = &table_iter_seek_void,
+	.next = &table_iter_next_void,
+	.close = &table_iter_close_void,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	assert(!it->ops);
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static void reader_init_iter(struct reftable_reader *r,
+			     struct reftable_iterator *it,
+			     uint8_t typ)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+
+	if (offs->is_present) {
+		struct table_iter *ti;
+		REFTABLE_ALLOC_ARRAY(ti, 1);
+		table_iter_init(ti, r);
+		iterator_from_table_iter(it, ti);
+	} else {
+		iterator_set_empty(it);
+	}
+}
+
 int reftable_reader_seek_ref(struct reftable_reader *r,
 			     struct reftable_iterator *it, const char *name)
 {
@@ -622,19 +630,23 @@ int reftable_reader_seek_ref(struct reftable_reader *r,
 			.refname = (char *)name,
 		},
 	};
-	return reader_seek(r, it, &rec);
+	reader_init_iter(r, it, BLOCK_TYPE_REF);
+	return iterator_seek(it, &rec);
 }
 
 int reftable_reader_seek_log_at(struct reftable_reader *r,
 				struct reftable_iterator *it, const char *name,
 				uint64_t update_index)
 {
-	struct reftable_record rec = { .type = BLOCK_TYPE_LOG,
-				       .u.log = {
-					       .refname = (char *)name,
-					       .update_index = update_index,
-				       } };
-	return reader_seek(r, it, &rec);
+	struct reftable_record rec = {
+		.type = BLOCK_TYPE_LOG,
+		.u.log = {
+			.refname = (char *)name,
+			.update_index = update_index,
+		},
+	};
+	reader_init_iter(r, it, BLOCK_TYPE_LOG);
+	return iterator_seek(it, &rec);
 }
 
 int reftable_reader_seek_log(struct reftable_reader *r,
@@ -692,7 +704,8 @@ static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
 	struct indexed_table_ref_iter *itr = NULL;
 
 	/* Look through the reverse index. */
-	err = reader_seek(r, &oit, &want);
+	reader_init_iter(r, &oit, BLOCK_TYPE_OBJ);
+	err = iterator_seek(&oit, &want);
 	if (err != 0)
 		goto done;
 
@@ -773,10 +786,11 @@ uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
 
 /* generic table interface. */
 
-static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
-				     struct reftable_record *rec)
+static void reftable_reader_init_iter_void(void *tab,
+					   struct reftable_iterator *it,
+					   uint8_t typ)
 {
-	return reader_seek(tab, it, rec);
+	reader_init_iter(tab, it, typ);
 }
 
 static uint32_t reftable_reader_hash_id_void(void *tab)
@@ -795,7 +809,7 @@ static uint64_t reftable_reader_max_update_index_void(void *tab)
 }
 
 static struct reftable_table_vtable reader_vtable = {
-	.seek_record = reftable_reader_seek_void,
+	.init_iter = reftable_reader_init_iter_void,
 	.hash_id = reftable_reader_hash_id_void,
 	.min_update_index = reftable_reader_min_update_index_void,
 	.max_update_index = reftable_reader_max_update_index_void,
-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 10/13] reftable/generic: adapt interface to allow reuse of iterators
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
                   ` (8 preceding siblings ...)
  2024-05-08 11:04 ` [PATCH 09/13] reftable/generic: move seeking of records into the iterator Patrick Steinhardt
@ 2024-05-08 11:04 ` Patrick Steinhardt
  2024-05-08 11:04 ` [PATCH 11/13] reftable/reader: " Patrick Steinhardt
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:04 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 6394 bytes --]

Refactor the interfaces exposed by `struct reftable_table` and `struct
reftable_iterator` such that they support iterator reuse. This is done
by separating initialization of the iterator and seeking on it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/generic.c           | 53 ++++++++++++++++++++++++++----------
 reftable/iter.c              |  8 +++---
 reftable/reftable-generic.h  |  8 +++---
 reftable/reftable-iterator.h | 21 ++++++++++++++
 4 files changed, 68 insertions(+), 22 deletions(-)

diff --git a/reftable/generic.c b/reftable/generic.c
index 1cf68fe124..28ae26145e 100644
--- a/reftable/generic.c
+++ b/reftable/generic.c
@@ -20,8 +20,20 @@ void table_init_iter(struct reftable_table *tab,
 	tab->ops->init_iter(tab->table_arg, it, typ);
 }
 
-int reftable_table_seek_ref(struct reftable_table *tab,
-			    struct reftable_iterator *it, const char *name)
+void reftable_table_init_ref_iter(struct reftable_table *tab,
+				  struct reftable_iterator *it)
+{
+	table_init_iter(tab, it, BLOCK_TYPE_REF);
+}
+
+void reftable_table_init_log_iter(struct reftable_table *tab,
+				  struct reftable_iterator *it)
+{
+	table_init_iter(tab, it, BLOCK_TYPE_LOG);
+}
+
+int reftable_iterator_seek_ref(struct reftable_iterator *it,
+			       const char *name)
 {
 	struct reftable_record want = {
 		.type = BLOCK_TYPE_REF,
@@ -29,29 +41,37 @@ int reftable_table_seek_ref(struct reftable_table *tab,
 			.refname = (char *)name,
 		},
 	};
-	table_init_iter(tab, it, BLOCK_TYPE_REF);
 	return it->ops->seek(it->iter_arg, &want);
 }
 
-int reftable_table_seek_log(struct reftable_table *tab,
-			    struct reftable_iterator *it, const char *name)
+int reftable_iterator_seek_log_at(struct reftable_iterator *it,
+				  const char *name, uint64_t update_index)
 {
 	struct reftable_record want = {
 		.type = BLOCK_TYPE_LOG,
 		.u.log = {
 			.refname = (char *)name,
-			.update_index = ~((uint64_t)0),
+			.update_index = update_index,
 		},
 	};
-	table_init_iter(tab, it, BLOCK_TYPE_LOG);
 	return it->ops->seek(it->iter_arg, &want);
 }
 
+int reftable_iterator_seek_log(struct reftable_iterator *it,
+			       const char *name)
+{
+	return reftable_iterator_seek_log_at(it, name, ~((uint64_t) 0));
+}
+
 int reftable_table_read_ref(struct reftable_table *tab, const char *name,
 			    struct reftable_ref_record *ref)
 {
 	struct reftable_iterator it = { NULL };
-	int err = reftable_table_seek_ref(tab, &it, name);
+	int err;
+
+	reftable_table_init_ref_iter(tab, &it);
+
+	err = reftable_iterator_seek_ref(&it, name);
 	if (err)
 		goto done;
 
@@ -76,10 +96,13 @@ int reftable_table_print(struct reftable_table *tab) {
 	struct reftable_ref_record ref = { NULL };
 	struct reftable_log_record log = { NULL };
 	uint32_t hash_id = reftable_table_hash_id(tab);
-	int err = reftable_table_seek_ref(tab, &it, "");
-	if (err < 0) {
+	int err;
+
+	reftable_table_init_ref_iter(tab, &it);
+
+	err = reftable_iterator_seek_ref(&it, "");
+	if (err < 0)
 		return err;
-	}
 
 	while (1) {
 		err = reftable_iterator_next_ref(&it, &ref);
@@ -94,10 +117,12 @@ int reftable_table_print(struct reftable_table *tab) {
 	reftable_iterator_destroy(&it);
 	reftable_ref_record_release(&ref);
 
-	err = reftable_table_seek_log(tab, &it, "");
-	if (err < 0) {
+	reftable_table_init_log_iter(tab, &it);
+
+	err = reftable_iterator_seek_log(&it, "");
+	if (err < 0)
 		return err;
-	}
+
 	while (1) {
 		err = reftable_iterator_next_log(&it, &log);
 		if (err > 0) {
diff --git a/reftable/iter.c b/reftable/iter.c
index b4528fab47..fddea31e51 100644
--- a/reftable/iter.c
+++ b/reftable/iter.c
@@ -45,11 +45,11 @@ static int filtering_ref_iterator_next(void *iter_arg,
 		if (fri->double_check) {
 			struct reftable_iterator it = { NULL };
 
-			err = reftable_table_seek_ref(&fri->tab, &it,
-						      ref->refname);
-			if (err == 0) {
+			reftable_table_init_ref_iter(&fri->tab, &it);
+
+			err = reftable_iterator_seek_ref(&it, ref->refname);
+			if (err == 0)
 				err = reftable_iterator_next_ref(&it, ref);
-			}
 
 			reftable_iterator_destroy(&it);
 
diff --git a/reftable/reftable-generic.h b/reftable/reftable-generic.h
index d239751a77..65670ea093 100644
--- a/reftable/reftable-generic.h
+++ b/reftable/reftable-generic.h
@@ -21,11 +21,11 @@ struct reftable_table {
 	void *table_arg;
 };
 
-int reftable_table_seek_log(struct reftable_table *tab,
-			    struct reftable_iterator *it, const char *name);
+void reftable_table_init_ref_iter(struct reftable_table *tab,
+				  struct reftable_iterator *it);
 
-int reftable_table_seek_ref(struct reftable_table *tab,
-			    struct reftable_iterator *it, const char *name);
+void reftable_table_init_log_iter(struct reftable_table *tab,
+				  struct reftable_iterator *it);
 
 /* returns the hash ID from a generic reftable_table */
 uint32_t reftable_table_hash_id(struct reftable_table *tab);
diff --git a/reftable/reftable-iterator.h b/reftable/reftable-iterator.h
index d3eee7af35..e3bf688d53 100644
--- a/reftable/reftable-iterator.h
+++ b/reftable/reftable-iterator.h
@@ -21,12 +21,33 @@ struct reftable_iterator {
 	void *iter_arg;
 };
 
+/*
+ * Position the iterator at the ref record with given name such that the next
+ * call to `next_ref()` would yield the record.
+ */
+int reftable_iterator_seek_ref(struct reftable_iterator *it,
+			       const char *name);
+
 /* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
  * end of iteration.
  */
 int reftable_iterator_next_ref(struct reftable_iterator *it,
 			       struct reftable_ref_record *ref);
 
+/*
+ * Position the iterator at the log record with given name and update index
+ * such that the next call to `next_log()` would yield the record.
+ */
+int reftable_iterator_seek_log_at(struct reftable_iterator *it,
+				  const char *name, uint64_t update_index);
+
+/*
+ * Position the iterator at the newest log record with given name such that the
+ * next call to `next_log()` would yield the record.
+ */
+int reftable_iterator_seek_log(struct reftable_iterator *it,
+			       const char *name);
+
 /* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
  * end of iteration.
  */
-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 11/13] reftable/reader: adapt interface to allow reuse of iterators
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
                   ` (9 preceding siblings ...)
  2024-05-08 11:04 ` [PATCH 10/13] reftable/generic: adapt interface to allow reuse of iterators Patrick Steinhardt
@ 2024-05-08 11:04 ` Patrick Steinhardt
  2024-05-10 21:48   ` Justin Tobler
  2024-05-08 11:04 ` [PATCH 12/13] reftable/stack: provide convenience functions to create iterators Patrick Steinhardt
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:04 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 8299 bytes --]

Refactor the interfaces exposed by `struct reftable_reader` and `struct
table_iterator` such that they support iterator reuse. This is done by
separating initialization of the iterator and seeking on it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c          | 31 ++++----------------------
 reftable/readwrite_test.c  | 35 +++++++++++++++++++----------
 reftable/reftable-reader.h | 45 ++++++--------------------------------
 3 files changed, 35 insertions(+), 76 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index a5a13cb0b9..bbdb4bdafa 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -621,39 +621,16 @@ static void reader_init_iter(struct reftable_reader *r,
 	}
 }
 
-int reftable_reader_seek_ref(struct reftable_reader *r,
-			     struct reftable_iterator *it, const char *name)
+void reftable_reader_init_ref_iterator(struct reftable_reader *r,
+				       struct reftable_iterator *it)
 {
-	struct reftable_record rec = {
-		.type = BLOCK_TYPE_REF,
-		.u.ref = {
-			.refname = (char *)name,
-		},
-	};
 	reader_init_iter(r, it, BLOCK_TYPE_REF);
-	return iterator_seek(it, &rec);
 }
 
-int reftable_reader_seek_log_at(struct reftable_reader *r,
-				struct reftable_iterator *it, const char *name,
-				uint64_t update_index)
+void reftable_reader_init_log_iterator(struct reftable_reader *r,
+				       struct reftable_iterator *it)
 {
-	struct reftable_record rec = {
-		.type = BLOCK_TYPE_LOG,
-		.u.log = {
-			.refname = (char *)name,
-			.update_index = update_index,
-		},
-	};
 	reader_init_iter(r, it, BLOCK_TYPE_LOG);
-	return iterator_seek(it, &rec);
-}
-
-int reftable_reader_seek_log(struct reftable_reader *r,
-			     struct reftable_iterator *it, const char *name)
-{
-	uint64_t max = ~((uint64_t)0);
-	return reftable_reader_seek_log_at(r, it, name, max);
 }
 
 void reader_close(struct reftable_reader *r)
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index a6dbd214c5..d99543bbd6 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -239,7 +239,9 @@ static void test_log_write_read(void)
 	err = init_reader(&rd, &source, "file.log");
 	EXPECT_ERR(err);
 
-	err = reftable_reader_seek_ref(&rd, &it, names[N - 1]);
+	reftable_reader_init_ref_iterator(&rd, &it);
+
+	err = reftable_iterator_seek_ref(&it, names[N - 1]);
 	EXPECT_ERR(err);
 
 	err = reftable_iterator_next_ref(&it, &ref);
@@ -252,7 +254,9 @@ static void test_log_write_read(void)
 	reftable_iterator_destroy(&it);
 	reftable_ref_record_release(&ref);
 
-	err = reftable_reader_seek_log(&rd, &it, "");
+	reftable_reader_init_log_iterator(&rd, &it);
+
+	err = reftable_iterator_seek_log(&it, "");
 	EXPECT_ERR(err);
 
 	i = 0;
@@ -330,7 +334,8 @@ static void test_log_zlib_corruption(void)
 	err = init_reader(&rd, &source, "file.log");
 	EXPECT_ERR(err);
 
-	err = reftable_reader_seek_log(&rd, &it, "refname");
+	reftable_reader_init_log_iterator(&rd, &it);
+	err = reftable_iterator_seek_log(&it, "refname");
 	EXPECT(err == REFTABLE_ZLIB_ERROR);
 
 	reftable_iterator_destroy(&it);
@@ -358,7 +363,8 @@ static void test_table_read_write_sequential(void)
 	err = init_reader(&rd, &source, "file.ref");
 	EXPECT_ERR(err);
 
-	err = reftable_reader_seek_ref(&rd, &it, "");
+	reftable_reader_init_ref_iterator(&rd, &it);
+	err = reftable_iterator_seek_ref(&it, "");
 	EXPECT_ERR(err);
 
 	while (1) {
@@ -412,7 +418,8 @@ static void test_table_read_api(void)
 	err = init_reader(&rd, &source, "file.ref");
 	EXPECT_ERR(err);
 
-	err = reftable_reader_seek_ref(&rd, &it, names[0]);
+	reftable_reader_init_ref_iterator(&rd, &it);
+	err = reftable_iterator_seek_ref(&it, names[0]);
 	EXPECT_ERR(err);
 
 	err = reftable_iterator_next_log(&it, &log);
@@ -457,7 +464,8 @@ static void test_table_read_write_seek(int index, int hash_id)
 	}
 
 	for (i = 1; i < N; i++) {
-		int err = reftable_reader_seek_ref(&rd, &it, names[i]);
+		reftable_reader_init_ref_iterator(&rd, &it);
+		err = reftable_iterator_seek_ref(&it, names[i]);
 		EXPECT_ERR(err);
 		err = reftable_iterator_next_ref(&it, &ref);
 		EXPECT_ERR(err);
@@ -472,7 +480,8 @@ static void test_table_read_write_seek(int index, int hash_id)
 	strbuf_addstr(&pastLast, names[N - 1]);
 	strbuf_addstr(&pastLast, "/");
 
-	err = reftable_reader_seek_ref(&rd, &it, pastLast.buf);
+	reftable_reader_init_ref_iterator(&rd, &it);
+	err = reftable_iterator_seek_ref(&it, pastLast.buf);
 	if (err == 0) {
 		struct reftable_ref_record ref = { NULL };
 		int err = reftable_iterator_next_ref(&it, &ref);
@@ -576,7 +585,8 @@ static void test_table_refs_for(int indexed)
 		rd.obj_offsets.is_present = 0;
 	}
 
-	err = reftable_reader_seek_ref(&rd, &it, "");
+	reftable_reader_init_ref_iterator(&rd, &it);
+	err = reftable_iterator_seek_ref(&it, "");
 	EXPECT_ERR(err);
 	reftable_iterator_destroy(&it);
 
@@ -639,7 +649,8 @@ static void test_write_empty_table(void)
 	err = reftable_new_reader(&rd, &source, "filename");
 	EXPECT_ERR(err);
 
-	err = reftable_reader_seek_ref(rd, &it, "");
+	reftable_reader_init_ref_iterator(rd, &it);
+	err = reftable_iterator_seek_ref(&it, "");
 	EXPECT_ERR(err);
 
 	err = reftable_iterator_next_ref(&it, &rec);
@@ -846,7 +857,8 @@ static void test_write_multiple_indices(void)
 	 * Seeking the log uses the log index now. In case there is any
 	 * confusion regarding indices we would notice here.
 	 */
-	err = reftable_reader_seek_log(reader, &it, "");
+	reftable_reader_init_log_iterator(reader, &it);
+	err = reftable_iterator_seek_log(&it, "");
 	EXPECT_ERR(err);
 
 	reftable_iterator_destroy(&it);
@@ -901,7 +913,8 @@ static void test_write_multi_level_index(void)
 	/*
 	 * Seeking the last ref should work as expected.
 	 */
-	err = reftable_reader_seek_ref(reader, &it, "refs/heads/199");
+	reftable_reader_init_ref_iterator(reader, &it);
+	err = reftable_iterator_seek_ref(&it, "refs/heads/199");
 	EXPECT_ERR(err);
 
 	reftable_iterator_destroy(&it);
diff --git a/reftable/reftable-reader.h b/reftable/reftable-reader.h
index 4a4bc2fdf8..35b9fb66e4 100644
--- a/reftable/reftable-reader.h
+++ b/reftable/reftable-reader.h
@@ -36,48 +36,17 @@ struct reftable_table;
 int reftable_new_reader(struct reftable_reader **pp,
 			struct reftable_block_source *src, const char *name);
 
-/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
-   in the table.  To seek to the start of the table, use name = "".
-
-   example:
-
-   struct reftable_reader *r = NULL;
-   int err = reftable_new_reader(&r, &src, "filename");
-   if (err < 0) { ... }
-   struct reftable_iterator it  = {0};
-   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
-   if (err < 0) { ... }
-   struct reftable_ref_record ref  = {0};
-   while (1) {
-   err = reftable_iterator_next_ref(&it, &ref);
-   if (err > 0) {
-   break;
-   }
-   if (err < 0) {
-   ..error handling..
-   }
-   ..found..
-   }
-   reftable_iterator_destroy(&it);
-   reftable_ref_record_release(&ref);
-*/
-int reftable_reader_seek_ref(struct reftable_reader *r,
-			     struct reftable_iterator *it, const char *name);
+/* Initialize a reftable iterator for reading refs. */
+void reftable_reader_init_ref_iterator(struct reftable_reader *r,
+				       struct reftable_iterator *it);
+
+/* Initialize a reftable iterator for reading refs. */
+void reftable_reader_init_log_iterator(struct reftable_reader *r,
+				       struct reftable_iterator *it);
 
 /* returns the hash ID used in this table. */
 uint32_t reftable_reader_hash_id(struct reftable_reader *r);
 
-/* seek to logs for the given name, older than update_index. To seek to the
-   start of the table, use name = "".
-*/
-int reftable_reader_seek_log_at(struct reftable_reader *r,
-				struct reftable_iterator *it, const char *name,
-				uint64_t update_index);
-
-/* seek to newest log entry for given name. */
-int reftable_reader_seek_log(struct reftable_reader *r,
-			     struct reftable_iterator *it, const char *name);
-
 /* closes and deallocates a reader. */
 void reftable_reader_free(struct reftable_reader *);
 
-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 12/13] reftable/stack: provide convenience functions to create iterators
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
                   ` (10 preceding siblings ...)
  2024-05-08 11:04 ` [PATCH 11/13] reftable/reader: " Patrick Steinhardt
@ 2024-05-08 11:04 ` Patrick Steinhardt
  2024-05-08 11:04 ` [PATCH 13/13] reftable/merged: adapt interface to allow reuse of iterators Patrick Steinhardt
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:04 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 12261 bytes --]

There exist a bunch of call sites in the reftable backend that want to
create iterators for a reftable stack. This is rather convoluted right
now, where you always have to go via the merged table. And it is about
to become even more convoluted when we split up iterator initialization
and seeking in the next commit.

Introduce convenience functions that allow the caller to create an
iterator from a reftable stack directly without going through the merged
table. Adapt callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c   | 48 +++++++++++++++++----------------------
 reftable/merged.c         |  6 ++---
 reftable/merged.h         |  6 +++++
 reftable/reftable-stack.h | 18 +++++++++++++++
 reftable/stack.c          | 15 ++++++++++++
 5 files changed, 63 insertions(+), 30 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 010ef811b6..7ac2772bcb 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -15,7 +15,6 @@
 #include "../reftable/reftable-record.h"
 #include "../reftable/reftable-error.h"
 #include "../reftable/reftable-iterator.h"
-#include "../reftable/reftable-merged.h"
 #include "../setup.h"
 #include "../strmap.h"
 #include "parse.h"
@@ -462,7 +461,6 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 							    const char *prefix,
 							    int flags)
 {
-	struct reftable_merged_table *merged_table;
 	struct reftable_ref_iterator *iter;
 	int ret;
 
@@ -482,9 +480,8 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	if (ret)
 		goto done;
 
-	merged_table = reftable_stack_merged_table(stack);
-
-	ret = reftable_merged_table_seek_ref(merged_table, &iter->iter, prefix);
+	reftable_stack_init_ref_iterator(stack, &iter->iter);
+	ret = reftable_iterator_seek_ref(&iter->iter, prefix);
 	if (ret)
 		goto done;
 
@@ -1015,8 +1012,6 @@ static int transaction_update_cmp(const void *a, const void *b)
 static int write_transaction_table(struct reftable_writer *writer, void *cb_data)
 {
 	struct write_transaction_table_arg *arg = cb_data;
-	struct reftable_merged_table *mt =
-		reftable_stack_merged_table(arg->stack);
 	uint64_t ts = reftable_stack_next_update_index(arg->stack);
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
@@ -1051,6 +1046,8 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 			struct reftable_log_record log = {0};
 			struct reftable_iterator it = {0};
 
+			reftable_stack_init_log_iterator(arg->stack, &it);
+
 			/*
 			 * When deleting refs we also delete all reflog entries
 			 * with them. While it is not strictly required to
@@ -1060,7 +1057,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 			 * Unfortunately, we have no better way than to delete
 			 * all reflog entries one by one.
 			 */
-			ret = reftable_merged_table_seek_log(mt, &it, u->refname);
+			ret = reftable_iterator_seek_log(&it, u->refname);
 			while (ret == 0) {
 				struct reftable_log_record *tombstone;
 
@@ -1354,7 +1351,6 @@ static int write_copy_table(struct reftable_writer *writer, void *cb_data)
 {
 	struct write_copy_arg *arg = cb_data;
 	uint64_t deletion_ts, creation_ts;
-	struct reftable_merged_table *mt = reftable_stack_merged_table(arg->stack);
 	struct reftable_ref_record old_ref = {0}, refs[2] = {0};
 	struct reftable_log_record old_log = {0}, *logs = NULL;
 	struct reftable_iterator it = {0};
@@ -1488,7 +1484,8 @@ static int write_copy_table(struct reftable_writer *writer, void *cb_data)
 	 * copy over all log entries from the old reflog. Last but not least,
 	 * when renaming we also have to delete all the old reflog entries.
 	 */
-	ret = reftable_merged_table_seek_log(mt, &it, arg->oldname);
+	reftable_stack_init_log_iterator(arg->stack, &it);
+	ret = reftable_iterator_seek_log(&it, arg->oldname);
 	if (ret < 0)
 		goto done;
 
@@ -1694,7 +1691,6 @@ static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
 								  struct reftable_stack *stack)
 {
-	struct reftable_merged_table *merged_table;
 	struct reftable_reflog_iterator *iter;
 	int ret;
 
@@ -1711,9 +1707,8 @@ static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftabl
 	if (ret < 0)
 		goto done;
 
-	merged_table = reftable_stack_merged_table(stack);
-
-	ret = reftable_merged_table_seek_log(merged_table, &iter->iter, "");
+	reftable_stack_init_log_iterator(stack, &iter->iter);
+	ret = reftable_iterator_seek_log(&iter->iter, "");
 	if (ret < 0)
 		goto done;
 
@@ -1771,7 +1766,6 @@ static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
 	struct reftable_ref_store *refs =
 		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent_reverse");
 	struct reftable_stack *stack = stack_for(refs, refname, &refname);
-	struct reftable_merged_table *mt = NULL;
 	struct reftable_log_record log = {0};
 	struct reftable_iterator it = {0};
 	int ret;
@@ -1779,8 +1773,8 @@ static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
 	if (refs->err < 0)
 		return refs->err;
 
-	mt = reftable_stack_merged_table(stack);
-	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	reftable_stack_init_log_iterator(stack, &it);
+	ret = reftable_iterator_seek_log(&it, refname);
 	while (!ret) {
 		ret = reftable_iterator_next_log(&it, &log);
 		if (ret < 0)
@@ -1808,7 +1802,6 @@ static int reftable_be_for_each_reflog_ent(struct ref_store *ref_store,
 	struct reftable_ref_store *refs =
 		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent");
 	struct reftable_stack *stack = stack_for(refs, refname, &refname);
-	struct reftable_merged_table *mt = NULL;
 	struct reftable_log_record *logs = NULL;
 	struct reftable_iterator it = {0};
 	size_t logs_alloc = 0, logs_nr = 0, i;
@@ -1817,8 +1810,8 @@ static int reftable_be_for_each_reflog_ent(struct ref_store *ref_store,
 	if (refs->err < 0)
 		return refs->err;
 
-	mt = reftable_stack_merged_table(stack);
-	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	reftable_stack_init_log_iterator(stack, &it);
+	ret = reftable_iterator_seek_log(&it, refname);
 	while (!ret) {
 		struct reftable_log_record log = {0};
 
@@ -1855,7 +1848,6 @@ static int reftable_be_reflog_exists(struct ref_store *ref_store,
 	struct reftable_ref_store *refs =
 		reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_exists");
 	struct reftable_stack *stack = stack_for(refs, refname, &refname);
-	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
 	struct reftable_log_record log = {0};
 	struct reftable_iterator it = {0};
 	int ret;
@@ -1868,7 +1860,8 @@ static int reftable_be_reflog_exists(struct ref_store *ref_store,
 	if (ret < 0)
 		goto done;
 
-	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	reftable_stack_init_log_iterator(stack, &it);
+	ret = reftable_iterator_seek_log(&it, refname);
 	if (ret < 0)
 		goto done;
 
@@ -1966,8 +1959,6 @@ struct write_reflog_delete_arg {
 static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_data)
 {
 	struct write_reflog_delete_arg *arg = cb_data;
-	struct reftable_merged_table *mt =
-		reftable_stack_merged_table(arg->stack);
 	struct reftable_log_record log = {0}, tombstone = {0};
 	struct reftable_iterator it = {0};
 	uint64_t ts = reftable_stack_next_update_index(arg->stack);
@@ -1975,12 +1966,14 @@ static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_da
 
 	reftable_writer_set_limits(writer, ts, ts);
 
+	reftable_stack_init_log_iterator(arg->stack, &it);
+
 	/*
 	 * In order to delete a table we need to delete all reflog entries one
 	 * by one. This is inefficient, but the reftable format does not have a
 	 * better marker right now.
 	 */
-	ret = reftable_merged_table_seek_log(mt, &it, arg->refname);
+	ret = reftable_iterator_seek_log(&it, arg->refname);
 	while (ret == 0) {
 		ret = reftable_iterator_next_log(&it, &log);
 		if (ret < 0)
@@ -2116,7 +2109,6 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 	struct reftable_ref_store *refs =
 		reftable_be_downcast(ref_store, REF_STORE_WRITE, "reflog_expire");
 	struct reftable_stack *stack = stack_for(refs, refname, &refname);
-	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
 	struct reftable_log_record *logs = NULL;
 	struct reftable_log_record *rewritten = NULL;
 	struct reftable_ref_record ref_record = {0};
@@ -2135,7 +2127,9 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 	if (ret < 0)
 		goto done;
 
-	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	reftable_stack_init_log_iterator(stack, &it);
+
+	ret = reftable_iterator_seek_log(&it, refname);
 	if (ret < 0)
 		goto done;
 
diff --git a/reftable/merged.c b/reftable/merged.c
index fc7946d90d..d127f99360 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -253,9 +253,9 @@ reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
 	return mt->min;
 }
 
-static void merged_table_init_iter(struct reftable_merged_table *mt,
-				   struct reftable_iterator *it,
-				   uint8_t typ)
+void merged_table_init_iter(struct reftable_merged_table *mt,
+			    struct reftable_iterator *it,
+			    uint8_t typ)
 {
 	struct merged_iter *mi = reftable_malloc(sizeof(*mi));
 	merged_iter_init(mi, mt, typ);
diff --git a/reftable/merged.h b/reftable/merged.h
index a2571dbc99..a10469f58e 100644
--- a/reftable/merged.h
+++ b/reftable/merged.h
@@ -26,4 +26,10 @@ struct reftable_merged_table {
 
 void merged_table_release(struct reftable_merged_table *mt);
 
+struct reftable_iterator;
+
+void merged_table_init_iter(struct reftable_merged_table *mt,
+			    struct reftable_iterator *it,
+			    uint8_t typ);
+
 #endif
diff --git a/reftable/reftable-stack.h b/reftable/reftable-stack.h
index 1b602dda58..26740e6073 100644
--- a/reftable/reftable-stack.h
+++ b/reftable/reftable-stack.h
@@ -66,6 +66,24 @@ int reftable_stack_add(struct reftable_stack *st,
 					  void *write_arg),
 		       void *write_arg);
 
+struct reftable_iterator;
+
+/*
+ * Initialize an iterator for the merged tables contained in the stack that can
+ * be used to iterate through refs. The iterator is valid until the next reload
+ * or write.
+ */
+void reftable_stack_init_ref_iterator(struct reftable_stack *st,
+				      struct reftable_iterator *it);
+
+/*
+ * Initialize an iterator for the merged tables contained in the stack that can
+ * be used to iterate through logs. The iterator is valid until the next reload
+ * or write.
+ */
+void reftable_stack_init_log_iterator(struct reftable_stack *st,
+				      struct reftable_iterator *it);
+
 /* returns the merged_table for seeking. This table is valid until the
  * next write or reload, and should not be closed or deleted.
  */
diff --git a/reftable/stack.c b/reftable/stack.c
index a59ebe038d..03f95935e1 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -10,6 +10,7 @@ license that can be found in the LICENSE file or at
 
 #include "../write-or-die.h"
 #include "system.h"
+#include "constants.h"
 #include "merged.h"
 #include "reader.h"
 #include "reftable-error.h"
@@ -130,6 +131,20 @@ int read_lines(const char *filename, char ***namesp)
 	return err;
 }
 
+void reftable_stack_init_ref_iterator(struct reftable_stack *st,
+				      struct reftable_iterator *it)
+{
+	merged_table_init_iter(reftable_stack_merged_table(st),
+			       it, BLOCK_TYPE_REF);
+}
+
+void reftable_stack_init_log_iterator(struct reftable_stack *st,
+				      struct reftable_iterator *it)
+{
+	merged_table_init_iter(reftable_stack_merged_table(st),
+			       it, BLOCK_TYPE_LOG);
+}
+
 struct reftable_merged_table *
 reftable_stack_merged_table(struct reftable_stack *st)
 {
-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 13/13] reftable/merged: adapt interface to allow reuse of iterators
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
                   ` (11 preceding siblings ...)
  2024-05-08 11:04 ` [PATCH 12/13] reftable/stack: provide convenience functions to create iterators Patrick Steinhardt
@ 2024-05-08 11:04 ` Patrick Steinhardt
  2024-05-08 23:42 ` [PATCH 00/13] reftable: prepare for re-seekable iterators Junio C Hamano
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-08 11:04 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 6937 bytes --]

Refactor the interfaces exposed by `struct reftable_merged_table` and
`struct merged_iter` such that they support iterator reuse. This is done
by separating initialization of the iterator and seeking on it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c          | 35 -----------------------------------
 reftable/merged_test.c     | 19 +++++++++++++------
 reftable/reftable-merged.h | 15 ---------------
 reftable/stack.c           | 14 +++++++++-----
 4 files changed, 22 insertions(+), 61 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index d127f99360..0da9dba265 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -262,41 +262,6 @@ void merged_table_init_iter(struct reftable_merged_table *mt,
 	iterator_from_merged_iter(it, mi);
 }
 
-int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
-				   struct reftable_iterator *it,
-				   const char *name)
-{
-	struct reftable_record rec = {
-		.type = BLOCK_TYPE_REF,
-		.u.ref = {
-			.refname = (char *)name,
-		},
-	};
-	merged_table_init_iter(mt, it, BLOCK_TYPE_REF);
-	return iterator_seek(it, &rec);
-}
-
-int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
-				      struct reftable_iterator *it,
-				      const char *name, uint64_t update_index)
-{
-	struct reftable_record rec = { .type = BLOCK_TYPE_LOG,
-				       .u.log = {
-					       .refname = (char *)name,
-					       .update_index = update_index,
-				       } };
-	merged_table_init_iter(mt, it, BLOCK_TYPE_LOG);
-	return iterator_seek(it, &rec);
-}
-
-int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
-				   struct reftable_iterator *it,
-				   const char *name)
-{
-	uint64_t max = ~((uint64_t)0);
-	return reftable_merged_table_seek_log_at(mt, it, name, max);
-}
-
 uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *mt)
 {
 	return mt->hash_id;
diff --git a/reftable/merged_test.c b/reftable/merged_test.c
index 530fc82d1c..33a17efcde 100644
--- a/reftable/merged_test.c
+++ b/reftable/merged_test.c
@@ -12,6 +12,7 @@ license that can be found in the LICENSE file or at
 
 #include "basics.h"
 #include "blocksource.h"
+#include "constants.h"
 #include "reader.h"
 #include "record.h"
 #include "test_framework.h"
@@ -145,7 +146,10 @@ static void test_merged_between(void)
 	int i;
 	struct reftable_ref_record ref = { NULL };
 	struct reftable_iterator it = { NULL };
-	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	int err;
+
+	merged_table_init_iter(mt, &it, BLOCK_TYPE_REF);
+	err = reftable_iterator_seek_ref(&it, "a");
 	EXPECT_ERR(err);
 
 	err = reftable_iterator_next_ref(&it, &ref);
@@ -217,14 +221,15 @@ static void test_merged(void)
 	struct reftable_reader **readers = NULL;
 	struct reftable_merged_table *mt =
 		merged_table_from_records(refs, &bs, &readers, sizes, bufs, 3);
-
 	struct reftable_iterator it = { NULL };
-	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	int err;
 	struct reftable_ref_record *out = NULL;
 	size_t len = 0;
 	size_t cap = 0;
 	int i = 0;
 
+	merged_table_init_iter(mt, &it, BLOCK_TYPE_REF);
+	err = reftable_iterator_seek_ref(&it, "a");
 	EXPECT_ERR(err);
 	EXPECT(reftable_merged_table_hash_id(mt) == GIT_SHA1_FORMAT_ID);
 	EXPECT(reftable_merged_table_min_update_index(mt) == 1);
@@ -348,14 +353,15 @@ static void test_merged_logs(void)
 	struct reftable_reader **readers = NULL;
 	struct reftable_merged_table *mt = merged_table_from_log_records(
 		logs, &bs, &readers, sizes, bufs, 3);
-
 	struct reftable_iterator it = { NULL };
-	int err = reftable_merged_table_seek_log(mt, &it, "a");
+	int err;
 	struct reftable_log_record *out = NULL;
 	size_t len = 0;
 	size_t cap = 0;
 	int i = 0;
 
+	merged_table_init_iter(mt, &it, BLOCK_TYPE_LOG);
+	err = reftable_iterator_seek_log(&it, "a");
 	EXPECT_ERR(err);
 	EXPECT(reftable_merged_table_hash_id(mt) == GIT_SHA1_FORMAT_ID);
 	EXPECT(reftable_merged_table_min_update_index(mt) == 1);
@@ -377,7 +383,8 @@ static void test_merged_logs(void)
 						 GIT_SHA1_RAWSZ));
 	}
 
-	err = reftable_merged_table_seek_log_at(mt, &it, "a", 2);
+	merged_table_init_iter(mt, &it, BLOCK_TYPE_LOG);
+	err = reftable_iterator_seek_log_at(&it, "a", 2);
 	EXPECT_ERR(err);
 	reftable_log_record_release(&out[0]);
 	err = reftable_iterator_next_log(&it, &out[0]);
diff --git a/reftable/reftable-merged.h b/reftable/reftable-merged.h
index c91a2d83a2..14d5fc9f05 100644
--- a/reftable/reftable-merged.h
+++ b/reftable/reftable-merged.h
@@ -36,21 +36,6 @@ int reftable_new_merged_table(struct reftable_merged_table **dest,
 			      struct reftable_table *stack, size_t n,
 			      uint32_t hash_id);
 
-/* returns an iterator positioned just before 'name' */
-int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
-				   struct reftable_iterator *it,
-				   const char *name);
-
-/* returns an iterator for log entry, at given update_index */
-int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
-				      struct reftable_iterator *it,
-				      const char *name, uint64_t update_index);
-
-/* like reftable_merged_table_seek_log_at but look for the newest entry. */
-int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
-				   struct reftable_iterator *it,
-				   const char *name);
-
 /* returns the max update_index covered by this merged table. */
 uint64_t
 reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
diff --git a/reftable/stack.c b/reftable/stack.c
index 03f95935e1..7af4c3fd66 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -925,7 +925,8 @@ static int stack_write_compact(struct reftable_stack *st,
 		goto done;
 	}
 
-	err = reftable_merged_table_seek_ref(mt, &it, "");
+	merged_table_init_iter(mt, &it, BLOCK_TYPE_REF);
+	err = reftable_iterator_seek_ref(&it, "");
 	if (err < 0)
 		goto done;
 
@@ -949,7 +950,8 @@ static int stack_write_compact(struct reftable_stack *st,
 	}
 	reftable_iterator_destroy(&it);
 
-	err = reftable_merged_table_seek_log(mt, &it, "");
+	merged_table_init_iter(mt, &it, BLOCK_TYPE_LOG);
+	err = reftable_iterator_seek_log(&it, "");
 	if (err < 0)
 		goto done;
 
@@ -1340,9 +1342,11 @@ int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
 int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
 			    struct reftable_log_record *log)
 {
-	struct reftable_iterator it = { NULL };
-	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
-	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	struct reftable_iterator it = {0};
+	int err;
+
+	reftable_stack_init_log_iterator(st, &it);
+	err = reftable_iterator_seek_log(&it, refname);
 	if (err)
 		goto done;
 
-- 
2.45.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/13] reftable: prepare for re-seekable iterators
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
                   ` (12 preceding siblings ...)
  2024-05-08 11:04 ` [PATCH 13/13] reftable/merged: adapt interface to allow reuse of iterators Patrick Steinhardt
@ 2024-05-08 23:42 ` Junio C Hamano
  2024-05-09  0:16   ` Junio C Hamano
  2024-05-10  7:48   ` Patrick Steinhardt
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
  14 siblings, 2 replies; 50+ messages in thread
From: Junio C Hamano @ 2024-05-08 23:42 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> To address this inefficiency, the patch series at hand refactors the
> reftable library such that creation of iterators and seeking on an
> iterator are separate steps. This refactoring prepares us for reusing
> iterators to perform multiple seeks, which in turn will allow us to
> reuse internal data structures for subsequent seeks.

;-)

> Note: this series does not yet go all the way to re-seekable iterators,
> and there are no users yet. The patch series is complex enough as-is
> already, so I decided to defer that to the next iteration. Thus, the
> whole refactoring here should essentially be a large no-op that prepares
> the infrastructure for re-seekable iterators.
>
> The series depends on pks/reftable-write-optim at fa74f32291
> (reftable/block: reuse compressed array, 2024-04-08).

There is another topic on reftable to make write options tweakable,
whose addition of reftable/dump and reader_print_blocks interface
needs to be adjusted to this change, I think.



 reftable/reader.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git c/reftable/reader.c w/reftable/reader.c
index 2ea830bdb6..fd516e01db 100644
--- c/reftable/reader.c
+++ w/reftable/reader.c
@@ -841,7 +841,7 @@ int reftable_reader_print_blocks(const char *tablename)
 		},
 	};
 	struct reftable_block_source src = { 0 };
-	struct table_iter ti = TABLE_ITER_INIT;
+	struct table_iter *ti;
 	struct reftable_reader *r = NULL;
 	size_t i;
 	int err;
@@ -854,11 +854,14 @@ int reftable_reader_print_blocks(const char *tablename)
 	if (err < 0)
 		goto done;
 
+	REFTABLE_ALLOC_ARRAY(ti, 1);
+	table_iter_init(ti, r);
+
 	printf("header:\n");
 	printf("  block_size: %d\n", r->block_size);
 
 	for (i = 0; i < ARRAY_SIZE(sections); i++) {
-		err = reader_start(r, &ti, sections[i].type, 0);
+		err = table_iter_seek_start(ti, sections[i].type, 0);
 		if (err < 0)
 			goto done;
 		if (err > 0)
@@ -867,10 +870,10 @@ int reftable_reader_print_blocks(const char *tablename)
 		printf("%s:\n", sections[i].name);
 
 		while (1) {
-			printf("  - length: %u\n", ti.br.block_len);
-			printf("    restarts: %u\n", ti.br.restart_count);
+			printf("  - length: %u\n", ti->br.block_len);
+			printf("    restarts: %u\n", ti->br.restart_count);
 
-			err = table_iter_next_block(&ti);
+			err = table_iter_next_block(ti);
 			if (err < 0)
 				goto done;
 			if (err > 0)
@@ -880,6 +883,6 @@ int reftable_reader_print_blocks(const char *tablename)
 
 done:
 	reftable_reader_free(r);
-	table_iter_close(&ti);
+	table_iter_close(ti);
 	return err;
 }

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/13] reftable: prepare for re-seekable iterators
  2024-05-08 23:42 ` [PATCH 00/13] reftable: prepare for re-seekable iterators Junio C Hamano
@ 2024-05-09  0:16   ` Junio C Hamano
  2024-05-10  7:48   ` Patrick Steinhardt
  1 sibling, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2024-05-09  0:16 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> diff --git c/reftable/reader.c w/reftable/reader.c
> index 2ea830bdb6..fd516e01db 100644
> --- c/reftable/reader.c
> +++ w/reftable/reader.c
> @@ -841,7 +841,7 @@ int reftable_reader_print_blocks(const char *tablename)
>  		},
>  	};
>  	struct reftable_block_source src = { 0 };
> -	struct table_iter ti = TABLE_ITER_INIT;
> +	struct table_iter *ti;
>  	struct reftable_reader *r = NULL;
>  	size_t i;
>  	int err;

... an unusually early error could skip everything here ...

> @@ -880,6 +883,6 @@ int reftable_reader_print_blocks(const char *tablename)
>  
>  done:
>  	reftable_reader_free(r);
> -	table_iter_close(&ti);
> +	table_iter_close(ti);

... and cause this to break.  In the version of the merge-fix I
used, *ti is initialized to NULL and the _close() is called only
when ti is not NULL.

>  	return err;
>  }

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/13] reftable: prepare for re-seekable iterators
  2024-05-08 23:42 ` [PATCH 00/13] reftable: prepare for re-seekable iterators Junio C Hamano
  2024-05-09  0:16   ` Junio C Hamano
@ 2024-05-10  7:48   ` Patrick Steinhardt
  2024-05-10 15:40     ` Junio C Hamano
  1 sibling, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-10  7:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 3234 bytes --]

On Wed, May 08, 2024 at 04:42:11PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > To address this inefficiency, the patch series at hand refactors the
> > reftable library such that creation of iterators and seeking on an
> > iterator are separate steps. This refactoring prepares us for reusing
> > iterators to perform multiple seeks, which in turn will allow us to
> > reuse internal data structures for subsequent seeks.
> 
> ;-)
> 
> > Note: this series does not yet go all the way to re-seekable iterators,
> > and there are no users yet. The patch series is complex enough as-is
> > already, so I decided to defer that to the next iteration. Thus, the
> > whole refactoring here should essentially be a large no-op that prepares
> > the infrastructure for re-seekable iterators.
> >
> > The series depends on pks/reftable-write-optim at fa74f32291
> > (reftable/block: reuse compressed array, 2024-04-08).
> 
> There is another topic on reftable to make write options tweakable,
> whose addition of reftable/dump and reader_print_blocks interface
> needs to be adjusted to this change, I think.

True, I forgot to double check that one. Your proposed resolution isn't
quite correct as we now leak the `ti` pointer -- `table_iter_close()`
only closes the iterator and releases its underlying resources, but does
not free the iterator itself.

The below diff would be needed on top of what you currently have in
`seen`. In any case though, I can also resend this topic with
ps/reftable-write-options pulled in as a dependency. Please let me know
your preference.

Patrick

-- >8 --

diff --git a/reftable/reader.c b/reftable/reader.c
index 2d9630b7c2..83f6772e5d 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -841,8 +841,8 @@ int reftable_reader_print_blocks(const char *tablename)
 		},
 	};
 	struct reftable_block_source src = { 0 };
-	struct table_iter *ti = NULL;
 	struct reftable_reader *r = NULL;
+	struct table_iter ti = {0};
 	size_t i;
 	int err;
 
@@ -854,14 +854,13 @@ int reftable_reader_print_blocks(const char *tablename)
 	if (err < 0)
 		goto done;
 
-	REFTABLE_ALLOC_ARRAY(ti, 1);
-	table_iter_init(ti, r);
+	table_iter_init(&ti, r);
 
 	printf("header:\n");
 	printf("  block_size: %d\n", r->block_size);
 
 	for (i = 0; i < ARRAY_SIZE(sections); i++) {
-		err = table_iter_seek_start(ti, sections[i].type, 0);
+		err = table_iter_seek_start(&ti, sections[i].type, 0);
 		if (err < 0)
 			goto done;
 		if (err > 0)
@@ -870,10 +869,10 @@ int reftable_reader_print_blocks(const char *tablename)
 		printf("%s:\n", sections[i].name);
 
 		while (1) {
-			printf("  - length: %u\n", ti->br.block_len);
-			printf("    restarts: %u\n", ti->br.restart_count);
+			printf("  - length: %u\n", ti.br.block_len);
+			printf("    restarts: %u\n", ti.br.restart_count);
 
-			err = table_iter_next_block(ti);
+			err = table_iter_next_block(&ti);
 			if (err < 0)
 				goto done;
 			if (err > 0)
@@ -883,7 +882,6 @@ int reftable_reader_print_blocks(const char *tablename)
 
 done:
 	reftable_reader_free(r);
-	if (ti)
-		table_iter_close(ti);
+	table_iter_close(&ti);
 	return err;
 }

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/13] reftable: prepare for re-seekable iterators
  2024-05-10  7:48   ` Patrick Steinhardt
@ 2024-05-10 15:40     ` Junio C Hamano
  2024-05-10 16:13       ` Patrick Steinhardt
  0 siblings, 1 reply; 50+ messages in thread
From: Junio C Hamano @ 2024-05-10 15:40 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> The below diff would be needed on top of what you currently have in
> `seen`. In any case though, I can also resend this topic with
> ps/reftable-write-options pulled in as a dependency. Please let me know
> your preference.

Is it "needed", in the sense that "without the fix what you posted
is broken in such and such ways", or is it "I think it is niceR to
have it on stack because this one instance does not have to be on
heap"?  To me, they look equivalent and I have no problems with the
"nicer" variant, but your "needed" makes me wonder if I am missing
some correctness invariants I am violating without realizing.

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/13] reftable: prepare for re-seekable iterators
  2024-05-10 15:40     ` Junio C Hamano
@ 2024-05-10 16:13       ` Patrick Steinhardt
  2024-05-10 17:17         ` Junio C Hamano
  0 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-10 16:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 921 bytes --]

On Fri, May 10, 2024 at 08:40:56AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > The below diff would be needed on top of what you currently have in
> > `seen`. In any case though, I can also resend this topic with
> > ps/reftable-write-options pulled in as a dependency. Please let me know
> > your preference.
> 
> Is it "needed", in the sense that "without the fix what you posted
> is broken in such and such ways", or is it "I think it is niceR to
> have it on stack because this one instance does not have to be on
> heap"?  To me, they look equivalent and I have no problems with the
> "nicer" variant, but your "needed" makes me wonder if I am missing
> some correctness invariants I am violating without realizing.

It's needed in the sense that your version leaks memory -- the `ti`
pointer is never free'd. Other than that they are equivalent indeed.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/13] reftable: prepare for re-seekable iterators
  2024-05-10 16:13       ` Patrick Steinhardt
@ 2024-05-10 17:17         ` Junio C Hamano
  0 siblings, 0 replies; 50+ messages in thread
From: Junio C Hamano @ 2024-05-10 17:17 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

Patrick Steinhardt <ps@pks.im> writes:

> It's needed in the sense that your version leaks memory -- the `ti`
> pointer is never free'd. Other than that they are equivalent indeed.

Ah, OK, I somehow misread that _close() will free the resource, but
it only frees the resources held by the shell and not the shell
itself.

Thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/13] reftable/merged: split up initialization and seeking of records
  2024-05-08 11:04 ` [PATCH 07/13] reftable/merged: split up initialization and seeking of records Patrick Steinhardt
@ 2024-05-10 19:18   ` Justin Tobler
  2024-05-13  8:36     ` Patrick Steinhardt
  0 siblings, 1 reply; 50+ messages in thread
From: Justin Tobler @ 2024-05-10 19:18 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 24/05/08 01:04PM, Patrick Steinhardt wrote:
> To initialize a `struct merged_iter`, we need to seek all subiterators
> to the wanted record and then add their results to the priority queue
> used to sort the records. This logic is split up across two functions,
> `merged_table_seek_record()` and `merged_table_iter()`. The scope of

Did we mean `merged_iter_init` instead of `merged_table_iter()` here?

> these functions is somewhat weird though, where `merged_table_iter()` is
> only responsible for adding the records of the subiterators to the
> priority queue.
> 
> Clarify the scope of those functions such that `merged_table_iter()` is
> only responsible for initializing the iterator's structure. Performing
> the subiterator seeks are now part of `merged_table_seek_record()`.
> 
> This step is required to move seeking of records into the generic
> `struct reftable_iterator` infrastructure.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
...
> @@ -246,32 +230,33 @@ static int merged_table_seek_record(struct reftable_merged_table *mt,
>  				    struct reftable_iterator *it,
>  				    struct reftable_record *rec)
>  {
> -	struct merged_iter merged = {
> -		.typ = reftable_record_type(rec),
> -		.hash_id = mt->hash_id,
> -		.suppress_deletions = mt->suppress_deletions,
> -		.advance_index = -1,
> -	};
> -	struct merged_iter *p;
> +	struct merged_iter merged, *p;
>  	int err;
>  
> -	REFTABLE_CALLOC_ARRAY(merged.subiters, mt->stack_len);
> +	merged_iter_init(&merged, mt);
> +
>  	for (size_t i = 0; i < mt->stack_len; i++) {
> +		reftable_record_init(&merged.subiters[merged.stack_len].rec,
> +				     reftable_record_type(rec));

I find it somewhat confusing how we increment the subiterator index
here. I assume this is because when a record is not found we don't want
to add it to the index. Might be nice to add a comment here explaining
this.

Edit: Looks like you address this in the next commit so I guess we are
good :)

> +
>  		err = reftable_table_seek_record(&mt->stack[i],
>  						 &merged.subiters[merged.stack_len].iter, rec);
>  		if (err < 0)
>  			goto out;
> -		if (!err)
> -			merged.stack_len++;
> -	}
> +		if (err > 0)
> +			continue;
>  
> -	err = merged_iter_init(&merged);
> -	if (err < 0)
> -		goto out;
> +		err = merged_iter_advance_subiter(&merged, merged.stack_len);
> +		if (err < 0)
> +			goto out;
> +
> +		merged.stack_len++;
> +	}
>  
> -	p = reftable_malloc(sizeof(struct merged_iter));
> +	p = reftable_malloc(sizeof(*p));
>  	*p = merged;
>  	iterator_from_merged_iter(it, p);
> +	err = 0;
>  
>  out:
>  	if (err < 0)
> -- 
> 2.45.0
> 



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 08/13] reftable/merged: simplify indices for subiterators
  2024-05-08 11:04 ` [PATCH 08/13] reftable/merged: simplify indices for subiterators Patrick Steinhardt
@ 2024-05-10 19:25   ` Justin Tobler
  2024-05-13  8:36     ` Patrick Steinhardt
  0 siblings, 1 reply; 50+ messages in thread
From: Justin Tobler @ 2024-05-10 19:25 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 24/05/08 01:04PM, Patrick Steinhardt wrote:
> When seeking on a merged table, we perform the seek for each of the
> subiterators. If the subiterator hasa the desired record we add it to

s/hasa/has/

> the priority queue, otherwise we skip it and don't add it to the stack
> of subiterators hosted by the merged table.
> 
> The consequence of this is that the index of the subiterator in the
> merged table does not necessarily correspond to the index of it in the
> merged iterator. Next to being potentially confusing, it also means that
> we won't easily be able to re-seek the merged iterator because we have
> no clear connection between both of the data structures.

Ah, I also found this a bit confusing. I think this is a good change.
> 
> Refactor the code so that the index stays the same in both structures.

Was there any advantage to not adding subiterators to the stack
originally? It looks like it adding them doesn't affect anything.

-Justin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 09/13] reftable/generic: move seeking of records into the iterator
  2024-05-08 11:04 ` [PATCH 09/13] reftable/generic: move seeking of records into the iterator Patrick Steinhardt
@ 2024-05-10 21:44   ` Justin Tobler
  2024-05-13  8:36     ` Patrick Steinhardt
  0 siblings, 1 reply; 50+ messages in thread
From: Justin Tobler @ 2024-05-10 21:44 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 24/05/08 01:04PM, Patrick Steinhardt wrote:
> Reftable iterators are created by seeking on the parent structure of a
> corresponding record. For example, to create an iterator for the merged
> table you would call `reftable_merged_table_seek_ref()`. Most notably,
> it is not posible to create an iterator and then seek it afterwards.
> 
> While this may be a bit easier to reason about, it comes with two
> significant downsides. The first downside is that the logic to find
> records is split up between the parent data structure and the iterator
> itself. Conceptually, it is more straight forward if all that logic was
> contained in a single place, which should be the iterator.
> 
> The second and more significant downside is that it is impossible to
> reuse iterators for multiple seeks. Whenever you want to look up a
> record, you need to re-create the whole infrastructure again, which is
> quite a waste of time. Furthermore, it is impossible to for example
> optimize seeks, for example when seeking the same record multiple times.

The last setence could use some rewording.

"Furthermore, it is impossible to optimize seeks, such as when seeking
the same record multiple times."

> 
> To address this, we essentially split up the concerns properly such that
> the parent data structure is responsible for setting up the iterator via
> a new `init_iter()` callback, whereas the iterator handles seeks via a
> new `seek()` callback. This will eventually allow us to call `seek()` on
> the iterator multiple times, where every iterator can potentially
> optimize for certain cases.
> 
> Note that at this point in time we are not yet ready to reuse the
> iterators. This will be left for a future patch series.
> 
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/generic.c |  47 +++++++++++++----
>  reftable/generic.h |   9 +++-
>  reftable/iter.c    |  15 ++++++
>  reftable/merged.c  |  97 +++++++++++++++++-----------------
>  reftable/reader.c  | 126 +++++++++++++++++++++++++--------------------
>  5 files changed, 177 insertions(+), 117 deletions(-)
> 
> diff --git a/reftable/generic.c b/reftable/generic.c
> index b9f1c7c18a..1cf68fe124 100644
> --- a/reftable/generic.c
> +++ b/reftable/generic.c
> @@ -12,25 +12,39 @@ license that can be found in the LICENSE file or at
>  #include "reftable-iterator.h"
>  #include "reftable-generic.h"
>  
> +void table_init_iter(struct reftable_table *tab,

The following table related functions are prefixed with `reftable_`. Do
we want to do the same here?

> +		     struct reftable_iterator *it,
> +		     uint8_t typ)
> +{
> +
> +	tab->ops->init_iter(tab->table_arg, it, typ);
> +}
> +
>  int reftable_table_seek_ref(struct reftable_table *tab,
>  			    struct reftable_iterator *it, const char *name)
>  {
> -	struct reftable_record rec = { .type = BLOCK_TYPE_REF,
> -				       .u.ref = {
> -					       .refname = (char *)name,
> -				       } };
> -	return tab->ops->seek_record(tab->table_arg, it, &rec);
> +	struct reftable_record want = {
> +		.type = BLOCK_TYPE_REF,
> +		.u.ref = {
> +			.refname = (char *)name,
> +		},
> +	};
> +	table_init_iter(tab, it, BLOCK_TYPE_REF);
> +	return it->ops->seek(it->iter_arg, &want);
>  }
>  
>  int reftable_table_seek_log(struct reftable_table *tab,
>  			    struct reftable_iterator *it, const char *name)
>  {
> -	struct reftable_record rec = { .type = BLOCK_TYPE_LOG,
> -				       .u.log = {
> -					       .refname = (char *)name,
> -					       .update_index = ~((uint64_t)0),
> -				       } };
> -	return tab->ops->seek_record(tab->table_arg, it, &rec);
> +	struct reftable_record want = {
> +		.type = BLOCK_TYPE_LOG,
> +		.u.log = {
> +			.refname = (char *)name,
> +			.update_index = ~((uint64_t)0),
> +		},
> +	};
> +	table_init_iter(tab, it, BLOCK_TYPE_LOG);
> +	return it->ops->seek(it->iter_arg, &want);
>  }
>  
>  int reftable_table_read_ref(struct reftable_table *tab, const char *name,
...
> @@ -23,6 +23,13 @@ static void filtering_ref_iterator_close(void *iter_arg)
>  	reftable_iterator_destroy(&fri->it);
>  }
>  
> +static int filtering_ref_iterator_seek(void *iter_arg,
> +				       struct reftable_record *want)
> +{
> +	struct filtering_ref_iterator *fri = iter_arg;
> +	return iterator_seek(&fri->it, want);
> +}

I've found the `filtering_ref_iterator_seek()` here to be a little
confusing. At first, I assumed that the `filtering_ref_iterator` would
have referenced `filtering_ref_iterator_vtable` thus resulting in a
cycle, but on closer inspection this does not seem to be the case and is
in face always set to some other iterator operation.

Am I understanding this correctly?

-Justin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/13] reftable/reader: adapt interface to allow reuse of iterators
  2024-05-08 11:04 ` [PATCH 11/13] reftable/reader: " Patrick Steinhardt
@ 2024-05-10 21:48   ` Justin Tobler
  2024-05-13  8:36     ` Patrick Steinhardt
  0 siblings, 1 reply; 50+ messages in thread
From: Justin Tobler @ 2024-05-10 21:48 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git

On 24/05/08 01:04PM, Patrick Steinhardt wrote:

> +/* Initialize a reftable iterator for reading refs. */
> +void reftable_reader_init_ref_iterator(struct reftable_reader *r,
> +				       struct reftable_iterator *it);
> +
> +/* Initialize a reftable iterator for reading refs. */

I think you meant to say "reading logs" here.

> +void reftable_reader_init_log_iterator(struct reftable_reader *r,
> +				       struct reftable_iterator *it);
>  

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 07/13] reftable/merged: split up initialization and seeking of records
  2024-05-10 19:18   ` Justin Tobler
@ 2024-05-13  8:36     ` Patrick Steinhardt
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:36 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1711 bytes --]

On Fri, May 10, 2024 at 02:18:16PM -0500, Justin Tobler wrote:
> On 24/05/08 01:04PM, Patrick Steinhardt wrote:
> > To initialize a `struct merged_iter`, we need to seek all subiterators
> > to the wanted record and then add their results to the priority queue
> > used to sort the records. This logic is split up across two functions,
> > `merged_table_seek_record()` and `merged_table_iter()`. The scope of
> 
> Did we mean `merged_iter_init` instead of `merged_table_iter()` here?

Indeed, good catch.

> > @@ -246,32 +230,33 @@ static int merged_table_seek_record(struct reftable_merged_table *mt,
> >  				    struct reftable_iterator *it,
> >  				    struct reftable_record *rec)
> >  {
> > -	struct merged_iter merged = {
> > -		.typ = reftable_record_type(rec),
> > -		.hash_id = mt->hash_id,
> > -		.suppress_deletions = mt->suppress_deletions,
> > -		.advance_index = -1,
> > -	};
> > -	struct merged_iter *p;
> > +	struct merged_iter merged, *p;
> >  	int err;
> >  
> > -	REFTABLE_CALLOC_ARRAY(merged.subiters, mt->stack_len);
> > +	merged_iter_init(&merged, mt);
> > +
> >  	for (size_t i = 0; i < mt->stack_len; i++) {
> > +		reftable_record_init(&merged.subiters[merged.stack_len].rec,
> > +				     reftable_record_type(rec));
> 
> I find it somewhat confusing how we increment the subiterator index
> here. I assume this is because when a record is not found we don't want
> to add it to the index. Might be nice to add a comment here explaining
> this.
> 
> Edit: Looks like you address this in the next commit so I guess we are
> good :)

Seconded. And yes, as you noticed, this is one of the reasons why I
change this in the next patch.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 08/13] reftable/merged: simplify indices for subiterators
  2024-05-10 19:25   ` Justin Tobler
@ 2024-05-13  8:36     ` Patrick Steinhardt
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:36 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1139 bytes --]

On Fri, May 10, 2024 at 02:25:09PM -0500, Justin Tobler wrote:
> On 24/05/08 01:04PM, Patrick Steinhardt wrote:
> > When seeking on a merged table, we perform the seek for each of the
> > subiterators. If the subiterator hasa the desired record we add it to
> 
> s/hasa/has/

Fixed.

> > the priority queue, otherwise we skip it and don't add it to the stack
> > of subiterators hosted by the merged table.
> > 
> > The consequence of this is that the index of the subiterator in the
> > merged table does not necessarily correspond to the index of it in the
> > merged iterator. Next to being potentially confusing, it also means that
> > we won't easily be able to re-seek the merged iterator because we have
> > no clear connection between both of the data structures.
> 
> Ah, I also found this a bit confusing. I think this is a good change.
> > 
> > Refactor the code so that the index stays the same in both structures.
> 
> Was there any advantage to not adding subiterators to the stack
> originally? It looks like it adding them doesn't affect anything.

Not to the best of my knowledge, no.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 09/13] reftable/generic: move seeking of records into the iterator
  2024-05-10 21:44   ` Justin Tobler
@ 2024-05-13  8:36     ` Patrick Steinhardt
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:36 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 3114 bytes --]

On Fri, May 10, 2024 at 04:44:54PM -0500, Justin Tobler wrote:
> On 24/05/08 01:04PM, Patrick Steinhardt wrote:
> > Reftable iterators are created by seeking on the parent structure of a
> > corresponding record. For example, to create an iterator for the merged
> > table you would call `reftable_merged_table_seek_ref()`. Most notably,
> > it is not posible to create an iterator and then seek it afterwards.
> > 
> > While this may be a bit easier to reason about, it comes with two
> > significant downsides. The first downside is that the logic to find
> > records is split up between the parent data structure and the iterator
> > itself. Conceptually, it is more straight forward if all that logic was
> > contained in a single place, which should be the iterator.
> > 
> > The second and more significant downside is that it is impossible to
> > reuse iterators for multiple seeks. Whenever you want to look up a
> > record, you need to re-create the whole infrastructure again, which is
> > quite a waste of time. Furthermore, it is impossible to for example
> > optimize seeks, for example when seeking the same record multiple times.
> 
> The last setence could use some rewording.
> 
> "Furthermore, it is impossible to optimize seeks, such as when seeking
> the same record multiple times."

Done.

[snip]
> > diff --git a/reftable/generic.c b/reftable/generic.c
> > index b9f1c7c18a..1cf68fe124 100644
> > --- a/reftable/generic.c
> > +++ b/reftable/generic.c
> > @@ -12,25 +12,39 @@ license that can be found in the LICENSE file or at
> >  #include "reftable-iterator.h"
> >  #include "reftable-generic.h"
> >  
> > +void table_init_iter(struct reftable_table *tab,
> 
> The following table related functions are prefixed with `reftable_`. Do
> we want to do the same here?

Functions with the `reftable_` prefix are supposed to be public, whereas
functions without them are private. So this is intentionally missing the
prefix.

[snip]
> > @@ -23,6 +23,13 @@ static void filtering_ref_iterator_close(void *iter_arg)
> >  	reftable_iterator_destroy(&fri->it);
> >  }
> >  
> > +static int filtering_ref_iterator_seek(void *iter_arg,
> > +				       struct reftable_record *want)
> > +{
> > +	struct filtering_ref_iterator *fri = iter_arg;
> > +	return iterator_seek(&fri->it, want);
> > +}
> 
> I've found the `filtering_ref_iterator_seek()` here to be a little
> confusing. At first, I assumed that the `filtering_ref_iterator` would
> have referenced `filtering_ref_iterator_vtable` thus resulting in a
> cycle, but on closer inspection this does not seem to be the case and is
> in face always set to some other iterator operation.
> 
> Am I understanding this correctly?

Yes. The filtering ref iterator wraps a _different_ iterator, which is
`fri->it` in the above case, and only returns a subset of the records of
that wrapped iterator. So we eventually end up calling the callbacks of
the wrapped iterator, which are likely not a filtering ref iterator
themselves (even though that would in theory be possible).

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 11/13] reftable/reader: adapt interface to allow reuse of iterators
  2024-05-10 21:48   ` Justin Tobler
@ 2024-05-13  8:36     ` Patrick Steinhardt
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:36 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 579 bytes --]

On Fri, May 10, 2024 at 04:48:45PM -0500, Justin Tobler wrote:
> On 24/05/08 01:04PM, Patrick Steinhardt wrote:
> 
> > +/* Initialize a reftable iterator for reading refs. */
> > +void reftable_reader_init_ref_iterator(struct reftable_reader *r,
> > +				       struct reftable_iterator *it);
> > +
> > +/* Initialize a reftable iterator for reading refs. */
> 
> I think you meant to say "reading logs" here.

Indeed, thanks!

Patrick

> > +void reftable_reader_init_log_iterator(struct reftable_reader *r,
> > +				       struct reftable_iterator *it);
> >  

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 00/13] reftable: prepare for re-seekable iterators
  2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
                   ` (13 preceding siblings ...)
  2024-05-08 23:42 ` [PATCH 00/13] reftable: prepare for re-seekable iterators Junio C Hamano
@ 2024-05-13  8:46 ` Patrick Steinhardt
  2024-05-13  8:47   ` [PATCH v2 01/13] reftable/block: use `size_t` to track restart point index Patrick Steinhardt
                     ` (14 more replies)
  14 siblings, 15 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:46 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 6572 bytes --]

Hi,

this is the second version of my patch series that prepares the reftable
iterators to become re-seekable. These refactorings will eventually
allow us to reuse data structures by iterators and optimize for certain
cases.

Changes compared to v1:

  - Various fixes to commit messages.

  - Fixed a copy & pasted comment to refer to logs instead of refs.

The series continues to build on top of ps/reftable-write-optim. There
is a merge conflict with the in-flight ps/reftable-write-options, but
given that this series has not yet been merged to next and because Junio
has already resolved the conflict in "seen" I decided to not pull it in
as an additional dependency.

Thanks!

Patrick

Patrick Steinhardt (13):
  reftable/block: use `size_t` to track restart point index
  reftable/reader: avoid copying index iterator
  reftable/reader: unify indexed and linear seeking
  reftable/reader: separate concerns of table iter and reftable reader
  reftable/reader: inline `reader_seek_internal()`
  reftable/reader: set up the reader when initializing table iterator
  reftable/merged: split up initialization and seeking of records
  reftable/merged: simplify indices for subiterators
  reftable/generic: move seeking of records into the iterator
  reftable/generic: adapt interface to allow reuse of iterators
  reftable/reader: adapt interface to allow reuse of iterators
  reftable/stack: provide convenience functions to create iterators
  reftable/merged: adapt interface to allow reuse of iterators

 refs/reftable-backend.c      |  48 ++++----
 reftable/block.c             |   4 +-
 reftable/generic.c           |  94 +++++++++++----
 reftable/generic.h           |   9 +-
 reftable/iter.c              |  23 +++-
 reftable/merged.c            | 148 ++++++++----------------
 reftable/merged.h            |   6 +
 reftable/merged_test.c       |  19 ++-
 reftable/reader.c            | 218 +++++++++++++++--------------------
 reftable/readwrite_test.c    |  35 ++++--
 reftable/reftable-generic.h  |   8 +-
 reftable/reftable-iterator.h |  21 ++++
 reftable/reftable-merged.h   |  15 ---
 reftable/reftable-reader.h   |  45 ++------
 reftable/reftable-stack.h    |  18 +++
 reftable/stack.c             |  29 ++++-
 16 files changed, 378 insertions(+), 362 deletions(-)

Range-diff against v1:
 1:  ca86a8b58d =  1:  ca86a8b58d reftable/block: use `size_t` to track restart point index
 2:  bdbebdbd36 =  2:  bdbebdbd36 reftable/reader: avoid copying index iterator
 3:  716863a580 =  3:  716863a580 reftable/reader: unify indexed and linear seeking
 4:  91db2f18c1 =  4:  91db2f18c1 reftable/reader: separate concerns of table iter and reftable reader
 5:  4d498ef342 =  5:  4d498ef342 reftable/reader: inline `reader_seek_internal()`
 6:  5a10a11584 =  6:  5a10a11584 reftable/reader: set up the reader when initializing table iterator
 7:  21b3e3ab5f !  7:  12c10fd366 reftable/merged: split up initialization and seeking of records
    @@ Commit message
         To initialize a `struct merged_iter`, we need to seek all subiterators
         to the wanted record and then add their results to the priority queue
         used to sort the records. This logic is split up across two functions,
    -    `merged_table_seek_record()` and `merged_table_iter()`. The scope of
    -    these functions is somewhat weird though, where `merged_table_iter()` is
    +    `merged_table_seek_record()` and `merged_iter_init()`. The scope of
    +    these functions is somewhat weird though, where `merged_iter_init()` is
         only responsible for adding the records of the subiterators to the
         priority queue.
     
    -    Clarify the scope of those functions such that `merged_table_iter()` is
    +    Clarify the scope of those functions such that `merged_iter_init()` is
         only responsible for initializing the iterator's structure. Performing
         the subiterator seeks are now part of `merged_table_seek_record()`.
     
 8:  f0f42cd56b !  8:  31bdfc1e34 reftable/merged: simplify indices for subiterators
    @@ Commit message
         reftable/merged: simplify indices for subiterators
     
         When seeking on a merged table, we perform the seek for each of the
    -    subiterators. If the subiterator hasa the desired record we add it to
    -    the priority queue, otherwise we skip it and don't add it to the stack
    -    of subiterators hosted by the merged table.
    +    subiterators. If the subiterator has the desired record we add it to the
    +    priority queue, otherwise we skip it and don't add it to the stack of
    +    subiterators hosted by the merged table.
     
         The consequence of this is that the index of the subiterator in the
         merged table does not necessarily correspond to the index of it in the
 9:  859b399e71 !  9:  3e91c3830e reftable/generic: move seeking of records into the iterator
    @@ Commit message
         The second and more significant downside is that it is impossible to
         reuse iterators for multiple seeks. Whenever you want to look up a
         record, you need to re-create the whole infrastructure again, which is
    -    quite a waste of time. Furthermore, it is impossible to for example
    -    optimize seeks, for example when seeking the same record multiple times.
    +    quite a waste of time. Furthermore, it is impossible to optimize seeks,
    +    such as when seeking the same record multiple times.
     
         To address this, we essentially split up the concerns properly such that
         the parent data structure is responsible for setting up the iterator via
10:  727b8fa432 = 10:  b0641dd800 reftable/generic: adapt interface to allow reuse of iterators
11:  f688f8f59a ! 11:  0745951650 reftable/reader: adapt interface to allow reuse of iterators
    @@ reftable/reftable-reader.h: struct reftable_table;
     +void reftable_reader_init_ref_iterator(struct reftable_reader *r,
     +				       struct reftable_iterator *it);
     +
    -+/* Initialize a reftable iterator for reading refs. */
    ++/* Initialize a reftable iterator for reading logs. */
     +void reftable_reader_init_log_iterator(struct reftable_reader *r,
     +				       struct reftable_iterator *it);
      
12:  68cc35919b = 12:  51480c4328 reftable/stack: provide convenience functions to create iterators
13:  be4da295c6 = 13:  2586e93c44 reftable/merged: adapt interface to allow reuse of iterators
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 01/13] reftable/block: use `size_t` to track restart point index
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
@ 2024-05-13  8:47   ` Patrick Steinhardt
  2024-05-21 13:34     ` Karthik Nayak
  2024-05-13  8:47   ` [PATCH v2 02/13] reftable/reader: avoid copying index iterator Patrick Steinhardt
                     ` (13 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1062 bytes --]

The function `block_reader_restart_offset()` gets the offset of the
`i`th restart point. `i` is a signed integer though, which is certainly
not the correct type to track indices like this. Furthermore, both
callers end up passing a `size_t`.

Refactor the code to use a `size_t` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/block.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/reftable/block.c b/reftable/block.c
index 5942cb4053..00030eee06 100644
--- a/reftable/block.c
+++ b/reftable/block.c
@@ -326,9 +326,9 @@ int block_reader_first_key(const struct block_reader *br, struct strbuf *key)
 	return 0;
 }
 
-static uint32_t block_reader_restart_offset(const struct block_reader *br, int i)
+static uint32_t block_reader_restart_offset(const struct block_reader *br, size_t idx)
 {
-	return get_be24(br->restart_bytes + 3 * i);
+	return get_be24(br->restart_bytes + 3 * idx);
 }
 
 void block_iter_seek_start(struct block_iter *it, const struct block_reader *br)
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 02/13] reftable/reader: avoid copying index iterator
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
  2024-05-13  8:47   ` [PATCH v2 01/13] reftable/block: use `size_t` to track restart point index Patrick Steinhardt
@ 2024-05-13  8:47   ` Patrick Steinhardt
  2024-05-13  8:47   ` [PATCH v2 03/13] reftable/reader: unify indexed and linear seeking Patrick Steinhardt
                     ` (12 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 3552 bytes --]

When doing an indexed seek we need to walk down the multi-level index
until we finally hit a record of the desired indexed type. This loop
performs a copy of the index iterator on every iteration, which is both
hard to understand and completely unnecessary.

Refactor the code so that we use a single iterator to walk down the
indices, only.

Note that while this should improve performance, the improvement is
negligible in all but the most unreasonable repositories. This is
because the effect is only really noticeable when we have to walk down
many levels of indices, which is not something that a repository would
typically have. So the motivation for this change is really only about
readability.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c | 38 ++++++++++++++------------------------
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index 481dff10d4..6bfadcad71 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -510,13 +510,11 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		.type = BLOCK_TYPE_INDEX,
 		.u.idx = { .last_key = STRBUF_INIT },
 	};
-	struct table_iter index_iter = TABLE_ITER_INIT;
-	struct table_iter empty = TABLE_ITER_INIT;
-	struct table_iter next = TABLE_ITER_INIT;
+	struct table_iter ti = TABLE_ITER_INIT, *malloced;
 	int err = 0;
 
 	reftable_record_key(rec, &want_index.u.idx.last_key);
-	err = reader_start(r, &index_iter, reftable_record_type(rec), 1);
+	err = reader_start(r, &ti, reftable_record_type(rec), 1);
 	if (err < 0)
 		goto done;
 
@@ -526,7 +524,7 @@ static int reader_seek_indexed(struct reftable_reader *r,
 	 * highest layer that identifies the relevant index block as well as
 	 * the record inside that block that corresponds to our wanted key.
 	 */
-	err = reader_seek_linear(&index_iter, &want_index);
+	err = reader_seek_linear(&ti, &want_index);
 	if (err < 0)
 		goto done;
 
@@ -552,44 +550,36 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		 * all levels of the index only to find out that the key does
 		 * not exist.
 		 */
-		err = table_iter_next(&index_iter, &index_result);
+		err = table_iter_next(&ti, &index_result);
 		if (err != 0)
 			goto done;
 
-		err = reader_table_iter_at(r, &next, index_result.u.idx.offset,
-					   0);
+		err = reader_table_iter_at(r, &ti, index_result.u.idx.offset, 0);
 		if (err != 0)
 			goto done;
 
-		err = block_iter_seek_key(&next.bi, &next.br, &want_index.u.idx.last_key);
+		err = block_iter_seek_key(&ti.bi, &ti.br, &want_index.u.idx.last_key);
 		if (err < 0)
 			goto done;
 
-		if (next.typ == reftable_record_type(rec)) {
+		if (ti.typ == reftable_record_type(rec)) {
 			err = 0;
 			break;
 		}
 
-		if (next.typ != BLOCK_TYPE_INDEX) {
+		if (ti.typ != BLOCK_TYPE_INDEX) {
 			err = REFTABLE_FORMAT_ERROR;
-			break;
+			goto done;
 		}
-
-		table_iter_close(&index_iter);
-		index_iter = next;
-		next = empty;
 	}
 
-	if (err == 0) {
-		struct table_iter *malloced = reftable_calloc(1, sizeof(*malloced));
-		*malloced = next;
-		next = empty;
-		iterator_from_table_iter(it, malloced);
-	}
+	REFTABLE_ALLOC_ARRAY(malloced, 1);
+	*malloced = ti;
+	iterator_from_table_iter(it, malloced);
 
 done:
-	table_iter_close(&next);
-	table_iter_close(&index_iter);
+	if (err)
+		table_iter_close(&ti);
 	reftable_record_release(&want_index);
 	reftable_record_release(&index_result);
 	return err;
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 03/13] reftable/reader: unify indexed and linear seeking
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
  2024-05-13  8:47   ` [PATCH v2 01/13] reftable/block: use `size_t` to track restart point index Patrick Steinhardt
  2024-05-13  8:47   ` [PATCH v2 02/13] reftable/reader: avoid copying index iterator Patrick Steinhardt
@ 2024-05-13  8:47   ` Patrick Steinhardt
  2024-05-21 14:41     ` Karthik Nayak
  2024-05-13  8:47   ` [PATCH v2 04/13] reftable/reader: separate concerns of table iter and reftable reader Patrick Steinhardt
                     ` (11 subsequent siblings)
  14 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 3959 bytes --]

In `reader_seek_internal()` we either end up doing an indexed seek when
there is one or a linear seek otherwise. These two code paths are
disjunct without a good reason, where the indexed seek will cause us to
exit early.

Refactor the two code paths such that it becomes possible to share a bit
more code between them.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c | 42 ++++++++++++++++--------------------------
 1 file changed, 16 insertions(+), 26 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index 6bfadcad71..cf7f126d8d 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -425,7 +425,7 @@ static int reader_seek_linear(struct table_iter *ti,
 	struct strbuf want_key = STRBUF_INIT;
 	struct strbuf got_key = STRBUF_INIT;
 	struct reftable_record rec;
-	int err = -1;
+	int err;
 
 	reftable_record_init(&rec, reftable_record_type(want));
 	reftable_record_key(want, &want_key);
@@ -499,8 +499,8 @@ static int reader_seek_linear(struct table_iter *ti,
 	return err;
 }
 
-static int reader_seek_indexed(struct reftable_reader *r,
-			       struct reftable_iterator *it,
+static int reader_seek_indexed(struct table_iter *ti,
+			       struct reftable_reader *r,
 			       struct reftable_record *rec)
 {
 	struct reftable_record want_index = {
@@ -510,13 +510,9 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		.type = BLOCK_TYPE_INDEX,
 		.u.idx = { .last_key = STRBUF_INIT },
 	};
-	struct table_iter ti = TABLE_ITER_INIT, *malloced;
-	int err = 0;
+	int err;
 
 	reftable_record_key(rec, &want_index.u.idx.last_key);
-	err = reader_start(r, &ti, reftable_record_type(rec), 1);
-	if (err < 0)
-		goto done;
 
 	/*
 	 * The index may consist of multiple levels, where each level may have
@@ -524,7 +520,7 @@ static int reader_seek_indexed(struct reftable_reader *r,
 	 * highest layer that identifies the relevant index block as well as
 	 * the record inside that block that corresponds to our wanted key.
 	 */
-	err = reader_seek_linear(&ti, &want_index);
+	err = reader_seek_linear(ti, &want_index);
 	if (err < 0)
 		goto done;
 
@@ -550,36 +546,30 @@ static int reader_seek_indexed(struct reftable_reader *r,
 		 * all levels of the index only to find out that the key does
 		 * not exist.
 		 */
-		err = table_iter_next(&ti, &index_result);
+		err = table_iter_next(ti, &index_result);
 		if (err != 0)
 			goto done;
 
-		err = reader_table_iter_at(r, &ti, index_result.u.idx.offset, 0);
+		err = reader_table_iter_at(r, ti, index_result.u.idx.offset, 0);
 		if (err != 0)
 			goto done;
 
-		err = block_iter_seek_key(&ti.bi, &ti.br, &want_index.u.idx.last_key);
+		err = block_iter_seek_key(&ti->bi, &ti->br, &want_index.u.idx.last_key);
 		if (err < 0)
 			goto done;
 
-		if (ti.typ == reftable_record_type(rec)) {
+		if (ti->typ == reftable_record_type(rec)) {
 			err = 0;
 			break;
 		}
 
-		if (ti.typ != BLOCK_TYPE_INDEX) {
+		if (ti->typ != BLOCK_TYPE_INDEX) {
 			err = REFTABLE_FORMAT_ERROR;
 			goto done;
 		}
 	}
 
-	REFTABLE_ALLOC_ARRAY(malloced, 1);
-	*malloced = ti;
-	iterator_from_table_iter(it, malloced);
-
 done:
-	if (err)
-		table_iter_close(&ti);
 	reftable_record_release(&want_index);
 	reftable_record_release(&index_result);
 	return err;
@@ -595,15 +585,15 @@ static int reader_seek_internal(struct reftable_reader *r,
 	struct table_iter ti = TABLE_ITER_INIT, *p;
 	int err;
 
-	if (idx > 0)
-		return reader_seek_indexed(r, it, rec);
-
-	err = reader_start(r, &ti, reftable_record_type(rec), 0);
+	err = reader_start(r, &ti, reftable_record_type(rec), !!idx);
 	if (err < 0)
 		goto out;
 
-	err = reader_seek_linear(&ti, rec);
-	if (err < 0)
+	if (idx)
+		err = reader_seek_indexed(&ti, r, rec);
+	else
+		err = reader_seek_linear(&ti, rec);
+	if (err)
 		goto out;
 
 	REFTABLE_ALLOC_ARRAY(p, 1);
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 04/13] reftable/reader: separate concerns of table iter and reftable reader
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2024-05-13  8:47   ` [PATCH v2 03/13] reftable/reader: unify indexed and linear seeking Patrick Steinhardt
@ 2024-05-13  8:47   ` Patrick Steinhardt
  2024-05-13  8:47   ` [PATCH v2 05/13] reftable/reader: inline `reader_seek_internal()` Patrick Steinhardt
                     ` (10 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 4209 bytes --]

In "reftable/reader.c" we implement two different interfaces:

  - The reftable reader contains the logic to read reftables.

  - The table iterator is used to iterate through a single reftable read
    by the reader.

The way those two types are used in the code is somewhat confusing
though because seeking inside a table is implemented as if it was part
of the reftable reader, even though it is ultimately more of a detail
implemented by the table iterator.

Make the boundary between those two types clearer by renaming functions
that seek records in a table such that they clearly belong to the table
iterator's logic.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index cf7f126d8d..b210753441 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -386,9 +386,8 @@ static void iterator_from_table_iter(struct reftable_iterator *it,
 	it->ops = &table_iter_vtable;
 }
 
-static int reader_table_iter_at(struct reftable_reader *r,
-				struct table_iter *ti, uint64_t off,
-				uint8_t typ)
+static int table_iter_seek_to(struct table_iter *ti, struct reftable_reader *r,
+			      uint64_t off, uint8_t typ)
 {
 	int err;
 
@@ -403,8 +402,8 @@ static int reader_table_iter_at(struct reftable_reader *r,
 	return 0;
 }
 
-static int reader_start(struct reftable_reader *r, struct table_iter *ti,
-			uint8_t typ, int index)
+static int table_iter_seek_start(struct table_iter *ti, struct reftable_reader *r,
+				 uint8_t typ, int index)
 {
 	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
 	uint64_t off = offs->offset;
@@ -416,11 +415,11 @@ static int reader_start(struct reftable_reader *r, struct table_iter *ti,
 		typ = BLOCK_TYPE_INDEX;
 	}
 
-	return reader_table_iter_at(r, ti, off, typ);
+	return table_iter_seek_to(ti, r, off, typ);
 }
 
-static int reader_seek_linear(struct table_iter *ti,
-			      struct reftable_record *want)
+static int table_iter_seek_linear(struct table_iter *ti,
+				  struct reftable_record *want)
 {
 	struct strbuf want_key = STRBUF_INIT;
 	struct strbuf got_key = STRBUF_INIT;
@@ -499,9 +498,8 @@ static int reader_seek_linear(struct table_iter *ti,
 	return err;
 }
 
-static int reader_seek_indexed(struct table_iter *ti,
-			       struct reftable_reader *r,
-			       struct reftable_record *rec)
+static int table_iter_seek_indexed(struct table_iter *ti,
+				   struct reftable_record *rec)
 {
 	struct reftable_record want_index = {
 		.type = BLOCK_TYPE_INDEX, .u.idx = { .last_key = STRBUF_INIT }
@@ -520,7 +518,7 @@ static int reader_seek_indexed(struct table_iter *ti,
 	 * highest layer that identifies the relevant index block as well as
 	 * the record inside that block that corresponds to our wanted key.
 	 */
-	err = reader_seek_linear(ti, &want_index);
+	err = table_iter_seek_linear(ti, &want_index);
 	if (err < 0)
 		goto done;
 
@@ -550,7 +548,7 @@ static int reader_seek_indexed(struct table_iter *ti,
 		if (err != 0)
 			goto done;
 
-		err = reader_table_iter_at(r, ti, index_result.u.idx.offset, 0);
+		err = table_iter_seek_to(ti, ti->r, index_result.u.idx.offset, 0);
 		if (err != 0)
 			goto done;
 
@@ -585,14 +583,14 @@ static int reader_seek_internal(struct reftable_reader *r,
 	struct table_iter ti = TABLE_ITER_INIT, *p;
 	int err;
 
-	err = reader_start(r, &ti, reftable_record_type(rec), !!idx);
+	err = table_iter_seek_start(&ti, r, reftable_record_type(rec), !!idx);
 	if (err < 0)
 		goto out;
 
 	if (idx)
-		err = reader_seek_indexed(&ti, r, rec);
+		err = table_iter_seek_indexed(&ti, rec);
 	else
-		err = reader_seek_linear(&ti, rec);
+		err = table_iter_seek_linear(&ti, rec);
 	if (err)
 		goto out;
 
@@ -742,7 +740,7 @@ static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
 	int err;
 
 	*ti = ti_empty;
-	err = reader_start(r, ti, BLOCK_TYPE_REF, 0);
+	err = table_iter_seek_start(ti, r, BLOCK_TYPE_REF, 0);
 	if (err < 0) {
 		reftable_free(ti);
 		return err;
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 05/13] reftable/reader: inline `reader_seek_internal()`
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2024-05-13  8:47   ` [PATCH v2 04/13] reftable/reader: separate concerns of table iter and reftable reader Patrick Steinhardt
@ 2024-05-13  8:47   ` Patrick Steinhardt
  2024-05-13  8:47   ` [PATCH v2 06/13] reftable/reader: set up the reader when initializing table iterator Patrick Steinhardt
                     ` (9 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 2240 bytes --]

We have both `reader_seek()` and `reader_seek_internal()`, where the
former function only exists so that we can exit early in case the given
table has no records of the sought-after type.

Merge these two functions into one.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c | 34 ++++++++++++----------------------
 1 file changed, 12 insertions(+), 22 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index b210753441..c3541e2c43 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -573,21 +573,25 @@ static int table_iter_seek_indexed(struct table_iter *ti,
 	return err;
 }
 
-static int reader_seek_internal(struct reftable_reader *r,
-				struct reftable_iterator *it,
-				struct reftable_record *rec)
+static int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
+		       struct reftable_record *rec)
 {
-	struct reftable_reader_offsets *offs =
-		reader_offsets_for(r, reftable_record_type(rec));
-	uint64_t idx = offs->index_offset;
+	uint8_t typ = reftable_record_type(rec);
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
 	struct table_iter ti = TABLE_ITER_INIT, *p;
 	int err;
 
-	err = table_iter_seek_start(&ti, r, reftable_record_type(rec), !!idx);
+	if (!offs->is_present) {
+		iterator_set_empty(it);
+		return 0;
+	}
+
+	err = table_iter_seek_start(&ti, r, reftable_record_type(rec),
+				    !!offs->index_offset);
 	if (err < 0)
 		goto out;
 
-	if (idx)
+	if (offs->index_offset)
 		err = table_iter_seek_indexed(&ti, rec);
 	else
 		err = table_iter_seek_linear(&ti, rec);
@@ -604,20 +608,6 @@ static int reader_seek_internal(struct reftable_reader *r,
 	return err;
 }
 
-static int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
-		       struct reftable_record *rec)
-{
-	uint8_t typ = reftable_record_type(rec);
-
-	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
-	if (!offs->is_present) {
-		iterator_set_empty(it);
-		return 0;
-	}
-
-	return reader_seek_internal(r, it, rec);
-}
-
 int reftable_reader_seek_ref(struct reftable_reader *r,
 			     struct reftable_iterator *it, const char *name)
 {
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 06/13] reftable/reader: set up the reader when initializing table iterator
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2024-05-13  8:47   ` [PATCH v2 05/13] reftable/reader: inline `reader_seek_internal()` Patrick Steinhardt
@ 2024-05-13  8:47   ` Patrick Steinhardt
  2024-05-13  8:47   ` [PATCH v2 07/13] reftable/merged: split up initialization and seeking of records Patrick Steinhardt
                     ` (8 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 4239 bytes --]

All the seeking functions accept a `struct reftable_reader` as input
such that they can use the reader to look up the respective blocks.
Refactor the code to instead set up the reader as a member of `struct
table_iter` during initialization such that we don't have to pass the
reader on every single call.

This step is required to move seeking of records into the generic
`struct reftable_iterator` infrastructure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index c3541e2c43..021608f638 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -224,8 +224,14 @@ struct table_iter {
 	struct block_iter bi;
 	int is_finished;
 };
-#define TABLE_ITER_INIT { \
-	.bi = BLOCK_ITER_INIT \
+
+static int table_iter_init(struct table_iter *ti, struct reftable_reader *r)
+{
+	struct block_iter bi = BLOCK_ITER_INIT;
+	memset(ti, 0, sizeof(*ti));
+	ti->r = r;
+	ti->bi = bi;
+	return 0;
 }
 
 static int table_iter_next_in_block(struct table_iter *ti,
@@ -386,26 +392,23 @@ static void iterator_from_table_iter(struct reftable_iterator *it,
 	it->ops = &table_iter_vtable;
 }
 
-static int table_iter_seek_to(struct table_iter *ti, struct reftable_reader *r,
-			      uint64_t off, uint8_t typ)
+static int table_iter_seek_to(struct table_iter *ti, uint64_t off, uint8_t typ)
 {
 	int err;
 
-	err = reader_init_block_reader(r, &ti->br, off, typ);
+	err = reader_init_block_reader(ti->r, &ti->br, off, typ);
 	if (err != 0)
 		return err;
 
-	ti->r = r;
 	ti->typ = block_reader_type(&ti->br);
 	ti->block_off = off;
 	block_iter_seek_start(&ti->bi, &ti->br);
 	return 0;
 }
 
-static int table_iter_seek_start(struct table_iter *ti, struct reftable_reader *r,
-				 uint8_t typ, int index)
+static int table_iter_seek_start(struct table_iter *ti, uint8_t typ, int index)
 {
-	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+	struct reftable_reader_offsets *offs = reader_offsets_for(ti->r, typ);
 	uint64_t off = offs->offset;
 	if (index) {
 		off = offs->index_offset;
@@ -415,7 +418,7 @@ static int table_iter_seek_start(struct table_iter *ti, struct reftable_reader *
 		typ = BLOCK_TYPE_INDEX;
 	}
 
-	return table_iter_seek_to(ti, r, off, typ);
+	return table_iter_seek_to(ti, off, typ);
 }
 
 static int table_iter_seek_linear(struct table_iter *ti,
@@ -548,7 +551,7 @@ static int table_iter_seek_indexed(struct table_iter *ti,
 		if (err != 0)
 			goto done;
 
-		err = table_iter_seek_to(ti, ti->r, index_result.u.idx.offset, 0);
+		err = table_iter_seek_to(ti, index_result.u.idx.offset, 0);
 		if (err != 0)
 			goto done;
 
@@ -578,7 +581,7 @@ static int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
 {
 	uint8_t typ = reftable_record_type(rec);
 	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
-	struct table_iter ti = TABLE_ITER_INIT, *p;
+	struct table_iter ti, *p;
 	int err;
 
 	if (!offs->is_present) {
@@ -586,7 +589,9 @@ static int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
 		return 0;
 	}
 
-	err = table_iter_seek_start(&ti, r, reftable_record_type(rec),
+	table_iter_init(&ti, r);
+
+	err = table_iter_seek_start(&ti, reftable_record_type(rec),
 				    !!offs->index_offset);
 	if (err < 0)
 		goto out;
@@ -722,15 +727,15 @@ static int reftable_reader_refs_for_unindexed(struct reftable_reader *r,
 					      struct reftable_iterator *it,
 					      uint8_t *oid)
 {
-	struct table_iter ti_empty = TABLE_ITER_INIT;
-	struct table_iter *ti = reftable_calloc(1, sizeof(*ti));
+	struct table_iter *ti;
 	struct filtering_ref_iterator *filter = NULL;
 	struct filtering_ref_iterator empty = FILTERING_REF_ITERATOR_INIT;
 	int oid_len = hash_size(r->hash_id);
 	int err;
 
-	*ti = ti_empty;
-	err = table_iter_seek_start(ti, r, BLOCK_TYPE_REF, 0);
+	REFTABLE_ALLOC_ARRAY(ti, 1);
+	table_iter_init(ti, r);
+	err = table_iter_seek_start(ti, BLOCK_TYPE_REF, 0);
 	if (err < 0) {
 		reftable_free(ti);
 		return err;
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 07/13] reftable/merged: split up initialization and seeking of records
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2024-05-13  8:47   ` [PATCH v2 06/13] reftable/reader: set up the reader when initializing table iterator Patrick Steinhardt
@ 2024-05-13  8:47   ` Patrick Steinhardt
  2024-05-13  8:47   ` [PATCH v2 08/13] reftable/merged: simplify indices for subiterators Patrick Steinhardt
                     ` (7 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 3427 bytes --]

To initialize a `struct merged_iter`, we need to seek all subiterators
to the wanted record and then add their results to the priority queue
used to sort the records. This logic is split up across two functions,
`merged_table_seek_record()` and `merged_iter_init()`. The scope of
these functions is somewhat weird though, where `merged_iter_init()` is
only responsible for adding the records of the subiterators to the
priority queue.

Clarify the scope of those functions such that `merged_iter_init()` is
only responsible for initializing the iterator's structure. Performing
the subiterator seeks are now part of `merged_table_seek_record()`.

This step is required to move seeking of records into the generic
`struct reftable_iterator` infrastructure.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 59 ++++++++++++++++++-----------------------------
 1 file changed, 22 insertions(+), 37 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index f85a24c678..4e1b78e93f 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -25,34 +25,18 @@ struct merged_subiter {
 struct merged_iter {
 	struct merged_subiter *subiters;
 	struct merged_iter_pqueue pq;
-	uint32_t hash_id;
 	size_t stack_len;
-	uint8_t typ;
 	int suppress_deletions;
 	ssize_t advance_index;
 };
 
-static int merged_iter_init(struct merged_iter *mi)
+static void merged_iter_init(struct merged_iter *mi,
+			     struct reftable_merged_table *mt)
 {
-	for (size_t i = 0; i < mi->stack_len; i++) {
-		struct pq_entry e = {
-			.index = i,
-			.rec = &mi->subiters[i].rec,
-		};
-		int err;
-
-		reftable_record_init(&mi->subiters[i].rec, mi->typ);
-		err = iterator_next(&mi->subiters[i].iter,
-				    &mi->subiters[i].rec);
-		if (err < 0)
-			return err;
-		if (err > 0)
-			continue;
-
-		merged_iter_pqueue_add(&mi->pq, &e);
-	}
-
-	return 0;
+	memset(mi, 0, sizeof(*mi));
+	mi->advance_index = -1;
+	mi->suppress_deletions = mt->suppress_deletions;
+	REFTABLE_CALLOC_ARRAY(mi->subiters, mt->stack_len);
 }
 
 static void merged_iter_close(void *p)
@@ -246,32 +230,33 @@ static int merged_table_seek_record(struct reftable_merged_table *mt,
 				    struct reftable_iterator *it,
 				    struct reftable_record *rec)
 {
-	struct merged_iter merged = {
-		.typ = reftable_record_type(rec),
-		.hash_id = mt->hash_id,
-		.suppress_deletions = mt->suppress_deletions,
-		.advance_index = -1,
-	};
-	struct merged_iter *p;
+	struct merged_iter merged, *p;
 	int err;
 
-	REFTABLE_CALLOC_ARRAY(merged.subiters, mt->stack_len);
+	merged_iter_init(&merged, mt);
+
 	for (size_t i = 0; i < mt->stack_len; i++) {
+		reftable_record_init(&merged.subiters[merged.stack_len].rec,
+				     reftable_record_type(rec));
+
 		err = reftable_table_seek_record(&mt->stack[i],
 						 &merged.subiters[merged.stack_len].iter, rec);
 		if (err < 0)
 			goto out;
-		if (!err)
-			merged.stack_len++;
-	}
+		if (err > 0)
+			continue;
 
-	err = merged_iter_init(&merged);
-	if (err < 0)
-		goto out;
+		err = merged_iter_advance_subiter(&merged, merged.stack_len);
+		if (err < 0)
+			goto out;
+
+		merged.stack_len++;
+	}
 
-	p = reftable_malloc(sizeof(struct merged_iter));
+	p = reftable_malloc(sizeof(*p));
 	*p = merged;
 	iterator_from_merged_iter(it, p);
+	err = 0;
 
 out:
 	if (err < 0)
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 08/13] reftable/merged: simplify indices for subiterators
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2024-05-13  8:47   ` [PATCH v2 07/13] reftable/merged: split up initialization and seeking of records Patrick Steinhardt
@ 2024-05-13  8:47   ` Patrick Steinhardt
  2024-05-13  8:47   ` [PATCH v2 09/13] reftable/generic: move seeking of records into the iterator Patrick Steinhardt
                     ` (6 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 2032 bytes --]

When seeking on a merged table, we perform the seek for each of the
subiterators. If the subiterator has the desired record we add it to the
priority queue, otherwise we skip it and don't add it to the stack of
subiterators hosted by the merged table.

The consequence of this is that the index of the subiterator in the
merged table does not necessarily correspond to the index of it in the
merged iterator. Next to being potentially confusing, it also means that
we won't easily be able to re-seek the merged iterator because we have
no clear connection between both of the data structures.

Refactor the code so that the index stays the same in both structures.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index 4e1b78e93f..18a2a6f09b 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -37,6 +37,7 @@ static void merged_iter_init(struct merged_iter *mi,
 	mi->advance_index = -1;
 	mi->suppress_deletions = mt->suppress_deletions;
 	REFTABLE_CALLOC_ARRAY(mi->subiters, mt->stack_len);
+	mi->stack_len = mt->stack_len;
 }
 
 static void merged_iter_close(void *p)
@@ -236,21 +237,19 @@ static int merged_table_seek_record(struct reftable_merged_table *mt,
 	merged_iter_init(&merged, mt);
 
 	for (size_t i = 0; i < mt->stack_len; i++) {
-		reftable_record_init(&merged.subiters[merged.stack_len].rec,
+		reftable_record_init(&merged.subiters[i].rec,
 				     reftable_record_type(rec));
 
 		err = reftable_table_seek_record(&mt->stack[i],
-						 &merged.subiters[merged.stack_len].iter, rec);
+						 &merged.subiters[i].iter, rec);
 		if (err < 0)
 			goto out;
 		if (err > 0)
 			continue;
 
-		err = merged_iter_advance_subiter(&merged, merged.stack_len);
+		err = merged_iter_advance_subiter(&merged, i);
 		if (err < 0)
 			goto out;
-
-		merged.stack_len++;
 	}
 
 	p = reftable_malloc(sizeof(*p));
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 09/13] reftable/generic: move seeking of records into the iterator
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2024-05-13  8:47   ` [PATCH v2 08/13] reftable/merged: simplify indices for subiterators Patrick Steinhardt
@ 2024-05-13  8:47   ` Patrick Steinhardt
  2024-05-13  8:47   ` [PATCH v2 10/13] reftable/generic: adapt interface to allow reuse of iterators Patrick Steinhardt
                     ` (5 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 17812 bytes --]

Reftable iterators are created by seeking on the parent structure of a
corresponding record. For example, to create an iterator for the merged
table you would call `reftable_merged_table_seek_ref()`. Most notably,
it is not posible to create an iterator and then seek it afterwards.

While this may be a bit easier to reason about, it comes with two
significant downsides. The first downside is that the logic to find
records is split up between the parent data structure and the iterator
itself. Conceptually, it is more straight forward if all that logic was
contained in a single place, which should be the iterator.

The second and more significant downside is that it is impossible to
reuse iterators for multiple seeks. Whenever you want to look up a
record, you need to re-create the whole infrastructure again, which is
quite a waste of time. Furthermore, it is impossible to optimize seeks,
such as when seeking the same record multiple times.

To address this, we essentially split up the concerns properly such that
the parent data structure is responsible for setting up the iterator via
a new `init_iter()` callback, whereas the iterator handles seeks via a
new `seek()` callback. This will eventually allow us to call `seek()` on
the iterator multiple times, where every iterator can potentially
optimize for certain cases.

Note that at this point in time we are not yet ready to reuse the
iterators. This will be left for a future patch series.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/generic.c |  47 +++++++++++++----
 reftable/generic.h |   9 +++-
 reftable/iter.c    |  15 ++++++
 reftable/merged.c  |  97 +++++++++++++++++-----------------
 reftable/reader.c  | 126 +++++++++++++++++++++++++--------------------
 5 files changed, 177 insertions(+), 117 deletions(-)

diff --git a/reftable/generic.c b/reftable/generic.c
index b9f1c7c18a..1cf68fe124 100644
--- a/reftable/generic.c
+++ b/reftable/generic.c
@@ -12,25 +12,39 @@ license that can be found in the LICENSE file or at
 #include "reftable-iterator.h"
 #include "reftable-generic.h"
 
+void table_init_iter(struct reftable_table *tab,
+		     struct reftable_iterator *it,
+		     uint8_t typ)
+{
+
+	tab->ops->init_iter(tab->table_arg, it, typ);
+}
+
 int reftable_table_seek_ref(struct reftable_table *tab,
 			    struct reftable_iterator *it, const char *name)
 {
-	struct reftable_record rec = { .type = BLOCK_TYPE_REF,
-				       .u.ref = {
-					       .refname = (char *)name,
-				       } };
-	return tab->ops->seek_record(tab->table_arg, it, &rec);
+	struct reftable_record want = {
+		.type = BLOCK_TYPE_REF,
+		.u.ref = {
+			.refname = (char *)name,
+		},
+	};
+	table_init_iter(tab, it, BLOCK_TYPE_REF);
+	return it->ops->seek(it->iter_arg, &want);
 }
 
 int reftable_table_seek_log(struct reftable_table *tab,
 			    struct reftable_iterator *it, const char *name)
 {
-	struct reftable_record rec = { .type = BLOCK_TYPE_LOG,
-				       .u.log = {
-					       .refname = (char *)name,
-					       .update_index = ~((uint64_t)0),
-				       } };
-	return tab->ops->seek_record(tab->table_arg, it, &rec);
+	struct reftable_record want = {
+		.type = BLOCK_TYPE_LOG,
+		.u.log = {
+			.refname = (char *)name,
+			.update_index = ~((uint64_t)0),
+		},
+	};
+	table_init_iter(tab, it, BLOCK_TYPE_LOG);
+	return it->ops->seek(it->iter_arg, &want);
 }
 
 int reftable_table_read_ref(struct reftable_table *tab, const char *name,
@@ -152,11 +166,21 @@ int reftable_iterator_next_log(struct reftable_iterator *it,
 	return err;
 }
 
+int iterator_seek(struct reftable_iterator *it, struct reftable_record *want)
+{
+	return it->ops->seek(it->iter_arg, want);
+}
+
 int iterator_next(struct reftable_iterator *it, struct reftable_record *rec)
 {
 	return it->ops->next(it->iter_arg, rec);
 }
 
+static int empty_iterator_seek(void *arg, struct reftable_record *want)
+{
+	return 0;
+}
+
 static int empty_iterator_next(void *arg, struct reftable_record *rec)
 {
 	return 1;
@@ -167,6 +191,7 @@ static void empty_iterator_close(void *arg)
 }
 
 static struct reftable_iterator_vtable empty_vtable = {
+	.seek = &empty_iterator_seek,
 	.next = &empty_iterator_next,
 	.close = &empty_iterator_close,
 };
diff --git a/reftable/generic.h b/reftable/generic.h
index 98886a0640..8341fa570e 100644
--- a/reftable/generic.h
+++ b/reftable/generic.h
@@ -14,19 +14,24 @@ license that can be found in the LICENSE file or at
 
 /* generic interface to reftables */
 struct reftable_table_vtable {
-	int (*seek_record)(void *tab, struct reftable_iterator *it,
-			   struct reftable_record *);
+	void (*init_iter)(void *tab, struct reftable_iterator *it, uint8_t typ);
 	uint32_t (*hash_id)(void *tab);
 	uint64_t (*min_update_index)(void *tab);
 	uint64_t (*max_update_index)(void *tab);
 };
 
+void table_init_iter(struct reftable_table *tab,
+		     struct reftable_iterator *it,
+		     uint8_t typ);
+
 struct reftable_iterator_vtable {
+	int (*seek)(void *iter_arg, struct reftable_record *want);
 	int (*next)(void *iter_arg, struct reftable_record *rec);
 	void (*close)(void *iter_arg);
 };
 
 void iterator_set_empty(struct reftable_iterator *it);
+int iterator_seek(struct reftable_iterator *it, struct reftable_record *want);
 int iterator_next(struct reftable_iterator *it, struct reftable_record *rec);
 
 #endif
diff --git a/reftable/iter.c b/reftable/iter.c
index aa9ac199b1..b4528fab47 100644
--- a/reftable/iter.c
+++ b/reftable/iter.c
@@ -23,6 +23,13 @@ static void filtering_ref_iterator_close(void *iter_arg)
 	reftable_iterator_destroy(&fri->it);
 }
 
+static int filtering_ref_iterator_seek(void *iter_arg,
+				       struct reftable_record *want)
+{
+	struct filtering_ref_iterator *fri = iter_arg;
+	return iterator_seek(&fri->it, want);
+}
+
 static int filtering_ref_iterator_next(void *iter_arg,
 				       struct reftable_record *rec)
 {
@@ -73,6 +80,7 @@ static int filtering_ref_iterator_next(void *iter_arg,
 }
 
 static struct reftable_iterator_vtable filtering_ref_iterator_vtable = {
+	.seek = &filtering_ref_iterator_seek,
 	.next = &filtering_ref_iterator_next,
 	.close = &filtering_ref_iterator_close,
 };
@@ -119,6 +127,12 @@ static int indexed_table_ref_iter_next_block(struct indexed_table_ref_iter *it)
 	return 0;
 }
 
+static int indexed_table_ref_iter_seek(void *p, struct reftable_record *want)
+{
+	BUG("seeking indexed table is not supported");
+	return -1;
+}
+
 static int indexed_table_ref_iter_next(void *p, struct reftable_record *rec)
 {
 	struct indexed_table_ref_iter *it = p;
@@ -175,6 +189,7 @@ int new_indexed_table_ref_iter(struct indexed_table_ref_iter **dest,
 }
 
 static struct reftable_iterator_vtable indexed_table_ref_iter_vtable = {
+	.seek = &indexed_table_ref_iter_seek,
 	.next = &indexed_table_ref_iter_next,
 	.close = &indexed_table_ref_iter_close,
 };
diff --git a/reftable/merged.c b/reftable/merged.c
index 18a2a6f09b..fc7946d90d 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -31,12 +31,18 @@ struct merged_iter {
 };
 
 static void merged_iter_init(struct merged_iter *mi,
-			     struct reftable_merged_table *mt)
+			     struct reftable_merged_table *mt,
+			     uint8_t typ)
 {
 	memset(mi, 0, sizeof(*mi));
 	mi->advance_index = -1;
 	mi->suppress_deletions = mt->suppress_deletions;
+
 	REFTABLE_CALLOC_ARRAY(mi->subiters, mt->stack_len);
+	for (size_t i = 0; i < mt->stack_len; i++) {
+		reftable_record_init(&mi->subiters[i].rec, typ);
+		table_init_iter(&mt->stack[i], &mi->subiters[i].iter, typ);
+	}
 	mi->stack_len = mt->stack_len;
 }
 
@@ -68,6 +74,27 @@ static int merged_iter_advance_subiter(struct merged_iter *mi, size_t idx)
 	return 0;
 }
 
+static int merged_iter_seek(struct merged_iter *mi, struct reftable_record *want)
+{
+	int err;
+
+	mi->advance_index = -1;
+
+	for (size_t i = 0; i < mi->stack_len; i++) {
+		err = iterator_seek(&mi->subiters[i].iter, want);
+		if (err < 0)
+			return err;
+		if (err > 0)
+			continue;
+
+		err = merged_iter_advance_subiter(mi, i);
+		if (err < 0)
+			return err;
+	}
+
+	return 0;
+}
+
 static int merged_iter_next_entry(struct merged_iter *mi,
 				  struct reftable_record *rec)
 {
@@ -133,6 +160,11 @@ static int merged_iter_next_entry(struct merged_iter *mi,
 	return 0;
 }
 
+static int merged_iter_seek_void(void *it, struct reftable_record *want)
+{
+	return merged_iter_seek(it, want);
+}
+
 static int merged_iter_next_void(void *p, struct reftable_record *rec)
 {
 	struct merged_iter *mi = p;
@@ -147,6 +179,7 @@ static int merged_iter_next_void(void *p, struct reftable_record *rec)
 }
 
 static struct reftable_iterator_vtable merged_iter_vtable = {
+	.seek = merged_iter_seek_void,
 	.next = &merged_iter_next_void,
 	.close = &merged_iter_close,
 };
@@ -220,47 +253,13 @@ reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
 	return mt->min;
 }
 
-static int reftable_table_seek_record(struct reftable_table *tab,
-				      struct reftable_iterator *it,
-				      struct reftable_record *rec)
-{
-	return tab->ops->seek_record(tab->table_arg, it, rec);
-}
-
-static int merged_table_seek_record(struct reftable_merged_table *mt,
-				    struct reftable_iterator *it,
-				    struct reftable_record *rec)
+static void merged_table_init_iter(struct reftable_merged_table *mt,
+				   struct reftable_iterator *it,
+				   uint8_t typ)
 {
-	struct merged_iter merged, *p;
-	int err;
-
-	merged_iter_init(&merged, mt);
-
-	for (size_t i = 0; i < mt->stack_len; i++) {
-		reftable_record_init(&merged.subiters[i].rec,
-				     reftable_record_type(rec));
-
-		err = reftable_table_seek_record(&mt->stack[i],
-						 &merged.subiters[i].iter, rec);
-		if (err < 0)
-			goto out;
-		if (err > 0)
-			continue;
-
-		err = merged_iter_advance_subiter(&merged, i);
-		if (err < 0)
-			goto out;
-	}
-
-	p = reftable_malloc(sizeof(*p));
-	*p = merged;
-	iterator_from_merged_iter(it, p);
-	err = 0;
-
-out:
-	if (err < 0)
-		merged_iter_close(&merged);
-	return err;
+	struct merged_iter *mi = reftable_malloc(sizeof(*mi));
+	merged_iter_init(mi, mt, typ);
+	iterator_from_merged_iter(it, mi);
 }
 
 int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
@@ -273,7 +272,8 @@ int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
 			.refname = (char *)name,
 		},
 	};
-	return merged_table_seek_record(mt, it, &rec);
+	merged_table_init_iter(mt, it, BLOCK_TYPE_REF);
+	return iterator_seek(it, &rec);
 }
 
 int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
@@ -285,7 +285,8 @@ int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
 					       .refname = (char *)name,
 					       .update_index = update_index,
 				       } };
-	return merged_table_seek_record(mt, it, &rec);
+	merged_table_init_iter(mt, it, BLOCK_TYPE_LOG);
+	return iterator_seek(it, &rec);
 }
 
 int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
@@ -301,11 +302,11 @@ uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *mt)
 	return mt->hash_id;
 }
 
-static int reftable_merged_table_seek_void(void *tab,
-					   struct reftable_iterator *it,
-					   struct reftable_record *rec)
+static void reftable_merged_table_init_iter_void(void *tab,
+						 struct reftable_iterator *it,
+						 uint8_t typ)
 {
-	return merged_table_seek_record(tab, it, rec);
+	merged_table_init_iter(tab, it, typ);
 }
 
 static uint32_t reftable_merged_table_hash_id_void(void *tab)
@@ -324,7 +325,7 @@ static uint64_t reftable_merged_table_max_update_index_void(void *tab)
 }
 
 static struct reftable_table_vtable merged_table_vtable = {
-	.seek_record = reftable_merged_table_seek_void,
+	.init_iter = reftable_merged_table_init_iter_void,
 	.hash_id = reftable_merged_table_hash_id_void,
 	.min_update_index = reftable_merged_table_min_update_index_void,
 	.max_update_index = reftable_merged_table_max_update_index_void,
diff --git a/reftable/reader.c b/reftable/reader.c
index 021608f638..a5a13cb0b9 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -369,29 +369,6 @@ static int table_iter_next(struct table_iter *ti, struct reftable_record *rec)
 	}
 }
 
-static int table_iter_next_void(void *ti, struct reftable_record *rec)
-{
-	return table_iter_next(ti, rec);
-}
-
-static void table_iter_close_void(void *ti)
-{
-	table_iter_close(ti);
-}
-
-static struct reftable_iterator_vtable table_iter_vtable = {
-	.next = &table_iter_next_void,
-	.close = &table_iter_close_void,
-};
-
-static void iterator_from_table_iter(struct reftable_iterator *it,
-				     struct table_iter *ti)
-{
-	assert(!it->ops);
-	it->iter_arg = ti;
-	it->ops = &table_iter_vtable;
-}
-
 static int table_iter_seek_to(struct table_iter *ti, uint64_t off, uint8_t typ)
 {
 	int err;
@@ -576,43 +553,74 @@ static int table_iter_seek_indexed(struct table_iter *ti,
 	return err;
 }
 
-static int reader_seek(struct reftable_reader *r, struct reftable_iterator *it,
-		       struct reftable_record *rec)
+static int table_iter_seek(struct table_iter *ti,
+			   struct reftable_record *want)
 {
-	uint8_t typ = reftable_record_type(rec);
-	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
-	struct table_iter ti, *p;
+	uint8_t typ = reftable_record_type(want);
+	struct reftable_reader_offsets *offs = reader_offsets_for(ti->r, typ);
 	int err;
 
-	if (!offs->is_present) {
-		iterator_set_empty(it);
-		return 0;
-	}
-
-	table_iter_init(&ti, r);
-
-	err = table_iter_seek_start(&ti, reftable_record_type(rec),
+	err = table_iter_seek_start(ti, reftable_record_type(want),
 				    !!offs->index_offset);
 	if (err < 0)
 		goto out;
 
 	if (offs->index_offset)
-		err = table_iter_seek_indexed(&ti, rec);
+		err = table_iter_seek_indexed(ti, want);
 	else
-		err = table_iter_seek_linear(&ti, rec);
+		err = table_iter_seek_linear(ti, want);
 	if (err)
 		goto out;
 
-	REFTABLE_ALLOC_ARRAY(p, 1);
-	*p = ti;
-	iterator_from_table_iter(it, p);
-
 out:
-	if (err)
-		table_iter_close(&ti);
 	return err;
 }
 
+static int table_iter_seek_void(void *ti, struct reftable_record *want)
+{
+	return table_iter_seek(ti, want);
+}
+
+static int table_iter_next_void(void *ti, struct reftable_record *rec)
+{
+	return table_iter_next(ti, rec);
+}
+
+static void table_iter_close_void(void *ti)
+{
+	table_iter_close(ti);
+}
+
+static struct reftable_iterator_vtable table_iter_vtable = {
+	.seek = &table_iter_seek_void,
+	.next = &table_iter_next_void,
+	.close = &table_iter_close_void,
+};
+
+static void iterator_from_table_iter(struct reftable_iterator *it,
+				     struct table_iter *ti)
+{
+	assert(!it->ops);
+	it->iter_arg = ti;
+	it->ops = &table_iter_vtable;
+}
+
+static void reader_init_iter(struct reftable_reader *r,
+			     struct reftable_iterator *it,
+			     uint8_t typ)
+{
+	struct reftable_reader_offsets *offs = reader_offsets_for(r, typ);
+
+	if (offs->is_present) {
+		struct table_iter *ti;
+		REFTABLE_ALLOC_ARRAY(ti, 1);
+		table_iter_init(ti, r);
+		iterator_from_table_iter(it, ti);
+	} else {
+		iterator_set_empty(it);
+	}
+}
+
 int reftable_reader_seek_ref(struct reftable_reader *r,
 			     struct reftable_iterator *it, const char *name)
 {
@@ -622,19 +630,23 @@ int reftable_reader_seek_ref(struct reftable_reader *r,
 			.refname = (char *)name,
 		},
 	};
-	return reader_seek(r, it, &rec);
+	reader_init_iter(r, it, BLOCK_TYPE_REF);
+	return iterator_seek(it, &rec);
 }
 
 int reftable_reader_seek_log_at(struct reftable_reader *r,
 				struct reftable_iterator *it, const char *name,
 				uint64_t update_index)
 {
-	struct reftable_record rec = { .type = BLOCK_TYPE_LOG,
-				       .u.log = {
-					       .refname = (char *)name,
-					       .update_index = update_index,
-				       } };
-	return reader_seek(r, it, &rec);
+	struct reftable_record rec = {
+		.type = BLOCK_TYPE_LOG,
+		.u.log = {
+			.refname = (char *)name,
+			.update_index = update_index,
+		},
+	};
+	reader_init_iter(r, it, BLOCK_TYPE_LOG);
+	return iterator_seek(it, &rec);
 }
 
 int reftable_reader_seek_log(struct reftable_reader *r,
@@ -692,7 +704,8 @@ static int reftable_reader_refs_for_indexed(struct reftable_reader *r,
 	struct indexed_table_ref_iter *itr = NULL;
 
 	/* Look through the reverse index. */
-	err = reader_seek(r, &oit, &want);
+	reader_init_iter(r, &oit, BLOCK_TYPE_OBJ);
+	err = iterator_seek(&oit, &want);
 	if (err != 0)
 		goto done;
 
@@ -773,10 +786,11 @@ uint64_t reftable_reader_min_update_index(struct reftable_reader *r)
 
 /* generic table interface. */
 
-static int reftable_reader_seek_void(void *tab, struct reftable_iterator *it,
-				     struct reftable_record *rec)
+static void reftable_reader_init_iter_void(void *tab,
+					   struct reftable_iterator *it,
+					   uint8_t typ)
 {
-	return reader_seek(tab, it, rec);
+	reader_init_iter(tab, it, typ);
 }
 
 static uint32_t reftable_reader_hash_id_void(void *tab)
@@ -795,7 +809,7 @@ static uint64_t reftable_reader_max_update_index_void(void *tab)
 }
 
 static struct reftable_table_vtable reader_vtable = {
-	.seek_record = reftable_reader_seek_void,
+	.init_iter = reftable_reader_init_iter_void,
 	.hash_id = reftable_reader_hash_id_void,
 	.min_update_index = reftable_reader_min_update_index_void,
 	.max_update_index = reftable_reader_max_update_index_void,
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 10/13] reftable/generic: adapt interface to allow reuse of iterators
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2024-05-13  8:47   ` [PATCH v2 09/13] reftable/generic: move seeking of records into the iterator Patrick Steinhardt
@ 2024-05-13  8:47   ` Patrick Steinhardt
  2024-05-13  8:47   ` [PATCH v2 11/13] reftable/reader: " Patrick Steinhardt
                     ` (4 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 6396 bytes --]

Refactor the interfaces exposed by `struct reftable_table` and `struct
reftable_iterator` such that they support iterator reuse. This is done
by separating initialization of the iterator and seeking on it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/generic.c           | 53 ++++++++++++++++++++++++++----------
 reftable/iter.c              |  8 +++---
 reftable/reftable-generic.h  |  8 +++---
 reftable/reftable-iterator.h | 21 ++++++++++++++
 4 files changed, 68 insertions(+), 22 deletions(-)

diff --git a/reftable/generic.c b/reftable/generic.c
index 1cf68fe124..28ae26145e 100644
--- a/reftable/generic.c
+++ b/reftable/generic.c
@@ -20,8 +20,20 @@ void table_init_iter(struct reftable_table *tab,
 	tab->ops->init_iter(tab->table_arg, it, typ);
 }
 
-int reftable_table_seek_ref(struct reftable_table *tab,
-			    struct reftable_iterator *it, const char *name)
+void reftable_table_init_ref_iter(struct reftable_table *tab,
+				  struct reftable_iterator *it)
+{
+	table_init_iter(tab, it, BLOCK_TYPE_REF);
+}
+
+void reftable_table_init_log_iter(struct reftable_table *tab,
+				  struct reftable_iterator *it)
+{
+	table_init_iter(tab, it, BLOCK_TYPE_LOG);
+}
+
+int reftable_iterator_seek_ref(struct reftable_iterator *it,
+			       const char *name)
 {
 	struct reftable_record want = {
 		.type = BLOCK_TYPE_REF,
@@ -29,29 +41,37 @@ int reftable_table_seek_ref(struct reftable_table *tab,
 			.refname = (char *)name,
 		},
 	};
-	table_init_iter(tab, it, BLOCK_TYPE_REF);
 	return it->ops->seek(it->iter_arg, &want);
 }
 
-int reftable_table_seek_log(struct reftable_table *tab,
-			    struct reftable_iterator *it, const char *name)
+int reftable_iterator_seek_log_at(struct reftable_iterator *it,
+				  const char *name, uint64_t update_index)
 {
 	struct reftable_record want = {
 		.type = BLOCK_TYPE_LOG,
 		.u.log = {
 			.refname = (char *)name,
-			.update_index = ~((uint64_t)0),
+			.update_index = update_index,
 		},
 	};
-	table_init_iter(tab, it, BLOCK_TYPE_LOG);
 	return it->ops->seek(it->iter_arg, &want);
 }
 
+int reftable_iterator_seek_log(struct reftable_iterator *it,
+			       const char *name)
+{
+	return reftable_iterator_seek_log_at(it, name, ~((uint64_t) 0));
+}
+
 int reftable_table_read_ref(struct reftable_table *tab, const char *name,
 			    struct reftable_ref_record *ref)
 {
 	struct reftable_iterator it = { NULL };
-	int err = reftable_table_seek_ref(tab, &it, name);
+	int err;
+
+	reftable_table_init_ref_iter(tab, &it);
+
+	err = reftable_iterator_seek_ref(&it, name);
 	if (err)
 		goto done;
 
@@ -76,10 +96,13 @@ int reftable_table_print(struct reftable_table *tab) {
 	struct reftable_ref_record ref = { NULL };
 	struct reftable_log_record log = { NULL };
 	uint32_t hash_id = reftable_table_hash_id(tab);
-	int err = reftable_table_seek_ref(tab, &it, "");
-	if (err < 0) {
+	int err;
+
+	reftable_table_init_ref_iter(tab, &it);
+
+	err = reftable_iterator_seek_ref(&it, "");
+	if (err < 0)
 		return err;
-	}
 
 	while (1) {
 		err = reftable_iterator_next_ref(&it, &ref);
@@ -94,10 +117,12 @@ int reftable_table_print(struct reftable_table *tab) {
 	reftable_iterator_destroy(&it);
 	reftable_ref_record_release(&ref);
 
-	err = reftable_table_seek_log(tab, &it, "");
-	if (err < 0) {
+	reftable_table_init_log_iter(tab, &it);
+
+	err = reftable_iterator_seek_log(&it, "");
+	if (err < 0)
 		return err;
-	}
+
 	while (1) {
 		err = reftable_iterator_next_log(&it, &log);
 		if (err > 0) {
diff --git a/reftable/iter.c b/reftable/iter.c
index b4528fab47..fddea31e51 100644
--- a/reftable/iter.c
+++ b/reftable/iter.c
@@ -45,11 +45,11 @@ static int filtering_ref_iterator_next(void *iter_arg,
 		if (fri->double_check) {
 			struct reftable_iterator it = { NULL };
 
-			err = reftable_table_seek_ref(&fri->tab, &it,
-						      ref->refname);
-			if (err == 0) {
+			reftable_table_init_ref_iter(&fri->tab, &it);
+
+			err = reftable_iterator_seek_ref(&it, ref->refname);
+			if (err == 0)
 				err = reftable_iterator_next_ref(&it, ref);
-			}
 
 			reftable_iterator_destroy(&it);
 
diff --git a/reftable/reftable-generic.h b/reftable/reftable-generic.h
index d239751a77..65670ea093 100644
--- a/reftable/reftable-generic.h
+++ b/reftable/reftable-generic.h
@@ -21,11 +21,11 @@ struct reftable_table {
 	void *table_arg;
 };
 
-int reftable_table_seek_log(struct reftable_table *tab,
-			    struct reftable_iterator *it, const char *name);
+void reftable_table_init_ref_iter(struct reftable_table *tab,
+				  struct reftable_iterator *it);
 
-int reftable_table_seek_ref(struct reftable_table *tab,
-			    struct reftable_iterator *it, const char *name);
+void reftable_table_init_log_iter(struct reftable_table *tab,
+				  struct reftable_iterator *it);
 
 /* returns the hash ID from a generic reftable_table */
 uint32_t reftable_table_hash_id(struct reftable_table *tab);
diff --git a/reftable/reftable-iterator.h b/reftable/reftable-iterator.h
index d3eee7af35..e3bf688d53 100644
--- a/reftable/reftable-iterator.h
+++ b/reftable/reftable-iterator.h
@@ -21,12 +21,33 @@ struct reftable_iterator {
 	void *iter_arg;
 };
 
+/*
+ * Position the iterator at the ref record with given name such that the next
+ * call to `next_ref()` would yield the record.
+ */
+int reftable_iterator_seek_ref(struct reftable_iterator *it,
+			       const char *name);
+
 /* reads the next reftable_ref_record. Returns < 0 for error, 0 for OK and > 0:
  * end of iteration.
  */
 int reftable_iterator_next_ref(struct reftable_iterator *it,
 			       struct reftable_ref_record *ref);
 
+/*
+ * Position the iterator at the log record with given name and update index
+ * such that the next call to `next_log()` would yield the record.
+ */
+int reftable_iterator_seek_log_at(struct reftable_iterator *it,
+				  const char *name, uint64_t update_index);
+
+/*
+ * Position the iterator at the newest log record with given name such that the
+ * next call to `next_log()` would yield the record.
+ */
+int reftable_iterator_seek_log(struct reftable_iterator *it,
+			       const char *name);
+
 /* reads the next reftable_log_record. Returns < 0 for error, 0 for OK and > 0:
  * end of iteration.
  */
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 11/13] reftable/reader: adapt interface to allow reuse of iterators
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
                     ` (9 preceding siblings ...)
  2024-05-13  8:47   ` [PATCH v2 10/13] reftable/generic: adapt interface to allow reuse of iterators Patrick Steinhardt
@ 2024-05-13  8:47   ` Patrick Steinhardt
  2024-05-13  8:47   ` [PATCH v2 12/13] reftable/stack: provide convenience functions to create iterators Patrick Steinhardt
                     ` (3 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 8301 bytes --]

Refactor the interfaces exposed by `struct reftable_reader` and `struct
table_iterator` such that they support iterator reuse. This is done by
separating initialization of the iterator and seeking on it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/reader.c          | 31 ++++----------------------
 reftable/readwrite_test.c  | 35 +++++++++++++++++++----------
 reftable/reftable-reader.h | 45 ++++++--------------------------------
 3 files changed, 35 insertions(+), 76 deletions(-)

diff --git a/reftable/reader.c b/reftable/reader.c
index a5a13cb0b9..bbdb4bdafa 100644
--- a/reftable/reader.c
+++ b/reftable/reader.c
@@ -621,39 +621,16 @@ static void reader_init_iter(struct reftable_reader *r,
 	}
 }
 
-int reftable_reader_seek_ref(struct reftable_reader *r,
-			     struct reftable_iterator *it, const char *name)
+void reftable_reader_init_ref_iterator(struct reftable_reader *r,
+				       struct reftable_iterator *it)
 {
-	struct reftable_record rec = {
-		.type = BLOCK_TYPE_REF,
-		.u.ref = {
-			.refname = (char *)name,
-		},
-	};
 	reader_init_iter(r, it, BLOCK_TYPE_REF);
-	return iterator_seek(it, &rec);
 }
 
-int reftable_reader_seek_log_at(struct reftable_reader *r,
-				struct reftable_iterator *it, const char *name,
-				uint64_t update_index)
+void reftable_reader_init_log_iterator(struct reftable_reader *r,
+				       struct reftable_iterator *it)
 {
-	struct reftable_record rec = {
-		.type = BLOCK_TYPE_LOG,
-		.u.log = {
-			.refname = (char *)name,
-			.update_index = update_index,
-		},
-	};
 	reader_init_iter(r, it, BLOCK_TYPE_LOG);
-	return iterator_seek(it, &rec);
-}
-
-int reftable_reader_seek_log(struct reftable_reader *r,
-			     struct reftable_iterator *it, const char *name)
-{
-	uint64_t max = ~((uint64_t)0);
-	return reftable_reader_seek_log_at(r, it, name, max);
 }
 
 void reader_close(struct reftable_reader *r)
diff --git a/reftable/readwrite_test.c b/reftable/readwrite_test.c
index a6dbd214c5..d99543bbd6 100644
--- a/reftable/readwrite_test.c
+++ b/reftable/readwrite_test.c
@@ -239,7 +239,9 @@ static void test_log_write_read(void)
 	err = init_reader(&rd, &source, "file.log");
 	EXPECT_ERR(err);
 
-	err = reftable_reader_seek_ref(&rd, &it, names[N - 1]);
+	reftable_reader_init_ref_iterator(&rd, &it);
+
+	err = reftable_iterator_seek_ref(&it, names[N - 1]);
 	EXPECT_ERR(err);
 
 	err = reftable_iterator_next_ref(&it, &ref);
@@ -252,7 +254,9 @@ static void test_log_write_read(void)
 	reftable_iterator_destroy(&it);
 	reftable_ref_record_release(&ref);
 
-	err = reftable_reader_seek_log(&rd, &it, "");
+	reftable_reader_init_log_iterator(&rd, &it);
+
+	err = reftable_iterator_seek_log(&it, "");
 	EXPECT_ERR(err);
 
 	i = 0;
@@ -330,7 +334,8 @@ static void test_log_zlib_corruption(void)
 	err = init_reader(&rd, &source, "file.log");
 	EXPECT_ERR(err);
 
-	err = reftable_reader_seek_log(&rd, &it, "refname");
+	reftable_reader_init_log_iterator(&rd, &it);
+	err = reftable_iterator_seek_log(&it, "refname");
 	EXPECT(err == REFTABLE_ZLIB_ERROR);
 
 	reftable_iterator_destroy(&it);
@@ -358,7 +363,8 @@ static void test_table_read_write_sequential(void)
 	err = init_reader(&rd, &source, "file.ref");
 	EXPECT_ERR(err);
 
-	err = reftable_reader_seek_ref(&rd, &it, "");
+	reftable_reader_init_ref_iterator(&rd, &it);
+	err = reftable_iterator_seek_ref(&it, "");
 	EXPECT_ERR(err);
 
 	while (1) {
@@ -412,7 +418,8 @@ static void test_table_read_api(void)
 	err = init_reader(&rd, &source, "file.ref");
 	EXPECT_ERR(err);
 
-	err = reftable_reader_seek_ref(&rd, &it, names[0]);
+	reftable_reader_init_ref_iterator(&rd, &it);
+	err = reftable_iterator_seek_ref(&it, names[0]);
 	EXPECT_ERR(err);
 
 	err = reftable_iterator_next_log(&it, &log);
@@ -457,7 +464,8 @@ static void test_table_read_write_seek(int index, int hash_id)
 	}
 
 	for (i = 1; i < N; i++) {
-		int err = reftable_reader_seek_ref(&rd, &it, names[i]);
+		reftable_reader_init_ref_iterator(&rd, &it);
+		err = reftable_iterator_seek_ref(&it, names[i]);
 		EXPECT_ERR(err);
 		err = reftable_iterator_next_ref(&it, &ref);
 		EXPECT_ERR(err);
@@ -472,7 +480,8 @@ static void test_table_read_write_seek(int index, int hash_id)
 	strbuf_addstr(&pastLast, names[N - 1]);
 	strbuf_addstr(&pastLast, "/");
 
-	err = reftable_reader_seek_ref(&rd, &it, pastLast.buf);
+	reftable_reader_init_ref_iterator(&rd, &it);
+	err = reftable_iterator_seek_ref(&it, pastLast.buf);
 	if (err == 0) {
 		struct reftable_ref_record ref = { NULL };
 		int err = reftable_iterator_next_ref(&it, &ref);
@@ -576,7 +585,8 @@ static void test_table_refs_for(int indexed)
 		rd.obj_offsets.is_present = 0;
 	}
 
-	err = reftable_reader_seek_ref(&rd, &it, "");
+	reftable_reader_init_ref_iterator(&rd, &it);
+	err = reftable_iterator_seek_ref(&it, "");
 	EXPECT_ERR(err);
 	reftable_iterator_destroy(&it);
 
@@ -639,7 +649,8 @@ static void test_write_empty_table(void)
 	err = reftable_new_reader(&rd, &source, "filename");
 	EXPECT_ERR(err);
 
-	err = reftable_reader_seek_ref(rd, &it, "");
+	reftable_reader_init_ref_iterator(rd, &it);
+	err = reftable_iterator_seek_ref(&it, "");
 	EXPECT_ERR(err);
 
 	err = reftable_iterator_next_ref(&it, &rec);
@@ -846,7 +857,8 @@ static void test_write_multiple_indices(void)
 	 * Seeking the log uses the log index now. In case there is any
 	 * confusion regarding indices we would notice here.
 	 */
-	err = reftable_reader_seek_log(reader, &it, "");
+	reftable_reader_init_log_iterator(reader, &it);
+	err = reftable_iterator_seek_log(&it, "");
 	EXPECT_ERR(err);
 
 	reftable_iterator_destroy(&it);
@@ -901,7 +913,8 @@ static void test_write_multi_level_index(void)
 	/*
 	 * Seeking the last ref should work as expected.
 	 */
-	err = reftable_reader_seek_ref(reader, &it, "refs/heads/199");
+	reftable_reader_init_ref_iterator(reader, &it);
+	err = reftable_iterator_seek_ref(&it, "refs/heads/199");
 	EXPECT_ERR(err);
 
 	reftable_iterator_destroy(&it);
diff --git a/reftable/reftable-reader.h b/reftable/reftable-reader.h
index 4a4bc2fdf8..52e4942b7b 100644
--- a/reftable/reftable-reader.h
+++ b/reftable/reftable-reader.h
@@ -36,48 +36,17 @@ struct reftable_table;
 int reftable_new_reader(struct reftable_reader **pp,
 			struct reftable_block_source *src, const char *name);
 
-/* reftable_reader_seek_ref returns an iterator where 'name' would be inserted
-   in the table.  To seek to the start of the table, use name = "".
-
-   example:
-
-   struct reftable_reader *r = NULL;
-   int err = reftable_new_reader(&r, &src, "filename");
-   if (err < 0) { ... }
-   struct reftable_iterator it  = {0};
-   err = reftable_reader_seek_ref(r, &it, "refs/heads/master");
-   if (err < 0) { ... }
-   struct reftable_ref_record ref  = {0};
-   while (1) {
-   err = reftable_iterator_next_ref(&it, &ref);
-   if (err > 0) {
-   break;
-   }
-   if (err < 0) {
-   ..error handling..
-   }
-   ..found..
-   }
-   reftable_iterator_destroy(&it);
-   reftable_ref_record_release(&ref);
-*/
-int reftable_reader_seek_ref(struct reftable_reader *r,
-			     struct reftable_iterator *it, const char *name);
+/* Initialize a reftable iterator for reading refs. */
+void reftable_reader_init_ref_iterator(struct reftable_reader *r,
+				       struct reftable_iterator *it);
+
+/* Initialize a reftable iterator for reading logs. */
+void reftable_reader_init_log_iterator(struct reftable_reader *r,
+				       struct reftable_iterator *it);
 
 /* returns the hash ID used in this table. */
 uint32_t reftable_reader_hash_id(struct reftable_reader *r);
 
-/* seek to logs for the given name, older than update_index. To seek to the
-   start of the table, use name = "".
-*/
-int reftable_reader_seek_log_at(struct reftable_reader *r,
-				struct reftable_iterator *it, const char *name,
-				uint64_t update_index);
-
-/* seek to newest log entry for given name. */
-int reftable_reader_seek_log(struct reftable_reader *r,
-			     struct reftable_iterator *it, const char *name);
-
 /* closes and deallocates a reader. */
 void reftable_reader_free(struct reftable_reader *);
 
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 12/13] reftable/stack: provide convenience functions to create iterators
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
                     ` (10 preceding siblings ...)
  2024-05-13  8:47   ` [PATCH v2 11/13] reftable/reader: " Patrick Steinhardt
@ 2024-05-13  8:47   ` Patrick Steinhardt
  2024-05-13  8:48   ` [PATCH v2 13/13] reftable/merged: adapt interface to allow reuse of iterators Patrick Steinhardt
                     ` (2 subsequent siblings)
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:47 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 12263 bytes --]

There exist a bunch of call sites in the reftable backend that want to
create iterators for a reftable stack. This is rather convoluted right
now, where you always have to go via the merged table. And it is about
to become even more convoluted when we split up iterator initialization
and seeking in the next commit.

Introduce convenience functions that allow the caller to create an
iterator from a reftable stack directly without going through the merged
table. Adapt callers accordingly.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs/reftable-backend.c   | 48 +++++++++++++++++----------------------
 reftable/merged.c         |  6 ++---
 reftable/merged.h         |  6 +++++
 reftable/reftable-stack.h | 18 +++++++++++++++
 reftable/stack.c          | 15 ++++++++++++
 5 files changed, 63 insertions(+), 30 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 010ef811b6..7ac2772bcb 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -15,7 +15,6 @@
 #include "../reftable/reftable-record.h"
 #include "../reftable/reftable-error.h"
 #include "../reftable/reftable-iterator.h"
-#include "../reftable/reftable-merged.h"
 #include "../setup.h"
 #include "../strmap.h"
 #include "parse.h"
@@ -462,7 +461,6 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 							    const char *prefix,
 							    int flags)
 {
-	struct reftable_merged_table *merged_table;
 	struct reftable_ref_iterator *iter;
 	int ret;
 
@@ -482,9 +480,8 @@ static struct reftable_ref_iterator *ref_iterator_for_stack(struct reftable_ref_
 	if (ret)
 		goto done;
 
-	merged_table = reftable_stack_merged_table(stack);
-
-	ret = reftable_merged_table_seek_ref(merged_table, &iter->iter, prefix);
+	reftable_stack_init_ref_iterator(stack, &iter->iter);
+	ret = reftable_iterator_seek_ref(&iter->iter, prefix);
 	if (ret)
 		goto done;
 
@@ -1015,8 +1012,6 @@ static int transaction_update_cmp(const void *a, const void *b)
 static int write_transaction_table(struct reftable_writer *writer, void *cb_data)
 {
 	struct write_transaction_table_arg *arg = cb_data;
-	struct reftable_merged_table *mt =
-		reftable_stack_merged_table(arg->stack);
 	uint64_t ts = reftable_stack_next_update_index(arg->stack);
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
@@ -1051,6 +1046,8 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 			struct reftable_log_record log = {0};
 			struct reftable_iterator it = {0};
 
+			reftable_stack_init_log_iterator(arg->stack, &it);
+
 			/*
 			 * When deleting refs we also delete all reflog entries
 			 * with them. While it is not strictly required to
@@ -1060,7 +1057,7 @@ static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 			 * Unfortunately, we have no better way than to delete
 			 * all reflog entries one by one.
 			 */
-			ret = reftable_merged_table_seek_log(mt, &it, u->refname);
+			ret = reftable_iterator_seek_log(&it, u->refname);
 			while (ret == 0) {
 				struct reftable_log_record *tombstone;
 
@@ -1354,7 +1351,6 @@ static int write_copy_table(struct reftable_writer *writer, void *cb_data)
 {
 	struct write_copy_arg *arg = cb_data;
 	uint64_t deletion_ts, creation_ts;
-	struct reftable_merged_table *mt = reftable_stack_merged_table(arg->stack);
 	struct reftable_ref_record old_ref = {0}, refs[2] = {0};
 	struct reftable_log_record old_log = {0}, *logs = NULL;
 	struct reftable_iterator it = {0};
@@ -1488,7 +1484,8 @@ static int write_copy_table(struct reftable_writer *writer, void *cb_data)
 	 * copy over all log entries from the old reflog. Last but not least,
 	 * when renaming we also have to delete all the old reflog entries.
 	 */
-	ret = reftable_merged_table_seek_log(mt, &it, arg->oldname);
+	reftable_stack_init_log_iterator(arg->stack, &it);
+	ret = reftable_iterator_seek_log(&it, arg->oldname);
 	if (ret < 0)
 		goto done;
 
@@ -1694,7 +1691,6 @@ static struct ref_iterator_vtable reftable_reflog_iterator_vtable = {
 static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftable_ref_store *refs,
 								  struct reftable_stack *stack)
 {
-	struct reftable_merged_table *merged_table;
 	struct reftable_reflog_iterator *iter;
 	int ret;
 
@@ -1711,9 +1707,8 @@ static struct reftable_reflog_iterator *reflog_iterator_for_stack(struct reftabl
 	if (ret < 0)
 		goto done;
 
-	merged_table = reftable_stack_merged_table(stack);
-
-	ret = reftable_merged_table_seek_log(merged_table, &iter->iter, "");
+	reftable_stack_init_log_iterator(stack, &iter->iter);
+	ret = reftable_iterator_seek_log(&iter->iter, "");
 	if (ret < 0)
 		goto done;
 
@@ -1771,7 +1766,6 @@ static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
 	struct reftable_ref_store *refs =
 		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent_reverse");
 	struct reftable_stack *stack = stack_for(refs, refname, &refname);
-	struct reftable_merged_table *mt = NULL;
 	struct reftable_log_record log = {0};
 	struct reftable_iterator it = {0};
 	int ret;
@@ -1779,8 +1773,8 @@ static int reftable_be_for_each_reflog_ent_reverse(struct ref_store *ref_store,
 	if (refs->err < 0)
 		return refs->err;
 
-	mt = reftable_stack_merged_table(stack);
-	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	reftable_stack_init_log_iterator(stack, &it);
+	ret = reftable_iterator_seek_log(&it, refname);
 	while (!ret) {
 		ret = reftable_iterator_next_log(&it, &log);
 		if (ret < 0)
@@ -1808,7 +1802,6 @@ static int reftable_be_for_each_reflog_ent(struct ref_store *ref_store,
 	struct reftable_ref_store *refs =
 		reftable_be_downcast(ref_store, REF_STORE_READ, "for_each_reflog_ent");
 	struct reftable_stack *stack = stack_for(refs, refname, &refname);
-	struct reftable_merged_table *mt = NULL;
 	struct reftable_log_record *logs = NULL;
 	struct reftable_iterator it = {0};
 	size_t logs_alloc = 0, logs_nr = 0, i;
@@ -1817,8 +1810,8 @@ static int reftable_be_for_each_reflog_ent(struct ref_store *ref_store,
 	if (refs->err < 0)
 		return refs->err;
 
-	mt = reftable_stack_merged_table(stack);
-	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	reftable_stack_init_log_iterator(stack, &it);
+	ret = reftable_iterator_seek_log(&it, refname);
 	while (!ret) {
 		struct reftable_log_record log = {0};
 
@@ -1855,7 +1848,6 @@ static int reftable_be_reflog_exists(struct ref_store *ref_store,
 	struct reftable_ref_store *refs =
 		reftable_be_downcast(ref_store, REF_STORE_READ, "reflog_exists");
 	struct reftable_stack *stack = stack_for(refs, refname, &refname);
-	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
 	struct reftable_log_record log = {0};
 	struct reftable_iterator it = {0};
 	int ret;
@@ -1868,7 +1860,8 @@ static int reftable_be_reflog_exists(struct ref_store *ref_store,
 	if (ret < 0)
 		goto done;
 
-	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	reftable_stack_init_log_iterator(stack, &it);
+	ret = reftable_iterator_seek_log(&it, refname);
 	if (ret < 0)
 		goto done;
 
@@ -1966,8 +1959,6 @@ struct write_reflog_delete_arg {
 static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_data)
 {
 	struct write_reflog_delete_arg *arg = cb_data;
-	struct reftable_merged_table *mt =
-		reftable_stack_merged_table(arg->stack);
 	struct reftable_log_record log = {0}, tombstone = {0};
 	struct reftable_iterator it = {0};
 	uint64_t ts = reftable_stack_next_update_index(arg->stack);
@@ -1975,12 +1966,14 @@ static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_da
 
 	reftable_writer_set_limits(writer, ts, ts);
 
+	reftable_stack_init_log_iterator(arg->stack, &it);
+
 	/*
 	 * In order to delete a table we need to delete all reflog entries one
 	 * by one. This is inefficient, but the reftable format does not have a
 	 * better marker right now.
 	 */
-	ret = reftable_merged_table_seek_log(mt, &it, arg->refname);
+	ret = reftable_iterator_seek_log(&it, arg->refname);
 	while (ret == 0) {
 		ret = reftable_iterator_next_log(&it, &log);
 		if (ret < 0)
@@ -2116,7 +2109,6 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 	struct reftable_ref_store *refs =
 		reftable_be_downcast(ref_store, REF_STORE_WRITE, "reflog_expire");
 	struct reftable_stack *stack = stack_for(refs, refname, &refname);
-	struct reftable_merged_table *mt = reftable_stack_merged_table(stack);
 	struct reftable_log_record *logs = NULL;
 	struct reftable_log_record *rewritten = NULL;
 	struct reftable_ref_record ref_record = {0};
@@ -2135,7 +2127,9 @@ static int reftable_be_reflog_expire(struct ref_store *ref_store,
 	if (ret < 0)
 		goto done;
 
-	ret = reftable_merged_table_seek_log(mt, &it, refname);
+	reftable_stack_init_log_iterator(stack, &it);
+
+	ret = reftable_iterator_seek_log(&it, refname);
 	if (ret < 0)
 		goto done;
 
diff --git a/reftable/merged.c b/reftable/merged.c
index fc7946d90d..d127f99360 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -253,9 +253,9 @@ reftable_merged_table_min_update_index(struct reftable_merged_table *mt)
 	return mt->min;
 }
 
-static void merged_table_init_iter(struct reftable_merged_table *mt,
-				   struct reftable_iterator *it,
-				   uint8_t typ)
+void merged_table_init_iter(struct reftable_merged_table *mt,
+			    struct reftable_iterator *it,
+			    uint8_t typ)
 {
 	struct merged_iter *mi = reftable_malloc(sizeof(*mi));
 	merged_iter_init(mi, mt, typ);
diff --git a/reftable/merged.h b/reftable/merged.h
index a2571dbc99..a10469f58e 100644
--- a/reftable/merged.h
+++ b/reftable/merged.h
@@ -26,4 +26,10 @@ struct reftable_merged_table {
 
 void merged_table_release(struct reftable_merged_table *mt);
 
+struct reftable_iterator;
+
+void merged_table_init_iter(struct reftable_merged_table *mt,
+			    struct reftable_iterator *it,
+			    uint8_t typ);
+
 #endif
diff --git a/reftable/reftable-stack.h b/reftable/reftable-stack.h
index 1b602dda58..26740e6073 100644
--- a/reftable/reftable-stack.h
+++ b/reftable/reftable-stack.h
@@ -66,6 +66,24 @@ int reftable_stack_add(struct reftable_stack *st,
 					  void *write_arg),
 		       void *write_arg);
 
+struct reftable_iterator;
+
+/*
+ * Initialize an iterator for the merged tables contained in the stack that can
+ * be used to iterate through refs. The iterator is valid until the next reload
+ * or write.
+ */
+void reftable_stack_init_ref_iterator(struct reftable_stack *st,
+				      struct reftable_iterator *it);
+
+/*
+ * Initialize an iterator for the merged tables contained in the stack that can
+ * be used to iterate through logs. The iterator is valid until the next reload
+ * or write.
+ */
+void reftable_stack_init_log_iterator(struct reftable_stack *st,
+				      struct reftable_iterator *it);
+
 /* returns the merged_table for seeking. This table is valid until the
  * next write or reload, and should not be closed or deleted.
  */
diff --git a/reftable/stack.c b/reftable/stack.c
index a59ebe038d..03f95935e1 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -10,6 +10,7 @@ license that can be found in the LICENSE file or at
 
 #include "../write-or-die.h"
 #include "system.h"
+#include "constants.h"
 #include "merged.h"
 #include "reader.h"
 #include "reftable-error.h"
@@ -130,6 +131,20 @@ int read_lines(const char *filename, char ***namesp)
 	return err;
 }
 
+void reftable_stack_init_ref_iterator(struct reftable_stack *st,
+				      struct reftable_iterator *it)
+{
+	merged_table_init_iter(reftable_stack_merged_table(st),
+			       it, BLOCK_TYPE_REF);
+}
+
+void reftable_stack_init_log_iterator(struct reftable_stack *st,
+				      struct reftable_iterator *it)
+{
+	merged_table_init_iter(reftable_stack_merged_table(st),
+			       it, BLOCK_TYPE_LOG);
+}
+
 struct reftable_merged_table *
 reftable_stack_merged_table(struct reftable_stack *st)
 {
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 13/13] reftable/merged: adapt interface to allow reuse of iterators
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
                     ` (11 preceding siblings ...)
  2024-05-13  8:47   ` [PATCH v2 12/13] reftable/stack: provide convenience functions to create iterators Patrick Steinhardt
@ 2024-05-13  8:48   ` Patrick Steinhardt
  2024-05-15 20:15   ` [PATCH v2 00/13] reftable: prepare for re-seekable iterators Justin Tobler
  2024-05-21 15:31   ` Karthik Nayak
  14 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-13  8:48 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 6939 bytes --]

Refactor the interfaces exposed by `struct reftable_merged_table` and
`struct merged_iter` such that they support iterator reuse. This is done
by separating initialization of the iterator and seeking on it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c          | 35 -----------------------------------
 reftable/merged_test.c     | 19 +++++++++++++------
 reftable/reftable-merged.h | 15 ---------------
 reftable/stack.c           | 14 +++++++++-----
 4 files changed, 22 insertions(+), 61 deletions(-)

diff --git a/reftable/merged.c b/reftable/merged.c
index d127f99360..0da9dba265 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -262,41 +262,6 @@ void merged_table_init_iter(struct reftable_merged_table *mt,
 	iterator_from_merged_iter(it, mi);
 }
 
-int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
-				   struct reftable_iterator *it,
-				   const char *name)
-{
-	struct reftable_record rec = {
-		.type = BLOCK_TYPE_REF,
-		.u.ref = {
-			.refname = (char *)name,
-		},
-	};
-	merged_table_init_iter(mt, it, BLOCK_TYPE_REF);
-	return iterator_seek(it, &rec);
-}
-
-int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
-				      struct reftable_iterator *it,
-				      const char *name, uint64_t update_index)
-{
-	struct reftable_record rec = { .type = BLOCK_TYPE_LOG,
-				       .u.log = {
-					       .refname = (char *)name,
-					       .update_index = update_index,
-				       } };
-	merged_table_init_iter(mt, it, BLOCK_TYPE_LOG);
-	return iterator_seek(it, &rec);
-}
-
-int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
-				   struct reftable_iterator *it,
-				   const char *name)
-{
-	uint64_t max = ~((uint64_t)0);
-	return reftable_merged_table_seek_log_at(mt, it, name, max);
-}
-
 uint32_t reftable_merged_table_hash_id(struct reftable_merged_table *mt)
 {
 	return mt->hash_id;
diff --git a/reftable/merged_test.c b/reftable/merged_test.c
index 530fc82d1c..33a17efcde 100644
--- a/reftable/merged_test.c
+++ b/reftable/merged_test.c
@@ -12,6 +12,7 @@ license that can be found in the LICENSE file or at
 
 #include "basics.h"
 #include "blocksource.h"
+#include "constants.h"
 #include "reader.h"
 #include "record.h"
 #include "test_framework.h"
@@ -145,7 +146,10 @@ static void test_merged_between(void)
 	int i;
 	struct reftable_ref_record ref = { NULL };
 	struct reftable_iterator it = { NULL };
-	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	int err;
+
+	merged_table_init_iter(mt, &it, BLOCK_TYPE_REF);
+	err = reftable_iterator_seek_ref(&it, "a");
 	EXPECT_ERR(err);
 
 	err = reftable_iterator_next_ref(&it, &ref);
@@ -217,14 +221,15 @@ static void test_merged(void)
 	struct reftable_reader **readers = NULL;
 	struct reftable_merged_table *mt =
 		merged_table_from_records(refs, &bs, &readers, sizes, bufs, 3);
-
 	struct reftable_iterator it = { NULL };
-	int err = reftable_merged_table_seek_ref(mt, &it, "a");
+	int err;
 	struct reftable_ref_record *out = NULL;
 	size_t len = 0;
 	size_t cap = 0;
 	int i = 0;
 
+	merged_table_init_iter(mt, &it, BLOCK_TYPE_REF);
+	err = reftable_iterator_seek_ref(&it, "a");
 	EXPECT_ERR(err);
 	EXPECT(reftable_merged_table_hash_id(mt) == GIT_SHA1_FORMAT_ID);
 	EXPECT(reftable_merged_table_min_update_index(mt) == 1);
@@ -348,14 +353,15 @@ static void test_merged_logs(void)
 	struct reftable_reader **readers = NULL;
 	struct reftable_merged_table *mt = merged_table_from_log_records(
 		logs, &bs, &readers, sizes, bufs, 3);
-
 	struct reftable_iterator it = { NULL };
-	int err = reftable_merged_table_seek_log(mt, &it, "a");
+	int err;
 	struct reftable_log_record *out = NULL;
 	size_t len = 0;
 	size_t cap = 0;
 	int i = 0;
 
+	merged_table_init_iter(mt, &it, BLOCK_TYPE_LOG);
+	err = reftable_iterator_seek_log(&it, "a");
 	EXPECT_ERR(err);
 	EXPECT(reftable_merged_table_hash_id(mt) == GIT_SHA1_FORMAT_ID);
 	EXPECT(reftable_merged_table_min_update_index(mt) == 1);
@@ -377,7 +383,8 @@ static void test_merged_logs(void)
 						 GIT_SHA1_RAWSZ));
 	}
 
-	err = reftable_merged_table_seek_log_at(mt, &it, "a", 2);
+	merged_table_init_iter(mt, &it, BLOCK_TYPE_LOG);
+	err = reftable_iterator_seek_log_at(&it, "a", 2);
 	EXPECT_ERR(err);
 	reftable_log_record_release(&out[0]);
 	err = reftable_iterator_next_log(&it, &out[0]);
diff --git a/reftable/reftable-merged.h b/reftable/reftable-merged.h
index c91a2d83a2..14d5fc9f05 100644
--- a/reftable/reftable-merged.h
+++ b/reftable/reftable-merged.h
@@ -36,21 +36,6 @@ int reftable_new_merged_table(struct reftable_merged_table **dest,
 			      struct reftable_table *stack, size_t n,
 			      uint32_t hash_id);
 
-/* returns an iterator positioned just before 'name' */
-int reftable_merged_table_seek_ref(struct reftable_merged_table *mt,
-				   struct reftable_iterator *it,
-				   const char *name);
-
-/* returns an iterator for log entry, at given update_index */
-int reftable_merged_table_seek_log_at(struct reftable_merged_table *mt,
-				      struct reftable_iterator *it,
-				      const char *name, uint64_t update_index);
-
-/* like reftable_merged_table_seek_log_at but look for the newest entry. */
-int reftable_merged_table_seek_log(struct reftable_merged_table *mt,
-				   struct reftable_iterator *it,
-				   const char *name);
-
 /* returns the max update_index covered by this merged table. */
 uint64_t
 reftable_merged_table_max_update_index(struct reftable_merged_table *mt);
diff --git a/reftable/stack.c b/reftable/stack.c
index 03f95935e1..7af4c3fd66 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -925,7 +925,8 @@ static int stack_write_compact(struct reftable_stack *st,
 		goto done;
 	}
 
-	err = reftable_merged_table_seek_ref(mt, &it, "");
+	merged_table_init_iter(mt, &it, BLOCK_TYPE_REF);
+	err = reftable_iterator_seek_ref(&it, "");
 	if (err < 0)
 		goto done;
 
@@ -949,7 +950,8 @@ static int stack_write_compact(struct reftable_stack *st,
 	}
 	reftable_iterator_destroy(&it);
 
-	err = reftable_merged_table_seek_log(mt, &it, "");
+	merged_table_init_iter(mt, &it, BLOCK_TYPE_LOG);
+	err = reftable_iterator_seek_log(&it, "");
 	if (err < 0)
 		goto done;
 
@@ -1340,9 +1342,11 @@ int reftable_stack_read_ref(struct reftable_stack *st, const char *refname,
 int reftable_stack_read_log(struct reftable_stack *st, const char *refname,
 			    struct reftable_log_record *log)
 {
-	struct reftable_iterator it = { NULL };
-	struct reftable_merged_table *mt = reftable_stack_merged_table(st);
-	int err = reftable_merged_table_seek_log(mt, &it, refname);
+	struct reftable_iterator it = {0};
+	int err;
+
+	reftable_stack_init_log_iterator(st, &it);
+	err = reftable_iterator_seek_log(&it, refname);
 	if (err)
 		goto done;
 
-- 
2.45.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 00/13] reftable: prepare for re-seekable iterators
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
                     ` (12 preceding siblings ...)
  2024-05-13  8:48   ` [PATCH v2 13/13] reftable/merged: adapt interface to allow reuse of iterators Patrick Steinhardt
@ 2024-05-15 20:15   ` Justin Tobler
  2024-05-21 15:31   ` Karthik Nayak
  14 siblings, 0 replies; 50+ messages in thread
From: Justin Tobler @ 2024-05-15 20:15 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Junio C Hamano

On 24/05/13 10:46AM, Patrick Steinhardt wrote:
> Hi,
> 
> this is the second version of my patch series that prepares the reftable
> iterators to become re-seekable. These refactorings will eventually
> allow us to reuse data structures by iterators and optimize for certain
> cases.
> 
> Changes compared to v1:
> 
>   - Various fixes to commit messages.
> 
>   - Fixed a copy & pasted comment to refer to logs instead of refs.
> 
> The series continues to build on top of ps/reftable-write-optim. There
> is a merge conflict with the in-flight ps/reftable-write-options, but
> given that this series has not yet been merged to next and because Junio
> has already resolved the conflict in "seen" I decided to not pull it in
> as an additional dependency.

Thanks Patrick! From the range-diff, this version looks good to me :)

-Justin

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 01/13] reftable/block: use `size_t` to track restart point index
  2024-05-13  8:47   ` [PATCH v2 01/13] reftable/block: use `size_t` to track restart point index Patrick Steinhardt
@ 2024-05-21 13:34     ` Karthik Nayak
  2024-05-22  7:23       ` Patrick Steinhardt
  0 siblings, 1 reply; 50+ messages in thread
From: Karthik Nayak @ 2024-05-21 13:34 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1302 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> The function `block_reader_restart_offset()` gets the offset of the
> `i`th restart point. `i` is a signed integer though, which is certainly
> not the correct type to track indices like this. Furthermore, both
> callers end up passing a `size_t`.
>
> Refactor the code to use a `size_t` instead.

More of a question for my understanding: Why use `size_t` vs `uint16_t`
here? I'm asking since the restart count is defined as `uint16_t
restart_count` in `struct block_reader`.

>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  reftable/block.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/reftable/block.c b/reftable/block.c
> index 5942cb4053..00030eee06 100644
> --- a/reftable/block.c
> +++ b/reftable/block.c
> @@ -326,9 +326,9 @@ int block_reader_first_key(const struct block_reader *br, struct strbuf *key)
>  	return 0;
>  }
>
> -static uint32_t block_reader_restart_offset(const struct block_reader *br, int i)
> +static uint32_t block_reader_restart_offset(const struct block_reader *br, size_t idx)
>  {
> -	return get_be24(br->restart_bytes + 3 * i);
> +	return get_be24(br->restart_bytes + 3 * idx);
>  }
>
>  void block_iter_seek_start(struct block_iter *it, const struct block_reader *br)
> --
> 2.45.GIT

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 03/13] reftable/reader: unify indexed and linear seeking
  2024-05-13  8:47   ` [PATCH v2 03/13] reftable/reader: unify indexed and linear seeking Patrick Steinhardt
@ 2024-05-21 14:41     ` Karthik Nayak
  2024-05-22  7:23       ` Patrick Steinhardt
  0 siblings, 1 reply; 50+ messages in thread
From: Karthik Nayak @ 2024-05-21 14:41 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 224 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> In `reader_seek_internal()` we either end up doing an indexed seek when
> there is one or a linear seek otherwise. These two code paths are

Are we missing the subject here?

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 00/13] reftable: prepare for re-seekable iterators
  2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
                     ` (13 preceding siblings ...)
  2024-05-15 20:15   ` [PATCH v2 00/13] reftable: prepare for re-seekable iterators Justin Tobler
@ 2024-05-21 15:31   ` Karthik Nayak
  2024-05-22  7:23     ` Patrick Steinhardt
  14 siblings, 1 reply; 50+ messages in thread
From: Karthik Nayak @ 2024-05-21 15:31 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 978 bytes --]

Hello,

Patrick Steinhardt <ps@pks.im> writes:
> Hi,
>
> this is the second version of my patch series that prepares the reftable
> iterators to become re-seekable. These refactorings will eventually
> allow us to reuse data structures by iterators and optimize for certain
> cases.
>
> Changes compared to v1:
>
>   - Various fixes to commit messages.
>
>   - Fixed a copy & pasted comment to refer to logs instead of refs.
>
> The series continues to build on top of ps/reftable-write-optim. There
> is a merge conflict with the in-flight ps/reftable-write-options, but
> given that this series has not yet been merged to next and because Junio
> has already resolved the conflict in "seen" I decided to not pull it in
> as an additional dependency.
>
> Thanks!
>
> Patrick

I didn't review the earlier version. I did go through the patches in
this version, I only have a nit in the first patch, not worth a reroll.
Thanks for the series, I have nothing more to add.

Karthik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 01/13] reftable/block: use `size_t` to track restart point index
  2024-05-21 13:34     ` Karthik Nayak
@ 2024-05-22  7:23       ` Patrick Steinhardt
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-22  7:23 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 830 bytes --]

On Tue, May 21, 2024 at 01:34:48PM +0000, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > The function `block_reader_restart_offset()` gets the offset of the
> > `i`th restart point. `i` is a signed integer though, which is certainly
> > not the correct type to track indices like this. Furthermore, both
> > callers end up passing a `size_t`.
> >
> > Refactor the code to use a `size_t` instead.
> 
> More of a question for my understanding: Why use `size_t` vs `uint16_t`
> here? I'm asking since the restart count is defined as `uint16_t
> restart_count` in `struct block_reader`.

Mostly because all callers already use `size_t`, and it's customary to
use them when talking about indices. We could use `uint16_t`, too, but
it didn't really feel worth it to adjust all callers.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 03/13] reftable/reader: unify indexed and linear seeking
  2024-05-21 14:41     ` Karthik Nayak
@ 2024-05-22  7:23       ` Patrick Steinhardt
  2024-05-22  7:56         ` Karthik Nayak
  0 siblings, 1 reply; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-22  7:23 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 632 bytes --]

On Tue, May 21, 2024 at 02:41:27PM +0000, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > In `reader_seek_internal()` we either end up doing an indexed seek when
> > there is one or a linear seek otherwise. These two code paths are
> 
> Are we missing the subject here?

Hm. "one" refers back to "indexed" and is supposed to mean "index" here.
I agree that this is easy to misread though and may not even be proper
English.

It doesn't really feel worth it to rebase this series just for this
issue, but if I did it would be 's/one/an index/'. Let me know in case
you disagree though.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 00/13] reftable: prepare for re-seekable iterators
  2024-05-21 15:31   ` Karthik Nayak
@ 2024-05-22  7:23     ` Patrick Steinhardt
  0 siblings, 0 replies; 50+ messages in thread
From: Patrick Steinhardt @ 2024-05-22  7:23 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 1142 bytes --]

On Tue, May 21, 2024 at 03:31:15PM +0000, Karthik Nayak wrote:
> Hello,
> 
> Patrick Steinhardt <ps@pks.im> writes:
> > Hi,
> >
> > this is the second version of my patch series that prepares the reftable
> > iterators to become re-seekable. These refactorings will eventually
> > allow us to reuse data structures by iterators and optimize for certain
> > cases.
> >
> > Changes compared to v1:
> >
> >   - Various fixes to commit messages.
> >
> >   - Fixed a copy & pasted comment to refer to logs instead of refs.
> >
> > The series continues to build on top of ps/reftable-write-optim. There
> > is a merge conflict with the in-flight ps/reftable-write-options, but
> > given that this series has not yet been merged to next and because Junio
> > has already resolved the conflict in "seen" I decided to not pull it in
> > as an additional dependency.
> >
> > Thanks!
> >
> > Patrick
> 
> I didn't review the earlier version. I did go through the patches in
> this version, I only have a nit in the first patch, not worth a reroll.
> Thanks for the series, I have nothing more to add.

Thanks!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 03/13] reftable/reader: unify indexed and linear seeking
  2024-05-22  7:23       ` Patrick Steinhardt
@ 2024-05-22  7:56         ` Karthik Nayak
  0 siblings, 0 replies; 50+ messages in thread
From: Karthik Nayak @ 2024-05-22  7:56 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Junio C Hamano, Justin Tobler

[-- Attachment #1: Type: text/plain, Size: 772 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> On Tue, May 21, 2024 at 02:41:27PM +0000, Karthik Nayak wrote:
>> Patrick Steinhardt <ps@pks.im> writes:
>>
>> > In `reader_seek_internal()` we either end up doing an indexed seek when
>> > there is one or a linear seek otherwise. These two code paths are
>>
>> Are we missing the subject here?
>
> Hm. "one" refers back to "indexed" and is supposed to mean "index" here.
> I agree that this is easy to misread though and may not even be proper
> English.
>
> It doesn't really feel worth it to rebase this series just for this
> issue, but if I did it would be 's/one/an index/'. Let me know in case
> you disagree though.
>
> Patrick

Okay that makes a bunch of sense, I think it is okay though and not
worth a reroll.

Karthik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2024-05-22  7:56 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-08 11:03 [PATCH 00/13] reftable: prepare for re-seekable iterators Patrick Steinhardt
2024-05-08 11:03 ` [PATCH 01/13] reftable/block: use `size_t` to track restart point index Patrick Steinhardt
2024-05-08 11:03 ` [PATCH 02/13] reftable/reader: avoid copying index iterator Patrick Steinhardt
2024-05-08 11:03 ` [PATCH 03/13] reftable/reader: unify indexed and linear seeking Patrick Steinhardt
2024-05-08 11:03 ` [PATCH 04/13] reftable/reader: separate concerns of table iter and reftable reader Patrick Steinhardt
2024-05-08 11:03 ` [PATCH 05/13] reftable/reader: inline `reader_seek_internal()` Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 06/13] reftable/reader: set up the reader when initializing table iterator Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 07/13] reftable/merged: split up initialization and seeking of records Patrick Steinhardt
2024-05-10 19:18   ` Justin Tobler
2024-05-13  8:36     ` Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 08/13] reftable/merged: simplify indices for subiterators Patrick Steinhardt
2024-05-10 19:25   ` Justin Tobler
2024-05-13  8:36     ` Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 09/13] reftable/generic: move seeking of records into the iterator Patrick Steinhardt
2024-05-10 21:44   ` Justin Tobler
2024-05-13  8:36     ` Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 10/13] reftable/generic: adapt interface to allow reuse of iterators Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 11/13] reftable/reader: " Patrick Steinhardt
2024-05-10 21:48   ` Justin Tobler
2024-05-13  8:36     ` Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 12/13] reftable/stack: provide convenience functions to create iterators Patrick Steinhardt
2024-05-08 11:04 ` [PATCH 13/13] reftable/merged: adapt interface to allow reuse of iterators Patrick Steinhardt
2024-05-08 23:42 ` [PATCH 00/13] reftable: prepare for re-seekable iterators Junio C Hamano
2024-05-09  0:16   ` Junio C Hamano
2024-05-10  7:48   ` Patrick Steinhardt
2024-05-10 15:40     ` Junio C Hamano
2024-05-10 16:13       ` Patrick Steinhardt
2024-05-10 17:17         ` Junio C Hamano
2024-05-13  8:46 ` [PATCH v2 " Patrick Steinhardt
2024-05-13  8:47   ` [PATCH v2 01/13] reftable/block: use `size_t` to track restart point index Patrick Steinhardt
2024-05-21 13:34     ` Karthik Nayak
2024-05-22  7:23       ` Patrick Steinhardt
2024-05-13  8:47   ` [PATCH v2 02/13] reftable/reader: avoid copying index iterator Patrick Steinhardt
2024-05-13  8:47   ` [PATCH v2 03/13] reftable/reader: unify indexed and linear seeking Patrick Steinhardt
2024-05-21 14:41     ` Karthik Nayak
2024-05-22  7:23       ` Patrick Steinhardt
2024-05-22  7:56         ` Karthik Nayak
2024-05-13  8:47   ` [PATCH v2 04/13] reftable/reader: separate concerns of table iter and reftable reader Patrick Steinhardt
2024-05-13  8:47   ` [PATCH v2 05/13] reftable/reader: inline `reader_seek_internal()` Patrick Steinhardt
2024-05-13  8:47   ` [PATCH v2 06/13] reftable/reader: set up the reader when initializing table iterator Patrick Steinhardt
2024-05-13  8:47   ` [PATCH v2 07/13] reftable/merged: split up initialization and seeking of records Patrick Steinhardt
2024-05-13  8:47   ` [PATCH v2 08/13] reftable/merged: simplify indices for subiterators Patrick Steinhardt
2024-05-13  8:47   ` [PATCH v2 09/13] reftable/generic: move seeking of records into the iterator Patrick Steinhardt
2024-05-13  8:47   ` [PATCH v2 10/13] reftable/generic: adapt interface to allow reuse of iterators Patrick Steinhardt
2024-05-13  8:47   ` [PATCH v2 11/13] reftable/reader: " Patrick Steinhardt
2024-05-13  8:47   ` [PATCH v2 12/13] reftable/stack: provide convenience functions to create iterators Patrick Steinhardt
2024-05-13  8:48   ` [PATCH v2 13/13] reftable/merged: adapt interface to allow reuse of iterators Patrick Steinhardt
2024-05-15 20:15   ` [PATCH v2 00/13] reftable: prepare for re-seekable iterators Justin Tobler
2024-05-21 15:31   ` Karthik Nayak
2024-05-22  7:23     ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).