git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net, stolee@gmail.com,
	git@jeffhostetler.com, Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH 1/3] csum-file: add nested_hashfile()
Date: Fri, 26 Mar 2021 19:12:05 +0000	[thread overview]
Message-ID: <0eca529766fcbe70147bc28ce7f0692e645fd118.1616785928.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.916.git.1616785928.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

The index writing code in do_write_index() uses a custom set of hashing
code, in part because it was introduced before the hashfile API. But
also, the End of Index Entries extension computes a hash of just the
extension data, not the entire file preceding that extension.

Before converting the index writing code to use the hashfile API, create
a concept of a "nested hashfile". By adding a 'base' member to 'struct
hashfile', we indicate that any writes to this hashfile should be passed
along to the base hashfile, too.

In the next change, the index code will use this to create a new
hashfile wose base is the hashfile for the index. The outer hashfile
will compute the hash just for the extension details. Thus, it will
finalize earlier than the base hashfile, hence there is no modification
to finalize_hashfile() here.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 csum-file.c | 22 ++++++++++++++++++++++
 csum-file.h |  9 +++++++++
 2 files changed, 31 insertions(+)

diff --git a/csum-file.c b/csum-file.c
index 0f35fa5ee47c..e73b35316e66 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -13,6 +13,9 @@
 
 static void flush(struct hashfile *f, const void *buf, unsigned int count)
 {
+	if (f->base)
+		return;
+
 	if (0 <= f->check_fd && count)  {
 		unsigned char check_buffer[8192];
 		ssize_t ret = read_in_full(f->check_fd, check_buffer, count);
@@ -116,6 +119,9 @@ void hashwrite(struct hashfile *f, const void *buf, unsigned int count)
 		}
 		f->offset = offset;
 	}
+
+	if (f->base)
+		hashwrite(f->base, buf, count);
 }
 
 struct hashfile *hashfd(int fd, const char *name)
@@ -150,6 +156,7 @@ struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp
 	f->name = name;
 	f->do_crc = 0;
 	the_hash_algo->init_fn(&f->ctx);
+	f->base = NULL;
 	return f;
 }
 
@@ -184,3 +191,18 @@ uint32_t crc32_end(struct hashfile *f)
 	f->do_crc = 0;
 	return f->crc32;
 }
+
+struct hashfile *nested_hashfile(struct hashfile *f)
+{
+	struct hashfile *n = xmalloc(sizeof(*f));
+	n->fd = -1;
+	n->check_fd = -1;
+	n->offset = 0;
+	n->total = 0;
+	n->tp = NULL;
+	n->name = NULL;
+	n->do_crc = 0;
+	the_hash_algo->init_fn(&n->ctx);
+	n->base = f;
+	return n;
+}
diff --git a/csum-file.h b/csum-file.h
index e54d53d1d0b3..b8785e7ecb46 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -16,6 +16,7 @@ struct hashfile {
 	const char *name;
 	int do_crc;
 	uint32_t crc32;
+	struct hashfile *base;
 	unsigned char buffer[8192];
 };
 
@@ -42,6 +43,14 @@ void hashflush(struct hashfile *f);
 void crc32_begin(struct hashfile *);
 uint32_t crc32_end(struct hashfile *);
 
+/*
+ * A nested hashfile uses the same interface as a hashfile, and computes
+ * a hash for the input bytes while passing them to the base hashfile
+ * instead of writing them to its own file. This is useful for computing
+ * a hash of a region within a file during the write.
+ */
+struct hashfile *nested_hashfile(struct hashfile *f);
+
 /*
  * Returns the total number of bytes fed to the hashfile so far (including ones
  * that have not been written out to the descriptor yet).
-- 
gitgitgadget


  reply	other threads:[~2021-03-26 19:12 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-26 19:12 [PATCH 0/3] Convert index writes to use hashfile API Derrick Stolee via GitGitGadget
2021-03-26 19:12 ` Derrick Stolee via GitGitGadget [this message]
2021-03-26 19:12 ` [PATCH 2/3] read-cache: use hashfile instead of git_hash_ctx Derrick Stolee via GitGitGadget
2021-03-29 15:04   ` Derrick Stolee
2021-03-29 19:10     ` Derrick Stolee
2021-03-26 19:12 ` [PATCH 3/3] read-cache: delete unused hashing methods Derrick Stolee via GitGitGadget
2021-03-26 20:16 ` [PATCH 0/3] Convert index writes to use hashfile API Derrick Stolee
2021-05-17 12:24 ` [PATCH v2 0/4] " Derrick Stolee via GitGitGadget
2021-05-17 12:24   ` [PATCH v2 1/4] hashfile: use write_in_full() Derrick Stolee via GitGitGadget
2021-05-17 12:24   ` [PATCH v2 2/4] csum-file.h: increase hashfile buffer size Derrick Stolee via GitGitGadget
2021-05-17 21:54     ` Junio C Hamano
2021-05-18  7:33       ` Jeff King
2021-05-18 14:44         ` Derrick Stolee
2021-05-18  7:31     ` Jeff King
2021-05-18  7:42       ` Jeff King
2021-05-17 12:24   ` [PATCH v2 3/4] read-cache: use hashfile instead of git_hash_ctx Derrick Stolee via GitGitGadget
2021-05-17 22:13     ` Junio C Hamano
2021-05-18 14:16       ` Derrick Stolee
2021-05-17 12:24   ` [PATCH v2 4/4] read-cache: delete unused hashing methods Derrick Stolee via GitGitGadget
2021-05-18 18:32   ` [PATCH v3 0/4] Convert index writes to use hashfile API Derrick Stolee via GitGitGadget
2021-05-18 18:32     ` [PATCH v3 1/4] hashfile: use write_in_full() Derrick Stolee via GitGitGadget
2021-05-18 18:32     ` [PATCH v3 2/4] csum-file.h: increase hashfile buffer size Derrick Stolee via GitGitGadget
2021-11-25 12:14       ` t4216-log-bloom.sh fails with -v (but not --verbose-log) Ævar Arnfjörð Bjarmason
2021-11-26  4:08         ` Jeff King
2021-11-29 13:49           ` Derrick Stolee
2021-05-18 18:32     ` [PATCH v3 3/4] read-cache: use hashfile instead of git_hash_ctx Derrick Stolee via GitGitGadget
2021-05-18 18:32     ` [PATCH v3 4/4] read-cache: delete unused hashing methods Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0eca529766fcbe70147bc28ce7f0692e645fd118.1616785928.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    --subject='Re: [PATCH 1/3] csum-file: add nested_hashfile()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).