git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH v2] builtin/pack-objects.c: introduce `pack.extraCruftTips`
Date: Fri, 5 May 2023 18:13:57 -0400	[thread overview]
Message-ID: <20230505221357.GD3321533@coredump.intra.peff.net> (raw)
In-Reply-To: <20230505212631.GB3321533@coredump.intra.peff.net>

On Fri, May 05, 2023 at 05:26:31PM -0400, Jeff King wrote:

> But since the cruft-pack implementation diverges quite a bit from the
> regular "-A" handling, I guess it makes the code more complex rather
> than less. The asymmetry feels like a wart to me, but I guess in the
> long run we'd hope that the "turn unreachable loose" strategy is
> deprecated anyway.

One thing that could make this a lot simpler is if the code was added to
"are we recent" code paths in the first place.

Something like this:

diff --git a/reachable.c b/reachable.c
index 55bb114353..86bb5e021e 100644
--- a/reachable.c
+++ b/reachable.c
@@ -16,6 +16,9 @@
 #include "object-store.h"
 #include "pack-bitmap.h"
 #include "pack-mtimes.h"
+#include "oidset.h"
+#include "run-command.h"
+#include "config.h"
 
 struct connectivity_progress {
 	struct progress *progress;
@@ -67,8 +70,71 @@ struct recent_data {
 	timestamp_t timestamp;
 	report_recent_object_fn *cb;
 	int ignore_in_core_kept_packs;
+	struct oidset fake_recent_oids;
+	int fake_recent_oids_loaded;
 };
 
+static int run_one_recent_cmd(struct oidset *set, const char *args)
+{
+	struct child_process cmd = CHILD_PROCESS_INIT;
+	struct strbuf buf = STRBUF_INIT;
+	FILE *out;
+	int ret = 0;
+
+	cmd.use_shell = 1;
+	cmd.out = -1;
+
+	strvec_push(&cmd.args, args);
+
+	if (start_command(&cmd))
+		return -1;
+
+	out = xfdopen(cmd.out, "r");
+	while (strbuf_getline(&buf, out) != EOF) {
+		struct object_id oid;
+		const char *rest;
+
+		if (parse_oid_hex(buf.buf, &oid, &rest) || *rest)
+			die(_("invalid extra cruft tip: '%s'"), buf.buf);
+
+		oidset_insert(set, &oid);
+	}
+
+	if (finish_command(&cmd))
+		die(_("cruft tip hook returned error"));
+
+	fclose(out);
+	strbuf_release(&buf);
+
+	return ret;
+}
+
+static int obj_is_recent(const struct object_id *oid, timestamp_t mtime,
+			 struct recent_data *data)
+{
+	if (mtime > data->timestamp)
+		return 1;
+
+	if (!data->fake_recent_oids_loaded) {
+		const struct string_list *programs;
+
+		if (!git_config_get_string_multi("pack.extracrufttips", &programs)) {
+			size_t i;
+			for (i = 0; i < programs->nr; i++) {
+				if (run_one_recent_cmd(&data->fake_recent_oids,
+						       programs->items[i].string) < 0)
+					die(_("unable to enumerate additional cruft tips"));
+			}
+		}
+		data->fake_recent_oids_loaded = 1;
+	}
+
+	if (oidset_contains(&data->fake_recent_oids, oid))
+		return 1;
+
+	return 0;
+}
+
 static void add_recent_object(const struct object_id *oid,
 			      struct packed_git *pack,
 			      off_t offset,
@@ -78,7 +144,7 @@ static void add_recent_object(const struct object_id *oid,
 	struct object *obj;
 	enum object_type type;
 
-	if (mtime <= data->timestamp)
+	if (!obj_is_recent(oid, mtime, data))
 		return;
 
 	/*
@@ -193,6 +259,10 @@ int add_unseen_recent_objects_to_traversal(struct rev_info *revs,
 	data.cb = cb;
 	data.ignore_in_core_kept_packs = ignore_in_core_kept_packs;
 
+	/* XXX must free before exiting function */
+	oidset_init(&data.fake_recent_oids, 0);
+	data.fake_recent_oids_loaded = 0;
+
 	r = for_each_loose_object(add_recent_loose, &data,
 				  FOR_EACH_OBJECT_LOCAL_ONLY);
 	if (r)

That would affect both "repack -A" and "repack --cruft". It would also
affect "git prune", but that seems like a good thing to me.

It would not affect a straight "repack -ad", since there we do not ask
about recency at all. I'm not sure if that use case is important or not.
If it is, we could easily trigger the "is it recent" code in that case
only if a hook is configured (we'd just set the cutoff to "0" so that
everything looks "too old").

Note that this inverts the loop logic from what you have. In your patch,
you ask the hook to enumerate all faux-recent objects, and then you
treat them separately from objects we found on disk. Here we enumerate
the objects on disk as usual, and just adjust our idea of "recent" based
on the hook. I think this is a bit simpler because objects we don't even
bother to ask it about objects we've already handled, are already marked
as recent, or are not present in the repository.

-Peff

  reply	other threads:[~2023-05-05 22:14 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-20 17:27 [PATCH] builtin/pack-objects.c: introduce `pack.extraCruftTips` Taylor Blau
2023-04-20 18:12 ` Junio C Hamano
2023-04-20 19:30   ` Taylor Blau
2023-04-20 19:52     ` Junio C Hamano
2023-04-20 20:48       ` Taylor Blau
2023-04-21  0:10 ` Chris Torek
2023-04-21  2:14   ` Taylor Blau
2023-04-25 19:42 ` Derrick Stolee
2023-04-25 21:25   ` Taylor Blau
2023-04-26 10:52     ` Derrick Stolee
2023-05-03  0:06       ` Taylor Blau
2023-05-03  0:09 ` [PATCH v2] " Taylor Blau
2023-05-03 14:01   ` Derrick Stolee
2023-05-03 19:59   ` Jeff King
2023-05-03 21:22     ` Taylor Blau
2023-05-05 21:23       ` Jeff King
2023-05-06  0:06         ` Taylor Blau
2023-05-06  0:14           ` Taylor Blau
2023-05-03 21:28     ` Taylor Blau
2023-05-05 21:26       ` Jeff King
2023-05-05 22:13         ` Jeff King [this message]
2023-05-06  0:13           ` Taylor Blau
2023-05-06  0:20             ` Taylor Blau
2023-05-06  2:12             ` Jeff King
2023-05-03 22:05 ` [PATCH v3] " Taylor Blau
2023-05-03 23:18   ` Junio C Hamano
2023-05-03 23:42     ` Junio C Hamano
2023-05-03 23:48       ` Taylor Blau
2023-05-03 23:50       ` Taylor Blau
2023-05-05 21:39     ` Jeff King
2023-05-05 22:19   ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230505221357.GD3321533@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).