git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH v2 2/3] object-file: emit corruption errors when detected
Date: Wed, 7 Dec 2022 01:42:29 -0500	[thread overview]
Message-ID: <Y5A11dOFgHP/ADcS@coredump.intra.peff.net> (raw)
In-Reply-To: <9ddfff3585c293c9801570e395b514505796a43f.1670373420.git.jonathantanmy@google.com>

On Tue, Dec 06, 2022 at 04:40:52PM -0800, Jonathan Tan wrote:

> Note that in the RHS of this patch's diff, a check for ENOENT that was
> introduced in 3ba7a06552 (A loose object is not corrupt if it cannot
> be read due to EMFILE, 2010-10-28) is also removed. The purpose of this
> check is to avoid a false report of corruption if the errno contains
> something like EMFILE (or anything that is not ENOENT), in which case
> a more generic report is presented. Because, as of this patch, we no
> longer rely on such a heuristic to determine corruption, but surface
> the error message at the point when we read something that we did not
> expect, this check is no longer necessary.

You're right that this will not say "oops, the object is corrupted" when
we get EMFILE, etc, which is what 3ba7a06552 was fixing. But I think
after your patch, we would also never actually say "could not open
$path: too many open descriptors". I don't think that kicked in reliably
with the current code, but it seems like something we are losing in this
patch.

I.e., I think you'd want to also complain when map_loose_object()
returns anything but ENOENT. The errno value there is reliable-ish,
though it might be worth spending the extra work to preserve it across
the close() calls, just in case. Or if you split the open/mmap as Ævar
suggested, then it becomes:

  fd = open_loose_object(r, oid, &path);
  if (fd < 0) {
	if (errno != ENOENT)
		error_errno("unable to open loose object %s", path);
	return -1;
  }

  buf = map_loose_object_1(fd, path);
  if (!buf) {
	/* not much point in complaining here, as xmmap will die on
	 * error. In theory this could catch fstat() problems, but
	 * probably map_loose_object_1() itself should do so, since
	 * it already is special-casing empty files.
	 */
        close(fd);
	return -1;
  }

> diff --git a/object-file.c b/object-file.c
> index 596dd049fd..c7a513d123 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1215,7 +1215,8 @@ static int quick_has_loose(struct repository *r,
>   * searching for a loose object named "oid".
>   */
>  static void *map_loose_object_1(struct repository *r, const char *path,
> -			     const struct object_id *oid, unsigned long *size)
> +				const struct object_id *oid, unsigned long *size,
> +				char **mapped_path)
>  {
>  	void *map;
>  	int fd;
> @@ -1224,6 +1225,9 @@ static void *map_loose_object_1(struct repository *r, const char *path,
>  		fd = git_open(path);
>  	else
>  		fd = open_loose_object(r, oid, &path);
> +	if (mapped_path)
> +		*mapped_path = xstrdup(path);
> +

This introduces an extra malloc/free in every object lookup, even in the
success case where we don't even bother using the value. It's probably
not really noticeable, but it kind of feels wrong. Especially when
open_loose_object() already returns a pointer to long-ish term storage.

One solution is...

> @@ -1497,8 +1504,13 @@ static int loose_object_info(struct repository *r,
>  		break;
>  	}
>  
> +	if (status && (flags & OBJECT_INFO_DIE_IF_CORRUPT))
> +		die(_("loose object %s (stored in %s) is corrupt"),
> +		    oid_to_hex(oid), mapped_path);
> +

...we could just say "loose object %s is corrupt", which is generally
sufficient (outside of alternates, you can only have one copy anyway).

But if we want to retain it, we could just redo the lookup in the error
path, which is what the existing code (that you're deleting) does, via
stat_loose_object(). That's even more wasteful than an extra
malloc/free, but you only pay the cost for a corrupted object. It's
racy, of course, but probably not in a meaningful way in practice.

Alternatively, I like Ævar's suggestion to just split
map_loose_object(). Then this code would naturally have the path, via
calling open_loose_object() itself.

> diff --git a/object-store.h b/object-store.h
> index 1be57abaf1..01134ab5ec 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -447,6 +447,9 @@ struct object_info {
>   */
>  #define OBJECT_INFO_FOR_PREFETCH (OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK)
>  
> +/* Die if object corruption (not just an object being missing) was detected. */
> +#define OBJECT_INFO_DIE_IF_CORRUPT 64

I have a suspicion that the world would be a better place if these die()
calls simply went away, in favor of returning -1 up the stack. But I'm
OK leaving it as-is for the sake of trying not to do too many things at
once (I probably wouldn't have even mentioned it, except that if we do
want to end up there in the long run, we'd eventually have to rip out
this new flag and its associated plumbing).

-Peff

  parent reply	other threads:[~2022-12-07  6:42 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-30 20:30 [PATCH 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-11-30 20:30 ` [PATCH 1/4] object-file: reread object with exact same args Jonathan Tan
2022-11-30 20:30 ` [PATCH 2/4] object-file: refactor corrupt object diagnosis Jonathan Tan
2022-11-30 20:47   ` Jeff King
2022-11-30 23:42     ` Junio C Hamano
2022-12-01 19:06       ` Jonathan Tan
2022-11-30 20:30 ` [PATCH 3/4] object-file: refactor replace object lookup Jonathan Tan
2022-11-30 20:54   ` Jeff King
2022-11-30 20:30 ` [PATCH 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-11-30 21:04   ` Jeff King
2022-12-01 19:11     ` Jonathan Tan
2022-12-01 19:33       ` Jeff King
2022-11-30 23:56   ` Junio C Hamano
2022-11-30 21:06 ` [PATCH 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-01 19:27 ` [PATCH v2 " Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 1/4] object-file: reread object with exact same args Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 2/4] object-file: refactor corrupt object diagnosis Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 3/4] object-file: refactor replace object lookup Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-01 19:54   ` [PATCH v2 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-01 21:26     ` Jonathan Tan
2022-12-02  0:23       ` Jeff King
2022-12-06  0:49         ` Jonathan Tan
2022-12-06  2:03           ` Jeff King
2022-12-01 23:09     ` Junio C Hamano
2022-12-07  0:40 ` [PATCH v2 0/3] " Jonathan Tan
2022-12-07  0:40   ` [PATCH v2 1/3] object-file: don't exit early if skipping loose Jonathan Tan
2022-12-07  1:12     ` Junio C Hamano
2022-12-07  6:14       ` Jeff King
2022-12-07  6:43         ` Junio C Hamano
2022-12-07 23:20           ` Jonathan Tan
2022-12-07  0:40   ` [PATCH v2 2/3] object-file: emit corruption errors when detected Jonathan Tan
2022-12-07  1:16     ` Junio C Hamano
2022-12-07  4:05     ` Ævar Arnfjörð Bjarmason
2022-12-07  7:07       ` Jeff King
2022-12-07 10:33         ` Ævar Arnfjörð Bjarmason
2022-12-07 23:26           ` Jonathan Tan
2022-12-07 23:50             ` Ævar Arnfjörð Bjarmason
2022-12-08  6:33               ` Jeff King
2022-12-07  6:42     ` Jeff King [this message]
2022-12-07  0:40   ` [PATCH v2 3/3] commit: don't lazy-fetch commits Jonathan Tan
2022-12-07  1:17     ` Junio C Hamano
2022-12-07  6:47     ` Jeff King
2022-12-08 20:57 ` [PATCH v3 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-08 20:57   ` [PATCH v3 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-08 20:57   ` [PATCH v3 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-09  2:00     ` Jeff King
2022-12-09 18:17       ` Jonathan Tan
2022-12-09 20:27         ` Jeff King
2022-12-09 20:27           ` Jeff King
2022-12-08 20:57   ` [PATCH v3 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-09  1:56     ` Jeff King
2022-12-09 18:26       ` Jonathan Tan
2022-12-09 14:19     ` Ævar Arnfjörð Bjarmason
2022-12-09 18:33       ` Jonathan Tan
2022-12-08 20:57   ` [PATCH v3 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-09 14:14     ` Ævar Arnfjörð Bjarmason
2022-12-09 21:44 ` [PATCH v4 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-09 21:44   ` [PATCH v4 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-09 21:44   ` [PATCH v4 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-09 21:44   ` [PATCH v4 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-10  0:16     ` Junio C Hamano
2022-12-12 20:38       ` Jonathan Tan
2022-12-12 20:49       ` Jeff King
2022-12-12 20:59         ` Jonathan Tan
2022-12-12 21:20           ` Jeff King
2022-12-12 21:29             ` Jonathan Tan
2022-12-12 22:17               ` Jeff King
2022-12-12 22:52             ` Jonathan Tan
2022-12-13 10:37               ` Jeff King
2022-12-09 21:44   ` [PATCH v4 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-12 22:48 ` [PATCH v5 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-12 22:48   ` [PATCH v5 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-12 22:48   ` [PATCH v5 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-12 22:48   ` [PATCH v5 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-13  1:51     ` Junio C Hamano
2022-12-13 10:38       ` Jeff King
2022-12-12 22:48   ` [PATCH v5 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-14 19:17 ` [PATCH v6 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-14 20:43   ` [PATCH v6 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-15  0:07     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y5A11dOFgHP/ADcS@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).