git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Jeff King <peff@peff.net>
Cc: Jonathan Tan <jonathantanmy@google.com>,
	git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH v2 0/4] Don't lazy-fetch commits when parsing them
Date: Mon,  5 Dec 2022 16:49:34 -0800	[thread overview]
Message-ID: <20221206004935.1794596-1-jonathantanmy@google.com> (raw)
In-Reply-To: <Y4lFbemK4HHiCsyJ@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:
> Yeah, I don't see any difference in the new caller versus what the
> original was doing. The errno we care about comes from inside
> oid_object_info_extended(). So in any case, we'll see at least
> obj_read_unlock() followed by obj_read_lock() between the syscalls of
> interest and this check. And I don't even really see any indication that
> oid_object_info_extended() tries to set or preserve errno itself. The
> likely sequence is:
> 
>   - find_pack_entry() fails to find it; errno isn't set at all
>   - loose_object_info() tries to open it and probably gets ENOENT
>   - we check find_pack_entry() again after reprepare_packed_git()
>   - that fails so we return -1, barring submodule or partial clone
>     tricks
> 
> So it really seems like we're quite likely to get an errno from opening
> or mapping packs. Which implies the original suffers from the same
> issue, but we simply never triggered it meaningfully in a test.

Thanks for checking. I'm still not sure how the current code passes CI, but my
patches don't. 

> I'm not entirely sure on just removing the check. It comes from
> 3ba7a06552 (A loose object is not corrupt if it cannot be read due to
> EMFILE, 2010-10-28), so we'd lose what that commit is trying to do.
> Though I think even back then, I think it would have suffered from the
> same problems (minus the lock/unlock; I'm still unclear which syscall is
> the actual culprit here).

Ah, thanks for the pointer to that commit. Without that, my patch would report
corruption even if the real issue was EMFILE, as the commit message of that
commit describes.

> If we assume that errno from reading the object isn't reliable, I think
> you'd have to actually re-check things. Something like:
> 
>   if (find_pack_entry(...) || !stat_loose_object(...))
>     /* ok, it's not missing */
> 
> but of course we don't have the actual errno that _did_ cause us to
> fail, which makes the error message we'd print a lot less useful. Maybe
> this check should be ditched and we should complain much closer to the
> source of the problem:
> 
> diff --git a/object-file.c b/object-file.c
> index 26290554bb..743ba8210e 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -1460,8 +1460,12 @@ static int loose_object_info(struct repository *r,
>  	}
>  
>  	map = map_loose_object(r, oid, &mapsize);
> -	if (!map)
> +	if (!map) {
> +		if (errno != ENOENT)
> +			error_errno("unable to open loose object %s",
> +				    oid_to_hex(oid));
>  		return -1;
> +	}
>  
>  	if (!oi->sizep)
>  		oi->sizep = &size_scratch;
> 
> That might make things more verbose for other code paths, but that kind
> of seems like a good thing. If you have an object file that we can't
> open, we probably _should_ be complaining loudly about it.
> 
> We may need to be a little more careful about preserving errno in
> map_loose_object_1(), though (gee, another place where the existing
> check could run into trouble).

Besides needing to be careful in map_loose_object_1(), I'm not sure if this
fully solves the problem. This is non-fatal, so the EMFILE commit's work would
still remain undone. If this were made fatal, I think this would change the
behavior of too much code, especially those that can tolerate loose objects
being missing.
 
What do you think of not putting any die_if_corrupt() calls in the commit
parsing code at all? The error message printed would then be different (just a
generic message about being unable to parse a commit, versus the specific one
here) but it does pass CI [1]. Also, I don't think that we should be doing errno
diagnostics separate from what causes the errno anyway.

[1] https://github.com/jonathantanmy/git/actions/runs/3624495729

  reply	other threads:[~2022-12-06  0:49 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-30 20:30 [PATCH 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-11-30 20:30 ` [PATCH 1/4] object-file: reread object with exact same args Jonathan Tan
2022-11-30 20:30 ` [PATCH 2/4] object-file: refactor corrupt object diagnosis Jonathan Tan
2022-11-30 20:47   ` Jeff King
2022-11-30 23:42     ` Junio C Hamano
2022-12-01 19:06       ` Jonathan Tan
2022-11-30 20:30 ` [PATCH 3/4] object-file: refactor replace object lookup Jonathan Tan
2022-11-30 20:54   ` Jeff King
2022-11-30 20:30 ` [PATCH 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-11-30 21:04   ` Jeff King
2022-12-01 19:11     ` Jonathan Tan
2022-12-01 19:33       ` Jeff King
2022-11-30 23:56   ` Junio C Hamano
2022-11-30 21:06 ` [PATCH 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-01 19:27 ` [PATCH v2 " Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 1/4] object-file: reread object with exact same args Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 2/4] object-file: refactor corrupt object diagnosis Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 3/4] object-file: refactor replace object lookup Jonathan Tan
2022-12-01 19:27   ` [PATCH v2 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-01 19:54   ` [PATCH v2 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-01 21:26     ` Jonathan Tan
2022-12-02  0:23       ` Jeff King
2022-12-06  0:49         ` Jonathan Tan [this message]
2022-12-06  2:03           ` Jeff King
2022-12-01 23:09     ` Junio C Hamano
2022-12-07  0:40 ` [PATCH v2 0/3] " Jonathan Tan
2022-12-07  0:40   ` [PATCH v2 1/3] object-file: don't exit early if skipping loose Jonathan Tan
2022-12-07  1:12     ` Junio C Hamano
2022-12-07  6:14       ` Jeff King
2022-12-07  6:43         ` Junio C Hamano
2022-12-07 23:20           ` Jonathan Tan
2022-12-07  0:40   ` [PATCH v2 2/3] object-file: emit corruption errors when detected Jonathan Tan
2022-12-07  1:16     ` Junio C Hamano
2022-12-07  4:05     ` Ævar Arnfjörð Bjarmason
2022-12-07  7:07       ` Jeff King
2022-12-07 10:33         ` Ævar Arnfjörð Bjarmason
2022-12-07 23:26           ` Jonathan Tan
2022-12-07 23:50             ` Ævar Arnfjörð Bjarmason
2022-12-08  6:33               ` Jeff King
2022-12-07  6:42     ` Jeff King
2022-12-07  0:40   ` [PATCH v2 3/3] commit: don't lazy-fetch commits Jonathan Tan
2022-12-07  1:17     ` Junio C Hamano
2022-12-07  6:47     ` Jeff King
2022-12-08 20:57 ` [PATCH v3 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-08 20:57   ` [PATCH v3 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-08 20:57   ` [PATCH v3 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-09  2:00     ` Jeff King
2022-12-09 18:17       ` Jonathan Tan
2022-12-09 20:27         ` Jeff King
2022-12-09 20:27           ` Jeff King
2022-12-08 20:57   ` [PATCH v3 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-09  1:56     ` Jeff King
2022-12-09 18:26       ` Jonathan Tan
2022-12-09 14:19     ` Ævar Arnfjörð Bjarmason
2022-12-09 18:33       ` Jonathan Tan
2022-12-08 20:57   ` [PATCH v3 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-09 14:14     ` Ævar Arnfjörð Bjarmason
2022-12-09 21:44 ` [PATCH v4 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-09 21:44   ` [PATCH v4 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-09 21:44   ` [PATCH v4 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-09 21:44   ` [PATCH v4 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-10  0:16     ` Junio C Hamano
2022-12-12 20:38       ` Jonathan Tan
2022-12-12 20:49       ` Jeff King
2022-12-12 20:59         ` Jonathan Tan
2022-12-12 21:20           ` Jeff King
2022-12-12 21:29             ` Jonathan Tan
2022-12-12 22:17               ` Jeff King
2022-12-12 22:52             ` Jonathan Tan
2022-12-13 10:37               ` Jeff King
2022-12-09 21:44   ` [PATCH v4 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-12 22:48 ` [PATCH v5 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-12 22:48   ` [PATCH v5 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-12 22:48   ` [PATCH v5 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-12 22:48   ` [PATCH v5 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-13  1:51     ` Junio C Hamano
2022-12-13 10:38       ` Jeff King
2022-12-12 22:48   ` [PATCH v5 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-14 19:17 ` [PATCH v6 0/4] Don't lazy-fetch commits when parsing them Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 1/4] object-file: remove OBJECT_INFO_IGNORE_LOOSE Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 2/4] object-file: refactor map_loose_object_1() Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 3/4] object-file: emit corruption errors when detected Jonathan Tan
2022-12-14 19:17   ` [PATCH v6 4/4] commit: don't lazy-fetch commits Jonathan Tan
2022-12-14 20:43   ` [PATCH v6 0/4] Don't lazy-fetch commits when parsing them Jeff King
2022-12-15  0:07     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221206004935.1794596-1-jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).