git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "René Scharfe" <l.s.r@web.de>
To: Junio C Hamano <gitster@pobox.com>
Cc: Brandon Williams <bwilliamseng@gmail.com>,
	git <git@vger.kernel.org>, Jeff King <peff@peff.net>
Subject: Re: invalid tree and commit object
Date: Sat, 9 May 2020 21:24:43 +0200	[thread overview]
Message-ID: <938f0818-7e57-b883-009f-01db88ef8f65@web.de> (raw)
In-Reply-To: <xmqqpnbduiec.fsf@gitster.c.googlers.com>

Am 09.05.20 um 19:28 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> So I got curious if such trees might be in popular repos, wrote the patch
>> below and checked around a bit, but couldn't find any.
>>
>> Is there a smarter way to check for duplicates?  One that doesn't need
>> allocations?  Perhaps by having a version of tree_entry_extract() that
>> seeks backwards somehow?
>
> I've never looked into seeking backwards in a tree object, but in
> unpack-trees, I had to deal with this exact problem that a blob
> 'hello' sorts before 'hello.c' which in turn sorts after a tree
> 'hello' because of the "implicit slash after the name for an entry
> for a tree object" rule by introducing the "cache_bottom" hack in
> the traversal logic to limit how far we must scan back.
>
> We may be able to limit the list of "seen recently" names in a
> similar way.
>
> If the tree we are scanning has 'hello' (blob), 'hello.c' and
> 'hellp', the bottom pointer initially would be at 'hello' (blob),
> then stay there when we see 'hello.c' (because the next entry might
> be 'hello' (tree) that would crash with 'hello'), and when we see
> the entry 'hellp', we know that the entry at the bottom pointer
> 'hello' (blob) cannot crash with any entry that comes later in the
> tree object we are scanning, so we can advance the bottom pointer
> forward.  To decide if we can advance the bottom pointer beyond
> 'hello.c' (blob), we see if 'hello.c' (tree) can appear after the
> current entry we are looking at (i.e. 'hellp'), and we know it
> cannot without violating the sort order.  So the bottom would move
> to point at 'hellp' we just saw.
>
> If we had 'hello' (tree) instead of 'hellp', when we look at it
> after looking at 'hello' (blob) and 'hello.c', we scan from the
> bottom pointer up to the previous entry, which is still pointing at
> 'hello' (blob), and notice the crash.
>

Hmm, this could lead to quadratic behavior in the worst case, can't it?
Imagine you have:

  a
  a.b
  a.b.c
  ...
  a.b.c/
  a.b/
  a/

If you have a single bottom pointer, it needs to stay at "a" the whole
time, so that you can detect the d/f conflict with "a/" at the end.
And for all entries in between you need to scan from "a" on and compare
each entry -- quadratic.  The blobs "a.b" and "a.b.c" don't need to
actually be present, but we need to scan all entries to determine
"a.b/" and "a.b.c/" are not conflicting anyway.  Right?

We could, however, reduce the names we add to the string_list to
those that are possible candidates for conflict -- blobs followed by an
entry whose name starts with the blob name followed by a dot and trees
that follow an entry whose name matches in the same way.

René

  reply	other threads:[~2020-05-09 19:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-09  6:19 invalid tree and commit object Brandon Williams
2020-05-09 10:16 ` René Scharfe
2020-05-09  7:16   ` Johannes Schindelin
2020-05-09 11:51     ` René Scharfe
2020-05-09 17:28   ` Junio C Hamano
2020-05-09 19:24     ` René Scharfe [this message]
2020-05-09 20:27       ` Junio C Hamano
2020-05-10  9:07         ` René Scharfe
2020-05-10 16:12           ` René Scharfe
2020-05-11 16:25             ` Junio C Hamano
2020-05-13 16:27               ` Brandon Williams
2020-05-21  9:51               ` René Scharfe
2020-05-21  9:52               ` [PATCH 1/4] fsck: fix a typo in a comment René Scharfe
2020-05-21 10:10                 ` Denton Liu
2020-05-21 11:15                 ` René Scharfe
2020-05-21  9:52               ` [PATCH 2/4] t1450: increase test coverage of in-tree d/f detection René Scharfe
2020-05-21 10:20                 ` Denton Liu
2020-05-21 13:31                   ` René Scharfe
2020-05-21 18:01                     ` Junio C Hamano
2020-05-21  9:52               ` [PATCH 3/4] t1450: demonstrate undetected in-tree d/f conflict René Scharfe
2020-05-21  9:52               ` [PATCH 4/4] fsck: detect more in-tree d/f conflicts René Scharfe
2020-05-10 16:37           ` invalid tree and commit object Junio C Hamano
2020-05-21  9:51             ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=938f0818-7e57-b883-009f-01db88ef8f65@web.de \
    --to=l.s.r@web.de \
    --cc=bwilliamseng@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).