All of lore.kernel.org
 help / color / mirror / Atom feed
* git describe --contains doesn't work properly for a commit
@ 2015-02-26 13:35 Michal Hocko
  2015-02-26 14:23 ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2015-02-26 13:35 UTC (permalink / raw)
  To: git

Hi,
I have just encountered an old kernel git commit:
commit c854363e80b49dd04a4de18ebc379eb8c8806674
Author: Dave Chinner <david@fromorbit.com>
Date:   Sat Feb 6 12:39:36 2010 +1100

    xfs: Use delayed write for inodes rather than async V2
[...]

which cannot be described properly:
$ git describe --contains c854363e80b49dd04a4de18ebc379eb8c8806674
fatal: cannot describe 'c854363e80b49dd04a4de18ebc379eb8c8806674'

but it seems to find a tag on which the commit is based:
$ git describe c854363e80b49dd04a4de18ebc379eb8c8806674
v2.6.33-rc4-49-gc854363e80b4

if I follow parents
sha=c854363e80b49dd04a4de18ebc379eb8c8806674; 
while true
do 
	parent=$(git show --format=%P $sha | head -1)
	echo $sha $parent
	git describe --contains $parent && break
	sha=$parent
done
c854363e80b49dd04a4de18ebc379eb8c8806674 777df5afdb26c71634edd60582be620ff94e87a0
fatal: cannot describe '777df5afdb26c71634edd60582be620ff94e87a0'
777df5afdb26c71634edd60582be620ff94e87a0 d5db0f97fbbeff11c88dec1aaf1536a975afbaeb
fatal: cannot describe 'd5db0f97fbbeff11c88dec1aaf1536a975afbaeb'
d5db0f97fbbeff11c88dec1aaf1536a975afbaeb 388f1f0c346b533b06d8bc792f7204ebc3e4b7da
v2.6.34-rc1~278^2~14

I am using:
$ git --version
git version 2.1.4

but the same seems to be the case with older git version (1.8.5.6).
$ git rev-list c854363e80b49dd04a4de18ebc379eb8c8806674..v2.6.34 | wc -l
11648

So there seems to be a line between the two commits AFAIU.

Is the history somehow broken or is it a bug in git?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git describe --contains doesn't work properly for a commit
  2015-02-26 13:35 git describe --contains doesn't work properly for a commit Michal Hocko
@ 2015-02-26 14:23 ` Michal Hocko
  2015-03-04 10:54   ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2015-02-26 14:23 UTC (permalink / raw)
  To: git

On Thu 26-02-15 14:35:34, Michal Hocko wrote:
> Hi,
> I have just encountered an old kernel git commit:
> commit c854363e80b49dd04a4de18ebc379eb8c8806674
> Author: Dave Chinner <david@fromorbit.com>
> Date:   Sat Feb 6 12:39:36 2010 +1100
> 
>     xfs: Use delayed write for inodes rather than async V2
> [...]
 
OK, I've managed to recreate this in a simple repo with 3 commits:
$ git log --format="%H %cd"
ab0efec2b697f2f9f864bb0e2cd77308d1f04561 Thu Feb 26 15:18:36 2015 +0100
d63972e4e4e7eda0444e56739ad09bfbc476b9bd Wed Feb 26 15:18:30 2014 +0100
108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3 Thu Feb 26 14:57:29 2015 +0100

The commit in the middle was ammended to have committer date in the
past.
$ git describe --contains d63972e4e4e7eda0444e56739ad09bfbc476b9bd
tag~1

but
$ git describe --contains 108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3
fatal: cannot describe '108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3'

I guess this is the same issue reported previously here:
http://git.661346.n2.nabble.com/git-describe-contains-fails-on-given-tree-td5448286.html

Can this be fixed somehow or it would lead to other kind of issues?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git describe --contains doesn't work properly for a commit
  2015-02-26 14:23 ` Michal Hocko
@ 2015-03-04 10:54   ` Jeff King
  2015-03-04 15:06     ` Michael J Gruber
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2015-03-04 10:54 UTC (permalink / raw)
  To: Michal Hocko; +Cc: git

On Thu, Feb 26, 2015 at 03:23:14PM +0100, Michal Hocko wrote:

> The commit in the middle was ammended to have committer date in the
> past.
> $ git describe --contains d63972e4e4e7eda0444e56739ad09bfbc476b9bd
> tag~1
> 
> but
> $ git describe --contains 108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3
> fatal: cannot describe '108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3'
> 
> I guess this is the same issue reported previously here:
> http://git.661346.n2.nabble.com/git-describe-contains-fails-on-given-tree-td5448286.html

Yes, the "describe --contains" algorithm uses timestamps to cut off the
traversal, so it can do the wrong thing if there's clock skew. It has a
"slop" margin of one day, but skew larger than that can fool it.

> Can this be fixed somehow or it would lead to other kind of issues?

The options are basically:

  1. Stop cutting off the traversal based on timestamps. This will make
     the common case of valid timestamps much slower, though, as it will
     have to walk all the way to the roots.

  2. Use a different slop mechanism. For example, keep walking up to 5
     commits past a commit suspected to be past the cutoff. This is
     relatively easy to do (we do it for "--since" checks), and would
     catch your case above. But of course it does not catch all cases of
     skew.

  3. Introduce a more trust-worthy mechanism for ordering commits. The
     timestamp here is really just a proxy for the oft-discussed
     "generation number" of the commit within the graph. We've avoided
     adding generation numbers because of the storage/complexity issues.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git describe --contains doesn't work properly for a commit
  2015-03-04 10:54   ` Jeff King
@ 2015-03-04 15:06     ` Michael J Gruber
  2015-03-04 18:05       ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Michael J Gruber @ 2015-03-04 15:06 UTC (permalink / raw)
  To: Jeff King, Michal Hocko; +Cc: git

Jeff King venit, vidit, dixit 04.03.2015 11:54:
> On Thu, Feb 26, 2015 at 03:23:14PM +0100, Michal Hocko wrote:
> 
>> The commit in the middle was ammended to have committer date in the
>> past.
>> $ git describe --contains d63972e4e4e7eda0444e56739ad09bfbc476b9bd
>> tag~1
>>
>> but
>> $ git describe --contains 108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3
>> fatal: cannot describe '108a0d5972fd2e5f25b2f38cfd2fee73031ff9d3'
>>
>> I guess this is the same issue reported previously here:
>> http://git.661346.n2.nabble.com/git-describe-contains-fails-on-given-tree-td5448286.html
> 
> Yes, the "describe --contains" algorithm uses timestamps to cut off the
> traversal, so it can do the wrong thing if there's clock skew. It has a
> "slop" margin of one day, but skew larger than that can fool it.
> 
>> Can this be fixed somehow or it would lead to other kind of issues?
> 
> The options are basically:
> 
>   1. Stop cutting off the traversal based on timestamps. This will make
>      the common case of valid timestamps much slower, though, as it will
>      have to walk all the way to the roots.
> 
>   2. Use a different slop mechanism. For example, keep walking up to 5
>      commits past a commit suspected to be past the cutoff. This is
>      relatively easy to do (we do it for "--since" checks), and would
>      catch your case above. But of course it does not catch all cases of
>      skew.
> 
>   3. Introduce a more trust-worthy mechanism for ordering commits. The
>      timestamp here is really just a proxy for the oft-discussed
>      "generation number" of the commit within the graph. We've avoided
>      adding generation numbers because of the storage/complexity issues.

Hmmh.

Storage: one int (or maybe less) per commit doesn't sound too bad. We
can probably do without on bare repos by default.

Complexity: Was that due to replace refs? Other than that, it seemed to
be simple: max(parent generation numbers)+1.

... or can reachability bitmaps help???

Michael

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git describe --contains doesn't work properly for a commit
  2015-03-04 15:06     ` Michael J Gruber
@ 2015-03-04 18:05       ` Jeff King
  2015-03-04 20:41         ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2015-03-04 18:05 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: Michal Hocko, git

On Wed, Mar 04, 2015 at 04:06:17PM +0100, Michael J Gruber wrote:

> >   3. Introduce a more trust-worthy mechanism for ordering commits. The
> >      timestamp here is really just a proxy for the oft-discussed
> >      "generation number" of the commit within the graph. We've avoided
> >      adding generation numbers because of the storage/complexity issues.
> 
> Hmmh.
> 
> Storage: one int (or maybe less) per commit doesn't sound too bad. We
> can probably do without on bare repos by default.
> 
> Complexity: Was that due to replace refs? Other than that, it seemed to
> be simple: max(parent generation numbers)+1.

Calculating them is simple. Caching and storage is the bigger question.
When we do we generate them? Where do we store them? What do we do with
replace-refs and grafts?

I think the answers are "at repack time", "in an auxiliary file alongside
the pack idx", and "we turn it off completely when these features are in
use".

See:

  http://thread.gmane.org/gmane.comp.version-control.git/214916

for a sample implementation.

> ... or can reachability bitmaps help???

Sometimes. If you are asking about --contains traversals, then bitmaps
can let you stop the traversal early. We have some patches that do this
that are running in production at GitHub, but they are kind of gnarly.
One of my goals is to clean them up and get them upstream.

It's also part of why I didn't pursue the series above further. Making
"--contains" faster is one goal, but making "rev-list --objects --all"
faster was more important (since we do it for every fetch). And making
commits faster is only half the equation there.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git describe --contains doesn't work properly for a commit
  2015-03-04 18:05       ` Jeff King
@ 2015-03-04 20:41         ` Junio C Hamano
  2015-03-04 22:05           ` Mike Hommey
  2015-03-05  5:12           ` Jeff King
  0 siblings, 2 replies; 9+ messages in thread
From: Junio C Hamano @ 2015-03-04 20:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael J Gruber, Michal Hocko, git

Jeff King <peff@peff.net> writes:

> On Wed, Mar 04, 2015 at 04:06:17PM +0100, Michael J Gruber wrote:
>
>> Complexity: Was that due to replace refs? Other than that, it seemed to
>> be simple: max(parent generation numbers)+1.
>
> Calculating them is simple. Caching and storage is the bigger question.

Yes, also having to handle the ones whose generation numbers haven't
been computed yet adds to the complexity.

This one, and $gmane/264101, are a few instances of this known issue
raised here recently.  I have been wondering if we can do something
along the following (these are not alternatives) as a cheaper
workaround:

 (1) Introduce '--skewed-timestamps[=(allow|warn|reject)' to all
     commands that create new commit objects.  If the committer
     timestamp being used is older than any of the parent commits,
     "warn" or "reject" depending on the setting.

     Make 'reject' the default for commands that are purely local
     (i.e. recording your own progress by cherry-picking,
     committing, rebasing, reverting, etc.) and 'warn' the default
     for commands that merge other peoples' history that you may
     lack the power to rewind and correct (e.g. 'pull' and 'merge'
     from remote tracking refs).

 (2) Compute a bitmap whose timestamps are suspect when we pack to
     mark commits.  When revision.c:limit_list() tries to see if
     there still are interesting commits, an UNINTERESTING commit
     marked as such shouldn't be counted as "not interesting because
     it is old enough".  Use the same hint in the walker used in
     "describe --contains".

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git describe --contains doesn't work properly for a commit
  2015-03-04 20:41         ` Junio C Hamano
@ 2015-03-04 22:05           ` Mike Hommey
  2015-03-05  5:12           ` Jeff King
  1 sibling, 0 replies; 9+ messages in thread
From: Mike Hommey @ 2015-03-04 22:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Michael J Gruber, Michal Hocko, git

On Wed, Mar 04, 2015 at 12:41:57PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > On Wed, Mar 04, 2015 at 04:06:17PM +0100, Michael J Gruber wrote:
> >
> >> Complexity: Was that due to replace refs? Other than that, it seemed to
> >> be simple: max(parent generation numbers)+1.
> >
> > Calculating them is simple. Caching and storage is the bigger question.
> 
> Yes, also having to handle the ones whose generation numbers haven't
> been computed yet adds to the complexity.
> 
> This one, and $gmane/264101, are a few instances of this known issue
> raised here recently.  I have been wondering if we can do something
> along the following (these are not alternatives) as a cheaper
> workaround:
> 
>  (1) Introduce '--skewed-timestamps[=(allow|warn|reject)' to all
>      commands that create new commit objects.  If the committer
>      timestamp being used is older than any of the parent commits,
>      "warn" or "reject" depending on the setting.
> 
>      Make 'reject' the default for commands that are purely local
>      (i.e. recording your own progress by cherry-picking,
>      committing, rebasing, reverting, etc.) and 'warn' the default
>      for commands that merge other peoples' history that you may
>      lack the power to rewind and correct (e.g. 'pull' and 'merge'
>      from remote tracking refs).

Please note that a common cause (for me, at least) of skewed timestamps
is importing from a foreign VCS.

Mike

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git describe --contains doesn't work properly for a commit
  2015-03-04 20:41         ` Junio C Hamano
  2015-03-04 22:05           ` Mike Hommey
@ 2015-03-05  5:12           ` Jeff King
  2015-03-05  6:00             ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Jeff King @ 2015-03-05  5:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael J Gruber, Michal Hocko, git

On Wed, Mar 04, 2015 at 12:41:57PM -0800, Junio C Hamano wrote:

> > Calculating them is simple. Caching and storage is the bigger question.
> 
> Yes, also having to handle the ones whose generation numbers haven't
> been computed yet adds to the complexity.

I'm not sure it's that bad. If you cache generation numbers for all
known commits when you repack, then worst case you have to traverse all
commits not in the pack.

> This one, and $gmane/264101, are a few instances of this known issue
> raised here recently.

If $gmane/264101 is caused by clock skew, I'd find that disturbing.
Those algorithms are supposed to be "correct, but slower" in the face of
skew, not ever incorrect.

> I have been wondering if we can do something
> along the following (these are not alternatives) as a cheaper
> workaround:
> 
>  (1) Introduce '--skewed-timestamps[=(allow|warn|reject)' to all
>      commands that create new commit objects.  If the committer
>      timestamp being used is older than any of the parent commits,
>      "warn" or "reject" depending on the setting.

I think this idea has come up before. If it's _your_ timestamp that is
screwed up, this detects it, which is good. But if it's somebody else's
timestamp that is screwed up, there's often not much you can do. It's
already baked into the history.

I don't mind it as an extra layer of protection, I guess. But my
recollection of the great skew survey[1] is that most of these problems
don't come from actual clock skew, but from software bugs or bogus data
in imported commits. True skew is generally less than a day, and can be
handled with a fixed slop time.

[1] http://article.gmane.org/gmane.comp.version-control.git/159065

>  (2) Compute a bitmap whose timestamps are suspect when we pack to
>      mark commits.  When revision.c:limit_list() tries to see if
>      there still are interesting commits, an UNINTERESTING commit
>      marked as such shouldn't be counted as "not interesting because
>      it is old enough".  Use the same hint in the walker used in
>      "describe --contains".

If you see mismatched timestamps between a parent and child commit, it's
often not clear which one is suspicious.  Is the parent skewed to the
future, or is the child skewed to the past? Which one do you mark as
suspect?

IMHO, if you are going to go to the trouble to detect and store skew,
you should just go to the trouble to calculate and store reliable
generation numbers.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git describe --contains doesn't work properly for a commit
  2015-03-05  5:12           ` Jeff King
@ 2015-03-05  6:00             ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2015-03-05  6:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael J Gruber, Michal Hocko, git

Jeff King <peff@peff.net> writes:

> On Wed, Mar 04, 2015 at 12:41:57PM -0800, Junio C Hamano wrote:
>
>
>> This one, and $gmane/264101, are a few instances of this known issue
>> raised here recently.
>
> If $gmane/264101 is caused by clock skew, I'd find that disturbing.
> Those algorithms are supposed to be "correct, but slower" in the face of
> skew, not ever incorrect.

My understanding is that the commit painting done by merge-base is
designed to be always correct, but the A..B revision traversal
depends on SLOP being big enough.  When the traversal queue is
filled with all UNINTERESTING commits, some of which needs to be dug
to reveal newer commits that are not yet painted as UNINTERESTING in
order to get the complete picture, the still_interesting() logic
will end up stopping prematurely, leaving commits that are actually
UNINTERESTING in the topological sense unpainted in the resulting
newlist that is assigned to revs->commits in limit_list(), yielding
an incorrect result.

>> > Calculating them is simple. Caching and storage is the bigger question.
>> 
>> Yes, also having to handle the ones whose generation numbers haven't
>> been computed yet adds to the complexity.
>
> I'm not sure it's that bad. If you cache generation numbers for all
> known commits when you repack, then worst case you have to traverse all
> commits not in the pack.
> ...
> IMHO, if you are going to go to the trouble to detect and store skew,
> you should just go to the trouble to calculate and store reliable
> generation numbers.

I would actually prefer a solution with generation numbers, of
course, because that would give us provably correct result.  If it
can be done without too much hassle, I am all for it.  Scrap the
half-baked "I've been wondering" compromise ;-)

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-03-05  6:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-26 13:35 git describe --contains doesn't work properly for a commit Michal Hocko
2015-02-26 14:23 ` Michal Hocko
2015-03-04 10:54   ` Jeff King
2015-03-04 15:06     ` Michael J Gruber
2015-03-04 18:05       ` Jeff King
2015-03-04 20:41         ` Junio C Hamano
2015-03-04 22:05           ` Mike Hommey
2015-03-05  5:12           ` Jeff King
2015-03-05  6:00             ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.