All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Robin Rosenberg <robin.rosenberg@dewire.com>
Cc: git@vger.kernel.org
Subject: [JGIT PATCH 4/5 v2] Fix RevWalk with Linus Torvald's occasional bad commit date hack
Date: Fri, 13 Mar 2009 17:54:42 -0700	[thread overview]
Message-ID: <20090314005442.GJ22920@spearce.org> (raw)
In-Reply-To: <1236910062-18476-5-git-send-email-spearce@spearce.org>

In git-core commit 7d004199d134c9d465e013064f72dbc04507f6c0 Linus
describes a hack he created to improve the handling of cases where
commit dates are out of order, such as a child commit having a date
older than its parent.  These cases can show up when there is bad
imported data, or significant clock skew between distributed peers.

The original git-core bug report identified a failure in:

  git rev-list A..B C

where commits contained in both A and B were reported, due to out
of order commit dates.

We keep running until the most recent commit in the pending queue
is more recent than the last commit sent to the caller.  If the
pending queue suddenly goes "backwards" due to one of our parent's
having a more recent commit date, this new check ensures we will
keep processing the graph until we see a more consistent cut.

We process an extra OVER_SCAN commits after we decide that all
remaining commits are UNINTERESTING.  Processing these extra
commits ensures flags are carried back further in the graph,
increasing the chances that we correctly mark relevant nodes.

As a result of this hack, we may produce a commit to our caller,
but then later mark it UNINTERESTING if we discover date skew.
To handle such cases, callers could buffer produced commits and
filter out those that are UNINTERESTING, but this somewhat costly
and may not always be necessary.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---

 This version inverts the "n.commitTime <= last.commitTime",
 I confused myself and wrote it backwards.

 The next patch I'm going to send with the tests has a test for
 this condition, to verify its correct.

 .../org/spearce/jgit/revwalk/PendingGenerator.java |   51 ++++++++++++++++++--
 1 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/revwalk/PendingGenerator.java b/org.spearce.jgit/src/org/spearce/jgit/revwalk/PendingGenerator.java
index 144e909..4e24431 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/revwalk/PendingGenerator.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/revwalk/PendingGenerator.java
@@ -42,6 +42,7 @@
 import org.spearce.jgit.errors.IncorrectObjectTypeException;
 import org.spearce.jgit.errors.MissingObjectException;
 import org.spearce.jgit.errors.StopWalkException;
+import org.spearce.jgit.lib.ObjectId;
 import org.spearce.jgit.revwalk.filter.RevFilter;
 
 /**
@@ -60,6 +61,24 @@
 
 	private static final int UNINTERESTING = RevWalk.UNINTERESTING;
 
+	/**
+	 * Number of additional commits to scan after we think we are done.
+	 * <p>
+	 * This small buffer of commits is scanned to ensure we didn't miss anything
+	 * as a result of clock skew when the commits were made. We need to set our
+	 * constant to 1 additional commit due to the use of a pre-increment
+	 * operator when accessing the value.
+	 */
+	private static final int OVER_SCAN = 5 + 1;
+
+	/** A commit near the end of time, to initialize {@link #last} with. */
+	private static final RevCommit INIT_LAST;
+
+	static {
+		INIT_LAST = new RevCommit(ObjectId.zeroId());
+		INIT_LAST.commitTime = Integer.MAX_VALUE;
+	}
+
 	private final RevWalk walker;
 
 	private final DateRevQueue pending;
@@ -68,6 +87,17 @@
 
 	private final int output;
 
+	/** Last commit produced to the caller from {@link #next()}. */
+	private RevCommit last = INIT_LAST;
+
+	/**
+	 * Number of commits we have remaining in our over-scan allotment.
+	 * <p>
+	 * Only relevant if there are {@link #UNINTERESTING} commits in the
+	 * {@link #pending} queue.
+	 */
+	private int overScan = OVER_SCAN;
+
 	boolean canDispose;
 
 	PendingGenerator(final RevWalk w, final DateRevQueue p,
@@ -112,14 +142,27 @@ RevCommit next() throws MissingObjectException,
 				walker.carryFlagsImpl(c);
 
 				if ((c.flags & UNINTERESTING) != 0) {
-					if (pending.everbodyHasFlag(UNINTERESTING))
-						throw StopWalkException.INSTANCE;
-					c.dispose();
+					if (pending.everbodyHasFlag(UNINTERESTING)) {
+						final RevCommit n = pending.peek();
+						if (n != null && n.commitTime >= last.commitTime) {
+							// This is too close to call. The next commit we
+							// would pop is dated after the last one produced.
+							// We have to keep going to ensure that we carry
+							// flags as much as necessary.
+							//
+							overScan = OVER_SCAN;
+						} else if (--overScan == 0)
+							throw StopWalkException.INSTANCE;
+					} else {
+						overScan = OVER_SCAN;
+					}
+					if (canDispose)
+						c.dispose();
 					continue;
 				}
 
 				if (produce)
-					return c;
+					return last = c;
 				else if (canDispose)
 					c.dispose();
 			}
-- 
1.6.2.288.gc3f22

  parent reply	other threads:[~2009-03-14  0:57 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-13  2:07 [JGIT PATCH 0/5] RevWalk fixes for UNINTERESTING Shawn O. Pearce
2009-03-13  2:07 ` [JGIT PATCH 1/5] Show critical flags in debug toString() descriptions of rev queues Shawn O. Pearce
2009-03-13  2:07   ` [JGIT PATCH 2/5] Make RevObject.getType implementations final Shawn O. Pearce
2009-03-13  2:07     ` [JGIT PATCH 3/5] Remove the horribly stupid RevSort.START_ORDER Shawn O. Pearce
2009-03-13  2:07       ` [JGIT PATCH 4/5] Fix RevWalk with Linus Torvald's occasional bad commit date hack Shawn O. Pearce
2009-03-13  2:07         ` [JGIT PATCH 5/5] Avoid incorrect output of UNINTERESTING commits when clock skew occurs Shawn O. Pearce
2009-03-14  0:54         ` Shawn O. Pearce [this message]
2009-03-13 20:00 ` [JGIT PATCH 0/5] RevWalk fixes for UNINTERESTING Robin Rosenberg
2009-03-13 22:39   ` [JGIT PATCH 0/6] Add tests for RevWalk and its supporting code Shawn O. Pearce
2009-03-14  0:13     ` Shawn O. Pearce
2009-03-14  0:56       ` [JGIT PATCH 6/5 v2] " Shawn O. Pearce
2009-03-15 11:34         ` Robin Rosenberg
2009-03-16 14:33           ` Shawn O. Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090314005442.GJ22920@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=robin.rosenberg@dewire.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.