git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [JGIT PATCH 00/18] Misc. performance tweaks
@ 2008-12-25  2:10 Shawn O. Pearce
  2008-12-25  2:10 ` [JGIT PATCH 01/23] Improve hit performance on the UnpackedObjectCache Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:10 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Cloning linux-2.6.git through JGit was painful at best.  I found
and fixed some small bottlenecks after a day of profiling and
experimentation, but we're still slower than C git.

With this series I managed to drop the time for "git clone --bare"
over git:// using "jgit daemon" server and "C git" client.
Any difference between jgit and "C git" is in the server side.

  before:  7m42.488s
  after :  2m33.882s
  C git :  1m26.158s     ("git daemon" server)


So I'm still seeing a major bottleneck that I can't quite fix.

Object enumeration (aka "Counting ...") takes too long, because we
spend a huge amount of time unpacking delta chains for trees so we
can enumerate their referenced items.

Our UnpackedObjectCache gets <4% hit ratio when doing the trees
for linux-2.6.git.  Increasing the cache doesn't have a noticable
improvement on performance.

I tried rewriting UnpackedObjectCache to permit multiple objects
per hash bucket.  Even with that (and the maximum chain length
per bucket not exceeding 4 items) our hit ratio was still <5%,
so I tossed that implementation out.

"jgit rev-list --objects" vs. "git rev-list --objects" is a huge
difference, about 1m difference.  That's most of the time difference
I noted above between jgit and C git on the server side.

So with this series, we're better.  Its actually almost tolerable
to clone linux-2.6 through a jgit backed server.


Shawn O. Pearce (23):
  Improve hit performance on the UnpackedObjectCache
  Add MutableObjectId.clear() to set the id to zeroId
  Allow TreeWalk callers to pass a MutableObjectId to get the current
    id
  Switch ObjectWalk to use the new MutableObjectId form in TreeWalk
  Change walker based fetch to use TreeWalk's MutableObjectId accessor
  Reduce garbage allocation when using TreeWalk
  Switch ObjectWalk to use a naked CanonicalTreeParser because its
    faster
  Remove the unused PackFile.get(ObjectId) form
  Remove getId from ObjectLoader API as its unnecessary overhead
  Make mmap mode more reliable by forcing GC at the correct spot
  Rewrite WindowCache to use a hash table
  Change ByteArrayWindow to read content outside of WindowCache's lock
  Dispose of RevCommit buffers when they aren't used in PackWriter
  Don't unpack delta chains while writing a pack from a pack v1 index
  Don't unpack delta chains while converting delta to whole object
  Defer parsing of the ObjectId while walking a PackIndex Iterator
  Only do one getCachedBytes per whole object written
  Correctly use a long for the offsets within a generated pack
  Allow more direct access to determine isWritten
  Move "wantWrite" field of ObjectToPack into the flags field
  Use an ArrayList for the reuseLoader collection in PackWriter
  Don't cut off existing delta chains if we are reusing deltas
  Correctly honor the thin parameter to PackWriter.writePack

 .../jgit/pgm/opt/AbstractTreeIteratorHandler.java  |    6 +-
 .../tst/org/spearce/jgit/lib/PackIndexTest.java    |    4 +-
 .../tst/org/spearce/jgit/lib/PackWriterTest.java   |   14 +-
 .../tst/org/spearce/jgit/lib/T0004_PackReader.java |    4 +-
 .../jgit/errors/CorruptObjectException.java        |   12 +
 .../src/org/spearce/jgit/lib/ByteArrayWindow.java  |   31 ++
 .../src/org/spearce/jgit/lib/ByteBufferWindow.java |   17 +
 .../src/org/spearce/jgit/lib/ByteWindow.java       |   20 ++-
 .../src/org/spearce/jgit/lib/Constants.java        |    2 +-
 .../spearce/jgit/lib/DeltaPackedObjectLoader.java  |    3 +-
 .../src/org/spearce/jgit/lib/MutableObjectId.java  |    9 +
 .../src/org/spearce/jgit/lib/ObjectLoader.java     |   38 ---
 .../src/org/spearce/jgit/lib/PackFile.java         |   49 +---
 .../src/org/spearce/jgit/lib/PackIndex.java        |   48 ++--
 .../src/org/spearce/jgit/lib/PackIndexV1.java      |   20 +-
 .../src/org/spearce/jgit/lib/PackIndexV2.java      |   27 +-
 .../src/org/spearce/jgit/lib/PackWriter.java       |   63 +++--
 .../src/org/spearce/jgit/lib/Repository.java       |   29 +--
 .../org/spearce/jgit/lib/UnpackedObjectCache.java  |   21 +-
 .../org/spearce/jgit/lib/UnpackedObjectLoader.java |   12 +-
 .../spearce/jgit/lib/WholePackedObjectLoader.java  |    3 +-
 .../src/org/spearce/jgit/lib/WindowCache.java      |  323 ++++++++++++--------
 .../src/org/spearce/jgit/lib/WindowCursor.java     |   16 +
 .../src/org/spearce/jgit/lib/WindowedFile.java     |   61 +++--
 .../src/org/spearce/jgit/revwalk/ObjectWalk.java   |   51 ++--
 .../src/org/spearce/jgit/revwalk/RevWalk.java      |    8 +-
 .../spearce/jgit/transport/PackedObjectInfo.java   |    2 +-
 .../src/org/spearce/jgit/transport/UploadPack.java |    1 +
 .../jgit/transport/WalkFetchConnection.java        |   48 ++-
 .../jgit/treewalk/AbstractTreeIterator.java        |   48 +++
 .../spearce/jgit/treewalk/CanonicalTreeParser.java |   85 +++++-
 .../src/org/spearce/jgit/treewalk/TreeWalk.java    |   88 +++++-
 .../spearce/jgit/util/CountingOutputStream.java    |    5 +-
 33 files changed, 752 insertions(+), 416 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [JGIT PATCH 01/23] Improve hit performance on the UnpackedObjectCache
  2008-12-25  2:10 [JGIT PATCH 00/18] Misc. performance tweaks Shawn O. Pearce
@ 2008-12-25  2:10 ` Shawn O. Pearce
  2008-12-25  2:10   ` [JGIT PATCH 02/23] Add MutableObjectId.clear() to set the id to zeroId Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:10 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

If the JVM cleared one of our SoftReferences we "leaked" the space in
the cache that the entry occupied.  Over time this meant we lost room
in the cache and didn't have a way to recover that when we replaced
the evicted entry.

The hash function was also too complex for the hit ratio we were
getting.  The old function on one of my linux-2.6 clones was giving
us <7% hit ratio; this new function is a little simpler to compute
and is getting ~11%.  Increasing the size of the hash table helps
matters considerably.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../org/spearce/jgit/lib/UnpackedObjectCache.java  |   21 +++++++++----------
 1 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectCache.java b/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectCache.java
index ee6a680..677b3a7 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectCache.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectCache.java
@@ -40,17 +40,14 @@
 import java.lang.ref.SoftReference;
 
 class UnpackedObjectCache {
-	private static final int CACHE_SZ = 256;
+	private static final int CACHE_SZ = 1024;
 
 	private static final int MB = 1024 * 1024;
 
 	private static final SoftReference<Entry> DEAD;
 
-	private static int hash(final WindowedFile pack, final long position) {
-		int h = pack.hash + (int) position;
-		h += h >>> 16;
-		h += h >>> 8;
-		return h & (CACHE_SZ - 1);
+	private static int hash(final long position) {
+		return (((int) position) << 22) >>> 22;
 	}
 
 	private static int maxByteCount;
@@ -80,7 +77,7 @@ static synchronized void reconfigure(final int dbLimit) {
 	}
 
 	static synchronized Entry get(final WindowedFile pack, final long position) {
-		final Slot e = cache[hash(pack, position)];
+		final Slot e = cache[hash(position)];
 		if (e.provider == pack && e.position == position) {
 			final Entry buf = e.data.get();
 			if (buf != null) {
@@ -96,7 +93,7 @@ static synchronized void store(final WindowedFile pack,
 		if (data.length > maxByteCount)
 			return; // Too large to cache.
 
-		final Slot e = cache[hash(pack, position)];
+		final Slot e = cache[hash(position)];
 		clearEntry(e);
 
 		openByteCount += data.length;
@@ -104,6 +101,7 @@ static synchronized void store(final WindowedFile pack,
 
 		e.provider = pack;
 		e.position = position;
+		e.sz = data.length;
 		e.data = new SoftReference<Entry>(new Entry(data, objectType));
 		moveToHead(e);
 	}
@@ -155,11 +153,10 @@ private static void unlink(final Slot e) {
 	}
 
 	private static void clearEntry(final Slot e) {
-		final Entry old = e.data.get();
-		if (old != null)
-			openByteCount -= old.data.length;
+		openByteCount -= e.sz;
 		e.provider = null;
 		e.data = DEAD;
+		e.sz = 0;
 	}
 
 	private UnpackedObjectCache() {
@@ -186,6 +183,8 @@ Entry(final byte[] aData, final int aType) {
 
 		long position;
 
+		int sz;
+
 		SoftReference<Entry> data = DEAD;
 	}
 }
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 02/23] Add MutableObjectId.clear() to set the id to zeroId
  2008-12-25  2:10 ` [JGIT PATCH 01/23] Improve hit performance on the UnpackedObjectCache Shawn O. Pearce
@ 2008-12-25  2:10   ` Shawn O. Pearce
  2008-12-25  2:10     ` [JGIT PATCH 03/23] Allow TreeWalk callers to pass a MutableObjectId to get the current id Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:10 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

This makes it easier to set any MutableObjectId to the same value
as the zeroId.  Its logical to ask for a ".clear()" invocation when
you are writing application code, and its more efficient for the
JIT if we do 5 zero writes then to perform a copy from the zeroId
singleton.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/lib/MutableObjectId.java  |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/MutableObjectId.java b/org.spearce.jgit/src/org/spearce/jgit/lib/MutableObjectId.java
index 5b16345..fadebab 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/MutableObjectId.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/MutableObjectId.java
@@ -66,6 +66,15 @@ MutableObjectId(MutableObjectId src) {
 		this.w5 = src.w5;
 	}
 
+	/** Make this id match {@link ObjectId#zeroId()}. */
+	public void clear() {
+		w1 = 0;
+		w2 = 0;
+		w3 = 0;
+		w4 = 0;
+		w5 = 0;
+	}
+
 	/**
 	 * Convert an ObjectId from raw binary representation.
 	 * 
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 03/23] Allow TreeWalk callers to pass a MutableObjectId to get the current id
  2008-12-25  2:10   ` [JGIT PATCH 02/23] Add MutableObjectId.clear() to set the id to zeroId Shawn O. Pearce
@ 2008-12-25  2:10     ` Shawn O. Pearce
  2008-12-25  2:11       ` [JGIT PATCH 04/23] Switch ObjectWalk to use the new MutableObjectId form in TreeWalk Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:10 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

This avoids a memory allocation associated with getting the entry
object for each name.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/treewalk/TreeWalk.java    |   28 +++++++++++++++++++-
 1 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java b/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java
index 26544b5..38a726b 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java
@@ -46,6 +46,7 @@
 import org.spearce.jgit.errors.StopWalkException;
 import org.spearce.jgit.lib.Constants;
 import org.spearce.jgit.lib.FileMode;
+import org.spearce.jgit.lib.MutableObjectId;
 import org.spearce.jgit.lib.ObjectId;
 import org.spearce.jgit.lib.Repository;
 import org.spearce.jgit.revwalk.RevTree;
@@ -513,7 +514,8 @@ public FileMode getFileMode(final int nth) {
 	 * <p>
 	 * Using this method to compare ObjectId values between trees of this walker
 	 * is very inefficient. Applications should try to use
-	 * {@link #idEqual(int, int)} whenever possible.
+	 * {@link #idEqual(int, int)} or {@link #getObjectId(MutableObjectId, int)}
+	 * whenever possible.
 	 * <p>
 	 * Every tree supplies an object id, even if the tree does not contain the
 	 * current entry. In the latter case {@link ObjectId#zeroId()} is returned.
@@ -521,6 +523,7 @@ public FileMode getFileMode(final int nth) {
 	 * @param nth
 	 *            tree to obtain the object identifier from.
 	 * @return object identifier for the current tree entry.
+	 * @see #getObjectId(MutableObjectId, int)
 	 * @see #idEqual(int, int)
 	 */
 	public ObjectId getObjectId(final int nth) {
@@ -530,6 +533,29 @@ public ObjectId getObjectId(final int nth) {
 	}
 
 	/**
+	 * Obtain the ObjectId for the current entry.
+	 * <p>
+	 * Every tree supplies an object id, even if the tree does not contain the
+	 * current entry. In the latter case {@link ObjectId#zeroId()} is supplied.
+	 * <p>
+	 * Applications should try to use {@link #idEqual(int, int)} when possible
+	 * as it avoids conversion overheads.
+	 * 
+	 * @param out
+	 *            buffer to copy the object id into.
+	 * @param nth
+	 *            tree to obtain the object identifier from.
+	 * @see #idEqual(int, int)
+	 */
+	public void getObjectId(final MutableObjectId out, final int nth) {
+		final AbstractTreeIterator t = trees[nth];
+		if (t.matches == currentHead)
+			out.fromRaw(t.idBuffer(), t.idOffset());
+		else
+			out.clear();
+	}
+
+	/**
 	 * Compare two tree's current ObjectId values for equality.
 	 * 
 	 * @param nthA
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 04/23] Switch ObjectWalk to use the new MutableObjectId form in TreeWalk
  2008-12-25  2:10     ` [JGIT PATCH 03/23] Allow TreeWalk callers to pass a MutableObjectId to get the current id Shawn O. Pearce
@ 2008-12-25  2:11       ` Shawn O. Pearce
  2008-12-25  2:11         ` [JGIT PATCH 05/23] Change walker based fetch to use TreeWalk's MutableObjectId accessor Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

RevWalk already has an idBuffer field available, which is used by
the commit and tree parsers to hold an id they need to probe for
in the hash table.  We should be using that buffer during tree
entry iteration as it avoids object allocation.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/revwalk/ObjectWalk.java   |   20 ++++++++++++--------
 1 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java b/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
index 69a20aa..454cb4a 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
@@ -251,7 +251,8 @@ public RevObject nextObject() throws MissingObjectException,
 
 			switch (sType) {
 			case Constants.OBJ_BLOB: {
-				final RevObject o = lookupAny(treeWalk.getObjectId(0), sType);
+				treeWalk.getObjectId(idBuffer, 0);
+				final RevObject o = lookupAny(idBuffer, sType);
 				if ((o.flags & SEEN) != 0)
 					continue;
 				o.flags |= SEEN;
@@ -262,7 +263,8 @@ public RevObject nextObject() throws MissingObjectException,
 				return o;
 			}
 			case Constants.OBJ_TREE: {
-				final RevObject o = lookupAny(treeWalk.getObjectId(0), sType);
+				treeWalk.getObjectId(idBuffer, 0);
+				final RevObject o = lookupAny(idBuffer, sType);
 				if ((o.flags & SEEN) != 0)
 					continue;
 				o.flags |= SEEN;
@@ -276,8 +278,9 @@ public RevObject nextObject() throws MissingObjectException,
 			default:
 				if (FileMode.GITLINK.equals(mode))
 					continue;
+				treeWalk.getObjectId(idBuffer, 0);
 				throw new CorruptObjectException("Invalid mode " + mode
-						+ " for " + treeWalk.getObjectId(0).name() + " "
+						+ " for " + idBuffer.name() + " "
 						+ treeWalk.getPathString() + " in " + currentTree + ".");
 			}
 		}
@@ -390,13 +393,13 @@ private void markTreeUninteresting(final RevTree tree)
 
 			switch (sType) {
 			case Constants.OBJ_BLOB: {
-				final ObjectId sid = treeWalk.getObjectId(0);
-				lookupAny(sid, sType).flags |= UNINTERESTING;
+				treeWalk.getObjectId(idBuffer, 0);
+				lookupAny(idBuffer, sType).flags |= UNINTERESTING;
 				continue;
 			}
 			case Constants.OBJ_TREE: {
-				final ObjectId sid = treeWalk.getObjectId(0);
-				final RevObject subtree = lookupAny(sid, sType);
+				treeWalk.getObjectId(idBuffer, 0);
+				final RevObject subtree = lookupAny(idBuffer, sType);
 				if ((subtree.flags & UNINTERESTING) == 0) {
 					subtree.flags |= UNINTERESTING;
 					treeWalk.enterSubtree();
@@ -406,8 +409,9 @@ private void markTreeUninteresting(final RevTree tree)
 			default:
 				if (FileMode.GITLINK.equals(mode))
 					continue;
+				treeWalk.getObjectId(idBuffer, 0);
 				throw new CorruptObjectException("Invalid mode " + mode
-						+ " for " + treeWalk.getObjectId(0).name() + " "
+						+ " for " + idBuffer.name() + " "
 						+ treeWalk.getPathString() + " in " + tree + ".");
 			}
 		}
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 05/23] Change walker based fetch to use TreeWalk's MutableObjectId accessor
  2008-12-25  2:11       ` [JGIT PATCH 04/23] Switch ObjectWalk to use the new MutableObjectId form in TreeWalk Shawn O. Pearce
@ 2008-12-25  2:11         ` Shawn O. Pearce
  2008-12-25  2:11           ` [JGIT PATCH 06/23] Reduce garbage allocation when using TreeWalk Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Fetch time is usually dominated by network traffic, but while marking
objects reachable locally we can still benefit from the reduction in
object allocations this variant of getObjectId offers.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../jgit/transport/WalkFetchConnection.java        |   29 +++++++++++--------
 1 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/transport/WalkFetchConnection.java b/org.spearce.jgit/src/org/spearce/jgit/transport/WalkFetchConnection.java
index 91c5ea8..6300f10 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/transport/WalkFetchConnection.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/transport/WalkFetchConnection.java
@@ -59,6 +59,7 @@
 import org.spearce.jgit.lib.AnyObjectId;
 import org.spearce.jgit.lib.Constants;
 import org.spearce.jgit.lib.FileMode;
+import org.spearce.jgit.lib.MutableObjectId;
 import org.spearce.jgit.lib.ObjectChecker;
 import org.spearce.jgit.lib.ObjectId;
 import org.spearce.jgit.lib.PackIndex;
@@ -152,6 +153,8 @@
 	 */
 	private final Set<String> packsConsidered;
 
+	private final MutableObjectId idBuffer = new MutableObjectId();
+
 	/**
 	 * Errors received while trying to obtain an object.
 	 * <p>
@@ -300,16 +303,17 @@ private void processTree(final RevObject obj) throws TransportException {
 
 				switch (sType) {
 				case Constants.OBJ_BLOB:
-				case Constants.OBJ_TREE: {
-					final ObjectId sId = treeWalk.getObjectId(0);
-					needs(revWalk.lookupAny(sId, sType));
+				case Constants.OBJ_TREE:
+					treeWalk.getObjectId(idBuffer, 0);
+					needs(revWalk.lookupAny(idBuffer, sType));
 					continue;
-				}
+
 				default:
 					if (FileMode.GITLINK.equals(mode))
 						continue;
+					treeWalk.getObjectId(idBuffer, 0);
 					throw new CorruptObjectException("Invalid mode " + mode
-							+ " for " + treeWalk.getObjectId(0).name() + " "
+							+ " for " + idBuffer.name() + " "
 							+ treeWalk.getPathString() + " in "
 							+ obj.getId().name() + ".");
 				}
@@ -722,14 +726,14 @@ private void markTreeComplete(final RevTree tree) throws IOException {
 			final int sType = mode.getObjectType();
 
 			switch (sType) {
-			case Constants.OBJ_BLOB: {
-				final ObjectId sid = treeWalk.getObjectId(0);
-				revWalk.lookupAny(sid, sType).add(COMPLETE);
+			case Constants.OBJ_BLOB:
+				treeWalk.getObjectId(idBuffer, 0);
+				revWalk.lookupAny(idBuffer, sType).add(COMPLETE);
 				continue;
-			}
+
 			case Constants.OBJ_TREE: {
-				final ObjectId sid = treeWalk.getObjectId(0);
-				final RevObject o = revWalk.lookupAny(sid, sType);
+				treeWalk.getObjectId(idBuffer, 0);
+				final RevObject o = revWalk.lookupAny(idBuffer, sType);
 				if (!o.has(COMPLETE)) {
 					o.add(COMPLETE);
 					treeWalk.enterSubtree();
@@ -739,8 +743,9 @@ private void markTreeComplete(final RevTree tree) throws IOException {
 			default:
 				if (FileMode.GITLINK.equals(mode))
 					continue;
+				treeWalk.getObjectId(idBuffer, 0);
 				throw new CorruptObjectException("Invalid mode " + mode
-						+ " for " + treeWalk.getObjectId(0).name() + " "
+						+ " for " + idBuffer.name() + " "
 						+ treeWalk.getPathString() + " in " + tree.name() + ".");
 			}
 		}
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 06/23] Reduce garbage allocation when using TreeWalk
  2008-12-25  2:11         ` [JGIT PATCH 05/23] Change walker based fetch to use TreeWalk's MutableObjectId accessor Shawn O. Pearce
@ 2008-12-25  2:11           ` Shawn O. Pearce
  2008-12-25  2:11             ` [JGIT PATCH 07/23] Switch ObjectWalk to use a naked CanonicalTreeParser because its faster Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

We now support more efficiently resetting a TreeWalk when the
walker is using a single tree traversal from the repository.
This is very common during an ObjectWalk for a PackWriter, so
optimizing the code path is worth while.  By using a special
form of reset we can avoid unnecessary temporary arrays and
also skip some loop conditional instructions.

When diving into a subtree iterator (which is also quite a
common operation in ObjectWalk) we want to reuse an idBuffer
and a WindowCursor as much as possible, ensuring that these
aren't created temporarily on the stack.  This should help
to reduce the stress on the Inflater cache by allowing it
to reuse the same Inflater on each tree construction.  We
also can avoid creating a temporary ObjectId by filling out
the mutable idBuffer during the repository lookup.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../jgit/pgm/opt/AbstractTreeIteratorHandler.java  |    6 ++-
 .../src/org/spearce/jgit/revwalk/ObjectWalk.java   |    5 +-
 .../jgit/transport/WalkFetchConnection.java        |    4 +-
 .../jgit/treewalk/AbstractTreeIterator.java        |   28 ++++++++++
 .../spearce/jgit/treewalk/CanonicalTreeParser.java |   46 ++++++++++++----
 .../src/org/spearce/jgit/treewalk/TreeWalk.java    |   58 ++++++++++++++++++--
 6 files changed, 124 insertions(+), 23 deletions(-)

diff --git a/org.spearce.jgit.pgm/src/org/spearce/jgit/pgm/opt/AbstractTreeIteratorHandler.java b/org.spearce.jgit.pgm/src/org/spearce/jgit/pgm/opt/AbstractTreeIteratorHandler.java
index 9432d5f..12d4191 100644
--- a/org.spearce.jgit.pgm/src/org/spearce/jgit/pgm/opt/AbstractTreeIteratorHandler.java
+++ b/org.spearce.jgit.pgm/src/org/spearce/jgit/pgm/opt/AbstractTreeIteratorHandler.java
@@ -51,6 +51,7 @@
 import org.spearce.jgit.errors.IncorrectObjectTypeException;
 import org.spearce.jgit.errors.MissingObjectException;
 import org.spearce.jgit.lib.ObjectId;
+import org.spearce.jgit.lib.WindowCursor;
 import org.spearce.jgit.treewalk.AbstractTreeIterator;
 import org.spearce.jgit.treewalk.CanonicalTreeParser;
 import org.spearce.jgit.treewalk.FileTreeIterator;
@@ -110,8 +111,9 @@ public int parseArguments(final Parameters params) throws CmdLineException {
 			throw new CmdLineException(name + " is not a tree");
 
 		final CanonicalTreeParser p = new CanonicalTreeParser();
+		final WindowCursor curs = new WindowCursor();
 		try {
-			p.reset(clp.getRepository(), clp.getRevWalk().parseTree(id));
+			p.reset(clp.getRepository(), clp.getRevWalk().parseTree(id), curs);
 		} catch (MissingObjectException e) {
 			throw new CmdLineException(name + " is not a tree");
 		} catch (IncorrectObjectTypeException e) {
@@ -119,6 +121,8 @@ public int parseArguments(final Parameters params) throws CmdLineException {
 		} catch (IOException e) {
 			throw new CmdLineException("cannot read " + name + ": "
 					+ e.getMessage());
+		} finally {
+			curs.release();
 		}
 
 		setter.addValue(p);
diff --git a/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java b/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
index 454cb4a..cfdbe18 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
@@ -45,7 +45,6 @@
 import org.spearce.jgit.lib.AnyObjectId;
 import org.spearce.jgit.lib.Constants;
 import org.spearce.jgit.lib.FileMode;
-import org.spearce.jgit.lib.ObjectId;
 import org.spearce.jgit.lib.Repository;
 import org.spearce.jgit.treewalk.TreeWalk;
 
@@ -296,7 +295,7 @@ public RevObject nextObject() throws MissingObjectException,
 				continue;
 			if (o instanceof RevTree) {
 				currentTree = (RevTree) o;
-				treeWalk.reset(new ObjectId[] { currentTree });
+				treeWalk.reset(currentTree);
 			}
 			return o;
 		}
@@ -386,7 +385,7 @@ private void markTreeUninteresting(final RevTree tree)
 			return;
 		tree.flags |= UNINTERESTING;
 
-		treeWalk.reset(new ObjectId[] { tree });
+		treeWalk.reset(tree);
 		while (treeWalk.next()) {
 			final FileMode mode = treeWalk.getFileMode(0);
 			final int sType = mode.getObjectType();
diff --git a/org.spearce.jgit/src/org/spearce/jgit/transport/WalkFetchConnection.java b/org.spearce.jgit/src/org/spearce/jgit/transport/WalkFetchConnection.java
index 6300f10..5d0c6bc 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/transport/WalkFetchConnection.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/transport/WalkFetchConnection.java
@@ -296,7 +296,7 @@ private void processBlob(final RevObject obj) throws TransportException {
 
 	private void processTree(final RevObject obj) throws TransportException {
 		try {
-			treeWalk.reset(new ObjectId[] { obj });
+			treeWalk.reset(obj);
 			while (treeWalk.next()) {
 				final FileMode mode = treeWalk.getFileMode(0);
 				final int sType = mode.getObjectType();
@@ -720,7 +720,7 @@ private void markTreeComplete(final RevTree tree) throws IOException {
 		if (tree.has(COMPLETE))
 			return;
 		tree.add(COMPLETE);
-		treeWalk.reset(new ObjectId[] { tree });
+		treeWalk.reset(tree);
 		while (treeWalk.next()) {
 			final FileMode mode = treeWalk.getFileMode(0);
 			final int sType = mode.getObjectType();
diff --git a/org.spearce.jgit/src/org/spearce/jgit/treewalk/AbstractTreeIterator.java b/org.spearce.jgit/src/org/spearce/jgit/treewalk/AbstractTreeIterator.java
index 5226ab6..c404cdc 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/treewalk/AbstractTreeIterator.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/treewalk/AbstractTreeIterator.java
@@ -46,8 +46,10 @@
 import org.spearce.jgit.errors.IncorrectObjectTypeException;
 import org.spearce.jgit.lib.Constants;
 import org.spearce.jgit.lib.FileMode;
+import org.spearce.jgit.lib.MutableObjectId;
 import org.spearce.jgit.lib.ObjectId;
 import org.spearce.jgit.lib.Repository;
+import org.spearce.jgit.lib.WindowCursor;
 import org.spearce.jgit.treewalk.filter.TreeFilter;
 
 /**
@@ -347,6 +349,32 @@ public abstract AbstractTreeIterator createSubtreeIterator(Repository repo)
 			throws IncorrectObjectTypeException, IOException;
 
 	/**
+	 * Create a new iterator for the current entry's subtree.
+	 * <p>
+	 * The parent reference of the iterator must be <code>this</code>, otherwise
+	 * the caller would not be able to exit out of the subtree iterator
+	 * correctly and return to continue walking <code>this</code>.
+	 * 
+	 * @param repo
+	 *            repository to load the tree data from.
+	 * @param idBuffer
+	 *            temporary ObjectId buffer for use by this method.
+	 * @param curs
+	 *            window cursor to use during repository access.
+	 * @return a new parser that walks over the current subtree.
+	 * @throws IncorrectObjectTypeException
+	 *             the current entry is not actually a tree and cannot be parsed
+	 *             as though it were a tree.
+	 * @throws IOException
+	 *             a loose object or pack file could not be read.
+	 */
+	public AbstractTreeIterator createSubtreeIterator(final Repository repo,
+			final MutableObjectId idBuffer, final WindowCursor curs)
+			throws IncorrectObjectTypeException, IOException {
+		return createSubtreeIterator(repo);
+	}
+
+	/**
 	 * Is this tree iterator positioned on its first entry?
 	 * <p>
 	 * An iterator is positioned on the first entry if <code>back(1)</code>
diff --git a/org.spearce.jgit/src/org/spearce/jgit/treewalk/CanonicalTreeParser.java b/org.spearce.jgit/src/org/spearce/jgit/treewalk/CanonicalTreeParser.java
index dcc53cd..df3384d 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/treewalk/CanonicalTreeParser.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/treewalk/CanonicalTreeParser.java
@@ -41,11 +41,14 @@
 
 import org.spearce.jgit.errors.IncorrectObjectTypeException;
 import org.spearce.jgit.errors.MissingObjectException;
+import org.spearce.jgit.lib.AnyObjectId;
 import org.spearce.jgit.lib.Constants;
 import org.spearce.jgit.lib.FileMode;
+import org.spearce.jgit.lib.MutableObjectId;
 import org.spearce.jgit.lib.ObjectId;
 import org.spearce.jgit.lib.ObjectLoader;
 import org.spearce.jgit.lib.Repository;
+import org.spearce.jgit.lib.WindowCursor;
 
 /** Parses raw Git trees from the canonical semi-text/semi-binary format. */
 public class CanonicalTreeParser extends AbstractTreeIterator {
@@ -87,6 +90,8 @@ public void reset(final byte[] treeData) {
 	 * @param id
 	 *            identity of the tree being parsed; used only in exception
 	 *            messages if data corruption is found.
+	 * @param curs
+	 *            window cursor to use during repository access.
 	 * @throws MissingObjectException
 	 *             the object supplied is not available from the repository.
 	 * @throws IncorrectObjectTypeException
@@ -95,27 +100,46 @@ public void reset(final byte[] treeData) {
 	 * @throws IOException
 	 *             a loose object or pack file could not be read.
 	 */
-	public void reset(final Repository repo, final ObjectId id)
+	public void reset(final Repository repo, final AnyObjectId id,
+			final WindowCursor curs)
 			throws IncorrectObjectTypeException, IOException {
-		final ObjectLoader ldr = repo.openObject(id);
-		if (ldr == null)
-			throw new MissingObjectException(id, Constants.TYPE_TREE);
+		final ObjectLoader ldr = repo.openObject(curs, id);
+		if (ldr == null) {
+			final ObjectId me = id.toObjectId();
+			throw new MissingObjectException(me, Constants.TYPE_TREE);
+		}
 		final byte[] subtreeData = ldr.getCachedBytes();
-		if (ldr.getType() != Constants.OBJ_TREE)
-			throw new IncorrectObjectTypeException(id, Constants.TYPE_TREE);
+		if (ldr.getType() != Constants.OBJ_TREE) {
+			final ObjectId me = id.toObjectId();
+			throw new IncorrectObjectTypeException(me, Constants.TYPE_TREE);
+		}
 		reset(subtreeData);
 	}
 
-	public CanonicalTreeParser createSubtreeIterator(final Repository repo)
+	@Override
+	public CanonicalTreeParser createSubtreeIterator(final Repository repo,
+			final MutableObjectId idBuffer, final WindowCursor curs)
 			throws IncorrectObjectTypeException, IOException {
-		final ObjectId id = getEntryObjectId();
-		if (!FileMode.TREE.equals(mode))
-			throw new IncorrectObjectTypeException(id, Constants.TYPE_TREE);
+		idBuffer.fromRaw(idBuffer(), idOffset());
+		if (!FileMode.TREE.equals(mode)) {
+			final ObjectId me = idBuffer.toObjectId();
+			throw new IncorrectObjectTypeException(me, Constants.TYPE_TREE);
+		}
 		final CanonicalTreeParser p = new CanonicalTreeParser(this);
-		p.reset(repo, id);
+		p.reset(repo, idBuffer, curs);
 		return p;
 	}
 
+	public CanonicalTreeParser createSubtreeIterator(final Repository repo)
+			throws IncorrectObjectTypeException, IOException {
+		final WindowCursor curs = new WindowCursor();
+		try {
+			return createSubtreeIterator(repo, new MutableObjectId(), curs);
+		} finally {
+			curs.release();
+		}
+	}
+
 	@Override
 	public byte[] idBuffer() {
 		return raw;
diff --git a/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java b/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java
index 38a726b..cf66bf4 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java
@@ -44,11 +44,13 @@
 import org.spearce.jgit.errors.IncorrectObjectTypeException;
 import org.spearce.jgit.errors.MissingObjectException;
 import org.spearce.jgit.errors.StopWalkException;
+import org.spearce.jgit.lib.AnyObjectId;
 import org.spearce.jgit.lib.Constants;
 import org.spearce.jgit.lib.FileMode;
 import org.spearce.jgit.lib.MutableObjectId;
 import org.spearce.jgit.lib.ObjectId;
 import org.spearce.jgit.lib.Repository;
+import org.spearce.jgit.lib.WindowCursor;
 import org.spearce.jgit.revwalk.RevTree;
 import org.spearce.jgit.treewalk.filter.PathFilterGroup;
 import org.spearce.jgit.treewalk.filter.TreeFilter;
@@ -99,7 +101,7 @@
 	 *             a tree object was not found.
 	 */
 	public static TreeWalk forPath(final Repository db, final String path,
-			final ObjectId[] trees) throws MissingObjectException,
+			final AnyObjectId[] trees) throws MissingObjectException,
 			IncorrectObjectTypeException, CorruptObjectException, IOException {
 		final TreeWalk r = new TreeWalk(db);
 		r.setFilter(PathFilterGroup.createFromStrings(Collections
@@ -142,6 +144,10 @@ public static TreeWalk forPath(final Repository db, final String path,
 
 	private final Repository db;
 
+	private final MutableObjectId idBuffer = new MutableObjectId();
+
+	private final WindowCursor curs = new WindowCursor();
+
 	private TreeFilter filter;
 
 	AbstractTreeIterator[] trees;
@@ -278,6 +284,46 @@ public void reset() {
 	}
 
 	/**
+	 * Reset this walker to run over a single existing tree.
+	 * 
+	 * @param id
+	 *            the tree we need to parse. The walker will execute over this
+	 *            single tree if the reset is successful.
+	 * @throws MissingObjectException
+	 *             the given tree object does not exist in this repository.
+	 * @throws IncorrectObjectTypeException
+	 *             the given object id does not denote a tree, but instead names
+	 *             some other non-tree type of object. Note that commits are not
+	 *             trees, even if they are sometimes called a "tree-ish".
+	 * @throws CorruptObjectException
+	 *             the object claimed to be a tree, but its contents did not
+	 *             appear to be a tree. The repository may have data corruption.
+	 * @throws IOException
+	 *             a loose object or pack file could not be read.
+	 */
+	public void reset(final AnyObjectId id) throws MissingObjectException,
+			IncorrectObjectTypeException, CorruptObjectException, IOException {
+		if (trees.length == 1) {
+			AbstractTreeIterator o = trees[0];
+			while (o.parent != null)
+				o = o.parent;
+			if (o instanceof CanonicalTreeParser) {
+				o.matches = null;
+				o.matchShift = 0;
+				((CanonicalTreeParser) o).reset(db, id, curs);
+				trees[0] = o;
+			} else {
+				trees[0] = parserFor(id);
+			}
+		} else {
+			trees = new AbstractTreeIterator[] { parserFor(id) };
+		}
+
+		advance = false;
+		depth = 0;
+	}
+
+	/**
 	 * Reset this walker to run over a set of existing trees.
 	 * 
 	 * @param ids
@@ -295,7 +341,7 @@ public void reset() {
 	 * @throws IOException
 	 *             a loose object or pack file could not be read.
 	 */
-	public void reset(final ObjectId[] ids) throws MissingObjectException,
+	public void reset(final AnyObjectId[] ids) throws MissingObjectException,
 			IncorrectObjectTypeException, CorruptObjectException, IOException {
 		final int oldLen = trees.length;
 		final int newLen = ids.length;
@@ -311,7 +357,7 @@ public void reset(final ObjectId[] ids) throws MissingObjectException,
 				if (o instanceof CanonicalTreeParser) {
 					o.matches = null;
 					o.matchShift = 0;
-					((CanonicalTreeParser) o).reset(db, ids[i]);
+					((CanonicalTreeParser) o).reset(db, ids[i], curs);
 					r[i] = o;
 					continue;
 				}
@@ -709,7 +755,7 @@ public void enterSubtree() throws MissingObjectException,
 			final AbstractTreeIterator t = trees[i];
 			final AbstractTreeIterator n;
 			if (t.matches == ch && !t.eof() && FileMode.TREE.equals(t.mode))
-				n = t.createSubtreeIterator(db);
+				n = t.createSubtreeIterator(db, idBuffer, curs);
 			else
 				n = new EmptyTreeIterator(t);
 			tmp[i] = n;
@@ -781,10 +827,10 @@ private void exitSubtree() {
 		currentHead = minRef;
 	}
 
-	private CanonicalTreeParser parserFor(final ObjectId id)
+	private CanonicalTreeParser parserFor(final AnyObjectId id)
 			throws IncorrectObjectTypeException, IOException {
 		final CanonicalTreeParser p = new CanonicalTreeParser();
-		p.reset(db, id);
+		p.reset(db, id, curs);
 		return p;
 	}
 
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 07/23] Switch ObjectWalk to use a naked CanonicalTreeParser because its faster
  2008-12-25  2:11           ` [JGIT PATCH 06/23] Reduce garbage allocation when using TreeWalk Shawn O. Pearce
@ 2008-12-25  2:11             ` Shawn O. Pearce
  2008-12-25  2:11               ` [JGIT PATCH 08/23] Remove the unused PackFile.get(ObjectId) form Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Removing the indirection of the TreeWalk during an ObjectWalk's loop
over the trees saves us 20 seconds on one of my linux-2.6 clones.  On
average we went from 1m37s for "jgit rev-list --objects" down to 1m17s.

C git still does the same operation in 0m20s, so we're a long way from
matching C git's performance, but this is a worthwhile improvement, even
though the code has become slightly more complex.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/revwalk/ObjectWalk.java   |   42 ++++++++++---------
 .../jgit/treewalk/AbstractTreeIterator.java        |   20 +++++++++
 .../spearce/jgit/treewalk/CanonicalTreeParser.java |   39 ++++++++++++++++++-
 .../src/org/spearce/jgit/treewalk/TreeWalk.java    |    4 +-
 4 files changed, 82 insertions(+), 23 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java b/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
index cfdbe18..0d67a38 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/revwalk/ObjectWalk.java
@@ -46,7 +46,7 @@
 import org.spearce.jgit.lib.Constants;
 import org.spearce.jgit.lib.FileMode;
 import org.spearce.jgit.lib.Repository;
-import org.spearce.jgit.treewalk.TreeWalk;
+import org.spearce.jgit.treewalk.CanonicalTreeParser;
 
 /**
  * Specialized subclass of RevWalk to include trees, blobs and tags.
@@ -74,7 +74,7 @@
 	 */
 	private static final int IN_PENDING = RevWalk.REWRITE;
 
-	private final TreeWalk treeWalk;
+	private CanonicalTreeParser treeWalk;
 
 	private BlockObjQueue pendingObjects;
 
@@ -92,8 +92,8 @@
 	 */
 	public ObjectWalk(final Repository repo) {
 		super(repo);
-		treeWalk = new TreeWalk(repo);
 		pendingObjects = new BlockObjQueue();
+		treeWalk = new CanonicalTreeParser();
 	}
 
 	/**
@@ -240,17 +240,17 @@ public RevObject nextObject() throws MissingObjectException,
 		fromTreeWalk = false;
 
 		if (enterSubtree) {
-			treeWalk.enterSubtree();
+			treeWalk = treeWalk.createSubtreeIterator(db, idBuffer, curs);
 			enterSubtree = false;
 		}
 
-		while (treeWalk.next()) {
-			final FileMode mode = treeWalk.getFileMode(0);
+		for (; !treeWalk.eof(); treeWalk = treeWalk.next()) {
+			final FileMode mode = treeWalk.getEntryFileMode();
 			final int sType = mode.getObjectType();
 
 			switch (sType) {
 			case Constants.OBJ_BLOB: {
-				treeWalk.getObjectId(idBuffer, 0);
+				treeWalk.getEntryObjectId(idBuffer);
 				final RevObject o = lookupAny(idBuffer, sType);
 				if ((o.flags & SEEN) != 0)
 					continue;
@@ -262,7 +262,7 @@ public RevObject nextObject() throws MissingObjectException,
 				return o;
 			}
 			case Constants.OBJ_TREE: {
-				treeWalk.getObjectId(idBuffer, 0);
+				treeWalk.getEntryObjectId(idBuffer);
 				final RevObject o = lookupAny(idBuffer, sType);
 				if ((o.flags & SEEN) != 0)
 					continue;
@@ -277,10 +277,11 @@ public RevObject nextObject() throws MissingObjectException,
 			default:
 				if (FileMode.GITLINK.equals(mode))
 					continue;
-				treeWalk.getObjectId(idBuffer, 0);
+				treeWalk.getEntryObjectId(idBuffer);
 				throw new CorruptObjectException("Invalid mode " + mode
 						+ " for " + idBuffer.name() + " "
-						+ treeWalk.getPathString() + " in " + currentTree + ".");
+						+ treeWalk.getEntryPathString() + " in " + currentTree
+						+ ".");
 			}
 		}
 
@@ -295,7 +296,7 @@ public RevObject nextObject() throws MissingObjectException,
 				continue;
 			if (o instanceof RevTree) {
 				currentTree = (RevTree) o;
-				treeWalk.reset(currentTree);
+				treeWalk = treeWalk.resetRoot(db, currentTree, curs);
 			}
 			return o;
 		}
@@ -353,7 +354,7 @@ public void checkConnectivity() throws MissingObjectException,
 	 *         has no path, such as for annotated tags or root level trees.
 	 */
 	public String getPathString() {
-		return fromTreeWalk ? treeWalk.getPathString() : null;
+		return fromTreeWalk ? treeWalk.getEntryPathString() : null;
 	}
 
 	@Override
@@ -385,33 +386,34 @@ private void markTreeUninteresting(final RevTree tree)
 			return;
 		tree.flags |= UNINTERESTING;
 
-		treeWalk.reset(tree);
-		while (treeWalk.next()) {
-			final FileMode mode = treeWalk.getFileMode(0);
+		treeWalk = treeWalk.resetRoot(db, tree, curs);
+		for (;!treeWalk.eof(); treeWalk = treeWalk.next()) {
+			final FileMode mode = treeWalk.getEntryFileMode();
 			final int sType = mode.getObjectType();
 
 			switch (sType) {
 			case Constants.OBJ_BLOB: {
-				treeWalk.getObjectId(idBuffer, 0);
+				treeWalk.getEntryObjectId(idBuffer);
 				lookupAny(idBuffer, sType).flags |= UNINTERESTING;
 				continue;
 			}
 			case Constants.OBJ_TREE: {
-				treeWalk.getObjectId(idBuffer, 0);
+				treeWalk.getEntryObjectId(idBuffer);
 				final RevObject subtree = lookupAny(idBuffer, sType);
 				if ((subtree.flags & UNINTERESTING) == 0) {
 					subtree.flags |= UNINTERESTING;
-					treeWalk.enterSubtree();
+					treeWalk = treeWalk.createSubtreeIterator(db, idBuffer,
+							curs);
 				}
 				continue;
 			}
 			default:
 				if (FileMode.GITLINK.equals(mode))
 					continue;
-				treeWalk.getObjectId(idBuffer, 0);
+				treeWalk.getEntryObjectId(idBuffer);
 				throw new CorruptObjectException("Invalid mode " + mode
 						+ " for " + idBuffer.name() + " "
-						+ treeWalk.getPathString() + " in " + tree + ".");
+						+ treeWalk.getEntryPathString() + " in " + tree + ".");
 			}
 		}
 	}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/treewalk/AbstractTreeIterator.java b/org.spearce.jgit/src/org/spearce/jgit/treewalk/AbstractTreeIterator.java
index c404cdc..0229582 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/treewalk/AbstractTreeIterator.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/treewalk/AbstractTreeIterator.java
@@ -308,6 +308,26 @@ public ObjectId getEntryObjectId() {
 	}
 
 	/**
+	 * Obtain the ObjectId for the current entry.
+	 * 
+	 * @param out
+	 *            buffer to copy the object id into.
+	 */
+	public void getEntryObjectId(final MutableObjectId out) {
+		out.fromRaw(idBuffer(), idOffset());
+	}
+
+	/** @return the file mode of the current entry. */
+	public FileMode getEntryFileMode() {
+		return FileMode.fromBits(mode);
+	}
+
+	/** @return path of the current entry, as a string. */
+	public String getEntryPathString() {
+		return TreeWalk.pathOf(this);
+	}
+
+	/**
 	 * Get the byte array buffer object IDs must be copied out of.
 	 * <p>
 	 * The id buffer contains the bytes necessary to construct an ObjectId for
diff --git a/org.spearce.jgit/src/org/spearce/jgit/treewalk/CanonicalTreeParser.java b/org.spearce.jgit/src/org/spearce/jgit/treewalk/CanonicalTreeParser.java
index df3384d..6a5bada 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/treewalk/CanonicalTreeParser.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/treewalk/CanonicalTreeParser.java
@@ -52,6 +52,8 @@
 
 /** Parses raw Git trees from the canonical semi-text/semi-binary format. */
 public class CanonicalTreeParser extends AbstractTreeIterator {
+	private static final byte[] EMPTY = {};
+
 	private byte[] raw;
 
 	/** First offset within {@link #raw} of the current entry's data. */
@@ -62,7 +64,7 @@
 
 	/** Create a new parser. */
 	public CanonicalTreeParser() {
-		// Nothing necessary.
+		raw = EMPTY;
 	}
 
 	private CanonicalTreeParser(final CanonicalTreeParser p) {
@@ -92,6 +94,41 @@ public void reset(final byte[] treeData) {
 	 *            messages if data corruption is found.
 	 * @param curs
 	 *            window cursor to use during repository access.
+	 * @return the root level parser.
+	 * @throws MissingObjectException
+	 *             the object supplied is not available from the repository.
+	 * @throws IncorrectObjectTypeException
+	 *             the object supplied as an argument is not actually a tree and
+	 *             cannot be parsed as though it were a tree.
+	 * @throws IOException
+	 *             a loose object or pack file could not be read.
+	 */
+	public CanonicalTreeParser resetRoot(final Repository repo,
+			final AnyObjectId id, final WindowCursor curs)
+			throws IncorrectObjectTypeException, IOException {
+		CanonicalTreeParser p = this;
+		while (p.parent != null)
+			p = (CanonicalTreeParser) p.parent;
+		p.reset(repo, id, curs);
+		return p;
+	}
+
+	/** @return this iterator, or its parent, if the tree is at eof. */
+	public CanonicalTreeParser next() {
+		next(1);
+		return eof() && parent != null ? (CanonicalTreeParser) parent : this;
+	}
+
+	/**
+	 * Reset this parser to walk through the given tree.
+	 * 
+	 * @param repo
+	 *            repository to load the tree data from.
+	 * @param id
+	 *            identity of the tree being parsed; used only in exception
+	 *            messages if data corruption is found.
+	 * @param curs
+	 *            window cursor to use during repository access.
 	 * @throws MissingObjectException
 	 *             the object supplied is not available from the repository.
 	 * @throws IncorrectObjectTypeException
diff --git a/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java b/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java
index cf66bf4..d1841b3 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/treewalk/TreeWalk.java
@@ -596,7 +596,7 @@ public ObjectId getObjectId(final int nth) {
 	public void getObjectId(final MutableObjectId out, final int nth) {
 		final AbstractTreeIterator t = trees[nth];
 		if (t.matches == currentHead)
-			out.fromRaw(t.idBuffer(), t.idOffset());
+			t.getEntryObjectId(out);
 		else
 			out.clear();
 	}
@@ -834,7 +834,7 @@ private CanonicalTreeParser parserFor(final AnyObjectId id)
 		return p;
 	}
 
-	private static String pathOf(final AbstractTreeIterator t) {
+	static String pathOf(final AbstractTreeIterator t) {
 		return RawParseUtils.decode(Constants.CHARSET, t.path, 0, t.pathLen);
 	}
 }
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 08/23] Remove the unused PackFile.get(ObjectId) form
  2008-12-25  2:11             ` [JGIT PATCH 07/23] Switch ObjectWalk to use a naked CanonicalTreeParser because its faster Shawn O. Pearce
@ 2008-12-25  2:11               ` Shawn O. Pearce
  2008-12-25  2:11                 ` [JGIT PATCH 09/23] Remove getId from ObjectLoader API as its unnecessary overhead Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Everyone passes in a WindowCursor, except this one test case.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../tst/org/spearce/jgit/lib/T0004_PackReader.java |    2 +-
 .../src/org/spearce/jgit/lib/PackFile.java         |   19 -------------------
 2 files changed, 1 insertions(+), 20 deletions(-)

diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/T0004_PackReader.java b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/T0004_PackReader.java
index 8288e56..f6fff52 100644
--- a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/T0004_PackReader.java
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/T0004_PackReader.java
@@ -55,7 +55,7 @@ public void test003_lookupCompressedObject() throws IOException {
 
 		id = ObjectId.fromString("902d5476fa249b7abc9d84c611577a81381f0327");
 		pr = new PackFile(db, TEST_IDX, TEST_PACK);
-		or = pr.get(id);
+		or = pr.get(new WindowCursor(), id);
 		assertNotNull(or);
 		assertEquals(id, or.getId());
 		assertEquals(Constants.OBJ_TREE, or.getType());
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
index fc1b6ea..cd17bd4 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
@@ -117,25 +117,6 @@ public boolean hasObject(final AnyObjectId id) {
 	/**
 	 * Get an object from this pack.
 	 * 
-	 * @param id
-	 *            the object to obtain from the pack. Must not be null.
-	 * @return the object loader for the requested object if it is contained in
-	 *         this pack; null if the object was not found.
-	 * @throws IOException
-	 *             the pack file or the index could not be read.
-	 */
-	public PackedObjectLoader get(final AnyObjectId id) throws IOException {
-		final WindowCursor wc = new WindowCursor();
-		try {
-			return get(wc, id);
-		} finally {
-			wc.release();
-		}
-	}
-
-	/**
-	 * Get an object from this pack.
-	 * 
 	 * @param curs
 	 *            temporary working space associated with the calling thread.
 	 * @param id
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 09/23] Remove getId from ObjectLoader API as its unnecessary overhead
  2008-12-25  2:11               ` [JGIT PATCH 08/23] Remove the unused PackFile.get(ObjectId) form Shawn O. Pearce
@ 2008-12-25  2:11                 ` Shawn O. Pearce
  2008-12-25  2:11                   ` [JGIT PATCH 10/23] Make mmap mode more reliable by forcing GC at the correct spot Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Apparently nobody actually requires the getId() method on an ObjectLoader.
In every single location where the getId() is being used the caller has
an AnyObjectId wich the proper value already in-scope, typically the value
they passed into Repository.openObject().  Avoiding having the id in the
loader API means PackFile doesn't have to copy a MutableObjectId into an
immutable ObjectId, saving an object allocation per object lookup.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../tst/org/spearce/jgit/lib/T0004_PackReader.java |    2 -
 .../jgit/errors/CorruptObjectException.java        |   12 ++++++
 .../src/org/spearce/jgit/lib/Constants.java        |    2 +-
 .../spearce/jgit/lib/DeltaPackedObjectLoader.java  |    3 +-
 .../src/org/spearce/jgit/lib/ObjectLoader.java     |   38 --------------------
 .../src/org/spearce/jgit/lib/PackFile.java         |   12 +-----
 .../src/org/spearce/jgit/lib/Repository.java       |    2 +-
 .../org/spearce/jgit/lib/UnpackedObjectLoader.java |   12 +++----
 .../spearce/jgit/lib/WholePackedObjectLoader.java  |    3 +-
 .../src/org/spearce/jgit/revwalk/RevWalk.java      |    8 ++--
 .../jgit/transport/WalkFetchConnection.java        |   15 +++++++-
 11 files changed, 42 insertions(+), 67 deletions(-)

diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/T0004_PackReader.java b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/T0004_PackReader.java
index f6fff52..6ffb904 100644
--- a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/T0004_PackReader.java
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/T0004_PackReader.java
@@ -57,7 +57,6 @@ public void test003_lookupCompressedObject() throws IOException {
 		pr = new PackFile(db, TEST_IDX, TEST_PACK);
 		or = pr.get(new WindowCursor(), id);
 		assertNotNull(or);
-		assertEquals(id, or.getId());
 		assertEquals(Constants.OBJ_TREE, or.getType());
 		assertEquals(35, or.getSize());
 		assertEquals(7738, or.getDataOffset());
@@ -72,7 +71,6 @@ public void test004_lookupDeltifiedObject() throws IOException {
 		or = db.openObject(id);
 		assertNotNull(or);
 		assertTrue(or instanceof PackedObjectLoader);
-		assertEquals(id, or.getId());
 		assertEquals(Constants.OBJ_BLOB, or.getType());
 		assertEquals(18009, or.getSize());
 		assertEquals(537, ((PackedObjectLoader) or).getDataOffset());
diff --git a/org.spearce.jgit/src/org/spearce/jgit/errors/CorruptObjectException.java b/org.spearce.jgit/src/org/spearce/jgit/errors/CorruptObjectException.java
index 4eb7aa6..2b0e287 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/errors/CorruptObjectException.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/errors/CorruptObjectException.java
@@ -40,6 +40,7 @@
 
 import java.io.IOException;
 
+import org.spearce.jgit.lib.AnyObjectId;
 import org.spearce.jgit.lib.ObjectId;
 
 /**
@@ -55,6 +56,17 @@
 	 * @param id
 	 * @param why
 	 */
+	public CorruptObjectException(final AnyObjectId id, final String why) {
+		this(id.toObjectId(), why);
+	}
+
+	/**
+	 * Construct a CorruptObjectException for reporting a problem specified
+	 * object id
+	 *
+	 * @param id
+	 * @param why
+	 */
 	public CorruptObjectException(final ObjectId id, final String why) {
 		super("Object " + id.name() + " is corrupt: " + why);
 	}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/Constants.java b/org.spearce.jgit/src/org/spearce/jgit/lib/Constants.java
index 9613d07..8f093d6 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/Constants.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/Constants.java
@@ -311,7 +311,7 @@ public static String typeString(final int typeCode) {
 	 * @throws CorruptObjectException
 	 *             there is no valid type identified by <code>typeString</code>.
 	 */
-	public static int decodeTypeString(final ObjectId id,
+	public static int decodeTypeString(final AnyObjectId id,
 			final byte[] typeString, final byte endMark,
 			final MutableInteger offset) throws CorruptObjectException {
 		try {
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaPackedObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaPackedObjectLoader.java
index e73f8e5..85f2bda 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaPackedObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/DeltaPackedObjectLoader.java
@@ -92,7 +92,8 @@ public long getSize() throws IOException {
 			return data;
 		} catch (DataFormatException dfe) {
 			final CorruptObjectException coe;
-			coe = new CorruptObjectException(getId(), "bad stream");
+			coe = new CorruptObjectException("Object at " + dataOffset + " in "
+					+ pack.getPackFile() + " has bad zlib stream");
 			coe.initCause(dfe);
 			throw coe;
 		}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/ObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/ObjectLoader.java
index 8d745dd..5485d8d 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/ObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/ObjectLoader.java
@@ -39,50 +39,12 @@
 package org.spearce.jgit.lib;
 
 import java.io.IOException;
-import java.security.MessageDigest;
 
 /**
  * Base class for a set of loaders for different representations of Git objects.
  * New loaders are constructed for every object.
  */
 public abstract class ObjectLoader {
-	private ObjectId objectId;
-
-	/**
-	 * @return the id of this object, possibly computed on demand
-	 * @throws IOException
-	 */
-	public ObjectId getId() throws IOException {
-		if (objectId == null) {
-			final MessageDigest md = Constants.newMessageDigest();
-			md.update(Constants.encodedTypeString(getType()));
-			md.update((byte) ' ');
-			md.update(Constants.encodeASCII(getSize()));
-			md.update((byte) 0);
-			md.update(getCachedBytes());
-			objectId = ObjectId.fromRaw(md.digest());
-		}
-		return objectId;
-	}
-
-	/**
-	 * @return true if id of loaded object is already known, false otherwise.
-	 */
-	protected boolean hasComputedId() {
-		return objectId != null;
-	}
-
-	/**
-	 * Set the SHA-1 id of the object handled by this loader
-	 * 
-	 * @param id
-	 */
-	protected void setId(final ObjectId id) {
-		if (objectId != null)
-			throw new IllegalStateException("Id already set.");
-		objectId = id;
-	}
-
 	/**
 	 * @return Git in pack object type, see {@link Constants}.
 	 * @throws IOException
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
index cd17bd4..3cdca8f 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
@@ -129,11 +129,7 @@ public boolean hasObject(final AnyObjectId id) {
 	public PackedObjectLoader get(final WindowCursor curs, final AnyObjectId id)
 			throws IOException {
 		final long offset = idx.findOffset(id);
-		if (offset == -1)
-			return null;
-		final PackedObjectLoader objReader = reader(curs, offset);
-		objReader.setId(id.toObjectId());
-		return objReader;
+		return 0 < offset ? reader(curs, offset) : null;
 	}
 
 	/**
@@ -219,11 +215,7 @@ final void copyRawData(final PackedObjectLoader loader,
 			pack.copyToStream(dataOffset, buf, cnt, crcOut, curs);
 			final long computed = crc.getValue();
 
-			ObjectId id;
-			if (loader.hasComputedId())
-				id = loader.getId();
-			else
-				id = findObjectForOffset(objectOffset);
+			final ObjectId id = findObjectForOffset(objectOffset);
 			final long expected = idx.findCRC32(id);
 			if (computed != expected)
 				throw new CorruptObjectException(id,
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/Repository.java b/org.spearce.jgit/src/org/spearce/jgit/lib/Repository.java
index 8f6e96e..02f8103 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/Repository.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/Repository.java
@@ -317,7 +317,7 @@ public ObjectLoader openObject(final WindowCursor curs, final AnyObjectId id)
 			} while (k > 0);
 		}
 		try {
-			return new UnpackedObjectLoader(this, id.toObjectId());
+			return new UnpackedObjectLoader(this, id);
 		} catch (FileNotFoundException fnfe) {
 			return null;
 		}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectLoader.java
index 3ad273f..0560c3a 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/UnpackedObjectLoader.java
@@ -67,13 +67,13 @@
 	 *            SHA-1
 	 * @throws IOException
 	 */
-	public UnpackedObjectLoader(final Repository db, final ObjectId id)
+	public UnpackedObjectLoader(final Repository db, final AnyObjectId id)
 			throws IOException {
 		this(readCompressed(db, id), id);
 	}
 
-	private static byte[] readCompressed(final Repository db, final ObjectId id)
-			throws FileNotFoundException, IOException {
+	private static byte[] readCompressed(final Repository db,
+			final AnyObjectId id) throws FileNotFoundException, IOException {
 		final FileInputStream objStream = new FileInputStream(db.toFile(id));
 		final byte[] compressed;
 		try {
@@ -101,10 +101,8 @@ public UnpackedObjectLoader(final byte[] compressed)
 		this(compressed, null);
 	}
 
-	private UnpackedObjectLoader(final byte[] compressed, final ObjectId id)
+	private UnpackedObjectLoader(final byte[] compressed, final AnyObjectId id)
 			throws CorruptObjectException {
-		setId(id);
-
 		// Try to determine if this is a legacy format loose object or
 		// a new style loose object. The legacy format was completely
 		// compressed with zlib so the first byte must be 0x78 (15-bit
@@ -177,7 +175,7 @@ private UnpackedObjectLoader(final byte[] compressed, final ObjectId id)
 		}
 	}
 
-	private void decompress(final ObjectId id, final Inflater inf, int p)
+	private void decompress(final AnyObjectId id, final Inflater inf, int p)
 			throws CorruptObjectException {
 		try {
 			while (!inf.finished())
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/WholePackedObjectLoader.java b/org.spearce.jgit/src/org/spearce/jgit/lib/WholePackedObjectLoader.java
index 7185df5..3b4f90d 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/WholePackedObjectLoader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/WholePackedObjectLoader.java
@@ -72,7 +72,8 @@ WholePackedObjectLoader(final WindowCursor curs, final PackFile pr,
 			return data;
 		} catch (DataFormatException dfe) {
 			final CorruptObjectException coe;
-			coe = new CorruptObjectException(getId(), "bad stream");
+			coe = new CorruptObjectException("Object at " + dataOffset + " in "
+					+ pack.getPackFile() + " has bad zlib stream");
 			coe.initCause(dfe);
 			throw coe;
 		}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/revwalk/RevWalk.java b/org.spearce.jgit/src/org/spearce/jgit/revwalk/RevWalk.java
index b1571ab..dfb34d9 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/revwalk/RevWalk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/revwalk/RevWalk.java
@@ -687,23 +687,23 @@ public RevObject parseAny(final AnyObjectId id)
 			final int type = ldr.getType();
 			switch (type) {
 			case Constants.OBJ_COMMIT: {
-				final RevCommit c = createCommit(ldr.getId());
+				final RevCommit c = createCommit(id);
 				c.parseCanonical(this, data);
 				r = c;
 				break;
 			}
 			case Constants.OBJ_TREE: {
-				r = new RevTree(ldr.getId());
+				r = new RevTree(id);
 				r.flags |= PARSED;
 				break;
 			}
 			case Constants.OBJ_BLOB: {
-				r = new RevBlob(ldr.getId());
+				r = new RevBlob(id);
 				r.flags |= PARSED;
 				break;
 			}
 			case Constants.OBJ_TAG: {
-				final RevTag t = new RevTag(ldr.getId());
+				final RevTag t = new RevTag(id);
 				t.parseCanonical(this, data);
 				r = t;
 				break;
diff --git a/org.spearce.jgit/src/org/spearce/jgit/transport/WalkFetchConnection.java b/org.spearce.jgit/src/org/spearce/jgit/transport/WalkFetchConnection.java
index 5d0c6bc..93b5bd2 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/transport/WalkFetchConnection.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/transport/WalkFetchConnection.java
@@ -42,6 +42,7 @@
 import java.io.FileNotFoundException;
 import java.io.FileOutputStream;
 import java.io.IOException;
+import java.security.MessageDigest;
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.HashMap;
@@ -155,6 +156,8 @@
 
 	private final MutableObjectId idBuffer = new MutableObjectId();
 
+	private final MessageDigest objectDigest = Constants.newMessageDigest();
+
 	/**
 	 * Errors received while trying to obtain an object.
 	 * <p>
@@ -573,9 +576,17 @@ private void verifyLooseObject(final AnyObjectId id, final byte[] compressed)
 			throw e;
 		}
 
-		if (!AnyObjectId.equals(id, uol.getId())) {
+		objectDigest.reset();
+		objectDigest.update(Constants.encodedTypeString(uol.getType()));
+		objectDigest.update((byte) ' ');
+		objectDigest.update(Constants.encodeASCII(uol.getSize()));
+		objectDigest.update((byte) 0);
+		objectDigest.update(uol.getCachedBytes());
+		idBuffer.fromRaw(objectDigest.digest(), 0);
+
+		if (!AnyObjectId.equals(id, idBuffer)) {
 			throw new TransportException("Incorrect hash for " + id.name()
-					+ "; computed " + uol.getId().name() + " as a "
+					+ "; computed " + idBuffer.name() + " as a "
 					+ Constants.encodedTypeString(uol.getType()) + " from "
 					+ compressed.length + " bytes.");
 		}
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 10/23] Make mmap mode more reliable by forcing GC at the correct spot
  2008-12-25  2:11                 ` [JGIT PATCH 09/23] Remove getId from ObjectLoader API as its unnecessary overhead Shawn O. Pearce
@ 2008-12-25  2:11                   ` Shawn O. Pearce
  2008-12-25  2:11                     ` [JGIT PATCH 11/23] Rewrite WindowCache to use a hash table Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

ObjectLoaders can defer the bulk of their data access to the
getCachedBytes() method invocation, rather than at the time
of their construction.  That means there's code paths into
the mmap allocator which aren't protected by the "try GC and
try again" within openObject, resulting in some random crashes
due to mmap failures.

The right place to check for mmap failure and retry is within
WindowedFile.loadWindow, where we actually invoke the mmap call.
If the call fails, we need can immediately retry it, and if it
fails again, we can temporarily degrade to the byte[] variant
until the JVM is able to evict and GC enough space.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/lib/Repository.java       |   27 +-----------
 .../src/org/spearce/jgit/lib/WindowedFile.java     |   43 ++++++++++++++-----
 2 files changed, 34 insertions(+), 36 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/Repository.java b/org.spearce.jgit/src/org/spearce/jgit/lib/Repository.java
index 02f8103..e1c4049 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/Repository.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/Repository.java
@@ -290,30 +290,9 @@ public ObjectLoader openObject(final WindowCursor curs, final AnyObjectId id)
 		int k = packs.length;
 		if (k > 0) {
 			do {
-				try {
-					final ObjectLoader ol = packs[--k].get(curs, id);
-					if (ol != null)
-						return ol;
-				} catch (IOException ioe) {
-					// This shouldn't happen unless the pack was corrupted
-					// after we opened it or the VM runs out of memory. This is
-					// a know problem with memory mapped I/O in java and have
-					// been noticed with JDK < 1.6. Tell the gc that now is a good
-					// time to collect and try once more.
-					try {
-						curs.release();
-						System.gc();
-						final ObjectLoader ol = packs[k].get(curs, id);
-						if (ol != null)
-							return ol;
-					} catch (IOException ioe2) {
-						ioe2.printStackTrace();
-						ioe.printStackTrace();
-						// Still fails.. that's BAD, maybe the pack has
-						// been corrupted after all, or the gc didn't manage
-						// to release enough previously mmaped areas.
-					}
-				}
+				final ObjectLoader ol = packs[--k].get(curs, id);
+				if (ol != null)
+					return ol;
 			} while (k > 0);
 		}
 		try {
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
index f640c42..dd94f24 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
@@ -299,22 +299,41 @@ void cacheClose() {
 	}
 
 	void loadWindow(final WindowCursor curs, final int windowId,
-			final long pos, final int windowSize) throws IOException {
+			final long pos, final int size) throws IOException {
 		if (WindowCache.mmap) {
-			final MappedByteBuffer map = fd.getChannel().map(MapMode.READ_ONLY,
-					pos, windowSize);
-			if (map.hasArray()) {
-				final byte[] b = map.array();
-				curs.window = new ByteArrayWindow(this, pos, windowId, b);
-				curs.handle = b;
-			} else {
-				curs.window = new ByteBufferWindow(this, pos, windowId, map);
-				curs.handle = map;
+			MappedByteBuffer map;
+			try {
+				map = fd.getChannel().map(MapMode.READ_ONLY, pos, size);
+			} catch (IOException e) {
+				// The most likely reason this failed is the JVM has run out
+				// of virtual memory. We need to discard quickly, and try to
+				// force the GC to finalize and release any existing mappings.
+				try {
+					curs.release();
+					System.gc();
+					System.runFinalization();
+					map = fd.getChannel().map(MapMode.READ_ONLY, pos, size);
+				} catch (IOException ioe2) {
+					// Temporarily disable mmap and do buffered disk IO.
+					//
+					map = null;
+					System.err.println("warning: mmap failure: "+ioe2);
+				}
+			}
+			if (map != null) {
+				if (map.hasArray()) {
+					final byte[] b = map.array();
+					curs.window = new ByteArrayWindow(this, pos, windowId, b);
+					curs.handle = b;
+				} else {
+					curs.window = new ByteBufferWindow(this, pos, windowId, map);
+					curs.handle = map;
+				}
+				return;
 			}
-			return;
 		}
 
-		final byte[] b = new byte[windowSize];
+		final byte[] b = new byte[size];
 		synchronized (fd) {
 			fd.seek(pos);
 			fd.readFully(b);
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 11/23] Rewrite WindowCache to use a hash table
  2008-12-25  2:11                   ` [JGIT PATCH 10/23] Make mmap mode more reliable by forcing GC at the correct spot Shawn O. Pearce
@ 2008-12-25  2:11                     ` Shawn O. Pearce
  2008-12-25  2:11                       ` [JGIT PATCH 12/23] Change ByteArrayWindow to read content outside of WindowCache's lock Shawn O. Pearce
  2008-12-27 13:30                       ` [JGIT PATCH 11/23] Rewrite WindowCache to use a hash table Robin Rosenberg
  0 siblings, 2 replies; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

This shaved 10s off my linux-2.6 "jgit rev-list --objects" test.
We're also getting slightly better cache hit ratios, approaching
95% during commit access and 80% during tree access on the same
repository.  Its a significant win.

The cache is also a bit better about memory usage.  We now make
sure the byte[] or ByteBuffer handle is cleared out of the soft
reference when we push the reference into the queue manually. A
small change, but it makes a marked difference with the overall
JVM memory usage.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/lib/ByteWindow.java       |   10 +-
 .../src/org/spearce/jgit/lib/WindowCache.java      |  323 ++++++++++++--------
 .../src/org/spearce/jgit/lib/WindowedFile.java     |    2 +-
 3 files changed, 199 insertions(+), 136 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/ByteWindow.java b/org.spearce.jgit/src/org/spearce/jgit/lib/ByteWindow.java
index 1f6beb8..8d37de7 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/ByteWindow.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/ByteWindow.java
@@ -56,14 +56,20 @@
  *            type of object reference used to manage the window data.
  */
 abstract class ByteWindow<T> extends SoftReference<T> {
+	boolean sizeActive = true;
+
+	ByteWindow<?> chainNext;
+
+	ByteWindow<?> lruPrev;
+
+	ByteWindow<?> lruNext;
+
 	final WindowedFile provider;
 
 	final int id;
 
 	final int size;
 
-	int lastAccessed;
-
 	final long start;
 
 	final long end;
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java
index f617845..f478f04 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2007, Robin Rosenberg <robin.rosenberg@dewire.com>
+ * Copyright (C) 2008, Google Inc.
  * Copyright (C) 2008, Shawn O. Pearce <spearce@spearce.org>
  *
  * All rights reserved.
@@ -41,11 +41,7 @@
 import java.io.IOException;
 import java.lang.ref.ReferenceQueue;
 
-/**
- * The WindowCache manages reusable <code>Windows</code> and inflaters used by
- * the other windowed file access classes.
- */
-public class WindowCache {
+class WindowCache {
 	private static final int KB = 1024;
 
 	private static final int MB = 1024 * KB;
@@ -68,23 +64,51 @@ private static final int bits(int newSize) {
 
 	static final ReferenceQueue<?> clearedWindowQueue;
 
-	private static ByteWindow[] windows;
+	private static ByteWindow[] cache;
+
+	private static ByteWindow lruHead;
 
-	private static int openWindowCount;
+	private static ByteWindow lruTail;
 
 	private static int openByteCount;
 
-	private static int accessClock;
+	private static int hits;
+
+	private static int reqs;
+
+	private static int opens;
+
+	private static int closes;
 
 	static {
 		maxByteCount = 10 * MB;
 		windowSizeShift = bits(8 * KB);
 		windowSize = 1 << windowSizeShift;
 		mmap = false;
-		windows = new ByteWindow[maxByteCount / windowSize];
+		cache = new ByteWindow[cacheTableSize()];
 		clearedWindowQueue = new ReferenceQueue<Object>();
 	}
 
+	static synchronized String statString() {
+		int maxChain = 0, tot = 0;
+		for (ByteWindow<?> e : cache) {
+			int n = 0;
+			for (; e != null; e = e.chainNext) {
+				n++;
+				tot++;
+			}
+			maxChain = Math.max(maxChain, n);
+		}
+		return "WindowCache[ hits: " + (hits * 100 / reqs) + "%"
+				+ "; windows: " + tot + " maxChain: " + maxChain + "; kb:"
+				+ (openByteCount / KB) + "; cache: " + cache.length + " fds: "
+				+ (opens - closes) + "," + opens + "," + closes + " ]";
+	}
+
+	private static int cacheTableSize() {
+		return 5 * (maxByteCount / windowSize) / 2;
+	}
+
 	/**
 	 * Modify the configuration of the window cache.
 	 * <p>
@@ -135,27 +159,36 @@ private static synchronized void reconfigureImpl(final int packedGitLimit,
 			// We have to throw away every window we have. None
 			// of them are suitable for the new configuration.
 			//
-			for (int i = 0; i < openWindowCount; i++) {
-				final ByteWindow win = windows[i];
-				if (--win.provider.openCount == 0)
-					win.provider.cacheClose();
-				windows[i] = null;
+			for (ByteWindow<?> e : cache) {
+				for (; e != null; e = e.chainNext)
+					clear(e);
+			}
+			runClearedWindowQueue();
+			cache = new ByteWindow[cacheTableSize()];
+
+		} else {
+			if (prune) {
+				// We should decrease our memory usage.
+				//
+				releaseMemory();
+				runClearedWindowQueue();
 			}
-			windows = new ByteWindow[maxByteCount / windowSize];
-			openWindowCount = 0;
-			openByteCount = 0;
-		} else if (prune) {
-			// Our memory limit was decreased so we should try
-			// to drop windows to ensure we meet the new lower
-			// limit we were just given.
-			//
-			final int wincnt = maxByteCount / windowSize;
-			releaseMemory(wincnt, null, 0, 0);
 
-			if (wincnt != windows.length) {
-				final ByteWindow[] n = new ByteWindow[wincnt];
-				System.arraycopy(windows, 0, n, 0, openWindowCount);
-				windows = n;
+			if (cache.length != cacheTableSize()) {
+				// The cache table should be resized.
+				// Rehash every entry.
+				//
+				final ByteWindow[] priorTable = cache;
+
+				cache = new ByteWindow[cacheTableSize()];
+				for (ByteWindow<?> e : priorTable) {
+					for (ByteWindow<?> n; e != null; e = n) {
+						n = e.chainNext;
+						final int idx = hash(e.provider, e.id);
+						e.chainNext = cache[idx];
+						cache[idx] = e;
+					}
+				}
 			}
 		}
 	}
@@ -178,19 +211,27 @@ private static synchronized void reconfigureImpl(final int packedGitLimit,
 	 */
 	public static synchronized final void get(final WindowCursor curs,
 			final WindowedFile wp, final long position) throws IOException {
+		reqs++;
 		final int id = (int) (position >> windowSizeShift);
-		int idx = binarySearch(wp, id);
-		if (0 <= idx) {
-			final ByteWindow<?> w = windows[idx];
-			if ((curs.handle = w.get()) != null) {
-				w.lastAccessed = ++accessClock;
-				curs.window = w;
-				return;
+		final int idx = hash(wp, id);
+		for (ByteWindow<?> e = cache[idx]; e != null; e = e.chainNext) {
+			if (e.provider == wp && e.id == id) {
+				if ((curs.handle = e.get()) != null) {
+					curs.window = e;
+					makeMostRecent(e);
+					hits++;
+					return;
+				}
+
+				clear(e);
+				break;
 			}
 		}
 
-		if (++wp.openCount == 1) {
+		if (wp.openCount == 0) {
 			try {
+				opens++;
+				wp.openCount = 1;
 				wp.cacheOpen();
 			} catch (IOException ioe) {
 				wp.openCount = 0;
@@ -201,106 +242,55 @@ public static synchronized final void get(final WindowCursor curs,
 			} catch (Error ioe) {
 				wp.openCount = 0;
 				throw ioe;
+			} finally {
+				wp.openCount--;
 			}
 
 			// The cacheOpen may have mapped the window we are trying to
 			// map ourselves. Retrying the search ensures that does not
 			// happen to us.
 			//
-			idx = binarySearch(wp, id);
-			if (0 <= idx) {
-				final ByteWindow<?> w = windows[idx];
-				if ((curs.handle = w.get()) != null) {
-					w.lastAccessed = ++accessClock;
-					curs.window = w;
-					return;
+			for (ByteWindow<?> e = cache[idx]; e != null; e = e.chainNext) {
+				if (e.provider == wp && e.id == id) {
+					if ((curs.handle = e.get()) != null) {
+						curs.window = e;
+						makeMostRecent(e);
+						return;
+					}
+
+					clear(e);
+					break;
 				}
 			}
 		}
 
-		idx = -(idx + 1);
-		final int wSz = windowSize(wp, id);
-		idx = releaseMemory(windows.length, wp, idx, wSz);
-
-		if (idx < 0)
-			idx = 0;
-		final int toMove = openWindowCount - idx;
-		if (toMove > 0)
-			System.arraycopy(windows, idx, windows, idx + 1, toMove);
-		wp.loadWindow(curs, id, id << windowSizeShift, wSz);
-		windows[idx] = curs.window;
-		openWindowCount++;
-		openByteCount += curs.window.size;
+		final int wsz = windowSize(wp, id);
+		wp.openCount++;
+		openByteCount += wsz;
+		releaseMemory();
+		runClearedWindowQueue();
+
+		wp.loadWindow(curs, id, id << windowSizeShift, wsz);
+		final ByteWindow<?> e = curs.window;
+		e.chainNext = cache[idx];
+		cache[idx] = e;
+		insertLRU(e);
 	}
 
-	private static int releaseMemory(final int maxWindowCount,
-			final WindowedFile willRead, int insertionIndex, final int willAdd) {
-		for (;;) {
-			final ByteWindow<?> w = (ByteWindow<?>) clearedWindowQueue.poll();
-			if (w == null)
-				break;
-			final int oldest = binarySearch(w.provider, w.id);
-			if (oldest < 0 || windows[oldest] != w)
-				continue; // Must have been evicted by our other controls.
-
-			final WindowedFile p = w.provider;
-			if (--p.openCount == 0 && p != willRead)
-				p.cacheClose();
-
-			openByteCount -= w.size;
-			final int toMove = openWindowCount - oldest - 1;
-			if (toMove > 0)
-				System.arraycopy(windows, oldest + 1, windows, oldest, toMove);
-			windows[--openWindowCount] = null;
-			if (oldest < insertionIndex)
-				insertionIndex--;
-		}
-
-		while (openWindowCount >= maxWindowCount
-				|| (openWindowCount > 0 && openByteCount + willAdd > maxByteCount)) {
-			int oldest = 0;
-			for (int k = openWindowCount - 1; k > 0; k--) {
-				if (windows[k].lastAccessed < windows[oldest].lastAccessed)
-					oldest = k;
-			}
-
-			final ByteWindow w = windows[oldest];
-			final WindowedFile p = w.provider;
-			if (--p.openCount == 0 && p != willRead)
-				p.cacheClose();
-
-			openByteCount -= w.size;
-			final int toMove = openWindowCount - oldest - 1;
-			if (toMove > 0)
-				System.arraycopy(windows, oldest + 1, windows, oldest, toMove);
-			windows[--openWindowCount] = null;
-			w.enqueue();
-			if (oldest < insertionIndex)
-				insertionIndex--;
+	private static void makeMostRecent(ByteWindow<?> e) {
+		if (lruHead != e) {
+			unlinkLRU(e);
+			insertLRU(e);
 		}
-
-		return insertionIndex;
 	}
 
-	private static final int binarySearch(final WindowedFile sprov,
-			final int sid) {
-		if (openWindowCount == 0)
-			return -1;
-		final int shc = sprov.hash;
-		int high = openWindowCount;
-		int low = 0;
-		do {
-			final int mid = (low + high) / 2;
-			final ByteWindow mw = windows[mid];
-			if (mw.provider == sprov && mw.id == sid)
-				return mid;
-			final int mhc = mw.provider.hash;
-			if (mhc < shc || (shc == mhc && mw.id < sid))
-				low = mid + 1;
-			else
-				high = mid;
-		} while (low < high);
-		return -(low + 1);
+	private static void releaseMemory() {
+		ByteWindow<?> e = lruTail;
+		while (openByteCount > maxByteCount && e != null) {
+			final ByteWindow<?> p = e.lruPrev;
+			clear(e);
+			e = p;
+		}
 	}
 
 	/**
@@ -316,22 +306,89 @@ private static final int binarySearch(final WindowedFile sprov,
 	 *            cache.
 	 */
 	public static synchronized final void purge(final WindowedFile wp) {
-		int d = 0;
-		for (int s = 0; s < openWindowCount; s++) {
-			final ByteWindow win = windows[s];
-			if (win.provider != wp)
-				windows[d++] = win;
-			else
-				openByteCount -= win.size;
+		for (ByteWindow e : cache) {
+			for (; e != null; e = e.chainNext) {
+				if (e.provider == wp)
+					clear(e);
+			}
+		}
+		runClearedWindowQueue();
+	}
+
+	private static void runClearedWindowQueue() {
+		ByteWindow<?> e;
+		while ((e = (ByteWindow) clearedWindowQueue.poll()) != null) {
+			unlinkSize(e);
+			unlinkLRU(e);
+			unlinkCache(e);
+			e.chainNext = null;
+			e.lruNext = null;
+			e.lruPrev = null;
 		}
-		openWindowCount = d;
+	}
+
+	private static void clear(final ByteWindow<?> e) {
+		unlinkSize(e);
+		e.clear();
+		e.enqueue();
+	}
 
-		if (wp.openCount > 0) {
-			wp.openCount = 0;
-			wp.cacheClose();
+	private static void unlinkSize(final ByteWindow<?> e) {
+		if (e.sizeActive) {
+			if (--e.provider.openCount == 0) {
+				closes++;
+				e.provider.cacheClose();
+			}
+			openByteCount -= e.size;
+			e.sizeActive = false;
 		}
 	}
 
+	private static void unlinkCache(final ByteWindow dead) {
+		final int idx = hash(dead.provider, dead.id);
+		ByteWindow<?> e = cache[idx], p = null, n;
+		for (; e != null; p = e, e = n) {
+			n = e.chainNext;
+			if (e == dead) {
+				if (p == null)
+					cache[idx] = n;
+				else
+					p.chainNext = n;
+				break;
+			}
+		}
+	}
+
+	private static void unlinkLRU(final ByteWindow e) {
+		final ByteWindow<?> prev = e.lruPrev;
+		final ByteWindow<?> next = e.lruNext;
+
+		if (prev != null)
+			prev.lruNext = next;
+		else
+			lruHead = next;
+
+		if (next != null)
+			next.lruPrev = prev;
+		else
+			lruTail = prev;
+	}
+
+	private static void insertLRU(final ByteWindow<?> e) {
+		final ByteWindow h = lruHead;
+		e.lruPrev = null;
+		e.lruNext = h;
+		if (h != null)
+			h.lruPrev = e;
+		else
+			lruTail = e;
+		lruHead = e;
+	}
+
+	private static int hash(final WindowedFile wp, final int id) {
+		return ((wp.hash + id) >>> 1) % cache.length;
+	}
+
 	private static int windowSize(final WindowedFile file, final int id) {
 		final long len = file.length();
 		final long pos = id << windowSizeShift;
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
index dd94f24..9c5cf1e 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
@@ -88,7 +88,7 @@
 	 */
 	public WindowedFile(final File file) {
 		fPath = file;
-		hash = System.identityHashCode(this);
+		hash = System.identityHashCode(this) * 31;
 		length = Long.MAX_VALUE;
 	}
 
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 12/23] Change ByteArrayWindow to read content outside of WindowCache's lock
  2008-12-25  2:11                     ` [JGIT PATCH 11/23] Rewrite WindowCache to use a hash table Shawn O. Pearce
@ 2008-12-25  2:11                       ` Shawn O. Pearce
  2008-12-25  2:11                         ` [JGIT PATCH 13/23] Dispose of RevCommit buffers when they aren't used in PackWriter Shawn O. Pearce
  2008-12-27 13:30                       ` [JGIT PATCH 11/23] Rewrite WindowCache to use a hash table Robin Rosenberg
  1 sibling, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

This improves the concurrency limit of WindowCache by allowing the
individual windows to be paged in outside of the cache lock.  By
moving it out multiple threads can read in different windows of
the same (or different) pack files concurrently, but we still do
only one read per window.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/lib/ByteArrayWindow.java  |   17 +++++++++++++++++
 .../src/org/spearce/jgit/lib/WindowCache.java      |    2 +-
 .../src/org/spearce/jgit/lib/WindowedFile.java     |   15 +++++++--------
 3 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/ByteArrayWindow.java b/org.spearce.jgit/src/org/spearce/jgit/lib/ByteArrayWindow.java
index cea71be..b32d4f8 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/ByteArrayWindow.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/ByteArrayWindow.java
@@ -38,6 +38,8 @@
 
 package org.spearce.jgit.lib;
 
+import java.io.IOException;
+import java.nio.ByteBuffer;
 import java.util.zip.DataFormatException;
 import java.util.zip.Inflater;
 
@@ -45,6 +47,8 @@
  * A {@link ByteWindow} with an underlying byte array for storage.
  */
 final class ByteArrayWindow extends ByteWindow<byte[]> {
+	boolean loaded;
+
 	/**
 	 * Constructor for ByteWindow.
 	 * 
@@ -64,6 +68,7 @@ ByteArrayWindow(final WindowedFile o, final long p, final int d,
 	}
 
 	int copy(final byte[] array, final int p, final byte[] b, final int o, int n) {
+		ensureLoaded(array);
 		n = Math.min(array.length - p, n);
 		System.arraycopy(array, p, b, o, n);
 		return n;
@@ -71,6 +76,7 @@ int copy(final byte[] array, final int p, final byte[] b, final int o, int n) {
 
 	int inflate(final byte[] array, final int pos, final byte[] b, int o,
 			final Inflater inf) throws DataFormatException {
+		ensureLoaded(array);
 		while (!inf.finished()) {
 			if (inf.needsInput()) {
 				inf.setInput(array, pos, array.length - pos);
@@ -82,4 +88,15 @@ int inflate(final byte[] array, final int pos, final byte[] b, int o,
 			o += inf.inflate(b, o, b.length - o);
 		return o;
 	}
+
+	private synchronized void ensureLoaded(final byte[] array) {
+		if (!loaded) {
+			try {
+				provider.fd.getChannel().read(ByteBuffer.wrap(array), start);
+			} catch (IOException e) {
+				throw new RuntimeException("Cannot fault in window", e);
+			}
+			loaded = true;
+		}
+	}
 }
\ No newline at end of file
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java
index f478f04..649567b 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java
@@ -270,7 +270,7 @@ public static synchronized final void get(final WindowCursor curs,
 		releaseMemory();
 		runClearedWindowQueue();
 
-		wp.loadWindow(curs, id, id << windowSizeShift, wsz);
+		wp.allocWindow(curs, id, id << windowSizeShift, wsz);
 		final ByteWindow<?> e = curs.window;
 		e.chainNext = cache[idx];
 		cache[idx] = e;
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
index 9c5cf1e..7626693 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
@@ -73,7 +73,7 @@
 
 	final int hash;
 
-	private RandomAccessFile fd;
+	RandomAccessFile fd;
 
 	private long length;
 
@@ -298,8 +298,8 @@ void cacheClose() {
 		fd = null;
 	}
 
-	void loadWindow(final WindowCursor curs, final int windowId,
-			final long pos, final int size) throws IOException {
+	void allocWindow(final WindowCursor curs, final int windowId,
+			final long pos, final int size) {
 		if (WindowCache.mmap) {
 			MappedByteBuffer map;
 			try {
@@ -323,7 +323,10 @@ void loadWindow(final WindowCursor curs, final int windowId,
 			if (map != null) {
 				if (map.hasArray()) {
 					final byte[] b = map.array();
-					curs.window = new ByteArrayWindow(this, pos, windowId, b);
+					final ByteArrayWindow w;
+					w = new ByteArrayWindow(this, pos, windowId, b);
+					w.loaded = true;
+					curs.window = w;
 					curs.handle = b;
 				} else {
 					curs.window = new ByteBufferWindow(this, pos, windowId, map);
@@ -334,10 +337,6 @@ void loadWindow(final WindowCursor curs, final int windowId,
 		}
 
 		final byte[] b = new byte[size];
-		synchronized (fd) {
-			fd.seek(pos);
-			fd.readFully(b);
-		}
 		curs.window = new ByteArrayWindow(this, pos, windowId, b);
 		curs.handle = b;
 	}
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 13/23] Dispose of RevCommit buffers when they aren't used in PackWriter
  2008-12-25  2:11                       ` [JGIT PATCH 12/23] Change ByteArrayWindow to read content outside of WindowCache's lock Shawn O. Pearce
@ 2008-12-25  2:11                         ` Shawn O. Pearce
  2008-12-25  2:11                           ` [JGIT PATCH 14/23] Don't unpack delta chains while writing a pack from a pack v1 index Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

We don't need the commit buffer, so we might as well throw it
away immediately to reduce the total memory usage within the
writer process.

The same goes for the buffers within UploadPack when its doing
a check to see if it has a complete base.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/lib/PackWriter.java       |    2 ++
 .../src/org/spearce/jgit/transport/UploadPack.java |    1 +
 2 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
index 89460f2..88b2b1f 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
@@ -802,10 +802,12 @@ private void findObjectsToPack(final ObjectWalk walker)
 
 		while ((o = walker.next()) != null) {
 			addObject(o);
+			o.dispose();
 			initMonitor.update(1);
 		}
 		while ((o = walker.nextObject()) != null) {
 			addObject(o);
+			o.dispose();
 			initMonitor.update(1);
 		}
 		initMonitor.endTask();
diff --git a/org.spearce.jgit/src/org/spearce/jgit/transport/UploadPack.java b/org.spearce.jgit/src/org/spearce/jgit/transport/UploadPack.java
index 4401951..d57df03 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/transport/UploadPack.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/transport/UploadPack.java
@@ -434,6 +434,7 @@ private boolean wantSatisfied(final RevCommit want) throws IOException {
 				}
 				return true;
 			}
+			c.dispose();
 		}
 		return false;
 	}
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 14/23] Don't unpack delta chains while writing a pack from a pack v1 index
  2008-12-25  2:11                         ` [JGIT PATCH 13/23] Dispose of RevCommit buffers when they aren't used in PackWriter Shawn O. Pearce
@ 2008-12-25  2:11                           ` Shawn O. Pearce
  2008-12-25  2:11                             ` [JGIT PATCH 15/23] Don't unpack delta chains while converting delta to whole object Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

We only need to inflate the delta to verify the content is sane;
going all the way through the delta chain by getCachedBytes is
a massive expensive that we just cannot afford to pay.  This
simple change improves clone time for the Linux kernel from
14m50s to 3m20s, due to the improved throughput in our write
objects phase.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/lib/ByteArrayWindow.java  |   14 ++++++++++++++
 .../src/org/spearce/jgit/lib/ByteBufferWindow.java |   17 +++++++++++++++++
 .../src/org/spearce/jgit/lib/ByteWindow.java       |   10 ++++++++++
 .../src/org/spearce/jgit/lib/PackFile.java         |   18 +++++++++++-------
 .../src/org/spearce/jgit/lib/WindowCursor.java     |   16 ++++++++++++++++
 .../src/org/spearce/jgit/lib/WindowedFile.java     |    5 +++++
 6 files changed, 73 insertions(+), 7 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/ByteArrayWindow.java b/org.spearce.jgit/src/org/spearce/jgit/lib/ByteArrayWindow.java
index b32d4f8..7b7c7a4 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/ByteArrayWindow.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/ByteArrayWindow.java
@@ -89,6 +89,20 @@ int inflate(final byte[] array, final int pos, final byte[] b, int o,
 		return o;
 	}
 
+	void inflateVerify(final byte[] array, final int pos, final Inflater inf)
+			throws DataFormatException {
+		ensureLoaded(array);
+		while (!inf.finished()) {
+			if (inf.needsInput()) {
+				inf.setInput(array, pos, array.length - pos);
+				break;
+			}
+			inf.inflate(verifyGarbageBuffer, 0, verifyGarbageBuffer.length);
+		}
+		while (!inf.finished() && !inf.needsInput())
+			inf.inflate(verifyGarbageBuffer, 0, verifyGarbageBuffer.length);
+	}
+
 	private synchronized void ensureLoaded(final byte[] array) {
 		if (!loaded) {
 			try {
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/ByteBufferWindow.java b/org.spearce.jgit/src/org/spearce/jgit/lib/ByteBufferWindow.java
index 03bafb1..a6757e8 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/ByteBufferWindow.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/ByteBufferWindow.java
@@ -90,4 +90,21 @@ int inflate(final ByteBuffer buffer, final int pos, final byte[] b, int o,
 			o += inf.inflate(b, o, b.length - o);
 		return o;
 	}
+
+	void inflateVerify(final ByteBuffer buffer, final int pos,
+			final Inflater inf) throws DataFormatException {
+		final byte[] tmp = new byte[512];
+		final ByteBuffer s = buffer.slice();
+		s.position(pos);
+		while (s.remaining() > 0 && !inf.finished()) {
+			if (inf.needsInput()) {
+				final int n = Math.min(s.remaining(), tmp.length);
+				s.get(tmp, 0, n);
+				inf.setInput(tmp, 0, n);
+			}
+			inf.inflate(verifyGarbageBuffer, 0, verifyGarbageBuffer.length);
+		}
+		while (!inf.finished() && !inf.needsInput())
+			inf.inflate(verifyGarbageBuffer, 0, verifyGarbageBuffer.length);
+	}
 }
\ No newline at end of file
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/ByteWindow.java b/org.spearce.jgit/src/org/spearce/jgit/lib/ByteWindow.java
index 8d37de7..732664b 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/ByteWindow.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/ByteWindow.java
@@ -207,4 +207,14 @@ final int inflate(T ref, long pos, byte[] dstbuf, int dstoff, Inflater inf)
 	 */
 	abstract int inflate(T ref, int pos, byte[] dstbuf, int dstoff, Inflater inf)
 			throws DataFormatException;
+
+	protected static final byte[] verifyGarbageBuffer = new byte[2048];
+
+	final void inflateVerify(T ref, long pos, Inflater inf)
+			throws DataFormatException {
+		inflateVerify(ref, (int) (pos - start), inf);
+	}
+
+	abstract void inflateVerify(T ref, int pos, Inflater inf)
+			throws DataFormatException;
 }
\ No newline at end of file
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
index 3cdca8f..0de4c55 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackFile.java
@@ -218,15 +218,19 @@ final void copyRawData(final PackedObjectLoader loader,
 			final ObjectId id = findObjectForOffset(objectOffset);
 			final long expected = idx.findCRC32(id);
 			if (computed != expected)
-				throw new CorruptObjectException(id,
-						"Possible data corruption - CRC32 of raw pack data (object offset "
-								+ objectOffset
-								+ ") mismatch CRC32 from pack index");
+				throw new CorruptObjectException("Object at " + dataOffset
+						+ " in " + getPackFile() + " has bad zlib stream");
 		} else {
+			try {
+				pack.verifyCompressed(dataOffset, curs);
+			} catch (DataFormatException dfe) {
+				final CorruptObjectException coe;
+				coe = new CorruptObjectException("Object at " + dataOffset
+						+ " in " + getPackFile() + " has bad zlib stream");
+				coe.initCause(dfe);
+				throw coe;
+			}
 			pack.copyToStream(dataOffset, buf, cnt, out, curs);
-
-			// read to verify against Adler32 zlib checksum
-			loader.getCachedBytes();
 		}
 	}
 
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCursor.java b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCursor.java
index 7aad081..5c8bec5 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCursor.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCursor.java
@@ -125,6 +125,22 @@ int inflate(final WindowedFile provider, long position,
 		}
 	}
 
+	void inflateVerify(final WindowedFile provider, long position)
+			throws IOException, DataFormatException {
+		if (inf == null)
+			inf = InflaterCache.get();
+		else
+			inf.reset();
+		for (;;) {
+			pin(provider, position);
+			window.inflateVerify(handle, position, inf);
+			if (inf.finished())
+				return;
+			position = window.end;
+		}
+	}
+
+	
 	private void pin(final WindowedFile provider, final long position)
 			throws IOException {
 		final ByteWindow w = window;
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
index 7626693..5eb8465 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
@@ -248,6 +248,11 @@ void readCompressed(final long position, final byte[] dstbuf,
 			throw new EOFException("Short compressed stream at " + position);
 	}
 
+	void verifyCompressed(final long position, final WindowCursor curs)
+			throws IOException, DataFormatException {
+		curs.inflateVerify(this, position);
+	}
+
 	/**
 	 * Overridable hook called after the file is opened.
 	 * <p>
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 15/23] Don't unpack delta chains while converting delta to whole object
  2008-12-25  2:11                           ` [JGIT PATCH 14/23] Don't unpack delta chains while writing a pack from a pack v1 index Shawn O. Pearce
@ 2008-12-25  2:11                             ` Shawn O. Pearce
  2008-12-25  2:11                               ` [JGIT PATCH 16/23] Defer parsing of the ObjectId while walking a PackIndex Iterator Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

If an object is currently a delta but we need to convert it into
a whole object we shouldn't be chasing the delta chain to get its
real type code.  We had the type code at the time we made the list
of objects to pack; so we save it in the ObjectToPack structure.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/lib/PackWriter.java       |   28 ++++++++++++++-----
 1 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
index 88b2b1f..a0823c7 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
@@ -705,14 +705,14 @@ private void writeObject(final ObjectToPack otp) throws IOException {
 	private void writeWholeObject(final ObjectToPack otp) throws IOException {
 		if (otp.hasReuseLoader()) {
 			final PackedObjectLoader loader = otp.getReuseLoader();
-			writeObjectHeader(loader.getType(), loader.getSize());
+			writeObjectHeader(otp.getType(), loader.getSize());
 			loader.copyRawData(out, buf);
 			otp.disposeLoader();
 		} else {
 			final ObjectLoader loader = db.openObject(windowCursor, otp);
 			final DeflaterOutputStream deflaterOut = new DeflaterOutputStream(
 					out, deflater);
-			writeObjectHeader(loader.getType(), loader.getSize());
+			writeObjectHeader(otp.getType(), loader.getSize());
 			deflaterOut.write(loader.getCachedBytes());
 			deflaterOut.finish();
 			deflater.reset();
@@ -833,7 +833,7 @@ public void addObject(final RevObject object)
 			return;
 		}
 
-		final ObjectToPack otp = new ObjectToPack(object);
+		final ObjectToPack otp = new ObjectToPack(object, object.getType());
 		try {
 			objectsLists[object.getType()].add(otp);
 		} catch (ArrayIndexOutOfBoundsException x) {
@@ -858,7 +858,8 @@ public void addObject(final RevObject object)
 
 		private PackedObjectLoader reuseLoader;
 
-		private int deltaDepth;
+		/** Low bits contain the objectType; higher bits the deltaDepth */
+		private int flags;
 
 		private boolean wantWrite;
 
@@ -868,9 +869,12 @@ public void addObject(final RevObject object)
 		 *
 		 * @param src
 		 *            object id of object for packing
+		 * @param type
+		 *            real type code of the object, not its in-pack type.
 		 */
-		ObjectToPack(AnyObjectId src) {
+		ObjectToPack(AnyObjectId src, final int type) {
 			super(src);
+			flags |= type;
 		}
 
 		/**
@@ -946,15 +950,23 @@ void disposeLoader() {
 			this.reuseLoader = null;
 		}
 
+		int getType() {
+			return flags & 0x7;
+		}
+
 		int getDeltaDepth() {
-			return deltaDepth;
+			return flags >>> 3;
 		}
 
 		void updateDeltaDepth() {
+			final int d;
 			if (deltaBase instanceof ObjectToPack)
-				deltaDepth = ((ObjectToPack) deltaBase).deltaDepth + 1;
+				d = ((ObjectToPack) deltaBase).getDeltaDepth() + 1;
 			else if (deltaBase != null)
-				deltaDepth = 1;
+				d = 1;
+			else
+				d = 0;
+			flags = (d << 3) | getType();
 		}
 
 		boolean wantWrite() {
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 16/23] Defer parsing of the ObjectId while walking a PackIndex Iterator
  2008-12-25  2:11                             ` [JGIT PATCH 15/23] Don't unpack delta chains while converting delta to whole object Shawn O. Pearce
@ 2008-12-25  2:11                               ` Shawn O. Pearce
  2008-12-25  2:11                                 ` [JGIT PATCH 17/23] Only do one getCachedBytes per whole object written Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

This is a slight performance improvement for the PackReverseIndex
construction.  The only code that really cares about the ObjectId
from a PackIndex entry is test cases.  By avoiding construction
we can save some CPU time during the several passes we do to make
the reverse index data structure within PackWriter.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../tst/org/spearce/jgit/lib/PackIndexTest.java    |    4 +-
 .../tst/org/spearce/jgit/lib/PackWriterTest.java   |    2 +-
 .../src/org/spearce/jgit/lib/PackIndex.java        |   48 +++++++++++--------
 .../src/org/spearce/jgit/lib/PackIndexV1.java      |   20 +++++---
 .../src/org/spearce/jgit/lib/PackIndexV2.java      |   27 +++++++----
 5 files changed, 61 insertions(+), 40 deletions(-)

diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexTest.java b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexTest.java
index 8ab380e..7163718 100644
--- a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexTest.java
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackIndexTest.java
@@ -134,10 +134,10 @@ assertEquals("c59759f143fb1fe21c197981df75a7ee00290799", iter.next()
 	 */
 	public void testCompareEntriesOffsetsWithFindOffsets() {
 		for (MutableEntry me : smallIdx) {
-			assertEquals(smallIdx.findOffset(me), me.getOffset());
+			assertEquals(smallIdx.findOffset(me.toObjectId()), me.getOffset());
 		}
 		for (MutableEntry me : denseIdx) {
-			assertEquals(denseIdx.findOffset(me), me.getOffset());
+			assertEquals(denseIdx.findOffset(me.toObjectId()), me.getOffset());
 		}
 	}
 
diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackWriterTest.java b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackWriterTest.java
index 3c02955..ec138a0 100644
--- a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackWriterTest.java
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackWriterTest.java
@@ -487,7 +487,7 @@ public int compare(MutableEntry o1, MutableEntry o2) {
 
 		int i = 0;
 		for (MutableEntry me : entries) {
-			assertEquals(objectsOrder[i++], me.copy());
+			assertEquals(objectsOrder[i++].toObjectId(), me.toObjectId());
 		}
 	}
 }
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndex.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndex.java
index 5fb41b1..5b7a62d 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndex.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndex.java
@@ -239,15 +239,10 @@ abstract long findCRC32(AnyObjectId objId) throws MissingObjectException,
 	 * in pack (both mutable).
 	 * 
 	 */
-	public static class MutableEntry extends MutableObjectId {
-		private long offset;
+	public static class MutableEntry {
+		protected final MutableObjectId idBuffer = new MutableObjectId();
 
-		/**
-		 * Empty constructor. Object fields should be filled in later.
-		 */
-		public MutableEntry() {
-			super();
-		}
+		long offset;
 
 		/**
 		 * Returns offset for this index object entry
@@ -258,30 +253,43 @@ public long getOffset() {
 			return offset;
 		}
 
-		void setOffset(long offset) {
-			this.offset = offset;
+		/** @return hex string describing the object id of this entry. */
+		public String name() {
+			ensureId();
+			return idBuffer.name();
 		}
 
-		private MutableEntry(MutableEntry src) {
-			super(src);
-			this.offset = src.offset;
+		/** @return a copy of the object id. */
+		public ObjectId toObjectId() {
+			ensureId();
+			return idBuffer.toObjectId();
 		}
 
-		/**
-		 * Returns mutable copy of this mutable entry.
-		 * 
-		 * @return copy of this mutable entry
-		 */
+		/** @return a complete copy of this entry, that won't modify */
 		public MutableEntry cloneEntry() {
-			return new MutableEntry(this);
+			final MutableEntry r = new MutableEntry();
+			ensureId();
+			r.idBuffer.w1 = idBuffer.w1;
+			r.idBuffer.w2 = idBuffer.w2;
+			r.idBuffer.w3 = idBuffer.w3;
+			r.idBuffer.w4 = idBuffer.w4;
+			r.idBuffer.w5 = idBuffer.w5;
+			r.offset = offset;
+			return r;
+		}
+		
+		protected void ensureId() {
+			// Override in implementations.
 		}
 	}
 
 	protected abstract class EntriesIterator implements Iterator<MutableEntry> {
-		protected MutableEntry objectId = new MutableEntry();
+		protected final MutableEntry entry = initEntry();
 
 		protected long returnedNumber = 0;
 
+		protected abstract MutableEntry initEntry();
+
 		public boolean hasNext() {
 			return returnedNumber < getObjectCount();
 		}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV1.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV1.java
index 02465f6..90b5f6f 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV1.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV1.java
@@ -162,21 +162,27 @@ boolean hasCRC32Support() {
 
 		private int levelTwo;
 
+		@Override
+		protected MutableEntry initEntry() {
+			return new MutableEntry() {
+				protected void ensureId() {
+					idBuffer.fromRaw(idxdata[levelOne], levelTwo
+							- Constants.OBJECT_ID_LENGTH);
+				}
+			};
+		}
+
 		public MutableEntry next() {
 			for (; levelOne < idxdata.length; levelOne++) {
 				if (idxdata[levelOne] == null)
 					continue;
-
 				if (levelTwo < idxdata[levelOne].length) {
-					long offset = NB.decodeUInt32(idxdata[levelOne], levelTwo);
-					objectId.setOffset(offset);
-					objectId.fromRaw(idxdata[levelOne], levelTwo + 4);
+					entry.offset = NB.decodeUInt32(idxdata[levelOne], levelTwo);
 					levelTwo += Constants.OBJECT_ID_LENGTH + 4;
 					returnedNumber++;
-					return objectId;
-				} else {
-					levelTwo = 0;
+					return entry;
 				}
+				levelTwo = 0;
 			}
 			throw new NoSuchElementException();
 		}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
index cc3ad65..48a5206 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackIndexV2.java
@@ -234,25 +234,32 @@ else if (cmp == 0) {
 
 		private int levelTwo;
 
+		@Override
+		protected MutableEntry initEntry() {
+			return new MutableEntry() {
+				protected void ensureId() {
+					idBuffer.fromRaw(names[levelOne], levelTwo
+							- Constants.OBJECT_ID_LENGTH / 4);
+				}
+			};
+		}
+
 		public MutableEntry next() {
 			for (; levelOne < names.length; levelOne++) {
 				if (levelTwo < names[levelOne].length) {
-					objectId.fromRaw(names[levelOne], levelTwo);
-					int arrayIdx = levelTwo / (Constants.OBJECT_ID_LENGTH / 4)
-							* 4;
-					long offset = NB.decodeUInt32(offset32[levelOne], arrayIdx);
+					int idx = levelTwo / (Constants.OBJECT_ID_LENGTH / 4) * 4;
+					long offset = NB.decodeUInt32(offset32[levelOne], idx);
 					if ((offset & IS_O64) != 0) {
-						arrayIdx = (8 * (int) (offset & ~IS_O64));
-						offset = NB.decodeUInt64(offset64, arrayIdx);
+						idx = (8 * (int) (offset & ~IS_O64));
+						offset = NB.decodeUInt64(offset64, idx);
 					}
-					objectId.setOffset(offset);
+					entry.offset = offset;
 
 					levelTwo += Constants.OBJECT_ID_LENGTH / 4;
 					returnedNumber++;
-					return objectId;
-				} else {
-					levelTwo = 0;
+					return entry;
 				}
+				levelTwo = 0;
 			}
 			throw new NoSuchElementException();
 		}
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 17/23] Only do one getCachedBytes per whole object written
  2008-12-25  2:11                               ` [JGIT PATCH 16/23] Defer parsing of the ObjectId while walking a PackIndex Iterator Shawn O. Pearce
@ 2008-12-25  2:11                                 ` Shawn O. Pearce
  2008-12-25  2:11                                   ` [JGIT PATCH 18/23] Correctly use a long for the offsets within a generated pack Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Getting the size of an object may cause getCachedBytes to be invoked,
and then have its result discarded.  To save time we should make the
call to getCachedBytes first, hold onto the byte[], then obtain the
size off the byte[], and finally write from the byte[].

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/lib/PackWriter.java       |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
index a0823c7..bb889e8 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
@@ -710,10 +710,11 @@ private void writeWholeObject(final ObjectToPack otp) throws IOException {
 			otp.disposeLoader();
 		} else {
 			final ObjectLoader loader = db.openObject(windowCursor, otp);
+			final byte[] data = loader.getCachedBytes();
 			final DeflaterOutputStream deflaterOut = new DeflaterOutputStream(
 					out, deflater);
-			writeObjectHeader(otp.getType(), loader.getSize());
-			deflaterOut.write(loader.getCachedBytes());
+			writeObjectHeader(otp.getType(), data.length);
+			deflaterOut.write(data);
 			deflaterOut.finish();
 			deflater.reset();
 		}
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 18/23] Correctly use a long for the offsets within a generated pack
  2008-12-25  2:11                                 ` [JGIT PATCH 17/23] Only do one getCachedBytes per whole object written Shawn O. Pearce
@ 2008-12-25  2:11                                   ` Shawn O. Pearce
  2008-12-25  2:11                                     ` [JGIT PATCH 19/23] Allow more direct access to determine isWritten Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Pack files can be larger than 2 GiB, especially if the PackIndexV2
format is being used.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../tst/org/spearce/jgit/lib/PackWriterTest.java   |   12 ++++++------
 .../spearce/jgit/util/CountingOutputStream.java    |    5 ++---
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackWriterTest.java b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackWriterTest.java
index ec138a0..98e0d3a 100644
--- a/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackWriterTest.java
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/lib/PackWriterTest.java
@@ -308,11 +308,11 @@ public void testWritePack4ThinPack() throws IOException {
 	 */
 	public void testWritePack2SizeDeltasVsNoDeltas() throws Exception {
 		testWritePack2();
-		final int sizePack2NoDeltas = cos.getCount();
+		final long sizePack2NoDeltas = cos.getCount();
 		tearDown();
 		setUp();
 		testWritePack2DeltasReuseRefs();
-		final int sizePack2DeltasRefs = cos.getCount();
+		final long sizePack2DeltasRefs = cos.getCount();
 
 		assertTrue(sizePack2NoDeltas > sizePack2DeltasRefs);
 	}
@@ -327,11 +327,11 @@ public void testWritePack2SizeDeltasVsNoDeltas() throws Exception {
 	 */
 	public void testWritePack2SizeOffsetsVsRefs() throws Exception {
 		testWritePack2DeltasReuseRefs();
-		final int sizePack2DeltasRefs = cos.getCount();
+		final long sizePack2DeltasRefs = cos.getCount();
 		tearDown();
 		setUp();
 		testWritePack2DeltasReuseOffsets();
-		final int sizePack2DeltasOffsets = cos.getCount();
+		final long sizePack2DeltasOffsets = cos.getCount();
 
 		assertTrue(sizePack2DeltasRefs > sizePack2DeltasOffsets);
 	}
@@ -345,11 +345,11 @@ public void testWritePack2SizeOffsetsVsRefs() throws Exception {
 	 */
 	public void testWritePack4SizeThinVsNoThin() throws Exception {
 		testWritePack4();
-		final int sizePack4 = cos.getCount();
+		final long sizePack4 = cos.getCount();
 		tearDown();
 		setUp();
 		testWritePack4ThinPack();
-		final int sizePack4Thin = cos.getCount();
+		final long sizePack4Thin = cos.getCount();
 
 		assertTrue(sizePack4 > sizePack4Thin);
 	}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/util/CountingOutputStream.java b/org.spearce.jgit/src/org/spearce/jgit/util/CountingOutputStream.java
index 53935dc..b0b5f7d 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/util/CountingOutputStream.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/util/CountingOutputStream.java
@@ -45,8 +45,7 @@
  * Counting output stream decoration. Counts bytes written to stream.
  */
 public class CountingOutputStream extends FilterOutputStream {
-
-	private int count;
+	private long count;
 
 	/**
 	 * Create counting stream being decorated to provided real output stream.
@@ -76,7 +75,7 @@ public void write(byte[] b, int off, int len) throws IOException {
 	 * @return number of written bytes since last reset (object is reset upon
 	 *         creation)
 	 */
-	public int getCount() {
+	public long getCount() {
 		return count;
 	}
 
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 19/23] Allow more direct access to determine isWritten
  2008-12-25  2:11                                   ` [JGIT PATCH 18/23] Correctly use a long for the offsets within a generated pack Shawn O. Pearce
@ 2008-12-25  2:11                                     ` Shawn O. Pearce
  2008-12-25  2:11                                       ` [JGIT PATCH 20/23] Move "wantWrite" field of ObjectToPack into the flags field Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

This gives the JVM a better chance to inline the isWritten method
when writing a pack file out.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/lib/PackWriter.java       |    2 +-
 .../spearce/jgit/transport/PackedObjectInfo.java   |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
index bb889e8..50d06c2 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
@@ -932,7 +932,7 @@ boolean isDeltaRepresentation() {
 		 * @return true if object is already written; false otherwise.
 		 */
 		boolean isWritten() {
-			return getOffset() != 0;
+			return offset != 0;
 		}
 
 		PackedObjectLoader getReuseLoader() {
diff --git a/org.spearce.jgit/src/org/spearce/jgit/transport/PackedObjectInfo.java b/org.spearce.jgit/src/org/spearce/jgit/transport/PackedObjectInfo.java
index f37f421..045b795 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/transport/PackedObjectInfo.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/transport/PackedObjectInfo.java
@@ -49,7 +49,7 @@
  * objects from the pack. This extension of ObjectId includes the offset.
  */
 public class PackedObjectInfo extends ObjectId {
-	private long offset;
+	protected long offset;
 
 	private int crc;
 
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 20/23] Move "wantWrite" field of ObjectToPack into the flags field
  2008-12-25  2:11                                     ` [JGIT PATCH 19/23] Allow more direct access to determine isWritten Shawn O. Pearce
@ 2008-12-25  2:11                                       ` Shawn O. Pearce
  2008-12-25  2:11                                         ` [JGIT PATCH 21/23] Use an ArrayList for the reuseLoader collection in PackWriter Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

This reduces the ObjectToPack structure by 4 bytes, saving us 4 MB
on the linux kernel repository.  Its only a single bit value, so
keeping it in a full boolean is rather wasteful given how many of
these objects we need in to pack a large project.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/lib/PackWriter.java       |   23 ++++++++++++-------
 1 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
index 50d06c2..3ef9154 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
@@ -859,11 +859,16 @@ public void addObject(final RevObject object)
 
 		private PackedObjectLoader reuseLoader;
 
-		/** Low bits contain the objectType; higher bits the deltaDepth */
+		/**
+		 * Bit field, from bit 0 to bit 31:
+		 * <ul>
+		 * <li>1 bit: wantWrite</li>
+		 * <li>3 bits: type</li>
+		 * <li>28 bits: deltaDepth</li>
+		 * </ul>
+		 */
 		private int flags;
 
-		private boolean wantWrite;
-
 		/**
 		 * Construct object for specified object id. <br/> By default object is
 		 * marked as not written and non-delta packed (as a whole object).
@@ -875,7 +880,7 @@ public void addObject(final RevObject object)
 		 */
 		ObjectToPack(AnyObjectId src, final int type) {
 			super(src);
-			flags |= type;
+			flags |= type << 1;
 		}
 
 		/**
@@ -952,11 +957,11 @@ void disposeLoader() {
 		}
 
 		int getType() {
-			return flags & 0x7;
+			return (flags>>1) & 0x7;
 		}
 
 		int getDeltaDepth() {
-			return flags >>> 3;
+			return flags >>> 4;
 		}
 
 		void updateDeltaDepth() {
@@ -967,15 +972,15 @@ else if (deltaBase != null)
 				d = 1;
 			else
 				d = 0;
-			flags = (d << 3) | getType();
+			flags = (d << 4) | flags & 0x15;
 		}
 
 		boolean wantWrite() {
-			return wantWrite;
+			return (flags & 1) == 1;
 		}
 
 		void markWantWrite() {
-			this.wantWrite = true;
+			flags |= 1;
 		}
 	}
 }
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 21/23] Use an ArrayList for the reuseLoader collection in PackWriter
  2008-12-25  2:11                                       ` [JGIT PATCH 20/23] Move "wantWrite" field of ObjectToPack into the flags field Shawn O. Pearce
@ 2008-12-25  2:11                                         ` Shawn O. Pearce
  2008-12-25  2:11                                           ` [JGIT PATCH 22/23] Don't cut off existing delta chains if we are reusing deltas Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Nulling out the elements of the array is quicker than deallocating
LinkedList nodes and allocating new ones for the next use.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/lib/PackWriter.java       |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
index 3ef9154..69b4294 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
@@ -46,7 +46,6 @@
 import java.util.Collection;
 import java.util.Collections;
 import java.util.Iterator;
-import java.util.LinkedList;
 import java.util.List;
 import java.util.zip.Deflater;
 import java.util.zip.DeflaterOutputStream;
@@ -577,8 +576,7 @@ public void writePack(OutputStream packStream) throws IOException {
 
 	private void searchForReuse() throws IOException {
 		initMonitor.beginTask(SEARCHING_REUSE_PROGRESS, getObjectsNumber());
-		final Collection<PackedObjectLoader> reuseLoaders = new LinkedList<PackedObjectLoader>();
-
+		final Collection<PackedObjectLoader> reuseLoaders = new ArrayList<PackedObjectLoader>();
 		for (List<ObjectToPack> list : objectsLists) {
 			for (ObjectToPack otp : list) {
 				if (initMonitor.isCancelled())
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 22/23] Don't cut off existing delta chains if we are reusing deltas
  2008-12-25  2:11                                         ` [JGIT PATCH 21/23] Use an ArrayList for the reuseLoader collection in PackWriter Shawn O. Pearce
@ 2008-12-25  2:11                                           ` Shawn O. Pearce
  2008-12-25  2:11                                             ` [JGIT PATCH 23/23] Correctly honor the thin parameter to PackWriter.writePack Shawn O. Pearce
  0 siblings, 1 reply; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

The existing pack file may have chains longer than our current max
depth, such as if it is a historical pack and was generated with a
maximum chain depth >100.  Cutting off the chains would create a
bigger pack file that we need to transmit, and it costs a lot more
CPU time to expand the delta object in order to recompress it whole,
because we couldn't copy it directly from the pack file.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/lib/PackWriter.java       |    6 ------
 1 files changed, 0 insertions(+), 6 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
index 69b4294..4c01b6e 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
@@ -681,12 +681,6 @@ private void writeObject(final ObjectToPack otp) throws IOException {
 					writeObject(deltaBase);
 				}
 			}
-
-			otp.updateDeltaDepth();
-			if (otp.getDeltaDepth() > maxDeltaDepth) {
-				otp.clearDeltaBase();
-				otp.disposeLoader();
-			}
 		}
 
 		assert !otp.isWritten();
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [JGIT PATCH 23/23] Correctly honor the thin parameter to PackWriter.writePack
  2008-12-25  2:11                                           ` [JGIT PATCH 22/23] Don't cut off existing delta chains if we are reusing deltas Shawn O. Pearce
@ 2008-12-25  2:11                                             ` Shawn O. Pearce
  0 siblings, 0 replies; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-25  2:11 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/lib/PackWriter.java       |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
index 4c01b6e..b2d0227 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/PackWriter.java
@@ -465,8 +465,9 @@ public void preparePack(
 			final Collection<? extends ObjectId> uninterestingObjects,
 			final boolean thin, final boolean ignoreMissingUninteresting)
 			throws IOException {
+		this.thin = thin;
 		ObjectWalk walker = setUpWalker(interestingObjects,
-				uninterestingObjects, thin, ignoreMissingUninteresting);
+				uninterestingObjects, ignoreMissingUninteresting);
 		findObjectsToPack(walker);
 	}
 
@@ -759,7 +760,7 @@ private void writeChecksum() throws IOException {
 	private ObjectWalk setUpWalker(
 			final Collection<? extends ObjectId> interestingObjects,
 			final Collection<? extends ObjectId> uninterestingObjects,
-			final boolean thin, final boolean ignoreMissingUninteresting)
+			final boolean ignoreMissingUninteresting)
 			throws MissingObjectException, IOException,
 			IncorrectObjectTypeException {
 		final ObjectWalk walker = new ObjectWalk(db);
-- 
1.6.1.rc4.301.g5497a

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [JGIT PATCH 11/23] Rewrite WindowCache to use a hash table
  2008-12-25  2:11                     ` [JGIT PATCH 11/23] Rewrite WindowCache to use a hash table Shawn O. Pearce
  2008-12-25  2:11                       ` [JGIT PATCH 12/23] Change ByteArrayWindow to read content outside of WindowCache's lock Shawn O. Pearce
@ 2008-12-27 13:30                       ` Robin Rosenberg
  2008-12-27 17:32                         ` [JGIT PATCH 11/23 v2] " Shawn O. Pearce
  1 sibling, 1 reply; 26+ messages in thread
From: Robin Rosenberg @ 2008-12-27 13:30 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

torsdag 25 december 2008 03:11:07 skrev Shawn O. Pearce:
> diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java
> index f617845..f478f04 100644
> --- a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java
> +++ b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java
> @@ -41,11 +41,7 @@
>  import java.io.IOException;
>  import java.lang.ref.ReferenceQueue;
>  
> -/**
> - * The WindowCache manages reusable <code>Windows</code> and inflaters used by
> - * the other windowed file access classes.
> - */
> -public class WindowCache {
> +class WindowCache {

This breaks the Eclipse plugin which want to call WindowCache.reconfigure. Package level
class can also have javadocs. Even though we do not require it, I still thinks it's a good idea. Not
sure how useful that one was.

-- robin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [JGIT PATCH 11/23 v2] Rewrite WindowCache to use a hash table
  2008-12-27 13:30                       ` [JGIT PATCH 11/23] Rewrite WindowCache to use a hash table Robin Rosenberg
@ 2008-12-27 17:32                         ` Shawn O. Pearce
  0 siblings, 0 replies; 26+ messages in thread
From: Shawn O. Pearce @ 2008-12-27 17:32 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

This shaved 10s off my linux-2.6 "jgit rev-list --objects" test.
We're also getting slightly better cache hit ratios, approaching
95% during commit access and 80% during tree access on the same
repository.  Its a significant win.

The cache is also a bit better about memory usage.  We now make
sure the byte[] or ByteBuffer handle is cleared out of the soft
reference when we push the reference into the queue manually. A
small change, but it makes a marked difference with the overall
JVM memory usage.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
  Robin Rosenberg <robin.rosenberg.lists@dewire.com> wrote:
  > > -/**
  > > - * The WindowCache manages reusable <code>Windows</code> and inflaters used by
  > > - * the other windowed file access classes.
  > > - */
  > > -public class WindowCache {
  > > +class WindowCache {
  > 
  > This breaks the Eclipse plugin which want to call WindowCache.reconfigure. Package level
  > class can also have javadocs. Even though we do not require it, I still thinks it's a good idea. Not
  > sure how useful that one was.

  Dammit, good catch.  I was working on this in the context of only
  having the JGit code in my workspace, so I missed the impact on
  the Eclipse plugin, and possibly on the NetBeans one as well.

  Corrected patch follows.

  
 .../src/org/spearce/jgit/lib/ByteWindow.java       |   10 +-
 .../src/org/spearce/jgit/lib/WindowCache.java      |  317 ++++++++++++--------
 .../src/org/spearce/jgit/lib/WindowedFile.java     |    2 +-
 3 files changed, 198 insertions(+), 131 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/ByteWindow.java b/org.spearce.jgit/src/org/spearce/jgit/lib/ByteWindow.java
index 1f6beb8..8d37de7 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/ByteWindow.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/ByteWindow.java
@@ -56,14 +56,20 @@
  *            type of object reference used to manage the window data.
  */
 abstract class ByteWindow<T> extends SoftReference<T> {
+	boolean sizeActive = true;
+
+	ByteWindow<?> chainNext;
+
+	ByteWindow<?> lruPrev;
+
+	ByteWindow<?> lruNext;
+
 	final WindowedFile provider;
 
 	final int id;
 
 	final int size;
 
-	int lastAccessed;
-
 	final long start;
 
 	final long end;
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java
index f617845..e71ca73 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowCache.java
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2007, Robin Rosenberg <robin.rosenberg@dewire.com>
+ * Copyright (C) 2008, Google Inc.
  * Copyright (C) 2008, Shawn O. Pearce <spearce@spearce.org>
  *
  * All rights reserved.
@@ -68,23 +68,51 @@ private static final int bits(int newSize) {
 
 	static final ReferenceQueue<?> clearedWindowQueue;
 
-	private static ByteWindow[] windows;
+	private static ByteWindow[] cache;
 
-	private static int openWindowCount;
+	private static ByteWindow lruHead;
+
+	private static ByteWindow lruTail;
 
 	private static int openByteCount;
 
-	private static int accessClock;
+	private static int hits;
+
+	private static int reqs;
+
+	private static int opens;
+
+	private static int closes;
 
 	static {
 		maxByteCount = 10 * MB;
 		windowSizeShift = bits(8 * KB);
 		windowSize = 1 << windowSizeShift;
 		mmap = false;
-		windows = new ByteWindow[maxByteCount / windowSize];
+		cache = new ByteWindow[cacheTableSize()];
 		clearedWindowQueue = new ReferenceQueue<Object>();
 	}
 
+	static synchronized String statString() {
+		int maxChain = 0, tot = 0;
+		for (ByteWindow<?> e : cache) {
+			int n = 0;
+			for (; e != null; e = e.chainNext) {
+				n++;
+				tot++;
+			}
+			maxChain = Math.max(maxChain, n);
+		}
+		return "WindowCache[ hits: " + (hits * 100 / reqs) + "%"
+				+ "; windows: " + tot + " maxChain: " + maxChain + "; kb:"
+				+ (openByteCount / KB) + "; cache: " + cache.length + " fds: "
+				+ (opens - closes) + "," + opens + "," + closes + " ]";
+	}
+
+	private static int cacheTableSize() {
+		return 5 * (maxByteCount / windowSize) / 2;
+	}
+
 	/**
 	 * Modify the configuration of the window cache.
 	 * <p>
@@ -135,27 +163,36 @@ private static synchronized void reconfigureImpl(final int packedGitLimit,
 			// We have to throw away every window we have. None
 			// of them are suitable for the new configuration.
 			//
-			for (int i = 0; i < openWindowCount; i++) {
-				final ByteWindow win = windows[i];
-				if (--win.provider.openCount == 0)
-					win.provider.cacheClose();
-				windows[i] = null;
+			for (ByteWindow<?> e : cache) {
+				for (; e != null; e = e.chainNext)
+					clear(e);
+			}
+			runClearedWindowQueue();
+			cache = new ByteWindow[cacheTableSize()];
+
+		} else {
+			if (prune) {
+				// We should decrease our memory usage.
+				//
+				releaseMemory();
+				runClearedWindowQueue();
 			}
-			windows = new ByteWindow[maxByteCount / windowSize];
-			openWindowCount = 0;
-			openByteCount = 0;
-		} else if (prune) {
-			// Our memory limit was decreased so we should try
-			// to drop windows to ensure we meet the new lower
-			// limit we were just given.
-			//
-			final int wincnt = maxByteCount / windowSize;
-			releaseMemory(wincnt, null, 0, 0);
 
-			if (wincnt != windows.length) {
-				final ByteWindow[] n = new ByteWindow[wincnt];
-				System.arraycopy(windows, 0, n, 0, openWindowCount);
-				windows = n;
+			if (cache.length != cacheTableSize()) {
+				// The cache table should be resized.
+				// Rehash every entry.
+				//
+				final ByteWindow[] priorTable = cache;
+
+				cache = new ByteWindow[cacheTableSize()];
+				for (ByteWindow<?> e : priorTable) {
+					for (ByteWindow<?> n; e != null; e = n) {
+						n = e.chainNext;
+						final int idx = hash(e.provider, e.id);
+						e.chainNext = cache[idx];
+						cache[idx] = e;
+					}
+				}
 			}
 		}
 	}
@@ -178,19 +215,27 @@ private static synchronized void reconfigureImpl(final int packedGitLimit,
 	 */
 	public static synchronized final void get(final WindowCursor curs,
 			final WindowedFile wp, final long position) throws IOException {
+		reqs++;
 		final int id = (int) (position >> windowSizeShift);
-		int idx = binarySearch(wp, id);
-		if (0 <= idx) {
-			final ByteWindow<?> w = windows[idx];
-			if ((curs.handle = w.get()) != null) {
-				w.lastAccessed = ++accessClock;
-				curs.window = w;
-				return;
+		final int idx = hash(wp, id);
+		for (ByteWindow<?> e = cache[idx]; e != null; e = e.chainNext) {
+			if (e.provider == wp && e.id == id) {
+				if ((curs.handle = e.get()) != null) {
+					curs.window = e;
+					makeMostRecent(e);
+					hits++;
+					return;
+				}
+
+				clear(e);
+				break;
 			}
 		}
 
-		if (++wp.openCount == 1) {
+		if (wp.openCount == 0) {
 			try {
+				opens++;
+				wp.openCount = 1;
 				wp.cacheOpen();
 			} catch (IOException ioe) {
 				wp.openCount = 0;
@@ -201,106 +246,55 @@ public static synchronized final void get(final WindowCursor curs,
 			} catch (Error ioe) {
 				wp.openCount = 0;
 				throw ioe;
+			} finally {
+				wp.openCount--;
 			}
 
 			// The cacheOpen may have mapped the window we are trying to
 			// map ourselves. Retrying the search ensures that does not
 			// happen to us.
 			//
-			idx = binarySearch(wp, id);
-			if (0 <= idx) {
-				final ByteWindow<?> w = windows[idx];
-				if ((curs.handle = w.get()) != null) {
-					w.lastAccessed = ++accessClock;
-					curs.window = w;
-					return;
+			for (ByteWindow<?> e = cache[idx]; e != null; e = e.chainNext) {
+				if (e.provider == wp && e.id == id) {
+					if ((curs.handle = e.get()) != null) {
+						curs.window = e;
+						makeMostRecent(e);
+						return;
+					}
+
+					clear(e);
+					break;
 				}
 			}
 		}
 
-		idx = -(idx + 1);
-		final int wSz = windowSize(wp, id);
-		idx = releaseMemory(windows.length, wp, idx, wSz);
-
-		if (idx < 0)
-			idx = 0;
-		final int toMove = openWindowCount - idx;
-		if (toMove > 0)
-			System.arraycopy(windows, idx, windows, idx + 1, toMove);
-		wp.loadWindow(curs, id, id << windowSizeShift, wSz);
-		windows[idx] = curs.window;
-		openWindowCount++;
-		openByteCount += curs.window.size;
+		final int wsz = windowSize(wp, id);
+		wp.openCount++;
+		openByteCount += wsz;
+		releaseMemory();
+		runClearedWindowQueue();
+
+		wp.loadWindow(curs, id, id << windowSizeShift, wsz);
+		final ByteWindow<?> e = curs.window;
+		e.chainNext = cache[idx];
+		cache[idx] = e;
+		insertLRU(e);
 	}
 
-	private static int releaseMemory(final int maxWindowCount,
-			final WindowedFile willRead, int insertionIndex, final int willAdd) {
-		for (;;) {
-			final ByteWindow<?> w = (ByteWindow<?>) clearedWindowQueue.poll();
-			if (w == null)
-				break;
-			final int oldest = binarySearch(w.provider, w.id);
-			if (oldest < 0 || windows[oldest] != w)
-				continue; // Must have been evicted by our other controls.
-
-			final WindowedFile p = w.provider;
-			if (--p.openCount == 0 && p != willRead)
-				p.cacheClose();
-
-			openByteCount -= w.size;
-			final int toMove = openWindowCount - oldest - 1;
-			if (toMove > 0)
-				System.arraycopy(windows, oldest + 1, windows, oldest, toMove);
-			windows[--openWindowCount] = null;
-			if (oldest < insertionIndex)
-				insertionIndex--;
-		}
-
-		while (openWindowCount >= maxWindowCount
-				|| (openWindowCount > 0 && openByteCount + willAdd > maxByteCount)) {
-			int oldest = 0;
-			for (int k = openWindowCount - 1; k > 0; k--) {
-				if (windows[k].lastAccessed < windows[oldest].lastAccessed)
-					oldest = k;
-			}
-
-			final ByteWindow w = windows[oldest];
-			final WindowedFile p = w.provider;
-			if (--p.openCount == 0 && p != willRead)
-				p.cacheClose();
-
-			openByteCount -= w.size;
-			final int toMove = openWindowCount - oldest - 1;
-			if (toMove > 0)
-				System.arraycopy(windows, oldest + 1, windows, oldest, toMove);
-			windows[--openWindowCount] = null;
-			w.enqueue();
-			if (oldest < insertionIndex)
-				insertionIndex--;
+	private static void makeMostRecent(ByteWindow<?> e) {
+		if (lruHead != e) {
+			unlinkLRU(e);
+			insertLRU(e);
 		}
-
-		return insertionIndex;
 	}
 
-	private static final int binarySearch(final WindowedFile sprov,
-			final int sid) {
-		if (openWindowCount == 0)
-			return -1;
-		final int shc = sprov.hash;
-		int high = openWindowCount;
-		int low = 0;
-		do {
-			final int mid = (low + high) / 2;
-			final ByteWindow mw = windows[mid];
-			if (mw.provider == sprov && mw.id == sid)
-				return mid;
-			final int mhc = mw.provider.hash;
-			if (mhc < shc || (shc == mhc && mw.id < sid))
-				low = mid + 1;
-			else
-				high = mid;
-		} while (low < high);
-		return -(low + 1);
+	private static void releaseMemory() {
+		ByteWindow<?> e = lruTail;
+		while (openByteCount > maxByteCount && e != null) {
+			final ByteWindow<?> p = e.lruPrev;
+			clear(e);
+			e = p;
+		}
 	}
 
 	/**
@@ -316,22 +310,89 @@ private static final int binarySearch(final WindowedFile sprov,
 	 *            cache.
 	 */
 	public static synchronized final void purge(final WindowedFile wp) {
-		int d = 0;
-		for (int s = 0; s < openWindowCount; s++) {
-			final ByteWindow win = windows[s];
-			if (win.provider != wp)
-				windows[d++] = win;
-			else
-				openByteCount -= win.size;
+		for (ByteWindow e : cache) {
+			for (; e != null; e = e.chainNext) {
+				if (e.provider == wp)
+					clear(e);
+			}
+		}
+		runClearedWindowQueue();
+	}
+
+	private static void runClearedWindowQueue() {
+		ByteWindow<?> e;
+		while ((e = (ByteWindow) clearedWindowQueue.poll()) != null) {
+			unlinkSize(e);
+			unlinkLRU(e);
+			unlinkCache(e);
+			e.chainNext = null;
+			e.lruNext = null;
+			e.lruPrev = null;
 		}
-		openWindowCount = d;
+	}
+
+	private static void clear(final ByteWindow<?> e) {
+		unlinkSize(e);
+		e.clear();
+		e.enqueue();
+	}
 
-		if (wp.openCount > 0) {
-			wp.openCount = 0;
-			wp.cacheClose();
+	private static void unlinkSize(final ByteWindow<?> e) {
+		if (e.sizeActive) {
+			if (--e.provider.openCount == 0) {
+				closes++;
+				e.provider.cacheClose();
+			}
+			openByteCount -= e.size;
+			e.sizeActive = false;
 		}
 	}
 
+	private static void unlinkCache(final ByteWindow dead) {
+		final int idx = hash(dead.provider, dead.id);
+		ByteWindow<?> e = cache[idx], p = null, n;
+		for (; e != null; p = e, e = n) {
+			n = e.chainNext;
+			if (e == dead) {
+				if (p == null)
+					cache[idx] = n;
+				else
+					p.chainNext = n;
+				break;
+			}
+		}
+	}
+
+	private static void unlinkLRU(final ByteWindow e) {
+		final ByteWindow<?> prev = e.lruPrev;
+		final ByteWindow<?> next = e.lruNext;
+
+		if (prev != null)
+			prev.lruNext = next;
+		else
+			lruHead = next;
+
+		if (next != null)
+			next.lruPrev = prev;
+		else
+			lruTail = prev;
+	}
+
+	private static void insertLRU(final ByteWindow<?> e) {
+		final ByteWindow h = lruHead;
+		e.lruPrev = null;
+		e.lruNext = h;
+		if (h != null)
+			h.lruPrev = e;
+		else
+			lruTail = e;
+		lruHead = e;
+	}
+
+	private static int hash(final WindowedFile wp, final int id) {
+		return ((wp.hash + id) >>> 1) % cache.length;
+	}
+
 	private static int windowSize(final WindowedFile file, final int id) {
 		final long len = file.length();
 		final long pos = id << windowSizeShift;
diff --git a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
index dd94f24..9c5cf1e 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/lib/WindowedFile.java
@@ -88,7 +88,7 @@
 	 */
 	public WindowedFile(final File file) {
 		fPath = file;
-		hash = System.identityHashCode(this);
+		hash = System.identityHashCode(this) * 31;
 		length = Long.MAX_VALUE;
 	}
 
-- 
1.6.1.302.gccd4d

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2008-12-27 17:34 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-25  2:10 [JGIT PATCH 00/18] Misc. performance tweaks Shawn O. Pearce
2008-12-25  2:10 ` [JGIT PATCH 01/23] Improve hit performance on the UnpackedObjectCache Shawn O. Pearce
2008-12-25  2:10   ` [JGIT PATCH 02/23] Add MutableObjectId.clear() to set the id to zeroId Shawn O. Pearce
2008-12-25  2:10     ` [JGIT PATCH 03/23] Allow TreeWalk callers to pass a MutableObjectId to get the current id Shawn O. Pearce
2008-12-25  2:11       ` [JGIT PATCH 04/23] Switch ObjectWalk to use the new MutableObjectId form in TreeWalk Shawn O. Pearce
2008-12-25  2:11         ` [JGIT PATCH 05/23] Change walker based fetch to use TreeWalk's MutableObjectId accessor Shawn O. Pearce
2008-12-25  2:11           ` [JGIT PATCH 06/23] Reduce garbage allocation when using TreeWalk Shawn O. Pearce
2008-12-25  2:11             ` [JGIT PATCH 07/23] Switch ObjectWalk to use a naked CanonicalTreeParser because its faster Shawn O. Pearce
2008-12-25  2:11               ` [JGIT PATCH 08/23] Remove the unused PackFile.get(ObjectId) form Shawn O. Pearce
2008-12-25  2:11                 ` [JGIT PATCH 09/23] Remove getId from ObjectLoader API as its unnecessary overhead Shawn O. Pearce
2008-12-25  2:11                   ` [JGIT PATCH 10/23] Make mmap mode more reliable by forcing GC at the correct spot Shawn O. Pearce
2008-12-25  2:11                     ` [JGIT PATCH 11/23] Rewrite WindowCache to use a hash table Shawn O. Pearce
2008-12-25  2:11                       ` [JGIT PATCH 12/23] Change ByteArrayWindow to read content outside of WindowCache's lock Shawn O. Pearce
2008-12-25  2:11                         ` [JGIT PATCH 13/23] Dispose of RevCommit buffers when they aren't used in PackWriter Shawn O. Pearce
2008-12-25  2:11                           ` [JGIT PATCH 14/23] Don't unpack delta chains while writing a pack from a pack v1 index Shawn O. Pearce
2008-12-25  2:11                             ` [JGIT PATCH 15/23] Don't unpack delta chains while converting delta to whole object Shawn O. Pearce
2008-12-25  2:11                               ` [JGIT PATCH 16/23] Defer parsing of the ObjectId while walking a PackIndex Iterator Shawn O. Pearce
2008-12-25  2:11                                 ` [JGIT PATCH 17/23] Only do one getCachedBytes per whole object written Shawn O. Pearce
2008-12-25  2:11                                   ` [JGIT PATCH 18/23] Correctly use a long for the offsets within a generated pack Shawn O. Pearce
2008-12-25  2:11                                     ` [JGIT PATCH 19/23] Allow more direct access to determine isWritten Shawn O. Pearce
2008-12-25  2:11                                       ` [JGIT PATCH 20/23] Move "wantWrite" field of ObjectToPack into the flags field Shawn O. Pearce
2008-12-25  2:11                                         ` [JGIT PATCH 21/23] Use an ArrayList for the reuseLoader collection in PackWriter Shawn O. Pearce
2008-12-25  2:11                                           ` [JGIT PATCH 22/23] Don't cut off existing delta chains if we are reusing deltas Shawn O. Pearce
2008-12-25  2:11                                             ` [JGIT PATCH 23/23] Correctly honor the thin parameter to PackWriter.writePack Shawn O. Pearce
2008-12-27 13:30                       ` [JGIT PATCH 11/23] Rewrite WindowCache to use a hash table Robin Rosenberg
2008-12-27 17:32                         ` [JGIT PATCH 11/23 v2] " Shawn O. Pearce

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).