[JGIT PATCH 1/2] Add raw buffer fetch methods to FileHeader, HunkHeader

All of lore.kernel.org
 help / color / mirror / Atom feed

* [JGIT PATCH 1/2] Add raw buffer fetch methods to FileHeader, HunkHeader
@ 2008-12-13  2:42 Shawn O. Pearce
  2008-12-13  2:42 ` [JGIT PATCH 2/2] Add getPatchText functions to obtain the plain-text version of a patch Shawn O. Pearce
  0 siblings, 1 reply; 5+ messages in thread
From: Shawn O. Pearce @ 2008-12-13  2:42 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

These permit application level code to read back the patch
script, for example to slice it up and output parts into a
UI on a per-file or per-hunk basis.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---

 Two last-minute patches.  While using this code in Gerrit 2 I
 realized I forgot to add a way to get the script back after its
 been parsed by the library.  :-)

 .../src/org/spearce/jgit/patch/BinaryHunk.java     |   15 +++++++++++++++
 .../src/org/spearce/jgit/patch/FileHeader.java     |   15 +++++++++++++++
 .../src/org/spearce/jgit/patch/HunkHeader.java     |   15 +++++++++++++++
 3 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/org.spearce.jgit/src/org/spearce/jgit/patch/BinaryHunk.java b/org.spearce.jgit/src/org/spearce/jgit/patch/BinaryHunk.java
index 92eab86..f43a1b9 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/patch/BinaryHunk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/patch/BinaryHunk.java
@@ -81,6 +81,21 @@ public FileHeader getFileHeader() {
 		return file;
 	}
 
+	/** @return the byte array holding this hunk's patch script. */
+	public byte[] getBuffer() {
+		return file.buf;
+	}
+
+	/** @return offset the start of this hunk in {@link #getBuffer()}. */
+	public int getStartOffset() {
+		return startOffset;
+	}
+
+	/** @return offset one past the end of the hunk in {@link #getBuffer()}. */
+	public int getEndOffset() {
+		return endOffset;
+	}
+
 	/** @return type of this binary hunk */
 	public Type getType() {
 		return type;
diff --git a/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java b/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java
index 79e4b0a..7c3a45a 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java
@@ -173,6 +173,21 @@ int getParentCount() {
 		return 1;
 	}
 
+	/** @return the byte array holding this file's patch script. */
+	public byte[] getBuffer() {
+		return buf;
+	}
+
+	/** @return offset the start of this file's script in {@link #getBuffer()}. */
+	public int getStartOffset() {
+		return startOffset;
+	}
+
+	/** @return offset one past the end of the file script. */
+	public int getEndOffset() {
+		return endOffset;
+	}
+
 	/**
 	 * Get the old name associated with this file.
 	 * <p>
diff --git a/org.spearce.jgit/src/org/spearce/jgit/patch/HunkHeader.java b/org.spearce.jgit/src/org/spearce/jgit/patch/HunkHeader.java
index fc149ac..12c670d 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/patch/HunkHeader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/patch/HunkHeader.java
@@ -123,6 +123,21 @@ public FileHeader getFileHeader() {
 		return file;
 	}
 
+	/** @return the byte array holding this hunk's patch script. */
+	public byte[] getBuffer() {
+		return file.buf;
+	}
+
+	/** @return offset the start of this hunk in {@link #getBuffer()}. */
+	public int getStartOffset() {
+		return startOffset;
+	}
+
+	/** @return offset one past the end of the hunk in {@link #getBuffer()}. */
+	public int getEndOffset() {
+		return endOffset;
+	}
+
 	/** @return information about the old image mentioned in this hunk. */
 	public OldImage getOldImage() {
 		return old;
-- 
1.6.1.rc2.306.ge5d5e

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [JGIT PATCH 2/2] Add getPatchText functions to obtain the plain-text version of a patch
  2008-12-13  2:42 [JGIT PATCH 1/2] Add raw buffer fetch methods to FileHeader, HunkHeader Shawn O. Pearce
@ 2008-12-13  2:42 ` Shawn O. Pearce
  2008-12-13 11:02   ` Robin Rosenberg
  0 siblings, 1 reply; 5+ messages in thread
From: Shawn O. Pearce @ 2008-12-13  2:42 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

The conversion from byte[] to String is performed one line at a time,
in case the patch is a character encoding conversion patch for the
file.  For simplicity we currently assume UTF-8 still as the default
encoding for any content, but eventually we should support using the
.gitattributes encoding property when performing this conversion.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
 .../src/org/spearce/jgit/patch/BinaryHunk.java     |    8 ++
 .../src/org/spearce/jgit/patch/FileHeader.java     |    6 ++
 .../src/org/spearce/jgit/patch/HunkHeader.java     |    7 ++
 .../src/org/spearce/jgit/patch/PatchUtil.java      |   79 ++++++++++++++++++++
 4 files changed, 100 insertions(+), 0 deletions(-)
 create mode 100644 org.spearce.jgit/src/org/spearce/jgit/patch/PatchUtil.java

diff --git a/org.spearce.jgit/src/org/spearce/jgit/patch/BinaryHunk.java b/org.spearce.jgit/src/org/spearce/jgit/patch/BinaryHunk.java
index f43a1b9..f4e2ee3 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/patch/BinaryHunk.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/patch/BinaryHunk.java
@@ -42,6 +42,8 @@
 import static org.spearce.jgit.util.RawParseUtils.nextLF;
 import static org.spearce.jgit.util.RawParseUtils.parseBase10;
 
+import org.spearce.jgit.lib.Constants;
+
 /** Part of a "GIT binary patch" to describe the pre-image or post-image */
 public class BinaryHunk {
 	private static final byte[] LITERAL = encodeASCII("literal ");
@@ -96,6 +98,12 @@ public int getEndOffset() {
 		return endOffset;
 	}
 
+	/** @return text of this patch file's script; best-effort decoded */
+	public String getHunkText() {
+		return PatchUtil.decode(Constants.CHARSET, getBuffer(),
+				getStartOffset(), getEndOffset());
+	}
+
 	/** @return type of this binary hunk */
 	public Type getType() {
 		return type;
diff --git a/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java b/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java
index 7c3a45a..0110f4a 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java
@@ -188,6 +188,12 @@ public int getEndOffset() {
 		return endOffset;
 	}
 
+	/** @return text of this patch file's script; best-effort decoded */
+	public String getScriptText() {
+		return PatchUtil.decode(Constants.CHARSET, getBuffer(),
+				getStartOffset(), getEndOffset());
+	}
+
 	/**
 	 * Get the old name associated with this file.
 	 * <p>
diff --git a/org.spearce.jgit/src/org/spearce/jgit/patch/HunkHeader.java b/org.spearce.jgit/src/org/spearce/jgit/patch/HunkHeader.java
index 12c670d..5a3b590 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/patch/HunkHeader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/patch/HunkHeader.java
@@ -42,6 +42,7 @@
 import static org.spearce.jgit.util.RawParseUtils.parseBase10;
 
 import org.spearce.jgit.lib.AbbreviatedObjectId;
+import org.spearce.jgit.lib.Constants;
 import org.spearce.jgit.util.MutableInteger;
 
 /** Hunk header describing the layout of a single block of lines */
@@ -138,6 +139,12 @@ public int getEndOffset() {
 		return endOffset;
 	}
 
+	/** @return text of this patch file's script; best-effort decoded */
+	public String getHunkText() {
+		return PatchUtil.decode(Constants.CHARSET, getBuffer(),
+				getStartOffset(), getEndOffset());
+	}
+
 	/** @return information about the old image mentioned in this hunk. */
 	public OldImage getOldImage() {
 		return old;
diff --git a/org.spearce.jgit/src/org/spearce/jgit/patch/PatchUtil.java b/org.spearce.jgit/src/org/spearce/jgit/patch/PatchUtil.java
new file mode 100644
index 0000000..89136c0
--- /dev/null
+++ b/org.spearce.jgit/src/org/spearce/jgit/patch/PatchUtil.java
@@ -0,0 +1,79 @@
+/*
+ * Copyright (C) 2008, Google Inc.
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+package org.spearce.jgit.patch;
+
+import java.nio.charset.Charset;
+
+import org.spearce.jgit.util.RawParseUtils;
+
+/** Patch related utility functions. */
+public class PatchUtil {
+	/**
+	 * Decode a region of a buffer one line at a time.
+	 * <p>
+	 * Unlike {@link RawParseUtils#decode(Charset, byte[], int, int)} this
+	 * method reads the input one line at a time and decodes each line
+	 * individually. This permits a decoding of a file converting from
+	 * ISO-8859-1 to UTF-8 encoding (for example), as each line in the patch
+	 * script will be in one encoding or the other.
+	 * 
+	 * @param cs
+	 *            preferred character set to use when decoding the buffer.
+	 * @param buf
+	 *            buffer to pull the raw bytes from.
+	 * @param ptr
+	 *            first position to read.
+	 * @param end
+	 *            one position past the last position to read.
+	 * @return a string representation of the region, decoded per-line.
+	 */
+	public static String decode(final Charset cs, final byte[] buf, int ptr,
+			final int end) {
+		final StringBuilder r = new StringBuilder(end - ptr);
+		while (ptr < end) {
+			final int eol = Math.min(end, RawParseUtils.nextLF(buf, ptr));
+			r.append(RawParseUtils.decode(cs, buf, ptr, eol));
+			ptr = eol;
+		}
+		return r.toString();
+	}
+
+	private PatchUtil() {
+		// No instances
+	}
+}
-- 
1.6.1.rc2.306.ge5d5e

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [JGIT PATCH 2/2] Add getPatchText functions to obtain the plain-text version of a patch
  2008-12-13  2:42 ` [JGIT PATCH 2/2] Add getPatchText functions to obtain the plain-text version of a patch Shawn O. Pearce
@ 2008-12-13 11:02   ` Robin Rosenberg
  2008-12-13 21:26     ` Robin Rosenberg
  0 siblings, 1 reply; 5+ messages in thread
From: Robin Rosenberg @ 2008-12-13 11:02 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

lördag 13 december 2008 03:42:26 skrev Shawn O. Pearce:
> The conversion from byte[] to String is performed one line at a time,
> in case the patch is a character encoding conversion patch for the
> file.  For simplicity we currently assume UTF-8 still as the default
> encoding for any content, but eventually we should support using the
> .gitattributes encoding property when performing this conversion.

For usefulness we must be able to pass the encoding from outside, 
e.g. the encoding Eclipse uses, which often is not UTF-8-

-- robin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [JGIT PATCH 2/2] Add getPatchText functions to obtain the plain-text version of a patch
  2008-12-13 11:02   ` Robin Rosenberg
@ 2008-12-13 21:26     ` Robin Rosenberg
  2008-12-17 20:13       ` [JGIT PATCH 2/2 v2] Add getScriptText " Shawn O. Pearce
  0 siblings, 1 reply; 5+ messages in thread
From: Robin Rosenberg @ 2008-12-13 21:26 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: git

lördag 13 december 2008 12:02:07 skrev Robin Rosenberg:
> lördag 13 december 2008 03:42:26 skrev Shawn O. Pearce:
> > The conversion from byte[] to String is performed one line at a time,
> > in case the patch is a character encoding conversion patch for the
> > file.  For simplicity we currently assume UTF-8 still as the default
> > encoding for any content, but eventually we should support using the
> > .gitattributes encoding property when performing this conversion.
> 
> For usefulness we must be able to pass the encoding from outside, 
> e.g. the encoding Eclipse uses, which often is not UTF-8-

It's even worse. You should probably do the encoding guess on the whole
patch, or per file and not per line so make success possible at all. Reading
and writing as ISO-8859-1 will always work as that is just padding every
byte with NUL on reading and dropping it on writing. I.e. if your convert
to char at all...

-- robin

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [JGIT PATCH 2/2 v2] Add getScriptText functions to obtain the plain-text version of a patch
  2008-12-13 21:26     ` Robin Rosenberg
@ 2008-12-17 20:13       ` Shawn O. Pearce
  0 siblings, 0 replies; 5+ messages in thread
From: Shawn O. Pearce @ 2008-12-17 20:13 UTC (permalink / raw)
  To: Robin Rosenberg; +Cc: git

The conversion from byte[] to String is performed one file at a time,
in case the patch is a character encoding conversion patch for the
file.  For simplicity we currently assume UTF-8 still as the default
encoding for any content, but eventually we should support using the
.gitattributes encoding property when performing this conversion.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
---
  Robin Rosenberg <robin.rosenberg.lists@dewire.com> wrote:
  > > For usefulness we must be able to pass the encoding from outside, 
  > > e.g. the encoding Eclipse uses, which often is not UTF-8-
  > 
  > It's even worse. You should probably do the encoding guess on the whole
  > patch, or per file and not per line so make success possible at all. Reading
  > and writing as ISO-8859-1 will always work as that is just padding every
  > byte with NUL on reading and dropping it on writing. I.e. if your convert
  > to char at all...

  So this patch does the "whole file" thing.  But there is a
  fast-path in getScriptText to try and bypass the multiple copies
  we have to make in order to shovel the entire file into the
  CharsetDecoder just to read the patch.  It isn't common to see
  a character set conversion patch, so the fast case of decoding
  the whole patch text at once should happen most of the time.

 .../jgit/patch/testGetText_BothISO88591.patch      |   21 +++
 .../spearce/jgit/patch/testGetText_Convert.patch   |   21 +++
 .../spearce/jgit/patch/testGetText_DiffCc.patch    |   13 ++
 .../spearce/jgit/patch/testGetText_NoBinary.patch  |    4 +
 .../tst/org/spearce/jgit/patch/GetTextTest.java    |  142 ++++++++++++++++++++
 .../org/spearce/jgit/patch/CombinedFileHeader.java |   27 ++++
 .../org/spearce/jgit/patch/CombinedHunkHeader.java |  127 +++++++++++++++++
 .../src/org/spearce/jgit/patch/FileHeader.java     |  116 ++++++++++++++++
 .../src/org/spearce/jgit/patch/HunkHeader.java     |   86 ++++++++++++
 .../src/org/spearce/jgit/util/RawParseUtils.java   |   57 ++++++++-
 10 files changed, 611 insertions(+), 3 deletions(-)
 create mode 100644 org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_BothISO88591.patch
 create mode 100644 org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_Convert.patch
 create mode 100644 org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_DiffCc.patch
 create mode 100644 org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_NoBinary.patch
 create mode 100644 org.spearce.jgit.test/tst/org/spearce/jgit/patch/GetTextTest.java

diff --git a/org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_BothISO88591.patch b/org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_BothISO88591.patch
new file mode 100644
index 0000000..8224fcc
--- /dev/null
+++ b/org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_BothISO88591.patch
@@ -0,0 +1,21 @@
+diff --git a/X b/X
+index 014ef30..8c80a36 100644
+--- a/X
++++ b/X
+@@ -1,7 +1,7 @@
+ a
+ b
+ c
+-�ngstr�m
++line 4 �ngstr�m
+ d
+ e
+ f
+@@ -13,6 +13,6 @@ k
+ l
+ m
+ n
+-�ngstr�m
++�ngstr�m; line 16
+ o
+ p
diff --git a/org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_Convert.patch b/org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_Convert.patch
new file mode 100644
index 0000000..a43fef5
--- /dev/null
+++ b/org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_Convert.patch
@@ -0,0 +1,21 @@
+diff --git a/X b/X
+index 014ef30..209db0d 100644
+--- a/X
++++ b/X
+@@ -1,7 +1,7 @@
+ a
+ b
+ c
+-�ngstr�m
++Ångström
+ d
+ e
+ f
+@@ -13,6 +13,6 @@ k
+ l
+ m
+ n
+-�ngstr�m
++Ångström
+ o
+ p
diff --git a/org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_DiffCc.patch b/org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_DiffCc.patch
new file mode 100644
index 0000000..3f74a52
--- /dev/null
+++ b/org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_DiffCc.patch
@@ -0,0 +1,13 @@
+diff --cc X
+index bdfc9f4,209db0d..474bd69
+--- a/X
++++ b/X
+@@@ -1,7 -1,7 +1,7 @@@
+  a
+--b
+  c
+ +test �ngstr�m
++ Ångström
+  d
+  e
+  f
diff --git a/org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_NoBinary.patch b/org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_NoBinary.patch
new file mode 100644
index 0000000..e4968dc
--- /dev/null
+++ b/org.spearce.jgit.test/tst-rsrc/org/spearce/jgit/patch/testGetText_NoBinary.patch
@@ -0,0 +1,4 @@
+diff --git a/org.spearce.egit.ui/icons/toolbar/fetchd.png b/org.spearce.egit.ui/icons/toolbar/fetchd.png
+new file mode 100644
+index 0000000..4433c54
+Binary files /dev/null and b/org.spearce.egit.ui/icons/toolbar/fetchd.png differ
diff --git a/org.spearce.jgit.test/tst/org/spearce/jgit/patch/GetTextTest.java b/org.spearce.jgit.test/tst/org/spearce/jgit/patch/GetTextTest.java
new file mode 100644
index 0000000..04810be
--- /dev/null
+++ b/org.spearce.jgit.test/tst/org/spearce/jgit/patch/GetTextTest.java
@@ -0,0 +1,142 @@
+/*
+ * Copyright (C) 2008, Google Inc.
+ *
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ *   copyright notice, this list of conditions and the following
+ *   disclaimer in the documentation and/or other materials provided
+ *   with the distribution.
+ *
+ * - Neither the name of the Git Development Community nor the
+ *   names of its contributors may be used to endorse or promote
+ *   products derived from this software without specific prior
+ *   written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
+ * CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
+ * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+ * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
+ * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+package org.spearce.jgit.patch;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.charset.Charset;
+
+import junit.framework.TestCase;
+
+public class GetTextTest extends TestCase {
+	public void testGetText_BothISO88591() throws IOException {
+		final Charset cs = Charset.forName("ISO-8859-1");
+		final Patch p = parseTestPatchFile();
+		assertTrue(p.getErrors().isEmpty());
+		assertEquals(1, p.getFiles().size());
+		final FileHeader fh = p.getFiles().get(0);
+		assertEquals(2, fh.getHunks().size());
+		assertEquals(readTestPatchFile(cs), fh.getScriptText(cs, cs));
+	}
+
+	public void testGetText_NoBinary() throws IOException {
+		final Charset cs = Charset.forName("ISO-8859-1");
+		final Patch p = parseTestPatchFile();
+		assertTrue(p.getErrors().isEmpty());
+		assertEquals(1, p.getFiles().size());
+		final FileHeader fh = p.getFiles().get(0);
+		assertEquals(0, fh.getHunks().size());
+		assertEquals(readTestPatchFile(cs), fh.getScriptText(cs, cs));
+	}
+
+	public void testGetText_Convert() throws IOException {
+		final Charset csOld = Charset.forName("ISO-8859-1");
+		final Charset csNew = Charset.forName("UTF-8");
+		final Patch p = parseTestPatchFile();
+		assertTrue(p.getErrors().isEmpty());
+		assertEquals(1, p.getFiles().size());
+		final FileHeader fh = p.getFiles().get(0);
+		assertEquals(2, fh.getHunks().size());
+
+		// Read the original file as ISO-8859-1 and fix up the one place
+		// where we changed the character encoding. That makes the exp
+		// string match what we really expect to get back.
+		//
+		String exp = readTestPatchFile(csOld);
+		exp = exp.replace("\303\205ngstr\303\266m", "\u00c5ngstr\u00f6m");
+
+		assertEquals(exp, fh.getScriptText(csOld, csNew));
+	}
+
+	public void testGetText_DiffCc() throws IOException {
+		final Charset csOld = Charset.forName("ISO-8859-1");
+		final Charset csNew = Charset.forName("UTF-8");
+		final Patch p = parseTestPatchFile();
+		assertTrue(p.getErrors().isEmpty());
+		assertEquals(1, p.getFiles().size());
+		final CombinedFileHeader fh = (CombinedFileHeader) p.getFiles().get(0);
+		assertEquals(1, fh.getHunks().size());
+
+		// Read the original file as ISO-8859-1 and fix up the one place
+		// where we changed the character encoding. That makes the exp
+		// string match what we really expect to get back.
+		//
+		String exp = readTestPatchFile(csOld);
+		exp = exp.replace("\303\205ngstr\303\266m", "\u00c5ngstr\u00f6m");
+
+		assertEquals(exp, fh
+				.getScriptText(new Charset[] { csNew, csOld, csNew }));
+	}
+
+	private Patch parseTestPatchFile() throws IOException {
+		final String patchFile = getName() + ".patch";
+		final InputStream in = getClass().getResourceAsStream(patchFile);
+		if (in == null) {
+			fail("No " + patchFile + " test vector");
+			return null; // Never happens
+		}
+		try {
+			final Patch p = new Patch();
+			p.parse(in);
+			return p;
+		} finally {
+			in.close();
+		}
+	}
+
+	private String readTestPatchFile(final Charset cs) throws IOException {
+		final String patchFile = getName() + ".patch";
+		final InputStream in = getClass().getResourceAsStream(patchFile);
+		if (in == null) {
+			fail("No " + patchFile + " test vector");
+			return null; // Never happens
+		}
+		try {
+			final InputStreamReader r = new InputStreamReader(in, cs);
+			char[] tmp = new char[2048];
+			final StringBuilder s = new StringBuilder();
+			int n;
+			while ((n = r.read(tmp)) > 0)
+				s.append(tmp, 0, n);
+			return s.toString();
+		} finally {
+			in.close();
+		}
+	}
+}
diff --git a/org.spearce.jgit/src/org/spearce/jgit/patch/CombinedFileHeader.java b/org.spearce.jgit/src/org/spearce/jgit/patch/CombinedFileHeader.java
index 3ccc418..a27e0f8 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/patch/CombinedFileHeader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/patch/CombinedFileHeader.java
@@ -41,7 +41,9 @@
 import static org.spearce.jgit.util.RawParseUtils.match;
 import static org.spearce.jgit.util.RawParseUtils.nextLF;
 
+import java.nio.charset.Charset;
 import java.util.ArrayList;
+import java.util.Arrays;
 import java.util.List;
 
 import org.spearce.jgit.lib.AbbreviatedObjectId;
@@ -111,6 +113,31 @@ public AbbreviatedObjectId getOldId(final int nthParent) {
 		return oldIds[nthParent];
 	}
 
+	@Override
+	public String getScriptText(final Charset ocs, final Charset ncs) {
+		final Charset[] cs = new Charset[getParentCount() + 1];
+		Arrays.fill(cs, ocs);
+		cs[getParentCount()] = ncs;
+		return getScriptText(cs);
+	}
+
+	/**
+	 * Convert the patch script for this file into a string.
+	 * 
+	 * @param charsetGuess
+	 *            optional array to suggest the character set to use when
+	 *            decoding each file's line. If supplied the array must have a
+	 *            length of <code>{@link #getParentCount()} + 1</code>
+	 *            representing the old revision character sets and the new
+	 *            revision character set.
+	 * @return the patch script, as a Unicode string.
+	 */
+	@Override
+	public String getScriptText(final Charset[] charsetGuess) {
+		return super.getScriptText(charsetGuess);
+	}
+
+	@Override
 	int parseGitHeaders(int ptr, final int end) {
 		while (ptr < end) {
 			final int eol = nextLF(buf, ptr);
diff --git a/org.spearce.jgit/src/org/spearce/jgit/patch/CombinedHunkHeader.java b/org.spearce.jgit/src/org/spearce/jgit/patch/CombinedHunkHeader.java
index 3e5c465..83ea681 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/patch/CombinedHunkHeader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/patch/CombinedHunkHeader.java
@@ -40,6 +40,9 @@
 import static org.spearce.jgit.util.RawParseUtils.nextLF;
 import static org.spearce.jgit.util.RawParseUtils.parseBase10;
 
+import java.io.IOException;
+import java.io.OutputStream;
+
 import org.spearce.jgit.lib.AbbreviatedObjectId;
 import org.spearce.jgit.util.MutableInteger;
 
@@ -188,4 +191,128 @@ int parseBody(final Patch script, final int end) {
 
 		return c;
 	}
+
+	@Override
+	void extractFileLines(final OutputStream[] out) throws IOException {
+		final byte[] buf = file.buf;
+		int ptr = startOffset;
+		int eol = nextLF(buf, ptr);
+		if (endOffset <= eol)
+			return;
+
+		// Treat the hunk header as though it were from the ancestor,
+		// as it may have a function header appearing after it which
+		// was copied out of the ancestor file.
+		//
+		out[0].write(buf, ptr, eol - ptr);
+
+		SCAN: for (ptr = eol; ptr < endOffset; ptr = eol) {
+			eol = nextLF(buf, ptr);
+
+			if (eol - ptr < old.length + 1) {
+				// Line isn't long enough to mention the state of each
+				// ancestor. It must be the end of the hunk.
+				break SCAN;
+			}
+
+			switch (buf[ptr]) {
+			case ' ':
+			case '-':
+			case '+':
+				break;
+
+			default:
+				// Line can't possibly be part of this hunk; the first
+				// ancestor information isn't recognizable.
+				//
+				break SCAN;
+			}
+
+			int delcnt = 0;
+			for (int ancestor = 0; ancestor < old.length; ancestor++) {
+				switch (buf[ptr + ancestor]) {
+				case '-':
+					delcnt++;
+					out[ancestor].write(buf, ptr, eol - ptr);
+					continue;
+
+				case ' ':
+					out[ancestor].write(buf, ptr, eol - ptr);
+					continue;
+
+				case '+':
+					continue;
+
+				default:
+					break SCAN;
+				}
+			}
+			if (delcnt < old.length) {
+				// This line appears in the new file if it wasn't deleted
+				// relative to all ancestors.
+				//
+				out[old.length].write(buf, ptr, eol - ptr);
+			}
+		}
+	}
+
+	void extractFileLines(final StringBuilder sb, final String[] text,
+			final int[] offsets) {
+		final byte[] buf = file.buf;
+		int ptr = startOffset;
+		int eol = nextLF(buf, ptr);
+		if (endOffset <= eol)
+			return;
+		copyLine(sb, text, offsets, 0);
+		SCAN: for (ptr = eol; ptr < endOffset; ptr = eol) {
+			eol = nextLF(buf, ptr);
+
+			if (eol - ptr < old.length + 1) {
+				// Line isn't long enough to mention the state of each
+				// ancestor. It must be the end of the hunk.
+				break SCAN;
+			}
+
+			switch (buf[ptr]) {
+			case ' ':
+			case '-':
+			case '+':
+				break;
+
+			default:
+				// Line can't possibly be part of this hunk; the first
+				// ancestor information isn't recognizable.
+				//
+				break SCAN;
+			}
+
+			boolean copied = false;
+			for (int ancestor = 0; ancestor < old.length; ancestor++) {
+				switch (buf[ptr + ancestor]) {
+				case ' ':
+				case '-':
+					if (copied)
+						skipLine(text, offsets, ancestor);
+					else {
+						copyLine(sb, text, offsets, ancestor);
+						copied = true;
+					}
+					continue;
+
+				case '+':
+					continue;
+
+				default:
+					break SCAN;
+				}
+			}
+			if (!copied) {
+				// If none of the ancestors caused the copy then this line
+				// must be new across the board, so it only appears in the
+				// text of the new file.
+				//
+				copyLine(sb, text, offsets, old.length);
+			}
+		}
+	}
 }
diff --git a/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java b/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java
index c91f80e..66c785f 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/patch/FileHeader.java
@@ -39,10 +39,15 @@
 
 import static org.spearce.jgit.lib.Constants.encodeASCII;
 import static org.spearce.jgit.util.RawParseUtils.decode;
+import static org.spearce.jgit.util.RawParseUtils.decodeNoFallback;
+import static org.spearce.jgit.util.RawParseUtils.extractBinaryString;
 import static org.spearce.jgit.util.RawParseUtils.match;
 import static org.spearce.jgit.util.RawParseUtils.nextLF;
 import static org.spearce.jgit.util.RawParseUtils.parseBase10;
 
+import java.io.IOException;
+import java.nio.charset.CharacterCodingException;
+import java.nio.charset.Charset;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.List;
@@ -51,6 +56,8 @@
 import org.spearce.jgit.lib.Constants;
 import org.spearce.jgit.lib.FileMode;
 import org.spearce.jgit.util.QuotedString;
+import org.spearce.jgit.util.RawParseUtils;
+import org.spearce.jgit.util.TemporaryBuffer;
 
 /** Patch header describing an action for a single file path. */
 public class FileHeader {
@@ -189,6 +196,115 @@ public int getEndOffset() {
 	}
 
 	/**
+	 * Convert the patch script for this file into a string.
+	 * <p>
+	 * The default character encoding ({@link Constants#CHARSET}) is assumed for
+	 * both the old and new files.
+	 * 
+	 * @return the patch script, as a Unicode string.
+	 */
+	public String getScriptText() {
+		return getScriptText(null, null);
+	}
+
+	/**
+	 * Convert the patch script for this file into a string.
+	 * 
+	 * @param oldCharset
+	 *            hint character set to decode the old lines with.
+	 * @param newCharset
+	 *            hint character set to decode the new lines with.
+	 * @return the patch script, as a Unicode string.
+	 */
+	public String getScriptText(Charset oldCharset, Charset newCharset) {
+		return getScriptText(new Charset[] { oldCharset, newCharset });
+	}
+
+	protected String getScriptText(Charset[] charsetGuess) {
+		if (getHunks().isEmpty()) {
+			// If we have no hunks then we can safely assume the entire
+			// patch is a binary style patch, or a meta-data only style
+			// patch. Either way the encoding of the headers should be
+			// strictly 7-bit US-ASCII and the body is either 7-bit ASCII
+			// (due to the base 85 encoding used for a BinaryHunk) or is
+			// arbitrary noise we have chosen to ignore and not understand
+			// (e.g. the message "Binary files ... differ").
+			//
+			return extractBinaryString(buf, startOffset, endOffset);
+		}
+
+		if (charsetGuess != null && charsetGuess.length != getParentCount() + 1)
+			throw new IllegalArgumentException("Expected "
+					+ (getParentCount() + 1) + " character encoding guesses");
+
+		if (trySimpleConversion(charsetGuess)) {
+			Charset cs = charsetGuess != null ? charsetGuess[0] : null;
+			if (cs == null)
+				cs = Constants.CHARSET;
+			try {
+				return decodeNoFallback(cs, buf, startOffset, endOffset);
+			} catch (CharacterCodingException cee) {
+				// Try the much slower, more-memory intensive version which
+				// can handle a character set conversion patch.
+			}
+		}
+
+		final StringBuilder r = new StringBuilder(endOffset - startOffset);
+
+		// Always treat the headers as US-ASCII; Git file names are encoded
+		// in a C style escape if any character has the high-bit set.
+		//
+		final int hdrEnd = getHunks().get(0).getStartOffset();
+		for (int ptr = startOffset; ptr < hdrEnd;) {
+			final int eol = Math.min(hdrEnd, nextLF(buf, ptr));
+			r.append(extractBinaryString(buf, ptr, eol));
+			ptr = eol;
+		}
+
+		final String[] files = extractFileLines(charsetGuess);
+		final int[] offsets = new int[files.length];
+		for (final HunkHeader h : getHunks())
+			h.extractFileLines(r, files, offsets);
+		return r.toString();
+	}
+
+	private static boolean trySimpleConversion(final Charset[] charsetGuess) {
+		if (charsetGuess == null)
+			return true;
+		for (int i = 1; i < charsetGuess.length; i++) {
+			if (charsetGuess[i] != charsetGuess[0])
+				return false;
+		}
+		return true;
+	}
+
+	private String[] extractFileLines(final Charset[] csGuess) {
+		final TemporaryBuffer[] tmp = new TemporaryBuffer[getParentCount() + 1];
+		try {
+			for (int i = 0; i < tmp.length; i++)
+				tmp[i] = new TemporaryBuffer();
+			for (final HunkHeader h : getHunks())
+				h.extractFileLines(tmp);
+
+			final String[] r = new String[tmp.length];
+			for (int i = 0; i < tmp.length; i++) {
+				Charset cs = csGuess != null ? csGuess[i] : null;
+				if (cs == null)
+					cs = Constants.CHARSET;
+				r[i] = RawParseUtils.decode(cs, tmp[i].toByteArray());
+			}
+			return r;
+		} catch (IOException ioe) {
+			throw new RuntimeException("Cannot convert script to text", ioe);
+		} finally {
+			for (final TemporaryBuffer b : tmp) {
+				if (b != null)
+					b.destroy();
+			}
+		}
+	}
+
+	/**
 	 * Get the old name associated with this file.
 	 * <p>
 	 * The meaning of the old name can differ depending on the semantic meaning
diff --git a/org.spearce.jgit/src/org/spearce/jgit/patch/HunkHeader.java b/org.spearce.jgit/src/org/spearce/jgit/patch/HunkHeader.java
index 12c670d..fc30311 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/patch/HunkHeader.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/patch/HunkHeader.java
@@ -41,6 +41,9 @@
 import static org.spearce.jgit.util.RawParseUtils.nextLF;
 import static org.spearce.jgit.util.RawParseUtils.parseBase10;
 
+import java.io.IOException;
+import java.io.OutputStream;
+
 import org.spearce.jgit.lib.AbbreviatedObjectId;
 import org.spearce.jgit.util.MutableInteger;
 
@@ -240,4 +243,87 @@ int parseBody(final Patch script, final int end) {
 
 		return c;
 	}
+
+	void extractFileLines(final OutputStream[] out) throws IOException {
+		final byte[] buf = file.buf;
+		int ptr = startOffset;
+		int eol = nextLF(buf, ptr);
+		if (endOffset <= eol)
+			return;
+
+		// Treat the hunk header as though it were from the ancestor,
+		// as it may have a function header appearing after it which
+		// was copied out of the ancestor file.
+		//
+		out[0].write(buf, ptr, eol - ptr);
+
+		SCAN: for (ptr = eol; ptr < endOffset; ptr = eol) {
+			eol = nextLF(buf, ptr);
+			switch (buf[ptr]) {
+			case ' ':
+			case '\n':
+			case '\\':
+				out[0].write(buf, ptr, eol - ptr);
+				out[1].write(buf, ptr, eol - ptr);
+				break;
+			case '-':
+				out[0].write(buf, ptr, eol - ptr);
+				break;
+			case '+':
+				out[1].write(buf, ptr, eol - ptr);
+				break;
+			default:
+				break SCAN;
+			}
+		}
+	}
+
+	void extractFileLines(final StringBuilder sb, final String[] text,
+			final int[] offsets) {
+		final byte[] buf = file.buf;
+		int ptr = startOffset;
+		int eol = nextLF(buf, ptr);
+		if (endOffset <= eol)
+			return;
+		copyLine(sb, text, offsets, 0);
+		SCAN: for (ptr = eol; ptr < endOffset; ptr = eol) {
+			eol = nextLF(buf, ptr);
+			switch (buf[ptr]) {
+			case ' ':
+			case '\n':
+			case '\\':
+				copyLine(sb, text, offsets, 0);
+				skipLine(text, offsets, 1);
+				break;
+			case '-':
+				copyLine(sb, text, offsets, 0);
+				break;
+			case '+':
+				copyLine(sb, text, offsets, 1);
+				break;
+			default:
+				break SCAN;
+			}
+		}
+	}
+
+	protected void copyLine(final StringBuilder sb, final String[] text,
+			final int[] offsets, final int fileIdx) {
+		final String s = text[fileIdx];
+		final int start = offsets[fileIdx];
+		int end = s.indexOf('\n', start);
+		if (end < 0)
+			end = s.length();
+		else
+			end++;
+		sb.append(s, start, end);
+		offsets[fileIdx] = end;
+	}
+
+	protected void skipLine(final String[] text, final int[] offsets,
+			final int fileIdx) {
+		final String s = text[fileIdx];
+		final int end = s.indexOf('\n', offsets[fileIdx]);
+		offsets[fileIdx] = end < 0 ? s.length() : end + 1;
+	}
 }
diff --git a/org.spearce.jgit/src/org/spearce/jgit/util/RawParseUtils.java b/org.spearce.jgit/src/org/spearce/jgit/util/RawParseUtils.java
index 55a3001..ff89e9e 100644
--- a/org.spearce.jgit/src/org/spearce/jgit/util/RawParseUtils.java
+++ b/org.spearce.jgit/src/org/spearce/jgit/util/RawParseUtils.java
@@ -472,6 +472,40 @@ public static String decode(final Charset cs, final byte[] buffer) {
 	 */
 	public static String decode(final Charset cs, final byte[] buffer,
 			final int start, final int end) {
+		try {
+			return decodeNoFallback(cs, buffer, start, end);
+		} catch (CharacterCodingException e) {
+			// Fall back to an ISO-8859-1 style encoding. At least all of
+			// the bytes will be present in the output.
+			//
+			return extractBinaryString(buffer, start, end);
+		}
+	}
+
+	/**
+	 * Decode a region of the buffer under the specified character set if
+	 * possible.
+	 * 
+	 * If the byte stream cannot be decoded that way, the platform default is
+	 * tried and if that too fails, an exception is thrown.
+	 * 
+	 * @param cs
+	 *            character set to use when decoding the buffer.
+	 * @param buffer
+	 *            buffer to pull raw bytes from.
+	 * @param start
+	 *            first position within the buffer to take data from.
+	 * @param end
+	 *            one position past the last location within the buffer to take
+	 *            data from.
+	 * @return a string representation of the range <code>[start,end)</code>,
+	 *         after decoding the region through the specified character set.
+	 * @throws CharacterCodingException
+	 *             the input is not in any of the tested character sets.
+	 */
+	public static String decodeNoFallback(final Charset cs,
+			final byte[] buffer, final int start, final int end)
+			throws CharacterCodingException {
 		final ByteBuffer b = ByteBuffer.wrap(buffer, start, end - start);
 		b.mark();
 
@@ -508,9 +542,26 @@ public static String decode(final Charset cs, final byte[] buffer,
 			}
 		}
 
-		// Fall back to an ISO-8859-1 style encoding. At least all of
-		// the bytes will be present in the output.
-		//
+		throw new CharacterCodingException();
+	}
+
+	/**
+	 * Decode a region of the buffer under the ISO-8859-1 encoding.
+	 * 
+	 * Each byte is treated as a single character in the 8859-1 character
+	 * encoding, performing a raw binary->char conversion.
+	 * 
+	 * @param buffer
+	 *            buffer to pull raw bytes from.
+	 * @param start
+	 *            first position within the buffer to take data from.
+	 * @param end
+	 *            one position past the last location within the buffer to take
+	 *            data from.
+	 * @return a string representation of the range <code>[start,end)</code>.
+	 */
+	public static String extractBinaryString(final byte[] buffer,
+			final int start, final int end) {
 		final StringBuilder r = new StringBuilder(end - start);
 		for (int i = start; i < end; i++)
 			r.append((char) (buffer[i] & 0xff));
-- 
1.6.1.rc3.302.gb14d9

-- 
Shawn.

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-12-17 20:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-13  2:42 [JGIT PATCH 1/2] Add raw buffer fetch methods to FileHeader, HunkHeader Shawn O. Pearce
2008-12-13  2:42 ` [JGIT PATCH 2/2] Add getPatchText functions to obtain the plain-text version of a patch Shawn O. Pearce
2008-12-13 11:02   ` Robin Rosenberg
2008-12-13 21:26     ` Robin Rosenberg
2008-12-17 20:13       ` [JGIT PATCH 2/2 v2] Add getScriptText " Shawn O. Pearce

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.