All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jeff King <peff@peff.net>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH] diff: add a config option to control orderfile
Date: Fri, 20 Sep 2013 00:32:26 +0300	[thread overview]
Message-ID: <20130919213226.GA21291@redhat.com> (raw)
In-Reply-To: <20130917205615.GA20178@sigill.intra.peff.net>

On Tue, Sep 17, 2013 at 04:56:16PM -0400, Jeff King wrote:
> On Tue, Sep 17, 2013 at 11:38:07PM +0300, Michael S. Tsirkin wrote:
> 
> > > A problem with both schemes, though, is that they are not
> > > backwards-compatible with existing git-patch-id implementations.
> > 
> > Could you clarify?
> > We never send patch IDs on the wire - how isn't this compatible?
> 
> I meant that you might be comparing patch-ids generated by different
> implementations, or across time. There are no dedicated tools to do so,
> but it is very easy to do so with standard tools like "join".
> 
> For example, you can do:
> 
>   patch_ids() {
>     git rev-list "$1" |
>     git diff-tree --stdin -p |
>     git patch-id |
>     sort
>   }
> 
>   patch_ids origin..topic1 >us
>   patch_ids origin..topic2 >them
> 
>   join us them | cut -d' ' -f2-3
> 
> to get a list of correlated commits between two branches. If the "them"
> was on another machine with a different implementation (or is saved from
> an earlier time), your patch-ids would not match.
> 
> It may be esoteric enough not to worry about, though. By far the most
> common use of patch-ids is going to be in a single "rev-list
> --cherry-pick" situation where you are trying to omit commits during
> a rebase.
> 
> I am mostly thinking of the problems we had with the "kup" tool, which
> expected stability across diffs that would be signed by both kernel.org.
> But as far as I know, they do not use patch-id. More details in case you
> are curious (including me arguing that we should not care, and it is
> kup's problem!) are here:
> 
>   http://thread.gmane.org/gmane.comp.version-control.git/192331/focus=192424
> 
> rerere is mentioned in that thread, but I believe that it does its own
> hash, and does not rely on patch-id.
> 
> -Peff

OK so far I came up with the following.
Needs documentation and tests obviously.
But besides that -
would something like this be enough to address the
issue Junio raised?

--->

patch-id: make it more stable

Add a new patch-id algorithm making it stable against
hunk reodering:
	- prepend header to each hunk (if not there)
	- calculate SHA1 hash for each hunk separately
	- sum all hashes to get patch id

Add --order-sensitive to get historical unstable behaviour.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 builtin/patch-id.c | 65 +++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 50 insertions(+), 15 deletions(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 3cfe02d..d1902ff 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -1,17 +1,14 @@
 #include "builtin.h"
 
-static void flush_current_id(int patchlen, unsigned char *id, git_SHA_CTX *c)
+static void flush_current_id(int patchlen, unsigned char *id, unsigned char *result)
 {
-	unsigned char result[20];
 	char name[50];
 
 	if (!patchlen)
 		return;
 
-	git_SHA1_Final(result, c);
 	memcpy(name, sha1_to_hex(id), 41);
 	printf("%s %s\n", sha1_to_hex(result), name);
-	git_SHA1_Init(c);
 }
 
 static int remove_space(char *line)
@@ -56,10 +53,30 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
 	return 1;
 }
 
-static int get_one_patchid(unsigned char *next_sha1, git_SHA_CTX *ctx, struct strbuf *line_buf)
+static void flush_one_hunk(unsigned char *result, git_SHA_CTX *ctx)
 {
-	int patchlen = 0, found_next = 0;
+	unsigned char hash[20];
+	unsigned short carry = 0;
+	int i;
+
+	git_SHA1_Final(hash, ctx);
+	git_SHA1_Init(ctx);
+	/* 20-byte sum, with carry */
+	for (i = 0; i < 20; ++i) {
+		carry += result[i] + hash[i];
+		result[i] = carry;
+		carry >>= 8;
+	}
+}
+static int get_one_patchid(unsigned char *next_sha1, unsigned char *result,
+			   struct strbuf *line_buf, int stable)
+{
+	int patchlen = 0, found_next = 0, hunks = 0;
 	int before = -1, after = -1;
+	git_SHA_CTX ctx, header_ctx;
+
+	git_SHA1_Init(&ctx);
+	hashclr(result);
 
 	while (strbuf_getwholeline(line_buf, stdin, '\n') != EOF) {
 		char *line = line_buf->buf;
@@ -99,6 +116,18 @@ static int get_one_patchid(unsigned char *next_sha1, git_SHA_CTX *ctx, struct st
 			if (!memcmp(line, "@@ -", 4)) {
 				/* Parse next hunk, but ignore line numbers.  */
 				scan_hunk_header(line, &before, &after);
+				if (stable) {
+					if (hunks) {
+						flush_one_hunk(result, &ctx);
+						memcpy(&ctx, &header_ctx,
+						       sizeof ctx);
+					} else {
+						/* Save ctx for next hunk.  */
+						memcpy(&header_ctx, &ctx,
+						       sizeof ctx);
+					}
+				}
+				hunks++;
 				continue;
 			}
 
@@ -108,6 +137,7 @@ static int get_one_patchid(unsigned char *next_sha1, git_SHA_CTX *ctx, struct st
 
 			/* Else we're parsing another header.  */
 			before = after = -1;
+			hunks = 0;
 		}
 
 		/* If we get here, we're inside a hunk.  */
@@ -119,27 +149,27 @@ static int get_one_patchid(unsigned char *next_sha1, git_SHA_CTX *ctx, struct st
 		/* Compute the sha without whitespace */
 		len = remove_space(line);
 		patchlen += len;
-		git_SHA1_Update(ctx, line, len);
+		git_SHA1_Update(&ctx, line, len);
 	}
 
 	if (!found_next)
 		hashclr(next_sha1);
 
+	flush_one_hunk(result, &ctx);
+
 	return patchlen;
 }
 
-static void generate_id_list(void)
+static void generate_id_list(int stable)
 {
-	unsigned char sha1[20], n[20];
-	git_SHA_CTX ctx;
+	unsigned char sha1[20], n[20], result[20];
 	int patchlen;
 	struct strbuf line_buf = STRBUF_INIT;
 
-	git_SHA1_Init(&ctx);
 	hashclr(sha1);
 	while (!feof(stdin)) {
-		patchlen = get_one_patchid(n, &ctx, &line_buf);
-		flush_current_id(patchlen, sha1, &ctx);
+		patchlen = get_one_patchid(n, result, &line_buf, stable);
+		flush_current_id(patchlen, sha1, result);
 		hashcpy(sha1, n);
 	}
 	strbuf_release(&line_buf);
@@ -149,9 +179,14 @@ static const char patch_id_usage[] = "git patch-id < patch";
 
 int cmd_patch_id(int argc, const char **argv, const char *prefix)
 {
-	if (argc != 1)
+	int stable;
+	if (argc == 2 && !strcmp(argv[1], "--order-sensitive"))
+		stable = 0;
+	else if (argc == 1)
+		stable = 1;
+	else
 		usage(patch_id_usage);
 
-	generate_id_list();
+	generate_id_list(stable);
 	return 0;
 }
-- 
MST

  parent reply	other threads:[~2013-09-19 23:03 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-31 19:44 [PATCH] diff: add a config option to control orderfile Michael S. Tsirkin
2013-09-03 17:12 ` Junio C Hamano
2013-09-03 21:08   ` Michael S. Tsirkin
2013-09-15  7:49     ` Michael S. Tsirkin
2013-09-15  8:08       ` Michael S. Tsirkin
2013-09-17 16:26         ` Junio C Hamano
2013-09-17 16:42           ` Michael S. Tsirkin
2013-09-17 17:24             ` Junio C Hamano
2013-09-17 17:28               ` Michael S. Tsirkin
2013-09-17 18:06                 ` Junio C Hamano
2013-09-17 19:25                   ` Michael S. Tsirkin
2013-09-17 20:14                   ` Michael S. Tsirkin
2013-09-17 20:16                     ` Michael S. Tsirkin
2013-09-17 20:18                       ` Jeff King
2013-09-17 20:38                         ` Michael S. Tsirkin
2013-09-17 20:41                           ` Michael S. Tsirkin
2013-09-17 20:56                           ` Jeff King
2013-09-17 21:03                             ` Michael S. Tsirkin
2013-09-17 21:06                               ` Jeff King
2013-09-17 21:52                             ` Junio C Hamano
2013-09-19 21:32                             ` Michael S. Tsirkin [this message]
2013-09-23 21:09                               ` Michael S. Tsirkin
2013-09-23 21:37                                 ` Jonathan Nieder
2013-09-24  5:45                                   ` Jeff King
2013-09-24  5:54                                   ` Michael S. Tsirkin
2013-09-24 19:36                                     ` Jonathan Nieder
2013-09-24 20:15                                       ` Michael S. Tsirkin
2013-09-24 21:31                                         ` Jonathan Nieder
2013-09-24 21:57                                           ` Michael S. Tsirkin
2013-09-24 22:08                                             ` Jonathan Nieder
2013-09-17 20:31                       ` Michael S. Tsirkin
2013-09-21 21:08               ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130919213226.GA21291@redhat.com \
    --to=mst@redhat.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.