All of lore.kernel.org
 help / color / mirror / Atom feed
* [JGIT] patch-id
@ 2009-09-28 22:21 Nasser Grainawi
  2009-10-08 16:28 ` Shawn O. Pearce
  0 siblings, 1 reply; 2+ messages in thread
From: Nasser Grainawi @ 2009-09-28 22:21 UTC (permalink / raw)
  To: Shawn O. Pearce, Robin Rosenberg; +Cc: Git Mailing List

Hello again,

I'm trying to add a public getPatchId method to the jgit Patch class and I
came up with some questions. Shawn previously mentioned that Patch already
does the parsing of the patch; however, I can't quite wrap my head around
how/where/if data from that parsing is stored.

It seems Patch does some statistical number gathering, but at no point does
it store a 'slimmed-down' version of a patch. I had the idea to just iterate
over the FileHeader's and get the byte buffer of each, but I don't think
those buffers have the parsed data.

If I've mis-read the code (quite possible), someone please let me know.
Short of that, suggestions for how to go about acquiring/storing a parsed
representation of the data with maximal existing code re-use would be
appreciated.

Thanks,
Nasser

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [JGIT] patch-id
  2009-09-28 22:21 [JGIT] patch-id Nasser Grainawi
@ 2009-10-08 16:28 ` Shawn O. Pearce
  0 siblings, 0 replies; 2+ messages in thread
From: Shawn O. Pearce @ 2009-10-08 16:28 UTC (permalink / raw)
  To: Nasser Grainawi; +Cc: Robin Rosenberg, Git Mailing List

Nasser Grainawi <nasser@codeaurora.org> wrote:
> I'm trying to add a public getPatchId method to the jgit Patch class [...]
>
> It seems Patch does some statistical number gathering, but at no point does
> it store a 'slimmed-down' version of a patch.

It parses the patch to create FileHeader objects, one for each
file mentioned in the script.  Within each FileHeader there is a
HunkHeader object, one for each hunk present in the patch.  Within
each HunkHeader there is an EditList composed of Edit instances;
each Edit instance denotes a contiguous line range within that hunk.

Edit instances come in one of 3 forms:

  INSERT:  a run of + lines with no - lines
  DELETE:  a run of - lines with no + lines
  REPLACE: a mixture of - and + lines

and their type is actually determined by the line numbers attached
to them.  A INSERT has the same starting and ending line number on
the A side, but on the B side the ending line number is at least
one higher than the starting number.  DELETE is the reverse, and
REPLACE has both ending numbers higher than the starting number.

IIRC Edit uses 0 based offsets, so line 3 is actually position 2.

These HunkHeader and Edit instances are only available on a text
patch, binary patches use a different representation for the
binary delta.  Combined diff patches (--cc format) also lack these
HunkHeader/Edit instances as we don't have a generic n-way patch
parser yet.

> I had the idea to just iterate
> over the FileHeader's and get the byte buffer of each, but I don't think
> those buffers have the parsed data.

The HunkHeader and Edit instances really don't have the actual
line data available to them, they only have the line numbers.
To generate a patch ID you'd need to get the line data too.

Worse, IIRC the patch ID generation in C git favors a 3 line context.

In theory you could modify FileHeader or HunkHeader to produce
a RawText that uses the underlying byte[] returned by getBuffer()
as the backing store, but create a specialized IntList which has the
actual file line numbers mapped to the positions in the patch script.
To do that you'd need to re-walk the patch, like the toEditList()
method in HunkHeader does.

Given that RawText you could feed it through something like
DiffFormatter to create a patch with 3 lines of context, and hash
the relevant bits.

But... that seems like a lot of work.

Also, there is a class in Gerrit Code Review called EditList (not
to be confused with JGit's EditList class!) that really should be
moved back over to JGit.  It has some useful routines for walking
through a patch as a series of iterations.

> Short of that, suggestions for how to go about acquiring/storing a parsed
> representation of the data with maximal existing code re-use would be
> appreciated.

I'm coming up short on suggestions right now.  I'm not seeing an
easy path to this without writing a bit of code.  I think you really
just need to walk the patch... :-\

-- 
Shawn.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-10-08 16:32 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-28 22:21 [JGIT] patch-id Nasser Grainawi
2009-10-08 16:28 ` Shawn O. Pearce

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.