* Making GitGitGadget conversion lossless
@ 2020-02-26 20:09 Konstantin Ryabitsev
2020-02-26 21:01 ` Junio C Hamano
0 siblings, 1 reply; 5+ messages in thread
From: Konstantin Ryabitsev @ 2020-02-26 20:09 UTC (permalink / raw)
To: git; +Cc: vegard.nossum
Hi, all:
GitGitGadget is great, and I'm looking forward to adapting it to Linux
Kernel's needs. There is one area where I think the situation can be
further improved, and that's if the process of converting a pull request
into a patch series were completely 100% reversible. As of right now,
the following data is permanently lost from commits as they are
converted into patches:
- parent/tree hashes
- author/committer information
- cryptographic attestation (gpgsig)
There is an existing body of work done by Vegard Nossum [1] that makes
it possible to fully reconstruct a git commit from an email message, and
I hope that it can make its way into official upstream. If that were to
happen, it would mean that converting from a pull request into a patch
series would become a lossless operation and tools like GitGitGadget
would be able to preserve full cryptographic attestation of commits.
Vegard, if there is interest in getting this work into upstream, are you
in a position to continue your work on it?
Best regards,
-K
[1]: https://lore.kernel.org/git/20191022114518.32055-1-vegard.nossum@oracle.com/#t
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Making GitGitGadget conversion lossless
2020-02-26 20:09 Making GitGitGadget conversion lossless Konstantin Ryabitsev
@ 2020-02-26 21:01 ` Junio C Hamano
2020-02-26 21:32 ` Vegard Nossum
2020-02-26 21:35 ` Konstantin Ryabitsev
0 siblings, 2 replies; 5+ messages in thread
From: Junio C Hamano @ 2020-02-26 21:01 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: git, vegard.nossum
Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:
> - parent/tree hashes
Isn't this already available by recording the base-commit
information?
> - author/committer information
> - cryptographic attestation (gpgsig)
I think you are aiming to come up with bit-for-bit identical commit
the sender had, and I would imagine that the easiest and least
disruptive way to do so is to add a compressed and ascii-armored
copy of "git cat-file commit" output of the original commit after
the "---" line before the diff/diffstat of the e-mailed patch. The
receiving end can then act on it when given some option by
- first recover the contents of the commit object (call it #1);
- learn the parent commit(s) and check out the tree;
- apply the patch in the remainder of the patch e-mail to the tree;
- make sure that the result of patch application gives the tree object
recorded in #1;
- run "hash-object -t commit -w" over #1 that gives you a commit
object that is bit-for-bit identical.
As I said already, I do not think that the desire to get the
bit-for-bit identical commit is compatible with the idea to discuss
e-mailed patches---the pieces of patch e-mail will become "you may
look at them, you may apply them, but it is no use to comment on
them to get them improved". So, I dunno.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Making GitGitGadget conversion lossless
2020-02-26 21:01 ` Junio C Hamano
@ 2020-02-26 21:32 ` Vegard Nossum
2020-02-26 21:35 ` Konstantin Ryabitsev
1 sibling, 0 replies; 5+ messages in thread
From: Vegard Nossum @ 2020-02-26 21:32 UTC (permalink / raw)
To: Junio C Hamano, Konstantin Ryabitsev; +Cc: git
On 2/26/20 10:01 PM, Junio C Hamano wrote:
> As I said already, I do not think that the desire to get the
> bit-for-bit identical commit is compatible with the idea to discuss
> e-mailed patches---the pieces of patch e-mail will become "you may
> look at them, you may apply them, but it is no use to comment on
> them to get them improved". So, I dunno.
For me, at least, the goal was to be able to store previous patch
submissions in git (even if it is not merged into the main tree) so
that you can use git and all its tools (diff, log, blame, grep, notes,
etc.) to browse previous versions and browse discussions _and_ use the
SHA1 as a stable identifier for a specific submission.
The point of having the stable identifier is so that the submitter can
take comments into account and resubmit their patchset while still
keeping a (stable, universal, unambiguous) reference to their previous
submission.
I don't see the incompatibility at all. The whole point was that the
current email workflow used by Linux and git (that includes discussion,
feedback, and revision) _does not need to change_.
Vegard
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Making GitGitGadget conversion lossless
2020-02-26 21:01 ` Junio C Hamano
2020-02-26 21:32 ` Vegard Nossum
@ 2020-02-26 21:35 ` Konstantin Ryabitsev
2020-02-26 22:27 ` Junio C Hamano
1 sibling, 1 reply; 5+ messages in thread
From: Konstantin Ryabitsev @ 2020-02-26 21:35 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, vegard.nossum
On Wed, Feb 26, 2020 at 01:01:15PM -0800, Junio C Hamano wrote:
> Isn't this already available by recording the base-commit
> information?
>
> > - author/committer information
> > - cryptographic attestation (gpgsig)
>
> I think you are aiming to come up with bit-for-bit identical commit
> the sender had, and I would imagine that the easiest and least
> disruptive way to do so is to add a compressed and ascii-armored
> copy of "git cat-file commit" output of the original commit after
> the "---" line before the diff/diffstat of the e-mailed patch. The
> receiving end can then act on it when given some option by
>
> - first recover the contents of the commit object (call it #1);
> - learn the parent commit(s) and check out the tree;
> - apply the patch in the remainder of the patch e-mail to the tree;
> - make sure that the result of patch application gives the tree object
> recorded in #1;
> - run "hash-object -t commit -w" over #1 that gives you a commit
> object that is bit-for-bit identical.
Right, I just don't want to be doing this in a separate tool. :)
> As I said already, I do not think that the desire to get the
> bit-for-bit identical commit is compatible with the idea to discuss
> e-mailed patches---the pieces of patch e-mail will become "you may
> look at them, you may apply them, but it is no use to comment on
> them to get them improved".
I disagree -- specifically from the attestation point of view. One of
the drawbacks of platforms like lore.kernel.org is that it creates an
opportunity for a malicious actor to compromise it and modify patches
that they know will be downloaded and applied by Linux maintainers -- so
my goal is to ensure that we do not have to trust lore.kernel.org in
order to trust patches downloaded from it. This means some mechanism for
end-to-end patch attestation.
There are two avenues that I am pursuing for this purpose:
1. being able to submit attestation information out-of-band, see
discussion here:
https://lore.kernel.org/workflows/20200226172502.q3fl67ealxsonfgp@chatter.i7.local/T/#u
2. being able to preserve commit signatures as they are converted into
patches and back
I know that it is very uncommon for patches to be applied without any
changes, because the maintainer would almost always add their
Signed-off-by trailer before applying it to their tree. However,
preserving full commit metadata allows checking cryptographic
attestation *before* adding trailers or making any other edits, for
example by making a shallow clone of the worktree, applying the series
"verbatim", as you describe above, and then verifying the signature at
the tip. If "git verify-commit HEAD" is successful, then the maintainer
can be assured that patch contents have not been modified between when
they left the developer's system and arrived at the maintainer's
workstation.
This means nobody needs to trust me or other members of the sysadmin
team responsible for lore.kernel.org in order to trust patches they
retrieve from it.
Best,
-K
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Making GitGitGadget conversion lossless
2020-02-26 21:35 ` Konstantin Ryabitsev
@ 2020-02-26 22:27 ` Junio C Hamano
0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2020-02-26 22:27 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: git, vegard.nossum
Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:
> On Wed, Feb 26, 2020 at 01:01:15PM -0800, Junio C Hamano wrote:
>> Isn't this already available by recording the base-commit
>> information?
>>
>> > - author/committer information
>> > - cryptographic attestation (gpgsig)
>>
>> I think you are aiming to come up with bit-for-bit identical commit
>> the sender had, and I would imagine that the easiest and least
>> disruptive way to do so is to add a compressed and ascii-armored
>> copy of "git cat-file commit" output of the original commit after
>> the "---" line before the diff/diffstat of the e-mailed patch. The
>> receiving end can then act on it when given some option by
>>
>> - first recover the contents of the commit object (call it #1);
>> - learn the parent commit(s) and check out the tree;
>> - apply the patch in the remainder of the patch e-mail to the tree;
>> - make sure that the result of patch application gives the tree object
>> recorded in #1;
>> - run "hash-object -t commit -w" over #1 that gives you a commit
>> object that is bit-for-bit identical.
>
> Right, I just don't want to be doing this in a separate tool. :)
Yes, and I just outlined how it can be expressed in the
"format-patch" output format, and implemented on the "am" side, as
part of "git".
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-02-26 22:27 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-26 20:09 Making GitGitGadget conversion lossless Konstantin Ryabitsev
2020-02-26 21:01 ` Junio C Hamano
2020-02-26 21:32 ` Vegard Nossum
2020-02-26 21:35 ` Konstantin Ryabitsev
2020-02-26 22:27 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).