All of lore.kernel.org
 help / color / mirror / Atom feed
* Patch attestation RFC + proof of concept
@ 2020-02-26 17:25 Konstantin Ryabitsev
  2020-02-26 17:50 ` Kees Cook
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Konstantin Ryabitsev @ 2020-02-26 17:25 UTC (permalink / raw)
  To: workflows

Hi, all:

I've been mulling over the idea of submitting and verifying 
cryptographic patch attestation data in a way that would be both useful 
and unobtrusive -- and I would like to propose a mechanism and a 
proof-of-concept script for your discussion.

# TL;DR

1. git clone https://git.kernel.org/pub/scm/linux/kernel/git/mricon/korg-helpers.git
2. cd korg-helpers
3. ./get-lore-mbox.py 20200225051307.6401-1-keescook@chromium.org -aT
4. gpg --locate-keys kees@kernel.org
5. ./attest-patches.py -c v4_20200224_keescook_chromium_org.mbx -tF

PASS | [PATCH v4 1/6] x86/elf: Add table to document READ_IMPLIES_EXEC
PASS | [PATCH v4 2/6] x86/elf: Split READ_IMPLIES_EXEC from executable GNU_STACK
PASS | [PATCH v4 3/6] x86/elf: Disable automatic READ_IMPLIES_EXEC for 64-bit address spaces
PASS | [PATCH v4 4/6] arm32/64, elf: Add tables to document READ_IMPLIES_EXEC
PASS | [PATCH v4 5/6] arm32/64, elf: Split READ_IMPLIES_EXEC from executable GNU_STACK
PASS | [PATCH v4 6/6] arm64, elf: Disable automatic READ_IMPLIES_EXEC for 64-bit address spaces
---
All patches passed attestation:
  Attestation-by: Kees Cook <kees@kernel.org> (pgp:8972F4DFDC6DC026)

# Preamble

PGP is already commonly used by the kernel.org community for signing and 
verifying git tags when working with pull requests. However, even though 
PGP was developed with email in mind, it is rarely used for attesting 
patches submitted via email, for a number of good reasons:

- PGP support in email clients remains poor
- inline PGP is generally disliked because it interferes with the code 
  review process (and can mess up patches due to needing to escape 
  single dashes)
- MIME-wrapped PGP frequently doesn't survive the trip due to being 
  mangled by MTAs, mailing list software, etc, and may also interfere with
  the code review process due to mime-wrapping actual patches
- delegated trust is a hard problem to solve and doesn't easily scale up 
  to the level of thousands of contributors

For this reason, PGP-based attestation of emailed patches has never 
really taken off in the kernel community (or, in fact, in most other 
communities using email-based workflows).

# Proposal

I would like to propose integrating PGP-based patch attestation into the 
tooling we are building to automate a lot of tasks performed by the 
kernel.org maintainer community. As some of you know, I've put together 
get-lore-mbox, which is a client-side tool written with the goal to make 
it easier to retrieve patches from the lore.kernel.org archival system.  
While get-lore-mbox remains a work in progress, it has been generally 
well received by the kernel development community, and its usefulness 
should only grow as it is honed to handle various corner-cases around 
this patch workflow.

I suggest that we start using lore.kernel.org as a mechanism to record 
and retrieve patch attestation data, using a special pseudo-list I set 
up for this purpose: https://lore.kernel.org/signatures/.

## Main concepts

In each patch submission there are three distinct parts of interest:

- basic patch metadata (author, title, date)
- patch description, which becomes the git commit message
- the patch itself

There is frequently other information included with the patch that is 
largely superfluous (diffstat, series versioning info, etc). It is of no 
particular interest to us because it does not become part of the git 
commit and therefore is not useful for attestation purposes.

This separation into three component parts is important, because patch 
description is routinely edited by maintainers to add various trailer 
metadata (Signed-Off-By, Acked-By, etc). Patches may also end up edited 
for typos, formatting, or ease of merging, so what ends up committed 
into git repositories by maintainers may differ from patches as sent to 
the mailing lists.

### Three hashes per patch

If you look at the contents of the patch attestation message 
(https://lore.kernel.org/signatures/202002251425.E7847687B@keescook/), 
you will notice a yaml-style formatted document with a series of three 
hashes. Let's take the first one as example:

2a02abe0-215cf3f1-2acb5798:
  i: 2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e
  m: 215cf3f133478917ad147a6eda1010a9c4bba1846e7dd35295e9a0081559e9b0
  p: 2acb5798c366f97501f8feacb873327bac161951ce83e90f04bbcde32e993865

The source of these hashes is the following patch:
https://lore.kernel.org/kernel-hardening/20200225051307.6401-2-keescook@chromium.org/

To split the patch into its three components we can use the following 
command:

curl -s https://lore.kernel.org/kernel-hardening/20200225051307.6401-2-keescook@chromium.org/raw \
  | git mailinfo m p > i

The three files are:

m: the commit message
p: the patch (plus superfluous surrounding content)
i: author, email, subject, date

We can immediately calculate the m hash, as it requires no munging:

$ sha256sum m
215cf3f133478917ad147a6eda1010a9c4bba1846e7dd35295e9a0081559e9b0  m

To calculate the "p" hash, we first need to remove any surrounding junk 
that isn't just the patch itself. The goal is to get the exact same 
content as produced by "git diff". If we remove all lines preceding the 
first "diff" and then everything following the content of the last hunk, 
we get the p hash:

$ vi p
$ sha256sum p
2acb5798c366f97501f8feacb873327bac161951ce83e90f04bbcde32e993865  p

We cannot use the contents of the "i" file verbatim because it includes 
the Date: header, which is modified by git-send-email. We therefore only 
hash the Author, Email, and Subject lines:

$ egrep '^(Author|Email|Subject)' i | sha256sum
2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e  -

We then take the first 8 characters of the i-m-p hash to create the
attestation-ID: 2a02abe0-215cf3f1-2acb5798.

### One attestation document per many patches

The same process is repeated for all patches in the series. The 
resulting YAML document is then PGP-signed and sent to 
signatures@kernel.org. The metadata of that message is intentionally 
kept to a minimum in order to minimize PII-sensitive data and
reduce the potential for GDPR removal requests.

Since the content of this document is extremely structured, we can 
develop a simple public-inbox filter to almost completely reduce the 
possibility of adding spam into the archive. We can also create a simple 
ingestion URL that can consume POST requests containing attestation data 
that will serve as an alternative to sending out the attestation email.

## Verification process

To perform verification, the same three hashes are generated for each 
patch being attested, along with the i{8}-m{8}-p{8} attestation-id. We 
then use that attestation-id to query the signatures archive on 
lore.kernel.org:

https://lore.kernel.org/signatures/?q=2a02abe0-215cf3f1-2acb5798

For each returned result, we perform PGP validation, and if it passes 
(meaning that we have the key in the local keyring and it is assigned 
sufficient trust), we then parse the message contents.

A patch is considered as "passing attestation" if:

- PGP validation is successful
- We find an entry with the same three hashes
- The email address in the "From:" field of the email being verified 
  matches one of the UIDs on the key used to sign the attestation 
  document

This should be sufficient to provide strong assurance that the patch and 
all significant commit metadata are identical between the system where 
attestation was generated and the system where attestation was checked.

# The attest-patches.py proof of concept

The provided proof-of-concept script is able to both create and verify 
patch attestation data. It should handle most obvious malicious 
corner-cases that I was able to think of, but it hasn't been 
scrutinized, which is why it probably shouldn't be used for real work 
until more people have a chance to weigh in on both the script and the 
overall concept.

## The submitter workflow

As I envision it, the submitter workflow would look as follows:

- the developer runs "git-format-patch"
- the developer reviews the changes and makes any last-minute edits they 
  deem necessary before submitting their work to the list/maintainer
- the developer executes "git-send-email"
- the developer runs "attest-patches -a *.patches"
- the developer sends attestation.eml to signatures@kernel.org
  (or the tool auto-POSTs it to the submission URL, as mentioned)

There can even be a fairly simple wrapper around git-send-email that 
would perform attestation as part of the "sending patches" stage.

## The reviewer workflow

The reviewer does not need to concern themselves with attestation until 
they are ready to apply the patches to their git tree. When that moment 
comes:

- the maintainer runs get-lore-mbox -aA (-A is not implemented yet)
- get-lore-mbox performs attestation before generating the am-ready mbox
- if attestation passes, get-lore-mbox adds two trailers to each patch: 
  "Attestation-by:" and "Attestation-verified:". In our example case 
  those are:
  Attestation-by: Kees Cook <kees@kernel.org> (pgp:8972F4DFDC6DC026)
  Attestation-verified: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
- if attestation does not pass, get-lore-mbox can provide some basic 
  explanation behind the failure before aborting:
  - attestation not found in the archive
  - the PGP key does not have sufficient trust
  - attestation exists but not all hashes pass, in which case the tool 
    can show which hashes failed verification
  - etc

If attestation is not found in the archive (e.g. the submitter didn't 
bother submitting it), the maintainer can request that it is generated 
and submitted post-fact (e.g. by rerunning "git-format-patch", or using 
their "Sent" folder, etc). 

For obvious and trivial patches, the maintainer may forego 
checking/requiring attestation entirely. That said, if a subsystem 
adopts attestation requirements, it should stick to requiring it on all 
submitted patches on the basis of principle.

# Thoughts?

Okay, what do you all think? I believe this scheme has the following 
merits:

- it is opt-in and can be adopted by individual subsystem maintainers
- it builds on top of the PGP trust framework already used extensively 
  by the kernel developers
- it doesn't litter mailing lists with non-human-readable attestation 
  junk
- it doesn't require that attestation data is created at the time when 
  patches are submitted for review; the maintainer can request that it 
  is provided at a later time when they are ready to apply the series to 
  their git tree and want attestation data for the final sanity check 
  and record-keeping
- all attestations are recorded in the public-inbox "signatures" feed 
  that can be mirrored along all other public-inbox repositories on 
  lore.kernel.org

Downsides:

- we aren't solving the problem of delegated trust, which will continue 
  to be the hardest part behind any distributed development effort

I would greatly appreciate any feedback.

Best,
-K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-26 17:25 Patch attestation RFC + proof of concept Konstantin Ryabitsev
@ 2020-02-26 17:50 ` Kees Cook
  2020-02-26 18:47   ` Konstantin Ryabitsev
  2020-02-26 20:11 ` Jason Gunthorpe
  2020-02-27  4:11 ` Jason A. Donenfeld
  2 siblings, 1 reply; 17+ messages in thread
From: Kees Cook @ 2020-02-26 17:50 UTC (permalink / raw)
  To: workflows

On Wed, Feb 26, 2020 at 12:25:02PM -0500, Konstantin Ryabitsev wrote:
> ## The submitter workflow
> 
> As I envision it, the submitter workflow would look as follows:
> 
> - the developer runs "git-format-patch"
> - the developer reviews the changes and makes any last-minute edits they 
>   deem necessary before submitting their work to the list/maintainer
> - the developer executes "git-send-email"
> - the developer runs "attest-patches -a *.patches"
> - the developer sends attestation.eml to signatures@kernel.org
>   (or the tool auto-POSTs it to the submission URL, as mentioned)
> 
> There can even be a fairly simple wrapper around git-send-email that 
> would perform attestation as part of the "sending patches" stage.

FWIW, I would find this utterly trivial to add to my workflow. (The only
minor difference that doesn't really matter is that in addition to
series-style patches, I also regularly send one-offs, but that would
just be a short attestation email.)

> ## The reviewer workflow
> 
> The reviewer does not need to concern themselves with attestation until 
> they are ready to apply the patches to their git tree. When that moment 
> comes:
> 
> - the maintainer runs get-lore-mbox -aA (-A is not implemented yet)
> - get-lore-mbox performs attestation before generating the am-ready mbox
> - if attestation passes, get-lore-mbox adds two trailers to each patch: 
>   "Attestation-by:" and "Attestation-verified:". In our example case 
>   those are:
>   Attestation-by: Kees Cook <kees@kernel.org> (pgp:8972F4DFDC6DC026)
>   Attestation-verified: Konstantin Ryabitsev <konstantin@linuxfoundation.org>

We'll probably need to have a stable way to deal with key aliases. Even
in this example my email addresses are keescook@chromium.org (patches
and attestation sender), kees@outflux.net (comment in gpg sig), and
kees@kernel.org (since that's what Konstantin used to fetch my key).

And perhaps I lack imagination, but what is the overall purpose of these
proposed tags? If it's just a hint to whether the patch has attestation,
the '-verified:' isn't needed. Anyone wanting to check attestation would
always need to do the full heavy lifting anyway.

> # Thoughts?
> 
> Okay, what do you all think? I believe this scheme has the following 
> merits:
> 
> - it is opt-in and can be adopted by individual subsystem maintainers
> - it builds on top of the PGP trust framework already used extensively 
>   by the kernel developers
> - it doesn't litter mailing lists with non-human-readable attestation 
>   junk
> - it doesn't require that attestation data is created at the time when 
>   patches are submitted for review; the maintainer can request that it 
>   is provided at a later time when they are ready to apply the series to 
>   their git tree and want attestation data for the final sanity check 
>   and record-keeping
> - all attestations are recorded in the public-inbox "signatures" feed 
>   that can be mirrored along all other public-inbox repositories on 
>   lore.kernel.org

Moar blockchains! ;) But, yes, it provides a signed path from author to
committer without interfering with existing workflows. I like it!

> Downsides:
> 
> - we aren't solving the problem of delegated trust, which will continue 
>   to be the hardest part behind any distributed development effort

This was immediately my first question. How does a committer choose the
correct GPG key? /me waits for the kernel.org key server to appear...

Thanks for working on this! It was fun being the alpha guinea pig. ;)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-26 17:50 ` Kees Cook
@ 2020-02-26 18:47   ` Konstantin Ryabitsev
  0 siblings, 0 replies; 17+ messages in thread
From: Konstantin Ryabitsev @ 2020-02-26 18:47 UTC (permalink / raw)
  To: Kees Cook; +Cc: workflows

On Wed, Feb 26, 2020 at 09:50:50AM -0800, Kees Cook wrote:
> > As I envision it, the submitter workflow would look as follows:
> > 
> > - the developer runs "git-format-patch"
> > - the developer reviews the changes and makes any last-minute edits they 
> >   deem necessary before submitting their work to the list/maintainer
> > - the developer executes "git-send-email"
> > - the developer runs "attest-patches -a *.patches"
> > - the developer sends attestation.eml to signatures@kernel.org
> >   (or the tool auto-POSTs it to the submission URL, as mentioned)
> > 
> > There can even be a fairly simple wrapper around git-send-email that 
> > would perform attestation as part of the "sending patches" stage.
> 
> FWIW, I would find this utterly trivial to add to my workflow. (The only
> minor difference that doesn't really matter is that in addition to
> series-style patches, I also regularly send one-offs, but that would
> just be a short attestation email.)

Yes, and you can even send a single attestation email for a bunch of 
patches that have nothing to do with each-other (i.e. they don't need to 
all be part of the same series). E.g. you could have an automated script 
that hooks into your "Sent" folder and sends regular attestations for 
any new patches it finds.

> > ## The reviewer workflow
> > 
> > The reviewer does not need to concern themselves with attestation until 
> > they are ready to apply the patches to their git tree. When that moment 
> > comes:
> > 
> > - the maintainer runs get-lore-mbox -aA (-A is not implemented yet)
> > - get-lore-mbox performs attestation before generating the am-ready mbox
> > - if attestation passes, get-lore-mbox adds two trailers to each patch: 
> >   "Attestation-by:" and "Attestation-verified:". In our example case 
> >   those are:
> >   Attestation-by: Kees Cook <kees@kernel.org> (pgp:8972F4DFDC6DC026)
> >   Attestation-verified: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
> 
> We'll probably need to have a stable way to deal with key aliases. Even
> in this example my email addresses are keescook@chromium.org (patches
> and attestation sender), kees@outflux.net (comment in gpg sig), and
> kees@kernel.org (since that's what Konstantin used to fetch my key).

The TL;DR section uses the Web Key Directory to look up key information, 
which will strip any UIDs not matching @kernel.org. I totally disagree 
with this WKD behaviour, but I can't change it. This is why we pass the 
-F flag to attest-patches, which forces it to ignore From mismatches.

When a key is imported via something like kernel.org's pgpkeys.git 
repository, it will have all of your UIDs and the "Attestation-by" line 
will use the one that matches the email address in the From: field. This 
is the default behaviour.

> And perhaps I lack imagination, but what is the overall purpose of these
> proposed tags? If it's just a hint to whether the patch has attestation,
> the '-verified:' isn't needed. Anyone wanting to check attestation would
> always need to do the full heavy lifting anyway.

It's true, the -verified tag is redundant with Signed-off-by. I'm happy 
to drop it. On the other hand, I think the "Attestation-by" trailer is 
helpful because the Author of the patch may be different from the person 
sending the patches to the maintainer. E.g. in this case, where the 
patch author is different from the sender:
https://lore.kernel.org/linux-mm/20200204013345.IPi4a1jYT%25akpm@linux-foundation.org/

So, a presence of "Attestation-by: Andrew Morton" would help build 
attestation chain. If there is no "Attestation-by: Gang He", then we'll 
know that there was no cryptographic attestation for this patch en route 
to akpm.

> > - all attestations are recorded in the public-inbox "signatures" 
> > feed that can be mirrored along all other public-inbox repositories 
> > on lore.kernel.org
> 
> Moar blockchains! ;) But, yes, it provides a signed path from author to
> committer without interfering with existing workflows. I like it!

I tried not to use the "B" word, but sure. :) There is no 
proof-of-whatnot + global reconciliation in the case of git 
repositories, so it's not technically a blockchain in its strictest 
sense, even if we PGP-sign all commits. </pedant mode>

> > Downsides:
> > 
> > - we aren't solving the problem of delegated trust, which will continue 
> >   to be the hardest part behind any distributed development effort
> 
> This was immediately my first question. How does a committer choose the
> correct GPG key? /me waits for the kernel.org key server to appear...

The attest-patches script offers the -t flag, which forces the "TOFU" 
trust model, meaning that the fist time you come across a key that you 
don't know, it's considered good and trusted. If in the future you come 
across another key with the same email address on the UID, GnuPG will 
mark both keys as untrusted and force you to pick which one you prefer.

There is also some ongoing effort that tries to implement the 
"distributed ID" spec for git [1], which aims to standardize on a way to 
pass developer key information via the repository itself (or via a 
linked repository). It's an ambitious goal, but if it's successful, then 
it will offer a way to record key information for most prominent 
developers inside git itself (similar to pgpkeys.git).

[1]: https://public-inbox.org/git/CACi-FhDeAZecXSM36zroty6kpf2BCWLS=0R+dUwuB96LqFKuTA@mail.gmail.com/

In general terms, though, I see this mostly being relevant in ongoing 
long-term developer relationships. The goal is to prevent malicious 
tampering of patches that appear to be coming from a fellow developer 
with whom you have a long-term collaboration history and whose judgment 
you trust.

If you're receiving a patch submission from someone with whom you have 
no history, then you have no reason to trust that their code is not 
trying to do something malicious -- in which case checking their PGP 
signature will not be meaningful anyway.

> Thanks for working on this! It was fun being the alpha guinea pig. ;)

I totally forgot to thank you for kindly agreeing to guinea-pig this for 
me. :) So, thanks again!

-K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-26 17:25 Patch attestation RFC + proof of concept Konstantin Ryabitsev
  2020-02-26 17:50 ` Kees Cook
@ 2020-02-26 20:11 ` Jason Gunthorpe
  2020-02-26 20:42   ` Konstantin Ryabitsev
  2020-02-27  4:11 ` Jason A. Donenfeld
  2 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2020-02-26 20:11 UTC (permalink / raw)
  To: workflows

On Wed, Feb 26, 2020 at 12:25:02PM -0500, Konstantin Ryabitsev wrote:

> ### Three hashes per patch
> 
> If you look at the contents of the patch attestation message 
> (https://lore.kernel.org/signatures/202002251425.E7847687B@keescook/), 
> you will notice a yaml-style formatted document with a series of three 
> hashes. Let's take the first one as example:
> 
> 2a02abe0-215cf3f1-2acb5798:
>   i: 2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e
>   m: 215cf3f133478917ad147a6eda1010a9c4bba1846e7dd35295e9a0081559e9b0
>   p: 2acb5798c366f97501f8feacb873327bac161951ce83e90f04bbcde32e993865
> 
> The source of these hashes is the following patch:
> https://lore.kernel.org/kernel-hardening/20200225051307.6401-2-keescook@chromium.org/

If you define an alternative message signature algorithm like this,
then is there still value in detatching the PGP signature from the
patch email?

The usual PGP signature computes a hash of the message in a certain
way (with unquoting etc). If you instead replace that with your method
and then just generate the normal base64 blob using:

  msg_hash = HASH(HASH(i) || HASH(m) || HASH(p))
  sig = RSA_Sign(msg_hash)

Then the base64 of the sig can just be dropped at the end of the
message, and doesn't need to be detached, or need the various ---BEGIN
PGP--- overheads

The magic I see here is defining a way to compute the message hash of
a patch email that doesn't cause a big mess.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-26 20:11 ` Jason Gunthorpe
@ 2020-02-26 20:42   ` Konstantin Ryabitsev
  2020-02-26 21:04     ` Jason Gunthorpe
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Ryabitsev @ 2020-02-26 20:42 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: workflows

On Wed, Feb 26, 2020 at 04:11:40PM -0400, Jason Gunthorpe wrote:
> > If you look at the contents of the patch attestation message 
> > (https://lore.kernel.org/signatures/202002251425.E7847687B@keescook/), 
> > you will notice a yaml-style formatted document with a series of 
> > three hashes. Let's take the first one as example:
> > 
> > 2a02abe0-215cf3f1-2acb5798:
> >   i: 2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e
> >   m: 215cf3f133478917ad147a6eda1010a9c4bba1846e7dd35295e9a0081559e9b0
> >   p: 2acb5798c366f97501f8feacb873327bac161951ce83e90f04bbcde32e993865
> > 
> > The source of these hashes is the following patch:
> > https://lore.kernel.org/kernel-hardening/20200225051307.6401-2-keescook@chromium.org/
> 
> If you define an alternative message signature algorithm like this,
> then is there still value in detatching the PGP signature from the
> patch email?

I believe that yes, because it offers better workflows around the
following scenarios:

- developer does all their work on a remote VM that doesn't have access 
  to their PGP keys and submits actual attestation when they get back to 
  their workstation
- developer configures their smartcard to require a PIN during each 
  operation and disables the pgp-agent; sending a series of 40 patches 
  requires a single PIN entry instead of 40
- developer submits a v1 of the patch that they don't expect to pass on 
  the first try and doesn't bother submitting attestation; shockingly,
  the maintainer accepts it as-is and the developer can attest their 
  patches post-fact *without* needing to collect all the acked-by's 
  reviewed-by's etc from all others who have already responded to the v1 
  submission

> The usual PGP signature computes a hash of the message in a certain
> way (with unquoting etc). If you instead replace that with your method
> and then just generate the normal base64 blob using:
> 
>   msg_hash = HASH(HASH(i) || HASH(m) || HASH(p))
>   sig = RSA_Sign(msg_hash)

The reason I want to leave i/m/p hashes individually present is because 
it makes it possible to query patch attestation information based on a 
subset of full information.

For example, a maintainer will almost certainly edit the message content 
to add their own Signed-off-by, and may edit the patch for minor 
nitpicking. Full i-m-p attestation will fail in this case, but we can 
then query the signatures archive for each individual hash to identify 
which part of the submission fails validation:
https://lore.kernel.org/signatures/?q=2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e

This lets us present the maintainer with more useful info, like: "full 
attestation failed, but the only changed part is actually the message 
and not the patch content, so it's probably still okay to apply."

> Then the base64 of the sig can just be dropped at the end of the
> message, and doesn't need to be detached, or need the various ---BEGIN
> PGP--- overheads
> 
> The magic I see here is defining a way to compute the message hash of
> a patch email that doesn't cause a big mess.

I still think that one of the key benefits of this proposal is being 
able to submit patch attestation data post-fact. For signatures included 
with patches, I'd rather see this happen during the git-format-patch 
step following Vagard Nossum's work of fully reconstructing commits from 
patches -- see my email to the git list here:
https://lore.kernel.org/git/20200226200929.z4aej74ohbkgcdza@chatter.i7.local/T/#u

Best,
-K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-26 20:42   ` Konstantin Ryabitsev
@ 2020-02-26 21:04     ` Jason Gunthorpe
  2020-02-26 21:18       ` Konstantin Ryabitsev
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Gunthorpe @ 2020-02-26 21:04 UTC (permalink / raw)
  To: workflows

On Wed, Feb 26, 2020 at 03:42:31PM -0500, Konstantin Ryabitsev wrote:
> On Wed, Feb 26, 2020 at 04:11:40PM -0400, Jason Gunthorpe wrote:
> > > If you look at the contents of the patch attestation message 
> > > (https://lore.kernel.org/signatures/202002251425.E7847687B@keescook/), 
> > > you will notice a yaml-style formatted document with a series of 
> > > three hashes. Let's take the first one as example:
> > > 
> > > 2a02abe0-215cf3f1-2acb5798:
> > >   i: 2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e
> > >   m: 215cf3f133478917ad147a6eda1010a9c4bba1846e7dd35295e9a0081559e9b0
> > >   p: 2acb5798c366f97501f8feacb873327bac161951ce83e90f04bbcde32e993865
> > > 
> > > The source of these hashes is the following patch:
> > > https://lore.kernel.org/kernel-hardening/20200225051307.6401-2-keescook@chromium.org/
> > 
> > If you define an alternative message signature algorithm like this,
> > then is there still value in detatching the PGP signature from the
> > patch email?
> 
> I believe that yes, because it offers better workflows around the
> following scenarios:
> 
> - developer does all their work on a remote VM that doesn't have access 
>   to their PGP keys and submits actual attestation when they get back to 
>   their workstation

Unfortunately this is a challenging work flow for a lot of reasons. :(

> - developer configures their smartcard to require a PIN during each 
>   operation and disables the pgp-agent; sending a series of 40 patches 
>   requires a single PIN entry instead of 40

This is certainly my situation, my PGP key lives in a yubikey token
configured for physical presence to confirm. :\

> - developer submits a v1 of the patch that they don't expect to pass on
>   the first try and doesn't bother submitting attestation; shockingly,
>   the maintainer accepts it as-is and the developer can attest their 
>   patches post-fact *without* needing to collect all the acked-by's 
>   reviewed-by's etc from all others who have already responded to the v1 
>   submission

But there won't be tags in this case, so how do we learn the original
attestation-id to find the detatched signature?

> For example, a maintainer will almost certainly edit the message content 
> to add their own Signed-off-by, and may edit the patch for minor 
> nitpicking. 

The i/p/m will always change once in git:
   - The commit message is always changed, at least to add sign off
   - The email Subject is always changed to strip [PATCH xxx]
   - The diff is almost always changed because the patch is applied
     to a different tree. The git blob IDs will be different at a
     minimum, at a maximum the context lines will be different

If we know the expected ID then you could do some fuzzy scheme to
cancel out, or at least check, the differences..

We have many patches now that have Link: trailer. I think it would be
useful to run some analysis:
 - Fetch the original patch email from the Link and compute the
   detached signature hashes
 - Attempt to validate this sigature against the git commit
 - Is there a fuzzy algorithm that brings this rate higher?

What is the pass/fail rate? It would say a lot about the strength of
this scheme through to the final git commit and thus if post-fact is
relevant or not.

> Full i-m-p attestation will fail in this case, but we can 
> then query the signatures archive for each individual hash to identify 
> which part of the submission fails validation:
> https://lore.kernel.org/signatures/?q=2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e
>
> This lets us present the maintainer with more useful info, like: "full 
> attestation failed, but the only changed part is actually the message 
> and not the patch content, so it's probably still okay to apply."

How does the message body get changed in transit from the submitter to
the maintainer?

> I still think that one of the key benefits of this proposal is being 
> able to submit patch attestation data post-fact. For signatures included 
> with patches, I'd rather see this happen during the git-format-patch 
> step following Vagard Nossum's work of fully reconstructing commits from 
> patches -- see my email to the git list here:
> https://lore.kernel.org/git/20200226200929.z4aej74ohbkgcdza@chatter.i7.local/T/#u

It is too bad that the email flow for git cannot, at least optionally,
reconstruct losslessly the original commit hashes. It would be very
nice if this were possible.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-26 21:04     ` Jason Gunthorpe
@ 2020-02-26 21:18       ` Konstantin Ryabitsev
  2020-02-27  1:23         ` Jason Gunthorpe
  0 siblings, 1 reply; 17+ messages in thread
From: Konstantin Ryabitsev @ 2020-02-26 21:18 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: workflows

On Wed, Feb 26, 2020 at 05:04:42PM -0400, Jason Gunthorpe wrote:
> > - developer does all their work on a remote VM that doesn't have 
> > access to their PGP keys and submits actual attestation when they 
> > get back to their workstation
> 
> Unfortunately this is a challenging work flow for a lot of reasons. :(

Can you describe why? I would expect that this is done fairly routinely 
due to having more compute power on a remote VM to run various tests.

> > - developer configures their smartcard to require a PIN during each 
> > operation and disables the pgp-agent; sending a series of 40 patches 
> > requires a single PIN entry instead of 40
> 
> This is certainly my situation, my PGP key lives in a yubikey token
> configured for physical presence to confirm. :\

I'm in the same boat. :)

> > - developer submits a v1 of the patch that they don't expect to pass on
> >   the first try and doesn't bother submitting attestation; shockingly,
> >   the maintainer accepts it as-is and the developer can attest their 
> >   patches post-fact *without* needing to collect all the acked-by's 
> >   reviewed-by's etc from all others who have already responded to the v1 
> >   submission
> 
> But there won't be tags in this case, so how do we learn the original
> attestation-id to find the detatched signature?

The attestation would be performed before all the follow-up tags are 
applied, so the attestation-id would be the same. After the patch is 
applied to a git repository, we can still use the "i" hash to look it up 
(see more below).

> > For example, a maintainer will almost certainly edit the message 
> > content to add their own Signed-off-by, and may edit the patch for 
> > minor nitpicking. 
> 
> The i/p/m will always change once in git:
>    - The commit message is always changed, at least to add sign off
>    - The email Subject is always changed to strip [PATCH xxx]

This is already done by "git mailinfo" so I would expect that the i: 
hash almost never changes, actually, unless the maintainer actually 
edits the subject. Subject + Author + Email are sufficiently unique to 
be able to locate the attestation data of the exact patch.

> If we know the expected ID then you could do some fuzzy scheme to
> cancel out, or at least check, the differences..
> 
> We have many patches now that have Link: trailer. I think it would be
> useful to run some analysis:
>  - Fetch the original patch email from the Link and compute the
>    detached signature hashes
>  - Attempt to validate this sigature against the git commit
>  - Is there a fuzzy algorithm that brings this rate higher?

So, the goal is not really to perform attestation once the patches made 
it into the git tree. I am specifically trying to address the following 
cases:

- Someone poses as a trusted developer and submits a malicious patch
- Someone bribes me to edit a patch on lore.kernel.org to introduce a 
  backdoor

Currently, maintainers have no mechanism to check against either of 
these cases. Adopting end-to-end patch attestation resolves both 
problems.

> > Full i-m-p attestation will fail in this case, but we can then query 
> > the signatures archive for each individual hash to identify which 
> > part of the submission fails validation:
> > https://lore.kernel.org/signatures/?q=2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e
> >
> > This lets us present the maintainer with more useful info, like: "full 
> > attestation failed, but the only changed part is actually the message 
> > and not the patch content, so it's probably still okay to apply."
> 
> How does the message body get changed in transit from the submitter to
> the maintainer?

It shouldn't, because this means that the patch becomes mangled. We do 
perform a single normalization routine, which is converting all '\r\n' 
into '\n'. Any other changes done to bodies mean broken patches.

-K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-26 21:18       ` Konstantin Ryabitsev
@ 2020-02-27  1:23         ` Jason Gunthorpe
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Gunthorpe @ 2020-02-27  1:23 UTC (permalink / raw)
  To: workflows

On Wed, Feb 26, 2020 at 04:18:05PM -0500, Konstantin Ryabitsev wrote:
> On Wed, Feb 26, 2020 at 05:04:42PM -0400, Jason Gunthorpe wrote:
> > > - developer does all their work on a remote VM that doesn't have 
> > > access to their PGP keys and submits actual attestation when they 
> > > get back to their workstation
> > 
> > Unfortunately this is a challenging work flow for a lot of reasons. :(
> 
> Can you describe why? I would expect that this is done fairly routinely 
> due to having more compute power on a remote VM to run various tests.

I've seen enough situations where people have a Linux server and a
Windows laptop. They don't have a Linux desktop, so mixing a local
'secure' PGP key in a windows environment with the remote Linux server
environment is challenging.

If you have a secure 'Linux desktop environment' then I'd imagine
always sending signed emails from there, and just using the server for
test/etc.

Certainly, exposing my email password (aka my cloud account password)
is of even greater risk to me (and my company!) than exposing my PGP
key. I want to keep it on my secure server, encrypted, etc. Actually
I'm trying hard to move all my email access to OAUTH to minimize risks
to my cloud accounts :\

> > > - developer submits a v1 of the patch that they don't expect to pass on
> > >   the first try and doesn't bother submitting attestation; shockingly,
> > >   the maintainer accepts it as-is and the developer can attest their 
> > >   patches post-fact *without* needing to collect all the acked-by's 
> > >   reviewed-by's etc from all others who have already responded to the v1 
> > >   submission
> > 
> > But there won't be tags in this case, so how do we learn the original
> > attestation-id to find the detatched signature?
> 
> The attestation would be performed before all the follow-up tags are 
> applied, so the attestation-id would be the same. After the patch is 
> applied to a git repository, we can still use the "i" hash to look it up 
> (see more below).

I'm not sure, if the '[PATCH xx v5]' is stripped then 'i' is not
unique any more?
 
> > > For example, a maintainer will almost certainly edit the message 
> > > content to add their own Signed-off-by, and may edit the patch for 
> > > minor nitpicking. 
> > 
> > The i/p/m will always change once in git:
> >    - The commit message is always changed, at least to add sign off
> >    - The email Subject is always changed to strip [PATCH xxx]
> 
> This is already done by "git mailinfo" so I would expect that the i:
> hash almost never changes, actually, unless the maintainer actually 
> edits the subject. Subject + Author + Email are sufficiently unique to 
> be able to locate the attestation data of the exact patch.

Ah, using git mailinfos is not how you described 'i':

$ egrep '^(Author|Email|Subject)' i | sha256sum
2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e  -

(also I do edit subjects a fair amount, again it would be interesting
 to see stats on how well this works)

> So, the goal is not really to perform attestation once the patches made 
> it into the git tree. I am specifically trying to address the following 
> cases:
> 
> - Someone poses as a trusted developer and submits a malicious patch
> - Someone bribes me to edit a patch on lore.kernel.org to introduce a 
>   backdoor

Okay, but then post-apply-attestation doesn't help this threat
model. post-apply-attestation is surely only useful if you can check
the signature from the git data?

Like I said it would be very interesting to see data on how well these
signatures could survive, if we could get, what, 80% coverage of git
commits this way then that seems like a powerful argument for this
approach, right?

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-26 17:25 Patch attestation RFC + proof of concept Konstantin Ryabitsev
  2020-02-26 17:50 ` Kees Cook
  2020-02-26 20:11 ` Jason Gunthorpe
@ 2020-02-27  4:11 ` Jason A. Donenfeld
  2020-02-27 10:05   ` Geert Uytterhoeven
  2020-02-27 14:29   ` Konstantin Ryabitsev
  2 siblings, 2 replies; 17+ messages in thread
From: Jason A. Donenfeld @ 2020-02-27  4:11 UTC (permalink / raw)
  To: workflows; +Cc: Konstantin Ryabitsev

Hi Konstantin,

A few comments on the proposal, inline below:

On Wed, Feb 26, 2020 at 12:25:02PM -0500, Konstantin Ryabitsev wrote:

> 4. gpg --locate-keys kees@kernel.org

I suppose you have in mind WKD instead of the now-DoS-broken key
servers?
 
> $ egrep '^(Author|Email|Subject)' i | sha256sum
> 2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e  -

What if there are multiple such lines? What if there's
Subject-justkidding? That grep doesn't use a colon in it. Are there
other malleability issues to account for here?

> To calculate the "p" hash, we first need to remove any surrounding junk 
> that isn't just the patch itself.

This has to be done extremely carefully, and it's probably worth reading
the source of git apply and gnu patch and such to find any edge cases.
Again, potential malleability issues which could be exploitable if not
done carefully.

> We then take the first 8 characters of the i-m-p hash to create the
> attestation-ID: 2a02abe0-215cf3f1-2acb5798.

It's now easy to forge collisions and complicate the lookup process by
forcing clients to do lots of trial hashing and lore fetching. Remember:
anybody can submit to the signatures list. If you want an attestation ID
that uses each of the three hashes, hash the hashes together instead of
truncating and concatenating them.


Shouldn't this step:

> - the developer executes "git-send-email"

be after these steps:

> - the developer runs "attest-patches -a *.patches"
> - the developer sends attestation.eml to signatures@kernel.org
>   (or the tool auto-POSTs it to the submission URL, as mentioned)

? Otherwise there's a small race window.


> - if attestation passes, get-lore-mbox adds two trailers to each patch: 
>   "Attestation-by:" and "Attestation-verified:". In our example case 
>   those are:
>   Attestation-by: Kees Cook <kees@kernel.org> (pgp:8972F4DFDC6DC026)
>   Attestation-verified: Konstantin Ryabitsev <konstantin@linuxfoundation.org>

What do these lines actually add that is useful? The Signed-off-by is
already a non-cryptographic trail of which maintainers patches flow
through. These two add another, also non-cryptographic, summary. If
they're forgible, then what's the point? Seems like clutter. If
anything, these signatures help maintainers verify that the sender
actually sent it before they merge it into their trees. But after that,
those commit messages could contain anything, including bogus
"Attestation-by" lines. It seems like it's basically just tinfoil virtue
signaling at that point? Ostensibly the maintainers should also be
reading the patches for content anyway...

> Okay, what do you all think? I believe this scheme has the following 
> merits:
> Downsides:
> 
> - we aren't solving the problem of delegated trust, which will continue 
>   to be the hardest part behind any distributed development effort
> 

Another odd quirk worth considering: vanilla patches aren't tied to a
specific commit base, so there could be a "replay attack" where an
attacker resends an old patch that still applies without issue, but
means a different thing in the present state of the tree. For example,
Alice sends patch P in November 2019. Bob discovers it causes a remotely
exploitable vulnerability in December 2019 and submits a revert patch.
The seasons change a few times, and it's now March 2025, maintainers
have changed a bit, but the code is still mostly the same. Eve resubmits
P which has Alice's name on it. Signature verifies. Doom ensues. Real
git pgp signing involves a signature over the whole object, which
contains the hash of the parent, which avoids this issue.

General reflection: this is solving a odd "problem". As mentioned above,
maintainers should be reading commits as they come in on the mailing
list. There might be a practical argument to be made, though, that
sometimes they trust things on face value, even emailed commits, and so
this gives some extra assurance. I also wonder if this adds even more
overhead to what hipsters are already decrying as an overly clunky
contribution process. Even more generally, is the lack of signatures on
email patches a real problem we're facing?

Regards,
Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-27  4:11 ` Jason A. Donenfeld
@ 2020-02-27 10:05   ` Geert Uytterhoeven
  2020-02-27 13:30     ` Jason A. Donenfeld
  2020-02-27 14:29   ` Konstantin Ryabitsev
  1 sibling, 1 reply; 17+ messages in thread
From: Geert Uytterhoeven @ 2020-02-27 10:05 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: workflows, Konstantin Ryabitsev

Hi Jason,

On Thu, Feb 27, 2020 at 5:13 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Another odd quirk worth considering: vanilla patches aren't tied to a
> specific commit base, so there could be a "replay attack" where an
> attacker resends an old patch that still applies without issue, but
> means a different thing in the present state of the tree. For example,
> Alice sends patch P in November 2019. Bob discovers it causes a remotely
> exploitable vulnerability in December 2019 and submits a revert patch.
> The seasons change a few times, and it's now March 2025, maintainers
> have changed a bit, but the code is still mostly the same. Eve resubmits
> P which has Alice's name on it. Signature verifies. Doom ensues. Real
> git pgp signing involves a signature over the whole object, which
> contains the hash of the parent, which avoids this issue.

How would the commit base help here?  It would indicate this is an old
patch, which would be indicated by the signature date, too.

The only thing that would help is time-limiting the window between
attestation and application.  So when applying an old patch, it has to
be attested again by the original author.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-27 10:05   ` Geert Uytterhoeven
@ 2020-02-27 13:30     ` Jason A. Donenfeld
  0 siblings, 0 replies; 17+ messages in thread
From: Jason A. Donenfeld @ 2020-02-27 13:30 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: workflows, Konstantin Ryabitsev

On Thu, Feb 27, 2020 at 6:05 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> How would the commit base help here?  It would indicate this is an old
> patch, which would be indicated by the signature date, too.

For email, not much, since the patch is always disconnected. The point
is that this isn't a problem when verifying commits inside of git
itself because the signatures are over the commit's position in the
tree, so you can't reorder or rearrange commits. Not necessarily an
applicable solution here, but worth noting that other setups don't
encounter the same problem due to other, larger, design decisions.

> The only thing that would help is time-limiting the window between
> attestation and application.

Sure, one can draw up a few bandaids for this, such as: big red text
saying "warning, this commit is kind of old", which of course means
its date needs to be included in the metadata signature, and accurate
too. Maybe there are other bandaids. Or this is just a fundamental
issue with disconnected by-email patches that we'll have to live with.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-27  4:11 ` Jason A. Donenfeld
  2020-02-27 10:05   ` Geert Uytterhoeven
@ 2020-02-27 14:29   ` Konstantin Ryabitsev
  2020-02-28  1:57     ` Jason A. Donenfeld
  1 sibling, 1 reply; 17+ messages in thread
From: Konstantin Ryabitsev @ 2020-02-27 14:29 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: workflows

On Thu, Feb 27, 2020 at 12:11:44PM +0800, Jason A. Donenfeld wrote:
> Hi Konstantin,
> 
> A few comments on the proposal, inline below:
> 
> On Wed, Feb 26, 2020 at 12:25:02PM -0500, Konstantin Ryabitsev wrote:
> 
> > 4. gpg --locate-keys kees@kernel.org
> 
> I suppose you have in mind WKD instead of the now-DoS-broken key
> servers?

I am not fond of WKD because it stupidly strips all UIDs that don't 
match kernel.org. In fact, if someone has a kernel.org account but 
didn't put their kernel.org UID on the key, the WKD lookup will fail.  
This is why I've been keeping pgpkeys.git repository around.
https://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git

> > $ egrep '^(Author|Email|Subject)' i | sha256sum
> > 2a02abe02216f626105622aee2f26ab10c155b6442e23441d90fc5fe4071b86e  -
> 
> What if there are multiple such lines? What if there's
> Subject-justkidding? That grep doesn't use a colon in it. Are there
> other malleability issues to account for here?

I leave all this to be handled by git-mailinfo. We don't process the 
message directly, and whatever is returned by git-mailinfo is what git 
decides to use when you run "git am".

> > To calculate the "p" hash, we first need to remove any surrounding 
> > junk that isn't just the patch itself.
> 
> This has to be done extremely carefully, and it's probably worth reading
> the source of git apply and gnu patch and such to find any edge cases.
> Again, potential malleability issues which could be exploitable if not
> done carefully.

Yes, fully agreed. I am hoping to convince git folks that we need this 
functionality natively present in git-mailinfo:
https://lore.kernel.org/git/20200221171312.xyzsrvebuwiw6pgj@chatter.i7.local/

To reiterate, I want to use whatever git uses and make as few design 
decisions as possible.

> > We then take the first 8 characters of the i-m-p hash to create the
> > attestation-ID: 2a02abe0-215cf3f1-2acb5798.
> 
> It's now easy to forge collisions and complicate the lookup process by
> forcing clients to do lots of trial hashing and lore fetching. Remember:
> anybody can submit to the signatures list. If you want an attestation ID
> that uses each of the three hashes, hash the hashes together instead of
> truncating and concatenating them.

This doesn't really concern me as much. Anyone can submit attestation 
signatures, true, but we throw out anything that doesn't validate and 
also any PGP keys where none of the UIDs match the From: header. So, if 
a malicious person sends a bunch of attestations, then they will just be 
noise and fail at the "gpg --verify" stage because you won't have that 
key on your keyring.

> Shouldn't this step:
> 
> > - the developer executes "git-send-email"
> 
> be after these steps:
> 
> > - the developer runs "attest-patches -a *.patches"
> > - the developer sends attestation.eml to signatures@kernel.org
> >   (or the tool auto-POSTs it to the submission URL, as mentioned)
> 
> ? Otherwise there's a small race window.

Why does it matter? I envision cases where you'd send attestation full 
days or weeks later. As long as what the maintainer received matches 
what is on the developer's system, the validity time window can be large 
enough to cover a few weeks.

> > - if attestation passes, get-lore-mbox adds two trailers to each 
> > patch: "Attestation-by:" and "Attestation-verified:". In our example 
> > case those are:
> >   Attestation-by: Kees Cook <kees@kernel.org> (pgp:8972F4DFDC6DC026)
> >   Attestation-verified: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
> 
> What do these lines actually add that is useful? The Signed-off-by is
> already a non-cryptographic trail of which maintainers patches flow
> through. These two add another, also non-cryptographic, summary. If
> they're forgible, then what's the point? Seems like clutter. If
> anything, these signatures help maintainers verify that the sender
> actually sent it before they merge it into their trees. But after that,
> those commit messages could contain anything, including bogus
> "Attestation-by" lines. It seems like it's basically just tinfoil virtue
> signaling at that point? Ostensibly the maintainers should also be
> reading the patches for content anyway...

I agree that Attestation-verified is redundant with Signed-off-by. I 
think Attestation-by is helpful for the purposes of seeing the 
attestation chain. If Developer A sent patches to Submaintainer B, and 
Submaintainer B sent it to Maintainer C, these headers will help us 
figure out which steps were attested. If you only see:

Attestation-by: Submaintainer B

You'll know that there probably was no attestation submitted/checked for 
when the patches travelled between Developer A and Submaintainer B.

However, I don't feel strongly about this. If the community decides that 
these trailers are not useful, we can drop them. Most of this can be 
otherwise traced via Link: trailers.

> > Okay, what do you all think? I believe this scheme has the following 
> > merits:
> > Downsides:
> > 
> > - we aren't solving the problem of delegated trust, which will continue 
> >   to be the hardest part behind any distributed development effort
> > 
> 
> Another odd quirk worth considering: vanilla patches aren't tied to a
> specific commit base, so there could be a "replay attack" where an
> attacker resends an old patch that still applies without issue, but
> means a different thing in the present state of the tree. For example,
> Alice sends patch P in November 2019. Bob discovers it causes a remotely
> exploitable vulnerability in December 2019 and submits a revert patch.
> The seasons change a few times, and it's now March 2025, maintainers
> have changed a bit, but the code is still mostly the same. Eve resubmits
> P which has Alice's name on it. Signature verifies. Doom ensues. Real
> git pgp signing involves a signature over the whole object, which
> contains the hash of the parent, which avoids this issue.

PGP signatures include time of signing, so this is covered:

$ curl -s https://lore.kernel.org/signatures/202002251425.E7847687B@keescook/raw | gpg --verify
gpg: Signature made Tue 25 Feb 2020 05:25:19 PM EST
gpg:                using RSA key A5C3F68F229DD60F723E6E138972F4DFDC6DC026
gpg: Good signature from "Kees Cook <kees@outflux.net>" [full]

The proof-of-concept tool doesn't do anything with this info, but the 
real tool can implement a "window of validity" check that would reject 
attestation if signatures are too old.

> General reflection: this is solving a odd "problem". As mentioned above,
> maintainers should be reading commits as they come in on the mailing
> list. There might be a practical argument to be made, though, that
> sometimes they trust things on face value, even emailed commits, and so
> this gives some extra assurance. I also wonder if this adds even more
> overhead to what hipsters are already decrying as an overly clunky
> contribution process. Even more generally, is the lack of signatures on
> email patches a real problem we're facing?

So, I have two risk scenarios here:

1. An overworked, tired maintainer that sees the name of a trusted 
   developer on the From: line and doesn't go over the code as 
   thoroughly as they should. I share the "constant vigilance" 
   sentiment, but I also know that humans are fallible and giving 
   maintainers an auxiliary mechanism to verify trust chains will do 
   more good than harm.

2. A maintainer reviews patches that arrived in their mailbox and finds 
   them perfectly good. Then they run "get-lore-mbox" or "git pw" to get 
   these patches in a format that's easy to apply to their git repo, but 
   they suddenly get a *slightly different* set of patches, because now 
   we explicitly trust that whoever is in charge of 
   lore.kernel.org/patchwork.kernel.org isn't being malicious. I don't 
   want us to trust any piece of infrastructure that sits between the 
   developer and the maintainer.

I hope that the procedure I'm proposing is convenient and unobtrusive 
enough to be accepted and used. In my view, it is not significantly more 
cumbersome than adding signatures to git tags.

-K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-27 14:29   ` Konstantin Ryabitsev
@ 2020-02-28  1:57     ` Jason A. Donenfeld
  2020-02-28  2:30       ` Jason A. Donenfeld
                         ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Jason A. Donenfeld @ 2020-02-28  1:57 UTC (permalink / raw)
  To: Jason A. Donenfeld, workflows

On Thu, Feb 27, 2020 at 10:29 PM Konstantin Ryabitsev
<konstantin@linuxfoundation.org> wrote:
> This is why I've been keeping pgpkeys.git repository around.
> https://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git

Ah, right, that seems good. There's also signify, by the way, if you
ever feel the urge to jump ship from pgp. We can add cgit support for
it too.

> I leave all this to be handled by git-mailinfo. We don't process the
> message directly, and whatever is returned by git-mailinfo is what git
> decides to use when you run "git am".
> Yes, fully agreed. I am hoping to convince git folks that we need this
> functionality natively present in git-mailinfo:
> https://lore.kernel.org/git/20200221171312.xyzsrvebuwiw6pgj@chatter.i7.local/
> To reiterate, I want to use whatever git uses and make as few design
> decisions as possible.

Probably wise. But it doesn't really "solve" the problem, since
somebody still needs to write the code to have the set of properties
you care about for signatures. No matter who you foist the
responsibility on to, you'll probably need to audit it pretty
carefully.

> This doesn't really concern me as much. Anyone can submit attestation
> signatures, true, but we throw out anything that doesn't validate and
> also any PGP keys where none of the UIDs match the From: header. So, if
> a malicious person sends a bunch of attestations, then they will just be
> noise and fail at the "gpg --verify" stage because you won't have that
> key on your keyring.

That From header is forgeable too, by the way. (Some patches even have
two From headers -- one in the body of the email for git and one that
actually sent the email.)

> Why does it matter?

You send the patch to lkml. Then your Internet cuts out while the
firemen deal with an unidentified creature stuck in a nearby tree
meowing loudly. Finally it's back on and you submit the attestation.
However, while the cat was ruining your workflow, tglx went to apply
your patch but wasn't able to find the signature email, since you
hadn't posted it yet. Hence, a better flow is to just post the
signature email first.


> I agree that Attestation-verified is redundant with Signed-off-by. I
> think Attestation-by is helpful for the purposes of seeing the
> attestation chain. If Developer A sent patches to Submaintainer B, and
> Submaintainer B sent it to Maintainer C, these headers will help us
> figure out which steps were attested. If you only see:
>
> Attestation-by: Submaintainer B
>
> You'll know that there probably was no attestation submitted/checked for
> when the patches travelled between Developer A and Submaintainer B.
>
> However, I don't feel strongly about this. If the community decides that
> these trailers are not useful, we can drop them. Most of this can be
> otherwise traced via Link: trailers.

I think this too is redundant and mostly theater. Those trailers
aren't authenticated, so they communicate basically nothing useful
except, "I'm one of the cool maintainers who use this thing!" For
everyone else, it's just clutter.

> The proof-of-concept tool doesn't do anything with this info, but the
> real tool can implement a "window of validity" check that would reject
> attestation if signatures are too old.

Or at least warn in bright red or something. But do note: this is
mostly a compromise-hack to work around a fundamental design issue
here, and it doesn't actually solve all cases; i.e. it depends on
humans having some short term memory during the validity period about
what's going on too.

> 2. A maintainer reviews patches that arrived in their mailbox and finds
>    them perfectly good. Then they run "get-lore-mbox" or "git pw" to get
>    these patches in a format that's easy to apply to their git repo, but
>    they suddenly get a *slightly different* set of patches, because now
>    we explicitly trust that whoever is in charge of
>    lore.kernel.org/patchwork.kernel.org isn't being malicious.

It doesn't help with the reverse scenario, where what's in the
developer's inbox or displayed on the lore webserver running a
backdoored nginx doesn't match the patches that they eventually apply.
That might seem like an unrealistic attack scenario, but then think
about it combined with the get-lore-mbox feature to grab the latest
patchset version and other similar shenanigans. Or, maybe Eve reposts
v1 as v20, and this mailing list thread convinces you to ignore the
[PATCH vX] from the metadata hash, or maintainers don't care about
that hash verifying anyway since it tends to get mangled.


I like that this all uses email and simple python command line tools
and whatnot. But the whole scheme just seems kind of brittle and
clunky.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-28  1:57     ` Jason A. Donenfeld
@ 2020-02-28  2:30       ` Jason A. Donenfeld
  2020-02-28 18:33         ` Konstantin Ryabitsev
  2020-02-28 17:54       ` Konstantin Ryabitsev
  2020-03-06 16:53       ` Geert Uytterhoeven
  2 siblings, 1 reply; 17+ messages in thread
From: Jason A. Donenfeld @ 2020-02-28  2:30 UTC (permalink / raw)
  To: workflows

Some constructive suggestions that might address some (but not all!)
maliability concerns and some clunkiness concerns:

A. What data are you signing?

Your current approach is to split up the email into parts, canonicalize
them, hash them, and then sign those hashes. Instead you could actually
more easily canonicalize the applied email. That is: when you have a
commit in a git tree, you can always git-format-patch it into the same
format. So, if you git-am an email, and then git-format-patch it out,
you'll get some standard format. You could insist all signatures are
made over this standard format. There's still the same set of attacks I
mentioned earlier here, but it's a bit less frail.

B. How are you communicating the signatures?

Your approach sticks these on a separate mailing list, linked by some
hash prefixes. Two approaches that would make the whole thing a lot less
clunky:

1. Include it as a separate part in a multi-part mime message. Lore web
interface could bury it someplace reasonable. vger could learn to accept
these parts, and since hashing is already mega fast, it could even
validate that the hashes are correct and reject emails with bad (or
missing) hashes. (I'm not suggesting validating the signature, but
rather just the hashes.)

2. Switch to using the tiny ed25519 signatures provided by
signify/minisign -- which has numerous benefits over gpg alone -- and
stick it in the mail headers. This is something git-send-email could
learn to add.

X-Git-Format-Patch-Hash: aC1ywMbaJpiMFJY7vK/62eBKtrgKiVIXFHa+WPQwBJk=
X-Git-Format-Patch-Ed25519-Signature: aSscBu2pXbIEDCuRZ7E0uKWVsE5SitNM8UA44tuFc/rg3GQwv5Ur/mpOk2mQJbT6dPDghuxJ1gwZKAZK20BXAQ==

I prefer (2), but (1) would be acceptable if you're some how wedded to
pgp and I can't talk you out of it. I can write whatever cryptographic
code we need to do (2), but I suspect signify/minisign have everything
we need.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-28  1:57     ` Jason A. Donenfeld
  2020-02-28  2:30       ` Jason A. Donenfeld
@ 2020-02-28 17:54       ` Konstantin Ryabitsev
  2020-03-06 16:53       ` Geert Uytterhoeven
  2 siblings, 0 replies; 17+ messages in thread
From: Konstantin Ryabitsev @ 2020-02-28 17:54 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: workflows

On Fri, Feb 28, 2020 at 09:57:47AM +0800, Jason A. Donenfeld wrote:
> On Thu, Feb 27, 2020 at 10:29 PM Konstantin Ryabitsev
> <konstantin@linuxfoundation.org> wrote:
> > This is why I've been keeping pgpkeys.git repository around.
> > https://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git
> 
> Ah, right, that seems good. There's also signify, by the way, if you
> ever feel the urge to jump ship from pgp. We can add cgit support for
> it too.

I like minisign/signify, but not for this. It would be a great addition 
to signing tarballs and other project releases, but it is poorly suited 
for inter-developer attestation, for the following reasons:

- There is no key management framework. If I lose access to my key, or 
  it's stolen, or whatever else, I have no mechanism for revoking the 
  old key beyond sending out a "please don't trust my old key." PGP is 
  not great for this either, because we have to rely on the rotting
  remains of the keyservers network, but at least there is *some* 
  mechanism for propagating key revocation.
- There is no support for cross-attestation, so owner trust must be 
  bootstrapped via some other mechanism -- either via TOFU, or via 
  delegating it to HTTPS (I trust that this key belongs to Alex Dev 
  because I downloaded it from https://alexdev.foo). Everyone likes to 
  dump on the Web of Trust because it is hard to grok and can't possibly 
  scale, but for niche applications like development communities it 
  actually works reasonably well and serves its purpose.
- There is no support for offloading crypto operations to specialized 
  portable hardware tokens (Yubikeys, Nitrokeys, whatnot). This means 
  that unless you always protect your minisign key with a passphrase, it 
  can leak off your disk in any number of ways (careless backups, 
  fat-fingered rsync jobs, vulnerable browsers, etc). If you *do* 
  protect your key with a strong passphrase, though...
- There is no support for any kind of PIN entry agent. If you need to 
  invoke minisign 50 times, then you will be typing in that passphrase 
  50 times. Or you will be copy-pasting it, which means it'll be 
  floating in your copypaste buffer for hours or days after you're done.

Sure, the answer to many of the above is "provide wrapper tooling that 
does what you need" -- but if we provide that kind of tooling for 
minisign, then we will end up with something that is similar to GnuPG 
and everyone will start hating on it just as much as they hate on PGP 
(me included). :)

PGP sucks not because its core concepts suck -- it sucks because 
implementing delegated trust in something resembling a secure way is 
super hard to do. That, and because GnuPG tries to remain compatible 
with all of the legacy crypto of the past 30 years. If it drops support 
for that legacy, it will start sucking a lot less.

The only way I see being able to use signify-style signatures is if 
there is broad adoption of did:git or any number of other nascent 
"sovereign identity" proposals. At this time, PGP/GnuPG is the best tool 
for decentralized delegated trust that we have.

> > This doesn't really concern me as much. Anyone can submit 
> > attestation
> > signatures, true, but we throw out anything that doesn't validate and
> > also any PGP keys where none of the UIDs match the From: header. So, if
> > a malicious person sends a bunch of attestations, then they will just be
> > noise and fail at the "gpg --verify" stage because you won't have that
> > key on your keyring.
> 
> That From header is forgeable too, by the way. (Some patches even have
> two From headers -- one in the body of the email for git and one that
> actually sent the email.)

Right, git-mailinfo will pick the one that's in the body. To reiterate, 
git-mailinfo returns whatever Git decides to use for its commit 
purposes, which is why I defer to it for most decisions.

> > Why does it matter?
> 
> You send the patch to lkml. Then your Internet cuts out while the
> firemen deal with an unidentified creature stuck in a nearby tree
> meowing loudly. Finally it's back on and you submit the attestation.
> However, while the cat was ruining your workflow, tglx went to apply
> your patch but wasn't able to find the signature email, since you
> hadn't posted it yet. Hence, a better flow is to just post the
> signature email first.

I'd expect him to treat it in the same way as "you forgot to 
Signed-off-by your patches, please resubmit." However, signing before 
you send is fine, too, I'm not picky. :)

> > I agree that Attestation-verified is redundant with Signed-off-by. I
> > think Attestation-by is helpful for the purposes of seeing the
> > attestation chain. If Developer A sent patches to Submaintainer B, and
> > Submaintainer B sent it to Maintainer C, these headers will help us
> > figure out which steps were attested. If you only see:
> >
> > Attestation-by: Submaintainer B
> >
> > You'll know that there probably was no attestation submitted/checked for
> > when the patches travelled between Developer A and Submaintainer B.
> >
> > However, I don't feel strongly about this. If the community decides that
> > these trailers are not useful, we can drop them. Most of this can be
> > otherwise traced via Link: trailers.
> 
> I think this too is redundant and mostly theater. Those trailers
> aren't authenticated, so they communicate basically nothing useful
> except, "I'm one of the cool maintainers who use this thing!" For
> everyone else, it's just clutter.

Okay, like I said, I don't feel strongly about it.

> I like that this all uses email and simple python command line tools
> and whatnot. But the whole scheme just seems kind of brittle and
> clunky.

It's not any more brittle and clunky than using email for patches. :)

-K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-28  2:30       ` Jason A. Donenfeld
@ 2020-02-28 18:33         ` Konstantin Ryabitsev
  0 siblings, 0 replies; 17+ messages in thread
From: Konstantin Ryabitsev @ 2020-02-28 18:33 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: workflows

On Fri, Feb 28, 2020 at 10:30:23AM +0800, Jason A. Donenfeld wrote:
> Some constructive suggestions that might address some (but not all!)
> maliability concerns and some clunkiness concerns:
> 
> A. What data are you signing?
> 
> Your current approach is to split up the email into parts, canonicalize
> them, hash them, and then sign those hashes. Instead you could actually
> more easily canonicalize the applied email. That is: when you have a
> commit in a git tree, you can always git-format-patch it into the same
> format. So, if you git-am an email, and then git-format-patch it out,
> you'll get some standard format. You could insist all signatures are
> made over this standard format. There's still the same set of attacks I
> mentioned earlier here, but it's a bit less frail.

This won't work because we can't count on the "Date" field to remain the 
same between when "git-format-patch" is run on the developer's 
workstation vs. when it's run on the maintainer's workstation. The 
outputs will not be the same.

Furthermore, we need to be able to git-am the patches in order to attest 
them, which poses the following difficulties:

- we need to have the git tree locally
- we need to be able to fork it
- we need to be able to cleanly git-am the patches, which means that
- we need to know at least the base-commit info

This will also break down if series have inter-dependencies that require 
that Series A is applied before Series B can be cleanly applied.

> B. How are you communicating the signatures?
> 
> Your approach sticks these on a separate mailing list, linked by some
> hash prefixes. Two approaches that would make the whole thing a lot less
> clunky:

I don't think it's clunky, because we're gearing up to spend a lot of 
effort enhancing public-inbox cross-list search capabilities. It's 
clunky *right now* because the proof-of-concept script uses a 
search-based atom feed to look up attestation messages, but that 
interface should be dramatically better in the near future. Looking up 
attestation will be a REST call to lore.kernel.org. Once that happens, 
we won't need to care on which list the attestation document is present.  
It can just as well exist on the developer's personal outbox feed -- 
lore.kernel.org or any of its mirrors will be able to find it based on 
the attestation ID.

> 1. Include it as a separate part in a multi-part mime message. Lore web
> interface could bury it someplace reasonable. vger could learn to accept
> these parts, and since hashing is already mega fast, it could even
> validate that the hashes are correct and reject emails with bad (or
> missing) hashes. (I'm not suggesting validating the signature, but
> rather just the hashes.)

Sure, mime-attaching it to the cover letter is one of the options I 
mentioned, but if we call it something other than text/plain, mailman 
will probably strip it off (or corporate MTAs will helpfully quarantine 
it); and if we stick with text/plain, everyone will see it in their mail 
client anyway, so there is little reason to mime-attach it in the first 
place. I'm actually considering making it possible to send it as a 
separate message in the series:

[PATCH 00/10] Implement adding foo to bar
[PATCH 01/10] Move foo into libfoo
...
[PATCH 10/10] Remove old bar interface
[PSIGN XX/10] Attestation: Implement adding foo to bar

I'm just worried that this will create too much clutter, especially on 
lists that mostly see one-off patches. I can imagine it will be annoying 
if most of the stuff you receive is:

[PATCH] Add foo to bar
[PSIGN] Add foo to bar

So, I'm sticking it into signatures@kernel.org for now and using the 
search interface to find it.

> 2. Switch to using the tiny ed25519 signatures provided by
> signify/minisign -- which has numerous benefits over gpg alone -- and
> stick it in the mail headers. This is something git-send-email could
> learn to add.
> 
> X-Git-Format-Patch-Hash: aC1ywMbaJpiMFJY7vK/62eBKtrgKiVIXFHa+WPQwBJk=
> X-Git-Format-Patch-Ed25519-Signature: aSscBu2pXbIEDCuRZ7E0uKWVsE5SitNM8UA44tuFc/rg3GQwv5Ur/mpOk2mQJbT6dPDghuxJ1gwZKAZK20BXAQ==
> 
> I prefer (2), but (1) would be acceptable if you're some how wedded to
> pgp and I can't talk you out of it. I can write whatever cryptographic
> code we need to do (2), but I suspect signify/minisign have everything
> we need.

I'd much rather that git gained the ability to incorporate full commit 
metadata into git-format-patch output. Then we won't have to invent 
anything at all -- we can just "git am" the series, then "git 
verify-commit HEAD" and let git handle all the crypto for us, whatever 
it happens to be (pgp, s/mime, minisign, or any other crypto ability it 
gains in the future). There are still problems with this approach (see 
my comments about git-am above), but at least we can work with this and 
depend on it not suddenly breaking.

However, quite a few of existing developers and maintainers don't work 
directly with git -- they prefer to queue things up with quilt or 
similar tools, so we cannot approach this purely from the git workflow 
perspective. For this reason, I'm pursuing both strategies:

- make git-format-patch/git-am a lossless operation
- provide an out-of-band attestation mechanism


Best,
-K

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Patch attestation RFC + proof of concept
  2020-02-28  1:57     ` Jason A. Donenfeld
  2020-02-28  2:30       ` Jason A. Donenfeld
  2020-02-28 17:54       ` Konstantin Ryabitsev
@ 2020-03-06 16:53       ` Geert Uytterhoeven
  2 siblings, 0 replies; 17+ messages in thread
From: Geert Uytterhoeven @ 2020-03-06 16:53 UTC (permalink / raw)
  To: Jason A. Donenfeld, Konstantin Ryabitsev; +Cc: workflows

On Fri, Feb 28, 2020 at 2:58 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> It doesn't help with the reverse scenario, where what's in the
> developer's inbox or displayed on the lore webserver running a
> backdoored nginx doesn't match the patches that they eventually apply.
> That might seem like an unrealistic attack scenario, but then think
> about it combined with the get-lore-mbox feature to grab the latest
> patchset version and other similar shenanigans. Or, maybe Eve reposts
> v1 as v20, and this mailing list thread convinces you to ignore the
> [PATCH vX] from the metadata hash, or maintainers don't care about
> that hash verifying anyway since it tends to get mangled.

Or different patches with the same subject.

Recently, I accidentally send out a patch with the wrong subject, which
matched an older patch.

"get-lore-mbox.py 20200218112557.5924-1-geert+renesas@glider.be"
downloads the new thread, plus the email with the old patch with the same
one-line summary.

Worse, "get-lore-mbox.py -a 20200218112557.5924-1-geert+renesas@glider.be"
selects the old patch instead of the new one, despite the exact Message-ID
match on the command line.  Fortunately "git am" complained, as the old
patch had already been applied.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-03-06 16:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-26 17:25 Patch attestation RFC + proof of concept Konstantin Ryabitsev
2020-02-26 17:50 ` Kees Cook
2020-02-26 18:47   ` Konstantin Ryabitsev
2020-02-26 20:11 ` Jason Gunthorpe
2020-02-26 20:42   ` Konstantin Ryabitsev
2020-02-26 21:04     ` Jason Gunthorpe
2020-02-26 21:18       ` Konstantin Ryabitsev
2020-02-27  1:23         ` Jason Gunthorpe
2020-02-27  4:11 ` Jason A. Donenfeld
2020-02-27 10:05   ` Geert Uytterhoeven
2020-02-27 13:30     ` Jason A. Donenfeld
2020-02-27 14:29   ` Konstantin Ryabitsev
2020-02-28  1:57     ` Jason A. Donenfeld
2020-02-28  2:30       ` Jason A. Donenfeld
2020-02-28 18:33         ` Konstantin Ryabitsev
2020-02-28 17:54       ` Konstantin Ryabitsev
2020-03-06 16:53       ` Geert Uytterhoeven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.