linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Crypto Mailing List <linux-crypto@vger.kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	David Miller <davem@davemloft.net>,
	Greg KH <gregkh@linuxfoundation.org>,
	"Jason A . Donenfeld" <Jason@zx2c4.com>,
	Samuel Neves <sneves@dei.uc.pt>,
	Dan Carpenter <dan.carpenter@oracle.com>,
	Arnd Bergmann <arnd@arndb.de>, Eric Biggers <ebiggers@google.com>,
	Andy Lutomirski <luto@kernel.org>, Will Deacon <will@kernel.org>,
	Marc Zyngier <maz@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>
Subject: Re: [RFC PATCH 18/18] net: wireguard - switch to crypto API for packet encryption
Date: Thu, 26 Sep 2019 13:06:26 +0200	[thread overview]
Message-ID: <CAKv+Gu_YOqvqJ4YC=ixBh-v4fiFTFNpEagHiTRU7Oq4PrhJPkw@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wjYsbxSiV_XKWV3BwGvau_hUvQiQHLOoc7vLUZt0Wqzfw@mail.gmail.com>

On Thu, 26 Sep 2019 at 00:15, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Sep 25, 2019 at 9:14 AM Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
> >
> > Replace the chacha20poly1305() library calls with invocations of the
> > RFC7539 AEAD, as implemented by the generic chacha20poly1305 template.
>
> Honestly, the other patches look fine to me from what I've seen (with
> the small note I had in a separate email for 11/18), but this one I
> consider just nasty, and a prime example of why people hate those
> crypto lookup routines.
>
> Some of it is just the fundamental and pointless silly indirection,
> that just makes things harder to read, less efficient, and less
> straightforward.
>
> That's exemplified by this part of the patch:
>
> >  struct noise_symmetric_key {
> > -       u8 key[NOISE_SYMMETRIC_KEY_LEN];
> > +       struct crypto_aead *tfm;
>
> which is just one of those "we know what we want and we just want to
> use it directly" things, and then the crypto indirection comes along
> and makes that simple inline allocation of a small constant size
> (afaik it is CHACHA20POLY1305_KEY_SIZE, which is 32) be another
> allocation entirely.
>
> And it's some random odd non-typed thing too, so then you have that
> silly and stupid dynamic allocation using a name lookup:
>
>    crypto_alloc_aead("rfc7539(chacha20,poly1305)", 0, CRYPTO_ALG_ASYNC);
>
> to create what used to be (and should be) a simple allocation that was
> has a static type and was just part of the code.
>

That crypto_alloc_aead() call does a lot of things under the hood:
- use an existing instantiation of rfc7539(chacha20,poly1305) if available,
- look for modules that implement the whole transformation directly,
- if none are found, instantiate the rfc7539 template, which will
essentially do the above for chacha20 and poly1305, potentially using
per-arch accelerated implementations if available (for either), or
otherwise, fall back to the generic versions.

What *I* see as the issue here is not that we need to do this at all,
but that we have to do it for each value of the key. IMO, it would be
much better to instantiate this thing only once, and have a way of
passing a per-request key into it, permitting us to hide the whole
thing behind the existing library interface.


> It also ends up doing other bad things, ie that packet-time
>
> +       if (unlikely(crypto_aead_reqsize(key->tfm) > 0)) {
> +               req = aead_request_alloc(key->tfm, GFP_ATOMIC);
> +               if (!req)
> +                       return false;
>
> thing that hopefully _is_ unlikely, but that's just more potential
> breakage from that whole async crypto interface.
>
> This is what people do *not* want to do, and why people often don't
> like the crypto interfaces.
>
> It's exactly why we then have those "bare" routines as a library where
> people just want to access the low-level hashing or whatever directly.
>
> So please don't do things like this for an initial import.
>

This is tied to the zero reqsize patch earlier in the series. If your
crypto_alloc_aead() call gets fulfilled by the template we modified
earlier (and which is the only synchronous implementation we currently
have), this is guaranteed to be unlikely/false. For async
implementations, however, we need this allocation to be on the heap
since the stack will go away before the call completes.

> Leave the thing alone, and just use the direct and synchronous static
> crypto library interface tjhat you imported in patch 14/18 (but see
> below about the incomplete import).
>

Patch #14 only imports the C library version, not any of the
accelerated versions that Jason proposed. So we will need a way to
hook all the existing accelerated drivers into that library interface,
add handling for SIMD etc.

> Now, later on, if you can *show* that some async implementation really
> really helps, and you have numbers for it, and you can convince people
> that the above kind of indirection is _worth_ it, then that's a second
> phase. But don't make code uglier without those actual numbers.
>
> Because it's not just uglier and has that silly extra indirection and
> potential allocation problems, this part just looks very fragile
> indeed:
>
> > The nonce related changes are there to address the mismatch between the
> > 96-bit nonce (aka IV) that the rfc7539() template expects, and the 64-bit
> > nonce that WireGuard uses.
> ...
> >  struct packet_cb {
> > -       u64 nonce;
> > -       struct noise_keypair *keypair;
> >         atomic_t state;
> > +       __le32 ivpad;                   /* pad 64-bit nonce to 96 bits */
> > +       __le64 nonce;
> > +       struct noise_keypair *keypair;
> >         u32 mtu;
> >         u8 ds;
> >  };
>
> The above is subtle and silently depends on the struct layout.
>
> It really really shouldn't.
>
> Can it be acceptable doing something like that? Yeah, but you really
> should be making it very explicit, perhaps using
>
>   struct {
>         __le32 ivpad;
>         __le64 nonce;
>    } __packed;
>
> or something like that.
>

This is what I started out with, put the packed struct causes GCC on
architectures that care about alignment to switch to the unaligned
accessors, which I tried to avoid. I'll add a build_bug to ensure that
offset(ivpad) + 4 == offset(nonce).

> Because right now you're depending on particular layout of those fields:
>
> > +       aead_request_set_crypt(req, sg, sg, skb->len,
> > +                              (u8 *)&PACKET_CB(skb)->ivpad);
>
> but honestly, that's not ok at all.
>
> Somebody makes a slight change to that struct, and it might continue
> to work fine on x86-32 (where 64-bit values are only 32-bit aligned)
> but subtly break on other architectures.
>
> Also, you changed how the nonce works from being in CPU byte order to
> be explicitly LE. That may be ok, and looks like it might be a
> cleanup, but honestly I think it should have been done as a separate
> patch.
>

Fair enough.

> So could you please update that patch 14/18 to also have that
> synchronous chacha20poly1305_decrypt_sg() interface, and then just
> drop this 18/18 for now?
>

Hmm, not really, because then performance is going to suck. The way we
organise the code in the crypto API today is to have generic C
libraries in lib/crypto, and use the full API for per-arch accelerated
code (or async accelerators). Patch #14 uses the former.

We'll need a way to refactor the existing accelerated code so it is
exposed via the library interface, and there are a couple of options:
- modify our AEAD code so it can take per-request keys - that way, we
could instantiate a single TFM in the implementation of the library,
and keep the chacha20poly1305_[en|de]crypt_sg() interfaces intact,
- create arch/*/lib versions of all the accelerated ChaCha20 and
Poly1305 routines we have, so that the ordinary library precedence
rules give you the fastest implementation available - we may need some
tweaks to the module loader for weak symbols etc to make this
seamless, but it should be doable
- add an entirely new accelerated crypto library stack on the side (aka Zinc)


> That would mean that
>
>  (a) you wouldn't need this patch, and you can then do that as a
> separate second phase once you have numbers and it can stand on its
> own.
>
>  (b) you'd actually have something that *builds* when  you import the
> main wireguard patch in 15/18
>
> because right now it looks like you're not only forcing this async
> interface with the unnecessary indirection, you're also basically
> having a tree that doesn't even build or work for a couple of commits.
>

True, but this was intentional, and not intended for merge as-is.

> And I'm still not convinced (a) ever makes sense - the overhead of any
> accelerator is just high enought that I doubt you'll have numbers -
> performance _or_ power.
>
> But even if you're right that it might be a power advantage on some
> platform, that wouldn't make it an advantage on other platforms. Maybe
> it could be done as a config option where you can opt in to the async
> interface when that makes sense - but not force the indirection and
> extra allocations when it doesn't.

I know the code isn't pretty, but it looks worse than it is. I'll look
into using the new static calls framework to instantiate the library
interface based on whatever the platform provides as synchronous
implementations of ChaCha20 and Poly1305.

> As a separate patch, something like
> that doesn't sound horrendous (and I think that's also an argument for
> doing that CPU->LE change as an independent change).
>
> Yes, yes, there's also that 17/18 that switches over to a different
> header file location and Kconfig names but that could easily be folded
> into 15/18 and then it would all be bisectable.
>
> Alternatively, maybe 15/18 could be done with wireguard disabled in
> the Kconfig (just to make the patch identical), and then 17/18 enables
> it when it compiles with a big note about how you wanted to keep 15/18
> pristine to make the changes obvious.
>
> Hmm?
>
> I don't really have a dog in this fight, but on the whole I really
> liked the series. But this 18/18 raised my heckles, and I think I
> understand why it might raise the heckles of the wireguard people.
>

We're likely to spend some time discussing all this before I get
around to respinning this (if ever). But I am also a fan of WireGuard,
and I am eager to finish this discussion once and for all.

  parent reply	other threads:[~2019-09-26 11:06 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-25 16:12 [RFC PATCH 00/18] crypto: wireguard using the existing crypto API Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 01/18] crypto: shash - add plumbing for operating on scatterlists Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 02/18] crypto: x86/poly1305 - implement .update_from_sg method Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 03/18] crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 04/18] crypto: arm64/poly1305 " Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 05/18] crypto: chacha - move existing library code into lib/crypto Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 06/18] crypto: rfc7539 - switch to shash for Poly1305 Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 07/18] crypto: rfc7539 - use zero reqsize for sync instantiations without alignmask Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 08/18] crypto: testmgr - add a chacha20poly1305 test case Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 09/18] crypto: poly1305 - move core algorithm into lib/crypto Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 10/18] crypto: poly1305 - add init/update/final library routines Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 11/18] int128: move __uint128_t compiler test to Kconfig Ard Biesheuvel
2019-09-25 21:01   ` Linus Torvalds
2019-09-25 21:19     ` Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 12/18] crypto: BLAKE2s - generic C library implementation and selftest Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 13/18] crypto: Curve25519 - generic C library implementations " Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 14/18] crypto: chacha20poly1305 - import construction and selftest from Zinc Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 15/18] net: WireGuard secure network tunnel Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 16/18] netlink: use new strict length types in policy for 5.2 Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 17/18] wg switch to lib/crypto algos Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 18/18] net: wireguard - switch to crypto API for packet encryption Ard Biesheuvel
2019-09-25 22:15   ` Linus Torvalds
2019-09-25 22:22     ` Linus Torvalds
2019-09-26  9:40     ` Pascal Van Leeuwen
2019-09-26 16:35       ` Linus Torvalds
2019-09-27  0:15         ` Pascal Van Leeuwen
2019-09-27  1:30           ` Linus Torvalds
2019-09-27  2:54             ` Linus Torvalds
2019-09-27  3:53               ` Herbert Xu
2019-09-27  4:37                 ` Andy Lutomirski
2019-09-27  4:59                   ` Herbert Xu
2019-09-27  4:01               ` Herbert Xu
2019-09-27  4:13                 ` Linus Torvalds
2019-09-27 10:44               ` Pascal Van Leeuwen
2019-09-27 11:08                 ` Pascal Van Leeuwen
2019-09-27  4:36             ` Andy Lutomirski
2019-09-27  9:58             ` Pascal Van Leeuwen
2019-09-27 10:11               ` Herbert Xu
2019-09-27 16:23               ` Linus Torvalds
2019-09-30 11:14                 ` France didn't want GSM encryption Marc Gonzalez
2019-09-30 21:37                   ` Linus Torvalds
2019-09-30 20:44                 ` [RFC PATCH 18/18] net: wireguard - switch to crypto API for packet encryption Pascal Van Leeuwen
2019-09-27  2:06           ` Linus Torvalds
2019-09-27 10:11             ` Pascal Van Leeuwen
2019-09-26 11:06     ` Ard Biesheuvel [this message]
2019-09-26 12:34       ` Ard Biesheuvel
2019-09-26  8:59 ` [RFC PATCH 00/18] crypto: wireguard using the existing crypto API Jason A. Donenfeld
2019-09-26 10:19   ` Pascal Van Leeuwen
2019-09-26 10:59     ` Jason A. Donenfeld
2019-09-26 11:06     ` chapoly acceleration hardware [Was: Re: [RFC PATCH 00/18] crypto: wireguard using the existing crypto API] Jason A. Donenfeld
2019-09-26 11:38       ` Toke Høiland-Jørgensen
2019-09-26 13:52       ` Pascal Van Leeuwen
2019-09-26 23:13         ` Dave Taht
2019-09-27 12:18           ` Pascal Van Leeuwen
2019-09-26 22:47       ` Jakub Kicinski
2019-09-26 12:07   ` [RFC PATCH 00/18] crypto: wireguard using the existing crypto API Ard Biesheuvel
2019-09-26 13:06     ` Pascal Van Leeuwen
2019-09-26 13:15       ` Ard Biesheuvel
2019-09-26 14:03         ` Pascal Van Leeuwen
2019-09-26 14:52           ` Ard Biesheuvel
2019-09-26 15:04             ` Pascal Van Leeuwen
2019-09-26 20:47     ` Jason A. Donenfeld
2019-09-26 21:22       ` Andrew Lunn
2019-09-26 21:36       ` Andy Lutomirski
2019-09-27  7:20         ` Jason A. Donenfeld
2019-10-01  8:56           ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKv+Gu_YOqvqJ4YC=ixBh-v4fiFTFNpEagHiTRU7Oq4PrhJPkw@mail.gmail.com' \
    --to=ard.biesheuvel@linaro.org \
    --cc=Jason@zx2c4.com \
    --cc=arnd@arndb.de \
    --cc=catalin.marinas@arm.com \
    --cc=dan.carpenter@oracle.com \
    --cc=davem@davemloft.net \
    --cc=ebiggers@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=maz@kernel.org \
    --cc=sneves@dei.uc.pt \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).