linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Pascal Van Leeuwen <pvanleeuwen@verimatrix.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Linux Crypto Mailing List <linux-crypto@vger.kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	David Miller <davem@davemloft.net>,
	Greg KH <gregkh@linuxfoundation.org>,
	"Jason A . Donenfeld" <Jason@zx2c4.com>,
	Samuel Neves <sneves@dei.uc.pt>,
	Dan Carpenter <dan.carpenter@oracle.com>,
	Arnd Bergmann <arnd@arndb.de>, Eric Biggers <ebiggers@google.com>,
	Andy Lutomirski <luto@kernel.org>, Will Deacon <will@kernel.org>,
	Marc Zyngier <maz@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>
Subject: Re: [RFC PATCH 18/18] net: wireguard - switch to crypto API for packet encryption
Date: Thu, 26 Sep 2019 18:30:37 -0700	[thread overview]
Message-ID: <CAHk-=wjr1w7x9Rjre_ALnDLASYNjsEHxu6VJpk4eUwZXN0ntqw@mail.gmail.com> (raw)
In-Reply-To: <MN2PR20MB297317D9870A3B93B5E506C9CA810@MN2PR20MB2973.namprd20.prod.outlook.com>

On Thu, Sep 26, 2019 at 5:15 PM Pascal Van Leeuwen
<pvanleeuwen@verimatrix.com> wrote:
>
> That remark is just very stupid. The hardware ALREADY exists, and
> more hardware is in the pipeline. Once this stuff is designed in, it
> usually stays in for many years to come. And these are chips sold in
> _serious_ quantities, to be used in things like wireless routers and
> DSL, cable and FTTH modems, 5G base stations, etc. etc.

Yes, I very much mentioned routers. I believe those can happen much
more quickly.

But I would very much hope that that is not the only situation where
you'd see wireguard used.

I'd want to see wireguard in an end-to-end situation from the very
client hardware. So laptops, phones, desktops. Not the untrusted (to
me) hw in between.

> No, these are just the routers going into *everyone's* home. And 5G
> basestations arriving at every other street corner. I wouldn't call
> that rare, exactly.

That's fine for a corporate tunnel between devices. Which is certainly
one use case for wireguard.

But if you want VPN for your own needs for security, you want it at
the _client_. Not at the router box. So that case really does matter.

And I really don't see the hardware happening in that space. So the
bad crypto interfaces only make the client _worse_.

See?

But on to the arguments that we actually agree on:

> Hey, no argument there. I don't see any good reason why the key can't
> be on the stack. I doubt any hardware would be able to DMA that as-is
> directly, and in any case, key changes should be infrequent, so copying
> it to some DMA buffer should not be a performance problem.
> So maybe that's an area for improvement: allow that to be on the stack.

It's not even just the stack. It's really that the crypto interfaces
are *designed* so that you have to allocate things separately, and
can't embed these things in your own data structures.

And they are that way, because the crypto interfaces aren't actually
about (just) hiding the hardware interface: they are about hiding
_all_ the encryption details.

There's no way to say "hey, I know the crypto I use, I know the key
size I have, I know the state size it needs, I can preallocate those
AS PART of my own data structures".

Because the interface is designed to be so "generic" that you simply
can't do those things, they are all external allocations, which is
inevitably slower when you don't have hardware.

And you've shown that you don't care about that "don't have hardware"
situation, and seem to think it's the only case that matters. That's
your job, after all.

But however much you try to claim otherwise, there's all these
situations where the hardware just isn't there, and the crypto
interface just forces nasty overhead for absolutely no good reason.

> I already explained the reasons for _not_ doing direct calls above.

And I've tried to explain how direct calls that do the synchronous
thing efficiently would be possible, but then _if_ there is hardware,
they can then fall back to an async interface.

> > So there is absolutely NO DOWNSIDE for hw accelerated crypto to just
> > do it right, and use an interface like this:
> >
> >        if (!chacha20poly1305_decrypt_sg(sg, sg, skb->len, NULL, 0,
> >                                         PACKET_CB(skb)->nonce, key->key,
> >                                         simd_context))
> >                return false;
> >
> Well, for one thing, a HW API should not expect the result to be
> available when the function call returns. (if that's what you
> mean here). That would just be WRONG.

Right. But that also shouldn't mean that when you have synchronous
hardware (ie CPU) you have to set everything up even though it will
never be used.

Put another way: even with hardware acceleration, the queuing
interface should be a simple "do this" interface.

The current crypto interface is basically something that requires all
the setup up-front, whether it's needed or not. And it forces those
very inconvenient and slow external allocations.

And I'm saying that causes problems, because it fundamentally means
that you can't do a good job for the common CPU  case, because you're
paying all those costs even when you need absolutely none of them.
Both at setup time, but also at run-time due to the extra indirection
and cache misses etc.

> Again, HW acceleration does not depend on the indirection _at all_,
> that's there for entirely different purposes I explained above.
> HW acceleration _does_ depend greatly on a truly async ifc though.

Can you realize that the world isn't just all hw acceleration?

Can you admit that the current crypto interface is just horrid for the
non-accelerated case?

Can you perhaps then also think that "maybe there are better models".

> So queue requests on one side, handle results from the other side
> in some callback func off of an interrupt handler.

Actually, what you can do - and what people *have* done - is to admit
that the synchronous case is real and important, and then design
interfaces that work for that one too.

You don't need to allocate resources ahead of time, and you don't have
to disallow just having the state buffer allocated by the caller.

So here's the *wrong* way to do it (and the way that crypto does it):

 - dynamically allocate buffers at "init time"

 - fill in "callback fields" etc before starting the crypto, whether
they are needed or not

 - call a "decrypt" function that then uses the indirect functions you
set up at init time, and possibly waits for it (or calls the callbacks
you set up)

note how it's all this "state machine" model where you add data to the
state machine, and at some point you say "execute" and then either you
wait for things or you get callbacks.

That makes sense for a hw crypto engine. It's how a lot of them work, after all.

But it makes _zero_ sense for the synchronous case. You did a lot of
extra work for that case, and because it was all a styate machine, you
did it particularly inefficiently: not only do you have those separate
allocations with pointer following, the "decrypt()" call ends up doing
an indirect call to the CPU implementation, which is just quite slow
to begin with, particularly in this day and age with retpoline etc.

So what's the alternative?

I claim that a good interface would accept that "Oh, a lot of cases
will be synchronous, and a lot of cases use one fixed
encryption/decryption model".

And it's quite doable. Instead of having those callback fields and
indirection etc, you could have something more akin to this:

 - let the caller know what the state size is and allocate the
synchronous state in its own data structures

 - let the caller just call a static "decrypt_xyz()" function for xyz
decryptrion.

 - if you end up doing it synchronously, that function just returns
"done". No overhead. No extra allocations. No unnecessary stuff. Just
do it, using the buffers provided. End of story. Efficient and simple.

 - BUT.

 - any hardware could have registered itself for "I can do xyz", and
the decrypt_xyz() function would know about those, and *if* it has a
list of accelerators (hopefully sorted by preference etc), it would
try to use them. And if they take the job (they might not - maybe
their queues are full, maybe they don't have room for new keys at the
moment, which might be a separate setup from the queues), the
"decrypt_xyz()" function returns a _cookie_ for that job. It's
probably a pre-allocated one (the hw accelerator might preallocate a
fixed number of in-progress data structures).

And once you have that cookie, and you see "ok, I didn't get the
answer immediately" only THEN do you start filling in things like
callback stuff, or maybe you set up a wait-queue and start waiting for
it, or whatever".

See the difference in models? One forces that asynchronous model, and
actively penalizes the synchronous one.

The other _allows_ an asynchronous model, but is fine with a synchronous one.

> >        aead_request_set_callback(req, 0, NULL, NULL);
> >
> This is just inevitable for HW acceration ...

See above. It really isn't. You could do it *after* the fact, when
you've gotten that ticket from the hardware. Then you say "ok, if the
ticket is done, use these callbacks". Or "I'll now wait for this
ticket to be done" (which is what the above does by setting the
callbacks to zero).

Wouldn't that be lovely for a user?

I suspect it would be a nice model for a hw accelerator too. If you
have full queues or have problems allocating new memory or whatever,
you just let the code fall back to the synchronous interface.

> Trust me, I have whole list of things I don't like about the
> API myself, it's not exacty ideal for HW acceleration  either.

That';s the thing. It's actively detrimental for "I have no HW acceleration".

And apparently it's not optimal for you guys either.

> But the point is - there are those case where you _don't_ know and
> _that_ is what the Crypto API is for. And just generally, crypto
> really _should_ be switchable.

It's very much not what wireguard does.

And honestly, most of the switchable ones have caused way more
security problems than they have "fixed" by being switchable.

                 Linus

  reply	other threads:[~2019-09-27  1:31 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-25 16:12 [RFC PATCH 00/18] crypto: wireguard using the existing crypto API Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 01/18] crypto: shash - add plumbing for operating on scatterlists Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 02/18] crypto: x86/poly1305 - implement .update_from_sg method Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 03/18] crypto: arm/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 04/18] crypto: arm64/poly1305 " Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 05/18] crypto: chacha - move existing library code into lib/crypto Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 06/18] crypto: rfc7539 - switch to shash for Poly1305 Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 07/18] crypto: rfc7539 - use zero reqsize for sync instantiations without alignmask Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 08/18] crypto: testmgr - add a chacha20poly1305 test case Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 09/18] crypto: poly1305 - move core algorithm into lib/crypto Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 10/18] crypto: poly1305 - add init/update/final library routines Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 11/18] int128: move __uint128_t compiler test to Kconfig Ard Biesheuvel
2019-09-25 21:01   ` Linus Torvalds
2019-09-25 21:19     ` Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 12/18] crypto: BLAKE2s - generic C library implementation and selftest Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 13/18] crypto: Curve25519 - generic C library implementations " Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 14/18] crypto: chacha20poly1305 - import construction and selftest from Zinc Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 15/18] net: WireGuard secure network tunnel Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 16/18] netlink: use new strict length types in policy for 5.2 Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 17/18] wg switch to lib/crypto algos Ard Biesheuvel
2019-09-25 16:12 ` [RFC PATCH 18/18] net: wireguard - switch to crypto API for packet encryption Ard Biesheuvel
2019-09-25 22:15   ` Linus Torvalds
2019-09-25 22:22     ` Linus Torvalds
2019-09-26  9:40     ` Pascal Van Leeuwen
2019-09-26 16:35       ` Linus Torvalds
2019-09-27  0:15         ` Pascal Van Leeuwen
2019-09-27  1:30           ` Linus Torvalds [this message]
2019-09-27  2:54             ` Linus Torvalds
2019-09-27  3:53               ` Herbert Xu
2019-09-27  4:37                 ` Andy Lutomirski
2019-09-27  4:59                   ` Herbert Xu
2019-09-27  4:01               ` Herbert Xu
2019-09-27  4:13                 ` Linus Torvalds
2019-09-27 10:44               ` Pascal Van Leeuwen
2019-09-27 11:08                 ` Pascal Van Leeuwen
2019-09-27  4:36             ` Andy Lutomirski
2019-09-27  9:58             ` Pascal Van Leeuwen
2019-09-27 10:11               ` Herbert Xu
2019-09-27 16:23               ` Linus Torvalds
2019-09-30 11:14                 ` France didn't want GSM encryption Marc Gonzalez
2019-09-30 21:37                   ` Linus Torvalds
2019-09-30 20:44                 ` [RFC PATCH 18/18] net: wireguard - switch to crypto API for packet encryption Pascal Van Leeuwen
2019-09-27  2:06           ` Linus Torvalds
2019-09-27 10:11             ` Pascal Van Leeuwen
2019-09-26 11:06     ` Ard Biesheuvel
2019-09-26 12:34       ` Ard Biesheuvel
2019-09-26  8:59 ` [RFC PATCH 00/18] crypto: wireguard using the existing crypto API Jason A. Donenfeld
2019-09-26 10:19   ` Pascal Van Leeuwen
2019-09-26 10:59     ` Jason A. Donenfeld
2019-09-26 11:06     ` chapoly acceleration hardware [Was: Re: [RFC PATCH 00/18] crypto: wireguard using the existing crypto API] Jason A. Donenfeld
2019-09-26 11:38       ` Toke Høiland-Jørgensen
2019-09-26 13:52       ` Pascal Van Leeuwen
2019-09-26 23:13         ` Dave Taht
2019-09-27 12:18           ` Pascal Van Leeuwen
2019-09-26 22:47       ` Jakub Kicinski
2019-09-26 12:07   ` [RFC PATCH 00/18] crypto: wireguard using the existing crypto API Ard Biesheuvel
2019-09-26 13:06     ` Pascal Van Leeuwen
2019-09-26 13:15       ` Ard Biesheuvel
2019-09-26 14:03         ` Pascal Van Leeuwen
2019-09-26 14:52           ` Ard Biesheuvel
2019-09-26 15:04             ` Pascal Van Leeuwen
2019-09-26 20:47     ` Jason A. Donenfeld
2019-09-26 21:22       ` Andrew Lunn
2019-09-26 21:36       ` Andy Lutomirski
2019-09-27  7:20         ` Jason A. Donenfeld
2019-10-01  8:56           ` Ard Biesheuvel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wjr1w7x9Rjre_ALnDLASYNjsEHxu6VJpk4eUwZXN0ntqw@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=Jason@zx2c4.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=arnd@arndb.de \
    --cc=catalin.marinas@arm.com \
    --cc=dan.carpenter@oracle.com \
    --cc=davem@davemloft.net \
    --cc=ebiggers@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=maz@kernel.org \
    --cc=pvanleeuwen@verimatrix.com \
    --cc=sneves@dei.uc.pt \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).