From: Herbert Xu <herbert@gondor.apana.org.au>
To: Eneas U de Queiroz <cotequeiroz@gmail.com>
Cc: linux-crypto@vger.kernel.org,
"David S. Miller" <davem@davemloft.net>,
Ard Biesheuvel <ardb@kernel.org>
Subject: Re: [PATCH v5 0/3] crypto: qce driver fixes for gcm
Date: Thu, 13 Feb 2020 17:26:15 +0800 [thread overview]
Message-ID: <20200213092615.l7cd3qe6etp55mjc@gondor.apana.org.au> (raw)
In-Reply-To: <20200207150227.31014-1-cotequeiroz@gmail.com>
On Fri, Feb 07, 2020 at 12:02:24PM -0300, Eneas U de Queiroz wrote:
> I've made enough mistakes in this series, I'll just start over. It's
> been hard for me not to be able to run test this in master, and have to
> go back and forth between it and 4.19; that's why I have messed up so
> many times. I apologize for the noise again.
>
> If you've read the cover letter from v1 and v2, there's not anything too
> relevant that I'm changing here.
>
> ---
>
> I've finally managed to get gcm(aes) working with the qce crypto engine.
>
> These first patch fixes a bug where the gcm authentication tag was being
> overwritten during gcm decryption, because it was passed in the same sgl
> buffer as the crypto payload. The qce driver appends some private state
> buffer to the request destination sgl, but it was not checking the
> length of the sgl being passed.
>
> The second patch works around a problem, which I frankly can't pinpoint
> what exactly is the cause, but after some help from Ard Biesheuvel, I
> think it is related to DMA. When gcm sends a request in
> crypto_gcm_setkey, it stores the hash (the crypto payload) and the iv in
> the same data struct. When the driver updates the IV, then the payload
> gets overwritten with the unencrypted data, or all zeroes, it may be a
> coincidence.
>
> However, it works if I pass the request down to the fallback driver--it
> is used by the driver to accept 192-bit-key requests. All I had to do
> was setup the fallback regardless of key size, and then check the
> payload length along with the keysize to pass the request to the
> fallback. This turns out to enhance performance, because of the
> avoided latency that comes with using the hardware.
>
> I've started with checking for a single 16-byte AES block, and that is
> enough to make gcm work. Next thing I've done was to tune the request
> size for performance. What got me started into looking at the qce
> driver was reports of it being detrimental to VPN speed, by the way.
> I've tested this win an Asus RT-AC58U, but the slow VPN reports[1] have
> more devices affected. Access to the device was kindly provided by
> @simsasss.
>
> I've added a 768-byte block size to tcrypt to get some measurements to
> come up with an optimal threshold to transition from software to
> hardware, and encountered another bug in the qce driver: it apparently
> cannot handle aes-xts requests that are greater than 512 bytes, but not
> a multiple of it. It failed with 768, 1280; XTS is usually used with a
> 512-byte sector (or a multiple of it), so I'm concluding that is the
> cause of failure.
>
> With that fixed, I added a module parameter to set the maximum request
> size that will be handled by the software fallback cipher and made some
> speed measurements using tcrypt to come up with an optimum value.
>
> I've documented this briefly in the parameter description, pointing out
> that gcm will not work if you set it to 0, and in better detail in the
> Kconfig help.
>
> TLDR: In the worst (where the hardware is slowest) case, hardware and
> software speed match at around 768 bytes, but I lowered the threshold to
> 512 to benefit the CPU offload.
>
> Here's a sample comparing three runs, using the proposed driver, varying
> the aes_sw_max_len parameter: 1st run will always use fallback, second
> run will use the default fallback for len <= 512, and third run will
> never use the fallback.
>
> testing speed of async cbc(aes) (cbc-aes-qce) encryption
> ------------------ ---------- ---------- ----------
> aes_sw_max_len 32,768 512 0
> ------------------ ---------- ---------- ----------
> 128 bit 16 bytes 8,081,136 5,614,448 430,416
> 128 bit 64 bytes 13,152,768 13,205,952 1,745,088
> 128 bit 256 bytes 16,094,464 16,101,120 6,969,600
> 128 bit 512 bytes 16,701,440 16,705,024 12,866,048
> 128 bit 768 bytes 16,883,712 13,192,704 15,186,432
> 128 bit 1024 bytes 17,036,288 17,149,952 19,716,096
> 128 bit 2048 bytes 17,108,992 30,842,880 32,868,352
> 128 bit 4096 bytes 17,203,200 44,929,024 49,655,808
> 128 bit 8192 bytes 17,219,584 58,966,016 74,186,752
> 256 bit 16 bytes 6,962,432 1,943,616 419,088
> 256 bit 64 bytes 10,485,568 10,421,952 1,681,536
> 256 bit 256 bytes 12,211,712 12,160,000 6,701,312
> 256 bit 512 bytes 12,499,456 12,584,448 9,882,112
> 256 bit 768 bytes 12,622,080 12,550,656 14,701,824
> 256 bit 1024 bytes 12,750,848 16,079,872 19,585,024
> 256 bit 2048 bytes 12,812,288 28,293,120 27,693,056
> 256 bit 4096 bytes 12,939,264 34,234,368 44,142,592
> 256 bit 8192 bytes 12,845,056 50,274,304 63,520,768
>
> The numbers vary from run to run, sometimes greatly.
>
> I've tried running the same tests with the arm-neon drivers, but the
> results don't change with any cipher mode, so I'm assuming the fallback
> is always aes-generic.
>
> I've made the measurements using an Asus RT-AC58U only, so I don't know
> how other hardware performs, but the user can always override the
> parameter, or even its default value.
>
> [1] https://forum.openwrt.org/t/ipsec-performance-issue/39690
>
> Eneas U de Queiroz (3):
> crypto: qce - use cryptlen when adding extra sgl
> crypto: qce - use AES fallback for small requests
> crypto: qce - handle AES-XTS cases that qce fails
>
> drivers/crypto/Kconfig | 23 +++++++++++++++++++++++
> drivers/crypto/qce/common.c | 2 --
> drivers/crypto/qce/common.h | 3 +++
> drivers/crypto/qce/dma.c | 11 ++++++-----
> drivers/crypto/qce/dma.h | 2 +-
> drivers/crypto/qce/skcipher.c | 30 ++++++++++++++++++++----------
> 6 files changed, 53 insertions(+), 18 deletions(-)
All applied. Thanks.
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
prev parent reply other threads:[~2020-02-13 9:26 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-07 15:02 [PATCH v5 0/3] crypto: qce driver fixes for gcm Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 1/3] crypto: qce - use cryptlen when adding extra sgl Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 2/3] crypto: qce - use AES fallback for small requests Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 3/3] crypto: qce - handle AES-XTS cases that qce fails Eneas U de Queiroz
2020-02-13 9:26 ` Herbert Xu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200213092615.l7cd3qe6etp55mjc@gondor.apana.org.au \
--to=herbert@gondor.apana.org.au \
--cc=ardb@kernel.org \
--cc=cotequeiroz@gmail.com \
--cc=davem@davemloft.net \
--cc=linux-crypto@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).