From: Herbert Xu <herbert@gondor.apana.org.au>
To: Eneas U de Queiroz <cotequeiroz@gmail.com>
Cc: linux-crypto@vger.kernel.org,
"David S. Miller" <davem@davemloft.net>,
Ard Biesheuvel <ardb@kernel.org>
Subject: Re: [PATCH v5 0/3] crypto: qce driver fixes for gcm
Date: Thu, 13 Feb 2020 17:26:15 +0800 [thread overview]
Message-ID: <20200213092615.l7cd3qe6etp55mjc@gondor.apana.org.au> (raw)
In-Reply-To: <20200207150227.31014-1-cotequeiroz@gmail.com>
On Fri, Feb 07, 2020 at 12:02:24PM -0300, Eneas U de Queiroz wrote:
> I've made enough mistakes in this series, I'll just start over. It's
> been hard for me not to be able to run test this in master, and have to
> go back and forth between it and 4.19; that's why I have messed up so
> many times. I apologize for the noise again.
>
> If you've read the cover letter from v1 and v2, there's not anything too
> relevant that I'm changing here.
>
> ---
>
> I've finally managed to get gcm(aes) working with the qce crypto engine.
>
> These first patch fixes a bug where the gcm authentication tag was being
> overwritten during gcm decryption, because it was passed in the same sgl
> buffer as the crypto payload. The qce driver appends some private state
> buffer to the request destination sgl, but it was not checking the
> length of the sgl being passed.
>
> The second patch works around a problem, which I frankly can't pinpoint
> what exactly is the cause, but after some help from Ard Biesheuvel, I
> think it is related to DMA. When gcm sends a request in
> crypto_gcm_setkey, it stores the hash (the crypto payload) and the iv in
> the same data struct. When the driver updates the IV, then the payload
> gets overwritten with the unencrypted data, or all zeroes, it may be a
> coincidence.
>
> However, it works if I pass the request down to the fallback driver--it
> is used by the driver to accept 192-bit-key requests. All I had to do
> was setup the fallback regardless of key size, and then check the
> payload length along with the keysize to pass the request to the
> fallback. This turns out to enhance performance, because of the
> avoided latency that comes with using the hardware.
>
> I've started with checking for a single 16-byte AES block, and that is
> enough to make gcm work. Next thing I've done was to tune the request
> size for performance. What got me started into looking at the qce
> driver was reports of it being detrimental to VPN speed, by the way.
> I've tested this win an Asus RT-AC58U, but the slow VPN reports[1] have
> more devices affected. Access to the device was kindly provided by
> @simsasss.
>
> I've added a 768-byte block size to tcrypt to get some measurements to
> come up with an optimal threshold to transition from software to
> hardware, and encountered another bug in the qce driver: it apparently
> cannot handle aes-xts requests that are greater than 512 bytes, but not
> a multiple of it. It failed with 768, 1280; XTS is usually used with a
> 512-byte sector (or a multiple of it), so I'm concluding that is the
> cause of failure.
>
> With that fixed, I added a module parameter to set the maximum request
> size that will be handled by the software fallback cipher and made some
> speed measurements using tcrypt to come up with an optimum value.
>
> I've documented this briefly in the parameter description, pointing out
> that gcm will not work if you set it to 0, and in better detail in the
> Kconfig help.
>
> TLDR: In the worst (where the hardware is slowest) case, hardware and
> software speed match at around 768 bytes, but I lowered the threshold to
> 512 to benefit the CPU offload.
>
> Here's a sample comparing three runs, using the proposed driver, varying
> the aes_sw_max_len parameter: 1st run will always use fallback, second
> run will use the default fallback for len <= 512, and third run will
> never use the fallback.
>
> testing speed of async cbc(aes) (cbc-aes-qce) encryption
> ------------------ ---------- ---------- ----------
> aes_sw_max_len 32,768 512 0
> ------------------ ---------- ---------- ----------
> 128 bit 16 bytes 8,081,136 5,614,448 430,416
> 128 bit 64 bytes 13,152,768 13,205,952 1,745,088
> 128 bit 256 bytes 16,094,464 16,101,120 6,969,600
> 128 bit 512 bytes 16,701,440 16,705,024 12,866,048
> 128 bit 768 bytes 16,883,712 13,192,704 15,186,432
> 128 bit 1024 bytes 17,036,288 17,149,952 19,716,096
> 128 bit 2048 bytes 17,108,992 30,842,880 32,868,352
> 128 bit 4096 bytes 17,203,200 44,929,024 49,655,808
> 128 bit 8192 bytes 17,219,584 58,966,016 74,186,752
> 256 bit 16 bytes 6,962,432 1,943,616 419,088
> 256 bit 64 bytes 10,485,568 10,421,952 1,681,536
> 256 bit 256 bytes 12,211,712 12,160,000 6,701,312
> 256 bit 512 bytes 12,499,456 12,584,448 9,882,112
> 256 bit 768 bytes 12,622,080 12,550,656 14,701,824
> 256 bit 1024 bytes 12,750,848 16,079,872 19,585,024
> 256 bit 2048 bytes 12,812,288 28,293,120 27,693,056
> 256 bit 4096 bytes 12,939,264 34,234,368 44,142,592
> 256 bit 8192 bytes 12,845,056 50,274,304 63,520,768
>
> The numbers vary from run to run, sometimes greatly.
>
> I've tried running the same tests with the arm-neon drivers, but the
> results don't change with any cipher mode, so I'm assuming the fallback
> is always aes-generic.
>
> I've made the measurements using an Asus RT-AC58U only, so I don't know
> how other hardware performs, but the user can always override the
> parameter, or even its default value.
>
> [1] https://forum.openwrt.org/t/ipsec-performance-issue/39690
>
> Eneas U de Queiroz (3):
> crypto: qce - use cryptlen when adding extra sgl
> crypto: qce - use AES fallback for small requests
> crypto: qce - handle AES-XTS cases that qce fails
>
> drivers/crypto/Kconfig | 23 +++++++++++++++++++++++
> drivers/crypto/qce/common.c | 2 --
> drivers/crypto/qce/common.h | 3 +++
> drivers/crypto/qce/dma.c | 11 ++++++-----
> drivers/crypto/qce/dma.h | 2 +-
> drivers/crypto/qce/skcipher.c | 30 ++++++++++++++++++++----------
> 6 files changed, 53 insertions(+), 18 deletions(-)
All applied. Thanks.
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
prev parent reply other threads:[~2020-02-13 9:26 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-07 15:02 [PATCH v5 0/3] crypto: qce driver fixes for gcm Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 1/3] crypto: qce - use cryptlen when adding extra sgl Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 2/3] crypto: qce - use AES fallback for small requests Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 3/3] crypto: qce - handle AES-XTS cases that qce fails Eneas U de Queiroz
2020-02-13 9:26 ` Herbert Xu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200213092615.l7cd3qe6etp55mjc@gondor.apana.org.au \
--to=herbert@gondor.apana.org.au \
--cc=ardb@kernel.org \
--cc=cotequeiroz@gmail.com \
--cc=davem@davemloft.net \
--cc=linux-crypto@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.