linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Herbert Xu <herbert@gondor.apana.org.au>
To: Eneas U de Queiroz <cotequeiroz@gmail.com>
Cc: linux-crypto@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Ard Biesheuvel <ardb@kernel.org>
Subject: Re: [PATCH v5 0/3] crypto: qce driver fixes for gcm
Date: Thu, 13 Feb 2020 17:26:15 +0800	[thread overview]
Message-ID: <20200213092615.l7cd3qe6etp55mjc@gondor.apana.org.au> (raw)
In-Reply-To: <20200207150227.31014-1-cotequeiroz@gmail.com>

On Fri, Feb 07, 2020 at 12:02:24PM -0300, Eneas U de Queiroz wrote:
> I've made enough mistakes in this series, I'll just start over.  It's
> been hard for me not to be able to run test this in master, and have to
> go back and forth between it and 4.19; that's why I have messed up so
> many times.  I apologize for the noise again.
> 
> If you've read the cover letter from v1 and v2, there's not anything too
> relevant that I'm changing here.
> 
> ---
> 
> I've finally managed to get gcm(aes) working with the qce crypto engine.
> 
> These first patch fixes a bug where the gcm authentication tag was being
> overwritten during gcm decryption, because it was passed in the same sgl
> buffer as the crypto payload.  The qce driver appends some private state
> buffer to the request destination sgl, but it was not checking the
> length of the sgl being passed.
> 
> The second patch works around a problem, which I frankly can't pinpoint
> what exactly is the cause, but after some help from Ard Biesheuvel, I
> think it is related to DMA.  When gcm sends a request in
> crypto_gcm_setkey, it stores the hash (the crypto payload) and the iv in
> the same data struct.  When the driver updates the IV, then the payload
> gets overwritten with the unencrypted data, or all zeroes, it may be a
> coincidence.
> 
> However, it works if I pass the request down to the fallback driver--it
> is used by the driver to accept 192-bit-key requests.  All I had to do
> was setup the fallback regardless of key size, and then check the
> payload length along with the keysize to pass the request to the
> fallback.  This turns out to enhance performance, because of the
> avoided latency that comes with using the hardware.
> 
> I've started with checking for a single 16-byte AES block, and that is
> enough to make gcm work.  Next thing I've done was to tune the request
> size for performance.  What got me started into looking at the qce
> driver was reports of it being detrimental to VPN speed, by the way.
> I've tested this win an Asus RT-AC58U, but the slow VPN reports[1] have
> more devices affected.  Access to the device was kindly provided by
> @simsasss.
> 
> I've added a 768-byte block size to tcrypt to get some measurements to
> come up with an optimal threshold to transition from software to
> hardware, and encountered another bug in the qce driver: it apparently
> cannot handle aes-xts requests that are greater than 512 bytes, but not
> a multiple of it.  It failed with 768, 1280; XTS is usually used with a
> 512-byte sector (or a multiple of it), so I'm concluding that is the
> cause of failure.
> 
> With that fixed, I added a module parameter to set the maximum request
> size that will be handled by the software fallback cipher and made some
> speed measurements using tcrypt to come up with an optimum value.
> 
> I've documented this briefly in the parameter description, pointing out
> that gcm will not work if you set it to 0, and in better detail in the
> Kconfig help.
> 
> TLDR: In the worst (where the hardware is slowest) case, hardware and
> software speed match at around 768 bytes, but I lowered the threshold to
> 512 to benefit the CPU offload.
> 
> Here's a sample comparing three runs, using the proposed driver, varying
> the aes_sw_max_len parameter: 1st run will always use fallback, second
> run will use the default fallback for len <= 512, and third run will
> never use the fallback.
> 
> testing speed of async cbc(aes) (cbc-aes-qce) encryption
> ------------------      ----------   ----------    ----------
> aes_sw_max_len              32,768          512             0
> ------------------      ----------   ----------    ----------
> 128 bit   16 bytes       8,081,136    5,614,448       430,416
> 128 bit   64 bytes      13,152,768   13,205,952     1,745,088
> 128 bit  256 bytes      16,094,464   16,101,120     6,969,600
> 128 bit  512 bytes      16,701,440   16,705,024    12,866,048
> 128 bit  768 bytes      16,883,712   13,192,704    15,186,432
> 128 bit 1024 bytes      17,036,288   17,149,952    19,716,096
> 128 bit 2048 bytes      17,108,992   30,842,880    32,868,352
> 128 bit 4096 bytes      17,203,200   44,929,024    49,655,808
> 128 bit 8192 bytes      17,219,584   58,966,016    74,186,752
> 256 bit   16 bytes       6,962,432    1,943,616       419,088
> 256 bit   64 bytes      10,485,568   10,421,952     1,681,536
> 256 bit  256 bytes      12,211,712   12,160,000     6,701,312
> 256 bit  512 bytes      12,499,456   12,584,448     9,882,112
> 256 bit  768 bytes      12,622,080   12,550,656    14,701,824
> 256 bit 1024 bytes      12,750,848   16,079,872    19,585,024
> 256 bit 2048 bytes      12,812,288   28,293,120    27,693,056
> 256 bit 4096 bytes      12,939,264   34,234,368    44,142,592
> 256 bit 8192 bytes      12,845,056   50,274,304    63,520,768
> 
> The numbers vary from run to run, sometimes greatly.
> 
> I've tried running the same tests with the arm-neon drivers, but the
> results don't change with any cipher mode, so I'm assuming the fallback
> is always aes-generic.
> 
> I've made the measurements using an Asus RT-AC58U only, so I don't know
> how other hardware performs, but the user can always override the
> parameter, or even its default value.
> 
> [1] https://forum.openwrt.org/t/ipsec-performance-issue/39690
> 
> Eneas U de Queiroz (3):
>   crypto: qce - use cryptlen when adding extra sgl
>   crypto: qce - use AES fallback for small requests
>   crypto: qce - handle AES-XTS cases that qce fails
> 
>  drivers/crypto/Kconfig        | 23 +++++++++++++++++++++++
>  drivers/crypto/qce/common.c   |  2 --
>  drivers/crypto/qce/common.h   |  3 +++
>  drivers/crypto/qce/dma.c      | 11 ++++++-----
>  drivers/crypto/qce/dma.h      |  2 +-
>  drivers/crypto/qce/skcipher.c | 30 ++++++++++++++++++++----------
>  6 files changed, 53 insertions(+), 18 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

      parent reply	other threads:[~2020-02-13  9:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-07 15:02 [PATCH v5 0/3] crypto: qce driver fixes for gcm Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 1/3] crypto: qce - use cryptlen when adding extra sgl Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 2/3] crypto: qce - use AES fallback for small requests Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 3/3] crypto: qce - handle AES-XTS cases that qce fails Eneas U de Queiroz
2020-02-13  9:26 ` Herbert Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200213092615.l7cd3qe6etp55mjc@gondor.apana.org.au \
    --to=herbert@gondor.apana.org.au \
    --cc=ardb@kernel.org \
    --cc=cotequeiroz@gmail.com \
    --cc=davem@davemloft.net \
    --cc=linux-crypto@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).