All of lore.kernel.org
 help / color / mirror / Atom feed
From: Herbert Xu <herbert@gondor.apana.org.au>
To: Eneas U de Queiroz <cotequeiroz@gmail.com>
Cc: linux-crypto@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Ard Biesheuvel <ardb@kernel.org>
Subject: Re: [PATCH v5 0/3] crypto: qce driver fixes for gcm
Date: Thu, 13 Feb 2020 17:26:15 +0800	[thread overview]
Message-ID: <20200213092615.l7cd3qe6etp55mjc@gondor.apana.org.au> (raw)
In-Reply-To: <20200207150227.31014-1-cotequeiroz@gmail.com>

On Fri, Feb 07, 2020 at 12:02:24PM -0300, Eneas U de Queiroz wrote:
> I've made enough mistakes in this series, I'll just start over.  It's
> been hard for me not to be able to run test this in master, and have to
> go back and forth between it and 4.19; that's why I have messed up so
> many times.  I apologize for the noise again.
> 
> If you've read the cover letter from v1 and v2, there's not anything too
> relevant that I'm changing here.
> 
> ---
> 
> I've finally managed to get gcm(aes) working with the qce crypto engine.
> 
> These first patch fixes a bug where the gcm authentication tag was being
> overwritten during gcm decryption, because it was passed in the same sgl
> buffer as the crypto payload.  The qce driver appends some private state
> buffer to the request destination sgl, but it was not checking the
> length of the sgl being passed.
> 
> The second patch works around a problem, which I frankly can't pinpoint
> what exactly is the cause, but after some help from Ard Biesheuvel, I
> think it is related to DMA.  When gcm sends a request in
> crypto_gcm_setkey, it stores the hash (the crypto payload) and the iv in
> the same data struct.  When the driver updates the IV, then the payload
> gets overwritten with the unencrypted data, or all zeroes, it may be a
> coincidence.
> 
> However, it works if I pass the request down to the fallback driver--it
> is used by the driver to accept 192-bit-key requests.  All I had to do
> was setup the fallback regardless of key size, and then check the
> payload length along with the keysize to pass the request to the
> fallback.  This turns out to enhance performance, because of the
> avoided latency that comes with using the hardware.
> 
> I've started with checking for a single 16-byte AES block, and that is
> enough to make gcm work.  Next thing I've done was to tune the request
> size for performance.  What got me started into looking at the qce
> driver was reports of it being detrimental to VPN speed, by the way.
> I've tested this win an Asus RT-AC58U, but the slow VPN reports[1] have
> more devices affected.  Access to the device was kindly provided by
> @simsasss.
> 
> I've added a 768-byte block size to tcrypt to get some measurements to
> come up with an optimal threshold to transition from software to
> hardware, and encountered another bug in the qce driver: it apparently
> cannot handle aes-xts requests that are greater than 512 bytes, but not
> a multiple of it.  It failed with 768, 1280; XTS is usually used with a
> 512-byte sector (or a multiple of it), so I'm concluding that is the
> cause of failure.
> 
> With that fixed, I added a module parameter to set the maximum request
> size that will be handled by the software fallback cipher and made some
> speed measurements using tcrypt to come up with an optimum value.
> 
> I've documented this briefly in the parameter description, pointing out
> that gcm will not work if you set it to 0, and in better detail in the
> Kconfig help.
> 
> TLDR: In the worst (where the hardware is slowest) case, hardware and
> software speed match at around 768 bytes, but I lowered the threshold to
> 512 to benefit the CPU offload.
> 
> Here's a sample comparing three runs, using the proposed driver, varying
> the aes_sw_max_len parameter: 1st run will always use fallback, second
> run will use the default fallback for len <= 512, and third run will
> never use the fallback.
> 
> testing speed of async cbc(aes) (cbc-aes-qce) encryption
> ------------------      ----------   ----------    ----------
> aes_sw_max_len              32,768          512             0
> ------------------      ----------   ----------    ----------
> 128 bit   16 bytes       8,081,136    5,614,448       430,416
> 128 bit   64 bytes      13,152,768   13,205,952     1,745,088
> 128 bit  256 bytes      16,094,464   16,101,120     6,969,600
> 128 bit  512 bytes      16,701,440   16,705,024    12,866,048
> 128 bit  768 bytes      16,883,712   13,192,704    15,186,432
> 128 bit 1024 bytes      17,036,288   17,149,952    19,716,096
> 128 bit 2048 bytes      17,108,992   30,842,880    32,868,352
> 128 bit 4096 bytes      17,203,200   44,929,024    49,655,808
> 128 bit 8192 bytes      17,219,584   58,966,016    74,186,752
> 256 bit   16 bytes       6,962,432    1,943,616       419,088
> 256 bit   64 bytes      10,485,568   10,421,952     1,681,536
> 256 bit  256 bytes      12,211,712   12,160,000     6,701,312
> 256 bit  512 bytes      12,499,456   12,584,448     9,882,112
> 256 bit  768 bytes      12,622,080   12,550,656    14,701,824
> 256 bit 1024 bytes      12,750,848   16,079,872    19,585,024
> 256 bit 2048 bytes      12,812,288   28,293,120    27,693,056
> 256 bit 4096 bytes      12,939,264   34,234,368    44,142,592
> 256 bit 8192 bytes      12,845,056   50,274,304    63,520,768
> 
> The numbers vary from run to run, sometimes greatly.
> 
> I've tried running the same tests with the arm-neon drivers, but the
> results don't change with any cipher mode, so I'm assuming the fallback
> is always aes-generic.
> 
> I've made the measurements using an Asus RT-AC58U only, so I don't know
> how other hardware performs, but the user can always override the
> parameter, or even its default value.
> 
> [1] https://forum.openwrt.org/t/ipsec-performance-issue/39690
> 
> Eneas U de Queiroz (3):
>   crypto: qce - use cryptlen when adding extra sgl
>   crypto: qce - use AES fallback for small requests
>   crypto: qce - handle AES-XTS cases that qce fails
> 
>  drivers/crypto/Kconfig        | 23 +++++++++++++++++++++++
>  drivers/crypto/qce/common.c   |  2 --
>  drivers/crypto/qce/common.h   |  3 +++
>  drivers/crypto/qce/dma.c      | 11 ++++++-----
>  drivers/crypto/qce/dma.h      |  2 +-
>  drivers/crypto/qce/skcipher.c | 30 ++++++++++++++++++++----------
>  6 files changed, 53 insertions(+), 18 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

      parent reply	other threads:[~2020-02-13  9:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-07 15:02 [PATCH v5 0/3] crypto: qce driver fixes for gcm Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 1/3] crypto: qce - use cryptlen when adding extra sgl Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 2/3] crypto: qce - use AES fallback for small requests Eneas U de Queiroz
2020-02-07 15:02 ` [PATCH v5 3/3] crypto: qce - handle AES-XTS cases that qce fails Eneas U de Queiroz
2020-02-13  9:26 ` Herbert Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200213092615.l7cd3qe6etp55mjc@gondor.apana.org.au \
    --to=herbert@gondor.apana.org.au \
    --cc=ardb@kernel.org \
    --cc=cotequeiroz@gmail.com \
    --cc=davem@davemloft.net \
    --cc=linux-crypto@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.