linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ardb@kernel.org>
To: Ben Greear <greearb@candelatech.com>
Cc: Linux Crypto Mailing List <linux-crypto@vger.kernel.org>
Subject: Re: Help getting aesni crypto patch upstream
Date: Fri, 31 Jul 2020 13:00:04 +0300	[thread overview]
Message-ID: <CAMj1kXFt5XCzJ7xGz2=pg-2dA0-zs94XTFsWoTNpSENuhdC51w@mail.gmail.com> (raw)
In-Reply-To: <d71f2800-baef-b97f-62cb-0fbe798c35ed@candelatech.com>

On Fri, 31 Jul 2020 at 01:57, Ben Greear <greearb@candelatech.com> wrote:
>
> On 7/29/20 1:06 PM, Ard Biesheuvel wrote:
> > On Wed, 29 Jul 2020 at 22:29, Ben Greear <greearb@candelatech.com> wrote:
> >>
> >> On 7/29/20 12:09 PM, Ard Biesheuvel wrote:
> >>> On Wed, 29 Jul 2020 at 15:27, Ben Greear <greearb@candelatech.com> wrote:
> >>>>
> >>>> On 7/28/20 11:06 PM, Ard Biesheuvel wrote:
> >>>>> On Wed, 29 Jul 2020 at 01:03, Ben Greear <greearb@candelatech.com> wrote:
> >>>>>>
> >>>>>> Hello,
> >>>>>>
> >>>>>> As part of my wifi test tool, I need to do decrypt AES on the CPU, and the only way this
> >>>>>> performs well is to use aesni.  I've been using a patch for years that does this, but
> >>>>>> recently somewhere between 5.4 and 5.7, the API I've been using has been removed.
> >>>>>>
> >>>>>> Would anyone be interested in getting this support upstream?  I'd be happy to pay for
> >>>>>> the effort.
> >>>>>>
> >>>>>> Here is the patch in question:
> >>>>>>
> >>>>>> https://github.com/greearb/linux-ct-5.7/blob/master/wip/0001-crypto-aesni-add-ccm-aes-algorithm-implementation.patch
> >>>>>>
> >>>>>> Please keep me in CC, I'm not subscribed to this list.
> >>>>>>
> >>>>>
> >>>>> Hi Ben,
> >>>>>
> >>>>> Recently, the x86 FPU handling was improved to remove the overhead of
> >>>>> preserving/restoring of the register state, so the issue that this
> >>>>> patch fixes may no longer exist. Did you try?
> >>>>>
> >>>>> In any case, according to the commit log on that patch, the problem is
> >>>>> in the MAC generation, so it might be better to add a cbcmac(aes)
> >>>>> implementation only, and not duplicate all the CCM boilerplate.
> >>>>>
> >>>>
> >>>> Hello,
> >>>>
> >>>> I don't know all of the details, and do not understand the crypto subsystem,
> >>>> but I am pretty sure that I need at least some of this patch.
> >>>>
> >>>
> >>> Whether this is true is what I am trying to get clarified.
> >>>
> >>> Your patch works around a performance bottleneck related to the use of
> >>> AES-NI instructions in the kernel, which has been addressed recently.
> >>> If the issue still exists, we can attempt to devise a fix for it,
> >>> which may or may not be based on this patch.
> >>
> >> Ok, I can do the testing.  Do you expect 5.7-stable has all the needed
> >> performance improvements?
> >>
> >
> > Yes.
>
> It does not, as far as we can tell.
>
> We did a download test on an apu2 (small embedded AMD CPU, but with
> aesni support).  A WiFi station is in software-decrypt mode (ath10k-ct driver/firmware,
> but ath9k would be valid to reproduce the issue as well.)
>
> On our 5.4 kernel with the aesni patch applied, we get
> about 220Mbps wpa2 download throughput.  With open, we get about 260Mbps
> download throughput.
>
> On 5.7, without any aesni patch, we see about 116Mbps download wpa2 throughput,
> and about 265Mbps open download throughput.
>

Thanks for the excellent data. Apparently, FPU preserve/restore is
still prohibitively expensive on these cores.

I'll have a stab at implementing cbcmac(aesni) early next week: as i
pointed out before, we don't need all the ccm boilerplate if the ctr
and mac processing are still done in separate passes anyway.


>
> perf-top on 5.4 during download test with our aesni patch looks like this:
>
>     11.73%  libc-2.29.so   [.] __memset_sse2_unaligned_erms
>       4.79%  [kernel]       [k] _aesni_enc1
>       1.71%  [kernel]       [k] ___bpf_prog_run
>       1.66%  [kernel]       [k] memcpy
>       1.25%  [kernel]       [k] copy_user_generic_string
>       1.18%  libjvm.so      [.] InstanceKlass::oop_follow_contents
>       1.07%  [kernel]       [k] _aesni_enc4
>       0.98%  [kernel]       [k] csum_partial_copy_generic
>       0.96%  libjvm.so      [.] SpinPause
>       0.84%  [kernel]       [k] get_data_to_compute
>       0.81%  libjvm.so      [.] ParMarkBitMap::mark_obj
>       0.64%  [kernel]       [k] udp_sendmsg
>       0.62%  [kernel]       [k] __ip_append_data.isra.53
>       0.58%  [kernel]       [k] ipt_do_table
>       0.56%  [kernel]       [k] _aesni_inc
>       0.56%  [kernel]       [k] fib_table_lookup
>       0.55%  [kernel]       [k] __rcu_read_unlock
>       0.52%  libc-2.29.so   [.] __GI___strcmp_ssse3
>       0.50%  [kernel]       [k] igb_xmit_frame_ring
>
>
> on 5.7, we see this:
>
>     11.36%  libc-2.29.so   [.] __memset_sse2_unaligned_erms
>       9.03%  [kernel]       [k] kernel_fpu_begin
>       4.75%  libjvm.so      [.] SpinPause
>       2.89%  [kernel]       [k] __crypto_xor
>       2.35%  [kernel]       [k] _aesni_enc1
>       1.94%  [kernel]       [k] copy_user_generic_string
>       1.29%  [kernel]       [k] aesni_encrypt
>       0.85%  [kernel]       [k] udp_sendmsg
>       0.85%  [kernel]       [k] crypto_cipher_encrypt_one
>       0.71%  [kernel]       [k] crypto_cbcmac_digest_update
>       0.69%  [kernel]       [k] __ip_append_data.isra.53
>       0.69%  [kernel]       [k] memcpy
>       0.68%  [kernel]       [k] crypto_ctr_crypt
>       0.61%  [kernel]       [k] irq_fpu_usable
>       0.58%  [kernel]       [k] ipt_do_table
>       0.55%  [kernel]       [k] __dev_queue_xmit
>       0.54%  [kernel]       [k] crypto_inc
>       0.49%  libc-2.29.so   [.] __GI___strcmp_ssse3
>       0.45%  libjvm.so      [.] InstanceKlass::oop_follow_contents
>       0.45%  [kernel]       [k] ip_route_output_key_hash_rcu
>
>
>
> So, I think there is still some good improvement possible, likely with something like
> the aesni patch I showed, but re-worked to function in 5.7+ kernels.
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com

  reply	other threads:[~2020-07-31 10:00 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28 22:03 Help getting aesni crypto patch upstream Ben Greear
2020-07-29  6:06 ` Ard Biesheuvel
2020-07-29 12:27   ` Ben Greear
2020-07-29 19:09     ` Ard Biesheuvel
2020-07-29 19:29       ` Ben Greear
2020-07-29 20:06         ` Ard Biesheuvel
2020-07-30 22:56           ` Ben Greear
2020-07-31 10:00             ` Ard Biesheuvel [this message]
2020-07-31 14:02               ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMj1kXFt5XCzJ7xGz2=pg-2dA0-zs94XTFsWoTNpSENuhdC51w@mail.gmail.com' \
    --to=ardb@kernel.org \
    --cc=greearb@candelatech.com \
    --cc=linux-crypto@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).