All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	David Miller <davem@davemloft.net>,
	Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH v2 net-next] tcp: md5: use kmalloc() backed scratch areas
Date: Wed, 29 Jun 2016 09:41:44 -0700	[thread overview]
Message-ID: <CALCETrUR1+au2-11qSAT24832nZcTwjUj8UY09rCMisyHawnaA@mail.gmail.com> (raw)
In-Reply-To: <20160629153818.GA29838@gondor.apana.org.au>

On Wed, Jun 29, 2016 at 8:38 AM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, Jun 29, 2016 at 08:26:43AM -0700, Andy Lutomirski wrote:
>>
>> Two reasons:
>>
>> 1. Code like tcp md5 would be simpler if it could pass a scatterlist
>> to hash the skb but use a virtual address for the header.
>
> True.  But I bet we can make it simpler in other ways without
> creating special interfaces for it.  Look at how we do IPsec
> encryption/hashing, this is what TCP md5 should look like.  But
> nobody in their right mind would bother doing this because this
> is dead code.
>
>> 2. The actual calling sequence and the amount of indirection is much
>> less for shash, so hashing short buffer is probably *much* faster.
>
> Really?
>
> Have you measured the speed difference between the ahash and shash
> interfaces? Are you sure that this would matter when compared
> against the speed of hashing a single MD5 block?
>

It's non-negligible.  If I had SHA-NI, it would probably be a bigger
relative difference.

[    0.535717] SHA1 ahash: warming up
[    0.737471] SHA1 ahash: 500000 loops, 386 ns/loop
[    0.737937] SHA1 shash: warming up
[    0.903131] SHA1 shash: 500000 loops, 323 ns/loop
[    1.055409] SHA1 shash_digest: 500000 loops, 303 ns/loop

[    0.574410] SHA1 ahash: warming up
[    0.796927] SHA1 ahash: 500000 loops, 414 ns/loop
[    0.797695] SHA1 shash: warming up
[    0.959772] SHA1 shash: 500000 loops, 316 ns/loop
[    1.100886] SHA1 shash_digest: 500000 loops, 280 ns/loop

Here's MD5:

[    0.504708] MD5 ahash: warming up
[    0.688155] MD5 ahash: 500000 loops, 344 ns/loop
[    0.688971] MD5 shash: warming up
[    0.834953] MD5 shash: 500000 loops, 285 ns/loop
[    0.982021] MD5 shash_digest: 500000 loops, 292 ns/loop

[    0.570780] MD5 ahash: warming up
[    0.754325] MD5 ahash: 500000 loops, 357 ns/loop
[    0.754807] MD5 shash: warming up
[    0.906861] MD5 shash: 500000 loops, 297 ns/loop
[    1.059057] MD5 shash_digest: 500000 loops, 303 ns/loop

If you throw in sub-one-block requests, it'll be worse, because those
requests don't do real work.

Overall, it looks like there's overhead of something like 50ns for
each ahash invocation vs the shash equivalent.  It's not huge, but
it's there.  (This is cache-hot.  I bet it's considerably worse if
cache-cold, because ahash will require a lot more code cache lines as
well as the extra cache lines involved in the scatterlist and whatever
arch stuff is needed to map back and forth between virtual and
physical addresses.

Here's the code:

static void test_ahash(void)
{
    struct crypto_ahash *hash;
    struct ahash_request *req;
    struct scatterlist sg_in;
    char *buf, *out;
    int i;
    u64 end, start;
    unsigned loops = 500000;

    hash = crypto_alloc_ahash("sha1", 0, CRYPTO_ALG_ASYNC);
    if (IS_ERR(hash))
        return;

    req = ahash_request_alloc(hash, GFP_KERNEL);
    if (!req)
        return;

    ahash_request_set_callback(req, 0, NULL, NULL);

    buf = kmalloc(64, GFP_KERNEL);
    memset(buf, 0, 64);
    out = kmalloc(20, GFP_KERNEL);

    pr_err("SHA1 ahash: warming up\n");

    for (i = 0; i < 10000; i++) {
        crypto_ahash_init(req);
        sg_init_one(&sg_in, buf, 64);
        ahash_request_set_crypt(req, &sg_in, NULL, 64);
        if (crypto_ahash_update(req))
            return;
        ahash_request_set_crypt(req, NULL, out, 0);
        if (crypto_ahash_final(req))
            return;
    }

    start = ktime_get_ns();
    for (i = 0; i < loops; i++) {
        crypto_ahash_init(req);
        sg_init_one(&sg_in, buf, 64);
        ahash_request_set_crypt(req, &sg_in, NULL, 64);
        if (crypto_ahash_update(req))
            return;
        ahash_request_set_crypt(req, NULL, out, 0);
        if (crypto_ahash_final(req))
            return;
    }
    end = ktime_get_ns();

    pr_err("SHA1 ahash: %u loops, %llu ns/loop\n", loops, (unsigned
long long)((end-start)/loops));
}

static void test_shash(void)
{
    struct crypto_shash *hash;
    struct shash_desc *desc;
    char *buf, *out;
    int i;
    u64 end, start;
    unsigned loops = 500000;

    hash = crypto_alloc_shash("sha1", 0, CRYPTO_ALG_ASYNC);
    if (IS_ERR(hash))
        return;

    desc = kmalloc(crypto_shash_descsize(hash), GFP_KERNEL);
    if (!desc)
        return;
    desc->tfm = hash;
    desc->flags = 0;

    /*
     * Hmm.  This interface isn't conducive to the use of hardware that
     * needs state beyond what lives in a single memory buffer -- how would
     * the shash implementation know when to free the state?
     */

    buf = kmalloc(64, GFP_KERNEL);
    memset(buf, 0, 64);
    out = kmalloc(20, GFP_KERNEL);

    pr_err("SHA1 shash: warming up\n");

    for (i = 0; i < 10000; i++) {
        crypto_shash_init(desc);
        if (crypto_shash_update(desc, buf, 64))
            return;
        if (crypto_shash_final(desc, out))
            return;
    }

    start = ktime_get_ns();
    for (i = 0; i < loops; i++) {
        /*
         * For fairness, this does init; update; final instead of
         * just digest.
         */
        crypto_shash_init(desc);
        if (crypto_shash_update(desc, buf, 64))
            return;
        if (crypto_shash_final(desc, out))
            return;
    }
    end = ktime_get_ns();

    pr_err("SHA1 shash: %u loops, %llu ns/loop\n", loops, (unsigned
long long)((end-start)/loops));

    start = ktime_get_ns();
    for (i = 0; i < loops; i++) {
        crypto_shash_digest(desc, buf, 64, out);
    }
    end = ktime_get_ns();

    pr_err("SHA1 shash_digest: %u loops, %llu ns/loop\n", loops,
(unsigned long long)((end-start)/loops));
}

  reply	other threads:[~2016-06-29 16:42 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-25  1:51 tcp md5: one more crypto-sg-on-the-stack instance Andy Lutomirski
2016-06-25  4:11 ` Eric Dumazet
2016-06-25  4:26   ` Eric Dumazet
2016-06-25  4:37     ` Eric Dumazet
2016-06-25 16:09       ` [PATCH net-next] tcp: md5: do not use stack storage in crypto operations Eric Dumazet
2016-06-25 19:35         ` Andy Lutomirski
2016-06-27 14:51         ` David Miller
2016-06-27 15:02           ` Andy Lutomirski
2016-06-27 15:22           ` Eric Dumazet
2016-06-27 16:51         ` [PATCH v2 net-next] tcp: md5: use kmalloc() backed scratch areas Eric Dumazet
2016-06-27 17:58           ` Andy Lutomirski
2016-06-27 21:22             ` Eric Dumazet
2016-06-28  3:41             ` Herbert Xu
2016-06-28 17:35               ` Andy Lutomirski
2016-06-29  2:23                 ` Herbert Xu
2016-06-29 14:59                   ` Andy Lutomirski
2016-06-29 15:02                     ` Herbert Xu
2016-06-29 15:26                       ` Andy Lutomirski
2016-06-29 15:38                         ` Herbert Xu
2016-06-29 16:41                           ` Andy Lutomirski [this message]
2016-06-29 21:44                             ` Eric Dumazet
2016-06-29 22:39                               ` Andy Lutomirski
2016-06-30  3:45                                 ` Herbert Xu
2016-06-27 18:31           ` Cong Wang
2016-06-27 21:25             ` Eric Dumazet
2016-07-01  8:03           ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrUR1+au2-11qSAT24832nZcTwjUj8UY09rCMisyHawnaA@mail.gmail.com \
    --to=luto@amacapital.net \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.