From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Eric Biggers <ebiggers3@gmail.com>
Cc: "linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
Herbert Xu <herbert@gondor.apana.org.au>
Subject: Re: [RFC PATCH] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic
Date: Thu, 2 Feb 2017 07:52:23 +0000 [thread overview]
Message-ID: <CAKv+Gu-TU+5EM8YLyxpaCgCkG=r-Pr0655aJq8B3OqT5szOq4A@mail.gmail.com> (raw)
In-Reply-To: <20170202064716.GB582@zzz>
On 2 February 2017 at 06:47, Eric Biggers <ebiggers3@gmail.com> wrote:
> On Mon, Jan 30, 2017 at 02:11:29PM +0000, Ard Biesheuvel wrote:
>> Instead of unconditionally forcing 4 byte alignment for all generic
>> chaining modes that rely on crypto_xor() or crypto_inc() (which may
>> result in unnecessary copying of data when the underlying hardware
>> can perform unaligned accesses efficiently), make those functions
>> deal with unaligned input explicitly, but only if the Kconfig symbol
>> HAVE_EFFICIENT_UNALIGNED_ACCESS is set. This will allow us to drop
>> the alignmasks from the CBC, CMAC, CTR, CTS, PCBC and SEQIV drivers.
>>
>> For crypto_inc(), this simply involves making the 4-byte stride
>> conditional on HAVE_EFFICIENT_UNALIGNED_ACCESS being set, given that
>> it typically operates on 16 byte buffers.
>>
>> For crypto_xor(), an algorithm is implemented that simply runs through
>> the input using the largest strides possible if unaligned accesses are
>> allowed. If they are not, an optimal sequence of memory accesses is
>> emitted that takes the relative alignment of the input buffers into
>> account, e.g., if the relative misalignment of dst and src is 4 bytes,
>> the entire xor operation will be completed using 4 byte loads and stores
>> (modulo unaligned bits at the start and end). Note that all expressions
>> involving startalign and misalign are simply eliminated by the compiler
>> if HAVE_EFFICIENT_UNALIGNED_ACCESS is defined.
>>
>
> Hi Ard,
>
> This is a good idea, and I think it was error-prone to be requiring 4-byte
> alignment always, and also inefficient on many architectures.
>
> The new crypto_inc() looks fine, but the new crypto_xor() is quite complicated.
> I'm wondering whether it has to be that way, especially since it seems to most
> commonly be used on very small input buffers, e.g. 8 or 16-byte blocks. There
> are a couple trivial ways it could be simplified, e.g. using 'dst' and 'src'
> directly instead of 'a' and 'b' (which also seems to improve code generation by
> getting rid of the '+= len & ~mask' parts), or using sizeof(long) directly
> instead of 'size' and 'mask'.
>
> But also when I tried testing the proposed crypto_xor() on MIPS, it didn't work
> correctly on a misaligned buffer. With startalign=1, it did one iteration of
> the following loop and then exited with startalign=0 and entered the "unsigned
> long at a time" loop, which is incorrect since at that point the buffers were
> not yet fully aligned:
>
Right. I knew it was convoluted but I thought that was justified by
its correctness :-)
>> do {
>> if (len < sizeof(u8))
>> break;
>>
>> if (len >= size && !(startalign & 1) && !(misalign & 1))
>> break;
>>
>> *dst++ ^= *src++;
>> len -= sizeof(u8);
>> startalign &= ~sizeof(u8);
>> } while (misalign & 1);
>
> I think it would need to do instead:
>
> startalign += sizeof(u8);
> startalign %= sizeof(unsigned long);
>
> But I am wondering whether you considered something simpler, using the
> get_unaligned/put_unaligned helpers, maybe even using a switch statement for the
> last (sizeof(long) - 1) bytes so it can be compiled as a jump table. Something
> like this:
>
> #define xor_unaligned(dst, src) \
> put_unaligned(get_unaligned(dst) ^ get_unaligned(src), (dst))
>
> void crypto_xor(u8 *dst, const u8 *src, unsigned int len)
> {
> while (len >= sizeof(unsigned long)) {
> xor_unaligned((unsigned long *)dst, (unsigned long *)src);
> dst += sizeof(unsigned long);
> src += sizeof(unsigned long);
> len -= sizeof(unsigned long);
> }
>
> switch (len) {
> #ifdef CONFIG_64BIT
> case 7:
> dst[6] ^= src[6];
> /* fall through */
> case 6:
> xor_unaligned((u16 *)&dst[4], (u16 *)&src[4]);
> goto len_4;
> case 5:
> dst[4] ^= src[4];
> /* fall through */
> case 4:
> len_4:
> xor_unaligned((u32 *)dst, (u32 *)src);
> break;
> #endif
> case 3:
> dst[2] ^= src[2];
> /* fall through */
> case 2:
> xor_unaligned((u16 *)dst, (u16 *)src);
> break;
> case 1:
> dst[0] ^= src[0];
> break;
> }
> }
>
> That would seem like a better choice for small buffers, which seems to be the
> more common case. It should generate slightly faster code on architectures with
> fast unaligned access like x86_64, while still being sufficient on architectures
> without --- perhaps even faster, since it wouldn't have as many branches.
>
Well, what I tried to deal with explicitly is misaligned dst and src
by, e.g., 4 bytes. In my implementation, the idea was that it would
run through the entire input using 32-bit loads and stores, and the
standard unaligned accessors always take the hit of bytewise accesses
on architectures that don't have hardware support.
But I have an idea how I could simplify this, stay tuned please
next prev parent reply other threads:[~2017-02-02 7:52 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-01-30 14:11 [RFC PATCH] crypto: algapi - make crypto_xor() and crypto_inc() alignment agnostic Ard Biesheuvel
2017-02-02 6:47 ` Eric Biggers
2017-02-02 7:52 ` Ard Biesheuvel [this message]
2017-02-04 23:00 ` Jason A. Donenfeld
2017-02-04 23:10 ` Jason A. Donenfeld
2017-02-05 3:05 ` Eric Biggers
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKv+Gu-TU+5EM8YLyxpaCgCkG=r-Pr0655aJq8B3OqT5szOq4A@mail.gmail.com' \
--to=ard.biesheuvel@linaro.org \
--cc=ebiggers3@gmail.com \
--cc=herbert@gondor.apana.org.au \
--cc=linux-crypto@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).