All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: Eric Biggers <ebiggers@kernel.org>
Cc: "open list:HARDWARE RANDOM NUMBER GENERATOR CORE"
	<linux-crypto@vger.kernel.org>,
	linux-fscrypt@vger.kernel.org,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Paul Crowley <paulcrowley@google.com>,
	Greg Kaiser <gkaiser@google.com>,
	Michael Halcrow <mhalcrow@google.com>,
	"Jason A . Donenfeld" <Jason@zx2c4.com>,
	Samuel Neves <samuel.c.p.neves@gmail.com>,
	Tomer Ashur <tomer.ashur@esat.kuleuven.be>
Subject: Re: [RFC PATCH v2 09/12] crypto: nhpoly1305 - add NHPoly1305 support
Date: Mon, 22 Oct 2018 19:25:27 -0300	[thread overview]
Message-ID: <CAKv+Gu_vXmfNQT8j=G_Zz5C3-zDPPEQ2ne6ZZQw8mD0rifO8qA@mail.gmail.com> (raw)
In-Reply-To: <20181022184236.GA59695@gmail.com>

On 22 October 2018 at 15:42, Eric Biggers <ebiggers@kernel.org> wrote:
> On Sat, Oct 20, 2018 at 11:06:00PM +0800, Ard Biesheuvel wrote:
>> >> > +
>> >> > +#define NH_STRIDE(K0, K1, K2, K3)                              \
>> >> > +({                                                             \
>> >> > +       m_A = get_unaligned_le32(src); src += 4;                \
>> >> > +       m_B = get_unaligned_le32(src); src += 4;                \
>> >> > +       m_C = get_unaligned_le32(src); src += 4;                \
>> >> > +       m_D = get_unaligned_le32(src); src += 4;                \
>> >> > +       K3##_A = *key++;                                        \
>> >> > +       K3##_B = *key++;                                        \
>> >> > +       K3##_C = *key++;                                        \
>> >> > +       K3##_D = *key++;                                        \
>> >> > +       sum0 += (u64)(u32)(m_A + K0##_A) * (u32)(m_C + K0##_C); \
>> >> > +       sum1 += (u64)(u32)(m_A + K1##_A) * (u32)(m_C + K1##_C); \
>> >> > +       sum2 += (u64)(u32)(m_A + K2##_A) * (u32)(m_C + K2##_C); \
>> >> > +       sum3 += (u64)(u32)(m_A + K3##_A) * (u32)(m_C + K3##_C); \
>> >> > +       sum0 += (u64)(u32)(m_B + K0##_B) * (u32)(m_D + K0##_D); \
>> >> > +       sum1 += (u64)(u32)(m_B + K1##_B) * (u32)(m_D + K1##_D); \
>> >> > +       sum2 += (u64)(u32)(m_B + K2##_B) * (u32)(m_D + K2##_D); \
>> >> > +       sum3 += (u64)(u32)(m_B + K3##_B) * (u32)(m_D + K3##_D); \
>> >> > +})
>> >> > +
>> >> > +static void nh_generic(const u32 *key, const u8 *src, size_t srclen,
>> >> > +                      __le64 hash[NH_NUM_PASSES])
>> >> > +{
>> >> > +       u64 sum0 = 0, sum1 = 0, sum2 = 0, sum3 = 0;
>> >> > +       u32 k0_A = *key++;
>> >> > +       u32 k0_B = *key++;
>> >> > +       u32 k0_C = *key++;
>> >> > +       u32 k0_D = *key++;
>> >> > +       u32 k1_A = *key++;
>> >> > +       u32 k1_B = *key++;
>> >> > +       u32 k1_C = *key++;
>> >> > +       u32 k1_D = *key++;
>> >> > +       u32 k2_A = *key++;
>> >> > +       u32 k2_B = *key++;
>> >> > +       u32 k2_C = *key++;
>> >> > +       u32 k2_D = *key++;
>> >> > +       u32 k3_A, k3_B, k3_C, k3_D;
>> >> > +       u32 m_A, m_B, m_C, m_D;
>> >> > +       size_t n = srclen / NH_MESSAGE_UNIT;
>> >> > +
>> >> > +       BUILD_BUG_ON(NH_PAIR_STRIDE != 2);
>> >> > +       BUILD_BUG_ON(NH_NUM_PASSES != 4);
>> >> > +
>> >> > +       while (n >= 4) {
>> >> > +               NH_STRIDE(k0, k1, k2, k3);
>> >> > +               NH_STRIDE(k1, k2, k3, k0);
>> >> > +               NH_STRIDE(k2, k3, k0, k1);
>> >> > +               NH_STRIDE(k3, k0, k1, k2);
>> >> > +               n -= 4;
>> >> > +       }
>> >> > +       if (n) {
>> >> > +               NH_STRIDE(k0, k1, k2, k3);
>> >> > +               if (--n) {
>> >> > +                       NH_STRIDE(k1, k2, k3, k0);
>> >> > +                       if (--n)
>> >> > +                               NH_STRIDE(k2, k3, k0, k1);
>> >> > +               }
>> >> > +       }
>> >> > +
>> >>
>> >> This all looks a bit clunky to me, with the macro, the *key++s in the
>> >> initializers and these conditionals.
>> >>
>> >> Was it written in this particular way to get GCC to optimize it in the
>> >> right way?
>> >
>> > This does get compiled into something much faster than a naive version, which
>> > you can find commented out at
>> > https://github.com/google/adiantum/blob/master/benchmark/src/nh.c#L14.
>> >
>> > Though, I admit that I haven't put a ton of effort into this C implementation of
>> > NH yet.  Right now it's actually somewhat of a translation of the NEON version.
>> > I'll do some experiments and see if it can be made into something less ugly
>> > without losing performance.
>> >
>>
>> No that's fine but please document it.
>>
>
> Hmm, I'm actually leaning towards the following instead.  Unrolling multiple
> strides to try to reduce loads of the keys doesn't seem worthwhile in the C
> implementation; for one, it bloats the code size a lot
> (412 => 2332 bytes on arm32).
>
> static void nh_generic(const u32 *key, const u8 *message, size_t message_len,
>                        __le64 hash[NH_NUM_PASSES])
> {
>         u64 sums[4] = { 0, 0, 0, 0 };
>
>         BUILD_BUG_ON(NH_PAIR_STRIDE != 2);
>         BUILD_BUG_ON(NH_NUM_PASSES != 4);
>
>         while (message_len) {
>                 u32 m0 = get_unaligned_le32(message + 0);
>                 u32 m1 = get_unaligned_le32(message + 4);
>                 u32 m2 = get_unaligned_le32(message + 8);
>                 u32 m3 = get_unaligned_le32(message + 12);
>
>                 sums[0] += (u64)(u32)(m0 + key[ 0]) * (u32)(m2 + key[ 2]);
>                 sums[1] += (u64)(u32)(m0 + key[ 4]) * (u32)(m2 + key[ 6]);
>                 sums[2] += (u64)(u32)(m0 + key[ 8]) * (u32)(m2 + key[10]);
>                 sums[3] += (u64)(u32)(m0 + key[12]) * (u32)(m2 + key[14]);
>                 sums[0] += (u64)(u32)(m1 + key[ 1]) * (u32)(m3 + key[ 3]);
>                 sums[1] += (u64)(u32)(m1 + key[ 5]) * (u32)(m3 + key[ 7]);
>                 sums[2] += (u64)(u32)(m1 + key[ 9]) * (u32)(m3 + key[11]);
>                 sums[3] += (u64)(u32)(m1 + key[13]) * (u32)(m3 + key[15]);

Are these (u32) casts really necessary? All the addends are u32 types,
so I'd expect each (x + y) subexpression to have a u32 type already as
well. Or am I missing something?

>                 key += NH_MESSAGE_UNIT / sizeof(key[0]);
>                 message += NH_MESSAGE_UNIT;
>                 message_len -= NH_MESSAGE_UNIT;
>         }
>
>         hash[0] = cpu_to_le64(sums[0]);
>         hash[1] = cpu_to_le64(sums[1]);
>         hash[2] = cpu_to_le64(sums[2]);
>         hash[3] = cpu_to_le64(sums[3]);
> }

In any case, this looks much better to me, so if the performance is
satisfactory, let's use this version.

WARNING: multiple messages have this Message-ID (diff)
From: ard.biesheuvel@linaro.org (Ard Biesheuvel)
To: linux-arm-kernel@lists.infradead.org
Subject: [RFC PATCH v2 09/12] crypto: nhpoly1305 - add NHPoly1305 support
Date: Mon, 22 Oct 2018 19:25:27 -0300	[thread overview]
Message-ID: <CAKv+Gu_vXmfNQT8j=G_Zz5C3-zDPPEQ2ne6ZZQw8mD0rifO8qA@mail.gmail.com> (raw)
In-Reply-To: <20181022184236.GA59695@gmail.com>

On 22 October 2018 at 15:42, Eric Biggers <ebiggers@kernel.org> wrote:
> On Sat, Oct 20, 2018 at 11:06:00PM +0800, Ard Biesheuvel wrote:
>> >> > +
>> >> > +#define NH_STRIDE(K0, K1, K2, K3)                              \
>> >> > +({                                                             \
>> >> > +       m_A = get_unaligned_le32(src); src += 4;                \
>> >> > +       m_B = get_unaligned_le32(src); src += 4;                \
>> >> > +       m_C = get_unaligned_le32(src); src += 4;                \
>> >> > +       m_D = get_unaligned_le32(src); src += 4;                \
>> >> > +       K3##_A = *key++;                                        \
>> >> > +       K3##_B = *key++;                                        \
>> >> > +       K3##_C = *key++;                                        \
>> >> > +       K3##_D = *key++;                                        \
>> >> > +       sum0 += (u64)(u32)(m_A + K0##_A) * (u32)(m_C + K0##_C); \
>> >> > +       sum1 += (u64)(u32)(m_A + K1##_A) * (u32)(m_C + K1##_C); \
>> >> > +       sum2 += (u64)(u32)(m_A + K2##_A) * (u32)(m_C + K2##_C); \
>> >> > +       sum3 += (u64)(u32)(m_A + K3##_A) * (u32)(m_C + K3##_C); \
>> >> > +       sum0 += (u64)(u32)(m_B + K0##_B) * (u32)(m_D + K0##_D); \
>> >> > +       sum1 += (u64)(u32)(m_B + K1##_B) * (u32)(m_D + K1##_D); \
>> >> > +       sum2 += (u64)(u32)(m_B + K2##_B) * (u32)(m_D + K2##_D); \
>> >> > +       sum3 += (u64)(u32)(m_B + K3##_B) * (u32)(m_D + K3##_D); \
>> >> > +})
>> >> > +
>> >> > +static void nh_generic(const u32 *key, const u8 *src, size_t srclen,
>> >> > +                      __le64 hash[NH_NUM_PASSES])
>> >> > +{
>> >> > +       u64 sum0 = 0, sum1 = 0, sum2 = 0, sum3 = 0;
>> >> > +       u32 k0_A = *key++;
>> >> > +       u32 k0_B = *key++;
>> >> > +       u32 k0_C = *key++;
>> >> > +       u32 k0_D = *key++;
>> >> > +       u32 k1_A = *key++;
>> >> > +       u32 k1_B = *key++;
>> >> > +       u32 k1_C = *key++;
>> >> > +       u32 k1_D = *key++;
>> >> > +       u32 k2_A = *key++;
>> >> > +       u32 k2_B = *key++;
>> >> > +       u32 k2_C = *key++;
>> >> > +       u32 k2_D = *key++;
>> >> > +       u32 k3_A, k3_B, k3_C, k3_D;
>> >> > +       u32 m_A, m_B, m_C, m_D;
>> >> > +       size_t n = srclen / NH_MESSAGE_UNIT;
>> >> > +
>> >> > +       BUILD_BUG_ON(NH_PAIR_STRIDE != 2);
>> >> > +       BUILD_BUG_ON(NH_NUM_PASSES != 4);
>> >> > +
>> >> > +       while (n >= 4) {
>> >> > +               NH_STRIDE(k0, k1, k2, k3);
>> >> > +               NH_STRIDE(k1, k2, k3, k0);
>> >> > +               NH_STRIDE(k2, k3, k0, k1);
>> >> > +               NH_STRIDE(k3, k0, k1, k2);
>> >> > +               n -= 4;
>> >> > +       }
>> >> > +       if (n) {
>> >> > +               NH_STRIDE(k0, k1, k2, k3);
>> >> > +               if (--n) {
>> >> > +                       NH_STRIDE(k1, k2, k3, k0);
>> >> > +                       if (--n)
>> >> > +                               NH_STRIDE(k2, k3, k0, k1);
>> >> > +               }
>> >> > +       }
>> >> > +
>> >>
>> >> This all looks a bit clunky to me, with the macro, the *key++s in the
>> >> initializers and these conditionals.
>> >>
>> >> Was it written in this particular way to get GCC to optimize it in the
>> >> right way?
>> >
>> > This does get compiled into something much faster than a naive version, which
>> > you can find commented out at
>> > https://github.com/google/adiantum/blob/master/benchmark/src/nh.c#L14.
>> >
>> > Though, I admit that I haven't put a ton of effort into this C implementation of
>> > NH yet.  Right now it's actually somewhat of a translation of the NEON version.
>> > I'll do some experiments and see if it can be made into something less ugly
>> > without losing performance.
>> >
>>
>> No that's fine but please document it.
>>
>
> Hmm, I'm actually leaning towards the following instead.  Unrolling multiple
> strides to try to reduce loads of the keys doesn't seem worthwhile in the C
> implementation; for one, it bloats the code size a lot
> (412 => 2332 bytes on arm32).
>
> static void nh_generic(const u32 *key, const u8 *message, size_t message_len,
>                        __le64 hash[NH_NUM_PASSES])
> {
>         u64 sums[4] = { 0, 0, 0, 0 };
>
>         BUILD_BUG_ON(NH_PAIR_STRIDE != 2);
>         BUILD_BUG_ON(NH_NUM_PASSES != 4);
>
>         while (message_len) {
>                 u32 m0 = get_unaligned_le32(message + 0);
>                 u32 m1 = get_unaligned_le32(message + 4);
>                 u32 m2 = get_unaligned_le32(message + 8);
>                 u32 m3 = get_unaligned_le32(message + 12);
>
>                 sums[0] += (u64)(u32)(m0 + key[ 0]) * (u32)(m2 + key[ 2]);
>                 sums[1] += (u64)(u32)(m0 + key[ 4]) * (u32)(m2 + key[ 6]);
>                 sums[2] += (u64)(u32)(m0 + key[ 8]) * (u32)(m2 + key[10]);
>                 sums[3] += (u64)(u32)(m0 + key[12]) * (u32)(m2 + key[14]);
>                 sums[0] += (u64)(u32)(m1 + key[ 1]) * (u32)(m3 + key[ 3]);
>                 sums[1] += (u64)(u32)(m1 + key[ 5]) * (u32)(m3 + key[ 7]);
>                 sums[2] += (u64)(u32)(m1 + key[ 9]) * (u32)(m3 + key[11]);
>                 sums[3] += (u64)(u32)(m1 + key[13]) * (u32)(m3 + key[15]);

Are these (u32) casts really necessary? All the addends are u32 types,
so I'd expect each (x + y) subexpression to have a u32 type already as
well. Or am I missing something?

>                 key += NH_MESSAGE_UNIT / sizeof(key[0]);
>                 message += NH_MESSAGE_UNIT;
>                 message_len -= NH_MESSAGE_UNIT;
>         }
>
>         hash[0] = cpu_to_le64(sums[0]);
>         hash[1] = cpu_to_le64(sums[1]);
>         hash[2] = cpu_to_le64(sums[2]);
>         hash[3] = cpu_to_le64(sums[3]);
> }

In any case, this looks much better to me, so if the performance is
satisfactory, let's use this version.

  reply	other threads:[~2018-10-23  6:45 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-15 17:54 [RFC PATCH v2 00/12] crypto: Adiantum support Eric Biggers
2018-10-15 17:54 ` Eric Biggers
2018-10-15 17:54 ` [RFC PATCH v2 01/12] crypto: chacha20-generic - add HChaCha20 library function Eric Biggers
2018-10-15 17:54   ` Eric Biggers
2018-10-19 14:13   ` Ard Biesheuvel
2018-10-19 14:13     ` Ard Biesheuvel
2018-10-19 14:13     ` Ard Biesheuvel
2018-10-15 17:54 ` [RFC PATCH v2 02/12] crypto: chacha20-generic - add XChaCha20 support Eric Biggers
2018-10-15 17:54   ` Eric Biggers
2018-10-19 14:24   ` Ard Biesheuvel
2018-10-19 14:24     ` Ard Biesheuvel
2018-10-19 14:24     ` Ard Biesheuvel
2018-10-15 17:54 ` [RFC PATCH v2 03/12] crypto: chacha20-generic - refactor to allow varying number of rounds Eric Biggers
2018-10-15 17:54   ` Eric Biggers
2018-10-19 14:25   ` Ard Biesheuvel
2018-10-19 14:25     ` Ard Biesheuvel
2018-10-19 14:25     ` Ard Biesheuvel
2018-10-15 17:54 ` [RFC PATCH v2 04/12] crypto: chacha - add XChaCha12 support Eric Biggers
2018-10-15 17:54   ` Eric Biggers
2018-10-19 14:34   ` Ard Biesheuvel
2018-10-19 14:34     ` Ard Biesheuvel
2018-10-19 14:34     ` Ard Biesheuvel
2018-10-19 18:28     ` Eric Biggers
2018-10-19 18:28       ` Eric Biggers
2018-10-19 18:28       ` Eric Biggers
2018-10-15 17:54 ` [RFC PATCH v2 05/12] crypto: arm/chacha20 - add XChaCha20 support Eric Biggers
2018-10-15 17:54   ` Eric Biggers
2018-10-20  2:29   ` Ard Biesheuvel
2018-10-20  2:29     ` Ard Biesheuvel
2018-10-20  2:29     ` Ard Biesheuvel
2018-10-15 17:54 ` [RFC PATCH v2 06/12] crypto: arm/chacha20 - refactor to allow varying number of rounds Eric Biggers
2018-10-15 17:54   ` Eric Biggers
2018-10-20  3:35   ` Ard Biesheuvel
2018-10-20  3:35     ` Ard Biesheuvel
2018-10-20  3:35     ` Ard Biesheuvel
2018-10-20  5:26     ` Eric Biggers
2018-10-20  5:26       ` Eric Biggers
2018-10-20  5:26       ` Eric Biggers
2018-10-15 17:54 ` [RFC PATCH v2 07/12] crypto: arm/chacha - add XChaCha12 support Eric Biggers
2018-10-15 17:54   ` Eric Biggers
2018-10-20  3:36   ` Ard Biesheuvel
2018-10-20  3:36     ` Ard Biesheuvel
2018-10-20  3:36     ` Ard Biesheuvel
2018-10-15 17:54 ` [RFC PATCH v2 08/12] crypto: poly1305 - add Poly1305 core API Eric Biggers
2018-10-15 17:54   ` Eric Biggers
2018-10-20  3:45   ` Ard Biesheuvel
2018-10-20  3:45     ` Ard Biesheuvel
2018-10-20  3:45     ` Ard Biesheuvel
2018-10-15 17:54 ` [RFC PATCH v2 09/12] crypto: nhpoly1305 - add NHPoly1305 support Eric Biggers
2018-10-15 17:54   ` Eric Biggers
2018-10-20  4:00   ` Ard Biesheuvel
2018-10-20  4:00     ` Ard Biesheuvel
2018-10-20  4:00     ` Ard Biesheuvel
2018-10-20  5:38     ` Eric Biggers
2018-10-20  5:38       ` Eric Biggers
2018-10-20  5:38       ` Eric Biggers
2018-10-20 15:06       ` Ard Biesheuvel
2018-10-20 15:06         ` Ard Biesheuvel
2018-10-20 15:06         ` Ard Biesheuvel
2018-10-22 18:42         ` Eric Biggers
2018-10-22 18:42           ` Eric Biggers
2018-10-22 18:42           ` Eric Biggers
2018-10-22 22:25           ` Ard Biesheuvel [this message]
2018-10-22 22:25             ` Ard Biesheuvel
2018-10-22 22:25             ` Ard Biesheuvel
2018-10-22 22:40             ` Eric Biggers
2018-10-22 22:40               ` Eric Biggers
2018-10-22 22:40               ` Eric Biggers
2018-10-22 22:43               ` Ard Biesheuvel
2018-10-22 22:43                 ` Ard Biesheuvel
2018-10-22 22:43                 ` Ard Biesheuvel
2018-10-15 17:54 ` [RFC PATCH v2 10/12] crypto: arm/nhpoly1305 - add NEON-accelerated NHPoly1305 Eric Biggers
2018-10-15 17:54   ` Eric Biggers
2018-10-20  4:12   ` Ard Biesheuvel
2018-10-20  4:12     ` Ard Biesheuvel
2018-10-20  4:12     ` Ard Biesheuvel
2018-10-20  5:51     ` Eric Biggers
2018-10-20  5:51       ` Eric Biggers
2018-10-20  5:51       ` Eric Biggers
2018-10-20 15:00       ` Ard Biesheuvel
2018-10-20 15:00         ` Ard Biesheuvel
2018-10-20 15:00         ` Ard Biesheuvel
2018-10-15 17:54 ` [RFC PATCH v2 11/12] crypto: adiantum - add Adiantum support Eric Biggers
2018-10-15 17:54   ` Eric Biggers
2018-10-20  4:17   ` Ard Biesheuvel
2018-10-20  4:17     ` Ard Biesheuvel
2018-10-20  4:17     ` Ard Biesheuvel
2018-10-20  7:12     ` Eric Biggers
2018-10-20  7:12       ` Eric Biggers
2018-10-20  7:12       ` Eric Biggers
2018-10-23 10:40       ` Ard Biesheuvel
2018-10-23 10:40         ` Ard Biesheuvel
2018-10-23 10:40         ` Ard Biesheuvel
2018-10-24 22:06         ` Eric Biggers
2018-10-24 22:06           ` Eric Biggers
2018-10-24 22:06           ` Eric Biggers
2018-10-30  8:17           ` Herbert Xu
2018-10-30  8:17             ` Herbert Xu
2018-10-30  8:17             ` Herbert Xu
2018-10-15 17:54 ` [RFC PATCH v2 12/12] fscrypt: " Eric Biggers
2018-10-15 17:54   ` Eric Biggers
2018-10-19 15:58 ` [RFC PATCH v2 00/12] crypto: " Jason A. Donenfeld
2018-10-19 15:58   ` Jason A. Donenfeld
2018-10-19 18:19   ` Paul Crowley
2018-10-19 18:19     ` Paul Crowley
2018-10-20  3:24     ` Ard Biesheuvel
2018-10-20  3:24       ` Ard Biesheuvel
2018-10-20  3:24       ` Ard Biesheuvel
2018-10-20  5:22       ` Eric Biggers
2018-10-20  5:22         ` Eric Biggers
2018-10-20  5:22         ` Eric Biggers
2018-10-22 10:19     ` Tomer Ashur
2018-10-22 11:20       ` Tomer Ashur
2018-10-22 11:20         ` Tomer Ashur
2018-10-19 19:04   ` Eric Biggers
2018-10-19 19:04     ` Eric Biggers
2018-10-20 10:26     ` Milan Broz
2018-10-20 10:26       ` Milan Broz
2018-10-20 13:47       ` Jason A. Donenfeld
2018-10-20 13:47         ` Jason A. Donenfeld
2018-11-16 21:52       ` Eric Biggers
2018-11-16 21:52         ` Eric Biggers
2018-11-17 10:29         ` Milan Broz
2018-11-17 10:29           ` Milan Broz
2018-11-19 19:28           ` Eric Biggers
2018-11-19 19:28             ` Eric Biggers
2018-11-19 20:05             ` Milan Broz
2018-11-19 20:05               ` Milan Broz
2018-11-19 20:30               ` Jason A. Donenfeld
2018-11-19 20:30                 ` Jason A. Donenfeld
2018-10-21 22:23     ` Eric Biggers
2018-10-21 22:23       ` Eric Biggers
2018-10-21 22:51       ` Jason A. Donenfeld
2018-10-21 22:51         ` Jason A. Donenfeld
2018-10-22 17:17         ` Paul Crowley
2018-10-22 17:17           ` Paul Crowley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKv+Gu_vXmfNQT8j=G_Zz5C3-zDPPEQ2ne6ZZQw8mD0rifO8qA@mail.gmail.com' \
    --to=ard.biesheuvel@linaro.org \
    --cc=Jason@zx2c4.com \
    --cc=ebiggers@kernel.org \
    --cc=gkaiser@google.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-fscrypt@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhalcrow@google.com \
    --cc=paulcrowley@google.com \
    --cc=samuel.c.p.neves@gmail.com \
    --cc=tomer.ashur@esat.kuleuven.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.