From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Lutomirski Subject: Re: [PATCH net-next v6 07/23] zinc: ChaCha20 ARM and ARM64 implementations Date: Thu, 27 Sep 2018 09:26:59 -0700 Message-ID: References: <20180925145622.29959-1-Jason@zx2c4.com> <20180925145622.29959-8-Jason@zx2c4.com> Mime-Version: 1.0 (1.0) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Thomas Gleixner , Peter Zijlstra , Ard Biesheuvel , LKML , Netdev , Linux Crypto Mailing List , David Miller , Greg Kroah-Hartman , Samuel Neves , Andrew Lutomirski , Jean-Philippe Aumasson , Russell King - ARM Linux , linux-arm-kernel@lists.infradead.org To: "Jason A. Donenfeld" Return-path: In-Reply-To: Sender: netdev-owner@vger.kernel.org List-Id: linux-crypto.vger.kernel.org > On Sep 27, 2018, at 8:19 AM, Jason A. Donenfeld wrote: >=20 > Hey again Thomas, >=20 >> On Thu, Sep 27, 2018 at 3:26 PM Jason A. Donenfeld wrot= e: >>=20 >> Hi Thomas, >>=20 >> I'm trying to optimize this for crypto performance while still taking >> into account preemption concerns. I'm having a bit of trouble figuring >> out a way to determine numerically what the upper bounds for this >> stuff looks like. I'm sure I could pick a pretty sane number that's >> arguably okay -- and way under the limit -- but I still am interested >> in determining what that limit actually is. I was hoping there'd be a >> debugging option called, "warn if preemption is disabled for too >> long", or something, but I couldn't find anything like that. I'm also >> not quite sure what the latency limits are, to just compute this with >> a formula. Essentially what I'm trying to determine is: >>=20 >> preempt_disable(); >> asm volatile(".fill N, 1, 0x90;"); >> preempt_enable(); >>=20 >> What is the maximum value of N for which the above is okay? What >> technique would you generally use in measuring this? >>=20 >> Thanks, >> Jason >=20 > =46rom talking to Peter (now CC'd) on IRC, it sounds like what you're > mostly interested in is clocktime latency on reasonable hardware, with > a goal of around ~20=C2=B5s as a maximum upper bound? I don't expect to ge= t > anywhere near this value at all, but if you can confirm that's a > decent ballpark, it would make for some interesting calculations. >=20 >=20 I would add another consideration: if you can get better latency with neglig= ible overhead (0.1%? 0.05%), then that might make sense too. For example, it= seems plausible that checking need_resched() every few blocks adds basicall= y no overhead, and the SIMD helpers could do this themselves or perhaps only= ever do a block at a time. need_resched() costs a cacheline access, but it=E2=80=99s usually a hot cach= eline, and the actual check is just whether a certain bit in memory is set.= From mboxrd@z Thu Jan 1 00:00:00 1970 From: luto@amacapital.net (Andy Lutomirski) Date: Thu, 27 Sep 2018 09:26:59 -0700 Subject: [PATCH net-next v6 07/23] zinc: ChaCha20 ARM and ARM64 implementations In-Reply-To: References: <20180925145622.29959-1-Jason@zx2c4.com> <20180925145622.29959-8-Jason@zx2c4.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org > On Sep 27, 2018, at 8:19 AM, Jason A. Donenfeld wrote: > > Hey again Thomas, > >> On Thu, Sep 27, 2018 at 3:26 PM Jason A. Donenfeld wrote: >> >> Hi Thomas, >> >> I'm trying to optimize this for crypto performance while still taking >> into account preemption concerns. I'm having a bit of trouble figuring >> out a way to determine numerically what the upper bounds for this >> stuff looks like. I'm sure I could pick a pretty sane number that's >> arguably okay -- and way under the limit -- but I still am interested >> in determining what that limit actually is. I was hoping there'd be a >> debugging option called, "warn if preemption is disabled for too >> long", or something, but I couldn't find anything like that. I'm also >> not quite sure what the latency limits are, to just compute this with >> a formula. Essentially what I'm trying to determine is: >> >> preempt_disable(); >> asm volatile(".fill N, 1, 0x90;"); >> preempt_enable(); >> >> What is the maximum value of N for which the above is okay? What >> technique would you generally use in measuring this? >> >> Thanks, >> Jason > > From talking to Peter (now CC'd) on IRC, it sounds like what you're > mostly interested in is clocktime latency on reasonable hardware, with > a goal of around ~20?s as a maximum upper bound? I don't expect to get > anywhere near this value at all, but if you can confirm that's a > decent ballpark, it would make for some interesting calculations. > > I would add another consideration: if you can get better latency with negligible overhead (0.1%? 0.05%), then that might make sense too. For example, it seems plausible that checking need_resched() every few blocks adds basically no overhead, and the SIMD helpers could do this themselves or perhaps only ever do a block at a time. need_resched() costs a cacheline access, but it?s usually a hot cacheline, and the actual check is just whether a certain bit in memory is set.