From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ard Biesheuvel Subject: Re: [PATCH net-next v6 07/23] zinc: ChaCha20 ARM and ARM64 implementations Date: Wed, 26 Sep 2018 16:02:22 +0200 Message-ID: References: <20180925145622.29959-1-Jason@zx2c4.com> <20180925145622.29959-8-Jason@zx2c4.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Linux Kernel Mailing List , "" , "open list:HARDWARE RANDOM NUMBER GENERATOR CORE" , "David S. Miller" , Greg Kroah-Hartman , Samuel Neves , Andy Lutomirski , Jean-Philippe Aumasson , Russell King , linux-arm-kernel To: "Jason A. Donenfeld" , Herbert Xu , Thomas Gleixner Return-path: In-Reply-To: Sender: netdev-owner@vger.kernel.org List-Id: linux-crypto.vger.kernel.org (+ Herbert, Thomas) On Wed, 26 Sep 2018 at 15:33, Jason A. Donenfeld wrote: > > Hi Ard, > > On Wed, Sep 26, 2018 at 10:59 AM Ard Biesheuvel > wrote: > > > +static inline bool chacha20_arch(struct chacha20_ctx *state, u8 *dst, > > > + const u8 *src, size_t len, > > > + simd_context_t *simd_context) > > > +{ > > > +#if defined(CONFIG_KERNEL_MODE_NEON) > > > + if (chacha20_use_neon && len >= CHACHA20_BLOCK_SIZE * 3 && > > > + simd_use(simd_context)) > > > + chacha20_neon(dst, src, len, state->key, state->counter); > > > + else > > > +#endif > > > > Better to use IS_ENABLED() here: > > > > > + if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON)) && > > > + chacha20_use_neon && len >= CHACHA20_BLOCK_SIZE * 3 && > > > + simd_use(simd_context)) > > Good idea. I'll fix that up. > > > > > Also, this still has unbounded worst case scheduling latency, given > > that the outer library function passes its entire input straight into > > the NEON routine. > > The vast majority of crypto routines in arch/*/crypto/ follow this > same exact pattern, actually. I realize a few don't -- probably the > ones you had a hand in :) -- but I think this is up to the caller to > handle. Anything that uses the scatterwalk API (AEADs and skciphers) will handle at most a page at a time. Hashes are different, which is why some of them have to handle it explicitly. > I made a change so that in chacha20poly1305.c, it calls > simd_relax after handling each scatter-gather element, so a > "construction" will handle this gracefully. But I believe it's up to > the caller to decide on what sizes of information it wants to pass to > primitives. Put differently, this also hasn't ever been an issue > before -- the existing state of the tree indicates this -- and so I > don't anticipate this will be a real issue now. The state of the tree does not capture all relevant context or history. The scheduling latency issue was brought up very recently by the -rt folks on the mailing lists. > And if it becomes one, > this is something we can address *later*, but certainly there's no use > of adding additional complexity to the initial patchset to do this > now. > You are introducing a very useful SIMD abstraction, but it lets code run with preemption disabled for unbounded amounts of time, and so now is the time to ensure we get it right. Part of the [justified] criticism on the current state of the crypto API is on its complexity, and so I don't think it makes sense to keep it simple now and add the complexity later (and the same concern applies to async support btw). From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCDBDC43382 for ; Wed, 26 Sep 2018 14:02:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7FE0520843 for ; Wed, 26 Sep 2018 14:02:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linaro.org header.i=@linaro.org header.b="HKWMGgO6" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7FE0520843 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727710AbeIZUPo (ORCPT ); Wed, 26 Sep 2018 16:15:44 -0400 Received: from mail-it1-f194.google.com ([209.85.166.194]:54593 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726880AbeIZUPn (ORCPT ); Wed, 26 Sep 2018 16:15:43 -0400 Received: by mail-it1-f194.google.com with SMTP id f14-v6so3057272ita.4 for ; Wed, 26 Sep 2018 07:02:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9E4pWiLPLRleapHaOvgVz85A5RfjeA6Uo1z3Q8Cse5Q=; b=HKWMGgO6ECbqsCiC8RTphA7IcDI/lt6KCo6iK43UXMaTmU/XOGCS566fyrPSBGIsG9 w5PTTFtd25nMlpRdCs7SBZacd3DSmtdPT6wNO8sBr3QrE94qAZeho/ZW+iRLOioBGxkR Bhvl/RZgZOTNY4TUCXmOMQ2F5Kpr6ELVxIyic= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9E4pWiLPLRleapHaOvgVz85A5RfjeA6Uo1z3Q8Cse5Q=; b=KhSw2FbUl4Su3gMTKqvELaAIwTOt7KpPIYIliO1M+XNVeLmEQjlD5ZjeV9h+Fl85fo f7zXD+ptLwnUbNJs4REtc8lMLkcHniZDmjEuRSx2HEYsb/TGsHYK2mh0mDIAMSi7JKF0 56s7mHUEb1Gt5vxWJmD0VWB8DcamjdBe90SfvGjO2+00s1rsDRv1GQndS1gUAXnTpIwD wxoSR+xSZy8TmHKcsOvRN/6n5g4G67btCqymLo3CJ4A1Pg6kAhWg8EuWYCuxVzj9e251 +sCCjzed6oOrefmOd6ee4nHN7BVIwNuzEUQQoKteDXphz3msmMFipatjRhbrrbqdXbpb qcyw== X-Gm-Message-State: ABuFfohHYGXPG4uvfwSrwmhRY6UIY1SUj9MGtvimslssxdcHhmLokFGC d+ZYCbX4CufZ/B9MZhG1iAosILNMDeZHsQmun7NWt/Tdykc= X-Google-Smtp-Source: ACcGV60cgmVYT+YNQqNYyqlPmEoWaQVAAjrzCOYH0YjaE8+M0F8OOFARtykDtlnMEAozKCI1ChdA7vKfq23kkCkUmNU= X-Received: by 2002:a24:e48e:: with SMTP id o136-v6mr5080014ith.58.1537970556452; Wed, 26 Sep 2018 07:02:36 -0700 (PDT) MIME-Version: 1.0 References: <20180925145622.29959-1-Jason@zx2c4.com> <20180925145622.29959-8-Jason@zx2c4.com> In-Reply-To: From: Ard Biesheuvel Date: Wed, 26 Sep 2018 16:02:22 +0200 Message-ID: Subject: Re: [PATCH net-next v6 07/23] zinc: ChaCha20 ARM and ARM64 implementations To: "Jason A. Donenfeld" , Herbert Xu , Thomas Gleixner Cc: Linux Kernel Mailing List , "" , "open list:HARDWARE RANDOM NUMBER GENERATOR CORE" , "David S. Miller" , Greg Kroah-Hartman , Samuel Neves , Andy Lutomirski , Jean-Philippe Aumasson , Russell King , linux-arm-kernel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (+ Herbert, Thomas) On Wed, 26 Sep 2018 at 15:33, Jason A. Donenfeld wrote: > > Hi Ard, > > On Wed, Sep 26, 2018 at 10:59 AM Ard Biesheuvel > wrote: > > > +static inline bool chacha20_arch(struct chacha20_ctx *state, u8 *dst, > > > + const u8 *src, size_t len, > > > + simd_context_t *simd_context) > > > +{ > > > +#if defined(CONFIG_KERNEL_MODE_NEON) > > > + if (chacha20_use_neon && len >= CHACHA20_BLOCK_SIZE * 3 && > > > + simd_use(simd_context)) > > > + chacha20_neon(dst, src, len, state->key, state->counter); > > > + else > > > +#endif > > > > Better to use IS_ENABLED() here: > > > > > + if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON)) && > > > + chacha20_use_neon && len >= CHACHA20_BLOCK_SIZE * 3 && > > > + simd_use(simd_context)) > > Good idea. I'll fix that up. > > > > > Also, this still has unbounded worst case scheduling latency, given > > that the outer library function passes its entire input straight into > > the NEON routine. > > The vast majority of crypto routines in arch/*/crypto/ follow this > same exact pattern, actually. I realize a few don't -- probably the > ones you had a hand in :) -- but I think this is up to the caller to > handle. Anything that uses the scatterwalk API (AEADs and skciphers) will handle at most a page at a time. Hashes are different, which is why some of them have to handle it explicitly. > I made a change so that in chacha20poly1305.c, it calls > simd_relax after handling each scatter-gather element, so a > "construction" will handle this gracefully. But I believe it's up to > the caller to decide on what sizes of information it wants to pass to > primitives. Put differently, this also hasn't ever been an issue > before -- the existing state of the tree indicates this -- and so I > don't anticipate this will be a real issue now. The state of the tree does not capture all relevant context or history. The scheduling latency issue was brought up very recently by the -rt folks on the mailing lists. > And if it becomes one, > this is something we can address *later*, but certainly there's no use > of adding additional complexity to the initial patchset to do this > now. > You are introducing a very useful SIMD abstraction, but it lets code run with preemption disabled for unbounded amounts of time, and so now is the time to ensure we get it right. Part of the [justified] criticism on the current state of the crypto API is on its complexity, and so I don't think it makes sense to keep it simple now and add the complexity later (and the same concern applies to async support btw). From mboxrd@z Thu Jan 1 00:00:00 1970 From: ard.biesheuvel@linaro.org (Ard Biesheuvel) Date: Wed, 26 Sep 2018 16:02:22 +0200 Subject: [PATCH net-next v6 07/23] zinc: ChaCha20 ARM and ARM64 implementations In-Reply-To: References: <20180925145622.29959-1-Jason@zx2c4.com> <20180925145622.29959-8-Jason@zx2c4.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org (+ Herbert, Thomas) On Wed, 26 Sep 2018 at 15:33, Jason A. Donenfeld wrote: > > Hi Ard, > > On Wed, Sep 26, 2018 at 10:59 AM Ard Biesheuvel > wrote: > > > +static inline bool chacha20_arch(struct chacha20_ctx *state, u8 *dst, > > > + const u8 *src, size_t len, > > > + simd_context_t *simd_context) > > > +{ > > > +#if defined(CONFIG_KERNEL_MODE_NEON) > > > + if (chacha20_use_neon && len >= CHACHA20_BLOCK_SIZE * 3 && > > > + simd_use(simd_context)) > > > + chacha20_neon(dst, src, len, state->key, state->counter); > > > + else > > > +#endif > > > > Better to use IS_ENABLED() here: > > > > > + if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON)) && > > > + chacha20_use_neon && len >= CHACHA20_BLOCK_SIZE * 3 && > > > + simd_use(simd_context)) > > Good idea. I'll fix that up. > > > > > Also, this still has unbounded worst case scheduling latency, given > > that the outer library function passes its entire input straight into > > the NEON routine. > > The vast majority of crypto routines in arch/*/crypto/ follow this > same exact pattern, actually. I realize a few don't -- probably the > ones you had a hand in :) -- but I think this is up to the caller to > handle. Anything that uses the scatterwalk API (AEADs and skciphers) will handle at most a page at a time. Hashes are different, which is why some of them have to handle it explicitly. > I made a change so that in chacha20poly1305.c, it calls > simd_relax after handling each scatter-gather element, so a > "construction" will handle this gracefully. But I believe it's up to > the caller to decide on what sizes of information it wants to pass to > primitives. Put differently, this also hasn't ever been an issue > before -- the existing state of the tree indicates this -- and so I > don't anticipate this will be a real issue now. The state of the tree does not capture all relevant context or history. The scheduling latency issue was brought up very recently by the -rt folks on the mailing lists. > And if it becomes one, > this is something we can address *later*, but certainly there's no use > of adding additional complexity to the initial patchset to do this > now. > You are introducing a very useful SIMD abstraction, but it lets code run with preemption disabled for unbounded amounts of time, and so now is the time to ensure we get it right. Part of the [justified] criticism on the current state of the crypto API is on its complexity, and so I don't think it makes sense to keep it simple now and add the complexity later (and the same concern applies to async support btw).