From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DDB9C64EBC for ; Wed, 3 Oct 2018 01:03:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 205022082A for ; Wed, 3 Oct 2018 01:03:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=zx2c4.com header.i=@zx2c4.com header.b="LIe52dxo" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 205022082A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=zx2c4.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726666AbeJCHt2 (ORCPT ); Wed, 3 Oct 2018 03:49:28 -0400 Received: from frisell.zx2c4.com ([192.95.5.64]:51063 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725767AbeJCHt2 (ORCPT ); Wed, 3 Oct 2018 03:49:28 -0400 Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTP id 472b9b18; Wed, 3 Oct 2018 01:03:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=mime-version :references:in-reply-to:from:date:message-id:subject:to:cc :content-type; s=mail; bh=dXOih4WeXhYPQcclndDyyEar5p0=; b=LIe52d xo38m4gQrNP2E77wfyrsLIQCZ1fX+fssx8qJv3ZFUQ8hLsRRQ1TbbmfU9E4gMKww TyNToIUsAw7RYrhuING7/ojLpo1m4R3Z0da/AzEJ1OX8TajclbsE83U0fL3osyzJ N+WYVPshvJHDTanVcBZSnWzL237JXxQ3CbLmu4PsZVROhg1Q/QTtqQTdlCMvpD9u hZQcxB1k5qw2WuIZopl7qUo25VBBvp7etwDPlAniaNCSLoAAba+Csvr0hcDrfDzY gE9YY9KGn3h+ba+POhzQUiW/kCJdCGVxVgU1OrjYwKTsEsdOUFbgsV38pLPB0L+P ffoluV0Pe/ef3kGA== Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id c3d2dbfe (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128:NO); Wed, 3 Oct 2018 01:03:14 +0000 (UTC) Received: by mail-oi1-f174.google.com with SMTP id e17-v6so3109264oig.12; Tue, 02 Oct 2018 18:03:21 -0700 (PDT) X-Gm-Message-State: ABuFfoiPqVmQNRiTl2CvPgpbdR/JLI5hcAHi25MTtD+kJZnTLEcCn0h6 vPeHr+KHRoN8bWtDHKdA/EXkjX7L01MEfA96Iqk= X-Google-Smtp-Source: ACcGV61omk0WxQaTnjP9rV8fyE7k8yRYZ1GybUmogXLwl6MHkEmwKzisHApk0nba6ODI56yleprw5qVVV2B3xqZ3KsU= X-Received: by 2002:aca:df42:: with SMTP id w63-v6mr8347618oig.295.1538528600437; Tue, 02 Oct 2018 18:03:20 -0700 (PDT) MIME-Version: 1.0 References: <20180925145622.29959-1-Jason@zx2c4.com> <20180925145622.29959-20-Jason@zx2c4.com> In-Reply-To: From: "Jason A. Donenfeld" Date: Wed, 3 Oct 2018 03:03:09 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH net-next v6 19/23] zinc: Curve25519 ARM implementation To: Ard Biesheuvel Cc: LKML , Netdev , Linux Crypto Mailing List , David Miller , Greg Kroah-Hartman , Samuel Neves , Andrew Lutomirski , Jean-Philippe Aumasson , Russell King - ARM Linux , linux-arm-kernel@lists.infradead.org, Peter Schwabe , "Daniel J . Bernstein" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org (+Dan,Peter in CC. Replying to: for context.) Hi Ard, On Tue, Oct 2, 2018 at 6:59 PM Ard Biesheuvel wrote: > Shouldn't this use the new simd abstraction as well? Yes, it probably should, thanks. > I guess qhasm means generated code, right? > Because many of these adds are completely redundant ... > This looks odd as well. > Could you elaborate on what qhasm is exactly? And, as with the other > patches, I would prefer it if we could have your changes as a separate > patch (although having the qhasm base would be preferred) Indeed qhasm converts this -- -- into this. It's a thing from Dan (CC'd now) -- . As you've requested, I can layer the patches to show our changes on top. > ... you can drop this add > same here > and here > and here > and here > and here > and here > and here > redundant add > I'll stop here - let me just note that this code does not strike me as > particularly well optimized for in-order cores (such as A7). > For instance, the sequence > can be reordered as > and not have every other instruction depend on the output of the previous one. > Obviously, the ultimate truth is in the benchmark numbers, but I'd > thought I'd mention it anyway. Yes indeed the output is suboptimal in a lot of places. We can gradually clean this up -- slowly and carefully over time -- if you want. I can also look into producing a new implementation within HACL* so that it's verified. Assurance-wise, though, I feel pretty good about this implementation considering its origins, its breadth of use (in BoringSSL), the fuzzing hours it's incurred, and the actual implementation itself. Either way, performance-wise, it's really worth having. For example, on a Cortex-A7, we get these results (according to get_cycles()): neon: 23142 cycles per call fiat32: 49136 cycles per call donna32: 71988 cycles per call And on a Cortex-A9, we get these results (according to get_cycles()): neon: 5020 cycles per call fiat32: 17326 cycles per call donna32: 28076 cycles per call Jason