linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] crypto: X25519 supports for ppc64le
@ 2024-05-14 17:38 Danny Tsen
  2024-05-14 17:38 ` [PATCH 1/3] crypto: X25519 low-level primitives " Danny Tsen
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Danny Tsen @ 2024-05-14 17:38 UTC (permalink / raw)
  To: linux-crypto
  Cc: herbert, leitao, nayna, appro, linux-kernel, linuxppc-dev, mpe,
	ltcgcw, dtsen, Danny Tsen

This patch series provide X25519 support for ppc64le with a new module
curve25519-ppc64le.

The implementation is based on CRYPTOGAMs perl output from x25519-ppc64.pl.
Modified and added 3 supporting functions.

This patch has passed the selftest by running modprobe
curve25519-ppc64le.

Danny Tsen (3):
  X25519 low-level primitives for ppc64le.
  X25519 core functions to handle scalar multiplication for ppc64le.
  Update Kconfig and Makefile.

 arch/powerpc/crypto/Kconfig                   |  11 +
 arch/powerpc/crypto/Makefile                  |   2 +
 arch/powerpc/crypto/curve25519-ppc64le-core.c | 299 ++++++++
 arch/powerpc/crypto/curve25519-ppc64le_asm.S  | 648 ++++++++++++++++++
 4 files changed, 960 insertions(+)
 create mode 100644 arch/powerpc/crypto/curve25519-ppc64le-core.c
 create mode 100644 arch/powerpc/crypto/curve25519-ppc64le_asm.S

-- 
2.31.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
  2024-05-14 17:38 [PATCH 0/3] crypto: X25519 supports for ppc64le Danny Tsen
@ 2024-05-14 17:38 ` Danny Tsen
  2024-05-15  8:11   ` Andy Polyakov
                     ` (2 more replies)
  2024-05-14 17:38 ` [PATCH 2/3] crypto: X25519 core functions " Danny Tsen
  2024-05-14 17:38 ` [PATCH 3/3] crypto: Update Kconfig and Makefile for ppc64le x25519 Danny Tsen
  2 siblings, 3 replies; 21+ messages in thread
From: Danny Tsen @ 2024-05-14 17:38 UTC (permalink / raw)
  To: linux-crypto
  Cc: herbert, leitao, nayna, appro, linux-kernel, linuxppc-dev, mpe,
	ltcgcw, dtsen, Danny Tsen

Use the perl output of x25519-ppc64.pl from CRYPTOGAMs and added three
supporting functions, x25519_fe51_sqr_times, x25519_fe51_frombytes
and x25519_fe51_tobytes.

Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
---
 arch/powerpc/crypto/curve25519-ppc64le_asm.S | 648 +++++++++++++++++++
 1 file changed, 648 insertions(+)
 create mode 100644 arch/powerpc/crypto/curve25519-ppc64le_asm.S

diff --git a/arch/powerpc/crypto/curve25519-ppc64le_asm.S b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
new file mode 100644
index 000000000000..8a018104838a
--- /dev/null
+++ b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
@@ -0,0 +1,648 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#
+# Copyright 2024- IBM Corp.  All Rights Reserved.
+#
+# This code is taken from CRYPTOGAMs[1] and is included here using the option
+# in the license to distribute the code under the GPL. Therefore this program
+# is free software; you can redistribute it and/or modify it under the terms of
+# the GNU General Public License version 2 as published by the Free Software
+# Foundation.
+#
+# [1] https://www.openssl.org/~appro/cryptogams/
+
+# Copyright (c) 2006-2017, CRYPTOGAMS by <appro@openssl.org>
+# All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions
+# are met:
+#
+#       * Redistributions of source code must retain copyright notices,
+#         this list of conditions and the following disclaimer.
+#
+#       * Redistributions in binary form must reproduce the above
+#         copyright notice, this list of conditions and the following
+#         disclaimer in the documentation and/or other materials
+#         provided with the distribution.
+#
+#       * Neither the name of the CRYPTOGAMS nor the names of its
+#         copyright holder and contributors may be used to endorse or
+#         promote products derived from this software without specific
+#         prior written permission.
+#
+# ALTERNATIVELY, provided that this notice is retained in full, this
+# product may be distributed under the terms of the GNU General Public
+# License (GPL), in which case the provisions of the GPL apply INSTEAD OF
+# those given above.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS
+# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# ====================================================================
+# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
+# project. The module is, however, dual licensed under OpenSSL and
+# CRYPTOGAMS licenses depending on where you obtain it. For further
+# details see https://www.openssl.org/~appro/cryptogams/.
+# ====================================================================
+
+#
+# ====================================================================
+# Written and Modified by Danny Tsen <dtsen@us.ibm.com>
+# - Added x25519_fe51_sqr_times, x25519_fe51_frombytes, x25519_fe51_tobytes
+#
+# X25519 lower-level primitives for PPC64.
+#
+
+#include <linux/linkage.h>
+
+.machine "any"
+.abiversion	2
+.text
+
+SYM_FUNC_START(x25519_fe51_mul)
+.align	5
+
+	stdu	1,-144(1)
+	std	21,56(1)
+	std	22,64(1)
+	std	23,72(1)
+	std	24,80(1)
+	std	25,88(1)
+	std	26,96(1)
+	std	27,104(1)
+	std	28,112(1)
+	std	29,120(1)
+	std	30,128(1)
+	std	31,136(1)
+
+	ld	6,0(5)
+	ld	7,0(4)
+	ld	8,8(4)
+	ld	9,16(4)
+	ld	10,24(4)
+	ld	11,32(4)
+
+	mulld	22,7,6
+	mulhdu	23,7,6
+
+	mulld	24,8,6
+	mulhdu	25,8,6
+
+	mulld	30,11,6
+	mulhdu	31,11,6
+	ld	4,8(5)
+	mulli	11,11,19
+
+	mulld	26,9,6
+	mulhdu	27,9,6
+
+	mulld	28,10,6
+	mulhdu	29,10,6
+	mulld	12,11,4
+	mulhdu	21,11,4
+	addc	22,22,12
+	adde	23,23,21
+
+	mulld	12,7,4
+	mulhdu	21,7,4
+	addc	24,24,12
+	adde	25,25,21
+
+	mulld	12,10,4
+	mulhdu	21,10,4
+	ld	6,16(5)
+	mulli	10,10,19
+	addc	30,30,12
+	adde	31,31,21
+
+	mulld	12,8,4
+	mulhdu	21,8,4
+	addc	26,26,12
+	adde	27,27,21
+
+	mulld	12,9,4
+	mulhdu	21,9,4
+	addc	28,28,12
+	adde	29,29,21
+	mulld	12,10,6
+	mulhdu	21,10,6
+	addc	22,22,12
+	adde	23,23,21
+
+	mulld	12,11,6
+	mulhdu	21,11,6
+	addc	24,24,12
+	adde	25,25,21
+
+	mulld	12,9,6
+	mulhdu	21,9,6
+	ld	4,24(5)
+	mulli	9,9,19
+	addc	30,30,12
+	adde	31,31,21
+
+	mulld	12,7,6
+	mulhdu	21,7,6
+	addc	26,26,12
+	adde	27,27,21
+
+	mulld	12,8,6
+	mulhdu	21,8,6
+	addc	28,28,12
+	adde	29,29,21
+	mulld	12,9,4
+	mulhdu	21,9,4
+	addc	22,22,12
+	adde	23,23,21
+
+	mulld	12,10,4
+	mulhdu	21,10,4
+	addc	24,24,12
+	adde	25,25,21
+
+	mulld	12,8,4
+	mulhdu	21,8,4
+	ld	6,32(5)
+	mulli	8,8,19
+	addc	30,30,12
+	adde	31,31,21
+
+	mulld	12,11,4
+	mulhdu	21,11,4
+	addc	26,26,12
+	adde	27,27,21
+
+	mulld	12,7,4
+	mulhdu	21,7,4
+	addc	28,28,12
+	adde	29,29,21
+	mulld	12,8,6
+	mulhdu	21,8,6
+	addc	22,22,12
+	adde	23,23,21
+
+	mulld	12,9,6
+	mulhdu	21,9,6
+	addc	24,24,12
+	adde	25,25,21
+
+	mulld	12,10,6
+	mulhdu	21,10,6
+	addc	26,26,12
+	adde	27,27,21
+
+	mulld	12,11,6
+	mulhdu	21,11,6
+	addc	28,28,12
+	adde	29,29,21
+
+	mulld	12,7,6
+	mulhdu	21,7,6
+	addc	30,30,12
+	adde	31,31,21
+
+.Lfe51_reduce:
+	li	0,-1
+	srdi	0,0,13
+
+	srdi	12,26,51
+	and	9,26,0
+	insrdi	12,27,51,0
+	srdi	21,22,51
+	and	7,22,0
+	insrdi	21,23,51,0
+	addc	28,28,12
+	addze	29,29
+	addc	24,24,21
+	addze	25,25
+
+	srdi	12,28,51
+	and	10,28,0
+	insrdi	12,29,51,0
+	srdi	21,24,51
+	and	8,24,0
+	insrdi	21,25,51,0
+	addc	30,30,12
+	addze	31,31
+	add	9,9,21
+
+	srdi	12,30,51
+	and	11,30,0
+	insrdi	12,31,51,0
+	mulli	12,12,19
+
+	add	7,7,12
+
+	srdi	21,9,51
+	and	9,9,0
+	add	10,10,21
+
+	srdi	12,7,51
+	and	7,7,0
+	add	8,8,12
+
+	std	9,16(3)
+	std	10,24(3)
+	std	11,32(3)
+	std	7,0(3)
+	std	8,8(3)
+
+	ld	21,56(1)
+	ld	22,64(1)
+	ld	23,72(1)
+	ld	24,80(1)
+	ld	25,88(1)
+	ld	26,96(1)
+	ld	27,104(1)
+	ld	28,112(1)
+	ld	29,120(1)
+	ld	30,128(1)
+	ld	31,136(1)
+	addi	1,1,144
+	blr
+SYM_FUNC_END(x25519_fe51_mul)
+
+SYM_FUNC_START(x25519_fe51_sqr)
+.align	5
+
+	stdu	1,-144(1)
+	std	21,56(1)
+	std	22,64(1)
+	std	23,72(1)
+	std	24,80(1)
+	std	25,88(1)
+	std	26,96(1)
+	std	27,104(1)
+	std	28,112(1)
+	std	29,120(1)
+	std	30,128(1)
+	std	31,136(1)
+
+	ld	7,0(4)
+	ld	8,8(4)
+	ld	9,16(4)
+	ld	10,24(4)
+	ld	11,32(4)
+
+	add	6,7,7
+	mulli	21,11,19
+
+	mulld	22,7,7
+	mulhdu	23,7,7
+	mulld	24,8,6
+	mulhdu	25,8,6
+	mulld	26,9,6
+	mulhdu	27,9,6
+	mulld	28,10,6
+	mulhdu	29,10,6
+	mulld	30,11,6
+	mulhdu	31,11,6
+	add	6,8,8
+	mulld	12,11,21
+	mulhdu	11,11,21
+	addc	28,28,12
+	adde	29,29,11
+
+	mulli	5,10,19
+
+	mulld	12,8,8
+	mulhdu	11,8,8
+	addc	26,26,12
+	adde	27,27,11
+	mulld	12,9,6
+	mulhdu	11,9,6
+	addc	28,28,12
+	adde	29,29,11
+	mulld	12,10,6
+	mulhdu	11,10,6
+	addc	30,30,12
+	adde	31,31,11
+	mulld	12,21,6
+	mulhdu	11,21,6
+	add	6,10,10
+	addc	22,22,12
+	adde	23,23,11
+	mulld	12,10,5
+	mulhdu	10,10,5
+	addc	24,24,12
+	adde	25,25,10
+	mulld	12,6,21
+	mulhdu	10,6,21
+	add	6,9,9
+	addc	26,26,12
+	adde	27,27,10
+
+	mulld	12,9,9
+	mulhdu	10,9,9
+	addc	30,30,12
+	adde	31,31,10
+	mulld	12,5,6
+	mulhdu	10,5,6
+	addc	22,22,12
+	adde	23,23,10
+	mulld	12,21,6
+	mulhdu	10,21,6
+	addc	24,24,12
+	adde	25,25,10
+
+	b	.Lfe51_reduce
+SYM_FUNC_END(x25519_fe51_sqr)
+
+SYM_FUNC_START(x25519_fe51_mul121666)
+.align	5
+
+	stdu	1,-144(1)
+	std	21,56(1)
+	std	22,64(1)
+	std	23,72(1)
+	std	24,80(1)
+	std	25,88(1)
+	std	26,96(1)
+	std	27,104(1)
+	std	28,112(1)
+	std	29,120(1)
+	std	30,128(1)
+	std	31,136(1)
+
+	lis	6,1
+	ori	6,6,56130
+	ld	7,0(4)
+	ld	8,8(4)
+	ld	9,16(4)
+	ld	10,24(4)
+	ld	11,32(4)
+
+	mulld	22,7,6
+	mulhdu	23,7,6
+	mulld	24,8,6
+	mulhdu	25,8,6
+	mulld	26,9,6
+	mulhdu	27,9,6
+	mulld	28,10,6
+	mulhdu	29,10,6
+	mulld	30,11,6
+	mulhdu	31,11,6
+
+	b	.Lfe51_reduce
+SYM_FUNC_END(x25519_fe51_mul121666)
+
+SYM_FUNC_START(x25519_fe51_sqr_times)
+.align	5
+
+	stdu	1,-144(1)
+	std	21,56(1)
+	std	22,64(1)
+	std	23,72(1)
+	std	24,80(1)
+	std	25,88(1)
+	std	26,96(1)
+	std	27,104(1)
+	std	28,112(1)
+	std	29,120(1)
+	std	30,128(1)
+	std	31,136(1)
+
+	ld	7,0(4)
+	ld	8,8(4)
+	ld	9,16(4)
+	ld	10,24(4)
+	ld	11,32(4)
+
+	mtctr	5
+
+.Lsqr_times_loop:
+	add	6,7,7
+	mulli	21,11,19
+
+	mulld	22,7,7
+	mulhdu	23,7,7
+	mulld	24,8,6
+	mulhdu	25,8,6
+	mulld	26,9,6
+	mulhdu	27,9,6
+	mulld	28,10,6
+	mulhdu	29,10,6
+	mulld	30,11,6
+	mulhdu	31,11,6
+	add	6,8,8
+	mulld	12,11,21
+	mulhdu	11,11,21
+	addc	28,28,12
+	adde	29,29,11
+
+	mulli	5,10,19
+
+	mulld	12,8,8
+	mulhdu	11,8,8
+	addc	26,26,12
+	adde	27,27,11
+	mulld	12,9,6
+	mulhdu	11,9,6
+	addc	28,28,12
+	adde	29,29,11
+	mulld	12,10,6
+	mulhdu	11,10,6
+	addc	30,30,12
+	adde	31,31,11
+	mulld	12,21,6
+	mulhdu	11,21,6
+	add	6,10,10
+	addc	22,22,12
+	adde	23,23,11
+	mulld	12,10,5
+	mulhdu	10,10,5
+	addc	24,24,12
+	adde	25,25,10
+	mulld	12,6,21
+	mulhdu	10,6,21
+	add	6,9,9
+	addc	26,26,12
+	adde	27,27,10
+
+	mulld	12,9,9
+	mulhdu	10,9,9
+	addc	30,30,12
+	adde	31,31,10
+	mulld	12,5,6
+	mulhdu	10,5,6
+	addc	22,22,12
+	adde	23,23,10
+	mulld	12,21,6
+	mulhdu	10,21,6
+	addc	24,24,12
+	adde	25,25,10
+
+	# fe51_reduce
+	li	0,-1
+	srdi	0,0,13
+
+	srdi	12,26,51
+	and	9,26,0
+	insrdi	12,27,51,0
+	srdi	21,22,51
+	and	7,22,0
+	insrdi	21,23,51,0
+	addc	28,28,12
+	addze	29,29
+	addc	24,24,21
+	addze	25,25
+
+	srdi	12,28,51
+	and	10,28,0
+	insrdi	12,29,51,0
+	srdi	21,24,51
+	and	8,24,0
+	insrdi	21,25,51,0
+	addc	30,30,12
+	addze	31,31
+	add	9,9,21
+
+	srdi	12,30,51
+	and	11,30,0
+	insrdi	12,31,51,0
+	mulli	12,12,19
+
+	add	7,7,12
+
+	srdi	21,9,51
+	and	9,9,0
+	add	10,10,21
+
+	srdi	12,7,51
+	and	7,7,0
+	add	8,8,12
+
+	std	9,16(3)
+	std	10,24(3)
+	std	11,32(3)
+	std	7,0(3)
+	std	8,8(3)
+	bdnz	.Lsqr_times_loop
+
+	ld	21,56(1)
+	ld	22,64(1)
+	ld	23,72(1)
+	ld	24,80(1)
+	ld	25,88(1)
+	ld	26,96(1)
+	ld	27,104(1)
+	ld	28,112(1)
+	ld	29,120(1)
+	ld	30,128(1)
+	ld	31,136(1)
+	addi	1,1,144
+	blr
+SYM_FUNC_END(x25519_fe51_sqr_times)
+
+SYM_FUNC_START(x25519_fe51_frombytes)
+.align	5
+
+	li	12, -1
+	srdi	12, 12, 13	# 0x7ffffffffffff
+
+	ld	5, 0(4)
+	ld	6, 8(4)
+	ld	7, 16(4)
+	ld	8, 24(4)
+
+	srdi	10, 5, 51
+	and	5, 5, 12	# h0
+
+	sldi	11, 6, 13
+	or	11, 10, 11	# h1t
+	srdi	10, 6, 38
+	and	6, 11, 12	# h1
+
+	sldi	11, 7, 26
+	or	10, 10, 11	# h2t
+
+	srdi	11, 7, 25
+	and	7, 10, 12	# h2
+	sldi	10, 8, 39
+	or	11, 11, 10	# h3t
+
+	srdi	9, 8, 12
+	and	8, 11, 12	# h3
+	and	9, 9, 12	# h4
+
+	std	5, 0(3)
+	std	6, 8(3)
+	std	7, 16(3)
+	std	8, 24(3)
+	std	9, 32(3)
+
+	blr
+SYM_FUNC_END(x25519_fe51_frombytes)
+
+SYM_FUNC_START(x25519_fe51_tobytes)
+.align	5
+
+	ld	5, 0(4)
+	ld	6, 8(4)
+	ld	7, 16(4)
+	ld	8, 24(4)
+	ld	9, 32(4)
+
+	li	12, -1
+	srdi	12, 12, 13	# 0x7ffffffffffff
+
+	# Full reducuction
+	addi	10, 5, 19
+	srdi	10, 10, 51
+	add	10, 10, 6
+	srdi	10, 10, 51
+	add	10, 10, 7
+	srdi	10, 10, 51
+	add	10, 10, 8
+	srdi	10, 10, 51
+	add	10, 10, 9
+	srdi	10, 10, 51
+
+	mulli	10, 10, 19
+	add	5, 5, 10
+	srdi	11, 5, 51
+	add	6, 6, 11
+	srdi	11, 6, 51
+	add	7, 7, 11
+	srdi	11, 7, 51
+	add	8, 8, 11
+	srdi	11, 8, 51
+	add	9, 9, 11
+
+	and	5, 5, 12
+	and	6, 6, 12
+	and	7, 7, 12
+	and	8, 8, 12
+	and	9, 9, 12
+
+	sldi	10, 6, 51
+	or	5, 5, 10	# s0
+
+	srdi	11, 6, 13
+	sldi	10, 7, 38
+	or	6, 11, 10	# s1
+
+	srdi	11, 7, 26
+	sldi	10, 8, 25
+	or	7, 11, 10	# s2
+
+	srdi	11, 8, 39
+	sldi	10, 9, 12
+	or	8, 11, 10	# s4
+
+	std	5, 0(3)
+	std	6, 8(3)
+	std	7, 16(3)
+	std	8, 24(3)
+
+	blr
+SYM_FUNC_END(x25519_fe51_tobytes)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/3] crypto: X25519 core functions for ppc64le
  2024-05-14 17:38 [PATCH 0/3] crypto: X25519 supports for ppc64le Danny Tsen
  2024-05-14 17:38 ` [PATCH 1/3] crypto: X25519 low-level primitives " Danny Tsen
@ 2024-05-14 17:38 ` Danny Tsen
  2024-05-15  8:29   ` Andy Polyakov
  2024-05-14 17:38 ` [PATCH 3/3] crypto: Update Kconfig and Makefile for ppc64le x25519 Danny Tsen
  2 siblings, 1 reply; 21+ messages in thread
From: Danny Tsen @ 2024-05-14 17:38 UTC (permalink / raw)
  To: linux-crypto
  Cc: herbert, leitao, nayna, appro, linux-kernel, linuxppc-dev, mpe,
	ltcgcw, dtsen, Danny Tsen

X25519 core functions to handle scalar multiplication for ppc64le.

Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
---
 arch/powerpc/crypto/curve25519-ppc64le-core.c | 299 ++++++++++++++++++
 1 file changed, 299 insertions(+)
 create mode 100644 arch/powerpc/crypto/curve25519-ppc64le-core.c

diff --git a/arch/powerpc/crypto/curve25519-ppc64le-core.c b/arch/powerpc/crypto/curve25519-ppc64le-core.c
new file mode 100644
index 000000000000..6a8b5efc40ce
--- /dev/null
+++ b/arch/powerpc/crypto/curve25519-ppc64le-core.c
@@ -0,0 +1,299 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright 2024- IBM Corp. All rights reserved.
+ *
+ * X25519 scalar multiplication with 51 bits limbs for PPC64le.
+ *   Based on RFC7748 and AArch64 optimized implementation for X25519
+ *     - Algorithm 1 Scalar multiplication of a variable point
+ */
+
+#include <crypto/curve25519.h>
+#include <crypto/internal/kpp.h>
+
+#include <linux/types.h>
+#include <linux/jump_label.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/scatterlist.h>
+
+#include <linux/cpufeature.h>
+#include <linux/processor.h>
+
+typedef uint64_t fe51[5];
+
+asmlinkage void x25519_fe51_mul(fe51 h, const fe51 f, const fe51 g);
+asmlinkage void x25519_fe51_sqr(fe51 h, const fe51 f);
+asmlinkage void x25519_fe51_mul121666(fe51 h, fe51 f);
+asmlinkage void x25519_fe51_sqr_times(fe51 h, const fe51 f, int n);
+asmlinkage void x25519_fe51_frombytes(fe51 h, const uint8_t *s);
+asmlinkage void x25519_fe51_tobytes(uint8_t *s, const fe51 h);
+
+#define fmul x25519_fe51_mul
+#define fsqr x25519_fe51_sqr
+#define fmul121666 x25519_fe51_mul121666
+#define fe51_tobytes x25519_fe51_tobytes
+#define fe51_frombytes x25519_fe51_frombytes
+
+static void cswap(fe51 p, fe51 q, unsigned int bit)
+{
+	u64 t, i;
+	u64 c = 0 - (u64) bit;
+
+	for (i = 0; i < 5; ++i) {
+		t = c & (p[i] ^ q[i]);
+		p[i] ^= t;
+		q[i] ^= t;
+	}
+}
+
+static void fadd(fe51 h, const fe51 f, const fe51 g)
+{
+	h[0] = f[0] + g[0];
+	h[1] = f[1] + g[1];
+	h[2] = f[2] + g[2];
+	h[3] = f[3] + g[3];
+	h[4] = f[4] + g[4];
+}
+
+/*
+ * Prime = 2 ** 255 - 19, 255 bits
+ *    (0x7fffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffed)
+ *
+ * Prime in 5 51-bit limbs
+ */
+static fe51 prime51 = { 0x7ffffffffffed, 0x7ffffffffffff, 0x7ffffffffffff, 0x7ffffffffffff, 0x7ffffffffffff};
+
+static void fsub(fe51 h, const fe51 f, const fe51 g)
+{
+	h[0] = (f[0] + ((prime51[0] * 2))) - g[0];
+	h[1] = (f[1] + ((prime51[1] * 2))) - g[1];
+	h[2] = (f[2] + ((prime51[2] * 2))) - g[2];
+	h[3] = (f[3] + ((prime51[3] * 2))) - g[3];
+	h[4] = (f[4] + ((prime51[4] * 2))) - g[4];
+}
+
+static void finv(fe51 o, const fe51 i)
+{
+	fe51 a0, b, c, t00;
+
+	fsqr(a0, i);
+	x25519_fe51_sqr_times(t00, a0, 2);
+
+	fmul(b, t00, i);
+	fmul(a0, b, a0);
+
+	fsqr(t00, a0);
+
+	fmul(b, t00, b);
+	x25519_fe51_sqr_times(t00, b, 5);
+
+	fmul(b, t00, b);
+	x25519_fe51_sqr_times(t00, b, 10);
+
+	fmul(c, t00, b);
+	x25519_fe51_sqr_times(t00, c, 20);
+
+	fmul(t00, t00, c);
+	x25519_fe51_sqr_times(t00, t00, 10);
+
+	fmul(b, t00, b);
+	x25519_fe51_sqr_times(t00, b, 50);
+
+	fmul(c, t00, b);
+	x25519_fe51_sqr_times(t00, c, 100);
+
+	fmul(t00, t00, c);
+	x25519_fe51_sqr_times(t00, t00, 50);
+
+	fmul(t00, t00, b);
+	x25519_fe51_sqr_times(t00, t00, 5);
+
+	fmul(o, t00, a0);
+}
+
+static void curve25519_fe51(uint8_t out[32], const uint8_t scalar[32],
+			    const uint8_t point[32])
+{
+	fe51 x1, x2, z2, x3, z3;
+	uint8_t s[32];
+	unsigned int swap = 0;
+	int i;
+
+	memcpy(s, scalar, 32);
+	s[0]  &= 0xf8;
+	s[31] &= 0x7f;
+	s[31] |= 0x40;
+	fe51_frombytes(x1, point);
+
+	z2[0] = z2[1] = z2[2] = z2[3] = z2[4] = 0;
+	x3[0] = x1[0];
+	x3[1] = x1[1];
+	x3[2] = x1[2];
+	x3[3] = x1[3];
+	x3[4] = x1[4];
+
+	x2[0] = z3[0] = 1;
+	x2[1] = z3[1] = 0;
+	x2[2] = z3[2] = 0;
+	x2[3] = z3[3] = 0;
+	x2[4] = z3[4] = 0;
+
+	for (i = 254; i >= 0; --i) {
+		unsigned int k_t = 1 & (s[i / 8] >> (i & 7));
+		fe51 a, b, c, d, e;
+		fe51 da, cb, aa, bb;
+		fe51 dacb_p, dacb_m;
+
+		swap ^= k_t;
+		cswap(x2, x3, swap);
+		cswap(z2, z3, swap);
+		swap = k_t;
+
+		fsub(b, x2, z2);		// B = x_2 - z_2
+		fadd(a, x2, z2);		// A = x_2 + z_2
+		fsub(d, x3, z3);		// D = x_3 - z_3
+		fadd(c, x3, z3);		// C = x_3 + z_3
+
+		fsqr(bb, b);			// BB = B^2
+		fsqr(aa, a);			// AA = A^2
+		fmul(da, d, a);			// DA = D * A
+		fmul(cb, c, b);			// CB = C * B
+
+		fsub(e, aa, bb);		// E = AA - BB
+		fmul(x2, aa, bb);		// x2 = AA * BB
+		fadd(dacb_p, da, cb);		// DA + CB
+		fsub(dacb_m, da, cb);		// DA - CB
+
+		fmul121666(z3, e);		// 121666 * E
+		fsqr(z2, dacb_m);		// (DA - CB)^2
+		fsqr(x3, dacb_p);		// x3 = (DA + CB)^2
+		fadd(b, bb, z3);		// BB + 121666 * E
+		fmul(z3, x1, z2);		// z3 = x1 * (DA - CB)^2
+		fmul(z2, e, b);		// z2 = e * (BB + (DA + CB)^2)
+	}
+
+	finv(z2, z2);
+	fmul(x2, x2, z2);
+	fe51_tobytes(out, x2);
+}
+
+void curve25519_arch(u8 mypublic[CURVE25519_KEY_SIZE],
+		     const u8 secret[CURVE25519_KEY_SIZE],
+		     const u8 basepoint[CURVE25519_KEY_SIZE])
+{
+	curve25519_fe51(mypublic, secret, basepoint);
+}
+EXPORT_SYMBOL(curve25519_arch);
+
+void curve25519_base_arch(u8 pub[CURVE25519_KEY_SIZE],
+			  const u8 secret[CURVE25519_KEY_SIZE])
+{
+	curve25519_fe51(pub, secret, curve25519_base_point);
+}
+EXPORT_SYMBOL(curve25519_base_arch);
+
+static int curve25519_set_secret(struct crypto_kpp *tfm, const void *buf,
+				 unsigned int len)
+{
+	u8 *secret = kpp_tfm_ctx(tfm);
+
+	if (!len)
+		curve25519_generate_secret(secret);
+	else if (len == CURVE25519_KEY_SIZE &&
+		 crypto_memneq(buf, curve25519_null_point, CURVE25519_KEY_SIZE))
+		memcpy(secret, buf, CURVE25519_KEY_SIZE);
+	else
+		return -EINVAL;
+	return 0;
+}
+
+static int curve25519_generate_public_key(struct kpp_request *req)
+{
+	struct crypto_kpp *tfm = crypto_kpp_reqtfm(req);
+	const u8 *secret = kpp_tfm_ctx(tfm);
+	u8 buf[CURVE25519_KEY_SIZE];
+	int copied, nbytes;
+
+	if (req->src)
+		return -EINVAL;
+
+	curve25519_base_arch(buf, secret);
+
+	/* might want less than we've got */
+	nbytes = min_t(size_t, CURVE25519_KEY_SIZE, req->dst_len);
+	copied = sg_copy_from_buffer(req->dst, sg_nents_for_len(req->dst,
+								nbytes),
+				     buf, nbytes);
+	if (copied != nbytes)
+		return -EINVAL;
+	return 0;
+}
+
+static int curve25519_compute_shared_secret(struct kpp_request *req)
+{
+	struct crypto_kpp *tfm = crypto_kpp_reqtfm(req);
+	const u8 *secret = kpp_tfm_ctx(tfm);
+	u8 public_key[CURVE25519_KEY_SIZE];
+	u8 buf[CURVE25519_KEY_SIZE];
+	int copied, nbytes;
+
+	if (!req->src)
+		return -EINVAL;
+
+	copied = sg_copy_to_buffer(req->src,
+				   sg_nents_for_len(req->src,
+						    CURVE25519_KEY_SIZE),
+				   public_key, CURVE25519_KEY_SIZE);
+	if (copied != CURVE25519_KEY_SIZE)
+		return -EINVAL;
+
+	curve25519_arch(buf, secret, public_key);
+
+	/* might want less than we've got */
+	nbytes = min_t(size_t, CURVE25519_KEY_SIZE, req->dst_len);
+	copied = sg_copy_from_buffer(req->dst, sg_nents_for_len(req->dst,
+								nbytes),
+				     buf, nbytes);
+	if (copied != nbytes)
+		return -EINVAL;
+	return 0;
+}
+
+static unsigned int curve25519_max_size(struct crypto_kpp *tfm)
+{
+	return CURVE25519_KEY_SIZE;
+}
+
+static struct kpp_alg curve25519_alg = {
+	.base.cra_name		= "curve25519",
+	.base.cra_driver_name	= "curve25519-ppc64le",
+	.base.cra_priority	= 200,
+	.base.cra_module	= THIS_MODULE,
+	.base.cra_ctxsize	= CURVE25519_KEY_SIZE,
+
+	.set_secret		= curve25519_set_secret,
+	.generate_public_key	= curve25519_generate_public_key,
+	.compute_shared_secret	= curve25519_compute_shared_secret,
+	.max_size		= curve25519_max_size,
+};
+
+
+static int __init curve25519_mod_init(void)
+{
+	return IS_REACHABLE(CONFIG_CRYPTO_KPP) ?
+		crypto_register_kpp(&curve25519_alg) : 0;
+}
+
+static void __exit curve25519_mod_exit(void)
+{
+	if (IS_REACHABLE(CONFIG_CRYPTO_KPP))
+		crypto_unregister_kpp(&curve25519_alg);
+}
+
+module_init(curve25519_mod_init);
+module_exit(curve25519_mod_exit);
+
+MODULE_ALIAS_CRYPTO("curve25519");
+MODULE_ALIAS_CRYPTO("curve25519-ppc64le");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Danny Tsen <dtsen@us.ibm.com>");
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/3] crypto: Update Kconfig and Makefile for ppc64le x25519.
  2024-05-14 17:38 [PATCH 0/3] crypto: X25519 supports for ppc64le Danny Tsen
  2024-05-14 17:38 ` [PATCH 1/3] crypto: X25519 low-level primitives " Danny Tsen
  2024-05-14 17:38 ` [PATCH 2/3] crypto: X25519 core functions " Danny Tsen
@ 2024-05-14 17:38 ` Danny Tsen
  2 siblings, 0 replies; 21+ messages in thread
From: Danny Tsen @ 2024-05-14 17:38 UTC (permalink / raw)
  To: linux-crypto
  Cc: herbert, leitao, nayna, appro, linux-kernel, linuxppc-dev, mpe,
	ltcgcw, dtsen, Danny Tsen

Defined CRYPTO_CURVE25519_PPC64 to support X25519 for ppc64le.

Added new module curve25519-ppc64le for X25519.

Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
---
 arch/powerpc/crypto/Kconfig  | 11 +++++++++++
 arch/powerpc/crypto/Makefile |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/arch/powerpc/crypto/Kconfig b/arch/powerpc/crypto/Kconfig
index 1e201b7ae2fc..09ebcbdfb34f 100644
--- a/arch/powerpc/crypto/Kconfig
+++ b/arch/powerpc/crypto/Kconfig
@@ -2,6 +2,17 @@
 
 menu "Accelerated Cryptographic Algorithms for CPU (powerpc)"
 
+config CRYPTO_CURVE25519_PPC64
+	tristate "Public key crypto: Curve25519 (PowerPC64)"
+	depends on PPC64 && CPU_LITTLE_ENDIAN
+	select CRYPTO_LIB_CURVE25519_GENERIC
+	select CRYPTO_ARCH_HAVE_LIB_CURVE25519
+	help
+	  Curve25519 algorithm
+
+	  Architecture: PowerPC64
+	  - Little-endian
+
 config CRYPTO_CRC32C_VPMSUM
 	tristate "CRC32c"
 	depends on PPC64 && ALTIVEC
diff --git a/arch/powerpc/crypto/Makefile b/arch/powerpc/crypto/Makefile
index fca0e9739869..59808592f0a1 100644
--- a/arch/powerpc/crypto/Makefile
+++ b/arch/powerpc/crypto/Makefile
@@ -17,6 +17,7 @@ obj-$(CONFIG_CRYPTO_AES_GCM_P10) += aes-gcm-p10-crypto.o
 obj-$(CONFIG_CRYPTO_CHACHA20_P10) += chacha-p10-crypto.o
 obj-$(CONFIG_CRYPTO_POLY1305_P10) += poly1305-p10-crypto.o
 obj-$(CONFIG_CRYPTO_DEV_VMX_ENCRYPT) += vmx-crypto.o
+obj-$(CONFIG_CRYPTO_CURVE25519_PPC64) += curve25519-ppc64le.o
 
 aes-ppc-spe-y := aes-spe-core.o aes-spe-keys.o aes-tab-4k.o aes-spe-modes.o aes-spe-glue.o
 md5-ppc-y := md5-asm.o md5-glue.o
@@ -29,6 +30,7 @@ aes-gcm-p10-crypto-y := aes-gcm-p10-glue.o aes-gcm-p10.o ghashp10-ppc.o aesp10-p
 chacha-p10-crypto-y := chacha-p10-glue.o chacha-p10le-8x.o
 poly1305-p10-crypto-y := poly1305-p10-glue.o poly1305-p10le_64.o
 vmx-crypto-objs := vmx.o aesp8-ppc.o ghashp8-ppc.o aes.o aes_cbc.o aes_ctr.o aes_xts.o ghash.o
+curve25519-ppc64le-y := curve25519-ppc64le-core.o curve25519-ppc64le_asm.o
 
 ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
 override flavour := linux-ppc64le
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
  2024-05-14 17:38 ` [PATCH 1/3] crypto: X25519 low-level primitives " Danny Tsen
@ 2024-05-15  8:11   ` Andy Polyakov
  2024-05-15 12:59     ` Danny Tsen
  2024-05-15  9:06   ` Andy Polyakov
  2024-05-16  4:53   ` Michael Ellerman
  2 siblings, 1 reply; 21+ messages in thread
From: Andy Polyakov @ 2024-05-15  8:11 UTC (permalink / raw)
  To: Danny Tsen, linux-crypto
  Cc: herbert, leitao, nayna, linux-kernel, linuxppc-dev, mpe, ltcgcw, dtsen

Hi,

Couple of remarks inline.

> +# [1] https://www.openssl.org/~appro/cryptogams/

https://github.com/dot-asm/cryptogams/ is arguably better reference.

> +SYM_FUNC_START(x25519_fe51_mul)
> +.align	5

The goal is to align the label, not the first instruction after the 
directive. It's not a problem in this spot, in the beginning of the 
module that is, but further below it's likely to inject redundant nops 
between the label and meaningful code. But since the directive in 
question is not position-sensitive one can resolve this by changing the 
order of the directive and the SYM_FUNC_START macro.

Cheers.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le
  2024-05-14 17:38 ` [PATCH 2/3] crypto: X25519 core functions " Danny Tsen
@ 2024-05-15  8:29   ` Andy Polyakov
  2024-05-15 13:06     ` Danny Tsen
                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Andy Polyakov @ 2024-05-15  8:29 UTC (permalink / raw)
  To: Danny Tsen, linux-crypto
  Cc: herbert, leitao, nayna, linux-kernel, linuxppc-dev, mpe, ltcgcw, dtsen

Hi,

> +static void cswap(fe51 p, fe51 q, unsigned int bit)
> +{
> +	u64 t, i;
> +	u64 c = 0 - (u64) bit;
> +
> +	for (i = 0; i < 5; ++i) {
> +		t = c & (p[i] ^ q[i]);
> +		p[i] ^= t;
> +		q[i] ^= t;
> +	}
> +}

The "c" in cswap stands for "constant-time," and the problem is that 
contemporary compilers have exhibited the ability to produce 
non-constant-time machine code as result of compilation of the above 
kind of technique. The outcome is platform-specific and ironically some 
of PPC code generators were observed to generate "most" 
non-constant-time code. "Most" in sense that execution time variations 
would be most easy to catch. One way to work around the problem, at 
least for the time being, is to add 'asm volatile("" : "+r"(c))' after 
you calculate 'c'. But there is no guarantee that the next compiler 
version won't see through it, hence the permanent solution is to do it 
in assembly. I can put together something...

Cheers.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
  2024-05-14 17:38 ` [PATCH 1/3] crypto: X25519 low-level primitives " Danny Tsen
  2024-05-15  8:11   ` Andy Polyakov
@ 2024-05-15  9:06   ` Andy Polyakov
  2024-05-15 13:04     ` Danny Tsen
  2024-05-16  4:53   ` Michael Ellerman
  2 siblings, 1 reply; 21+ messages in thread
From: Andy Polyakov @ 2024-05-15  9:06 UTC (permalink / raw)
  To: Danny Tsen, linux-crypto
  Cc: herbert, leitao, nayna, linux-kernel, linuxppc-dev, mpe, ltcgcw, dtsen

Hi,

> +SYM_FUNC_START(x25519_fe51_sqr_times)
> ...
> +
> +.Lsqr_times_loop:
> ...
> +
> +	std	9,16(3)
> +	std	10,24(3)
> +	std	11,32(3)
> +	std	7,0(3)
> +	std	8,8(3)
> +	bdnz	.Lsqr_times_loop

I see no reason for why the stores can't be moved outside the loop in 
question.

> +SYM_FUNC_START(x25519_fe51_frombytes)
> +.align	5
> +
> +	li	12, -1
> +	srdi	12, 12, 13	# 0x7ffffffffffff
> +
> +	ld	5, 0(4)
> +	ld	6, 8(4)
> +	ld	7, 16(4)
> +	ld	8, 24(4)

Is there actual guarantee that the byte input is 64-bit aligned? While 
it is true that processor is obliged to handle misaligned loads and 
stores by the ISA specification, them being inefficient doesn't go 
against it. Most notably inefficiency is likely to be noted at the page 
boundaries. What I'm trying to say is that it would be more appropriate 
to avoid the unaligned loads (and stores).

Cheers.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
  2024-05-15  8:11   ` Andy Polyakov
@ 2024-05-15 12:59     ` Danny Tsen
  0 siblings, 0 replies; 21+ messages in thread
From: Danny Tsen @ 2024-05-15 12:59 UTC (permalink / raw)
  To: Andy Polyakov, linux-crypto
  Cc: herbert, leitao, nayna, linux-kernel, linuxppc-dev, mpe, ltcgcw, dtsen

Thank you Andy.  Will fix this.

On 5/15/24 3:11 AM, Andy Polyakov wrote:
> Hi,
>
> Couple of remarks inline.
>
>> +# [1] https://www.openssl.org/~appro/cryptogams/
>
> https://github.com/dot-asm/cryptogams/ is arguably better reference.
>
>> +SYM_FUNC_START(x25519_fe51_mul)
>> +.align    5
>
> The goal is to align the label, not the first instruction after the 
> directive. It's not a problem in this spot, in the beginning of the 
> module that is, but further below it's likely to inject redundant nops 
> between the label and meaningful code. But since the directive in 
> question is not position-sensitive one can resolve this by changing 
> the order of the directive and the SYM_FUNC_START macro.
>
> Cheers.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
  2024-05-15  9:06   ` Andy Polyakov
@ 2024-05-15 13:04     ` Danny Tsen
  0 siblings, 0 replies; 21+ messages in thread
From: Danny Tsen @ 2024-05-15 13:04 UTC (permalink / raw)
  To: Andy Polyakov, linux-crypto
  Cc: herbert, leitao, nayna, linux-kernel, linuxppc-dev, mpe, ltcgcw, dtsen

See inline.

On 5/15/24 4:06 AM, Andy Polyakov wrote:
> Hi,
>
>> +SYM_FUNC_START(x25519_fe51_sqr_times)
>> ...
>> +
>> +.Lsqr_times_loop:
>> ...
>> +
>> +    std    9,16(3)
>> +    std    10,24(3)
>> +    std    11,32(3)
>> +    std    7,0(3)
>> +    std    8,8(3)
>> +    bdnz    .Lsqr_times_loop
>
> I see no reason for why the stores can't be moved outside the loop in 
> question.
>
Yeah.  I'll fix it.


>> +SYM_FUNC_START(x25519_fe51_frombytes)
>> +.align    5
>> +
>> +    li    12, -1
>> +    srdi    12, 12, 13    # 0x7ffffffffffff
>> +
>> +    ld    5, 0(4)
>> +    ld    6, 8(4)
>> +    ld    7, 16(4)
>> +    ld    8, 24(4)
>
> Is there actual guarantee that the byte input is 64-bit aligned? While 
> it is true that processor is obliged to handle misaligned loads and 
> stores by the ISA specification, them being inefficient doesn't go 
> against it. Most notably inefficiency is likely to be noted at the 
> page boundaries. What I'm trying to say is that it would be more 
> appropriate to avoid the unaligned loads (and stores).

Good point.  Maybe I can handle it with 64-bit aligned for the input.

Thanks.


>
> Cheers.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le
  2024-05-15  8:29   ` Andy Polyakov
@ 2024-05-15 13:06     ` Danny Tsen
  2024-05-15 13:33     ` Andy Polyakov
  2024-05-16 19:28     ` Segher Boessenkool
  2 siblings, 0 replies; 21+ messages in thread
From: Danny Tsen @ 2024-05-15 13:06 UTC (permalink / raw)
  To: Andy Polyakov, linux-crypto
  Cc: herbert, leitao, nayna, linux-kernel, linuxppc-dev, mpe, ltcgcw, dtsen

Hi Andy,

Points taken.  And much appreciate for the help.

Thanks.

-Danny

On 5/15/24 3:29 AM, Andy Polyakov wrote:
> Hi,
>
>> +static void cswap(fe51 p, fe51 q, unsigned int bit)
>> +{
>> +    u64 t, i;
>> +    u64 c = 0 - (u64) bit;
>> +
>> +    for (i = 0; i < 5; ++i) {
>> +        t = c & (p[i] ^ q[i]);
>> +        p[i] ^= t;
>> +        q[i] ^= t;
>> +    }
>> +}
>
> The "c" in cswap stands for "constant-time," and the problem is that 
> contemporary compilers have exhibited the ability to produce 
> non-constant-time machine code as result of compilation of the above 
> kind of technique. The outcome is platform-specific and ironically 
> some of PPC code generators were observed to generate "most" 
> non-constant-time code. "Most" in sense that execution time variations 
> would be most easy to catch. One way to work around the problem, at 
> least for the time being, is to add 'asm volatile("" : "+r"(c))' after 
> you calculate 'c'. But there is no guarantee that the next compiler 
> version won't see through it, hence the permanent solution is to do it 
> in assembly. I can put together something...
>
> Cheers.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le
  2024-05-15  8:29   ` Andy Polyakov
  2024-05-15 13:06     ` Danny Tsen
@ 2024-05-15 13:33     ` Andy Polyakov
  2024-05-15 13:58       ` Danny Tsen
  2024-05-16 19:28     ` Segher Boessenkool
  2 siblings, 1 reply; 21+ messages in thread
From: Andy Polyakov @ 2024-05-15 13:33 UTC (permalink / raw)
  To: Danny Tsen, linux-crypto
  Cc: herbert, leitao, nayna, linux-kernel, linuxppc-dev, mpe, ltcgcw, dtsen

>> +static void cswap(fe51 p, fe51 q, unsigned int bit)
>> +{
>> +    u64 t, i;
>> +    u64 c = 0 - (u64) bit;
>> +
>> +    for (i = 0; i < 5; ++i) {
>> +        t = c & (p[i] ^ q[i]);
>> +        p[i] ^= t;
>> +        q[i] ^= t;
>> +    }
>> +}
> 
> The "c" in cswap stands for "constant-time," and the problem is that 
> contemporary compilers have exhibited the ability to produce 
> non-constant-time machine code as result of compilation of the above 
> kind of technique. The outcome is platform-specific and ironically some 
> of PPC code generators were observed to generate "most" 
> non-constant-time code. "Most" in sense that execution time variations 
> would be most easy to catch.

Just to substantiate the point, consider 
https://godbolt.org/z/faYnEcPT7, and note the conditional branch in the 
middle of the loop, which flies in the face of constant-time-ness. In 
case you object 'bit &= 1' on line 7 in the C code. Indeed, if you 
comment it out, the generated code will be fine. But the point is that 
the compiler is capable of and was in fact observed to figure out that 
the caller passes either one or zero and generate the machine code in 
the assembly window. In other words 'bit &= 1' is just a reflection of 
what the caller does.

> ... the permanent solution is to do it 
> in assembly. I can put together something...

Though you should be able to do this just as well :-) So should I or 
would you?

Cheers.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le
  2024-05-15 13:33     ` Andy Polyakov
@ 2024-05-15 13:58       ` Danny Tsen
  2024-05-15 14:20         ` Andy Polyakov
  0 siblings, 1 reply; 21+ messages in thread
From: Danny Tsen @ 2024-05-15 13:58 UTC (permalink / raw)
  To: Andy Polyakov, linux-crypto
  Cc: herbert, leitao, nayna, linux-kernel, linuxppc-dev, mpe, ltcgcw, dtsen

Hi Andy,

Thanks for the info.  I should be able to do it.  I was hoping an 
assembly guru like you can show me some tricks here if there is :)

Thanks.

-Danny

On 5/15/24 8:33 AM, Andy Polyakov wrote:
>>> +static void cswap(fe51 p, fe51 q, unsigned int bit)
>>> +{
>>> +    u64 t, i;
>>> +    u64 c = 0 - (u64) bit;
>>> +
>>> +    for (i = 0; i < 5; ++i) {
>>> +        t = c & (p[i] ^ q[i]);
>>> +        p[i] ^= t;
>>> +        q[i] ^= t;
>>> +    }
>>> +}
>>
>> The "c" in cswap stands for "constant-time," and the problem is that 
>> contemporary compilers have exhibited the ability to produce 
>> non-constant-time machine code as result of compilation of the above 
>> kind of technique. The outcome is platform-specific and ironically 
>> some of PPC code generators were observed to generate "most" 
>> non-constant-time code. "Most" in sense that execution time 
>> variations would be most easy to catch.
>
> Just to substantiate the point, consider 
> https://godbolt.org/z/faYnEcPT7, and note the conditional branch in 
> the middle of the loop, which flies in the face of constant-time-ness. 
> In case you object 'bit &= 1' on line 7 in the C code. Indeed, if you 
> comment it out, the generated code will be fine. But the point is that 
> the compiler is capable of and was in fact observed to figure out that 
> the caller passes either one or zero and generate the machine code in 
> the assembly window. In other words 'bit &= 1' is just a reflection of 
> what the caller does.
>
>> ... the permanent solution is to do it in assembly. I can put 
>> together something...
>
> Though you should be able to do this just as well :-) So should I or 
> would you?
>
> Cheers.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le
  2024-05-15 13:58       ` Danny Tsen
@ 2024-05-15 14:20         ` Andy Polyakov
  0 siblings, 0 replies; 21+ messages in thread
From: Andy Polyakov @ 2024-05-15 14:20 UTC (permalink / raw)
  To: Danny Tsen, linux-crypto
  Cc: herbert, leitao, nayna, linux-kernel, linuxppc-dev, mpe, ltcgcw, dtsen

> Thanks for the info.  I should be able to do it.  I was hoping an 
> assembly guru like you can show me some tricks here if there is :)

No tricks in cswap, it's as straightforward as it gets, so go ahead :-)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
  2024-05-14 17:38 ` [PATCH 1/3] crypto: X25519 low-level primitives " Danny Tsen
  2024-05-15  8:11   ` Andy Polyakov
  2024-05-15  9:06   ` Andy Polyakov
@ 2024-05-16  4:53   ` Michael Ellerman
  2024-05-16  8:38     ` Andy Polyakov
  2024-05-16 11:38     ` Danny Tsen
  2 siblings, 2 replies; 21+ messages in thread
From: Michael Ellerman @ 2024-05-16  4:53 UTC (permalink / raw)
  To: Danny Tsen, linux-crypto
  Cc: herbert, leitao, nayna, appro, linux-kernel, linuxppc-dev,
	ltcgcw, dtsen, Danny Tsen

Hi Danny,

Danny Tsen <dtsen@linux.ibm.com> writes:
> Use the perl output of x25519-ppc64.pl from CRYPTOGAMs and added three
> supporting functions, x25519_fe51_sqr_times, x25519_fe51_frombytes
> and x25519_fe51_tobytes.

For other algorithms we have checked-in the perl script and generated
the code at runtime. Is there a reason you've done it differently this time?

> Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
> ---
>  arch/powerpc/crypto/curve25519-ppc64le_asm.S | 648 +++++++++++++++++++
>  1 file changed, 648 insertions(+)
>  create mode 100644 arch/powerpc/crypto/curve25519-ppc64le_asm.S
>
> diff --git a/arch/powerpc/crypto/curve25519-ppc64le_asm.S b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
> new file mode 100644
> index 000000000000..8a018104838a
> --- /dev/null
> +++ b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
> @@ -0,0 +1,648 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#
> +# Copyright 2024- IBM Corp.  All Rights Reserved.
 
I'm not a lawyer, but AFAIK "All Rights Reserved" is not required and
can be confusing - because we are not reserving all rights, we are
granting some rights under the GPL.

I also think the IBM copyright should be down below where your
modifications are described.

> +# This code is taken from CRYPTOGAMs[1] and is included here using the option
> +# in the license to distribute the code under the GPL. Therefore this program
> +# is free software; you can redistribute it and/or modify it under the terms of
> +# the GNU General Public License version 2 as published by the Free Software
> +# Foundation.
> +#
> +# [1] https://www.openssl.org/~appro/cryptogams/
> +
> +# Copyright (c) 2006-2017, CRYPTOGAMS by <appro@openssl.org>
> +# All rights reserved.
> +#
> +# Redistribution and use in source and binary forms, with or without
> +# modification, are permitted provided that the following conditions
> +# are met:
> +#
> +#       * Redistributions of source code must retain copyright notices,
> +#         this list of conditions and the following disclaimer.
> +#
> +#       * Redistributions in binary form must reproduce the above
> +#         copyright notice, this list of conditions and the following
> +#         disclaimer in the documentation and/or other materials
> +#         provided with the distribution.
> +#
> +#       * Neither the name of the CRYPTOGAMS nor the names of its
> +#         copyright holder and contributors may be used to endorse or
> +#         promote products derived from this software without specific
> +#         prior written permission.
> +#
> +# ALTERNATIVELY, provided that this notice is retained in full, this
> +# product may be distributed under the terms of the GNU General Public
> +# License (GPL), in which case the provisions of the GPL apply INSTEAD OF
> +# those given above.
> +#
> +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS
> +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +# ====================================================================
> +# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
> +# project. The module is, however, dual licensed under OpenSSL and
> +# CRYPTOGAMS licenses depending on where you obtain it. For further
> +# details see https://www.openssl.org/~appro/cryptogams/.
> +# ====================================================================
> +
> +#
> +# ====================================================================
> +# Written and Modified by Danny Tsen <dtsen@us.ibm.com>
> +# - Added x25519_fe51_sqr_times, x25519_fe51_frombytes, x25519_fe51_tobytes

ie. here.

> +# X25519 lower-level primitives for PPC64.
> +#
> +
> +#include <linux/linkage.h>
> +
> +.machine "any"
 
Please don't add new .machine directives unless they are required.

> +.abiversion	2

I'd prefer that was left to the compiler flags.

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
  2024-05-16  4:53   ` Michael Ellerman
@ 2024-05-16  8:38     ` Andy Polyakov
  2024-05-16 11:39       ` Danny Tsen
  2024-05-16 12:06       ` Michael Ellerman
  2024-05-16 11:38     ` Danny Tsen
  1 sibling, 2 replies; 21+ messages in thread
From: Andy Polyakov @ 2024-05-16  8:38 UTC (permalink / raw)
  To: Michael Ellerman, Danny Tsen, linux-crypto
  Cc: herbert, leitao, nayna, linux-kernel, linuxppc-dev, ltcgcw, dtsen

Hi,

>> +.abiversion	2
> 
> I'd prefer that was left to the compiler flags.

Problem is that it's the compiler that is responsible for providing this 
directive in the intermediate .s prior invoking the assembler. And there 
is no assembler flag to pass through -Wa. If concern is ABI neutrality, 
then solution would rather be #if (_CALL_ELF-0) == 2/#endif. One can 
also make a case for

#ifdef _CALL_ELF
.abiversion _CALL_ELF
#endif

Cheers.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
  2024-05-16  4:53   ` Michael Ellerman
  2024-05-16  8:38     ` Andy Polyakov
@ 2024-05-16 11:38     ` Danny Tsen
  1 sibling, 0 replies; 21+ messages in thread
From: Danny Tsen @ 2024-05-16 11:38 UTC (permalink / raw)
  To: Michael Ellerman, linux-crypto
  Cc: herbert, leitao, nayna, appro, linux-kernel, linuxppc-dev, ltcgcw, dtsen


On 5/15/24 11:53 PM, Michael Ellerman wrote:
> Hi Danny,
>
> Danny Tsen <dtsen@linux.ibm.com> writes:
>> Use the perl output of x25519-ppc64.pl from CRYPTOGAMs and added three
>> supporting functions, x25519_fe51_sqr_times, x25519_fe51_frombytes
>> and x25519_fe51_tobytes.
> For other algorithms we have checked-in the perl script and generated
> the code at runtime. Is there a reason you've done it differently this time?

Hi Michael,

It's easier for me to read and use just assembly not mixed with perl and 
it's easier for me to debug and testing also I copied some code and made 
some modification.

>> Signed-off-by: Danny Tsen <dtsen@linux.ibm.com>
>> ---
>>   arch/powerpc/crypto/curve25519-ppc64le_asm.S | 648 +++++++++++++++++++
>>   1 file changed, 648 insertions(+)
>>   create mode 100644 arch/powerpc/crypto/curve25519-ppc64le_asm.S
>>
>> diff --git a/arch/powerpc/crypto/curve25519-ppc64le_asm.S b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
>> new file mode 100644
>> index 000000000000..8a018104838a
>> --- /dev/null
>> +++ b/arch/powerpc/crypto/curve25519-ppc64le_asm.S
>> @@ -0,0 +1,648 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +#
>> +# Copyright 2024- IBM Corp.  All Rights Reserved.
>   
> I'm not a lawyer, but AFAIK "All Rights Reserved" is not required and
> can be confusing - because we are not reserving all rights, we are
> granting some rights under the GPL.
>
> I also think the IBM copyright should be down below where your
> modifications are described.
Will change that.
>> +# This code is taken from CRYPTOGAMs[1] and is included here using the option
>> +# in the license to distribute the code under the GPL. Therefore this program
>> +# is free software; you can redistribute it and/or modify it under the terms of
>> +# the GNU General Public License version 2 as published by the Free Software
>> +# Foundation.
>> +#
>> +# [1] https://www.openssl.org/~appro/cryptogams/
>> +
>> +# Copyright (c) 2006-2017, CRYPTOGAMS by <appro@openssl.org>
>> +# All rights reserved.
>> +#
>> +# Redistribution and use in source and binary forms, with or without
>> +# modification, are permitted provided that the following conditions
>> +# are met:
>> +#
>> +#       * Redistributions of source code must retain copyright notices,
>> +#         this list of conditions and the following disclaimer.
>> +#
>> +#       * Redistributions in binary form must reproduce the above
>> +#         copyright notice, this list of conditions and the following
>> +#         disclaimer in the documentation and/or other materials
>> +#         provided with the distribution.
>> +#
>> +#       * Neither the name of the CRYPTOGAMS nor the names of its
>> +#         copyright holder and contributors may be used to endorse or
>> +#         promote products derived from this software without specific
>> +#         prior written permission.
>> +#
>> +# ALTERNATIVELY, provided that this notice is retained in full, this
>> +# product may be distributed under the terms of the GNU General Public
>> +# License (GPL), in which case the provisions of the GPL apply INSTEAD OF
>> +# those given above.
>> +#
>> +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS
>> +# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> +# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> +# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> +# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> +# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> +# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> +# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> +
>> +# ====================================================================
>> +# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
>> +# project. The module is, however, dual licensed under OpenSSL and
>> +# CRYPTOGAMS licenses depending on where you obtain it. For further
>> +# details see https://www.openssl.org/~appro/cryptogams/.
>> +# ====================================================================
>> +
>> +#
>> +# ====================================================================
>> +# Written and Modified by Danny Tsen <dtsen@us.ibm.com>
>> +# - Added x25519_fe51_sqr_times, x25519_fe51_frombytes, x25519_fe51_tobytes
> ie. here.
>
>> +# X25519 lower-level primitives for PPC64.
>> +#
>> +
>> +#include <linux/linkage.h>
>> +
>> +.machine "any"
>   
> Please don't add new .machine directives unless they are required.
>
>> +.abiversion	2
> I'd prefer that was left to the compiler flags.

Ok.

Thanks.

-Danny

>
> cheers
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
  2024-05-16  8:38     ` Andy Polyakov
@ 2024-05-16 11:39       ` Danny Tsen
  2024-05-16 12:06       ` Michael Ellerman
  1 sibling, 0 replies; 21+ messages in thread
From: Danny Tsen @ 2024-05-16 11:39 UTC (permalink / raw)
  To: Andy Polyakov, Michael Ellerman, linux-crypto
  Cc: herbert, leitao, nayna, linux-kernel, linuxppc-dev, ltcgcw, dtsen

Hi Andy,

I learned something here.  Will fix this.  Thanks.

-Danny

On 5/16/24 3:38 AM, Andy Polyakov wrote:
> Hi,
>
>>> +.abiversion    2
>>
>> I'd prefer that was left to the compiler flags.
>
> Problem is that it's the compiler that is responsible for providing 
> this directive in the intermediate .s prior invoking the assembler. 
> And there is no assembler flag to pass through -Wa. If concern is ABI 
> neutrality, then solution would rather be #if (_CALL_ELF-0) == 
> 2/#endif. One can also make a case for
>
> #ifdef _CALL_ELF
> .abiversion _CALL_ELF
> #endif
>
> Cheers.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
  2024-05-16  8:38     ` Andy Polyakov
  2024-05-16 11:39       ` Danny Tsen
@ 2024-05-16 12:06       ` Michael Ellerman
  2024-05-16 13:42         ` Andy Polyakov
  2024-05-16 19:48         ` Segher Boessenkool
  1 sibling, 2 replies; 21+ messages in thread
From: Michael Ellerman @ 2024-05-16 12:06 UTC (permalink / raw)
  To: Andy Polyakov, Danny Tsen, linux-crypto
  Cc: herbert, leitao, nayna, linux-kernel, linuxppc-dev, ltcgcw, dtsen

Andy Polyakov <appro@cryptogams.org> writes:
> Hi,
>
>>> +.abiversion	2
>>
>> I'd prefer that was left to the compiler flags.
>
> Problem is that it's the compiler that is responsible for providing this
> directive in the intermediate .s prior invoking the assembler. And there
> is no assembler flag to pass through -Wa.

Hmm, right. But none of our existing .S files include .abiversion
directives.

We build .S files with gcc, passing -mabi=elfv2, but it seems to have no
effect.

So all the intermediate .o's generated from .S files are not ELFv2:

  $ find .build/ -name '*.o' | xargs file | grep Unspecified
  .build/arch/powerpc/kernel/vdso/note-64.o:                        ELF 64-bit LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, version 1 (SYSV), not stripped
  .build/arch/powerpc/kernel/vdso/sigtramp64-64.o:                  ELF 64-bit LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, version 1 (SYSV), not stripped
  .build/arch/powerpc/kernel/vdso/getcpu-64.o:                      ELF 64-bit LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, version 1 (SYSV), not stripped
  .build/arch/powerpc/kernel/vdso/gettimeofday-64.o:                ELF 64-bit LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, version 1 (SYSV), not stripped
  .build/arch/powerpc/kernel/vdso/datapage-64.o:                    ELF 64-bit LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, version 1 (SYSV), not stripped
  ...

But the actual code follows ELFv2, because we wrote it that way, and I
guess the linker doesn't look at the actual ABI version of the .o ?

So it currently works. But it's kind of gross that those .o files are
not ELFv2 for an ELFv2 build.

> If concern is ABI neutrality,
> then solution would rather be #if (_CALL_ELF-0) == 2/#endif. One can
> also make a case for
>
> #ifdef _CALL_ELF
> .abiversion _CALL_ELF
> #endif

Is .abiversion documented anywhere? I can't see it in the manual.

We used to use _CALL_ELF, but the kernel config is supposed to be the
source of truth, so we'd use:

  #ifdef CONFIG_PPC64_ELF_ABI_V2
  .abiversion 2
  #endif

And probably put it in a macro like:

  #ifdef CONFIG_PPC64_ELF_ABI_V2
  #define ASM_ABI_VERSION .abiversion 2
  #else
  #define ASM_ABI_VERSION
  #endif

Or something like that. But it's annoying that we need to go and
sprinkle that in every .S file.

Anyway, my comment can be ignored as far as this series is concerned,
seems we have to clean this up everywhere.

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
  2024-05-16 12:06       ` Michael Ellerman
@ 2024-05-16 13:42         ` Andy Polyakov
  2024-05-16 19:48         ` Segher Boessenkool
  1 sibling, 0 replies; 21+ messages in thread
From: Andy Polyakov @ 2024-05-16 13:42 UTC (permalink / raw)
  To: Michael Ellerman, Danny Tsen, linux-crypto
  Cc: herbert, leitao, nayna, linux-kernel, linuxppc-dev, ltcgcw, dtsen

Hi,

>>>> +.abiversion	2
>>>
>>> I'd prefer that was left to the compiler flags.
>>
>> Problem is that it's the compiler that is responsible for providing this
>> directive in the intermediate .s prior invoking the assembler. And there
>> is no assembler flag to pass through -Wa.
> 
> Hmm, right. But none of our existing .S files include .abiversion
> directives.
> 
> We build .S files with gcc, passing -mabi=elfv2, but it seems to have no
> effect.
> 
> So all the intermediate .o's generated from .S files are not ELFv2:
> 
>    $ find .build/ -name '*.o' | xargs file | grep Unspecified
>    .build/arch/powerpc/kernel/vdso/note-64.o:                        ELF 64-bit LSB relocatable, 64-bit PowerPC or cisco 7500, Unspecified or Power ELF V1 ABI, version 1 (SYSV), not stripped

I would guess that contemporary linker is more forgiving than it was 
back then when the .abiversion directive was added. If it works now, 
then it of course can be omitted. I suppose my original remark should be 
viewed rather as "you can't replace it with a command line option" than 
"you can't make it work without it." :-)

> But the actual code follows ELFv2, because we wrote it that way, and I
> guess the linker doesn't look at the actual ABI version of the .o ?
> 
> So it currently works. But it's kind of gross that those .o files are
> not ELFv2 for an ELFv2 build.

Well, as far as passing base types and pointers to/from assembly goes, 
there are no differences between the versions. Then it's a question of 
meaning assigned to r2 and r13, but as long as you don't touch them, you 
can freely reuse the code with either ABI. With this in mind the 
.abiversion directive is effectively reduced to just a marker in the .o 
file. In other words the instruction sequences by themselves are 
customarily ABI-neutral, at least in "general calculation" modules such 
as the suggested one, so that if it works 100% without the .abiversion 
directive, then it can be safely omitted.

Cheers.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/3] crypto: X25519 core functions for ppc64le
  2024-05-15  8:29   ` Andy Polyakov
  2024-05-15 13:06     ` Danny Tsen
  2024-05-15 13:33     ` Andy Polyakov
@ 2024-05-16 19:28     ` Segher Boessenkool
  2 siblings, 0 replies; 21+ messages in thread
From: Segher Boessenkool @ 2024-05-16 19:28 UTC (permalink / raw)
  To: Andy Polyakov
  Cc: Danny Tsen, linux-crypto, herbert, dtsen, nayna, linux-kernel,
	ltcgcw, leitao, linuxppc-dev

On Wed, May 15, 2024 at 10:29:56AM +0200, Andy Polyakov wrote:
> >+static void cswap(fe51 p, fe51 q, unsigned int bit)
> 
> The "c" in cswap stands for "constant-time," and the problem is that 
> contemporary compilers have exhibited the ability to produce 
> non-constant-time machine code as result of compilation of the above 
> kind of technique.

This can happen with *any* comnpiler, on *any* platform.  In general,
you have to write machine code if you want to be sure what machine code
will eventually be executed.

>  The outcome is platform-specific and ironically some 
> of PPC code generators were observed to generate "most" 
> non-constant-time code. "Most" in sense that execution time variations 
> would be most easy to catch. One way to work around the problem, at 
> least for the time being, is to add 'asm volatile("" : "+r"(c))' after 
> you calculate 'c'. But there is no guarantee that the next compiler 
> version won't see through it, hence the permanent solution is to do it 
> in assembly. I can put together something...

Such tricks can help ameliorate the problem, sure.  But it is not a
solution ever.


Segher

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] crypto: X25519 low-level primitives for ppc64le.
  2024-05-16 12:06       ` Michael Ellerman
  2024-05-16 13:42         ` Andy Polyakov
@ 2024-05-16 19:48         ` Segher Boessenkool
  1 sibling, 0 replies; 21+ messages in thread
From: Segher Boessenkool @ 2024-05-16 19:48 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Andy Polyakov, Danny Tsen, linux-crypto, herbert, dtsen, nayna,
	linux-kernel, ltcgcw, leitao, linuxppc-dev

Hi!

On Thu, May 16, 2024 at 10:06:58PM +1000, Michael Ellerman wrote:
> Andy Polyakov <appro@cryptogams.org> writes:
> >>> +.abiversion	2
> >>
> >> I'd prefer that was left to the compiler flags.
> >
> > Problem is that it's the compiler that is responsible for providing this
> > directive in the intermediate .s prior invoking the assembler. And there
> > is no assembler flag to pass through -Wa.
> 
> Hmm, right. But none of our existing .S files include .abiversion
> directives.
> 
> We build .S files with gcc, passing -mabi=elfv2, but it seems to have no
> effect.

Yup.  You coulds include some header file, maybe?  Since you run the
assembler code through the C preprocessor anyway, for some weird reason :-)

> But the actual code follows ELFv2, because we wrote it that way, and I
> guess the linker doesn't look at the actual ABI version of the .o ?

It isn't a version.  It is an actual different ABI.

GNU LD allows linking together whatever, yes.

> Is .abiversion documented anywhere? I can't see it in the manual.

Yeah me neither.  https://sourceware.org/bugzilla/enter_bug.cgi ?
A commandline flag (to GAS) would seem best?


Segher

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2024-05-16 20:10 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-14 17:38 [PATCH 0/3] crypto: X25519 supports for ppc64le Danny Tsen
2024-05-14 17:38 ` [PATCH 1/3] crypto: X25519 low-level primitives " Danny Tsen
2024-05-15  8:11   ` Andy Polyakov
2024-05-15 12:59     ` Danny Tsen
2024-05-15  9:06   ` Andy Polyakov
2024-05-15 13:04     ` Danny Tsen
2024-05-16  4:53   ` Michael Ellerman
2024-05-16  8:38     ` Andy Polyakov
2024-05-16 11:39       ` Danny Tsen
2024-05-16 12:06       ` Michael Ellerman
2024-05-16 13:42         ` Andy Polyakov
2024-05-16 19:48         ` Segher Boessenkool
2024-05-16 11:38     ` Danny Tsen
2024-05-14 17:38 ` [PATCH 2/3] crypto: X25519 core functions " Danny Tsen
2024-05-15  8:29   ` Andy Polyakov
2024-05-15 13:06     ` Danny Tsen
2024-05-15 13:33     ` Andy Polyakov
2024-05-15 13:58       ` Danny Tsen
2024-05-15 14:20         ` Andy Polyakov
2024-05-16 19:28     ` Segher Boessenkool
2024-05-14 17:38 ` [PATCH 3/3] crypto: Update Kconfig and Makefile for ppc64le x25519 Danny Tsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).