linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] RTAS changes for 6.4
@ 2023-03-06 21:33 Nathan Lynch via B4 Relay
  2023-03-06 21:33 ` [PATCH 1/8] powerpc/rtas: ensure 8-byte alignment for struct rtas_args Nathan Lynch via B4 Relay
                   ` (9 more replies)
  0 siblings, 10 replies; 32+ messages in thread
From: Nathan Lynch via B4 Relay @ 2023-03-06 21:33 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Nathan Lynch, Tyrel Datwyler, Nick Child, Andrew Donnellan,
	Scott Cheloha, Laurent Dufour, linuxppc-dev

Proposed changes for the RTAS subsystem and client code.

Fixes that are subject to backporting are at the front of the queue,
followed by documentation and cleanups, with enhancements at the end.

Noteworthy changes:
* Change sys_rtas() to consume -2/990x statuses instead of returning
  them to user space.
* Lockdep annotations for invariants in rtas.c.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
Nathan Lynch (8):
      powerpc/rtas: ensure 8-byte alignment for struct rtas_args
      powerpc/rtas: use memmove for potentially overlapping buffer copy
      powerpc/rtas: rtas_call_unlocked() kerneldoc
      powerpc/rtas: fix miswording in rtas_function kerneldoc
      powerpc/rtas: rename va_rtas_call_unlocked() to va_rtas_call()
      powerpc/rtas: lockdep annotations
      powerpc/rtas: warn on unsafe argument to rtas_call_unlocked()
      powerpc/rtas: consume retry statuses in sys_rtas()

 arch/powerpc/include/asm/rtas-types.h |  5 +-
 arch/powerpc/kernel/rtas.c            | 92 +++++++++++++++++++++++++----------
 2 files changed, 69 insertions(+), 28 deletions(-)
---
base-commit: 422fbcbf91303706823bc3babceb1df1a42112bf
change-id: 20230220-rtas-queue-for-6-4-214eb2ba1407

Best regards,
-- 
Nathan Lynch <nathanl@linux.ibm.com>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/8] powerpc/rtas: ensure 8-byte alignment for struct rtas_args
  2023-03-06 21:33 [PATCH 0/8] RTAS changes for 6.4 Nathan Lynch via B4 Relay
@ 2023-03-06 21:33 ` Nathan Lynch via B4 Relay
  2023-03-23  4:00   ` Andrew Donnellan
  2023-03-06 21:33 ` [PATCH 2/8] powerpc/rtas: use memmove for potentially overlapping buffer copy Nathan Lynch via B4 Relay
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Nathan Lynch via B4 Relay @ 2023-03-06 21:33 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Nathan Lynch, Tyrel Datwyler, Nick Child, Andrew Donnellan,
	Scott Cheloha, Laurent Dufour, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

CHRP and PAPR agree: "In order to make an RTAS call, the operating
system must construct an argument call buffer aligned on an eight byte
boundary in physically contiguous real memory [...]." (7.2.7 Calling
Mechanism and Conventions).

struct rtas_args is the type used for this argument call buffer. The
unarchitected 'rets' member happens to produce 8-byte alignment for
the struct on 64-bit targets in practice. But without an alignment
directive the structure will have only 4-byte alignment on 32-bit
targets:

  $ nm b/{before,after}/chrp32/vmlinux | grep rtas_args
  c096881c b rtas_args
  c0968820 b rtas_args

Add an alignment directive to the struct rtas_args declaration so all
instances have the alignment required by the specs. rtas-types.h no
longer refers to any spinlock types, so drop the spinlock_types.h
inclusion while we're here.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/include/asm/rtas-types.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas-types.h b/arch/powerpc/include/asm/rtas-types.h
index f2ad4a96cbc5..861145c8a021 100644
--- a/arch/powerpc/include/asm/rtas-types.h
+++ b/arch/powerpc/include/asm/rtas-types.h
@@ -2,7 +2,8 @@
 #ifndef _ASM_POWERPC_RTAS_TYPES_H
 #define _ASM_POWERPC_RTAS_TYPES_H
 
-#include <linux/spinlock_types.h>
+#include <linux/compiler_attributes.h>
+#include <linux/sizes.h>
 
 typedef __be32 rtas_arg_t;
 
@@ -12,7 +13,7 @@ struct rtas_args {
 	__be32 nret;
 	rtas_arg_t args[16];
 	rtas_arg_t *rets;     /* Pointer to return values in args[]. */
-};
+} __aligned(SZ_8);
 
 struct rtas_t {
 	unsigned long entry;		/* physical address pointer */

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/8] powerpc/rtas: use memmove for potentially overlapping buffer copy
  2023-03-06 21:33 [PATCH 0/8] RTAS changes for 6.4 Nathan Lynch via B4 Relay
  2023-03-06 21:33 ` [PATCH 1/8] powerpc/rtas: ensure 8-byte alignment for struct rtas_args Nathan Lynch via B4 Relay
@ 2023-03-06 21:33 ` Nathan Lynch via B4 Relay
  2023-03-23  4:09   ` Andrew Donnellan
  2023-03-06 21:33 ` [PATCH 3/8] powerpc/rtas: rtas_call_unlocked() kerneldoc Nathan Lynch via B4 Relay
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Nathan Lynch via B4 Relay @ 2023-03-06 21:33 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Nathan Lynch, Tyrel Datwyler, Nick Child, Andrew Donnellan,
	Scott Cheloha, Laurent Dufour, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Using memcpy() isn't safe when buf is identical to rtas_err_buf, which
can happen during boot before slab is up. Full context which may not
be obvious from the diff:

	if (altbuf) {
		buf = altbuf;
	} else {
		buf = rtas_err_buf;
		if (slab_is_available())
			buf = kmalloc(RTAS_ERROR_LOG_MAX, GFP_ATOMIC);
	}
	if (buf)
		memcpy(buf, rtas_err_buf, RTAS_ERROR_LOG_MAX);

This was found by inspection and I'm not aware of it causing problems
in practice. It appears to have been introduced by commit
033ef338b6e0 ("powerpc: Merge rtas.c into arch/powerpc/kernel"); the
old ppc64 version of this code did not have this problem.

Use memmove() instead.

Fixes: 033ef338b6e0 ("powerpc: Merge rtas.c into arch/powerpc/kernel")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 31175b34856a..9256cfaa8b6f 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -981,7 +981,7 @@ static char *__fetch_rtas_last_error(char *altbuf)
 				buf = kmalloc(RTAS_ERROR_LOG_MAX, GFP_ATOMIC);
 		}
 		if (buf)
-			memcpy(buf, rtas_err_buf, RTAS_ERROR_LOG_MAX);
+			memmove(buf, rtas_err_buf, RTAS_ERROR_LOG_MAX);
 	}
 
 	return buf;

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/8] powerpc/rtas: rtas_call_unlocked() kerneldoc
  2023-03-06 21:33 [PATCH 0/8] RTAS changes for 6.4 Nathan Lynch via B4 Relay
  2023-03-06 21:33 ` [PATCH 1/8] powerpc/rtas: ensure 8-byte alignment for struct rtas_args Nathan Lynch via B4 Relay
  2023-03-06 21:33 ` [PATCH 2/8] powerpc/rtas: use memmove for potentially overlapping buffer copy Nathan Lynch via B4 Relay
@ 2023-03-06 21:33 ` Nathan Lynch via B4 Relay
  2023-03-23  4:15   ` Andrew Donnellan
  2023-03-06 21:33 ` [PATCH 4/8] powerpc/rtas: fix miswording in rtas_function kerneldoc Nathan Lynch via B4 Relay
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Nathan Lynch via B4 Relay @ 2023-03-06 21:33 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Nathan Lynch, Tyrel Datwyler, Nick Child, Andrew Donnellan,
	Scott Cheloha, Laurent Dufour, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Add documentation for rtas_call_unlocked(), including details on how
it differs from rtas_call().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 9256cfaa8b6f..c73b01d722f6 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1016,6 +1016,23 @@ va_rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret,
 	do_enter_rtas(args);
 }
 
+/**
+ * rtas_call_unlocked() - Invoke an RTAS firmware function without synchronization.
+ * @args: RTAS parameter block to be used for the call, must obey RTAS addressing
+ *        constraints.
+ * @token: Identifies the function being invoked.
+ * @nargs: Number of input parameters. Does not include token.
+ * @nret: Number of output parameters, including the call status.
+ * @....: List of @nargs input parameters.
+ *
+ * Invokes the RTAS function indicated by @token, which the caller
+ * should obtain via rtas_function_token().
+ *
+ * This function is similar to rtas_call(), but must be used with a
+ * limited set of RTAS calls specifically exempted from the general
+ * requirement that only one RTAS call may be in progress at any
+ * time. Examples include stop-self and ibm,nmi-interlock.
+ */
 void rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret, ...)
 {
 	va_list list;

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/8] powerpc/rtas: fix miswording in rtas_function kerneldoc
  2023-03-06 21:33 [PATCH 0/8] RTAS changes for 6.4 Nathan Lynch via B4 Relay
                   ` (2 preceding siblings ...)
  2023-03-06 21:33 ` [PATCH 3/8] powerpc/rtas: rtas_call_unlocked() kerneldoc Nathan Lynch via B4 Relay
@ 2023-03-06 21:33 ` Nathan Lynch via B4 Relay
  2023-03-23  0:17   ` Andrew Donnellan
  2023-03-06 21:33 ` [PATCH 5/8] powerpc/rtas: rename va_rtas_call_unlocked() to va_rtas_call() Nathan Lynch via B4 Relay
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Nathan Lynch via B4 Relay @ 2023-03-06 21:33 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Nathan Lynch, Tyrel Datwyler, Nick Child, Andrew Donnellan,
	Scott Cheloha, Laurent Dufour, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

The 'filter' member is a pointer, not a bool; fix the wording
accordingly.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index c73b01d722f6..c29c38b1a55a 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -68,7 +68,7 @@ struct rtas_filter {
  *                            functions are believed to have no users on
  *                            ppc64le, and we want to keep it that way. It does
  *                            not make sense for this to be set when @filter
- *                            is false.
+ *                            is NULL.
  */
 struct rtas_function {
 	s32 token;

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/8] powerpc/rtas: rename va_rtas_call_unlocked() to va_rtas_call()
  2023-03-06 21:33 [PATCH 0/8] RTAS changes for 6.4 Nathan Lynch via B4 Relay
                   ` (3 preceding siblings ...)
  2023-03-06 21:33 ` [PATCH 4/8] powerpc/rtas: fix miswording in rtas_function kerneldoc Nathan Lynch via B4 Relay
@ 2023-03-06 21:33 ` Nathan Lynch via B4 Relay
  2023-03-23  4:17   ` Andrew Donnellan
  2023-03-29 12:24   ` Michael Ellerman
  2023-03-06 21:33 ` [PATCH 6/8] powerpc/rtas: lockdep annotations Nathan Lynch via B4 Relay
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 32+ messages in thread
From: Nathan Lynch via B4 Relay @ 2023-03-06 21:33 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Nathan Lynch, Tyrel Datwyler, Nick Child, Andrew Donnellan,
	Scott Cheloha, Laurent Dufour, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

The function name va_rtas_call_unlocked() is confusing: it may be
called with or without rtas_lock held. Rename it to va_rtas_call().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index c29c38b1a55a..96a10a0abe3a 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -996,9 +996,8 @@ static void __init init_error_log_max(void) {}
 #endif
 
 
-static void
-va_rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret,
-		      va_list list)
+static void va_rtas_call(struct rtas_args *args, int token, int nargs, int nret,
+			 va_list list)
 {
 	int i;
 
@@ -1038,7 +1037,7 @@ void rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret,
 	va_list list;
 
 	va_start(list, nret);
-	va_rtas_call_unlocked(args, token, nargs, nret, list);
+	va_rtas_call(args, token, nargs, nret, list);
 	va_end(list);
 }
 
@@ -1138,7 +1137,7 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...)
 	args = &rtas_args;
 
 	va_start(list, outputs);
-	va_rtas_call_unlocked(args, token, nargs, nret, list);
+	va_rtas_call(args, token, nargs, nret, list);
 	va_end(list);
 
 	/* A -1 return code indicates that the last command couldn't

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 6/8] powerpc/rtas: lockdep annotations
  2023-03-06 21:33 [PATCH 0/8] RTAS changes for 6.4 Nathan Lynch via B4 Relay
                   ` (4 preceding siblings ...)
  2023-03-06 21:33 ` [PATCH 5/8] powerpc/rtas: rename va_rtas_call_unlocked() to va_rtas_call() Nathan Lynch via B4 Relay
@ 2023-03-06 21:33 ` Nathan Lynch via B4 Relay
  2023-03-23  6:01   ` Andrew Donnellan
  2023-03-06 21:33 ` [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked() Nathan Lynch via B4 Relay
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Nathan Lynch via B4 Relay @ 2023-03-06 21:33 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Nathan Lynch, Tyrel Datwyler, Nick Child, Andrew Donnellan,
	Scott Cheloha, Laurent Dufour, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Add lockdep annotations for the following properties that must hold:

* Any error log retrieval must be atomically coupled with the prior
  RTAS call, without a window for another RTAS call to occur before the
  error log can be retrieved.

* All users of the core rtas_args parameter block must hold rtas_lock.

Move the definitions of rtas_lock and rtas_args up in the file so that
__do_enter_rtas_trace() can refer to them.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 96a10a0abe3a..633c925164e7 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -16,6 +16,7 @@
 #include <linux/init.h>
 #include <linux/kconfig.h>
 #include <linux/kernel.h>
+#include <linux/lockdep.h>
 #include <linux/memblock.h>
 #include <linux/of.h>
 #include <linux/of_fdt.h>
@@ -453,6 +454,16 @@ static struct rtas_function rtas_function_table[] __ro_after_init = {
 	},
 };
 
+/*
+ * Nearly all RTAS calls need to be serialized. All uses of the
+ * default rtas_args block must hold rtas_lock.
+ *
+ * Exceptions to the RTAS serialization requirement (e.g. stop-self)
+ * must use a separate rtas_args structure.
+ */
+static DEFINE_RAW_SPINLOCK(rtas_lock);
+static struct rtas_args rtas_args;
+
 /**
  * rtas_function_token() - RTAS function token lookup.
  * @handle: Function handle, e.g. RTAS_FN_EVENT_SCAN.
@@ -560,6 +571,9 @@ static void __do_enter_rtas(struct rtas_args *args)
 static void __do_enter_rtas_trace(struct rtas_args *args)
 {
 	const char *name = NULL;
+
+	if (args == &rtas_args)
+		lockdep_assert_held(&rtas_lock);
 	/*
 	 * If the tracepoints that consume the function name aren't
 	 * active, avoid the lookup.
@@ -619,16 +633,6 @@ static void do_enter_rtas(struct rtas_args *args)
 
 struct rtas_t rtas;
 
-/*
- * Nearly all RTAS calls need to be serialized. All uses of the
- * default rtas_args block must hold rtas_lock.
- *
- * Exceptions to the RTAS serialization requirement (e.g. stop-self)
- * must use a separate rtas_args structure.
- */
-static DEFINE_RAW_SPINLOCK(rtas_lock);
-static struct rtas_args rtas_args;
-
 DEFINE_SPINLOCK(rtas_data_buf_lock);
 EXPORT_SYMBOL_GPL(rtas_data_buf_lock);
 
@@ -951,6 +955,8 @@ static char *__fetch_rtas_last_error(char *altbuf)
 	u32 bufsz;
 	char *buf = NULL;
 
+	lockdep_assert_held(&rtas_lock);
+
 	if (token == -1)
 		return NULL;
 
@@ -1107,6 +1113,7 @@ static bool token_is_restricted_errinjct(s32 token)
  */
 int rtas_call(int token, int nargs, int nret, int *outputs, ...)
 {
+	struct pin_cookie cookie;
 	va_list list;
 	int i;
 	unsigned long flags;
@@ -1133,6 +1140,8 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...)
 	}
 
 	raw_spin_lock_irqsave(&rtas_lock, flags);
+	cookie = lockdep_pin_lock(&rtas_lock);
+
 	/* We use the global rtas args buffer */
 	args = &rtas_args;
 
@@ -1150,6 +1159,7 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...)
 			outputs[i] = be32_to_cpu(args->rets[i + 1]);
 	ret = (nret > 0) ? be32_to_cpu(args->rets[0]) : 0;
 
+	lockdep_unpin_lock(&rtas_lock, cookie);
 	raw_spin_unlock_irqrestore(&rtas_lock, flags);
 
 	if (buff_copy) {
@@ -1781,6 +1791,7 @@ static bool block_rtas_call(int token, int nargs,
 /* We assume to be passed big endian arguments */
 SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 {
+	struct pin_cookie cookie;
 	struct rtas_args args;
 	unsigned long flags;
 	char *buff_copy, *errbuf = NULL;
@@ -1849,6 +1860,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	buff_copy = get_errorlog_buffer();
 
 	raw_spin_lock_irqsave(&rtas_lock, flags);
+	cookie = lockdep_pin_lock(&rtas_lock);
 
 	rtas_args = args;
 	do_enter_rtas(&rtas_args);
@@ -1859,6 +1871,7 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 	if (be32_to_cpu(args.rets[0]) == -1)
 		errbuf = __fetch_rtas_last_error(buff_copy);
 
+	lockdep_unpin_lock(&rtas_lock, cookie);
 	raw_spin_unlock_irqrestore(&rtas_lock, flags);
 
 	if (buff_copy) {

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked()
  2023-03-06 21:33 [PATCH 0/8] RTAS changes for 6.4 Nathan Lynch via B4 Relay
                   ` (5 preceding siblings ...)
  2023-03-06 21:33 ` [PATCH 6/8] powerpc/rtas: lockdep annotations Nathan Lynch via B4 Relay
@ 2023-03-06 21:33 ` Nathan Lynch via B4 Relay
  2023-03-23  4:25   ` Andrew Donnellan
  2023-03-06 21:33 ` [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas() Nathan Lynch via B4 Relay
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 32+ messages in thread
From: Nathan Lynch via B4 Relay @ 2023-03-06 21:33 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Nathan Lynch, Tyrel Datwyler, Nick Child, Andrew Donnellan,
	Scott Cheloha, Laurent Dufour, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

Any caller of rtas_call_unlocked() must provide an rtas_args parameter
block distinct from the core rtas_args buffer used by the rtas_call()
path. It's an unlikely error to make, but the potential consequences
are grim, and it's trivial to check.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 633c925164e7..47a2aa43d7d4 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1042,6 +1042,13 @@ void rtas_call_unlocked(struct rtas_args *args, int token, int nargs, int nret,
 {
 	va_list list;
 
+	/*
+	 * Callers must not use rtas_args; otherwise they risk
+	 * corrupting the state of the rtas_call() path, which is
+	 * serialized by rtas_lock.
+	 */
+	WARN_ON(args == &rtas_args);
+
 	va_start(list, nret);
 	va_rtas_call(args, token, nargs, nret, list);
 	va_end(list);

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()
  2023-03-06 21:33 [PATCH 0/8] RTAS changes for 6.4 Nathan Lynch via B4 Relay
                   ` (6 preceding siblings ...)
  2023-03-06 21:33 ` [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked() Nathan Lynch via B4 Relay
@ 2023-03-06 21:33 ` Nathan Lynch via B4 Relay
  2023-03-23  6:26   ` Andrew Donnellan
                     ` (2 more replies)
  2023-04-06  1:09 ` (subset) [PATCH 0/8] RTAS changes for 6.4 Michael Ellerman
  2023-04-26 12:12 ` Michael Ellerman
  9 siblings, 3 replies; 32+ messages in thread
From: Nathan Lynch via B4 Relay @ 2023-03-06 21:33 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Nathan Lynch, Tyrel Datwyler, Nick Child, Andrew Donnellan,
	Scott Cheloha, Laurent Dufour, linuxppc-dev

From: Nathan Lynch <nathanl@linux.ibm.com>

The kernel can handle retrying RTAS function calls in response to
-2/990x in the sys_rtas() handler instead of relaying the intermediate
status to user space.

Justifications:

* Currently it's nondeterministic and quite variable in practice
  whether a retry status is returned for any given invocation of
  sys_rtas(). Therefore user space code cannot be expecting a retry
  result without already being broken.

* This tends to significantly reduce the total number of system calls
  issued by programs such as drmgr which make use of sys_rtas(),
  improving the experience of tracing and debugging such
  programs. This is the main motivation for me: I think this change
  will make it easier for us to characterize current sys_rtas() use
  cases as we move them to other interfaces over time.

* It reduces the number of opportunities for user space to leave
  complex operations, such as those associated with DLPAR, incomplete
  and diffcult to recover.

* We can expect performance improvements for existing sys_rtas()
  users, not only because of overall reduction in the number of system
  calls issued, but also due to the better handling of -2/990x in the
  kernel. For example, librtas still sleeps for 1ms on -2, which is
  completely unnecessary.

Performance differences for PHB add and remove on a small P10 PowerVM
partition are included below. For add, elapsed time is slightly
reduced. For remove, there are more significant improvements: the
number of context switches is reduced by an order of magnitude, and
elapsed time is reduced by over half.

(- before, + after):

  Performance counter stats for 'drmgr -c phb -a -s PHB 23' (5 runs):

-          1,847.58 msec task-clock                       #    0.135 CPUs utilized               ( +- 14.15% )
-            10,867      cs                               #    9.800 K/sec                       ( +- 14.14% )
+          1,901.15 msec task-clock                       #    0.148 CPUs utilized               ( +- 14.13% )
+            10,451      cs                               #    9.158 K/sec                       ( +- 14.14% )

-         13.656557 +- 0.000124 seconds time elapsed  ( +-  0.00% )
+          12.88080 +- 0.00404 seconds time elapsed  ( +-  0.03% )

  Performance counter stats for 'drmgr -c phb -r -s PHB 23' (5 runs):

-          1,473.75 msec task-clock                       #    0.092 CPUs utilized               ( +- 14.15% )
-             2,652      cs                               #    3.000 K/sec                       ( +- 14.16% )
+          1,444.55 msec task-clock                       #    0.221 CPUs utilized               ( +- 14.14% )
+               104      cs                               #  119.957 /sec                        ( +- 14.63% )

-          15.99718 +- 0.00801 seconds time elapsed  ( +-  0.05% )
+           6.54256 +- 0.00830 seconds time elapsed  ( +-  0.13% )

Move the existing rtas_lock-guarded critical section in sys_rtas()
into a conventional rtas_busy_delay()-based loop, returning to user
space only when a final success or failure result is available.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/rtas.c | 28 ++++++++++++++++------------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 47a2aa43d7d4..c330a22ccc70 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -1798,7 +1798,6 @@ static bool block_rtas_call(int token, int nargs,
 /* We assume to be passed big endian arguments */
 SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 {
-	struct pin_cookie cookie;
 	struct rtas_args args;
 	unsigned long flags;
 	char *buff_copy, *errbuf = NULL;
@@ -1866,20 +1865,25 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
 
 	buff_copy = get_errorlog_buffer();
 
-	raw_spin_lock_irqsave(&rtas_lock, flags);
-	cookie = lockdep_pin_lock(&rtas_lock);
+	do {
+		struct pin_cookie cookie;
 
-	rtas_args = args;
-	do_enter_rtas(&rtas_args);
-	args = rtas_args;
+		raw_spin_lock_irqsave(&rtas_lock, flags);
+		cookie = lockdep_pin_lock(&rtas_lock);
 
-	/* A -1 return code indicates that the last command couldn't
-	   be completed due to a hardware error. */
-	if (be32_to_cpu(args.rets[0]) == -1)
-		errbuf = __fetch_rtas_last_error(buff_copy);
+		rtas_args = args;
+		do_enter_rtas(&rtas_args);
+		args = rtas_args;
 
-	lockdep_unpin_lock(&rtas_lock, cookie);
-	raw_spin_unlock_irqrestore(&rtas_lock, flags);
+		/*
+		 * Handle error record retrieval before releasing the lock.
+		 */
+		if (be32_to_cpu(args.rets[0]) == -1)
+			errbuf = __fetch_rtas_last_error(buff_copy);
+
+		lockdep_unpin_lock(&rtas_lock, cookie);
+		raw_spin_unlock_irqrestore(&rtas_lock, flags);
+	} while (rtas_busy_delay(be32_to_cpu(args.rets[0])));
 
 	if (buff_copy) {
 		if (errbuf)

-- 
2.39.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/8] powerpc/rtas: fix miswording in rtas_function kerneldoc
  2023-03-06 21:33 ` [PATCH 4/8] powerpc/rtas: fix miswording in rtas_function kerneldoc Nathan Lynch via B4 Relay
@ 2023-03-23  0:17   ` Andrew Donnellan
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Donnellan @ 2023-03-23  0:17 UTC (permalink / raw)
  To: nathanl, Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Tyrel Datwyler, Scott Cheloha, Laurent Dufour, linuxppc-dev, Nick Child

On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
> From: Nathan Lynch <nathanl@linux.ibm.com>
> 
> The 'filter' member is a pointer, not a bool; fix the wording
> accordingly.
> 
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>

Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>

> ---
>  arch/powerpc/kernel/rtas.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index c73b01d722f6..c29c38b1a55a 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -68,7 +68,7 @@ struct rtas_filter {
>   *                            functions are believed to have no
> users on
>   *                            ppc64le, and we want to keep it that
> way. It does
>   *                            not make sense for this to be set when
> @filter
> - *                            is false.
> + *                            is NULL.
>   */
>  struct rtas_function {
>         s32 token;
> 

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@linux.ibm.com   IBM Australia Limited

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/8] powerpc/rtas: ensure 8-byte alignment for struct rtas_args
  2023-03-06 21:33 ` [PATCH 1/8] powerpc/rtas: ensure 8-byte alignment for struct rtas_args Nathan Lynch via B4 Relay
@ 2023-03-23  4:00   ` Andrew Donnellan
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Donnellan @ 2023-03-23  4:00 UTC (permalink / raw)
  To: nathanl, Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Tyrel Datwyler, Scott Cheloha, Laurent Dufour, linuxppc-dev, Nick Child

On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
> > From: Nathan Lynch <nathanl@linux.ibm.com>
> > 
> > CHRP and PAPR agree: "In order to make an RTAS call, the operating
> > system must construct an argument call buffer aligned on an eight
> > byte
> > boundary in physically contiguous real memory [...]." (7.2.7
> > Calling
> > Mechanism and Conventions).
> > 
> > struct rtas_args is the type used for this argument call buffer.
> > The
> > unarchitected 'rets' member happens to produce 8-byte alignment for
> > the struct on 64-bit targets in practice. But without an alignment
> > directive the structure will have only 4-byte alignment on 32-bit
> > targets:
> > 
> >   $ nm b/{before,after}/chrp32/vmlinux | grep rtas_args
> >   c096881c b rtas_args
> >   c0968820 b rtas_args
> > 
> > Add an alignment directive to the struct rtas_args declaration so
> > all
> > instances have the alignment required by the specs. rtas-types.h no
> > longer refers to any spinlock types, so drop the spinlock_types.h
> > inclusion while we're here.
> > 
> > Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>

Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>

> > ---
> >  arch/powerpc/include/asm/rtas-types.h | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/rtas-types.h
> > b/arch/powerpc/include/asm/rtas-types.h
> > index f2ad4a96cbc5..861145c8a021 100644
> > --- a/arch/powerpc/include/asm/rtas-types.h
> > +++ b/arch/powerpc/include/asm/rtas-types.h
> > @@ -2,7 +2,8 @@
> >  #ifndef _ASM_POWERPC_RTAS_TYPES_H
> >  #define _ASM_POWERPC_RTAS_TYPES_H
> >  
> > -#include <linux/spinlock_types.h>
> > +#include <linux/compiler_attributes.h>
> > +#include <linux/sizes.h>
> >  
> >  typedef __be32 rtas_arg_t;
> >  
> > @@ -12,7 +13,7 @@ struct rtas_args {
> >         __be32 nret;
> >         rtas_arg_t args[16];
> >         rtas_arg_t *rets;     /* Pointer to return values in
> > args[].
> > */
> > -};
> > +} __aligned(SZ_8);

Nowhere else in the kernel uses __aligned(SZ_8) over just __aligned(8),
which I suppose would also save an include, but I don't care either
way.

> >  
> >  struct rtas_t {
> >         unsigned long entry;            /* physical address pointer
> > */
> > 

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@linux.ibm.com   IBM Australia Limited


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/8] powerpc/rtas: use memmove for potentially overlapping buffer copy
  2023-03-06 21:33 ` [PATCH 2/8] powerpc/rtas: use memmove for potentially overlapping buffer copy Nathan Lynch via B4 Relay
@ 2023-03-23  4:09   ` Andrew Donnellan
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Donnellan @ 2023-03-23  4:09 UTC (permalink / raw)
  To: nathanl, Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Tyrel Datwyler, Scott Cheloha, Laurent Dufour, linuxppc-dev, Nick Child

On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
> From: Nathan Lynch <nathanl@linux.ibm.com>
> 
> Using memcpy() isn't safe when buf is identical to rtas_err_buf,
> which
> can happen during boot before slab is up. Full context which may not
> be obvious from the diff:
> 
>         if (altbuf) {
>                 buf = altbuf;
>         } else {
>                 buf = rtas_err_buf;
>                 if (slab_is_available())
>                         buf = kmalloc(RTAS_ERROR_LOG_MAX,
> GFP_ATOMIC);
>         }
>         if (buf)
>                 memcpy(buf, rtas_err_buf, RTAS_ERROR_LOG_MAX);
> 
> This was found by inspection and I'm not aware of it causing problems
> in practice. It appears to have been introduced by commit
> 033ef338b6e0 ("powerpc: Merge rtas.c into arch/powerpc/kernel"); the
> old ppc64 version of this code did not have this problem.
> 
> Use memmove() instead.
> 
> Fixes: 033ef338b6e0 ("powerpc: Merge rtas.c into
> arch/powerpc/kernel")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>

Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@linux.ibm.com   IBM Australia Limited

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/8] powerpc/rtas: rtas_call_unlocked() kerneldoc
  2023-03-06 21:33 ` [PATCH 3/8] powerpc/rtas: rtas_call_unlocked() kerneldoc Nathan Lynch via B4 Relay
@ 2023-03-23  4:15   ` Andrew Donnellan
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Donnellan @ 2023-03-23  4:15 UTC (permalink / raw)
  To: nathanl, Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Tyrel Datwyler, Scott Cheloha, Laurent Dufour, linuxppc-dev, Nick Child

On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
> From: Nathan Lynch <nathanl@linux.ibm.com>
> 
> Add documentation for rtas_call_unlocked(), including details on how
> it differs from rtas_call().
> 
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>

Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>


-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@linux.ibm.com   IBM Australia Limited

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/8] powerpc/rtas: rename va_rtas_call_unlocked() to va_rtas_call()
  2023-03-06 21:33 ` [PATCH 5/8] powerpc/rtas: rename va_rtas_call_unlocked() to va_rtas_call() Nathan Lynch via B4 Relay
@ 2023-03-23  4:17   ` Andrew Donnellan
  2023-03-23 16:11     ` Nathan Lynch
  2023-03-29 12:24   ` Michael Ellerman
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Donnellan @ 2023-03-23  4:17 UTC (permalink / raw)
  To: nathanl, Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Tyrel Datwyler, Scott Cheloha, Laurent Dufour, linuxppc-dev, Nick Child

On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
> From: Nathan Lynch <nathanl@linux.ibm.com>
> 
> The function name va_rtas_call_unlocked() is confusing: it may be
> called with or without rtas_lock held. Rename it to va_rtas_call().
> 
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>

Not a huge fan of the name, the va_ suggests that the only difference
between this function and rtas_call() is the varargs handling. Perhaps
something like __rtas_call()?

> ---
>  arch/powerpc/kernel/rtas.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index c29c38b1a55a..96a10a0abe3a 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -996,9 +996,8 @@ static void __init init_error_log_max(void) {}
>  #endif
>  
>  
> -static void
> -va_rtas_call_unlocked(struct rtas_args *args, int token, int nargs,
> int nret,
> -                     va_list list)
> +static void va_rtas_call(struct rtas_args *args, int token, int
> nargs, int nret,
> +                        va_list list)
>  {
>         int i;
>  
> @@ -1038,7 +1037,7 @@ void rtas_call_unlocked(struct rtas_args *args,
> int token, int nargs, int nret,
>         va_list list;
>  
>         va_start(list, nret);
> -       va_rtas_call_unlocked(args, token, nargs, nret, list);
> +       va_rtas_call(args, token, nargs, nret, list);
>         va_end(list);
>  }
>  
> @@ -1138,7 +1137,7 @@ int rtas_call(int token, int nargs, int nret,
> int *outputs, ...)
>         args = &rtas_args;
>  
>         va_start(list, outputs);
> -       va_rtas_call_unlocked(args, token, nargs, nret, list);
> +       va_rtas_call(args, token, nargs, nret, list);
>         va_end(list);
>  
>         /* A -1 return code indicates that the last command couldn't
> 

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@linux.ibm.com   IBM Australia Limited

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked()
  2023-03-06 21:33 ` [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked() Nathan Lynch via B4 Relay
@ 2023-03-23  4:25   ` Andrew Donnellan
  2023-03-23 12:17     ` Nathan Lynch
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Donnellan @ 2023-03-23  4:25 UTC (permalink / raw)
  To: nathanl, Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Tyrel Datwyler, Scott Cheloha, Laurent Dufour, linuxppc-dev, Nick Child

On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
> From: Nathan Lynch <nathanl@linux.ibm.com>
> 
> Any caller of rtas_call_unlocked() must provide an rtas_args
> parameter
> block distinct from the core rtas_args buffer used by the rtas_call()
> path. It's an unlikely error to make, but the potential consequences
> are grim, and it's trivial to check.
> 
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>

call_rtas_display_status() seems to do exactly this, or am I missing
something?

> ---
>  arch/powerpc/kernel/rtas.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 633c925164e7..47a2aa43d7d4 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -1042,6 +1042,13 @@ void rtas_call_unlocked(struct rtas_args
> *args, int token, int nargs, int nret,
>  {
>         va_list list;
>  
> +       /*
> +        * Callers must not use rtas_args; otherwise they risk
> +        * corrupting the state of the rtas_call() path, which is
> +        * serialized by rtas_lock.
> +        */
> +       WARN_ON(args == &rtas_args);
> +
>         va_start(list, nret);
>         va_rtas_call(args, token, nargs, nret, list);
>         va_end(list);
> 

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@linux.ibm.com   IBM Australia Limited

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/8] powerpc/rtas: lockdep annotations
  2023-03-06 21:33 ` [PATCH 6/8] powerpc/rtas: lockdep annotations Nathan Lynch via B4 Relay
@ 2023-03-23  6:01   ` Andrew Donnellan
  0 siblings, 0 replies; 32+ messages in thread
From: Andrew Donnellan @ 2023-03-23  6:01 UTC (permalink / raw)
  To: nathanl, Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Tyrel Datwyler, Scott Cheloha, Laurent Dufour, linuxppc-dev, Nick Child

On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
> From: Nathan Lynch <nathanl@linux.ibm.com>
> 
> Add lockdep annotations for the following properties that must hold:
> 
> * Any error log retrieval must be atomically coupled with the prior
>   RTAS call, without a window for another RTAS call to occur before
> the
>   error log can be retrieved.
> 
> * All users of the core rtas_args parameter block must hold
> rtas_lock.
> 
> Move the definitions of rtas_lock and rtas_args up in the file so
> that
> __do_enter_rtas_trace() can refer to them.
> 
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>

I'm no lockdep expert and I haven't checked if every possible case that
can be annotated has been annotated, but these changes make sense as
far as I can tell from my limited inspection.

Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@linux.ibm.com   IBM Australia Limited

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()
  2023-03-06 21:33 ` [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas() Nathan Lynch via B4 Relay
@ 2023-03-23  6:26   ` Andrew Donnellan
  2023-03-23 19:39     ` Nathan Lynch
  2023-03-23  9:44   ` Michael Ellerman
  2024-01-25 15:55   ` Christophe Leroy
  2 siblings, 1 reply; 32+ messages in thread
From: Andrew Donnellan @ 2023-03-23  6:26 UTC (permalink / raw)
  To: nathanl, Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Tyrel Datwyler, Scott Cheloha, Laurent Dufour, linuxppc-dev, Nick Child

On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
> From: Nathan Lynch <nathanl@linux.ibm.com>
> 
> The kernel can handle retrying RTAS function calls in response to
> -2/990x in the sys_rtas() handler instead of relaying the
> intermediate
> status to user space.
> 
> Justifications:
> 
> * Currently it's nondeterministic and quite variable in practice
>   whether a retry status is returned for any given invocation of
>   sys_rtas(). Therefore user space code cannot be expecting a retry
>   result without already being broken.
> 
> * This tends to significantly reduce the total number of system calls
>   issued by programs such as drmgr which make use of sys_rtas(),
>   improving the experience of tracing and debugging such
>   programs. This is the main motivation for me: I think this change
>   will make it easier for us to characterize current sys_rtas() use
>   cases as we move them to other interfaces over time.
> 
> * It reduces the number of opportunities for user space to leave
>   complex operations, such as those associated with DLPAR, incomplete
>   and diffcult to recover.
> 
> * We can expect performance improvements for existing sys_rtas()
>   users, not only because of overall reduction in the number of
> system
>   calls issued, but also due to the better handling of -2/990x in the
>   kernel. For example, librtas still sleeps for 1ms on -2, which is
>   completely unnecessary.

Would be good to see this fixed on the librtas side.

> 
> Performance differences for PHB add and remove on a small P10 PowerVM
> partition are included below. For add, elapsed time is slightly
> reduced. For remove, there are more significant improvements: the
> number of context switches is reduced by an order of magnitude, and
> elapsed time is reduced by over half.
> 
> (- before, + after):
> 
>   Performance counter stats for 'drmgr -c phb -a -s PHB 23' (5 runs):
> 
> -          1,847.58 msec task-clock                       #    0.135
> CPUs utilized               ( +- 14.15% )
> -            10,867      cs                               #    9.800
> K/sec                       ( +- 14.14% )
> +          1,901.15 msec task-clock                       #    0.148
> CPUs utilized               ( +- 14.13% )
> +            10,451      cs                               #    9.158
> K/sec                       ( +- 14.14% )
> 
> -         13.656557 +- 0.000124 seconds time elapsed  ( +-  0.00% )
> +          12.88080 +- 0.00404 seconds time elapsed  ( +-  0.03% )
> 
>   Performance counter stats for 'drmgr -c phb -r -s PHB 23' (5 runs):
> 
> -          1,473.75 msec task-clock                       #    0.092
> CPUs utilized               ( +- 14.15% )
> -             2,652      cs                               #    3.000
> K/sec                       ( +- 14.16% )
> +          1,444.55 msec task-clock                       #    0.221
> CPUs utilized               ( +- 14.14% )
> +               104      cs                               #  119.957
> /sec                        ( +- 14.63% )
> 
> -          15.99718 +- 0.00801 seconds time elapsed  ( +-  0.05% )
> +           6.54256 +- 0.00830 seconds time elapsed  ( +-  0.13% )
> 
> Move the existing rtas_lock-guarded critical section in sys_rtas()
> into a conventional rtas_busy_delay()-based loop, returning to user
> space only when a final success or failure result is available.
> 
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>

Should there be some kind of timeout? I'm a bit worried by sleeping in
a syscall for an extended period.

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@linux.ibm.com   IBM Australia Limited

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()
  2023-03-06 21:33 ` [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas() Nathan Lynch via B4 Relay
  2023-03-23  6:26   ` Andrew Donnellan
@ 2023-03-23  9:44   ` Michael Ellerman
  2023-03-23 13:40     ` Nathan Lynch
  2024-01-25 15:55   ` Christophe Leroy
  2 siblings, 1 reply; 32+ messages in thread
From: Michael Ellerman @ 2023-03-23  9:44 UTC (permalink / raw)
  To: Nathan Lynch via B4 Relay, Nicholas Piggin, Christophe Leroy
  Cc: Nathan Lynch, Tyrel Datwyler, Nick Child, Andrew Donnellan,
	Scott Cheloha, Laurent Dufour, linuxppc-dev

Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> writes:
> From: Nathan Lynch <nathanl@linux.ibm.com>
>
> The kernel can handle retrying RTAS function calls in response to
> -2/990x in the sys_rtas() handler instead of relaying the intermediate
> status to user space.

This looks good in general.

One query ...

> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 47a2aa43d7d4..c330a22ccc70 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -1798,7 +1798,6 @@ static bool block_rtas_call(int token, int nargs,
>  /* We assume to be passed big endian arguments */
>  SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>  {
> -	struct pin_cookie cookie;
>  	struct rtas_args args;
>  	unsigned long flags;
>  	char *buff_copy, *errbuf = NULL;
> @@ -1866,20 +1865,25 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>  
>  	buff_copy = get_errorlog_buffer();
>  
> -	raw_spin_lock_irqsave(&rtas_lock, flags);
> -	cookie = lockdep_pin_lock(&rtas_lock);
> +	do {
> +		struct pin_cookie cookie;
>  
> -	rtas_args = args;
> -	do_enter_rtas(&rtas_args);
> -	args = rtas_args;
> +		raw_spin_lock_irqsave(&rtas_lock, flags);
> +		cookie = lockdep_pin_lock(&rtas_lock);
>  
> -	/* A -1 return code indicates that the last command couldn't
> -	   be completed due to a hardware error. */
> -	if (be32_to_cpu(args.rets[0]) == -1)
> -		errbuf = __fetch_rtas_last_error(buff_copy);
> +		rtas_args = args;
> +		do_enter_rtas(&rtas_args);
> +		args = rtas_args;
>  
> -	lockdep_unpin_lock(&rtas_lock, cookie);
> -	raw_spin_unlock_irqrestore(&rtas_lock, flags);
> +		/*
> +		 * Handle error record retrieval before releasing the lock.
> +		 */
> +		if (be32_to_cpu(args.rets[0]) == -1)
> +			errbuf = __fetch_rtas_last_error(buff_copy);
> +
> +		lockdep_unpin_lock(&rtas_lock, cookie);
> +		raw_spin_unlock_irqrestore(&rtas_lock, flags);
> +	} while (rtas_busy_delay(be32_to_cpu(args.rets[0])));

rtas_busy_delay_early() has the successive_ext_delays case that will
break out eventually. But if we keep getting plain RTAS_BUSY back from
RTAS I *think* this loop will never terminate?

To avoid that, and just as good manners, I think we should have a
fatal_signal_pending() check, and if that returns true we bail out of
the syscall with -EINTR ?

cheers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked()
  2023-03-23  4:25   ` Andrew Donnellan
@ 2023-03-23 12:17     ` Nathan Lynch
  2023-03-24  0:56       ` Nathan Lynch
  0 siblings, 1 reply; 32+ messages in thread
From: Nathan Lynch @ 2023-03-23 12:17 UTC (permalink / raw)
  To: Andrew Donnellan
  Cc: Tyrel Datwyler, Nick Child, Scott Cheloha, Nicholas Piggin,
	Laurent Dufour, linuxppc-dev

Andrew Donnellan <ajd@linux.ibm.com> writes:

> On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
>> From: Nathan Lynch <nathanl@linux.ibm.com>
>> 
>> Any caller of rtas_call_unlocked() must provide an rtas_args
>> parameter
>> block distinct from the core rtas_args buffer used by the rtas_call()
>> path. It's an unlikely error to make, but the potential consequences
>> are grim, and it's trivial to check.
>> 
>> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
>
> call_rtas_display_status() seems to do exactly this, or am I missing
> something?

No you're right, the warning would be spurious in that case. May need to
drop this one, or refactor rtas_call():

  4456f4524604be2558e5f6a8e0f7cc9ed17c783e
  Author:     Michael Ellerman <mpe@ellerman.id.au>
  AuthorDate: Tue Nov 24 22:26:11 2015 +1100

  powerpc/rtas: Use rtas_call_unlocked() in call_rtas_display_status()

  Although call_rtas_display_status() does actually want to use the
  regular RTAS locking, it doesn't want the extra logic that is in
  rtas_call(), so currently it open codes the logic.

  Instead we can use rtas_call_unlocked(), after taking the RTAS lock.

aside: does anyone know if the display_status() code is worth keeping?
It looks like it is used to drive the 16-character wide physical LCD I
remember seeing on P4-era and older machines. Is it a vestige of
non-LPAR pseries that should be dropped, or is it perhaps useful for
chrp or cell?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()
  2023-03-23  9:44   ` Michael Ellerman
@ 2023-03-23 13:40     ` Nathan Lynch
  0 siblings, 0 replies; 32+ messages in thread
From: Nathan Lynch @ 2023-03-23 13:40 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Tyrel Datwyler, Nick Child, Andrew Donnellan, Scott Cheloha,
	Laurent Dufour, linuxppc-dev

Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org> writes:
>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>
>> The kernel can handle retrying RTAS function calls in response to
>> -2/990x in the sys_rtas() handler instead of relaying the intermediate
>> status to user space.
>
> This looks good in general.
>
> One query ...
>
>> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
>> index 47a2aa43d7d4..c330a22ccc70 100644
>> --- a/arch/powerpc/kernel/rtas.c
>> +++ b/arch/powerpc/kernel/rtas.c
>> @@ -1798,7 +1798,6 @@ static bool block_rtas_call(int token, int nargs,
>>  /* We assume to be passed big endian arguments */
>>  SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>>  {
>> -	struct pin_cookie cookie;
>>  	struct rtas_args args;
>>  	unsigned long flags;
>>  	char *buff_copy, *errbuf = NULL;
>> @@ -1866,20 +1865,25 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>>  
>>  	buff_copy = get_errorlog_buffer();
>>  
>> -	raw_spin_lock_irqsave(&rtas_lock, flags);
>> -	cookie = lockdep_pin_lock(&rtas_lock);
>> +	do {
>> +		struct pin_cookie cookie;
>>  
>> -	rtas_args = args;
>> -	do_enter_rtas(&rtas_args);
>> -	args = rtas_args;
>> +		raw_spin_lock_irqsave(&rtas_lock, flags);
>> +		cookie = lockdep_pin_lock(&rtas_lock);
>>  
>> -	/* A -1 return code indicates that the last command couldn't
>> -	   be completed due to a hardware error. */
>> -	if (be32_to_cpu(args.rets[0]) == -1)
>> -		errbuf = __fetch_rtas_last_error(buff_copy);
>> +		rtas_args = args;
>> +		do_enter_rtas(&rtas_args);
>> +		args = rtas_args;
>>  
>> -	lockdep_unpin_lock(&rtas_lock, cookie);
>> -	raw_spin_unlock_irqrestore(&rtas_lock, flags);
>> +		/*
>> +		 * Handle error record retrieval before releasing the lock.
>> +		 */
>> +		if (be32_to_cpu(args.rets[0]) == -1)
>> +			errbuf = __fetch_rtas_last_error(buff_copy);
>> +
>> +		lockdep_unpin_lock(&rtas_lock, cookie);
>> +		raw_spin_unlock_irqrestore(&rtas_lock, flags);
>> +	} while (rtas_busy_delay(be32_to_cpu(args.rets[0])));
>
> rtas_busy_delay_early() has the successive_ext_delays case that will
> break out eventually. But if we keep getting plain RTAS_BUSY back from
> RTAS I *think* this loop will never terminate?

Yes, but if this happens, then there is a serious bug in Linux or
RTAS. The only time I've seen something like that on PowerVM is when
Linux corrupted internal RTAS state by not serializing calls correctly.

rtas_busy_delay_early() has a bail-out heuristic, not for RTAS_BUSY, but
for extended delay statuses (990x), which I suspect happen rarely (if
ever) that early. That's there in order to allow boot to proceed and
hopefully get useful messages out in a truly unexpected circumstance.

That said...

> To avoid that, and just as good manners, I think we should have a
> fatal_signal_pending() check, and if that returns true we bail out of
> the syscall with -EINTR ?

That probably makes sense. In its current state, I could see
this patch preventing or delaying OS shutdown in situations where it
wouldn't have occurred before.

I think I would want the bailout condition in this case to be
(fatal_signal_pending() && retries > some_threshold), to reduce the
likelihood of non-"stuck" operations from being left unfinished. And it
should dump a stack trace.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/8] powerpc/rtas: rename va_rtas_call_unlocked() to va_rtas_call()
  2023-03-23  4:17   ` Andrew Donnellan
@ 2023-03-23 16:11     ` Nathan Lynch
  0 siblings, 0 replies; 32+ messages in thread
From: Nathan Lynch @ 2023-03-23 16:11 UTC (permalink / raw)
  To: Andrew Donnellan, Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Tyrel Datwyler, Scott Cheloha, Laurent Dufour, linuxppc-dev, Nick Child

Andrew Donnellan <ajd@linux.ibm.com> writes:

> On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
>> From: Nathan Lynch <nathanl@linux.ibm.com>
>> 
>> The function name va_rtas_call_unlocked() is confusing: it may be
>> called with or without rtas_lock held. Rename it to va_rtas_call().
>> 
>> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
>
> Not a huge fan of the name, the va_ suggests that the only difference
> between this function and rtas_call() is the varargs handling. Perhaps
> something like __rtas_call()?

I would be more inclined to agree if va_rtas_call() were a public API,
like rtas_call().

But it's not, so the convention you're appealing to shouldn't inform the
expectations of external users of the rtas_* APIs, at least.

__rtas_call() conveys strictly less information than va_rtas_call()
IMO. Most functions in the kernel that take a va_list have a "v" worked
into their name somehow.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()
  2023-03-23  6:26   ` Andrew Donnellan
@ 2023-03-23 19:39     ` Nathan Lynch
  0 siblings, 0 replies; 32+ messages in thread
From: Nathan Lynch @ 2023-03-23 19:39 UTC (permalink / raw)
  To: Andrew Donnellan, Michael Ellerman, Nicholas Piggin, Christophe Leroy
  Cc: Tyrel Datwyler, Scott Cheloha, Laurent Dufour, linuxppc-dev, Nick Child

Andrew Donnellan <ajd@linux.ibm.com> writes:

> On Mon, 2023-03-06 at 15:33 -0600, Nathan Lynch via B4 Relay wrote:
>> * We can expect performance improvements for existing sys_rtas()
>>   users, not only because of overall reduction in the number of
>> system
>>   calls issued, but also due to the better handling of -2/990x in the
>>   kernel. For example, librtas still sleeps for 1ms on -2, which is
>>   completely unnecessary.
>
> Would be good to see this fixed on the librtas side.

Filed an issue: https://github.com/ibm-power-utilities/librtas/issues/30

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked()
  2023-03-23 12:17     ` Nathan Lynch
@ 2023-03-24  0:56       ` Nathan Lynch
  2023-03-29 12:20         ` Michael Ellerman
  0 siblings, 1 reply; 32+ messages in thread
From: Nathan Lynch @ 2023-03-24  0:56 UTC (permalink / raw)
  To: Andrew Donnellan
  Cc: Tyrel Datwyler, Nick Child, Scott Cheloha, Nicholas Piggin,
	Laurent Dufour, linuxppc-dev

Nathan Lynch <nathanl@linux.ibm.com> writes:
>
> aside: does anyone know if the display_status() code is worth keeping?
> It looks like it is used to drive the 16-character wide physical LCD I
> remember seeing on P4-era and older machines. Is it a vestige of
> non-LPAR pseries that should be dropped, or is it perhaps useful for
> chrp or cell?

Never mind, I see the display-character token and associated properties
on a P8 LPAR and in a current PAPR.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked()
  2023-03-24  0:56       ` Nathan Lynch
@ 2023-03-29 12:20         ` Michael Ellerman
  2023-03-29 16:23           ` Nathan Lynch
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Ellerman @ 2023-03-29 12:20 UTC (permalink / raw)
  To: Nathan Lynch, Andrew Donnellan
  Cc: Tyrel Datwyler, Nick Child, Scott Cheloha, Nicholas Piggin,
	Laurent Dufour, linuxppc-dev

Nathan Lynch <nathanl@linux.ibm.com> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>
>> aside: does anyone know if the display_status() code is worth keeping?
>> It looks like it is used to drive the 16-character wide physical LCD I
>> remember seeing on P4-era and older machines. Is it a vestige of
>> non-LPAR pseries that should be dropped, or is it perhaps useful for
>> chrp or cell?
>
> Never mind, I see the display-character token and associated properties
> on a P8 LPAR and in a current PAPR.

Or a P10 LPAR even.

The characters written using it are shown on the HMC, somewhere on the
partition info page.

cheers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/8] powerpc/rtas: rename va_rtas_call_unlocked() to va_rtas_call()
  2023-03-06 21:33 ` [PATCH 5/8] powerpc/rtas: rename va_rtas_call_unlocked() to va_rtas_call() Nathan Lynch via B4 Relay
  2023-03-23  4:17   ` Andrew Donnellan
@ 2023-03-29 12:24   ` Michael Ellerman
  1 sibling, 0 replies; 32+ messages in thread
From: Michael Ellerman @ 2023-03-29 12:24 UTC (permalink / raw)
  To: Nathan Lynch via B4 Relay, Nicholas Piggin, Christophe Leroy
  Cc: Nathan Lynch, Tyrel Datwyler, Nick Child, Andrew Donnellan,
	Scott Cheloha, Laurent Dufour, linuxppc-dev

Nathan Lynch via B4 Relay <devnull+nathanl.linux.ibm.com@kernel.org>
writes:
> From: Nathan Lynch <nathanl@linux.ibm.com>
>
> The function name va_rtas_call_unlocked() is confusing: it may be
> called with or without rtas_lock held. Rename it to va_rtas_call().

I'm not sure about this one.

The "unlocked" is meant to convey that it doesn't do any locking. The
caller has to be OK with that, or do its own locking.

Andrew is right that the common naming pattern is foo() that takes the
lock and __foo() that doesn't - but I agree that's not very pretty.

Can we just leave it as-is?

cheers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked()
  2023-03-29 12:20         ` Michael Ellerman
@ 2023-03-29 16:23           ` Nathan Lynch
  0 siblings, 0 replies; 32+ messages in thread
From: Nathan Lynch @ 2023-03-29 16:23 UTC (permalink / raw)
  To: Michael Ellerman, Andrew Donnellan
  Cc: Tyrel Datwyler, Nick Child, Scott Cheloha, Nicholas Piggin,
	Laurent Dufour, linuxppc-dev

Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> Nathan Lynch <nathanl@linux.ibm.com> writes:
>>>
>>> aside: does anyone know if the display_status() code is worth keeping?
>>> It looks like it is used to drive the 16-character wide physical LCD I
>>> remember seeing on P4-era and older machines. Is it a vestige of
>>> non-LPAR pseries that should be dropped, or is it perhaps useful for
>>> chrp or cell?
>>
>> Never mind, I see the display-character token and associated properties
>> on a P8 LPAR and in a current PAPR.
>
> Or a P10 LPAR even.
>
> The characters written using it are shown on the HMC, somewhere on the
> partition info page.

Yes, in the "Reference code" field. On the command line, you can see the
history, too:

hscroot@ltchmcv3:~> lsrefcode -r lpar -m ltczep4 --filter lpar_names=ltczep4-lp3 -n 10 -F 'time_stamp refcode'
"03/28/2023 18:38:20" "Linux ppc64le"
"03/28/2023 18:38:19" CA000093
"03/28/2023 18:38:19" CA00E891
"03/28/2023 18:38:19" CA260208
"03/28/2023 18:38:19" CA260208
"03/28/2023 18:38:19" CA00E890
"03/28/2023 18:38:19" CA000050
"03/28/2023 18:38:19" CA000040
"03/28/2023 18:38:19" CA00E880
"03/28/2023 18:38:19" CA00E879

https://www.ibm.com/docs/en/power10/000V-HMC?topic=commands-lsrefcode

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: (subset) [PATCH 0/8] RTAS changes for 6.4
  2023-03-06 21:33 [PATCH 0/8] RTAS changes for 6.4 Nathan Lynch via B4 Relay
                   ` (7 preceding siblings ...)
  2023-03-06 21:33 ` [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas() Nathan Lynch via B4 Relay
@ 2023-04-06  1:09 ` Michael Ellerman
  2023-04-26 12:12 ` Michael Ellerman
  9 siblings, 0 replies; 32+ messages in thread
From: Michael Ellerman @ 2023-04-06  1:09 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Nathan Lynch
  Cc: Tyrel Datwyler, Nick Child, Andrew Donnellan, Scott Cheloha,
	Laurent Dufour, linuxppc-dev

On Mon, 06 Mar 2023 15:33:39 -0600, Nathan Lynch wrote:
> Proposed changes for the RTAS subsystem and client code.
> 
> Fixes that are subject to backporting are at the front of the queue,
> followed by documentation and cleanups, with enhancements at the end.
> 
> Noteworthy changes:
> * Change sys_rtas() to consume -2/990x statuses instead of returning
>   them to user space.
> * Lockdep annotations for invariants in rtas.c.
> 
> [...]

Applied to powerpc/next.

[6/8] powerpc/rtas: lockdep annotations
      https://git.kernel.org/powerpc/c/af8bc68263b2184e63ee67ca70cecff4636f7901

cheers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/8] RTAS changes for 6.4
  2023-03-06 21:33 [PATCH 0/8] RTAS changes for 6.4 Nathan Lynch via B4 Relay
                   ` (8 preceding siblings ...)
  2023-04-06  1:09 ` (subset) [PATCH 0/8] RTAS changes for 6.4 Michael Ellerman
@ 2023-04-26 12:12 ` Michael Ellerman
  9 siblings, 0 replies; 32+ messages in thread
From: Michael Ellerman @ 2023-04-26 12:12 UTC (permalink / raw)
  To: Michael Ellerman, Nicholas Piggin, Christophe Leroy, Nathan Lynch
  Cc: Tyrel Datwyler, Nick Child, Andrew Donnellan, Scott Cheloha,
	Laurent Dufour, linuxppc-dev

On Mon, 06 Mar 2023 15:33:39 -0600, Nathan Lynch wrote:
> Proposed changes for the RTAS subsystem and client code.
> 
> Fixes that are subject to backporting are at the front of the queue,
> followed by documentation and cleanups, with enhancements at the end.
> 
> Noteworthy changes:
> * Change sys_rtas() to consume -2/990x statuses instead of returning
>   them to user space.
> * Lockdep annotations for invariants in rtas.c.
> 
> [...]

Patches 1-4, 6 applied to powerpc/next.

[1/8] powerpc/rtas: ensure 8-byte alignment for struct rtas_args
      https://git.kernel.org/powerpc/c/f40b0f6c5c27de167fdd10e541e0a4b5f2bc772b
[2/8] powerpc/rtas: use memmove for potentially overlapping buffer copy
      https://git.kernel.org/powerpc/c/271208ee5e335cb1ad280d22784940daf7ddf820
[3/8] powerpc/rtas: rtas_call_unlocked() kerneldoc
      https://git.kernel.org/powerpc/c/1792e46ed0cfc1fa27c8c805f8098f806bcc5fc3
[4/8] powerpc/rtas: fix miswording in rtas_function kerneldoc
      https://git.kernel.org/powerpc/c/32740fce09f98d30f3c71a09ee4e9d90b3965427
[6/8] powerpc/rtas: lockdep annotations
      https://git.kernel.org/powerpc/c/af8bc68263b2184e63ee67ca70cecff4636f7901

cheers

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()
  2023-03-06 21:33 ` [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas() Nathan Lynch via B4 Relay
  2023-03-23  6:26   ` Andrew Donnellan
  2023-03-23  9:44   ` Michael Ellerman
@ 2024-01-25 15:55   ` Christophe Leroy
  2024-01-25 16:33     ` Nathan Lynch
  2 siblings, 1 reply; 32+ messages in thread
From: Christophe Leroy @ 2024-01-25 15:55 UTC (permalink / raw)
  To: nathanl
  Cc: Tyrel Datwyler, Nick Child, Andrew Donnellan, Scott Cheloha,
	Nicholas Piggin, Laurent Dufour, linuxppc-dev

Hi Nathan,

Le 06/03/2023 à 22:33, Nathan Lynch via B4 Relay a écrit :
> From: Nathan Lynch <nathanl@linux.ibm.com>
> 
> The kernel can handle retrying RTAS function calls in response to
> -2/990x in the sys_rtas() handler instead of relaying the intermediate
> status to user space.

 From this series with still have patches 5, 7 and 8 awaiting in 
patchwork, see 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?submitter=85747 
and patch 8 doesn't apply anymore.

Are those 3 patches still relevant or should they be discarded ?

Thanks
Christophe


> 
> Justifications:
> 
> * Currently it's nondeterministic and quite variable in practice
>    whether a retry status is returned for any given invocation of
>    sys_rtas(). Therefore user space code cannot be expecting a retry
>    result without already being broken.
> 
> * This tends to significantly reduce the total number of system calls
>    issued by programs such as drmgr which make use of sys_rtas(),
>    improving the experience of tracing and debugging such
>    programs. This is the main motivation for me: I think this change
>    will make it easier for us to characterize current sys_rtas() use
>    cases as we move them to other interfaces over time.
> 
> * It reduces the number of opportunities for user space to leave
>    complex operations, such as those associated with DLPAR, incomplete
>    and diffcult to recover.
> 
> * We can expect performance improvements for existing sys_rtas()
>    users, not only because of overall reduction in the number of system
>    calls issued, but also due to the better handling of -2/990x in the
>    kernel. For example, librtas still sleeps for 1ms on -2, which is
>    completely unnecessary.
> 
> Performance differences for PHB add and remove on a small P10 PowerVM
> partition are included below. For add, elapsed time is slightly
> reduced. For remove, there are more significant improvements: the
> number of context switches is reduced by an order of magnitude, and
> elapsed time is reduced by over half.
> 
> (- before, + after):
> 
>    Performance counter stats for 'drmgr -c phb -a -s PHB 23' (5 runs):
> 
> -          1,847.58 msec task-clock                       #    0.135 CPUs utilized               ( +- 14.15% )
> -            10,867      cs                               #    9.800 K/sec                       ( +- 14.14% )
> +          1,901.15 msec task-clock                       #    0.148 CPUs utilized               ( +- 14.13% )
> +            10,451      cs                               #    9.158 K/sec                       ( +- 14.14% )
> 
> -         13.656557 +- 0.000124 seconds time elapsed  ( +-  0.00% )
> +          12.88080 +- 0.00404 seconds time elapsed  ( +-  0.03% )
> 
>    Performance counter stats for 'drmgr -c phb -r -s PHB 23' (5 runs):
> 
> -          1,473.75 msec task-clock                       #    0.092 CPUs utilized               ( +- 14.15% )
> -             2,652      cs                               #    3.000 K/sec                       ( +- 14.16% )
> +          1,444.55 msec task-clock                       #    0.221 CPUs utilized               ( +- 14.14% )
> +               104      cs                               #  119.957 /sec                        ( +- 14.63% )
> 
> -          15.99718 +- 0.00801 seconds time elapsed  ( +-  0.05% )
> +           6.54256 +- 0.00830 seconds time elapsed  ( +-  0.13% )
> 
> Move the existing rtas_lock-guarded critical section in sys_rtas()
> into a conventional rtas_busy_delay()-based loop, returning to user
> space only when a final success or failure result is available.
> 
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
> ---
>   arch/powerpc/kernel/rtas.c | 28 ++++++++++++++++------------
>   1 file changed, 16 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
> index 47a2aa43d7d4..c330a22ccc70 100644
> --- a/arch/powerpc/kernel/rtas.c
> +++ b/arch/powerpc/kernel/rtas.c
> @@ -1798,7 +1798,6 @@ static bool block_rtas_call(int token, int nargs,
>   /* We assume to be passed big endian arguments */
>   SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>   {
> -	struct pin_cookie cookie;
>   	struct rtas_args args;
>   	unsigned long flags;
>   	char *buff_copy, *errbuf = NULL;
> @@ -1866,20 +1865,25 @@ SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>   
>   	buff_copy = get_errorlog_buffer();
>   
> -	raw_spin_lock_irqsave(&rtas_lock, flags);
> -	cookie = lockdep_pin_lock(&rtas_lock);
> +	do {
> +		struct pin_cookie cookie;
>   
> -	rtas_args = args;
> -	do_enter_rtas(&rtas_args);
> -	args = rtas_args;
> +		raw_spin_lock_irqsave(&rtas_lock, flags);
> +		cookie = lockdep_pin_lock(&rtas_lock);
>   
> -	/* A -1 return code indicates that the last command couldn't
> -	   be completed due to a hardware error. */
> -	if (be32_to_cpu(args.rets[0]) == -1)
> -		errbuf = __fetch_rtas_last_error(buff_copy);
> +		rtas_args = args;
> +		do_enter_rtas(&rtas_args);
> +		args = rtas_args;
>   
> -	lockdep_unpin_lock(&rtas_lock, cookie);
> -	raw_spin_unlock_irqrestore(&rtas_lock, flags);
> +		/*
> +		 * Handle error record retrieval before releasing the lock.
> +		 */
> +		if (be32_to_cpu(args.rets[0]) == -1)
> +			errbuf = __fetch_rtas_last_error(buff_copy);
> +
> +		lockdep_unpin_lock(&rtas_lock, cookie);
> +		raw_spin_unlock_irqrestore(&rtas_lock, flags);
> +	} while (rtas_busy_delay(be32_to_cpu(args.rets[0])));
>   
>   	if (buff_copy) {
>   		if (errbuf)
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()
  2024-01-25 15:55   ` Christophe Leroy
@ 2024-01-25 16:33     ` Nathan Lynch
  2024-01-25 16:46       ` Christophe Leroy
  0 siblings, 1 reply; 32+ messages in thread
From: Nathan Lynch @ 2024-01-25 16:33 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Tyrel Datwyler, Nick Child, Andrew Donnellan, Scott Cheloha,
	Nicholas Piggin, Laurent Dufour, linuxppc-dev

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Hi Nathan,
>
> Le 06/03/2023 à 22:33, Nathan Lynch via B4 Relay a écrit :
>> From: Nathan Lynch <nathanl@linux.ibm.com>
>> 
>> The kernel can handle retrying RTAS function calls in response to
>> -2/990x in the sys_rtas() handler instead of relaying the intermediate
>> status to user space.
>
>  From this series with still have patches 5, 7 and 8 awaiting in 
> patchwork, see 
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?submitter=85747 
> and patch 8 doesn't apply anymore.
>
> Are those 3 patches still relevant or should they be discarded ?

Thanks for checking - 5 and 7 can be discarded.

I intend to return to 8/8 ("consume retry statuses...") when time
allows. So that could be put in "changes requested" state I suppose, but
if it's easier on the maintainer side to discard it that's fine with me
too.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()
  2024-01-25 16:33     ` Nathan Lynch
@ 2024-01-25 16:46       ` Christophe Leroy
  2024-01-25 17:23         ` Nathan Lynch
  0 siblings, 1 reply; 32+ messages in thread
From: Christophe Leroy @ 2024-01-25 16:46 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Tyrel Datwyler, Nick Child, Andrew Donnellan, Scott Cheloha,
	Nicholas Piggin, Laurent Dufour, linuxppc-dev



Le 25/01/2024 à 17:33, Nathan Lynch a écrit :
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Hi Nathan,
>>
>> Le 06/03/2023 à 22:33, Nathan Lynch via B4 Relay a écrit :
>>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>>
>>> The kernel can handle retrying RTAS function calls in response to
>>> -2/990x in the sys_rtas() handler instead of relaying the intermediate
>>> status to user space.
>>
>>   From this series with still have patches 5, 7 and 8 awaiting in
>> patchwork, see
>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?submitter=85747
>> and patch 8 doesn't apply anymore.
>>
>> Are those 3 patches still relevant or should they be discarded ?
> 
> Thanks for checking - 5 and 7 can be discarded.
> 
> I intend to return to 8/8 ("consume retry statuses...") when time
> allows. So that could be put in "changes requested" state I suppose, but
> if it's easier on the maintainer side to discard it that's fine with me
> too.

Ok, thanks for answering, I rejected 5 and 7 and requested changes to 8.

While at it, what about patch 2 of series 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=382195&state=*

Christophe

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas()
  2024-01-25 16:46       ` Christophe Leroy
@ 2024-01-25 17:23         ` Nathan Lynch
  0 siblings, 0 replies; 32+ messages in thread
From: Nathan Lynch @ 2024-01-25 17:23 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Tyrel Datwyler, Nick Child, Andrew Donnellan, Scott Cheloha,
	Nicholas Piggin, Laurent Dufour, linuxppc-dev

Christophe Leroy <christophe.leroy@csgroup.eu> writes:

> Le 25/01/2024 à 17:33, Nathan Lynch a écrit :
>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>> Hi Nathan,
>>>
>>> Le 06/03/2023 à 22:33, Nathan Lynch via B4 Relay a écrit :
>>>> From: Nathan Lynch <nathanl@linux.ibm.com>
>>>>
>>>> The kernel can handle retrying RTAS function calls in response to
>>>> -2/990x in the sys_rtas() handler instead of relaying the intermediate
>>>> status to user space.
>>>
>>>   From this series with still have patches 5, 7 and 8 awaiting in
>>> patchwork, see
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?submitter=85747
>>> and patch 8 doesn't apply anymore.
>>>
>>> Are those 3 patches still relevant or should they be discarded ?
>> 
>> Thanks for checking - 5 and 7 can be discarded.
>> 
>> I intend to return to 8/8 ("consume retry statuses...") when time
>> allows. So that could be put in "changes requested" state I suppose, but
>> if it's easier on the maintainer side to discard it that's fine with me
>> too.
>
> Ok, thanks for answering, I rejected 5 and 7 and requested changes to 8.
>
> While at it, what about patch 2 of series 
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=382195&state=*

Drop it, thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2024-01-25 17:24 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-06 21:33 [PATCH 0/8] RTAS changes for 6.4 Nathan Lynch via B4 Relay
2023-03-06 21:33 ` [PATCH 1/8] powerpc/rtas: ensure 8-byte alignment for struct rtas_args Nathan Lynch via B4 Relay
2023-03-23  4:00   ` Andrew Donnellan
2023-03-06 21:33 ` [PATCH 2/8] powerpc/rtas: use memmove for potentially overlapping buffer copy Nathan Lynch via B4 Relay
2023-03-23  4:09   ` Andrew Donnellan
2023-03-06 21:33 ` [PATCH 3/8] powerpc/rtas: rtas_call_unlocked() kerneldoc Nathan Lynch via B4 Relay
2023-03-23  4:15   ` Andrew Donnellan
2023-03-06 21:33 ` [PATCH 4/8] powerpc/rtas: fix miswording in rtas_function kerneldoc Nathan Lynch via B4 Relay
2023-03-23  0:17   ` Andrew Donnellan
2023-03-06 21:33 ` [PATCH 5/8] powerpc/rtas: rename va_rtas_call_unlocked() to va_rtas_call() Nathan Lynch via B4 Relay
2023-03-23  4:17   ` Andrew Donnellan
2023-03-23 16:11     ` Nathan Lynch
2023-03-29 12:24   ` Michael Ellerman
2023-03-06 21:33 ` [PATCH 6/8] powerpc/rtas: lockdep annotations Nathan Lynch via B4 Relay
2023-03-23  6:01   ` Andrew Donnellan
2023-03-06 21:33 ` [PATCH 7/8] powerpc/rtas: warn on unsafe argument to rtas_call_unlocked() Nathan Lynch via B4 Relay
2023-03-23  4:25   ` Andrew Donnellan
2023-03-23 12:17     ` Nathan Lynch
2023-03-24  0:56       ` Nathan Lynch
2023-03-29 12:20         ` Michael Ellerman
2023-03-29 16:23           ` Nathan Lynch
2023-03-06 21:33 ` [PATCH 8/8] powerpc/rtas: consume retry statuses in sys_rtas() Nathan Lynch via B4 Relay
2023-03-23  6:26   ` Andrew Donnellan
2023-03-23 19:39     ` Nathan Lynch
2023-03-23  9:44   ` Michael Ellerman
2023-03-23 13:40     ` Nathan Lynch
2024-01-25 15:55   ` Christophe Leroy
2024-01-25 16:33     ` Nathan Lynch
2024-01-25 16:46       ` Christophe Leroy
2024-01-25 17:23         ` Nathan Lynch
2023-04-06  1:09 ` (subset) [PATCH 0/8] RTAS changes for 6.4 Michael Ellerman
2023-04-26 12:12 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).